How to Become a Web Scraping Pro with Python pt. 1

Scrapy is an excellent Python library for web scraping. For example, you could create an API with data that is populated via web scraping. This article covers some basic scrapy features, such as the shell and selectors.

Install scrapy in virtual environment on your machine:

$ virtualenv venv
$ source venv/bin/activate
$ pip install scrapy

To learn about scrapy, the shell is a good place to start, because it offers an interactive environment where you can try selectors on a concrete web page. Here is how to start the scrapy shell:

$ scrapy shell http://doc.scrapy.org/en/latest/topics/selectors.html

Selectors

Now, try out different selections.

You can select elements on a page with CSS and XPath; these selectors can be stringed together. For example, use css to select a tags and xpath to select the href attribute of those tags:

>>> for link in response.css('a').xpath('@href').extract():
>>>   print link

Documentation

Now you are ready to head over to the documentation to read more about how to become great a using scrapy. Another tip is to follow the scrapinghub blog.

Leave a Comment

This site uses Akismet to reduce spam. Learn how your comment data is processed.