Scrapy is an excellent Python library for web scraping. For example, you could create an API with data that is populated via web scraping. This article covers some basic scrapy features, such as the shell and selectors.
Install scrapy in virtual environment on your machine:
$ virtualenv venv $ source venv/bin/activate $ pip install scrapy |
To learn about scrapy, the shell is a good place to start, because it offers an interactive environment where you can try selectors on a concrete web page. Here is how to start the scrapy shell:
$ scrapy shell http://doc.scrapy.org/en/latest/topics/selectors.html
Selectors
Now, try out different selections.
You can select elements on a page with CSS and XPath; these selectors can be stringed together. For example, use css to select a
tags and xpath to select the href
attribute of those tags:
>>> for link in response.css('a').xpath('@href').extract(): >>> print link |
Documentation
Now you are ready to head over to the documentation to read more about how to become great a using scrapy. Another tip is to follow the scrapinghub blog.