Scrapy is an excellent Python library for web scraping. For example, you could create an API with data that is populated via web scraping. This article covers some basic scrapy features, such as the shell and selectors.
Install scrapy in virtual environment on your machine:
$ virtualenv venv $ source venv/bin/activate $ pip install scrapy
To learn about scrapy, the shell is a good place to start, because it offers an interactive environment where you can try selectors on a concrete web page. Here is how to start the scrapy shell:
$ scrapy shell http://doc.scrapy.org/en/latest/topics/selectors.html
Now, try out different selections.
You can select elements on a page with CSS and XPath; these selectors can be stringed together. For example, use css to select
a tags and xpath to select the
href attribute of those tags:
>>> for link in response.css('a').xpath('@href').extract(): >>> print link