How to scrape images from the web

I'm interested in object detection and other computer vision tasks. For example, I'm working on a teddy-bear detector with my son.

So, how do you quickly download images for a certain category? You can use this approach that I learned from a course on Udemy.

# pip install icrawler
from icrawler.builtin import GoogleImageCrawler

keywords = ['cat', 'dog']
for keyword in keywords:
    google_crawler = GoogleImageCrawler(
        parser_threads=2,
        downloader_threads=4,
        storage={'root_dir': 'images/{}'.format(keyword)}
    
    )
    google_crawler.crawl(
        keyword=keyword, max_num=10, min_size=(200, 200))

In the above example, the crawler will find images in two categories -- cats and dogs, as if you search for 'cat' and 'dog' on Google images and downloaded what you found.

Let's walk through the parameters used in the code. First, there is the constructor, which is called with three arguments in the example. The most important parameter is storage, which specifies where the images will be stored. Second, we have the call to the crawl function. Here, the max_num parameter is used to specify that at most 10 images per category should be downloaded. The min_size argument specifies that the images must be at least 200 x 200 pixels.

That's it. Happy downloading.

Leave a Comment

This site uses Akismet to reduce spam. Learn how your comment data is processed.