code

August 24, 2015

To crawl or not to crawl, that is the question

In order to write an efficient crawler, you must be smart about the content you download. When your crawler downloads an HTML page it uses bandwidth, memory and CPU, not only its […]

August 12, 2015

Tiny basic multi-threaded web crawler in Python

If you need a simple web crawler that will scour the web for a while to download random site’s content – this code is for you. Usage:

$ python tinyDirtyIffyGoodEnoughWebCrawler.py https://cnn.com

1	$ python tinyDirtyIffyGoodEnoughWebCrawler.py https://cnn.com

Where https://cnn.com is your seed site. It could […]