Webhose.io Archive Access is now LIVE

Posted on February 8, 2016 by Webhose

Following popular demand, we are really happy and excited to grant access to Webhose.io’s historical data archive. This is the first time that anyone can programmatically access a huge index of the internet for analytical purposes. We like to keep things simple here, so accessing the archive is as simple as one two three (and possibly...

Continue reading

Posted in API

The Top 10 Data & Analytics Articles of 2015

Posted on January 12, 2016 by Webhose

The online world of data and analytics is fast approaching epic portions. It’s easy to get overwhelmed. Why? Because, not only has big data been big business in 2015 … but posts, articles, podcasts, webinars, and resources abound. Some are worth your time. Some … are not. To help you dig through the very best...

Continue reading

Posted in Big Data

Tiny basic multi-threaded web crawler in Python

Posted on August 12, 2015 by Ran Geva

If you need a simple web crawler that will scour the web for a while to download random site’s content – this code is for you. Usage: $ python tinyDirtyIffyGoodEnoughWebCrawler.py https://cnn.com Where https://cnn.com is your seed site. It could be any site that contains content and links to other sites. My colleagues described this piece of code I wrote...

Continue reading

Posted in API