Following popular demand, we are really happy and excited to grant access to Webhose.io’s historical data archive. This is the first time that anyone can programmatically access a huge index of the internet for analytical purposes. We like to keep things simple here, so accessing the archive is as simple as one two three (and possibly four).
Define your query filters
The first step in retrieving the data is by defining what you want to retrieve. Using either our simple query builder, or a free text boolean query, you can choose how you would like to filter your data. Just to keep it simple, you can even use the same parameters that you would use with the API.
Review the data
The next step is reviewing the data to make sure your query brings in relevant posts. The data you will see is from the the past 30 days. If you are not happy with the results, go back and adjust your query filters.
Set the timeframe
Now that you are happy with the sample data, it is time to set the time frame that you wish to retrieve the data from. Our archive dates all the way back to mid December 2014.
Get the data
The last step is to tell the system if you would like the data in a JSON or XML format and then the email that you wish our system to send the download link to. Fill in the billing info and you are set to go. Once completed you will soon receive an email with an HTTP link to download the data. Depending on the time-frame you defined and the number of matching posts, this will usually take a matter of minutes but can take a few hours.