June 2018

Survey Results: What Matters to Web Data Collection Buyers

Posted on June 28, 2018 by

While structured web data presents exciting possibilities in many fields of endeavor – including finance, cyber-security, artificial intelligence and more – the market for data extraction platforms is still fairly young. Only a handful of companies are providing online data at scale, and unlike other technologies which are covered extensively by analysts and professional publications,

Continue reading

Posted in Technology | Leave a comment

Quick Guide to News APIs

Posted on October 10, 2017 by

Monitoring mass media has come a long way since the days of the press-cutting agency. The bulk of today’s news is published online, while modern technology lets us store, index and query massive amounts of textual data in milliseconds. Digitization presents clear advantages for consumers, who can now read or watch the news from the

Continue reading

Posted in API | Comments Off on Quick Guide to News APIs

Why Extracting Content From The Open Web Is Better than Surveys for Research

Posted on March 21, 2016 by

What’s the best way to find out how people feel about a given topic? Simply ask them, right? Well, at least that’s what we’ve been led to believe. Standard polling practice tells us that if you put together some questions, pose them to a group of people and then “normalize” the data to account for

Continue reading

Posted in API | Comments Off on Why Extracting Content From The Open Web Is Better than Surveys for Research

Article’s publication date extractor – an overview

Posted on December 13, 2015 by

A few days ago I’ve released an open source Python module that provides you with a simple way to extract and normalize the publication date of any online blog or news post. There are some commercial solutions out there, but why not just use this module for free?   The logic behind the code Here

Continue reading

Posted in API | Comments Off on Article’s publication date extractor – an overview

Dead simple {for devs} python crawler (script) for extracting structured data from any website into CSV

Posted on August 16, 2015 by

On my previous post I wrote about a very basic web crawler I wrote, that can randomly scour the web and mirror/download websites. Today I want to share with you a very simple script that can extract structured data from any <almost> website. Use the following script to extract specific information from any website (i.e prices, ids, titles,

Continue reading

Posted in API | Comments Off on Dead simple {for devs} python crawler (script) for extracting structured data from any website into CSV