Avoid Biased Data Analysis with Clean and Structured Data

Posted on March 10, 2019 by Shai Schwartz

I want to share with you an unfortunate truth: All data is biased. Here at Webhose, we’ve written about this at length in our posts that explained how surveys are biased and the danger of fake reviews.  News headlines throughout 2018 were full of examples of disinformation, fake news and the questioning of its impact...

Continue reading

Posted in API

Meet the Online News Archive: Time for Some Historical Perspective

Posted on March 12, 2018 by Guy Mor

Today we’re very excited to announce the latest milestone in our journey to make structured web data easily accessible to every organization, developer and researcher: the Online News Archive has now been officially launched!   TL;DR version: it’s a massive database of online news articles in structured format collected from thousands of sources in over...

Continue reading

Posted in News

Structuring the Dark Web!

Posted on January 24, 2018 by eranl

We’ve recently launched an exciting new addition to our dark web data feed (as featured on Betanews, ProgrammableWeb, and elsewhere): now, in addition to industry-leading breadth of coverage of the TOR network, we’ll also be structuring the extracted data so that it fits into a similar JSON format as our open web data feeds. The...

Continue reading

Posted in Dark Web

3 Steps to Turn Webpages into Machine-Readable Data

Posted on October 30, 2017 by eranl

The vast majority of us use the web every single day – for news, shopping, socializing and really any type of activity you can imagine. But when it comes to acquiring data from the web for analytical or research purposes, you need to start looking at web content in a more technical way – breaking...

Continue reading

Posted in Technology

3 Ways to Use eCommerce Product Data for Market Research

Posted on September 14, 2017 by eranl

The web is an invaluable source of data when it comes to competitive intelligence, research and creating a go-to-market strategy. A simple Google search will reveal incredible amounts of public information about your main competitors, and connect you to mounds of existing research, financial analysis and other information that could be pertinent to your next...

Continue reading

Posted in Big Data

The Race to Achieve 100% Coverage of the Web

Posted on September 19, 2016 by ohadf

In our new report, we deconstruct the all-too-familiar race to achieve 100% coverage of the web. Data acquisition efforts usually rely on one of three approaches – build an internal web crawling capability, rely on data providers, or implement a combination of both. The goal is to tap into as much structured web data as...

Continue reading

Posted in Big Data

Guide to Structured Web Data Consumption: How to get instant access to news, blogs, and online discussions

Posted on September 1, 2016 by ohadf

Hundreds of entrepreneurs, researchers, and data scientists contact us daily with questions about accessing structured web data. We put together our answers our new guide to Structured Web Data Consumption. The consumerization of web data It’s easy to fall into the trap of building a proprietary crawling and data structuring solution tailored to a particular...

Continue reading

Posted in API

How to Extract Data from a Website: 5 Steps to Transform Unstructured Data into Business Insights

Posted on December 8, 2015 by webhose

Big data is big business. And for good reason. As Harvard Business Review recently reported, an exhaustive study of 330 North American companies led by the MIT Center for Digital Business in conjunction with McKinsey’s Business Technology Office revealed that the use of data in business decisions like product development, hiring and firing, as well...

Continue reading

Posted in Machine Learning