July 2019

Avoid Biased Data Analysis with Clean and Structured Data

Posted on March 10, 2019 by

I want to share with you an unfortunate truth: All data is biased. Here at Webhose, we’ve written about this at length in our posts that explained how surveys are biased and the danger of fake reviews.  News headlines throughout 2018 were full of examples of disinformation, fake news and the questioning of its impact

Continue reading

Posted in API | Comments Off on Avoid Biased Data Analysis with Clean and Structured Data

Auritus: Open-Source, Public Relations Monitoring Platform

Posted on February 19, 2019 by

The reason I started Webhose.io is because I experienced the difficulties in collecting web data at scale when I worked on a previous project named PRTrack.it. At PRTrack.it we wanted to create a simple solution that will monitor how well your press release performed. We worked hard on the UX/UI but when it came to

Continue reading

Posted in Technology | Comments Off on Auritus: Open-Source, Public Relations Monitoring Platform

Webhose 2018 Year in Review

Posted on December 21, 2018 by

The Year in Review at Webhose What a year we’ve had at Webhose! With your ongoing feedback we’ve been able to develop offerings that better serve you. As we stop to catch our breath before the year ends, we wanted to review how much was achieved in such a short time. And we thought the

Continue reading

Posted in News | Comments Off on Webhose 2018 Year in Review

Track Your News with RSS Instead of Social Media

Posted on August 29, 2018 by

One of the first popular sites to ever offer RSS, Rich Site Summary (or Really Simple Syndication) was the New York Times in early 2002. These feeds brought the advantage of learning about the latest news or trends from a variety of sources at once, without being obligated to subscribe, see ads, or opt-in to

Continue reading

Posted in API | Comments Off on Track Your News with RSS Instead of Social Media

Meet the Online News Archive: Time for Some Historical Perspective

Posted on March 12, 2018 by

Today we’re very excited to announce the latest milestone in our journey to make structured web data easily accessible to every organization, developer and researcher: the Online News Archive has now been officially launched!   TL;DR version: it’s a massive database of online news articles in structured format collected from thousands of sources in over

Continue reading

Posted in News | Comments Off on Meet the Online News Archive: Time for Some Historical Perspective

Richer Media Analysis with Broadcast News Transcripts

Posted on November 23, 2017 by

After a few long nights and some very tired developers, we’re proud to introduce the Broadcast Data Feed: transcribed, structured, and machine-readable television and radio programming from 1091 US TV stations, 21 Canadian TV stations, 83 Spanish-US TV Stations and 356 Radio Stations. This new product line is available and ready for you to use,

Continue reading

Posted in API | Comments Off on Richer Media Analysis with Broadcast News Transcripts

Quick Guide to News APIs

Posted on October 10, 2017 by

Monitoring mass media has come a long way since the days of the press-cutting agency. The bulk of today’s news is published online, while modern technology lets us store, index and query massive amounts of textual data in milliseconds. Digitization presents clear advantages for consumers, who can now read or watch the news from the

Continue reading

Posted in API | Comments Off on Quick Guide to News APIs

Can Data Science Deliver a Fake News Detector?

Posted on April 4, 2017 by

Regardless of your political opinion, fake news has dominated the conversation since the 2016 US presidential election. The crux of the problem is that the very definition of what qualifies as fake news is in dispute. Still, most of us would like to know if the news story we’re reading reflects actual events – or

Continue reading

Posted in Machine Learning | Comments Off on Can Data Science Deliver a Fake News Detector?

Top 10 Big Data Stories Leading the Conversation

Posted on September 26, 2016 by

In the right hands, crawled web data can tell an amazing story. We were interested in the top 10 news stories – sorted by social shares on Facebook and LinkedIn. So we set up a simple news API request. We were looking for the stories published over the past 30 days returned by an exact match query for the term “big data”.  Here

Continue reading

Posted in Big Data | Comments Off on Top 10 Big Data Stories Leading the Conversation

100% coverage of the Web

Posted on March 9, 2016 by

Well that’s the holy grail. To be able to tap into World Wide Web as a whole is something that anyone dealing with data would like to have, but is far FAR from achieving (except maybe for the NSA, we don’t know). The idea behind Webhose.io is that when you need data from the web,

Continue reading

Posted in API | Comments Off on 100% coverage of the Web

Five Reasons a News Crawler Is Essential to Your Business

Posted on January 5, 2016 by

“Originality is the art of remembering something but forgetting where you heard it.” Case in point, I don’t remember where I heard that. Nonetheless, it’s absolutely true, especially when it comes to running an online business. Why? Because in today’s online marketplace, sales, brand management, and genuine engagement are all practices that shouldn’t begin with

Continue reading

Posted in API | Leave a comment

30-Days of Historical Data Access for Webhose.io Now Available

Posted on September 10, 2015 by

I’m very happy to let you know about the launch of our extended access to 30-days of historical data from Webhose.io, which is available to our paying customers immediately. No waiting list. With the 30 days data access, Webhose.io customers don’t have to worry about missing posts in the realtime stream since they can now

Continue reading

Posted in News | Comments Off on 30-Days of Historical Data Access for Webhose.io Now Available

Webhose.io Tip: Search for top performing (viral) posts

Posted on April 30, 2015 by

Our crawlers download millions of posts a day from millions of sources. Sometimes you may want to only sift through news or blog posts that had some kind of social impact. To provide you with this capability, we are introducing a new score we call the “Performance Score”.  

Continue reading

Posted in API | Comments Off on Webhose.io Tip: Search for top performing (viral) posts