Category: API

Track Your News with RSS Instead of Social Media

Posted on August 29, 2018 by

One of the first popular sites to ever offer RSS, Rich Site Summary (or Really Simple Syndication) was the New York Times in early 2002. These feeds brought the advantage of learning about the latest news or trends from a variety of sources at once, without being obligated to subscribe, see ads, or opt-in to

Continue reading

Posted in API | Comments Off on Track Your News with RSS Instead of Social Media

What is the Omgili Bot, and why is it Crawling Your Website?

Posted on December 28, 2017 by

Hi there. If you’re reading this, it’s probably because you’ve run into Omgilibot – perhaps in your web analytics or server logs (user agent: omgili/0.5 +https://omgili.com) – and turned to Google to decide whether this crawler is a benevolent creature that should be permitted to do as it will, or something more nefarious that deserves

Continue reading

Posted in API | Comments Off on What is the Omgili Bot, and why is it Crawling Your Website?

Richer Media Analysis with Broadcast News Transcripts

Posted on November 23, 2017 by

After a few long nights and some very tired developers, we’re proud to introduce the Broadcast Data Feed: transcribed, structured, and machine-readable television and radio programming from 1091 US TV stations, 21 Canadian TV stations, 83 Spanish-US TV Stations and 356 Radio Stations. This new product line is available and ready for you to use,

Continue reading

Posted in API | Comments Off on Richer Media Analysis with Broadcast News Transcripts

Quick Guide to News APIs

Posted on October 10, 2017 by

Monitoring mass media has come a long way since the days of the press-cutting agency. The bulk of today’s news is published online, while modern technology lets us store, index and query massive amounts of textual data in milliseconds. Digitization presents clear advantages for consumers, who can now read or watch the news from the

Continue reading

Posted in API | Comments Off on Quick Guide to News APIs

The Hackathon Award for Best API Mashup Goes to…

Posted on March 26, 2017 by

Competitive programming competitions, commonly referred to as Hackathons, offer a great opportunity for new talent to show what they can do. Much like professional sports, industry leaders send recruiters to scout out the top performers. With high stakes on the line and limited resources, getting noticed as a hackathon winner not only looks good on

Continue reading

Posted in API | Comments Off on The Hackathon Award for Best API Mashup Goes to…

Webhose.io API Featured in New Guide to Web Development with Django

Posted on March 12, 2017 by

Last February, co-authors Leiff Azopardi and James Maxwell completed the latest edition of their book Tango with Django. It presents an excellent step-by-step approach to learning Python on the popular Django framework v1.9 (also compatible with v1.10). Although the book is designed as a beginner’s guide to web development, the material is packed with tips even

Continue reading

Posted in API | Leave a comment

How to Use Online Review Ratings to Crush the Market

Posted on March 2, 2017 by

Sifting through millions of posts on review sites presents both a massive undertaking and an incredible opportunity for influencer marketing. Some of the most successful app makers are capitalizing on that oppotunity. Use your favorite media monitoring plaform to sift through the reviews. As you might expect, the biggest opportunity is in reaching negative and neutral

Continue reading

Posted in API | Comments Off on How to Use Online Review Ratings to Crush the Market

How to use rated reviews for sentiment classification

Posted on February 9, 2017 by

Sentiment classification is a fascinating use case for machine learning. Regardless of complexity – you need two core components to deliver meaningful results; a machine learning engine and a significant volume of structured data to train that engine. Last month, we added the new “rating” field for rated review sites covered in the Webhose.io threaded

Continue reading

Posted in API | Comments Off on How to use rated reviews for sentiment classification

Guide to Structured Web Data Consumption: How to get instant access to news, blogs, and online discussions

Posted on September 1, 2016 by

Hundreds of entrepreneurs, researchers, and data scientists contact us daily with questions about accessing structured web data. We put together our answers our new guide to Structured Web Data Consumption.     The consumerization of web data It’s easy to fall into the trap of building a proprietary crawling and data structuring solution tailored to

Continue reading

Posted in API | Comments Off on Guide to Structured Web Data Consumption: How to get instant access to news, blogs, and online discussions

Why Extracting Content From The Open Web Is Better than Surveys for Research

Posted on March 21, 2016 by

What’s the best way to find out how people feel about a given topic? Simply ask them, right? Well, at least that’s what we’ve been led to believe. Standard polling practice tells us that if you put together some questions, pose them to a group of people and then “normalize” the data to account for

Continue reading

Posted in API | Comments Off on Why Extracting Content From The Open Web Is Better than Surveys for Research

100% coverage of the Web

Posted on March 9, 2016 by

Well that’s the holy grail. To be able to tap into World Wide Web as a whole is something that anyone dealing with data would like to have, but is far FAR from achieving (except maybe for the NSA, we don’t know). The idea behind Webhose.io is that when you need data from the web,

Continue reading

Posted in API | Comments Off on 100% coverage of the Web

How to Create a Custom RSS Feed for Content Monitoring

Posted on March 3, 2016 by

Imagine that you had the ability to track what’s being said, felt and published about a given topic, industry or brand. Whether you’re in marketing, sales, search engine optimization, management or just a curious person, there are some major benefits to staying on top of the latest discussions, trends, issues and developments happening in your

Continue reading

Posted in API | Comments Off on How to Create a Custom RSS Feed for Content Monitoring

How Crawled Data Gave One News Outlet the Edge in the Israeli Election

Posted on February 18, 2016 by

In the spring of 2015, as Israel prepared for general elections, virtually all of the mainstream media analysts believed that change was in the air. Conventional wisdom at that time had it that the Israeli populace was ready to turn its back on Prime Minister Benjamin Netanyahu and the government led by his Likud Party

Continue reading

Posted in API | Comments Off on How Crawled Data Gave One News Outlet the Edge in the Israeli Election

Webhose.io Archive Access is now LIVE

Posted on February 8, 2016 by

Following popular demand, we are really happy and excited to grant access to Webhose.io’s historical data archive. This is the first time that anyone can programmatically access a huge index of the internet for analytical purposes. We like to keep things simple here, so accessing the archive is as simple as one two three (and possibly

Continue reading

Posted in API | Comments Off on Webhose.io Archive Access is now LIVE

Five Reasons a News Crawler Is Essential to Your Business

Posted on January 5, 2016 by

“Originality is the art of remembering something but forgetting where you heard it.” Case in point, I don’t remember where I heard that. Nonetheless, it’s absolutely true, especially when it comes to running an online business. Why? Because in today’s online marketplace, sales, brand management, and genuine engagement are all practices that shouldn’t begin with

Continue reading

Posted in API | Leave a comment

Extracting Data from Forums: 3 Sources to Discover What Your Market Really Thinks

Posted on December 29, 2015 by

Robert Collier, the great ad man of the early 20th century, once summarized the secret of all effective marketing as entering “the conversation already taking place in the customer’s mind.” That’s powerful advice … and difficult. Why? Because most of the sources we normally turn to for market research are woefully incomplete. For example, surveys

Continue reading

Posted in API | Comments Off on Extracting Data from Forums: 3 Sources to Discover What Your Market Really Thinks

Article’s publication date extractor – an overview

Posted on December 13, 2015 by

A few days ago I’ve released an open source Python module that provides you with a simple way to extract and normalize the publication date of any online blog or news post. There are some commercial solutions out there, but why not just use this module for free?   The logic behind the code Here

Continue reading

Posted in API | Comments Off on Article’s publication date extractor – an overview

Dead simple {for devs} python crawler (script) for extracting structured data from any website into CSV

Posted on August 16, 2015 by

On my previous post I wrote about a very basic web crawler I wrote, that can randomly scour the web and mirror/download websites. Today I want to share with you a very simple script that can extract structured data from any <almost> website. Use the following script to extract specific information from any website (i.e prices, ids, titles,

Continue reading

Posted in API | Comments Off on Dead simple {for devs} python crawler (script) for extracting structured data from any website into CSV

Tiny basic multi-threaded web crawler in Python

Posted on August 12, 2015 by

If you need a simple web crawler that will scour the web for a while to download random site’s content – this code is for you. Usage: $ python tinyDirtyIffyGoodEnoughWebCrawler.py https://cnn.com Where https://cnn.com is your seed site. It could be any site that contains content and links to other sites. My colleagues described this piece of code I wrote

Continue reading

Posted in API | Leave a comment

Webhose.io Tip: Search for top performing (viral) posts

Posted on April 30, 2015 by

Our crawlers download millions of posts a day from millions of sources. Sometimes you may want to only sift through news or blog posts that had some kind of social impact. To provide you with this capability, we are introducing a new score we call the “Performance Score”.  

Continue reading

Posted in API | Comments Off on Webhose.io Tip: Search for top performing (viral) posts

Building a Better Search Query

Posted on December 10, 2014 by

Many factors can affect streaming data relevancy. When the data you consume isn’t ordered by relevancy, rather by the time it was crawled, getting the relevant posts is essential. I would like to share with you a few tips you can use to highly increase the relevancy of the data you consume via Webhose.io API

Continue reading

Posted in API | Leave a comment

Webhose.io Tips & Tricks: Search for Reviews

Posted on December 10, 2014 by

Are you looking to focus your data search specifically on consumer generated reviews? Here are a couple of simple Webhose.io tricks that might help: Limit your query to specific sites You can limit your search to specific “review sites” like amazon.com, bestbuy.com, newegg.com, cnet.com, engadget.com, pcmag.com etc.. Here is an example for how you should

Continue reading

Posted in API | Comments Off on Webhose.io Tips & Tricks: Search for Reviews