Data at Scale: The Google News API vs Webhose

Posted on June 2, 2020 by Guy Mor

read the article

When you want to find a particular news article, your first thought is probably to Google it. From Altavista to AOL and Netscape, many of the earliest search engines haven’t successfully stood the test of time. Google has become the standard go-to for searches in an age of ever-expanding news articles and data.

 If you’re in the market for regularly updated news articles, you’ve probably heard of the Google News API. The Google News API is an aggregate news service that integrates Google News search results into an application or web page. Results are available in 35 languages and updated regularly.

Beyond individual Google searches, businesses also rely on search engines to keep up with the latest trends in their industry and the competition. Brands leverage search engines to monitor their brands and the voice of the customer. 

Hedge funds and investment management firms need to have access to the most accurate data and relevant news articles for predictive analysis. And sometimes a research institution or lone entrepreneur needs access to a specific data set for developing new machine-learning or natural language processing algorithms.  

So businesses might naturally turn to Google to gather and collect data. But this might not be the best solution for all businesses. 

4 Major Limitations of the Google News Search API

Google’s Programmable Search, once known as Google News API, has come a long way since its early years. In 2011 it was officially shut down and replaced by the Custom Search API. The Custom Search API was a RESTFUL API that enabled you to develop applications that would receive and show search results (web search, images) from Google Custom Search programmatically. With this API, you could use RESTful requests to get either web search or image search results in JSON format. You could also customize your search among different topics and websites. As the Custom Search API gradually evolved into Google Programmable Search, these features remained. 

But there are a few limitations. 

It Doesn’t Deliver Structured Data 

While Google can extract structured data that was in a meta tag inside HTML, it mostly provides search results limited to a title, link, and text snippet. But what about other important information like the author of the post, country, language, or the name and date of the publication?  An advanced news API like Webhose can also extract ranking and scoring data to provide a measure of traffic levels, social distribution volume, and relevance. 

Businesses relying on Google’s API for search results have an extra step of structuring data before plugging their data into an algorithm for insights – which consumes vast amounts of time and resources. Especially when you’re talking about data at scale.  

It is Beholden to Google’s Algorithm 

Google loves to tweak its search algorithm. It made more than 3,200 changes to its search algorithm in 2018 alone.  Many times these changes are minor, but occasionally there’s a big one that shakes up the Search Engine Results Page (SERP). Although the overall aim is to offer a better user experience, there are plenty of examples of top-quality (and low-quality) sites that were negatively impacted.

So if you rely on Google’s API for your search results, you’re beholden to their algorithm and the way their search engine ranks content. Newer sites and niche sites may not appear in the search results. Or they might – and then one day suddenly not. It’s best to find a web crawler that isn’t subject to these changes if at all possible. 

Google’s algorithm also determines the frequency with which it crawls and indexes data on the web. But Webhose’s crawlers allow users to determine the crawling frequency themselves – whether it be every day or every hour so it can discover the most recent and relevant data for your business. 

It Isn’t Affordable at Scale

If you’re a small startup or a mid-size business scaling up, you’ll need to be able to scale your data in a cost-effective way. For these customers, Google’s Search API is expensive, allowing you 100 queries per day for free but after that it costs $5 per 1000 queries. In addition, there is a maximum limit of 10,000 queries per day.  In contrast, Webhose offers up to 1000 free API calls per month with its 10-day trial, with additional requests after that increasing incrementally in cost.

They Sometimes Shut Down Their Products 

One of Google’s more admirable qualities is their willingness to innovate and take risks on products. But the other side of that is that they also are prone to suddenly shut down their products that they decide aren’t profitable, sometimes without much warning. Remember Google Hangouts and Google Plus? Both were shut down after 8 years. Google Picasa? Shutdown after 4 years.

 Maybe you don’t want to cast your fate to the wind and would rather invest in a news search API that’s focused on bringing your relevant, accurate, structured data at scale for the foreseeable future. 

Fuel Your Analysis with Structured Data at Scale 

Another solution would be to develop your own news API, but that can drain time and resources that could be better spent on analysis. Many times, businesses don’t have the resources for developing an in-house news API. Sometimes they are looking for a very targeted dataset. Take the case of MeaningCloud, who leveraged Webhose’s free news datasets related to the coronavirus to gain insights on the latest trends in the new media in Spain from that period. Webhose’s datasets, retrieved from our News API, enabled them to focus their time and resources on delivering insights rather than on collecting and extracting the relevant data necessary. 

Or take the case of SESAMm, a big data and AI company that analyzes billions of news articles from the web to provide machine learning tools to build investment strategies. As the coronavirus crisis unfolded and wreaked havoc in the markets, it relied heavily on Webhose’s News API to gauge investor sentiment in various geographic locations and different financial markets. 

So if you want structured, high-quality data at scale that isn’t beholden to Google’s algorithm that you can rely on to continue running in the foreseeable future, check out an advanced News API like Webhose.