The Complete Guide to Selecting a News API

Posted on February 21, 2021 by Webhose

read the article

The Need for a News API  

From financial analysis and media and web monitoring to artificial intelligence and machine learning, organizations of all sizes rely on news APIs as the fuel behind their business insights. With the ever-expanding number of news sources online and news articles published every moment, it is critical that these organizations have comprehensive and continuous coverage of high-quality news sources to deliver accurate insights to their customers. 

That’s why media and web monitoring organizations are increasingly relying on tools and services to gather news data for them. 

The Challenge of Unstructured News Data

It’s not just the sheer mass of news data that makes collecting it a challenge – it’s being able to digest it into a format that is ready for analysis. Most of the news data on the web is unstructured, and it only becomes a structured dataset after applying any number of data extraction techniques. One study even suggests that 73% of content in English articles are irrelevant and need to be cleaned up.

That’s why a news API that takes raw data and transforms it into machine-readable data is a must for these organizations. Here at Webhose we offer an Enriched News API and a Structured News API, both collect news data for organizations based on their requirements. 

This guide is intended to serve as a resource for organizations that need a news API that continuously collects the extensive amount of online news articles available, delivering it into a digestible format so that they can focus on delivering insights, analysis, and more to their customers.

Enriched News API

A ready-made news feed with NLP-enriched data that can be quickly integrated into your application. 

More enriched data means more relevant data through our search engine so that you can quickly detect important events that could create a significant rise in stock value or affect your brand or product- anything from a natural disaster and fraud to a recent IPO, merger or  acquisition.

  • Comprehensive coverage of the top 50,000+ global news sources in 6 different languages
  • Entity and article-level sentiment with 5 different types of article-level sentiment
  • Smart entity extraction includes the advanced classification of entities into over 200 categories 
  • Includes news coverage of the last 30 days
  • Full text + Automatically generated summary of the article with the most relevant phrases – you decide what needs to be read in more detail 
  • Content deduplication
  • Reader comment counts
  • Automatically finds similar articles

Structured News API 

A news data feed with comprehensive coverage of news articles from a massive news repository with basic NLP-enrichment. 

The news API includes access to a wider range of news articles is critical for organizations wanting to stay ahead of larger industry trends, conduct competitive analysis of hundreds of products simultaneously, or provide comprehensive brand monitoring. 

  • Wide coverage of over a million global news sources in 76 languages 
  • Basic sentiment analysis with structured data delivered in JSON and XML format 
  • Filters searches according to persons, locations, organizations, keyword, location, social shares in social media, and more 
  • Includes access to more than 10 years of archived news stories
  • Article full text
  • Includes full text of reader comments – for deeper data analysis based on news stories
  • Basic URL deduplication
  • Includes access to 25 TB of historical data going back as far as 2008

A Few of the Most Common Limitations of News APIs 

Mainstream news publications today produce news at an astonishing rate. Organizations that rely on collecting news data for their data analysis and insights find that keeping track of the number of news articles published every day is increasingly challenging, if not impossible. For many of these organizations, an on-demand news API can help them discover, collect, and structure these news articles – but they need to beware of a few of the common challenges in selecting a News API

First, since there are so many news sites across so many different niches, organizations often struggle to find a news API that delivers the comprehensive coverage they need – including the ability to search articles according to a range of different languages. Second, the news data many APIs deliver isn’t always in a machine-readable format that can be easily digested and integrated into your solution. News data coverage isn’t always continuous either – sometimes there’s a high latency for all sites, or main news sites are crawled more frequently than smaller niche sites. Another challenge for many organizations is that when they want to scale, the news API they’ve selected for their one-time project isn’t advanced enough to scale with them. Finally, many only include current data and don’t have a historical archive of data at all – much less one going more than 10 years back that Webhose offers.  

All of these limitations create serious consequences for organizations. Incomplete and non-continuous coverage means you miss out on relevant data. That is unacceptable for organizations that deliver data to customers in the financial, news and media monitoring industries, not to mention brands conducting constant competitive analysis and nead near real-time data. Missing data-points also lead to inaccurate AI and machine-learning algorithms. (You can read more about biased algorithms and other ways to avoid biased data analysis in this blog post). The inability to scale often raises the price of search queries to one that is unaffordable for most organizations. 

These challenges are particularly acute for organizations that require a lot of data for their data analysis or insights.That is why organizations should ask themselves the following questions before they start their product development: 

  • What are your data requirements? Do you know the types of news sites you want to cover, and will that coverage change in the future? 

  • How do you need the data to be structured to integrate into your existing infrastructure? Do you have any specific technical limitations that need to be taken into account?   

  • Do you have the time and resources to crawl data (especially if you intend to scale in the near future)?  

  • How quickly do you intend to scale? Will these plans affect any of the previous considerations?

By taking time to evaluate these points early on in your product development, your organizations can save countless hours and resources in the future, letting your organization focus on what it does best.

When an Organization Might Not Need a News API 

Not every organization with a need for news data can benefit from a news API. Sometimes a simple news scraper that pulls a specific dataset from the web will suffice. Building your own in-house solutions also gives you full control of how often you decide to crawl for data, how to parse it, and what data to collect. If your organization is not looking to scale or if you need news data as a one-time project, your news data collection might be best done with in-house resources.

Since we understand that different organizations have different needs, however, we developed two separate news data products in response to these market needs: our News API, which covers a wider range of news sources but allows organizations to enrich the data themselves; and our Enriched News API, a ready-made news feed that collects news data from the top 50,000 news sources according to Alexa ranking with NLP-enriched data. The Enriched News API was developed in mind for organizations that are either starting their data enrichment from scratch or want to quickly enhance an existing product. 

Our mission at Webhose since the beginning has been to deliver high-quality, accurate, machine-readable data to organizations that need it for their data analysis and insights. We deliver this data to organizations of all sizes and across a wide range of industries, and we’re proud to play a role in shaping the latest trends in artificial intelligence, machine-learning, financial analysis, web and media monitoring, and more.

How Global Organizations Use a News API 

For organizations interested in understanding wider trends across the web or as the basis for their data analysis and insights, news data is critical. These enterprise-level organizations and startups exist across a wide range of industries. 

Here are just a few examples of how they use a news API: 

Financial Analysis 

Financial management institutions all over the world have been scrambling to get access to high-quality alternative data, including web and news data, to deliver data-driven investment strategies to its customers. One example of this is one of Webhose’s clients, a fintech organization specializing in big data and artificial intelligence that leveraged Webhose’s comprehensive set of news data to gauge impact of sentiment and emotions on stocks on a global scale during the peak of the coronavirus crisis. This type of predictive modeling and natural language processing requires a massive dataset – the kind that Webhose’s News API – with 2 million data sources in 120 different languages with over 10 years of history – can deliver.  

AI and Machine Learning

Another force heavily influencing the news data market is artificial intelligence. Experts estimate the AI software market to reach $37 billion by 2025 as its applications are adopted in almost every industry. Machine learning and natural language processing are rapidly expanding fields that fall under the category of artificial intelligence. DataRobot, an AI platform serving customers ranging from the software industry to grocery retail and healthcare services, used Webhose’s News API and datasets to develop over 90 different machine learning models to select the one with the most accurate viral score. This model served as an alternative monetization model for publishers to replace the standard clickbait model. 

Market Research

Many global brands need access to high-quality, continuous news data to conduct research on trends and the industry in addition to competitive analysis. The Brick Factory, a Washington DC-based digital agency, uses Webhose’s news API to evaluate the effectiveness of their digital marketing campaigns of their clients. The ability to set up complex queries and track hundreds of thousands of mentions of general trends dramatically reduced the cost, time and resources they had previously dedicated to data management. 

Media and Web Monitoring 

In today’s fiercely competitive digital age, having an in-depth understanding of the voice of the customer and your competitors is a must. News and media monitoring organizations need a news API with fast crawling cycles that delivers high-quality news to their customers. MeaningCloud, a text analytics platform, used Webhose’s coronavirus datasets as a solid basis to understand news trends in the Spanish online media. Researchers at Simon Fraser University compared Webhose’s news articles with fake news items from the Russian Internet Research Agency to serve as the basis for a fake news detector that identifies disinformation. 

Not all news APIs are created equal, however. Sometimes organizations don’t need a news API to understand large trends across the web; they need one with enriched data – such as Webhose’s Enriched News API – to lay the foundation for their AI application.

Check out our webinar with DataRobot and how they used data in order to predict virality.

The Need For NLP-Enriched News Data  

The demand for NLP-enriched data has exploded as enterprise and startup organizations alike incorporate AI and machine-learning applications into their offerings. IBM needs data for its Watson AI application, Google is exploring neural networks, Amazon is incorporating AI into its Amazon GO grocery retail stores – and those are just the largest brands. Not only do these brands rely on NLP-enriched data to develop AI applications from scratch, but also to ensure these applications continue to become more intelligent. 

When selecting a news API, it is important to understand that different news APIs serve different organizational needs. As we have already mentioned, some pull specific small data sets from the web while others can scale quickly. Others may deliver basic NLP-enrichment while others provide more advanced NLP enrichment of data. 

Here are a few signs that indicate an organization should consider selecting a news API with NLP-enriched data:

  • You are scaling and in a high-growth phase but still want to focus your effort and resources mainly on product development

  • You have an existing product but want to quickly enhance it so that you get to the market as fast as possible  

  • You find most news APIs to have limited querying options with data that is not relevant 

With this in mind, Webhose has developed an Enriched News API as a ready-made news feed that organizations can then quickly plug into their AI or machine-learning algorithms and applications. This news feed can also be easily customized for each customer’s specific needs. Its advanced features, such as article and entity-level sentiment and smart entity extraction with over 200 advanced classifications are dedicated to fulfilling this mission. It also deduplicates articles with similar text – like a press release republished in full in online media publication. This feature, along with the ability to automatically discover articles on a similar subject, are time-saving features that give the Enriched News API the plug and play search relevance these organizations need.

Data preparation is still a huge challenge for organizations, with data scientists still spending up to 45% of their time on data preparation. That preparation – a process  that includes discovering, collecting, structuring and delivering relevant online news articles – costs organizations countless time and resources. For many, it is not feasible. Other organizations may find that these resources are better spent on getting the product ready to reach the market as quickly as possible. 

At Webhose, we developed these new features for our Enriched News API with the goal of assisting organizations of all sizes to focus their time and energy on product development rather than data enrichment. We don’t want to see great products that failed to reach the market in time due to the misplaced use of resources spent on data preparation. And we’re excited to also play an important role in the growth of AI and machine-learning applications in global enterprises and startups alike.

Ready to Select a News API?

After reading this guide, you should be more aware of the main points to consider as well as the mistakes many organizations make when selecting a news API. You should also have a better idea of how they could impact your organization’s data analysis or insights and AI or machine-learning applications. You’ll be able to evaluate whether or not your organization needs a news API that delivers basic or more advanced NLP-enrichment. With all of this information, you should finally feel more confident about selecting the right news API for your organization.

At this stage, we suggest taking a few news APIs and trying them out to see which one truly suits your needs. Solutions that offer flexible pricing plans are also a consideration for organizations that aren’t able to commit to a long-term agreement. While you’re using the free trial you should also take a look at each product’s technical documentation and see how responsive their support team is to your organization’s questions or concerns. Technical support is also critical in the successful integration of your news API in addition to your product or AI and machine learning application. Whichever tool you end up selecting, we wish you a lot of success along the way.

Find out if Webhose’s News API or Enriched News API is the right fit for your organization. Start a free 10-day trial today!