Monitoring mass media has come a long way since the days of the press-cutting agency. The bulk of today’s news is published online, while modern technology lets us store, index and query massive amounts of textual data in milliseconds.
Digitization presents clear advantages for consumers, who can now read or watch the news from the palm of their hand, anywhere and at any time; and it also creates exciting possibilities for organizations that develop products around this newly-found wealth of information.
For example, marketing agencies can identify important mentions of their customers and send them notifications in near real-time; financial analysts can extract insights on company and stock performance; and researchers can use sentiment analysis, machine learning and other data science tools to extract deeper insights from the day’s headlines.
In all of these examples, there is an unsung hero that connects between the organization and the media it wishes to monitor, analyze or research: the news API. If you’re unsure what this means or are comparing a few different news APIs, this article is for you.
What’s a News API?
An application programming interface, or API, is simply a way for two websites or pieces of software to talk to each other. Programmers use APIs as building blocks that allow them to automate repetitive tasks, and to develop additional functionality on top of these blocks. E.g., if a website allows you to create an account using Facebook, it would use the Facebook API to retrieve your details from facebook.com (instead of a developer manually writing code to do so); the website’s back-end then uses the retrieved details to create your account.
News APIs connect between applications and online news stories. Whether you’re trying to create automatic coverage reports for your clients, predict the outcome of the elections, or use news stories as a data source for sophisticated AI applications, you first need a way to automatically and methodically extract machine-readable data from news websites. This data can then be scanned, enriched, and analyzed to serve whichever exciting use case you had in mind.
Why not just use Google?
Google search provides unparalleled coverage of the world wide web, not to mention a mind-boggling ability to serve the exact right content based on a search query. However, Google does not currently provide a way to extract these results, or to perform further data mining and analysis on the content of its indexed websites.
This means that the only viable way to use Google for these purposes is to have someone, most likely an increasingly-disgruntled intern, regularly copy and paste Google search results into a spreadsheet. This is unscalable for news monitoring, and useless for analysis (as you still will not have article text and date in a unified format, and these would have to be manually scraped from news sites).
Types of News APIs
While the above definition should apply in the vast majority of cases, there are some nuances to the way the term “news API” is used. APIs that are commonly lumped into this category include:
- An API for a specific online news service, such as the New York Times website (ProgrammableWeb has a heap of these). The amount and type of data delivered depends on the specific website or service – e.g., the ESPN API lets you retrieve specific information about sports teams and matches, but does not provide the full text of the articles that appear on espn.com; The Guardian API is broader.
- A feed of news stories or headlines with links to the original publications, often delivered through an RSS feed or XML files. These services will usually provide a list of publications that can be useful for media monitoring,
- Structured data extracted from news websites and provided as a service: this includes the Webhose.io News API and a few of our competitors. In this case, you would get broader datasets with title, body content, and other data points that can be accessed via the API and used to power many different types of analyses.
4 Tips for Choosing the Right API for Your Needs
Hopefully this article has helped you understand what a news API is and what it’s good for. If you think you could benefit from using a news API, here are a few quick pointers on finding the right API to support your use case:
Full Text or Headlines
As covered in the previous section, many news APIs only provide snippets or headlines. This could be fine if you just need to find and list specific news stories, but is less helpful for textual analysis of the articles themselves. You might also ask yourself whether you want just the news story, or the reader comments that are often included in these publications, but cannot be accessed by most APIs (with Webhose.io being an exception).
Which types of media outlets do you need covered? Are you interested only in the major news websites like cnn.com, or would you consider a prominent blogger to be a news source as well? Are you looking for results in English only, or are other languages relevant as well? The breadth of coverage depends on the underlying data that your news API is accessing, so you should find one that covers every publication that could be relevant for your project.
Accessing the Extracted Data
How easy is it for you to “feed” the data delivered by the news API to your own algorithms? Is the data coming through in a consistent structure and in machine-friendly formats such as JSON, or do you need additional tools to transform the data before starting the analysis?
Ease of Use
Finally, as with any API, you need to see how comfortable it will be for your developers. Does the API have clear documentation? Does it follow conventional standards? How much effort will be required to integrate this API into your own software or workflows? The best way to answer these questions is to have your developers get their hands dirty working with the API, start running queries and see what results they’re getting. And if there isn’t an evaluation version or free trial available, run for the hills.
Want to learn more? Discover the challenges of online media monitoring by reading The Race to Achieve 100% Coverage of the Open Web, explore ways to extract data from websites, or visit our API playground for instant access to the most comprehensive database of news, blogs, and forums on the web.