General FAQ

Get answers to all your questions about the Webhose service.

Can I Use Webose to Capture Web Data?

Yes. Webhose crawls the open (or “clear”) web and allows users to retrieve pre-filtered data in machine-readable format. It indexes millions of posts from news sites, online discussions, blogs and reviews. Users can query the API by keywords, author, sentiment, location, and organization. The internet is a resource that is constantly shifting in both size and content and no one tool, including major search engines, can ever index it completely. For organizations at any stage of their growth, however, Webhose’s open web API can serve as their primary means for ingesting data from the open internet or serve as a supplement to existing sources.

Can I Use the News API for Financial Analysis?

Those trading the financial markets, particularly high-frequency traders, require access to real-time financial market data to stay on top of market and geopolitical trends. Webhose’s News API can allow traders to gain a broader perspective over the financial market by adding global news, forums, reviews, and blogs to the repository of informational sources that they make trading decisions based upon, resulting in them taking smarter, more profitable, decisions. AI-powered decision-making is currently responsible for more than two-thirds of global financial transactions. A rapid supply of data is required to feed the continuously improving machine learning models that power these technologies. Additionally, by correlating historic market data with subsequent market movements, financial institutions can deduce trends which can inform future trading decisions and develop predictive analytics systems to determine the future trajectory of financial instruments and markets.

Can I Use Webhose to Build my own Datasets for Machine Learning?

Yes. Building artificial intelligence (AI) models that rely on machine learning requires supplying datasets. Webhose delivers both historical and real-time data feeds at scale that can power use-cases such as predictive analytics engines, natural language processing (NLP) tools, and financial analysis programs. Webhose users can leverage over 25TB of historical data through using the open and historical archived web data. In addition, those considering using the service can evaluate by using free datasets including information retrieved from blog posts, online messaging boards, and news articles. Webhose has been successfully used to power models which identify fake news.

How Can I Get Access to Data Feeds in Near Real-Time?

Webhose’s data feeds offer accurate, up-to-date information from relevant online, app reviews or rated discussion sources about your brand that can be vital for keeping tabs on the dynamic nature of product sentiment and the voice of the customer. Our web crawlers are scheduled to collect data from major websites, several times a day and deliver it to you in a structured, machine-readable JSON format – ready for analysis.

What Tools Can I Use to Acquire Data from the Web?

Scraping tools are able to automatically capture and export information from the internet and can also detect and output both syntactic and semantic components, such as phone numbers, email addresses, and other contact details. Many tools can be used to scrape data from the internet: dedicated web scrapers, RSS feeds, and various web APIs are among the most popular tools.

Webhose provides an API that facilitates scraping the web — at scale. Its spiders index, capture, and analyze millions of posts a day, including content from the dark and deep webs which is notoriously difficult to capture for structured analysis.

Does Webhose Offer Prepared Datasets?

Webhose offers a range of high-quality, free datasets spanning multiple content domains, including online reviews, news, blogs, and discussions. Additionally, millions of open and dark web content is indexed every day and its data is structured for delivery by API to clients. Both API results and the static datasets available for free download include extracted elements (entities common to a particular source type), inferred elements, such as language and author name, as well as enriched data such as web ranking and social distribution volume. Students, researchers, and commercial enterprises that want straightforward access to pre-structured web data in a unified format — the first stage in building viable AI and machine learning models — can all leverage Webhose’s datasets to further their objectives — without having to waste time developing their own data collection and structuring systems.