Case Study: DataRobot
We’re all familiar with clickbait content — articles with catchy headlines but little substance behind them. As many online publishers continue to incentivize their writers to produce this kind of content, demand is growing for ways to flag it and make it easier for online readers to access more in-depth, quality articles instead.
DataRobot has developed a highly accurate predictive model for flagging which articles might be clickbait. Their algorithm relies heavily upon the detected virality of the webpage — which is strongly correlated with this type of writing. Webhose supplied DataRobot with the metadata they need to analyze internet articles, such as article title, text, publisher, author, detected sentiment, and virality.
Using Webhose’s data, DataRobot discovered correlations between the virality score Webhose provided and specific keywords, titles, and external links. This enabled them to improve the predictive accuracy of its algorithm and identify certain words, phrases, and attributes (such as many external links) that are more likely to drive viral news.
Thanks to the insights DataRobot derived through Webhose data, it has been able to demonstrate to clients that there are alternative ways to drive viral content than simply writing clickbait article headlines. The company has since developed over 90 different machine learning models and rolled out, for customers, the ones that deliver the most accurate viral prediction score.
DataRobot is the category creator and leading provider of automated machine learning organizations. Organizations worldwide use DataRobot to empower the teams they already have in place to rapidly build and deploy machine learning models.
Webhose.io, the brainchild of Ran Geva and Guy Mor, two entrepreneurs with extensive experience in technology, data mining and product development provides on-demand access to web data feeds. Webhose empowers you to build, launch, and scale data-driven operations as you grow. Every web data feed is optimized to deliver up-to-the-minute coverage of a specific content domain including news, blogs, online discussions and forums, and more.