June 2019

Gooooolll! Who Will Win The World Cup?

Posted on July 15, 2018 by

It’s been a month since the World Cup began and as usual, there were quite a few surprises in these matches. Seriously – did anyone see Germany getting bumped in the first round?! Here at Webhose, everyone was psyched and so we did a friendly competition to predict the winners. Not surprisingly, the majority of

Continue reading

Posted in Machine Learning | Comments Off on Gooooolll! Who Will Win The World Cup?

How Alternative Data is Reshaping Finance

Posted on May 24, 2018 by

According to a report recently featured on the Financial Times (PDF), hedge funds are expected to spend upwards of $600m on digital datasets this year, and up to $1bn by 2020. What’s going on? Why are investment firms hoarding all this data, and what types of data are piquing their interest in particular? Read on

Continue reading

Posted in Big Data | Comments Off on How Alternative Data is Reshaping Finance

What is the Omgili Bot, and why is it Crawling Your Website?

Posted on December 28, 2017 by

Hi there. If you’re reading this, it’s probably because you’ve run into Omgilibot – perhaps in your web analytics or server logs (user agent: omgili/0.5 +https://omgili.com) – and turned to Google to decide whether this crawler is a benevolent creature that should be permitted to do as it will, or something more nefarious that deserves

Continue reading

Posted in API | Comments Off on What is the Omgili Bot, and why is it Crawling Your Website?

What is DaaS, BDaaS, DBaaS? And Why Should You Care?

Posted on August 8, 2017 by

The proliferation of data services has created a wide range of confusing buzzwords and acronyms – but at its core DaaS is still a meaningful concept. We are living in the age of everything as a service (EaaS?). What started as the simple and fairly easy to understand concept of software as a service has,

Continue reading

Posted in Big Data | Comments Off on What is DaaS, BDaaS, DBaaS? And Why Should You Care?

Crawling the Dark Web to Detect the Next Market

Posted on July 25, 2017 by

Over the past few days, the internet has been abuzz with talk of the recent blows dealt by law enforcement to two major dark web “marketplaces”, AlphaBay and Hansa market; and the subsequent suicide of Alexander Cazes – the Canadian programmer-turned-criminal mastermind behind AlphaBay, who ended his own life in a Thai prison while awaiting

Continue reading

Posted in Dark Web | Comments Off on Crawling the Dark Web to Detect the Next Market

Can Data Science Deliver a Fake News Detector?

Posted on April 4, 2017 by

Regardless of your political opinion, fake news has dominated the conversation since the 2016 US presidential election. The crux of the problem is that the very definition of what qualifies as fake news is in dispute. Still, most of us would like to know if the news story we’re reading reflects actual events – or

Continue reading

Posted in Machine Learning | Comments Off on Can Data Science Deliver a Fake News Detector?

The Hackathon Award for Best API Mashup Goes to…

Posted on March 26, 2017 by

Competitive programming competitions, commonly referred to as Hackathons, offer a great opportunity for new talent to show what they can do. Much like professional sports, industry leaders send recruiters to scout out the top performers. With high stakes on the line and limited resources, getting noticed as a hackathon winner not only looks good on

Continue reading

Posted in API | Comments Off on The Hackathon Award for Best API Mashup Goes to…

Webhose.io API Featured in New Guide to Web Development with Django

Posted on March 12, 2017 by

Last February, co-authors Leiff Azopardi and James Maxwell completed the latest edition of their book Tango with Django. It presents an excellent step-by-step approach to learning Python on the popular Django framework v1.9 (also compatible with v1.10). Although the book is designed as a beginner’s guide to web development, the material is packed with tips even

Continue reading

Posted in API | Leave a comment

How to use rated reviews for sentiment classification

Posted on February 9, 2017 by

Sentiment classification is a fascinating use case for machine learning. Regardless of complexity – you need two core components to deliver meaningful results; a machine learning engine and a significant volume of structured data to train that engine. Last month, we added the new “rating” field for rated review sites covered in the Webhose.io threaded

Continue reading

Posted in API | Comments Off on How to use rated reviews for sentiment classification

Can Crawled Web Data Tell the Future?

Posted on November 14, 2016 by

Robert Tercek’s book Vaporized: Solid Strategies for Success in a Dematerialized World recently recently won GetAbastract’s 2016 International Book of the Year award at the Frankfurt Book Fair. Based in Hollywood, Robert has  spent his entire career creating interactive content and inspiring others to do the same. He was kind enough to share a few words

Continue reading

Posted in Big Data | Comments Off on Can Crawled Web Data Tell the Future?

Should you buy crawled web data or build your own solution?

Posted on October 10, 2016 by

In a technologically driven environment, the temptation to develop a proprietary web crawling solution is virtually irresistible. Our latest report examines the true cost of computing and software development resources required to deliver a data crawling and structuring solution at scale: Development & Maintenance Development could mean coding a proprietary solution from scratch, or modifying an existing crawling

Continue reading

Posted in Technology | Comments Off on Should you buy crawled web data or build your own solution?

Guide to Structured Web Data Consumption: How to get instant access to news, blogs, and online discussions

Posted on September 1, 2016 by

Hundreds of entrepreneurs, researchers, and data scientists contact us daily with questions about accessing structured web data. We put together our answers our new guide to Structured Web Data Consumption.     The consumerization of web data It’s easy to fall into the trap of building a proprietary crawling and data structuring solution tailored to

Continue reading

Posted in API | Comments Off on Guide to Structured Web Data Consumption: How to get instant access to news, blogs, and online discussions

5 Ways to Measure the Impact of Crawled Web Data on Your Business

Posted on July 27, 2016 by

The analysis you provide is only as good as the raw data you start with. Although data from the open web is often perceived as a commodity, not all crawled data is created equal.  Whether you’re relying on a proprietary crawling technology, tapping into a vendor’s firehose, or implementing a combination of both strategies –

Continue reading

Posted in Technology | Comments Off on 5 Ways to Measure the Impact of Crawled Web Data on Your Business

Why Extracting Content From The Open Web Is Better than Surveys for Research

Posted on March 21, 2016 by

What’s the best way to find out how people feel about a given topic? Simply ask them, right? Well, at least that’s what we’ve been led to believe. Standard polling practice tells us that if you put together some questions, pose them to a group of people and then “normalize” the data to account for

Continue reading

Posted in API | Comments Off on Why Extracting Content From The Open Web Is Better than Surveys for Research

100% coverage of the Web

Posted on March 9, 2016 by

Well that’s the holy grail. To be able to tap into World Wide Web as a whole is something that anyone dealing with data would like to have, but is far FAR from achieving (except maybe for the NSA, we don’t know). The idea behind Webhose.io is that when you need data from the web,

Continue reading

Posted in API | Comments Off on 100% coverage of the Web

How Crawled Data Gave One News Outlet the Edge in the Israeli Election

Posted on February 18, 2016 by

In the spring of 2015, as Israel prepared for general elections, virtually all of the mainstream media analysts believed that change was in the air. Conventional wisdom at that time had it that the Israeli populace was ready to turn its back on Prime Minister Benjamin Netanyahu and the government led by his Likud Party

Continue reading

Posted in API | Comments Off on How Crawled Data Gave One News Outlet the Edge in the Israeli Election

The Top 10 Data & Analytics Articles of 2015

Posted on January 12, 2016 by

The online world of data and analytics is fast approaching epic portions. It’s easy to get overwhelmed. Why? Because, not only has big data been big business in 2015 … but posts, articles, podcasts, webinars, and resources abound. Some are worth your time. Some … are not. To help you dig through the very best

Continue reading

Posted in Big Data | Comments Off on The Top 10 Data & Analytics Articles of 2015

Social Media Analytics: Insights from Structured versus Unstructured Data

Posted on December 1, 2015 by

Let’s be honest … social media is a challenge. Not only is staying current, active, and “topped off” a chore, but crafting full-scale campaigns that contribute to your business’ and brand’s actual goals can be bewildering. At the same time, the market for social-media continues to grow. According to recent data from eMarketer, “Social Network

Continue reading

Posted in Big Data | Comments Off on Social Media Analytics: Insights from Structured versus Unstructured Data

Building a Better Search Query

Posted on December 10, 2014 by

Many factors can affect streaming data relevancy. When the data you consume isn’t ordered by relevancy, rather by the time it was crawled, getting the relevant posts is essential. I would like to share with you a few tips you can use to highly increase the relevancy of the data you consume via Webhose.io API

Continue reading

Posted in API | Leave a comment

Webhose.io Tips & Tricks: Search for Reviews

Posted on December 10, 2014 by

Are you looking to focus your data search specifically on consumer generated reviews? Here are a couple of simple Webhose.io tricks that might help: Limit your query to specific sites You can limit your search to specific “review sites” like amazon.com, bestbuy.com, newegg.com, cnet.com, engadget.com, pcmag.com etc.. Here is an example for how you should

Continue reading

Posted in API | Comments Off on Webhose.io Tips & Tricks: Search for Reviews

Crawling Horrors – Computer Vision Crawlers

Posted on November 26, 2014 by

So if RSS Crawlers are bad, Browser Scraping isn’t efficient, what about computer vision web-page analyzers? This technology uses machine learning and computer vision to extract information from web pages by interpreting pages visually as a human being might.  

Continue reading

Posted in Technology | Comments Off on Crawling Horrors – Computer Vision Crawlers