How we quadrupled the performance of Elasticsearch

Posted on July 19, 2015 by Ran Geva

Well, that’s a misleading title. We actually quadrupled the performance of our brand monitoring alert system that uses Elasticsearch’s Percolator, but that would have been a much longer title. Some background Buzzilla has two main products. The first is Webhose.io which provides businesses worldwide access to structured data from the open web, and the second...

Continue reading

Posted in Technology

Webhose.io Tip: Search for top performing (viral) posts

Posted on April 30, 2015 by Ran Geva

Here at Webhose, our crawlers download millions of posts a day from millions of sources. When searching for web data among these many sources, you may want to limit your results to news or blog posts that had some kind of social impact. To provide you with this capability, we are introducing a new score...

Continue reading

Posted in API

Building a Better Search Query

Posted on December 10, 2014 by Ran Geva

Many factors can affect streaming data relevancy. When the data you consume isn’t ordered by relevancy, rather by the time it was crawled, getting the relevant posts is essential. I would like to share with you a few tips you can use to highly increase the relevancy of the data you consume via Webhose.io API...

Continue reading

Posted in API

Webhose.io Tips & Tricks: Search for Reviews

Posted on December 10, 2014 by Ran Geva

Are you looking to focus your data search specifically on consumer generated reviews? Here are a couple of simple Webhose.io tricks that might help: Limit your query to specific sites You can limit your search to specific “review sites” like amazon.com, bestbuy.com, newegg.com, cnet.com, engadget.com, pcmag.com etc.. Here is an example for how you should...

Continue reading

Posted in API

Vertical aggregation & Pattern matching crawlers

Posted on November 27, 2014 by Ran Geva

After bashing various crawling techniques, I would like to describe the technique we use here, at webhose.io, a technology that was developed over the past 8 years. Our crawlers were developed with the following demands in mind: Efficient on server resources, i.e CPU & bandwidth Fast in fetching and extracting content Easily add new sites...

Continue reading

Posted in API