Documentation FAQs FAQ
Do you have any filter that will return data only from the top sites you crawl?
Yes. There are actually multiple ways to get better quality posts either from popular websites, or even popular posts. The first way would be to use the domain_rank filter. The domain rank filter specifies how popular a domain is (by monthly traffic), so if you want to search for posts from the top 1,000 sites, use the following:
Can you share the list of sites you are crawling?
Webhose.io doesn’t rely on a white-list to crawl the web, our crawlers find new sites and new content dynamically, so sending a list would be misrepresentative. If you want to know if we crawl a source or not, you can either use the “site:” filter, or email email@example.com with the list of sites you want to check.
Can I disable stemming when searching for an exact term?
Yes. Just append the dollar sign ($) to the end of the keyword. For example, searching for the keyword “simplivity” will also return hits for the word “simple” since we index the stemmed version of the word, but if you want to find documents that contain “simplivity” and nothing else, search for “simplivity$“.
Stemmed searches are currently supported for English, Spanish, Arabic and Russian.
Do you rate limit API calls?
Rate limiting of the API is considered on a per access token basis. You can make one request per second. Exceeding the API rate limit will result in a 429 HTTP error.
Does the API support wildcard expressions as the query?
The query syntax is based on Elasticsearch query string syntax, which means you can use wildcards.
Do you limit the length of the query, or the maximum number of Boolean clauses I can use?
The maximum length of a query is 4,000 characters.
Does the API support nested boolean expressions as well?
Boolean expressions can be nested in as many levels as you want.
For example: (exp1 AND exp2 AND exp3) OR (exp4 AND (exp5 OR exp6)) -(exp7 AND (exp8 OR exp9))
Can I get the highlighted fragments that matched my query?
Yes. Just add highlight=true as a parameter to your call.
How can I get all the posts of a thread?
To extract an entire thread, use the “thread.url” filter. This will return all the posts belonging to the thread URL provided. Example:
(note that you must escape the http:// part of the URL like so: http\:\/\/).
Pricing section says ‘100 results per request’. Does that mean we get only 100 results?
No. If your query produced more than 100 results, you can call the URL appearing in the “next” key in the results set to receive the next page presenting the next set of 100 posts.