Dark Web Monitoring FAQ
Get answers to all your questions about the Webhose Dark Web Monitoring and Data Breach Detection service.
What Tools Can I Use to Monitor the Dark Web?
Scanners, crawlers, and scraping tools can all be used to extract data from the dark web, but due to the ephemeral nature of much of the content uploaded by criminals to these websites, low latency extraction tools are a preferable methodology for capturing information for analysis.
Webhose’s Cyber API scans and extracts data from millions of dark web (.onion) sites, files, marketplaces, messaging apps, and forums and can serve the data extracted in both structured and unstructured formats.
Webhose’s technology also understands the meaning of abbreviations commonly used by criminals operating on these networks, such as “DUMP” (full credit card information) “fullz” (full package of an individual’s information), and “fishscale” (high quality drugs). It also retrieves information from password-protected deepweb and communities, indexes gated content, and can automatically solve complex triple CAPTCHA puzzles.
Due to the sensitive nature of information often posted on the dark web, unlike Webhose’s open web APIs, prospective users of the Cyber API must pass through a short approval process. National security and law enforcement agencies are among those whose dark web interception and analysis efforts are powered in part by Webhose’s service.
How Does Webhose Extract Data from the Dark Web?
Gathering data from the dark web is difficult. Unlike the open web, there is no straightforward means of indexing the network — and criminals tend to migrate data between its websites, networks, and secret forums in order to keep law enforcement and national security agencies at bay.
To provide a means for their customers to search through this information with their own analysis tools, Webhose’s team of cyber analysts constantly monitor the dark web to develop and maintain a proprietary list index of websites to crawl. This continuously updated index includes millions of active properties, many of which facilitate illicit activities.
The API, which is interacted with by making a simple RESTful API call, can output both structured and unstructured data in machine readable formats and be polled for entities such as email addresses, organizations, locations, and cryptocurrency wallet IDs. In particular, Webhose’s crawling bots are focused on capturing non-public information (NPI), personally identifiable information (PII), and information which may have implications for national security.