What Is a TOR Search Engine?

Posted on July 26, 2020 by Yuval Michaeli

read the article

What is a TOR search engine? How does it work? And why is it so difficult to monitor the TOR network, in comparison to the open web?

A TOR search engine is basically the way to search the TOR network, which comprises the vast majority of the dark web or darknet.

But in order to understand what a TOR search engine is, you first need a basic understanding of the TOR network and browser. You’ll also need to understand how they work together to offer anonymous and private browsing of the internet.

Here at Webhose we’ve decided to put together a brief explanation of TOR, its history, and an overview of the different TOR search engines that exist. It’s important to note though that this review covers TOR search engines for public non-commercial use. These search engines don’t  have the same advantages that commercial technology has in comparison.

TOR and the Onion Network 

TOR (The Onion Router) is a software that allows for anonymous communication through onion routing. Data is encrypted and sent through network nodes called onion routers, which remove a layer of encryption each time the date is sent to another node until it arrives at its final destination. This multi-layered network encryption – similar to an onion – is how TOR got its name. 

The TOR network is composed of users that volunteer to connect their computers to the network, once they do, they act as nodes that support it and increase its security. The relay of data through these nodes are what give TOR the anonymity, privacy and decentralization that makes it safe for whistleblowers, journalists, and citizens of oppressive regimes to use. Or just private citizens who don’t want their ISP to be able to see the browsing activity. Since it offers such a high level of security, however, TOR is also used by criminals, hackers, and terrorists to buy and sell illegal goods and services, breach data, and plot attacks on countries and organizations.

The TOR Network and Browser 

Sites in the TOR network are easy to identify, because they end in “.onion” in the URL and can’t be accessed with a regular browser. The TOR network includes its own TOR browser so that you can access these .onion sites. It also enables you to view not just TOR – the majority of the dark web – but also the “regular” internet (what we call the open web) anonymously and privately. Users of the open web can use TOR to avoid Internet Service Providers (ISPs) and advertisers from collecting data about their online activity.

The TOR Browser and DuckDuckGo

In 2016, DuckDuckGo became the official search engine for the TOR browser.

DuckDuckGo is a search engine that is known for enabling private browsing because it does not store user information such as IP addresses or search history. When combined with the TOR network, it complements the user experience to provide even more anonymity and privacy. 

But here’s the thing: DuckDuckGo as a search engine does not return results from the onion network. It only returns search results for the clearnet, or open web. To dive deeper and get coverage of the dark web and the TOR network, you’ll need a TOR search engine, one with access to .onion sites.

The Need for a TOR Search Engine 

Multiple TOR search engines currently exist, such as Candle, Torch, Kilos, AHMIA, and Tor66 exist that explore the TOR network. The problem is that most dark web search engines — aside from AHMIA — offer only partial coverage of dark web sites like marketplaces and forums. In addition, with the exception of Kilos, they don’t have granular filtering capabilities either – so your search results aren’t always relevant. And none of them are able to return search results for content blocked by logins and paywalls.

There are a few TOR directories that also exist, like Dark.fail, Darknetmarketslink and others which list hyperlinks of active sites on the dark web and divide them into categories. However, these directories are often outdated and can contain broken links. Their coverage of the dark web is limited. And since different directories give you different links, you’ll need to constantly cross-reference them to make sure you didn’t miss a listed site somewhere. 

In other words, if you want robust coverage of the TOR network and the most relevant search results, you can’t rely on the dark web search engines or directory listings that currently exist.

Why It’s Hard to Build a TOR Search Engine 

It turns out that the dark web search engines above have limitations because it’s fairly complicated to build a TOR search engine. As experts in developing technology that crawls and extracts data from the open web, we thought adding dark web data to our data repositories would be fairly simple. But here are just a few challenges we encountered.

A Lack of a Search Directory 

The dark web isn’t indexed in the same way as the open web. Unlike the open web, sites on the dark web do not wish to be discovered. External links to other sites are rare, and a limited number of directories exist that map the dark web. So while the ability to return search results can be automated, human intervention is still necessary to read and discover the important sources to crawl. And that’s not enough either, because the dark web is expanding exponentially. So you also have to rely on special dark web monitoring technology that can constantly discover new sites, networks, and marketplaces within the complexities of the dark web. 

Returning Relevant Results 

Since the dark web isn’t indexed nice and neat like the open web is on Google’s search engine and others, it can be difficult to receive relevant results. You’ll have a hard time finding what you want without a way of categorizing data into different categories: drugs, terror, PII (Personally Identifiable Information). That’s why, here at Webhose, we used machine learning to automatically categorize the content and extract specific and multi entities such as credit card numbers, social security numbers, phone numbers, emails, so that users can quickly locate relevant information in a simple and convenient way through our cyber API. 

CAPTCHAs and Passwords Block Crawling 

The majority of domains on the dark web are gated by a CAPTCHA or login page. You need an effective way of bypassing these form authentication processes or you won’t be able to access most of the data on the dark web, even if you are constantly discovering new sites and marketplaces. You’ll be blocked. 

Legal Implications 

Finally, there are legal challenges in providing dark web data to individuals and organizations. The dark web monitoring technology you ultimately select should abide by KYC (Know Your Customer) principles and have strict agreements about how clients can use it. At Webhose, we are careful to follow these regulations, and we are especially careful to sanitize sensitive information like credit card number and passwords to prevent abuse.

Shedding Light on the Dark Web 

The TOR search engine, network and browser are all tools that together offer exploration of the dark web and open web with a high level of anonymity. Many of the current tools available to the public like the current dark web search engines and directories and even DuckDuckGo have significant drawbacks, as we’ve stated. Here at Webhose we’ve worked hard to deliver you comprehensive coverage of both open and dark web data on the TOR network and beyond.

Want to learn more about how to monitor the TOR network with Webhose’s Cyber API? Schedule a call with our data experts today!