The web data business can get pretty tricky, especially when your job is to extract the broadest possible dataset from the planet’s biggest database. Last week, Webhose CEO Ran Geva ran a fun experiment to visualize Hillary Clinton’s web network. More precisely, who are the top 100 people most frequently mentioned in news articles and blog posts that reference Hillary Clinton?
You might think a straightforward approach would be to simply list the names and rank them by number of mentions. But then you’d lose the networked component. Ran was interested not only in a list of names ranked by number of mentions, but in the relationship between them.
The Cool Useless Demo is Born
So he ran a simple script to query the Webhose.io index and return just the top 5 people mentioned most frequently alongside Hillary Clinton. Then for each of those names, he ran the same script again. Whenever the script returned a name that was already on the list, a new connection (also know as a “vertex” in graph theory) was added to the network graph. The script kept running until the total number of names on the network graph reached 100.
Ran then plugged the dataset into VivaGraphJS and voila! The interactive infographic and data viz Cool Useless Demo was born. Then something really weird happened. He decided that while it was really cool, there’s wasn’t much to do with the graph and was therefore going to forget about it and move onto more important things. I begged him to show the world, and as a compromise promised to keep the geeky title of Cool Useless Demo.
You can learn more about how Ran created the Cool Useless Demo here, and if you know your way around the code you can even create your own version.
Have fun, stay cool, and don’t worry too much about usefulness!