Crawling Horrors – RSS Crawlers

Posted on November 24, 2014 by Ran Geva

One of the fastest, simplest and unfortunately wrong ways of extracting content out of a website, is by reading its RSS feeds. I will show you how its done and why it’s useless. Each RSS feed already contains the data, structured and ready for harvesting, so content extraction is indeed simple and fast. Let’s take...

Continue reading

Posted in Technology