T3x Crawler results

About this website

This website is the result of a personal project. A crawler found all the sites here listed.

This project started as an idea during Christmas vacation and was intended to be only a test and a try to figure out how to crawl the internet.

What it does

The crawler was only a small script that should grab the front page of a website, get some meta tags, and search for external links. Since it only gets the content of the front page, I expected that the queue would get a certain number of domains, and then it would stop.

But it didn't stop. It started running, and it got more and more domain. At this point, Jun 2020, the crawler had a look at 62 Million websites.

Since I was only interested in the used Content Management System (CMS), you will find the filtered results of the websites that contained the generator meta tag.

I filled the queue with five domains and started the script on a Raspberry PI 3. In between, I had multiple Raspberries running and searching. Now one server is running various jobs. One job is crawling; one task is updating the result set, plus cleanup and backup tasks.

The crawler is not running on this webserver. This database is synchronized daily.