Google’s Index Reaches A Trillion URLs

Saturday, July 26, 2008

325

Google Logo (Image courtesy Google)
By Andrew Liszewski

The internet as we know it would be a very different place if Google didn’t exist. Sure there’d be other search engines to fill the void, but none of Google’s competitors seem as obsessed with scouring and indexing every nook and cranny of the internet in order to make it easily searchable. According to the Google Blog, their first index way back in 1998 had 26 million pages, and by 2000 that had grown to 1 billion. But recently that number hit another milestone as 8 years later Google’s index has passed the 1 trillion unique URLs mark. Here’s an interesting quote from their blog that puts that amount of information in perspective:

To keep up with this volume of information, our systems have come a long way since the first set of web data Google processed to answer queries. Back then, we did everything in batches: one workstation could compute the PageRank graph on 26 million pages in a couple of hours, and that set of pages would be used as Google’s index for a fixed period of time. Today, Google downloads the web continuously, collecting updated page information and re-processing the entire web-link graph several times per day. This graph of one trillion URLs is similar to a map made up of one trillion intersections. So multiple times every day, we do the computational equivalent of fully exploring every intersection of every road in the United States. Except it’d be a map about 50,000 times as big as the U.S., with 50,000 times as many roads and intersections.

And to think, I actually complain about having to check the 100+ websites in my RSS reader every morning. Anyways, good work Google, and we all look forward to celebrating when you pass the quadrillion mark!

[ Google Blog – We knew the web was big… ] VIA [ Slashdot ]