I tend to assume that for all its flaws The Economist gets its facts right – at least on technical issues. But this article on How Google Works in their technology section recently repeats a popular misconception about search. The article says, ‘Google is thought to have several complete copies of the web distributed across servers in California and Virginia’ – whatever they do have it is nothing close to a complete copy of the web. Even if they had a complete index of the text of the first 100Kb of each page on the publicly spidered web (the most they would even claim) this would still miss the huge volume of available information that is stored in web-accessible databases (like the “British Telecom phone book”:http://www2.bt.com/edq_busnamesearch).
I believe that a search engine that managed to do a good job of searching this ‘invisible web’ alongside the ‘surface web’ would have a good shot at the number one spot.
P.S. While on the subject of search, here’s a tip – to get a (small) discount on your next Amazon purchase, check out their new A9 search engine.