A Needle in a Haystack |
| Anyone who looks for information on the Internet
has to marvel at its ability to provide content on almost any subject.
Search for ‘beekeeping’ or ‘atomic physics’ or ‘Tibetan cooking’ and you
will be sure to find something. But the quality and reliability of the
results vary widely, and how is one to sort through 50,000 references?
What the Internet has gained in breadth compared to traditional information sources, it appears to have lost in depth. Definitive reference sites vie for attention with the online musings of millions of web users. And while more often than not a search is fruitful, there is more luck than certainty in success. As the web grows exponentially in size, the ability to find authoritative material will be as significant as the opportunity to give everyone a voice. Cyveillance, Inc., of Washington, D.C., estimated the number of publicly accessible web pages last summer at approximately 2 billion, and projected that this number would double by early 2001. Yet the majority of these pages remain unindexed by online search engines. An NEC study published in Nature found that the most comprehensive search engine, Northern Light, indexed less than 16% of the web. Broken links are becoming more common as the Internet grows and changes, and search engines are increasingly unable to keep up. BrightPlanet.com LLC of Sioux Falls, S.D., has identified a much more extensive collection of online resources, a ‘deep web’ that is unreachable by conventional search engines. It estimated last summer that this hidden resource was about 500 times the size of the visible Internet, comprising more than 500 billion documents. Much of the information on the Internet today is stored in databases, where it is inaccessible to the software used by search engines to build their indexes. This content, BrightPlanet says, is “hidden in plain sight.” “Searching on the Internet today,” the company says, “can be compared to dragging a net across the surface of the ocean. There is a wealth of information that is deep and therefore missed. The reason is simple: basic search methodology and technology have not evolved significantly since the inception of the Internet.” BrightPlanet has developed new search technology for identifying, retrieving, qualifying, classifying and organizing ‘deep’ and ‘surface’ content from the web. It completed a study of the hidden Internet last year. Among the most significant findings:
BrightPlanet points out that if the most comprehensive search engine indexes only 16% of the surface web, Internet searchers are currently accessing only 0.03% of available Internet resources. BrightPlanet has created a portal (www.completeplanet.com) where users can search for relevant deep web sites. The portal also provides a comprehensive directory of some 40,000 sites on the invisible web, organized under 4,000 subject headings. Other online directories of deep web resources include Direct Search, Invisible Web, Lycos Invisible Web Catalog and Web Data. While these portals can take the user to the ‘front door’ of a promising deep web site, they do not provide the capability of a traditional Internet search engine to search multiple sites at the same time. BrightPlanet has addressed this need by developing LexiBot – a new software program that can search up to 60 deep web sites simultaneously. The software is available on the company’s web site, where it can be downloaded for a 30-day free trial. After the trial the user must purchase the software, which is priced at US$ 89.95. LexiBot is currently configured to access 600 deep web sites. While technology has made it possible for us to create an unprecedented global information resource, it currently stands in the way of our making effective use of it. Since the lure of good information is irresistible, one hopes that it is only a matter of time until there are new standards for information sharing, and a new generation of tools, like LexiBot, that will allow us to access the Internet’s hidden resources. Until then, it seems, we’re still destined to keep looking for a needle in a haystack. RESOURCES: BrightPlanet white paper describing the Deep Web
(Adobe Acrobat file) – BrightPlanet largest deep web sites – BrightPlanet LexiBot software – CompletePlanet portal for searching the Deep Web
– Cyveillance estimates of Internet size – Direct Search – Invisible Web – Ken Wiseman, Apple Distinguished Educator, Librarians’ Index to the Internet – Lycos Invisible Web Catalog – The Standard, “Diving Into the Deep Web” – Web Data – |
www.innovationwatch.com |