Anyone who looks for information
on the Internet has to marvel at its ability to provide content
on almost any subject. Search for ‘beekeeping’ or ‘atomic
physics’ or ‘Tibetan cooking’ and you will be sure to find
something. But the quality and reliability of the results
vary widely, and how is one to sort through 50,000 references?
What the Internet has gained in breadth compared to traditional
information sources, it appears to have lost in depth. Definitive
reference sites vie for attention with the online musings
of millions of web users. And while more often than not
a search is fruitful, there is more luck than certainty
in success. As the web grows exponentially in size, the
ability to find authoritative material will be as significant
as the opportunity to give everyone a voice.
Cyveillance, Inc., of Washington, D.C., estimated the number
of publicly accessible web pages last summer at approximately
2 billion, and projected that this number would double by
early 2001. Yet the majority of these pages remain unindexed
by online search engines. An NEC study published in Nature
found that the most comprehensive search engine, Northern
Light, indexed less than 16% of the web. Broken links are
becoming more common as the Internet grows and changes,
and search engines are increasingly unable to keep up.
BrightPlanet.com LLC of Sioux Falls, S.D., has identified
a much more extensive collection of online resources, a
‘deep web’ that is unreachable by conventional search engines.
It estimated last summer that this hidden resource was about
500 times the size of the visible Internet, comprising more
than 500 billion documents. Much of the information on the
Internet today is stored in databases, where it is inaccessible
to the software used by search engines to build their indexes.
This content, BrightPlanet says, is “hidden in plain sight.”
“Searching on the Internet today,” the company says, “can
be compared to dragging a net across the surface of the
ocean. There is a wealth of information that is deep and
therefore missed. The reason is simple: basic search methodology
and technology have not evolved significantly since the
inception of the Internet.”
BrightPlanet has developed new search technology for identifying,
retrieving, qualifying, classifying and organizing ‘deep’
and ‘surface’ content from the web. It completed a study
of the hidden Internet last year. Among the most significant
findings:
- There are more than 100,000 deep web sites
- About 60 of the largest sites collectively contain about
750 terabytes of information, comparable to 40 times the
size of the visible Internet
- On average, deep web sites receive about 50% greater
monthly traffic than surface sites, and are more highly
linked; however, the typical deep web site is not well known
to the public
- The deep web is the largest growing category of new
information on the Internet
- Deep web sites tend to be narrower, with deeper content
than conventional surface sites
- Total quality content of the deep web is at least 1,000
to 2,000 times greater than that of the surface web
- Deep web content is highly relevant to every information
need, market and domain
- More than half of the deep web content resides in topic
specific databases
- A full 95% of the deep web is publicly accessible information
that is not subject to fees or subscriptions
BrightPlanet points out that if the most comprehensive
search engine indexes only 16% of the surface web, Internet
searchers are currently accessing only 0.03% of available
Internet resources.
BrightPlanet has created a portal (www.completeplanet.com)
where users can search for relevant deep web sites. The
portal also provides a comprehensive directory of some 40,000
sites on the invisible web, organized under 4,000 subject
headings. Other online directories of deep web resources
include Direct Search, Invisible Web, Lycos Invisible Web
Catalog and Web Data.
While these portals can take the user to the ‘front door’
of a promising deep web site, they do not provide the capability
of a traditional Internet search engine to search multiple
sites at the same time. BrightPlanet has addressed this
need by developing LexiBot – a new software program that
can search up to 60 deep web sites simultaneously. The software
is available on the company’s web site, where it can be
downloaded for a 30-day free trial. After the trial the
user must purchase the software, which is priced at US$
89.95. LexiBot is currently configured to access 600 deep
web sites.
While technology has made it possible for us to create
an unprecedented global information resource, it currently
stands in the way of our making effective use of it. Since
the lure of good information is irresistible, one hopes
that it is only a matter of time until there are new standards
for information sharing, and a new generation of tools,
like LexiBot, that will allow us to access the Internet’s
hidden resources. Until then, it seems, we’re still destined
to keep looking for a needle in a haystack.