The Invisible Web: Deep Web

ad space

The invisible web, also known as deep web and hidden web, is a part of the WWW not listed with search engines like Google and Yahoo. While search engines and directories don't have direct access to this monumental storehouse of online data, you can still access the invisible web. You just have to know how and where to look.

Why is it called invisible web?

Spiders wander throughout the web and they index web addresses they discover, but they don't know what to do with a page when they encounter it in the invisible web. These search engine crawlers can record the web address, but they can't tell you what the information is contained in the page.

There are many factors why everything is not visible, but mainly they all boil down to technical limitations or site owners' decisions to exclude their pages from spiders. For example, password-protected university library sites will be excluded from search engine results. Also, script-based pages that search engine spiders can't read will not be included in search engine results.

Searching deep and wide

Invisible web is not some kind of an X-Files deal that only people who have imprinted numbers on their foreheads can see. You can find databases that contain invisible web pages by using search engines. Just search a subject term plus the word "database". For example: "car crash database," "American literature database," or "real estate database."

Librarians' Internet Index, Virtual Library, Direct Search, and Invisible Web Directory are all valuable in academic research. The Resource Discovery Network is also a good resource, although resources come mostly from the UK. Infomine, maintained by the University of California, is an impressive resource that includes more than 100,000 links as well as access to thousands of databases.

The size of the invisible web

Amazingly, the invisible web eclipses the visible web in terms of data storage capacity. In fact, the visible web is only around 167 terabytes, whereas the deep web is approximately 91, 000 terabytes. What we see in common search are just a pinch of what's on the web. It is estimated that the invisible web is 500 times larger than the visible web.

The ambiguity of the invisible web

It is difficult to predict what sites or portions of sites will or won't be part of the invisible web. There are a number of factors involved such as which sites replicate content in static pages, which sites replicate it all, and which databases don't replicate their pages in links. Also search engines such as Google and yahoo can change their policies about what site will be included or excluded.