Built without a plan

Built without a plan

A partial diagrammatic map of node connections on the internet • Image by Matt Britt (CC BY 2.5)

Originally published 11 October 1999

The World Wide Web is the first tech­no­log­i­cal arti­fact that was not built from a blueprint.

Con­sid­er the most com­plex arti­fact you can think of: a super­com­put­er, the Space Shut­tle, a high-ener­gy par­ti­cle accel­er­a­tor, Boston’s new Cen­tral Artery. Some­where in a bunch of file draw­ers is a set of plans that were in place before con­struc­tion began. Even such sim­ple arti­facts as a stone chop­per or cop­per bracelet are fash­ioned from a men­tal blueprint.

But the Web just grew like Top­sy, willy-nil­ly, with aston­ish­ing speed. No one alive today under­stands it ful­ly. Of course, the rules of the Web were designed from scratch, and the tech­nol­o­gy in which the Web resides was built to pur­pose, but the Web itself was cre­at­ed by mil­lions of indi­vid­u­als world­wide, work­ing in essen­tial iso­la­tion from each oth­er, and guid­ed by a bewil­der­ing diver­si­ty of motivations.

The pub­licly acces­si­ble Web today con­sists of more than 800 mil­lion web pages, resid­ing on sev­er­al mil­lion servers and grow­ing by a mil­lion pages a day. The great major­i­ty of these doc­u­ments (more than 80 per­cent) con­tain com­mer­cial infor­ma­tion, such as com­pa­ny home pages. Six per­cent are sci­en­tif­ic and edu­ca­tion­al in con­tent. Porn, gov­ern­ment, and health relat­ed web sites account for one or two per­cent each. Per­son­al web­sites add anoth­er few percent.

The doc­u­ments are held togeth­er by bil­lions of con­nec­tions, called hyper­links. These are the click­able words or phras­es that let you jump from one page or site to another.

The Web resem­bles noth­ing so much as the human brain: huge­ly com­plex, most­ly unmapped, an impen­e­tra­ble tan­gle of nerves and connections.

Sci­en­tists have begun to study the Web as it if were a nat­ur­al phe­nom­e­non, like the brain itself, try­ing to tease out pat­terns of order.

For exam­ple, three physi­cists at the Uni­ver­si­ty of Notre Dame asked them­selves “What is the Web’s diam­e­ter?” They do not mean the spa­tial diam­e­ter, which is coin­ci­dent with the sur­face area of the Earth, but rather the con­nec­tiv­i­ty diam­e­ter: How many clicks does it take to get from any one place in the web to any oth­er place?

They began by mak­ing a com­plete map of the nd.edu domain, the sub­set of the Web asso­ci­at­ed with their uni­ver­si­ty. The domain con­tained (at the time of their study) 325,729 pages and 1,469,680 links. They found a math­e­mat­i­cal for­mu­la for the small­est num­ber of links that must be fol­lowed to find one’s way between any two pages in a web, as a func­tion of the total num­ber of pages. Their con­clu­sion: no doc­u­ment on the present World Wide Web is more than 19 clicks away.

The for­mu­la is log­a­rith­mic. When the Web grows by 10 times, its “diam­e­ter” will only increase to 21.

This rel­a­tive­ly small click­able diam­e­ter will be of inter­est to the peo­ple who pro­vide us with tools for search­ing the Web. The Web’s vast wealth of infor­ma­tion is use­less unless we can find our way to it. So far there have been two main strate­gies for mak­ing it accessible.

Com­pa­nies like Yahoo hire humans to surf the Web, eval­u­at­ing con­tent, and com­pil­ing hier­ar­chi­cal lists of sites that are like­ly to be of inter­est to their cus­tomers. These lists cat­a­log only a small por­tion of the avail­able sites.

The oth­er strat­e­gy uses auto­mat­ed “search engines” to crawl the Web from link to link, index­ing every word on every page. Type in a word or phrase, or a list of words or phras­es, and the engine will return a ranked click­able list of every doc­u­ment that con­tains those words or phras­es. The rank might depend upon such things as how many times the word appears on the pages, whether it appears in titles, how ear­ly in the text it appears, and so on.

Still, the “hits” returned by a search engine can be a daunt­ing­ly large col­lec­tion of use­ful info and garbage. And even the best search engines, like North­ern Lights and AltaVista, index less than half of the Web. Find­ing what one wants on the Web becomes increas­ing­ly dif­fi­cult at the Web grows bigger.

A kind of “arms race” exists between design­ers of search engines and the web­mas­ters who want to attract you to their pages. A com­mer­cial site, for exam­ple, might put on its home page every con­ceiv­able word or phrase that a poten­tial cus­tomer might look for, repeat­ing key words many times, in a type that is the same col­or as the back­ground. The words are invis­i­ble to the page view­er, but “vis­i­ble” to the search engine. The intent is to force the site to the top of ranked lists.

The next gen­er­a­tion of search engines will look at incom­ing and out­go­ing hyper­links as a mea­sure of a page’s impor­tance. Web­mas­ters will cer­tain­ly sub­vert this strat­e­gy too, by includ­ing super­flu­ous links. Pop­u­lar pages will get more pop­u­lar, and new pages will have an increas­ing­ly dif­fi­cult time mak­ing it onto search engine listings.

Com­pet­ing forces such as these will dri­ve the Web into undreamed-of pat­terns of con­nec­tiv­i­ty — a kind of unplanned evo­lu­tion by nat­ur­al selec­tion. Sci­en­tists who study the Web will be hard pressed to keep up.

Share this Musing: