The Internet’s Data Structure
...or Lack Thereof
The World Wide Web is full of unstructured data. There are some light restrictions on how the web is organized (e.g., by nation or by top-level domain—.edu, .gov, etc.), but the web itself is largely unstructured. For example, there are no restrictions on how domain names are semantically organized. Websites about “ferrets” do not necessarily have any distinguishing characteristics in their domain names, such as including the term “ferrets” (e.g., “ferrets.com") or, better yet, a taxonomy like “mustela.mustelidae.carnivora.mammalia.chordata.animalia.”
Imagine the web were organized in such a manner as this.
- How would locating information be different?
- How would creating information be different?
Historically, there have been some attempts to overlay structure onto the Web. One of the most prolific is the Open Directory Project (aka DMoz), which contains structured lists of links to individual pages on the web. These directories form the structure and the links represent the data. Watch the following video that demonstrates the difference between structured and unstructured searches for the key term “twins":
Notice that finding information is extremely easy using key terms in Google. Type
twins, hit enter, and BAM! Results. However, there is a lot of noise in the results. Minnesota Twins and Twins the movie are among the top hits (because Google apparently knows that the searcher loves baseball and Arnold Schwarzenegger movies).
On the other hand, searching via the directories of DMoz led us down the wrong path initially. In essence, you need to know the organizing structure that they created in order to really make it useful. Note that Google actually provides some structure when you use its autocomplete feature, which could be considered a dynamic structuring schema.
Try this type of experiment for yourself. Conduct both a key term search with Google and a directory search with DMoz on the same topic of your choosing. Compare and contrast the results.
Try this experiment:
- Choose a key term to search for.
Prepare a stopwatch and time how long it takes you to find quality results for your key term doing the following:
Submit links to the best link you find with your Google and DMoz searches, along with the time it took you to find each link. Write one paragraph that compares and contrasts the results and evaluates the effectiveness of each processes. Be sure to use the terms unstructured and structured to describe the search processes, addressing both the quality and quantity of your results.