Understanding web classification

Fantastic white-paper about the problems and potential of web-classification systems.

The hot new term in information organization is "ontology." Everybody's inventing, and writing about, ontologies, which are classifications, lists of indexing terms, or concept term clusters (Communications of the ACM, 2002). But here's the problem: "Ontology" is a term taken from philosophy; it refers to the philosophical issues surrounding the nature of being. If you name a classification or vocabulary an "ontology" then that says to the world that you believe that you are describing the world as it truly is, in its essence, that you have found the universe's one true nature and organization. But, in fact, we do not actually know how things "really" are. Put ten classificationists (people who devise classifications) in a room together and you will have ten views on how the world is organized.

Librarians had to abandon this "one true way" approach to classification in the early twentieth century. As many are (re-)discovering today, information indexing and description need to be adjusted and adapted to a myriad of different circumstances. Why, then, use the misleading term "ontology"?

Apart from philosophical issues, there is another, more important reason to abandon use of the term. Recorded information does not work the same way the natural world does. Information is a representation of something else. A book, or a Web site, can mix and match informational topics any way its developer feels like doing. There's no such thing as a creature that is half squirrel and half cat, but there are many mixes of half-squirrel/half-cat topics in information resources and Web sites. Methods of information indexing have to recognize what's distinctive to information, as opposed to classifications of nature, and design the systems accordingly.

Link

Discuss

(Thanks, Chris!!