Taxonomies and Metadata

From CUNY Academic Commons

Jump to: navigation, search
Categories

Contents

Some terms you might come across in exploring taxonomy and metadata

There is a fair amount of jargon when talking about organizing information; some terms you might come across include ontologies, taxonomies, controlled vocabularies, thesauri, folksonomies, tagging and metadata. To add to the confusion, as there is a tremendous overlap in what some of these terms mean and therefore, they can be used interchangeably. However, there are some distinctions that are generally agreed upon when in comes to terminology about structured information:


Controlled vocabulary

Controlled vocabularies is a closed list of indexable terms that assign a single term to describe an item which may be described in multiple ways. In everyday language, multiple words can be used to describe a single phenomenon -- a heart attack can also be called a myocardial infarction. Within a medical controlled vocabulary, a layperson who uses the term "heart attack" may be referred to the preferred "myocardial infarction" as the term to use. A controlled vocabulary provides consistent usage of language to describe an item. For more information, see http://www.controlledvocabulary.com/ http://en.wikipedia.org/wiki/Controlled_vocabulary


Dublin Core

Dublin Core is the lingua franca of metadata standards, with 15 elements (fields) describing such things as title, creator, format, date, etc. It was developed to ensure that people were describing the most basic elements of a digital object. Many libraries, archives and institutional repositories use this metadata standard in addition to more complex standards that suits the needs of a particular domain of practice as it facilitates interoperability between multiple systems and local standards. The basic set of elements can be expanded into more fine grained elements in the Qualified Dublin Core (http://dublincore.org/documents/2000/07/11/dcmes-qualifiers/). More information can be found on Dublin Core at:

http://www.xml.com/pub/a/2000/10/25/dublincore/index.html, http://dublincore.org/about/ and http://en.wikipedia.org/wiki/Dublin_Core.


Folksonomy

Folksonomies are ad hoc taxonomies created through a collaborative process by content creators and users of a system. Some other words affiliated with folksonomies include collaborative tagging, social classification, social indexing, and social tagging, and the public at large is familiar with the tagging process available in many Web 2.0 applications such as Flickr and Delicious. More on folksonomies at http://en.wikipedia.org/wiki/Folksonomy.


Metadata

Metadata is a structured way of describing information -- data about data. There are many different standards for describing information, and a metadata scheme consists of standardized fields or elements that describe an object such as title, creator, etc. as well as instructions for how these fields can/should be used. In the context of a library, an archetypal document for metadata would be a card from the card catalog. It holds information in a particular way that describes another content-laden object, a book. Some good introductory links include: http://www.library.uq.edu.au/iad/ctmeta4.html and http://www.language-archives.org/documents/gentle-intro.html.


ID3 Tags

An ID3 tag is a data container that provides a way to associate descriptive information within an MP3 audio file stored in a prescribed format. To learn all about ID3 tags click here.


You'lll find software (free and otherwise) that implement ID3 tagging here.


Ontology

As a rule, an ontology refers to a network of relationships between terms or even taxonomies. In the broadest sense of the term, it is a mapping of a conceptual model with specification of entities and relationships. See http://www-ksl.stanford.edu/kst/what-is-an-ontology.html for a description of ontology in Artificial Intelligence. A thoughtful analysis of ontology can be found at http://www.shirky.com/writings/ontology_overrated.html which discusses scenarios in which ontologies make more sense than other forms of structuring information. Included in Shirky's list of best realms in which to deploy ontologies include: Small corpus [or collection being described], formal categories, stable entities, restricted entities and clear edges[or clearly defined boundaries for each category].


Taxonomy

"Taxonomy (from Greek taxis meaning arrangement or division and nomos meaning law) is the science of classification according to a pre-determined system, with the resulting catalog used to provide a conceptual framework for discussion, analysis, or information retrieval. In theory, the development of a good taxonomy takes into account the importance of separating elements of a group (taxon) into subgroups (taxa) that are mutually exclusive, unambiguous, and taken together, include all possibilities. In practice, a good taxonomy should be simple, easy to remember, and easy to use." retrieved from http://www.leadingtoday.org/weleadinlearning/sf05.htm with full index of relevant article. Here is an excellent introduction to what taxonomies are for laypeople: http://www.digital-web.com/articles/better_living_through_taxonomies/


Thesaurus

At a basic level, a thesaurus allows for the cross-referencing of terms and concepts. Some call a list of synonyms or "see also" terms (e.g. heart attack, see also myocardial infarction) a thesaurus. For others, it is a whole host of potential relationships between terms that extends beyond the standard tree structure of a classic taxonomy. ISO2788 has standardized thesaurus usage descriptions that might look familiar from your card catalog days. More information about ISO 2788 at: http://www.ontopia.net/topicmaps/materials/tm-vs-thesauri.html#N429. Also a more thoughtful discussion on types of relationships can be found here: http://www.slis.kent.edu/~mzeng/Z3919/4relationship.htm

Authors