Friday, November 04, 2005

TagClusters - Creating a tag hierarchy

As part of my quest to eventually become a full internet citizen, I'm trying to take more of my software development public. As part of that I intend to a series on mining the vast open metadata reserves that can now be found on the internet. The first few parts of this series are on making sense from tagged data.

Suppose one has the following tagged items

Item A, Tags: code snippets, php, dom, xml
Item B, Tags: code_snippets, javascript, dom, xml
Item B, Tags: code_snippets, javascript
Item B, Tags: code_snippets, php

In order to browse the items, it would be useful to organise the tags in a hierarchy. The simplest (and most commonly used) method is to have a root node for each of the tags, and as children of each the root nodes, all the tags that are used in common with the root tag. For example, with the above tagged items, a partially expanded tree might look like.

+ code_snippets
+---+ php
+----+ dom
+----+ xml
+----javascript
+----+ dom
+----+ xml
+---- ...
+ javascript
+---+ code_snippets
+---+ ...
+ php
+ dom
+ xml

A better method would realise that all the items are tagged "code_snippets", and all the items are tagged either javascript or php, and that all items tagged dom are also tagged xml.
In more formal terms, a better hierarchy for tags uses containment as a partial ordering system.
This is the basis for TagClusters a javascript class for creating tag hierarchies. The above example rendered using TagClusters can be found here.
I've also tried it on a much more complex example - a snapshot of my del.icio.us bookmarks. While an improvement can be seen over the naive approach, it is still less than optimal.

0 comments: