Saturday, June 10, 2006

RDF in a nutshell

[UPDATE: I forgot a very important part of the rdf model - the open world assumption - added in as point 4.]

I'm sick of hearing the Semantic Web is too hard, the core of the Semantic Web is the RDF model, which is very easy to understand.

  1. A statement in RDF is an ordered triple of RDF terms, known as subject predicate object. An rdf statement, expresses a relationship (named by the predicate) between the subject and object eg. s:myCar p:hasColour o:Blue
  2. Terms should be URI's, but subject terms can also be literals (strings).
  3. We can write statements about any URI, we can connect statements about the same URI together to build more complex statements eg. s:myCar p:hasMake o:Focus, s:Focus p:manufacturedBy o:Ford, can be written in english as "The make of my car is the Focus, the Focus is manufacted by Ford"
  4. A missing triple is not significant in the rdf model i.e. a missing triple only means that the piece of information that the triple describes is not known. For example, given the model [Jimbo hasChild Tombo] how many children does Jimbo have? It is common in programming for the answer to be one, but in rdf the answer is at least one. This is known as the open-world assumption (as opposed to the closed world assumption used in object and relational models)
  5. (Optional) As well as all the terms being URI's we can assign a URI to an RDF statement, this allows us to make assertions about statements. This is known as reification, and allows higher level models to be developed such as building versioning or provenance support into RDF.
This is all you need to know about the RDF model, but to build applications with it a few more concepts might be useful.

RDF-Syntax
  1. The RDF model, is independant of any particular syntax used to express it. Most RDF tools and libraries allow you to easily convert one syntax into another.
  2. RDF/XML is an XML syntax for expressing RDF models, it is designed to be used mostly for machine-to-machine transfer, so humans are best ignoring it. However rdf/xml is the most common rdf format in the wild, and many writers get RDF (the model) confused with rdf/xml (the syntax), many criticisms of rdf are based on this misunderstanding.
  3. N3, Turtle and N-Triples, are much more understandable rdf syntaxes for consumption by humans.
  4. RDF can be expressed in POX (plain-old-xml) by using a custom transformation to convert into something the RDF parsers understand - these stylesheets can be attach to the document using GRDDL so you don't have to tell the parser what transformation to use.
  5. There already exist many transformation for converting other data formats into an rdf-compatible syntax, in particular there are several attempts at defining way to embed rdf in (x)html - eg. RDF/A, RDF in HTML
Vocabularies
  1. A collection of common terms that be used to describe an object is called a vocabulary. For example the FOAF vocabulary, describes People and relationships between people, and contains terms such a Person, Project, Organisation, name, nick, knows, mbox.
  2. You are encouraged to mix and match vocabularies when creating RDF models, and to create your own when there are know vocabularies that match your purpose.
  3. Vocabularies are expressed using the RDF model (ie. terms are URI's). A vocabulary that helps defines other vocabularies is RDF-Schema, this defines terms such as Resource, Class, Literal, Property, type, range, domain, isSubClassOf, isSubPropertyOf, List, Seq, Baq etc.
Ontologies
  1. A collection of rdf document that describes a set of objects, and is designed for reuse, is known as an ontology. For example the statement s:Focus p:manufacturedBy o:Ford could have come from a Car Manufacturers ontology.
  2. Ontologies tend to be either domain-specific, or very general eg Word-net (an ontology about words) and Open-Cyc (an ontology about real world concepts - or all of human consensus reality as they put it).
  3. The OWL Web Ontology language is a Vocabulary for describing Ontologies.
  4. As with Vocabularies you are encouraged to reuse what's out there or build and publish your own.
  5. Building Ontologies can be HARD, but they have the biggest payback of all semantic technologies, don't be discouraged.

Semantic web is easy

I just read Dan Zombonini's post The 7 flaws of the semantic web, which is a good critique of the Semantic Web (big S, big W). It doesn't throw much new light on it's flaws but it does gather them together and put forward a very coherent argument. While I disagree vehemently, rather than writing an incoherent rebuttal, I'd like to tackle one point of his in a concrete manner. [I've already written a rebuttal in attempting to write an introduction - for concreteness see RDF in a nutshell]

He writes

Web 2.0 applications are pretty easy to build, because most people can pick up the base technologies really quickly - HTML, CSS, XML, Javascript. You can then take a little step up to learn about DHTML and AJAX, and then use these to build a Web 2.0 application.

With the semantic web, there’s a much higher first rung to the ladder. Getting to grips with RDF/XML, SPARQL, and the other core technologies is a big ask for most developers. To then get useful semantic web applications out of these takes a couple more exhausting jumps of complexity.


First off his diagram is misleading, he has neatly folded the bottom layers of the stack into one layer, so his comment "there's a much higher first rung to the ladder is misleading". Lokking at TimBL's version of the stack, the bottom rung is URI's, technically a difficult topic, but one most developers (and even non-developers) have some familiarity with.

The next rung is xml, again something which is well deployed and reasonably well understood by developers at large.
Developers can contibute to the semantic web, by simply using URI's, especially if the URI's follow web conventions ie. expose RESTful interfaces. Even if you don't use any of the other layers in the stack, using URI's gives your resources a presence on the Semantic Web, the more URI's you use the more useful your data is, in a Semantic Web way. For example, the rss fragment (written as rdf)

http://www.news.com/news-story has a http://www.metadata.com/property/subject of “Science/Nature”

can be written

http://www.news.com/news-story has a http://www.metadata.com/property/subject of http://technorati.com/tag/Science&Nature

This makes the data more useful, allowing for example,and aggregator to combine this data with other data that is about Science and Nature. To make this happen the author didn't need to know anything about the semantic web other than URI's are better.
Similarly using xml makes your data, more useful on the Semantic Web (the advantage in this case is that it is easier to transform your data into other data formats eg. using Atom means your data can be transformed to rss1.0 and parsed by an rdf parser, or transformed to html to be read through a web browser).

The second rung in the ladder is the RDF model, this rung is where the good stuff happens, but is the realm of the semantic web developer, not the developer-at-large. RDF is not difficult (though the RDF/XML syntax is ugly and is what puts most developers off), the intended aim of this post was to demonstrate that, instead I got way-laid. To see why RDF is easy, see RDF in a Nutshell

With the semantic web, there’s a low first rung to the ladder. To get useful semantic web data, developers only need to use XML and URI's. To then build useful semantic web applications developers can rely on RDF, SPARQL, and the other higher-level technologies.