Saturday, June 10, 2006

RDF in a nutshell

[UPDATE: I forgot a very important part of the rdf model - the open world assumption - added in as point 4.]

I'm sick of hearing the Semantic Web is too hard, the core of the Semantic Web is the RDF model, which is very easy to understand.

  1. A statement in RDF is an ordered triple of RDF terms, known as subject predicate object. An rdf statement, expresses a relationship (named by the predicate) between the subject and object eg. s:myCar p:hasColour o:Blue
  2. Terms should be URI's, but subject terms can also be literals (strings).
  3. We can write statements about any URI, we can connect statements about the same URI together to build more complex statements eg. s:myCar p:hasMake o:Focus, s:Focus p:manufacturedBy o:Ford, can be written in english as "The make of my car is the Focus, the Focus is manufacted by Ford"
  4. A missing triple is not significant in the rdf model i.e. a missing triple only means that the piece of information that the triple describes is not known. For example, given the model [Jimbo hasChild Tombo] how many children does Jimbo have? It is common in programming for the answer to be one, but in rdf the answer is at least one. This is known as the open-world assumption (as opposed to the closed world assumption used in object and relational models)
  5. (Optional) As well as all the terms being URI's we can assign a URI to an RDF statement, this allows us to make assertions about statements. This is known as reification, and allows higher level models to be developed such as building versioning or provenance support into RDF.
This is all you need to know about the RDF model, but to build applications with it a few more concepts might be useful.

RDF-Syntax
  1. The RDF model, is independant of any particular syntax used to express it. Most RDF tools and libraries allow you to easily convert one syntax into another.
  2. RDF/XML is an XML syntax for expressing RDF models, it is designed to be used mostly for machine-to-machine transfer, so humans are best ignoring it. However rdf/xml is the most common rdf format in the wild, and many writers get RDF (the model) confused with rdf/xml (the syntax), many criticisms of rdf are based on this misunderstanding.
  3. N3, Turtle and N-Triples, are much more understandable rdf syntaxes for consumption by humans.
  4. RDF can be expressed in POX (plain-old-xml) by using a custom transformation to convert into something the RDF parsers understand - these stylesheets can be attach to the document using GRDDL so you don't have to tell the parser what transformation to use.
  5. There already exist many transformation for converting other data formats into an rdf-compatible syntax, in particular there are several attempts at defining way to embed rdf in (x)html - eg. RDF/A, RDF in HTML
Vocabularies
  1. A collection of common terms that be used to describe an object is called a vocabulary. For example the FOAF vocabulary, describes People and relationships between people, and contains terms such a Person, Project, Organisation, name, nick, knows, mbox.
  2. You are encouraged to mix and match vocabularies when creating RDF models, and to create your own when there are know vocabularies that match your purpose.
  3. Vocabularies are expressed using the RDF model (ie. terms are URI's). A vocabulary that helps defines other vocabularies is RDF-Schema, this defines terms such as Resource, Class, Literal, Property, type, range, domain, isSubClassOf, isSubPropertyOf, List, Seq, Baq etc.
Ontologies
  1. A collection of rdf document that describes a set of objects, and is designed for reuse, is known as an ontology. For example the statement s:Focus p:manufacturedBy o:Ford could have come from a Car Manufacturers ontology.
  2. Ontologies tend to be either domain-specific, or very general eg Word-net (an ontology about words) and Open-Cyc (an ontology about real world concepts - or all of human consensus reality as they put it).
  3. The OWL Web Ontology language is a Vocabulary for describing Ontologies.
  4. As with Vocabularies you are encouraged to reuse what's out there or build and publish your own.
  5. Building Ontologies can be HARD, but they have the biggest payback of all semantic technologies, don't be discouraged.

0 comments: