Thursday, August 10, 2006

Making statements about statements in RDF

This SWIG thread "Reifying Triples as unique URIs" is one of the most interesting I have read in weeks. It explores the question of how do you best make statements about other statements in RDF, in particular how do you best make statements about triples.

This W3C Working Note has a pretty good introduction to the problem and its solutions, so I suggest reading that then reading the SWIG mailing list thread.
One other thing that pops on the list but not in the working note is using Named Graphs.
Named Graphs add URIs to collections of rdf statements (graphs). For example we could write (in TRIG)

http://example.org/bob
{
_:a foaf:name "Bob" .
_:a foaf:mbox .
}

Since this graph is named with a URI we can write statements about this graph.
This approach can be used to make statements about triples by putting each triple in its own named graph, however this has the following drawbacks
  • It requires using Named Graphs which is still an immature technology.
  • Named graphs contain only statements, not other graphs, so if we name our triples (so that they are quads) we can't name our graphs of quads e.g. We can't have
    :G3{
    :G1
    {
    :s1 :p1 :o1
    }

    :G2
    {
    :s2 :p2 :o2
    }
    }

Wednesday, August 09, 2006

Code Churn

Just a mental note.

When discussing programmer productivity kloc (kilo-lines-of-code) per year is often used as a general purpose metric. I presume this is based on measuring the size of the code-base and dividing by the time taken to write it. For example i have written ~5k in 10 months so I write aprox 6 kloc per year. This measuremnt hides the fact that that 5k of code has been rewritten many times, so I have probably written much more that 5kloc. Measuring how many lines of code were altered at each subversion commit and summing them up over the year would give a better idea of how much code I have actually written. I call this figure code turnover (analagous to (european) business turnover).
The ratio of turnover to the size of the code at the end (code profit?), I'll call the code churn ratio, or just churn. I wonder what kind of numbers are typical for churn? I guesstimate my churn on this project to be at least 2 and it wouldn't surprise me if it was higher.

PS. If you are wondering why I haven't posted in a while, it's because I am stealthily working on my new site