Thursday, October 27, 2005

php - getFirstOrderedNode

I'm kicking off my code snippet series with a php function which mimics javascript's "first ordered node" xpath result type. This series isn't really for my regular readers (if i had any) more for the archives.


function getFirstOrderedNode($xpath, $query, $parent){
$children = $xpath->query($query, $parent);
if($children->length != 1){ return null; }
$child = $children->item(0);
return $child;
}

Tags: ,,,

Tuesday, October 25, 2005

Who needs RDF/XML?

[Note: This post is rough and unfinished but its been in draft status too long, so I though I'd put it out there. Hopefully, I will write a clearer follow-up]

Seth Ladd joins a long list of semwebbers (see Tim Bray for an early example) striking out against the ugliness of RDF/XML.
Thinking about what is keeping RDF from wider adoption, I keep coming back to the fact that the serialization is, well, too complex. Why hasn't anyone stepped up to propose a XML syntax that plays well with the thousands of XML tools out there? Let the XML folk say, "I came for the simple syntax, I stayed for the powerful model."
I'm not going to refute that in most applications RDF/XML is horrible, as Seth points out, it is
  1. Hard to understand for someone from an xml background
  2. Unpredictable ie. multiple serialisations exist for the same model
  3. Not validatable against a schema
  4. Not usable with xslt and similar xml processing tools
I think the major misconception here is that rdf/xml is either the only serialisation of rdf, when in fact the nature of rdf is that many semantically equivalent serialisations exist, each with their own purpose, and many of these serialisations are plain-old xml. For example, Seth and others talk about "view source" as a major driver in adoption of the web and how rdf/xml is too complicated for the average user. However, in this scenario (rdf delivered via a web browser), rdf is unlikely to be delivered to the user as rdf/xml, instead the user will consume metadata embedded in xhtml, either using microformats (and grddl), or rdf/a or rdf-in-html.
As for the unpredictability of rdf/xml, this is unavoidable since there is no canonical way to serialise a graph (and a complex graph at that - with edges as nodes, and triples as nodes), to a tree. This is also why rdf does not sit well with xml schema languages, and xml processing tools such as xslt, it seems obvious to me xml tools are never going to be a perfect fit for the task of rdf processing.
However, since xml is an established technology it is important that there are good mappings between the two domains, sparql is the hero here, particular the sparql results format, as its predicatable, reasonably pretty/understandable, and works well with conventional xml tools. The drawback of course, is that you need some knowledge of the ontology in order to create a useful query. Other approaches that are useful when a good xml serialisation is required include rdf-twig (an updated rdf-twig might go a long way to convincing the naysayers) and Concise Bounded Descriptions.
One other important serialisation I haven't mentioned is N3. N3 is for me the easiest way of writing an rdf document for input, however it isn't so useful as an output serialisation because like with rdf/xml multiple serialisations can exist for the same set of triples, although subsets of N3 such as N-Triples may be a useful output serialisation in some circumstances.

All that being said, I don't think so many voices would have joined this conversation, unless there is a need that isn't being met by current technologies. I think the need is not for a replacement of rdf/xml, but a simpler way to write xml documents and schems such that they can be reused for xml processing. I might call this beast SimpleRDF, and it would be based on the ideas in, Bob Du Charme's "Make your xml rdf-friendly" (and Danny Ayers' exposition of the same idea), I would also add into the mix CURIEs, and I'd probably attempt to remove rdf:id, rdf:about, and rdf:resource from the spec, and add support for header information, including a grddl profile, and a mapping from triples to the xml format (via sparql and xslt for example). More on that in a later post I think.


Tags: , , ,

Sunday, October 09, 2005

Analysis of weblogsinc deal with Metcalfe's law

Tristan Louis recently wrote a much blogged analysis of the weblogs inc/aol deal. His analysis used incoming links as a measure of influence, and presumed that value (in dollars) scaled linearly with influence eg kN = V, where k=link value, N=number of links, and V is the value of the network. This fails to take into account metcalfe's law, which suggests value scales with the square of the number of users. A better formula for value would be kN2=V, where k is now the value per link squared.

Running the numbers we find k=0.012752733, thus to find the value of your blog, multiply this number by the square of the number of sites linking to you according to technorati.
Some example valuations :
BoingBoing: $3.5 million
Engadget (part of weblogs inc.): $2.3 million
Scobeleizer: $175,000
ongoing: $23, 000

I think these numbers look a lot more convincing than valuations based directly on Tristan's analysis. It's also important to note that when using this technique on a network of sites like Jason' weblogs inc. the summation of the values of each of the individual properties falls far short of the value of the whole network.

[UPDATE: A few more links on the topic.
Inside Google has a list of valuations using Tristan's metric.
Business Bits is skeptical and points out the sum is worth more than the whole.
Jermey Zawodny is selling his blog.]