Sunday, November 06, 2005

SPARQL and Web 2

Henry Story has just posted describing SPARQL as query language for Web 2.0. I think all his usage examples are good, but I think he's missed the point slightly. Henry suggests that Web 2.0 business will expose SPARQL endpoints over web services. This isn't going to happen for several reasons
  1. Economics: There is a lot of value stored in the databases Henry mentions and most companies will not want competitors/users to have unrestricted access to this data. Current web service APIs are designed so the expected value increase from user derived software, is likely to exceed the loss of the value in the data.
  2. Performance: Even if the data is completely open, and the economics doesn't come into play, performance is a major issue. SPARQL queries are designed to be written by Semantic Web engineers, much as SQL queries are designed to be written by database engineers. As an example, consider the following query

    PREFIX foaf: ...
    PREFIX dc: ...
    SELECT ?book
    WHERE { ?book dc:creator ?who
    ?who foaf:name "J. K. Rowling"
    }


    This query (if the WHERE clause is evaluated top to bottom) is highly inefficient, it first searches for all triples with property dc:creator, then filters those such that the dc:creator's foaf:name is "J. K. Rowling". A much more efficient query reverses the patterns in the WHERE clause. I believe automated query rewriting is beyond state of the art at the moment and will continue to be for the foreseeable future, especially when you consider the technical challenge of throwing inferencing into the mix, and the social challenge of open access (eg. consider the query "SELECT ?s ?p ?o WHERE { ?s ?p ?o}").
that's not to say I don't see SPARQL becoming an integral part of Web 2.0. I envisage that the next generation of back-end storage products, will be based on triples, inferencing, and rules. SPARQL will be the query language used to interface with the backend. At the web tier, services will continue to be built on RESTful principles, however more services will expose data as RDF, and publish schemas based on RDFS, OWL etc to enhance their meaning. At the client side aggregators, smushers, inferencers and provers will be fundamental building blocks, and high-level special purpose APIs written to interface with them (eg. BlogEd's RDF Javabeans classes). I think there is room for SPARQL again at this level, but it's likely to be too general and complex for your average application programmer.

0 comments: