RDF Triple Stores
Posted by Martin Homik | Posted in Semantic Web | Posted on 04-07-2007
3
For my competency matching task, I have been looking for different RDF Triple Stores. I am a newbie in this field, so I won’t understand the reasons, advantages, and details of those different stores. So, I will base my design decision on an impression. My requirements are:
- Easy to use.
- OWL DL support.
- Support for SPARQL.
- Should run in server mode and provide web services.
- Existing documentation.
- Vivid community.
- Roadmap.
I started to develop an ontology for competencies with Protegé. I was able to test the ontology using RacerPro and even to retrieve simple information with some first SPARQL queries. However, I did not know how to formulate my exact matching query in SPARQL: “Find all job competency profiles whose competencies are a subset of a person’s competencies.” The quest for a suitable RDF store began. Here are the candidates.
Jena. I think, Jena is the most known RDF triple store in the field. The documentation is very good and it also comes with some extra tool developed by third parties. It provides a programmatic environment for RDF, RDFS and OWL, SPARQL and includes a rule-based inference engine. It also has the ability to be used as an RDF database via its Joseki layer. But after a while, I somehow got the impression that Semantic Web people recently were in favour of Sesame. As far as I remember, they criticised Jena’s missing web framework and SPARQL interface which is a bit annoying, because it has. Maybe not at the time. Also they pointed me to Kowari and Mulgara. The Mulgara Semantic Store is an Open Source, massively scalable, transaction-safe, purpose-built database for the storage and retrieval of RDF, written in Java. It is an active fork of Kowari. Another reason why I dropped Jena from my list, is that Sesame comes with a more modern architecture. And even Boca turned from a hardwired Jena RDF Model to Sesame’s open RDF Model.
Sesame. Sesame seem to be the most modern approach. It comes with a plugin architecture which makes it very modular. Also, in Version 2 it offers interfaces to the Spring Framework which makes it even more modular. Finally, it can be deployed to tomcat such that one can upload and ontology and test queries in the browser. That’s not bad.
But what convinced me me most, is its documentation, vivid community, and roadmap. I have the impression that they guys will continue developing the system and really care aboout their community. When I posted my request on competency matching in the forum, I received an answer the next day. The suggested query was a bit complicated, but it gave me a very good insight in how to formulate queries. The query was in SeRQL a query language/engine devloped by the Sesame people. I am not quite sure, if I can reformulate that query in SPARQL, because it uses nested WHEREs and existential quantifiers (in a negated form).
Boca. Boca is another RDF Triple Store. I came across it via IBM Semantic Layered Research Platform. Their goal is to come up with a full application framework based on Semantic RDF Store. The framework is quite impressive and supports many sophisticated features such as client support, offline persistence with replication, notification, access control, versioning, and much more. It was easy to install the server application and to run some client examples.
Unfortunately, on the flipside, it seems that they do not support any reasoning. That is, they just provide the RDF store and a SPARQL query interface, but there is no OWL Reasoner. Also, though it is a very interesting project, the community is rather small and I do not know whether this project will die one day. Also, I have the impression that Sesame might catch up in Version 2 with Boca’s features.
Conclusion. I chose the following setting. In the beginning, I use Protegé to develop the antology and to include some individuals for testing. Then I load the ontology in Sesame and test my queries. This phase is just for testing what can be done and where are the limitations. My requirements fitted top Sesame best.
In a next phase, a user-friendly interface to instantiate competencies and profiles is needed. I am not quite sure whether I want to save this information straight into the RDF store. A much more reliable and stable solution might be to base this phase on a classic application framework including persistency and a relational database, and to write a converter wich converts data from the relational database to rdf statements that can be then uploaded to the RDF store. Considering this alternative enables to use much of state-of-the-art best practices. Or maybe, I should also check Jastor. Jastor is a open source Java code generator that emits Java Beans from Web Ontologies (OWL). Let’s see.


Hello.
The article is very good.
In our group, we are starting with something like this and we have the same problem.
Also exists an API oriented to the manipulation of OWL (OWL-API) but I think that gives no support for SPARQL. The advantage may be that has support for OWL 1.1. This the proposal for the next version of OWL.
….
We are working with Jena but we want have evaluation of others libraries, frameorks, etc.
Fernando Carpani.
[...] Now we have only three real candidates: Sesame, Open Anzo and Mulgara. Unfortunately, we dropped the Jena from our list, because, it has the old annoying graph model and some problems with scalability (see more in the “RDF Triple Stores“). [...]
Dear Fernando, thanks a lot for your coment. I haven’t investigated other triple stores any more. Instead, we decided to use Sesame, because it is a well-known and well-documented project. Moreover, I know some other researchers using it such that I can ask them in case of questions. Also, the Sesame Forum is quite vivid.
Anyway, if there will be time, I’ll check with your two other suggestions. I once came across Mulgara, but I was recommended not to consider it for serious use as it was a fresh fork from Kowari at that time.