Search sites continued …

Posted by Martin Homik | Posted in Semantic Web | Posted on 14-08-2008

0

Search sites that base their knowledge on semantic information are clearly progressing. Watch the awesome video of  Freebase Parallax. There is another interface here.

Java Data Binding / Data-driven approach

Posted by Martin Homik | Posted in Java, Semantic Web, WebApp | Posted on 04-12-2007

0

I have to write this article for myself, because I am very forgetful. In a world dominated by so many technical terms and products it is not easy to find the right sources and solution. In my case, the following scenario is present: I want to have a web application including a database storage and remote services. This incorporates a set of classes, a database schema and a XML representation of my data model. In addition, the xml representation can comply to some well-known XSD standard or it can be formulated in some Semantic Web language such as OWL. Because so many components interact with each other they share a common data model but each component uses its own representation of the information.

  • The relation database uses a database schema formulated in SQL
  • The Java classes comply to a Java specification.
  • Semantic Triple stores base on RDFS and/or on OWL
  • Web Services communicate via WSDL.
  • And finally there also exist XSD definitions of a data model for interoperability.

The ultimate question is, how can I select one representation and generate all other representations? This approach is data-driven.

Hibernate Tools allow to create a set of POJOs from a database schema and vice versa a database schema from a set of JPA annotated classes. AppFuse applies HibernateTools for database creation from POJOs. POJOs also serve as sources for generation of WSDL files. Java XML Binding frameworks such as JAXB, JaxMe, or XMLBeans help to create POJOs from XSD files. POJOs created by these tools can be streamed to XML files that correspond to the origin XSD. And finally, RDFReactor can take an OWL file and create a POJO which can be used for persisting into a triple store.

So, you see the flow is not easy, as it is interrupted. While you can create POJOs with Java XML Binding tools, these POJOs are not annoted by JPAs such that an automatic generation of a database schema is not possible. In any case, if a solution exists that unifies all streams then it should be an easy-to-use approach. And this is hard to find.

Java XML Binding:

  • JAXB This is a reference implementation. There is a lot action here and the hige community.
  • JaxMe 2 In my opinion, this started as an ambitious project but lost much motivation. I do not know whether it cas been stopped, but the last news message is from 2006. A pity, because it delivered a “complete” approach tying POJOs/Beans, XML files, and (XML) databases together.
  • XMLBeans Though still one of the most active projects, I always considered this as a package monster. Good for Java/XML Binding.
  • Castor provides Java-to-XML binding, Java-to-SQL persistence, and more.
  • JibX is a fast Java XML Binding framework.

Java Database Binding:

Java RDF/OWL Binding:

  • Jastor This an open source Java code generator that emits Java Beans from Web Ontologies (OWL) enabling convenient, type safe access and eventing of RDF stored in a Jena Semantic Web Framework model. As mentioned it is programmed against Jena making an application dependent on a particular RDF Triple Store.
  • RDFReactor RDFReactor views the RDF data model through object-oriented Java proxies. It makes using RDF easy for Java developers. It is independent of a specific RDF store. It is still under development and rather a research project.

Application Frameworks

Further Reading:

RDF Triple Stores

Posted by Martin Homik | Posted in Semantic Web | Posted on 04-07-2007

3

For my competency matching task, I have been looking for different RDF Triple Stores. I am a newbie in this field, so I won’t understand the reasons, advantages, and details of those different stores. So, I will base my design decision on an impression. My requirements are:

  • Easy to use.
  • OWL DL support.
  • Support for SPARQL.
  • Should run in server mode and provide web services.
  • Existing documentation.
  • Vivid community.
  • Roadmap.

I started to develop an ontology for competencies with Protegé. I was able to test the ontology using RacerPro and even to retrieve simple information with some first SPARQL queries. However, I did not know how to formulate my exact matching query in SPARQL: “Find all job competency profiles whose competencies are a subset of a person’s competencies.” The quest for a suitable RDF store began. Here are the candidates.

Jena. I think, Jena is the most known RDF triple store in the field. The documentation is very good and it also comes with some extra tool developed by third parties. It provides a programmatic environment for RDF, RDFS and OWL, SPARQL and includes a rule-based inference engine. It also has the ability to be used as an RDF database via its Joseki layer.  But after a while, I somehow got the impression that Semantic Web people recently were in favour of Sesame. As far as I remember, they criticised Jena’s missing web framework and SPARQL interface which is a bit annoying, because it has. Maybe not at the time. Also they pointed me to Kowari and Mulgara. The Mulgara Semantic Store is an Open Source, massively scalable, transaction-safe, purpose-built database for the storage and retrieval of RDF, written in Java. It is an active fork of Kowari. Another reason why I dropped Jena from my list, is that  Sesame comes with a more modern architecture. And even Boca turned from a hardwired Jena RDF Model to Sesame’s open  RDF Model.

Sesame. Sesame seem to be the most modern approach. It comes with a plugin architecture which makes it very modular. Also, in Version 2 it offers interfaces to the Spring Framework which makes it even more modular. Finally, it can be deployed to tomcat such that one can upload and ontology and test queries in the browser. That’s not bad.

But what convinced me me most, is its documentation, vivid community, and roadmap. I have the impression that they guys will continue developing the system and really care aboout their community. When I posted my request on competency matching in the forum, I received an answer the next day. The suggested query was a bit complicated, but it gave me a very good insight in how to formulate queries. The query was in SeRQL a query language/engine devloped by the Sesame people. I am not quite sure, if I can reformulate that query in SPARQL, because it uses nested WHEREs and existential quantifiers (in a negated form).

Boca. Boca is another RDF Triple Store. I came across it via IBM Semantic Layered Research Platform. Their goal is to come up with a full application framework based on Semantic RDF Store. The framework is quite impressive and supports many sophisticated features such as client support, offline persistence with replication, notification, access control, versioning, and much more. It was easy to install the server application and to run some client examples.

Unfortunately, on the flipside, it seems that they do not support any reasoning. That is, they just provide the RDF store and a SPARQL query interface, but there is no OWL Reasoner. Also, though it is a very interesting project, the community is rather small and I do not know whether this project will die one day. Also, I have the impression that Sesame might catch up in Version 2 with Boca’s features.

Conclusion. I chose the following setting. In the beginning, I use Protegé to develop the antology and to include  some individuals for testing. Then I load the ontology in Sesame and test my queries. This phase is just for testing what can be done and where are the limitations. My requirements fitted top Sesame best.

In a next phase, a user-friendly interface to instantiate competencies and profiles is needed. I am not quite sure whether I want to save this information straight into the RDF store. A much more reliable and stable solution might be to base this phase on a classic application framework including persistency and a relational database, and to write a converter wich converts data from the relational database to rdf statements that can be then uploaded to the RDF store. Considering this alternative enables to use much of state-of-the-art best practices. Or maybe, I should also check Jastor. Jastor is a open source Java code generator that emits Java Beans from Web Ontologies (OWL). Let’s see.

Topic Maps vs. Tagging

Posted by Martin Homik | Posted in Semantic Web | Posted on 25-06-2007

0

When talking about references to topics in general,a fellow researcher always preferred to talk in terms of Topic Maps. The other day, I had time to read about Topic Maps. The following brief description is taken from Wikipedia:

Topic maps are an ISO standard for the representation and interchange of knowledge, with an emphasis on the findability of information. The standard is formally known as ISO/IEC 13250:2003.

Topic Maps illustration

A topic map can represent information using topics (representing any concept, from people, countries, and organizations to software modules, individual files, and events), associations (which represent the relationships between them), and occurrences (which represent relationships between topics and information resources relevant to them). They are thus similar to semantic networks and both concept and mind maps in many respects. In loose usage all those concepts are often used synonymously, though only topic maps are standardized.

The interested reader might also read the following documents:

When I finished reading the documents. I asked myself how it differs from tagging in respect to usage. Tagging is so simple for bookmarking and annotation of bookmarks by using terms that come into your mind. In this respect, tagging is a representation of our associative memory, and hence, it is highly individual. Adding an extra level of complexity like in Topic Maps, never occurred to me feasible in the sense that people will like to use it. They prefer simplicity. However, it is very easy to map tags into topic maps and topic maps have some interesting advantages as stated by Lars Marius Garshol in his weblog: multiple names, different topics with the same name, and relationships between topics. This features introduce a much more reliable level of structuring knowledge. Have a look at Fuzzy.com, for instance.

Semantic Knowledge Representations

Posted by Martin Homik | Posted in Semantic Web | Posted on 22-06-2007

0

This blog entry is mostly for myself. It serves as a note to remember the differences between common knowledge representations. Every information is taken from A Semantic Web Primer book which I highly recommend. The Semantic Web is a very huge topic and it is not easy to grasp the concepts by only searching for web resources. I needed one place, written by one author (group) who explains intelligibly all the different knowledge representation that play a major role in the Semantic Web in a consistent way. Here, I will list the knowledge representations together with their “you have to know” facts. In the end, you will have a brief overview on which representation carries semantics and what kind of logic constructs it supports.

XML and XSL

  • XML allows the representation of information that is also machine-readable. Hence, XML can serve as a uniform exchange format between applications.
  • XML separates content from formatting.
  • XML is a meta-language for markup: it does not have a fixed set of tags but allows users to define tags of their own.
  • Nesting of tags introduces structure. The structure of XML documents can be defined/enforced by DTDs or by XML Schemas. Note, the nesting of tags has no standard meaning.
  • The semantics of XML documents is not accessible to machines, only to people.
  • Collaboration and exchange are supported if there is an underlying shared understanding of the vocabulary. XML is well-suited for close collaboration, where domain- or community-based vocabularies are used. It is not so well-suited for global communication.
  • Namespaces support the modularisation of DTDs and XML Schemas.
  • Accessing and querying of XML documents can be done by using XPath.
  • Transformation of XML documents can be done by using XSL and XSLT.

RDF and RDFS

  • RDF provides a foundatioon for representing and processing metadara.
  • RDF has a graph-based data model. Its key concepts are resource, property, and statement. A statement is a resource-property-value triple.
  • RDF has an XML based syntax to support syntactic interoperability. XML and RDF complement each otherbecause RDF supports semantic interoperability. Note, XML is just one possible representation which is handy for interoperability.
  • RDF has a decentralised philosophy and allows incremental building of knowledge, and its sharing and reuse.
  • RDF is domain-independent.RDF Schema provides a mechanism for describing specific domains – for defining a terminology.
  • RDF Schema is a primitive ontology language. It offers certain modelling primitives with fixed meaning. Key concepts of RDF Schema are class, subclass relations, property, subclass property relations, and domain and range restrictions.
  • XML Schema constraints the structure of XML documents, whereas RDF Schema defines the vocabulary used in RDF data models.
  • RDFS makes semantic information machine-accessible.
  • RDF supports reification: making statements about statements. This introduces some complexity.
  • In XML namespaces are only used for disambuigation purposes. In RDF external namespaces are expacted to be RDF documents defining resources, which are then used in the importing RDF document.
  • RDF inference systems implement only a few dozen rules. All those rules can be efficiently implemented. These systems do not rely on first-order logic. The inference systems are sound and complete.
  • Range definitions in RDF Schema are not used to restrict the range of a property, but rather to infer the membership of the range.
  • There exist query languages for RDF and RDFS such as RQL or SPARQL. Those query languages dpo not need to understand the document structure. They operate on the graph data model.

OWL

  • OWL is  teh proposed standard for Web ontologies. It allows us to describe the semantics of knowledge in a machine-accessible way.
  • OWL build upon RDF and RDF Schema: (XML-based) RDF syntax is used; instances are defined using RDF descriptions; and most RDFS modelling primitives are used.
  • Formal semantics and reasoning support is provided through the mapping of OWL on logics. Predicate logic and description logics have been used for this purpose.

Limitations of the Expressive Power of RDF Schema

  •  Local scope of properties: We cannot declare range restrictions that applay to some classes only.
  • Disjointness of classes.
  • Boolean combinations of classes.
  • Cardinality restrictions.
  • Special characteristics of properties: transitive, inverse, functional, etc.

Formal Semantics and Reasoning Support

  • Formal semantics describes the meaning of knowledge precisely.  It allows to reason about knowledge:
  • Class membership.
  • Equivalence of classes.
  • Consistency.
  • Classification.

A Formal semantics and reasoning support are usually provided by mapping an ontology language to a known logical formalism, and by using automated reasoners that already exist for those formalisms. OWL is (partially) mapped on a description logic, and makes use of existing reasoners such as FaCT and RACER. Description logics are a subset of predicate logic for which efficient reasoning support is possible.

 Species of OWL

  • OWL Full: language
    • is powerfull,
    • but undecidable (incomplete; inefficient resoning support)
    • fully compatible with RDF
    • mapping to predicate logic needed
  • OWL DL:
    • Application of OWL’s constructors to each other is disallowed
    • Looses full compatibility with RDF; but every legal OWL DL document is a legal RDF document
    • gains efficiency due to a mapping to description logic
  • OWL Light:
    • no enumerated classes
    • no disjointness statements
    • no arbitrary cardinality

This page will be continuously updated.