Topic Maps vs. Tagging

Posted by Martin Homik | Posted in Semantic Web | Posted on 25-06-2007

0

When talking about references to topics in general,a fellow researcher always preferred to talk in terms of Topic Maps. The other day, I had time to read about Topic Maps. The following brief description is taken from Wikipedia:

Topic maps are an ISO standard for the representation and interchange of knowledge, with an emphasis on the findability of information. The standard is formally known as ISO/IEC 13250:2003.

Topic Maps illustration

A topic map can represent information using topics (representing any concept, from people, countries, and organizations to software modules, individual files, and events), associations (which represent the relationships between them), and occurrences (which represent relationships between topics and information resources relevant to them). They are thus similar to semantic networks and both concept and mind maps in many respects. In loose usage all those concepts are often used synonymously, though only topic maps are standardized.

The interested reader might also read the following documents:

When I finished reading the documents. I asked myself how it differs from tagging in respect to usage. Tagging is so simple for bookmarking and annotation of bookmarks by using terms that come into your mind. In this respect, tagging is a representation of our associative memory, and hence, it is highly individual. Adding an extra level of complexity like in Topic Maps, never occurred to me feasible in the sense that people will like to use it. They prefer simplicity. However, it is very easy to map tags into topic maps and topic maps have some interesting advantages as stated by Lars Marius Garshol in his weblog: multiple names, different topics with the same name, and relationships between topics. This features introduce a much more reliable level of structuring knowledge. Have a look at Fuzzy.com, for instance.

Semantic Knowledge Representations

Posted by Martin Homik | Posted in Semantic Web | Posted on 22-06-2007

0

This blog entry is mostly for myself. It serves as a note to remember the differences between common knowledge representations. Every information is taken from A Semantic Web Primer book which I highly recommend. The Semantic Web is a very huge topic and it is not easy to grasp the concepts by only searching for web resources. I needed one place, written by one author (group) who explains intelligibly all the different knowledge representation that play a major role in the Semantic Web in a consistent way. Here, I will list the knowledge representations together with their “you have to know” facts. In the end, you will have a brief overview on which representation carries semantics and what kind of logic constructs it supports.

XML and XSL

  • XML allows the representation of information that is also machine-readable. Hence, XML can serve as a uniform exchange format between applications.
  • XML separates content from formatting.
  • XML is a meta-language for markup: it does not have a fixed set of tags but allows users to define tags of their own.
  • Nesting of tags introduces structure. The structure of XML documents can be defined/enforced by DTDs or by XML Schemas. Note, the nesting of tags has no standard meaning.
  • The semantics of XML documents is not accessible to machines, only to people.
  • Collaboration and exchange are supported if there is an underlying shared understanding of the vocabulary. XML is well-suited for close collaboration, where domain- or community-based vocabularies are used. It is not so well-suited for global communication.
  • Namespaces support the modularisation of DTDs and XML Schemas.
  • Accessing and querying of XML documents can be done by using XPath.
  • Transformation of XML documents can be done by using XSL and XSLT.

RDF and RDFS

  • RDF provides a foundatioon for representing and processing metadara.
  • RDF has a graph-based data model. Its key concepts are resource, property, and statement. A statement is a resource-property-value triple.
  • RDF has an XML based syntax to support syntactic interoperability. XML and RDF complement each otherbecause RDF supports semantic interoperability. Note, XML is just one possible representation which is handy for interoperability.
  • RDF has a decentralised philosophy and allows incremental building of knowledge, and its sharing and reuse.
  • RDF is domain-independent.RDF Schema provides a mechanism for describing specific domains – for defining a terminology.
  • RDF Schema is a primitive ontology language. It offers certain modelling primitives with fixed meaning. Key concepts of RDF Schema are class, subclass relations, property, subclass property relations, and domain and range restrictions.
  • XML Schema constraints the structure of XML documents, whereas RDF Schema defines the vocabulary used in RDF data models.
  • RDFS makes semantic information machine-accessible.
  • RDF supports reification: making statements about statements. This introduces some complexity.
  • In XML namespaces are only used for disambuigation purposes. In RDF external namespaces are expacted to be RDF documents defining resources, which are then used in the importing RDF document.
  • RDF inference systems implement only a few dozen rules. All those rules can be efficiently implemented. These systems do not rely on first-order logic. The inference systems are sound and complete.
  • Range definitions in RDF Schema are not used to restrict the range of a property, but rather to infer the membership of the range.
  • There exist query languages for RDF and RDFS such as RQL or SPARQL. Those query languages dpo not need to understand the document structure. They operate on the graph data model.

OWL

  • OWL is  teh proposed standard for Web ontologies. It allows us to describe the semantics of knowledge in a machine-accessible way.
  • OWL build upon RDF and RDF Schema: (XML-based) RDF syntax is used; instances are defined using RDF descriptions; and most RDFS modelling primitives are used.
  • Formal semantics and reasoning support is provided through the mapping of OWL on logics. Predicate logic and description logics have been used for this purpose.

Limitations of the Expressive Power of RDF Schema

  •  Local scope of properties: We cannot declare range restrictions that applay to some classes only.
  • Disjointness of classes.
  • Boolean combinations of classes.
  • Cardinality restrictions.
  • Special characteristics of properties: transitive, inverse, functional, etc.

Formal Semantics and Reasoning Support

  • Formal semantics describes the meaning of knowledge precisely.  It allows to reason about knowledge:
  • Class membership.
  • Equivalence of classes.
  • Consistency.
  • Classification.

A Formal semantics and reasoning support are usually provided by mapping an ontology language to a known logical formalism, and by using automated reasoners that already exist for those formalisms. OWL is (partially) mapped on a description logic, and makes use of existing reasoners such as FaCT and RACER. Description logics are a subset of predicate logic for which efficient reasoning support is possible.

 Species of OWL

  • OWL Full: language
    • is powerfull,
    • but undecidable (incomplete; inefficient resoning support)
    • fully compatible with RDF
    • mapping to predicate logic needed
  • OWL DL:
    • Application of OWL’s constructors to each other is disallowed
    • Looses full compatibility with RDF; but every legal OWL DL document is a legal RDF document
    • gains efficiency due to a mapping to description logic
  • OWL Light:
    • no enumerated classes
    • no disjointness statements
    • no arbitrary cardinality

This page will be continuously updated.

Simple Reusable Competency Map

Posted by Martin Homik | Posted in e-portfolio | Posted on 19-06-2007

0

The drawback of the IEEE RCD specification is that competencies cannot be grouped or related to each other. Hence, the structure does not allow

  1. to build complex competencies from simpler ones, e.g. by using an isPartOf relationship,
  2. nor is it possible to establish or name relationships between concept definitions at all.

Claude Ostyn is working on on a new standard data definition, Simple Reusable Competency Map (SRCM) which aims “to be used for describing, referencing, and exchanging data about the relationships between competencies, primarily in the context of online and distributed learning�. The basic idea is to concatenate RCDs in a in direct acyclic graph. This enables to define relations between competencies in a top-down manner. For instance suppose you have the following DAG and you interpret edges as an isPartOf relationship. Each node in the DAG can denote either a RCD or a SRCP.

Some DAG topologies with different levels of complexity

  • Read the first DAG as follows. If Y is the only child of X then proficiency in X requires proficiency in Y and proficiency in Y implies proficiency in X. Similarly, proficiency in A requires proficiency in B AND C, and proficiency in B and C implies proficiency in A.
  • Different complex competencies might share more simple sub competencies.

The usage of DAGs and the implicit interpretation of of edges as isPartOf solves problem 1). I am not quite sure whether the standard is also open for the description of other relationships than the “isPartOf” which would solve 2). The missing information is a element in the data model that refers to or describes a relationship. I have to investigate this.

To understand SRCMs data mode, I made some graphical representations. I will briefly describe those. Red nodes denote manadatory elements, blue nodes are optional elements, edges represent sub information elements.

Simple Reusable Competency Map data model

  • It’s unclear to what rcdRef is referring to. I could not find any explanation for this.
  • Referential is a boolean placeholder which indicates whether the structure is self-contained or whether it refers to some other SRCMs. In principle, this value could be deduced automatically while going through the graph nodes.
  • The Metadata element can include any information, but it is recommended to stick to LOM here and to put anything else in Extensions.
  • The crucial information is within the Graph element whic contains a list of nodes, a reference to entry nodes, nodes without parents, and a reference to the “root” default entry node.

So, to break it down, the very interesting information of this structure is inside the node type.

SRCM node structure model

The node type has the following elements:

  • rcdRef refers to a RCD if the node is not a grouping.Otrherwise it is nil.
  • SymLink might refer to a SRCM to include a map defined somewhere else.
  • A model element might refer to some model or vocabulary and use a “label” to define a class of nodes, RCDs respictively.
  • Each node refers to its parents.
  • Each node lists its children.
  • each node lists a set of rules.

Relative to the node’s context children node’s have additional information. This is:

  • they have a weight which contributes to the calculation of the parents proficiency.
  • dataRequired field indicating whether the proficiency information should go into rollup calculations.
  • they have a required proficiency value
  • and a desired proficiency value.

Note that a competency’s or map’s weight and proficiency values might differ in different contexts. For instance, depending on a educational level, one might expect higher proficiency values for higher educational levels.

To be honest, I do not understand the definition between required and desired proficiency.

Rules define rollup computations for all children, including proficiency required/desired, rollup method and an additional value depending on the rollup method. Methods can be: all, any, fraction, units, mean, and other. Rules proficiency statements are made for the context and they override the requirements specified for individual nodes of the context.

Examples are:

  • A required 80% proficiency value for any sub competency, i.e., just one sub competence needs to be fullfilled in order to imply a proficiency in the current node.
  • A proficiency mean of 80% requires that the mean of the proficiency values of the sub comptenecies is at least 80%.
  • Node RX refernces RCD X and specifies a proficiency required of 70%. However, node RX is a child of RA that specifies that for child RX the proficiency required is 80%. When rolling up comptency status information from RX into the comptency status information of RA, the proficiency required used to evaluate whether a measure satisfies the requirement will be 80%.

Some conclusions:

  1. Finally, a data structure model for relating comptencies with each other. At least for isPartOf.
  2. The data structure model mixes competency structures/relations with proficiency information. I am not sure whether this approach meets my expectations.
  3. It is unclear what the difference is between required and desired proficiency.
  4. Why should proficiency information in rules (on the context) override individual node information for the context. Isn’t it the other way round? Nodes are more specific than general rules?
  5. It is unclear whar rcdRef means in the context of the rcm element

Well, it is good to know that there is some solution for the interoperability of complex competency definitions. Whether I will use it at some point, I do not know.

Reusable Comptency Definitions (RCD)

Posted by Martin Homik | Posted in e-portfolio | Posted on 18-06-2007

0

This IEEE 1484.20.1 standard defines a data model for describing, referencing, and sharing competency definitions. It is quite simple and in this blog entry I want to explain only the difficult to understand elements. Below is a simplified graphical data model of RCD created in form of a diagram:

RCD Data Model

Arrows denote element that are part of an data structure. Red nodes denote a mandatory information, blue nodes represent an optional data.

The most interesting part of the specification is the definition information. The definition is an optional information. There can exist several definitions in parallel. A definition can be stated in different ways.

  • Either there exists already a definition, so I reference it by the source element.
  • Or I make one or more statements about the RCD in the definition section.

The statement element itself may be made in different ways. Note, that those alternatives are conditionally optional,i.e., only one of the alternatives is to be used. Alternatives are:

  • Statement identifier is a unique label within the scope of the definition. I do not understand wether this information is a label for the statement itself or if it refrences some other statement. If it is a unique reference label, where is it referenced then? If it is a reference, where is the unique reference label stored? The fact that this element is conditionally optional it suggests that the element is used for referencing other statements.
  • Statement name is just a string name. Examples are: Condition, Action, Standard, Outcome, Criteria, etc.
  • Statement text is some more verbal text that describes the competency.
  • Statement token is a data structure that includes a pointer to some location which defines a vocabulary to use and a value chosen from this vocabulary.

Example. I have to add one. So far I came across an example shipped together with IMS Rubric /e-portfolio specification, but I do not like it very much. At least, it shows you how to use the definition source model element and some statements. Surprisingly, IMS uses for statements both: statement name and text. This breaks the conditionally optional concept.

The example basically defines the following:

  • It references a basic rubric modelin the model source field
  • It makes three statements:
    1. Regarding measure,  all artifacts and work samples are clearly and directly related to one or more national, regional, or state teaching standards and provide evidence of professional practice.
    2. Regarding score, a value of 9 has been achieved.
    3. According to skill-level, the token exemplary has been chosen from the vocabulary source at URN:FICTIONAL:UWSTOUT:SKILL-LEVELS.

Well, need to add more good examples including one in IMS RCDEO XML specification.

Metadata is an optional information. According to the specification, there is no restriction, but I guess that it is recommended to stick to the LOM standard. Also, it is suggested to put here the information about the schema and its version the information model sticks to.

By the way, what is the difference between data model and information model? The IEEE draft presents a data model while IMS is talking about an information model. Have to check this.

Competency Matching

Posted by Martin Homik | Posted in e-portfolio | Posted on 18-06-2007

0

The master thesis task mentioned in an earlier blog entry on competency matching has been cancelled. I will implement it myself. I plan to start with with several steps:

  • Check for related work.
  • Check for competency models.
  • Check for technologies to be used.

Check for related work. There has been some work on competency matching such as Kowien and Professional-Learning. Kowien is a project that ran out in 2004, and as far as I know, it has not been continued. However, it investigated how to use ontologies for the management of competency profiles.

Professional Learning is a project that starts where Kowien finished. It also favours a solution based on semantic web technologies. The goal was to propose a framwork description that glues together several components which interrelate with each other in a business / on-the-job setting where being up to date educationally is a requirement. The componets are: user profile, HR system, learning repository, and job tasks description. The components are glued by competencies.  An individual has or lacks competencies; job tasks are described by required competencies; learning objects goals can be decribed in form of competencies. HR Systems idenfy competency gaps and recommend learning objects to close competency gaps. 

Check for competency models. Claude Ostyn lists a few comptency models standards. Among others he lists the IEEE RCD data standard for reusable competency definitions which has been implemented by IMS in IMS RCDEO in an information model. Right now, Ostyn is working on on a new standard data definition, Simple Reusable Competency Map (SRCM) which aims “to be used for describing, referencing, and exchanging data about the relationships between competencies, primarily in the context of online and distributed learning”. The basic idea is to concatenate RCDs in a in direct acyclic graph. The description of data about relationships between competencies is a bit misleading, as the parent-children graph relation implies a semantic consistsOf/isPartOf relationship. I am not quite sure whether the standard is also open for the description of other relationships.  I have to investigate this.

Another short but nice competeny model (in this case ontology)  is presented in the professional learning. The basic idea is to describe competencies in a hierarchical manner including a description of the competency level (or even range). A subsumes and isComposedOf relationship helps to use competency substitutions and competency subsumptions to deduce correct matchings.

Both models, SRCM and Professional Learning, aim to connect job/task/learning object descriptions with an individual’s profile. The glueing stuff are the competency descriptions.

Check for technologies to be used. A possible and my favourite solution to my task is a semantic web approach. The reasons are:

  • A clear distinction beween model and logic.
  • Different ontologies depending on the cotext can be devised for different purposes while the basic ontology model is used.
  • Depending on the  ontology language (inference) some functionalities are provided for free, e.g. class subsumption or transitivity (OWL).
  • There already exist several very good frameworks that additioanlly offer query possiblitie, another abstraction level. That is we have a deductive modelling language, a set of possible inference engines, and a deductive query language.

I reviewed several open source framworks: Jena, Sesame, and IBM Semantic Layer Research Plattform. While Jena has been there for a while it is considered to be slow and too inflexible. I do not want to comment this as I am not an expert. Collegues at DFKI recommended to use Sesame. One of the main benefits is its  abstraction to the inference engine. Through a plugin mechanism it is possible to include own inference implementations. Sesame itself comes with RDF and RDFS inference. However, OntoText offers  an OWL inference plugin, called Owlim. Finally, IBMs Semantic Layer Research PLattform is an application plattform for applications that need semantic data store and inference layer. In the recent version, the decision was made to change from Jena to Sesame.

Due to my collected data and because I have to make a decision very soon, my choice is in favour of Sesame. I’ll see how far I get with that.

Bluenote: RDF Store complements MBase

Posted by Martin Homik | Posted in ActiveMath | Posted on 13-06-2007

3

Motivation. We wrote a great application for ActiveMath, called iCMap, which visualizes the structures of a mathematical domain and helps students to understand an recreate those structures. This tool is interactive and provides verification as well a suggestion mechanism. One of the key implementation problems in this tool is the lack of seperation between data (queries) and logic. Because iCMap requires inference mechanism to compute transitivities or fault-tolerances for verification, all those inferences are implemented in Java making heavy use of the MBase. This approach has several problems:

  • MBase interface does no inferencing. Hence, it does not calculate the transitive closure for a learning item. iCMap computes that itself and requires for this action resources. Consequently, for one query, iCMap needs to contact the MBase several times. A reduction to one query with one result set as response is wished.
  • Because inferencing is inside iCMap Java methods, adding new queries is not easy and requires a rebuild and deployment of the application.
  • Queries are not reusable as they are embedded in iCMap. They are to some extent “cryptified”, i.e., you have to go through the code and understand it. A standard query language would solve the problem.

Solution. To overcome those deficiencies, I suggest to implement in addition to MBase a RDF Store for OMDoc learning object’s metadata information. This store will be optimized for RDF models, will include a reasoner for OWL (Lite/DL), and accept SPARQL queries. The component can run as a web-service and will be therefore reusable for other applications. Possible technologies for the implementation are Sesame or Jena.

Bluenote: Web2.0 Application for ActiveMath

Posted by Martin Homik | Posted in ActiveMath | Posted on 13-06-2007

1

Yesterday, I watched Steve Job’s keynote speech at the WWDC07. He mentioned that for writing iPhone applications there won’t be any SDK. Instead, they will support developers to write Web2.0+Ajax applications running in Safari which which is shipped together with the iPhone. So, the advantage is, developers can use well-experienced web-technologies and do not need to deal with distribution: you just access the application via a URL.

A year ago, we started a brainstorm discussion about Web2.0 applications (running on a PDA) to be integrate into ActiveMath. we had only a few ideas and nothing has been implemented yet. We are in delay and we should watch that we do not miss the train.

Here is another suggestion: ActiveMath Dictionary for the iPhone. I believe that on the long-term, PDAs will vanish and will be merged with mobile devices. Actually, this is already the case, we call those devices “Smartphones”, but they won’t make their way out of their business-targeted market. The iPhone addresses the regular customer and provides apart of an awesome UI,Wi-FI and EDGE to access the broadband. In principle, the iPhone is the PDA we always dreamed of and a tiny “notebook” for everyone.

My idea is quite simple and most of the implementation is already done. Just write an iPhone-ready GUI interface for “search”. (I still do not like this name. I’d like to call it “dictionary”, “search” sound more like an action.) We could offer this task as a bachelor thesis as soon as the iPhone comes out in Europe. I am pretty sure there will be computer science students who buy one and they will be thrilled to write an application for it. Maybe this task is too simple. I’d be happy to start another brainstorm session.