KMR > E-learning frameworks > Specification development > IMS metadata

Constructing an RDF Schema for IMS metadata: a comparison of approaches

In this document I will try to compare some different approaches to the modeling of IMS metadata in RDF. It is effectively a comparison of the five schemas I have seen; two by myself, called [flat] and [hier], one from KBS in Hannover, called [kbs], one from the UNIVERSAL project, called [univ], and one from Saba, called [saba]. They can all be found on the main RDF page.

I will concentrate on a few concrete aspects of these bindings. These are:

Structure
Data representation
Ordering
Compatibility
Type safety

The comparison consists of my own personal ideas and is not necessarily objective. Please do not see this a some sort of attack on your favorite binding. I will listen to any objections -- this is a Work In Progress. I only wish I had more pictures and examples. Please be patient with this.

1. Structure

There are two approaches to structure. One is a highly hierarchical one, where the nine categories are in some way explicitly represented in the model. The other extreme is a completely flat structure essentially consisting of (property, value) pairs.

The most hierarchical of the schemas is probably [hier]. This schema uses a separate resource for each category, letting the properties apply to the categories, and thus only indirectly to the resource. The [univ] schema lets the properties apply directly to the resource, but uses different namespaces for different categories. The advantages are that eliminates the need to differentiate between e.g. general.description and educational.description as they separate cleanly between the nine categories defined in the information model. The [hier] schema also gains in modularity. For example, it is easier to reuse parts like the whole metametadata category.

The [kbs] schema is more flat in nature, not trying to explicitly separate between the categories unless where necessary (only a few places actually). It still has an extensive data hierarchy in the property values. The [flat] and [saba] schemas try to eliminate this by modeling the schema on the Dublin Core idea of values and qualifiers. Thus most properties can have a single string literal as value. This can then be extended to include qualifying properties in a consistent way. The advantages with all three of the flatter bindings include the removal of complicating structures that only fill a very technical purpose.

2. Data representation

Three of the schemas ([hier], [kbs] and [univ]) use very similar constructs for the low-level data, which essentially consists of LangStrings and Dates. One concrete difference is that [hier] uses the standard XML xml:lang attribute for tagging strings with language, while the other two introduce language properties. The advantage of having an explicit RDF construct versus using the XML serialization for this is an interesting subject for discussion. Apart from this, I see no fundamental difference between them.

On the other hand, approaches differ significantly when it comes to vocabularies. The [univ] explicitly restricts the values of vocabularies to be from the sets defined in IEEE LOM. No extension mechanism (using the vocabulary.source element) is present. This is a serious compatibility problem, and needs to be addressed.

The other two schemas, [flat] and [saba], differ in the approach. As an example, they see the description property as having a single string value. If several translations are provided, they are put in an rdf:Alt container. Other structures, such as vocabularies, are modeled in the same way. The property status in the [flat] binding, for example, can have a string literal as value. If we want to name a source, we introduce an intermediate object with the properties source with the source as value, and rdf:value with the string literal as value. This is consistent with the Dublin Core and the VCard schemas, and uses standard RDF constructs to model lists and qualifiers. Although presently incomplete, the [saba] binding would work similarly.

A problem with this latter approach is that the specification becomes a lot more complicated to express formally. More on this below.

3. Ordering

Another problem that needs to be taken care of is the issue of ordered lists. The [hier], [kbs] and [univ] schemas have no explicit constructs for this, but instead depend on the XML serialization to provide ordered lists. This is problematic, as this order may not be visible to an application, and needs to be addressed.

The [flat] and [saba] schemas use standard RDF constructs such as rdf:Seq and rdf:Bag to separate ordered from unordered lists.

4. Compatibility

What kinds of compatibility problems could be relevant? I see at least four:

Dublin Core RDF binding. This is relevant as there are several overlapping elements in this specification. The translation from one into the other should either be very simple or possibly unnecessary (direct reuse of constructs).
VCard RDF binding. The VCard RDF binding should probably be reused directly.
Generic RDF tools. Reuse of standard RDF constructs (such as rdf:Alt/Seq/Bag and rdf:value, and xml:lang) is often desirable, as this would make it possible for IMS-unaware tools (such as semantic web agents, search engines etc.) to partly understand and even modify IMS metadata. This is the very idea behind RDF.
IMS metadata 1.2 XML DTD/schema. It should be a simple task to convert a IMS metadata XML record to RDF and the other way around. I also believe that the element names etc. used in the XML DTD should be reused to the extent possible, to eliminate any possibilities for misunderstandings.

How are the five schemas doing with respect to these issues?

The [kbs] binding is relatively compatible with the 1.2 XML DTD. It does reuse the standard RDF construct of a Bag in some places, but then introduces several constraint properties which a generic RDF tool would not be able to interpret. It is not directly compatible with Dublin Core and does not use VCard.
The [univ] binding is relatively compatible with the 1.2 XML DTD. It does not reuse any standard RDF constructs, but does not introduce their own properties either. It is not directly compatible with Dublin Core and does not use VCard.
The [hier] binding is designed to be maximally compatible with the 1.2 XML DTD to the extent possible, even in XML record structure (as can be seen in the example). It has the same problems as the above regarding the Dublin Core and VCard issues, and also does not reuse RDF constructs.
The [flat] binding is designed to be directly compatible with Dublin Core and VCard, to the extent that replacing the title property with the dc:title property is feasible, as are several other replacements. It is relatively compatible with names used in the XML DTD, even if the structure of a record would be completely different. The VCard binding is directly reused, and it relies heavily on standard RDF constructs. It has other problems, described below.
The [saba] binding is designed to be directly compatible with Dublin Core (but not VCard). Indeed, it reuses several Dublin Core constructs explicitly, and it relies heavily on standard RDF constructs. It has other problems, described below.

The conclusion is that this is an area where no real consensus exists.

5. Type safety

One important problem is to make the RDF schema type safe, i.e., specify rdfs:range and rdfs:domain (and possibly other) constraints on the properties in order to enable syntactical checking using standard RDF tools. How are the five schemas doing with respect to these issues?

The [kbs] binding is relatively type safe. It uses RDF Bags (which are type-unsafe) in some places, but tries to remedy this by introducing their own rdf container type restrictions. This is a good idea, but not very useful for generic RDF tools.
The [univ] binding is completely type safe.
The [hier] binding is completely type safe
The [flat] binding is not type safe at all. This is destroyed partially by using untyped containers, partially by using the rdf:value mechanism for adding qualifiers. This is a necessary consequence of the Dublin Core/VCard compatibility.
The [saba] binding has the same problems as the [flat] binding.

It should be noted that a type safe approach is a natural consequence of trying to write an RDF Schema for the binding, while a direct use of RDF-only constructs naturally leads to a type-unsafe approach. This is clearly exemplified in the examples contained in the respective specifications for RDF and RDF Schema.

Conclusions

I believe that the two fundamentally different approaches to the construction of an RDF binding are:

The type-safe approach, as exemplified in [kbs], [univ] and [hier].
The type-unsafe approach, as exemplified in [flat] and [saba].

The three type-safe bindings are not very different, really. It should be possible to reach a consensus on a single type-safe RDF binding. The data types and ordering problems are very similar, and the structural differences can be overcome. The same is true of the two type-unsafe bindings. But the two categories are fundamentally incompatible with each other. Thus, the most important decision to make is to choose one of these approaches. Perhaps there is some way to use the best of both?

One question we must ask ourselves is this: why do we want an RDF binding? My answer would be: "To be able to reuse RDF machinery in all forms". I think we must be very careful to use RDF in a native way, the way that it is intended to be used. Exactly what this means is of course not clear.

Please mail any comments to
Mikael Nilsson <mini@nada.kth.se>.