KMR > E-learning frameworks > Specification development > IMS metadata
Constructing an RDF Schema for IMS metadata:
a comparison of approaches
In this document I will try to compare some different approaches to the modeling
of IMS metadata in RDF. It is effectively a comparison of the five schemas I
have seen; two by myself, called [flat] and [hier], one from
KBS in Hannover, called [kbs], one from the UNIVERSAL project,
called [univ], and one from Saba, called [saba].
They can all be found on the main RDF page.
I will concentrate on a few concrete aspects of these bindings. These are:
- Structure
- Data representation
- Ordering
- Compatibility
- Type safety
The comparison consists of my own personal ideas and is not necessarily objective.
Please do not see this a some sort of attack on your favorite binding. I will listen
to any objections -- this is a Work In Progress.
I only wish I had more pictures and examples. Please be patient with this.
1. Structure
There are two approaches to structure. One is a highly hierarchical one, where
the nine categories are in some way explicitly represented in the model. The other
extreme is a completely flat structure essentially consisting of (property, value) pairs.
The most hierarchical of the schemas is probably [hier]. This schema
uses a separate resource for each category, letting the properties apply to
the categories, and thus only indirectly to the resource. The [univ]
schema lets the properties apply directly to the resource, but uses different namespaces
for different categories. The advantages are that eliminates the need to differentiate
between e.g. general.description and educational.description as they separate
cleanly between the nine categories defined in the information model.
The [hier] schema also gains in modularity. For example,
it is easier to reuse parts like the whole metametadata category.
The [kbs] schema is more flat in nature, not trying to explicitly
separate between the categories unless where necessary (only a few places actually).
It still has an extensive data hierarchy in the property values. The
[flat] and [saba] schemas try to eliminate this
by modeling the schema on the Dublin Core idea of values and qualifiers.
Thus most properties can have
a single string literal as value. This can then be extended to include qualifying
properties in a consistent way. The advantages with all three of the flatter bindings include
the removal of complicating structures that only fill a very technical purpose.
2. Data representation
Three of the schemas ([hier], [kbs] and [univ])
use very similar constructs for the low-level data, which essentially consists of
LangStrings and Dates. One concrete difference is that [hier] uses
the standard XML xml:lang attribute for tagging strings with language, while
the other two introduce language properties. The advantage of having an
explicit RDF construct versus using the XML serialization for this is an interesting
subject for discussion. Apart from this, I see no fundamental difference between them.
On the other hand, approaches differ significantly when it comes to vocabularies.
The [univ] explicitly restricts the values of vocabularies to be
from the sets defined in IEEE LOM. No extension mechanism (using the vocabulary.source
element) is present. This is a serious compatibility problem, and needs to be addressed.
The other two schemas, [flat] and [saba], differ in the approach.
As an example, they
see the description property as having a single string value.
If several translations are provided, they are put in an rdf:Alt container.
Other structures, such as vocabularies, are modeled in the same way. The property
status in the [flat] binding, for example,
can have a string literal as value. If we want
to name a source, we introduce an intermediate object with the properties
source with the source as value, and rdf:value
with the string literal as value. This is consistent with the Dublin Core and
the VCard schemas, and uses standard RDF constructs to model lists and qualifiers.
Although presently incomplete, the [saba] binding would work
similarly.
A problem with this latter approach is that the specification becomes a lot more
complicated to express formally. More on this below.
3. Ordering
Another problem that needs to be taken care of is the issue of ordered lists.
The [hier], [kbs] and [univ] schemas have no explicit
constructs for this, but instead depend on the XML serialization to provide ordered
lists. This is problematic, as this order may not be visible to an application, and needs
to be addressed.
The [flat] and [saba] schemas use
standard RDF constructs such as rdf:Seq
and rdf:Bag to separate ordered from unordered lists.
4. Compatibility
What kinds of compatibility problems could be relevant? I see at least four:
- Dublin Core RDF binding. This is relevant as there are several overlapping
elements in this specification. The translation from one into the other
should either be very simple or possibly unnecessary (direct reuse
of constructs).
- VCard RDF binding. The VCard RDF binding should probably
be reused directly.
- Generic RDF tools. Reuse of standard RDF constructs
(such as rdf:Alt/Seq/Bag and rdf:value,
and xml:lang) is often desirable,
as this would make it possible for IMS-unaware tools (such as
semantic web agents, search engines etc.) to
partly understand and even modify IMS metadata. This is the very idea
behind RDF.
- IMS metadata 1.2 XML DTD/schema. It should be a simple task to convert
a IMS metadata XML record to RDF and the other way around. I also
believe that the element names etc. used in the XML DTD should be
reused to the extent possible, to eliminate any possibilities for
misunderstandings.
How are the five schemas doing with respect to these issues?
- The [kbs] binding is relatively compatible with the 1.2 XML DTD.
It does reuse the standard RDF construct of a Bag in some places,
but then introduces several
constraint properties which a generic RDF tool would not be able
to interpret. It is not directly compatible with Dublin Core and does not
use VCard.
- The [univ] binding is relatively compatible with the 1.2 XML DTD.
It does not reuse any standard RDF constructs, but does not introduce their
own properties either. It is not directly compatible with Dublin Core
and does not use VCard.
- The [hier] binding is designed to be maximally compatible with
the 1.2 XML DTD to the extent possible, even in XML record structure
(as can be seen in the example). It has the same problems as
the above regarding the Dublin Core and VCard issues, and also does not reuse
RDF constructs.
- The [flat] binding is designed to be directly compatible
with Dublin Core and VCard, to the extent that replacing the title
property with the dc:title property is feasible, as
are several other replacements. It is relatively compatible
with names used in the XML DTD, even if the structure
of a record would be completely different. The VCard binding
is directly reused, and it relies heavily on standard RDF constructs. It has
other problems, described below.
- The [saba] binding is designed to be directly compatible
with Dublin Core (but not VCard). Indeed, it reuses several Dublin Core
constructs explicitly, and it relies heavily on standard RDF constructs. It has
other problems, described below.
The conclusion is that this is an area where no real consensus exists.
5. Type safety
One important problem is to make the RDF schema type safe, i.e., specify
rdfs:range and rdfs:domain (and possibly other)
constraints on the properties in order to enable syntactical
checking using standard RDF tools.
How are the five schemas doing with respect to these issues?
- The [kbs] binding is relatively type safe. It uses RDF Bags
(which are type-unsafe) in some places, but tries to remedy this
by introducing their own rdf container type restrictions. This is a good idea,
but not very useful for generic RDF tools.
- The [univ] binding is completely type safe.
- The [hier] binding is completely type safe
- The [flat] binding is not type safe at all. This is destroyed
partially by using untyped containers, partially by using the
rdf:value mechanism for adding qualifiers. This
is a necessary consequence of the Dublin Core/VCard compatibility.
- The [saba] binding has the same problems as the
[flat] binding.
It should be noted that a type safe approach is a natural consequence of trying
to write an RDF Schema for the binding, while a direct use of RDF-only constructs
naturally leads to a type-unsafe approach. This is clearly exemplified in the
examples contained in the respective specifications for RDF and RDF Schema.
Conclusions
I believe that the two fundamentally different approaches to the construction of
an RDF binding are:
- The type-safe approach, as exemplified in [kbs],
[univ] and [hier].
- The type-unsafe approach, as exemplified in [flat] and
[saba].
The three type-safe bindings are not very different, really. It should be possible to
reach a consensus on a single type-safe RDF binding. The data types and ordering problems
are very similar, and the structural differences can be overcome. The same is true
of the two type-unsafe bindings. But the two categories are
fundamentally incompatible with each other. Thus, the most important
decision to make is to choose one of these approaches. Perhaps there is some
way to use the best of both?
One question we must ask ourselves is this: why do we want an RDF binding?
My answer would be: "To be able to reuse RDF machinery in all forms". I think
we must be very careful to use RDF in a native way, the way that it is intended to
be used. Exactly what this means is of course not clear.
Please mail any comments to
Mikael Nilsson <mini@nada.kth.se>.