KMR > E-learning frameworks>Specification development >LOM metadata > Semantic issues

Semantic issues with the LOM RDF binding

Mikael Nilsson 2003-01-15

This is my effort at documenting the issues that have been raised during the design of the LOM RDF binding. It has grown longer than I had envisioned, but it contains many things necessary to understand the problems. Please read when in a good mood!

1. Using RDF for Meta-data as Compared to Pure XML

1.1 Semantic modeling

in a pure XML approach such as the LOM XML Binding, the structure of the XML instance is the result of choosing the most convenient syntax, creating the element hierarchy that best matches the structure of the LOM data model.

By contrast, in RDF the precise data model is not only syntactic, but has semantic consequences. RDF is a highly object-oriented "language" where objects have properties that relate them to other objects. The type of an object or property defines its interpretation, and is thus not simply a syntactic placeholder. In the pure XML version of LOM each structure is represented by an element. In RDF there are several different possibilities for representing a LOM element: you can use Properties, Resources, Classes, or even namespaces to reflect the structure of LOM. And the choice matters, as those constructs have fundamentally different semantics. All of these are used in the current draft.

Thus, a considerable amount of effort is needed to extract the desired semantic quality of each element in order to be able to represent it appropriately. If this reinterpretation is not done, you risk losing not only clarity for the human consumer, but you risk more serious damage to the usefulness of the model. Much of the effort that has gone into this binding has focused on creating such a well-formed (machine-interpretable) semantics of the model.

We therefore expect to see much richer structures on many levels in an RDF representation than in the corresponding XML binding instance. In this perspective, we should expect to find that meta-data expressed in RDF using this binding probably can be exported to XML format without many problems; however, an XML meta-data record cannot always be effortlessly translated to RDF, as the translation will depend on your setup.

1.2 Documents vs. statements

As a consequence of this, we cannot expect the RDF binding to fulfill the same purpose as the XML binding. The XML Binding defines an exchange format for meta-data. The meta-data might be contained in a database and an XML representation generated on demand, for export to other tools and environments. Thus, an XML meta-data record is a self-contained entity with a well-defined structure.

In RDF, the meta-data is not always self-contained, but rather forms part of a global network of information, where anyone has the capability of adding any kind of meta-data to any resource. It is not the case that meta-data for one resource need be contained in a single RDF document. Translations might be administrated separately, and different categories of meta-data might be separated. This dramatically strengthens the incentive both to reuse identical structures that are used repeatedly, as well as to create decentralized descriptions of resources. Both of these phenomena naturally lead to a fundamentally different approach to meta-data modelling than that found in the XML binding.

One way of putting it is this: while XML is document-centric, RDF is statement-centric. XML describes the structure of a complete document instance. RDF describes the structure of single metadata statement. The RDF binding must therefore be designed one element at a time.

1.3 Semantic and structural extensions

Another aspect is that of compatibility. In the XML binding, there is no standard way to reuse other meta-data standards. The statement-centric design of RDF leads to naturally reuseable constructs. Metadata elements can be extended both structurally (by adding more information), or semantically (by adding refinements of elements). This binding has been designed to be directly compatible with Dublin Core (including the DC Qualifiers, DC Type and DC Education vocabularies) and with the vCard RDF binding. However, this compatibly comes at the price of modeling freedom - some modeling restrictions are imposed on us. Fortunately, much of this compatibility comes for free when taking the approach that the data should be modeled to maximally exploit the expressivity of RDF.

Finally, as RDF is intended to be processed by software, and in many cases software with no explicit knowledge of LOM, it is important to use explicit data typing. This will be seen below in the representation of languages and dates. We have tried to avoid using string literals with implicit typing. Thus, a goal of this binding has been to define a set of RDF constructs that facilitates introduction of LOM meta-data into the semantic web in the most convenient way.

3. Design decisions

This binding has been in development since approximately March 2001, when a first attempt at encoding IMS metadata in RDF was made. The first version of the binding was released with version 1.2 of the IMS metadata standard. Most of the important design decisions had been made at that point (some of the discussion can be seen here). The current binding is a development of that binding, consisting of some clarifications, several minor changes in encoding, updated namespaces and a new introduction, and an update to LOM 1.0 (from draft 6.1, used in IMS). The most important design decisions resulting from these efforts have been:

The binding should extend the Dublin Core and DCQ RDF vocabularies whenever possible.
The binding should reuse the vCard RDF binding.
The binding should use URIs for all vocabulary terms, and not literals.
The binding should try to maintain the intended semantics of LOM and RDF, while not necessarily perfectly represent the exact structure of the LOM information model.
The binding should be relatively straightforward to translate into an XML format for LOM, without losing any LOM information (other information might be lost, however).

We will now discuss each of these point separately.

3.1 Dublin Core

The RDF representation of LOM relies heavily on the Dublin Core meta-data element set, and its representation in RDF. We try to model LOM elements similarly to how the Dublin Core qualifiers are represented. The Dublin Core RDF usage model is taken from the latest DC-Architecture RDF draft, foundhere. Understanding that work is helpful when trying to understand this binding. The decision to extend Dublin Core was made early, and was probably the single most important decision for this binding. This decision is therefore well-aligned with the efforts to improve interoperability between Dublin Core and LOM (see the memorandum of understanding here).

The RDF binding is designed to be almost fully Dublin Core RDF compatible, in the sense that meta-data constructed according to this guide can be directly understood by Dublin Core-aware software. All the elements of the LOM Dublin Core mapping (in Appendix B of LOM 1.0) are represented in a way compatible with both LOM 1.0 and with Qualified Dublin Core.

It is, however, at this time not possible to map any Dublin Core construct (made without reference to this guide) to a LOM element without some effort, as the LOM requires a more detailed structure in many elements. In short, this guide describes some restrictions to Qualified Dublin Core meta-data that are needed to be LOM compatible. The guiding principle has been that applying the "dumb-down" algorithm described in the Dublin Core Qualifiers in RDF draft should result in correct Dublin Core meta-data.

Please note that the Dublin Core Qualifiers work referred to above has not yet reached its final version, so some constructs described here might change.

See below for a more detailed description of the Dublin Core mapping.

3.2 vCard

This binding also makes use of the vCard RDF binding by the W3C in a fairly straghtforward manner.

3.3 URIs for vocabularies

Of fundamental importance for RDF is the usage of URIs (or strictly speaking, URI references). Using URIs for all terms in any RDF vocabulary makes it possible to add RDF metadata to the vocabulary terms themselves. Examples of vocabulary metadata could be machine semantics such as "this term is a refinement of dc:contributor" or human-readable information such as "this term is called 'Skapare' in Swedish". It was quickly decided that this binding must use URIs for all vocabulary terms used.

3.4 LOM semantics vs. syntax

The Dublin Core metadata specifications deal with terms such as "Element", "Element refinement", and "Element Encoding", that have obvious counterparts in RDF:

Dublin Core term	RDF term
Element	`rdf:Property`
Element refinement	`rdfs:subPropertyOf`
Element encoding	`rdfs:Class` (used as `rdfs:range` of the corresponding `rdf:Property`)

Dublin Core therefore has well-defined semantics of each element, a semantics that corresponds well with RDF semantics. One of the most important problems for this binding has been that the LOM Data Model does not seem to have an explicit semantics for its elements. It rather seems that the term "Element" in LOM more closely corresponds to the term "Element" in XML, representing a node in a hierarchy. LOM contains no facilities for refining elements or using another element encoding. It would therefore seem that LOM uses a structural model just like XML, while Dublin Core and RDF uses a more semantic model.

Thus, in order to encode the LOM 1.0 data model in RDF in a manner compatible with Dublin Core, we have had to do some re-modeling of LOM, trying to interpret the element hierarchy in terms of "properties" and "values". This is discussed in more detail below. In the following, I will refer to Dublin Core Elements with the term "DC Element", and to LOM elements using "LOM Element" to disambiguate the term "element".

3.5 LOM XML

Close attention has been paid to the LOM data model and its XML binding, and no structure representable in LOM should be problematic to represent in the RDF binding. But there are some differences from the LOM XML binding in structure, naming, and representation. However, converting an RDF version of the LOM data to XML should be straightforward. Many users of the RDF binding will use highly customized versions of the binding, with many structural and semantic extensions, as well as application-specific conventions. It can therefore not be expected that a generic XML-to-RDF conversion tool will work for all situations.

4. The binding development process

The binding has been developed in a number of steps. Explaining this process helps when trying to explain specific problems, so I have chosen to include it here.

4.1 Isolate properties and objects

The first step involves extracting an object-oriented view of the LOM data model. What LOM elements are objects, and which are relations between objects?

The first kinds of LOM elements that were taken care of were the nine LOM categories. As they do not in themselves carry information, but only represent a context for other LOM elements, they were simply used as namespaces for the properties in each category. So the binding consists of nine category-specific namespaces plus one namespace for general information.

Exceptions to the "a category is only a namespace" rule were the categories 7. Relation, 8. Annotation and 9. Classification. This can be seen in that the categories themselves are repeatable, so that each occurence of a category represents a distict value of some property of the learning object [What is up with 5. Educational???]. However, these categories still use their own namespaces.

The second kind of elements that can easily be discerned are obvious objects: the basic data types. These include:

Dates
Dutations
Languages (i.e. the values of 1.3, 3.4 and 5.11)
Langstrings

and so on. Generally, all leaf nodes in the LOM data model are objects. Note that several of these objects may have several properties of their own (e.g. dates may have descriptions).

"Vocabulary" items are sometimes, but not always, mapped to objects. The LOM element 5.4 Educational.SemanticDensity uses a property, called lom_edu:semanticDensity, with a value of type lom_edu:SemanticDensity. Five instances of that type are defined, which form the vocabulary given in the LOM data model, e.g. lom_edu:HighDensity. So it is now easy to make statements of the form "My resource has semantic density 'high'":

Subject	Predicate	Object
`<http://www.myresource.com/>`	`<lom_edu:semanticDensity>`	`<lom_edu:HighDensity>`

(this is an namespace-abbreviated N-TRIPLE RDF format). It is also very easy to make new vocabularies for this LOM element: just define your own instances of the Class lom_edu:SemanticDensity. (Note that this is a nice example of the statement-orientedness of RDF. The statement above is in itself a complete piece of RDF, and can live independently of any other metadata for that resource).

But take as another example the 7.1 Relation.Kind element. This element corresponds very closely to an element refinement of the dc:relation Dublin Core element. In RDF, such refinements are represented as distinct properties, each marked as being refinements (sub-properties) of dc:relation. So a vocabulary for 7.1 Relation.Kind is actually a list of properties, not of values like in the Semantic Density example. To say that "My resource is part of http://www.w3.org/", one would simply say

Subject	Predicate	Object
`<http://www.myresource.com/>`	`<dcterms:isPartOf>`	`<http://www.w3.org/>`

where dcterms:isPartOf is a sub-property of dc:relation:

Subject	Predicate	Object
`<dcterms:isPartOf>`	`<rdfs:subPropertyOf>`	`<dc:relation>`

This fact is already recorded in the DC Qualifiers RDF schema, of course. So, creating a vocabulary for the 7.1 Relation.Kind LOM element boils down to defining sub-properties (=refinements) of dc:relation. Note the difference between this and the above example, where we defined new instances of lom_edu:SemanticDensity.

For the rest of the elements (quite a few), things are not always obvious. I wish I could produce a complete listing of all the properties and objects, but it would unfortuantely be too long. I will only mention those that actually cause problems.

4.2 Find related Dublin Core elements and encodings

As seen above in the 7.1 Relation.Kind example, some elements have obvious Dublin Core counterparts. Nothing would stop us from defining our own lom_rel:isPartOf property, and then saying

Subject	Predicate	Object
`<http://www.myresource.com/>`	`<lom_rel:isPartOf>`	`<http://www.w3.org/>`

But this would seem quite unnecessary, as the meaning would be exactly the same as when using dcterms:isPartOf. It would only cause interoperability problems, not solve any.

We have therefore tried to reuse Dublin Core vocabulary wherever that has been feasible. This has actually been quite successful; only in a few cases has it proven difficult. As has been mentioned, Dublin Core elements come in two kinds: DC Elements (including DC Element Refinements) and DC Element Encodings. A DC Element Encoding is a specification of the type of value for a certain DC Element. For example, the dc:language DC Element can include any kind of string value, such as "English" or "Swedish". One DC Element Encoding for that DC Element is dcterms:RFC1766, which specifies that the string must be encoded using RFC1766 (two-letter ISO language codes). Using this DC Element Encoding, we can say "The language of My Resource is something of type RFC1766, and with the value 'en'":

Subject	Predicate	Object
`<http://www.myresource.com/>`	`<dc:language>`	`_:XXX`
`_:XXX`	`<rdf:type>`	`<dcterms:RFC1766>`
`_:XXX`	`<rdf:value>`	`"en"`

The _:XXX object (the "something" object) is a so-called anonymous RDF node (it has no URI).

These DC Element Encodings are very useful for specifying the interpretation of a literal string, where this is not given by the definition of the property. This kind of construct has been used in many places in the LOM RDF binding, and many of those are direct reuses of DC Element Encodings, for the very same interoperability reasons as above.

So this step of the process consisted of finding out what DC Elements and Element Encodings were related to a given LOM element. In some cases this caused reconsideration of the property-value status (step 1 above) of the LOM element.

4.3 Define relation to Dublin Core element

After having found the relevant Dublin Core elements, the precise relation to the Dublin Core element needed to be defined. There are essentially four ways in which a LOM element might be related to Dublin Core:

By being identical to some Dublin Core Element.
By being a sub-property (=refinement) of a Dublin Core Element.
By being a super-property of a Dublin Core Element
By using literal values that could be specified using a Dubin Core Element Encoding.

Obviously, these relations could be clarified in the LOM standard. But the fact is that they are not, so we needed to specify them. This was done starting from the LOM-DC mapping in Appendix B of the LOM data model specification. As it (unfortunately) does not contain a mapping for the (many) terms in DC Qualifiers, DC Type vocabulary, or DC Education, this mapping had to be expanded to include those terms. The resulting list can be found in Appendix A.

4.4 Define recommended encoding wrt. ordering etc.

When the precise relation to the relevant Dublin Core element had been specified, we needed to make sure that the element could express all the information in LOM. Such information includes:

ordering contraints
language encoding
additional information not found in the corresponding Dublin Core element

With respect to the above, a number of design decisions, common for many elements, were made.

Ordering

In contrast to XML, RDF elements have no automatic ordering. Again, we can see the XML-isms in LOM, which specifies whether an element should be interpreted as ordered or not. In XML elements are automatically ordered, so this is simply a question of interpretation. In RDF, ordering must explicitly be encoded. This is usually done using the rdf:Seq container. Using this container construct, a set of values of a property can be grouped together and placed in order. There are two other container types in RDF, the rdf:Bag and the rdf:Alt. Their usage is pretty straightforward:

rdf:Seq is used for a set of values that are ordered (more important first, for example)
rdf:Bag is used for a set of values that belong together, but with no inherent order (such as a group authors)
rdf:Alt is used for a set of values that represent alternative, interchangable values of the same property.

The decision was that, as RDF requires explicit encoding of ordering, no contraints on the use of ordering should be made in the specification. Instead, implementors should use the container with the semantics that most closely matches the semantics of the data. Thus, any LOM properties that the standard says should be ordered, could also be used in an unordered fashion. However, in order to be fully interoperable with other LOM tools (especially XML-based), therecommended encoding must be used, which corresponds to the requirement given in LOM.

Language Encoding

RDF allows literal strings to carry a language tag. The LangString LOM construct is encoded in RDF using this feature and the rdf:Alt container. Thus, a title with translations is given as

Subject	Predicate	Object
`<http://www.myresource.com/>`	`<dc:title>`	`_:XXX`
`_:XXX`	`<rdf:type>`	`<rdf:Alt>`
`_:XXX`	`<rdf:_1>`	`"My resource"-en`
`_:XXX`	`<rdf:_2>`	`"Min resurs"-sv`

Additional information

Several LOM elements correspond to Dublin Core elements, but carry more information that the Dublin Core element. For example, when using 9. Classification, LOM allows each taxon to have both an id (9.2.2.1 Id) and a textual entry (9.2.2.2 Entry). In the RDF binding, these are modeled as additional properties of the object of the dc:subject property. Dublin Core does not specifiy them, but RDF allows them, so this approach works seamlessly.

4.5 Construct RDF vocabulary

When the precise encoding has been defined, the corresponding RDF vocabulary needs to be specified. In essence, this boils dow to inventing URIs for the relevant vocabulary terms. It would be beneficial for LOM if this vocabulary could be synchronized between XML and RDF.

4.5 Produce RDF Schema fragment and examples

The final step is to define the precise semantics of the vocabularies. This includes defining classes such as lom_edu:SemanticDensity, that can be used as a type for the vocabulary terms of the corresponding LOM Element. Defining new vocabulary for this LOM element then boils down to simply defining new instances of this class. In this way, all elements that can be extended have a well-defined, semantically correct method for extension.

5. Issues

Finally, I will try to summarize the issues that have been encountered. There are issues of four kinds: issed that can be resolved in this binding, issues that relate to LOM in general and should be solved by LOM, and issues that have to do with RDF and therefore cannot be solved in this binding (only worked around). The fourth sort are purely editorial in nature. Each issue is label by importance:

[1] means the issue is pressing
[2] means the issue needs to be dealt with
[3] means the issue can wait

5.1 Issues in the RDF binding

	Prio	Issue	Problem	Status	Suggestion
5.1.1	[1]	Contribute element	In LOM, the Contribute element models a contribution, not a contributor. Mapping this to DC is difficult.	Currently, the LOM RDF binding maps Contribute to subproperties of `dc:creator` etc.	Do not use DC compatible modeling - it just does not work!
5.1.2	[2]	Learning Resource Type	The Learning Resource Type element uses `rdf:type`. This LOM element is ordered, but `rdf:type` can not be ordered.	`rdf:type` is used, and order is not preserved!	`rdf:type` is the right mapping for this. This should probably not be changed. Or should we use `rdf:type` only for the first type? Probably not...
5.1.3	[2]	Identifier element	The identifier element uses URIs whenever possible, but what about the case when there is no URI binding, but only a Catalog/Entry Pair?	Currently uses an `rdf:value` solution (making it a DC Element Encoding, essentially), that works quite well.	Should we perhaps look at IMS Reusable competency definitions? Their XML binding has nice text on this. In any case, we need to sync with the XML folk.
5.1.4	[1]	Educational category	The Educational category in LOM 1.0 has multiplicity 1..*, which it did not have in earlier draft versions of LOM.	We use Educational as a namespace only.	The Educational category probably needs to be remodeled as a structure.
5.1.5	[3]	Educational.Context	Does Educational.Context map to `dc:audiencelevel`?	Currently it does not.	In light of 5.1.4, this question might be moot.
5.1.6	[2]	Taxonomies	The taxonomy model uses `lom_cls:taxon` for two different purposes: to specify the root taxons in a hierarchy, and to specify sub-taxons in the hierarchy. This leads to improper conclusions if classification inference is used.	Right now we cannot distinguish the two cases.	Need to introduce a new property.
5.1.7	[2]	Need to double- and triple-check LOM compatibility	The model might contain places where it is incompatible with LOM.	No known cases except as noted in this list.	Need to go through the whole model thoroughly. Independent person would be good.

5.2 Issues with LOM

	Prio	Issue	Problem	Status	Suggestion
5.2.1	[1]	Semantics	Some of the semantic modeling that has been done will need approval from LOM.	None of it has been cleared	?
5.2.2	[1]	DC/DCQ mapping	Need to specify the details of the full DCQ mapping, and have it approved in LOM.	To be done.	?
5.2.3	[2]	Need namespaces	Need to specify the exact namespaces to use.	Boyd is on this?	?

5.3 Issues with RDF

	Prio	Issue	Problem	Status	Suggestion
5.3.1	[1]	Use RDF datatyping?	The new RDF specs contain the notion of typed literals. This fits well into many elements.	Currently not used.	Delay until LOM/RDF version 2.
5.3.2	[3]	Very weak contraints - would need both Ontology & Rules	Many of the constraints and inference rules are not represented in a machine-processable manner. We would need to use both Ontology support and RDF Rules to fix that.	Only RDF Schema is used.	Wait until Ontology/Rules specs from W3C stabilize.
5.3.3	[3]	Ordering is sometimes ugly	Forcing ordering is in some places very ugly	Currently ordering is not required, but possible.	Leave as is

5.4 Editorial Issues

	Prio	Issue	Problem	Status	Suggestion
5.4.1	[2]	Examples - XML/Graphs/N-TRIPLE?	How should we present the examples? As XML, as Graphs or using a triple notation such as N-TRIPLE?	Only XML fragment.	Use more, at least. XML is not enough.
5.4.2	[2]	RDF Schema extracts?	DCQ/RDF includes Schema fragments in each example. Should we?	No schemas in examples.	Possibly include.
5.4.3	[3]	Check references	Are all references correct and up-to-date?	-	Check
5.4.4	[2]	Document format	Do we have a good doc format? More information? Does the table work?	-	-