News > Data Specifications

Many people who create GML, and in particular INSPIRE GML, hit some common challenges around identifying features. In part, these come from technical requirements of XML/GML, and in part they come from INSPIRE requirements.

An INSPIRE feature will generally have three properties that identify objects, each with a different purpose:

gml:id: This is the mandatory XML element ID, and it is encoded as an attribute of the element. It is used to uniquely identify that element in the current document, and serves to identify the target object of an Xlink. It has to match a defined pattern, e.g. it must start with a letter or underscore. It is first and foremost a technical identifier, though it should be stable over time (e.g. over multiple transformation runs) and should thus be grounded in a property of the source feature. Only if it is stable over time, Xlink references across documents can actually work. The gml:id is used by the WFS standard query GetFeatureByID.

inspireId: This is a specific, often mandatory, complex property of INSPIRE objects, which consists of three sub-properties - localId, namespace, and version. The INSPIRE ID should be stable, and is usually used to clearly identify the object in its specific domain. Often, existing keys are re-used 1:1 as the localId.

gml:identifier: This is the optional external element ID, i.e. it should include a namespace to make it globally unique, not just in the current document. It is a standard property of all GML objects, it is encoded as an element, and is of the type gml:CodeWithAuthorityType. This is also a technical identifier which should be stable over time. INSPIRE recommends to use the namespace and localId from the inspireId to build the identifier, and INSPIRE identifiers use this codespace: http://inspire.ec.europa.eu/ids

Many people who create GML, and in particular INSPIRE GML, hit some common challenges around identifying features. In part, these come from technical requirements of XML/GML, and in part they come from INSPIRE requirements.

An INSPIRE feature will generally have three properties that identify objects, each with a different purpose:

gml:id: This is the mandatory XML element ID, and it is encoded as an attribute of the element. It is used to uniquely identify that element in the current document, and serves to identify the target object of an Xlink. It has to match a defined pattern, e.g. it must start with a letter or underscore. It is first and foremost a technical identifier, though it should be stable over time (e.g. over multiple transformation runs) and should thus be grounded in a property of the source feature. Only if it is stable over time, Xlink references across documents can actually work. The gml:id is used by the WFS standard query GetFeatureByID.

inspireId: This is a specific, often mandatory, complex property of INSPIRE objects, which consists of three sub-properties - localId, namespace, and version. The INSPIRE ID should be stable, and is usually used to clearly identify the object in its specific domain. Often, existing keys are re-used 1:1 as the localId.

gml:identifier: This is the optional external element ID, i.e. it should include a namespace to make it globally unique, not just in the current document. It is a standard property of all GML objects, it is encoded as an element, and is of the type gml:CodeWithAuthorityType. This is also a technical identifier which should be stable over time. INSPIRE recommends to use the namespace and localId from the inspireId to build the identifier, and INSPIRE identifiers use this codespace: http://inspire.ec.europa.eu/ids

Here is a complete example for a feature with all three properties:



The following guidelines on how to form these different types of IDs are partially based on the guidelines that the German AdV (Arbeitsgemeinschaft der Vermessungsverwaltungen der Länder Deutschlands) has developed. We’ve used those successfully in hundreds of transformation projects.

Namespaces

Both for gml:Identifier and for the INSPIRE ID, you will need to define a dataset namespace. The dataset namespace needs to be unique within all of the INSPIRE infrastructure, and will be used for one data set only. There are generally two common patterns for these namespaces:

  1. Technical namespace: There is one central namespace for all resources in your local spatial data infrastructure. All resources get a technical identifier, such as a UUID, which together with the registry URL, forms the dataset namespace, such as in this (fictitious) example: https://www.nationaalgeoregister.nl/c4b137b8-2317-42c2-aced-204c4216d68d
    Such namespaces are easy to generate, and collisions are very unlikely.
  2. Semantic namespace: A semantic namespace identifies the data owner, as well as some properties of the dataset, such as the INSPIRE theme it belongs to, and what data it was derived from. This is a real example: http://www.swisstopo.ch/inspire/au/4.0/swissboundaries3d/

Both approaches have some advantages and disadvantages, so it comes down to what you want to achieve by using the namespaces. For all kinds of namespaces, there is often a national or regional registry (such as the GDI-DE Registry) where INSPIRE implementers have to register their organisation and dataset namespaces.

General Rules for IDs

In most situations, we recommend to have the values for gml:id, localId and the local part of the gml:identifier to be identical. Since we often generate multiple INSPIRE objects of different INSPIRE Feature types from one source object, we need to differentiate these objects and thus prefix the domain key with the INSPIRE type name, e.g. like this:

AdministrativeBoundary_932817

We used both underscores and points to separate the INSPIRE type name from the domain key, there is no inherent difference. The domain key has to be a unique property in all source objects, or it has to be generated. Using a unique source property is highly preferred, as only that guarantees a stable ID over multiple transformation or generation runs.

In some cases, the source objects have a unique domain key that uses a problematic format (e.g. containing spaces or backslashes). If uniqueness can still be guaranteed by removing the special characters you can just strip them, otherwise, we recommend to use the source domain key as input to either generate a UUID, or to generate a Hash value. To generate a Hash value, we recommend the SHA-256 algorithm.

This approach has several advantages:

  • It guarantees a valid ID, which needs to start with a non-numerical character
  • It differentiates multiple objects created from one source object
  • It immediately tells a viewer what kind of object this is, which is especially useful in references

The question how you can build references is often the key to determine which source domain key is used best. This requires a stable, reproducible generation method that we can also employ in places where the original source object was referenced. So, when in doubt, your domain key should always be the value that is used in the existing data to create references (e.g. a Foreign Key in a data base table).

Merging objects

There are many cases where we create an INSPIRE object from multiple source objects. As an example, we merge a set of WaterwayLinks to create InlandWaterways. In this case, we still want to create a stable ID for the merged object. We do this by concatenating the domain keys of all the merged objects and then calculating the SHA-256 Hash value of the resulting string. This gives us a long, but still manageable ID:

InlandWaterway_e3b0c44298fc1c149afbf4c8...4649b934ca495991b7852b855

Another approach would be to use the value that was used to group objects as the domain key part. This creates a semantically meaningful identifier, like in this example with a stripped name:

InlandWaterway_DiepwaterrouteDuitslandWestFrieslandNoordHinder

Splitting Objects

In some cases, we need to disassemble a source object and create many INSPIRE objects of the same type from it. The most common use case for this is when the INSPIRE schema only allows simple geometries, and we have to split up a MultiGeometry. In this case, we apply the same rules as for the simple 1:1 creation, but add a postfix to the ID that uses the index of the property on which we split the object. In this example, we look at the 22nd object created from splitting out a source object(we start with 1, not with 0):

WaterwayLink_e3b0c44298fc1c149afbf4c8...4649b934ca495991b7852b855_22

Joining Objects

In a join, we use multiple objects of different source types to form an INSPIRE object. As an example, we might join a Municipality and a District object together to create an AdministrativeUnit with references to lowerLevelUnits. If there is a reason why we can’t just use the domain key of the District, our recommendation is to also use multi-component IDs for this case. In a Join, there is always a “focus” or “root” object, to which matching objects of other types are added. In this example, we try to find all Municipalities belonging to the District, so the District is the focus object. We use the domain key of this root object as we would for a simple 1:1 creation. However, we then add another key created by concatenating the domain keys of the joined objects (the Municipalities), like we do it in the Merge case. This means we take the concatenated IDs of the Municipalities and then create a SHA-256 Hash value, which is then added to the other parts of the ID:

District_241_e3b0c44298fc1c149afbf4c8...4649b934ca495991b7852b855

Summary

Creating stable IDs that can be referenced is somewhat complex. However, we’ve used the rules above as well as some variants over a few hundred projects by now and they work very well. Do you have ideas on who to improve or complement them? Let us know!

(more)

One of the big debates surrounding INSPIRE in 2017 centers around the fitness for purpose of the INSPIRE data specifications. Earlier this year, the Germany National Mapping and Cadastral Agency BKG thus asked us to perform a study to identify practical issues in the INSPIRE data specifications that make implementation and usage harder. They also asked us to document recommendations on how to improve on the Technical Guidance and the Implementing Rules.

As you might know, one of our company’s goals is to help make standards better. For us that means that we use data-driven, analytic approaches to identify places of overspecification or underspecification as well as inefficient or overly complicated data structures. We also systematically look for mismatches between existing data and the targeted implementation platforms. In earlier posts, we’ve described some of the methods we use for that.

One of the big debates surrounding INSPIRE in 2017 centers around the fitness for purpose of the INSPIRE data specifications. Earlier this year, the Germany National Mapping and Cadastral Agency BKG thus asked us to perform a study to identify practical issues in the INSPIRE data specifications that make implementation and usage harder. They also asked us to document recommendations on how to improve on the Technical Guidance and the Implementing Rules.

As you might know, one of our company’s goals is to help make standards better. For us that means that we use data-driven, analytic approaches to identify places of overspecification or underspecification as well as inefficient or overly complicated data structures. We also systematically look for mismatches between existing data and the targeted implementation platforms. In earlier posts, we’ve described some of the methods we use for that.

Methods used in Schema and Data Analysis

The BKG has now published the final version of the report on the GDI-DE website.

In this report, we analyse five data specifications (Buildings, Species distribution, Environmental monitoring facilities, Utility and governmental services and Natural risk zones) from two perspectives:

  • Is there unnecessary complexity in the technical guidance that hinders adoption by users?
  • Do the specifications really support key use cases such as e-reporting or COPERNICUS in-situ data provision? We specifically looked for patterns that would be problematic for use cases such as Data Management, Data Exchange, Data Transformation, Data Analysis in a Desktop GIS and Data Publishing through INSPIRE Services.

In the report, we describe proposals where the INSPIRE Implementing Rules or Technical Guidance can be amended to ensure the interoperability of spatial data sets and services with reasonable efforts for the authorities concerned. The proposals include concrete references to alternative encodings and simplifications (e.g., multiplicity, voidable, flattening, data type).

Resources:

(more)