Many people who create GML, and in particular INSPIRE GML, hit some common challenges around identifying features. In part, these come from technical requirements of XML/GML, and in part they come from INSPIRE requirements.
An INSPIRE feature will generally have three properties that identify objects, each with a different purpose:
gml:id
: This is the mandatory XML element ID, and it is encoded as an attribute of the element. It is used to uniquely identify that element in the current document, and serves to identify the target object of an Xlink. It has to match a defined pattern, e.g. it must start with a letter or underscore. It is first and foremost a technical identifier, though it should be stable over time (e.g. over multiple transformation runs) and should thus be grounded in a property of the source feature. Only if it is stable over time, Xlink references across documents can actually work. The gml:id
is used by the WFS standard query GetFeatureByID
.
inspireId
: This is a specific, often mandatory, complex property of INSPIRE objects, which consists of three sub-properties - localId
, namespace
, and version
. The INSPIRE ID should be stable, and is usually used to clearly identify the object in its specific domain. Often, existing keys are re-used 1:1 as the localId
.
gml:identifier
: This is the optional external element ID, i.e. it should include a namespace to make it globally unique, not just in the current document. It is a standard property of all GML objects, it is encoded as an element, and is of the type gml:CodeWithAuthorityType
. This is also a technical identifier which should be stable over time. INSPIRE recommends to use the namespace
and localId
from the inspireId
to build the identifier
, and INSPIRE identifiers
use this codespace: http://inspire.ec.europa.eu/ids
Many people who create GML, and in particular INSPIRE GML, hit some common challenges around identifying features. In part, these come from technical requirements of XML/GML, and in part they come from INSPIRE requirements.
An INSPIRE feature will generally have three properties that identify objects, each with a different purpose:
gml:id
: This is the mandatory XML element ID, and it is encoded as an attribute of the element. It is used to uniquely identify that element in the current document, and serves to identify the target object of an Xlink. It has to match a defined pattern, e.g. it must start with a letter or underscore. It is first and foremost a technical identifier, though it should be stable over time (e.g. over multiple transformation runs) and should thus be grounded in a property of the source feature. Only if it is stable over time, Xlink references across documents can actually work. The gml:id
is used by the WFS standard query GetFeatureByID
.
inspireId
: This is a specific, often mandatory, complex property of INSPIRE objects, which consists of three sub-properties - localId
, namespace
, and version
. The INSPIRE ID should be stable, and is usually used to clearly identify the object in its specific domain. Often, existing keys are re-used 1:1 as the localId
.
gml:identifier
: This is the optional external element ID, i.e. it should include a namespace to make it globally unique, not just in the current document. It is a standard property of all GML objects, it is encoded as an element, and is of the type gml:CodeWithAuthorityType
. This is also a technical identifier which should be stable over time. INSPIRE recommends to use the namespace
and localId
from the inspireId
to build the identifier
, and INSPIRE identifiers
use this codespace: http://inspire.ec.europa.eu/ids
Here is a complete example for a feature with all three properties:
The following guidelines on how to form these different types of IDs are partially based on the guidelines that the German AdV (Arbeitsgemeinschaft der Vermessungsverwaltungen der Länder Deutschlands) has developed. We’ve used those successfully in hundreds of transformation projects.
Both for gml:Identifier
and for the INSPIRE ID, you will need to define a dataset namespace. The dataset namespace needs to be unique within all of the INSPIRE infrastructure, and will be used for one data set only. There are generally two common patterns for these namespaces:
https://www.nationaalgeoregister.nl/c4b137b8-2317-42c2-aced-204c4216d68d
http://www.swisstopo.ch/inspire/au/4.0/swissboundaries3d/
Both approaches have some advantages and disadvantages, so it comes down to what you want to achieve by using the namespaces. For all kinds of namespaces, there is often a national or regional registry (such as the GDI-DE Registry) where INSPIRE implementers have to register their organisation and dataset namespaces.
In most situations, we recommend to have the values for gml:id, localId
and the local part of the gml:identifier
to be identical. Since we often generate multiple INSPIRE objects of different INSPIRE Feature types from one source object, we need to differentiate these objects and thus prefix the domain key with the INSPIRE type name, e.g. like this:
AdministrativeBoundary_932817
We used both underscores and points to separate the INSPIRE type name from the domain key, there is no inherent difference. The domain key has to be a unique property in all source objects, or it has to be generated. Using a unique source property is highly preferred, as only that guarantees a stable ID over multiple transformation or generation runs.
In some cases, the source objects have a unique domain key that uses a problematic format (e.g. containing spaces or backslashes). If uniqueness can still be guaranteed by removing the special characters you can just strip them, otherwise, we recommend to use the source domain key as input to either generate a UUID, or to generate a Hash value. To generate a Hash value, we recommend the SHA-256 algorithm.
This approach has several advantages:
The question how you can build references is often the key to determine which source domain key is used best. This requires a stable, reproducible generation method that we can also employ in places where the original source object was referenced. So, when in doubt, your domain key should always be the value that is used in the existing data to create references (e.g. a Foreign Key in a data base table).
There are many cases where we create an INSPIRE object from multiple source objects. As an example, we merge a set of WaterwayLinks
to create InlandWaterways
. In this case, we still want to create a stable ID for the merged object. We do this by concatenating the domain keys of all the merged objects and then calculating the SHA-256 Hash value of the resulting string. This gives us a long, but still manageable ID:
InlandWaterway_e3b0c44298fc1c149afbf4c8...4649b934ca495991b7852b855
Another approach would be to use the value that was used to group objects as the domain key part. This creates a semantically meaningful identifier, like in this example with a stripped name:
InlandWaterway_DiepwaterrouteDuitslandWestFrieslandNoordHinder
In some cases, we need to disassemble a source object and create many INSPIRE objects of the same type from it. The most common use case for this is when the INSPIRE schema only allows simple geometries, and we have to split up a MultiGeometry
. In this case, we apply the same rules as for the simple 1:1 creation, but add a postfix to the ID that uses the index of the property on which we split the object. In this example, we look at the 22nd object created from splitting out a source object(we start with 1, not with 0):
WaterwayLink_e3b0c44298fc1c149afbf4c8...4649b934ca495991b7852b855_22
In a join, we use multiple objects of different source types to form an INSPIRE object. As an example, we might join a Municipality
and a District
object together to create an AdministrativeUnit
with references to lowerLevelUnits
. If there is a reason why we can’t just use the domain key of the District
, our recommendation is to also use multi-component IDs for this case. In a Join, there is always a “focus” or “root” object, to which matching objects of other types are added. In this example, we try to find all Municipalities
belonging to the District
, so the District
is the focus object. We use the domain key of this root object as we would for a simple 1:1 creation. However, we then add another key created by concatenating the domain keys of the joined objects (the Municipalities), like we do it in the Merge case. This means we take the concatenated IDs of the Municipalities and then create a SHA-256 Hash value, which is then added to the other parts of the ID:
District_241_e3b0c44298fc1c149afbf4c8...4649b934ca495991b7852b855
Creating stable IDs that can be referenced is somewhat complex. However, we’ve used the rules above as well as some variants over a few hundred projects by now and they work very well. Do you have ideas on who to improve or complement them? Let us know!
(more)
One of the big debates surrounding INSPIRE in 2017 centers around the fitness for purpose of the INSPIRE data specifications. Earlier this year, the Germany National Mapping and Cadastral Agency BKG thus asked us to perform a study to identify practical issues in the INSPIRE data specifications that make implementation and usage harder. They also asked us to document recommendations on how to improve on the Technical Guidance and the Implementing Rules.
As you might know, one of our company’s goals is to help make standards better. For us that means that we use data-driven, analytic approaches to identify places of overspecification or underspecification as well as inefficient or overly complicated data structures. We also systematically look for mismatches between existing data and the targeted implementation platforms. In earlier posts, we’ve described some of the methods we use for that.
One of the big debates surrounding INSPIRE in 2017 centers around the fitness for purpose of the INSPIRE data specifications. Earlier this year, the Germany National Mapping and Cadastral Agency BKG thus asked us to perform a study to identify practical issues in the INSPIRE data specifications that make implementation and usage harder. They also asked us to document recommendations on how to improve on the Technical Guidance and the Implementing Rules.
As you might know, one of our company’s goals is to help make standards better. For us that means that we use data-driven, analytic approaches to identify places of overspecification or underspecification as well as inefficient or overly complicated data structures. We also systematically look for mismatches between existing data and the targeted implementation platforms. In earlier posts, we’ve described some of the methods we use for that.
The BKG has now published the final version of the report on the GDI-DE website.
In this report, we analyse five data specifications (Buildings, Species distribution, Environmental monitoring facilities, Utility and governmental services and Natural risk zones) from two perspectives:
In the report, we describe proposals where the INSPIRE Implementing Rules or Technical Guidance can be amended to ensure the interoperability of spatial data sets and services with reasonable efforts for the authorities concerned. The proposals include concrete references to alternative encodings and simplifications (e.g., multiplicity, voidable, flattening, data type).
Resources:
(more)
The JRC has continuously worked to make access to the INSPIRE Data Specifications easier. Initially, interested people had to either read through hundreds of pages of PDFs, or to analyse the Enterprise Architect UML files. Thanks to the hard work of the IES team at the JRC, we now have a third option available. The INSPIRE Interactive Data Specifications. This online application provides two tools:
The JRC has continuously worked to make access to the INSPIRE Data Specifications easier. Initially, interested people had to either read through hundreds of pages of PDFs, or to analyse the Enterprise Architect UML files. Thanks to the hard work of the IES team at the JRC, we now have a third option available. The INSPIRE Interactive Data Specifications. This online application provides two tools:
This new third option provides you with a hale studio mapping project. This project makes it easier for you to get started with producing INSPIRE interoperable data. Get your mapping project by following these steps:
You can save or open the mapping project that you’ll download. When you open it in hale studio 2.9.4+, it’s preconfigured to include…:
We’ll also add codelists soon. Let us know what else would make your life easier!
(more)