Over the course of the last 15 years, Germany developed its own set of GML-based spatial data exchange standards, known as ALKIS-AFIS-ATKIS (or short 3A NAS). Surveying organisations in all states have implemented the standards, thus providing a common foundation for an INSPIRE implementation.
In 2016, the Arbeitsgemeinschaft der Vermessungsverwaltungen der Länder der Bundesrepublik Deutschland (AdV) commissioned wetransform to create a formal data transformation documentation, with 3A NAS and the “Hauskoordinaten” as a source and with 12 INSPIRE Annex I GML schemas as the target. This documentation was to be generated based on hale studio alignments, and validated against data sets from multiple German states.
This project has recently been completed, resulting with the first full, formal and executable data transformation specification. Those results helped German authorities to achieve the breakthrough in the provision of harmonised INSPIRE data sets.
Additional project challenge was the high complexity of the source data models, being much larger than the individual INSPIRE annex data specifications. Furthermore, the 3A NAS schemas use a lot of special constructs to link features, and the German states have implemented individual variants due to different processes and legal requirements.
In this article, we explain how we created these highly complex alignments with up to 450 cells using hale studio, what methodologies we applied, and how implementers, e.g. the state of Rheinland-Pfalz have already picked up the results to create INSPIRE compliant data sets from their 3A NAS production databases.
The baseline for the project was a massive collection of Excel matching tables, equivalent to more than 200 A3 pages when printed out. We used these Excel tables to create the initial Alignments. Furthermore, we worked with the AdV to define common rules for the transformation and for the resulting INSPIRE data sets, such as patterns for
During the initial analysis of the data models, we saw the need for specific functions and common mappings for all the alignments. As both, the source and target models are rich object-oriented models with rich inheritance hierarchies, we can define the common mappings in one alignment and then import these into all others. These so-called base alignments are re-usable components that we then imported into all Annex I alignments:
The custom functions we wrote for this project included the following:
Using the custom functions, we avoided a lot of redundancy in the alignments and reduced their complexity.
The core task in the project was to create the 14 concrete alignments used to generate the formal documentation. We applied the following development process:
In this project, we learned that the highly detailed matching tables captured only about 30% of all transformation cells in the final projects correctly or fully. Most of the work was to review and improve iterations that followed on the initial implementation. A lot of very important input was provided by the AdV stakeholders, so that the alignments could be improved until they reached sufficient quality on all aspects. The following links lead you to the interactive mapping documentation for some of these:
These alignments are currently in the final resolution process of the AdV.
You might have noticed that there are three alignments that have Administrative Units as their target schema: In 3A, the geometry of Administrative Units is derived by creating the union of a set of land parcels. This process reduces redundancy in the data, but can be computationally expensive. As a consequence, we developed an alignment that creates these aggregated geometries for all levels of Administrative Units, but also made two variants that allow the specification of an additional data source with the respective pre-aggregated geometries.
We set up a process to generate derived alignments for subsets of the 3A data models based on the “Modellart”. The “Modellart” is a mix of model and scale – for example, there are landscape models in scales of 1:25.000 to 1:1.000.000. Each “Modellart” includes a subset of the total 3A model, so that the transformation also need to be used on a subset only, and some information is not available. We used annotations to the mapping cells to indicate which cell is relevant for which model. Due to hale’s declarative mapping they can be created easily by excluding mappings for feature types that are not part of the respective model.
We also set up another automated generation process to derive modified alignments that would use the PostNAS database system instead of 3A XML as the source schema. One of the big advantages of a declarative system is that it makes such derivation processes and re-used of transformation mappings feasible.
For any kind of complex data processing, continuous testing is necessary. We set up an automated process that transformed and validated more than a dozen different data sets after each change to the mappings. This process was implemented with a Gradle script invoking the hale Command Line Interface. This interface has grown in capabilities with each release and can be used to control almost all aspects of hale – be it the transformation, the generation of artifacts such as the formal documentation or the validation of the results.
The final deliverable of the project was the formal documentation. For a long time, hale studio had the capability to generate both matching tables and HTML documentation. Over the development of the last releases we have continuously improved the HTML documentation feature, so that the documentation offers a lot more than any static document could provide. It includes a graphical representation of the mapping, a verbal description, and information on the related schema entities, notes and other information. It is also interactive –search and filter options make it possible to choose what information to display.
This project was a relatively complex undertaking, with more than 20 stakeholders reviewing the mappings and the transformed data to ensure completeness and correctness of the formal documentation. In the initial project, we used Gitlab as an issue tracker and collaboration platform. Gitlab is a very useful general purpose project and source code management platform, much like GitHub. However, we also found some issues with the usage of Gitlab for this specific use case:
We thus implemented collaboration features as part of the documentation itself. These collaboration features enable efficient teamwork in larger groups with diverse backgrounds:
These additional features require a central service to function, which we deployed as part of haleconnect.com. We evaluated the use of hale connect to manage our internal transformation projects over the last months. Now, we start to use the same processes with our customers to build better transformation projects faster.