News > hale
News' image preview

INSPIRE Pilots and Data Harmonisation Case Studies

To support the INSPIRE implementation and understanding, the JRC coordinates several large-scale INSPIRE pilot projects, such as the Marine Pilot, the Transportation Pilot and also the Danube Reference Data and Services Infrastructure (DRDSI).

DRDSI is an initiative that aims to provide support for the implementation of the European Union Strategy for the Danube Region (EUSDR) in close cooperation with key scientific partners. The initiative covers a lot of use cases and datasets that are inherently cross border, such as…:

  • River Basin District Management
  • Assessment of Water Resources
  • Environmental Impact Analysis

INSPIRE Pilots and Data Harmonisation Case Studies

To support the INSPIRE implementation and understanding, the JRC coordinates several large-scale INSPIRE pilot projects, such as the Marine Pilot, the Transportation Pilot and also the Danube Reference Data and Services Infrastructure (DRDSI).

DRDSI is an initiative that aims to provide support for the implementation of the European Union Strategy for the Danube Region (EUSDR) in close cooperation with key scientific partners. The initiative covers a lot of use cases and datasets that are inherently cross border, such as…:

  • River Basin District Management
  • Assessment of Water Resources
  • Environmental Impact Analysis
Data sets being transformed to INSPIRE using hale studio

In the scope of the DRDSI project, we implemented a data mapping and transformation pilot. The pilot involved all steps of a data harmonization project, from source data analysis over transformation to data publishing, and was conducted with a short timeframe and relatively small budget. The data harmonisation pilot was commissioned by the JRC and executed by wetransform.

Through this work, we created harmonized data that adds content and value to the existing DRDSI. The work aimed at filling in gaps in regional datasets by creating harmonised data for bordering countries and documenting results for use in the DRDSI platform.

Analysis

As a first step, we wanted to know whether the existing data is fit for INSPIRE harmonisation. We received seven data sets from Moldova and five data sets from Ukraine. For all data sets, we performed a quick quality analysis. This analysis included the following checks:

  • Completeness: Are all fields in the source data filled?
  • Consistency: Are there many inconsistent values, such as overlapping geometries, or different spellings of the same names?
  • Coverage: Can we likely get the minimum required information to fulfill INSPIRE requirements from the source data sets?
  • Encoding: Is the encoding clear and correct?

Based on the analysis, we decided to use data sets for three different INSPIRE themes for the pilot: Administrative Units, Hydro-Physical Waters and Railway Transport Networks.

Transformation

In many INSPIRE implementation projects, there are two steps: conceptual mapping and transformation development. With hale studio, both steps can be combined into one. There are several functions we used to make sure we got the mapping right – both conceptually and technically. In particular, we used hale studio’s real time validation features based on the loaded source data to assess whether our target data set is schema compliant. For review of the mapping by the data providers, we generated the interactive documentation and worked on improvements together. You can check out two example transformation projects we created here:

Publishing

When the transformation projects were completed, the next step was to publish the data as INSPIRE View and Download services.

We generally provide two options how to deliver services: Either as Docker Containers, or as public cloud services. As the data providers and research partners in the project didn’t have resources to host the services, we agreed to use haleconnect.com to publish the data sets. However, we also provided instructions on how the project partners could set up services based on degree directly.

Conclusions

The objective of this project was to quickly implement INSPIRE data sets and services to enable cross-border use cases for the Danube Reference Data and Services Infrastructure. We were able to work very effectively with the data stakeholders, who helped us with the analysis and the mapping through their profound understanding of the data. Using hale studio and hale connect, we acquired, analysed, transformed and published 6 INSPIRE data set with a total effort of about 10 person days.

(more)

News' image preview

One thing we’re doing a lot for our customers is to create INSPIRE data sets from their original data. Usually these data sets are available in a specific national or organisation-specific schema and need to be restructured substantially to meet the INSPIRE requirements. This harmonisation process is one of the things that has given INSPIRE a bad reputation, as in that it is a complex and time-intensive endeavour.

Recently, we passed the 100-datasets-harmonised mark. As we usually track the effort needed for each of these projects, we now start to have a meaningful sample size to judge how much time the development of each of these transformation projects took – time to look at some numbers!

The data that we collect for every project includes the source schema, the target schema, the time spent and a few additional variables, such as schema complexity. In this post, we’re going to look at the mean time spent per target data model, we’ll look at the correlation between source model complexity and effort as well as simple counts.

The dataset

Out of all the projects we’ve done, 68 have time tracking records, and are related to INSPIRE – either they use one of the 34 core data specifications, or an extension of one of those.

One thing we’re doing a lot for our customers is to create INSPIRE data sets from their original data. Usually these data sets are available in a specific national or organisation-specific schema and need to be restructured substantially to meet the INSPIRE requirements. This harmonisation process is one of the things that has given INSPIRE a bad reputation, as in that it is a complex and time-intensive endeavour.

Recently, we passed the 100-datasets-harmonised mark. As we usually track the effort needed for each of these projects, we now start to have a meaningful sample size to judge how much time the development of each of these transformation projects took – time to look at some numbers!

The data that we collect for every project includes the source schema, the target schema, the time spent and a few additional variables, such as schema complexity. In this post, we’re going to look at the mean time spent per target data model, we’ll look at the correlation between source model complexity and effort as well as simple counts.

The dataset

Out of all the projects we’ve done, 68 have time tracking records, and are related to INSPIRE – either they use one of the 34 core data specifications, or an extension of one of those.

Data set counts by required time for transformation project development

As the graph shows, quite exactly half of the projects can be completed in 8 hours or less, while only very few projects took more than 64 hours to complete. 64 hours equal about 10 productive person days when we factor in some overhead.

After looking at the general effort distribution, we wanted to dig a bit deeper – which INSPIRE Annex themes create a lot of effort for us?

Efforts to create a transformation project by target schema

The range the graph shows is pretty wide. While Addresses, Transport Networks and Hydrography Networks are all in the 30+ hour range, most of the other themes show mean times of 5 to 20 hours of required effort. As the orange line in the graph indicates, the number of datasets we’ve included for a given target data model is in many cases very small (1-3), so these numbers are certainly not stable.

Maybe we need to look at the dataset from a different angle. As we often work on a fixed price basis, we want to make sure the estimates we give are reliable, so it is important for us to know what drives effort up. Thus, the next thing we look at is source data model complexity. We measure complexity using an arbitrary set of measures that tests existence of some model features (such as foreign key relationships and inheritance) as well as model size to give a number between 1 (e.g. a single shapefile) and 10 (massive model, with every modelling feature you can imagine).

Effort required for transformation project development by Source Model Complexity

This graph does show an interesting – and not really unexpected relationship. On the X-Axis, we can see the source model complexity, on the Y-Axis, we see the time spent for the projects. We indicate effort and complexity for each project with a blue dot, and the trendline with an orange dotted line. The relationship is pretty clear: The more complex the model, the higher the mean effort. The trendline is actually almost linear, and shows a growth from about 3 to 28 over the complexity range from 1 to 10 – which is a factor close to 10.

Our conclusions?

  • Source model complexity is so far the best indicator for expected effort in a project;
  • Effort varies a lot across different INSPIRE themes;
  • Overall, more than half of the INSPIRE harmonisation projects can be completed in less than a day (caveat: we are quite experienced, so a person knowing less about INSPIRE and hale studio will need more time).

What are your experiences? How much time did you spend on transformation project setup?

(more)