News' image preview

One thing we’re doing a lot for our customers is to create INSPIRE data sets from their original data. Usually these data sets are available in a specific national or organisation-specific schema and need to be restructured substantially to meet the INSPIRE requirements. This harmonisation process is one of the things that has given INSPIRE a bad reputation, as in that it is a complex and time-intensive endeavour.

Recently, we passed the 100-datasets-harmonised mark. As we usually track the effort needed for each of these projects, we now start to have a meaningful sample size to judge how much time the development of each of these transformation projects took – time to look at some numbers!

The data that we collect for every project includes the source schema, the target schema, the time spent and a few additional variables, such as schema complexity. In this post, we’re going to look at the mean time spent per target data model, we’ll look at the correlation between source model complexity and effort as well as simple counts.

The dataset

Out of all the projects we’ve done, 68 have time tracking records, and are related to INSPIRE – either they use one of the 34 core data specifications, or an extension of one of those.

One thing we’re doing a lot for our customers is to create INSPIRE data sets from their original data. Usually these data sets are available in a specific national or organisation-specific schema and need to be restructured substantially to meet the INSPIRE requirements. This harmonisation process is one of the things that has given INSPIRE a bad reputation, as in that it is a complex and time-intensive endeavour.

Recently, we passed the 100-datasets-harmonised mark. As we usually track the effort needed for each of these projects, we now start to have a meaningful sample size to judge how much time the development of each of these transformation projects took – time to look at some numbers!

The data that we collect for every project includes the source schema, the target schema, the time spent and a few additional variables, such as schema complexity. In this post, we’re going to look at the mean time spent per target data model, we’ll look at the correlation between source model complexity and effort as well as simple counts.

The dataset

Out of all the projects we’ve done, 68 have time tracking records, and are related to INSPIRE – either they use one of the 34 core data specifications, or an extension of one of those.

Data set counts by required time for transformation project development

As the graph shows, quite exactly half of the projects can be completed in 8 hours or less, while only very few projects took more than 64 hours to complete. 64 hours equal about 10 productive person days when we factor in some overhead.

After looking at the general effort distribution, we wanted to dig a bit deeper – which INSPIRE Annex themes create a lot of effort for us?

Efforts to create a transformation project by target schema

The range the graph shows is pretty wide. While Addresses, Transport Networks and Hydrography Networks are all in the 30+ hour range, most of the other themes show mean times of 5 to 20 hours of required effort. As the orange line in the graph indicates, the number of datasets we’ve included for a given target data model is in many cases very small (1-3), so these numbers are certainly not stable.

Maybe we need to look at the dataset from a different angle. As we often work on a fixed price basis, we want to make sure the estimates we give are reliable, so it is important for us to know what drives effort up. Thus, the next thing we look at is source data model complexity. We measure complexity using an arbitrary set of measures that tests existence of some model features (such as foreign key relationships and inheritance) as well as model size to give a number between 1 (e.g. a single shapefile) and 10 (massive model, with every modelling feature you can imagine).

Effort required for transformation project development by Source Model Complexity

This graph does show an interesting – and not really unexpected relationship. On the X-Axis, we can see the source model complexity, on the Y-Axis, we see the time spent for the projects. We indicate effort and complexity for each project with a blue dot, and the trendline with an orange dotted line. The relationship is pretty clear: The more complex the model, the higher the mean effort. The trendline is actually almost linear, and shows a growth from about 3 to 28 over the complexity range from 1 to 10 – which is a factor close to 10.

Our conclusions?

  • Source model complexity is so far the best indicator for expected effort in a project;
  • Effort varies a lot across different INSPIRE themes;
  • Overall, more than half of the INSPIRE harmonisation projects can be completed in less than a day (caveat: we are quite experienced, so a person knowing less about INSPIRE and hale studio will need more time).

What are your experiences? How much time did you spend on transformation project setup?

(more)

News' image preview

Michael Lutz and Athina Trakas decided to spice up the OGC Europe forum slot at the Delft Technical Meeting by asking for position papers around the question “What if we would start implementation of INSPIRE again today?”

Michael Lutz and Athina Trakas decided to spice up the OGC Europe forum slot at the Delft Technical Meeting by asking for position papers around the question “What if we would start implementation of INSPIRE again today?”

Participants of the workshop engage in discussions, Photo by Michael Lutz

More than 40 participants joined the workshop and first of all, saw eight three-minute presentations with a wide range of suggestions:

  • Satish Sankaran (Esri Inc.) asked “What are the right metrics to measure success?”, and suggested that adoption rates could be improved if people could contribute to the infrastructure without requiring full compliance.
  • Paul van Genuchten (GeoCAT) highlighted the potential of INSPIRE linked data as explored in the GeoNovum testbed Geo4Web.
  • Thijs Brentjes (Geonovum) suggested a set of data specifications with a simple base, with no mandatory extensions, built on top of existing (INSPIRE) SDI. He also suggested to use additional encodings with the objective of making INSPIRE usable by web developers.
  • Sylvain Grellet (BRGM) also suggested that adoption would be easier with alternative encodings (SF, JSON, …). He was the first presenter to suggest different levels or labels for compliance. Sylvain also said that joint funding of development should be organised from the start on, instead of leaving everything to the implementers. Sylvain also suggested to better organise aspects such as trainings and hackathons.
  • Clemens Portele (Interactive Instruments) explained how important stability and reliability are for a major infrastructure project like INSPIRE. He suggested to improve specifications through small, iterative changes. It should be possible to make these changes in an agile, fast, usage driven way. He briefly outlined the work done by Geonovum and Interactive Instruments on making data accessible to web applications and suggested to put facades or proxies on top of the existing INSPIRE infrastructure.
  • Thorsten Reitz (wetransform) focused on the user experience of applications built directly to manage and explore INSPIRE data and services and explained that with most of the investment going to backend infrastructure, these leave a lot to be desired would be essential to show the value of the infrastructure.
  • A representative of Natural Resources Canada explained the objectives of the Maps for HTML standards working group and explained that adding capabilities such as MapML would foster usage and adoption.
  • Peter Bauman (Rasdaman GmbH) asked “What if our services could talk?” but wasn’t able to join in person.

These three recurring topics emerged:

  1. tiered or more flexible compliance,
  2. usage of web standards and improvements to data usability and
  3. adding proxy layers on top of the INSPIRE infrastructure.
Participants of the workshop engage in discussions, Photo by Michael Lutz

Following this agenda setting, we split up in four groups to discuss several key questions, using the World Cafe methodology:

  • What standards and technologies should the infrastructure be based on?
  • What architectural pattern would you recommend? What should be the main components of the infrastructure?
  • How would you organise the implementation process and make it cost-efficient?
  • How would you ensure a wide adoption and use of the infrastructure?

Athina and Michael asked me to facilitate the group discussion around the third question - “How would you organise the implementation process and make it cost-efficient?” Our objective was to define 2-3 recommendations and to suggest follow-up actions. As facilitator, I asked the following questions, and a lively discussion ensued:

What is the very first thing that should happen in the implementation process?

  • Define criteria for success early on (not “only” for compliance)
  • Define Interoperability (when is data interoperable -> when it us useable in clients), “General” interoperability isn’t the goal, usage is!
  • Provide end product specifications for high value use cases to large numbers of users
  • Define success for concrete users in addition to the overall objective

If you had one year to implement INSPIRE from scratch, how would you do it?

  • Start with simple data models, manage complexity (necessary vs. unnecessary, e.g. in ISO and metadata), expand over time with new use cases
  • Follow pragmatic approaches
  • Use a well-defined set of interfaces that are already proven (mainstream IT industry)
  • Follow trend in industry towards RESTful mechanisms
  • Focus on pushing data out (e.g. as done in Copernicus)

What is the process to continuously coordinate implementation and make re-use possible?

  • Use agile methods such as iterative development
  • Keep users in close involvement
  • Identify anything
  • Keep existing infrastructure, build agile infrastructure on top, then let the market choose what works
  • Transparency, make it available
  • The metric could be: How many users to you have? Users drive continual improvement?

How can implementation of key components be coordinated in an efficient way?

  • Coordinate development of core components early on, in particular validation and registries (e.g. code lists, extensions)
  • Common components should be coordinated; Harmonisation may be such an issue
  • No mandatory components, but all components for a reference implementation should be available
  • Make sure that requirements that are specific to INSPIRE are really understood well - Value vs. effort/costs on very specific INSPIRE requirements?
  • Clarify the business case for the implementation coordinator - is that organisation paid for by taxes?
  • Countries see lots of liability and low central investment
  • Central funds/investments is low compared to the overall investments required for implementation and compared to COPERNICUS
  • Harmonisation across INSPIRE and Copernicus currently looks like a Godzilla vs. King Kong fight, will be difficult to achieve effectively
Participants of the workshop engage in discussions, Photo by Michael Lutz

We then consolidated recommendations:

  1. We should treat INSPIRE not as separate infrastructure, but rather as integrated with existing products and processes, e.g. by extending national models to meet additional INSPIRE requirements
  2. Cost effective: INSPIRE not as something specific, but as a general infrastructure and be a natural part of what we are doing
  3. We would reframe INSPIRE in the context of the Open Data Movement to limit competition between INSPIRE and Open Data / Linked Open Data
  4. We also make sure products are designed from the user experience first
  5. We have to orient implementation guidance towards implementer’s questions and problems (“How to provide a bridge in INSPIRE” –> can’t be answered by a professional)
  6. Have a library of reference implementations to describe how it’s done for all annexes

… and Follow-Up actions:

  1. Collect reference implementations as concrete guides and publish those
  2. Provide compliance levels (?) as a means to get in easily
  3. Find how to react to new uses cases in an agile way

A side discussion that came up in our group as well as in at least two of the other three groups was what it meant to be compliant really, but I’ll leave that for another post ☺.

All in all, I really enjoyed the highly interactive format of the What if…? Workshop and the productive discussions, which was not just rehashing of previously discussed issues. Thanks to Athina and Michael!

(more)

hale connect enables highly simplified and largely automated workflows for analysing, transforming and publishing datasets. The configuration for these automated workflows is called a theme. Such themes are reuseable configuration objects that enable you to define transformation projects, metadata generation rules, and automation triggers, among many other aspects. Watch this 5 minute video to learn how to create themes and schemas (data models) in hale connect:

If you’d like to try this out yourself, go over to hale connect to sign up for a demo account or reach out to us to learn more.

hale connect enables highly simplified and largely automated workflows for analysing, transforming and publishing datasets. The configuration for these automated workflows is called a theme. Such themes are reuseable configuration objects that enable you to define transformation projects, metadata generation rules, and automation triggers, among many other aspects. Watch this 5 minute video to learn how to create themes and schemas (data models) in hale connect:

If you’d like to try this out yourself, go over to hale connect to sign up for a demo account or reach out to us to learn more.

(more)

News' image preview

For a long time, the data harmonisation panel and specifically the Redmine installation at www.esdi-community.org was the home to hale studio. More than 1.400 people registered there over the course of six years and posted questions on INSPIRE and other standards. For reasons such as that no one maintained the existing infrastructure anymore, we have moved most resources away from Redmine over the course of the last year:

  1. The hale studio source code and issue tracking is now hosted at GitHub
  2. The end user and developer documentation has moved to halestudio.org
  3. Since last week, the hale support board is now moved to discuss.wetransform.to

The old website will now be switched off. So, what do we have in stock for you on the new site?

For a long time, the data harmonisation panel and specifically the Redmine installation at www.esdi-community.org was the home to hale studio. More than 1.400 people registered there over the course of six years and posted questions on INSPIRE and other standards. For reasons such as that no one maintained the existing infrastructure anymore, we have moved most resources away from Redmine over the course of the last year:

  1. The hale studio source code and issue tracking is now hosted at GitHub
  2. The end user and developer documentation has moved to halestudio.org
  3. Since last week, the hale support board is now moved to discuss.wetransform.to

The old website will now be switched off. So, what do we have in stock for you on the new site?

The new forum for hale at wetransform.to

The new forum pretty much looks like other discussion sites. It offers threaded discussion on one single board, where you can post anything related to hale studio, hale connect, inspire git and data harmonisation in general. What’s special about it is that it is fully integrated with hale connect, our cloud platform for collaborative data modelling and transformation. When you register at the forum, you automatically get a free account to hale connect as well.

Step by step, we will integrate functions that are only available inside hale connect, such as discussions on individual cells of hale studio projects, with the public forum. We look forward to see more focused and effective teamwork on issues such as challenging transformation projects!

Happy transforming!

(more)

hale studio is a schema driven data transformation software, first and foremost. Many users have also found it very useful to load, explore and analyse their data models with it, be it large database models or massive XML schemas. This short video explains the basic aspects of loading and navigating schemas:

If you’d like to try this out yourself, go over to the hale studio downloads page or reach out to us to learn more.

hale studio is a schema driven data transformation software, first and foremost. Many users have also found it very useful to load, explore and analyse their data models with it, be it large database models or massive XML schemas. This short video explains the basic aspects of loading and navigating schemas:

If you’d like to try this out yourself, go over to the hale studio downloads page or reach out to us to learn more.

(more)