If you have been creating INSPIRE GML, you have almost certainly encountered so-called codelists. They are an important part of INSPIRE data specifications and contribute substantially to interoperability. They are, however, not as straightforward as a simple enumeration is. This post explains what codelists are, how you use them, and why they are important.

In general, a codelist contains several terms whose definitions are universally agreed upon and understood. Codelists support data interoperability and form a shared vocabulary for a community. They can even be multilingual.

Managing Codelists and Codelist Registries

INSPIRE Codelists are commonly managed and maintained in codelist registers which provide search capabilities, so that both end users and client applications can easily access codelist values for reference. Registers provide unique and persistent identifiers for the published codelist values and ensure consistent versioning. There are many different INSPIRE registers which manage the identifiers of different resources commonly used in INSPIRE.

Codelists used in INSPIRE are maintained in the INSPIRE code list registry, the codelist registry of a member state, or an acknowledged, external third-party who maintains a domain-specific codelist.

To add a new codelist, you will have to either set up your own registry or work with the administration of one of the existing registries to get your codelist published. This can be a quite an involved process, which is designed to make sure that there is no random growth of codelists.

Extending Codelists

One special feature of codelists in INSPIRE is that they may be extensible. If a codelist is extensible, it will only contain a small set of common terms, but you can add your own terms. With respect to extensibility, we differentiate four different types of codelists in INSPIRE:

  • None (Not extensible): A codelist that is not extensible includes only the values specified in the INSPIRE Implementing Rules (IR).
  • Narrower (Narrower extensible): A codelist that is narrower extensible includes the values specified in the IR and narrower values defined by the data providers.
  • Open (Freely extensible): A freely extensible codelist includes the values specified in the IR and additional values defined by data providers.
  • Empty (Any values allowed): An empty codelist can contain any values defined by the data providers.

You can recognize which type a codelist is by either looking at the UML model, where they appear as tagged values (“extensibility”), or by looking into their definitions in the respective registry. For example, the Anthropogenic Geomorphologic Feature codelist is shown below.

Codelists have maintenance processes which enable the update of codelist values. Codelists of the type “Not extensible” can also be updated to include new values for inclusion in the next, updated version. Codelists of the type “Freely extensible” can include extended codelist values, however only if they are managed in a register. Codelists of the type “Empty” often pose a challenge to users as there are not always readily applicable codelists available. In some cases, empty codelists suggest use of a standard external codelist commonly used in the domain.

Codelist Encoding

The conceptual schema language rules in the INSPIRE Generic Conceptual Model contain guidance on how to include codelists in INSPIRE GML application schemas, some of which you may recognize:

  • Code lists should use the stereotype codeList.
  • The name of the codelist or enumeration should include the suffix Value
  • The documentation field of the codeList classes in the UML application schemas shall include the -- Name --, -- Definition --, and -- Description -- information.
  • The natural language name of the code list (given in the -- Name -- section) should not include the term Value.
  • The type of code list shall be specified using the tagged value extensibility on the codeList class.
  • For each code list, a tagged value called vocabulary shall be specified. The value of the tagged value shall be a persistent URI identifying the values of the code list.
  • A code list may also be used as a super-class for a number of specific codelists whose values may be used to specify the attribute value.
  • Values of INSPIRE-governed code lists and enumerations shall be in lowerCamelCase notation.

In UML, the usage of an extended code list is indicated by substituting the existing code list. The extended codelist is represented by a sub-type of the original codelist.

Codelist values are encoded in GML application schemas using gml:ReferenceType, which means that there is no formal link between the new subtype in the GML application schema and the extended codelist. The codelist itself must be published in a register and the register should be published in the INSPIRE register federation, however the application schema does not need to be adapted to use the extended or profiled codelist.

Using INSPIRE codelists in hale»studio

Both INSPIRE GML as well as the INSPIRE metadata – which describes harmonized datasets and network services – include references to codelists in the form of xlinks. Xlink is a recommendation by the World Wide Web Consortium for the definition of references in or across XML documents. Simple xlinksare the standard method for object references in GML. Attributes encoded using xlink require a URI to the remote object or internal document reference in xlink:href.

It is standard practice to refer to items in the INSPIRE registry using HTTP URIs.

If you are using hale»studio to create your harmonization project, you can load INSPIRE codelists directly from the INSPIRE registry for use in your project. The INSPIRE codelists are referenced using http in the exported GML data.

To import an INSPIRE codelist into your hale studio project, select “File” » “Import” » “Codelist”.

Next, select “From INSPIRE registry”. A list of all INSPIRE codelists will appear and you can either filter by name or search by INSPIRE theme. The selected codelist will be added to your project.

If all the target instances in your dataset will use the same codelist value, select the href attribute in the target property and apply the Assign function. In the Assign function dialog, select the icon with the yellow arrows to assign a codelist value from the codelist you loaded into your project.

Next steps

Codelists are a fundamental building block of any INSPIRE implementation: they promote data interoperability through the effective reuse of stable and persistent identifiers for universally defined concepts. INSPIRE harmonization projects can often be stalled by empty codelists and missing values. Wetransform has supported numerous customers with the UML encoding of custom, codelist extensions, and with the development and maintenance of codelist registries. If you are interested in moving ahead with your project and overcoming the obstacles, please get in touch with our support team at support@wetransform.to.

If you’re interested in learning more about such topics, feel free to check out our post on INSPIRE IDs or our news page!

If you have been creating INSPIRE GML, you have almost certainly encountered so-called codelists. They are an important part of INSPIRE data specifications and contribute substantially to interoperability. They are, however, not as straightforward as a simple enumeration is. This post explains what codelists are, how you use them, and why they are important.

In general, a codelist contains several terms whose definitions are universally agreed upon and understood. Codelists support data interoperability and form a shared vocabulary for a community. They can even be multilingual.

Managing Codelists and Codelist Registries

INSPIRE Codelists are commonly managed and maintained in codelist registers which provide search capabilities, so that both end users and client applications can easily access codelist values for reference. Registers provide unique and persistent identifiers for the published codelist values and ensure consistent versioning. There are many different INSPIRE registers which manage the identifiers of different resources commonly used in INSPIRE.

Codelists used in INSPIRE are maintained in the INSPIRE code list registry, the codelist registry of a member state, or an acknowledged, external third-party who maintains a domain-specific codelist.

To add a new codelist, you will have to either set up your own registry or work with the administration of one of the existing registries to get your codelist published. This can be a quite an involved process, which is designed to make sure that there is no random growth of codelists.

Extending Codelists

One special feature of codelists in INSPIRE is that they may be extensible. If a codelist is extensible, it will only contain a small set of common terms, but you can add your own terms. With respect to extensibility, we differentiate four different types of codelists in INSPIRE:

  • None (Not extensible): A codelist that is not extensible includes only the values specified in the INSPIRE Implementing Rules (IR).
  • Narrower (Narrower extensible): A codelist that is narrower extensible includes the values specified in the IR and narrower values defined by the data providers.
  • Open (Freely extensible): A freely extensible codelist includes the values specified in the IR and additional values defined by data providers.
  • Empty (Any values allowed): An empty codelist can contain any values defined by the data providers.

You can recognize which type a codelist is by either looking at the UML model, where they appear as tagged values (“extensibility”), or by looking into their definitions in the respective registry. For example, the Anthropogenic Geomorphologic Feature codelist is shown below.

Codelists have maintenance processes which enable the update of codelist values. Codelists of the type “Not extensible” can also be updated to include new values for inclusion in the next, updated version. Codelists of the type “Freely extensible” can include extended codelist values, however only if they are managed in a register. Codelists of the type “Empty” often pose a challenge to users as there are not always readily applicable codelists available. In some cases, empty codelists suggest use of a standard external codelist commonly used in the domain.

Codelist Encoding

The conceptual schema language rules in the INSPIRE Generic Conceptual Model contain guidance on how to include codelists in INSPIRE GML application schemas, some of which you may recognize:

  • Code lists should use the stereotype codeList.
  • The name of the codelist or enumeration should include the suffix Value
  • The documentation field of the codeList classes in the UML application schemas shall include the -- Name --, -- Definition --, and -- Description -- information.
  • The natural language name of the code list (given in the -- Name -- section) should not include the term Value.
  • The type of code list shall be specified using the tagged value extensibility on the codeList class.
  • For each code list, a tagged value called vocabulary shall be specified. The value of the tagged value shall be a persistent URI identifying the values of the code list.
  • A code list may also be used as a super-class for a number of specific codelists whose values may be used to specify the attribute value.
  • Values of INSPIRE-governed code lists and enumerations shall be in lowerCamelCase notation.

In UML, the usage of an extended code list is indicated by substituting the existing code list. The extended codelist is represented by a sub-type of the original codelist.

Codelist values are encoded in GML application schemas using gml:ReferenceType, which means that there is no formal link between the new subtype in the GML application schema and the extended codelist. The codelist itself must be published in a register and the register should be published in the INSPIRE register federation, however the application schema does not need to be adapted to use the extended or profiled codelist.

Using INSPIRE codelists in hale»studio

Both INSPIRE GML as well as the INSPIRE metadata – which describes harmonized datasets and network services – include references to codelists in the form of xlinks. Xlink is a recommendation by the World Wide Web Consortium for the definition of references in or across XML documents. Simple xlinksare the standard method for object references in GML. Attributes encoded using xlink require a URI to the remote object or internal document reference in xlink:href.

It is standard practice to refer to items in the INSPIRE registry using HTTP URIs.

If you are using hale»studio to create your harmonization project, you can load INSPIRE codelists directly from the INSPIRE registry for use in your project. The INSPIRE codelists are referenced using http in the exported GML data.

To import an INSPIRE codelist into your hale studio project, select “File” » “Import” » “Codelist”.

Next, select “From INSPIRE registry”. A list of all INSPIRE codelists will appear and you can either filter by name or search by INSPIRE theme. The selected codelist will be added to your project.

If all the target instances in your dataset will use the same codelist value, select the href attribute in the target property and apply the Assign function. In the Assign function dialog, select the icon with the yellow arrows to assign a codelist value from the codelist you loaded into your project.

Next steps

Codelists are a fundamental building block of any INSPIRE implementation: they promote data interoperability through the effective reuse of stable and persistent identifiers for universally defined concepts. INSPIRE harmonization projects can often be stalled by empty codelists and missing values. Wetransform has supported numerous customers with the UML encoding of custom, codelist extensions, and with the development and maintenance of codelist registries. If you are interested in moving ahead with your project and overcoming the obstacles, please get in touch with our support team at support@wetransform.to.

If you’re interested in learning more about such topics, feel free to check out our post on INSPIRE IDs or our news page!

(more)

2021 has been an eventful year, and we’ve had many exciting developments including further support for the increasingly popular GeoPackage format, more CSW capabilities, and a host of other improvements.

Here’s what’s new:

Information for Users

New Features

  • Users uploading data to the hale»connect platform via URL can now add Authorization headers to HTTP(s) requests to provide the required authentication, as shown below.
  • Organisations that have their own CSW configured can now edit values in the CSW capabilities documents through use of variables on the organization profile page.
    Note: Activation of the CSW_INSPIRE_METADATA_CONFIG feature toggle is required.
  • Organisation can now be filtered in the text filter of the dataset resource list.
  • hale»connect now supports GeoPackage as source data input to online transformations via URL.
  • Service publishing is now enabled with custom SLDs that include multiple feature types in one layer.

Changes

  • Usage statistics graphs now use the same color for the same user agent across multiple graphs, as shown below.
  • Metadata input fields that allow you to select predefined values as well as enter free text, now save the free text when you exit the field (and not just after entering “Enter”).
  • In the WMS service settings, there is now the option of restricting rendering in view services to the bounding box in the metadata. Whether the option is activated by default depends on the system configuration. If the option is activated, data may not be displayed in a view service if the bounding box in the metadata is incorrect or the axes are swapped.
    Note: Currently, this setting for data set series cannot be adjusted via the user interface.
  • Data fields that are interpreted as a number in the file analysis are no longer treated as a floating point number if values are all integers.
  • To improve overall performance, the system-wide display level configuration for raster layers is now checked earlier.
  • Terms of use / useContraints in the metadata: Descriptions for given code list values can now be adapted.
  • Use restrictions / useLimitations in the metadata: GDI-DE-specific rules only apply if the country specified in the metadata is Germany. If this is not specified, the GDI-DE standard value is no longer set if the data set is an INSPIRE data set (“INSPIRE” Category) so that INSPIRE Monitoring correctly recognizes the metadata as being in accordance with TG 2.0.
  • Outdated GDI-DE test suite tests were removed from hale»connect.

Fixes

  • Some unnecessary error messages that occur if a user does not have sufficient privileges to access certain information have been removed.
  • An error that caused status messages to be displayed incorrectly on the dataset overview page was fixed.
  • An error was fixed to prevent password protection on services after the password was removed.
  • File names of uploaded shape files can start with a number.
  • Fixed a bug with downloading files that need to be converted before adding to a dataset.
  • The automatic process that fills metadata now waits for any outstanding processes to calculate the attribute coverage, to be able to access the results.
  • Services of datasets in a dataset series no longer count for the capacity points (only the services of the series itself).
  • An error that deleted uploaded files that were not associated with a dataset (e.g. uploaded logos for organizations)was fixed.
  • Several “priority dataset” keywords are now correctly represented in the metadata when published.
  • When using the hybrid mode, no geometries were saved if no geometry was referenced in the SLD. This has been fixed by now always trying to identify a standard geometry in a feature type.
  • The Mapproxy cache for raster layers in a series no longer resets every time the series is changed. This now only happens when changes are made to the relevant individual data set.
  • An error has been corrected which caused the WFS to deliver invalid XML with missing namespace definitions. This also affected corresponding GetFeatureInfo queries.
  • GetFeatureInfo requests now return complete XML when the INFO_FORMAT parameter is of type text/xml.
  • GetFeatureInfo requests return results for raster/vector datasets.
  • An issue was fixed that was related to schema location when the same schema definition file is referenced directly in a combined schema and imported by a schema contained in the same combined schema file.
  • *The AuthorityUrl.name element can now only contain valid values for the data type NMTOKEN.
  • Added redirection handling for INSPIRE schemas.
  • A fix to prevent global capacity points updates from running during the day was implemented.
  • A fix was implemented to use string representations of number values as autofill results, when available.

Information for Systems Administrators

Mapproxy: Adjustments to the Docker Image

Until now, Mapproxy could become a bottleneck when processing WMS requests, as many parallel requests could only be processed poorly in the previous configuration. The runtime environment in which Mapproxy runs in the Docker container has been adapted, as well as the procedure for deleting caches. The result is that Mapproxy no longer acts as the root user within the container - the caches created are assigned to the root user. To ensure access to the caches created, the rights must be adapted so that the mapproxy user of the container has write and read rights. This can be done, for example, via a shell in the new mapproxy container: chown -R mapproxy: mapproxy / mapproxy / cache /

Note: As an alternative, there is also the possibility to keep Mapproxy running as root, but this should only be used as an interim solution - if you are interested, we can provide the appropriate configuration option.

Mapproxy: Extended configuration options

Mapproxy acts as a buffer in the system that intercepts GetMap requests to view services and, if possible, serves them from the built-up cache. It thereby determines what kind of requests are processed by deegree. The behavior of Mapproxy can now be adjusted in some aspects. The configuration options are currently only available at the system configuration level, with the exception of the setting to restrict the metadata to the bounding box.

Important: Changes to the configuration are not automatically applied to existing publications. The new actions on the debug page of the service-publisher should be used for this purpose:

  • To update the mapproxy configuration only:
    1. “Update mapproxy configuration for all publications” for all existing publications
    2. “update-mapproxy” for a single publication
  • To update the mapproxy configuration and to reset the cache (e.g. when changing the cache backend)
    1. “Update mapproxy configuration and clear cache for all publications” for all existing publications
    2. “reset-mapproxy” for a single publication

The new configuration options are described below. More information on the individual options can also be found in the Mapproxy documentation.

Reduced re-start times of unresponsive WMS/WFS services

With many publications, the initialization of the OWS services can take a long time. If the feature toggle to divide the configuration workspace into sub-workspaces per organization is used, the configurations are initialized in parallel. This significantly accelerates the start of WMS/WFS services after a failure.

Before / after examples from our systems: Before: approx. 5 minutes - after: approx. 90 seconds (10k+ Services) Before: between 30 and 50 minutes - after: between 5 and 8 minutes (60k+ Services)

If you are not yet using sub-workspaces in your deployment and are interested in it, please contact us. Start up time only improves significantly if the publications in the system are well distributed among different organizations.

Cache backend

By default, Mapproxy saves the individual cached tiles as individual files in a specific directory structure. This can quickly lead to millions of files being used for a cache. This, in turn, can be a problem if the file system’s limitations on the maximum number of files (or inodes) are reached. Once the limit has been reached and no more files can be created, it is particularly critical if data other than the caches are on the same file system. It is now possible to adapt the backend used for the caches. The options are as follows:

  • file - the standard setting with storage as individual files
  • sqlite - Saving a zoom level in a SQLite file
  • geopackage - Saving a zoom level in a geopackage file

Recommendation: We recommend using the sqlite backend, which we are already using productively. You should check whether the number of files in the file system could possibly become a problem (e.g. with df -i). Currently, we do not support any mechanism to migrate caches between different backends. In this respect, the old cache should be deleted when updating the configuration for existing publications. In principle, however, there is a tool at Mapproxy with which a migration can be carried out.

Restriction of the cache to certain zoom levels

In hale»connect, Mapproxy uses a uniform tile grid for all publications based on EPSG: 3857:

GLOBAL_WEBMERCATOR:
Configuration:
    bbox*: [-20037508.342789244, -20037508.342789244, 20037508.342789244, 20037508.342789244]
    origin: 'nw'
    srs: 'EPSG:3857'
    tile_size*: [256, 256]
Levels: Resolutions, # x * y = total tiles
    00:  156543.03392804097,  #      1 * 1      =          1
    01:  78271.51696402048,   #      2 * 2      =          4
    02:  39135.75848201024,   #      4 * 4      =         16
    03:  19567.87924100512,   #      8 * 8      =         64
    04:  9783.93962050256,    #     16 * 16     =        256
    05:  4891.96981025128,    #     32 * 32     =       1024
    06:  2445.98490512564,    #     64 * 64     =       4096
    07:  1222.99245256282,    #    128 * 128    =      16384
    08:  611.49622628141,     #    256 * 256    =      65536
    09:  305.748113140705,    #    512 * 512    =     262144
    10:  152.8740565703525,   #   1024 * 1024   =      1.05M
    11:  76.43702828517625,   #   2048 * 2048   =      4.19M
    12:  38.21851414258813,   #   4096 * 4096   =     16.78M
    13:  19.109257071294063,  #   8192 * 8192   =     67.11M
    14:  9.554628535647032,   #  16384 * 16384  =    268.44M
    15:  4.777314267823516,   #  32768 * 32768  =   1073.74M
    16:  2.388657133911758,   #  65536 * 65536  =   4294.97M
    17:  1.194328566955879,   # 131072 * 131072 =  17179.87M
    18:  0.5971642834779395,  # 262144 * 262144 =  68719.48M
    19:  0.29858214173896974, # 524288 * 524288 = 274877.91M

Mapproxy can now be configured not to cache tiles from a certain zoom level but always to make requests to deegree:

service_publisher:
map_proxy:
    # Don't cache but use direct access beginning with the given level 
    # (negative value to disable)
    # For example: A value of 18 mean levels 0-17 are cached but levels >=18 not
    use_direct_from_level: -1

Restricting queries and cache to the bounding box of the metadata

Since the data of a view service rarely covers the whole world, it makes sense to spatially limit the cache and the requests to deegree. Now there is the possibility to do this with the help of the bounding box of the metadata. When activated, requests that are outside of the system automatically deliver an empty image without making a request to deegree or the cache having to be expanded to include the information. In addition to activating the restriction, a buffer can also be configured around the bounding box to avoid content being cut off (which can be possible, for example, with raster data):

map_proxy:
    # limit mapproxy cache and source requests to metadata bounding box
    # otherwise the cache may encompass the whole world-wide grid (see above)
    coverage:
    enabled: true
    buffer: 0.01 # buffer for WGS 84 bounding box (to for instance compensate rasters that exceed the vector bounding box); 0.01 ~ 1km

Monitoring: Alerts on file systems

The existing alerts on file systems, which should provide information when a file system is almost full or no more handles are available, have unfortunately not been fully functional due to a change in the names of the metrics. These alerts have been revised and expanded to determine when the threshold of a file system’s maximum number of files is close to being met. The standard value of the limit is 10% available memory / files, but can be adjusted:

alerts: 
    filesystem: 
        # default limit in percent of available space / inodes, must be an integer value available_limit: 10

2021 has been an eventful year, and we’ve had many exciting developments including further support for the increasingly popular GeoPackage format, more CSW capabilities, and a host of other improvements.

Here’s what’s new:

Information for Users

New Features

  • Users uploading data to the hale»connect platform via URL can now add Authorization headers to HTTP(s) requests to provide the required authentication, as shown below.
  • Organisations that have their own CSW configured can now edit values in the CSW capabilities documents through use of variables on the organization profile page.
    Note: Activation of the CSW_INSPIRE_METADATA_CONFIG feature toggle is required.
  • Organisation can now be filtered in the text filter of the dataset resource list.
  • hale»connect now supports GeoPackage as source data input to online transformations via URL.
  • Service publishing is now enabled with custom SLDs that include multiple feature types in one layer.

Changes

  • Usage statistics graphs now use the same color for the same user agent across multiple graphs, as shown below.
  • Metadata input fields that allow you to select predefined values as well as enter free text, now save the free text when you exit the field (and not just after entering “Enter”).
  • In the WMS service settings, there is now the option of restricting rendering in view services to the bounding box in the metadata. Whether the option is activated by default depends on the system configuration. If the option is activated, data may not be displayed in a view service if the bounding box in the metadata is incorrect or the axes are swapped.
    Note: Currently, this setting for data set series cannot be adjusted via the user interface.
  • Data fields that are interpreted as a number in the file analysis are no longer treated as a floating point number if values are all integers.
  • To improve overall performance, the system-wide display level configuration for raster layers is now checked earlier.
  • Terms of use / useContraints in the metadata: Descriptions for given code list values can now be adapted.
  • Use restrictions / useLimitations in the metadata: GDI-DE-specific rules only apply if the country specified in the metadata is Germany. If this is not specified, the GDI-DE standard value is no longer set if the data set is an INSPIRE data set (“INSPIRE” Category) so that INSPIRE Monitoring correctly recognizes the metadata as being in accordance with TG 2.0.
  • Outdated GDI-DE test suite tests were removed from hale»connect.

Fixes

  • Some unnecessary error messages that occur if a user does not have sufficient privileges to access certain information have been removed.
  • An error that caused status messages to be displayed incorrectly on the dataset overview page was fixed.
  • An error was fixed to prevent password protection on services after the password was removed.
  • File names of uploaded shape files can start with a number.
  • Fixed a bug with downloading files that need to be converted before adding to a dataset.
  • The automatic process that fills metadata now waits for any outstanding processes to calculate the attribute coverage, to be able to access the results.
  • Services of datasets in a dataset series no longer count for the capacity points (only the services of the series itself).
  • An error that deleted uploaded files that were not associated with a dataset (e.g. uploaded logos for organizations)was fixed.
  • Several “priority dataset” keywords are now correctly represented in the metadata when published.
  • When using the hybrid mode, no geometries were saved if no geometry was referenced in the SLD. This has been fixed by now always trying to identify a standard geometry in a feature type.
  • The Mapproxy cache for raster layers in a series no longer resets every time the series is changed. This now only happens when changes are made to the relevant individual data set.
  • An error has been corrected which caused the WFS to deliver invalid XML with missing namespace definitions. This also affected corresponding GetFeatureInfo queries.
  • GetFeatureInfo requests now return complete XML when the INFO_FORMAT parameter is of type text/xml.
  • GetFeatureInfo requests return results for raster/vector datasets.
  • An issue was fixed that was related to schema location when the same schema definition file is referenced directly in a combined schema and imported by a schema contained in the same combined schema file.
  • *The AuthorityUrl.name element can now only contain valid values for the data type NMTOKEN.
  • Added redirection handling for INSPIRE schemas.
  • A fix to prevent global capacity points updates from running during the day was implemented.
  • A fix was implemented to use string representations of number values as autofill results, when available.

Information for Systems Administrators

Mapproxy: Adjustments to the Docker Image

Until now, Mapproxy could become a bottleneck when processing WMS requests, as many parallel requests could only be processed poorly in the previous configuration. The runtime environment in which Mapproxy runs in the Docker container has been adapted, as well as the procedure for deleting caches. The result is that Mapproxy no longer acts as the root user within the container - the caches created are assigned to the root user. To ensure access to the caches created, the rights must be adapted so that the mapproxy user of the container has write and read rights. This can be done, for example, via a shell in the new mapproxy container: chown -R mapproxy: mapproxy / mapproxy / cache /

Note: As an alternative, there is also the possibility to keep Mapproxy running as root, but this should only be used as an interim solution - if you are interested, we can provide the appropriate configuration option.

Mapproxy: Extended configuration options

Mapproxy acts as a buffer in the system that intercepts GetMap requests to view services and, if possible, serves them from the built-up cache. It thereby determines what kind of requests are processed by deegree. The behavior of Mapproxy can now be adjusted in some aspects. The configuration options are currently only available at the system configuration level, with the exception of the setting to restrict the metadata to the bounding box.

Important: Changes to the configuration are not automatically applied to existing publications. The new actions on the debug page of the service-publisher should be used for this purpose:

  • To update the mapproxy configuration only:
    1. “Update mapproxy configuration for all publications” for all existing publications
    2. “update-mapproxy” for a single publication
  • To update the mapproxy configuration and to reset the cache (e.g. when changing the cache backend)
    1. “Update mapproxy configuration and clear cache for all publications” for all existing publications
    2. “reset-mapproxy” for a single publication

The new configuration options are described below. More information on the individual options can also be found in the Mapproxy documentation.

Reduced re-start times of unresponsive WMS/WFS services

With many publications, the initialization of the OWS services can take a long time. If the feature toggle to divide the configuration workspace into sub-workspaces per organization is used, the configurations are initialized in parallel. This significantly accelerates the start of WMS/WFS services after a failure.

Before / after examples from our systems: Before: approx. 5 minutes - after: approx. 90 seconds (10k+ Services) Before: between 30 and 50 minutes - after: between 5 and 8 minutes (60k+ Services)

If you are not yet using sub-workspaces in your deployment and are interested in it, please contact us. Start up time only improves significantly if the publications in the system are well distributed among different organizations.

Cache backend

By default, Mapproxy saves the individual cached tiles as individual files in a specific directory structure. This can quickly lead to millions of files being used for a cache. This, in turn, can be a problem if the file system’s limitations on the maximum number of files (or inodes) are reached. Once the limit has been reached and no more files can be created, it is particularly critical if data other than the caches are on the same file system. It is now possible to adapt the backend used for the caches. The options are as follows:

  • file - the standard setting with storage as individual files
  • sqlite - Saving a zoom level in a SQLite file
  • geopackage - Saving a zoom level in a geopackage file

Recommendation: We recommend using the sqlite backend, which we are already using productively. You should check whether the number of files in the file system could possibly become a problem (e.g. with df -i). Currently, we do not support any mechanism to migrate caches between different backends. In this respect, the old cache should be deleted when updating the configuration for existing publications. In principle, however, there is a tool at Mapproxy with which a migration can be carried out.

Restriction of the cache to certain zoom levels

In hale»connect, Mapproxy uses a uniform tile grid for all publications based on EPSG: 3857:

GLOBAL_WEBMERCATOR:
Configuration:
    bbox*: [-20037508.342789244, -20037508.342789244, 20037508.342789244, 20037508.342789244]
    origin: 'nw'
    srs: 'EPSG:3857'
    tile_size*: [256, 256]
Levels: Resolutions, # x * y = total tiles
    00:  156543.03392804097,  #      1 * 1      =          1
    01:  78271.51696402048,   #      2 * 2      =          4
    02:  39135.75848201024,   #      4 * 4      =         16
    03:  19567.87924100512,   #      8 * 8      =         64
    04:  9783.93962050256,    #     16 * 16     =        256
    05:  4891.96981025128,    #     32 * 32     =       1024
    06:  2445.98490512564,    #     64 * 64     =       4096
    07:  1222.99245256282,    #    128 * 128    =      16384
    08:  611.49622628141,     #    256 * 256    =      65536
    09:  305.748113140705,    #    512 * 512    =     262144
    10:  152.8740565703525,   #   1024 * 1024   =      1.05M
    11:  76.43702828517625,   #   2048 * 2048   =      4.19M
    12:  38.21851414258813,   #   4096 * 4096   =     16.78M
    13:  19.109257071294063,  #   8192 * 8192   =     67.11M
    14:  9.554628535647032,   #  16384 * 16384  =    268.44M
    15:  4.777314267823516,   #  32768 * 32768  =   1073.74M
    16:  2.388657133911758,   #  65536 * 65536  =   4294.97M
    17:  1.194328566955879,   # 131072 * 131072 =  17179.87M
    18:  0.5971642834779395,  # 262144 * 262144 =  68719.48M
    19:  0.29858214173896974, # 524288 * 524288 = 274877.91M

Mapproxy can now be configured not to cache tiles from a certain zoom level but always to make requests to deegree:

service_publisher:
map_proxy:
    # Don't cache but use direct access beginning with the given level 
    # (negative value to disable)
    # For example: A value of 18 mean levels 0-17 are cached but levels >=18 not
    use_direct_from_level: -1

Restricting queries and cache to the bounding box of the metadata

Since the data of a view service rarely covers the whole world, it makes sense to spatially limit the cache and the requests to deegree. Now there is the possibility to do this with the help of the bounding box of the metadata. When activated, requests that are outside of the system automatically deliver an empty image without making a request to deegree or the cache having to be expanded to include the information. In addition to activating the restriction, a buffer can also be configured around the bounding box to avoid content being cut off (which can be possible, for example, with raster data):

map_proxy:
    # limit mapproxy cache and source requests to metadata bounding box
    # otherwise the cache may encompass the whole world-wide grid (see above)
    coverage:
    enabled: true
    buffer: 0.01 # buffer for WGS 84 bounding box (to for instance compensate rasters that exceed the vector bounding box); 0.01 ~ 1km

Monitoring: Alerts on file systems

The existing alerts on file systems, which should provide information when a file system is almost full or no more handles are available, have unfortunately not been fully functional due to a change in the names of the metrics. These alerts have been revised and expanded to determine when the threshold of a file system’s maximum number of files is close to being met. The standard value of the limit is 10% available memory / files, but can be adjusted:

alerts: 
    filesystem: 
        # default limit in percent of available space / inodes, must be an integer value available_limit: 10

(more)

Wind turbines, helipads, industrial plants, and campsites - wondering what’s common among these?

According to §21 of Germany’s Air Traffic Regulations and the EU Drone Regulation of 2021 they are recognized as drone flight prohibition zones. However, not all such drone no-fly zones have been formally identified and communicated to the public.

In 2018, drones caused 158 disruptions to air traffic in Germany - almost double the amount from the previous year. Drone flight clearly comes with safety and security risks, and not all no-fly zones are formally defined. The expected growth in drone usage (between 2020 and 2025, commercial drone usage is expected to increase by 200%) will worsen the situation and lead to more similar instances - unless no-fly zones are identified, monitored and communicated actively.

Moreover, precise and safe drone navigation will expand possible drone mobility uses and add value to society. For example, drones will reduce efforts and costs of supply lines and operational execution. They could be used to transport vaccines or medical equipment such as oxygen cannisters more effectively. Robotic camera drones could be used for high-voltage pipelines inspections or gas line maintenance. The applications of safe drone mobility are vast and diverse.

Eliminating the current drone navigation problems and making the most out of drone transport is no small feat. Germany has 357,386 km² of terrain that need to be analysed, branded, and visualized as fly or no-fly zones. Some of these terrain features have not yet been mapped. These problems are a part of a sphere of innovation that needs further strategic development.

Deutsche Flugsicherung (DFS), Fraunhofer IGD, and wetransform have joined forces to tackle these problems through the fAIRport (Flight Area Artificial Intelligence Retrieval Portal) project. The three-year long (2020 – 2023) and 1.2 million EUR project is supported by the Federal Ministry of Transport and Digital Infrastructure (BMVI) as a part of the mFUND (Modernity Fund) research initiative.

The main aim of fAIRport is to provide a comprehensive high-precision geodatabase for no-fly zones in Germany by merging existing datasets and creating new information through Fraunhofer IGD’s AI-based methods for orthoimage object detection.

wetransform is developing the centrepiece of the project- the fAIRport municipal portal and creating the dataflows for the establishment of the portal. The portal will collate static data, dynamic data, local information, and real-time information. Fraunhofer IGD is creating the 2D and 3D visualisations of no-fly zones that will make the portal more intuitive and easier to use. The portal will allow local authorities to access, upload, maintain, and manage no-fly zones within their jurisdictions.

The project is in its initial stages, and recently we hosted the second user workshop on the functionalities of the portal for the City of Langen, and important project stakeholder. The goal of the workshop was to get a deeper understanding of the required workflows and the portal requirements. The first draft of the portal based on user requirements is shown below:

This draft received positive feedback, but there’s still a long way to go. Nationwide data must be collected and branded as fly or no-fly zones, and then this data needs to be made accessible at scale and visualized effectively.

After project completion in early 2023, there will be clear rules about where drones can fly in all of Germany and all no-fly zones will be visualized and maintained with ease. Moreover, community members will be able to add in zones themselves, thus any spontaneous activities that lead to temporary no-fly zones can also easily be added to the fAIRport portal. These initiatives will open more flight corridors as only dedicated areas will be restricted. The improved definition of zones will also let local authorities across Germany effectively manage no-fly zones in their jurisdictions.

We’re excited to see how this project will cause disruption across industries. If you’re interested in other similar initiatives, you can also read our article about how wetransform is harnessing the power of open data to save Germany’s forests.

Interested in staying updated about the latest happenings in the world of data interoperability? Sign up for our newsletter here.

Wind turbines, helipads, industrial plants, and campsites - wondering what’s common among these?

According to §21 of Germany’s Air Traffic Regulations and the EU Drone Regulation of 2021 they are recognized as drone flight prohibition zones. However, not all such drone no-fly zones have been formally identified and communicated to the public.

In 2018, drones caused 158 disruptions to air traffic in Germany - almost double the amount from the previous year. Drone flight clearly comes with safety and security risks, and not all no-fly zones are formally defined. The expected growth in drone usage (between 2020 and 2025, commercial drone usage is expected to increase by 200%) will worsen the situation and lead to more similar instances - unless no-fly zones are identified, monitored and communicated actively.

Moreover, precise and safe drone navigation will expand possible drone mobility uses and add value to society. For example, drones will reduce efforts and costs of supply lines and operational execution. They could be used to transport vaccines or medical equipment such as oxygen cannisters more effectively. Robotic camera drones could be used for high-voltage pipelines inspections or gas line maintenance. The applications of safe drone mobility are vast and diverse.

Eliminating the current drone navigation problems and making the most out of drone transport is no small feat. Germany has 357,386 km² of terrain that need to be analysed, branded, and visualized as fly or no-fly zones. Some of these terrain features have not yet been mapped. These problems are a part of a sphere of innovation that needs further strategic development.

Deutsche Flugsicherung (DFS), Fraunhofer IGD, and wetransform have joined forces to tackle these problems through the fAIRport (Flight Area Artificial Intelligence Retrieval Portal) project. The three-year long (2020 – 2023) and 1.2 million EUR project is supported by the Federal Ministry of Transport and Digital Infrastructure (BMVI) as a part of the mFUND (Modernity Fund) research initiative.

The main aim of fAIRport is to provide a comprehensive high-precision geodatabase for no-fly zones in Germany by merging existing datasets and creating new information through Fraunhofer IGD’s AI-based methods for orthoimage object detection.

wetransform is developing the centrepiece of the project- the fAIRport municipal portal and creating the dataflows for the establishment of the portal. The portal will collate static data, dynamic data, local information, and real-time information. Fraunhofer IGD is creating the 2D and 3D visualisations of no-fly zones that will make the portal more intuitive and easier to use. The portal will allow local authorities to access, upload, maintain, and manage no-fly zones within their jurisdictions.

The project is in its initial stages, and recently we hosted the second user workshop on the functionalities of the portal for the City of Langen, and important project stakeholder. The goal of the workshop was to get a deeper understanding of the required workflows and the portal requirements. The first draft of the portal based on user requirements is shown below:

This draft received positive feedback, but there’s still a long way to go. Nationwide data must be collected and branded as fly or no-fly zones, and then this data needs to be made accessible at scale and visualized effectively.

After project completion in early 2023, there will be clear rules about where drones can fly in all of Germany and all no-fly zones will be visualized and maintained with ease. Moreover, community members will be able to add in zones themselves, thus any spontaneous activities that lead to temporary no-fly zones can also easily be added to the fAIRport portal. These initiatives will open more flight corridors as only dedicated areas will be restricted. The improved definition of zones will also let local authorities across Germany effectively manage no-fly zones in their jurisdictions.

We’re excited to see how this project will cause disruption across industries. If you’re interested in other similar initiatives, you can also read our article about how wetransform is harnessing the power of open data to save Germany’s forests.

Interested in staying updated about the latest happenings in the world of data interoperability? Sign up for our newsletter here.

(more)

Based on our work with 1000+ public organizations across the EU, we have found that the Environmental Monitoring Facilities theme poses challenges for many organizations. Through this blog post, you will learn more about:

  • What is the INSPIRE Environmental Monitoring Facilities theme and the related Observations and Measurements specification, and what are some of its use cases?
  • Why is the theme challenging to implement?
  • How can you deal with this complexity?

INSPIRE Environmental Monitoring Facilities (EMF) and Observations and Measurements (OM)

The environmental monitoring facilities data specification is a generic model that has been developed for environmental monitoring of any kind, across any domain. The scope of EF is defined in the INSPIRE directive as “Location and operation of environmental monitoring facilities includes observation and measurement of emissions, of the state of environmental media and of other ecosystem parameters (biodiversity, ecological conditions of vegetation, etc.) by or on behalf of public authorities.

This scope definition includes two aspects:

  1. The environmental monitoring facility as a spatial object in the context of INSPIRE (location)
  2. Data obtained through observations and measurements linked to the environmental monitoring facility (operation).

The EMF data specification re-uses the OGC Observations and Measurements specification for the actual observations. Again, this is a generic data model that can be used to encode data from a very wide range of sensors. Observed properties recorded at environmental monitoring facilities include particulate matter concentrations in ppm, air temperature in degrees Celsius, precipitation in millimeters or water pressure in PSI.

Due to this open scope, data encoded according to the EMF specification has many use cases, such as monitoring water or air quality, noise monitoring, or detection of specific emissions. Such data is often mission-critical for businesses and organizations. For example, due to the Icelandic volcanic eruptions of 2014, air traffic routes over Iceland were severely disrupted due to significant amounts of ash. These routes were used by many airline carriers that had heterogenous data on air quality and weather patterns. As a result of this heterogeneity, coordination of navigation through the Icelandic airspace became increasingly complex and resulted in large costs for the airline carriers.

Why is this theme challenging to implement?

The EMF and OM specifications form a powerful, generic model, with a wide range of different measurements and use cases. What this leads to is a very flexible model, i.e. one that doesn’t just prescribe one way to encode things. Different implementers may thus arrive at different ways how to structure their EMF data, even for the same type of measurement. As an example, you can decide to reference observation data from the EnvironmentalMonitoringFacility either inLine or byReference, using the property hasObservation.

Both the option to nest OM_Observations in the EnvironmentalMonitoringFacility.hasObservation property and the option to create top-level OM_Observation features and reference them via href in EnvironmentalMonitoringFacility.hasObservation are equally valid in INSPIRE. If users want to primarily request OM_Observation features from the WFS (independent of EnvironmentalMonitoringFacility features), modelling them as top-level (byReference) is better. If users primarily want to request EnvironmentalMonitoringFacility features with all details (including the OM_Observation under the hasObservation property), then nesting (inLine) would be better. The decision to provide features inLine or byReference should be made before your project begins.

Example: modelling features inLine

Another decision you will need to make is to define what structure the actual measurements in OM_Observationshould have. By default, the OM_Observation.result property is of type anyType in the Environmental Monitoring Facility INSPIRE schema.

In the XML Schema Definition Language (XSD), the data type anyType is at the root of the type definition hierarchy. An element with type anyType defines no constraints regarding the type of data it can contain. This does not mean, that the contained data should not or cannot have a type defined in a schema. In INSPIRE, type anyType does not always mean that we can add whatever data type we want. In some INSPIRE schemas, there are constraints documented in the INSPIRE UML models which describe the types that are acceptable. In the Environmental Monitoring Facilities schema, the constraint documented in the UML states: “{result type shall be suitable for observedProperty}”. As the property is mandatory in the INSPIRE data model, implementers need to provide a value in the result property. To be able to populate this property, a type definition must be added to the schema.

The data types Quantity and Range are common choices. For the following example, we selected the data type Quantity, found in this schema. Using Quantity, we can add the measurement value, the unit of measurement, and the allowed value range of the measurements.

To further clarify this topic and to train INSPIRE implementers on how to transform data to the INSPIRE Environmental Monitoring Facilities theme, we will host an online training from 25th May 2021-27th May 2021. This training will cover:

  • Learn how to analyse your source data and determine the structure of your EMF dataset(s)
  • Learn how to combine and edit schemas for use in hale studio
  • Interactively create an INSPIRE compliant EMF dataset through detailed, guided mapping instruction

You can learn more about this training, costs, and the registration process here.

Based on our work with 1000+ public organizations across the EU, we have found that the Environmental Monitoring Facilities theme poses challenges for many organizations. Through this blog post, you will learn more about:

  • What is the INSPIRE Environmental Monitoring Facilities theme and the related Observations and Measurements specification, and what are some of its use cases?
  • Why is the theme challenging to implement?
  • How can you deal with this complexity?

INSPIRE Environmental Monitoring Facilities (EMF) and Observations and Measurements (OM)

The environmental monitoring facilities data specification is a generic model that has been developed for environmental monitoring of any kind, across any domain. The scope of EF is defined in the INSPIRE directive as “Location and operation of environmental monitoring facilities includes observation and measurement of emissions, of the state of environmental media and of other ecosystem parameters (biodiversity, ecological conditions of vegetation, etc.) by or on behalf of public authorities.

This scope definition includes two aspects:

  1. The environmental monitoring facility as a spatial object in the context of INSPIRE (location)
  2. Data obtained through observations and measurements linked to the environmental monitoring facility (operation).

The EMF data specification re-uses the OGC Observations and Measurements specification for the actual observations. Again, this is a generic data model that can be used to encode data from a very wide range of sensors. Observed properties recorded at environmental monitoring facilities include particulate matter concentrations in ppm, air temperature in degrees Celsius, precipitation in millimeters or water pressure in PSI.

Due to this open scope, data encoded according to the EMF specification has many use cases, such as monitoring water or air quality, noise monitoring, or detection of specific emissions. Such data is often mission-critical for businesses and organizations. For example, due to the Icelandic volcanic eruptions of 2014, air traffic routes over Iceland were severely disrupted due to significant amounts of ash. These routes were used by many airline carriers that had heterogenous data on air quality and weather patterns. As a result of this heterogeneity, coordination of navigation through the Icelandic airspace became increasingly complex and resulted in large costs for the airline carriers.

Why is this theme challenging to implement?

The EMF and OM specifications form a powerful, generic model, with a wide range of different measurements and use cases. What this leads to is a very flexible model, i.e. one that doesn’t just prescribe one way to encode things. Different implementers may thus arrive at different ways how to structure their EMF data, even for the same type of measurement. As an example, you can decide to reference observation data from the EnvironmentalMonitoringFacility either inLine or byReference, using the property hasObservation.

Both the option to nest OM_Observations in the EnvironmentalMonitoringFacility.hasObservation property and the option to create top-level OM_Observation features and reference them via href in EnvironmentalMonitoringFacility.hasObservation are equally valid in INSPIRE. If users want to primarily request OM_Observation features from the WFS (independent of EnvironmentalMonitoringFacility features), modelling them as top-level (byReference) is better. If users primarily want to request EnvironmentalMonitoringFacility features with all details (including the OM_Observation under the hasObservation property), then nesting (inLine) would be better. The decision to provide features inLine or byReference should be made before your project begins.

Example: modelling features inLine

Another decision you will need to make is to define what structure the actual measurements in OM_Observationshould have. By default, the OM_Observation.result property is of type anyType in the Environmental Monitoring Facility INSPIRE schema.

In the XML Schema Definition Language (XSD), the data type anyType is at the root of the type definition hierarchy. An element with type anyType defines no constraints regarding the type of data it can contain. This does not mean, that the contained data should not or cannot have a type defined in a schema. In INSPIRE, type anyType does not always mean that we can add whatever data type we want. In some INSPIRE schemas, there are constraints documented in the INSPIRE UML models which describe the types that are acceptable. In the Environmental Monitoring Facilities schema, the constraint documented in the UML states: “{result type shall be suitable for observedProperty}”. As the property is mandatory in the INSPIRE data model, implementers need to provide a value in the result property. To be able to populate this property, a type definition must be added to the schema.

The data types Quantity and Range are common choices. For the following example, we selected the data type Quantity, found in this schema. Using Quantity, we can add the measurement value, the unit of measurement, and the allowed value range of the measurements.

To further clarify this topic and to train INSPIRE implementers on how to transform data to the INSPIRE Environmental Monitoring Facilities theme, we will host an online training from 25th May 2021-27th May 2021. This training will cover:

  • Learn how to analyse your source data and determine the structure of your EMF dataset(s)
  • Learn how to combine and edit schemas for use in hale studio
  • Interactively create an INSPIRE compliant EMF dataset through detailed, guided mapping instruction

You can learn more about this training, costs, and the registration process here.

(more)

From April 2021 to June 2021, we will provide tutorials, trainings, and 1:1 sessions on identifying and fixing INSPIRE compliance gaps. Here’s the schedule:

Webinars and Sessions

Description Date Price Registration
INSPIRE Solutions for Municipal Service Companies April 20 free Closed
GeoPackage: An alternative encoding for INSPIRE May 18 free Register here
Kommunale Lösungen für INSPIRE und XPlanung June 1 free Register here
INSPIRE Monitoring 2021: Identify Compliance Gaps April 1-30 free E-mail us
INSPIRE Monitoring 2021: Fix Compliance Gaps May 2-31 free E-mail us

INSPIRE Online Trainings

Format Description Audience Date Price
15 Std. Datentransformation nach INSPIRE mit hale»studio Beginners June 14-18 800€
15 hrs Transforming Data to INSPIRE with hale»studio Beginners July 6–10 800€
8 hrs Transformation for Environmental Monitoring Facilities Advanced May 25-27 400€
8 hrs Transformation for Geology and Mineral Resources Advanced June 21-23 400€
8 hrs Mastering complex INSPIRE transformations with Scripts Advanced June 8–10 400€

XPlanung Online Trainings

Format Description Audience Date Price
1 Std. Einführung in XPlanung und XPlanGML All April 7 free
15 Std. Datentransformation nach XPlanung mit hale»studio Beginners April 26-30 800€

To sign up for other events, e-mail us at info@wetransform.to with the list of events you want to attend and your name and organization. To learn more about the trainings, visit our workshops webpage.

From April 2021 to June 2021, we will provide tutorials, trainings, and 1:1 sessions on identifying and fixing INSPIRE compliance gaps. Here’s the schedule:

Webinars and Sessions

Description Date Price Registration
INSPIRE Solutions for Municipal Service Companies April 20 free Closed
GeoPackage: An alternative encoding for INSPIRE May 18 free Register here
Kommunale Lösungen für INSPIRE und XPlanung June 1 free Register here
INSPIRE Monitoring 2021: Identify Compliance Gaps April 1-30 free E-mail us
INSPIRE Monitoring 2021: Fix Compliance Gaps May 2-31 free E-mail us

INSPIRE Online Trainings

Format Description Audience Date Price
15 Std. Datentransformation nach INSPIRE mit hale»studio Beginners June 14-18 800€
15 hrs Transforming Data to INSPIRE with hale»studio Beginners July 6–10 800€
8 hrs Transformation for Environmental Monitoring Facilities Advanced May 25-27 400€
8 hrs Transformation for Geology and Mineral Resources Advanced June 21-23 400€
8 hrs Mastering complex INSPIRE transformations with Scripts Advanced June 8–10 400€

XPlanung Online Trainings

Format Description Audience Date Price
1 Std. Einführung in XPlanung und XPlanGML All April 7 free
15 Std. Datentransformation nach XPlanung mit hale»studio Beginners April 26-30 800€

To sign up for other events, e-mail us at info@wetransform.to with the list of events you want to attend and your name and organization. To learn more about the trainings, visit our workshops webpage.

(more)