Skip to content

Examples on dataset series

Riccardo Albertoni edited this page Nov 30, 2021 · 25 revisions

1. Examples 2020

ℹ️ Page created following discussion in https://www.w3.org/2020/10/20-dxwg-minutes#t03

Linked issues:

We collect some examples for tuning the DCAT support for dataset series. The following examples are meant to ease the discussion and do not aim to provide official guidance on how Dataset Series must be dealt with DCAT 3.

Let's consider an example of Dataset Series, ex:budget, and list some of the discussions (hereafter, named "aspects"). The group has discussed most of the aspects in issue 868 and related GitHub issues.

Example 1: Time-related dataset series

    #aspect 1: softyping or new class e.g., dcat:DatasetSeries? dcat:DatasetSeries is subclass of dcat:Dataset 
    # or dcat:Resource?

    # @andrea-perego: I would stick in the examples to what is possible with the current version of DCAT, and 
    # raise this issue in the aspects discussed after the examples.

    ex:budget a dcat:DatasetSeries;
      dct:type <http://inspire.ec.europa.eu/metadata-codelist/ResourceType/series> ;
    #aspect 2: how to express containment?
      dct:hasPart ex:budget2018 ;
      dct:hasPart ex:budget2019;
      dct:hasPart ex:budget2020;

    # aspect 3: temporal granularity

    # @andrea-perego: I think temporal resolution should be specified (also) at the child dataset level.
    # Moreover, the dataset series may have multiple statement about temporal resolution (in case child 
    # datasets have different temporal resolution).

	  dcat:temporalResolution "P1Y"^^xsd:duration;
    # aspect 4: temporal coverage 
	  dct:temporal [ a dct:PeriodOfTime ;
	    dcat:startDate "2018-01-01"^^xsd:date ;
	    dcat:endDate   "2020-12-31"^^xsd:date ;
	  ] .
      
    #aspect 5: the parts of a Dataset Series are datasets or distributions? In the following, datasets, 
    # The discussion seems to have solved this aspect .

    # @andrea-perego: Following what stated in the DCAT2 NOTE accompanying the dcat:Distribution class,
    # the default approach should be to create separate datasets. But dcat:Distribution can be used 
    # otherwise, depending on the requirements of the application scenario.

    ex:budget2018 a dcat:Dataset;  
	     dct:title "Budget for 2018";
	     dct:temporal [ a dct:PeriodOfTime ;
		    dcat:startDate "2018-01-01"^^xsd:date ;
		    dcat:endDate   "2018-12-31"^^xsd:date ;
		  ] .

    ex:budget2019 a dcat:Dataset;
	     dct:title "Budget for 2019";
	     dct:temporal [ a dct:PeriodOfTime ;
		    dcat:startDate "2019-01-01"^^xsd:date ;
		    dcat:endDate   "2019-12-31"^^xsd:date ;
		  ] .

    ex:budget2020 a dcat:Dataset;
	     dct:title "Budget for 2020";
	     dct:temporal [ a dct:PeriodOfTime ;
		    dcat:startDate "2020-01-01"^^xsd:date ;
		    dcat:endDate   "2020-12-31"^^xsd:date ;
		  ] .

Example 2: Space-related dataset series

This example is a variant of Example 1. It's about 2018 budget data at a country level for all European countries, collected and processed following the same criteria.

This scenario usually leads to the creation of a single dataset, merging all the data. However, there may be cases where it would be desirable (or required) to publish them separately. E.g., if each dataset is produced by a different organisation (e.g., responsible for statistical reporting at the national level), it may be important to know who is the producer of each dataset, for attribution and/or accountability.

    ex:budget-2018 a dcat:Dataset;
      dct:type <http://inspire.ec.europa.eu/metadata-codelist/ResourceType/series> ;
    #aspect 2: how to express containment?
      dct:hasPart ex:budget-2018-fr ;
      dct:hasPart ex:budget-2018-be ;
      dct:hasPart ex:budget-2018-it ;
      ...
    # aspect 3: temporal granularity
	  dcat:temporalResolution "P1Y"^^xsd:duration;
    # aspect 4: temporal coverage 
	  dct:temporal [ a dct:PeriodOfTime ;
	    dcat:startDate "2018-01-01"^^xsd:date ;
	    dcat:endDate   "2018-12-31"^^xsd:date ;
	  ] ;
    # aspect 4.A: spatial coverage 
	  dct:spatial <http://publications.europa.eu/resource/dataset/country/FRA> ,
            <http://publications.europa.eu/resource/dataset/country/BEL> ,
            <http://publications.europa.eu/resource/dataset/countryITA> ,
            ...
          .
      
    ex:budget-fr-2018 a dcat:Dataset;
	  dct:title "France budget for 2018";
	  dcat:temporalResolution "P1Y"^^xsd:duration;
	  dct:temporal [ a dct:PeriodOfTime ;
	    dcat:startDate "2018-01-01"^^xsd:date ;
	    dcat:endDate   "2018-12-31"^^xsd:date ;
	  ] ;
	  dct:spatial <http://publications.europa.eu/resource/dataset/country/FRA> .

    ex:budget-be-2018 a dcat:Dataset;
	  dct:title "Belgium budget for 2018";
	  dcat:temporalResolution "P1Y"^^xsd:duration;
	  dct:temporal [ a dct:PeriodOfTime ;
	    dcat:startDate "2018-01-01"^^xsd:date ;
	    dcat:endDate   "2018-12-31"^^xsd:date ;
	  ] ;
	  dct:spatial <http://publications.europa.eu/resource/dataset/country/BEL> .

    ex:budget-it-2018 a dcat:Dataset;
	  dct:title "Italy budget for 2018";
	  dcat:temporalResolution "P1Y"^^xsd:duration;
	  dct:temporal [ a dct:PeriodOfTime ;
	    dcat:startDate "2018-01-01"^^xsd:date ;
	    dcat:endDate   "2018-12-31"^^xsd:date ;
	  ] ;
	  dct:spatial <http://publications.europa.eu/resource/dataset/countryITA> .

     ...

Example 3: Time-related and space-related dataset series

The previous 2 examples can be combined to create a dataset series containing dataset for annual budget data for each country.

Example 4: Another space-related dataset series

This example is about datasets of satellite images, each covering a specific geographical area (possibly overlapping), over the same time period (year 2018).

In the example, the satellite images are taken on a daily basis, with a resolution of 10m. The same temporal and spatial resolution are specified at both the dataset and dataset series level.

In case the images are taken at different temporal and/or spatial resolutions, the dataset series should include all of them.

The spatial coverage of each dataset is specified by a geographic bounding box. The spatial coverage of the dataset series is a bounding box corresponding to the union of the bounding boxes of each dataset.

    ex:satellite a dcat:Dataset;
      dct:type <http://inspire.ec.europa.eu/metadata-codelist/ResourceType/series> ;
    #aspect 2: how to express containment?
      dct:hasPart ex:satellite-1 ;
      dct:hasPart ex:satellite-2 ;
      dct:hasPart ex:satellite-3 ;
      ...
    # aspect 3: temporal granularity
	  dcat:temporalResolution "P1D"^^xsd:duration;
    # aspect 3.A: spatial resolution
	  dcat:spatialResolutionInMeters "10"^^xsd:decimal ;
    # aspect 4: temporal coverage 
	  dct:temporal [ a dct:PeriodOfTime ;
	    dcat:startDate "2018-01-01"^^xsd:date ;
	    dcat:endDate   "2018-12-31"^^xsd:date ;
	  ] ;
    # aspect 4.A: spatial coverage 
	  dct:spatial [ a dct:Location ;
            dcat:bbox "POLYGON(...)"^^geosparql:wktLiteral
          ]
          .
      
    ex:satellite-1 a dcat:Dataset;
	  dct:title "Satellite images for area 1";
	  dcat:temporalResolution "P1D"^^xsd:duration;
	  dcat:spatialResolutionInMeters "10"^^xsd:decimal ;
	  dct:temporal [ a dct:PeriodOfTime ;
	    dcat:startDate "2018-01-01"^^xsd:date ;
	    dcat:endDate   "2018-12-31"^^xsd:date ;
	  ] ;
	  dct:spatial [ a dct:Location ;
            dcat:bbox "POLYGON(...)"^^geosparql:wktLiteral
          ]
          .

    ex:satellite-2 a dcat:Dataset;
	  dct:title "Satellite images for area 2";
	  dcat:temporalResolution "P1D"^^xsd:duration;
	  dcat:spatialResolutionInMeters "10"^^xsd:decimal ;
	  dct:temporal [ a dct:PeriodOfTime ;
	    dcat:startDate "2018-01-01"^^xsd:date ;
	    dcat:endDate   "2018-12-31"^^xsd:date ;
	  ] ;
	  dct:spatial [ a dct:Location ;
            dcat:bbox "POLYGON(...)"^^geosparql:wktLiteral
          ]
          .

    ex:satellite-3 a dcat:Dataset;
	  dct:title "Satellite images for area 3";
	  dcat:temporalResolution "P1D"^^xsd:duration;
	  dcat:spatialResolutionInMeters "10"^^xsd:decimal ;
	  dct:temporal [ a dct:PeriodOfTime ;
	    dcat:startDate "2018-01-01"^^xsd:date ;
	    dcat:endDate   "2018-12-31"^^xsd:date ;
	  ] ;
	  dct:spatial [ a dct:Location ;
            dcat:bbox "POLYGON(...)"^^geosparql:wktLiteral
          ]
          .

     ...

Example 5: Content-related dataset series

This example is about datasets each collecting data about emissions of specific air pollutants, collected on a daily basis from sensors located at the average distance of 500m. The datasets cover the same geographical area (here, Milan, Italy) and the same time period (year 2018).

The dataset series groups them together to enable users to find data related to all the measured emissions of air pollutants.

    ex:emissions a dcat:Dataset;
      dct:type <http://inspire.ec.europa.eu/metadata-codelist/ResourceType/series> ;
    #aspect 2: how to express containment?
      dct:hasPart ex:emissions-co2 ;
      dct:hasPart ex:emissions-nh3 ;
      dct:hasPart ex:satellite-pm10 ;
      ...
    # aspect 3: temporal granularity
	  dcat:temporalResolution "P1D"^^xsd:duration;
    # aspect 3.A: spatial resolution
	  dcat:spatialResolutionInMeters "500"^^xsd:decimal ;
    # aspect 4: temporal coverage 
	  dct:temporal [ a dct:PeriodOfTime ;
	    dcat:startDate "2018-01-01"^^xsd:date ;
	    dcat:endDate   "2018-12-31"^^xsd:date ;
	  ] ;
    # aspect 4.A: spatial coverage 
	  dct:spatial <http://publications.europa.eu/resource/authority/place/ITA_MIL>
          .
      
    ex:emissions-co2 a dcat:Dataset;
	  dct:title "CO2 emissions";
	  dcat:temporalResolution "P1D"^^xsd:duration;
	  dcat:spatialResolutionInMeters "500"^^xsd:decimal ;
	  dct:temporal [ a dct:PeriodOfTime ;
	    dcat:startDate "2018-01-01"^^xsd:date ;
	    dcat:endDate   "2018-12-31"^^xsd:date ;
	  ] ;
	  dct:spatial <http://publications.europa.eu/resource/authority/place/ITA_MIL>
          .

    ex:emissions-nh3 a dcat:Dataset;
	  dct:title "NH3 emissions";
	  dcat:temporalResolution "P1D"^^xsd:duration;
	  dcat:spatialResolutionInMeters "500"^^xsd:decimal ;
	  dct:temporal [ a dct:PeriodOfTime ;
	    dcat:startDate "2018-01-01"^^xsd:date ;
	    dcat:endDate   "2018-12-31"^^xsd:date ;
	  ] ;
	  dct:spatial <http://publications.europa.eu/resource/authority/place/ITA_MIL>
          .

    ex:emissions-pm10 a dcat:Dataset;
	  dct:title "PM10 emissions";
	  dcat:temporalResolution "P1D"^^xsd:duration;
	  dcat:spatialResolutionInMeters "500"^^xsd:decimal ;
	  dct:temporal [ a dct:PeriodOfTime ;
	    dcat:startDate "2018-01-01"^^xsd:date ;
	    dcat:endDate   "2018-12-31"^^xsd:date ;
	  ] ;
	  dct:spatial <http://publications.europa.eu/resource/authority/place/ITA_MIL>
          .

     ...

Discussion

Other aspects we might want to consider:

  • #aspect 1: Softyping or new class e.g., dcat:DatasetSeries? dcat:DatasetSeries should be a subclass of dcat:Dataset or dcat:Resource?

    @riccardo-albertoni: Just for completing the list of the possible superclass, I wonder if also dcat:Catalog might be an option.

  • ...

  • #aspect 5: the parts of a Dataset Series are datasets or distributions? In the following, datasets. The discussion seems to have solved this aspect .

    @andrea-perego: Following what stated in the DCAT2 NOTE accompanying the dcat:Distribution class, the default approach should be to create separate datasets. But dcat:Distribution can be used otherwise, depending on the requirements of the application scenario.

  • #aspect 6: Should Dataset Series declare what is the varying variable? e.g., in Example 1, it is time, we have a series developed along to years (2018, 2019, 2020). For temporal and spatial series, dcat:temporalResolution and dcat:spatialResolution, if added to the new class dcat:DatasetSeries, might give the gist of the varying variable. Do we want to support other than spatial and temporal series? (in the case, we might borrow some ideas such as data structures In RDF cube?)

    @andrea-perego: RDF Data Cube, and in particular the notion of "slices", can indeed be a useful general reference about dataset subsetting criteria and dataset series.

  • #aspect 7: there are other types of data aggregations (see Makx's list at issue 868), which might be worth modeling and we might want to provide a coherent way to model these aggregations. For example, by mapping the above example into qualified relations.

  • (@andrea-perego) #aspect 8: Should we support multi-level dataset series, as in ISO 19115? I.e., the child of a dataset series can be either a dataset or another series (which can in turn contain datasets and other series).

    @riccardo-albertoni: Yes, I guess multi-level dataset series might be needed, considering that they are already included in other standards, I wonder if we can have this almost for free if we model the child as dataset, and dataseries as subclass of dataset.

  • (@andrea-perego) #aspect 9: For time-related dataset series, besides the containment relationship, also those specifying the prev / next / latest dataset in the series are relevant (this aspect is partially covered in the extended draft of the versioning section). Actually, depending on the use case, the latter can be more relevant than the containment one

    @riccardo-albertoni: Yes, I think it does not harm to include this possibility, and the solution you have identified in the Pull Request draft based on adms's prev /next/ latest, works fine for me.

  • (@andrea-perego) #aspect 10: In relation to #aspect 6: where the variables should be specified? At the level of datasets, series, or both? In case the variables are different in child datasets (e.g., different spatial or temporal resolution), how this should be specified at the dataset series level?

  • Are there other aspects to consider?

    @riccardo-albertoni: we might want to group the aspects to ease their discussion For example:

    • Aspect 1 might relate to aspects 5 and 8. Let's say they deal with "how the dataset series fit in the DCAT backbone"?
    • Aspect 6 relates to aspects 7 and 10. As 10 explicitly refers to 6, while using RDF cube structure could be an alternative or a complement to the aggregation by qualified relations. Let's say they deal with "Aggregation vs (qualified) relation and slicing"?

2. Examples using distributions as items for DataSeries

This section is a work in progress providing additional examples. The idea under discussion is that in some cases, for example when series include items created with high frequency, distributions can be added to the series without proxy datasets.

This section is solely for the purpose of discussion and brainstorming of issue # 1429 .

Example 2.1: Time and space related series using only distributions without datasets.

    ex:budget a dcat:DatasetSeries ;
     dcterms:title "Budget data"@en ; 
     dcterms:issued "2019-01-01"^^xsd:date ;
     dcterms:modified "2021-01-01"^^xsd:date ;
     dcterms:accrualPeriodicity <http://purl.org/cld/freq/annual> ;
     dcat:temporalResolution "P1Y"^^xsd:duration ; 
     dcterms:temporal [ a dcterms:PeriodOfTime ; 
	     dcat:startDate "2018-01-01"^^xsd:date ; 
	     dcat:endDate "2020-12-31"^^xsd:date ; 
	 ] ; 
	 dcterms:spatial <http://publications.europa.eu/resource/dataset/country/BEL> , 
<http://publications.europa.eu/resource/dataset/country/FRA> , 
<http://publications.europa.eu/resource/dataset/country/ITA> , 
	 ... ; 
	 dcat:distribution ex:budget-all;
		
	 .
	
 # the whole dataseries can be access via service
	ex:budget-all a dcat:Distribution; 
	  dcterms:title "budget dataset";
	  dcat:service ex:budgetService;
	   .

 
	ex:budgetService a dcat:DataService;
	  dcat:endpointDescription <http://example.org/budget> ;
	  dcat:endpointURL <http://example.org/api/budget-all> ;
	  dcat:servesDataset ex:budget ;
	  .
	   
 # specific partitions of the dataset are distributed as csv		 
	ex:budget-2018-be a dcat:Distribution ; 
		dcterms:title "Belgium budget data for year 2018"@en ; 
		dcat:inSeries ex:budget ;
		dcterms:issued "2019-01-01"^^xsd:date ; 
		dcat:next ex:budget-2019-be ;
		dcterms:accrualPeriodicity <http://purl.org/cld/freq/annual> ;
		dcat:downloadURL <http//:example.org/budget-2018-be-csv>;
		dcat:mediaType <http://www.iana.org/assignments/media-types/text/csv> ;
		dcat:temporalResolution "P1Y"^^xsd:duration ;
		dcterms:temporal [ a dcterms:PeriodOfTime ;
			 dcat:startDate "2018-01-01"^^xsd:date ; 
			 dcat:endDate "2018-12-31"^^xsd:date ; 
	     ] ; 
	     dcterms:spatial <http://publications.europa.eu/resource/dataset/country/BEL> ; 
		. 
	... 
	  
	ex:budget-2018-fr a dcat:Distribution ; 
		dcterms:title "France budget data for year 2018"@en ; 
		dcat:inSeries ex:budget ; 
		dcterms:issued "2019-01-01"^^xsd:date ; 
		dcat:next ex:budget-2019-fr ; 
		dcat:downloadURL <http//:example.org/budget-2019-fr-csv>;
		dcat:mediaType <http://www.iana.org/assignments/media-types/text/csv> ;
		dcterms:accrualPeriodicity <http://purl.org/cld/freq/annual> ; 
		dcat:temporalResolution "P1Y"^^xsd:duration ;
		dcterms:temporal [ a dcterms:PeriodOfTime ; 
			dcat:startDate "2018-01-01"^^xsd:date ; 
			dcat:endDate "2018-12-31"^^xsd:date ; 
		] ; 
		dcterms:spatial <http://publications.europa.eu/resource/dataset/country/FRA> ;
		. 
	... 

	ex:budget-2018-it a dcat:Distribution ; 
		dcterms:title "Italy budget data for year 2018"@en ; 
		dcat:inSeries ex:budget ; 
		dcterms:issued "2019-01-01"^^xsd:date ;
		dcat:next ex:budget-2019-it ; 
		dcat:downloadURL <http//:example.org/budget-2019-it-csv>;
		dcat:mediaType <http://www.iana.org/assignments/media-types/text/csv> ;
		dcterms:accrualPeriodicity <http://purl.org/cld/freq/annual> ; 	
		dcat:temporalResolution "P1Y"^^xsd:duration ; 
			dcterms:temporal [ a dcterms:PeriodOfTime ; 
				dcat:startDate "2018-01-01"^^xsd:date ; 
				dcat:endDate "2018-12-31"^^xsd:date ; 
		] ; 
		dcterms:spatial <http://publications.europa.eu/resource/dataset/country/ITA> ; 
	. 
	...

Questions and Notes for the Brainstorming

Solution 1 - as in the second DCAT working draft and the current editors draft

The current DCAT Editor Draft and the second DCAT working draft acknowledge some flexibility on what to consider as items of dcat:datasetSeries, which already includes the use of Distributions in place of Datasets.

Indeed, the property dcat:inSeries, which links the items to data series, has no domain specified, and its usage note says
Normally, child datasets in dataset series are represented as dcat:Dataset. The use of dcat:Distribution for typing child datasets is however recognized as a possible alternative, whenever it addresses more effectively the requirements of a given application scenario.

However, it does not provide deep guidelines and examples about this solution.

In order to refine and clarify the recommendation, we might want to answer the following macro-question:

  • Should we keep Dataset as the primary choice solution for representing Items in the dataset series, suggesting the use of Distributions only when strictly needed (e.g., high frequent publication of the Items)?

Solution 2 - Alternative Modelling Pattern

Diagram showing an alternative pattern for data set Series

Grounding Use Cases

  • Examples of theoretically possible use cases that might require to use datasets as Series' Items:
  • Example of use cases that might profit from distribution as series Items
    • High-frequently updated dataset series.
Clone this wiki locally