Dataset-as-a-Service

Dataset-as-a-Service is a new service concept for publishing metadata about datasets that are close to our HPC services, often in our own environments. The goals are to:

  • promote access to large valuable datasets for research use
  • strengthen collaboration between research data management and HPC services
  • support data management during the research process
  • enable sharing and linking to documentation and tools during active research
  • support efficient and responsible AI and capacity management

This service concept is developed in a collaboration between LUMI AIF and the Finnish EOSC node.

For the data provider the service offers

  • support in planning and contracting data sharing
  • opportunity to publish use copies of datasets outside repositories with possibility to link to master data
  • visibility for the dataset
  • a documented lifecycle and terms of use for the dataset
  • in the future more tools to manage and share access to data

For the data user the service offers

  • opportunity to discover accessible large datasets, that earlier have been in the dark
  • easier data management
  • possibility to discover recommended tools
  • better data lineage tracking
  • support for easy citation

The current DaaS datasets are visible in the Fairdata Etsin service tagged with the project Dataset-as-a-Service. We are currently planning accommodating the data model, likely with small extensions to the  DCAT application profile. If you are interested to discuss this or if you want to publish data through the DaaS service, e.g. data in Allas, LUMI-O or in HPC environment, please be in touch.

Dataset-as-a-Service datasets in Fairdata Etsin