Example case 5: Sensitive data analysis

 

5.1 Analysis for statistics, file editing or annotation (files <100 GB; default options).

5.2 Analysis of large datasets programmatically via data streaming (files > 100 GB; advanced).

 

5.1 Analysis for statistics, file editing or annotation (files <100 GB; default options).

This research project collects sensitive data (consented for research purposes). We will analyze several file types (video, audio, spreadsheets, and questionnaires), and we want to process them in a secure environment. We work in Finnish academic organizations, but we are also collaborating with international researchers. We have never used CSC's services before and are unfamiliar with cloud services.

Each file is less than 100 GB and we will not collect more than 200 TB of data in this project. We will use open source software to edit the files directly (e.g. ELAN (to annotate audio and video files), Open Office Cal (open source version of Excel), PSPP (open source version of SPSS), etc. At the end of the analysis, we want to export our results (non-sensitive) from the secure environment.

CSC offers services for managing data in all phases of the research project. As in this case, we are analysing sensitive data, we can use:

If you're new to CSC, familiarizing yourself with these services might take approximately six hours. However, don't hesitate to contact us if you have any problems. We can often solve your request in a few minutes with the help of an expert.

Service

Links to necessary documentation and tutorials.

Benefit for this case

Requirements for the user

MyCSC portal

For creating an account and managing data access.

We first create a CSC account and set up a CSC project. Next, as we will process sensitive data, we complete the description of processing activities form, add service access to Allas and SD Desktop and  enable two-step verification for our accounts.

Access CSC's services on demand via a self-service portal.

Directly grant  access to the same project /data.

Possibility to add collaborators as project members.

Comply with CSC's General terms of use and Data Processing Agreement.

If necessary, consult the academic organization's legal service for advice regarding agreements, DPIA, or adding CSC project members from non-EU/EEA areas.

SD Connect

For securely uploading and storing sensitive data to CSC.

Next, we log in to SD Connect.  Using the drag and drop function, we can encrypt and upload files and folders.

The user interface has specific features that streamline data sensitive data management, for example automated data encryption.

 

Modern web browser. 

Files need to be 100<GB (for larger files see example 5.2 below).

No expertise is required. However users must become familiar with the necessary steps in the user guide.

Maintain a backup copy of the data.

 

SD Desktop

For analysing data with open source software.

We log in to SD Desktop: here, we can set up a virtual computer that will be available to all the project members every time we will access the service. For this type of analysis, we will launch the small computation option and add a volume (200 GB). Once our virtual Desktop is ready, we can import the datasets stored in SD Connect using a specific application. The application will automatically decrypt each file. Now we can start the analysis with the default programs available in SD Desktop.

The service  allows users to  easily start and access a private and secure computing environment to analyze sensitive data on demand.

Users directly manage data access.

Modern web browser.

No expertise is required. However users must become familiar with the necessary steps in the user guide.

Only open-source software can be installed in SD Desktop.

servicedesk@csc.fi

For installing additional software.

 

We contact the service desk and request assistance in installing the additional software.

The virtual Desktop can be customised based on the research project.

Adding software requires importing singularity containers. If users do not have the necessary technical skills, our expert can help with this operation.

SD Desktop

For exporting results and deleting secure environment.

The PI (or CSC's project manager) can export the non-sensitive results from the virtual Desktop. The results will be available in SD Connect where we can download it and decrypt it.

As the analysis phase of our research is completed, we can now delete the virtual Desktop.

 

Data export is managed by the CSC project manager.

Additional copies of  sensitive data created during the analysis phase can be deleted.

The service will not consume additional resources.

Modern web browser.

No expertise is required. However users should become familiar with the necessary steps in the user guide.

servicedesk@csc.fi

To obtain a copy of the original dataset.

Files collected during the initial phases of our research are still encrypted and stored on SD Connect. We contact servicedesk to ask for technical support and download a copy of the data.

 

Data stored in SD Connect and SD Desktop will be deleted 90 days after account termination or project closure, after which it cannot be retrieved.

 

 

5.2 Analysis of large datasets programmatically via data streaming (files > 100 GB; advanced).

In this research project, we want to collect and analyse sensitive research data in a secure environment. We are expert data scientists and we want to perform the data analysis in collaboration with researchers from other organizations.
We will collect files larger than 100 GB and store more than 200 TB of data.  We will use open-source software (e.g. R-Studio, python) with specific packages and scripts. At the end of the research, we want to export non-sensitive findings from the secure environment.

CSC offers services for managing data in all phases of the research project. As in this case, we are analysing a large dataset, we can use the following Sensitive Data service components:

  • MyCSC portal for creating an account, a CSC project, and adding services access.
  • Sensitive Data Connect /Allas for storing encrypted data and importing specific scripts /packages/singularity containers programmatically.
  • Sensitive Data Desktop for analysis via data streaming.
Service Links to necessary documentation and tutorials Benefit for this case Requirements for the user

MyCSC portal

For creating an account and managing data access.

We first create a CSC account and set up a CSC project. Next, as we will process sensitive data, we complete the description of processing activities form, add service access to Allas and SD Desktop and enable two-step verification for our accounts.

As we are processing more than 200 TB of data, we contact servicedesk@csc.fi to ask for advice and apply for additional resources.

Access CSC's services on demand via a self service-portal.

Directly grant  access to the same project /data.

Possibility to add collaborators as project members.

Comply with CSC's General terms of use and Data Processing Agreement.

If necessary, consult the academic organization's legal service for advice regarding agreements, DPIA, or adding CSC project members from non-EU/EEA areas.

 

SD Connect

For securely uploading and storing encrypted sensitive data to CSC.

Next, we encrypt each file programmatically with multiple encryption keys and upload it to Allas.

After upload, the encrypted files will be accessible from SD Connect and SD Desktop.

A project-specific encryption key will allow users to download and decrypt the data from SD Connect.

 

Modern web browser.

Specific expertise is required (programmatic data upload).

Users must become familiar with the necessary steps in the user guide.

 

SD Desktop

For analysing data with open source software.

 

 

We log into SD Desktop and for this type of analysis, we will launch the Medium Computation option.

Once the virtual Desktop is ready, we access the data stored in SD Connect using the Data Gateway application. With the same application we can also  import singularity containers and scripts. The application will automatically decrypt each file.

At the end of our research, we can export the results and delete the virtual Desktop. The service will not consume additional resources.

The original datasets will still be available in SD Connect, where it is safely stored.

Directly manage data access.

Users can customise their virtual Desktop.

 

The CSC project manager can export non-sensitive findings from the virtual Desktop.

At the end of the analysis phase,  additional copies of sensitive data are deleted.

 

Adding software requires importing singularity containers.

Data stored in SD Connect and SD Desktop will be deleted 90 days after account termination or project closure, after which it cannot be retrieved.