Secure Processing Environment in search of a metaphor 

The Sensitive Data Management unit at CSC fosters scientific research on sensitive data. We are building services that use elaborate setups to secure these data from prying eyes while scientists work on it. We are using all the tools and mechanisms thought out and developed in the past seven decades or so for electronic computers. But have we stopped to think through what we are doing?

Our work on sensitive data processing services got jump-started when the EU General Data Protection Regulation (GDPR, 2016) created Europe-wide guidelines. There are still furious debates about its exact interpretation, but at least we know what is expected of us and what we mean by sensitive data. GDPR was written with business use in mind. Since then, its scope has been expanded several times to better include the academic scientific use of sensitive data. The Data Act (2020) gave us the concept of Secure Processing Environment (SPE) which is a safe place for legitimate users to work on sensitive data without worrying about its privacy and the security of the environment. 

The focus of these laws and subsequent efforts to build sensitive processing environments is security. However, that overlooks the main function of SPE that is to enable researchers to do science with sensitive data. What does enabling mean in this context and how could we do it efficiently? 

Primary purpose is to advance research

We must look beyond the most obvious needs of researchers. Clearly, they need the sensitive data there, other data that supports them, and programmatic tools to do the actual data analysis. 

The key factor of scientific work, especially empiric data science, is knowledge. We are getting increasingly comfortable with the idea that we can search and instantly reach any known fact over the Internet. However, this cannot be allowed for SPE users because uncontrolled communication exposes the environment and the sensitive data to unknown risks. 

SPE is a social environment 

One way of looking at the history of computing is that it is a progression of ever tighter security rules. That seems to hold for SPEs, too, but then we are missing the elephant in the room. The tight isolation that is guaranteed by the SPE specification means that its users have a worry-free area without security concerns. Information and communication should flow freely and collaboratively among the researchers. Science is an increasingly collaborative effort and an SPE of a project brings together people who have a specific task in hand. SPE should be seen as a social environment that promotes maximal collaboration with minimal restrictions. 

This puts SPE in a strange light. It is a state-of-the-art computer environment with enormous power and capacity compared to previous generations but cutting Internet out of it makes it more like a standalone PC in the 80s or the type of mainframe computers from the late 60s that might have had only a few specific connections between them. Those mainframes had a very limited or non-existent file protection, allowing anyone logging in to do almost whatever they wanted with the system. This era also saw an explosion of new ideas getting implemented and information was freely exchanged. The roots of the open-source movement are in those times. 

We need to recreate the spirit of those exciting and open times inside SPE. Our main target should be communication. SPEs are usually implemented as virtual computers with a graphical interface. If users happen to be inside the SPE at the same time, they should see and talk to each other. A user logging in needs to know what the others have already done. A prudent PI will set up a shared text file and mandate everyone to log their actions in there. This project logbook needs to be inside the SPE for timeliness and to protect sensitive data that might get written in it. The next logical step forward is that the SPE environment should do most of the work of logging and informing project members what has happened, what is happening right now and possibly what will be expected of them next. Imagine one person submitting a large HPC job and logging out. The environment should inform the next one about the pending job and alert them when it finishes. In other words, the SPE needs a communication center.

From lone wolf to pack

For a long time, computers were built to be used by one person at a time, a PC – personal computer literally. Only with modern cloud services we have now started seeing versions of the common tools that allow multiple users concurrently working on the same data. Google Docs and Conceptboard are some of these. SPE provides a secure, private cloud for one project. Eventually, most of its tools should have this shared, concurrent ability. In practice, we haven’t even started providing these to users.

When facing something new, we humans reuse something from the real world and from our previous experience. These metaphors help us communicate clearly with other people. The names we use to describe our interaction with computers reflect our past views: command-line, PC, and desktop are all about how one person looks at things. The computer desktop visible in SPE should not only reflect what one person is doing but should express the shared experience and aim for the whole research group. Can you come up with a better metaphor?

Heikki Lehväslaiho
The author works as a Senior Application Specialist in the Sensitive Data Coordination group at CSC.