The Grand Challenge Pilot Projects are starting
 

CSC's Scientific Customer Panel selected 15 Grand Challenge Projects that will be run as CSC's new computing environment pilot projects during the acceptance phase in June.

The approved projects are:

AI-EMO

Pekka Ruusuvuori, Tampere University.

Histopathological slides are routinely prepared and utilised in current clinical practice for detection and diagnosis of breast cancer. In recent times, molecular phenotyping has emerged as the natural step forward to make diagnosis more precise and comprehensive. In this project we will systematically determine to what extent morphological patterns in histopathology image data can be utilised to predict molecular phenotypes using deep convolutional neural network models. Due to the large number of molecular phenotypes to be predicted, the problem requires efficient modelling and optimization approaches, together with extensive GPU-based computational resources. From a technical viewpoint, the large data volumes associated with the task require a highly optimized computational pipeline, coupling efficient usage of disk hardware with parallel CPU & GPU computation.

AurkA-MYC

Antti Poso, University of Eastern Finland.

The undruggable oncoprotein MYC is involved in progression of human solid tumours. Recently, a potential way to target this oncogene indirectly via Aurora kinase A (AurkA) was introduced. When bound to MYC, AurkA shields it from proteasomal degradation. Specific AurkA inhibitors, conformational shift inducers (CSIs), prevent MYC binding to AurkA, leading to MYC's degradation. The precise mechanism of AurkA–MYC binding is unknown and it is unclear how a CSI prevents AurkA-MYC interaction. These uncertainties are currently the limiting step in the CSI design process and hinder the ongoing drug development. We aim to characterize and study the AurkA–MYC interaction and CSI compounds' binding to AurkA by long-timescale molecular dynamics (MD) simulations. Our main objectives are to understand the AurkA–MYC interaction on the molecular level and to provide a mechanistic explanation of why this is distorted by CSIs. The final aim is to devop efficient MYC-targating AurkA inhibitor.

Complexity

Vivek Sharma, University of Helsinki.

The respiratory complex I (~1 MDa) in the inner mitochondrial membrane contributes to about 40 % of the total ATP generation. It accepts electrons from the food-stuff, and pumps protons across the membrane. Besides being one of the key enzymes in biological energy conversion, it has been found to be associated with various neurodegenerative and mitochondrial disorders. The 3D structures of complex I have been solved only recently using X-ray and cryo EM techniques. Despite this the mechanism of coupling between the electron transfer and proton pumping remains enigmatic. In this project we will perform fully atomistic classical MD simulations on the entire structure of complex I (million atom model), combined with all-QM and QM/MM calculations. Our multiscale simulation approaches will not only shed light on the fundamental aspects of long range coupling in complex I, but will also provide a molecular reasoning to the diseases associated with complex I dysfunction.

DeepFin

Filip Ginter, University of Turku.

Recent advances in deep learning have made it possible to model human language at unprecedented levels of accuracy, leading to breakthroughs in machine translation, question answering, natural language dialogue, and more. However, due to the computational cost of training state-of-the-art methods on billions of words of text, large-scale applications of this technology have been primarily limited to large multinational companies working on English. Using web-scale Finnish language resources compiled by the University of Turku Natural Language Processing group and newly introduced CSC computational resources, the DeepFin Grand Challenge project will create a deep language model of Finnish that will be comparable in coverage and quality to the best language models available today for any language. This critical language resource will be made freely available under Open Data licensing to accelerate further advances in Finnish natural language processing.

DYNAMIC-IR

Ilpo Vattulainen, University of Helsinki.

The insulin receptor (IR) is a membrane spanning glycoprotein, whose impaired activation is known to result in type 2 diabetes, yet it is also involved in cancer development. The IR is activated by binding of insulin to the active site in the extracellular α-subunit. Recent studies using single-particle electron microscopy (EM) have shed light on the conformational changes taking place in the IR on insulin binding. However, high-resolution models describing the activation and deactivation mechanisms and their dynamics are still missing. The objective of this project is to provide atomistic-level understanding of the insulin-induced conformation changes in the IR using all-atom molecular dynamics (MD) simulations tightly linked to EM and cell biology experiments. The project will shed light on the activation mechanism of the insulin receptor and therefore provide the basis for understanding how altered conditions found in type 2 diabetes lead to impaired glucose intake into the cells.

GraSP

David Weir, University of Helsinki.

Gravitational waves represent a new window onto the physics of the early universe. The LISA space-based gravitational wave detector will launch before 2034, and the design of the analysis pipelines is under way, so now is the time to understand the possible signals that could be detected. A strong candidate for a cosmological source of gravitational waves would be a first-order phase transition at the electroweak scale, around 10 picoseconds after the Big Bang, when the Higgs boson "turns on". In such a phase transition bubbles of the new Higgs phase nucleate and expand. As they collide they set up sound waves in the hot plasma of the early universe. This project will extend previous efforts to simulate this scenario to phase transitions with stronger fluid shocks in the plasma and more complicated reaction fronts around the walls of the colliding bubbles. These strong transitions are the most viable for detection at LISA or similar future gravitational wave missions.

H2OINTE

Karoliina Honkala, University of Jyväskylä.

Solvent effects are known to play an important role in heterogeneous catalysis and electrocatalysis. The lack of detailed understanding of reactivity at a water-catalyst interface prevents rationalization of the influence of solvent on activity and selectivity of a reaction. Herein, we will take the computation of water catalyst interactions to the next level by performing ab initio molecular dynamic simulations for series of metal surfaces and carbon materials. The obtained results will allow us to parametrize a polarizible continuum model, which enables including solvent effects in a cost-effective manner into reaction studies. We will use the continuum model to investigate the impact of solvent in activity and selectivity of glycerol oxidation. Overall, the obtained results will enable us to establish a framework for rationalization of water-catalyst interaction at the atomic level, set up a continuum solvent model, and explore the influence of solvent on catalytic chemistry.

MultiMT

Jörg Tiedemann, University of Helsinki.

The purpose of the project is to train massively multilingual neural machine translation models based on large data sets of previously translated documents. The goal is to create neural translation models that share parameters across a large number of languages (over 200 hundred) covering various language families and linguistic properties. The goal is to scale up neural machine translation in terms of language coverage and support of low-resource languages using transfer-learning and multi-task training. We will apply an architecture that learns a language-neutral semantic abstraction layer, which also supports the translation between language pairs that are not seen in training. The emerging semantic representations can also be used in other downstream tasks that require natural language understanding. Testing such a model on this scale is unique and requires extensive computing power and training needs to be run in parallel on GPU nodes.

NANOVIR

Juha Huiskonen, University of Helsinki.

Viruses are cellular parasites that infect all known living organisms. Outside of the living cell, the viral particle must protect its genes from sometimes harsh environmental conditions. These particles assemble often to near perfection following icosahedral symmetry. However, the view of viruses as mere static containers of the genes is far from accurate. In many cases, the virus particle harbors nanomachineries that package and release the viral genome. These machineries break the regular symmetry of the container and their moving parts render them difficult to analyse using conventional structural biology methods. In NANOVIR, we apply cryogenic electron microscopy (cryo-EM) and novel expectation maximisation image processing methods that require massive parallel computing to analyse ~100,000 virus images in order to study the viral machineries at atomic detail. Detailed mechanistic understanding of these nanomachines is expected to inform rational design of antiviral therapeutics.

POLARELAX

Gerrit Groenhof, University of Jyvaskylä.

When photoactive molecules interact strongly with confined light modes in optical cavities polaritons form that are coherent superpositions of excitations of the molecules and of the cavity photon. Because the light-matter hybridisation changes the potential energy surface with respect to the bare molecules, cavities can be used to control chemical reactivity. Because the polariton lifetimes are limited by the cavity lifetime, high finesse cavities would be required. However, recent experiments suggest that even in very low-finesse cavities, the emissive lifetime of the lower polariton exceeds the intrinsic cavity-lifetime by orders of magnitude. To resolve this paradox, we will perform massively parallel multi-scale molecular dynamics simulations on large number of GPUs of room-temperature ensembles of both reactive and non-reactive molecules strongly coupled to a single confined light mode in low finesse cavities.

SATFire

Simo Hostikka, Aalto University.

Observed fire behavior during the World Trade Center fires showed that the assumption of a homogeneous temperature field in a large structure could be inaccurate and led to the development of the traveling fire concept. These fires spread non-uniformly creating a heterogeneous temperature field, argued to be highly detrimental to structures. Understanding the behavior of traveling fires is essential for designing safer structures. The research project aims to investigate traveling fire behavior through computational fluid dynamics simulations of reactive flows. We will examine the probability of occurrence of traveling fires using LHS Monte Carlo simulations. The compendium of time-temperature curves produced, will be used for detailed structural analysis the building frame. An open source CFD code, Fire Dynamics Simulator, will be used to simulate the fire spread scenarios within an actual warehouse building model. The project is a part of the Academy of Finland project no:289037.

SIBELIUS-DARK

Stuart McAlpine, University of Helsinki.

Cosmological simulations of the Universe must satisfy two criteria: a high resolution, to cover a wide dynamic range in mass, and a large volume, to incorporate a the widest possible range of scales. Their goal is well defined: to produce a virtual universe that closely resembles our own, however only in a statistical sense, with no consideration towards reproducing the observed local structures, or indeed our home, the Local Group (LG). The LG is more than our cosmic home, it is an ideal laboratory to study cosmic structure formation and the nature of dark matter, yet simulating the LG embedded within the correct cosmological environment is a challenging task. In the SIBELIUS project, we aim to apply our state-of-the-art simulation code to our novel initial conditions that place the LG in its real cosmic environment. This will allow us to explain some of the LG's unique features, determine the role of the environment on its evolution, and make predictions beyond the immediate LG.

TREAT

Antti Karttunen, Aalto University.

XPEC

Miguel Caro, Aalto University.

X-ray spectroscopy is a powerful tool to unravel the microscopic structure of materials, that is, how atoms are arranged at the nanoscale. This experimental technique would in principle allow us to precisely establish the nature of chemical bonding in functional materials and make the link between a material's atomic structure in its performance in real-life applications. This knowledge is of vital importance to optimize the new generation of materials needed for environmentally-friendly technologies, such as clean fuel production and sequestration of atmospheric CO2. Unfortunately, interpretation of X-ray spectra of complex materials is extremely challenging, since signals coming from different atomic environments overlap. This project will integrate together machine-learning data classification and high-dimensional regression with multilevel quantum chemistry methods (DFT and GW) to create the ultimate predictive tool for computational generation of X-ray spectra.

ZLife

Lassi Paavolainen, University of Helsinki.

Transfer learning is an important technique in various applications of deep learning; it helps models to converge quicker and enables training of highly performing models even when only limited training set is available. Especially in image understanding, where convolutional neural networks (CNNs) are used, community-maintained collections of pre-trained models i.e. model zoos have had major role in advancing the whole scientific field. However, the available pre-trained CNNs are trained with images representing everyday objects. Despite the pre-trained models learn universal image features, their direct applicability to life science imaging domain is suboptimal due to their strikingly different channel correlation and pixel distribution, which are totally absent from datasets of natural images. The aim of this project is to train a selection of standard CNNs using two life science domain specific dataset and openly share and host this resource (Zoo of Life) for the whole community.