Data Analytics

Data analytics and data driven research have grown rapidly, creating a need for a wide spectrum of services.

We offer expert support, tools and computing resources for processing and analysing collections of data. Our service offering is developed to match the varying needs of the users, ranging from researchers taking the first steps towards quantitative analysis to heavy-users looking for large computing resources.

Often data analysis starts with storing data in CSC environment so that it is efficiently and conveniently available. See the page on data storage to explore different storage options.

See our data analysis guide to learn more about our data analysis tools and environments.

Our specialists can also help you in choosing the right tools and environment for your data analysis. No matter what aspect of data, computing or scientific methods it is, you can receive support in data analytics from us.

CSC's services for a variety of cases

Our services can be part of very different research workflows. Here are the most typical user stories from scientific data analysis.

Getting into data driven research

"I studied my discipline, not data and computing methods"

A researcher is skilled in their own field but now needs to use data driven methods to make new kinds of hypotheses and findings. A good way to get started is by attending introductory courses at CSC. When you have a question, take a look at the Support for Data Analytics page. Both technical questions and more general inquiries are welcome. You can also find a lot of material in our online documentation.

Scaling up from laptop to computing cluster

"It used to work but now running it takes forever on my laptop"

A scientist has developed an algorithm using a smaller problem size or a single dataset/model. Running the same solution on the full problem goes beyond the capabilities of the researcher's laptop, or going through all dataset/models would take too long. The researcher can run the same code on our Puhti supercomputer that provides access to large amounts of memory and can run many processes simultaneously. They can also use Puhti for proper parallel computing for large problems, which usually requires at least small code modifications.

Big data processing on CSC's cloud

"My data is too big"

A scientist has collected a lot of data and needs to analyze it in a reasonable time. Apache Spark performs large-scale data analysis tasks which cannot be otherwise accomplished with a traditional approach. We provide Apache Spark in Rahti container cloud, where Spark runs on a cluster of machines which can be scaled up and down as needed. For more information, check the Big Data Computing page.

Course environments on CSC's cloud

"We have all these tricky dependencies"

A teacher has one or more courses that touch on programming, data analytics or a field that uses said disciplines. Coursework includes exercises that require some programming and being able to run examples interactively. Setting up a working environment with all the required dependencies for all students can be a hassle. Luckily, we have a solution: Notebooks service.

The teacher maintains course materials as a repository on GitHub or a similar source. Each course has a template environment the students can run using a web browser. The environments have a defined lifetime and are automatically cleaned up after use. Nothing new needs to be installed on university or students' own computers.

Deep learning with GPUs

"I want to train a deep neural network"

A researcher wants to try out an existing deep neural network model for a dataset or to develop a novel network architecture to solve a new task. Training neural networks can easily require a lot of processing power. The GPU accelerator cards available at CSC are particularly suited for such tasks. Several popular machine and deep learning frameworks are readily available on our compute cluster. For more information check the Machine Learning page.