Data Analytics

Data analytics and data driven research have grown rapidly, creating a need for a wide spectrum of services. CSC offers expert support, tools and computing resources for processing and analysing collections of data. Service offering is developed to match the varying needs of the users, ranging from researchers taking the first steps towards quantitative analysis to heavy-users looking for large computing resources.

Often data analysis starts with storing the data in CSC environment so that it is efficiently and conveniently available. See Data storage for more information on different storage options.

CSC specialists can help you in choosing the right tools and environment for your data analysis. No matter what aspect of data, computing or scientific methods it is, we are here to support you. See Support for data analytics page.

You might also be interested in our Data analysis guide.

Our services can be part of very different research workflows. Here are the most typical user stories from scientific data analysis.

Getting into data driven research

"I studied my discipline, not data and computing methods"

A researcher is skilled in their own field but now needs to use data driven methods to make new kinds of hypotheses and findings. A good way to begin is by attending introductory courses at CSC to get started. When you have a question, you can start from the support for data analytics page. Both technical questions and more general inquiries are welcome. You can also find a lot of material in our online documentation.

Scaling up from laptop to computing cluster

"It used to work but now running it takes forever on my laptop"

A scientist has developed an algorithm using a smaller problem size or a single dataset/model. Running the same solution on the full problem goes beyond the capabilities of the researcher's laptop, or going through all dataset/models would take too long. The researcher can run the same code on CSC's Puhti environment, with access to large memory or many processes running simultaneously. They can also use Puhti for proper parallel computing for large problems, which usually requires at least small modifications to the code.

Big data processing on CSC cloud

"My data is too big"

A scientist has collected a lot of data and needs to analyze it in a reasonable time. Spark performs large scale data analysis tasks which cannot be otherwise accomplished with a traditional approach. CSC provides Spark in Rahti container cloud, where Spark runs on a cluster of machines which can be scaled up and down depending upon the needs. For more information check the Big data computing page.

Course environments on CSC cloud

"We have all these tricky dependencies"

A teacher has one or more courses that touch on programming, data analytics or a field that uses said disciplines. Coursework includes exercises that require some programming and being able to interactively run examples. Setting up a working environment with all required dependencies for all students can be a hassle. Luckily CSC has a solution: Notebooks.

The teacher maintains course materials as a repository on GitHub or a similar source. Each course has a template environment the students can run using a web browser. The environments have a lifetime of some hours and are automatically cleaned up after use. Nothing new needs to be installed on University IT maintained machines or student's own laptops. For more information check the Notebooks page.

Deep learning with GPUs

"I want to train a deep neural network"

A researcher wants to try out an existing deep neural network model for a dataset or to develop a novel network architecture to solve a new task. Training neural networks can easily require a lot of processing power, and the GPU accelerator cards available at CSC are particularly suited for these tasks. Several popular machine and deep learning frameworks are readily available in CSC's compute cluster. For more information check the Machine learning page.