CSC's trainings and events have moved

Find our upcoming trainings and events at www.csc.fi.

This site is an archive version and is no longer updated.
 

Go to CSC Customer trainings and Events

Introduction to data science for researchers

Introduction to data science for researchers
Date: 06.05.2015 9:00 - 07.05.2015 16:15
Location details: -
Language: english-language
lecturers: Seija Sirkiä
Harri Hämäläinen
Price: -

Data science has become a trendy subject in recent years. Along with big data, it would appear to be the latest fashion particularly in business. But data analysis and statistics have been around much longer, so why do we suddenly need a new concept? The short answer is that we actually don't but the sheer amount of data available nowadays calls for new combinations of existing skills. Full analysis of a data-heavy problem requires skills that previously would have been more clearly divided among computer science, statistics and the subject matter at hand. The multidisciplinary activity happening at this intersection has become known as data science.

 

Among researchers the subject matter knowledge is often already combined with basic statistics skills. The aim of this course then is to update that to basic data science skills. We explore topics such as:

  • R or Python instead of SPSS, Excel, etc.
  • harvesting data from various sources
  • handling large data sets (note however that this course is not focused on big data)
  • handling messy data
  • exploring data with both visual and numeric approaches
  • applying familiar analysis methods to data and learning new ones
  • understanding the limitations of pure data analysis in research
  • overview of CSC's services for data science (available free of charge to researchers at higher education institutions)
 

We use a lot of examples and exercises to show these things. Therefore we assume that the participants can follow examples given in either R or Python. Knowing both is not necessary, as examples are given in both languages and exercises can be done with either. Basic knowledge in either should suffice. However, participants should be prepared to write code themselves: a large part of the course (and data science in general) consists of 'data wrangling': the art of making data structures and their contents behave the way they are supposed to, which in fact looks a lot like programming.

Course material available at https://github.com/CSC-IT-Center-for-Science/data-stat-course

Program

Tuesday 5th of May

9:00-10:30 Lecture 1: Introduction and preliminaries

10:30-10:45 Coffee break

10:45-12:15 Lecture 2: Data acquisition

12:15-13:00 Lunch break

13:00-14:30 Lecture 3: Data wrangling, pt. 1

14:30-14:45 Coffee break

14:45-16:15 Lecture 4: Data wrangling, pt. 2

 

Wednesday 6th of May

9:00-10:30 Lecture 5: Data wrangling, pt. 3

10:30-10:45 Coffee break

10:45-12:15 Lecture 6: Visualization

12:15-13:00 Lunch break

13:00-14:30 Lecture 7: Research related topics

14:30-14:45 Coffee break

14:45-16:15 Final exercises

 

Each lecture will also contain exercises.