Introduction to data science for researchers - Training
CSC's trainings and events have moved
Find our upcoming trainings and events at www.csc.fi.
This site is an archive version and is no longer updated.
Date: | 06.05.2015 9:00 - 07.05.2015 16:15 |
Location details: | - |
Language: | english-language |
lecturers: |
Seija Sirkiä Harri Hämäläinen |
Price: | - |
Data science has become a trendy subject in recent years. Along with big data, it would appear to be the latest fashion particularly in business. But data analysis and statistics have been around much longer, so why do we suddenly need a new concept? The short answer is that we actually don't but the sheer amount of data available nowadays calls for new combinations of existing skills. Full analysis of a data-heavy problem requires skills that previously would have been more clearly divided among computer science, statistics and the subject matter at hand. The multidisciplinary activity happening at this intersection has become known as data science.
Among researchers the subject matter knowledge is often already combined with basic statistics skills. The aim of this course then is to update that to basic data science skills. We explore topics such as:
- R or Python instead of SPSS, Excel, etc.
- harvesting data from various sources
- handling large data sets (note however that this course is not focused on big data)
- handling messy data
- exploring data with both visual and numeric approaches
- applying familiar analysis methods to data and learning new ones
- understanding the limitations of pure data analysis in research
- overview of CSC's services for data science (available free of charge to researchers at higher education institutions)
We use a lot of examples and exercises to show these things. Therefore we assume that the participants can follow examples given in either R or Python. Knowing both is not necessary, as examples are given in both languages and exercises can be done with either. Basic knowledge in either should suffice. However, participants should be prepared to write code themselves: a large part of the course (and data science in general) consists of 'data wrangling': the art of making data structures and their contents behave the way they are supposed to, which in fact looks a lot like programming.
Course material available at https://github.com/CSC-IT-Center-for-Science/data-stat-course
Tuesday 5th of May
9:00-10:30 Lecture 1: Introduction and preliminaries
10:30-10:45 Coffee break
10:45-12:15 Lecture 2: Data acquisition
12:15-13:00 Lunch break
13:00-14:30 Lecture 3: Data wrangling, pt. 1
14:30-14:45 Coffee break
14:45-16:15 Lecture 4: Data wrangling, pt. 2
Wednesday 6th of May
9:00-10:30 Lecture 5: Data wrangling, pt. 3
10:30-10:45 Coffee break
10:45-12:15 Lecture 6: Visualization
12:15-13:00 Lunch break
13:00-14:30 Lecture 7: Research related topics
14:30-14:45 Coffee break
14:45-16:15 Final exercises
Each lecture will also contain exercises.