Big data computing

Large data collections pose new kind of challenges for data processing: running out of memory, out of disk space and out of time are common problems for those working with big data. We support Apache Spark framework for processing and analysing large data collections. It automatically distributes the workload on multiple servers.

Spark is supported on the up-coming Rahti container cloud. To use Spark you first need CSC user account and access to Rahti service. When logged into Rahti you can choose the Apache Spark template and deploy your own cluster with a couple of clicks.

Big data tools such as Spark, Hadoop and Kafka can also be deployed by the user to cPouta cloud. This allows maximum flexibility for tailoring the environment, but requires more expertise to accomplish.