Big data computing

Large data collections pose new kinds of challenges for data processing. Running out of memory, out of disk space and out of time are common problems for those working with big data.

Apache Spark on Rahti

For processing and analysing large data collections, we have Apache Spark framework available. It automatically distributes the workload on multiple servers.

Apache Spark is supported on Rahti container cloud service. To use Apache Spark you first need a CSC user account and access to Rahti service. When logged into Rahti, you can choose the Apache Spark template and deploy your own cluster with a couple of clicks.

Big data tools on cPouta

Various big data tools, such as Spark, Hadoop and Kafka, can also be deployed by the user to cPouta cloud. This allows maximum flexibility for tailoring the environment, but requires more expertise to accomplish.