Geocomputing

CSC offers a wide range of very high level computing services for any field of research. Moving to our supercomputers might make sense in these cases:

  • Computing something takes more than 2-4 hours
  • You need more memory
  • You are working with very big datasets
  • You need to work with GIS data or software already available in Puhti
  • You want to keep your desktop computer for normal usage and do computation elsewhere
  • You have a need for a server computer (cPouta)
  • You have a need for a lot of computers with the same set-up for courses (CSC Notebooks)
  • You are running GPU or MPI programs

Use of CSC's computing environments is mostly free-of-charge for users from Finnish universities and state research institutes.

Our supercomputers have fast data I/O and a lot more memory than normal desktop computers. In general, the computing speed of one CPU is not much higher than that of normal desktop computer, but there are thousands of CPUs compared to a few in desktop computers. Using our supercomputers could significantly reduce computing time if the analysis can run in parallel on several CPUs or on a GPU.

For GIS users, Puhti supercomputer and cPouta cloud service should be especially valuable computing environments.


Webinar recording on geocomputing in CSC's environments

Puhti for geocomputing

Our Puhti supercomputer could be the first option to consider for GIS users. Puhti has several GIS software packages preinstalled and also includes some bigger Finnish datasets. The environment is ready for computing, you just need to log in and start working! However, Puhti is not a desktop or web application. It requires Linux command-line skills. The main reasons why Puhti might not be suitable for some analysis are user's limited Linux skills and software incompatibility.

Software

It is possible to install most of the software available for Linux on Puhti. A lot of common GIS software is already installed on Puhti by CSC. Puhti's GIS software includes at the moment: CloudCompareFORCE, GDAL, GRASS, LasTools, OpenDroneMap, Orfeo ToolBox, PCL, PDAL, QGIS, SagaGIS, sen2cor, SNAP, SPLITS, WhiteboxTools and Zonation. Additionally, R and Python are available with preinstalled spatial packages.

When using some of the installed software, a related module must always be loaded first, please see the linked pages above for details about the specific software. It is also possible to install software on CSC environment yourself for personal use. Puhti also has GPU partitions, which are mostly used for deep learning.

Puhti's operating system is Linux, so software available only for Windows, such as ArcGIS or Erdas, cannot be installed on it. Also, server kind of software, including PostGIS or GeoServer, is not suitable for Puhti. In these cases you can use our cPouta service, described below. 

Data

Puhti has a shared data folder for spatial data, which is available for all users and includes the most important open GIS datasets of Finland. These include NLS DEM, lidar data and topographic database, LUKE VMI, all SYKE open data, and many more.

You can also move your own data to Puhti. There are different directories available for various purposes. In scratch directory, everybody has a 1 Tb of space by default, which can be extended on request. Scratch is cleaned up periodically, so keep a copy of your important files also on Allas object storage. GDAL and all other software based on it support reading data directly from Allas. However, GDAL does not support direct writing to Allas, so normally you have to write your output files to scratch first, and then move them to Allas. With Python and R scripts it is possible to write directly to Allas.

Working with Puhti

Normally, work with Puhti is done using scripts. The most commonly used scripting languages for GIS are R, Python and bash scripts. Additionally, MATLAB and Julia are also available. If you are moving your existing R or Python scripts to Puhti from a Windows environment, you usually only need to modify the paths of the files. You also have to confirm the availability of used packages, but you can install your own packages for your own use.

The scripts are run as jobs on Puhti. Jobs enable to organize and balance the use of computing resources between different users. A job is started by a batch job file. In principle, there are three kinds of batch jobs:

  • Single-core serial jobs with "normal" GIS-software. You run your code as it is, just on Puhti. This will not be much faster than using a desktop computer, but it will free up your desktop during long computations.
  • Array jobs with several cores with "normal" GIS-software. The idea of an array job is to run the same script several times simultaneously. But these jobs are unaware of each other, and the user has no control over the order of execution of these jobs. In GIS context, array jobs are useful, for example, if you are doing the same analysis for different map sheets, different scenarios, or different time periods.
  • Parallel jobs with several cores. Many scientific software packages support this option, so this is the most common usage type on Puhti. Some GIS software packages support parallel computing out-of-the-box. Many programming languages, such as R and Python, support parallel computing. Using these features, it is possible to write scripts that run in parallel yourself.

We provide some example scripts for spatial data analysis in Puhti. Examples include also batch job scripts. Some of the examples include similar solutions for serial, array, and parallel jobs. Examples are for Allas, Python, R, FORCE, GDAL, GRASS, PDAL, SNAP, and machine learning. Also, GeoPortti includes some longer examples on Github.

Interactive usage

Puhti has an interactive partition for using tools in a regular way. It is meant for smaller interactive analysis tasks and using software with graphical user interface (GUI). This way you can use, for example, CloudCompare, QGIS, SNAP, GRASS GIS, SagaGIS, RStudio or Spyder for Python. Puhti web interface is the best option for using software with GUI.

cPouta for geocomputing

cPouta is an Infrastrucutre-as-a-Service offered by CSC. It offers different hardware setups where the user has to install everything needed from scratch (operating system, software, network configuration, and so on). Therefore, cPouta is not suitable for trivial computing needs. On the other hnd, this gives user the freedom to install custom computing environments. cPouta is ideal for running server kind of software, such as PostGIS and GeoServer. Expert users can also set up their own computing clusters. cPouta requires skills of server administration, software installation, and Linux.

CSC provides some example GIS installation guidelines for cPouta: ArcGIS Server, Agisoft Metashape, PostGIS, and GeoServer. GeoPortti provides guidelines for installing OpenDroneMap.

cPouta practically supports only different versions of Linux, so setting up ArcGIS Pro or desktop is not easily possible. The easiest way to use some ArcGIS functionality is to install ArcGIS Server for Linux on cPouta and run ArcPy scripts.

Performance hints for geocomputing

Scripts

  • Use profiling tools to see which parts of your script are the slowest. Look for possibilities to make the slowest parts faster. All programming languages have their own profiling tools, for example:
  • Algorithms and functions from different packages might use different amounts of time for the same computational task.
  • Watch out for 'for loops' and try to find alternative ways.
  • Make the script run in parallel.

Data

  • When working with big raster datasets, virtual rasters might be very helpful.
  • When working with big vector data sets, using a database is appropriate.
  • Remove unnecessary data (clip, select, generalize).
  • Index vector data if your software can use it.

Next steps for starting with Puhti

Documentation

Learning materials

Geocomputing related news are sent to gis-hpc mailing list, you are welcome to join!

In case you have any questions or comments, or you need some other software or data on Puhti, please contact our service desk.