Geocomputing - Services for Research
CSC offers a wide range of very high level computing services for any field of research. Moving to our services might make sense in these cases:
- Computing something takes more than 2-4 hours
- You need more memory
- You are working with very big datasets
- You need to work with GIS data or software already available in Puhti
- You want to keep your desktop computer for normal usage and do computation elsewhere
- You have a need for a server computer (cPouta)
- You have a need for a lot of computers with the same set-up for courses (CSC Notebooks)
- You are running GPU or MPI programs
Use of CSC's computing environments is mostly free-of-charge for users from Finnish universities and state research institutes. When the conditions for free-of-charge use are not met, it is possible use many of our services by purchasing them. For supercomputer resources companies are encouraged to use LUMI, see also LUMI section below.
Our supercomputers have more storage space and a lot more memory than normal desktop computers. In general, the computing speed of one CPU is not much higher than that of normal desktop computer, but there are thousands of CPUs compared to a few in desktop computers. Using our supercomputers could significantly reduce computing time if the analysis can run in parallel on several CPUs or GPUs.
For GIS users, Puhti and LUMI supercomputers and cPouta cloud service should be especially valuable computing environments. If you are working with sensiteve data, CSC provides special sensite data services.
Webinar recording on geocomputing in CSC's environments
Puhti supercomputer for geocomputing
Our Puhti supercomputer could be the first option to consider for GIS users. Puhti has several GIS software packages preinstalled and also includes some bigger Finnish datasets. The environment is ready for computing, you just need to log in and start working! However, Puhti is not a desktop or web application. It requires some Linux command-line skills. The main reasons why Puhti might not be suitable for some analysis are user's limited Linux skills and software incompatibility.
It is possible to install most of the software available for Linux on Puhti. A lot of common GIS software is already installed on Puhti by CSC. Puhti's GIS software includes at the moment: CloudCompare, FORCE, GDAL, GRASS, LasTools, NASA Ames Stereo Pipeline, OpenDroneMap, Orfeo ToolBox, PCL, PDAL, QGIS, SagaGIS, sen2cor, sen2mosaic, SNAP, WhiteboxTools and Zonation. Additionally, R and Python are available with preinstalled spatial packages. Additionally, MATLAB and Julia are also available.
When using some of the installed software, a related module must always be loaded first, please see the linked pages above for details about the specific software. It is also possible to install software on CSC environment yourself for your project's use. The easiest option for installations is often Tykky tool. Puhti also has GPU partitions, which among GIS users are mostly used for deep learning.
Puhti's operating system is Linux, so software available only for Windows, such as ArcGIS or Erdas, cannot be installed on it. Also, server kind of software, including PostGIS or GeoServer, are not suitable for Puhti. In these cases you can use our cPouta service, described below.
Puhti has a shared data folder for spatial data, which is available for all users and includes the most important open GIS datasets of Finland. These include NLS DEM, lidar data and topographic database, LUKE VMI, all SYKE open data, and many more.
You can also move your own data to Puhti. There are different directories available for various purposes. In scratch directory, everybody has a 1 Tb of space by default, which can be extended on request. Scratch is cleaned up periodically, so keep a copy of your important files also on Allas object storage. GDAL and all other software based on it support reading data directly from Allas. However, GDAL does not support direct writing to Allas, so normally you have to write your output files to scratch first, and then move them to Allas. With Python and R scripts it is possible to write directly to Allas.
LUMI supercomputer for geocomputing
LUMI is one of the largest supercomptuers in the world, providing especially a lot of GPU resources, but also CPU. LUMI access is applied via national or EU quota for academic projects. For spatial data researchs, LUMI could be valuable in international projects or public-privat collaborations. Also companies have good possibilities to use LUMI.
Compared to Puhti, LUMI has no local spatial data and a very limited list of pre-installed software, the users are expected to install the required tools themselves. The same possibilities and limitations apply as to tools on Puhti. The easiest option for installing many GIS tools is the Container wrapper (similar to Tykky in Puhti), which can create custom pip and conda installations or convert a Docker container to an Apptainer container, which is suitable for the supercomputer. CSC also provides help with installing tools to LUMI. If you are interested in using LUMI for spatial data analysis, contact CSC servicedesk.
Working with a supercomputer
Normally, work with a supercomputer is done using scripts. The most commonly used scripting languages for GIS are R, Python and bash scripts. If you are moving your existing scripts to supercomputer from a Windows environment, you usually only need to modify the paths of the files. You also have to confirm the availability of used packages, but you can install your own packages for your own use.
The scripts are run as batch jobs. Jobs enable to organize and balance the use of computing resources between different users. A job is started by a batch job file. In principle, there are three kinds of batch jobs:
- Single-core serial jobs with "normal" GIS-software. You run your code as it is, just on the supercomputer. This will not be much faster than using a desktop computer, but it will free up your desktop during long computations.
- Embarrassingly parallel jobs with "normal" GIS-software. The idea is to run the same script several times simultaneously. But these jobs are unaware of each other, and the user has no control over the order of execution of these jobs. In GIS context, these jobs are useful, for example, if you are doing the same analysis for different map sheets, different scenarios, or different time periods. Several tools are available to support this type of jobs.
- Parallel jobs with several cores. Many scientific software packages support this option, so this is the most common usage type on Puhti. Some GIS software packages support parallel computing out-of-the-box. Many programming languages, such as R and Python, support parallel computing. Using these features, it is possible to write scripts that run in parallel yourself.
We provide some example scripts for spatial data analysis in Puhti. Examples include also batch job scripts. Some of the examples include similar solutions for serial and parallel jobs. Examples are for Allas object storage, Python, R, FORCE, GDAL, GRASS, PDAL, SNAP, and machine learning. Also, GeoPortti provides some longer examples on Github.
Puhti has an interactive partition for using tools in a regular way. It is meant for smaller interactive analysis tasks and using software with graphical user interface (GUI). This way you can use, for example, CloudCompare, Jupyter, QGIS, SNAP, GRASS GIS, SagaGIS, Zonation, RStudio or Spyder for Python. Puhti web interface is the best option for using software with GUI.
LUMI web interface is still under development, but will be similar to Puhti web interface.
cPouta for geocomputing
cPouta is an Infrastrucutre-as-a-Service offered by CSC. It offers different hardware setups where the user has to install everything needed from scratch (operating system, software, network configuration, and so on). Therefore, cPouta is not suitable for trivial computing needs. On the other hand, this gives user the freedom to install custom computing environments. cPouta is ideal for running server kind of software, such as PostGIS and GeoServer. Expert users can also set up their own computing clusters. cPouta requires skills of server administration, software installation, and Linux.
cPouta practically supports only different versions of Linux, so setting up ArcGIS Pro or desktop is not possible. The easiest way to use some ArcGIS functionality is to install ArcGIS Server for Linux on cPouta and run ArcPy scripts.
Performance hints for geocomputing
- Use profiling tools to see which parts of your script are the slowest. Look for possibilities to make the slowest parts faster. All programming languages have their own profiling tools, for example:
- Algorithms and functions from different packages might use different amounts of time for the same computational task.
- Watch out for 'for loops' and try to find alternative ways.
- Make the script run in parallel.
- When working with big raster datasets, virtual rasters might be very helpful.
- When working with big vector data sets, using a database is appropriate.
- Remove unnecessary data (clip, select, generalize).
- Index vector data if your software can use it.
Next steps for starting with CSC computing services
- Steps to get started with CSC computing services: accounts, projects, services, billing units.
- Connecting to CSC supercomputers
- cPouta documentation
- Moving data to CSC services and back:
- Linux tutorial
- CSC computing environment self-learning course
- GIS courses materials, inc Geocomputing using CSC resources, R and Python GIS courses, machine learning with spatial data etc.
- Geocomputing seminar materials, inc point cloud and EO workshops and several use case presentations.
Geocomputing related news are sent to gis-hpc mailing list, you are welcome to join!
In case you have any questions or comments, or you need some other software or data on Puhti, please contact our service desk.