Geocomputing

CSC offers a wide range of very high level computing services for any field of research. A typical GIS-user uses some desktop software for daily work. Moving to CSC computers might make sense in these cases:

  • Computing something takes more than 2-4 hours
  • Need more memory
  • Working with very big datasets
  • Keep your desktop computer for normal usage, do computation elsewhere
  • Need for a server computer (cPouta)
  • Need for a lot of computers with the same set-up (courses)
  • GPU or MPI programs
Usage of CSC's computing environments are normallly free of charge for users from Finnish universities, other users are also welcome, for them the price list is available here.

CSC's supercomputers have fast data I/O and a lot more memory then normal desktop computers. In general the computing speed of one CPU is not much better than of normal desktop computer, but there are thousands of CPUs compared to few of the desktop computers. Taito also has GPU accelerators. Use of CSC's computers could give significant results if, the analysis can run in parallel on several CPUs.

 

Practical solutions in CSC's environment

The use of these services for GIS has been so far rather limited. One big problem for GIS users has been, that the widely used ArcGIS software are available only for Windows operating system, but CSC's supercomputers are running on Linux. Also normally GIS software is not designed for running in parallel or using other supercomputing concepts.

Taito

Taito is CSC's supercluser and could be the first option to consider for GIS-users. To Taito it is possible to install most of the software available for Linux. A lot of software is installed to Taito by CSC. Taito's GIS software includes at the moment: QGIS, GDAL/OGR, Proj.4, SagaGIS and R, including several spatial packages. Taito is a Linux machine, so software available only for Windows can not be installed there or server kind of software, which means that ArcGIS, Erdas, PostGIS or GeoServer can not be added there. It is also possible to install yourself software to CSC environment for personal use. If you need a software that could be useful also for others, please send an e-mail to CSC servicedesk asking installation of that software.

The programs written in C/C++, Fortran, Python or Perl should work relatively well. On the other hand Java, Javascript and other interpreted languages have less potential. Also a command-line interface or API and possibility to write scripts are a big advantage, although also a graphial interface may be used via X shell or NoMachine desktop.

Taito has a shared data folder for spatial data, which is available for all users.

Alternatives for using Taito:

  1. Using single core serial jobs with "normal" GIS-software. You run your code as it is, just in Taito. This will not be much faster than using desktop, but for long computations just freeing up your desktop might be useful. And you can use the extra memory and faster input-output properties of Taito.
  2. Using several cores with array jobs., with "normal" GIS-software. The idea of array job is to start several jobs at the same time, but these jobs are unaware of each other, and the user has no control over the execution order of these jobs. In GIS context array jobs are useful for example if you are doing same analysis for different map sheets, or different scenarios, or different time periods.
  3. Using several cores with parallel jobs.
    • Many scientific software packages support this option, so this is the most common usage type in Taito. But only very few GIS software packages support parallel computing out-of-the box, see the list below.
    • Many programming languages support parallel computing (for example snow or foreach in R, or multiprocessing or parallel in Python). Using these features the user has control over the workflow and which parts of the code are run in parallel.

 

cPouta

cPouta is a Infrastrucutre-as-a-service kind of service, so there the user has to do all setup work (software installation, network configuration etc), so for smaller works it is not suitable. On the other side this gives the user a lot more freedem. In cPouta also Windows installations are possible in printciple. To cPouta any GIS software can be installed, most attracting it is for software not suitable for Taito, for example ArcGIS, PostGIS, GeoServer. In cPouta a wide range of virtual machine flavours is available, some of these are speciallly designed for HPC-computing or fast IO.

The easiest way for utilizing ArcGIS functionality is to install ArcGIS Server for Linux and then to run ArcPy scripts, see instructions for that.

For running open source software the easiest way might be installing OSGeoLive.

 

Some more hints for faster geocomputing

It is always recommended to have critical look also on the used code and your data. Small changes in these might have a big impact on computing times.

Software:
  • Use profilig tools to see which parts of your workflow are the slowest.
  • Look for possibilities to make the slowest parts faster. Different algorithms and different software products may use quite different amount of time for same computation.

Data:

  • When working with big vector data sets using a database could be appropriate.
  • Vector data has indeces
  • appropriate level of detail (generalize if needed)
  • only needed area (clip if needed)
  • only relevant data as attributes (delete some of the attributes if needed)
  • no any empty unused space in .dbf files of Shape files
  • for some analysis it might be better to divide your data into parts, for example in ArcGIS Tabulate Area was ca 100 times faster when using 1000 input files with one polygon instead of one file 1000 polygons and each polygon was calculated separately.

 

Software suitable for supercomputers

Some international projects have developed GIS-software for use with supercomputers. In these cases the software can make use of the special characters of supercomputers, running in parallel or using GPU for processing.

GRASS has limited support for Parallel GRASS jobs. Also some script examples of running several GRASS jobs in parallel are available. There has been also one attempt to use GPU. Some related articles:

In CyberGIS project some open source GIS software packages with support for parallel runs were developed:

Some others:

If you have any questions or comments, or any interest in using CSC's supercomputers contact CSC servicedesk.