Virtual datasets

Virtual datasets are useful for managing large datasets that are split into multiple files. For example 10m and 2m DEMs available in Taito are often convenient to use through virtual rasters.

Virtual rasters are just xml files that tell GDAL where actual data can be found but from user's point of view virtual rasters can be treated much like any other raster format. Virtual raster's are useful because they allow handling of large datasets as if they were a single file eliminating need for locating correct files for each part of your script.

For example the 2m DEM is available in Taito at /proj/ogiir-csc/mml/dem2m. It is however split into a number of tif files (map sheets) and if we wanted for example to calculate zonal statistics for some areas scattered around whole Finland we would have to somehow find out which elevation model covers which area and compute statistics from correct file. Further complications would arise if an area we want to calculate statistics for happens to lie at a border between two or more map sheets. These issues can be easily avoided by creating a virtual raster for the whole study area and above mentioned problems will be automatically taken care of by GDAL.


Creating virtual rasters with GDAL
As virtual rasters are just xml we could write it even by hand using text editor. This is of course impractical for any large number of files. GDAL has a very nice tool gdalbuildvrt which will create the virtual raster for us. To use GDAL in Taito we must first load the module with:

module load gdal

Gdalbuildvrt is very simple to use. It takes a list of files and name of output virtual raster as parameters like so:

gdalbuildvrt -input_file_list file_list.txt virtual_raster.vrt

Note that the tool has some other options available but for this example only most basic functionality is required.

In our 2m dem example a list of files (including paths) can be generated using find:

find /proj/ogiir-csc/mml/dem2m/ -name "*.tif" > file_list.txt

Above command looks recursively for all files with .tif ending from dem2m folder and prints them to file_list.txt file which can be then supplied to gdalbuildvrt as argument.

Once virtual raster has been created it can be used and visualized like any other raster file using software that utilasies GDAL, including many python gis modules, qgis, grass and saga. It is worth noting that while running some analysis on a 2m dem covering whole Finland is entirely feasible in Taito, viewing the data with for example QGIS is not practical for such a large dataset.

Working with virtual rasters in different GIS-software

If your virtual raster if of a size that could be handled for example by a normal single tif file then most GIS software should be able to use it without problems. For working with larger datasets:

  • QGIS, can be used to view large virtual rasters (with hierarchical structure and overviews) smoothly.
  • Python, packages such as rasterio and rasterstats can use large virtual rasters relatively efficiently (see training github).
  • R, reading and querying virtual rasters with raster package works fine.
  • GDAL allows you to specify an operating area so you can work pretty efficiently with large vrt files too if you don't actually need the entire data.
  • GRASS can link to external vrt files with r.external tool and also allows setting an computational region in a similar fashion to GDAL to process only a small part of a vrt file or process vrt file in parallel tiles (see training github). However r.external seems to take long time for really large virtual rasters (whole 2m dem of finland in Taito for example). Viewing of large vrt files is not as smooth as in QGIS.
  • Taudem reads vrt files, but as output files are rasters covering the same area as the input vrt there isn't that much point to using large vrt files with Taudem.
  • SagaGIS can import vrt files but this will simply result in one large saga grid file so again not much advantage in using a large vrt to begin with.