ArcGIS computing

ArcGIS on local machine

  • For best performance use ArcGIS Pro OR ArcMap with ArcGIS for Desktop Background Geoprocessing (64 bit). Using 64 bit processing to perform analysis on systems with large amounts of RAM may help when processing large data which may have otherwise failed in a 32 bit environment.
  • Using the in_memory workspace instead of using geometry objects is faster, but if you need memory also for calculation and have a big dataset loaded, there might not be enough memory available.
  • It is not possible to use ArcGIS Pro or ArcMap at CSC's Taito computing environment. It is only possible to run ArcMap ArcPy scripts in cPouta by installing Linux ArcGIS Server.

 

ArcPy

For running ArcPy scripts in CSC computing environment, the best option is to install ArcGIS Server to cPouta. The installation instructions can be found from GitHub. ArcGIS Server includes also the ArcPy library. So it is possible to run the ArcPy scripts as with ArcGIS for Desktop also on server machine, for example in cPouta.

 

ArcGIS in parallel

ArcMap and ArcGIS Pro both have some functions that can run in parallel.

It is also possible to write ArcPy scripts where the parallelization is added with Python Multiprocessing or Parellel packages.

Some tips from these sites:

  • Locks:
    • Each process locks the used data, so only one process can access data from same file/FileGDB.
    • Multiprocessing is possible, if you are using several files (Shape, tiff etc) or suitable database.
      • If you want to use multiprocessing for geocoding or routing, you need several copies of the road network data.
  • For each process, there is a start-up cost loading the arcpy library (1-3 seconds). Depending on the complexity and size of the data, this can cause the multiprocessing script to take longer to run than a script without multiprocessing.
  • In many cases, the final step in the multiprocessing workflow is to aggregate all results together, which is an additional cost.
  • Memory. When an arcpy seasoned python.exe process runs out of memory while running a geoprocessing task, it tends to thow a rather obscure and unexpected errors in tools that otherwise (when run by themselves) run just fine.
  • Python Multiprocessing vs. Parallel Python:There doesn't seem to be much speed difference between the two, however:
    • Parallel Python requires an extra package to be installed (Multiprocessing is included with Python)
    • Parallel Python will fail if you use arcpy.addMessage() or arcpy.addError() within the parallelised function
    • Parallel Python has issues if you are using tools that must be ‘Checked Out' before execution (such as Network Analyst)
    • I have not been able to get Multiprocessing to work with Geocoding
  • arcpy.Exists() is essential– it should be used to check every input dataset. Not just because it will catch input errors, but without it, Arc will regularly fail to do basic operations for no apparent reason. It may be that checking for existence forces a file refresh…?