Determine Flow Accumulation GRID: Using Python; Map Algebra and NumPy

PeterWilson · ‎05-12-2013

I'm currently busy with a hydrology study where I'm determining the location and structure requirements (culverts & bridges) for a proposed Railway Alignment. My initial assessment is based on the STRM 90m DEM and the size of my DEM is 34 651 X 24 934 cells = 863 988 034 cells. I can't afford to split my DEM or to increase the cells size as its already a coarse cell size. I've used the Flow Accumulation tool within Arc Hydro, the Flow Accumulation tool within the Spatial Analyst Extension; Hydrology Toolbox as well as tried running it from Python hoping for better processing time. The Flow Accumulation algorithm that is being used seems to be hard coded in that it doesn't use any more than 2 GB memory when processing the Flow Direction GRID to generate the Flow Accumulation GRID.

Is there a way to use Python; Map Algebra and Numpy to improve my processing time in generating the Flow Accumulation GRID for large DEM's.

The current Flow Accumulation GRID runs for almost 72 hrs.

Regards

curtvprice · ‎05-12-2013

Hi Peter!

Yes, that's a big raster: 34651 * 24934 * 4. / ( 1024 ** 3) > 3.2 GB uncompressed. The fact that the tools are not failing is a great credit to Esri's GP team. (Well done, folks!)

I've used the Flow Accumulation tool within Arc Hydro, the Flow Accumulation tool within the Spatial Analyst Extension; Hydrology Toolbox as well as tried running it from Python hoping for better processing time.

In 10.x, the same actual raster tools run, whether you initiate it from a tool, model or Python, or even ArcObjects "Op" interfaces. (Although this takes away some options from 9.x, I'm happy to be free of the complications-- although if desperate you can still get to MOMA - with its many limitations -- through arcpy.gp.) So I don't expect any improvement there.

The Flow Accumulation algorithm that is being used seems to be hard coded in that it doesn't use any more than 2 GB memory when processing the Flow Direction GRID to generate the Flow Accumulation GRID.

This has to do with that the tools are developed and usually run in 32-bit, so you are limited with how much direct-access memory can be addressed at one time.

Two things I suggest trying (using a small test area!!):

1) If you have 10.1 SP1 installed, and are running Win7-64, try running the tool using x64 Background Geoprocessing. (Or more directly by running your script from 64-bit Python/arcpy.) It's possible the Flow Accumulation tool has been modified in 64 bit to use more RAM.

2) Test using other raster data types and see if the performance varies. Since GIS processing is often I/O dependent, different datatypes sometimes work better with different tools, due to how the process interacts with the particular data structure. Candidates I'd compare first would be grid, file gdb, .img, and .tif. (For example, I have heard that Flow Direction is quite zippy for .tif output.)

For best performance (you just be doing a lot of thrashing between memory and disk) you may be forced to tile your processing into basins (by building watersheds from your flow direction raster). The NHDPlus folks have taken this approach when hydro-processing DEMs for the continental U.S.

Hope this is helpful. I'd look further into the NHDPlus docs to see if anything in their approach can help you.