MapInfo Pro

 View Only
  • 1.  Denmark DTM – Building a trillion cell terrain raster

    Posted 09-12-2019 19:27
    Edited by Sam Roberts 09-13-2019 17:12

    This article contains data from The Danish Agency for Data Supply and Efficiency, The Danish Elevation Model, August 2019.
    The data was downloaded from the Danish Agency for Data Supply and Efficiency in July 2019.
    More information (in English) is available at  https://eng.sdfe.dk/product-and-services/the-danish-elevation-model-dk-dem/


    Introduction

    Denmark is a low-lying country with significant areas below sea level protected by man-made dykes and sea walls. For those who don't know where Denmark is, or think that Denmark and the Netherlands are the same place - Google Maps to the rescue.

    Denmark - sourced from Google Maps

    The image below shows the extent of dyke development in the South West of Denmark adjacent to the Wadden Sea. It is not surprising, therefore, that the Danes want to get a thorough understanding of their terrain and topography. To that end, they have flown at least two LiDAR surveys over the entire country to create high-resolution digital terrain models. I was given access to the most recent survey data which can be downloaded via FTP.

    Wadden Sea Dyke - sourced from Google Images


    In 2015, MapInfo acquired data from an earlier LiDAR survey of Denmark and we created a DTM raster in MRR format. This was a useful exercise at the time but the raster has a number of problems which I hoped to rectify using the new survey data in 2019.

    Both DTM (the ground surface) and DSM (the first return surface including trees, buildings etc.) rasters are available for download in TIFF format. The LiDAR data is also available for download at the same site, although I have not tried to download or process any of this data. Towards the end of June 2019, I downloaded the DTM rasters - a total of 668 ZIP archives, each containing up to 100 TIFF rasters. The total download size was 548 GB. The TIFF rasters have a horizontal resolution of 0.4 metres (40 centimetres) and with a size of 2500 columns by 2500 rows and provide coverage of one square kilometre each. They use the ETRS TM Zone 32, Northern Hemisphere (ETRS89) projection. The RL data is stored in metres as 32-bit floating-point real numbers which are able to represent data accurate to less than a millimetre vertically.

    My goal was to join all of these TIFF rasters back together into a single raster in MRR format. I wanted to remove any offshore data and I wanted to minimise the size of the final raster to make it as portable as possible. This article describes how I went about it.

    Processing challenges

    With a dataset of this size, the first challenge is disk space. I have a laptop with a small SSD and a slow HDD. I do not have room on either of these drives for the downloaded data or any of the downstream raster products. So, just about all processing had to be done from source external HDD to destination external HDD, using the SSD for temporary data whenever possible. Needless to say, this does not provide high performance and the first lesson is – get yourself a big, fast SSD. Secondly, I performed this processing on a laptop with a quad-core CPU. Whilst it is possible to work with huge rasters on modest hardware, do yourself a favour and get a decent desktop workstation.

    The next issue is how to efficiently extract all the TIFF files from the ZIP archives so that they can be processed. The TIFF files use no compression, so once they are extracted from the ZIP archive they explode in size. Consequently, it was necessary to process one archive at a time and clean up all the waste products as I proceeded.

    I decided to merge all the TIFF files in each of the 668 ZIP archives into a "naked" MRR – leaving me with 668 MRR raster files to work with thereafter. A "naked" MRR is an MRR that does not include an overview data pyramid or statistics. Computing statistics and building an overview pyramid takes time and consumes disk space, so if you are only going to use the rasters as intermediate products in a processing pipeline it can be advantageous to omit these processing tasks.

    To achieve this I use "RasterProcessor". This is a stand-alone program that taps into the raster processing engine and, following a script supplied in XML, executes processing operations. At one stage we toyed with exposing this in MapInfo Pro Advanced, but it was withdrawn. It is available in the Raster API – look for an API called "MIR_ExecuteProcess". My script, shown below, copies the ZIP file to a scratch directory, extracts the TIFFs, merges them together using the undocumented "Join" operation, moves the naked MRR back to external HDD and then deletes all the rubbish. After running this I can forget about all the TIFFs – instead, I have 668 naked MRR rasters containing all the unmodified source data.

    <?xml version="1.0" encoding="UTF-8"?>
    <RasterProcessing>
    
    	<RasterOperationList Primary="True">
    		<GatherFiles Mode="OneByOne">
    			<TargetDir>F:\Data\OpenSource\Denmark\DTM_2019\Downloads\</TargetDir>
    			<TargetName>*.zip</TargetName>
    			<OpListName>ProcessZIP</OpListName>
    			<Parallel>False,8</Parallel>
    		</GatherFiles>
    	</RasterOperationList>
    	
    	<RasterOperationList Name="GatherAndJoin">
    		<GatherFiles Mode="GroupByAll">
    			<TargetDir>C:\Temp\DanishDTM\Working\</TargetDir>
    			<TargetName>*.tif</TargetName>
    			<OpListName>JoinTIFF</OpListName>
    		</GatherFiles>
    	</RasterOperationList>
    	
    	<RasterOperationList Name="GatherDeleteTIFF">
    		<GatherFiles Mode="OneByOne">
    			<TargetDir>C:\Temp\DanishDTM\Working\</TargetDir>
    			<TargetName>*.tif</TargetName>
    			<OpListName>DeleteFile</OpListName>
    			<Parallel>False</Parallel>
    		</GatherFiles>
    	</RasterOperationList>
    	
    	<RasterOperationList Name="GatherDeleteMD5">
    		<GatherFiles Mode="OneByOne">
    			<TargetDir>C:\Temp\DanishDTM\Working\</TargetDir>
    			<TargetName>*.md5</TargetName>
    			<OpListName>DeleteFile</OpListName>
    			<Parallel>False</Parallel>
    		</GatherFiles>
    	</RasterOperationList>
    	
    	<RasterOperationList Name="ProcessZIP">
    		<ExecuteShellCmd>
    			<CommandData Command="Copy" Target="%FULLPATHNAME%" Destination="C:\Temp\DanishDTM\Working\%FULLNAME%"/>
    			<CommandData Command="Execute" Executable="C:/Program Files/7-Zip/7z.exe" Target="C:\Temp\DanishDTM\Working\%FULLNAME%" Parameters="e -y"/>
    			<CommandData Command="Delete" Target="C:\Temp\DanishDTM\Working\%FULLNAME%"/>
    		</ExecuteShellCmd>
    		<ExecuteOpList OpListName="GatherDeleteMD5"/>
    		<ExecuteOpList OpListName="GatherAndJoin"/>
    		<ExecuteOpList OpListName="GatherDeleteTIFF"/>
    		<ExecuteShellCmd>
    			<CommandData Command="Move" Target="C:\Temp\DanishDTM\Working\%NAME%.mrr" Destination="F:\Data\OpenSource\Denmark\DTM_2019\NakedMRRTiles\%NAME%.mrr"/>
    		</ExecuteShellCmd>
    	</RasterOperationList>
    
    	<RasterOperationList Name="DeleteFile">
    		<ExecuteShellCmd>
    			<CommandData Command="Delete" Target="%FULLPATHNAME%"/>
    		</ExecuteShellCmd>
    	</RasterOperationList>
    	
    	<RasterOperationList Name="JoinTIFF">
    		<Join>
    			<DestinationFilename>C:\Temp\DanishDTM\Working\%NAME%.mrr</DestinationFilename>
    			<Format>MI_MRR</Format>
    			<InvalidToEmpty>True</InvalidToEmpty>
    			<CellCombineRule>Last</CellCombineRule>
    			<RasterInfo>
    				<Name>DTM</Name>
    				<UnderviewMapSize>32,32</UnderviewMapSize>
    				<UnderviewTileSize>256,256</UnderviewTileSize>
    				<BaseMapSize>32,32</BaseMapSize>
    				<BaseTileSize>1024,1024</BaseTileSize>
    				<OverviewMapSize>32,32</OverviewMapSize>
    				<OverviewTileSize>1024,1024</OverviewTileSize>
    				<FieldInfo>
    					<Name>DTM</Name>
    					<Type>Continuous</Type>
    					<Compression>LZMA,3</Compression>
    					<OverviewCellCoverage>Coverage_Half</OverviewCellCoverage>
    					<ValidFlagPerBand>True</ValidFlagPerBand>
    					<BandInfo>
    						<Name>RL</Name>
    						<DataType>REAL4</DataType>
    						<StoreDataType>REAL4</StoreDataType>
    						<RestrictDecimals>False</RestrictDecimals>
    						<Clip>False</Clip>
    						<Transform>False</Transform>
    						<PredEncode>False</PredEncode>
    						<NullValueType>Mask</NullValueType>
    						<DiscreteValue>False</DiscreteValue>
    					</BandInfo>
    				</FieldInfo>
    			</RasterInfo>
    			<Finalisation>
    				<BuildOverviews>None</BuildOverviews>
    				<ComputeStatistics>None</ComputeStatistics>
    				<StatisticsLevel>None</StatisticsLevel>
    				<CreateTABFile>False</CreateTABFile>
    			</Finalisation>
    		</Join>
    	</RasterOperationList>
    </RasterProcessing>


    The next step is to merge these MRR files together. I did this twice – once to produce an MRR with statistics and overviews so I could visualise the raster, and once to produce another "naked" MRR which I would use for further processing. To perform the merge I used the "Join" operation via RasterProcessor, as before. Here is what the raster looks like once you join all the TIFF tiles together.

    Denmark DTM - Joined TIFF files


    The completed raster revealed the following problems –

    • Oceans. Where a tile intersects the coast, the cells over the ocean are valid and set to zero. So the raster contains billions of useless cells filled with zero. These cells skew the distribution statistics and provide no useful information. About 15% of all the cells are zero, and I wanted to remove them. The picture below shows the typical extent of ocean cells in the raster.

    Ocean data shown in blue
    • Patches. I discovered four TIFF tiles that contained data that was clearly not valid. I guess the LiDAR processing on these tiles went wrong and the failures were not detected. It was not easy to find these tiles – they are needles in a haystack. Two I found by eye, the remainder I found later by comparing this raster to the raster made in 2015, displaying the differential using virtual rasters. The picture below shows a bad tile - it contains data, but that data is clearly no good.

    A bad tile
    • Invalid Cells. The statistics indicated that there were a number of cells that had invalid RL values. The DTM range in Denmark is between about -8 and +173 metres. But lurking within this trillion cell raster were cells with values much lower and higher than these limits. Some of this data is valid – for example, there are some quarries or excavations that are substantially below sea level – but the majority of these cells must surely be invalid.


    Removing Oceans

    I want to remove all the cells that are "ocean" and I know they all have a value of zero. You can use a calculator operation to hunt down and remove cells but this has the undesirable side effect of removing all cells with a value of zero, regardless of whether they are the ocean or not. In this dataset, there are many zero value cells that are landlocked and should not be removed. I wanted a better approach.

    An alternative approach is to edit the raster by "flood filling" the areas of ocean and replacing the valid cells with invalid cells. This has the advantage that it searches for contiguous regions of the ocean and will not touch zero value cells that are not connected to the ocean. The first problem with a flood fill is that it has to be initiated somewhere (at a cell in the raster) - and to edit this raster would require many separate initiation points. These initiation points need to be located somehow. Secondly, the flood fill algorithm is recursive which is a euphemism for "can consume an unpredictable amount of resources". Acting upon a trillion cell raster, a recursive algorithm has plenty of scope to consume all your processing resources - and fail.

    The algorithm I developed works in two stages. Firstly, I process the raster to identify cells that are valid, cells that are invalid, cells that are ocean and cells that are zero but not ocean. This uses a flood fill algorithm that self-initiates and is designed to keep recursion levels very low. The output of this is another raster (at the same resolution as the original raster) where the types of cells are clearly identified. As this "mask" raster will be used in the second stage of processing, I can manually edit it before using it to eliminate the ocean. The final ocean mask is shown below.

    Ocean mask


    Manual editing was required to remove areas where the algorithm decided zero value cells were landlocked, but the human eye can determine that they are not. For example, this can occur when bridges are not removed from the LiDAR and two bridges cross a neck of the ocean. I used a raster editing test harness to manually and visually correct these issues in the mask raster. In the second stage, I then "applied" the mask to the DTM raster and removed all the cells that I had identified as the ocean. The picture below shows a typical error in the mask where two bridges cross a neck of ocean and in between the algorithm incorrectly identifies landlocked zero value cells. I corrected these manually.

    Mask error


    Correcting Patches

    For posterity, these are the UTM coordinate ranges of the patches that look to contain invalid data.

    (523000, 6142000) - (523800, 6143000), (523000, 6118000) - (523800, 6119000), (623000, 6133000) - (625000, 6134000), (533000, 6182000) - (534000, 6183000)

    I repaired these areas at the end of the processing by using the Merge operation in MapInfo Pro Advanced and taking advantage of its ability to edit an existing raster. I clipped out these areas from the older 2015 DTM raster and then inserted those four small rasters into the large DTM raster.

    Treating Invalid Cells

    A very small percentage of the cells in the raster – about 90 thousand out of a trillion - have invalid cell values. We know this from the raster statistics which report the minimum and maximum values of the raster. Finding those cells in a trillion cell raster is no easy thing to do. You cannot do it by visual inspection.

    I decided to find out where these bad cells are. To do so, I executed another Join operation on the naked MRR's but this time using data conditioning to reject all data between -7m and 170m RL. I was left with just those cells that are outside that RL range. As I expected the cells to be isolated and difficult to spot, I used a different method to generate the overview pyramid which ensures that any cells in the base level remain present in the overview levels. That way, you can see the rogue cells from far away and zoom in on them.

    I found that much of the data between -20 and -7m RL is likely to be legitimate. It includes some known broad areas that are quite low lying and other features like quarries and excavations that can be significantly below sea level. Similarly, some of the RL's up to about 173m are also legitimate high ground. But much of the rogue data is clearly illegitimate and wrong. I think much of it probably originates from a single bad LiDAR reading that then gets turned into a bad area of cells as it gets incorporated into the surface triangulation. Then again, some areas seem to have systematically bad data.

    I addressed these invalid cells by simply clipping them out using data conditioning. I rejected all cells below -25m RL and above 200m RL. This leaves behinds holes in the raster, which I did not attempt to fill or rectify. The image below shows a typical cluster of cells where the cell values are negative and invalid. The second image shows a region of cells with invalid high values - perhaps caused by surface water?

    Bad data - lows
    Bad data - high


    Keeping a lid on it

    The original TIFF files, stored in ZIP files, amounted to 548 GB. My goal is to end up with a smaller and more portable dataset and to achieve this I use the MRR format. Remember that an MRR will contain an overview pyramid which increases the amount of data in storage by at least 33%, so you start from behind.

    The primary way you can keep raster files small is to use compression. In this case, I used the lossless LZMA codec (level 9) to compress the cell data inside the MRR. Apart from compression, you can take advantage of the rich variety of data types MRR supports as well as the scaling and transformation properties for bands. You can also choose to sacrifice vertical resolution if you deem it to be appropriate. I chose to transform the data from 4-byte floating-point to 2-byte signed integer using a transform that reduced the vertical resolution of the data to 10 millimetres. As the data is being stored in the MRR as integers, I was then able to use predictive encoding. This transformation sacrifices no information but makes the data more compressible and at the end of the day you get a smaller file. With a horizontal resolution of 0.4 metres and a vertical resolution of 10 millimetres the final MRR file, including the overview pyramid, is 164 GB. I think that is a good result. The final DTM raster is shown below.

    Final DTM


    Summary


    By removing the ocean we eliminated billions of useless cells from the raster and this results in more useful statistics. Better statistics mean better rendering and better analysis. By patching the areas that were clearly bad and removing cells with crazy RL's, I made some attempt to improve the overall quality of the raster.

    The processing operations used – principally the Join operation and the automated flood edit operation – have now been tested in the heat of battle and may find their way into future releases of MapInfo Pro Advanced.

    Now that we have two rasters of Denmark (the first from 2015 and the second from 2019), we will be able to make comparisons to see what has changed over that time. To do this, I plan to use virtual rasters and I might report on that analysis in a future article.

    I have attached a DTM of Denmark that you can download, extracted from the 0.4-metre resolution raster. It is simply one of the overview resolution levels (level 6), extracted and written to a new raster. The horizontal resolution is 25.6 metres and the vertical resolution is 10 centimetres.



    ------------------------------
    Sam Roberts
    Engineer, MapInfo Pro Advanced (Raster)
    Australia
    ------------------------------

    Attachment(s)



  • 2.  RE: Denmark DTM – Building a trillion cell terrain raster

    Employee
    Posted 09-16-2019 08:02
    Great article, Sam. Thanks for sharing.

    Maybe I should consider adding support for the MIR_ExecuteProcess in the MapInfo Raster Tool, that I work on.

    ------------------------------
    Peter Horsbøll Møller
    Pitney Bowes
    ------------------------------



  • 3.  RE: Denmark DTM – Building a trillion cell terrain raster

    Posted 09-19-2019 06:09
    Nice article Sam. Pushing the limits!

    ------------------------------
    - Stevemann4@gmail.com
    Knowledge Community Shared Account
    ------------------------------



  • 4.  RE: Denmark DTM – Building a trillion cell terrain raster

    Posted 12-05-2019 01:00
    At the request of a client, I have now built the DSM for Denmark, which is at the same resolution as the DTM described in the original article.

    The DSM (digital surface model) differs from the DTM (digital terrain model) as it includes all the surface features - tree, buildings, ships, cars, people, cows and anything else you can think of. Consequently, the raster is larger (on disk) as there is more information and it is less compressible. The DSM raster is 226 GB.

    The data is obtained from the Danish Mapping and Cadastre DTM site - ftp://ftp.kortforsyningen.dk/dhm_danmarks_hoejdemodel/DSM/
    Please note the following disclaimer - "This article contains data from The Danish Agency for Data Supply and Efficiency, The Danish Elevation Model, August 2019. The data was downloaded from the Danish Agency for Data Supply and Efficiency in July 2019. More information (in English) is available at https://eng.sdfe.dk/product-and-services/the-danish-elevation-model-dk-dem/"

    The directory contains 673 ZIP files containing TIFF rasters, total size 635,965,046,565 bytes (592.3 GB). Note that this is 5 more than the DTM dataset. The five extra files appear to be "DTM" rasters - not "DSM" - and they duplicate files that are already in the DSM list. I did not download them.

    To create this DSM raster I used the same procedure described in the original article, with less intervention.

    I executed Process_Batch_ZIP2TIFF2MRR_Join_DSM.xml to iteratively extract each ZIP and join the TIFFs to a naked MRR.

    I executed Process_Join_NakedTiles2MRR.xml to create a full MRR with statistics and overviews. Decimal precision was set to
    3 (vertical resolution of 1 mm) and there was no data conditioning. Summary statistics indicated there are bad values in the raster with low values from -300 to very high values. The highest building in Denmark is 120m and the highest terrain is abut 170m so I used clipping from -25 to 300m ought to trap all valid data.

    I executed Process_Join_NakedTiles2NakedMRR.xml to create a base level MRR with clipping from -25 to 300m and 1mm vertical resolution. This raster will be the target for flood editing.

    I executed Process_AutoFloodEdit.xml to remove as much as the ocean zero value cells as possible automatically. This edit was applied to DenmarkDSM2019_400mmH_1mmV_Naked.mrr "in place". No pyramid was generated.

    I executed Process_ConvertNakedMRR2FinalMRR.xml to create a final MRR at 10mm vertical resolution.

    I found that the Auto Flood Edit had left behind significant regions of ocean, so I flood edited the raster manually using a test harness and saved changes back to the same file.

    I executed Process_Copy_LowRes.xml to create a reduced resolution MRR with cell size 25.6m and vertical resolution 0.1m by extracting level 6 from the source raster pyramid. This raster and processing scripts are attached.

    ------------------------------
    Sam Roberts
    Engineer, MapInfo Pro Advanced (Raster)
    Australia
    ------------------------------

    Attachment(s)

    zip
    ProcessingScripts.zip   5 KB 1 version


  • 5.  RE: Denmark DTM – Building a trillion cell terrain raster

    Posted 12-06-2019 00:02
    Wow, well done.​ I did a similar exercise 2 years ago, council has purchased a point cloud datasets for flood modeling purposes. the data packages include classified point cloud data and a DEM in individual tiles, I created a DSM and a normatlized DSM from that, but that's the time before MapInfo Pro raster and MapInfo Pro V17, so that was a joint team efforts from ArcGIS, FME, QGIS and wonderful Google. It's good to see that can be all done in MapInfo Raster nowaday. What version do you use? Do you need to use Discover or Discover 3d?

    ------------------------------
    Alexis Qiao
    Spatial Information Systems Officer
    City of Warrnambool
    Warrnambool
    ------------------------------



  • 6.  RE: Denmark DTM – Building a trillion cell terrain raster

    Posted 12-08-2019 16:21
    Hi Alexis,

    The Denmark surveys also include LiDAR point cloud data but I have not downloaded this data or tried to process it. Needless to say, there is a vast amount of it. We now include LiDAR gridding capabilities in MapInfo Pro Advanced which might interest you. We support LAS/LAZ files as input and you can use any gridding method you want (I suggest Triangulation). You can filter the LiDAR data any way you want. We plan to upgrade this capability in future patches to 2019 to make it easier to use and to expose new LiDAR Forest Canopy analysis capabilities we have developed (Height, Density and Coverage).

    I am using the latest MapInfo Pro Advanced release (2019) and some of the functionality described is only shipped in this version. I do not use Discover or Discover3D. All the processing described above was done using XML scripts rather than through the user interface. It is not a capability that we have exposed to users yet, although there is an API function supporting it. This allows me to access processing capabilities that are not yet exposed in the mainstream product. 

    Did you use TUFLOW to do your flood modelling? I believe TUFLOW have some support for displaying the products in MapInfo and there has been some talk of improving it. I hope the modelling goes well and the good people of Warrnambool keep their heads above water!

    ------------------------------
    Sam Roberts
    Engineer, MapInfo Pro Advanced (Raster)
    Australia
    ------------------------------



  • 7.  RE: Denmark DTM – Building a trillion cell terrain raster

    Posted 12-08-2019 19:13
    Edited by Alexis Qiao 12-08-2019 19:15
    ​Hi, Sam:

    Thanks for your reply.

    Yes, LiDAR gridding capabilities in MapInfo Pro Advanced does sound interesting. Does it include point cloud classification tools as well? Actually does PB offer any MapInfo Pro Advanced course (e.g short course runs for 1 to 2 days) in Melbourne Australia region? I would love to get more info for how people utilize raster processing tools such as MapInfo Pro Advanced within local government environments.

    No I didn't do flood modelling from the point coud data back then as some external consultancy company did. Although later on I got the tasks of fiddling with the flood modelling data for preparations and finalizations for the purpose of planning scheme amendments. As the modelling data comes in raster format (gridding files), I found it is essential to get hold of proper raster tools in dealing with them.

    Also I used the point cloud dataset to create a DEM, DSM and nDSM, just for elevation data provisions and nice visual effects. As I said, that's all before I had MapInfo Pro Advanced and MapInfo Pro V17 installed for me.

    ------------------------------
    Alexis Qiao
    Spatial Information Systems Officer
    Warrnambool City Council
    ------------------------------



  • 8.  RE: Denmark DTM – Building a trillion cell terrain raster

    Posted 12-09-2019 23:13
    MapInfo Pro Advanced Does not include tools that classify LiDAR point data. We assume that this work has already been done (probably by the contractor that conducted the survey) and that the data has is ready to consume. I find it useful to execute a Stamp or Nearest Neighbour gridding operation on the classification code point data. This generates a raster you can use to visualise the point classification data and also provides statistics on the classification codes that present in the dataset.

    I have emailed Ashley Crane regarding training in case he can help.

    ------------------------------
    Sam Roberts
    Engineer, MapInfo Pro Advanced (Raster)
    Australia
    ------------------------------