What gridding method should I use? – Part 2 "The results"

View Only

Back to discussions

Expand all | Collapse all

What gridding method should I use? – Part 2 "The results"

1. What gridding method should I use? – Part 2 "The results"

9 Like
Sam Roberts
Posted 06-13-2019 21:05
Edited by Sam Roberts 07-19-2019 01:19
| view attached (10)

Reply Reply Privately
In part 1 of this article, I discussed strategies for determining which gridding algorithm will best suit your data and provided tips for getting the best results. In this article, I will compare gridding methods on different kinds of point datasets using real data.

Previous article -  What gridding method should I use? – Part 1 "The decision"

I have prepared six point datasets – each demonstrating a typical kind of arrangement discussed in the previous article. I have interpolated the value of a terrain raster at all of these points, and so we can compare how the different gridding methods reconstruct the original raster data from the sampled point data.

My six point arrangements are –

Regular isotropic

Points are arranged in a grid pattern with regular or semi-regular spacing in X and Y. My sample spacing of 80 metres is the same in X and Y, although I have introduced some random variation. This is not too dissimilar to LiDAR survey data (apart from the large sample spacing).

Regular anisotropic

Points are collected at a regular sample spacing along parallel lines that are regularly spaced. My line spacing is 160 metres and the along line sample spacing is 20 metres. This is fairly typical of an airborne magnetic field survey, for example.

Regular network

Points are collected at a regular sample spacing along a network of interconnected non-parallel lines (for example following a road network). My along line sample spacing is 80 metres (although in practice it is smaller than this), with some random variation.

Random

Points are randomly located in space. I have generated about 2500 randomly located points.

Random clustered

Points are randomly located but occur in clusters where point density is elevated. I have combined about 750 randomly located samples with about six clusters that each typically contain about 120 samples.

Mixed

A mixture of two or more of the source data arrangements described above. I have combined the road network dataset (with sample spacing of 80 metres) with an isotropic dataset (inclined) at about 200 metre spacing but with a significant degree of random variation.

I will compare grids generated with Heat Map (HEAT), Nearest Neighbour (NN), Natural Neighbour (NAN), Inverse Distance Weighted (IDW), Triangulation (TRI) and Minimum Curvature (MINC) methods. For each point data arrangement I will list all the relevant gridding properties for each method. Note that I did no clipping, coincident point analysis or input data conditioning.

The original raster has a 5 metre cell size and 641601 samples. In this exercise we are regenerating this raster from no more than a couple of thousand samples, so we can expect to lose a significant amount of detail. I have tried to create the new rasters at the same cell size, except where indicated differently. The image below shows the sample locations on top of the original terrain raster.

Regular Isotropic

HEAT     – Triweight kernel, Search Spherical 300m, Cell 5m.

NN          – Search 100m, Cell 5m

NAN       – Search 100m, Smooth 2, Cell 5m.

IDW       – Gaussian weight function, Range 100m, Search Spherical 200m, Cell 5m.

TRI          – Triangle size 160m, Cell 5m.

MINC     – Iterations High, Tension 0.0, Cell 10m.

In some scenarios you can work out the average data spacing by generating a Heat Map sample density raster from the point data and analysing the distribution statistics. In this case the statistical "mode" of the density raster is 0.0001556. Taking the square root of the inverse of this value yields the sample spacing – approximately 80 metres.

HEAT shows little variation or useful information, as you would expect with a point dataset that has an almost even spatial distribution.

NN shows a brick pattern, reflecting the regular isotropic sample spacing, where each brick is centred upon a data sample. I think this is a useful raster for data exploration, but not for interpretation.

All four interpolation methods do a reasonable job of reconstructing the original raster. MINC stands out as the best even though a larger cell size was required. NAN seems to have preserved detail well, although it is not as smooth or visually appealing as MINC – even though post-processing smoothing was applied to it. IDW has also preserved detail well but shows high frequency noise and could benefit from smoothing. TRI is visually unappealing because of a tendency for the orientation of triangles to "flip" causing an abrupt local change in slope.

Regular Anisotropic

HEAT     – Triweight, Search Spherical 100m, Cell 5m.

NN          – Search 100m, Cell 5m

NAN       – Search 100m, Cell 5m, Smooth 2.

IDW       – Gaussian, Range 100m, Search Spherical 200m, Cell 5m.

TRI          – Triangle size 200m, Cell 5m.

MINC     – Iteration High, Tension 0, Cell 40m.

In some cases you can work out the sample spacing by looking at lag statistics generated from the point data (you will require statistical software to do this). For this point data I see a recurring sample count peak every 20 metres (corresponding to the along line sampling) and at 160 metres (corresponding to the line to line spacing).

The HEAT result is highly dependent on the search radius and will give completely different results for a 100 metre radius and a 200 metres radius. In the first case the kernel does not "see" line to line and so higher density values lie along the line. In the second case we do see line to line, and so higher density values lie between the lines.

NN shows abrupt changes in between the sample lines and shows triangular artefacts which are visually unappealing.

Of the four interpolation methods only MINC has returned a visually appealing result. In the NAN raster you can see the sample lines clearly, but this time the discontinuity is along the lines rather than between them. IDW also shows a strong along line correlation. I could have used a North-South oriented search ellipse (designed to capture points from line to line rather than along lines), but I found this did not achieve any visible improvement. TRI is visually unappealing with long thin triangles between lines. MINC is smooth but note the much larger cell size of 40 metres - 1/4 the line spacing.

It is worth noting that the anisotropic dataset has 4925 samples and the isotropic dataset has 2500 samples. In this test we have almost twice as many samples, but our raster results were worse in all cases.

Regular Network

HEAT     – Triweight, Search Spherical 500m, Cell 5m.

NN          – Search 500m, Cell 5m

NAN       – Search 500m, Cell 5m, No Smooth.

IDW       – Gaussian, Range 300m, Search Spherical 500m, Cell 5m.

TRI          – Triangle size 800m, Cell 5m.

MINC     – Iteration High, Tension 0, Cell 20m.

The lag statistics for these points shows a recurring sample count peak every 60 metres, corresponding to the along line sampling. (Elsewhere I may have said the sample spacing was 80 metres, but because of the way the code interpolated between nodes of the polylines the actual average sample spacing came out at 60 metres).

I now use a much larger search radius for HEAT to get a broader indication of where sample spacing is dense.

NN is quite unappealing with long polygons bridging the gaps between lines. It is hard to see any use for this raster. TRI suffers from the same problem. To get a result from IDW required a much larger Gaussian range and search radius. The result is smooth but lacks detail and is quite different to the original raster at small and large scales.

NAN also shows typical "knot" like features following the network lines, but otherwise managed to come up with a reasonable reconstruction of the original raster. Note that the gridding operation took much longer to perform than with any other dataset. MINC worked quite well and I was able to bring the cell size down to preserve more detail.

This sample dataset is the smallest with only 707 points.

Random

HEAT     – Triweight, Search Spherical 200m, Cell 5m.

NN          – Search 150m, Cell 5m

NAN       – Search 150m, Cell 5m, No Smooth.

IDW       – Gaussian, Range 150m, Search Spherical 200m, Cell 5m.

TRI          – Triangle size 300m, Cell 5m.

MINC     – Iteration High, Tension 0, Cell 10m.

You can use a sample density grid to help determine the average sample spacing in this dataset, which works out to about 85 metres. There are 2340 points in the dataset.

This time, I used a tighter search radius for HEAT to characterise local density variations. NN is more visually pleasing and I think this is a useful raster for data exploration.

All four interpolation methods have returned useable results. NAN still shows the "knot" features that are characteristic of this method. The IDW raster shows a reasonable amount of detail although the slopes tend to "shelf" and are not smooth as they should be. The TRI result is acceptable and could be smoothed to produce a reasonable raster. MINC produced a detailed and smooth raster with a cell size equal to the smallest so far for this method.

Clustered

HEAT     – Triweight, Search Spherical 300m, Cell 5m.

NN          – Search 250m, Cell 5m

NAN       – Search 250m, Cell 5m, No Smooth.

IDW       – Gaussian, Range 300m, Search Spherical 500m, Cell 5m.

TRI          – Triangle size 500m, Cell 5m.

MINC     – Iteration High, Tension 0, Cell 10m.

Both the density grid and lag analysis methods are difficult to interpret with this arrangement. You may end up determining the cell size by eye. The dataset contains 1380 samples, so we might expect less detail in the rasters than the random dataset.

HEAT used a larger search radius to characterise the lower density data between clusters and clearly shows the location of the clusters, without showing any detail within them. NN is more visually pleasing and I think this is a useful raster for data exploration.

Once again, all four interpolation methods have returned useable results but NAN still shows the "knot" features and may require smoothing. A larger search radius was required for IDW and the lack of detail in the raster reflects this, although it is quite appealing. The TRI result suffers from the higher sample spacing. MINC produced a detailed and smooth raster and once again supported a smaller cell size than expected.

Mixed

HEAT     – Triweight, Search Spherical 400m, Cell 5m.

NN          – Search 200m, Cell 5m

NAN       – Search 200m, Cell 5m, No Smooth.

IDW       – Gaussian, Range 200m, Search Spherical 300m, Cell 5m.

TRI          – Triangle size 400m, Cell 5m.

MINC     – Iteration High, Tension 0, Cell 10m.

The lag statistics for these points indicates three sample spacing populations – 60, 120 and 180 metres. There are 1083 samples in the dataset.

I used a smaller search in HEAT reflecting the more even station spacing within the network voids, which produced a more detailed density raster.

NN also used a smaller search radius and is considerably more interpretable than for the Network data. NAN also performed reasonably well, displaying the "knots" as usual.

IDW was able to use a smaller search distance and Gaussian range, resulting in a more detailed raster. However, the hill slopes tend to shelf between network lines rather than vary smoothly.

The TRI raster is quite visually appealing and could be post smoothed. Once again MINC produced a detailed and smooth raster and supported a smaller cell size than expected.

I think this shows how much your rasters can be improved if you are able to collect some widely (but regularly) spaced data in addition to the data collected along the network.

Summary

I hope this helps guide your decision making when designing surveys and creating rasters using either density analysis or interpolation.

In general, it is better to have an even or random distribution of samples (even if there are less overall) than a collection of samples that are closely spaced along lines but with the lines spaced further apart. In practice, surveys are often conducted as lines or networks due to necessity. In those scenarios, gridding is more difficult.

Minimum curvature is the stand-out interpolator in this survey. It generally returns smooth results that honour the sample data values closely but bear in mind it is only suitable for small and moderately sized surveys.

The workflow I recommend is to use Stamp or Nearest Neighbour initially to help explore the dataset. Often, there are too many points to visualise in MapInfo or the points are not in a format that MapInfo will display. Both of these high-performance rasterization options can help you to discover the structure of your data so that you can plan your gridding strategy. Once I have a good understanding of the sample dataset, I then go to the appropriate gridding method. As often as not that turns out to be minimum curvature, or triangulation for huge LiDAR datasets.

------------------------------
Sam Roberts
Engineer, MapInfo Pro Advanced (Raster)
Australia
------------------------------

Attachment(s)

SampleDatasetTABFiles.zip 364 KB 1 version

SourceRaster.zip 1.92 MB 1 version
2. RE: What gridding method should I use? – Part 2 "The results"

1 Like
Employee

Peter Møller
Posted 06-14-2019 05:55

Reply Reply Privately
Great article, @Sam Roberts, thanks for writing this up and sharing your insight.
Is it fair to say that Heat Map/Hot Spot in this context never result in a grid that is even close to the original grid? If it is, could you elaborate on why?
Is it caused by the nature of the Heat Map/Hot Spot interpolation method?

Thanks

------------------------------
Peter Horsbøll Møller
Pitney Bowes
------------------------------

Original Message
3. RE: What gridding method should I use? – Part 2 "The results"

2 Like
Sam Roberts
Posted 06-14-2019 20:35

Reply Reply Privately
In this article I used the term "Heat Map" and this is a term we will standardise on for the next major release of Pro. There will be four different types of heat map - Heat Map (Estimate), Heat Map (Weighted Estimate), Heat Map (Sample Count) and Heat Map (Sample Density). There will also be a Heat Map (Advanced) option which will give you access to all available properties.

In 17.03 all of these are available through the "Hot Spot Density" gridding option which is currently the equivalent of what we will call "Heat Map Advanced" in the next version. To get one of the four different kinds of heat maps you need to set the properties appropriately.

In this article, I used the "Heat Map (Estimate)" technique. This is not an interpolation method and will not produce some estimate of the data values, nor will any heat map technique. It produces an estimate of the sample density across the survey. It tells you where samples are, not what values samples have.

The "Heat Map (Weighted Estimate)" technique uses a column from your sample data to bias the sample density. You might think that if you chose the terrain value column it would produce a raster that looks like the terrain because each sample turns from being "one count" to "value counts". Try it. It does not. In fact, it produces a raster that looks almost exactly the same as the unbiased heat map, except that the magnitude of the cell values has changed. Mathematically, this is because the sum of the sample values at each cell is not divided by the sum of the weights (like it is in inverse distance weighted).

So the answer is no - there is no way that a heat map operation can produce a raster that resembles the original raster and that is not its purpose. To reiterate - a heat map tells you where samples are, not what values samples have.

------------------------------
Sam Roberts
Engineer, MapInfo Pro Advanced (Raster)
Australia
------------------------------

Original Message