In part 1 of this article I will outline some of the issues you ought to consider when gridding point data. In part 2 I will look at how the different gridding methods perform when trying to reconstruct a raster from a point dataset.
You have a TAB file with point data, or a collection of LAS or LAZ files, or one or more ASCII files with delimited point data – and you want to grid this data to create a raster. To get a good result from a gridding operation you need to consider the following issues-
- What is the arrangement of my input source data?
- How large is my source data set?
- What methods produce the outcome I require?
- Of those methods, which will produce the best raster for my data?
- How should I design my output raster?
- How do I get the optimal results from that method?
Finally, you ought to give thought to the question –
- What is my definition of a good result and how do I measure it?
Source data arrangement
There are six typical source data configurations.
Points are arranged in a grid pattern with regular or semi-regular spacing in X and Y. Consider the sample spacing to be the smallest of the average distance between samples in the X direction and the Y direction.
Points are collected at a regular sample spacing along parallel lines that are regularly spaced. Consider the sample spacing to be the distance between the lines. Generally, along lines the measurements will be oversampled.
Points are collected at a regular sample spacing along a network of interconnected non-parallel lines (for example following a road network). Consider the sample spacing to be the average distance between lines, which you may only be able to judge by visual inspection.
Points are randomly located in space. A sample density analysis can help you to compute the average sample spacing and histogram statistics can give you a sense of the sample spacing distribution.
Points are randomly located but occur in clusters where point density is elevated. A sample density analysis can help you to compute the average sample spacing and histogram statistics can give you a sense of the sample spacing distribution.
A mixture of two or more of the source data arrangements described above. These are the hardest kinds of datasets to grid, and are often considered multi-scale.
Source data observations are almost always point samples. Note that MapInfo also supports vector samples (as polylines and polygons) which are decomposed into point samples that are regularly spaced along the polyline (or polygon exterior boundary).
Source data size
I classify any dataset with less than one million samples as "small", a dataset with up to 100 million samples "large" and a dataset with over 100 million samples "huge". Some methods are better than others at handling large or huge datasets and this factor alone may dictate how you grid your data. Note that some methods have been used on datasets of more than one billion points.
I classify a raster as small if it contains up to 100 million cells, large if it contains up to 10 billion cells and huge if it contains more than that. At PB we have created rasters that contain in excess of one trillion cells.
The performance of a gridding method may be primarily related to the number of data samples or the number of raster cells, or both.
As a general rule, for methods that require point data to be searched at each grid cell use small source datasets and generate small rasters. This includes Heat map, Nearest neighbour, Natural neighbour and Inverse Distance Weighted. I recommend that you only generate small rasters using Minimum curvature. Triangulation and Stamp can be used to generate large rasters from huge datasets.
Method styles
There are four kinds of gridding operation available –
Generates an estimate of the density of data samples or events. These methods don't tell you what value your data has across the raster – they tell you how many source data samples you have spatially across the raster, or how many times some event has occurred over the raster space.
Reflects the exact source data values without interpolation. These methods provide you with a rasterised picture of the exact source data values which you can use for exploring your source data or for further analysis.
Estimates source data values across the raster by interpolation without exceeding the statistical range of the source data values. These methods give you an estimate of a discontinuous measured parameter across the raster space.
- Non-bounded Interpolation
Estimates source data values smoothly across the raster by interpolation, possibly exceeding the statistical range of the source data values. These methods give you an estimate of a continuous measured parameter or field across the raster space.
We could add to this analytical methods like the "Distance" operation that populates the raster cells with the distance to the closest data sample. In a future version of Pro we will have a variety of LiDAR analysis operations to study tree canopy (coverage, density and height).
Method selection
Having determined the style of gridding required, you now need to determine exactly which method suits your needs.
Returns a representation of the spatial density of samples.
For example, you may have a dataset that records the location of car accidents in a city. A heat map will show you where the highest concentration of accidents occur and help you identify traffic "black spots" for remedial action.
- Heat map – weighted estimate
Returns a representation of the spatial density of samples where a data value at the sample location records the number of events at that location and this is used to weight the density.
For example, you may have a dataset that records the location of car accidents in a city along with the number of people killed an injured in those accidents. A weighted heat map will show you where the highest concentration of human casualties are. Or, you may have a dataset that records the location of robberies in a city and at each sample you have an estimate of the value of the lost property. A weighted heat map will provide a representation of where the most financial damage is being inflicted by the criminal activity.
Returns an exact count of the number of samples within the search ellipse centred on each raster cell.
This measure may be more useful from a statistical analysis perspective than the smooth estimates of density returned by other methods.
- Heat map – sample density
Returns the exact spatial density of samples within the search ellipse centred on each raster cell.
This is the same as the sample count method, except that the count is converted to a spatial density (based on the horizontal units of the raster).
Populates raster cells that contain one or more data samples with a data value derived from those samples.
For example, you can investigate the classification codes in a LiDAR dataset by choosing a small cell size and stamping the LiDAR returns into a raster. You can then explore the raster statistics or visually inspect the raster.
- Assignment – nearest neighbour
Populates raster cells with the value of the closest sample. In effect, this generates a Voronoi tessellation of the sample data.
This is a fast way to investigate your source data samples by conversion to a more easily visualised raster.
- Bounded Interpolation – Triangulation
Generates a Delaunay triangulation from the sample points and populates raster cells by linear interpolation across the triangle that overlies the raster cell centre. If the cell size is small compared to the triangle size, the flat triangle surfaces will be readily visible in the raster.
Smoothing can be achieved by filtering the output raster.
- Bounded Interpolation – Natural Neighbour
Starting from a Voronoi tessellation generated by nearest neighbour, this method considers a hypothetical Voronoi polygon centred on each raster cell and estimates the value at this cell by an area weighted sum of the overlap of this polygon with the nearest neighbour polygons.
Only those samples that have some spatial correlation with the raster cell will be used to estimate the value of the raster cell value, and that estimate will be weighted by the degree of spatial correlation. The result is generally smooth.
- Bounded Interpolation – Inverse Distance Weighted
Estimates the value of each raster cell by finding all sample values within a search ellipse and then summing the values, weighted by a function that decreases with distance from the cell centre.
By shaping and rotating the search ellipse, the user can apply bias to the estimates that may be in alignment with known spatial correlation information. It can also be used to ensure that the sample arrangement does not affect the estimates. In general, the raster will either honour the data samples but not be smooth, or be smooth but not honour the data samples.
- Non-bounded Interpolation – Minimum curvature
Fits a hypothetical smooth surface that minimises overall curvature through the sample locations and returns the value of this surface at each raster cell. The raster will honour the value at sample locations closely but, to maintain maximum smoothness, may produce undesired local overshoots or undershoots and may exceed the minimum and maximum sample values.
Output raster design
The most important design issue is cell size. If you want the raster to closely honour the input sample values, you will need a cell size 4 – 5 times smaller than the sample spacing. Balance this against the dependence of the method performance on the number of raster cells.
For minimum curvature (and triangulation to a lesser extent) the cell size is critical to success. If the cell size is too small then minimum curvature will fail to interpolate smoothly between samples. Follow the 4-5 cells per sample spacing rule.
Sometimes source data samples in a regular anisotropic arrangement are located on lines that are an east-west or north-south in a local grid coordinate system but inclined in a map projection. For some methods (like minimum curvature or bi-directional splining), it can be better to grid the data in the local coordinate system. Thereafter, the raster will need to be warped to be displayed in a map coordinate system. This can be done by running a warp operation or using a virtual raster that implements a warp on-the-fly.
Tips for optimal results from each method
Heat map
Algorithm performance is determined by the raster cell size and the size of the search ellipse. For high performance, choose a larger cell size and smaller search ellipse. Whilst a small search ellipse will capture more detail, but it may not be able to interpolate between samples leading to gaps in the raster coverage. Aim for a search ellipse approximately two times the sample spacing.
To achieve smoothness, you need to select a kernel weighting function that tapers to zero. See Wikipedia for details - https://en.wikipedia.org/wiki/Kernel_(statistics). Use either "Quartic" or "Triweight". In our 2019 major release, we will add "Tricube" and also introduce a new kernel called "Gaussian Sharpened" that will allow you to use a larger search ellipse to ensure coverage and smoothness, whilst still capturing detail.
To compute an estimate, select a tapered kernel and turn off "use input column as frequency". To compute a weighted estimate, select the column and turn on "use input column as frequency". To compute a sample count select the "Data count" kernel option and to compute sample density turn on "normalize by area".
In general, you should not need to use smoothing or coincident point processing.
Stamp
Algorithm performance is determined by the size of the input dataset and less so by the raster size. You can choose a small cell size (to reflect the actual sample values) or a large cell size (to integrate sample values over a cell). The method is missing an option to count the number of samples in a cell (this will be added in the 2019 major release).
In general, you should not use smoothing or coincident point processing.
Nearest neighbour
Algorithm performance is determined by the raster cell size and the size of the search radius. Raster cell coverage is determined by the search radius and the goal is to find that smallest search radius that provides you with the coverage that you require. Start small, and work upwards.
To contain the raster on the edges of the sample space, you may want to use a clipping polygon.
Triangulation
Algorithm performance is determined by the number of input samples (which must be triangulated) and to a lesser extent by the raster cell size.
You are required to set the "Maximum triangle size". Triangles with a side length longer than this will be excluded. This will prevent triangulation across large voids where there are no source data points. Also, it can prevent large triangles from being generated along the edges of the data set. If you set a very large triangle size (as large as the sample space) then you will generate a triangulation that extends to the convex hull of the samples for small datasets. For large or huge source datasets the convex hull is not guaranteed.
To contain the raster on the edges of the sample space, you may want to use a clipping polygon.
Coincident point processing is recommended.
Natural Neighbour
Algorithm performance is determined by the raster cell size and the size of the search radius. Raster cell coverage is determined by the search radius and the goal is to find that smallest search radius that provides you with the coverage that you require. Start small, and work upwards.
To contain the raster on the edges of the sample space, you may want to use a clipping polygon.
Because this implementation discretises the polygon area contributions by the raster cell size, the result is often not perfectly smooth. Some smoothing is recommended to counter this. Coincident point processing is recommended.
Inverse Distance Weighted
Algorithm performance is determined by the raster cell size and the size of the search ellipse. For high performance, choose a larger cell size and smaller search ellipse. Whilst a small search ellipse will capture more detail, but it may not be able to interpolate between samples leading to gaps in the raster coverage. Aim for a search ellipse approximately two times the sample spacing.
The search ellipse can be carefully designed in both shape and orientation and it can be divided into sectors from which samples will be selected. This creates a powerful sample selection tool but it presupposes that the same selection rules will be suitable across the entire sample space. This method is susceptible to sample density variations and regions of dense sampling will overly influence the estimations around them. Using search sectors is a partial solution to this.
The linear weighting model does not produce smooth rasters but is easy to control. Both the power and exponential models depend on the scale of the distance which makes them difficult to work with. In the 2019 major release we will introduce a new scaling parameter to improve the utility of these models. The Gaussian model can produce smooth rasters as long as the range parameter is carefully chosen. In all cases, experimentation is required and it may take many trial runs to tune the parameters appropriately. In the 2019 release we will offer new weighting functions that will produce smooth and more pleasing results.
Coincident point processing and a clipping polygon are recommended. Smoothing may be necessary.
Minimum curvature
Algorithm performance is determined by the raster cell size and the number of iterations performed. Critically, once the raster is sufficiently large that it cannot be held in memory and needs to be swapped out to file, performance will worsen dramatically.
A higher number of iterations will lead to a higher quality raster but iterations will cease once the change from one iteration to the next is lower than the "percentage change" value. To ensure iterations are performed, make this a very small number.
"Spline tension" ranges from zero to one and defaults to zero. For many kinds of data it is appropriate to introduce a small amount of tension (for example 0.25). This will help control the raster where it is unconstrained.
A clipping polygon is recommended. Smoothing is not recommended.
What is a good result?
If you have chosen an assignment method then a good result is largely determined by the design on your output raster. Primarily, make sure the cell size is appropriate to represent the source data.
If you have chosen a heat map method then a good raster will strike some kind of balance between detail and smoothness. Where that balance lies is up to you. In general, smoothness will be achieved at the expense of detail.
If you have chosen an interpolation method then a good raster will strike a balance between honouring the input data samples (meaning that if you query the value of the raster at a sample data location, it ought to return a result as close as possible to the source data value) and smoothness. Where that balance lies is up to you and depends on the how much confidence you have in the sample value correctness (or perhaps how much you value that correctness). You might value smoothness because a smooth raster tends to be easier to visualise and interpret.
Capabilities matrices
This table tells you if the performance of a gridding method is related to the number of source data samples or the number of raster cells, or both.
|
Samples
|
Cells
|
Comment
|
Heat Map
|
|
X
|
Search radius critical
|
Stamp
|
X
|
|
Cell size hardly matters
|
Nearest Neighbour
|
|
X
|
Search radius critical
|
Triangulation
|
X
|
X
|
High performance interpolator
|
Natural Neighbour
|
|
X
|
Search radius critical
|
Inverse Distance
|
|
X
|
Search radius critical
|
Minimum Curvature
|
|
X
|
Difficult to multi-thread, memory management critical
|
This table tells you whether a gridding method is suitable for small, large or huge sample data collections.
|
Small
|
Large
|
Huge
|
Comment
|
Heat Map
|
X
|
|
|
Control the search ellipse
|
Stamp
|
|
|
X
|
|
Nearest Neighbour
|
|
|
X
|
Limit the maximum search distance
|
Triangulation
|
|
|
X
|
|
Natural Neighbour
|
|
X
|
|
Limit the maximum search distance
|
Inverse Distance
|
X
|
|
|
Control the search ellipse
|
Minimum Curvature
|
|
X
|
|
Carefully choose the cell size
|
This table tells you what kinds of sample arrangements are handled well by each gridding method. It should be taken as a guide only.
|
Isotropic
|
Anisotropic
|
Network
|
Random
|
Clustered
|
Mixed
|
Heat Map
|
|
|
|
X
|
X
|
X
|
Stamp
|
X
|
X
|
X
|
X
|
X
|
X
|
Nearest Neighbour
|
|
|
|
X
|
X
|
X
|
Triangulation
|
|
|
|
X
|
X
|
X
|
Natural Neighbour
|
X
|
|
|
X
|
X
|
X
|
Inverse Distance
|
X
|
|
|
X
|
|
X
|
Minimum Curvature
|
X
|
X
|
X
|
X
|
X
|
X
|
These tables provide commentary on each sample arrangement for each gridding method.
Isotropic
|
Comment
|
Heat Map
|
Sample density is uniform, unlikely use case.
|
Stamp
|
Locate the centre of the raster cells on the sample locations.
|
Nearest Neighbour
|
Simulates a stamped raster.
|
Triangulation
|
Triangles can flip orientation randomly.
|
Natural Neighbour
|
Simulates a smoothed stamped raster.
|
Inverse Distance
|
Search distance can be isotropic and minimised.
|
Minimum Curvature
|
Choose a cell size ¼ the sample separation.
|
Anisotropic
|
Comment
|
Heat Map
|
Sample density is regular, unlikely use case.
|
Stamp
|
Choose a cell size suitable for the along line sample spacing.
|
Nearest Neighbour
|
Tessellation between lines tend to be unsightly.
|
Triangulation
|
Long triangles between lines tend to be unsightly.
|
Natural Neighbour
|
Tessellation between lines tend to be unsightly.
|
Inverse Distance
|
Orient the search ellipse to straddle lines.
|
Minimum Curvature
|
Choose a cell size ¼ the line separation.
|
Network
|
Comment
|
Heat Map
|
Difficult to obtain a meaningful result.
|
Stamp
|
Choose a cell size suitable for the along line sample spacing.
|
Nearest Neighbour
|
Tessellation can change rapidly and tends to be unsightly.
|
Triangulation
|
Triangles can change rapidly and tend to be unsightly.
|
Natural Neighbour
|
Tessellation can change rapidly and tends to be unsightly.
|
Inverse Distance
|
Difficult to design an appropriate search ellipse.
|
Minimum Curvature
|
Difficult to choose an appropriate cell size.
|
Random
|
Comment
|
Heat Map
|
Choose a weighting function that tapers to create a smooth raster.
|
Stamp
|
|
Nearest Neighbour
|
Fill voids with a large search distance, but watch runtime performance.
|
Triangulation
|
Control or create voids or the concave hull with maximum triangle size.
|
Natural Neighbour
|
Fill voids with a large search distance, but watch runtime performance.
|
Inverse Distance
|
Choose a weighting function that tapers to create a smooth raster.
|
Minimum Curvature
|
Difficult to choose an appropriate cell size.
|
Clustered
|
Comment
|
Heat Map
|
Choose a weighting function that tapers to create a smooth raster.
|
Stamp
|
|
Nearest Neighbour
|
Fill voids with a large search distance, but watch runtime performance.
|
Triangulation
|
Control or create voids or the concave hull with maximum triangle size.
|
Natural Neighbour
|
Fill voids with a large search distance, but watch runtime performance.
|
Inverse Distance
|
Choose a weighting function that tapers to create a smooth raster.
|
Minimum Curvature
|
Difficult to choose an appropriate cell size.
|
Mixed
|
Comment
|
Heat Map
|
Choose a weighting function that tapers to create a smooth raster.
|
Stamp
|
Choose a cell size based on the smallest sample separation along lines.
|
Nearest Neighbour
|
Fill voids with a large search distance, but watch runtime performance.
|
Triangulation
|
Control or create voids or the concave hull with maximum triangle size.
|
Natural Neighbour
|
Fill voids with a large search distance, but watch runtime performance.
|
Inverse Distance
|
Choose a weighting function that tapers to create a smooth raster.
|
Minimum Curvature
|
Difficult to choose an appropriate cell size.
|
Summary
MapInfo Pro Advanced provides a selection of gridding methods that are suitable for a wide variety of data scientists. Some specialist methods are not provided – for example, bi-directional splining is fast and effective for line based potential field surveys and Kriging in its various forms is popular in the Geostatistics community.
If you are working with demographic data you will probably use the Heat Map (Estimate and Weighted Estimate) methods.
If you are working with LiDAR data you will probably use Triangulation, especially if the source datasets are large.
If you are estimating a field (like temperature or a magnetic field for example) then Minimum curvature is likely to provide the best results and consider using some spline tension.
If you are working with highly variable data where you want to get a sense of trends without seeing the high-frequency variations, then Inverse Distance Weighted may be a suitable choice.
And don't forget that if you find a bug or want to see a feature implemented then let us know about it!
Next article - What gridding method should I use? – Part 2 "The results"------------------------------
Sam Roberts
Engineer, MapInfo Pro Advanced (Raster)
Australia
------------------------------