MapInfo Pro

 View Only
  • 1.  What gridding method should I use? – Part 1 "The decision"

    Posted 06-13-2019 20:43
    Edited by Sam Roberts 07-19-2019 01:18

    In part 1 of this article I will outline some of the issues you ought to consider when gridding point data. In part 2 I will look at how the different gridding methods perform when trying to reconstruct a raster from a point dataset.

    Raster ribbon showing Create Raster options in 17.03
    You have a TAB file with point data, or a collection of LAS or LAZ files, or one or more ASCII files with delimited point data – and you want to grid this data to create a raster. To get a good result from a gridding operation you need to consider the following issues- 

    • What is the arrangement of my input source data?
    • How large is my source data set?
    • What methods produce the outcome I require?
    • Of those methods, which will produce the best raster for my data?
    • How should I design my output raster?
    • How do I get the optimal results from that method?
     Finally, you ought to give thought to the question –

    • What is my definition of a good result and how do I measure it?

     Source data arrangement

    There are six typical source data configurations.

    • Regular isotropic

    Points are arranged in a grid pattern with regular or semi-regular spacing in X and Y. Consider the sample spacing to be the smallest of the average distance between samples in the X direction and the Y direction.

    Regular Isotropic Samples

    • Regular anisotropic

    Points are collected at a regular sample spacing along parallel lines that are regularly spaced. Consider the sample spacing to be the distance between the lines. Generally, along lines the measurements will be oversampled.

    Regular Anisotropic Samples

    • Regular network

    Points are collected at a regular sample spacing along a network of interconnected non-parallel lines (for example following a road network). Consider the sample spacing to be the average distance between lines, which you may only be able to judge by visual inspection.

    Regular Network Sampling

    • Random

    Points are randomly located in space. A sample density analysis can help you to compute the average sample spacing and histogram statistics can give you a sense of the sample spacing distribution.

    Random Sampling

    • Random clustered

    Points are randomly located but occur in clusters where point density is elevated. A sample density analysis can help you to compute the average sample spacing and histogram statistics can give you a sense of the sample spacing distribution.

    Random Clustered Sampling

    • Mixed

    A mixture of two or more of the source data arrangements described above. These are the hardest kinds of datasets to grid, and are often considered multi-scale.

    Mixed Sampling

    Source data observations are almost always point samples. Note that MapInfo also supports vector samples (as polylines and polygons) which are decomposed into point samples that are regularly spaced along the polyline (or polygon exterior boundary).

    Source data size

     I classify any dataset with less than one million samples as "small", a dataset with up to 100 million samples "large" and a dataset with over 100 million samples "huge". Some methods are better than others at handling large or huge datasets and this factor alone may dictate how you grid your data. Note that some methods have been used on datasets of more than one billion points.

     I classify a raster as small if it contains up to 100 million cells, large if it contains up to 10 billion cells and huge if it contains more than that. At PB we have created rasters that contain in excess of one trillion cells.

     The performance of a gridding method may be primarily related to the number of data samples or the number of raster cells, or both.

     As a general rule, for methods that require point data to be searched at each grid cell use small source datasets and generate small rasters. This includes Heat map, Nearest neighbour, Natural neighbour and Inverse Distance Weighted. I recommend that you only generate small rasters using Minimum curvature. Triangulation and Stamp can be used to generate large rasters from huge datasets.

     

    Method styles

     There are four kinds of gridding operation available – 

    • Heat map 

    Generates an estimate of the density of data samples or events. These methods don't tell you what value your data has across the raster – they tell you how many source data samples you have spatially across the raster, or how many times some event has occurred over the raster space.
     

    • Assignment 

    Reflects the exact source data values without interpolation. These methods provide you with a rasterised picture of the exact source data values which you can use for exploring your source data or for further analysis.
     

    • Bounded Interpolation 

    Estimates source data values across the raster by interpolation without exceeding the statistical range of the source data values. These methods give you an estimate of a discontinuous measured parameter across the raster space.
     

    • Non-bounded Interpolation 

    Estimates source data values smoothly across the raster by interpolation, possibly exceeding the statistical range of the source data values. These methods give you an estimate of a continuous measured parameter or field across the raster space.

    We could add to this analytical methods like the "Distance" operation that populates the raster cells with the distance to the closest data sample. In a future version of Pro we will have a variety of LiDAR analysis operations to study tree canopy (coverage, density and height).

     

    Method selection

     Having determined the style of gridding required, you now need to determine exactly which method suits your needs. 

    • Heat map – estimate 

    Returns a representation of the spatial density of samples. 

    For example, you may have a dataset that records the location of car accidents in a city. A heat map will show you where the highest concentration of accidents occur and help you identify traffic "black spots" for remedial action.

    • Heat map – weighted estimate 

    Returns a representation of the spatial density of samples where a data value at the sample location records the number of events at that location and this is used to weight the density. 

    For example, you may have a dataset that records the location of car accidents in a city along with the number of people killed an injured in those accidents. A weighted heat map will show you where the highest concentration of human casualties are. Or, you may have a dataset that records the location of robberies in a city and at each sample you have an estimate of the value of the lost property. A weighted heat map will provide a representation of where the most financial damage is being inflicted by the criminal activity.
     

    • Heat map – sample count 

    Returns an exact count of the number of samples within the search ellipse centred on each raster cell.

    This measure may be more useful from a statistical analysis perspective than the smooth estimates of density returned by other methods.
     

    • Heat map – sample density 

    Returns the exact spatial density of samples within the search ellipse centred on each raster cell. 

    This is the same as the sample count method, except that the count is converted to a spatial density (based on the horizontal units of the raster).

    • Assignment – stamp 
    Populates raster cells that contain one or more data samples with a data value derived from those samples.

    For example, you can investigate the classification codes in a LiDAR dataset by choosing a small cell size and stamping the LiDAR returns into a raster. You can then explore the raster statistics or visually inspect the raster.

     

    • Assignment – nearest neighbour
    Populates raster cells with the value of the closest sample. In effect, this generates a Voronoi tessellation of the sample data.

    This is a fast way to investigate your source data samples by conversion to a more easily visualised raster.
     
    • Bounded Interpolation – Triangulation

    Generates a Delaunay triangulation from the sample points and populates raster cells by linear interpolation across the triangle that overlies the raster cell centre. If the cell size is small compared to the triangle size, the flat triangle surfaces will be readily visible in the raster.

    Smoothing can be achieved by filtering the output raster.
     

    • Bounded Interpolation – Natural Neighbour
    Starting from a Voronoi tessellation generated by nearest neighbour, this method considers a hypothetical Voronoi polygon centred on each raster cell and estimates the value at this cell by an area weighted sum of the overlap of this polygon with the nearest neighbour polygons.

    Only those samples that have some spatial correlation with the raster cell will be used to estimate the value of the raster cell value, and that estimate will be weighted by the degree of spatial correlation. The result is generally smooth.
     
    • Bounded Interpolation – Inverse Distance Weighted 
    Estimates the value of each raster cell by finding all sample values within a search ellipse and then summing the values, weighted by a function that decreases with distance from the cell centre.

    By shaping and rotating the search ellipse, the user can apply bias to the estimates that may be in alignment with known spatial correlation information. It can also be used to ensure that the sample arrangement does not affect the estimates. In general, the raster will either honour the data samples but not be smooth, or be smooth but not honour the data samples.
     
    • Non-bounded Interpolation – Minimum curvature 

    Fits a hypothetical smooth surface that minimises overall curvature through the sample locations and returns the value of this surface at each raster cell. The raster will honour the value at sample locations closely but, to maintain maximum smoothness, may produce undesired local overshoots or undershoots and may exceed the minimum and maximum sample values.

     Output raster design 

    The most important design issue is cell size. If you want the raster to closely honour the input sample values, you will need a cell size 4 – 5 times smaller than the sample spacing. Balance this against the dependence of the method performance on the number of raster cells. 

    For minimum curvature (and triangulation to a lesser extent) the cell size is critical to success. If the cell size is too small then minimum curvature will fail to interpolate smoothly between samples. Follow the 4-5 cells per sample spacing rule. 

    Sometimes source data samples in a regular anisotropic arrangement are located on lines that are an east-west or north-south in a local grid coordinate system but inclined in a map projection. For some methods (like minimum curvature or bi-directional splining), it can be better to grid the data in the local coordinate system. Thereafter, the raster will need to be warped to be displayed in a map coordinate system. This can be done by running a warp operation or using a virtual raster that implements a warp on-the-fly.

    Tips for optimal results from each method

    Heat map

    Algorithm performance is determined by the raster cell size and the size of the search ellipse. For high performance, choose a larger cell size and smaller search ellipse. Whilst a small search ellipse will capture more detail, but it may not be able to interpolate between samples leading to gaps in the raster coverage. Aim for a search ellipse approximately two times the sample spacing.

    To achieve smoothness, you need to select a kernel weighting function that tapers to zero. See Wikipedia for details - https://en.wikipedia.org/wiki/Kernel_(statistics). Use either "Quartic" or "Triweight". In our 2019 major release, we will add "Tricube" and also introduce a new kernel called "Gaussian Sharpened" that will allow you to use a larger search ellipse to ensure coverage and smoothness, whilst still capturing detail.

    To compute an estimate, select a tapered kernel and turn off "use input column as frequency". To compute a weighted estimate, select the column and turn on "use input column as frequency". To compute a sample count select the "Data count" kernel option and to compute sample density turn on "normalize by area".

    In general, you should not need to use smoothing or coincident point processing.

     

    Stamp

    Algorithm performance is determined by the size of the input dataset and less so by the raster size. You can choose a small cell size (to reflect the actual sample values) or a large cell size (to integrate sample values over a cell). The method is missing an option to count the number of samples in a cell (this will be added in the 2019 major release).

    In general, you should not use smoothing or coincident point processing.

     

    Nearest neighbour

    Algorithm performance is determined by the raster cell size and the size of the search radius. Raster cell coverage is determined by the search radius and the goal is to find that smallest search radius that provides you with the coverage that you require. Start small, and work upwards.

    To contain the raster on the edges of the sample space, you may want to use a clipping polygon.

     

    Triangulation

    Algorithm performance is determined by the number of input samples (which must be triangulated) and to a lesser extent by the raster cell size.

    You are required to set the "Maximum triangle size". Triangles with a side length longer than this will be excluded. This will prevent triangulation across large voids where there are no source data points. Also, it can prevent large triangles from being generated along the edges of the data set. If you set a very large triangle size (as large as the sample space) then you will generate a triangulation that extends to the convex hull of the samples for small datasets. For large or huge source datasets the convex hull is not guaranteed.

    To contain the raster on the edges of the sample space, you may want to use a clipping polygon.

    Coincident point processing is recommended.

     

    Natural Neighbour

    Algorithm performance is determined by the raster cell size and the size of the search radius. Raster cell coverage is determined by the search radius and the goal is to find that smallest search radius that provides you with the coverage that you require. Start small, and work upwards.

    To contain the raster on the edges of the sample space, you may want to use a clipping polygon.

    Because this implementation discretises the polygon area contributions by the raster cell size, the result is often not perfectly smooth. Some smoothing is recommended to counter this. Coincident point processing is recommended.

     

    Inverse Distance Weighted

    Algorithm performance is determined by the raster cell size and the size of the search ellipse. For high performance, choose a larger cell size and smaller search ellipse. Whilst a small search ellipse will capture more detail, but it may not be able to interpolate between samples leading to gaps in the raster coverage. Aim for a search ellipse approximately two times the sample spacing.

    The search ellipse can be carefully designed in both shape and orientation and it can be divided into sectors from which samples will be selected. This creates a powerful sample selection tool but it presupposes that the same selection rules will be suitable across the entire sample space. This method is susceptible to sample density variations and regions of dense sampling will overly influence the estimations around them. Using search sectors is a partial solution to this.

    The linear weighting model does not produce smooth rasters but is easy to control. Both the power and exponential models depend on the scale of the distance which makes them difficult to work with. In the 2019 major release we will introduce a new scaling parameter to improve the utility of these models. The Gaussian model can produce smooth rasters as long as the range parameter is carefully chosen. In all cases, experimentation is required and it may take many trial runs to tune the parameters appropriately. In the 2019 release we will offer new weighting functions that will produce smooth and more pleasing results.

    Coincident point processing and a clipping polygon are recommended. Smoothing may be necessary.

     

    Minimum curvature

    Algorithm performance is determined by the raster cell size and the number of iterations performed. Critically, once the raster is sufficiently large that it cannot be held in memory and needs to be swapped out to file, performance will worsen dramatically.

    A higher number of iterations will lead to a higher quality raster but iterations will cease once the change from one iteration to the next is lower than the "percentage change" value. To ensure iterations are performed, make this a very small number.

    "Spline tension" ranges from zero to one and defaults to zero. For many kinds of data it is appropriate to introduce a small amount of tension (for example 0.25). This will help control the raster where it is unconstrained.

    A clipping polygon is recommended. Smoothing is not recommended.

     

    What is a good result?

    If you have chosen an assignment method then a good result is largely determined by the design on your output raster. Primarily, make sure the cell size is appropriate to represent the source data.

    If you have chosen a heat map method then a good raster will strike some kind of balance between detail and smoothness. Where that balance lies is up to you. In general, smoothness will be achieved at the expense of detail.

    If you have chosen an interpolation method then a good raster will strike a balance between honouring the input data samples (meaning that if you query the value of the raster at a sample data location, it ought to return a result as close as possible to the source data value) and smoothness. Where that balance lies is up to you and depends on the how much confidence you have in the sample value correctness (or perhaps how much you value that correctness). You might value smoothness because a smooth raster tends to be easier to visualise and interpret.

     

    Capabilities matrices

    This table tells you if the performance of a gridding method is related to the number of source data samples or the number of raster cells, or both. 

     

    Samples

    Cells

    Comment

    Heat Map

     

    X

    Search radius critical

    Stamp

    X

     

    Cell size hardly matters

    Nearest Neighbour

     

    X

    Search radius critical

    Triangulation

    X

    X

    High performance interpolator

    Natural Neighbour

     

    X

    Search radius critical

    Inverse Distance

     

    X

    Search radius critical

    Minimum Curvature

     

    X

    Difficult to multi-thread, memory management critical

     

    This table tells you whether a gridding method is suitable for small, large or huge sample data collections. 

     

    Small

    Large

    Huge

    Comment

    Heat Map

    X

     

     

    Control the search ellipse

    Stamp

     

     

    X

     

    Nearest Neighbour

     

     

    X

    Limit the maximum search distance

    Triangulation

     

     

    X

     

    Natural Neighbour

     

    X

     

    Limit the maximum search distance

    Inverse Distance

    X

     

     

    Control the search ellipse

    Minimum Curvature

     

    X

     

    Carefully choose the cell size

     

    This table tells you what kinds of sample arrangements are handled well by each gridding method. It should be taken as a guide only. 

     

    Isotropic

    Anisotropic

    Network

    Random

    Clustered

    Mixed

    Heat Map

     

     

     

    X

    X

    X

    Stamp

    X

    X

    X

    X

    X

    X

    Nearest Neighbour

     

     

     

    X

    X

    X

    Triangulation

     

     

     

    X

    X

    X

    Natural Neighbour

    X

     

     

    X

    X

    X

    Inverse Distance

    X

     

     

    X

     

    X

    Minimum Curvature

    X

    X

    X

    X

    X

    X

     

    These tables provide commentary on each sample arrangement for each gridding method.

    Isotropic

    Comment

    Heat Map

    Sample density is uniform, unlikely use case.

    Stamp

    Locate the centre of the raster cells on the sample locations.

    Nearest Neighbour

    Simulates a stamped raster.

    Triangulation

    Triangles can flip orientation randomly.

    Natural Neighbour

    Simulates a smoothed stamped raster.

    Inverse Distance

    Search distance can be isotropic and minimised.

    Minimum Curvature

    Choose a cell size ¼ the sample separation.

     

    Anisotropic

    Comment

    Heat Map

    Sample density is regular, unlikely use case.

    Stamp

    Choose a cell size suitable for the along line sample spacing.

    Nearest Neighbour

    Tessellation between lines tend to be unsightly.

    Triangulation

    Long triangles between lines tend to be unsightly.

    Natural Neighbour

    Tessellation between lines tend to be unsightly.

    Inverse Distance

    Orient the search ellipse to straddle lines.

    Minimum Curvature

    Choose a cell size ¼ the line separation.

     

    Network

    Comment

    Heat Map

    Difficult to obtain a meaningful result.

    Stamp

    Choose a cell size suitable for the along line sample spacing.

    Nearest Neighbour

    Tessellation can change rapidly and tends to be unsightly.

    Triangulation

    Triangles can change rapidly and tend to be unsightly.

    Natural Neighbour

    Tessellation can change rapidly and tends to be unsightly.

    Inverse Distance

    Difficult to design an appropriate search ellipse.

    Minimum Curvature

    Difficult to choose an appropriate cell size.

     

    Random

    Comment

    Heat Map

    Choose a weighting function that tapers to create a smooth raster.

    Stamp

     

    Nearest Neighbour

    Fill voids with a large search distance, but watch runtime performance.

    Triangulation

    Control or create voids or the concave hull with maximum triangle size.

    Natural Neighbour

    Fill voids with a large search distance, but watch runtime performance.

    Inverse Distance

    Choose a weighting function that tapers to create a smooth raster.

    Minimum Curvature

    Difficult to choose an appropriate cell size.

     

    Clustered

    Comment

    Heat Map

    Choose a weighting function that tapers to create a smooth raster.

    Stamp

     

    Nearest Neighbour

    Fill voids with a large search distance, but watch runtime performance.

    Triangulation

    Control or create voids or the concave hull with maximum triangle size.

    Natural Neighbour

    Fill voids with a large search distance, but watch runtime performance.

    Inverse Distance

    Choose a weighting function that tapers to create a smooth raster.

    Minimum Curvature

    Difficult to choose an appropriate cell size.

     


    Mixed

    Comment

    Heat Map

    Choose a weighting function that tapers to create a smooth raster.

    Stamp

    Choose a cell size based on the smallest sample separation along lines.

    Nearest Neighbour

    Fill voids with a large search distance, but watch runtime performance.

    Triangulation

    Control or create voids or the concave hull with maximum triangle size.

    Natural Neighbour

    Fill voids with a large search distance, but watch runtime performance.

    Inverse Distance

    Choose a weighting function that tapers to create a smooth raster.

    Minimum Curvature

    Difficult to choose an appropriate cell size.

     

    Summary

    MapInfo Pro Advanced provides a selection of gridding methods that are suitable for a wide variety of data scientists. Some specialist methods are not provided – for example, bi-directional splining is fast and effective for line based potential field surveys and Kriging in its various forms is popular in the Geostatistics community.

    If you are working with demographic data you will probably use the Heat Map (Estimate and Weighted Estimate) methods.

    If you are working with LiDAR data you will probably use Triangulation, especially if the source datasets are large.

    If you are estimating a field (like temperature or a magnetic field for example) then Minimum curvature is likely to provide the best results and consider using some spline tension.

    If you are working with highly variable data where you want to get a sense of trends without seeing the high-frequency variations, then Inverse Distance Weighted may be a suitable choice.

    And don't forget that if you find a bug or want to see a feature implemented then let us know about it!

    Next article - What gridding method should I use? – Part 2 "The results"

    ------------------------------
    Sam Roberts
    Engineer, MapInfo Pro Advanced (Raster)
    Australia
    ------------------------------


  • 2.  RE: What gridding method should I use? – Part 1 "The decision"

    Employee
    Posted 06-17-2019 04:36
    Great resource, @Sam Roberts! Thanks for writing this up.

    For the readers out there, do make sure to also read the follow up article: What gridding method should I use? – Part 2 "The results"

    ------------------------------
    Peter Horsbøll Møller
    Pitney Bowes
    ------------------------------