Last week, I was asked a question by @yogesh pawar from our Technical Support team:
A customer was seeking a method to divide points within a polygon into groups. The groups should consist of the same number of points, around 10 in this case. They were seeking a way to automate the redistricting process.
This is often also referred to as Cluster Analysis, where you want to create groups of records. This can be used for sales district optimization, for school districts, and for a long range of other use cases.
RouteFinder for MapInfo from Routeware has a dedicated tool for this that supports both straight-line analysis and routing analysis, where you take a road network into consideration.
One of the methods for performing this analysis is k-means clustering. You can find Python implementations of this in the Python module scikit-learn. I'll come back to that in a later article.
In this article, I'll look at a very basic approach I took when trying to divide points into a given number of clusters.
Happy #MapBasicMonday.
Finding the Starting Point
For my basic approach to cluster analysis, I decided to use the extent of the data as a starting point. Around a dataset, you can draw a minimum bounding rectangle (MBR). I will let the tool assign a location on the MBR as the starting point. This could be the center of the MBR, the four corners, the center of the four sides, or one of the corners.
For the four sides or the four corners, it will change for each cluster. This means that the starting point will change as I go through finding points for a cluster.
If I use the center or a specific corner, I will use the same every time. But this may also vary a bit. The MBR will change as it will be based on the points that still haven't been assigned to a cluster.
From the selected starting point, I create a buffer to find nearby points.
The size of the buffer is also related to the size of the MBR. I have set it to be 1/20 of the width of the MBR.
The buffer size is increased until at least one point is found. This point, or these points, are the starting points for the cluster.
------------------------------
Peter Horsbøll Møller
Principal Presales Consultant | Distinguished Engineer
Precisely | Trust in Data
------------------------------