Last week, I discussed cluster analysis in MapInfo Pro using MapBasic. Even though it was fun to try this out using MapBasic, it may not be very efficient, especially when you consider that you can easily use existing clustering methods in Python.
Let me show you how easy it is.
Happy #MapInfoMonday!
Using the K-Means Cluster Algorithm in Python
I started by asking my new friend, Copilot:
Write a Python script to load data from a csv file with locations (lat and long) and cluster this data using k-means
So, this was my first time asking Copilot a question. Earlier, I used ChatGPT. I suspect I need to tweak my prompts a bit. Anyway, after some small talk back and forth, Copilot gave me this Python code:
This code, however, does more than cluster the sample data. It also creates a small graph that illustrates the clustered data.
I copied the K-Means code and inserted it into the SQL Window in MapInfo Pro. I modified it slightly to not use hardcoded file names and added a call that exports the result to a CSV file.
As you can see, it only takes 6 lines of Python code to load data from a CSV file, cluster it, and save the result back out to another CSV file.
That was easy. Now we only need to add some preprocessing and postprocessing.
I start by defining a few variables to hold the values relevant for the process, such as the table name, column names for the unique ID in the data, the cluster ID column, and the desired number of clusters.
I also define a folder where I want to store the temporary data and set a name for the CSV file for the exported data.
Finally, I query my table to extract the ID/name and centroid coordinates. This is all I use when clustering the data points. The query result is exported to the CSV file.
After this, the cluster code shown above will be executed, creating the resulting CSV file.
With the result saved to a CSV file, we need to load this back into MapInfo Pro and update the existing table with the resulting cluster information.
I created this as a Python method that can be called with the necessary parameters: the name of the CSV file, the table name, and the two columns for the ID and the cluster information.
I create a TAB file referencing the CSV file using the MapBasic Register Table statement. As you can see here and earlier, in some cases, the code executed is MapBasic code. This is executed from Python using the do method.
Then I open this table in MapInfo Pro.
With the table opened, I can now update the original table with the cluster ID.
When you run this command, the module will be downloaded through PIP, the Package Installer for Python.
If the module has already been installed, you can see lines like these in the message window:
Requirement already satisfied: scikit-learn in c:\mapinfo\pro\v23.1\python310\lib\site-packages (1.6.1)
Requirement already satisfied: numpy>=1.19.5 in c:\mapinfo\pro\v23.1\python310\lib\site-packages (from scikit-learn) (1.24.3)
Requirement already satisfied: scipy>=1.6.0 in c:\mapinfo\pro\v23.1\python310\lib\site-packages (from scikit-learn) (1.11.1)
Requirement already satisfied: joblib>=1.2.0 in c:\mapinfo\pro\v23.1\python310\lib\site-packages (from scikit-learn) (1.5.1)
Requirement already satisfied: threadpoolctl>=3.1.0 in c:\mapinfo\pro\v23.1\python310\lib\site-packages (from scikit-learn) (3.6.0)
I am using several Python modules. I wasn't sure if we deploy these automatically with Python when we install MapInfo Pro, so at the first run of my code, I tried to install these:
install_module('pandas')
install_module('matplotlib')
install_module('scikit-learn')
After the initial run, I added a hash (#) in front of each of these 3 lines, as I now had deployed them and no longer needed to install them. The hash marks the lines as comments in Python, meaning they won't get executed.
Finally, you also need to import the modules you are using in your source code. These are my imports:
import os
import pandas as pd
import matplotlib.pyplot as plt
from sklearn.cluster import KMeans
from os.path import join, dirname, abspath
from MapInfo.Types.Data import OpenTableFlags
You can probably skip mathplotlib as I ended up not using this module. The graph that it creates, I can also see in my map window using a thematic map on the cluster ID.
I have also included it as a MapInfo Script file (MIS) in the attached ZIP file.
I hope you find this useful and that you can see how utilizing the Python support within MapInfo Pro significantly expands the possibilities.
------------------------------
Peter Horsbøll Møller
Principal Presales Consultant | Distinguished Engineer
Precisely | Trust in Data
------------------------------