MapInfo Pro

View Only

Back to discussions

Expand all | Collapse all

MapInfo Monday: Cluster Analysis in MapInfo Pro using Python

1. MapInfo Monday: Cluster Analysis in MapInfo Pro using Python

Like
Employee

Peter Møller
Posted 06-09-2025 03:48
Edited by Peter Møller 06-09-2025 03:56

Reply Reply Privately
Last week, I discussed cluster analysis in MapInfo Pro using MapBasic. Even though it was fun to try this out using MapBasic, it may not be very efficient, especially when you consider that you can easily use existing clustering methods in Python.

Let me show you how easy it is.

Happy #MapInfoMonday!

Using the K-Means Cluster Algorithm in Python

I started by asking my new friend, Copilot:

Write a Python script to load data from a csv file with locations (lat and long) and cluster this data using k-means

So, this was my first time asking Copilot a question. Earlier, I used ChatGPT. I suspect I need to tweak my prompts a bit. Anyway, after some small talk back and forth, Copilot gave me this Python code:

import pandas as pd import matplotlib.pyplot as plt from sklearn.cluster import KMeans # Load the data from the CSV file file_path = 'locations.csv' data = pd.read_csv(file_path) # Extract the latitude and longitude columns X = data[['LAT', 'LONG']] # Perform k-means clustering kmeans = KMeans(n_clusters=3, random_state=0).fit(X) data['Cluster'] = kmeans.labels_ # Plot the clusters plt.figure(figsize=(10, 6)) plt.scatter(data['LAT'], data['LONG'], c=data['Cluster'], cmap='viridis', marker='o') plt.xlabel('Latitude') plt.ylabel('Longitude') plt.title('K-Means Clustering of Locations') plt.colorbar(label='Cluster') plt.show()

This code, however, does more than cluster the sample data. It also creates a small graph that illustrates the clustered data.

I copied the K-Means code and inserted it into the SQL Window in MapInfo Pro. I modified it slightly to not use hardcoded file names and added a call that exports the result to a CSV file.

data = pd.read_csv(filecsv, header=0) X = data[['Lat', 'Long']] kmeans = KMeans(n_clusters=numclusters, random_state=0).fit(X) data['Cluster'] = kmeans.labels_ filecsv = eval('TempFileName$("")') data.to_csv(filecsv, index=False)

As you can see, it only takes 6 lines of Python code to load data from a CSV file, cluster it, and save the result back out to another CSV file.

That was easy. Now we only need to add some preprocessing and postprocessing.

Initializing Process and Exporting Data

I start by defining a few variables to hold the values relevant for the process, such as the table name, column names for the unique ID in the data, the cluster ID column, and the desired number of clusters.

I also define a folder where I want to store the temporary data and set a name for the CSV file for the exported data.

Finally, I query my table to extract the ID/name and centroid coordinates. This is all I use when clustering the data points. The query result is exported to the CSV file.

tabname = 'SiteFile' colSiteID = 'SiteName' colClusterID = 'ClusterID' numclusters = 12 foldertemp = eval('PathToDirectory$(TempFileName$(""))') filecsv = os.path.join(foldertemp, 'locations.csv') do(f'Set CoordSys Table {tabname}') do(f'Select {colSiteID} As "ID", CentroidX(OBJ) As "Long", CentroidY(OBJ) As "Lat" From {tabname} into q_export NoSelect Hide') do(f'Export q_export Into "{filecsv}" Type "ASCII" Delimiter "," Titles Overwrite') do('Close Table q_export')

After this, the cluster code shown above will be executed, creating the resulting CSV file.

Loading Results into the Original Table

With the result saved to a CSV file, we need to load this back into MapInfo Pro and update the existing table with the resulting cluster information.

I created this as a Python method that can be called with the necessary parameters: the name of the CSV file, the table name, and the two columns for the ID and the cluster information.

I create a TAB file referencing the CSV file using the MapBasic Register Table statement. As you can see here and earlier, in some cases, the code executed is MapBasic code. This is executed from Python using the do method.

Then I open this table in MapInfo Pro.

With the table opened, I can now update the original table with the cluster ID.

def UpdateTableWithClusterID(filecsv, tabname, colSiteID, colClusterID): try: if os.path.exists(filecsv): filetemp = eval('TempFileName$("")') do(f'Register Table "{filecsv}" TYPE ASCII Delimiter 44 Titles Charset "UTF-8" Into "{filetemp}"') pro.Catalog.OpenTable(filetemp, int(OpenTableFlags.Hide), 'csv_hidden') table = pro.Catalog.HiddenTable('csv_hidden') if table: do(f'Add Column {tabname} ({colClusterID}) From {table.Alias} Set To Cluster Where {colSiteID} = ID') print(f"The clustering results have been written back to table '{tabname}'.") table.Close() else: print('Cluster Result could not be opened') except Exception as e: print("Error: {}".format(e)) if table: table.Close()

Initializing the Python Tool

This Python tool uses several Python modules that need to be imported and also installed for MapInfo Pro to be able to use them.

When you are writing Python code in MapInfo Pro, you may have to install additional Python modules. There is a very easy way to do this. When you start MapInfo Pro, make sure to right-click on the MapInfo Pro icon and then select Run as Administrator. This will elevate your permissions to also write files to the MapInfo Pro installation folder, which is necessary to be able to install additional Python modules.

Now you can use a command in Python to download and install a new module:

install_module('scikit-learn')

When you run this command, the module will be downloaded through PIP, the Package Installer for Python.

If the module has already been installed, you can see lines like these in the message window:

Requirement already satisfied: scikit-learn in c:\mapinfo\pro\v23.1\python310\lib\site-packages (1.6.1) Requirement already satisfied: numpy>=1.19.5 in c:\mapinfo\pro\v23.1\python310\lib\site-packages (from scikit-learn) (1.24.3) Requirement already satisfied: scipy>=1.6.0 in c:\mapinfo\pro\v23.1\python310\lib\site-packages (from scikit-learn) (1.11.1) Requirement already satisfied: joblib>=1.2.0 in c:\mapinfo\pro\v23.1\python310\lib\site-packages (from scikit-learn) (1.5.1) Requirement already satisfied: threadpoolctl>=3.1.0 in c:\mapinfo\pro\v23.1\python310\lib\site-packages (from scikit-learn) (3.6.0)

I am using several Python modules. I wasn't sure if we deploy these automatically with Python when we install MapInfo Pro, so at the first run of my code, I tried to install these:

install_module('pandas') install_module('matplotlib') install_module('scikit-learn')

After the initial run, I added a hash (#) in front of each of these 3 lines, as I now had deployed them and no longer needed to install them. The hash marks the lines as comments in Python, meaning they won't get executed.

Finally, you also need to import the modules you are using in your source code. These are my imports:

import os import pandas as pd import matplotlib.pyplot as plt from sklearn.cluster import KMeans from os.path import join, dirname, abspath from MapInfo.Types.Data import OpenTableFlags

You can probably skip mathplotlib as I ended up not using this module. The graph that it creates, I can also see in my map window using a thematic map on the cluster ID.

The K-Means method has plenty of parameters that can help fine-tune the result. You can find the documentation for the method on the scikit-learn documentation page.

If you want more control, you can also consider using the method KMeansConstrained. It does, for example, allow you to set the minimum and maximum size of the resulting clusters.

Here is the full Python script. Note that the 3 commands for installing the additional Python modules have been commented out at the top.

#install_module('pandas') #install_module('matplotlib') #install_module('scikit-learn') import os import pandas as pd import matplotlib.pyplot as plt from sklearn.cluster import KMeans from os.path import join, dirname, abspath from MapInfo.Types.Data import OpenTableFlags def UpdateTableWithClusterID(filecsv, tabname, colSiteID, colClusterID): try: if os.path.exists(filecsv): filetemp = eval('TempFileName$("")') do(f'Register Table "{filecsv}" TYPE ASCII Delimiter 44 Titles Charset "UTF-8" Into "{filetemp}"') pro.Catalog.OpenTable(filetemp, int(OpenTableFlags.Hide), 'csv_hidden') table = pro.Catalog.HiddenTable('csv_hidden') if table: do(f'Add Column {tabname} ({colClusterID}) From {table.Alias} Set To Cluster Where {colSiteID} = ID') print(f"The clustering results have been written back to table '{tabname}'.") table.Close() else: print('Cluster Result could not be opened') except Exception as e: print("Error: {}".format(e)) if table: table.Close() tabname = 'Sites' colSiteID = 'SiteName' colClusterID = 'ClusterID' numclusters = 2 foldertemp = eval('PathToDirectory$(TempFileName$(""))') filecsv = os.path.join(foldertemp, 'locations.csv') do(f'Set CoordSys Table {tabname}') do(f'Select {colSiteID} As "ID", CentroidX(OBJ) As "Long", CentroidY(OBJ) As "Lat" From {tabname} into q_export NoSelect Hide') do(f'Export q_export Into "{filecsv}" Type "ASCII" Delimiter "," Titles Overwrite') do('Close Table q_export') data = pd.read_csv(filecsv, header=0) X = data[['Lat', 'Long']] kmeans = KMeans(n_clusters=numclusters, random_state=0, init='k-means++').fit(X) data['Cluster'] = kmeans.labels_ filecsv = eval('TempFileName$("")') data.to_csv(filecsv, index=False) #Resetting the dataframe data.iloc[:0] UpdateTableWithClusterID(filecsv, tabname, colSiteID, colClusterID)

I have also included it as a MapInfo Script file (MIS) in the attached ZIP file.

I hope you find this useful and that you can see how utilizing the Python support within MapInfo Pro significantly expands the possibilities.

------------------------------
Peter Horsbøll Møller
Principal Presales Consultant | Distinguished Engineer
Precisely | Trust in Data
------------------------------

MapInfo Pro

MapInfo Monday: Cluster Analysis in MapInfo Pro using Python

1. MapInfo Monday: Cluster Analysis in MapInfo Pro using Python

Using the K-Means Cluster Algorithm in Python

Initializing Process and Exporting Data

Loading Results into the Original Table

Initializing the Python Tool

About Precisely

Customer Support

Copyright ©2025 Precisely. All rights reserved worldwide.