MapInfo Pro

 View Only

MapInfo Monday: Cluster Analysis in MapInfo Pro using Python

  • 1.  MapInfo Monday: Cluster Analysis in MapInfo Pro using Python

    Employee
    Posted 06-09-2025 03:48
    Edited by Peter Møller 06-09-2025 03:56

    Last week, I discussed cluster analysis in MapInfo Pro using MapBasic. Even though it was fun to try this out using MapBasic, it may not be very efficient, especially when you consider that you can easily use existing clustering methods in Python.

    Let me show you how easy it is.

    Happy #MapInfoMonday!

    Using the K-Means Cluster Algorithm in Python

    I started by asking my new friend, Copilot:

    Write a Python script to load data from a csv file with locations (lat and long) and cluster this data using k-means
    So, this was my first time asking Copilot a question. Earlier, I used ChatGPT. I suspect I need to tweak my prompts a bit. Anyway, after some small talk back and forth, Copilot gave me this Python code:
    import pandas as pd
    import matplotlib.pyplot as plt
    from sklearn.cluster import KMeans
    
    # Load the data from the CSV file
    file_path = 'locations.csv'
    data = pd.read_csv(file_path)
    
    # Extract the latitude and longitude columns
    X = data[['LAT', 'LONG']]
    
    # Perform k-means clustering
    kmeans = KMeans(n_clusters=3, random_state=0).fit(X)
    data['Cluster'] = kmeans.labels_
    
    # Plot the clusters
    plt.figure(figsize=(10, 6))
    plt.scatter(data['LAT'], data['LONG'], c=data['Cluster'], cmap='viridis', marker='o')
    plt.xlabel('Latitude')
    plt.ylabel('Longitude')
    plt.title('K-Means Clustering of Locations')
    plt.colorbar(label='Cluster')
    plt.show()

    This code, however, does more than cluster the sample data. It also creates a small graph that illustrates the clustered data.

    I copied the K-Means code and inserted it into the SQL Window in MapInfo Pro. I modified it slightly to not use hardcoded file names and added a call that exports the result to a CSV file.

    data = pd.read_csv(filecsv, header=0)
    
    X = data[['Lat', 'Long']]
    
    kmeans = KMeans(n_clusters=numclusters, random_state=0).fit(X)
    data['Cluster'] = kmeans.labels_
    
    filecsv = eval('TempFileName$("")')
    data.to_csv(filecsv, index=False)
    As you can see, it only takes 6 lines of Python code to load data from a CSV file, cluster it, and save the result back out to another CSV file.
    That was easy. Now we only need to add some preprocessing and postprocessing.

    Initializing Process and Exporting Data 

    I start by defining a few variables to hold the values relevant for the process, such as the table name, column names for the unique ID in the data, the cluster ID column, and the desired number of clusters.

    I also define a folder where I want to store the temporary data and set a name for the CSV file for the exported data.

    Finally, I query my table to extract the ID/name and centroid coordinates. This is all I use when clustering the data points. The query result is exported to the CSV file. 

    tabname = 'SiteFile'
    colSiteID = 'SiteName'
    colClusterID = 'ClusterID'
    numclusters = 12
    
    foldertemp = eval('PathToDirectory$(TempFileName$(""))')
    filecsv = os.path.join(foldertemp, 'locations.csv')
    
    do(f'Set CoordSys Table {tabname}')
    do(f'Select {colSiteID} As "ID", CentroidX(OBJ) As "Long", CentroidY(OBJ) As "Lat" From {tabname} into q_export NoSelect Hide')
    do(f'Export q_export Into "{filecsv}" Type "ASCII" Delimiter "," Titles Overwrite') 
    do('Close Table q_export')

    After this, the cluster code shown above will be executed, creating the resulting CSV file.

    Loading Results into the Original Table

    With the result saved to a CSV file, we need to load this back into MapInfo Pro and update the existing table with the resulting cluster information.

    I created this as a Python method that can be called with the necessary parameters: the name of the CSV file, the table name, and the two columns for the ID and the cluster information.

    I create a TAB file referencing the CSV file using the MapBasic Register Table statement. As you can see here and earlier, in some cases, the code executed is MapBasic code. This is executed from Python using the do method.

    Then I open this table in MapInfo Pro.

    With the table opened, I can now update the original table with the cluster ID.

    def UpdateTableWithClusterID(filecsv, tabname, colSiteID, colClusterID):
        try:
            if os.path.exists(filecsv):
                filetemp = eval('TempFileName$("")')
                
                do(f'Register Table "{filecsv}" TYPE ASCII Delimiter 44 Titles Charset "UTF-8" Into "{filetemp}"')
                pro.Catalog.OpenTable(filetemp, int(OpenTableFlags.Hide), 'csv_hidden')
                table = pro.Catalog.HiddenTable('csv_hidden')
                
                if table:
                    do(f'Add Column {tabname} ({colClusterID}) From {table.Alias} Set To Cluster Where {colSiteID} = ID')
                    print(f"The clustering results have been written back to table '{tabname}'.")
                    table.Close()
                else:
                    print('Cluster Result could not be opened')
                        
        except Exception as e:
            print("Error: {}".format(e))
            if table:
                table.Close()

    Initializing the Python Tool

    This Python tool uses several Python modules that need to be imported and also installed for MapInfo Pro to be able to use them.

    When you are writing Python code in MapInfo Pro, you may have to install additional Python modules. There is a very easy way to do this. When you start MapInfo Pro, make sure to right-click on the MapInfo Pro icon and then select Run as Administrator. This will elevate your permissions to also write files to the MapInfo Pro installation folder, which is necessary to be able to install additional Python modules.

    Now you can use a command in Python to download and install a new module:

    install_module('scikit-learn')

    When you run this command, the module will be downloaded through PIP, the Package Installer for Python.

    If the module has already been installed, you can see lines like these in the message window:

    Requirement already satisfied: scikit-learn in c:\mapinfo\pro\v23.1\python310\lib\site-packages (1.6.1)
    Requirement already satisfied: numpy>=1.19.5 in c:\mapinfo\pro\v23.1\python310\lib\site-packages (from scikit-learn) (1.24.3)
    Requirement already satisfied: scipy>=1.6.0 in c:\mapinfo\pro\v23.1\python310\lib\site-packages (from scikit-learn) (1.11.1)
    Requirement already satisfied: joblib>=1.2.0 in c:\mapinfo\pro\v23.1\python310\lib\site-packages (from scikit-learn) (1.5.1)
    Requirement already satisfied: threadpoolctl>=3.1.0 in c:\mapinfo\pro\v23.1\python310\lib\site-packages (from scikit-learn) (3.6.0)

    I am using several Python modules. I wasn't sure if we deploy these automatically with Python when we install MapInfo Pro, so at the first run of my code, I tried to install these:

    install_module('pandas')
    install_module('matplotlib')
    install_module('scikit-learn')

    After the initial run, I added a hash (#) in front of each of these 3 lines, as I now had deployed them and no longer needed to install them. The hash marks the lines as comments in Python, meaning they won't get executed.

    Finally, you also need to import the modules you are using in your source code. These are my imports:

    import os
    import pandas as pd
    import matplotlib.pyplot as plt
    from sklearn.cluster import KMeans
    from os.path import join, dirname, abspath
    from MapInfo.Types.Data import OpenTableFlags

    You can probably skip mathplotlib as I ended up not using this module. The graph that it creates, I can also see in my map window using a thematic map on the cluster ID.

    The K-Means method has plenty of parameters that can help fine-tune the result. You can find the documentation for the method on the scikit-learn documentation page.
    If you want more control, you can also consider using the method KMeansConstrained. It does, for example, allow you to set the minimum and maximum size of the resulting clusters.
    Here is the full Python script. Note that the 3 commands for installing the additional Python modules have been commented out at the top.
    #install_module('pandas')
    #install_module('matplotlib')
    #install_module('scikit-learn')
    
    import os
    import pandas as pd
    import matplotlib.pyplot as plt
    from sklearn.cluster import KMeans
    from os.path import join, dirname, abspath
    from MapInfo.Types.Data import OpenTableFlags
    
    def UpdateTableWithClusterID(filecsv, tabname, colSiteID, colClusterID):
        try:
            if os.path.exists(filecsv):
                filetemp = eval('TempFileName$("")')
                
                do(f'Register Table "{filecsv}" TYPE ASCII Delimiter 44 Titles Charset "UTF-8" Into "{filetemp}"')
                pro.Catalog.OpenTable(filetemp, int(OpenTableFlags.Hide), 'csv_hidden')
                table = pro.Catalog.HiddenTable('csv_hidden')
                
                if table:
                    do(f'Add Column {tabname} ({colClusterID}) From {table.Alias} Set To Cluster Where {colSiteID} = ID')
                    print(f"The clustering results have been written back to table '{tabname}'.")
                    table.Close()
                else:
                    print('Cluster Result could not be opened')
                        
        except Exception as e:
            print("Error: {}".format(e))
            if table:
                table.Close()
    
    tabname = 'Sites'
    colSiteID = 'SiteName'
    colClusterID = 'ClusterID'
    numclusters = 2
    
    foldertemp = eval('PathToDirectory$(TempFileName$(""))')
    filecsv = os.path.join(foldertemp, 'locations.csv')
    
    do(f'Set CoordSys Table {tabname}')
    do(f'Select {colSiteID} As "ID", CentroidX(OBJ) As "Long", CentroidY(OBJ) As "Lat" From {tabname} into q_export NoSelect Hide')
    do(f'Export q_export Into "{filecsv}" Type "ASCII" Delimiter "," Titles Overwrite') 
    do('Close Table q_export')
    
    data = pd.read_csv(filecsv, header=0)
    
    X = data[['Lat', 'Long']]
    
    kmeans = KMeans(n_clusters=numclusters, random_state=0, init='k-means++').fit(X)
    data['Cluster'] = kmeans.labels_
    
    filecsv = eval('TempFileName$("")')
    data.to_csv(filecsv, index=False)
    
    #Resetting the dataframe
    data.iloc[:0]
    
    UpdateTableWithClusterID(filecsv, tabname, colSiteID, colClusterID)

    I have also included it as a MapInfo Script file (MIS) in the attached ZIP file.

    I hope you find this useful and that you can see how utilizing the Python support within MapInfo Pro significantly expands the possibilities.


    ------------------------------
    Peter Horsbøll Møller
    Principal Presales Consultant | Distinguished Engineer
    Precisely | Trust in Data
    ------------------------------