MapInfo Pro

 View Only
  • 1.  MapInfo Monday: Get that Column Indexed

    Employee
    Posted 08-15-2022 04:51
    Happy #MapInfoMonday!

    In today's post, we will look at indexing your columns. We will look at how you do it, and when you should index a column. We will also look at the cost of indexing a column.

    What is a Column Index?

    What does it in fact mean when a column has been indexed? Basically, an index makes it faster for MapInfo Pro to find a specific value as it can do a lookup in an ordered list.

    Typically, the data in your table appear in a random order in the table. So when you are looking for a specific value, you need to run through all the values to check if that's the value you are looking for. This may take some time, especially for large tables. I often compare this to the index at the back of a book telling you a specific word is found on what page. Imagine, you didn't have that index. If you don't, you will have to read the entire book to locate the word.

    The index in the book tells you what page to look at. Similarly, the index of a MapInfo table, tells MapInfo Pro what record to look at. This makes searching for a specific value much faster.

    Later on, I have some examples showing you the difference between having an indexed column and not having one.

    How do I Index a Column?

    You can tell MapInfo Pro to create an index on a column using the Table Structure dialog.

    Right-click on the table in the Table List and select Table Structure.

    This will open up the Table Structure dialog for the table.

    You can the list of columns in the table. To the right, there is a checkbox that tells MapInfo Pro if the column is indexed or not. In the example above, my table has been indexed on the columns Postal_address, Address, and Postal.

    Index the columns that you often use for querying or for thematic maps. But don't overdo it. An index comes with a price.

    What is the Cost of Indexing a Column?

    There is no such thing as a free lunch, unfortunately. That means that you want to carefully consider which columns to index. It doesn't make sense to index all columns in your table.

    The price of an indexed column comes in two parts:
    1. Declined performance on data editing
    2. Larger footprint on your disk
    If a column has been indexed, this index needs to be maintained. This means that MapInfo Pro will update the index if you modify the table. This can take time, especially for large tables.

    When you start editing a table, MapInfo Pro creates what is referred to as transaction files. This is where your changes are stored until you save the edits to your table. For large tables, the index file can be hundreds of MBs or even a few GBs. This will take time to write to your hard drive.

    And then the index itself needs to be maintained. For adding or removing a few records, it will not take a lot of time. But if you are updating the column for all the records, it can radically slow down the process.

    That's why it's recommended to remove the index on a column before running an Update Column process on the column.

    You can also enable Smart Indexing which will automatically remove an index and create it again when you are running batch updates. You enable this under Performance Settings from the Options page on the Backstage.

    In the example below, I will be using a table with address points for Denmark. It contains a bit more than 3.7 million addresses. I have indexed two character columns and an integer column. This creates an index file (.IND) for the table of a size of 2.7GB on disk.

    That's the other price you pay when indexing columns. The table will take up more space on your hard drive.

    What are the Benefits of an Index?

    You will find that it is worth paying that price as it will save you processing time.

    There are a few things in MapInfo Pro where the index is required. That could be creating a thematic map for the table, using the Find operation, or joining the table to another table. In these cases, MapInfo Pro will automatically index the column when needed if it hasn't already been indexed.

    And then there are the operations where an index isn't required but can improve the performance. Let me give you an example.

    As I just said, I have a table with 3.6 million points holding address information.

    It's a common task to be able to query this table for specific addresses. That can be to locate a specific address, the addresses for a specific road, or even all addresses within a specific postal area.

    To do this, you can use the Select by Attribute dialog that allows you to query specific columns of a table.

    Above, you can see an example of such a query where I'm looking for addresses on a given street in a specific postal area. I'm using the start with (Like) operator to find all addresses that start with the specified street name.

    In order to test what difference the index makes, I have tested 4 different queries with and without indexes on the columns. These are the four Where conditions:
    1. Where a.Postal_Address = "3460, Lupinkrogen 4"
    2. Where a.Postal_Address Like "3460, Lupinkrogen%"
    3. Where a.Address = "Lupinkrogen 4" And a.Postal = 3460
    4. Where a.Address Like "Lupinkrogen%" And a.Postal = 3460
    The fourth example is the one you saw in the Select by Attribute dialog.

    The queries are quite similar 1 & 2 and 4 & 5. The difference is that the first two are querying one column holding both address and postal information as a single string. The last two have two columns, one for address information and one for postal information. Below you can see the 3 columns used in a browser.

    In Denmark, we specify the address as street name + house number which is different from for example the US, where it's house number + street name.

    I timed the queries with and without an index, and the results are like this:
    1. Runs for 19 seconds without an index and under 1 second with an index
    2. Runs for 14 seconds without an index and for 12 seconds with an index
    3. Runs for 20 seconds without an index and under 1 second with an index
    4. Runs for 15 seconds without an index and under 1 second with an index
    The numbers above are averages from 5 subsequent runs of each query.

    All the queries are faster - way faster - if you have an index on the columns. The improvement for the Like query with just one column isn't very big but if you use two columns it helps as you can narrow down your search using the postal column where you are looking for a specific value.

    I got inspired to write this article after reading the article Performance Tips for MapInfo Pro written by SGSI. You can find a lot of good information in that article.

    What is your best performance improvement tip?

    ------------------------------
    Peter Horsbøll Møller
    Principal Presales Consultant | Distinguished Engineer
    Precisely | Trust in Data
    ------------------------------


  • 2.  RE: MapInfo Monday: Get that Column Indexed

    Posted 08-18-2022 09:46
    Hi Peter
    The Select by Attribute function does not have the option to re-centre the map or show extent of Selection, does it?

    ------------------------------
    John Ievers
    CDR Group
    Hope Valley, United Kingdom
    ------------------------------



  • 3.  RE: MapInfo Monday: Get that Column Indexed

    Posted 08-18-2022 09:56

    Do you mean this option, John?

    Nick

    ------------------------------
    Nick Hall
    Mapchester LTD
    Manchester
    ------------------------------



  • 4.  RE: MapInfo Monday: Get that Column Indexed

    Posted 08-18-2022 12:14
    That's the one Nick
    It wasn't playing ball with me for a while.
    The gremlin has flown now and all working as expected.
    Cheers

    ------------------------------
    John Ievers
    CDR Group
    Hope Valley, United Kingdom
    ------------------------------



  • 5.  RE: MapInfo Monday: Get that Column Indexed

    Employee
    Posted 08-19-2022 01:49
    One thing that has thrown me off from time to time, is why MapInfo Pro doesn't zoom to the extent of the query result but just pan to the center of it. This can sometimes result in the map showing an area where you can't see any of the selected records.

    This is controlled by a Map Preference on the Editing tab: Zoom on Find Selection


    ------------------------------
    Peter Horsbøll Møller
    Principal Presales Consultant | Distinguished Engineer
    Precisely | Trust in Data
    ------------------------------