Happy #MapInfoMonday
This article is based on a question our technical support was given last week.
A customer needed to update one table with values from another table. Not that uncommon a question and the answer is straightforward: Use Update Column.
From the Table tab, click on the Update Column control.
In the Update Column dialog, select the table and column to update, select the table and the column to get the value from, and finally check the join condition to ensure MapInfo Pro got that right too.
In my example, I'm using a table of addresses that I want to enrich with values from a table of parcels. I'm using a spatial join: where parcels contains addresses parcels
.
You can use attribute joins too if there is no spatial relationship.
If you have the MapBasic window open when running the Update Column process, you will see a statement like this:
Add Column "Addresses" (ejerlavkode) From Parcels Set To ELAVSKODE Where contains
Back to the question to our support team. The question wasn't as much about how to do it but more about optimizing it to run faster. He had to update not just one column, but multiple columns with values from a second table.
The obvious solution is to repeat the command above as many times as you have columns to update.
Add Column "Addresses" (ejerlavkode) From Parcels Set To ELAVSKODE Where contains
Add Column "Addresses" (ejerlavnavn) From Parcels Set To ELAVSNAVN Where contains
Add Column "Addresses" (matrikelnr) From Parcels Set To MATRNR Where contains
But when you start to look into the performance of it - that's not a good idea. It isn't a big issue if you only have a few hundred thousand records. But when you cross into the millions, you will start noticing the performance issue.
I ran the test with a table of addresses and parcels for Denmark: 3.7 million addresses and 2.5 million parcels.
Running one of the Add Column
statements above take 5-6 minutes with my sample data. The second also takes this amount of time and the third and fourth. I think you get the point.
There is another approach where you will start seeing the benefit when you have more than one column to update.
This other approach has two steps to it:
- Join the two tables and include the columns to write to and read from.
- Use an
Update
statement to update the columns in the resulting query.
This will typically be faster than running 2 of the Add Column
statements.
Here's an example of the Select
statement:
Select a.ejerlavkode As "WriteTo1"
, a.ejerlavnavn As "WriteTo2"
, a.matrikelnr As "WriteTo3"
, p.ELAVSKODE As "ReadFrom1"
, p.ELAVSNAVN As "ReadFrom2"
, p.MATRNR As "ReadFrom3"
From Addresses As "a", Parcels As "p"
Where a.OBJ Within p.OBJ
Into __To_Update NoSelect
Notice that I give my column aliases. This makes it easier for me when creating the Update
statement. That's especially the case when the columns share the same names in the two tables.
After that, we have to update the resulting query with an Update
statement:
Update __To_Update
Set WriteTo1 = ReadFrom1
, WriteTo2 = ReadFrom2
, WriteTo3 = ReadFrom3
Do note that the Update
statement updates all three columns in one statement. You don't have to use a Update
statement for each column. This is also one of the time-saving elements of the second method.
Using this approach and updating all 3 columns took around 8-10 minutes - for all three columns. The initial approach took around 15-18 minutes for all three columns.
I used the MapBasic window to build these scripts but you can also use the SQL window.
This is the full script you can see in the MapBasic window. It also includes some timing variables making it possible to compare the two methods:
------------------------------
Peter Horsbøll Møller
Principal Presales Consultant | Distinguished Engineer
Precisely | Trust in Data
------------------------------