Happy #MapInfoMonday! Yet another post with a couple of tips & tricks around MapInfo Pro.
This time I will give you some hints around analyzing attribute completeness in your data. This is interesting when you start looking at the quality of your data. Is it complete or how complete is it?
Sorting and Filtering using the Browser window
In MapInfo Pro, you can use the browser window to get an overview of the completeness of the data.
With a single click on the column title, you can sort the table by that column and quickly see empty strings or 0 values. You can also right-click in the browser window, and use the
Filter option to find all records where a numerical column equals 0 or where a character column is empty.
But if you have many records and many columns, it can be hard to keep an overview.
Analyzing the values and aggregating statistics
Another approach that has been made possible after the introduction of the
IIf()
function in MapInfo Pro v17.0, is to aggregate the column values and count how many are valid and how many are considered not valid, or not filled in.
The
IIf()
function makes it possible for you to convert one value to another depending on the value. This is useful when you want to establish the number of valid and invalid values in your data.
Syntax:
IIf( expression, truepart, falsepart )
The
IIf()
function takes three parameters:
- A condition in the form of an expression
- The value to return if the condition is true
- The value to return if the condition is false
When it comes to analyzing the attribute completeness of your data, you often want to compare the attribute to a valid or invalid value to determine whether it has been filled in or not.
Here are three examples:
IIf(t.frednr="",0,1)
: If the value in the column named frednr
is empty, the value has not been filled in and the IIf() function returns 0. Otherwise, it will return 1.
IIf(t.anlnr=0,0,1)
: If the value in the column named anlnr
is 0, the value has not been filled in and the IIf() function returns 0. Otherwise, it will return 1.
IIf(t.stednavn In ("", "Udefineret"),0,1)
: if the value in the column named stednavn
is either empty or "Udefineret" which is Danish for undefined, the value has not been filled in and the IIf() function returns 0. Otherwise, it will return 1.
I hope these three basic examples give you some idea of how to use the IIf() function to identify values that haven't been filled in.
I set the return value of the
IIf()
function to either 0 or 1. This makes it possible for me to use this return value in two aggregates to be able to count the number of values that have been filled in and also to calculate a percentage of the filled-in values using the average aggregate.
In the
SQL window below, you can see how I have built a Select statement to count the total number of records in the table, and the number of valid values, and the average for three columns in my dataset. The summed value (
Sum
) gives me a count of records with a valid value. The average value (
Avg
) gives me a percentage, as a decimal value, of the attribute completeness for the individual fields.
Select Count(*)
, Sum(IIf(t.frednr="",0,1)) "Count_frednr"
, Avg(IIf(t.frednr="",0,1)) "Avg_frednr"
, Sum(IIf(t.anlnr=0,0,1)) "Count_anlnr"
, Avg(IIf(t.anlnr=0,0,1)) "Avg_anlnr"
, Sum(IIf(t.stednavn In ("", "Udefineret"),0,1)) "Count_anlnr"
, Avg(IIf(t.stednavn In ("", "Udefineret"),0,1)) "Avg_anlnr"
From anlaeg_all_25832 As "t"
Into Selection
Below you can see the result of my query, and the completeness of the three fields. The field
frednr
only has a valid value in 16% of the records, where the other two columns have valid values in just shy of 100% of the records.
Grouping the result
In the example above I only get one record back from the Select statement which gives me a single number for all the records in the table.
Sometimes it would make sense to dive a bit more into these numbers to understand how they have originated, and maybe identify why some fields have bad completeness.
You can do this by grouping the values by a column. In the example below, I have grouped the result by an area of interest but it could also be grouped by time or by users to see if the is a correlation between any of these and the missing values in a field.
Maybe the field was added recently so all the records that originated prior to this date don't have any values in this field.
I hope this gives you some ideas on how you can use SQL Select in MapInfo Pro to analyze your data attributes.
If you have ideas for topics in an upcoming #MapInfoMonday post, please leave a comment with your idea.
------------------------------
Peter Horsbøll Møller
Principal Presales Consultant | Distinguished Engineer
Precisely | Trust in Data
------------------------------