MapInfo Pro

Expand all | Collapse all

MapInfo Monday: Attribute Completeness

  • 1.  MapInfo Monday: Attribute Completeness

    Posted 12-14-2020 10:42
    Happy #MapInfoMonday! Yet another post with a couple of tips & tricks around MapInfo Pro.

    This time I will give you some hints around analyzing attribute completeness in your data. This is interesting when you start looking at the quality of your data. Is it complete or how complete is it?

    Sorting and Filtering using the Browser window

    In MapInfo Pro, you can use the browser window to get an overview of the completeness of the data.

    With a single click on the column title, you can sort the table by that column and quickly see empty strings or 0 values. You can also right-click in the browser window, and use the Filter option to find all records where a numerical column equals 0 or where a character column is empty.

    But if you have many records and many columns, it can be hard to keep an overview.

    Analyzing the values and aggregating statistics

    Another approach that has been made possible after the introduction of the IIf() function in MapInfo Pro v17.0, is to aggregate the column values and count how many are valid and how many are considered not valid, or not filled in.

    The IIf() function makes it possible for you to convert one value to another depending on the value. This is useful when you want to establish the number of valid and invalid values in your data.

    Syntax:
    IIf( expression, truepart, falsepart )

    The IIf() function takes three parameters:
    1. A condition in the form of an expression
    2. The value to return if the condition is true
    3. The value to return if the condition is false
    When it comes to analyzing the attribute completeness of your data, you often want to compare the attribute to a valid or invalid value to determine whether it has been filled in or not.

    Here are three examples:
    • IIf(t.frednr="",0,1):  If the value in the column named frednr is empty, the value has not been filled in and the IIf() function returns 0. Otherwise, it will return 1.
    • IIf(t.anlnr=0,0,1): If the value in the column named anlnr is 0, the value has not been filled in and the IIf() function returns 0. Otherwise, it will return 1.
    • IIf(t.stednavn In ("", "Udefineret"),0,1): if the value in the column named stednavn is either empty or "Udefineret" which is Danish for undefined, the value has not been filled in and the IIf() function returns 0. Otherwise, it will return 1.
    I hope these three basic examples give you some idea of how to use the IIf() function to identify values that haven't been filled in.

    I set the return value of the IIf() function to either 0 or 1. This makes it possible for me to use this return value in two aggregates to be able to count the number of values that have been filled in and also to calculate a percentage of the filled-in values using the average aggregate.

    In the SQL window below, you can see how I have built a Select statement to count the total number of records in the table, and the number of valid values, and the average for three columns in my dataset. The summed value (Sum) gives me a count of records with a valid value. The average value (Avg) gives me a percentage, as a decimal value, of the attribute completeness for the individual fields.

    Select Count(*)
      , Sum(IIf(t.frednr="",0,1)) "Count_frednr"
      , Avg(IIf(t.frednr="",0,1)) "Avg_frednr"
      , Sum(IIf(t.anlnr=0,0,1)) "Count_anlnr"
      , Avg(IIf(t.anlnr=0,0,1)) "Avg_anlnr"
      , Sum(IIf(t.stednavn In ("", "Udefineret"),0,1)) "Count_anlnr"
      , Avg(IIf(t.stednavn In ("", "Udefineret"),0,1)) "Avg_anlnr"
    From anlaeg_all_25832 As "t"
    Into Selection

    Below you can see the result of my query, and the completeness of the three fields. The field frednr only has a valid value in 16% of the records, where the other two columns have valid values in just shy of 100% of the records.

    Grouping the result

    In the example above I only get one record back from the Select statement which gives me a single number for all the records in the table.

    Sometimes it would make sense to dive a bit more into these numbers to understand how they have originated, and maybe identify why some fields have bad completeness.

    You can do this by grouping the values by a column. In the example below, I have grouped the result by an area of interest but it could also be grouped by time or by users to see if the is a correlation between any of these and the missing values in a field.

    Maybe the field was added recently so all the records that originated prior to this date don't have any values in this field.

    I hope this gives you some ideas on how you can use SQL Select in MapInfo Pro to analyze your data attributes.

    If you have ideas for topics in an upcoming #MapInfoMonday post, please leave a comment with your idea.​

    ------------------------------
    Peter Horsbøll Møller
    Principal Presales Consultant | Distinguished Engineer
    Precisely | Trust in Data
    ------------------------------


  • 2.  RE: MapInfo Monday: Attribute Completeness

    Posted 12-15-2020 17:11
    Thanks Peter, good article.

    However, I am not sure I understand how your query returned decimal values (251.997 and 251.995) for a count aggregation? Also, I would have thought the reuse of the same column alias ("Count_anlnr" and "Avg_anlnr") for two different calculations would have caused some problems?

    ------------------------------
    Matthew Lee
    Deep End Services
    South Yarra VIC
    ------------------------------



  • 3.  RE: MapInfo Monday: Attribute Completeness

    Posted 12-16-2020 02:30
    Thanks for reaching out, Matthew.

    I'm using Danish regional settings so my decimal symbol is a comma, not a point. The point you see in the numbers is a digit grouping symbol. MapInfo Pro is using the Windows Regional Settings when representing numbers and dates, for example in browser windows.

    For you, the number might look more correct this way: 251,997 or even 251997.

    With regards to the columns alias, that's certainly a mistake on my part. A "Copy & Paste" error where I have forgotten to rename the column alias.

    It will not fail when running the query but if you try to save the result into a new table, or use the query in a new query you can run into trouble. MapInfo Pro will probably rename these columns to _COL6 and _COL7 or similar.

    ------------------------------
    Peter Horsbøll Møller
    Principal Presales Consultant | Distinguished Engineer
    Precisely | Trust in Data
    ------------------------------