Data Points

 View Only
  • 1.  Cleansing and Curation of POI Data

    Employee
    Posted 03-20-2019 12:01

    It's very common as the POI Product Manager to get questions regarding the existence of a particular business at any given location.  Often, I hear questions along the lines of, "why does it take so long for a business to appear within the database?"

    The short answer is, we take the extra steps to cleanse and curate our data to provide the best possible information about a given location.  How do we do that?

    Our curation process is multi-layered.  We have set an expectation of producing only information that we feel is trusted and validated.  Therefore, we run all of our data feeds through a vigorous process of cleansing, de-duplication, and quality assurance before they are added to the final database.  So what does that mean?

    Our data is fed from over 3000 unique locations, many of which are authoritative sources of data.  Public Registries, telephone directories, and Local Mercantile Registries are just a few of the sources of data used to gather information about a location.   All of which gets collected and standardized into a central database.  Until we have enough sources of data in agreement on the information of a given location, it is held, but not published.  This is very important, as it is a built-in confidence of the validity of our data.  Many other sources of POI data on the market will not cleanse the data.  Instead, they will publish all information found, and leave it to the user to clean.

    While this process may slow a particular point from updating, it adds confidence and cleanliness to the finalized product.

     



    ------------------------------
    Thomas McKean
    Product Manager, Location Data
    PITNEY BOWES SOFTWARE, INC
    Boulder, CO
    ------------------------------


  • 2.  RE: Cleansing and Curation of POI Data

    Employee
    Posted 03-22-2019 11:35
    Hi Tom, 

    One of the topics I've encountered in managing business listing data is that information is often inconsistent.  Fill rates can vary greatly from source to source, and even for different fields in the same source.   Data can also be very different from one geography to another.  

    How do you work towards standardizing and enriching the data when information is sparse?



    ------------------------------
    Dan Adams
    Pitney Bowes
    White River Junction VT
    ------------------------------



  • 3.  RE: Cleansing and Curation of POI Data

    Employee
    Posted 03-22-2019 14:50
    Hey Dan,

    Great Question!  Inconsistencies in data is natural as all major business databases must rely on a variety of different sources of information to create a well rounded product.  We've established an ecosystem of Machine Learning to address this issue.  Our methodology of analyzing every aspect of a record and using that analysis to build trends and confidence will allow us to not only standardize the information about businesses but also to enrich locations with information that my have otherwise been inaccessible.  We are currently undergoing a very large scale project which focuses on standardization of the brand, and it's parentage.

    A great example of this would be in the fast food industry.  Our ML process can look at a location named "Taco Bell", which may not have any associated information included.  We validate that it is indeed the franchised fast food restaurant that's parentage links to Yum Brands through our ML process, and fill in the relevant information about Yum Brands in the "Taco Bell" record.

    ------------------------------
    Thomas McKean
    Product Manager, Location Data
    PITNEY BOWES SOFTWARE, INC
    Maitland FL
    ------------------------------