Note: This was originally posted by an inactive account. Content was preserved by moving under an admin account.
Hi Jona, Pat,
We are currently working on functionality to support auto data discovery. This will also enable new functionality for auto curation (including auto relationships between technical and business assets).
The steps we are taking are:
- Allow for the storage and retrieval of descriptives of assets in Govern
- Perform analytics on data to extract information needed to classify the data (e.g.: signatures, semantics, patterns, etc)
- Make the above consumable within Govern, Analyze and other third party tools
- Allow users to manage semantics and how to discover them
- Enhance workflow capabilities with new actions, triggers and algorithms (supporting an internal recommendation engine) to facilitate auto curation
Last Sprint we introduced the support to visualize profiles in the side panel. When the profiles are generated by the profiler node from Analyze, we have access to Semantic Type (1) discovery as well as data and structure signature creation. The signatures allows us to easily link a technical asset with similar or duplicate assets based on data as shown below (2).

Clicking on the counts in Match Detection, will take you to grids showing those assets:

Match Detection will be released to dev environments tomorrow. This feature will evolve in the next sprints to allow to do bulk tagging of the assets in the grids (E.g.: to tag them as ‘Duplicates’). Other capabilities to be added later will be to action on them as well (E.g.: Trigger workflow to inquire about the need of duplicates or to get them removed).
For Semantic Types, we are including in tomorrow’s release an endpoint to be able to retrieve assets based on Semantic Type (GET /api/v2/dataprofiles/type/{typeQualifier}/{minConfidence}). This will enable recommendation engines based on Semantic types (e.g.: one could be built in Analyze). It will also enable a future internal recommendation engine integrated with workflow. This will allow for auto mapping of technical assets to business assets (e.g.: Glossary entries) and other relevant objects like policies and rules.
Being able to classify data is key for this effort. Analyze includes a library of preconfigured Semantics, but we realize that users have a need to add their own (e.g.: Product codes, customer accounts). To address this, we are looking at enhancing how Semantics are managed by adding a dedicated UX. This UX will allow users to add new Semantics by defining patterns to look in the data, patterns in labels, include/exclude lists, locales, etc. These definitions can be consumed by the profiling flow in Analyze to label them properly.
Hope this gives you an idea of the direction we are taking with Govern for data discovery and auto curation.
Thanks,
Franco