This brings to mind the issue of ethics around data curation and data use, which is a subject that is growing in importance as we collect, use and analyze personal data. Here's a checklist of questions to ask when collecting and using (curating) data that can help guide us (from Ethics and Data Science by Mike Loukides, Hilary Mason & DJ Patil Kindle Version location 157
https://www.amazon.com/Ethics-Data-Science-Mike-Loukides-ebook/dp/B07GTC8ZN7). Since Dan started this conversation around data curation I've highlighted some items below I think are key in that context.
"
As a community we have the opportunity to ask important questions about our new industry as it and we develop:
❏ Have we listed how this technology can be attacked or abused?
❏ Have we tested our training data to ensure it is fair and representative?
❏ Have we studied and understood possible sources of bias in our data?
❏ Does our team reflect diversity of opinions, backgrounds, and kinds of thought?
❏ What kind of user consent do we need to collect to use the data?
❏ Do we have a mechanism for gathering consent from users?
❏ Have we explained clearly what users are consenting to?
❏ Do we have a mechanism for redress if people are harmed by the results?
❏ Can we shut down this software in production if it is behaving badly?
❏ Have we tested for fairness with respect to different user groups?
❏ Have we tested for disparate error rates among different user groups?
❏ Do we test and monitor for model drift to ensure our software remains fair over time?
❏ Do we have a plan to protect and secure user data?"
Thanks,
Cecily
------------------------------
Cecily Herzig
PITNEY BOWES SOFTWARE, INC
Maitland FL
------------------------------
Original Message:
Sent: 07-18-2019 09:28
From: Samantha Martino
Subject: What is "Data Curation"
Hi Dan -
In light of this fascinating Forbes article. I thought I would comment on the mention of data curation and its responsibilities to the providers as well as dispensing intended use cases for potential customers/clients.
Here is the article for review:
https://www.forbes.com/sites/johnkoetsier/2019/07/17/viral-app-faceapp-now-owns-access-to-more-than-150-million-peoples-faces-and-names/#55c2f33362f1
FaceApp has now curated over 150 million pieces of important, not so private, pieces of data. To what degree does FaceApp owe the providers of the data (consumers of the app) and to what degree is FaceApp now responsible for how the data can/will be used in the future?
In my opinion, consumers have fully embraced the shift of data driven Apps, as a necessary evil. In doing so, also believe they are somehow protected.
The responsibility then ultimately falls on the data curators, to be held accountable for the "preserving, describing and delivering of value of the information contained in data to users."
In this use case, FaceApp has the responsibility to find thoughtful uses for this curation of data.
Thanks
Sam
------------------------------
Samantha Martino
Pitney Bowes
White River Junction, VT, USA
Original Message:
Sent: 05-09-2019 10:49
From: Dan Adams
Subject: What is "Data Curation"
The term Data Curation has taken hold and is now mainstream enough to be picked up by both marketing teams and MBAs as one of the ways they explain "what the data people do".
Years ago, "data steward" was used to describe the care-takers of data within an organization. Data Steward is still widely used to describe a person that is responsible for the care and feeding of a specific database or process. (I'll draft definition of that term if there's interest-- or feel free to post one).
Both Data Curation and Data Steward can mean many different things to both the speaker and the listener with enough ambiguity to completely allow them talk past each other and leave the conversation with different understandings of who is doing what without knowing it.
With that in mind, I'll offer this definition for Data Curation:
Data Curation:the process of gathering, sorting, formatting, cleansing, standardizing and maintaining data with a sense of responsibility for preserving, describing and delivering the value of the information contained in data to users.
My assertion is that Data Curation is an action of responsibility and authority that offers a service to other data users. The description of specific application of data and the value of the data in intended uses may be the most important aspect of this definition. A conversation with Colleen Reed, a co-worker and community member, left me with the question: does curation include documentation of such aspects as use cases?
Please, let me know your thoughts on this. I've been called a Data Curator a couple of times in the last week, and I'm curious to dig into what that means within the broader data community.
------------------------------
Dan Adams
Pitney Bowes
White River Junction VT
------------------------------