Test-Driven Anonymization in Health Data: A Case Study on Assistive Reproduction

Aug 3, 2020·

Cristian Augusto

Miguel Olivero

Jesús Morán

Leticia Morales

Claudio de la Riva

Javier Aroba

Javier Tuya

· 0 min read

PDF Cite Project Slides DOI

Abstract

Artificial intelligence (AI) is a broad field whose prevalence in the health sector has increased during recent years. Clinical data are the basic staple that feeds intelligent healthcare applications, but due to its sensitive character, its sharing and usage by third parties require compliance with both confidentiality agreements and security measures. Data Anonymization emerges as a solution to both increasing the data privacy and reducing the risk against unintentional disclosure of sensitive information through data modifications. Despite the anonymization improves privacy, the diverse modifications also harm the data functional suitability. These data modifications can affect to the applications that employ the anonymized data, especially those that are data-centric as the AI tools. To obtain a trade-off between both qualities (privacy and functional suitability), we use the Test-Driven Anonymization (TDA) approach, which anonymizes incrementally the data to train the AI tools and validate with the real data until maximize its quality. The approach is evaluated in a real-world dataset from the Spanish Institute for the Study of the Biology of Human Reproduction (INEBIR). The anonymized datasets are used to train AI tools and select the dataset that gets the best trade-off between privacy and functional quality requirements. The results show that TDA can be successfully applied to anonymize the clinical data of the INEBIR, allowing third parties to transfer without transgressing the user privacy and develop useful AI Tools with the anonymized data.

Type

Conference paper

Publication

In Proceedings - 2020 IEEE International Conference on Artificial Intelligence Testing, AITest 2020