Test-driven anonymization for artificial intelligence

Apr 4, 2019·

Cristian Augusto

Jesús Morán

Claudio de la Riva

Javier Tuya

· 0 min read

PDF Cite Project Slides DOI

Abstract

In recent years, data published and shared with third parties to develop artificial intelligence (AI) tools and services has significantly increased. When there are regulatory or internal requirements regarding privacy of data, anonymization techniques are used to maintain privacy by transforming the data. The side-effect is that the anonymization may lead to useless data to train and test the AI because it is highly dependent on the quality of the data. To overcome this problem, we propose a test-driven anonymization approach for artificial intelligence tools. The approach tests different anonymization efforts to achieve a trade-off in terms of privacy (non-functional quality) and functional suitability of the artificial intelligence technique (functional quality). The approach has been validated by means of two real-life datasets in the domains of healthcare and health insurance. Each of these datasets is anonymized with several privacy protections and then used to train classification AIs. The results show how we can anonymize the data to achieve an adequate functional suitability in the AI context while maintaining the privacy of the anonymized data as high as possible.

Type

Conference paper

Publication

In Proceedings - 2019 IEEE International Conference on Artificial Intelligence Testing, San Francisco, EEUU

Last updated on Apr 4, 2019

Authors

Cristian Augusto

Assistant Professor of Software Engineering

← Anonimización de datos guiada por pruebas para aplicaciones inteligentes Sep 2, 2019