Data Anonymization


In recent years, escalation of technology led to an increase in the capability to record and store personal data about consumers and individuals. This raised concerns on personal data misuses in many different ways. To mitigate these issues, some de-identification methodologies have recently been proposed that, in some well controlled circumstances, allow for the re-use of personal data in privacy- preserving ways. So one such particular method is Data Anonymization. Data anonymization ensures that even if de-identified data is stolen, it is very hard to re- identify it.

There are tools available which transforms the dataset into anonymized dataset. While there are tools that allow data to be anonymized, figuring out the extent of anonymization that can be done without impacting the analytics algorithms is difficult. The main objective of this web application is to make it easy for a user to anonymize the data and also run experiments to figure out the right level of data anonymization. It means the user can upload the dataset and can choose configuration or create a new configuration with in the web interface for de-identification, after that the de-identified or anonymised data can be downloaded. Optionally, the user can choose to run some data mining algorithms like regression or decision trees to compare the impact of anonymization on the quality of results produced. The important task of this project is to create an interface for the user such that the most suitable anonymization configuration can be developed on the dataset

Expected Social Impact:

The code for the data anonymization tool will be uploaded on Github as open source which could be used by external people. Once the code is uploaded with the documentation the Github link will be available below:

Team Members:

Prof Chandrashekar Ramnathan (PI),
2 project students