Geometric Data Perturbation for Privacy-preserving Data Classification

Keke Chen and Ling Liu

 


This project investigates a random-geometric-transformation based data-perturbation approach for privacy preserving data classification. The goal of this perturbation approach is two-fold: preserving the utility of data in terms of classification modeling, and preserving the privacy of data. To achieve the first goal, we identify that many classification models utilize the geometric properties of datasets, which can be preserved by geometric transformation. We prove that the three types of well-known classifiers will deliver the same (or very similar) performance over the geometrically perturbed dataset as over the original dataset. As a result, this perturbation approach guarantees almost no loss of accuracy for three popular classification methods. To reach the second goal, we propose a multi-column privacy model to address the problems of evaluating privacy quality for multidimensional perturbation, and develop an attack-resilient perturbation optimization method. We analyze three types of inference attacks: naive estimation, ICA-based reconstruction, and distribution-based attacks with the proposed privacy metric. Based on the attack analysis, a randomized optimization method is developed to optimize perturbation. Our initial experiments show that this approach can provide high privacy guarantee while preserving the accuracy for the discussed classifiers.

More related geometric transformations will be investigated to meet the requirements of different privacy-preserving mining tasks and models.

 

Matlab code. Note: add the whole directory including the subdirectory into the matlab path. The main function adv_geo_pert1().

 

Representative papers:

  • Keke Chen and Ling Liu: "Towards Attack Resilient Geometric Data Perturbation ", SIAM International Data Mining Conference, 2007 (SDM07)
  • Keke Chen and Ling Liu: "Privacy-Preserving Data Classification with Rotation Perturbation ", Proc. of IEEE Intl. Conf on Data Mining 2005 (ICDM05).