Classification of Privacy Preserving Data Mining Algorithms: A Review

       Dedi Gunawan


Nowadays, data from various sources are gathered and stored in databases. The collection of the data does not give a significant impact unless the database owner conducts certain data analysis such as using data mining techniques to the databases. Presently, the development of data mining techniques and algorithms provides significant benefits for the information extraction process in terms of the quality, accuracy, and precision results. Realizing the fact that performing data mining tasks using some available data mining algorithms may disclose sensitive information of data subject in the databases, an action to protect privacy should be taken into account by the data owner. Therefore, privacy preserving data mining (PPDM) is becoming an emerging field of study in the data mining research group. The main purpose of PPDM is to investigate the side effects of data mining methods that originate from the penetration into the privacy of individuals and organizations. In addition, it guarantees that the data miners cannot reveal any personal sensitive information contained in a database, while at the same time data utility of a sanitized database does not significantly differ from that of the original one. In this paper, we present a wide view of current PPDM techniques by classifying them based on their taxonomy techniques to differentiate the characteristics of each approach. The review of the PPDM methods is described comprehensively to provide a profound understanding of the methods along with advantages, challenges, and future development for researchers and practitioners.



Database; data mining; privacy preserving data mining; sensitive information

Full Text:



U. M. Fayyad, G. Piatetsky-Shapiro, and P. Smyth, “Advances in Knowledge Discovery and Data Mining,” U. M. Fayyad, G. Piatetsky-Shapiro, P. Smyth, and R. Uthurusamy, Eds. Menlo Park, CA, USA: American Association for Artificial Intelligence, 1996, pp. 1–34.

H. Mark, M. Erik, and V. Sunil, Java Data Mining: Strategy, Standard, and Practice, 1st Editio. Morgan Kaufmann, 2006.

Merdeka.com, “Mafia Jual Beli Data Pribadi.” 2020.

S. Xingzhi and P. S. Yu, “A border-based approach for hiding sensitive frequent itemsets,” 2005. Crossref

X. Sun and P. S. Yu, “A Border-Based Approach for Hiding Sensitive Frequent Itemsets,” in Proceedings of the Fifth IEEE International Conference on Data Mining, 2005, pp. 426–433. Crossref

M. Atallah, A. Elmagarmid, M. Ibrahim, E. Bertino, and V. Verykios, “Disclosure Limitation of Sensitive Rules,” in Proceedings of the 1999 Workshop on Knowledge and Data Engineering Exchange, 1999, pp. 45--. Crossref

R. Agrawal and R. Srikant, “Privacy-preserving Data Mining,” in Proceedings of the 2000 ACM SIGMOD International Conference on Management of Data, 2000, pp. 439–450. Crossref

B. Pinkas, “Cryptographic techniques for privacy-preserving data mining,” ACM SIGKDD Explorations Newsletter, 2002. Crossref

Y. Lindell and B. Pinkas, “Privacy preserving data mining,” Journal of Cryptology, 2003. Crossref

V. S. Verykios, A. K. Elmagarmid, E. Bertino, Y. Saygin, and E. Dasseni, “Association Rule Hiding,” IEEE Transactions on Knowledge and Data Engineering, 2004. Crossref

M. B. Malik, M. A. Ghazi, and R. Ali, “Privacy preserving data mining techniques: Current scenario and future prospects,” 2012. Crossref

V. S. Verykios et al., “State-of-the-art in Privacy Preserving Data Mining Classification of Privacy Pre,” ACM SIGMOD Record, vol. 33, no. 1, pp. 50–57, 2004. Crossref

L. Chun-Wei, H. Tzung-Pei, C. Chia-Ching, and W. Shyue-Liang, “A Greedy-based Approach for Hiding Sensitive Itemsets by Transaction Insertion,” Journal of Information Hiding and Multimedia Signal Processing., vol. 4, no. 4, pp. 201–2014, 2013.

J.-L. Lin and Y.-W. Cheng, “Privacy Preserving Itemset Mining Through Noisy Items,” Expert Syst. Appl., vol. 36, no. 3, pp. 5711–5717, Apr. 2009. Crossref

L. Liu, M. Kantarcioglu, and B. Thuraisingham, “The Applicability of the Perturbation Based Privacy Preserving Data Mining for Real-world Data,” Data Knowl. Eng., vol. 65, no. 1, pp. 5–21, Apr. 2008. Crossref

A. Evfimievski, R. Srikant, R. Agrawal, and J. Gehrke, “Privacy Preserving Mining of Association Rules,” in Proceedings of the Eighth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 2002, pp. 217–228. Crossref

J. Domingo-Ferrer and V. Torra, “Ordinal, Continuous and Heterogeneous k-Anonymity Through Microaggregation,” Data Min. Knowl. Discov., vol. 11, no. 2, pp. 195–212, Sep. 2005. Crossref

C. C. Aggarwal and P. S. Yu, Privacy-Preserving Data Mining: Models and Algorithms, 1st ed. Springer Publishing Company, Incorporated, 2008.

G. Navarro-Arribas, V. Torra, A. Erola, and J. Castellà-Roca, “User k-anonymity for privacy preserving data mining of query logs,” Information Processing & Management, vol. 48, no. 3, pp. 476–487, 2012. Crossref

S. Martínez, D. Sánchez, and A. Valls, “Semantic adaptive microaggregation of categorical microdata,” Computers and Security, vol. 31, no. 5, pp. 653–672, 2012. Crossref

M. Rodriguez-Garcia, M. Batet, and D. Sánchez, “A semantic framework for noise addition with nominal data,” Knowledge-Based Systems, 2017. Crossref

M. Batet, A. Erola, D. Sánchez, and J. Castellà-Roca, “Semantic anonymisation of set-valued data,” 2014. Crossref

M. Rodriguez-Garcia, M. Batet, and D. Sánchez, “Semantic noise: Privacy-protection of nominal microdata through uncorrelated noise addition,” 2016. Crossref

A. Rodríguez-Hoyos, J. Estrada-Jiménez, D. Rebollo-Monedero, A. M. Mezher, J. Parra-Arnau, and J. Forné, “The Fast Maximum Distance to Average Vector (F-MDAV): An algorithm for k-anonymous microaggregation in big data,” Engineering Applications of Artificial Intelligence, vol. 90, no. January, p. 103531, 2020. Crossref

T. Dalenius and S. P. Reiss, “Data-swapping: A technique for disclosure control,” Journal of Statistical Planning and Inference, vol. 6, no. 1, pp. 73–85, 1982. Crossref

M. Rodriguez-Garcia, M. Batet, and D. Sánchez, “Utility-preserving privacy protection of nominal data sets via semantic rank swapping,” Information Fusion, vol. 45, no. February 2018, pp. 282–295, 2019. Crossref

W. E. Winkler, “Masking and Re-identification Methods for Public-Use Microdata: Overview and Research Problems,” in Privacy in Statistical Databases, 2004, pp. 231–246.

A. Hundepool et al., Statistical Disclosure Control. 2012.

J. Domingo-Ferrer, D. Sánchez, and J. Soria-Comas, “Database Anonymization: Privacy Models, Data Utility, and Microaggregation-based Inter-model Connections,” Synthesis Lectures on Information Security, Privacy, and Trust, 2016. Crossref

D. P, lane J.i, T. J.J.M, and Z. L.V, Confidentiality, disclosure, and data acces: theory and practical applications for statistical agencies. Amsterdam: Elsevier Science, 2001.

D. Gunawan and M. Mambo, “Data anonymization for hiding personal tendency in set-valued database publication,” Future Internet, vol. 11, no. 6, 2019. Crossref

C. C. Aggarwal and P. S. Yu, “Chapter 2 A General Survey of Privacy-Preserving Data Mining Models and Algorithms,” Privacypreserving data mining, pp. 11–52, 2008. Crossref

Y. Wang, X. Wu, and D. Hu, “Using randomized response for differential privacy preserving data collection,” CEUR Workshop Proceedings, vol. 1558, 2016.

C. Bettini and D. Riboni, “Privacy protection in pervasive systems: State of the art and technical challenges,” Pervasive and Mobile Computing, vol. 17, no. PB, pp. 159–174, 2015. Crossref

Z. Xian, Q. Li, X. Huang, and L. Li, “New SVD-based collaborative filtering algorithms with differential privacy,” Journal of Intelligent and Fuzzy Systems, vol. 33, no. 4, pp. 2133–2144, 2017. Crossref

J. Vaidya and C. Clifton, “Privacy Preserving Association Rule Mining in Vertically Partitioned Data,” in Proceedings of the Eighth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 2002, pp. 639–644. Crossref

N. Domadiya and U. P. Rao, “Privacy Preserving Distributed Association Rule Mining Approach on Vertically Partitioned Healthcare Data,” Procedia Computer Science, vol. 148, no. Icds 2018, pp. 303–312, 2019. Crossref

W. Fang, C. Zhou, and B. Yang, “Privacy preserving linear regression modeling of distributed databases,” Optimization Letters, vol. 7, no. 4, pp. 807–818, Apr. 2013. Crossref

B. Denham, R. Pears, and M. A. Naeem, “Enhancing random projection with independent and cumulative additive noise for privacy-preserving data stream mining,” Expert Systems with Applications, vol. 152, 2020. Crossref

Y. Li, Z. L. Jiang, L. Yao, X. Wang, S. M. Yiu, and Z. Huang, “Outsourced privacy-preserving C4.5 decision tree algorithm over horizontally and vertically partitioned dataset among multiple parties,” Cluster Computing, vol. 22, pp. 1581–1593, 2019. Crossref

M. G. Kaosar, R. Paulet, and X. Yi, “Fully Homomorphic Encryption Based Two-party Association Rule Mining,” Data Knowl. Eng., vol. 76–78, pp. 1–15, Jun. 2012. Crossref

Y. Liu, Y. Luo, Y. Zhu, Y. Liu, and X. Li, “Secure multi-label data classification in cloud by additionally homomorphic encryption,” Information Sciences, vol. 468, pp. 89–102, 2018. Crossref

F. K. Dankar, “Privacy Preserving Linear Regression on Distributed Databases,” Trans. Data Privacy, vol. 8, no. 1, pp. 3–28, Dec. 2015, [Online]. Available: http://dl.acm.org/citation.cfm?id=2870564.2870566.

J. J. Yang, J. Q. Li, and Y. Niu, “A hybrid solution for privacy preserving medical data sharing in the cloud environment,” Future Generation Computer Systems, vol. 43–44, pp. 74–86, 2015. Crossref

S. R. M. Oliveira and O. R. Zaane, “Privacy Preserving Frequent Itemset Mining,” Proceedings of the IEEE international conference on Privacy, security and data mining, 2002.

D. Gunawan and G. Lee, “Heuristic Approach on Protecting Sensitive Frequent Itemsets in Parallel Computing Environment,” in The 1ST UMM International Conference on Pure and Applied Research (UMM-ICOPAR 2015), 2015, pp. 41–49.

Y.-C. Li, J.-S. Yeh, and C.-C. Chang, “MICF: An effective sanitization algorithm for hiding sensitive patterns on data mining,” Advanced Engineering Informatics, vol. 21, no. 3, pp. 269–280, 2007. Crossref

S. R. M. Oliveira and O. R. Zaïane, “Privacy Preserving Clustering By Data Transformation,” Proc. of the 18th Brazilian Symposium on Databases, pp. 304–318, 2003. Crossref

P. Cheng, C. W. Lin, and J. S. Pan, “Use HypE to hide association rules by adding items,” PLoS ONE, 2015. Crossref

L. Zhang, W. Wang, and Y. Zhang, “Privacy Preserving Association Rule Mining: Taxonomy, Techniques, and Metrics,” IEEE Access, vol. 7, pp. 45032–45047, 2019. Crossref

J. Domingo-Ferrer and V. Torra, “Disclosure risk assessment in statistical data protection,” Journal of Computational and Applied Mathematics, 2004. Crossref

D. Gunawan and M. Mambo, “Set-valued data anonymization maintaining data utility and data property,” Jan. 2018. Crossref

J. C. W. Lin, T. P. Hong, P. Fournier-Viger, Q. Liu, J. W. Wong, and J. Zhan, “Efficient hiding of confidential high-utility itemsets with minimal side effects,” Journal of Experimental and Theoretical Artificial Intelligence, vol. 29, no. 6, pp. 1225–1245, 2017. Crossref

S. R. M. Oliveira and O. R. Zaiane, “Privacy preserving frequent itemset mining,” Proceedings of the IEEE international conference on Privacy, security and data mining-Volume 14, vol. 14, pp. 43–54, 2002, [Online]. Available: http://portal.acm.org/citation.cfm?id=850782.850789.

J. Salas and J. Domingo-Ferrer, “Some Basics on Privacy Techniques, Anonymization and their Big Data Challenges,” Mathematics in Computer Science, vol. 12, no. 3, pp. 263–274, 2018. Crossref

Article Metrics

Metrics Loading ...

Metrics powered by PLOS ALM


  • There are currently no refbacks.

Copyright (c) 2020 Jurnal Elektronika dan Telekomunikasi

Creative Commons License
This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.