Adaptive Data Pipeline Architectures for Evolving Fraud Patterns Using Graph ML

Dharam Pal Singh

doi:10.56830/IJSIE202603

Authors

Dharam Pal Singh Author

DOI:

https://doi.org/10.56830/IJSIE202603

Keywords:

Fraud detection, graph machine learning, graph neural networks, adaptive systems, streaming data pipelines, concept drift, continual learning

Abstract

The growing sophistication of financial fraud has intensified alongside digital transformation, real-time payment systems, and the expansion of online financial services, resulting in substantial financial losses and operational challenges. Organizations combat these evolving threats by deploying advanced analytics and machine learning models capable of detecting unusual patterns across large volumes of transactional and behavioral data. Yet this approach creates tension: the financial data required for effective fraud detection—particularly personally identifiable information and transaction records—is inherently sensitive and subject to significant privacy protections.

This paper introduces a comprehensive framework for integrating machine learning into ETL pipelines while preserving privacy, enabling real-time financial fraud detection that is both data-driven and secure. The architecture embeds privacy, security, and regulatory compliance throughout every pipeline stage—from initial data ingestion and transformation through model training to real-time fraud alert generation. Our approach combines multiple privacy-enhancing technologies, including differential privacy, homomorphic encryption, federated learning, and secure multi-party computation, allowing organizations to perform collaborative analytics without exposing raw or sensitive data to unauthorized parties. The system incorporates temporal and behavioral modeling alongside external data enrichment and automated fraud registry capabilities, enhancing its ability to identify sophisticated fraud patterns as they evolve. Pipeline orchestration ensures scalability and near real-time processing, delivering timely fraud risk assessments. Experimental results indicate substantial improvements in both detection accuracy and processing speed relative to conventional approaches. Performance gains are further amplified through dimensionality reduction techniques. This framework enables data processing systems to scale dynamically in response to changing demands while preserving operational efficiency and resilience. The resulting ML-enhanced ETL pipeline equips financial institutions with an effective mechanism for minimizing fraud losses while maintaining both operational agility and regulatory compliance.

References

Agrawal, V. (2024, December 20). Understanding Streaming Data Pipelines. Learn | Hevo; Hevo Data. https://hevodata.com/learn/streaming-data-pipelines/

Alarab, I., Prakoonwit, S., & Nacer, M. I. (2020). Competence of graph convolutional networks for anti-money laundering in Bitcoin blockchain. Proceedings of the 5th International Conference on Information System and Data Mining, 23–27. DOI: https://doi.org/10.1145/3409073.3409080

Chen, J., Ma, T., & Xiao, C. (2018). FastGCN: Fast learning with graph convolutional networks via importance sampling. International Conference on Learning Representations.

Chen, Y., Wu, L., & Zaki, M. (2020). Iterative deep graph learning for graph neural networks: Better and robust node embeddings. Advances in Neural Information Processing Systems, 19314–19326.

Chiang, W. L., Liu, X., Si, S., Li, Y., Bengio, S., & Hsieh, C. J. (2019). Cluster-GCN: An efficient algorithm for training deep and large graph convolutional networks. DOI: https://doi.org/10.1145/3292500.3330925

Dal Pozzolo, A., Caelen, O., Johnson, R. A., & Bontempi, G. (2015). Calibrating probability with undersampling for unbalanced classification. 2015 IEEE Symposium Series on Computational Intelligence, 159–166. DOI: https://doi.org/10.1109/SSCI.2015.33

Defferrard, M., Bresson, X., & Van der Gheynst, P. (2016). Convolutional neural networks on graphs with fast localized spectral filtering. Advances in Neural Information Processing Systems, 3844–3852.

Deng, S., Wang, X., Zhang, H., Lian, D., Chen, E., & Yu, Z. (2020). Adversarial attacks and defenses for graph neural networks: Challenges, methods, and opportunities. arXiv preprint arXiv:2009.12119.

Finn, C., Abbeel, P., & Levine, S. (2017). Model-agnostic meta-learning for fast adaptation of deep networks. International Conference on Machine Learning, 1126–1135.

Fiore, U., De Santis, A., Perla, F., Zanetti, P., & Palmieri, F. (2019). Using generative adversarial networks for improving classification effectiveness in credit card fraud detection. Information Sciences, 479, 448–455. DOI: https://doi.org/10.1016/j.ins.2017.12.030

IEEE Computational Intelligence Society. (2019). IEEE-CIS fraud detection dataset. Kaggle Competition. https://www.kaggle.com/c/ieee-fraud-detection

Kumar, S., Hooi, B., Makhija, D., Kumar, M., Faloutsos, C., & Subrahmanian, V. (2018). Rev2: Fraudulent user prediction in rating platforms. Proceedings of the Eleventh ACM International Conference on Web Search and Data Mining, 333–341. DOI: https://doi.org/10.1145/3159652.3159729

Liu, Y., Ao, X., Qin, Z., Chi, J., Feng, J., Yang, H., & He, Q. (2021). Pick and choose: A GNN-based imbalanced learning approach for fraud detection. Proceedings of the Web Conference 2021, 3168–3177. DOI: https://doi.org/10.1145/3442381.3449989

Lundberg, S. M., & Lee, S. I. (2017). A unified approach to interpreting model predictions. Advances in Neural Information Processing Systems, 4765–4774.

Masihullah, S., Negi, M., Matthew, J., & Sathyanarayana, J. (2022). Identifying fraud rings using domain aware weighted community detection. Machine Learning and Knowledge Extraction, 108–122. DOI: https://doi.org/10.1007/978-3-031-14463-9_10

PayPal Engineering. (2021). Graph neural networks for fraud detection at PayPal. PayPal Engineering Blog. https://medium.com/paypal-tech

Pourhabibi, T., Ong, K. L., Kam, B. H., & Boo, Y. L. (2020). Fraud detection: A systematic literature review of graph-based anomaly detection approaches. Decision Support Systems, 133, 113303. DOI: https://doi.org/10.1016/j.dss.2020.113303

Ramanathan, V., et al. (2022). Scaling graph neural networks for fraud detection in production. KDD Workshop on Mining and Learning from Time Series.

Ribeiro, M. T., Singh, S., & Guestrin, C. (2016). “Why should I trust you?” Explaining the predictions of any classifier. Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 1135–1144. DOI: https://doi.org/10.1145/2939672.2939778

Wu, Y., He, H., Gao, D., & He, X. (2021). Federated learning for privacy-preserving fraud detection. IEEE International Conference on Big Data, 2133–2142.

Zheng, L., Li, Z., Li, J., Li, Z., & Gao, J. (2019). AddGraph: Anomaly detection in dynamic graph using attention-based temporal GCN. International Joint Conference on Artificial Intelligence, 4419–4425. DOI: https://doi.org/10.24963/ijcai.2019/614

Adaptive Data Pipeline Architectures for Evolving Fraud Patterns Using Graph ML

Authors

DOI:

Keywords:

Abstract

References

Downloads

Published

Issue

Section

Make a Submission

Information

Selected Indexes