Scalable Data Archival & Purging Mechanism for HighVolume Applications

Zahir Sayyed

doi:10.56830/IJSIE202412

Authors

Zahir Sayyed Software Engineer, Jamesburg, New Jersey, USA Author

DOI:

https://doi.org/10.56830/IJSIE202412

Keywords:

Scalable Data Archival, Policy-Driven Purging, Tiered Storage Management, Metadata-Driven Decisioning, Hybrid Machine-Learning Heuristics

Abstract

With IoT, multimedia, financial, and healthcare data, data is exponentially increasing in modern applications, and storage scalability, cost, and privacy compliance are monumental hindrances to contemporary applications. The paper proposes an automatic policy-based scalable framework of data archival and purging in high-volume applications, resolving life cycle operations in real time with support of tiered storage, metadata-based decision-making, and distributed coordination to scale to petabyte scale. This architecture is a decomposition of ingestion, metadata indexing, policy evaluation, and execution, and isolates faults and allows work to be scaled horizontally. It implements a hybrid indexing strategy where the write-heavy metadata ingestion and search data are stored in wide-column stores, and the complex policy queries are made on search engines, and it also introduces configurable retention policies by using data age, access frequency, data size, and business-specific tags. The algorithms used in candidate selection are a rule-based heuristic, a derivative classifier based on supervised learning based on past access patterns, and a combination of the two methods. Synthetic and real-world e-commerce workloads show that decision throughputs can be as high as 20,000 requests per second, and policy latency can be well below a second, with storage costs reduced by an order of magnitude or more, and stay compliant with safe-delete pipelines that have audit trails and rollback capabilities. The analysis shows the trade-offs between latency and cost savings made throughout strategies and identifies the elasticity of the framework as it empowers loads of petabytes. The outcomes confirm the theoretical capabilities of automated, adaptive management of the lifecycle of cloud-native and on-premises infrastructures, which can provide a credible solution to data management requirements of modern solutions and additional operational resiliency.

References

Alazzawe, A., Pal, A., & Kant, K. (2020). Efficient big-data access: Taxonomy and a comprehensive survey. IEEE transactions on big data, 8(2), 356-376. DOI: https://doi.org/10.1109/TBDATA.2020.3036813

Barbosa, M., Ferreira, B., Marques, J., Portela, B., & Preguiça, N. (2021, January).

Secure conflict-free replicated data types. In Proceedings of the 22nd

International Conference on Distributed Computing and Networking (pp. 615).

Bhanage, D. A., Pawar, A. V., & Kotecha, K. (2021). IT infrastructure anomaly detection and failure handling: A systematic literature review focusing on datasets, log preprocessing, machine & deep learning approaches and automated tool. IEEE Access, 9, 156392-156421. DOI: https://doi.org/10.1109/ACCESS.2021.3128283

Boda, V. V. R., & Allam, H. (2021). Automating Compliance in Healthcare: Tools and Techniques You Need. International Journal of Emerging Trends in Computer Science and Information Technology, 2(3), 38-48. DOI: https://doi.org/10.63282/3050-9246.IJETCSIT-V2I3P105

Cao, Z. (2020). High-performance and cost-effective storage systems for supporting big data applications (Doctoral dissertation, University of Minnesota).

Chavan, A., & Romanov, Y. (2023). Managing scalability and cost in microservices architecture: Balancing infinite scalability with financial constraints. Journal of Artificial Intelligence & Cloud Computing, 5, E102. DOI: https://doi.org/10.47363/JAICC/2023(2)E264

https://doi.org/10.47363/JMHC/2023(5)E102 DOI: https://doi.org/10.47363/JMHC/2023(5)E102

Chintale, P. (2023). DevOps Design Pattern: Implementing DevOps best practices for secure and reliable CI/CD pipeline (English Edition). Bpb Publications.

Crankshaw, D. (2019). The design and implementation of low-latency prediction serving systems (Doctoral dissertation, University of California, Berkeley).

Dai, H., Wang, Y., Kent, K. B., Zeng, L., & Xu, C. (2022). The state of the art of metadata managements in large-scale distributed file systems—scalability, performance and availability. IEEE Transactions on Parallel and Distributed Systems, 33(12), 3850-3869. DOI: https://doi.org/10.1109/TPDS.2022.3170574

Egan, D., Zhu, Q., & Prucka, R. (2023). A review of reinforcement learning-based powertrain controllers: Effects of agent selection for mixed-continuity control and reward formulation. Energies, 16(8), 3450. DOI: https://doi.org/10.3390/en16083450

Karwa, K. (2023). AI-powered career coaching: Evaluating feedback tools for design students. Indian Journal of Economics & Business.

https://www.ashwinanokha.com/ijeb-v22-4-2023.php

Konneru, N. M. K. (2021). Integrating security into CI/CD pipelines: A DevSecOps approach with SAST, DAST, and SCA tools. International Journal of Science and Research Archive. Retrieved from https://ijsra.net/content/rolenotification-scheduling-improving-patient

Kumar, A. (2019). The convergence of predictive analytics in driving business intelligence and enhancing DevOps efficiency. International Journal of Computational Engineering and Management, 6(6), 118-142. Retrieved from https://ijcem.in/wp-content/uploads/THE-CONVERGENCE-OF-

PREDICTIVE-ANALYTICS-IN-DRIVING-BUSINESS-INTELLIGENCE-

AND-ENHANCING-DEVOPS-EFFICIENCY.pdf

Martin, N. G. N. (2023). An efficient Rust implementation of BFT for supporting Byzantine Tolerant Distributed Storage (Master's thesis, Universidade do Porto (Portugal)).

Patgiri, R., & Nayak, S. (2020). A Survey on Large Scale Metadata Server for Big Data Storage. arXiv preprint arXiv:2005.06963.

Raju, R. K. (2017). Dynamic memory inference network for natural language inference. International Journal of Science and Research (IJSR), 6(2). https://www.ijsr.net/archive/v6i2/SR24926091431.pdf DOI: https://doi.org/10.21275/SR24926091431

Sardana, J. (2022). Scalable systems for healthcare communication: A design perspective. International Journal of Science and Research Archive. https://doi.org/10.30574/ijsra.2022.7.2.0253 DOI: https://doi.org/10.30574/ijsra.2022.7.2.0253

Shi, B., & Shen, H. (2019, April). Memory/disk operation aware lightweight vm live migration across data-centers with low performance impact. In IEEE INFOCOM 2019-IEEE Conference on Computer Communications (pp. 334342). IEEE. DOI: https://doi.org/10.1109/INFOCOM.2019.8737639

Singh, V. (2022). Visual question answering using transformer architectures: Applying transformer models to improve performance in VQA tasks. Journal of Artificial Intelligence and Cognitive Computing, 1(E228). https://doi.org/10.47363/JAICC/2022(1)E228 DOI: https://doi.org/10.47363/JAICC/2022(1)E228

Usman, M., Ferlin, S., Brunstrom, A., & Taheri, J. (2022). A survey on observability of distributed edge & container-based microservices. IEEE Access, 10, 8690486919. DOI: https://doi.org/10.1109/ACCESS.2022.3193102

Vangala, V. (2022). MLOps in Practice: A Framework for Scalable AI Model Deployment, Monitoring, and Retraining. International Journal of Machine Learning Research in Cybersecurity and Artificial Intelligence, 13(01), 740753.

Zhang, S., Pandey, A., Luo, X., Powell, M., Banerji, R., Fan, L., ... & Luzcando, E. (2022). Practical adoption of cloud computing in power systems—Drivers, challenges, guidance, and real-world use cases. IEEE Transactions on Smart Grid, 13(3), 2390-2411. DOI: https://doi.org/10.1109/TSG.2022.3148978

Scalable Data Archival & Purging Mechanism for HighVolume Applications

Authors

DOI:

Keywords:

Abstract

References

Downloads

Published

Issue

Section

Make a Submission

Information

Selected Indexes