Addressing power efficiency challenges in AI hardware through verification

Authors

  • Vikas Nagaraj Advanced Micro Devices (United States) image/svg+xml Author

DOI:

https://doi.org/10.56830/IJSIE202401

Keywords:

Power-aware verification, Unified Power Format (UPF), Dynamic Voltage and Frequency Scaling (DVFS), AI accelerators, Power-state coverage

Abstract

AI accelerators already run with constrained energy and thermal budgets, and small inefficiencies are amplified across an entire fleet, resulting in increased costs and emissions. This work redefines power efficiency as a checkable requirement and not a back-silicon addition. It specifies power intent in IEEE 1801 (UPF), encodes protocol noncorrectness with SystemVerilog/PSL assertions, and quantifies progress with power-state, transition, and cross coverage (DVFS X workload phase X thermal bin). The reproducible dataset schema integrates time with microarchitectural counters, voltage, frequency, temperature, and power, measured across real workloads (ResNet, BERT, and attention/GEMM) in simulation, emulation, and instrumented silicon. Telemetry input is synchronised using triggers and PTP/NTP; rails calibrated and error budgets quoted. The continuous integration gates are merged on quantitative thresholds (e.g., >2% p95 energy/inference regression), and dashboards auto-bisect offending changes. Experiments show a hybrid analytical-plus-ML estimator of 3.8-6.1% MAPE at millisecond latency with 30- 60x emulation throughput compared to simulation and mid-single-digit energy reductions due to verification-driven fixes. Case studies involve preventing standby leakage through restored isolation, smoothing a DVFS table to eliminate 10-15 ms oscillations, and fixing compiler schedules that caused incorrect L2 miss models and increased DRAM data traffic. This yields a realistic, start-to-finish pipeline UPF, ABV/formal, emulation/FPGA, calibrated rigs, and CI to bring watts into the top echelon of test metrics and achieve long-lasting efficiency improvements in GPU/NPU/ASIC accelerators. The full scope includes training and inference across 14-5 nm nodes adhering to rigorous safety, ethics, and licensing practices.

References

[1] Amin, S. U., Shahbaz, M. A., Jawed, S. A., Khan, F., Junaid, M., Kaleem, D., ... & Naveed. (2022). Temperature and humidity controlled test bench for temperature sensor characterization. Journal of Electronic Testing, 38(4), 453-461. DOI: https://doi.org/10.1007/s10836-022-06013-y

[2] Chavan, A. (2022). Importance of identifying and establishing context boundaries while migrating from monolith to microservices. Journal of Engineering and Applied Sciences Technology, 4, E168. http://doi.org/10.47363/JEAST/2022(4)E168 DOI: https://doi.org/10.47363/JEAST/2022(4)E168

[3] Chavan, A. (2023). Managing scalability and cost in microservices architecture: Balancing infinite scalability with financial constraints. Journal of Artificial Intelligence & Cloud Computing, 2, E264. http://doi.org/10.47363/JAICC/2023(2)E264 DOI: https://doi.org/10.47363/JAICC/2023(2)E264

[4] Farahpoor, M., Esparza, O., & Soriano, M. (2023). Comprehensive IoT-driven fleet management system for industrial vehicles. IEEE access. DOI: https://doi.org/10.1109/ACCESS.2023.3343920

[5] Gimbitskii, A. (2022). Interconnect design for the edge computing system-on-chip (Doctoral dissertation, MA thesis. Tampere university, 2022. URL: https://urn. fi/URN: NBN: fi: tuni-202206035477).

[6] Glowinski, S., Pecolt, S., Błażejewski, A., & Sobieraj, M. (2023). Design of a LowCost Measurement Module for the Acquisition of Analogue Voltage Signals. Electronics, 12(3), 610. DOI: https://doi.org/10.3390/electronics12030610

[7] Hebbar, R., & Milenković, A. (2022). PMU-events-driven DVFS techniques for improving energy efficiency of modern processors. ACM Transactions on Modeling and Performance Evaluation of Computing Systems, 7(1), 1-31. DOI: https://doi.org/10.1145/3538645

[8] Ibba, P., Crepaldi, M., Cantarella, G., Zini, G., Barcellona, A., Rivola, M., ... & Lugli, P. (2021). Design and validation of a portable AD5933–based impedance analyzer for smart agriculture. IEEE Access, 9, 63656-63675. DOI: https://doi.org/10.1109/ACCESS.2021.3074269

[9] Jiménez López, M. (2019). Distributed control systems based on high accurate timing synchronization (sistemas de control distribuido basado en sincronización temporal de alta precisión).

[10] Jung, J., & Erez, M. (2023, October). Predicting future-system reliability with a component-level dram fault model. In Proceedings of the 56th Annual IEEE/ACM International Symposium on Microarchitecture (pp. 944-956). DOI: https://doi.org/10.1145/3613424.3614294

[11] Karwa, K. (2023). AI-powered career coaching: Evaluating feedback tools for design students. Indian Journal of Economics & Business. https://www.ashwinanokha.com/ijeb-v22-4-2023.php

[12] Konneru, N. M. K. (2021). Integrating security into CI/CD pipelines: A DevSecOps approach with SAST, DAST, and SCA tools. International Journal of Science and Research Archive. Retrieved from https://ijsra.net/content/role-notificationscheduling-improving-patient

[13] Koyuncu, A., Liu, K., Bissyandé, T. F., Kim, D., Monperrus, M., Klein, J., & Le Traon, Y. (2019, August). iFixR: Bug report driven program repair. In Proceedings of the 2019 27th ACM joint meeting on european software engineering conference and symposium on the foundations of software engineering (pp. 314-325). DOI: https://doi.org/10.1145/3338906.3338935

[14] Lewis, T. G. (2019). Critical infrastructure protection in homeland security:

defending a networked nation. John Wiley & Sons.

[15] Ma, K. (2023). Improving Genetic Diagnostics and Developing Gene Therapies in Rare Muscle Diseases. Yale University.

[16] Mayton, B. D. (2020). Sensor networks for experience and ecology (Doctoral dissertation, Massachusetts Institute of Technology).

[17] Nyati, S. (2018). Revolutionizing LTL carrier operations: A comprehensive analysis of an algorithm-driven pickup and delivery dispatching solution. International Journal of Science and Research (IJSR), 7(2), 1659-1666. Retrieved from https://www.ijsr.net/getabstract.php?paperid=SR24203183637 DOI: https://doi.org/10.21275/SR24203183637

[18] Nyati, S. (2018). Transforming telematics in fleet management: Innovations in asset tracking, efficiency, and communication. International Journal of Science and Research (IJSR), 7(10), 1804-1810. Retrieved from https://www.ijsr.net/getabstract.php?paperid=SR24203184230 DOI: https://doi.org/10.21275/SR24203184230

[19] Papadimitriou, G., & Gizopoulos, D. (2022). Challenges on unveiling voltage margins from the node to the datacentre level. In Computing at the EDGE: New Challenges for Service Provision (pp. 13-49). Cham: Springer International Publishing. DOI: https://doi.org/10.1007/978-3-030-74536-3_2

[20] Qazi, F. (2020). Automating SLA enforcement in the cloud computing (Doctoral dissertation, University of Warwick).

[21] Raju, R. K. (2017). Dynamic memory inference network for natural language inference. International Journal of Science and Research (IJSR), 6(2). https://www.ijsr.net/archive/v6i2/SR24926091431.pdf DOI: https://doi.org/10.21275/SR24926091431

[22] Randall, D. S. (2020). Cost-Driven Integration Architectures for Multi-Die Silicon Systems (Doctoral dissertation, University of California, Santa Barbara).

[23] Samriya, J. K., Tiwari, R., Cheng, X., Singh, R. K., Shankar, A., & Kumar, M. (2022). Network intrusion detection using ACO-DNN model with DVFS based energy optimization in cloud framework. Sustainable Computing: Informatics and Systems, 35, 100746. DOI: https://doi.org/10.1016/j.suscom.2022.100746

[24] Sardana, J. (2022). Scalable systems for healthcare communication: A design perspective. International Journal of Science and Research Archive. https://doi.org/10.30574/ijsra.2022.7.2.0253 DOI: https://doi.org/10.30574/ijsra.2022.7.2.0253

[25] Singh, V. (2023). Enhancing object detection with self-supervised learning:

Improving object detection algorithms using unlabeled data through self-supervised techniques. International Journal of Advanced Engineering and Technology. https://romanpub.com/resources/Vol%205%20%2C%20No%201%20-%2023.pdf

[26] Singh, V. (2023). Large language models in visual question answering: Leveraging LLMs to interpret complex questions and generate accurate answers based on visual input. International Journal of Advanced Engineering and Technology (IJAET), 5(S2). https://romanpub.com/resources/Vol%205%20%2C%20No%20S2%20-%2012.pdf

[27] Wang, Y., Lee, V., Wei, G. Y., & Brooks, D. (2019). Predicting new workload or CPU performance by analyzing public datasets. ACM Transactions on Architecture and Code Optimization (TACO), 15(4), 1-21. DOI: https://doi.org/10.1145/3284127

[28] Xu, M., Kashyap, S., Zhao, H., & Kim, T. (2020, May). Krace: Data race fuzzing for kernel file systems. In 2020 IEEE Symposium on Security and Privacy (SP) (pp. 1643-1660). IEEE. DOI: https://doi.org/10.1109/SP40000.2020.00078

[29] Xu, Q., Shi, Y., Bamber, J., Tuo, Y., Ludwig, R., & Zhu, X. X. (2023). Physics-aware machine learning revolutionizes scientific paradigm for machine learning and process-based hydrology. arXiv preprint arXiv:2310.05227.

[30] Yao, Y. (2023). Game-of-life temperature-aware DVFS strategy for tile-based chip many-core processors. IEEE Journal on Emerging and Selected Topics in Circuits and Systems, 13(1), 58-72. DOI: https://doi.org/10.1109/JETCAS.2023.3244763

[31] Yassin, Y. H., Jahre, M., Kjeldsberg, P. G., Aunet, S., & Catthoor, F. (2021). Fast and accurate edge computing energy modeling and DVFS implementation in GEM5 using system call emulation mode. Journal of Signal Processing Systems, 93(1), 3348. DOI: https://doi.org/10.1007/s11265-020-01544-z

Downloads

Published

2026-03-06

Issue

Section

Articles