Artificial Intelligence, Computers, Technology and Engineering

Harnessing AI Power: Optimizing Transformers for Low-Power Embedded Systems

newswade

January 3, 2025

Artificial Intelligence (AI) is no longer confined to high-performance computing centers. It has ventured into the compact, resource-constrained world of embedded systems, thanks to innovations in model optimization and hardware capabilities. A recent study showcases how a self-supervised audio spectrogram transformer (SSAST) can efficiently operate on a low-power NVIDIA Jetson Orin Nano System-on-Chip (SoC), paving the way for AI’s integration into devices like IoT gadgets, wearable tech, and more.

Transformers Meet Embedded Systems

Transformers, like the SSAST, are known for their prowess in natural language processing and audio recognition tasks. However, their deployment on embedded systems has been limited by the high demands of inference processes. This study breaks new ground by demonstrating how to make these models work efficiently on low-power GPUs, highlighting optimization techniques that ensure minimal resource usage without compromising performance.

Key Optimizations for Efficiency

Batch Size Tuning: Larger batch sizes drastically reduce inference times and energy consumption. However, they require careful management to avoid exceeding memory constraints. In this study, a batch size of 16 struck the perfect balance, optimizing time, energy, and space.
Model Compilation with TensorRT: NVIDIA’s TensorRT framework was a game-changer, offering accelerated inference through precision-optimized kernels and memory optimization. Compiled models demonstrated up to 2x faster inference times compared to their non-compiled counterparts.
Precision Reduction: By reducing data precision to half-floating point (FP16) or even 8-bit integers, the study achieved significant gains in speed and energy efficiency. Notably, accuracy degradation was negligible, with less than 1% loss observed during 8-bit post-training quantization.

Experimental Insights

The team used the Google Speech Commands Dataset to test the SSAST model on the Jetson Orin Nano SoC, which houses a six-core ARM CPU and an NVIDIA Ampere GPU. Results showed that:

GPUs were 6x faster than CPUs for single-sample inference and up to 32x faster for batched inputs.
Compiled models consistently reduced energy consumption and inference time, even at larger batch sizes.
Memory utilization increased with batch size, but the optimized configurations ensured that other tasks weren’t disrupted.

Real-World Applications

These findings open the door to deploying AI in areas like:

Healthcare: Wearable devices for real-time health monitoring.
IoT: Smart home systems capable of real-time voice recognition.
Automotive: Low-power AI solutions for autonomous vehicle features.

Looking Ahead

The study sets the stage for broader adoption of AI on embedded systems. Future work will explore:

Quantization Aware Training (QAT) to further minimize accuracy loss during precision reduction.
Deployment of larger models, including Large Language Models (LLMs), on edge devices using tools like TensorRT-LLM.
Utilization of advanced hardware like the Jetson Xavier for even greater performance.

Conclusion

This research proves that cutting-edge transformer models can thrive in low-power environments, balancing time, energy, and memory efficiency without sacrificing accuracy. By leveraging optimization techniques like TensorRT and precision reduction, we can bring the transformative power of AI to embedded systems, driving innovation across industries.

Article derived from: Martin-Salinas, I., Badia, J.M., Valls, O. et al. Evaluating and accelerating vision transformers on GPU-based embedded edge AI systems. J Supercomput 81, 349 (2025). https://doi.org/10.1007/s11227-024-06807-1

Check out the cool NewsWade YouTube video about this article!

AI, deep learning, Embedded Systems, GPU Optimization, Low-Power Computing, TensorRT, Transformer Models

Harnessing AI Power: Optimizing Transformers for Low-Power Embedded Systems

Transformers Meet Embedded Systems

Key Optimizations for Efficiency

Experimental Insights

Real-World Applications

Looking Ahead

Conclusion

Share this article

newswade

Subscribe

Latest News

Bridging the Gap: How Helmet-Mounted Displays Are Revolutionizing Rotary-Wing Situational Awareness

Thales Delivers Next-Generation Dual Travelling Wave Tubes for Optus-11 Satellite

Automating the Practice of Science: The Future of Discovery is Here

The Evolution of AI-Driven User/Device Profiling in Cybersecurity: Benefits, Challenges, and Future Trends

Auburn University’s Breakthrough: How Brain Cells Stay Connected and What It Means for Alzheimer’s Disease

Columbia Chemists Create the First 2D Heavy Fermion Material—Unlocking a New Era in Quantum Materials

Featured Categories

More News

Bridging the Gap: How Helmet-Mounted Displays Are Revolutionizing Rotary-Wing Situational Awareness

Thales Delivers Next-Generation Dual Travelling Wave Tubes for Optus-11 Satellite

Automating the Practice of Science: The Future of Discovery is Here

The Evolution of AI-Driven User/Device Profiling in Cybersecurity: Benefits, Challenges, and Future Trends

Auburn University’s Breakthrough: How Brain Cells Stay Connected and What It Means for Alzheimer’s Disease

Columbia Chemists Create the First 2D Heavy Fermion Material—Unlocking a New Era in Quantum Materials

Breakthrough in Breast Cancer: Cyanine-Carborane Salts Supercharge Photodynamic Therapy

Scientists Discover a New Way to Convert Carbon Monoxide Into Useful Molecules Using a Rare Earth Metal

Stay up to date on the latest news

Where Every Story Makes Headlines

Trending News

Bridging the Gap: How Helmet-Mounted Displays Are Revolutionizing Rotary-Wing Situational Awareness

Thales Delivers Next-Generation Dual Travelling Wave Tubes for Optus-11 Satellite

Automating the Practice of Science: The Future of Discovery is Here

Popular Categories

Contact Us