A Comparative Study of CNN Optimization Methods for Edge AI: Exploring the Role of Early Exits

April 16, 2026 · Grace Period · + Add venue

Authors Nekane Fernandez, Ivan Valdes, Steven Van Vaerenbergh, Idoia de la Iglesia, Julen Arratibel arXiv ID 2604.14789 Category cs.AI: Artificial Intelligence Citations 0

Abstract

Deploying deep neural networks on edge devices requires balancing accuracy, latency, and resource constraints under realistic execution conditions. To fit models within these constraints, two broad strategies have emerged: static compression techniques such as pruning and quantization, which permanently reduce model size, and dynamic approaches such as early-exit mechanisms, which adapt computational cost at runtime. While both families are widely studied in isolation, they are rarely compared under identical conditions on physical hardware. This paper presents a unified deployment-oriented comparison of static compression and dynamic early-exit mechanisms, evaluated on real edge devices using ONNX based inference pipelines. Our results show that static and dynamic techniques offer fundamentally different trade-offs for edge deployment. While pruning and quantization deliver consistent memory footprint reduction, early-exit mechanisms enable input-adaptive computation savings that static methods cannot match. Their combination proves highly effective, simultaneously reducing inference latency and memory usage with minimal accuracy loss, expanding what is achievable at the edge.