Embedded vision is a top tier, rapidly growing market with a host of challenges and conflicting requirements. Complex algorithms can pose immense computation (>50GOPs) and communication (>8GB/s) demands, especially considering adaptive vision algorithms (which employ machine learning techniques and concurrently update a scene model). Yet at the same time, their embedded deployment demands low power operation consuming only a few Watts. Multiprocessor System-on-Chips (MPSoCs) have emerged as one main architecture approach to improve power efficiency while increasing computational performance. Still, current vision MPSoCs either shy away from supporting adaptive vision algorithms or operate at very limited resolutions due to the immense communication and computation demands. This dissertation identifies major challenges that hinder embedded realizations of advanced vision algorithms. (1) immense communication demands (>8GB/s) render efficient embedded implementations infeasible. (2) constructing larger vision applications out of independent vision processing elements (even if (1) would be solved) is challenging due to the combined communication demand. (3) to recover design cost, sales quantities need to be increased which can be achieved through targeting a domain of applications. This, however, requires novel architectures to simultaneously provide efficiency (performance and power) and flexibility (to execute multiple applications).
Finally, (4) system architects often start from system specification model and rely on their evolving knowledge to architect vision platforms. Abstraction levels and automation tools need to be identified to guide system architects on the path from market requirements to a system specification. This dissertation makes three major contributions to address the above identified challenges. First, this dissertation outlines how to reduce the communication demand of adaptive vision algorithms which removes a tremendous hurdle for their embedded realization. For this, we have identified two communication types commonly present in these algorithms, namely streaming and algorithm-intrinsic traffic. Separating these traffic types enables application-specific management of algorithm-intrinsic data (e.g. through compression, prioritization). We have demonstrated the benefits using Mixture of Gaussian (MoG) background subtraction. Through compression, we reduced the memory bandwidth by 60% without impacting quality. Through an architecture template, we demonstrate how the traffic separation can be realized in a platform. We furthermore demonstrate the benefits of traffic separation when constructing complete vision applications. Our complete object tracking vision flow (image smoothing, MoG background subtraction, morphology, component labeling, histogram checking) realized on a Zynq-based architecture processes 1080p at 30Hz. It executes 40GOPs at only 1.7Watts of on-chip power. Second, this dissertation introduces a novel processor class to efficiently support a set of vision applications within a market. In particular, we introduce Function-level Processor (FLP) which offers efficiency similar to custom hardware and yet is sufficient flexible to execute different applications (of the same market). An FLP achieves efficiency by coarsening architecture programmability from instructions (as in an ILP) to functions. We demonstrate the benefits using Analog Devices’ Pipeline Vision Processor (PVP). We show how 10 different Automotive Driver Assistance System (ADAS) applications can be entirely mapped to the PVP. The PVP processes to 22.4 GOPs while consuming 314 mW – 14x-18x less than a compared ILP-based solutions. Third, this dissertation provides guidance for system architects in early stages of the design, i.e. from market requirements to a system specification model. For this, we introduce Conceptual Abstraction Levels (CALs). CALs identify a sequence of critical areas for early architecture exploration and resolve interdependent challenges and dependencies through iteration. CALs help system architects to identify the potential application taking benefits of traffic separation or application blocks for function-level processing at early stages of design.
Overall, this dissertation tackles complexities associated with architecting embedded vision MPSoCs from three different angles: (1) in abstract design phase, (2) realization of individual algorithms in hardware, and (3) embedded realization of a complete flow even when simultaneously targeting multiple applications. The dissertation contributions can also provide guidance considering other challenging streaming domains, such as radar processing, wireless base-band processing or