Accelerator-based Chip Multi-Processors (ACMPs), which combine application-specific HW ACCelerators (ACCs) with host processor core(s), are promising architectures for high-performance and power-efficient computing. However, ACMPs with many ACCs have scalability limitations. The ACCs’ performance benefits can be overshadowed by bottlenecks on shared resources of processor core(s), communication fabric/DMA, and on-chip memory. Primarily, this is rooted in the ACCs’ data access and the orchestration dependency. Due to very loosely defined ACC communication semantics, and relying on general architectures, the resources bottlenecks hamper performance. This article explores and alleviates the scalability limitations of ACMPs. To this end, the article first proposes ACMPerf, an analytical model to capture the impact of the resources bottlenecks on the achievable ACCs’ benefits. Then, the article identifies and formalizes ACC communication semantics which paves the path toward a more scalable integration of ACCs. The semantics describe four primary aspects: data access, data granularity, data marshalling, and synchronization. Finally, the article proposes a novel architecture of Transparent Self-Synchronizing ACCelerators (TSS). TSS efficiently realizes our identified communication semantics of direct ACC-to-ACC connections often occurring in streaming applications. TSS delivers more of the ACCs’ benefits than conventional ACMP architectures. Given the same set of ACCs, TSS has up to 130x higher throughput and 78x lower energy consumption, mainly due to reducing the load on shared architectural resources by 78.3x.
IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems
Novel Architecture for Streaming Applications