Improving Scalability of CMPs with Dense ACCs Coverage
Utilizing Hardware Accelerators (ACCs) is a promising solution to improve the performance / power efficiency. However, new challenges raise with a trend to shift from few ACCs (with sparse ACCs coverage) to many ACCs (denser ACCs coverage) on a chip. The primary challenges are lack of clear semantics to communicate with ACCs as well as the processorcentric view for orchestrating the entire system. This paper opens a path toward efficient integration of many ACCs on a single chip. To this end, the paper at first identifies 4 major semantic aspects when two ACCs need to communicate with each other: data access model, data granularity, marshalling and synchronization. Based on the identified semantics, the paper proposes Transparent Self-Synchronizing (TSS) as an efficient architecture solution to realize the identified semantics in the underlying architecture. In principle, TSS proposes a shift from the current processor-centric view to a more equal, peer view between ACCs and the host processors. TSS minimizes the interaction with the host processor and reduces the volume of direct ACC-to-ACC communication traffic exposed to the system fabric. Our results using 8 streaming applications with different number of ACC-to-ACC connections demonstrate significant benefits of TSS including 3x speedup over current ACC based architectures.
Appeared in:
DATE
Year:
2016
Presentation Place:
Dresden, Germany
Related Research:
Novel Architecture for Streaming Applications