Increasing on-chip wire delay and growing off-chip miss latency, present two key challenges in designing large Level-2 (L2) CMP caches. Currently, some CMPs use a shared L2 cache to maximize cache capacity and minimize off-chip misses. Others use private L2 caches, replicating data to limit the delay from slow on-chip wires and minimize cache access time. Ideally, to improve performance for a wide variety of workloads, CMPs prefer both the capacity of a shared cache and the access latency of private caches. In this context, NUCA caches have been proved to be able to tolerate wire delay effects while maintaining a huge on-chip storage capacity. In this thesis, we investigate the choice of the coherence strategy (MESI and MOESI) and the whole system topology as design tradeoffs for S-NUCA based CMP system, and propose and evaluate a novel block migration scheme for DNUCA based systems, in which are addressed two specific problems that can arise due to the presence of multiple traffic sources. Results show that, in S-NUCA based CMP systems, choosing between MESI and MOESI has not a significant impact on performance, while the system topology can lead to very different behaviors. Block migration is introduced in NUCA cache to reduce access latency in a shared cache. Our results show that the migration mechanism is effective in reducing the average L1 miss latency, but the impact on performance is smaller, as a consequence of the very little L1 miss rate.
Cache Architectures for Wire-Delay Dominated CMP Systems
SOLINAS, MARCO
2009
Abstract
Increasing on-chip wire delay and growing off-chip miss latency, present two key challenges in designing large Level-2 (L2) CMP caches. Currently, some CMPs use a shared L2 cache to maximize cache capacity and minimize off-chip misses. Others use private L2 caches, replicating data to limit the delay from slow on-chip wires and minimize cache access time. Ideally, to improve performance for a wide variety of workloads, CMPs prefer both the capacity of a shared cache and the access latency of private caches. In this context, NUCA caches have been proved to be able to tolerate wire delay effects while maintaining a huge on-chip storage capacity. In this thesis, we investigate the choice of the coherence strategy (MESI and MOESI) and the whole system topology as design tradeoffs for S-NUCA based CMP system, and propose and evaluate a novel block migration scheme for DNUCA based systems, in which are addressed two specific problems that can arise due to the presence of multiple traffic sources. Results show that, in S-NUCA based CMP systems, choosing between MESI and MOESI has not a significant impact on performance, while the system topology can lead to very different behaviors. Block migration is introduced in NUCA cache to reduce access latency in a shared cache. Our results show that the migration mechanism is effective in reducing the average L1 miss latency, but the impact on performance is smaller, as a consequence of the very little L1 miss rate.| File | Dimensione | Formato | |
|---|---|---|---|
|
TesiDottoratoMarcoSolinas.pdf
embargo fino al 29/05/2049
Tipologia:
Altro materiale allegato
Licenza:
Tutti i diritti riservati
Dimensione
3.59 MB
Formato
Adobe PDF
|
3.59 MB | Adobe PDF | |
|
CopertinaTesiMarcoSolinas.pdf
accesso aperto
Licenza:
Tutti i diritti riservati
Dimensione
34.39 kB
Formato
Adobe PDF
|
34.39 kB | Adobe PDF | Visualizza/Apri |
I documenti in UNITESI sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.
https://hdl.handle.net/20.500.14242/132926
URN:NBN:IT:UNIPI-132926