Summary

Block PlacementAny one of N blocks definedby indexAny cache blockOnly one block, defined byindexDirect-MappedSet AssociativeFully AssociativeBlock IdentificationTag match for all blockswithin setTag match for all blockswithin cacheTag match with only oneblockBlock ReplacementChoose block based onreplacement policyChoose block based onreplacement policyChoose block based on index(no choice) # Context

If operands are in memory, it has to be loaded into the registers of the processor, and then operated, and stored back into memory.

Memory Technology

DDR SDRAM

Double Data Rate Synchronous Dynamic Random Access Memory

Motivation

Requirement: Big and fast memory

Key concept

Hierarchy of memory technologies should be used:

  • Small but fast near CPU
  • Large but slow farther away from CPU (for cost)
RegistersSRAMDRAMHard DiskSpeedSize # Cache

Keep frequently and recently used data in smaller but faster memory

Principle of locality

Program accesses only a small portion of the memory address space within a small time interval

Temporal locality

If an item is referenced, it will tend to be referenced again soon

Spatial locality

If an item is referenced, nearby items will tend to be referenced soon.

Working Set

Working set

Set of locations accessed during

Aim

Capture working set and keep it in memory closest to CPU

Memory Access

To make slow main memory appear faster:

  • Cache
  • Hardware management

To make small main memory appear bigger:

  • Virtual memory
  • OS managed

Terminology

Hit

Data is in cache

Hit rate

Fraction of memory that is in cache

Hit time

Time to access cache

Miss

Data is not in cache

Miss rate 1 - Hit rate)

Fraction of memory not in cache (

Miss penalty

By definition hit time

Average access time:

Memory-Cache Mapping

Cache block/line

Unit of transfer between memory and cache

Block NumberByte offset[31:N][N-1:0]2^NbyteblocksObservations1. Aligned at 2^N byte boundaries2. Addresses within a blockhave identical 32-N MSB ## Direct Mapped Cache Cache block2^mTagCache Indexm bitsBLOCK NUMBERCache IndexCache Index Multiple memory blocks map to the same cache block (and have the same cache index), but have unique tag numbers. TAGINDEXOFFSETn-10n+m-1n31n+m[n-1:0][n+m-1:n][31:n+m]32-(n+m) bitsm bitsn bitsNumber of cache blocks = 2^MSize of cache blocks = 2^N ValidTagDataCache HitTagMemory Address=ValidTRUE=ORTagMemory Address!=ValidFALSE=Cache Miss TAGINDEXOFFSETData (split into words)ValidTag=HitMUXData

Reading Data

Read address sent fromprocessor into cacheUse offset and deliver data toprocessorIn cache?Access memory block at RAAllocate cache lineLoad into cache line and set tagand valid bit if needed.READ HIT[Tags Match] && [Valid]READ MISS[Tags Mismatch] ||[Valid]Reading
Cache Misses
Compulsory misses (cold start/first reference)Occurs at first access to a block, as block must be brought into cache.
Conflict misses (collision/interference)Occurs in direct mapped/set associative, when several blocks are mapped to the same block/set
Capacity missesOccurs when blocks are discarded from cache as cache cannot contain all blocks

Writing Policy

Motivation

When writing data, the modified data is only in cache, but not in memory.

Solutions:

  1. Write-through cache
  2. Write-back cache

Write-through cache

Write data both to cache and to main memory

Write-back cache

Only writes to cache during write operation, and only writes to main memory when cache block is evicted.

ProcessorCacheDRAMWrite bufferWrite-through cacheprocessor writes data tocache and write buffermemory controllerwrites contents of bufferto memoryProcessorCacheDRAMWrite-through cacheAdd dirty bit to eachcache blockWrite changes dirty bitWhen cache isreplaced, only writeback if dirty bit = 1Write policy

Handling Cache Misses

Write allocate

  • load complete block into cache
  • change only required word in cache
  • write to main memory (based on write policy)

Write around

  • write directly to main memory
Write address and value sentfrom processor into cacheHandle WriteIn cache?Handle Write MissWRITE HIT[Tags Match] && [Valid]WRITE MISS[Tags Mismatch] ||[Valid]WritingWrite BackWrite throughWrite allocateWrite around

Block Size Trade-offs

Block sizeMiss PenaltyBlock sizeMiss PenaltyExploits spatial localitywith largerblock but compromisestemporallocality with fewernumber of blocksAverage AccessTimeBlock sizesweet spot

Set-Associative Cache

Motivation

Address conflict misses.

N-way set associative cache

A memory block can be placed in a fixed number of locations in the cache, where .

Effective idea:

  • instead of mapping to a cache block, it maps to a unique set of cache blocks where it can be placed in any of the cache blocks in that set.
  • requires searching of both to look for memory block
TAGINDEXOFFSETDataValidTag=HitDataDataValidTag=N x 1 select

Advantage of associativity

A direct-mapped cache of size has the same miss rate of a 2-way set associative cache of size .

Fully-Associative Cache

Fully-associative cache

Can be placed in any location in cache

Block number serves as tag in FA cache.

No conflict misses

No conflict misses happens since data can go anywhere

Capacity misses

Cache cannot contain all blocks needed

Cache Performance

  1. Cold/compulsory miss remains the same irrespective of cache size/associativity
  2. Conflict miss goes down with increasing associativity on the same cache size
  3. Conflict miss is 0 for FA caches
  4. For same cache size, capacity miss remains the same irrespective of associativity
  5. Capacity miss decreases with increasing cache size
AssociativityConflict missFull-associativeDirect-mapCache sizeCapacity missLarger sizeSmaller size
Cache Misses
Cold/compulsorysame irrespective of cache size/associativity
Conflict missfor same cache size, decreases with increasing associativity
Capacity missdecreases with increasing cache size

Block Replacement Policy

Motivation

In set associative and fully associative caches, a memory block can choose where to be placed, potentially replacing another cache block if full.

Least recently used

When replacing a block, choose one which has not been accessed for the longest time. (Temporal locality)

Hard to track if there are many choices

  • FIFO
  • RR
  • LFU