Chapter5MemoryHierarchyDesign-

Chapter5Memory Hierarchy DesignZheng QinghuaComputer Dept. of XJTU May, 2015OutlinePrinciple of Storage SystemPrinciple and Technique of Cache System Management of Main Memory Principle of Virtual Memory5.1 Overview of Hierarchy Memory SystemCPU Registers 100s Bytes 1s nsCache K Bytes 4 ns 1-0.1 cents/bitMain Memory M Bytes 100ns- 300ns $.0001-.00001 cents /bitDisk G Bytes, 10 ms (10,000,000 ns)10 - 10 cents/bit-5-6Capacity Access Time CostTape infinite sec-min10-8RegistersCacheMemoryDiskCD /TapeInstr. OperandsBlocksPagesFilesStaging Xfer Unitprog./compiler 1-8 bytescache cntl 8-128 bytesOS 512-4K bytesuser/operator MbytesUpper LevelLower LevelfasterLargerDesign principle of Hierarchy Memory System Basic Principles: locality and cost/ performance of memory technology The principle of locality: Programs tend to reuse data and instructions they have used recently. A rule of thumb: a program spends 80% of its execution time in only 20% of the code.Reason1: Locality An implication of locality is that we can predict with reasonable accuracy what instructions and data a program will use in the near future based on its accesses in the recent past. Temporal localityRecently accessed items are likely to be accessed in the near future. Spatial localityItems whose addresses are near one another tend to be referenced close together in time.Desktop, Drawer, and File Cabinet AnalogyFig. Items on a desktop (register) or in a drawer (cache) are more readily accessible than those in a file cabinet (main memory). Once the “working set” is in the drawer, very few trips to the file cabinet are needed.Temporal and Spatial LocalitiesAddressesTimeFrom Peter Dennings CACM paper, July 2005 (Vol. 48, No. 7, pp. 19-24)Temporal: Accesses to the same address are typically clustered in timeSpatial: When a location is accessed, nearby locations tend to be accessed alsoDesign resolution The principle of locality + The smaller Memory is faster led to the hierarchy based on memories of different speeds and sizes.Typical Levels in a Hierarchical MemoryReason2: Processor-DRAM Memory Gap (latency)Proc 60%/yr. (2X/1.5yr)DRAM 9%/yr. (2X/10 yrs) 11010010001980 19811985 1986 1987DRAMCPU1982Processor-Memory Performance Gap: (grows 50% / year)Performance“Moores Law”2000The Need for a Memory HierarchyThe widening speed gap between CPU and main memory Processor operations take of the order of 1 ns Memory access requires 10ns or even 100s of nsMemory bandwidth limits the instruction execution rate Each instruction executed involves at least one memory access Hence, a few to 100s of MIPS is the best that can be achieved A fast buffer memory can help bridge the CPU-memory gap The fastest memories are expensive and thus not very large A second even third intermediate cache level is thus often used The levels of the hierarchy usually subset one another; all data in one level is also found in the level below, and all data in that lower level is found in the one below it, and so on until we reach the bottom of the hierarchy. Each level maps addresses from a larger memory to a smaller but faster memory higher in the hierarchy. At the same time of address mapping, address checking and protection schemes for scrutinizing address are taken place too.The Need for a Memory HierarchyThe Need for a Memory Hierarchy5.1.2 Performance of Storage SystemVolume、Speed and PriceAverage price C of all over storage systemM 1 (T1 ,S1 ,C1 )(T2 ,S2 ,C2 )M 2 (T3 ,S3 ,C3 )M 3Users viewpointT = min (T1 ,T2 , Tn )S= max ( S1,S2, Sn )C = max (C1,C2,Cn) Access Time T Hit: data appears in some block in the upper level (example: Block X) Hit Rate/H: the fraction of memory access found in the upper level Hit Time: Time to access the upper level which consists of RAM access time + Time to determine hit/miss Miss: data needs to be retrieved from a block in the lower level (Block Y) Miss Rate = 1 - (Hit Rate) Miss Penalty: Time to replace a block in the upper level + Time to deliver the block the processor Hit Time (b, w） b由于主存中的块和Cache中的块一样大小，因此，两者具有同样的W； c地址变换算法的考虑要点：硬件实现是否容易速度冲突概率小存储空间的利用率高5.2.2 地址变换的几种方法Cache 地址目录表及其结构该表存储主存块号和Cache块号间的对应关系全相联映象及其变换概念：指主存中的任意一块都可以映象到Cache中的任意一块。因此这种映射共有CbMb种可能，其中 Cb和Mb分别为Cache中的块数和主存中的块数。结构为：主存块号Cache块号b有效位有关目录表结构全相联映象方式：全相联的地址映象规则：1)主存与缓存分成相同大小的数据块。2) 主存的某一数据块可以装入缓存的任意一块的空间中。 B：每块大小C：Cache容量M：主存容量块0块1：块i：块M/B-1块0块1 ：块C/B-1Cache主存储器全相联映像（续）N = 区内块数，阴影区表示查找范围全相联映像的特点：块的冲突概率最小 Cache的利用率也最高但需要一个访问速度很快，容量为Cb的相联存储器，代价较高。在虚拟存储系统中，一般都采用全相联映象方式。原因在于为了支持多任务调度时，CPU 可以快速切换到另一个任务之中。全相联映像（续）5.2.3 直接映象及其变换Cache和RAM仍以块为单位，但在RAM中将空间划分为大小和Cache完全一样的区，在RAM中，将区再划分为若干块。Cache中的块和RAM中的块形成一对多的映射关系，如下：(BA0，BA1，BAn)b(Cache)即满足：b =B Mod Cb其中，b为Cache中的块号，B为存储器块号，Cb为Cache中总块数。RAM 地址：(区号A，块号 B，块内地址W)Cache 地址：(Cache块号，块内地址)Cache地址变换的实质：直接映象直接映象（续）区号不等，但tag=1，则表示Cache中原来存放的那一块是有效的，为此，要先将此块写回 RAM，然后再按Case2进行操作