资源预览内容
第1页 / 共58页
第2页 / 共58页
第3页 / 共58页
第4页 / 共58页
第5页 / 共58页
第6页 / 共58页
第7页 / 共58页
第8页 / 共58页
第9页 / 共58页
第10页 / 共58页
亲,该文档总共58页,到这儿已超出免费预览范围,如果喜欢就下载吧!
资源描述
Motivating and Guiding Students to Learn and Think Creatively in Architecture and Systems ClassesXiaodong ZhangOhio State University, USASystems/Architecture are Critical in CurriculumComputer Science major graduates are Technology-developing driving forces with strong CS foundations, insights into core technology Algorithms, computer systems, and architectureNon-Computer Science major graduates are Technology-using driving forcesWith major focus on the cores in their own fields. If architecture and systems are not our cores What is our identity? How could we train innovators for core technology? 59%71%Surviving in the Highly Competitive World Darwin: Survival of the fittest. Tom Friedman in his The world is Flat” gives 4 survival rules to next generation youth: Learn how to learn: finding the best teachers and best learning environment.Learn with passion and curiosity: endless power.To be liked by others and like others: accepting diverse personalities and cultures for large collaborations.Express well: making convincing arguments orally and in writing. (weak expressing of thinking = no thinking) Make our classrooms into such an environment. Excellence for Architecture/System Teaching What is the trend and state-of-the-art of the field. What are the fundamental concerns and issues? How to raise questions within/beyond text books?How to establish an experimental environment? How to get insights via intensive experiments? How to make deep understanding via discussions?Can we formulate problems for students to think?How to we train students to be articulate? What is the best way to test students? A big Picture of Systems and Architecture Field Good NewsCPU cycles: oversupplied for many applications. Memory bandwidth: improved dramatically.Memory capacity: increasingly large and low cost.I/O bandwidth: improved dramatically. Disk capacity: huge and cheap. Cluster and Internet bandwidths: very rich. Bad NewsCPU cycles per Watt decreases. (less energy efficient).Cache capacity: always limited. Improvement of data access latencies at any level significantly lags behind!Adam Smith: the balance is guided by an “invisible hand”. Balancing supply/demand is the key for performance and cost optimization.CPU-DRAM Gap is no longer Major Bottleneck 50% per yearCPUDRAM Cache optimization only. Limited cache capacity would not hold working sets of data intensive applications. Caches are highly efficient, little space for improvement. A cache miss latency has been reduced to 50- 90 ns. Memory with a large capacity becomes a working place for fast data accesses. Device Name1980(ns)2000(ns)Improvement CPU Cycle Time1,0001.6625.00x SRAM Access Time3002015.00x DRAM Access Time6251006.25x Disk Seek Time87,000,000 8,000,00010.87xLimited by the mechanic components, the disks performance is seriously lagging behind the CPU and memory. In 1980, one disk seek costs 87,000 cycles, in 2000, one disk seek costs over 5,000,000 cycles. The disks in 2000 are more than 57 times “SLOWER” than their ancestors in 1980.Date Communication in Computer SystemsTransfer Bandwidth TimeLatency TimeDestination-perceived latency reduction is still limited due to imbalanced improvement of bandwidth and latencySourceDestinationLatency Lags Bandwidth (CACM, Patterson) In the last 20 years, 1002000X improvement in bandwidth5-20X improvement in latencyBetween CPU and on-chip L2: bandwidth: 2250X increase latency: 20X reductionBetween L3 cache and DRAM: bandwidth: 125X increaseLatency: 4X reductionBetween DRAM and disk: bandwidth: 150X increase latency: 8X reduction Between two nodes via a LAN:bandwidth: 100X increaselatency: 15X reductionTop 10 High End Systems in Top-500 (6/06)lTop 1: IBM Blue Gene/L: 280.6 TeraFlops (131,072 nodes)lLocated in Lawrence Livermore National Lab, USA. Top 2: IBM Blue Gene/L: 91.29 TeraFlops (40,960 nodes)lLocated in IBM T. J. Watson Research Center, USA. lTop 4: SGI Columbia: 51.87 TeraFlops (10,160 nodes)lLocated in NASA Ames Research Center, USA. lTop10: NEC Earth Simulator:35.86 TeraFlops (5,120 nodes)lLocated in Earth Simulator Center, Japan. (02-04: 1, 05: 4)How is Resource Supply/Demand Balanced?lSlowdown CPU Speed: lEarth Simulator: NEC AP, 500 MHz (4-way SU, a VU). lBlue Gene/L: IBM Power PC 440, 700 MHz.lColumbia: SGI Altix 3700 (Intel Itanium 2), 1.5 GHz. (commodity processors, no choice for its high speed) Very low latency on-chip data accesses:lEarth Simulator: 128K L1 cache and 128 large registers. lBlue Gene/L: on-chip L3 cache (2 MB). lColumbia: on-chip L3 cache (6 MB). Fast accesses to huge and shared main memory.lEarth Simulator: cross bar switches between AP and memory. lBlue Gene/L: cached DRAM memory, and 3-D torus connection.lColumbia: SGI NUMALinks data block transfer time: 50 ns.lFurther latency reductions: prefetching and caching. Computing Operations Versus Data Movement lComputation is much cheaper than data m
网站客服QQ:2055934822
金锄头文库版权所有
经营许可证:蜀ICP备13022795号 | 川公网安备 51140202000112号