计算机组织与结构：版7-12_Processor Structure and Function-

William Stallings Computer Organization and Architecture7th EditionChapter 12CPU Structure and Function1/60Key terms instruction cycle instruction pipeline instruction prefetch pipeline instruction pipeline condition code flag branch prediction branch prediction delayed branch program status word (PSW)2CPU StructureCPU must:Fetch instructionsInterpret instructionsFetch dataProcess dataWrite data3CPU With Systems Bus4CPU Internal Structure5RegistersCPU must have some working space (temporary storage)Called registersNumber and function vary between processor designsOne of the major design decisionsTop level of memory hierarchy6User Visible RegistersGeneral PurposeDataAddressCondition Codes7General Purpose Registers (1)May be true general purposeMay be restrictedMay be used for data or addressingDataAccumulatorAddressingSegment8General Purpose Registers (2)Make them general purposeIncrease flexibility and programmer optionsIncrease instruction size & complexityMake them specializedSmaller (faster) instructionsLess flexibilitySpecialized 英音：spelaizd 1.特化 2.专门 1.专门的，专用的 9How Many GP Registers?Between 8 - 32Fewer = more memory referencesMore does not reduce memory references and takes up processor real estateSee also RISC Estate 英音：isteit 屋；产业10How big?Large enough to hold full addressLarge enough to hold full wordOften possible to combine two data registersC programmingdouble int a;long int a;11Condition Code RegistersSets of individual bitse.g. result of last operation was zeroCan be read (implicitly) by programse.g. Jump if zeroCan not (usually) be set by programsindividual英音：,individjul个别的，个人的，单独的12Control & Status RegistersProgram Counter (PC)Instruction Decoding Register (IDR)Memory Address Register (MAR)Memory Buffer Register (MBR)Revision: what do these all do? Revision 复习 13Program Status Word (PSW)A set of bitsIncludes Condition CodesSign of last resultZeroCarryEqualOverflowInterrupt enable/disableSupervisor Supervisor 监督人;管理人;指导者14Supervisor ModeIntel ring zeroKernel modeAllows privileged instructions to executeUsed by operating systemNot available to user programs privileged 英音：privilidd 特许的，有特权的15Other RegistersMay have registers pointing to:Process control blocks (see O/S)Interrupt Vectors (see O/S)N.B. CPU design and operating system design are closely linked16Example Register Organizations17Instruction CycleRevisionStallings Chapter 318Indirect CycleMay require memory access to fetch operandsIndirect addressing requires more memory accessesCan be thought of as additional instruction subcycle19Instruction Cycle with Indirect20Instruction Cycle State Diagram21Data Flow (Instruction Fetch)Depends on CPU designIn general:FetchPC contains address of next instructionAddress moved to MARAddress placed on address busControl unit requests memory readResult placed on data bus, copied to MBR, then to IRMeanwhile PC incremented by 122Data Flow (Data Fetch)IR is examinedIf indirect addressing, indirect cycle is performedRight most N bits of MBR transferred to MARControl unit requests memory readResult (address of operand) moved to MBR23Data Flow (Fetch Diagram)(1)(2)(2)(3)(4)24Data Flow (Indirect Diagram)(1)(1)(2)25Data Flow (Indirect Diagram)(1)(2)(2)(3)(5)(4)(6)(7)(8)26Data Flow (Execute)May take many formsDepends on instruction being executedMay includeMemory read/writeInput/OutputRegister transfersALU operations27Data Flow (Interrupt)SimplePredictableCurrent PC saved to allow resumption after interruptContents of PC copied to MBRSpecial memory location (e.g. stack pointer) loaded to MARMBR written to memoryPC loaded with address of interrupt handling routineNext instruction (first of interrupt handler) can be fetched28Data Flow (Interrupt Diagram)(2)(3)(1)(4)(5)29PrefetchFetch accessing main memoryExecution usually does not access main memoryCan fetch next instruction during execution of current instructionCalled instruction prefetch prefetch 预读取(文件夹) 30Improved PerformanceBut not doubled:Fetch usually shorter than executionPrefetch more than one instruction?Any jump or branch means that prefetched instructions are not the required instructionsAdd more stages to improve performance31PipeliningFetch instructionDecode instructionCalculate operands (i.e. EAs)Fetch operandsExecute instructionsWrite resultOverlap these operations Pipelining 英音：,paiplaini (电脑)流水线操作技术 overlap 英音：uvlp与.部分重叠32Two Stage Instruction Pipeline33Timing Diagram for Instruction Pipeline Operation34The Effect of a Conditional Branch on Instruction Pipeline Operationpenalty 英音：penlti 处罚;刑罚 35Six Stage Instruction Pipeline36Alternative Pipeline DepictionDepiction英音：dipikn描写;叙述 37Speedup Factorswith InstructionPipeliningk=6,n=100,Sk=?38Dealing with BranchesMultiple StreamsPrefetch Branch TargetLoop bufferBranch predictionDelayed branching39Multiple StreamsHave two pipelinesPrefetch each branch into a separate pipelineUse appropriate pipelineLeads to bus & register contentionMultiple branches lead to further pipelines being needed ADD A,B JZ 100 SUB A,B JNZ 20040Prefetch Branch TargetTarget of branch is prefetched in addition to instructions following branchKeep target until branch is executedUsed by IBM 360/9141Loop BufferVery fast memoryMaintained by fetch stage of pipelineCheck buffer before fetching from memoryVery good for small loops or jumpsc.f. cacheUsed by CRAY-142Loop Buffer Diagram43Branch Prediction (1)Predict never takenAssume that jump will not happenAlways fetch next instruction 68020 & VAX 11/780VAX will not prefetch after branch if a page fault would result (O/S v CPU design)Predict always takenAssume that jump will happenAlways fetch target instruction44Branch Prediction (2)Predict by OpcodeSome instructions are more likely to result in a jump than thersCan get up to 75% successTaken/Not taken switchBased on previous historyGood for loops45Branch Prediction (3)Delayed BranchDo not take jump until you have toRearrange instructions46Branch Prediction Flowchart47Branch Prediction State Diagram(1 1)(1 0 )(0 0)(0 1)48Dealing With Branches100 DJNZ 120101.130 If (Acc=0) goto 150 131. 132 100120 0130150149Intel 80486 PipeliningFetchFrom cache or external memoryPut in one of two 16-byte prefetch buffersFill buffer with new data as soon as old data consumedAverage 5 instructions fetched per loadIndependent of other stages to keep buffers fullDecode stage 1Opcode & address-mode infoAt most first 3 bytes of instructionCan direct D2 stage to get rest of instructionDecode stage 2Expand opcode into control signalsComputation of complex address modesExecuteALU operations, cache access, register updateWritebackUpdate registers & flagsResults sent to cache & bus interface write buffers5080486 Instruction Pipeline Examples51Pentium 4 Registers52EFLAGS Register53Control Registers54MMX Register MappingMMX uses several 64 bit data typesUse 3 bit register address fields8 registersNo MMX specific registersAliasing to lower 64 bits of existing floating point registers55Mapping of MMX Registers to Floating-Point Registers56Pentium Interrupt ProcessingInterruptsMaskableNonmaskableExceptionsProcessor detectedProgrammedInterrupt vector tableEach interrupt type assigned a numberIndex to vector table256 * 32 bit interrupt vectors5 priority classes Priority 英音：prairiti 优先,重点;优先权;先取权 57PowerPC User Visible Registers58PowerPC Register Formats59Foreground ReadingProcessor examplesStallings Chapter 12Manufacturer web sites & specs60