资源预览内容
第1页 / 共68页
第2页 / 共68页
第3页 / 共68页
第4页 / 共68页
第5页 / 共68页
第6页 / 共68页
第7页 / 共68页
第8页 / 共68页
第9页 / 共68页
第10页 / 共68页
亲,该文档总共68页,到这儿已超出免费预览范围,如果喜欢就下载吧!
资源描述
Google Cluster Computing Faculty Training Workshop Module VII: Other Google TechnologiesOverview BigTable ChubbyBigTableA Conventional Database Data structure: arbitrary # of rows Fixed number and type of columns Supports search based on values in all cells Supports synthesis of output reports based on multiple tables (relational operators)Googles Needs Data reliability High speed retrieval Storage of huge numbers of records (several TB of data) (Multiple) past versions of records should be availableAssumptions Many times more reads than writes Individual component failures common Disks are cheap If they control database design as well as application design, the interface need not be standardReasonable Questions Are structured queries necessary? Can data be organized such that related data is physically close by nature? What is the minimum coordination required to retrieve data? Can existing components be leveraged to provide reliability and abstraction?From Needs to Constraints Simplified data retrieval mechanism (row, col, timestamp) value lookup, only No relational operators Atomic updates only possible at row levelBut Some Additional Flexibility Arbitrary number of columns per row Arbitrary data type for each column New constraint: data validation must be performed by application layer!Logical Data Representation Rows value lookup involves only one disk seek to disk blockPhysical Representation (2) A logical “table” is divided into multiple tablets Each tablet is one or more SSTable files Each tablet stores an interval of table rows If a tablet grows beyond a certain size, it is split into two new tabletsNetwork Interface One master server Communicates only with tablet servers Several tablet servers Perform actual client accesses “Chubby” lock server provides coordination and mutual exclusion GFS servers provide underlying storageBigtable ArchitectureMaster Responsibilities Determine which tablet server should hold a given (new) tablet Interface with GFS to garbage collect stale SSTable files Detect tablet server failures/resumption and load balance accordinglyTablet Server FailureTablet Server FailureTablet Server FailureTable Access StructureWrite Procedure Writes to a tablet are recorded in a GFS- enabled commit log New data is then stored in memory on tablet server supercedes underlying SSTable filesMinor Compactions Old data is stored in SSTable files Newer values are stored in memory in a memtable When a memtable exceeds a certain size, it is converted to an SSTable and written to disk Thus a tablet may be multiple SSTables underneath!Merging Compactions Multiple SSTable files are now involved in a single lookup operation slow! Merging compactions read multiple SSTables and create a new SSTable containing the most recent data Old SSTable files are discarded If only one SSTable remains for a tablet, called a major compactionCommit Logs stored in same SSTable Fast compression algorithms conserve space in SSTable by compressing related data Bloom filters If multiple SSTables comprise a tablet, bloom filters allow quick discarding of irrelevant SSTables from a lookup operationPerformanceConclusions Simple data schemas work Provided you design clients ground-up for this ahead of time Layered application building simplifies protocols other replicas can read this file to discover the chosen master name. Chubby doubles as a name serviceDistributed Consensus Chubby cell is usually 5 replicas 3 must be alive for cell to be viable How do replicas in Chubby agree on their own master, official lock values? PAXOS algorithmPAXOS Paxos is a family of algorithms (by Leslie Lamport) designed to provide distributed consensus in a network of several processors.Processor Assumptions Operate at arbitrary speed Independent, random failures Procs with stable storage may rejoin protocol after failure Do not lie, collude, or attempt to maliciously subvert the protocolNetwork Assumptions All processors can communicate with (“see”) one another Messages are sent asynchronously and may take arbitrarily long to deliver Order of messages is not guaranteed: they may be lost, reordered, or duplicated Messages, if delivered, are not corrupted in the processA Fault Tolerant Memory of Facts Paxos provides a memory for individual “facts” in the network. A fact is a binding from a variable to a value. Paxos between 2F+1 processors is reliable and can make progress if up to F of them fail.Roles Proposer An agent that proposes a fact Leader the authoritative proposer Acceptor holds agreed-upon facts in its memory Learner May retrieve a fact from the systemSafety Guarantees Nontriviality: Only proposed values can be learned Consistency: Only at most one value can be learned Liveness: If at least one value V has been propos
收藏 下载该资源
网站客服QQ:2055934822
金锄头文库版权所有
经营许可证:蜀ICP备13022795号 | 川公网安备 51140202000112号