大数据技术年会-－金锄头文库

开放平台 Apsara Cloud Platform,About Aliyun,Chinas largest cloud service provider 100s of thousands of customers Billions of accesses everyday,Providing Foundation Services of the Cloud Eco-system,Pay by usage Elasticity Safety (like “tap water”),The Nature of Cloud Computing,Scale 大规模,Economy 低成本,Public Utility 服务运营,Internet-scale computing 2.5EB generated per day, doubling every 40 months Billions of txns on Taobao everyday, must be processed in 6 hours,Economy means more than low prices Leading to behavior changes (like “telephone”) Key is scheduling (like “power grid”),Two Design Principles,Large-scale general computing platform as the base One system supporting both offline and online services Multi-tenancy, resource sharing, load shifting Web-based API as the delivery mechanism Online activation, pay-by-usage Location-transparency,Linux Cluster,IDC,Resource Management (伏羲),Security (钟馗),RPC (夸父),Naming/Coordination(女娲),Cluster Deployment (大禹),Cluster Monitor (神农),Distributed File System (盘古),Job Scheduling (伏羲),ACE,OSS,OTS,ODPS,ECS/SLB,RDS,Map, Mail, Search, etc,Cloud Mart,Other Cloud Services,OSPS,Cloud Computing Services,Elastic Computing 弹性计算,ECS: virtualized instances of servers that can be created and tailored to meet application requirements SLB: software load balancing technology that can elastically expand service capacity on demand ACE: Convenient and efficient execution environment for Web services, supporting Java, PHP, Node.js,Storage and Databases 海量存储和数据库,Large-scale Computing 大规模计算,Cloud Computing Services,Elastic Computing 弹性计算,Storage and Databases 海量存储和数据库,OSS: large-scale object storage service for unstructured data such as photos, music, or video OTS: large scale storage service for structured or semi-structured data storage and real-time query RDS: managed instances for relational databases with automatic backup and failover,Large-scale Data Computing 大规模计算,A Comparison of Storage and Database Services,Cloud Computing Services,Elastic Computing 弹性计算,Storage and Database 海量存储和数据库,Large-scale Computing 大规模计算,ODPS: large-scale data batch processing and computation, supporting SQL and MapReduce style programming languages OSPS: stream data processing service, supporting SQL-like query language and automatic failure recovery,Apsara Technical Highlights,A common platform supporting both offline and online services Search: 24B pages processed, 13B online index Mail: 100M mails received, 10M mails sent, 10ms latency Capability-based security management framework, enforcing the Principle of Least Privilege Distributed deployment, monitoring and diagnostics Zero SPOF (single-point-of-failure): availability 99.9% All data has 3 replicas: data reliability 99.99999999%,5K,2013/08/15: First-ever 5000-node Apsara cluster (ODPS) went into production 100K CPU cores, 100PB raw storage Processing petabytes per day 2013/09/24: Opened access to ODPS for 4 universities & research institutions Sorting 100TB in 30 minutes Current known record: 72 minutes (Yahoo!, 2013/07/03),Pangu: Large-scale Distributed File System,Master-Slave Architecture Master for metadata mgmt, Slave(Chunk Server) for IO mgmt Paxos-based multi-master architecture, failure recovery time 1 minute End-to-end inline checksum Scales to 1 billion files,CS,CS,CS,CS,CS,Separated IO Pipeline and Storage Mgmt,Adaptive IO Pipeline Replication master: chunk server vs client Replication policy: chaining vs star-replication Chunking policy: fixed, variable, or RAID Durability guarantee: txn logging vs sequential write,Common Storage Management Physical IO management Priority and QoS Background re-replication Chunk placement,Staged Event-driven Physical IO Mgmt,Chunk Server would rearrange IO requests to support priority, QoS, and reduce IO seek overhead,Distributed Re-replication,1TB,1TB,1TB,Typical: Mirroring,(10 hours),1TB,1TB,1TB,Pangu: Distributed re-replication,(20 min, 50-nodes),1TB,Intelligent scheduling Balanced storage Bandwidth throttling Minimizing data loss,RAID,Built into the core system instead of an add-on layer (as in HDFS RAID) Better management of data integrity, recovery, and chunk placement Synchronous redundancy block generation Low-latency failure recovery Small file support,Fuxi Master,Fuxi Master,. . .,App Master,APP Worker,App Master,APP Worker,. . .,Client,. . .,. . .,. . .,Tubo,Tubo,Job control,Resource requests,Node control,Job submission,APP Worker,APP Worker,APP Worker,Tubo,Tubo,Fuxi Resource Scheduling,Multi-dimension resources Elastic quota CGroup-based isolation Fuxi Master HA App Master failover Incremental scheduling,Fuxi Job Programming Model,Job: A DAG Vertex: Task Each task may have multiple instances based on input data chunks Edge: data flow, each task may have multiple input/output flows A data flow connecting two tasks represents data shuffling,Example: Find Best-Sellers,SELECT prod_id, Sum(count) AS quantity FROM orders GROUP BY prod_id ORDER BY quantity DESC;,orders,