资源预览内容
第1页 / 共33页
第2页 / 共33页
第3页 / 共33页
第4页 / 共33页
第5页 / 共33页
第6页 / 共33页
第7页 / 共33页
第8页 / 共33页
第9页 / 共33页
第10页 / 共33页
亲,该文档总共33页,到这儿已超出免费预览范围,如果喜欢就下载吧!
资源描述
BIG DATAEVERY MINUTE1,388 cabs2,777private carsDidi rides hailed:EVERY MINUTE 395,833People log inTo WeChat 194,444 peopleare video or audio chattingEVERY MINUTE625,000Youku Tudou videosbeing watchedEVERY MINUTE64,814posts and reposts on WeiboSEARCH4,166,667 search queriesEVERY MINUTE774 people buy something on Alibabas marketplacesUS$1,133,942spent on Alibaba1Definition2Characteristic3NoSQL4RDBMS5MapReduceCONTENTS6Applications1Definition1DefinitionBIGDATAvolume of dataimportant dataon a day-to-day basisfor better decisions2Characteristic2CharacteristicVolumeThe quantity of generated and stored data.VarietyThe type and nature of the data.The quality of captured data can vary greatly, affecting accurate analysis.VelocityIn this context, the speed at which the data is generated and processed to meet the demands and challenges that lie in the path of growth and development.VariabilityInconsistency of the data set can hamper processes to handle and manage it.Veracity3NoSQL3NoSQLNoSQL refers to document-oriented databases SQL doesnt scale well horizontally. It is schemaless. But not formless (JSON format). JSON: data interchange format Mongo Database Couch Database3NoSQLBasic Availabilityspread data across many storage systems with a high degree of replication.Soft StateEventual ConsistencyBase Modeldata consistency is the developers problem and should not be handled by the database.at some point in the future, data will converge to a consistent state. No guarantees are made “when”.3NoSQL field1: value1, field2: value2 fieldN: valueN var mydoc = _id:ObjectId(5099803df3f4948bd2f98391), name: first: Alan, last: Turing , birth: new Date(Jun 23, 1912), death: new Date(Jun 07, 1954), contribs: Turing machine, Turing test, , views : NumberLong(1250000) JSON Structure3NoSQLRDBMS vs NoSQLXszcRow DB:001:10,Smith,Joe,40000;002:12,Jones,Mary,50000;003:11,Johnson,Cathy,44000;004:22,Jones,Bob,55000; index: 001:40000;002:50000;003:44000;004:55000;Column DB:10:001,12:002,11:003,22:004;Smith:001,Jones:002,Johnson:003,Jones:004;Joe:001,Mary:002,Cathy:003,Bob:004;40000:001,50000 ;Smith:001,Jones:002,004,Johnson:003;3NoSQLBenefitsColumn-oriented organizations are more efficient when an aggregate needs to be computed over many rows but only for a notably smaller subset of all columns of data, because reading that smaller subset of data can be faster than reading all data.Column-oriented organizations are more efficient when new values of a column are supplied for all rows at once, because that column data can be written efficiently and replace old column data without touching any other columns for the rows.Row-oriented organizations are more efficient when many columns of a single row are required at the same time, and when row-size is relatively small, as the entire row can be retrieved with a single disk seek.Row-oriented organizations are more efficient when writing a new row if all of the column data is supplied at the same time, as the entire row can be written with a single disk seek.3NoSQLSQL vs Non SQLA good compromise is to design your system with 3 logical DBs 1. Normal SQL DB used by your admin application to create content. 2. No-SQL DB for front-end/public/high-volume applicaiton used by the public internet. 3. The last DB is for analytical reporting system using cubes and all that good stuff. Then data flows from the Admin DB to the client No-SQL DB when someone Publishes a piece of content, the client (NoSQL) db provides very fast read access and records user interactions with the content. Then you have a scheduled job that pulls the data from the client DB into the reporting system. Since Admin, client, and reporting are often separate apps, each application team can work with data in the format that best serves the application and the transition from one system to the other is handled in the service layers. 4RDBMS4RDBMSfixed-schema, row-oriented databases with ACID properties and a sophisticated SQL query engineThe emphasis is on strong consistency, referential integrity, abstraction from the physical layer, and complex queries through the SQL language.easily create secondary indexes, perform complex inner and outer joins, count, sum, sort, group, and page your data across a number of tables, rows, and columns.5MapReduceDividing and conqueringHighly fault tolerantEvery data block replicated on 3 nodesDifficult to implement5MapReduce5Comparison RDBMSMapReduceData sizeGBPBAccessInteractive and Batch Batch UpdatesRead /Write many times Write once ,Read many times Structure Static Schema Dynamic Scheme Integrated High(ACID)Low Scaling No liner Liner DBA Ratio 1:401:30005How does MapReduce workMapReduce uses key/value pairs. (Traditionally using rows and columns)-Mapall the intermediate values for a given output key are combined together into a list. -ReduceThe reduce function then combines the intermediate values into one or more final values for the same key. -ReduceTwo steps: Map and Reduce6Application6GovernmentThe use and adoption of big data within governmental processes is beneficial and allows efficiencies in terms of cost, productivity, and innovation, but does not come without its flaws. Data analysis often requires multiple parts of government (central and local) to work in collaboration and create new and innovative processes to deliver the desired outcome. Below are the thoughtby whom? leading examples within the governmental big data space.6HealthcareBig data analytics has helped healthcare improve by providing personalized medicine and prescriptive analytics, clinical risk intervention and predictive analytics, waste and care variability reduction, automated external and internal reporting of patient data, standardized medical terms and patient registries and fragmented point solutions.6EducationA McKinsey Global Institute study found a shortage of 1.5 million highly trained data professionals and managers and a number of universities including University of Tennessee and UC Berkeley, have created masters programs to meet this demand. Private bootcamps have also developed programs to meet that demand, including free programs like The Data Incubator or paid programs like General Assembly.6Internet of ThingsBig Data and the IoT work in conjunction. From a media perspective, data is the key derivative of device inter-connectivity and allows accurate targeting. The Internet of Things, with the help of big data, therefore transforms the media industry, companies and even governments, opening up a new era of economic growth and competitiveness. The intersection of people, data and intelligent algorithms have far-reaching impacts on media efficiency. The wealth of data generated allows an elaborate layer on the present targeting mechanisms of the industry.6SportsBig data can be used to improve training and understanding competitors, using sport sensors. Besides, it is possible to predict winners in a match using big data analytics. Future performance of players could be predicted as well. Thus, players value and salary is determined by data collected throughout the season.THANKS5Comparison 1KB=2(10)B=1024B1MB=2(10)KB=1024KB 1GB=2(10)MB=1024MB 1TB=2(10) GB=1024GB 1PB=2(10) TB=1024TB1EB=2(10) PB=1024PB Back
收藏 下载该资源
网站客服QQ:2055934822
金锄头文库版权所有
经营许可证:蜀ICP备13022795号 | 川公网安备 51140202000112号