资源预览内容
第1页 / 共62页
第2页 / 共62页
第3页 / 共62页
第4页 / 共62页
第5页 / 共62页
第6页 / 共62页
第7页 / 共62页
第8页 / 共62页
第9页 / 共62页
第10页 / 共62页
亲,该文档总共62页,到这儿已超出免费预览范围,如果喜欢就下载吧!
资源描述
Recent Developments in Data WarehousingHugh J. Watson Terry College of Business University of Georgia hwatsonterry.uga.eduhttp:/www.terry.uga.edu/hwatson/dw_tutorial.pptTutorial ObjectivesnProvide an overview of data warehousingnProvide materials to support the teaching of data warehousing nDiscuss recent developments in data warehousingThe Importance of Data WarehousingnProvide a “single version of the truth”nImprove decision making nSupport key corporate initiatives such as performance management, B2C and B2B e-commerce, and customer relationship managementnEstimated to be a $113.5 billion market in 2002 for systems, software, services, and in-house expenditures (Palo Alto Management Group) Data Warehouse CharacteristicsnSubject oriented - data are organized around sales, products, etc.nIntegrated - data are integrated to provide a comprehensive viewnTime variant - historical data are maintainednNonvolatile - data are not updated by usersTopics CoverednDefinitions and conceptsnTwo case studies: Harrahs Entertainment (first) and Owens street number and street name; and city and state.CorrectingnCorrects parsed individual data components using sophisticated data algorithms and secondary data sources.nExample include replacing a vanity address and adding a zip code.StandardizingnStandardizing applies conversion routines to transform data into its preferred (and consistent) format using both standard and custom business rules.nExamples include adding a pre name, replacing a nickname, and using a preferred street name. MatchingnSearching and matching records within and across the parsed, corrected and standardized data based on predefined business rules to eliminate duplications.nExamples include identifying similar names and addresses.ConsolidatingAnalyzing and identifying relationships between matched records and consolidating/merging them into ONE representation.Data StagingnOften used as an interim step between data extraction and later stepsnAccumulates data from asynchronous sources using native interfaces, flat files, FTP sessions, or other processesnAt a predefined cutoff time, data in the staging file is transformed and loaded to the warehousenThere is usually no end user access to the staging filenAn operational data store may be used for data stagingData TransformationnTransforms the data in accordance with the business rules and standards that have been establishednExample include: format changes, deduplication, splitting up fields, replacement of codes, derived values, and aggregatesData LoadingnData are physically moved to the data warehousenThe loading takes place within a “load window” nThe trend is to near real time updates of the data warehouse as the warehouse is increasingly used for operational applicationsMeta DatanData about datanNeeded by both information technology personnel and usersnIT personnel need to know data sources and targets; database, table and column names; refresh schedules; data usage measures; etc. nUsers need to know entity/attribute definitions; reports/query tools available; report distribution information; help desk contact information, etc. Recent Development: Meta Data IntegrationnA growing realization that meta data is critical to data warehousing success nProgress is being made on getting vendors to agree on standards and to incorporate the sharing of meta data among their toolsnVendors like Microsoft, Computer Associates, and Oracle have entered the meta data marketplace with significant product offeringsDatabase VendorsnHigh end (i.e., terabyte plus) vendors include IBM (DB2) and NCR -Teradata (Teradata)nOracle (8i) and Microsoft (SQL Server 7) are major players for smaller databasesOn-line Analytical Processing (OLAP)nA set of functionality that facilitates multidimensional analysisnAllows users to analyze data in ways that are natural to themnComes in many varieties - ROLAP, MOLAP, DOLAP, etc.ROLAPnRelational OLAPnUses a RDBMS to implement and OLAP environmentnTypically involves a star schema to provide the multidimensional capabilitiesnOLAP tool manipulates RDBMS star schema datanCalled slowlap by MOLAP vendorsMOLAPnMultidimensional OLAPnUses a MDDBS (e.g., Essbase) to store and access datanUsually requires proprietary (non SQL) data access toolsnProvides exceptionally fast response timesStar SchemanCreates non-normalized data structuresnEasier for users to understandnOptimized for OLAPnUses fact (facts or measures in the business) and dimension (establishes the context of the facts) tablesOLAP ToolsnProducts come from vendors such as Brio, Cognos, Hyperion, and BusinessObjectsnTypically available as a fat or thin (i.e., browser) clientnIn a web environment, the browser communicates with a web server, which talks to an application server, which connects to backend databasesnThe application server provides query, reporting, and OLAP analysis functionality ove
收藏 下载该资源
网站客服QQ:2055934822
金锄头文库版权所有
经营许可证:蜀ICP备13022795号 | 川公网安备 51140202000112号