资源预览内容
第1页 / 共208页
第2页 / 共208页
第3页 / 共208页
第4页 / 共208页
第5页 / 共208页
第6页 / 共208页
第7页 / 共208页
第8页 / 共208页
第9页 / 共208页
第10页 / 共208页
亲,该文档总共208页,到这儿已超出免费预览范围,如果喜欢就下载吧!
资源描述
数据元素的表示,How to lay out data on disk How to move it to memory,Principles are rather simple, but there are lots of variations in the details,主要内容,如何将SQL数据类型表示成字段? 如何将元组表示成记录? 如何在存储块中表示记录或元组的集合? 如何用块的集合表示和存储关系? 如果不同的元组具有不同的记录大小,如何处理? 如果记录大小因修改发生改变,如何处理?,Principles of Data Layout,数据元素的表示,Attributes of relational tuples (or objects) represented by sequences of bytes called fields Fields grouped together into records representation of tuples or objects Records stored in blocks File: collection of blocks that forms a relation (or the extent of an object class),Overview,Data Items Records Blocks Files Memory,here,数据元素的表示,作业,What are the data items we want to store?,a salary, a name a date, a picture ,数据元素的表示,作业,To represent:,Integer (short): 2 bytes ( -32000+32000) e.g., 35 is,00000000,00100011,arithmetic interpretation by hardware,Real, floating point n bits for mantissa, m for exponent.,数据元素的表示,作业,Characters various coding schemes suggested, most popular is ASCII,To represent:,Example (8 bit ASCII): A: 01000001 a: 01100001 5: 00110101 LF: 00001010,数据元素的表示,Boolean e.g., TRUE FALSE,1111 1111,0000 0000,To represent:,Application specific e.g., RED 1 GREEN 2 BLUE 3 YELLOW 4 ,数据元素的表示,Dates, e.g.: Integer: # days since Jan 1, 1900 8 chars: YYYYMMDD 7 chars: YYYYDDD 10 chars: YYYY-MM-DD (SQL2) (not YYMMDD! Why?) Time, e.g. Integer: seconds since midnight chars: HH:MM:SS.FF (SQL2),To represent:,数据元素的表示,String of characters Null terminated e.g., - Length given e.g., - Fixed length,3,To represent:,数据元素的表示,Bag of bits,Length,Bits,Record - Collection of related fields,E.g.: Employee record: name field, salary field, date-of-hire field, .,数据元素的表示,Types of records:,Main choices: FIXED vs VARIABLE FORMAT FIXED vs VARIABLE LENGTH,A SCHEMA (not record) contains following information - number of fields - type of each field - order in record - meaning of each field,Fixed format,数据元素的表示,Example: fixed format and length,Employee record (1) E#, 2 byte integer (2) E.name, 10 char. Schema (3) Dept, 2 byte code,46,F o r d,02,83,J o n e s,01,Records,数据元素的表示,Record itself contains format; “Self Describing”,Variable format,数据元素的表示,Field name codes could also be strings, i.e. tags ( XML as a data interchange format),# Fields Code identifying field as E# Integer type Code for Ename String type Length of str.,Variable format useful for:,“sparse” records e.g, patient records with thousands of possible tests repeating fields information integration from heterogeneous sources,EXAMPLE: var format record with repeating fields Employee one or more children,3,E_name: Fred,Child: Sally,Child: Tom,Variant between FIXED/VAR format,Hybrid format one part is fixed, other variable,数据元素的表示,Many variations in internal organization of record,Just to show one: length of field,3,F3,10,F1,5,F2,12,* * *,3,32,5,15,20,F1,F2,F3,total size,offsets,0 1 2 3 4 5 15 20 32,数据元素的表示,模式信息主要为出现在CREATE TABLE语句中的信息; 关系的属性 属性类型 属性在元组中出现的顺序 属性或关系自身上的约束,模式信息,数据元素的表示,SQL Server 数据行结构,状态位A 1-3 记录类型 5 是否存在变长列,状态位B,定长部分的长度,定长数据,列数,NULL位图每个列1位,变长列的数目,列偏移数组,变长列数据,Next: placing records into blocks,blocks . a file,数据元素的表示,(1) separating records (2) spanned vs. unspanned (3) mixed record types clustering (4) split records (5) sequencing (6) addressing records,Issues in storing records in blocks:,数据元素的表示,Block (a) fixed size recs. - no need to separate (b) special marker (c) give record lengths (or offsets) - within each record - in block header (see later),(1) Separating records,R2,R1,R3,数据元素的表示,Unspanned: records are within one block block 1 block 2 . Spanned: records span block boundaries block 1 block 2 .,(2) Spanned vs. Unspanned,数据元素的表示,Unspanned is much simpler, but may waste space Spanned necessary if record size block size (e.g., fields containing large “BLOB“s for, say, MPEG video clips),Spanned vs. unspanned:,数据元素的表示,Example (of unspanned records),106 records each of size 2,050 bytes (fixed) block size = 4096 bytes,Space used about 4 x 109 B, about half wasted,数据元素的表示,Mixed - records of different types (e.g. DEPT, EMPLOYEE) allowed in same block e.g., a block:,(3) Mixed record types,Dep,d1,Emp,e1,Emp,e2,数据元素的表示,Records that are frequently accessed together should be in the same block,数据元素的表示,Why do we want to mix?,Answer: CLUSTERING,Example,Q1: select DEPT.Name, EMP.Name, from DEPT, EMP where DEPT. Name = EMP.DeptName a block,DEPT,Name=Toy, .,EMP,DeptName=Toy, .,EMP,DeptName=Toy, .,数据元素的表示,If Q1 frequent, clustering is good But consider Q2: SELECT * FROM DEPT If Q2 is frequent, clustering is counter- productive,数据元素的表示,Fixed part in one block Typically for hybrid format Variable part in another block,(4) Split records,数据元素的表示,Block with fixed recs.,R1 (a),R1 (b),Blocks with variable recs.,数据元素的表示,Ordering records in file (and block) by some key value Sequential file ( sequenced),(5) Sequencing,Typ
收藏 下载该资源
网站客服QQ:2055934822
金锄头文库版权所有
经营许可证:蜀ICP备13022795号 | 川公网安备 51140202000112号