资源预览内容
第1页 / 共35页
第2页 / 共35页
第3页 / 共35页
第4页 / 共35页
第5页 / 共35页
第6页 / 共35页
第7页 / 共35页
第8页 / 共35页
第9页 / 共35页
第10页 / 共35页
亲,该文档总共35页,到这儿已超出免费预览范围,如果喜欢就下载吧!
资源描述
BOCN 数据仓库 ETL 性能分析与调整方法2 *前言随着数据仓库 基础数据和应用的不断增加,数据仓库 的性能问 题日益突出。 目前超过1小时的作业有2个 30分钟到1小时的作业有10个 10分钟到30分钟的作业有48个因此对数据仓库进 行性能优化的需求越来越迫切3 *作业时长 分析 共有12个作业运行时间 在30分钟以上系统 作业运行时间 SDDT05_EVENT_FUND_TXN_EXP1:47:03 ALSALM_DEPOSIT_FIX_DATA1:05:00 BRSBRS_EXP_DATA0:51:52 MAST_AGMT_INFO_A0:47:01 SDDSRDFTPONE0:46:08 MAST_AGMT_INFO_B0:46:02 PCRPCR_EXPORT0:43:20 SDDSRDFTPTHREE0:43:07 MASOTHER0:39:33 SDDSRDFTPTWO0:38:43 OCM99_OCRM_JOB_END0:37:20 ALSALM_GROUP_INT_H0:33:194 *共有48个作业运行时间为 10分钟到30分钟系统 作业运行时 间系统作业运行时间 BAKBAK_ESSAPS_SRC0:29:14DQCT03_AGMT_AMT_H0:15:18 T03T03_AGREEMENT_RT0:28:20CCRGRDJKFEXPDATA0:15:03 PMMJXKH_C_LDM0:28:19DQCDQC_CHECK_KPI0:14:50 BAKBAK_ETLENV_710:27:34PMMJXKH_P_LDM_NET0:14:08 T03T03_CARD_ACCT_BAL_COMPO_H0:24:39ALSALM_LOAN_DATA0:13:57 T03T03_AGREEMENT_S020:24:19MAST_MA_BALANCE0:13:53 T03T03_AGMT_ACCU_H0:23:59CBST_CBS_BAL_DL0:13:18 MAST_AGMT_INFO_C0:22:49BAKBAK_ETLENV_400:12:58 T03T03_AGMT_BAL_H0:22:17MASFTPFILE0:12:32 PMMJXKH_P_LDM_SAV0:21:33ACMCUST_FIX_IND_POST0:12:31 CFST_CFS_AC_TREE0:21:31T05T05_CRDCARD_TXN_DTL0:12:05 SDDBCNTDEPO0:20:57ECLNB_CUST_TXN_SUMM0:12:03 DQCT03_CARD_ACCT_BAL_COMPO_H0:20:32ECLNB_NEW_CUST_SUMM0:11:58 MDLM03_FIN_ACCT0:19:37BDET03_ACCT_CRDCARD0:11:53 BDET03_AGREEMENT0:19:26IFST_IFS_BAL_BASIC0:11:20 IRPT_DEP_POT_INT_RATE0:17:24MDLM01_CUST_SIGN_INFO0:11:16 PCRCC_CR_ID_I0:17:08IRPT_LOAN_POT_INT_RATE0:11:07 SDDREPAYDET0:17:06RIRDEPOS_BAL_DATA0:11:07 T03T03_AGMT_BAL_H_S010:17:01T05T05_EVT_MESSAGE_SEND0:10:47 ACMCUST_BASIC_INFO_PRE0:16:51CRST_REP_GJYW_ZB0:10:46 CABCAB_EXP_DATA0:16:40MDLM03_CARD_TXN_STAT0:10:44 PMMJXKH_P_LDM_CARD0:16:37IRPT_IRP_FLUIDITY_BASIC0:10:37 MDLM03_FIN_ACCT_STAT0:16:20PCRCC_CR_BASE_I0:10:28 PFXT05_VCH_FINANCE_TMP0:15:31ACMTHIRD_PTY_MON_SUMM0:10:205 *月末超长作业列表 月末跑批:系统名作业名跑批时间(分钟/min)IRPT_IRP_SAME_BIZ_AVGBALSQL语句无法跑批IRPT_IRP_DEPOSIT_ACTIVITYSQL语句无法跑批IFST_IFS_AC_DLSQL语句无法跑批OCMOCRM_HIGH_CUST_INFOSQL语句无法跑批6 *ETL脚本性能问题与原因分析 业务规则理解 数据访问路径选取的合理性 SQL策略与算法 Multi-Statement 实现并行InsertSelect 比对算法不标准 加载策略不优-大数据量Update、 向非空表进行Insert 一个任务内多脚本,过多的数据库录入/录出 ACRM中居多 模型优化 PPI和PI的选择 对小表或代码表的部分字段进行Collect Statistics,优化sql的执行路径 实际并发运行环境的影响7 *Duplicate RowsA duplicate row is a row of a table whose column values are all identical to another row in the same table.col_a col_bcol_c2050A2550A2550ADuplicate RowsBecause a PK uniquely identifies each row, ideally a relational table should not have duplicate rows! The ANSI standard, however, permits duplicate rows for specialized situations, thus Teradata permits them as well. You may select whether your table will or will not allow them.* Note: If a UPI is selected on a SET table, the duplicate row check is replaced by a check for duplicate index values.CREATE SET TABLE table_A:CREATE MULTISET TABLE table_B:Checks for * and disallows duplicate rows.Doesnt check for and allows duplicate rows.The Teradata defaultThe ANSI default8 *Row Distribution Using a UPI Case 1Notes: Often, but not always, the PK column(s) will be used as a UPI. PI values for Order_Number are known to be unique (its a PK). Teradata will distribute different index values evenly across all AMPs. Resulting row distribution among AMPs is very uniform. Assures maximum efficiency for parallel operations.AMPAMPAMPAMPo_#c_#o_dto_st720224/09C741514/13Co_#c_#o_dto_st732524/13O710314/10O740234/16Co_#c_#o_dto_st718814/13C722524/15Co_#c_#o_dto_st732434/13O738414/12COrder9 *Row Distribution Using a NUPI Case 2Notes: Customer_Number may be the preferred access column for ORDER table, thus a good index candidate. Values for Customer_Number are somewhat non-unique. Choice of Customer_Number is therefore a NUPI. Rows with the same PI value distribute to the same AMP. Row distribution is less uniform or skewed.o_#c_#o_dto_st732524/13O720224/09C722524/15Co_#c_#o_dto_st738414/12C710314/10O741514/13C718814/13Co_#c_#o_dto_st740234/16C732434/13OAMPAMPAMPAMPOrder10 *Row Distribution Using a Highly Non-Unique Primary Index (NUPI) Case 3OrderNotes: Values for Order_Status are “highly” non- unique. Choice of Order_Status column is a NUPI. Only two values exist, so only two AMPs will ever be used for this table. Table will not perform well in parallel operations. Highly non-unique columns are poor PI choices generally. The degree of uniqueness is critical to efficiency.AMPAMPAMPAMPo_#c_#o_dto_st740234/16C720224/09C722524/15C741514/13C718814/13C738414/12Co_#c_#o_dto_st710314/10O732434/13O732524/13O11 *Join RedistributionSELECT. . . FROMTable1 T1 INNER JOINTable2 T2 ONT1.A = T2.A;Join columns are from the same domain.No Redistribution needed.T1 ABCPI100214433T2 ABCPI100725002SELECT. . . FROMTable3 T3 INNER JOINTable4 T4 ONT3.A = T4.B;Join columns are from the same domain.Redistribution needed.T3 ABCPI255345225T4 ABCPI867255566SPOOL ABCPI867255566Redistribute T4 rows in spool on column B.12 *Join Redistribution (cont.)Join is on columns that isnt the Primary Index of either
收藏 下载该资源
网站客服QQ:2055934822
金锄头文库版权所有
经营许可证:蜀ICP备13022795号 | 川公网安备 51140202000112号