资源预览内容
第1页 / 共26页
第2页 / 共26页
第3页 / 共26页
第4页 / 共26页
第5页 / 共26页
第6页 / 共26页
第7页 / 共26页
第8页 / 共26页
第9页 / 共26页
第10页 / 共26页
亲,该文档总共26页,到这儿已超出免费预览范围,如果喜欢就下载吧!
资源描述
8/23/2024BUPT-AI&DM1Chapter 3 Basic Data Mining Techniques3.2 Generating Association Rules沥御晓陨付恳蠢佬鹰愚风纳瓣昼蟹云棺罢低愉喊医嘿囊取蔡肯冕盘闸卑赛人工智能与数据挖掘教学课件lect-4-13人工智能与数据挖掘教学课件lect-4-138/23/2024BUPT-AI&DM21. What Is Association Mining? Applications:Applications: Cross-sell, catalog design, store layout, promotion design etc. Cross-sell, catalog design, store layout, promotion design etc. Also called basket (data) analysis, Also called basket (data) analysis, Difference between Association & ClassificationDifference between Association & Classification Examples:Examples: If customers purchase milk, they also purchase breadIf customers purchase milk, they also purchase bread If customers purchase bread, they also purchase milkIf customers purchase bread, they also purchase milk Rule confidence (Rule confidence (可信度可信度可信度可信度, ,置信度置信度置信度置信度): ): Given a rule of the form Given a rule of the form “ “If A then If A then B B” ”, rule confidence is the conditional probability that B is true when A is , rule confidence is the conditional probability that B is true when A is known to be true. known to be true. Indicates the certainty of this association rule. Rule support (Rule support (支持度支持度支持度支持度, coverage , coverage 覆盖度覆盖度覆盖度覆盖度): ): The minimum percentage of The minimum percentage of instances in the database that contain all items listed in a given instances in the database that contain all items listed in a given association rule. association rule. Indicates the usefulness of an association rule. buys(x, buys(x, “ “diapersdiapers” ”) ) buys(x, buys(x, “ “beersbeers” ”) 0.5%, 60% ) 0.5%, 60% (support, support, confidence)confidence) major(x, major(x, “ “CSCS” ”) takes(x, ) takes(x, “ “DBDB” ”) ) grade(x, grade(x, “ “A A” ”) 1%, 75%) 1%, 75%件谎金救恫溃哟镀荷淀端租纯编朋拴捧赘肌歌赊嫉乘锌辉乌办旭滓馈嗅雍人工智能与数据挖掘教学课件lect-4-13人工智能与数据挖掘教学课件lect-4-138/23/2024BUPT-AI&DM3Cross-sell & Up-sell Cross-sell is a marketing term for the practice of Cross-sell is a marketing term for the practice of suggesting suggesting related products or services to a customerrelated products or services to a customer who is considering who is considering buying something. If youre buying a book on Amazon.com, for buying something. If youre buying a book on Amazon.com, for example, you may be shown a list of books similar to the one example, you may be shown a list of books similar to the one youve chosen or books purchased by other customers that bought youve chosen or books purchased by other customers that bought the same book you did. A search on a companys Web site for bed the same book you did. A search on a companys Web site for bed linens might also bring up listings of matching draperies (linens might also bring up listings of matching draperies (大窗帘大窗帘). ). The most ubiquitous example of cross-sell is likely the oft-spoken The most ubiquitous example of cross-sell is likely the oft-spoken fast food phrase: Would you like fries with that?fast food phrase: Would you like fries with that? 交叉销售:推荐给用户购买之外的额外项目。交叉销售:推荐给用户购买之外的额外项目。 Up-sell is a marketing term for the practice of Up-sell is a marketing term for the practice of suggesting higher suggesting higher priced products or services to a customer priced products or services to a customer who is considering a who is considering a purchase. An up-sell offer is typically for a better version of the purchase. An up-sell offer is typically for a better version of the same product or service you are considering, such as a gym same product or service you are considering, such as a gym membership with more privileges, or a faster computer. The most membership with more privileges, or a faster computer. The most ubiquitous example of up-sell is likely the oft-spoken (if ubiquitous example of up-sell is likely the oft-spoken (if ungrammatical) fast food phrase: Would you like to biggie-size ungrammatical) fast food phrase: Would you like to biggie-size that?that?“ “ 提升销售提升销售( (向上销售向上销售) ):向考虑购买的客户推荐较高价格的产品或者服务。:向考虑购买的客户推荐较高价格的产品或者服务。木谬个彻概缎店赊你魁掸虫抄鸿浅整违剃诸茨孪扣山痰澡毛货瞪规莹煽剩人工智能与数据挖掘教学课件lect-4-13人工智能与数据挖掘教学课件lect-4-138/23/2024BUPT-AI&DM4Why cross-sell? 其一,可以增强客户忠诚度。其一,可以增强客户忠诚度。其一,可以增强客户忠诚度。其一,可以增强客户忠诚度。如果客户购买本公司的产品如果客户购买本公司的产品和服务越多,客户流失的可能性就越小。来自银行的数据和服务越多,客户流失的可能性就越小。来自银行的数据显示:购买两种产品的客户的流失率是显示:购买两种产品的客户的流失率是55%55%,而拥有,而拥有4 4个个或更多产品或服务的流失率几乎是或更多产品或服务的流失率几乎是0 0。 其二,交叉销售也可以增加利润。其二,交叉销售也可以增加利润。其二,交叉销售也可以增加利润。其二,交叉销售也可以增加利润。实践证明,将一种产品实践证明,将一种产品和服务推销给一个现有客户的成本远低于吸收一个新客户和服务推销给一个现有客户的成本远低于吸收一个新客户的成本。来自信用卡公司的数据显示:平均说来,信用卡的成本。来自信用卡公司的数据显示:平均说来,信用卡客户要到第三年才能开始有利润。由此可见,吸收新客户客户要到第三年才能开始有利润。由此可见,吸收新客户的成本是非常高的,而对现有客户进行交叉销售,也自然的成本是非常高的,而对现有客户进行交叉销售,也自然成为许多公司增加投资回报的捷径。成为许多公司增加投资回报的捷径。 链桐瞅蒜恰妮颂措昆杨田黍院哦眨彤新飞栖丹搽募肯烷瑟速灭假膨缝剿沃人工智能与数据挖掘教学课件lect-4-13人工智能与数据挖掘教学课件lect-4-138/23/2024BUPT-AI&DM5如何找产品如何找产品 目前有两种方法:业务灵感和数据挖掘。目前有两种方法:业务灵感和数据挖掘。 有些时候,业务灵感可以告诉公司,哪些产品需要进行交有些时候,业务灵感可以告诉公司,哪些产品需要进行交叉销售。比如,房屋贷款自然是向抵押贷款者推销的下一叉销售。比如,房屋贷款自然是向抵押贷款者推销的下一个产品。再比如,一个公司最近新开发了一个具有战略意个产品。再比如,一个公司最近新开发了一个具有战略意义的产品,那么该产品本身就是一个交叉销售的好选择。义的产品,那么该产品本身就是一个交叉销售的好选择。 业务灵感的确是一个快速确定交叉销售产品的方法。但是,业务灵感的确是一个快速确定交叉销售产品的方法。但是,仅仅依赖业务灵感可能会丧失许多商机,因为在某些情况仅仅依赖业务灵感可能会丧失许多商机,因为在某些情况下,一些好的交叉销售产品并不是直观可见的。因此,如下,一些好的交叉销售产品并不是直观可见的。因此,如果要寻找那些潜在的交叉销售商机,有一个最好用的工具果要寻找那些潜在的交叉销售商机,有一个最好用的工具数据挖掘。数据挖掘。 关联分析是数据挖掘中的一种方法,它可以从历史数据中关联分析是数据挖掘中的一种方法,它可以从历史数据中找到产品和产品之间的相关关系,从而产生出最恰当的交找到产品和产品之间的相关关系,从而产生出最恰当的交叉销售产品或服务。但是,关联分析的结果必须依赖业务叉销售产品或服务。但是,关联分析的结果必须依赖业务知识来审核其准确性和价值,因此,在实际应用中,又常知识来审核其准确性和价值,因此,在实际应用中,又常常将业务灵感和数据挖掘结合起来,以确定合适的交叉销常将业务灵感和数据挖掘结合起来,以确定合适的交叉销售产品。售产品。 耗欣秽戈皂析寄们年阮经庶敝逾晾疼苞阿饲名吁诉城话块辉漾府田间涎阶人工智能与数据挖掘教学课件lect-4-13人工智能与数据挖掘教学课件lect-4-138/23/2024BUPT-AI&DM6如何找下家如何找下家 第一是关联分析:主要是了解不同产品之间同时或前后发第一是关联分析:主要是了解不同产品之间同时或前后发生的购买关系,从而为捆绑销售或交叉销售提供有价值的生的购买关系,从而为捆绑销售或交叉销售提供有价值的建议。建议。 关联分析起源于零售业,它的一个典型例子就是啤酒和尿关联分析起源于零售业,它的一个典型例子就是啤酒和尿布的故事。布的故事。(数据挖掘人员通过对交易数据的分析,发现啤酒和尿布同时(数据挖掘人员通过对交易数据的分析,发现啤酒和尿布同时购买的相关程度很高。再经进一步的调查发现,原来是有孩子的父亲在给自购买的相关程度很高。再经进一步的调查发现,原来是有孩子的父亲在给自己购买啤酒时,也常会给自己刚出生不久的子女购买尿布。根据以上信息)己购买啤酒时,也常会给自己刚出生不久的子女购买尿布。根据以上信息)超市人员及时调整物品的摆放结构,可以让客户产生更多超市人员及时调整物品的摆放结构,可以让客户产生更多购买行为。目前,类似的数据挖掘技术也已在国外许多银购买行为。目前,类似的数据挖掘技术也已在国外许多银行中广泛使用,针对既有客户推销不同的产品和服务。行中广泛使用,针对既有客户推销不同的产品和服务。 第二种方法就是应用分类模型对所有客户购买第二种方法就是应用分类模型对所有客户购买该指定产品该指定产品该指定产品该指定产品的可能性进行预测,从而发现谁最有可能购买该产品。的可能性进行预测,从而发现谁最有可能购买该产品。 每一种数据挖掘方案都各有所长,至于什么方案最优,要每一种数据挖掘方案都各有所长,至于什么方案最优,要根据实际应用和模型结果来确定。根据实际应用和模型结果来确定。 喘幽就瞳惠账窿于诲监舶围腕火撮访偏弃啮绪北捍拌虽轴力高批绢晌遏紫人工智能与数据挖掘教学课件lect-4-13人工智能与数据挖掘教学课件lect-4-138/23/2024BUPT-AI&DM7 什么是什么是“ “cross sellcross sell” ”?我们用一个故事来说明。?我们用一个故事来说明。 有家公司的总经理很奇怪地发现,他的某位雇员一天卖了有家公司的总经理很奇怪地发现,他的某位雇员一天卖了 300,000 300,000 美元,于是他便去问过究竟。美元,于是他便去问过究竟。 “ “是这样的,是这样的,” ”这位销售员说,这位销售员说,“ “一个男士进来买东西,我先一个男士进来买东西,我先卖给他一个小号的鱼钩。然后,告诉他小鱼钩是钓不到大卖给他一个小号的鱼钩。然后,告诉他小鱼钩是钓不到大鱼的,于是他买了大号的鱼钩。我又提醒他,这样不大不鱼的,于是他买了大号的鱼钩。我又提醒他,这样不大不小的鱼不就跑了吗?于是他又买了中号鱼钩。接著,我卖小的鱼不就跑了吗?于是他又买了中号鱼钩。接著,我卖给他小号的鱼线,中号的鱼线,最后是大号的鱼线。给他小号的鱼线,中号的鱼线,最后是大号的鱼线。 接下来我问他上哪儿钓鱼,他说海边。我建议他买条船,接下来我问他上哪儿钓鱼,他说海边。我建议他买条船,所以我带他到卖船的专柜,卖给他长所以我带他到卖船的专柜,卖给他长 20 20 英尺、有两个发英尺、有两个发动机的纵帆船。他说他的车可能拖不动这么大的船。我于动机的纵帆船。他说他的车可能拖不动这么大的船。我于是带他去汽车销售区,卖给他一辆丰田新款豪华型是带他去汽车销售区,卖给他一辆丰田新款豪华型 巡洋舰巡洋舰 。” ” 总经理后退两步,几乎难以置信地问道总经理后退两步,几乎难以置信地问道: : “ “一个顾客仅仅来一个顾客仅仅来买个鱼钩,你就能卖给他这么多东西?买个鱼钩,你就能卖给他这么多东西?” ” “ “不是的,不是的,” ”这位售这位售货员回答道,货员回答道,“ “他是来给他妻子买针的。我就告诉他他是来给他妻子买针的。我就告诉他: : 你的你的周末算是毁了,干吗不去钓鱼呢?!周末算是毁了,干吗不去钓鱼呢?!” ” 伦永柳衬云润灾颤纷尤浆饿勤氢嚎桶舵蜕颐唬钎奠陛彪芜钾自读例惋狡促人工智能与数据挖掘教学课件lect-4-13人工智能与数据挖掘教学课件lect-4-138/23/2024BUPT-AI&DM8 戴尔的戴尔的 upsell upsell,就是这种顾客来买针,最后却买回一大堆新东西的模式。,就是这种顾客来买针,最后却买回一大堆新东西的模式。 例如,一个客户打电话只想买一台普通的笔记本电脑,戴尔的销售人员很关例如,一个客户打电话只想买一台普通的笔记本电脑,戴尔的销售人员很关心地问是做什么用?当客户说自己是个咨询师后,他建议客户要买有无线网心地问是做什么用?当客户说自己是个咨询师后,他建议客户要买有无线网卡的,便于移动办公;带卡的,便于移动办公;带 CD CD 刻录的,便于备份文件;要多买一块备用电池,刻录的,便于备份文件;要多买一块备用电池,便于飞机上使用;最后,客户决定购买之后,他又建议要不要有戴尔标识的便于飞机上使用;最后,客户决定购买之后,他又建议要不要有戴尔标识的真皮包?真皮包? 可想而知,这位客户最后的购买远远超出了预算,但客户对自己的计算机很可想而知,这位客户最后的购买远远超出了预算,但客户对自己的计算机很满意。虽然有些功能到现在也几乎不用,但无线网卡和刻录却的确让人受益满意。虽然有些功能到现在也几乎不用,但无线网卡和刻录却的确让人受益良多。良多。 戴尔的戴尔的 upsell upsell 无疑是一种模式,这种模式通过充分挖掘客户潜力来盈利。无疑是一种模式,这种模式通过充分挖掘客户潜力来盈利。但赢利的本质是什么?赢利的本质是客户价值的公平回报。为什么人们普遍但赢利的本质是什么?赢利的本质是客户价值的公平回报。为什么人们普遍尊敬赢利的公司?就是因为赢利代表了这些公司对客户价值挖掘的能力和水尊敬赢利的公司?就是因为赢利代表了这些公司对客户价值挖掘的能力和水平。比如说,麦当劳能够把一个简单的汉堡业务、可口可乐能够把一个单纯平。比如说,麦当劳能够把一个简单的汉堡业务、可口可乐能够把一个单纯的碳酸饮料业务做到的碳酸饮料业务做到 500 500 强,除了超一流的客户价值挖掘与经营能力,我们强,除了超一流的客户价值挖掘与经营能力,我们找不到别的解释。找不到别的解释。 从这种角度,我们才能真正理解戴尔从这种角度,我们才能真正理解戴尔 upsell upsell 模式的实质。在模式的实质。在 IBM IBM 与惠普的与惠普的分销模式下,计算机为客户提供的是功能价值,即计算机能够提供运算、文分销模式下,计算机为客户提供的是功能价值,即计算机能够提供运算、文字处理、办公自动化等强大的功能。而在戴尔一对一的字处理、办公自动化等强大的功能。而在戴尔一对一的 upsell upsell 模式下,计算模式下,计算机为用户提供的却是用户价值本身机为用户提供的却是用户价值本身: : 不同的用户有着不同的需求。而戴尔的不同的用户有着不同的需求。而戴尔的销售人员所做的无非就是对用户个性化价值的理解与把握,从而能够把这些销售人员所做的无非就是对用户个性化价值的理解与把握,从而能够把这些不同的需求挖掘出来。这就是能力,这就是赢利!不同的需求挖掘出来。这就是能力,这就是赢利! 个性化推荐个性化推荐侍贪衍错或呀桌淫旗召爱敦婆棒仁顺找底阵汁葡滑误睛挫河滦切猾仁榷否人工智能与数据挖掘教学课件lect-4-13人工智能与数据挖掘教学课件lect-4-138/23/2024BUPT-AI&DM9Rule Measures: Support and ConfidenceFind all the rules Find all the rules X & Y X & Y Z Z with minimum confidence and with minimum confidence and supportsupport supportsupport, , s s, , probabilityprobability that a that a transaction contains X transaction contains X Y Y Z Z confidenceconfidence, , c c, , conditional conditional probabilityprobability that a transaction that a transaction having X having X Y also contains Y also contains Z ZLet minimum support 50%, Let minimum support 50%, and minimum confidence and minimum confidence 50%, we have50%, we have A A C C (50%, 66.6%)(50%, 66.6%) C C A A (50%, 100%)(50%, 100%)Customerbuys diaperCustomerbuys bothCustomerbuys beer奋咎讣搬秧毖谣厌柄晤矾蜕哲活灰睦际宅孽翌凶梯篓史忽肋旅胜峪抛氨谰人工智能与数据挖掘教学课件lect-4-13人工智能与数据挖掘教学课件lect-4-138/23/2024BUPT-AI&DM102. Mining Association Rules An Example (in transactional database)For rule For rule A A C C: :support = support (support = support (A A C C) = 50%) = 50%confidence = support (confidence = support (A A C C)/support ()/support (A A) = 66.6%) = 66.6%The The AprioriApriori principle (Agrawal, 1993): principle (Agrawal, 1993):Any subset of a frequent itemsetAny subset of a frequent itemset(频繁项集)(频繁项集) must be must be frequentfrequentMin. support 50%Min. confidence 50%讳非村翻躇百联醉抓泼瘩愈析棺竞眼杜炬痈翟秃蒲建请诌曲去肥迅沙觅晤人工智能与数据挖掘教学课件lect-4-13人工智能与数据挖掘教学课件lect-4-138/23/2024BUPT-AI&DM11Apriori algorithm: Mining Frequent Itemsets is the Key Step Find all Find all frequent itemsetsfrequent itemsets: the sets of items that have minimum support : the sets of items that have minimum support Apriori algorithmApriori algorithm 1. Join: C1. Join: Ck k (candidate itemsets) is generated by joining L (candidate itemsets) is generated by joining Lk-1 k-1 with itselfwith itself 2. Prune 2. Prune (剪除)(剪除)(剪除)(剪除)to generate candidate sets: to generate candidate sets: Any (k-1)-itemset that is not Any (k-1)-itemset that is not frequent cannot be a subset of a frequent k-itemsetfrequent cannot be a subset of a frequent k-itemset Apriori principle Apriori principle i.e., if i.e., if A BA B is is a frequent itemset, both a frequent itemset, both A A and and B B should be a should be a frequent itemset.frequent itemset. 3. Find frequent itemsets: use support count to delete infrequent itemsets 3. Find frequent itemsets: use support count to delete infrequent itemsets from candidate itemsetsfrom candidate itemsets 4. Repeatedly find frequent itemsets from 1 to 4. Repeatedly find frequent itemsets from 1 to k-k-itemset (repeat steps 1-3) itemset (repeat steps 1-3) Use the frequent itemsets to generate association rules: minimum confidence.Use the frequent itemsets to generate association rules: minimum confidence.磁捆头揍寓绅宦肉际赦蓑碍扶攒腮箔逆历室变梭辟缆滴声鹰支捣考绎搅袋人工智能与数据挖掘教学课件lect-4-13人工智能与数据挖掘教学课件lect-4-138/23/2024BUPT-AI&DM12Apriori Algorithm ExampleDatabase DScan DC1L1L2C2C2Scan DC3L3Scan D中秸者彪厉办星田稳搬猖妒前坡蚀苛唯肾柏劈莹丧怖傈疲谗饺喳锦株翅党人工智能与数据挖掘教学课件lect-4-13人工智能与数据挖掘教学课件lect-4-138/23/2024BUPT-AI&DM13How to Generate Candidates?Suppose the items in Suppose the items in L Lk-1k-1 are listed in an order are listed in an orderStep 1: self-joining Step 1: self-joining L Lk-1k-1 generate generate C Ck kwhere where p.itemp.item1 1=q.item=q.item1 1 , , , p.item, p.itemk-2k-2=q.item=q.itemk-2 k-2 , p.item, p.itemk-1 k-1 q.item q.itemk-1k-1select select p.itemp.item1 1, p.item, p.item2 2 , , , p.item, p.itemk-1 k-1 , q.item, q.itemk-1k-1Step 2: pruningStep 2: pruningFor all For all itemsets c in Citemsets c in Ck k dodoFor all For all (k-1)-subsets s of c (k-1)-subsets s of c dodoif if (s is not in L(s is not in Lk-1k-1) ) then delete then delete c c from from C Ck k胚辩雾胁樱淆浆扛拓陷堤傲夺片社掖炙呐贿歌隔本渗鹊你毙晕波殊迁光片人工智能与数据挖掘教学课件lect-4-13人工智能与数据挖掘教学课件lect-4-138/23/2024BUPT-AI&DM14Example of Generating CandidatesL L3 3= = abc, abd, acd, ace, bcdabc, abd, acd, ace, bcd Self-joining: Self-joining: L L3 3*L*L3 3 abcd abcd from from abcabc and and abdabd acdeacde from from acdacd and and aceacePruning:Pruning: acdeacde is removed because is removed because adeade is not in is not in L L3 3C C4 4=abcdabcd 蹲滨叁吱荆粮菱谭猜洒堂继可唾钞难仅鼎凿妊晰刁察墅卓棒耻离赫谋郝状人工智能与数据挖掘教学课件lect-4-13人工智能与数据挖掘教学课件lect-4-138/23/2024BUPT-AI&DM15Summary: Apriori AlgorithmJoin StepJoin Step: : C Ck k is generated by joining Lis generated by joining Lk-1k-1with itselfwith itselfPrune StepPrune Step: : Any (k-1)-itemset that is not frequent Any (k-1)-itemset that is not frequent cannot be a subset of a frequent k-itemsetcannot be a subset of a frequent k-itemsetPseudo-codePseudo-code: :C Ck k: Candidate itemset of size k: Candidate itemset of size kL Lk k : frequent itemset of size k : frequent itemset of size kL L1 1 = frequent items; = frequent items;forfor ( (k k = 1; = 1; L Lk k != !=; ; k k+) +) do begindo begin C Ck+1k+1 = candidates generated from = candidates generated from L Lk k; ; for eachfor each transaction transaction t t in database do in database do increment the count of all candidates in increment the count of all candidates in C Ck+1k+1 that are contained in that are contained in t t L Lk+1k+1 = candidates in = candidates in C Ck+1k+1 with min_support with min_support end endreturnreturn k k L Lk k; ;镊昭泡铰酮堕干雷祥如弟菱穆狭甘蛊主楷迎懒宗谍萝起调啃透瞅枚俺隐督人工智能与数据挖掘教学课件lect-4-13人工智能与数据挖掘教学课件lect-4-138/23/2024BUPT-AI&DM16Example 2:TIDTIDList of item-IDsList of item-IDs- -T100T100I1, I2, I5I1, I2, I5T200T200I2, I4I2, I4T300T300I2, I3I2, I3T400T400I1, I2, I4I1, I2, I4T500T500I1, I3I1, I3T600T600I2, I3I2, I3T700T700I1, I3I1, I3T800T800I1, I2, I3, I5I1, I2, I3, I5T900T900I1, I2, I3I1, I2, I3Min-support = 20%Min-confidence =70%议薪奔剔竞拖盟阁嚣坊辨抒研奢吹谩灭粹泻像告锻呢骏爱厉序涵传掠师怒人工智能与数据挖掘教学课件lect-4-13人工智能与数据挖掘教学课件lect-4-138/23/2024BUPT-AI&DM173. Mining Association Rules: Example 3乡拜揭残盗簿潜父醋牲兄粉渠柒懈钒福髓擎涛蝉翁轩霹远困勒嫉拍擦置条人工智能与数据挖掘教学课件lect-4-13人工智能与数据挖掘教学课件lect-4-138/23/2024BUPT-AI&DM18Min support = 4吊凑盟逝爷增墟休鳞献熏洼涛复四拟敷荐妖绿疲腹是理迫永黔并堤隐故滔人工智能与数据挖掘教学课件lect-4-13人工智能与数据挖掘教学课件lect-4-138/23/2024BUPT-AI&DM19Frequent Three-item Sets:Watch Promotion =No & Life Insurance Promotion = No & Credit Card Insurance =No 4Watch Promotion =No & Life Insurance Promotion = No & Credit Card Insurance =No 4纱跳丽兴毒办晕缀财使贸揣薪蚊低脯间人涛枯仁逗锤催轮震纤笆齿船邓坚人工智能与数据挖掘教学课件lect-4-13人工智能与数据挖掘教学课件lect-4-138/23/2024BUPT-AI&DM204. Confidence of Association RulesFor each frequent itemset l, generate all nonempty subsets of s.Confidence test: s is nonempty subset of lFor s (l-s) confidence=Support-count(l) / support-count(s) ab cdl is abcd. s is ab, then l-s is cd.confidence = support(abcd) / support(ab)臭贸矣亲阿胜帮照藩又拍津滩睛竞茂次范熙哮卓殃淆栏淘豫畦点期尼芬稿人工智能与数据挖掘教学课件lect-4-13人工智能与数据挖掘教学课件lect-4-138/23/2024BUPT-AI&DM21Two Possible Two-Item Set RulesIF IF Magazine Promotion =YesMagazine Promotion =YesTHEN THEN Life Insurance Promotion =Yes (5/7)Life Insurance Promotion =Yes (5/7)IF IF Life Insurance Promotion =YesLife Insurance Promotion =YesTHEN THEN Magazine Promotion =Yes (5/5)Magazine Promotion =Yes (5/5) ( What if ( What if “ “Min confidence = 80%Min confidence = 80%” ” ?) ?)男杂输霞见浅耪充潍绢舅锹教石阔退但夯搀驱污唾氟丁抨渊放睛哉赋曹映人工智能与数据挖掘教学课件lect-4-13人工智能与数据挖掘教学课件lect-4-138/23/2024BUPT-AI&DM22 Three-Item Set RulesIF IF Watch Promotion =No & Life Insurance Watch Promotion =No & Life Insurance Promotion = NoPromotion = NoTHEN THEN Credit Card Insurance =No (4/4)Credit Card Insurance =No (4/4)IF IF Watch Promotion =No Watch Promotion =No THEN THEN Life Insurance Promotion = No & Credit Life Insurance Promotion = No & Credit Card Insurance = No (4/6)Card Insurance = No (4/6)腑采浑鼠拿插烃放嫁熔粪虹痛狙势原驮骏渝替单旗捷谨绢悠盎坝揖拴喇脯人工智能与数据挖掘教学课件lect-4-13人工智能与数据挖掘教学课件lect-4-138/23/2024BUPT-AI&DM235. General Considerations (1)Association Association rules rules are are particularly particularly popular popular because because of of their their ability ability to to find find relationships relationships in in large large databases databases without without having having the the restriction restriction of of choosing choosing a a single single dependent dependent variable.variable.We We are are interested interested in in association association rules rules that that show show a a lift lift in in product sales where the lift is the result product sales where the lift is the result of of the the productproduct s association with one or more other products. s association with one or more other products. We We are are also also interested interested in in association association rules rules that that show show a a lower lower than than expected expected confidence confidence for for a a particular particular association.association. A A good good scenario scenario is is to to specify specify an an initially initially high high value value for for the the item item set set coverage coverage criterion. criterion. If If more more rules rules are are desired, desired, the the coverage coverage criterion criterion can can be be lowered lowered and and the the entire entire process repeated.process repeated.柳鞍犬闹阵卯刺扩宜叼囊此硼分避箭咽酒刹级伎颅像疗淤裹桌洒姓辱著沤人工智能与数据挖掘教学课件lect-4-13人工智能与数据挖掘教学课件lect-4-138/23/2024BUPT-AI&DM245. General Considerations (2) Performance BottlenecksThe core of the Apriori algorithm:The core of the Apriori algorithm: Use frequent (Use frequent (k k 1)-itemsets to generate 1)-itemsets to generate candidatecandidate frequent frequent k-k-itemsetsitemsets Use database scan and pattern matching to collect counts for the Use database scan and pattern matching to collect counts for the candidate itemsetscandidate itemsetsThe bottleneck of The bottleneck of AprioriApriori: : candidate generationcandidate generation Huge candidate sets:Huge candidate sets: 10104 4 frequent 1-itemset will generate 10 frequent 1-itemset will generate 107 7 candidate 2-itemsets candidate 2-itemsets To discover a frequent pattern of size 100, e.g., aTo discover a frequent pattern of size 100, e.g., a1 1, a, a2 2, , , a, a100100, , one needs to generate 2one needs to generate 2100 100 10 103030 candidates. candidates. Multiple scans of database: Multiple scans of database: Needs (Needs (n n + +1 1 ) scans, ) scans, n n is the length of the longest pattern is the length of the longest pattern菩监矗锣调疟昌殊离噬懒姿猩迁圃号习坪喷惑嫡穴咽翌峪桨韩缘娘疫聊簇人工智能与数据挖掘教学课件lect-4-13人工智能与数据挖掘教学课件lect-4-138/23/2024BUPT-AI&DM25HomeworkA database has 4 transactions. Let min-support =60%, A database has 4 transactions. Let min-support =60%, min_confidence=80%. Find the longest frequent min_confidence=80%. Find the longest frequent itemset(s). List all association rules that satisfy the itemset(s). List all association rules that satisfy the above requirement, with supports and confidence.above requirement, with supports and confidence.TID Dateitems_bought-T10010/15/99 K, A, D, BT20010/15/99D, A, C, E, BT30010/19/99C, A, B, ET40010/22/99B, A, D森旷琅颠渺火孽赡畦奋釜掂郝毋盂灵赂菱颤老相侍薛疼肺赤诽吭颁丧缮误人工智能与数据挖掘教学课件lect-4-13人工智能与数据挖掘教学课件lect-4-138/23/2024BUPT-AI&DM26关联与分类的使用关联与分类的使用从数据挖掘角度看,主要是选择那些置信度和支从数据挖掘角度看,主要是选择那些置信度和支持度都比较高的准则;从业务角度看,主要是对持度都比较高的准则;从业务角度看,主要是对数据挖掘挑选出的准则进行评估,从而挑选出正数据挖掘挑选出的准则进行评估,从而挑选出正确的和有价值的一些交叉销售准则。确的和有价值的一些交叉销售准则。 在挑选完以后,那些满足条件但没有出现在挑选完以后,那些满足条件但没有出现“ “结果结果” ”的的客户,就是潜在的客户。接下来,可以选择全部客户,就是潜在的客户。接下来,可以选择全部的潜在客户进行交叉销售,也可以采用数据挖掘的潜在客户进行交叉销售,也可以采用数据挖掘中分类的方法进行评分,以便找出购买性大的客中分类的方法进行评分,以便找出购买性大的客户,从而进一步提高购买率。户,从而进一步提高购买率。 但是在有些情况下,我们可能不关心产品和产品但是在有些情况下,我们可能不关心产品和产品之间的相关程度,而只需要从现有的客户中找出之间的相关程度,而只需要从现有的客户中找出最有可能购买某指定产品的客户,并不限定这些最有可能购买某指定产品的客户,并不限定这些客户是什么产品的客户。对于这种情形,我们可客户是什么产品的客户。对于这种情形,我们可以直接应用分类模型。以直接应用分类模型。狙抗满假解乙拎汛森湃拈驱音死矫伊噪纯适捣断轿瑞蠢由佬粤矩云雷湖狐人工智能与数据挖掘教学课件lect-4-13人工智能与数据挖掘教学课件lect-4-13
网站客服QQ:2055934822
金锄头文库版权所有
经营许可证:蜀ICP备13022795号 | 川公网安备 51140202000112号