资源预览内容
第1页 / 共34页
第2页 / 共34页
第3页 / 共34页
第4页 / 共34页
第5页 / 共34页
第6页 / 共34页
第7页 / 共34页
第8页 / 共34页
第9页 / 共34页
第10页 / 共34页
亲,该文档总共34页,到这儿已超出免费预览范围,如果喜欢就下载吧!
资源描述
Machine Learning on SparkShivaram Venkataraman UC BerkeleyComputer Science Machine learningStatisticsMachine learningSpam filtersRecommendationsClick predictionSearch rankingMachine learning techniquesClassificationRegressionClusteringActive learningCollaborative filteringImplementing Machine Learning Machine learning algorithms are- Complex, multi-stage- Iterative MapReduce/Hadoop unsuitable Need efficient primitives for data sharing Spark RDDs efficient data sharing In-memory caching accelerates performance- Up to 20x faster than Hadoop Easy to use high-level programming interface- Express complex algorithms 100 lines.Machine Learning using SparkMachine learning techniquesClassificationRegressionClusteringActive learningCollaborative filteringK-Means Clustering using SparkFocus: Implementation and PerformanceClusteringGrouping data according to similarityDistance EastDistance NorthE.g. archaeological digClusteringGrouping data according to similarityDistance EastDistance NorthE.g. archaeological digK-Means AlgorithmBenefits Popular Fast Conceptually straightforwardDistance EastDistance NorthE.g. archaeological digK-Means: preliminariesFeature 1Feature 2Data: Collection of valuesdata = lines.map(line=parseVector(line)K-Means: preliminariesFeature 1Feature 2Dissimilarity: Squared Euclidean distancedist = p.squaredDist(q)K-Means: preliminariesFeature 1Feature 2K = Number of clustersData assignments to clustersS1, S2,. . ., SKK-Means: preliminariesFeature 1Feature 2K = Number of clustersData assignments to clustersS1, S2,. . ., SKK-Means AlgorithmFeature 1Feature 2 Initialize K cluster centers Repeat until convergence: Assign each data point to the cluster with the closest center. Assign each cluster center to be the mean of its clusters data points.K-Means AlgorithmFeature 1Feature 2 Initialize K cluster centers Repeat until convergence: Assign each data point to the cluster with the closest center. Assign each cluster center to be the mean of its clusters data points.K-Means AlgorithmFeature 1Feature 2 Initialize K cluster centers Repeat until convergence: Assign each data point to the cluster with the closest center. Assign each cluster center to be the mean of its clusters data points.centers = data.takeSample(false, K, seed)K-Means AlgorithmFeature 1Feature 2 Initialize K cluster centers Repeat until convergence: Assign each data point to the cluster with the closest center. Assign each cluster center to be the mean of its clusters data points.centers = data.takeSample(false, K, seed)K-Means AlgorithmFeature 1Feature 2 Initialize K cluster centers Repeat until convergence: Assign each data point to the cluster with the closest center. Assign each cluster center to be the mean of its clusters data points.centers = data.takeSample(false, K, seed)K-Means AlgorithmFeature 1Feature 2 Initialize K cluster centers Repeat until convergence:Assign each cluster center to be the mean of its clusters data points.centers = data.takeSample(false, K, seed)closest = data.map(p =(closestPoint(p,centers),p)K-Means AlgorithmFeature 1Feature 2 Initialize K cluster centers Repeat until convergence:Assign each cluster center to be the mean of its clusters data points.centers = data.takeSample(false, K, seed)closest = data.map(p =(closestPoint(p,centers),p)K-Means AlgorithmFeature 1Feature 2 Initialize K cluster centers Repeat until convergence:Assign each cluster center to be the mean of its clusters data points.centers = data.takeSample(false, K, seed)closest = data.map(p =(closestPoint(p,centers),p)K-Means AlgorithmFeature 1Feature 2 Initialize K cluster centers Repeat until convergence:centers = data.takeSample(false, K, seed)closest = data.map(p =(closestPoint(p,centers),p)pointsGroup = closest.groupByKey()K-Means AlgorithmFeature 1Feature 2 Initialize K cluster centers Repeat until convergence:centers = data.takeSample(false, K, seed)closest = data.map(p =(closestPoint(p,centers),p)pointsGroup = closest.groupByKey()newCenters = pointsGroup.mapValues(ps = average(ps)K-Means AlgorithmFeature 1Feature 2 Initialize K cluster centers Repeat until convergence:centers = data.takeSample(false, K, seed)closest = data.map(p =(closestPoint(p,centers),p)pointsGroup = closest.groupByKey()newCenters = pointsGroup.mapValues(ps = average(ps)K-Means AlgorithmFeature 1Feature 2 Initialize K cluster centers Repeat until convergence:centers = data.takeSample(false, K, seed)closest = data.map(p =(closestPoint(p,centers),p)pointsGroup = closest.groupByKey()newCenters = pointsGroup.mapValues(ps = average(ps)K-Means AlgorithmFeature 1Feature 2 Initialize K cluster centers Repeat until convergence:centers = data.takeSample(false, K, seed)closest = data.map(p =(closestPoint(p,centers),p)pointsGroup = closest.groupByKey()newCenters =pointsGroup.mapValues(ps = average(ps)while (dist(centers, newCenters) )K-Means AlgorithmFeature 1Featur
收藏 下载该资源
网站客服QQ:2055934822
金锄头文库版权所有
经营许可证:蜀ICP备13022795号 | 川公网安备 51140202000112号