资源预览内容
第1页 / 共41页
第2页 / 共41页
第3页 / 共41页
第4页 / 共41页
第5页 / 共41页
第6页 / 共41页
第7页 / 共41页
第8页 / 共41页
第9页 / 共41页
第10页 / 共41页
亲,该文档总共41页,到这儿已超出免费预览范围,如果喜欢就下载吧!
资源描述
StatisticsSuccess Stories and CautionaryTales统计学的成功案例和警示故事LESSON 1第一课 1.1 WHAT IS STATISTICS ?什么是统计学Statistics is a collection of procedures and principle for gathering data and analyzing information in order to help people make decisions when faced with uncertainty.统计学是一套收集数据和分析信息的方法和原则,以帮助人们在面对不确定性时制定决策。The odds of finding two identical fingerprints were 1 in 64 billion.Francis Galton两个随机个体具有相同DNA图形的概率为310-11;如果同时用两种探针进行比较,两个个体完全相同的概率小于510-19。每支枪的枪管都有独一无二的特征,这种特征影响了它所发射的每一发子弹。司法弹道学航空公司通过抽样而省钱最早的英文原版打败庄家1.2 SEVEN STATISTICAL STORIES WITH MORALS7个包含寓义的统计学故事There are three kinds of lies: lies, damned lies and statistics。Benjamin Disraeli (British Prime Minister, 18041881)CASE STUDY 1Who are Those Speedy Drivers?谁是快车手?“Whats the fastest you have ever driven a car?”-Penn State University, 1994 110 109 90 140 105 150 120 110 110 90 115 95 145 140 110 105 85 95 100 115 124 95 100 125 140 85 120 115105 125 102 85 120 110 120 115 94 125 80 85 140 120 92 130 125 110 90 110 110 95 95 110 105 80 100 110 130 105 105 120 90 100 105 100 120 100 100 80 100 120 105 60 125 120 100 115 95 110 101 80 112 120 110 115 125 55 9087 Males 80 75 83 80 100 100 90 75 95 85 90 90 90 120 85 100 120 75 85 80 70 85 110 85 75 105 95 75 70 90 70 82 85 100 90 95 90 110 80 80 110 110 95 75 130 95 110 110 80 90 105 90 110 75 100 90 110 85 90 80 80 85 50 80 100 80 80 80 95 100 90 100 95 80 80 50 88 90 90 85 70 90 30 85 85 87 85 90 85 75 90 102 80 100 95 110 80 95 90 80 90 102 FemalesResponses to “Whats the fastest youve ever driven?”DotplotMALESFastest speed (mph)FEMALESFastest speed (mph)Responses to “Whats the fastest youve ever driven?”Five-number summaryMalesMales(87 Students)(87 Students)FemalesFemales(102 Students)(102 Students)MedianMedian1101108989QuartilesQuartiles959512012080809595ExtremesExtremes55551501503030130130一条平均水深0.4m河流绝不会比一个平均水深0.6m的游泳池更安全Responses to “Whats the fastest youve ever driven?”Five-number summaryMalesMales(87 Students)(87 Students)FemalesFemales(102 Students)(102 Students)MedianMedian1101108989QuartilesQuartiles959512012080809595ExtremesExtremes55551501503030130130Definition: The median is the value in the middle when the numbers are put in order. The lower quartile and upper quartile are (roughly) the medians of the lower and upper halves of the data.Moral of the story13013030301501505555ExtremesExtremes959580801201209595QuartilesQuartiles8989110110MedianMedianFemalesFemales(102 Students)(102 Students)MalesMales(87 Students)(87 Students)Simple summaries of data can tell an interesting story and are easier to digest than long lists.CASE STUDY 2Disaster in the Skies?空中的灾难?“Planes get closer in midair as traffic control errors rise.Errors by air traffic controllers climbed from 746 in fiscal 1997 to 878 in fiscal 1998, an 18% increase”-USA TODAY, Levin, 1999“The errors per million flights handled by controllers climbed from 4.8 to 5.5”5.5 4.8=114.6%Definition: The rate is simply the number of times something occurs per number of opportunities for it to occur. Baseline rate is the rate at a beginning time period or under specific conditions.Moral of the storyWhen discussing the change in the rate or risk of occurrence of something, make sure you also include the base rate or baseline risk.CASE STUDY 3Did anyone ask you whom youve been dating?“According to a new USA Today/Gallup Poll of teenagers across the country, 57 percent of teens who go out on dates say theyve been out with someone of another race or ethnic group.”-USA TODAY, Perterson, 1997CASE STUDY 3“In most cases, parents arent a major obstacle. Sixty-four percent of teens says their parents dont mind that they date interracially, or wouldnt mind if they did.”-Sacramento Bee, Hiram , 1997How could the polltakers manage to ask so many teenagers these question?Question 1Could such a small sample possibly tell us anything about the millions of teenagers in the United States ?Question 2Yes-if those teens constituted a random sample from the population of interest.How accurate could this sample possibly be ?Question 3The results of this poll are accurate to within a margin of error of about 4.5% (95% confidence interval).Moral of the storyA representative sample of only a few thousand, or perhaps even a few hundred, can give reasonably accurate information about a population of many millions.CASE STUDY 4Who Are Those Angry Women?“A well-conducted survey can be very informative, but a poorly conducted one can be a complete disaster.”-Statistics: Concepts and Controversives, David S. Moore“The women who responded were fed up with men and eager to fight them. For example, 91% of those who were divorced said they had initiated the divorce. The anger of women toward men became the theme of the book.”-Women and Love, Shere. HiteShere Hite sent questionnaires to 100,000 women asking about love, sex, and relationships.The Hite sample exemplifies one of the most common problems with surveys-the sample data may not represent the population.Extensive nonresponse from a random sample, or the use of a self-selected (i.e., all-volunteer) sample, will probably produce biased results.Moral of the storyAn unrepresentative sample, even a large one, tell you almost nothing about the population.Definition: Nonresponse bias can occur when many people who are selected for the sample either do not respond at all or do not respond to some of the key survey questions. This may occur even when an appropriate random sample is selected and contacted.Magazines, television stations, and internet websites routinely conduct the survey which is based on a nonrepresentative sample, usually those who feel strongly about the issues. The sample is called a self-selected sample or a volunteer sample which tells nothing about the larger population at all, it only tells you about those who responded.CASE STUDY 5Does Prayer Lower Blood pressure?Prayer Can Lower Blood Pressure“Attending religious services lowers blood pressure more than tuning into religious TV or radio, a new study says.”-USA TODAY, Davis, 1998People who attended a religious service once a week and prayed or studied the bible once a day were 40% less likely to have high blood pressure than those who dont go to church every week and prayed and studied the bible less.An observational study conducted by the U.S. National Institutes of Health, which followed 2391 people aged 65 or older for six years.BackgroundCriticismMoral of the storyCause-and-effect conclusions can not generally be made based on an observational study.Definition: An observational study is one in which participants are merely observed and measured. A confounding variable is a variable that is not the main concern of the study, but may be partially responsible for the observed results.CASE STUDY 6Does Aspirin Reduce Heart Attack Rates?BackgroundTime:19831988Organizer: Steering Committee of the Physicians Health Study Research GroupObjective: To determine if taking aspirin reduces the risk of a heart attack.Subject: 22,071 male physicians between the ages of 40 and 80Method: A five-year randomized experimentMethod2x2 factorial designlactive aspirin and active beta-carotene(5,517)lactive aspirin and beta-carotene placebo(5,520)laspirin placebo and active beta-carotene(5,519)laspirin placebo and beta-carotene placebo(5,515)Aspirin reduced the risk of first myocardial infarction by 44% (P less than 0.00001). There were too few strokes or deaths upon which to base sound clinical judgment regarding aspirin and stroke or mortality. Result-New England Journal of Medicine, 1989,321(3):18-185TreatmentTreatmentHeart Heart AttacksAttacksDoctors in Doctors in GroupGroupAttacks Per 1000 Attacks Per 1000 DoctorsDoctorsAspirinAspirin10410411,03711,037 9.42 9.42PlaceboPlacebo18918911,03411,03417.1317.13The Effect of Aspirin on Heart Attacks*More than 170 other findings have emerged from the trial so far. Moral of the storyUnlike with observational studies, cause-and effect conclusions can generally be made on the basis of randomized experiments.Definition: A randomized experiment is a study in which treatments are randomly assigned to participants. A statistically significant relationship of difference is one that large enough to be unlikely to have occurred in the sample if there was no relationship or difference in the population.CASE STUDY 7Does the internet Increase Loneliness and Depression?“greater use of the internet was associated with declines in participants communication with family members in the household, declines in size of their social circle, and increases in their depression and loneliness.”-Internet paradox: A social technology that reduces social involvement and psychological well-being?American Psychologist, 1998,53(9):1017-1031 David S. MooreSad, Lonely World Discovered in Cyberspace-Amy Harmon, August 30, 1998, SundayBackground:The study included 169 individuals in 73 households in Pittsburg, Pennsylvania who were given free computers and internet service in 1995. The participants answered a series of questions at the beginning of the study and either one or two years later, measuring social contacts, stress, loneliness, and depression.ConclusionPeople who spend even a few hours a week online have higher levels of depression and loneliness than they would if they used the computer less frequentlyone hour a week on the internet was associated, on average, with an increase of 0.03, or 1 percent on the depression scale.Local social networkBeforeAfter23.9422.90Average peopleLonelinessBeforeAfter1.991.890102005DepressionBeforeAfter0.730.6203Moral of the storylA “statistically significant” finding does not necessarily have practical important.lThe implied direction of cause and effect maybe wrong. (In this case, it could be that people who were more lonely and depression were more prone to use the internet.)1.3 THE COMMON ELEMENTS IN THE SEVEN STORIESlHow should we collect the data, and how much data is needed?lHow can we effectively summarize the data?lWhat decisions or generalizations are possible based on the observed data?
网站客服QQ:2055934822
金锄头文库版权所有
经营许可证:蜀ICP备13022795号 | 川公网安备 51140202000112号