资源预览内容
第1页 / 共33页
第2页 / 共33页
第3页 / 共33页
第4页 / 共33页
第5页 / 共33页
第6页 / 共33页
第7页 / 共33页
第8页 / 共33页
第9页 / 共33页
第10页 / 共33页
亲,该文档总共33页,到这儿已超出免费预览范围,如果喜欢就下载吧!
资源描述
BEO2255 Applied Statisticsfor BusinessWeek Six Analyzing categorical data: Chi-squared tests This week lecture will cover.Analysing categorical data (nominal) Chi-square test of differences between proportions Chi-square test of independenceSPSS单样本非参数检验总体分布的总体分布的chi-square检验检验(1)目的目的: 根据样本数据推断总体的分布与某个已知分布是否有显著差异根据样本数据推断总体的分布与某个已知分布是否有显著差异-吻合性检验。吻合性检验。适用于分类资料的统计推断适用于分类资料的统计推断SPSS单样本非参数检验单样本非参数检验l总体分布的chi-square检验(2)基本假设: H0:总体分布与理论分布无显著差异(3)基本方法根据已知总体的构成比计算出样本中各类别的期望频数,计算实际观察频数与期望频数的差距,即:计算卡方值卡方值较小,则实际频数和期望频数相差较小.如果P大于a,不能拒绝H0,认为总体分布与已知分布无显著差异.反之SPSS单样本卡方检验总体分布的总体分布的chi-square检验检验(4)基本操作步骤基本操作步骤:菜单:analyze-nonparametric test-chi square选定待检验变量入test variable list 框确定待检验个案的取值范围(expected range)get from data:全部样本use specified range:用户自定义个案范围指定期望频数(expected values)all categories equal:所有类别有相同的构成比value:用户自定义构成比Categorical variableVariables that describe categories of entitiesDealing with them all the time in statisticsMaking comparisons among variablesFor example, whether consumers prefer a particular brand of a product among other competing brands.Checking whether there is a relationship between two categorical variables Gender and preference for a product, whether the preference for a product is independent from genderChi-square test for differences between proportionsThis test involves with nominal data produced by multinomial experimentIt is a generalisation of a binomial experimentThese test the null hypothesis that data in the target population has a particular probability distribution.Example 1We might test whether consumers are indifferent to which of four materials (glass, plastic, steel or aluminium) that could be used to make soft drink containers.The null hypothesis is that they are indifferent (or that equal numbers prefer glass, plastic, steel and aluminium).Example 1DataLet pG be the probability that an individual selected at random will nominate glass as his/her preference if required to make a choice. Similarly for pP (plastic), pS (steel) and pA (aluminium)HypothesesHO: pG = pP = pS = pA = 0.25.HA: at least one pi 0.25.The alternative is that at least one material is more preferred (or less preferred) than the others.Example 1cont.Procedure:Select a random sample of, say, 100 consumers and determine their preferences.Under the null hypothesisWe expect 25 consumers to nominate glass, 25 to nominate plastic, 25 to nominate steel and 25 to nominate aluminiumThese are the expected frequencies, Ei.Ei = n pi.We compare the expected frequencies with the sample results or the observed frequencies, Oi. If they are approximately the same we would conclude that the null hypothesis is true.Oi Ei HO is probably true.Example 1cont., Chi squareWe require a test statistic to decide whether the difference is large enough to reject the null hypothesis.We use chi square with G - 1 degrees of freedom where G is the number of groups.Suppose in our example, 39 prefer glass, 16 prefer plastic, 20 prefer steel and 25 prefer aluminium. Recall that the expected frequencies were all 25.Obtain the critical value of chi square Critical 23 = 7.82. Obtain the critical value at 5% significance level at 3 d.f., (Table E4, page 742, Berenson et.al. 2013)i.e. there is only a 5 percent chance or less that 23 7.82 if HO is true. Comparison of chi square values23 = 12.08 7.82 reject HO. Conclusion: at the 5% significance level there is sufficient evidence to reject the null hypothesis. At least one of the probabilities (pi) is different. The sample results indicate that the materials are not equally preferred by consumers in the target population. Thus, at least preferences for two materials are different.Chi square test using SPSSExample : Suppose that we want to test whether or not customers have a colour preference for packaging. Three different colours, Blue, Green & Purple, are considered. The null hypothesis is that they dont have colour preference.Use Analyse/Nonparametric tests /Chi-Square.The default is that the probabilities are equal.Main display colour2630.0-4.03730.07.02730.0-3.090BlueGreenPurpleTotalObserved NExpected NResidualExample: We test the null hypothesis that consumers in the Example: We test the null hypothesis that consumers in the target population have no preference for any of three target population have no preference for any of three colours of packaging.colours of packaging.Numbers of consumers actually choosing particular colours.Numbers of consumers expected to choose particular colours if the null is true.Main display colour2630.0-4.03730.07.02730.0-3.090BlueGreenPurpleTotalObserved NExpected NResidualDifferent but differentenough to reject the null? Test Statistics2.4672.291Chi-SquareadfAsymp. Sig.Main DisplayColour0 cells (.0%) have expected frequencies less than5. The minimum expected cell frequency is 30.0.a. Degrees of freedom,groups - 1Chi-square statisticTest Statistics2.4672.291Chi-SquareadfAsymp. Sig.Main DisplayColourCheck this to test the null.Ho: Consumers in the target population have no preference for any of three colours of Ho: Consumers in the target population have no preference for any of three colours of packagingpackagingH1: Consumers in the target population have preference for at least one of three H1: Consumers in the target population have preference for at least one of three colours of packaging.colours of packaging.Check the sig value to test Ho Cannot reject the null (Ho) that all three colours are equally preferredbecause Sig 0.05.Conclusion: At 5% significance level there is no sufficient evidence to conclude that consumers in the target population have preference for at least one of three colours of packaging. Tests of independence Chi-squared test of a contingency tableThis test satisfies two different problem objectives :Are two nominal variables related? Are there differences among two or more population of nominal variables?Consider the following 3 featuresHeight in centimetres, Weight in kilograms & Colour of eyes.Whilst some people are tall and thin, on average taller people weigh more than shorter people.Weight and height are not independent. It seems unlikely that people with blue eyes weigh more, on average, than people with brown eyes.Weight and eye colour are almost certainly independent.交叉分组下的频数分析目的 了解不同变量在不同水平下的数据分布情况 例:学习成绩与性别有关联吗?(两变量)例:职业、性别、爱逛商店有关联吗?(三变量)分析的主要步骤产生交叉列联表分析列联表中变量间的关系产生交叉列联表什么是列联表列变量行变量地区控制变量频数产生交叉列联表基本操作步骤(1)菜单选项: analyze-descriptive statistics- crosstabs(2)选择一个变量作为行变量到row框.(3)选择一个变量作为列变量到column框.(4)可选一个或多个变量作为控制变量到layer框.控制变量的层次设置:同层为水平数加水平数加;不同层为水平数积水平数积.(5)是否显示各分组的棒图(display clustered bar charts )产生交叉列联表进一步计算 cells选项:选择在频数分析表中输出各种百分比.row:行百分比(Row pct);column:列百分比(Col pct);total:总百分比(Tot pct); 分析列联表中变量间的关系目的: 通过列联表分析,检验行列变量之间是否独立。方法: 卡方检验:对品质数据的相关性进行度量分析列联表中变量间的关系卡方检验 年龄与工资收入交叉列联表 低 中 高 青 400 0 0 中 0 5000 老 0 0 600 低 中 高 青 0 0 500 中 0 6000 老 400 0 0分析列联表中变量间的关系卡方检验基本步骤(1)H0:行列变量之间无关联或相互独立(2)构造卡方统计量统计量服从(r-1)*(c-1)个自由度的卡方分布count:观察(实际)频数expected count:期望频数(期望频数反映的是H0成立情况下的数据分布特征)Residual:剩余(观察频数-期望频数)不患肺癌不患肺癌患肺癌患肺癌总计总计不吸烟不吸烟7775427817吸烟吸烟2099492148总计总计98749199651、列联表2、三维柱形图3、二维条形图不患肺癌患肺癌吸烟不吸烟不患肺癌患肺癌吸烟不吸烟080007000600050004000300020001000从三维柱形图能清晰看出从三维柱形图能清晰看出各个频数的相对大小。各个频数的相对大小。从二维条形图能看出,吸烟者中从二维条形图能看出,吸烟者中患肺癌的比例高于不患肺癌的比例。患肺癌的比例高于不患肺癌的比例。通过图形直观判断两个分类变量是否相关:通过图形直观判断两个分类变量是否相关:Tests of independence contExample 2Suppose we interviewed 400 people & asked themwhich of three age groups they are in (under 25, 25 to 60, and over 60).We also ask their response to the statement that “All imports of automobiles should be banned in order to protect the local industry” (agree, no view either way, disagree).attitudes towards banning importsagreeno viewdisagree Total age groupunder 2519 53 25 9725 - 6046 94 47 187over 6030 56 30 116Total95203102 400Tests of independence contExample 2 cont.Null hypothesis: The null hypothesis is that answers to the two questions are independent.Under the null:Probover 60 and agree = Probover 60 ProbagreeMultiplication rule for independent eventsExpected frequency= Probover 60 Probagree sample size.ProcedureWe set up a cross-tabulation showing the observed frequencies of answers to the two questions.We calculate the expected frequencies.TestOur test is based on a comparison of the observed and expected frequencies.Short-cut for expected frequenciesAge *attitude to banning imports Cross tabulation19.053.025.097.023.049.224.796.946.094.047.0187.044.494.947.7187.030.056.030.0116.027.658.929.6116.195.0203.0102.0400.095.0203.0102.0400.0CountExpected CountCountExpected CountCountExpected CountCountExpected CountUnder 2525-60Over 60AgeGroupTotalAgreeNo viewDisagreeAttitude to ban importsTotalCalculation for expectedfrequency of agree and over 60,95 116 / 400Age *attitude to banning imports Cross tabulation19.053.025.097.023.049.224.796.946.094.047.0187.044.494.947.7187.030.056.030.0116.027.658.929.6116.195.0203.0102.0400.095.0203.0102.0400.0CountExpected CountCountExpected CountCountExpected CountCountExpected CountUnder 2525-60Over 60AgeGroupTotalAgreeNo viewDisagreeAttitude to ban importsTotalThe count (observed) and the expected are different, but different enough to reject the null?Chi-squared test for independenceRationale:Oij Eij HO is probably true.Test statisticWe require a test statistic to decide whether the difference is large enough to reject the null hypothesis.Chi-Square Tests1.438a4.8371.5174.8051.3071.758400Pearson Chi-SquareLikelihood RatioLinear-by-LinearAssociationN of Valid CasesValuedfAsymp. Sig.(2-sided)0 cells (.0%) have expected count less than 5. Theminimum expected count is 23.0.a. Calculated value ofChi-Square.Degrees of freedom,(rows - 1) (columns - 1)Chi-Square Tests1.438a4.8371.5174.8051.3071.758400Pearson Chi-SquareLikelihood RatioLinear-by-LinearAssociationN of Valid CasesValuedfAsymp. Sig.(2-sided)0 cells (.0%) have expected count less than 5. Theminimum expected count is 23.0.a. Cannot rejectthe null that all attitude andage are independentbecause Sig 0.05.H0: attitudes and age are independent.H1: attitudes and age are dependent.Conclusion: At 5% significance level we are unable to conclude that age & attitudes towards banning automobile imports are dependent.
收藏 下载该资源
网站客服QQ:2055934822
金锄头文库版权所有
经营许可证:蜀ICP备13022795号 | 川公网安备 51140202000112号