资源预览内容
第1页 / 共54页
第2页 / 共54页
第3页 / 共54页
第4页 / 共54页
第5页 / 共54页
第6页 / 共54页
第7页 / 共54页
第8页 / 共54页
第9页 / 共54页
第10页 / 共54页
亲,该文档总共54页,到这儿已超出免费预览范围,如果喜欢就下载吧!
资源描述
OutlineoftheproblemMissingvaluesinlongitudinaltrialsisabigissueFirstaimshouldbetoreduceproportionEthicsdictatethatitcantbeavoidedThereisnomagicmethodtofixitMagnitudeofproblemvariesacrossareas8-weekdepressiontrial:25%50%maydropoutbyfinalvisit12-weekasthmatrial:maybeonly5%10%1DateName,department2OutlineofthelecturePartI:MissingdataPartII:MultipleimputationExample:Theanalgesictrial34DateName,department5PartI:MissingdataInrealdatasets,like,e.g.,surveysandclinicaltrials,itisquitecommontohaveobservationswithmissingvaluesforoneormoreinputfeatures.Thefirstissueindealingwiththeproblemisdeterminingwhether the missing data mechanism has distorted the observed data.LittleandRubin(1987)andRubin(1987)distinguishbetweenbasicallythreemissingdatamechanisms.Dataaresaidtobemissingatrandom(MAR)ifthemechanismresultinginitsomissionisindependentofits(unobserved)value.Ifitsomissionisalsoindependentoftheobservedvalues,thenthemissingnessprocessissaidtobemissingcompletelyatrandom(MCAR).Inanyothercasetheprocessismissingnotatrandom(MNAR),i.e.,themissingnessprocessdependsontheunobservedvalues.http:/www.emea.europa.eu/pdfs/human/ewp/177699EN.pdf1.Introductiontomissingdata?Variables Cases?=missing6Whatismissingdata?The missingness hides a real value that is useful for analysis purposes.Survey questions:1.What is your total annual income for FY 2008?2.Who are you voting for in the 2009 election for the European parlament?7Whatismissingdata?Clinical trials:StartFinishcensored at this point in timetime8MissingnessIt matters why data are missing.Supposeyouaremodellingweight(Y)asafunctionofsex(X).Somerespondentswouldntdisclosetheirweight,soyouaremissingsomevaluesforY.Therearethreepossiblemechanismsforthenondisclosure:1.Theremaybenoparticularreasonwhysomerespondentstoldyoutheirweightsandothersdidnt.Thatis,theprobabilitythatYismissingmayhasnorelationshiptoXorY.Inthiscaseourdataismissing completely at random2.Onesexmaybelesslikelytodiscloseitsweight.Thatis,theprobabilitythatYismissingdependsonlyonthevalueofX.Suchdataaremissing at random3.Heavy(orlight)peoplemaybelesslikelytodisclosetheirweight.Thatis,theprobabilitythatYismissingdependsontheunobservedvalueofYitself.Suchdataarenotmissingatrandom9Missingdatapatterns&mechanisms Pattern:Which values are missing?Mechanism:Is missingness related to the response?(Yi,Ri)=Data matrix,with COMPLETE DATARij=1,Yij missing0,Yij observedRij=Missing data indicator matrix=Observed part of Y=Missing part of Y10Missingdatapatterns&mechanisms“Pattern”concerns the distribution of R“Mechanism”concerns the distribution of R given YRubin(Biometrika 1976)distinguishes between:Missing Completely at Random(MCAR)P(R|Y)=P(R)for all Y Missing at Random(MAR)P(R|Y)=P(R|)for all Not Missing at Random(NMAR)P(R|Y)depends on11MissingAtRandom(MAR)Whatarethemostgeneralconditionsunderwhichavalidanalysiscanbedoneusingonlytheobserveddata,andnoinformationaboutthemissingnessvaluemechanism,Theanswertothisiswhen,given the observed data,the missingness mechanism does not depend on the unobserved data.Mathematically,ThisistermedMissing At Random,andisequivalenttosayingthatthebehaviouroftwounitswhoshareobserved valueshavethesamestatisticalbehaviourontheotherobservations,whetherobservedornot.12Asunits1and2havethesamevalueswherebothareobserved,giventheseobservedvalues,underMAR,variables3,5and6fromunit2havethesamedistribution(NBnotthesamevalue!)asvariables3,5and6fromunit1.NotethatunderMARtheprobabilityofavaluebeingmissingwillgenerallydependonobservedvalues,soitdoesnotcorrespondtotheintuitivenotionofrandom.Theimportantideaisthatthemissingvaluemechanismcanbeexpressedsolelyintermsofobservations that are observed.Unfortunately,thiscanrarelybedefinitivelydeterminedfromthedataathand!Example13If data are MCAR or MAR,youcanignorethemissingdatamechanismandusemultipleimputationandmaximumlikelihood.If data are NMAR,youcantignorethemissingdatamechanism;twoapproachestoNMARdataareselection modelsandpattern mixture.14SupposeYisweightinpounds;ifsomeonehasaheavyweight,theymaybelessinclinedtoreportit.SothevalueofYaffectswhetherYismissing;thedataareNMAR.Twopossibleapproachesforsuchdataareselectionmodelsandpatternmixture.Selection models.Inaselectionmodel,yousimultaneouslymodelYandtheprobabilitythatYismissing.Unfortunately,anumberofpracticaldifficultiesareoftenencounteredinestimatingselectionmodels.Pattern mixture(Rubin1987).WhendataisNMAR,analternativetoselectionmodelsismultipleimputationwithpatternmixture.Inthisapproach,youperformmultipleimputationsunderavarietyofassumptionsaboutthemissingdatamechanism.Inordinarymultipleimputation,youassumethatthosepeoplewhoreporttheirweightsaresimilartothosewhodont.Inapattern-mixturemodel,youmayassumethatpeoplewhodontreporttheirweightsareanaverageof20poundsheavier.Thisisofcourseanarbitraryassumption;theideaofpatternmixtureistotryoutavarietyofplausibleassumptionsandseehowmuchtheyaffectyourresults.Patternmixtureisamorenatural,flexible,andinterpretableapproach.15Simpleanalysisstrategies(1)Complete Case(CC)analysisAdvantages:Complete
收藏 下载该资源
网站客服QQ:2055934822
金锄头文库版权所有
经营许可证:蜀ICP备13022795号 | 川公网安备 51140202000112号