资源预览内容
第1页 / 共78页
第2页 / 共78页
第3页 / 共78页
第4页 / 共78页
第5页 / 共78页
第6页 / 共78页
第7页 / 共78页
第8页 / 共78页
第9页 / 共78页
第10页 / 共78页
亲,该文档总共78页,到这儿已超出免费预览范围,如果喜欢就下载吧!
资源描述
multimedia innovative item types that are not feasible in the PBT format, and the abilityto measure response time (Boo Klein Bennett, 2001,2002; Parshall, Spray, Kalohn, Swinton and Powers, 1980; Bachman and Palmer, 1981; 1982; Upshur andHomburg, 1983; Vollmer, 1983; Vollmer and Sang, 1983; Sang et al., 1986; Boldt, 1988),as well as specific studies on oral communication (Hinofotis, 1983), and pronunciation(Purcell, 1983), have confirmed that language proficiency is multi-componential and notunitary as what was proposed by Oller. In addition, results in subsequent researches, withmore powerful factor analytic approaches used, were not in compliance with the unitarytrait hypothesis that one general factor sufficiently accounts for all of the commonvariances in language tests either (e.g., Bachman Carroll, 1983;Bachman, Davidson, Ryan Kunnan, 1995; Sasaki, 1996; Shin, 2005).Douglas (2000: 25) concluded that “As has become clear in recent years throughempirical studies conducted by language testers and others, language knowledge is multi-componential; however, what is extremely unclear is precisely what those componentsmay be and how they interact in actual language use.” Recent studies on the competencestructure of language tests such as the computer-based TOEFL (for example, Stricker, etal., 2005, 2008; Sawaki, et al., 2008, 2009) identify specific first-order factors such aslistening, reading, speaking and writing factors corresponding to the four language skills,providing more support for the view that language ability is divisible.The current consensus in the field of language testing is that second language abilityis multi-componential, with a general factor as well as smaller factors (Oller, 1983;Carroll, 1983). In general, recent researches agree that language proficiency mostprobably consists of a general higher-order factor and several distinct first-order abilityfactors (e.g. Bachman Carroll, 1983; Bachman, Davidson Fouly, et al., 1990; Bachman, Davidson, Ryan Sasaki,1996; Choi, et al., 2003; Shin, 2005; Sawaki, et al., 2008, 2009; Stricker, et al, 2008).However, there was no consensus in terms of the exact factor structures that wereidentified. Some studies found correlated first-order factors (e.g., Bachman Sang, et al., 1986; Kunnan, 1995; Stricker, et al., 2005;), while others found first-order factors as well as a higher-order general factor (e.g., Bachman Sasaki, 1996; Shin, 2005; Stricker, et al., 2008; Sawaki, et al., 2008, 2009; Bae Parshall et al., 2002).2.2.1 Comparability between PBT Bugbee, 1996): Scores from conventional and computer administrations may beconsidered equivalent when (a) the rank orders of scores of individuals tested inalternative modes closely approximate each other, and (b) the means, dispersions andshapes of the score distributions are approximately the same, or have been madeapproximately the same by rescaling the score from the computer mode.Since the 90s of the last century, score comparability between the paper-based testand the computer-based test has been extensively investigated. However, the resultscontradicted in previous researches. Some studies indicated that the CBT scores wereequivalent to the PBT scores (Bergstrom, 1992; Bugbee, 1996; Boo Wang, Newman, Choi Johnson Godwin, 1999; Pommerich Coon, Mcleod, and Thissen, 2002).For the purpose of this paper, reviews of thecomparability studies between the paper-based and the computer-based tests focus onthose on the commonality of score meaning across different delivery modes, that is, onpaper or by computer.Choi and Tinkler (2002) evaluated Oregon students in the third and tenth grades(800 students for each grade) with multiple-choice reading and mathematics items7delivered on paper and by computer. They found that items presented on computer weremore difficult than those presented on paper, and that this difference was more significantfor third-grader students and for reading items. Coon, McLeod, and Thissen (2002)conducted a similar research on students from the North Carolina Department of PublicInstruction. They assessed third graders in reading items and fifth graders in math itemsin multiple-choice form, with roughly 1,300 students in each grade taking paper testforms and 400 students taking the same test forms on computer. Results indicated thatscores were not comparable for either grade; the scale scores of the paper tests werehigher than those of the online tests. In addition, the mode differences were not the sameacross forms within grades, the delivery-mode by ethnic-group interaction, for example,which indicated that mode differences might vary among population groups. The authorfurther pointed out that such lack of consistency suggested that comparability betweenthese particular tests in two administration modes could not be achieved by a simplescore equivalence using data from the total population groups.Choi, et al. (2003) addressed the issue of the comparability between the comp
收藏 下载该资源
网站客服QQ:2055934822
金锄头文库版权所有
经营许可证:蜀ICP备13022795号 | 川公网安备 51140202000112号