计算机评估,基于Web的评估和计算机自适应测试(CAT)都涉及使用计算机技术用于选择和评估的人员工具的分类。这些术语中最笼统的计算机评估是指使用计算机接口提出的任何评估工具。基于Web的评估是一种专业的计算机评估形式,它依赖于万维网的功能来进行评估。最后,CAT使用计算机技术以非常规的方式进行测试。从本质上讲,这种测试形式根据他或她过去的回答来适应测试者,以产生针对特定人的测试经验。
Computer Assessment
最简单的计算机评估形式包括拿起纸笔仪器并在计算机上介绍该评估上的项目。这些测试通常被称为翻页器测试,因为计算机的技术用于将测试者从一个项目转移到另一个项目,例如在传统的纸笔评估中从页面转换为页面。但是,可以将计算机技术的更广泛利用集成到评估系统中。例如,计算机允许测试开发人员在评估中包括音频和视频文件等多媒体元素。此外,计算机技术为互动评估提供了比纸笔格式更多的交互式评估的机会 - 例如,评估可能包括计算机化的“篮子”,该评估对个人的行为做出响应或在查询时提供信息。
基于网络的评估
基于Web的评估通过将Internet纳入评估过程,将计算机化评估的过程进一步进一步。互联网的功能允许在评估过程中提高灵活性。Internet使评估可以在各种位置和不同时间进行管理,而无需专门软件。基于Web的评估允许测试者在最方便的情况下以自己的房屋的舒适性和隐私来完成评估电池,而无需Proctor来管理测试。互联网技术还为仅使用计算机而无法使用的独特评估创造了机会。例如,基于Web的访谈可以使用视频会议技术来进行面对面和电话访谈之间的访谈。
计算机自适应测试
计算机自适应测试在评估中提出了明显不同的计算机技术应用。常规测试通常由所有测试者暴露的一组项目组成。由于大多数测试都包含容易,中等困难和困难的物品的混合物,因此一些测试者接触到不适合其能力的项目。例如,需要高能力的测试者来回答一些非常简单的项目,而低能的考生被迫与一些极其困难的物品搏斗。由于高性能的人倾向于使所有简单的物品都正确,因此这些项目无助于在高能力考生之间区分。低能力考生的情况也是如此,他们在困难的项目上几乎没有成功的机会。因为这些不适当的项目在具有相似能力的测试者之间没有区别,所以更有效的解决方案是要求测试者只对适合其能力水平的项目做出反应。这是猫发挥作用的地方。
The process of CAT is as follows: An examinee responds to an initial item that is presented. The adaptive test then uses a statistical model called item response theory to generate an estimate of the examinee’s ability, and based on this estimate, the computer selects an item of appropriate difficulty to be presented next. This procedure continues iteratively after each item has been answered until some criterion for stopping the test is reached, such as answering a pre-specified number of items, reaching a time limit, or achieving a certain level of measurement precision. Because the presentation of items is tailored to examinees, test takers no longer have to answer questions that are extremely easy or exceedingly difficult for them. Because inappropriate items have been eliminated, adaptive testing procedures are much more efficient. In addition, because the items presented are tailored to a particular individual, CAT provides a more precise estimate of a test taker’s true score.
This process of tailoring items to a particular examinee creates a testing process that is quite unlike conventional tests. An obvious result of CAT is that all test takers do not receive the same items. Unlike conventional tests, which administer a fixed set of items to all examinees, CAT presents items based on individual response patterns. Thus, two examinees taking the test in the same place and at the same time might receive two completely different sets of questions. Computer adaptive tests also differ in the way an individual’s score is calculated. On conventional tests, an individual’s test score is determined by the number of questions he or she answered correctly. However, on an adaptive test, scores are not based solely on the number of items answered correctly but also on which items were answered correctly. Test takers are rewarded more for answering difficult questions correctly than for answering easy questions correctly. Unlike traditional paper-and-pencil tests, which allow test takers to skip items and return to them later, review their answers to previous items, and change answers to items already completed, adaptive tests usually do not permit any of these actions. Instead, test takers must advance through adaptive tests linearly, answering each question before moving on to the next one, with no opportunity to go back.
优点
The integration of technology and assessment confers a number of advantages, the most obvious being the potential for new and different types of assessment. Computerized multimedia or interactive assessments have the potential to increase the perceived realism of the assessment, thereby improving face validity and even criterion-related validity. In addition, novel performance indexes can be collected using computer technology, such as response latencies, which may further enhance validity or reduce adverse impact. In the case of CAT, these assessments are considerably more precise and efficient, taking one third to one half the time of a conventional test. Adaptive tests also provide increased test security. Because test takers receive items that are tailored specifically to them, it is virtually impossible to cheat. Similarly, conventional computerized tests may be more secure because there are no test forms that can be compromised.
计算机评估的另一个优点是他们能够提供瞬时反馈以测试接受者有关其性能的能力。尽管技术的前期成本很高,但计算机化评估在经济上还是有利的。由于没有打印成本,因此创建新测试或切换到其他测试表的成本可以忽略不计。不需要人力来评估评估或从个人的回答中编译数据,从而使结果较少容易出错。在基于Web的评估的情况下,可以消除测试监管者的成本。基于Web的评估赋予了其他优势和成本节省,因为它可以在任何可以访问Internet访问的地方进行管理。
Critical Issues
As with any new technology, there are a number of potential pitfalls that must be avoided to make full use of these techniques. One major concern with the use of technologically sophisticated assessments is adverse impact, especially because of the known disparity in access to technology among different groups. Test security must also be managed differently when technology is involved. Computerized assessments can enhance test security because there is no opportunity for test forms or booklets to be compromised. However, test administrators must protect the computerized item banks as well as the computerized records of individuals’ responses. Unproctored Web-based assessment creates the additional security dilemma of not knowing exactly who might be taking a particular assessment device or whether the respondent is working alone or getting assistance.
重要的是要考虑新程序的测量等效性。测量等效性的概念与使用计算机进行测试的测试是否会产生相当于该测试的纸笔版本中获得的分数。研究表明,适应性管理的测试等同于常规管理的评估。认知能力测试也会产生相似的分数,无论它们是以纸笔还是计算机格式管理的。但是,在快速测试(测试严格的时间限制的测试)上,得分的等效性急剧下降。计算机化非认知评估时应注意小心,因为非认知电池的测量等效性仍然相对未知。
测量等效性的问题也扩展到了监督和未经计算机测试的问题。基于Web的程序为您提供更多的评估机会,可以在未经验证的环境中完成,但是问题仍然是没有监督者获得的分数是否等同于在监督管理中可能获得的分数。
Finally, because technologically sophisticated assessment procedures are very different from traditional procedures that applicants are accustomed to, test takers’ reactions to the procedures must be taken into account. This is particularly true for testing procedures that use novel item types or for CAT, which uses an unfamiliar testing procedure.
结论
计算机评估,基于Web的评估和CAT提供了比常规管理评估的许多优势,并且可能会在未来的选择和评估领域中主导。但是,在实施这些技术复杂的评估时必须注意确保程序的可靠性和有效性。
参考:
- Anderson,N。(2003)。申请人和招聘人员对新技术的反应:未来研究的关键审查和议程。国际选择与评估杂志,11,121-136。
- Drasgow, F., & Olsen, J. B. (1999). Innovations in computerized assessment. Mahwah, NJ: Lawrence Erlbaum.
- Mead,A。D.和Drasgow,F。(1993)。计算机和纸笔认知能力测试的等效性:荟萃分析。心理公告,114,449-458。
- Tonidandel,S.,Quinones,M。A.,&Adams,A。A.(2002)。计算机自适应测试:测试特征对感知性能和测试者反应的影响。应用心理学杂志,87,320-332。
- Wainer,H。(2000)。计算机自适应测试:底漆。新泽西州马瓦(Mahwah):劳伦斯·埃尔鲍姆(Lawrence Erlbaum)。