Validity Definition
有效性是指根据某种测量结果做出的推论的正确性。也就是说,当我们测量某些东西时,我们需要询问我们是否准确地进行了测量,并完全反映了我们打算测量的内容。例如,根据使用(正常)磁带度量或标尺产生的观察到的分数,对人们身高的个体差异的推论是高度有效的。适当使用时,磁带量度的应用将产生观察到的测量值(例如,英寸,毫米,脚),与实际高度的实际差异密切相对应。
Common Misconceptions
It is common to hear people refer to the “validity of the test,” which might give the impression that validity is a property of the measurement device. However, this is incorrect. Validity is not a property of any assessment device; rather, it is a property of the inferences that you—the test user—make. For example, consider once again the tape measure. We might be tempted to say that “the tape measure has validity.” However, if we made inferences about differences in intelligence based on that same set of measurements rather than differences in height, those inferences would likely be highly incorrect. Nothing has changed about the tape measure or the set of measurements generated from its application. What has changed is the inference about what is being measured.
Although this might seem an absurd example (presumably no one would use a tape measure to measure intelligence), it demonstrates that validity is not a property of the measurement instrument but of the inference being made. The phrase “the test has validity,” though technically inappropriate, is often used because there is a general assumption about which inferences are (and are not) to be made from the use of a well-known measurement device. For example, testing experts may say, “The Wonderlic has good validity.” On the surface, this may seem profoundly inaccurate; however, it should be understood that this statement actually means (or at least, should mean), “Inferences regarding individual differences in general mental ability, and inferences regarding the probability of future outcomes such as job performance, are generally appropriate by relying on observed scores generated from the appropriate use of the Wonderlic.” That we sometimes use shorthand to abbreviate such a long statement should not be taken to imply that validity is a property of the test. Rather, it should be interpreted as suggesting there is reliable and verifiable evidence to support the intended set of inferences from the use of a given measurement device.
第二个常见的误解是有不同类型的有效性。取而代之的是,最好将有效性视为一个统一概念,该概念解决了如何完全准确地衡量其打算衡量的内容。但是,没有一种方法或策略可以提供准确或自信的推断所需的所有证据。因此,存在多种策略来产生此类证据。通常,这些策略(或更恰当地,从这些策略中产生的证据)被称为有效性类型,这是不幸的单词选择,因为它通常会导致误解,即有效性是许多不同的事物,并且某些类型的有效性是或多或少比其他类型有用。有效性是一个单一的统一想法:它涉及我们在测量中观察到的差异的程度,可以用来对某些不可观察的现象进行准确和自信的推论。
Typical Approaches to Generating Validity Evidence
工业和组织(I/O)心理学家经常关心是否可以自信地依靠给定的测量手段来做出有关雇用和晋升的准确决策。为此,I/O心理学家试图将某些工作的知识,技能或能力(从工作分析中识别)与某些确定的工作需求或标准相关联。但是,此过程需要做出许多不同的推论,反过来,这需要大量证据来支持它们。例如,有必要确保预测指标和标准措施准确并完全反映其旨在反映的工作要求和工作需求。还必须获得证据表明这两种措施是系统上相关的,并且关系并不是某种无意间评估的无关因素的结果。为了获得支持如此大的推论所需的证据,I/O心理学家通常使用三种一般方法:(a)内容有效性,(b)标准相关的有效性和(c)构建有效性。
内容有效性推断
The term content validity typically refers to inferences regarding the degree to which the content on a measurement device adequately represents the universe of possible content denoting the targeted construct or performance domain. There are a variety of methods or strategies that are useful for generating evidence to support content validity inferences; however, to establish the relevance of any evidence, it is first necessary to clearly define the performance domain or construct of interest and to identify the specific objectives for the assessment tool’s use (i.e., develop test specifications). These two activities circumscribe the universe of relevant content and constrain the set of inferences that one hopes to support.
Criterion-Related Validity Inferences
与标准相关的有效性是指可用于对未来行为或结果进行有用的推论(即准确的预测)的程度。通常,与标准相关的有效性的证据来自预测指标与标准度量之间的相关性。当然,为了支持与标准相关的有效性的有用推断,必须首先确定理论上有意义的标准构建体(即,应与预测指标表示的结构相关联或影响哪种类型的未来行为或结果),以及确保有衡量强大内容有效性证据的标准结构的衡量标准。
Construct Validity Inferences
The attempt to establish evidence for construct validity inferences is tantamount to theory testing. Construct validity encompasses a wide set of inferences regarding the nature of the psychological construct and its place in a larger nexus of constructs. In a sense, all validity inferences are part of construct validity. For example, strong support for content validity inferences can be used to support claims concerning the construct that is being measured by the assessment device. Criterion-related validity evidence is useful, too; a content-valid measure of a given construct should be related to (content-valid measures of) other constructs nearby in the nomological network and should not be related to (content-valid measures of) constructs that are far removed from the nomological network. Often, this type of evidence is referred to as convergent and discriminant validity, respectively. It is in this sense that construct validity is similar to theory testing. The definition of the construct and its relation to other constructs is in fact a mini-theory that produces specific hypotheses regarding the results of the measurement process. If most or all of those hypotheses are supported, we can be confident in the assessment device’s utility for generating observed scores, which, in turn, can be used to make a limited set of accurate inferences.
References:
- Binning,J。F.和Barrett,G。V.(1989)。人员决策的有效性:对推论和证据基础的概念分析。应用心理学杂志,74,478-494。
- Crocker,L。和Algina,J。(1986)。古典和现代测试理论简介。纽约:Holt,Rinehart&Winston。
- Cronbach,L。J.和Meehl,P。E.(1955)。在心理测试中构建有效性。心理公告,52,281-302。
- Kane,M。T.(1992)。基于参数的有效性方法。心理公告,112,527-535。