判断心理措施的一个重要标准是他们的分数反映了人们对感兴趣的属性的真实地位,例如认知能力和尽责。测量理论认识到,在度量上得分至少反映了两个组件:一个真实组件和误差组件。尽管理论在定义这些组件的方式方面有所不同,它们之间的关系程度以及它们集中的错误类型,但它们都对测量错误表示关注。概括性理论(G理论)是一种测量理论,它提供了估算多个误差源对得分和单个索引的综合效果的贡献的方法,即一种概括性系数(G-Coeffel)。
G理论的基础
G理论的根源是,由于错误(即误差差异)可以将人的分数变异性划分为组件,每种都反映了不同的错误源。例如,在尝试使用访谈测量一个人的人际交往技能水平时,错误可能来自
- 提出的具体问题,例如受访者如何解释这个问题的差异;
- 进行访谈的特定面试官,例如面试官对每个受访者的熟悉程度有所不同;和
- 进行采访的特殊场合,例如在采访当天受访者的心情。
以前注意到的所有差异都可能影响一个人的面试评分,原因与人的际交往能力无关。通过采用细粒度检查错误的方法,G理论家对降低措施质量的因素获得了批判性的了解。
Partitioning Variance in G-Theory
在G理论中,分数的方差通常通过方差分析(ANOVA)来分区。进行的ANOVA类型遵循测量设计,该设计描述了如何测量给定属性。G理论家描述了测量方面的测量设计,即测量条件的集合,其中收集了有关测量对象(正在测量的实体)的数据。继续采访例子,测量方面可能包括问题,访调员和场合;而测量的对象将是受访者。在G理论中,测量的方面和对象是ANOVA模型中的因素,该因素用于生成其贡献的估计(以及它们的相互作用)对分数方差的差异。
定义G理论中的真实差异和误差差异
归因于测量对象,方面及其相互作用的差异估计通常称为方差组件。与测量对象相关的差异组件被解释为对真实方差的估计值 - 分数的可变性量归因于测量对象之间的差异,例如受访者在感兴趣的属性上,包括人际关系技能。G理论家将这种差异称为宇宙得分差异。是否将特定的方差组件解释为错误,取决于研究人员希望借鉴测量对象的推论类型以及研究人员希望概括得分的测量方面。
为了说明这种依赖性,请考虑前面讨论的访谈示例。如果推论仅限于受访者对人际交往技能的相对顺序,则只有那些导致受访者在人际交往技能上不同的差异来源将定义为错误。在G理论中,该错误称为相对错误。对象(例如受访者和方面)之间的相互作用可以证明相对错误,包括问题和访问者的测量。例如,逐个互动的受访者越大,受访者对人际交往技能的相对顺序越多,取决于提出的问题。当用相对术语定义错误时,G-Coefficients反映了测量对象的一致性。从技术上讲,G-coefficer定义为宇宙得分方差与宇宙得分方差加上误差差异的比率,范围从0到1。
Alternatively, someone might wish to make inferences about persons’ true standing on some attribute compared with a fixed standard such as a cut score or performance standard. Such absolute comparisons are labeled criterion-referenced comparisons. In this case any source of variation causing an observed score to differ from a true score would be defined as error. In
G-theory this type of error is referred to as absolute error. Absolute error includes not only interactions between of the objects and facets of measurement but also main effects of the facets (e.g., variation in mean interpersonal skill scores across questions because of differences in question difficulty). Facet main effects do not contribute to relative error because they do not affect the relative ordering of objects of measurement; rather they only affect the distance between objects’ observed scores and true scores. When error is defined in absolute terms, G-coefficients (often called phi-coefficients when error is defined in absolute terms) reflect an estimate of absolute agreement regarding the standing of the objects of measurement on the attribute of interest across facets of measurement.
Decisions regarding which sources of variance are defined as error also depend on the facets across which the researcher wishes the scores to generalize. Returning to the interview example, to generalize interpersonal skill scores across questions, any inconsistency in the relative ordering of interviewees (or in mean score differences, if absolute error is a concern) across questions would be considered error.
Although the aforementioned example describes generalizing across a single facet (i.e., questions), there may be a need to generalize across other facets as well, such as interviewers. When considering two or more sources of error, there is the potential for interactions between the sources. For example, an interviewee’s interpersonal skill score may depend not only on the question used to assess interpersonal skill but also on the specific interviewer who rated the interviewee’s response to that question such as an interviewee-by-question-by-interviewer interaction.
测量设计的局限性
A key insight made clear by G-theory is that not all measurement designs allow researchers to estimate the sources of error that may be of concern to them. For example, assume that in implementing the interview described previously, the same interviewer conducts one interview with each interviewee. Although error may arise from the particular interviewer used, as well as the particular occasion on which the interview was conducted, it is not possible to estimate the contribution of these sources of error to observed interview scores based on this measurement design. To determine whether the relative ordering of interviewees on interpersonal skill differs across interviewers or occasions, obtain ratings for each interviewee from multiple interviewers on multiple occasions. Thus the fact that this particular measurement design only involved one interviewer and a single administration of the interview prevents assessing the generalizability of interview scores across interviewers and occasions.
The measurement design in the aforementioned example is also problematic in that the estimate for true variance in interpersonal skill (if variance in observed interview scores were decomposed) partially reflects variance arising from the interviewee-by-interviewer and interviewee-by-occasion interactions. To eliminate the variance attributable to these interactions from the estimate of true variance requires multiple interviewers to rate each interviewee on multiple occasions. Thus just because a given measurement design prohibits estimating the impact of a source of error on observed scores does not imply that the error is eliminated from a measure. Indeed, the error is still present but hidden from the researcher’s view and, in this example, inseparable from the estimate of true variance.
G理论as a Process
当Lee J. Cronbach及其同事在40多年前引入时,G理论的应用是根据开发和实施可推广的测量程序的两步过程概念化的。该过程的第一步是进行一项概括性研究(G-Study)。G-Study的目的是使用测量设计来收集给定度量的数据,该测量设计使研究人员可以生成所有关注的错误源的估计值(因此避免了上一节中提出的限制)。通过这样的估计,研究人员可以估计该度量在各种潜在测量条件下的普遍性。例如,根据G-Study的发现,研究人员可以估计其测量设计的每个方面所需的观察值(例如,问题的数量,访调员,场合)以达到所需的概括性水平。第二项称为决策研究(D-Study)的研究将涉及实施通过G-Study确定的测量程序,以收集有关要做出决定的人的数据。
尽管实际的约束通常会消除这种两阶段的方法,但它可以改善工业/组织(I/O)研究具有重要价值。具体而言,它迫使研究人员提供预见并了解其措施中关注的错误来源的知识。有了这些知识,研究人员可以采取措施来改善测量程序,通过确定在何处进行改进可能在何处产生最大的影响(针对最大的错误来源),然后才能实施其衡量措施以做出真正的决定(例如,雇用谁,促进谁)。换句话说,G理论为建立改进的措施提供了一个明确的过程。
参考:
- Brennan,R。L.(2001)。概括性理论。纽约:Springer-Verlag。
- Cronbach,L。J.,Gleser,G。C.,Nanda,H。,&Rajaratnam,N。(1973)。行为测量的可靠性:可概括性的概况和概况。纽约:威利。
- DeShon, R. P. (1998). A cautionary note on measurement error corrections in structural equation models. Psychological Methods, 3, 412-423.
- Deshon,R。P.(2002)。概括性理论。在F. Drasgow&N。Schmitt(编辑)中,组织中的行为:测量和数据分析的进步(第189-220页)。旧金山:乔西 - 巴斯。
- Feldt,L。S.和Brennan,R。L.(1989)。可靠性。在R. L. Linn(ed。)中,教育测量(第3版,第105-146页)。纽约:美国教育与麦克米伦委员会。
- Shavelson,R。J.和Webb,N。M.(2001)。概括性理论:底漆。加利福尼亚州千橡市:圣人。