Validity refers to evidence that supports the way test scores are used and the impact these uses can have on individuals. We use test scores to make inferences about what students know and can do. Validity affects the inferences we are able to make from these test scores.
Content validity is one source of evidence that allows us to make claims about what a test measures. It is the degree to which the content of a test is representative of the domain it is intended to cover. In order to use a test to describe achievement, we must have evidence to support that the test measures what it is intended to measure. For instance, if, after administering a test, we want to make statements about how a student reads, it is imperative that the test comprehensively measures the most important, relevant topics essential to the subject and skill of reading. All educational assessments aim to reason from specific things students do, make, or say to broader inferences about their knowledge and abilities. Without evidence of content validity, we cannot have confidence in these inferences.
The purposes of a test define how the test should be used, who should use it, who should take it, and what types of interpretations should be based on the results. This is why the purposes of a test must be clearly stated at the outset of the assessment development process. Once the
test purpose is defined, the test can be developed such that the outlined purposes are always at the forefront of the development process. Then it becomes possible to evaluate how the items are selected, how a test is used, and what is done with the results relative to the articulated test purpose.
Once the test purpose is clear, it is possible to develop an understanding of what the test is intended to cover. It is the test developers’ responsibility to provide specific evidence related to the content the test measures. In evaluating large-scale assessments, such as the Iowa Assessments™, this requires a very specific statement of the test content, or test domain. Often this comes in the form of content and performance standards as well as test specifications, which together outline what can be covered on an assessment. It is possible to think of the process of defining test content in terms of concentric circles (Figure 1). The largest and most encompassing circle is the construct. The construct is the concept or characteristic that a test is designed to
measure. It may be a broad range of knowledge and skills represented by subject area domains. Next, it is necessary to identify the student behaviors that are examples of those constructs, and then determine what types of tasks or situations can be used to elicit those behaviors.
Once criterion behaviors are established, it is possible to develop content and performance standards that appropriately communicate them. From there, we can define the target domain and the types of items that appropriately sample that domain by creating test specifications to guide development of the test.
Download the rest of the article to continue reading!