Test validity

Test validity
See also

Test validity - an indicator of the extent to which a test measures what it is supposed to measure, as defined by E. Carmines and R. Zeller [1]. Validity simply tells you how accurate a test is for your field of focus.

A test's validity is established in reference to a specific purpose and specific groups called reference groups. Test developers must determine if their test can be used appropriately with the particular type of people - target group - you plan to test. And, most importantly, the test should measure what it claims to measure, not some other characteristics [2]. In other words, the purpose of testing and the use of the information gathered must always be taken into account. On a test with high validity, tested fields or competencies will be closely linked to the test's intended purpose. With this, the higher the test's validity is, the outcome and information gathered from the assessment will be more relevant to its purpose.

Types of validity - methods for conducting validation studies

As discussed by E. Carmines and A. Zeller, traditionally there are three main types of validity:

  • Criterion-related validity or Instrumental validity (concurrent and predictive) - calculates the correlation between your measurement and an established standard of comparison
  • Content-related validity or Logical validity - checks whether an assessment is the right representation of all aspects to be measured
  • Construct-related validity - ensures that the method of measurement relates to the construct you want to test

Another type of validity different sources relate to is face-related validity. That concept refers to the extend to which an assessment appears to measure what it is supposed to measure [3].

These types tend to overlap - depending on the circumstances, one or more may be applicable.

Validity measuring

The method of measuring content validity was developed by Lawshe C. in 1975 as a way for deciding on agreement among raters that determine how essential a particular item is. Each of the experts has to respond to the following question for each item: "Is the skill or knowledge measured by this item 'essential,' 'useful, but not essential,' or 'not necessary' to the performance of the construct?". Based on these studies, Lawshe developed the content validity ratio formula\[CVR=\frac{(n_{e}-N/2)}{(N/2)}\] where \( n_e \) = number of SMEs indicating "essential", \( N \) = total number of SMEs and \( CRV \in <+1; -1> \)

Positive values indicate that at least half of the experts rated the item as essential, so that item has some content validity. The larger number of panelists agree that a particular item is essential, the greater level of content validity that item has [4].

Validity and reliability of tests

Fig. 1 Visual representation of reliability and validity

Both concepts of test theory are in use to evaluate the accuracy of a test and allow to determine if the way the test measures something is sufficient enough. They are closely related, but refer to different terms. Reliability is about the consistency for repeated measurements - it refers to the reproduction of measures, while validity refers to their accuracy.

A valid test should be reliable, but a reliable one is not necessarily valid, as reproducible results may not be correct.

Footnotes

  1. Carmines E., Zeller R., 1979
  2. U.S. Department of Labor Employment and Training Administration, 1999
  3. English F., 2006
  4. Lawshe C., 1975

References

Author: Anna Strzelecka