What type of validity that is achieved when a measure measures what it is presumed to measure is?

Validity refers to whether a measure actually measures what it claims to be measuring. Some key types of validity are explored below.

Face validity

Face validity is a measure of whether it looks subjectively promising that a tool measures what it's supposed to

  • e.g. It might be observed that people with higher scores in exams are getting higher scores on a IQ questionnaire; you cannot be sure that these are directly linked, but on the surface it appears that exam scores are a reasonable indication of IQ scores, so your measure shows good face validity.

Internal validity

Internal validity is a measure of whether results obtained are solely affected by changes in the variable being manipulated (i.e. by the independent variable) in a cause-and-effect relationship. Two key types of internal validity are:

  • Construct validity – asks whether a measure successfully measures the concept it is supposed to (e.g. does a questionnaire measure IQ, or something related but crucially different?).
  • Concurrent validity – asks whether a measure is in agreement with pre-existing measures that are validated to test for the same [or a very similar] concept (gauged by correlating measures against each other).

Internal validity can be assessed based on whether extraneous (i.e. unwanted) variables that could also affect results are successfully controlled or eliminated; the greater the control of such variables, the greater the confidence that a cause and effect relevant to the construct being investigated can be found.

External Validity

External validity is a measure of whether data can be generalised to other situations outside of the research environment they were originally gathered in. Two key types of external validity are:

  • Temporal validity – this is high when research findings successfully apply across time (certain variables in the past may no longer be relevant now or in the future).
    • e.g. Changes in attitude towards gender roles over time could lower the temporal validity of data from past experiments when applied to modern day research.
  • Ecological validity – whether data is generalisable to the real world, based on the conditions research is conducted under and procedures involved.
    • e.g. Laboratory research can exert a high degree of control over extraneous variables that would otherwise vary in a natural environment, so results might be considered too ‘artificial’ and thus possess low ecological validity.
      • However, mice, for example, might behave in the same way in a laboratory and in the wild, so laboratory experiments could arguably still maintain high ecological validity here.

The external validity of an experiment can be assessed and improved by replicating a study at different times and places, and obtaining similar results. For example, confidence in the generalisability [and in turn external validity] of results is increased when research is successfully replicated across different cultures.

Chapter 4.


1. What is the difference between the reliability and validity of a measurement?

The validity of a measure is the extent to which differences in scores on the instrument reflect true differences among individuals on the characteristic the instrument is supposed to measure. The reliability of a measure is its consistency.

2. What is construct validity and how do you test for it? Give an example.

Construct validity has to do with your ?construct? or ?concept? being a single unidimensional one, and on your measuring instrument?s ability to measure the intended construct and nothing more.

If the construct has several aspects or components, it will be difficult for any one measure to encompass all of them in all cases. Different approaches to measuring the construct produce different results because each gets at different aspects or because each comes from a different perspective. In this kind of situation you could say that you don't have a single unidimensional construct; you have a family of somewhat related but different constructs. In other words, you have a problem with construct validity. Bluntly, your construct is not valid.

If the construct is solid and coherent and doesn?t have a range of aspects that interact with your measurement approach, and if various operational definitions of the construct result in identical or similar measurements, then the construct and measurement approach together have construct validity.

The best way to assess construct validity is to use an approach called ?multiple methods, multiple measures.? To do this, you use several approaches to measurement, where each uses different methods or comes at the construct from a different direction. If the various approaches produce results that agree with one another, you can feel more confident that you in fact have only one construct. If they disagree, the validity of the construct would be called into question. However, it doesn?t necessarily mean that you have multiple constructs—you may simply have problems with your measurement procedures.

In a study of the effects of watching lots of violent films and TV programs, for example, the researchers might be attempting to measure increases in the amount of aggressive behavior performed by a large group of young men in some specified social setting. If they understand ?aggressive behavior? as overt physical belligerence and assault, their measurements will miss both explicit verbal aggression and more subtle forms of implicit or indirect aggression that are manifested by excluding others from conversation or from participating in social activities. A different group of researchers might have a different conceptual understanding of aggression and might thus tap into a different set of behaviors, resulting in measurements that disagree with those of the first team. There are more than one construct going under the name ?aggressive behavior? -- there is a problem with construct validity here.

3. What are the consequences of a lack of construct validity? Give an example to illustrate your answer.

When there is a lack of construct validity, different approaches to measuring the construct will produce different results because each gets at different aspects or because each comes from a different perspective. See the example in the last paragraph of the answer to Question 2.

4. What is it about construct validity that makes it more difficult to assess than the other types of validity?

Construct validity is probably the most difficult issue to deal with when you are studying abstract constructs. To assess it you use a complex, time-consuming approach called ?multiple methods, multiple measures.? To do this, you use several approaches to measurement, where each uses different methods or comes at the construct from a different direction. If the various approaches produce results that agree with one another, you can feel more confident that you in fact have only one construct. If they disagree, the validity of the construct would be called into question. Even if the results disagree, though, you can't be certain that you have multiple constructs — you may simply have problems with your measurement procedures.

5. What is the difference between construct validity and predictive validity? What would the consequences be if you confused the two and thought you had construct validity when you only had predictive validity?

Predictive or pragmatic validity means that the measurement makes correct predictions. Construct validity means that your ?construct? or ?concept? is a single unidimensional one, and that your measuring instrument is abile to measure the intended construct and nothing more. If you think you have construct validity but you only have predictive validity, you may be able to make correct predictions based on the results of your measurement, but your predictions will not agree with those made by a person who takes a different approach to measure the same construct, or your predictions will only be correct for some of the people you measure; for other people they will be quite incorrect. This would mean that the measure only has predictive validity in a limited range of situations; outside of that limited range, the measure gives misleading results.

6. What is the difference between random error and systematic error?

Systematic error is what is normally called ?bias.? It shows up as results consistently being distorted in the same direction. When you have systematic error you have a problem with validity. Random error is not consistently in one direction. It varies from case to case, both in magnitude and in direction. It is what is normally call ?unreliability.?

7. What is the difference between the reliability of a measurement and its accuracy?

To say that a measure is reliable means that it will be consistent -- it will give the same results if you repeat the measurement. This doesn't say anything about how accurate the results are; they may be consistently wrong and severely biased. When you estimate the amount of time it will take you to do something, your estimate may be more or less accurate. The closer your estimate is to the actual amount of time it takes, the more accurate it is.

8. Why is the multiple methods, multiple measures approach used to assess construct validity?

Construct validity means that your construct is unidimensional and coherent and that it doesn?t have a range of aspects that interact with your measurement approach -- that you have only one construct and not a family of related but different ones all mixed up under one name. If this is the case, than various operational definitions of your construct should result in identical or similar measurements. If you have a lack of construct validity, your construct will have several aspects or components, and it will be difficult for any one measure to encompass all of them in all cases. Different approaches to measuring the construct will produce different results because each gets at different aspects or because each comes from a different perspective. Using the multiple methods, multiple measures approach means that you are using different operational definitions and taking different approches to measuring the construct. You can see that this procedure will allow you to tell whether or not you have a single unidimensional construct -- whether or not you have construct validity.

9. What is the difference between the accuracy and the validity of a measurement? 30-33

The validity of a measure is the extent to which it is sensitive only to differences in what it is supposed to be measuring. A measure can be valid but not completely reliable (see pages 31 and 32), which means that there is random error in the results (but not systematic error). The accuracy of a measure is how close to the actual correct value it gives. An accurate measure would thus be a valid one in which there is little or no measurement error.

10. A professor is studying learning and academic performance and uses GPA as a measure of how much her students have learned. Discuss why (or why not) this is a valid measure how much her students have learned.

Think about this and consider how valid and reliable GPA is. Ask yourself what GPA actually measures. To help formulate an answer to this question, look at your own grades and think about what things, other than how much you learned, may influence your grades. For example, how much did you already know about the subject before the course beagn? What stressful events made your performance on examinations suffer? etc.

What is the type of validity that is achieved when a measure measures what it is presumed to measure?

Validity , often called construct validity, refers to the extent to which a measure adequately represents the underlying construct that it is supposed to measure.

What is the validity of measurements?

Validity refers to how accurately a method measures what it is intended to measure. If research has high validity, that means it produces results that correspond to real properties, characteristics, and variations in the physical or social world. High reliability is one indicator that a measurement is valid.

Is the type of validity that exists when an inspection of the items used to measure a concept suggests that they are appropriate on their face?

The type of validity that exists when an inspection of the items used to measure a concept suggests that they are appropriate "on their face". We can say that a measure has face validity if it is obviously pertains to the concept being measured more than to other concepts.

What are the 3 main types of measurement validity?

Here we consider three basic kinds: face validity, content validity, and criterion validity.