Threats to Internal Validity
Overview of Psych Methods
Survey Methods
Single-Case Designs and Small n Research
Repeated Measures Designs
Introduction to Complex Designs
Experiments vs Non-Experiments
Threats to Internal Validity
Scientific Method
Complex Design
Review Questions
Cafe Scientifique

Quasi-Experimental Designs and Program Evaluation

Applied Research

      Goal: To improve the conditions in which people live and work.

      Natural settings: Messy, “real world” — hard to establish experimental control.

      Quasi-experiments: Experimental procedures that approximate the conditions of highly controlled laboratory experiments.

      Program Evaluation: Applied research used to learn whether real-world treatments work.

Characteristics of True Experiments

      In true experiments, researchers manipulate an independent variable with treatment and comparison condition(s), and exercise a high degree of control (especially through random assignment to conditions).

      A true experiment is one that leads to an unambiguous outcome regarding what caused a result on the dependent variable.

Obstacles to Conducting True Experiments in Natural Settings

      Researchers may experience difficulty obtaining permission to conduct true experiments in natural settings and gaining access to participants.

      People frequently view random assignment to treatment as unfair, because some people who may need treatment don’t receive it.

      However, random assignment is the best way and fairest way to determine if a new treatment really is effective.

      A waiting-list control group may be used so that people randomly assigned to the control group receive treatment after the study is completed.

Advantage of True Experiments:
Threats to Internal Validity are Controlled

      Threats to internal validity are confounds that serve as plausible alternative explanations for a research finding.

      In order to make a causal inference, researchers rule out these alternative explanations.

      There are eight general classes of confounds referred to as “threats to internal validity”: history, maturation, testing, instrumentation, regression, selection, subject attrition, and additive effects with selection.

Threats to Internal Validity


When an event occurs at the same time as treatment and changes participants’ behavior, this event becomes an alternative explanation for the changes in participants’ behavior (rather than treatment); thus, participants’ “history” includes events other than treatment.


Threats to Internal Validity (continued)


Participants naturally change over time; these maturational changes, not treatment, may explain any changes in participants during the experiment.


Threats to Internal Validity (continued)


Taking a test generally affects subsequent testing; thus, participants’ performance on a measure at the end of the study may differ from an initial testing, not because of treatment but because they are familiar with the measure.


Threats to Internal Validity (continued)


Instruments used to measure participants’ performance may change over time (e.g., observers may become bored or tired); thus, changes in participants’ performance may not be due to treatment but to changes in the instruments used to measure performance.

Threats to Internal Validity (continued)


Participants sometimes perform very well or very poorly on a measure because of chance factors (e.g., luck). These chance factors are not likely to be present in a second testing, so their scores will not be so extreme — the scores “regress to the mean.” These regression effects, not the effect of treatment, may account for changes in participants’ performance over time.

Regression (continued)

Test score = true score + error (chance factors, etc.)

One definition of an unreliable test or measure is that it measures with a lot of error.

If people score very high or low on the test, it’s possible that chance factors produced the extreme score.

On a second testing, those chance factors are less likely to be present (otherwise they wouldn’t be chance).

Threats to Internal Validity (continued)

      Subject Attrition

When participants are lost from the study (attrition), the group equivalence formed at the start of the study may be destroyed; thus, differences between treatment and control groups at the end of the study may be due to differences in those who remained in each group rather than to the effects of treatment.


Threats to Internal Validity (continued)


When differences exist between individuals in treatment and control groups at the start of the study, these differences become alternative explanations for any differences observed at the end of the study (rather than treatment).

Threats to Internal Validity (continued)

      Additive Effects with Selection

When one group of participants responds differently to an external event (history), matures differently, or is measured more sensitively by a test (instrumentation), these threats (rather than treatment) may account for any group differences at the end of a study.


Threats to Internal Validity (continued)

      Important Points to Remember:

  When there is no comparison group in the study, the following threats to internal validity must be considered:

history, maturation, testing, instrumentation, regression, subject mortality, selection

  When a comparison group is added, the following threats to internal validity must be considered:

selection, additive effects with selection

Threats to Internal Validity (continued)

      Threats to internal validity that true experiments may not eliminate:


  Experimenter expectancy effects, and

  Novelty effects (including Hawthorne effect)

      Threats to external validity occur when treatment effects may not be generalized beyond the particular people, setting, treatment, and outcome of the experiment.

  The best way to assess the external validity of findings is to replicate the experiment.




Threats to Internal Validity (continued)

      Contamination: This occurs when there is communication about the experiment between groups of participants.

  Three possible outcomes of contamination:

    resentment: some participants’ performance may worsen because they resent being in a less desirable condition;

    rivalry: participants in a less desirable condition may boost their performance so they don’t look bad; and

    diffusion of treatments: control participants learn about a treatment and apply it to themselves.



Threats to Internal Validity (continued)

      Expectancy Effects: This occurs when an experimenter unintentionally influences the results of an experiment.

  Experimenters can make systematic errors in their interpretation of participants’ performance based on their expectations.

  Experimenters can make errors in recording data based on their expectations for participants’ performance.

Threats to Internal Validity (continued)

      Novelty Effects: This refers to changes in people’s behavior simply because an innovation (e.g., a treatment) produces excitement, energy, and enthusiasm

   A special case of novelty effects is the Hawthorne effect: performance changes when people know “significant others” (e.g., researchers, company bosses) are interested in them or care about their living or work conditions.

      Because of contamination, expectancy effects, and novelty effects, researchers may have difficulty concluding whether a treatment was effective.


      Quasi- (“resembling”) experiments provide an important alternative when true experiments are not possible.

      Quasi-experiments lack the degree of control found in true experiments.

      Researchers must seek additional evidence to eliminate threats to internal validity in a quasi-experiment.

The One-Group Pretest-Posttest Design

      This is a “bad experiment” and is sometimes referred to as a “pre-experimental design.”

  An intact group is selected for a treatment (e.g., a classroom of children, a group of employees).

  A pretest measure is used to record participants’ performance before treatment (O1— or “Observation 1”)

  The treatment (X) is implemented.

  A posttest measure is used to record performance following the treatment (O2).


                              O1 X O2



The One-Group Pretest-Posttest Design (continued)

      The one-group pretest-posttest design is a bad experiment because none of the threats to internal validity are controlled.

      Any change between pretest (O1) and posttest (O2) scores may be due to the treatment (X) or due to:

   History (some other event that coincided with treatment),

   Testing (the effects of repeated testing),

   Maturation (natural changes in participants over time),

   or due to the other threats to internal validity

Three Quasi-Experimental Designs

      Nonequivalent Control Group Design:

  a group similar to the treatment group serves as a comparison group, and

  researchers obtain pretest and posttest measures for individuals in both groups.

  random assignment to groups is not used

  pretest scores are used to determine whether the groups are equivalent

Example: Research Methods and Reasoning Ability

      Compare students in research methods courses and students in developmental psychology

      DV: 7-item test of methodological and statistical reasoning ability

Nonequivalent Control Group (continued)

      Suppose group differences are observed at a posttest.

      Rule out threats to internal validity:

  By adding a comparison group, researchers can rule out threats due to history, maturation, testing, instrumentation, and regression.

  We assume that these threats happen the same to both groups, therefore, these threats can’t be used to explain posttest differences.

Nonequivalent Control Group (continued)

      What threats are not ruled out?


   Because individuals are not randomly assigned to conditions, the two groups are not likely to be equivalent before the intervention (hence, “nonequivalent control”).

   These preexisting differences may account for group differences in the outcome at the end of the experiment.

Nonequivalent Control Group (continued)

Additive Effects with Selection: The two groups

   may have different experiences (selection- history effect), or

   may mature at different rates (selection- maturation effect), or

   be measured more or less sensitively by the instruments (selection-instrumentation effect), or…

Nonequivalent Control Group (continued)


  Additive Effects with Selection: The two groups:

   may drop out of the study at different rates (differential subject attrition), or

   may differ in terms of regression to the mean (differential regression).



Simple Interrupted Time-Series Design

      Observe a dependent variable for some time before and after a treatment is introduced (often, archival data are used).

            O1   O2   O3   O4   X   O5    O6   O7   O8


      Look for clear discontinuity in the time-series graph for evidence of treatment effectiveness.


Example: Study Habits

      Intervention: An instructional course to change students’ study habits, implemented during the summer following the sophomore year (after semester 4).

      DV: semester GPA

Simple Interrupted Time-Series Design (continued)

      Suppose a discontinuity is observed when treatment (X) is introduced.

      Rule out threats to internal validity:

  history threats are the most troublesome in this design,

  instrumentation threats also are likely in some studies.


Simple Interrupted Time-Series Design (continued)

      What threats are more easily ruled out?

  Maturation: We assume maturational changes are gradual, not abrupt discontinuities.

  Testing: If testing influences responses, these effects are likely to show up in the initial observations (i.e., before the intervention). Also, testing effects are less likely with archival data.

  Regression: If scores regress to the mean, they will do so in the initial observations.

Time Series with Nonequivalent Control Group Design

      Add a comparison group to the simple interrupted time series design:


            O1   O2   O3   O4   X   O5    O6   O7   O8


            O1   O2   O3   O4         O5    O6   O7   O8




Example: Study Habits

      Suppose a nonequivalent control group is added — these students don’t participate in the study habits course.

      Who should be in the comparison group?

      What threats are you able to rule out?

Program Evaluation

      Goal: To provide feedback to administrators of human service organizations in order to help them decide:

   what services they will provide

   to whom

   how to provide them most effectively and efficiently.

      This is a big growth area — particularly in the field of mental health (managed health care).

      Program evaluators in social services assess





Four Questions of Program Evaluation

      Needs: Is an agency or organization meeting the needs of the people it serves? (survey designs)

      Process: How is a program being implemented (is it going as planned)? (observational designs)

      Outcome: Has a program been effective in meeting its stated goals? (experimental, quasi-experimental designs; archival data)

      Efficiency: Is a program cost-efficient relative to alternative programs? (experimental, quasi-experimental designs; archival data)

Basic Research and Applied Research

       Program evaluation is the most extreme case of applied research — the goal of program evaluation is practical, not theoretical.

       The relationship between basic research and applied research is reciprocal:

    Basic research provides scientifically based principles about behavior and mental processes.

    These principles are applied in the complex, real world.

    New complexities are recognized (e.g., the scientific principles may not always apply in real-world settings) and new hypotheses must be tested in the lab using basic research.

Enter supporting content here

Dedicated to the science and art of academic testing.