Randomized trials for policy: a review of the external validity of treatment effects
The paper provides a first survey of the literature on external validity, using as a starting point recent debates regarding the use of randomized evaluations to inform policy. Besides synthesising contributions to the programme evaluation literature I consider definitions of external validity from other sub-disciplines within economics, such as experimental economics and the time-series forecasting literature, as well as from disciplines such as philosophy and medicine. Following Cook and Campbell (1979) I argue that the fundamental challenge arises from interactive functional forms. This somewhat neglected point provides a framework in which to understand how and why extrapolation may fail. In particular it suggests that replication cannot resolve the external validity problem unless informed by some prior theoretical understanding of the causal relationship of interest. Finally, the problem of interaction can be used to show that the assumptions required for simple external validity are conceptually equivalent to those required for obtaining unbiased estimates of treatment effects using non-experimental methods, undermining the idea that internal validity needs to be rigorously assessed whereas external validity can be ascertained subjectively. Theory may play a role in aiding extrapolation, but the extent to which this will be possible in practice remains an open question.
The external validity of class size effects: teacher quality in Project STAR
The
external validity of treatment effects is of fundamental importance for policy.
The paper explores this issue in the context of experimental evaluations of
class size effects on student outcomes. While the existing literature assumes
an additively separable educational production function, the way in which class
size is hypothesised to affect outcomes more plausibly implies an alternative
specification in which the marginal effect of size depends on teacher/class
quality. To investigate this possibility a novel measure of quality is used to
estimate possible interaction effects between teacher quality and class size in
the Tennessee Project STAR dataset. Results are mixed across grades and
subjects but include statistically and economically significant effects that
suggest dependence between the class size effect and class quality. It is
straightforward to show that interaction effects have implications for external
validity. Together these results suggest that the external validity of
class size effects will depend on the, typically unobserved, teacher quality
distribution in the populations of interest.
Constructing a value-added teacher quality measure using data from a randomized trial
Value-added measures of teacher quality are typically constructed using longitudinal, non-experimental adminstrative datasets. The primary challenge to identification is the non-random matching of teachers and students.We instead outline an approach to constructing a value-added measure from experimental data in which teachers are only observed for a single time period but students and teachers are randomly assigned to classes of certain types. Specifically, we use the Project STAR data to construct a teacher quality measure that is independent of class size and compare the ranking of teachers obtained in this way to alternative quality measures.