Table of Contents | Previous | Next |
Chapter Three
METHODOLOGICAL ISSUES
Synthesizing the literature on the causal effects of the reform policies described in the last chapter is the goal of this report. To do so, we first must assess the quality of each individual study addressing a particular topic (i.e., a policy-outcome combination). We then must assess the quality and quantity of the entire body of evidence available on that topic. Three methodological issues influence our assessment of individual studies. The first issue concerns the methods that a study uses to draw causal inferences. The second issue concerns the nature and characterization of the policy variation that the studies use to estimate the effects of the policy. The third issue concerns the data used to measure the outcomes of interest.
In this chapter, we first discuss these factors and how they contribute to our assessment of individual studies. We then provide summary information about the studies we include in our synthesis. We conclude by discussing how we weigh the results from multiple studies to draw conclusions about the effects of a particular policy on a particular outcome.
3.1. METHODS OF CAUSAL INFERENCE IN INDIVIDUAL STUDIES
This synthesis aims to answer the question: What is the effect of a given policy (e.g., a lower benefit reduction rate or a time limit) on a given welfare-related outcome (e.g., the caseload or child development), holding all else equal? To do this, we review studies that attempt to assess the causal effects of welfare reform. Although we restrict our attention to causal studies, it is important to note that not all welfare reform studies attempt to assess causation. Nor are causal studies the only types of analyses that are useful for assessing the success of welfare reform.
Examples of some important noncausal studies are the leavers studies that track the behavior of families that have left the welfare rolls. The results from 15 such studies, most of which were funded by USDHHS, are summarized by USDHHS (2001a). Those results show that 17 to 38 percent of leavers return to the welfare rolls within one year after their exit, and between 62 and 90 percent work at some time during that same year. On average, leaver families have post-exit incomes that are similar to the incomes that they had while on aid.15
Such studies provide information that is essential for monitoring the status of families that leave aid. However, they do not provide estimates of the causal effects of welfare reform. They cannot compare the behavior we do observe to the behavior we would have observed if reform had not taken place. In the language of the evaluation literature, they provide no "counterfactual" against which to assess the effects of reform.
We focus this synthesis on the causal effects of welfare reform because this is what a policymaker needs to know to make informed policy choices. For example, suppose Congress considers eliminating the federal time limit. In considering such a policy change, members of Congress would want to know how welfare-related outcomes would differ between two different policy scenarios: (1) a baseline scenario that leaves the time limit in place and (2) an alternative scenario that eliminates the time limit. When making the comparison, the legislator would want to hold everything else constant, such as the effect of the economy.
This thought experiment defines what we mean by the effect of a policy, but it does not tell us how to measure it. The reason is that we do not observe the outcome under the counterfactual scenario. Rather, we observe only the outcome corresponding to the policy that was actually chosen. The challenge facing the analyst is to devise a research design that predicts what outcomes would have been under the counterfactual policy.
This requires the researcher to design and implement a research strategy that holds all else constant. If the researcher fails to hold constant other factors that could independently influence the outcome, such as the economy, then the resulting estimates of the effects of the policy may be misleading. In the evaluation literature, such estimates are termed biased or inconsistent. They may reflect not only the effect of the policy of interest, but also the effects of the other factors. These other factors that could yield misleading results are referred to as confounding influences, or simply confounders.
The literature on the effects of welfare policies has adopted two general research strategies for dealing with confounding influences: random assignment and econometric analyses of observational data. Both of these methods have strengths and weaknesses. Because both approaches contribute to our understanding of the effects of welfare reform, we discuss each of them in some detail. In the core synthesis chapters, we consider evidence from both types of studies.
3.1.1. Random Assignment
One attractive approach to the problem of confounding factors is random assignment.16 Rather than relying on existing variation in policies or programs, the analyst induces random variation. To test a new program, a study population is chosen, such as all persons receiving aid at a particular time. Then, each member of the study population is assigned either to the control group, which is subject to the baseline policy environment, or the treatment group, which is subject to the new policy environment. The assignment is determined by the logical equivalent of a coin toss.
In principle, this approach holds everything constant except the policy whose effect the analyst seeks to estimate. Since families assigned to the new program differ from those assigned to the baseline program only by a flip of a coin, confounding influences, such as the economy, should be identical for the two groups. If randomization is implemented properly, there should be no systematic differences across the two groups other than those attributable to the different policy environments. Thus, the average effect of the policy, which is referred to as the "treatment effect" or the "impact" of the policy, can be estimated by the difference in mean welfare-related outcomes between the two groups.17
Such random assignment experiments can be a powerful evaluation tool. The crucial importance of controlling for confounding factors and the potential of random assignment for doing so led ACF-USDHHS to require random assignment evaluations as a component of section 1115 waivers.18 We review many of the studies emerging from such waiver evaluations in the chapters that follow.
Despite their advantages, however, random assignment studies have a number of disadvantages. First, random assignment evaluations can be conducted only when random assignment was performed at implementation. Random assignment is not always feasible. Even when it would be feasible, it is expensive and difficult to implement. As a result, random assignment is not always built into a program’s implementation
Second, random assignment evaluations of welfare reform capture the effect of the new policy only from the point of randomization, almost always for women who are on welfare. Some reforms, however, such as work requirements and policies designed to affect fertility and family formation, are expected to deter people from ever using welfare. Since individuals deterred from entering the program will never be included in the study population, conventional random assignment evaluations will not capture the effects of reform on welfare entries. This is important because recent evidence suggests that more than half of the decline in the welfare caseload results from changes in entry rates rather than from changes in exit rates (Haider, Klerman, and Roth, 2001).
Third, random assignment experiments may not reproduce the environment of a universally implemented program. Broader implementations may affect labor markets or service providers (e.g., the capacity of educational institutions). Experiments may be more likely to be implemented in locations with above-average management capability and may attract the best managers. Thus, implementation in other sites or more broadly in a given site may yield smaller effects.
Fourth, random assignment studies are not immune to problems that can bias their findings. For example, experimental contamination may result when treatment group members "cross-over," by moving to a location that is not part of the evaluation or when control group members become eligible to receive the program services. Sample attrition for subsequent follow-up data collection may be nonrandom and therefore bias the estimated treatment effects (Heckman, Smith, and Taber, 1998).
A final important problem involves participants’ perceptions of the rules that apply to them. In several of the random assignment studies we summarize below, members of the treatment group were confused about which of the new policies applied to them. In others, members of the control group incorrectly believed that they were subject to the new reforms. This latter form of confusion is of particular concern in exceptional-control evaluation designs, where almost all the population is subject to the new "treatment" rules and only a small fraction of the study population is held back under the old "control" rules. In an environment where most recipients are subject to welfare reform and welfare reform receives considerable public and media attention, it may be difficult to persuade the controls that they are not subject to the new rules themselves.19 Consequently, the control group may behave more like the treatment group, thereby biasing the estimated program impacts toward zero.
3.1.2. Econometric Methods for Observational Data
An alternative to random assignment is to analyze observational data, that is, to compare outcomes across different policy regimes (time-place combinations), typically using administrative data or national survey data. This is the primary method available to evaluate reforms that were not incorporated into the random assignment experiments. Unlike conventional random assignment, analyses of observational data can capture the effects of reform on welfare entries.
The key methodological problem with the analysis of observational data is that, while the policy environment will vary across observations, many confounding factors will vary as well. For policy analysis, we want to measure the effect of the policies, holding all else equal. To estimate that effect, we need to control for these confounding influences.
Regression analysis is the standard approach to this problem. To control for the effect of the economy, for example, econometric studies usually include the local unemployment rate as an independent variable in a linear regression model. In this way, standard regression methods control for observable confounding factors under the implicit assumptions that their effects are linear and additive. If these assumptions are correct and the observed variable (i.e., the unemployment rate) adequately represents the potentially confounding factor (i.e., the economy), then regression techniques eliminate any bias that could otherwise arise as a result of such a confounding factor.
Because many of the factors influencing welfare-related outcomes are difficult or even impossible to measure, the principal challenge facing econometric studies is controlling for unobserved, or unmeasureable, confounding influences. Now even more than before PRWORA, states determine their welfare programs. However, states differ in many ways besides their welfare programs, and some of those differences–in general attitudes or political sentiment, for example–may affect both welfare-related outcomes and welfare policy in the state. If such unobservable differences are not somehow controlled for, then the analyst may erroneously attribute changes in welfare-related outcomes to changes in welfare policy that are, in fact, attributable to unobserved factors. This problem of unobservable influences that may confound the relationship between welfare policy and welfare-related outcomes goes by many names in the research literature, including "unobserved heterogeneity," "policy endogeneity," "omitted variable bias," and "spurious correlation." It is the central threat to the validity of econometric studies.
In the literature on welfare reform, the standard approach to this problem is known as the difference-of-differences (DoD) method. DoD controls for two types of unobservable influences: those that vary between states, but are constant within a state over time; and those that vary over time, but similarly for all states. An example of a state-specific, time-invariant unobservable might be the state’s general political leaning. An example of a uniform national trend might be the macroeconomic policy environment.
To illustrate how the DoD approach works, we consider a very simple example in which there are two states and two time periods. Both state 1 and state 2 have the baseline policy in the first period. State 1 adopts the new policy in the second period, but state 2 does not. The DoD estimator of the effects of the new policy compares the before-and-after change in welfare-related outcomes in state 1 to the before-and-after change in state 2. Because both before-and-after changes are computed within the respective states, both control implicitly for time-invariant, state-specific factors that could confound the relationship between the policy change and the welfare-related outcome. Because both changes are computed between the same two time periods, their difference (that is, the difference of differences) nets out the influence of any nationwide trends.
Most of the econometric literature evaluating the effects of welfare reform uses a generalization of this DoD approach.20 In that generalization, this basic DoD insight is implemented in a multiple regression framework using dummy variables for each state and for each year. Because it controls for such state fixed-effects, the DoD estimator is often referred to as the "state fixed-effects" estimator. The multiple regression framework allows for more than two states and more than two periods. It also allows for the fact that states adopt new policies at different times. It can allow for state-specific linear (or quadratic) time trends to capture smoothly trending changes in a state over time. Finally, it allows the analyst to control explicitly for observable factors such as the unemployment rate and benefit levels.21
3.2. MEASURING THE POLICY ENVIRONMENT
Ideally, we would like to learn the effect of each individual policy embedded in the TANF reforms. Furthermore, we would like to learn the extent to which policies interact with each other, such that the effect of adopting two policies together is greater than (or less than) the sum of the effects of adopting each policy alone.22 To do so, we need to observe outcomes when a single policy or a given bundle of policies is implemented. Several issues arise in measuring the policy environment that are relevant for both random assignment and econometric studies. In this section, we focus on issues associated with the nature and characterization of the policy environment in random assignment and econometric studies in turn.
3.2.1. Characterizing the Policy Environment in Random Assignment Studies
Random assignment studies are designed to measure the impact of the "treatment," that is, the program features that differ between the experimental and control groups. The experimenter thus controls the policy environment being evaluated through the design of the program. In the case of welfare reform evaluations, these program features include financial incentives to encourage work, requirements to participate in work-related activities, sanctions policies, parental responsibility requirements, and so on.
For a few of the random assignment studies we consider, the treatment consists of a single policy reform, such as a family cap or a parental responsibility requirement. Two studies employ dual-treatment designs. These involve two experimental groups (in addition to the control group), both of which experience financial work incentives and one of which additionally is subject to mandated work-related activities. The dual-treatment design provides information about the effect of financial work incentives and the incremental effect of the work-related activity mandate. However, most of the random assignment studies were implemented to evaluate multifaceted state waiver programs rather than specific reform policies. Therefore most studies involve a single treatment group that is subject to multiple policy reforms. Such designs shed light on the effects of reform as a bundle, although they generally cannot be used to estimate the impact of specific reforms.
While the policy or policies being evaluated in a random assignment study may be known in principle, another issue that affects experimental studies is program implementation. Programs with the same name are often implemented very differently in one place than they are in another. Insight into implementation issues is often provided by a process study involving analyses of program records, a caseworker survey, or a recipient survey. These process analyses might reveal that some locations have the management capacity to train and motivate employees to implement a new program, while other places appear to be less successful. One might argue that we should be interested in the "average" effect of implementation. There is concern, however, that the sites participating in demonstration programs have more management capacity (or use it more effectively in the demonstration sites) than would be the case for the average site trying to implement the reforms in the context of a statewide program. In the individual synthesis chapters, we sometimes note external evidence that suggests that some of the variation across sites may result from differences in the quality of implementation.
Finally, while our synthesis focuses on a set of random assignment studies implemented during the 1990s, primarily as part of implementing welfare waivers, the individual policy components or even the bundle of components evaluated in the experiments do not necessarily equate with the state-level reforms implemented under TANF. Thus, while the impact estimates from the evaluations may capture the effect of implementing a given reform or set of reforms in a given setting during a given time period, they may not generalize to the impact that we would expect to see for the TANF program as implemented in an entire state in the post-TANF era. The findings from the random assignment studies are most useful in demonstrating the direction and magnitude of the effects for a particular reform or group of reforms, but we must be more circumspect in the inferences we draw from these studies about the impacts of policies as actually implemented across the states.
3.2.2. Characterizing the Policy Environment in Econometric Studies
While random assignment studies generate their own policy variation to estimate the effects of the policy, econometric studies make use of the variation in policies that exists between states and over time. This requires the analyst to characterize the welfare policies that are in place in each state in each year. This has proven to be a difficult undertaking, in part because there are so many policy components to characterize and in part because a single policy component can vary along several dimensions. Moreover, those dimensions may be difficult to quantify and even more difficult to measure as actually implemented at the state or local level.23
The approach taken by most analysts has been to specify a policy change in terms of the date on which the policy (or policy bundle) was adopted. The analyst constructs a dummy variable that is equal to one after the policy is in place and equal to zero before the policy is in place.24 The analyst then includes that dummy variable in her regression model. In the context of a DoD regression, the coefficient attached to such a variable indicates how much the welfare-related outcome changed in the state that adopted the policy once the policy was adopted, implicitly using states that did not adopt the policy change at the same time to control for unobservable confounding factors.
There are some ambiguities associated with this approach. For example, some analysts assume that the policy was in place once it was passed into law, whereas others assume that it was not in place until it was officially implemented.25 In practice, the appropriate date is probably even later, when the program is rolled out, staff are trained, and news of the new environment reaches recipients. Such in-the-field implementation dates are difficult to operationalize, which explains why they have never been used in observational studies. As a result, estimates of program effects are likely to be too conservative (i.e., too small).
The main virtue of the dummy-variable approach is that it is simple and transparent. Analysts may disagree about whether the legal adoption date or the official implementation date best characterizes the date on which the policy was put in place, but such disagreements are narrow. Moreover, one can test whether the difference is important empirically.
However, an important drawback of using adoption dates alone to characterize reform policies is that they capture only one dimension of policy variation. They provide no information about other dimensions of variation that might have important effects on behavior. For example, many states implemented financial work incentives during the waiver period. Some states cut the benefit reduction rate from essentially 100 percent to 75 percent, whereas others cut them to 50 percent or less. Economic theory predicts that bigger incentives should have stronger effects on employment, but the dummy variable approach treats all financial work incentives as being equal. Thus, it misses an important dimension of policy variation.
Furthermore, the dummy variable approach may be more susceptible to confounding factors than policy variables that capture additional dimensions of variation. Dummy welfare reform variables tend to equal zero in the early part of the sample period and to equal one at the end of the sample period. Thus, they are correlated with trends, or more precisely, in the context of DoD regressions, with state-specific trends that deviate from national trends. If the analyst fails to control for such trends, the results may be biased estimates of the effects of reform, since the reform dummy is correlated with the trends.
One approach to this problem is to characterize the reform more completely. Rather than simply including a dummy variable, it is sometimes possible to include a variable describing the intensity of the reform (e.g., the size of a financial work incentive). In this case, we would expect to find not only an effect when the policy is adopted, but also a larger effect when a stronger form of the policy is adopted. The variation across states allows more precise estimates. In addition, the additional implication that the effect should vary with the strength of the reform can be tested.
Of course, defining other dimensions of variation is not always simple. An example is illustrated by several analysts’ characterization of sanction policies. Starting in the waiver era and continuing into the post-PRWORA period, many states stiffened the sanctions they impose on recipients who violate their work requirements or personal responsibility mandates. In some states, an initial violation results in the adult’s portion of the grant being deleted until the recipient comes into compliance with the requirement. At the other end of the spectrum, some states cancel the family’s entire grant until the family comes into compliance (and in some states for a minimum duration). In many states, sanctions become more stringent for repeat offenders. Seven states impose lifetime, full-family sanctions for repeated noncompliance with work requirements, even if the family comes back into compliance.26
Four different sets of analysts have offered characterizations of states’ sanction policies, coding them as lenient, stringent, or in between. The characterizations are shown in Table 3.1. The aspects of state policies used to rate the different states vary between analysts; as a result, the summary ratings vary to a considerable extent. Pennsylvania is a noteworthy case in point; its sanction polices are rated as lenient by two sets of analysts, moderate by another, and severe by another. Indeed, the four sets of ratings are in agreement for only 25 of the 51 states. This poses a problem for comparing results across studies. If analysts cannot agree on what a strict sanction policy is, the effects of a "strict" sanction policy may vary across studies for reasons that have more to do with measurement than with real behavior. Moreover, none of the rankings incorporates information about the monetary value of the sanctions. This is important because a partial-family sanction in a high-benefit state may result in the same financial penalty as a full-family sanction in a low benefit state. Likewise, none of the rankings incorporates information about the rate at which sanctions are imposed, which is shown in the last two columns of the table. As mentioned in Chapter 2, standard deterrence theory predicts that both the severity of the sanction and the probability of detection should affect behavior. This suggests that any characterization of sanctions that omits information about the likelihood that they are applied is incomplete.
Another obstacle to characterizing states’ welfare reform policies is associated with implementation issues. Policy dummies, or even more detailed measures of policy characteristics, usually only capture variation in official statutes and regulations. However, the de facto variation in implementing a statute may be as important as the de jure variation in the actual statutes. For example, states with the same full-family sanction policy have varying numbers of people who have actually been sanctioned; states with similar time-limit policies vary in the fraction of people who receive extensions when they reach the time limit.
Finally, even assuming that the policy environment could be accurately captured for econometric analysis, the fact that most states implemented policies in bundles rather than individually poses an additional hurdle for statistical inference. This is similar to the problem of policy bundling in the random assignment studies. If policies are adopted together, there is less variation along each policy dimension to separately measure the effect of an individual policy. The limited number of post-reform state-year cells available for study, resulting both from the currency of the reforms and the lags in data release, makes it particularly difficult to distinguish the effect of one policy from that of another. As a result, econometric studies have typically been more successful in estimating the effects of reform-as-a-bundle than in estimating the separate effects of individual reforms.
|
Study |
|||||||
|---|---|---|---|---|---|---|---|
|
State |
CEA (1999) |
GAO (2000) |
Burke and |
Pavetti and |
All |
Percentage |
Percentage |
|
(1) |
(2) |
(3) |
(4) |
(5) |
(6) |
(7) |
|
|
Alabama |
Intermed. |
Intermed. |
Intermed. |
Stringent |
4.5 |
9.8 |
|
|
Alaska |
Lenient |
Lenient |
Lenient |
Lenient |
Yes |
0.0 |
4.3 |
|
Arizona |
Intermed. |
Intermed. |
Intermed. |
Intermed. |
Yes |
5.1 |
5.1 |
|
Arkansas |
Stringent |
Lenient |
Stringent |
Lenient |
0.3 |
3.2 |
|
|
California |
Lenient |
Lenient |
Lenient |
Lenient |
Yes |
0.1 |
2.3 |
|
Colorado |
Intermed. |
Intermed. |
Intermed. |
Intermed. |
Yes |
0.0 |
5.4 |
|
Connecticut |
Intermed. |
Stringent |
Intermed. |
Intermed. |
2.7 |
3.1 |
|
|
Delaware |
Stringent |
Stringent |
Intermed. |
Stringent |
0.8 |
15.3 |
|
|
DC |
Lenient |
Lenient |
Lenient |
Lenient |
Yes |
7.4 |
|
|
Florida |
Stringent |
Stringent |
Stringent |
Stringent |
Yes |
0.3 |
1.8 |
|
Georgia |
Stringent |
Stringent |
Stringent |
Stringent |
Yes |
2.3 |
2.3 |
|
Hawaii |
Lenient |
Stringent |
Lenient |
Stringent |
0.0 |
0.0 |
|
|
Idaho |
Stringent |
Stringent |
Stringent |
Stringent |
Yes |
1.1 |
1.1 |
|
Illinois |
Intermed. |
Intermed. |
Intermed. |
Intermed. |
Yes |
1.2 |
6.0 |
|
Indiana |
Intermed. |
Lenient |
Lenient |
Lenient |
0.9 |
5.5 |
|
|
Iowa |
Intermed. |
Lenient |
Intermed. |
Stringent |
0.6 |
3.3 |
|
|
Kansas |
Stringent |
Stringent |
Stringent |
Stringent |
Yes |
0.0 |
1.0 |
|
Kentucky |
Intermed. |
Lenient |
Lenient |
Intermed. |
1.6 |
5.7 |
|
|
Louisiana |
Intermed. |
Stringent |
Intermed. |
Stringent |
0.4 |
3.2 |
|
|
Maine |
Lenient |
Lenient |
Lenient |
Lenient |
Yes |
0.1 |
5.3 |
|
Maryland |
Stringent |
Stringent |
Intermed. |
Stringent |
0.0 |
11.3 |
|
|
Massachusetts |
Intermed. |
Lenient |
Intermed. |
Stringent |
4.7 |
4.7 |
|
|
Michigan |
Intermed. |
Intermed. |
Intermed. |
Stringent |
2.4 |
4.5 |
|
|
Minnesota |
Lenient |
Lenient |
Lenient |
Lenient |
Yes |
0.3 |
7.6 |
|
Mississippi |
Stringent |
Stringent |
Stringent |
Stringent |
Yes |
0.0 |
0.9 |
|
Missouri |
Lenient |
Lenient |
Lenient |
Lenient |
Yes |
1.4 |
12.3 |
|
Montana |
Lenient |
Lenient |
Lenient |
Lenient |
Yes |
1.0 |
8.0 |
|
Nebraska |
Stringent |
Lenient |
Stringent |
Stringent |
2.2 |
2.2 |
|
|
Nevada |
Stringent |
Intermed. |
Intermed. |
Intermed. |
0.8 |
3.2 |
|
|
New Hampshire |
Lenient |
Lenient |
Intermed. |
Lenient |
0.0 |
4.8 |
|
|
New Jersey |
Intermed. |
Stringent |
Intermed. |
Stringent |
2.8 |
8.0 |
|
|
New Mexico |
Intermed. |
Intermed. |
Intermed. |
Intermed. |
Yes |
0.0 |
0.0 |
|
New York |
Lenient |
Lenient |
Lenient |
Lenient |
Yes |
0.0 |
0.0 |
|
North Carolina |
Lenient |
Lenient |
Lenient |
Intermed. |
0.5 |
29.1 |
|
|
North Dakota |
Intermed. |
Intermed. |
Intermed. |
Stringent |
7.0 |
7.0 |
|
|
Ohio |
Stringent |
Stringent |
Stringent |
Stringent |
Yes |
0.0 |
0.9 |
|
Oklahoma |
Stringent |
Lenient |
Stringent |
Stringent |
0.0 |
2.2 |
|
|
Oregon |
Intermed. |
Intermed. |
Intermed. |
Intermed. |
Yes |
0.5 |
0.6 |
|
Pennsylvania |
Stringent |
Lenient |
Lenient |
Intermed. |
0.0 |
6.3 |
|
|
Rhode Island |
Lenient |
Lenient |
Lenient |
Lenient |
Yes |
0.2 |
3.0 |
|
South Carolina |
Stringent |
Stringent |
Stringent |
Stringent |
Yes |
0.2 |
5.7 |
|
South Dakota |
Intermed. |
Stringent |
Intermed. |
Stringent |
0.9 |
0.9 |
|
|
Tennessee |
Stringent |
Stringent |
Stringent |
Stringent |
Yes |
0.3 |
0.3 |
|
Texas |
Lenient |
Lenient |
Intermed. |
Intermed. |
0.0 |
15.5 |
|
|
Utah |
Intermed. |
Intermed. |
Intermed. |
Stringent |
0.0 |
4.0 |
|
|
Vermont |
Intermed. |
Lenient |
Lenient |
Intermed. |
0.0 |
||
|
Vermont |
Intermed. |
Lenient |
Lenient |
Intermed. |
0.0 |
||
|
Virginia |
Stringent |
Intermed. |
Stringent |
Stringent |
0.7 |
0.7 |
|
|
Washington |
Lenient |
Lenient |
Lenient |
Lenient |
Yes |
0.0 |
5.6 |
|
West Virginia |
Stringent |
Intermed. |
Intermed. |
Intermed. |
0.0 |
||
|
Wisconsin |
Stringent |
Stringent |
Stringent |
Stringent |
Yes |
4.6 |
22.8 |
|
Wyoming |
Stringent |
Stringent |
Stringent |
Stringent |
Yes |
2.0 |
2.0 |
|
Total/Average |
25 |
1.1 |
6.1 |
||||
| NOTES: Columns (6) and (7) are from GAO (2000) and pertain to 1998. The terminology used to describe the severity of sanctions differs among authors. Our "lenient" category corresponds to the categories described as "partial/partial" by CEA (1999); "partial" by GAO (2000); "weak" by Rector and Youssef (1999); and "lenient" by Pavetti and Bloom (2001). Our "intermediate" category corresponds to the categories described as "partial/full" by CEA (1999); "graduated" by GAO (2000); "moderate" or "delayed full-check" by Rector and Youssef (1999); and "moderate" by Pavetti and Bloom (2001). Our "stringent" category corresponds to the categories described as "full/full" by CEA (1999); "full-family" by GAO (2000); "initial full-check" by Rector and Youssef (1999); and "stringent" by Pavetti and Bloom (2001). |
3.3. DATA SOURCES FOR WELFARE OUTCOMES
To use either randomized trials or econometric methods to estimate the effects of a policy or program, the analyst requires data on welfare-related outcomes. In this section, we review the most commonly used data sources and discuss their utility. We also discuss sample sizes and statistical power, issues that are relevant for both econometric and experimental analyses of welfare-related outcomes.
3.3.1. Data Sources
Tables 3.2 and 3.3 summarize the major sources of administrative and survey data, respectively. Randomized trials often abstract their own data from these administrative records and from their own surveys. Econometric studies usually analyze existing sources of administrative and survey data.
As seen in Table 3.2, administrative data cover many of the outcomes of interest, and, for welfare program participants, they cover the entire caseload at a given time. The drawbacks of administrative data include issues of data quality, lack of coverage of non-TANF participants (e.g., leavers) and eligible nonparticipants (e.g., those choosing not to enter), and limited information on individual socioeconomic characteristics. Certain outcomes, such as measures of child well-being, are not typically available in administrative data sources. There is also considerable variation in state-level data systems and in their suitability for research, as well as in the extent of historical information and cross-state comparability of data systems.
|
Data Source |
Outcomes |
Coverage |
Notes |
|---|---|---|---|
|
State reports to |
Caseload |
All states, pre- and |
Aggregate program |
|
State reports to |
Work activities; |
All states, post-TANF |
Aggregate data only on |
|
State-specific |
Caseload; aid |
Within a single state; |
Issues of data quality; |
|
Unemployment |
Employment; earnings |
Within a single state; |
Gaps in coverage; data |
|
Other social |
Participation in Food |
Within a single state; |
|
|
Other administrative |
Births, etc. |
Nationwide (e.g., births) |
Welfare recipients not identified |
The major sources of survey data shown in Table 3.3 also cover many of the welfare-related outcomes of interest, often for large nationally representative samples observed both before and after welfare reform. These databases are typically rich in the socioeconomic information they contain, and they usually cover both program participants and nonparticipants. Some surveys track respondents over time so that the dynamics of behavior can be studied over both short and long horizons. The limitations of these databases include relatively small samples of welfare program participants, which is an acute problem for most longitudinal surveys; nonrandom survey nonresponse in cross-section surveys; attrition in panel surveys; and reporting errors for participation in many welfare programs. State-level survey data based on samples drawn from administrative records of welfare recipients can be hampered by problems with locating and tracking those no longer on aid.
|
Data Source |
Outcomes |
Coverage |
Notes |
|---|---|---|---|
|
Current Population Survey (CPS) |
Program participation and income/poverty status in previous calendar year; employment and earnings at interview and in previous calendar year; family structure |
Nationwide, relatively consistent survey content back to 1968 |
Sample size: 55,000 households; increasing in March 2001 and beyond |
|
Survey of Income and Program Participation (SIPP)
and |
Monthly data for same outcomes as CPS, plus (monthly) program entry and exit |
Nationwide, relatively consistent survey content back to 1984 |
Sample size: varies, about 30,000 households at any point in time; survey is a panel, following respondents for about two-and-a-half years |
|
Panel Study of Income Dynamics (PSID) |
Same as CPS, plus program entry and exit and measures of child well-being |
Nationwide sample of families followed since 1968 |
Sample size: 4,800 families in original cohort augmented by split-offs to 8,000 families in 2001 |
|
National Longitudinal Survey of Youth (NLSY) |
Same as CPS, plus program entry and exit and measures of child well-being |
Nationwide cohort of youth followed since 1979 |
Sample size: about 11,500 youth in original cohort plus the children of the original cohort of young women followed since 1986 |
|
National Survey of America’s Families (NSAF) |
Similar to CPS plus hardship, housing, health status and health care use, attitudes, knowledge of service availability |
Representative population in 13 states interviewed in 1997 and 1999 |
Sample size: about 44,000 households in repeated cross-sections |
|
State surveys |
Vary |
Current and recent recipients |
Details vary across states; issues with locating former recipients; limited cross-state comparability |
|
Program evaluation surveys |
Vary |
Treatment and control groups |
Details vary across evaluations |
3.3.2. Statistical Power
Whether the available data are sufficient to estimate policy effects precisely using econometric methods is the subject of some discussion in the literature. (See, in particular, Adams and Hotz, 2001.) The econometric studies discussed in this synthesis almost all use DoD methods. As such, they require that the outcomes be consistently measured across time, both before and after reform, and across states, and that there be enough observations in each state-year cell to construct at least a rough estimate of the outcomes of interest in the population of interest.27
These seemingly simple requirements make most conventional data sources unusable for econometric analyses. The requirement of consistent data across states rules out state-specific administrative data files. The requirement of consistent data across years rules out most single interview studies or studies that began after reform was under way. The requirement that it be possible to construct a rough estimate of the outcomes of interest in the population of interest rules out almost all other data sources. To understand this, note that welfare participation is a relatively rare behavior. The peak national welfare caseload (in persons) was about 15 million, and in early 2001 the figure was 5.5 million; in percentage terms, this is 5.5 and 2.1 percent of the population, respectively. Whether these numbers are "large" from a social perspective is part of the policy debate. However, from a survey research perspective, these are quite small numbers. Even a moderate-sized random sample of 10,000 households is likely to yield only a few hundred households with any welfare recipients. The resulting state-specific estimates of both the rate of welfare receipt and the rate of change of welfare receipt will be noisy (i.e., it will differ considerably from the true value because of sampling variability). Much of the variation will not be a result of variation in the true number of people receiving welfare, but instead of variation in who happens to be sampled, that is, to classical sampling variability.
Thus, the requirements for consistent national data, across multiple years and for large samples, appear to rule out all survey data except the CPS, the SIPP, and perhaps the NLSY and the PSID.
Finally, it is worth noting that power issues may also arise in the experimental evaluations, especially in the analysis of effects for population subgroups. While power calculations are used to ensure that the total sample in treatment and control groups is sufficiently large to detect an impact of a given size with a high probability, the likelihood of detecting impacts of the same size on smaller subgroup samples may be much smaller. Thus, detecting impacts in subgroups will require larger (often much larger) samples and thus a much higher cost of evaluation. Furthermore, some sites are simply not large enough to support the required samples. One approach for addressing this issue is to pool results across studies and then consider subgroup differences. This is the strategy adopted by Michalopoulos and Schwartz (2000).
3.4. SUMMARY OF STUDIES INCLUDED IN THE SYNTHESIS
As noted above, our synthesis draws on both random assignment evaluations and econometric studies. To better understand these studies, we review their key features. We begin with the details of the random assignment studies.28
3.4.1. Features of Random Assignment Studies
Tables 3.4 and 3.5 summarize the features of the experimental evaluations that we draw on for the synthesis and provide a useful reference for the discussion of these evaluations in the chapters that follow. We include only those studies that reasonably approximate the types of policies implemented under TANF. As a result, we exclude some evaluations that consider very specialized reforms such as those that focus on service delivery for teen parents on welfare or at risk of welfare participation (e.g., the Learning, Earning, and Parenting (LEAP) program, the New Chance program, and the Teen Parent Demonstration program), child support policies (e.g., New York Child Assistance Program), and specialized service delivery (e.g., the Postemployment Services Demonstration program). Furthermore, we exclude some of the earlier welfare-to-work experiments that predate the 1988 Family Support Act (e.g., San Diego’s Saturation Work Initiative Model and the early GAIN experiments in several California counties). We also exclude Project Independence, which was Florida’s initial JOBS program. It was the precursor to Florida’s Family Transition Program (FTP), which we do include.
Table 3.4 describes basic features of the experiments such as the location of the demonstration, whether it was part of a statewide reform, the population served, the period of randomization and length of the follow-up, the sample sizes in treatment and control groups, the policy environment for the controls (typically AFDC/JOBS), and contextual information in the form of the unemployment rate and welfare benefit level. Table 3.5 provides details on the policy reforms applicable to the treatment group. This includes information on the central reform components of financial work incentives, mandatory work-related activities, and time limits, as well as other reforms such as sanctions, family caps, parental responsibility requirements, transitional child care and health insurance, changes in eligibility for two-parent families, and various other features (e.g., changes in asset limits and use of personal responsibility agreements).
| Name | State | Sites | Demo part of statewide reform? | Cases served | RA period start | RA period end | F/U length | Sample sizes | Controls | Initial conditions in state (for RA start year) |
Cite(s) | |||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Total | T | C | U rate (%) | Max $ grant (a) | ||||||||||
| A. Programs that focus on financial work incentives | ||||||||||||||
| California Work Pays Demonstration Project (CWPDP) | CA | 3 counties | No | Single-parent recipients (b) | Oct 92 | Dec 92 | 42 months | 7,841 | 5,211 | 2,630 | AFDC/ JOBS |
9.3 | 663 | Becerra et al. (1998) Hu (2000) |
| Welfare Restructuring Project Incentives Only (WRP-IO) | VT | 6 welfare districts |
Yes | Single-parent recipients and applicants (b) | Jul 94 | Jun 95 | 42 months | 2,196 | 1,087 | 1,109 | AFDC/ JOBS |
4.7 | 638 | Bloom et al. (1998) Hendra and Michalopoulos (1999) Bloom, Hendra and Michalopoulos (2000) |
| Minnesota Family Investment Program Incentives Only (MFIP-IO) | MN | 3 urban counties |
No | Urban single-parent long-term (> 24 mos. in last 36 mos.) recipients | Apr 94 | Mar 96 | through 6/98 | 1,769 | 835 | 934 | AFDC/ JOBS |
4.0 | 532 | Miller et al. (1997) Miller et al. (2000) Gennetian and Miller (2000) |
| Urban single-parent recent applicants | Apr 94 | Mar 96 | through 6/98 | 3,113 | 980 | 2,133 | ||||||||
| B. Programs that focus on financial work incentives tied to hours of work | ||||||||||||||
| New Hope | WI | 2 areas of Milwaukee | No | Poor families employed FT at RA | Jul 94 | Dec 95 | through 12/98 | 418 | 218 | 200 | No New Hope benefits | 4.7 | 518 | Bos et al. (1999) Bos and Varga (2001) |
| Poor families not employed FT at RA | Jul 94 | Dec 95 | through 12/99 | 935 | 459 | 476 | ||||||||
| Self-Sufficiency Project (SSP) (c) | Canada (BC, NB) | Province-wide | No | Single-parent recipients | Nov 92 | Mar 95 | 36 months | 5,729 | 2,880 | 2,849 | Traditional Income Assistance | 10.5 (BC) 12.8 (NB) |
1,131 (BC) 747 (NB) |
Michalopoulos et al. (2000) Morris and Michalopoulos (2000) |
| Self-Sufficiency Project Plus (SSP-Plus) (c) | NB, Canada | lower NB | No | Single-parent recipients | Nov 94 | Mar 95 | 18 months | 596 | 293 | 303 | Traditional Income Assistance | N.A. | N.A. | Quets et al. (1999) |
| Self-Sufficiency Project Applicants (SSP-A) (c) | BC, Canada | Vancouver and lower British Columbia | No | Single-parent applicants (no IA for at least 6 months prior to RA) | Feb 94 | Feb 95 | 30 months | 2,852 | 1,422 | 1,430 | Traditional Income Assistance | N.A. | N.A. | Michalopoulos, Robins and Card (1999) |
| C. Programs that focus on mandatory work-related activities | ||||||||||||||
| LA Jobs-1st GAIN | CA | Los Angeles County | No | Single parent recipients and applicants (b) | Apr 96 | Sep 96 | 24 months | 15,683 | 11,521 | 4,162 | AFDC/ JOBS plus Work Pays (d) |
7.2 | 607 | Freedman et al. (2000b) |
| Atlanta Labor Force Attachment (LFA) | GA | Atlanta | No | Recipients and applicants | Jan 92 | Jan 94 | 5 years (e) | 2,938 | 1,441 | 1,497 | AFDC/ JOBS plus "fill-the-gap" budgeting (f) | 7.0 | 280 | Freedman et al. (2000a) McGroder et al. (2000) Hamilton et al. (2001) |
| Grand Rapids Labor Force Attachment (LFA) | MI | Grand Rapids | No | Recipients and applicants | Sep 91 | Jan 94 | 5 years (g) | 3,012 | 1,557 | 1,455 | AFDC/ JOBS |
9.3 | 474 | Same as above |
| Riverside Labor Force Attachment (LFA) | CA | Riverside | No | Recipients and applicants | Jun 91 | Jun 93 | 5 years | 6,726 | 3,384 | 3,342 | AFDC/ JOBS plus Work Pays after late 1993 (d) |
7.7 | 694 | Same as above |
| Portland | OR | Portland | No | Recipients and applicants; no cases with substantial barriers | Feb 93 | Dec 94 | 5 years | 4,028 | 3,529 | 499 | AFDC/ JOBS |
7.3 | 460 | Same as above |
| Atlanta Human Capital Development (HCD) | GA | Atlanta | No | Recipients and applicants | Jan 92 | Jan 94 | 5 years (e) | 2,992 | 1,495 | 1,497 | AFDC/ JOBS |
7.0 | 280 | Same as above |
| Grand Rapids Human Capital Development (HCD) | MI | Grand Rapids | No | Recipients and applicants | Sep 91 | Jan 94 | 5 years (g) | 2,997 | 1,542 | 1,455 | AFDC/ JOBS |
9.3 | 474 | Same as above |
| Riverside Human Capital Development (HCD) | CA | Riverside | No | Recipients and applicants, low education | Jun 91 | Jun 93 | 5 years | 4,938 | 1,596 | 3,342 | AFDC/ JOBS plus Work Pays after late 1993 (d) |
7.7 | 694 | Same as above |
| Columbus Integrated | OH | Columbus | No | Recipients and applicants | Sep 92 | Jul 94 | 5 years (h) | 4,672 | 2,513 | 2,159 | AFDC/ JOBS |
7.3 | 334 | Same as above |
| Columbus Traditional | OH | Columbus | No | Recipients and applicants | Sep 92 | Jul 94 | 5 years (h) | 4,729 | 2,570 | 2,159 | AFDC/ JOBS |
7.3 | 334 | Same as above |
| Detroit | MI | Detroit | No | Recipients and applicants | May 92 | Jun 94 | 5 years (i) | 4,459 | 2,226 | 2,233 | AFDC/ JOBS |
8.9 | 459 | Same as above |
| Oklahoma City | OK | Oklahoma City | No | Applicants | Sep 91 | May 93 | 5 years (e) | 8,677 | 4,309 | 4,368 | AFDC/ JOBS |
6.7 | 341 | Same as above |
| Indiana Manpower Placement and Comprehensive Training Program (IMPACT) Basic Track | IN | Statewide | Yes | Recipients and applicants, less job ready | May 95 | Dec 95 (j) | 2 years (k) | 3,856 | 3,090 | 766 | AFDC/ JOBS |
4.7 | 288 | Fein et al. (1998) |
| D. Programs that focus on financial work incentives and mandatory work-related activities | ||||||||||||||
| Welfare Restructuring Project (WRP) | VT | 6 welfare districts | Yes | Single-parent recipients and applicants (b) | Jul 94 | Jun 95 | 42 months | 4,376 | 3,267 | 1,109 | AFDC/ JOBS |
4.7 | 638 | Bloom et al. (1998) Hendra and Michalopoulos (1999) Bloom, Hendra and Michalopoulos (2000) |
| Minnesota Family Investment Program (MFIP) | MN | 3 urban counties (l) |
No | Urban single parent long-term (> 24 mos. in last 36 mos.) recipients (b) | Apr 94 | Mar 96 | through 6/98 | 1,780 | 846 | 934 | AFDC/ JOBS |
4.0 | 532 | Miller et al. (1997) Miller et al. (2000) Gennetian and Miller (2000) |
| Urban single parent recent applicants (b) | Apr 94 | Mar 96 | through 6/98 | 4,049 | 1,916 | 2,133 | ||||||||
| To Strengthen Michigan Families (TSMF) | MI | 4 local service offices | Yes | Single-parent recipients (b) | Oct 92 | Oct 92 | 4 years | 8,739 (m) | 4,462 | 4,277 | Until 10/94: AFDC/ JOBS After 10/94: Modified AFDC/ JOBS |
8.9 | 459 | Werner and Kornfeld
(1997) |
| Single-parent applicants (b) | Oct 92 | Sept 95 | 1 to 2 years | 6,042 (m) | 3,017 | 3,025 | ||||||||
| Family Investment Program (FIP) | IA | 9 counties | Yes | Recipients | Sept 93 | Sept 93 | 14 quarters | 6,684 | 4,461 | 2,223 | AFDC/ JOBS |
4.0 | 426 | Fraker and Jacobson
(2000) |
| Applicants | Oct 93 | Mar 95 | 8 quarters | 6,009 | 3,973 | 2,036 | ||||||||
| E. Programs that focus on other individual reforms | ||||||||||||||
| Arkansas Welfare Waiver Demonstration Project (AWWDP) | AR | 10 counties | N.A. | Recipients and applicants | N.A. | N.A. | N.A. | N.A. | N.A. | N.A. | N.A. | N.A. | N.A. | Turturro et al. (1997) |
| Family Development Program (FDP) (n) | NJ | 10 counties (o) | Yes | Recipients | Oct 92 | through 12/96 | 4,875 | 3,268 | 1,607 | AFDC/ JOBS |
8.5 | 424 | Camasso, Harvey and
Jagannathan (1996) Camasso et al. (1998, 1999) |
|
| Applicants | Oct 92 | Dec 94 | through 12/96 | 3,518 | 2,233 | 1,285 | ||||||||
| Primary Prevention Initiative (PPI) | MD | 6 welfare offices (4 urban, 2 rural) | Yes | Recipients and applicants | Jun 92 | Aug 95 | 1 to 2 years | 1,775 (p) | 911 | 864 | AFDC/ JOBS |
6.7 | 377 | Minkovitz et al (1999) |
| Preschool Immunization Project (PIP) | GA | Muscogee County | Yes | Recipients (q) | Nov 92 | Nov 92 | 4 years | 2,801 (r) | 1,076 | 1,725 | AFDC/ JOBS |
7.0 | 280 | Kerpelman, Connell, and Gunn (2000) |
| F. Programs that focus on TANF-like bundle of reforms (time limits with financial incentives, work-related activities, or both) | ||||||||||||||
| Employing and Moving People Off Welfare and Encouraging Responsibility (EMPOWER) | AZ | 3 in Phoenix 1 on Navajo reservation |
Yes | Recipients (including those receiving TMA) | Oct 95 | Oct 95 (s) | 36 months (t) | 2,934 | 1,476 | 1,458 | AFDC/ JOBS |
5.1 | 347 | Kornfeld et al. (1999) |
| Indiana Manpower Placement and Comprehensive Training Program (IMPACT) Placement Track | IN | Statewide | Yes | Recipients and applicants, more job ready | May 95 | Dec 95 (j) | 2 years (k) | 5,595 | 4,537 | 1,058 | AFDC/ JOBS |
4.7 | 288 | Fein et al. (1998) |
| Virginia Independence Program (VIP) and Virginia Initiative for Employment not Welfare (VIEW) (u) | VA | 3 cities: Lynchburg, Petersburg, and Portsmouth (v) | Yes | Recipients (w) | Jul 95 | Jul 95 | 27 months | 7,568 | 3,784 | 3,784 | AFDC/ JOBS |
4.5 | 354 | Gordon and Agodini (1999) |
| A Better Chance (ABC) | DE | 5 pilot offices | Yes | Single parent recipients and applicants (x) | Oct 95 | Sept 96 (y) | max. 18 mos. (z) | 3,959 | 2,138 | 1,821 | AFDC/ JOBS |
4.3 | 338 | Fein and Karweit (1997) Fein (1999) Fein and Lee (2000) Fein et al. (2001) |
| Family Transition Program (FTP) | FL | Escambia County | No | Recipients and applicants | May 94 | Feb 95 | 4 years | 2,815 | 1,405 | 1,410 | AFDC/ JOBS |
6.6 | 303 | Bloom et al. (1999) Bloom et al. (2000a) |
| Jobs First | CT | Manchester New Haven |
Yes | Recipients and applicants | Jan 96 | Feb 97 | 4 years | 4,803 | 2,396 | 2,407 | AFDC/ JOBS |
5.7 | 636 | Bloom et al. (2000b) Hendra, Michalopoulos and Bloom (2001) Bloom et al. (2002) |
| NOTES: Abbreviations: T=treatment; C=control; U=unemployment;
BC=British Columbia; NB=New Brunswick; N.A.=not available; IA=Candian
Income Assistance; FT=full-time; RA=random assignment; TMA=transitional
medical assistance. a. For one adult and two children. b. Evaluation also includes sample of two-parent families with results reported separately. c. All monetary values in Canadian dollars. d. Under Work Pays, the earnings disregard was $120 and the BRR was 67 percent, and a higher needs standard was used for "fill-the-gap" budgeting. e. Controls became subject to treatment conditions beginning in the fourth quarter of 1996. f. A higher needs standard (equal to $424 in 1993 for a family of three) was used for "fill-the-gap" budgeting. g. Controls assigned before January 1993 became subject to treatment conditions three years after RA. h. Controls became subject to treatment conditions beginning in the fourth quarter of 1997. i. Controls became subject to treatment conditions three years after RA. j. Randomization scheduled to end in December 1999; evaluation includes participants in first 8 months. k. Those entering after June 1995 observed for up to 6 months after Basic and Placement track distinction was eliminated in June 1997. l. Demonstration also implemented in 4 rural counties with results reported separately. m. Sample sizes are for combined one- and two-parent families. 87% of recipient cases and 80% of applicant cases are one-parent families. n. FDP provisions phased in between October 1992 and October 1993. o. Implementation of the FDP provisions was delayed in two counties until January 1995. Some results only pertain to 8 counties with implementation by October 1993. p. Sample sizes refer to number of children age 3 to 24 months with complete medical records abstraction for analysis of vaccination status at 1- and 2-year followups; a larger sample of families were in the experiment. q. Reforms applied to recipients and applicants but only former group included in evaluation. r. Sample sizes refer to number of children up to age 6 with complete medical records abstraction; 2,500 families were in the treatment and control groups. s. Randomization continued for new applicants from November 1995 to July 1997; evaluation includes only recipients as of October 1995. t. EMPOWER REDESIGN implemented in August 1997 under TANF applied to both treatment and controls. u. VIP provisions implemented in July 1995 but VIEW provisions phased in at five demonstration sites between July 1995 and October 1997. v. Evaluation also includes 2 counties but staggered implementation means no exposure to new rules at time of follow-up. w. Evaluation also includes sample of applicants between July 1995 and September 1996 but staggered implementation means little exposure to new rules at time of follow-up. x. Evaluation also includes sample of two-parent families but sample sizes were too small for separate analysis. y. Randomization continued through February 1997; evaluation includes participants in first 12 months. z. Controls became subject to treatment conditions beginning March 1997. |
The studies are grouped in both tables according to their central reform or reforms. Our categorization contains six groups. The first group, shown in Panel A of Table 3.4, consists of three experiments that focus on financial work incentives: California’s Work Pays Demonstration Program (CWPDP) and the Incentives-Only components of the Vermont Welfare Restructuring Project (WRP-IO) and the Minnesota Family Investment Program (MFIP-IO). MFIP-IO and WRP-IO were parts of dual-treatment experiments where the Incentives-Only groups experienced financial work incentives and the other experimental groups were subject to work-related activity mandates as well.
The programs listed in Panel B also provide financial work incentives, but they are implemented as earnings supplements outside the welfare system. Moreover, the earnings supplements were available only to participants who worked at least a minimum number of hours. New Hope was conducted in Wisconsin; the Self-Sufficiency Project (SSP) programs were carried out in Canada.
The third group of studies, shown in Panel C, consists of programs that imposed or strengthened requirements for mandatory work-related activities. These studies include Los Angeles Jobs-First GAIN (Greater Avenues for Independence), the 11 sites included in the National Evaluation of Welfare-to-Work Strategies (NEWWS), and the Basic Track of the Indiana Manpower Placement and Comprehensive Training Program (IMPACT).
The studies listed in Panel D combine mandatory work-related activities and a financial work incentive. In addition to the full WRP and MFIP programs, programs in Michigan (To Strengthen Michigan Families, TSMF) and Iowa (Family Investment Program, FIP) are included in this group. As Table 3.5 shows, MFIP and FIP provided more generous financial work incentives than WRP or TSMF.
Category E consists of four programs that focus on various other reforms. The Arkansas Welfare Waiver Demonstration Project (AWWDP) and New Jersey Family Development Program (FDP) each evaluate a family cap provision alone or with other reforms. The Maryland Primary Prevention Initiative (PPI) and Georgia Preschool Immunization Project (PIP) evaluate parental responsibility requirements focused on immunizations or preventative health care for children more generally.
The six evaluations in category F add time limits to a program with financial work incentives and/or mandatory work-related activities. Arizona’s EMPOWER (Employing and Moving People Off Welfare and Encouraging Responsibility) program combines a time limit with somewhat stricter JOBS sanctions. The other five programs–Indiana’s IMPACT Placement Track program, the Virginia Independence Program/Virginia Initiative for Employment not Welfare (VIP/VIEW), Delaware’s A Better Chance (ABC) program, Florida’s FTP, and Connecticut’s Jobs First program–combine a time limit with financial work incentives and a work requirement. Of all the programs we consider, these incorporate the most TANF-like bundles of reforms. They provide information about the effects of reform as a bundle.
Table 3.4 reveals that the evaluations we draw on were implemented in the 1990s under state waivers prior to the passage of PRWORA, with randomization periods that range from mid-1991 to late 1996. Thus, the reforms implemented under the studies listed in Table 3.4 are not necessarily representative of the range of individual reforms or range of policy bundles implemented across the states under PRWORA, especially at the less generous end of the spectrum (i.e., lower benefit levels, weaker financial work incentives, stricter work requirements and sanctions, and shorter time limits). Furthermore, for most settings, the economy was steadily improving during the period of randomization and follow-up. Even so, there is considerable variation across the evaluations in the initial state of the economy and the generosity of the state welfare program in terms of benefit levels (see Table 3.4).
In terms of measuring outcomes, Table 3.4 shows that about two years of follow-up data are typically available, although some programs have observed participants for up to five years post-randomization. Unless otherwise noted, the sample sizes shown are the maximum number of study participants available for analysis. In some cases, results discussed in subsequent chapters are based on smaller samples, especially when outcomes derive from survey data where the samples are often a subset of the full study population.
Most programs served both longer-term recipients and new applicants, with both single-parent and two-parent families eligible. When results are available separately for single parents, we show sample sizes specific to that group and report results in the synthesis chapters that exclude two-parent families. (When results are only available for a combined sample, the single-parent families usually dominate the sample.) Likewise, when available, we separately report sample sizes and results for recipients (those on welfare at the time of randomization) and applicants (those randomized at the time of application to welfare). In addition to stratifying results for one- and two-parent families and for recipients and applicants, many of the random assignment studies we analyze also report results for other subgroups of the study population, for example defined by educational attainment, employment history, or various composite measures of disadvantage. Appendix A discusses the key results from these subgroup analyses for each of the outcomes we consider in the synthesis. These findings are referenced in the individual chapters as well.
Finally, it is worth noting at this stage that a number of the methodological issues summarized in Section 3.1.1 above apply to the random assignment studies summarized in Tables 3.4 and 3.5. These methodological concerns will affect the weight we place on these particular studies throughout the chapters that follow. In particular, experiments will only yield valid estimates of the effect of ongoing (nonexperimental) implementation if they mimic the conditions of ongoing (nonexperimental) implementation. Failure of recipients to understand the policies that affected them–at a level similar to the level that would be expected in an ongoing (nonexperimental) program–violates this condition. As a result, the resulting estimates may be too small.
Confusion about program rules was a problem in a number of evaluations. In Arizona, 56 percent of the control group versus 61 percent of the treatment group thought they were affected by time limits (Kornfeld et al., 1999). In Delaware, a similar problem occurred with 66 percent of the controls reporting that they thought they had a time limit, compared with 84 percent of the treatment group (Fein and Karweit, 1997). Indiana is another example of this type of control group contamination. This confusion over key policy provisions leads us to place less weight on these programs in the chapters that follow.
In addition, in VIP/VIEW in Virginia and FDP in New Jersey, implementation of the treatment reforms was staggered, so that the "exposure" to the treatment varies across the study population. Another issue is that there are several cases where the treatment changed during the period of randomization (e.g., Michigan’s TSMF) or during the period of follow-up (Indiana’s IMPACT). This complicates the interpretation of the treatment impacts, which become a mixture of the two regimes. In several other studies (e.g., Arizona’s EMPOWER and Delaware’s ABC), the experiment was terminated and the reforms (or modified set of reforms) were applied to all study participants. Some of the long-term results from NEWWS may also be affected by control-group crossover, because at many of the sites, the control groups became eligible for program services during the fourth or fifth year of the follow-up. This control-group crossover limits the period during which "pure" treatment effects can be measured. This type of crossover was most likely to occur when a state implemented its TANF program in the late 1990s.
3.4.2. Outcomes Covered by Synthesis Studies
Table 3.6 summarizes the outcomes covered by the econometric and random assignment studies we include in our synthesis. The columns of the table pertain to the outcome chapters that follow: welfare caseload, employment and earnings, use of other government programs, and so on. In the case of the econometric studies, we tally the number of studies that examine a given outcome and note which studies analyze the CPS, which studies analyze administrative data (the two primary sources of data for econometric studies that meet our quality criteria), and which studies analyze other data sources. For random assignment studies, we simply indicate when the impact analysis includes one or more measures in each outcome category.29
| Study | State | Welfare use | Employment and earnings | Use of other govern-ment programs | Fertility and marriage | Income and poverty | Other measures of well-being | Child outcomes |
|---|---|---|---|---|---|---|---|---|
| I. Econometric Studies | ||||||||
| Econometric -- Administrative Data | -- | 13 | 4 | 1 | ||||
| Econometric -- CPS Data | -- | 6 | 5 | 1 | 3 | 4 | ||
| Econometric -- Other Data | -- | 3 | 3 | |||||
| II. Experimental studies (random assignment) | ||||||||
| A. Programs that focus on financial work incentives | ||||||||
| CWPDP | CA | X | X | X | X | |||
| WRP-IO | VT | X | X | X | X | X | X | |
| MFIP-IO | MN | X | X | X | X | X | X | X |
| B. Programs that focus on financial work incentives tied to hours of work | ||||||||
| New Hope | WI | X | X | X | X | X | X | |
| SSP | Canada | X | X | X | X | X | X | |
| SSP Plus | Canada | X | X | X | ||||
| SSP Applicants | Canada | X | X | X | ||||
| C. Programs that focus on mandatory work-related activities | ||||||||
| LA Jobs-1st GAIN | CA | X | X | X | X | X | X | X |
| Atlanta LFA | GA | X | X | X | X | X | X | X |
| Grand Rapids LFA | MI | X | X | X | X | X | X | X |
| Riverside LFA | CA | X | X | X | X | X | X | X |
| Portland | OR | X | X | X | X | X | X | X |
| Atlanta HCD | GA | X | X | X | X | X | X | X |
| Grand Rapids HCD | MI | X | X | X | X | X | X | X |
| Riverside HCD | CA | X | X | X | X | X | X | X |
| Columbus Integrated | OH | X | X | X | X | X | X | X |
| Columbus Traditional | OH | X | X | X | X | X | X | X |
| Detroit | MI | X | X | X | X | X | X | X |
| Oklahoma City | OK | X | X | X | X | X | X | X |
| IMPACT Basic Track | IN | X | X | X | X | |||
| D. Programs that focus on financial work incentives and mandatory work-related activities | ||||||||
| WRP | VT | X | X | X | X | X | X | |
| MFIP | MN | X | X | X | X | X | X | X |
| TSMF | MI | X | X | X | X | X | ||
| FIP | IA | X | X | X | X | |||
| E. Programs that focus on other individual reforms | ||||||||
| AWWDP | AR | (X) | (X) | (X) | X | |||
| FDP | NJ | (X) | (X) | X | ||||
| PPI | MD | X | ||||||
| PIP | GA | X | ||||||
| F. Programs that focus on TANF-like bundle of reforms (time limits with financial incentives, work-related activities, or both) | ||||||||
| EMPOWER | AZ | X | X | X | X | X | X | |
| IMPACT Placement Track | IN | X | X | X | X | |||
| VIP/VIEW | VA | X | X | X | X | |||
| ABC | DE | X | X | X | X | X | X | |
| FTP | FL | X | X | X | X | X | X | X |
| Jobs First | CT | X | X | X | X | X | X | X |
| NOTES: For full program names and citations, see Table 3.4. X=results discussed in relevant synthesis chapter; (X)=results not discussed in synthesis chapter. |
This tabulation reveals that, with the exception of the random assignment studies in category E (other reforms), all the random assignment studies and the bulk of the econometric studies cover welfare utilization. All of the experimental studies also cover employment and earnings, use of government programs (with the exception of CWPDP and the Canadian SSP), and income and poverty, but far fewer econometric analyses examine these outcomes. A smaller number of demonstration studies examine family structure, other measures of well-being, and child well-being. For the last two outcome areas, virtually all the evidence comes from random assignment studies, since survey data with the other required characteristics (i.e., panel data and large, national samples) and administrative data generally do not cover these outcomes.
3.5. ASSESSING RESULTS FROM MULTIPLE STUDIES
The previous section illustrates the range of studies available for the synthesis, as well as the outcomes covered by those studies. However, it is not enough to simply tally all the findings across the available studies. Rather, we need an approach for weighing the findings from each analysis and assessing the strength of the cumulative evidence for each policy-outcome pair. In this section, we discuss how we take the methodological issues raised in this chapter into account when synthesizing findings across studies.
In general, because even the best studies have limitations, the results from a single study provide a weak basis for making policy decisions. One’s confidence would increase if the results were consistent when different data were analyzed and when different, valid methods were employed to deal with the problem of confounding influences. When different studies yield similar results, it is less likely that the results stem from either data problems or inadequate controls for confounding factors. Such robust results are more likely to represent the true effects of the policy in question.
Ideally, in assessing the effects of a particular policy on a particular outcome, we would have a large number of studies on which to draw, where the studies were based on different data and employed different methodological approaches. Of course, this ideal is not entirely realistic, since only ten years have passed since states first began experimenting with statewide waivers to the AFDC program, and only six years have passed since the enactment of PRWORA. As seen in Table 3.6, there are a few policy-outcome combinations for which many studies are available, but many more for which only a few exist.
Moreover, it is not only the number of studies that matter; rather, it is the number of studies using different data and a mix of methods in drawing the same conclusions. In some cases, there appear to be several studies on a particular topic, but the studies are based on the same (or nearly the same) underlying data and use similar methods. In other cases, we have studies using different methods and data to explore the same topic. Confluent results from studies based on different data provide stronger evidence than confluent results from studies based on largely similar data.
Finally, as the discussion from the preceding section suggests, quality matters at least as much as quantity. Some random assignment studies carried out randomization properly, used data from multiple sources, and succeeded in communicating to the study participants which set of rules applied to them. These are the highest-quality random assignment studies, to which we give considerable weight. We give less weight to lower-quality studies, particularly those where the study participants (i.e., treatment and control group members) were unclear on the rules that applied to them.
Similarly, econometric studies differ in terms of their quality. Some provide rigorous controls for unobservable confounding factors and employ policy measures that capture multiple dimensions of policy variation. These represent the highest-quality econometric studies, to which we also give considerable weight. We give somewhat less weight to studies that employ controls for unobservable confounding factors but use only dummy variables to represent reform policies. We give little weight to studies that provide no controls for unobservables.
Since random assignment and econometric methods represent quite different approaches to the problem of confounding influences, we consider confluent results from high-quality random assignment studies and high-quality econometric studies to provide the strongest type of evidence on the effects of a particular policy. Of course, a multiplicity of high-quality studies of both types that point in the same direction yields stronger evidence still.
In the chapters that follow, we will see that there are very few policy-outcome combinations for which evidence of this quality and quantity is available. Thus, there are relatively few cases, albeit some important ones, where the research literature allows us to draw definitive conclusions about the effects of welfare reform. There are many policy-outcome pairs for which only one or two high-quality studies exist, and still more for which a few lower-quality studies are available. When the results from such studies point in the same direction, they provide suggestive evidence about the effects of the policy, particularly when the results are consistent with theoretical predictions. Nevertheless, evidence of this type is necessarily less conclusive than that from higher-quality studies. For a large number of policy-outcome pairs, little if any evidence is available.
15In addition to information about leavers' welfare use, employment, and income, USDHHS (2001a) summarizes results regarding leavers' earnings, use of other government programs, and other forms of material hardship. Other leaver studies include Ahn et al. (2000), Coulton and Verma (2000), Du et al. (2000), Fogarty and Kranley (2000), Foster (1999), Julnes et al. (2000), Loprest (2000), Loprest and Acs (2000), Midwest Research Institute (2000), Moses and Macuso (1999), Rockefeller Institute (1999), Ryan et al. (1999), Verma (2000), Westra and Routley (1999, 2000). For summaries of these and other leaver studies, see GAO (1999c), Committee on Ways and Means (2000), Acs and Loprest (2000), Isaacs and Lyon (2000), and Cancian et al. (1999a, 1999b, 2000).(back)
16For a discussion of random assignment in the social science context, see Burtless (1995) and Heckman and Smith (1995).(back)
17In practice, studies usually report more efficient results that use regression methods partially to control for the remaining (random) differences between the two groups.(back)
18On the ACF-USDHHS experience with waivers, see Harvey, Camasso, and Jagannathan (2000).(back)
19See Meyers, Glaser, and MacDonald (1998) on financial incentive changes as part of California's Work Pays Experiment. Harvey, Camasso, and Jagannathan (2000) note that at least some control group subjects in the section 1115 waiver demonstration studies believed they were subject to time limits, a family cap, or one of the other state welfare waiver provisions being evaluated. Similarly, Miller et al. (2000, Table B.1) report that many members of the Minnesota Family Investment Program (MFIP) treatment and control groups thought they were subject to time limits, even though MFIP did not involve time limits.(back)
20On DoD, see Meyer (1995). See also the discussion in Moffitt and Ver Ploeg (1999).(back)
21Because data from multiple states are necessary to implement the DoD procedure, we omit from the synthesis analyses based on data from single states, including Figlio and Ziliak (2000), Henry et al. (2000), and Klerman and Haider (2000).(back)
22We are also interested in how the effects of specific policies vary along other dimensions. Does a policy's effect vary across subgroups (e.g., whites versus blacks)? Does a policy's effect vary when the economy is good versus when the economy is bad? These interactions are potentially important (e.g., perhaps a work-based strategy will work when the economy is robustÑbecause jobs are availableÑbut would not be as effective when the economy experiences a downturn), but they are much harder to estimate. The synthesis chapters that follow find few studies that address these important issues.(back)
23Some states initially implemented their policy changes in only a portion of the state, which further complicates efforts to characterize the states' policies.(back)
24When the new policy was introduced within a calendar year, the dummy variable is usually replaced by a variable measuring the fraction of the year during which the new policy was in place.(back)
25Even when analysts agree on the appropriate way to conceptualize the policy environment, the devolution of welfare policy to the stateÑand even localÑlevel has created challenges for researchers who want to assemble the required information about the timing and nature of specific reforms as adopted and implemented. Some of the information is recorded in official documents (e.g., waiver applications), and some has been collected in a coordinated fashion (e.g., the Urban Institute's Assessing the New Federalism project, and the State Policy Documentation Project of the Center for Law and Social Policy and the Center on Budget and Policy Priorities), but the characterization of programs as implemented remains incomplete.(back)
26One study suggests that as many as 540,000 recipients may have received full-family sanctions between 1997 and 1999 (Goldberg and Schott, 2000). However, that study fails to account for families that would have left welfare anyway. Thus it probably overstates the net effect of sanctions on the welfare caseload.(back)
27Only a "rough estimate" is needed because, to some extent, statistical methods can be used to smooth over the sampling variability in individual cells.(back)
28Since the econometric studies typically focus on one outcome, we defer a discussion of the methods for these studies to the synthesis chapters that follow. In contrast, the random assignment studies typically consider multiple outcomes so that the summary provided in this section serves as a reference for all the synthesis chapters.(back)
29For the Arkansas and New Jersey family cap demonstrations listed in Panel E of Table 3.6, we note that although the impact analyses provide some results for welfare utilization, employment and earnings, or the use of other government programs, we do not discuss these findings in Chapters 4, 5, and 6, respectively. Instead, since the main focus of these demonstrations is the impact on fertility, we only discuss these experiments in terms of their impact on this outcome in Chapter 7. Likewise, the impact analyses for the two parental responsibility demonstrations (PPI and PIP) are really relevant only for Chapter 10 on child outcomes.(back)
| Table of Contents | Previous | Next |

