Skip Navigation
acfbanner  
ACF
Department of Health and Human Services 		  
		  Administration for Children and Families
          
ACF Home   |   Services   |   Working with ACF   |   Policy/Planning   |   About ACF   |   ACF News   |   HHS Home

  Questions?  |  Privacy  |  Site Index  |  Contact Us  |  Download Reader™Download Reader  |  Print Print      

Office of Planning, Research & Evaluation (OPRE) skip to primary page content
Advanced
Search

 Table of Contents | Previous | Next

Chapter Three

METHODOLOGICAL ISSUES

Synthesizing the literature on the causal effects of the reform policies described in the last chapter is the goal of this report. To do so, we first must assess the quality of each individual study addressing a particular topic (i.e., a policy-outcome combination). We then must assess the quality and quantity of the entire body of evidence available on that topic. Three methodological issues influence our assessment of individual studies. The first issue concerns the methods that a study uses to draw causal inferences. The second issue concerns the nature and characterization of the policy variation that the studies use to estimate the effects of the policy. The third issue concerns the data used to measure the outcomes of interest.

In this chapter, we first discuss these factors and how they contribute to our assessment of individual studies. We then provide summary information about the studies we include in our synthesis. We conclude by discussing how we weigh the results from multiple studies to draw conclusions about the effects of a particular policy on a particular outcome.

3.1. METHODS OF CAUSAL INFERENCE IN INDIVIDUAL STUDIES

This synthesis aims to answer the question: What is the effect of a given policy (e.g., a lower benefit reduction rate or a time limit) on a given welfare-related outcome (e.g., the caseload or child development), holding all else equal? To do this, we review studies that attempt to assess the causal effects of welfare reform. Although we restrict our attention to causal studies, it is important to note that not all welfare reform studies attempt to assess causation. Nor are causal studies the only types of analyses that are useful for assessing the success of welfare reform.

Examples of some important noncausal studies are the leavers studies that track the behavior of families that have left the welfare rolls. The results from 15 such studies, most of which were funded by USDHHS, are summarized by USDHHS (2001a). Those results show that 17 to 38 percent of leavers return to the welfare rolls within one year after their exit, and between 62 and 90 percent work at some time during that same year. On average, leaver families have post-exit incomes that are similar to the incomes that they had while on aid.15

Such studies provide information that is essential for monitoring the status of families that leave aid. However, they do not provide estimates of the causal effects of welfare reform. They cannot compare the behavior we do observe to the behavior we would have observed if reform had not taken place. In the language of the evaluation literature, they provide no "counterfactual" against which to assess the effects of reform.

We focus this synthesis on the causal effects of welfare reform because this is what a policymaker needs to know to make informed policy choices. For example, suppose Congress considers eliminating the federal time limit. In considering such a policy change, members of Congress would want to know how welfare-related outcomes would differ between two different policy scenarios: (1) a baseline scenario that leaves the time limit in place and (2) an alternative scenario that eliminates the time limit. When making the comparison, the legislator would want to hold everything else constant, such as the effect of the economy.

This thought experiment defines what we mean by the effect of a policy, but it does not tell us how to measure it. The reason is that we do not observe the outcome under the counterfactual scenario. Rather, we observe only the outcome corresponding to the policy that was actually chosen. The challenge facing the analyst is to devise a research design that predicts what outcomes would have been under the counterfactual policy.

This requires the researcher to design and implement a research strategy that holds all else constant. If the researcher fails to hold constant other factors that could independently influence the outcome, such as the economy, then the resulting estimates of the effects of the policy may be misleading. In the evaluation literature, such estimates are termed biased or inconsistent. They may reflect not only the effect of the policy of interest, but also the effects of the other factors. These other factors that could yield misleading results are referred to as confounding influences, or simply confounders.

The literature on the effects of welfare policies has adopted two general research strategies for dealing with confounding influences: random assignment and econometric analyses of observational data. Both of these methods have strengths and weaknesses. Because both approaches contribute to our understanding of the effects of welfare reform, we discuss each of them in some detail. In the core synthesis chapters, we consider evidence from both types of studies.

3.1.1. Random Assignment

One attractive approach to the problem of confounding factors is random assignment.16 Rather than relying on existing variation in policies or programs, the analyst induces random variation. To test a new program, a study population is chosen, such as all persons receiving aid at a particular time. Then, each member of the study population is assigned either to the control group, which is subject to the baseline policy environment, or the treatment group, which is subject to the new policy environment. The assignment is determined by the logical equivalent of a coin toss.

In principle, this approach holds everything constant except the policy whose effect the analyst seeks to estimate. Since families assigned to the new program differ from those assigned to the baseline program only by a flip of a coin, confounding influences, such as the economy, should be identical for the two groups. If randomization is implemented properly, there should be no systematic differences across the two groups other than those attributable to the different policy environments. Thus, the average effect of the policy, which is referred to as the "treatment effect" or the "impact" of the policy, can be estimated by the difference in mean welfare-related outcomes between the two groups.17

Such random assignment experiments can be a powerful evaluation tool. The crucial importance of controlling for confounding factors and the potential of random assignment for doing so led ACF-USDHHS to require random assignment evaluations as a component of section 1115 waivers.18 We review many of the studies emerging from such waiver evaluations in the chapters that follow.

Despite their advantages, however, random assignment studies have a number of disadvantages. First, random assignment evaluations can be conducted only when random assignment was performed at implementation. Random assignment is not always feasible. Even when it would be feasible, it is expensive and difficult to implement. As a result, random assignment is not always built into a program’s implementation

Second, random assignment evaluations of welfare reform capture the effect of the new policy only from the point of randomization, almost always for women who are on welfare. Some reforms, however, such as work requirements and policies designed to affect fertility and family formation, are expected to deter people from ever using welfare. Since individuals deterred from entering the program will never be included in the study population, conventional random assignment evaluations will not capture the effects of reform on welfare entries. This is important because recent evidence suggests that more than half of the decline in the welfare caseload results from changes in entry rates rather than from changes in exit rates (Haider, Klerman, and Roth, 2001).

Third, random assignment experiments may not reproduce the environment of a universally implemented program. Broader implementations may affect labor markets or service providers (e.g., the capacity of educational institutions). Experiments may be more likely to be implemented in locations with above-average management capability and may attract the best managers. Thus, implementation in other sites or more broadly in a given site may yield smaller effects.

Fourth, random assignment studies are not immune to problems that can bias their findings. For example, experimental contamination may result when treatment group members "cross-over," by moving to a location that is not part of the evaluation or when control group members become eligible to receive the program services. Sample attrition for subsequent follow-up data collection may be nonrandom and therefore bias the estimated treatment effects (Heckman, Smith, and Taber, 1998).

A final important problem involves participants’ perceptions of the rules that apply to them. In several of the random assignment studies we summarize below, members of the treatment group were confused about which of the new policies applied to them. In others, members of the control group incorrectly believed that they were subject to the new reforms. This latter form of confusion is of particular concern in exceptional-control evaluation designs, where almost all the population is subject to the new "treatment" rules and only a small fraction of the study population is held back under the old "control" rules. In an environment where most recipients are subject to welfare reform and welfare reform receives considerable public and media attention, it may be difficult to persuade the controls that they are not subject to the new rules themselves.19 Consequently, the control group may behave more like the treatment group, thereby biasing the estimated program impacts toward zero.

3.1.2. Econometric Methods for Observational Data

An alternative to random assignment is to analyze observational data, that is, to compare outcomes across different policy regimes (time-place combinations), typically using administrative data or national survey data. This is the primary method available to evaluate reforms that were not incorporated into the random assignment experiments. Unlike conventional random assignment, analyses of observational data can capture the effects of reform on welfare entries.

The key methodological problem with the analysis of observational data is that, while the policy environment will vary across observations, many confounding factors will vary as well. For policy analysis, we want to measure the effect of the policies, holding all else equal. To estimate that effect, we need to control for these confounding influences.

Regression analysis is the standard approach to this problem. To control for the effect of the economy, for example, econometric studies usually include the local unemployment rate as an independent variable in a linear regression model. In this way, standard regression methods control for observable confounding factors under the implicit assumptions that their effects are linear and additive. If these assumptions are correct and the observed variable (i.e., the unemployment rate) adequately represents the potentially confounding factor (i.e., the economy), then regression techniques eliminate any bias that could otherwise arise as a result of such a confounding factor.

Because many of the factors influencing welfare-related outcomes are difficult or even impossible to measure, the principal challenge facing econometric studies is controlling for unobserved, or unmeasureable, confounding influences. Now even more than before PRWORA, states determine their welfare programs. However, states differ in many ways besides their welfare programs, and some of those differences–in general attitudes or political sentiment, for example–may affect both welfare-related outcomes and welfare policy in the state. If such unobservable differences are not somehow controlled for, then the analyst may erroneously attribute changes in welfare-related outcomes to changes in welfare policy that are, in fact, attributable to unobserved factors. This problem of unobservable influences that may confound the relationship between welfare policy and welfare-related outcomes goes by many names in the research literature, including "unobserved heterogeneity," "policy endogeneity," "omitted variable bias," and "spurious correlation." It is the central threat to the validity of econometric studies.

In the literature on welfare reform, the standard approach to this problem is known as the difference-of-differences (DoD) method. DoD controls for two types of unobservable influences: those that vary between states, but are constant within a state over time; and those that vary over time, but similarly for all states. An example of a state-specific, time-invariant unobservable might be the state’s general political leaning. An example of a uniform national trend might be the macroeconomic policy environment.

To illustrate how the DoD approach works, we consider a very simple example in which there are two states and two time periods. Both state 1 and state 2 have the baseline policy in the first period. State 1 adopts the new policy in the second period, but state 2 does not. The DoD estimator of the effects of the new policy compares the before-and-after change in welfare-related outcomes in state 1 to the before-and-after change in state 2. Because both before-and-after changes are computed within the respective states, both control implicitly for time-invariant, state-specific factors that could confound the relationship between the policy change and the welfare-related outcome. Because both changes are computed between the same two time periods, their difference (that is, the difference of differences) nets out the influence of any nationwide trends.

Most of the econometric literature evaluating the effects of welfare reform uses a generalization of this DoD approach.20 In that generalization, this basic DoD insight is implemented in a multiple regression framework using dummy variables for each state and for each year. Because it controls for such state fixed-effects, the DoD estimator is often referred to as the "state fixed-effects" estimator. The multiple regression framework allows for more than two states and more than two periods. It also allows for the fact that states adopt new policies at different times. It can allow for state-specific linear (or quadratic) time trends to capture smoothly trending changes in a state over time. Finally, it allows the analyst to control explicitly for observable factors such as the unemployment rate and benefit levels.21

3.2. MEASURING THE POLICY ENVIRONMENT

Ideally, we would like to learn the effect of each individual policy embedded in the TANF reforms. Furthermore, we would like to learn the extent to which policies interact with each other, such that the effect of adopting two policies together is greater than (or less than) the sum of the effects of adopting each policy alone.22 To do so, we need to observe outcomes when a single policy or a given bundle of policies is implemented. Several issues arise in measuring the policy environment that are relevant for both random assignment and econometric studies. In this section, we focus on issues associated with the nature and characterization of the policy environment in random assignment and econometric studies in turn.

3.2.1. Characterizing the Policy Environment in Random Assignment Studies

Random assignment studies are designed to measure the impact of the "treatment," that is, the program features that differ between the experimental and control groups. The experimenter thus controls the policy environment being evaluated through the design of the program. In the case of welfare reform evaluations, these program features include financial incentives to encourage work, requirements to participate in work-related activities, sanctions policies, parental responsibility requirements, and so on.

For a few of the random assignment studies we consider, the treatment consists of a single policy reform, such as a family cap or a parental responsibility requirement. Two studies employ dual-treatment designs. These involve two experimental groups (in addition to the control group), both of which experience financial work incentives and one of which additionally is subject to mandated work-related activities. The dual-treatment design provides information about the effect of financial work incentives and the incremental effect of the work-related activity mandate. However, most of the random assignment studies were implemented to evaluate multifaceted state waiver programs rather than specific reform policies. Therefore most studies involve a single treatment group that is subject to multiple policy reforms. Such designs shed light on the effects of reform as a bundle, although they generally cannot be used to estimate the impact of specific reforms.

While the policy or policies being evaluated in a random assignment study may be known in principle, another issue that affects experimental studies is program implementation. Programs with the same name are often implemented very differently in one place than they are in another. Insight into implementation issues is often provided by a process study involving analyses of program records, a caseworker survey, or a recipient survey. These process analyses might reveal that some locations have the management capacity to train and motivate employees to implement a new program, while other places appear to be less successful. One might argue that we should be interested in the "average" effect of implementation. There is concern, however, that the sites participating in demonstration programs have more management capacity (or use it more effectively in the demonstration sites) than would be the case for the average site trying to implement the reforms in the context of a statewide program. In the individual synthesis chapters, we sometimes note external evidence that suggests that some of the variation across sites may result from differences in the quality of implementation.

Finally, while our synthesis focuses on a set of random assignment studies implemented during the 1990s, primarily as part of implementing welfare waivers, the individual policy components or even the bundle of components evaluated in the experiments do not necessarily equate with the state-level reforms implemented under TANF. Thus, while the impact estimates from the evaluations may capture the effect of implementing a given reform or set of reforms in a given setting during a given time period, they may not generalize to the impact that we would expect to see for the TANF program as implemented in an entire state in the post-TANF era. The findings from the random assignment studies are most useful in demonstrating the direction and magnitude of the effects for a particular reform or group of reforms, but we must be more circumspect in the inferences we draw from these studies about the impacts of policies as actually implemented across the states.

3.2.2. Characterizing the Policy Environment in Econometric Studies

While random assignment studies generate their own policy variation to estimate the effects of the policy, econometric studies make use of the variation in policies that exists between states and over time. This requires the analyst to characterize the welfare policies that are in place in each state in each year. This has proven to be a difficult undertaking, in part because there are so many policy components to characterize and in part because a single policy component can vary along several dimensions. Moreover, those dimensions may be difficult to quantify and even more difficult to measure as actually implemented at the state or local level.23

The approach taken by most analysts has been to specify a policy change in terms of the date on which the policy (or policy bundle) was adopted. The analyst constructs a dummy variable that is equal to one after the policy is in place and equal to zero before the policy is in place.24 The analyst then includes that dummy variable in her regression model. In the context of a DoD regression, the coefficient attached to such a variable indicates how much the welfare-related outcome changed in the state that adopted the policy once the policy was adopted, implicitly using states that did not adopt the policy change at the same time to control for unobservable confounding factors.

There are some ambiguities associated with this approach. For example, some analysts assume that the policy was in place once it was passed into law, whereas others assume that it was not in place until it was officially implemented.25 In practice, the appropriate date is probably even later, when the program is rolled out, staff are trained, and news of the new environment reaches recipients. Such in-the-field implementation dates are difficult to operationalize, which explains why they have never been used in observational studies. As a result, estimates of program effects are likely to be too conservative (i.e., too small).

The main virtue of the dummy-variable approach is that it is simple and transparent. Analysts may disagree about whether the legal adoption date or the official implementation date best characterizes the date on which the policy was put in place, but such disagreements are narrow. Moreover, one can test whether the difference is important empirically.

However, an important drawback of using adoption dates alone to characterize reform policies is that they capture only one dimension of policy variation. They provide no information about other dimensions of variation that might have important effects on behavior. For example, many states implemented financial work incentives during the waiver period. Some states cut the benefit reduction rate from essentially 100 percent to 75 percent, whereas others cut them to 50 percent or less. Economic theory predicts that bigger incentives should have stronger effects on employment, but the dummy variable approach treats all financial work incentives as being equal. Thus, it misses an important dimension of policy variation.

Furthermore, the dummy variable approach may be more susceptible to confounding factors than policy variables that capture additional dimensions of variation. Dummy welfare reform variables tend to equal zero in the early part of the sample period and to equal one at the end of the sample period. Thus, they are correlated with trends, or more precisely, in the context of DoD regressions, with state-specific trends that deviate from national trends. If the analyst fails to control for such trends, the results may be biased estimates of the effects of reform, since the reform dummy is correlated with the trends.

One approach to this problem is to characterize the reform more completely. Rather than simply including a dummy variable, it is sometimes possible to include a variable describing the intensity of the reform (e.g., the size of a financial work incentive). In this case, we would expect to find not only an effect when the policy is adopted, but also a larger effect when a stronger form of the policy is adopted. The variation across states allows more precise estimates. In addition, the additional implication that the effect should vary with the strength of the reform can be tested.

Of course, defining other dimensions of variation is not always simple. An example is illustrated by several analysts’ characterization of sanction policies. Starting in the waiver era and continuing into the post-PRWORA period, many states stiffened the sanctions they impose on recipients who violate their work requirements or personal responsibility mandates. In some states, an initial violation results in the adult’s portion of the grant being deleted until the recipient comes into compliance with the requirement. At the other end of the spectrum, some states cancel the family’s entire grant until the family comes into compliance (and in some states for a minimum duration). In many states, sanctions become more stringent for repeat offenders. Seven states impose lifetime, full-family sanctions for repeated noncompliance with work requirements, even if the family comes back into compliance.26

Four different sets of analysts have offered characterizations of states’ sanction policies, coding them as lenient, stringent, or in between. The characterizations are shown in Table 3.1. The aspects of state policies used to rate the different states vary between analysts; as a result, the summary ratings vary to a considerable extent. Pennsylvania is a noteworthy case in point; its sanction polices are rated as lenient by two sets of analysts, moderate by another, and severe by another. Indeed, the four sets of ratings are in agreement for only 25 of the 51 states. This poses a problem for comparing results across studies. If analysts cannot agree on what a strict sanction policy is, the effects of a "strict" sanction policy may vary across studies for reasons that have more to do with measurement than with real behavior. Moreover, none of the rankings incorporates information about the monetary value of the sanctions. This is important because a partial-family sanction in a high-benefit state may result in the same financial penalty as a full-family sanction in a low benefit state. Likewise, none of the rankings incorporates information about the rate at which sanctions are imposed, which is shown in the last two columns of the table. As mentioned in Chapter 2, standard deterrence theory predicts that both the severity of the sanction and the probability of detection should affect behavior. This suggests that any characterization of sanctions that omits information about the likelihood that they are applied is incomplete.

Another obstacle to characterizing states’ welfare reform policies is associated with implementation issues. Policy dummies, or even more detailed measures of policy characteristics, usually only capture variation in official statutes and regulations. However, the de facto variation in implementing a statute may be as important as the de jure variation in the actual statutes. For example, states with the same full-family sanction policy have varying numbers of people who have actually been sanctioned; states with similar time-limit policies vary in the fraction of people who receive extensions when they reach the time limit.

Finally, even assuming that the policy environment could be accurately captured for econometric analysis, the fact that most states implemented policies in bundles rather than individually poses an additional hurdle for statistical inference. This is similar to the problem of policy bundling in the random assignment studies. If policies are adopted together, there is less variation along each policy dimension to separately measure the effect of an individual policy. The limited number of post-reform state-year cells available for study, resulting both from the currency of the reforms and the lags in data release, makes it particularly difficult to distinguish the effect of one policy from that of another. As a result, econometric studies have typically been more successful in estimating the effects of reform-as-a-bundle than in estimating the separate effects of individual reforms.

Table 3.1-Four Characterizations of States' Initial Sanction Policies
 

Study

     

State

CEA (1999)

GAO (2000)

Burke and
Gish (1998),
as cited by
Rector and
Youssef (1999)

Pavetti and
Bloom (2001)

All
measures
agree?

Percentage
under full-family sanction

Percentage
under any
sanction

 

(1)

(2)

(3)

(4)

(5)

(6)

(7)

Alabama

Intermed.

Intermed.

Intermed.

Stringent

 

4.5

9.8

Alaska

Lenient

Lenient

Lenient

Lenient

Yes

0.0

4.3

Arizona

Intermed.

Intermed.

Intermed.

Intermed.

Yes

5.1

5.1

Arkansas

Stringent

Lenient

Stringent

Lenient

 

0.3

3.2

California

Lenient

Lenient

Lenient

Lenient

Yes

0.1

2.3

Colorado

Intermed.

Intermed.

Intermed.

Intermed.

Yes

0.0

5.4

Connecticut

Intermed.

Stringent

Intermed.

Intermed.

 

2.7

3.1

Delaware

Stringent

Stringent

Intermed.

Stringent

 

0.8

15.3

DC

Lenient

Lenient

Lenient

Lenient

Yes

 

7.4

Florida

Stringent

Stringent

Stringent

Stringent

Yes

0.3

1.8

Georgia

Stringent

Stringent

Stringent

Stringent

Yes

2.3

2.3

Hawaii

Lenient

Stringent

Lenient

Stringent

 

0.0

0.0

Idaho

Stringent

Stringent

Stringent

Stringent

Yes

1.1

1.1

Illinois

Intermed.

Intermed.

Intermed.

Intermed.

Yes

1.2

6.0

Indiana

Intermed.

Lenient

Lenient

Lenient

 

0.9

5.5

Iowa

Intermed.

Lenient

Intermed.

Stringent

 

0.6

3.3

Kansas

Stringent

Stringent

Stringent

Stringent

Yes

0.0

1.0

Kentucky

Intermed.

Lenient

Lenient

Intermed.

 

1.6

5.7

Louisiana

Intermed.

Stringent

Intermed.

Stringent

 

0.4

3.2

Maine

Lenient

Lenient

Lenient

Lenient

Yes

0.1

5.3

Maryland

Stringent

Stringent

Intermed.

Stringent

 

0.0

11.3

Massachusetts

Intermed.

Lenient

Intermed.

Stringent

 

4.7

4.7

Michigan

Intermed.

Intermed.

Intermed.

Stringent

 

2.4

4.5

Minnesota

Lenient

Lenient

Lenient

Lenient

Yes

0.3

7.6

Mississippi

Stringent

Stringent

Stringent

Stringent

Yes

0.0

0.9

Missouri

Lenient

Lenient

Lenient

Lenient

Yes

1.4

12.3

Montana

Lenient

Lenient

Lenient

Lenient

Yes

1.0

8.0

Nebraska

Stringent

Lenient

Stringent

Stringent

 

2.2

2.2

Nevada

Stringent

Intermed.

Intermed.

Intermed.

 

0.8

3.2

New Hampshire

Lenient

Lenient

Intermed.

Lenient

 

0.0

4.8

New Jersey

Intermed.

Stringent

Intermed.

Stringent

 

2.8

8.0

New Mexico

Intermed.

Intermed.

Intermed.

Intermed.

Yes

0.0

0.0

New York

Lenient

Lenient

Lenient

Lenient

Yes

0.0

0.0

North Carolina

Lenient

Lenient

Lenient

Intermed.

 

0.5

29.1

North Dakota

Intermed.

Intermed.

Intermed.

Stringent

 

7.0

7.0

Ohio

Stringent

Stringent

Stringent

Stringent

Yes

0.0

0.9

Oklahoma

Stringent

Lenient

Stringent

Stringent

 

0.0

2.2

Oregon

Intermed.

Intermed.

Intermed.

Intermed.

Yes

0.5

0.6

Pennsylvania

Stringent

Lenient

Lenient

Intermed.

 

0.0

6.3

Rhode Island

Lenient

Lenient

Lenient

Lenient

Yes

0.2

3.0

South Carolina

Stringent

Stringent

Stringent

Stringent

Yes

0.2

5.7

South Dakota

Intermed.

Stringent

Intermed.

Stringent

 

0.9

0.9

Tennessee

Stringent

Stringent

Stringent

Stringent

Yes

0.3

0.3

Texas

Lenient

Lenient

Intermed.

Intermed.

 

0.0

15.5

Utah

Intermed.

Intermed.

Intermed.

Stringent

 

0.0

4.0

Vermont

Intermed.

Lenient

Lenient

Intermed.

   

0.0

Vermont

Intermed.

Lenient

Lenient

Intermed.

   

0.0

Virginia

Stringent

Intermed.

Stringent

Stringent

 

0.7

0.7

Washington

Lenient

Lenient

Lenient

Lenient

Yes

0.0

5.6

West Virginia

Stringent

Intermed.

Intermed.

Intermed.

   

0.0

Wisconsin

Stringent

Stringent

Stringent

Stringent

Yes

4.6

22.8

Wyoming

Stringent

Stringent

Stringent

Stringent

Yes

2.0

2.0

Total/Average

       

25

1.1

6.1

NOTES: Columns (6) and (7) are from GAO (2000) and pertain to 1998. The terminology used to describe the severity of sanctions differs among authors. Our "lenient" category corresponds to the categories described as "partial/partial" by CEA (1999); "partial" by GAO (2000); "weak" by Rector and Youssef (1999); and "lenient" by Pavetti and Bloom (2001). Our "intermediate" category corresponds to the categories described as "partial/full" by CEA (1999); "graduated" by GAO (2000); "moderate" or "delayed full-check" by Rector and Youssef (1999); and "moderate" by Pavetti and Bloom (2001). Our "stringent" category corresponds to the categories described as "full/full" by CEA (1999); "full-family" by GAO (2000); "initial full-check" by Rector and Youssef (1999); and "stringent" by Pavetti and Bloom (2001).

3.3. DATA SOURCES FOR WELFARE OUTCOMES

To use either randomized trials or econometric methods to estimate the effects of a policy or program, the analyst requires data on welfare-related outcomes. In this section, we review the most commonly used data sources and discuss their utility. We also discuss sample sizes and statistical power, issues that are relevant for both econometric and experimental analyses of welfare-related outcomes.

3.3.1. Data Sources

Tables 3.2 and 3.3 summarize the major sources of administrative and survey data, respectively. Randomized trials often abstract their own data from these administrative records and from their own surveys. Econometric studies usually analyze existing sources of administrative and survey data.

As seen in Table 3.2, administrative data cover many of the outcomes of interest, and, for welfare program participants, they cover the entire caseload at a given time. The drawbacks of administrative data include issues of data quality, lack of coverage of non-TANF participants (e.g., leavers) and eligible nonparticipants (e.g., those choosing not to enter), and limited information on individual socioeconomic characteristics. Certain outcomes, such as measures of child well-being, are not typically available in administrative data sources. There is also considerable variation in state-level data systems and in their suitability for research, as well as in the extent of historical information and cross-state comparability of data systems.

Table 3.2-Sources of Administrative Data for Analysis of Welfare Reform

Data Source

Outcomes

Coverage

Notes

State reports to
ACF-USDHHS

Caseload

All states, pre- and
post-TANF

Aggregate program
counts; some aggregate
information on
distribution of caseload
by demographic group

State reports to
ACF-USDHHS

Work activities;
participation rate

All states, post-TANF

Aggregate data only on
total work activities and
participation rates, and
numbers in specific
program components;
some JOBS data available

State-specific
welfare program
data

Caseload; aid
payments; sanctions;
program activities

Within a single state;
availability of historical
data varies widely

Issues of data quality;
systems are not
consistent across states,
making cross-state
comparisons difficult

Unemployment
insurance data

Employment; earnings

Within a single state;
availability of historical
data varies widely

Gaps in coverage; data
relatively comparable
across states and some
cross-state efforts have
been mounted; limited
numbers of covariates
(difficult to identify at-risk population)

Other social
welfare program
administrative data

Participation in Food
Stamp Program,
Medicaid, subsidized
housing, child care
subsidies, foster care,
child support, etc.

Within a single state;
availability of historical
data varies widely

 

Other administrative
data (e.g., birth
certificates)

Births, etc.

Nationwide (e.g., births)
or state-specific;
historical data varies

Welfare recipients not identified

 

The major sources of survey data shown in Table 3.3 also cover many of the welfare-related outcomes of interest, often for large nationally representative samples observed both before and after welfare reform. These databases are typically rich in the socioeconomic information they contain, and they usually cover both program participants and nonparticipants. Some surveys track respondents over time so that the dynamics of behavior can be studied over both short and long horizons. The limitations of these databases include relatively small samples of welfare program participants, which is an acute problem for most longitudinal surveys; nonrandom survey nonresponse in cross-section surveys; attrition in panel surveys; and reporting errors for participation in many welfare programs. State-level survey data based on samples drawn from administrative records of welfare recipients can be hampered by problems with locating and tracking those no longer on aid.

Table 3.3-Sources of Survey Data for Analysis of Welfare Reform

Data Source

Outcomes

Coverage

Notes

Current Population Survey (CPS)

Program participation and income/poverty status in previous calendar year; employment and earnings at interview and in previous calendar year; family structure

Nationwide, relatively consistent survey content back to 1968

Sample size: 55,000 households; increasing in March 2001 and beyond

Survey of Income and Program Participation (SIPP) and
Survey of Program Dynamics (SPD)

Monthly data for same outcomes as CPS, plus (monthly) program entry and exit

Nationwide, relatively consistent survey content back to 1984

Sample size: varies, about 30,000 households at any point in time; survey is a panel, following respondents for about two-and-a-half years

Panel Study of Income Dynamics (PSID)

Same as CPS, plus program entry and exit and measures of child well-being

Nationwide sample of families followed since 1968

Sample size: 4,800 families in original cohort augmented by split-offs to 8,000 families in 2001

National Longitudinal Survey of Youth (NLSY)

Same as CPS, plus program entry and exit and measures of child well-being

Nationwide cohort of youth followed since 1979

Sample size: about 11,500 youth in original cohort plus the children of the original cohort of young women followed since 1986

National Survey of America’s Families (NSAF)

Similar to CPS plus hardship, housing, health status and health care use, attitudes, knowledge of service availability

Representative population in 13 states interviewed in 1997 and 1999

Sample size: about 44,000 households in repeated cross-sections

State surveys

Vary

Current and recent recipients

Details vary across states; issues with locating former recipients; limited cross-state comparability

Program evaluation surveys

Vary

Treatment and control groups

Details vary across evaluations

3.3.2. Statistical Power

Whether the available data are sufficient to estimate policy effects precisely using econometric methods is the subject of some discussion in the literature. (See, in particular, Adams and Hotz, 2001.) The econometric studies discussed in this synthesis almost all use DoD methods. As such, they require that the outcomes be consistently measured across time, both before and after reform, and across states, and that there be enough observations in each state-year cell to construct at least a rough estimate of the outcomes of interest in the population of interest.27

These seemingly simple requirements make most conventional data sources unusable for econometric analyses. The requirement of consistent data across states rules out state-specific administrative data files. The requirement of consistent data across years rules out most single interview studies or studies that began after reform was under way. The requirement that it be possible to construct a rough estimate of the outcomes of interest in the population of interest rules out almost all other data sources. To understand this, note that welfare participation is a relatively rare behavior. The peak national welfare caseload (in persons) was about 15 million, and in early 2001 the figure was 5.5 million; in percentage terms, this is 5.5 and 2.1 percent of the population, respectively. Whether these numbers are "large" from a social perspective is part of the policy debate. However, from a survey research perspective, these are quite small numbers. Even a moderate-sized random sample of 10,000 households is likely to yield only a few hundred households with any welfare recipients. The resulting state-specific estimates of both the rate of welfare receipt and the rate of change of welfare receipt will be noisy (i.e., it will differ considerably from the true value because of sampling variability). Much of the variation will not be a result of variation in the true number of people receiving welfare, but instead of variation in who happens to be sampled, that is, to classical sampling variability.

Thus, the requirements for consistent national data, across multiple years and for large samples, appear to rule out all survey data except the CPS, the SIPP, and perhaps the NLSY and the PSID.

Finally, it is worth noting that power issues may also arise in the experimental evaluations, especially in the analysis of effects for population subgroups. While power calculations are used to ensure that the total sample in treatment and control groups is sufficiently large to detect an impact of a given size with a high probability, the likelihood of detecting impacts of the same size on smaller subgroup samples may be much smaller. Thus, detecting impacts in subgroups will require larger (often much larger) samples and thus a much higher cost of evaluation. Furthermore, some sites are simply not large enough to support the required samples. One approach for addressing this issue is to pool results across studies and then consider subgroup differences. This is the strategy adopted by Michalopoulos and Schwartz (2000).

3.4. SUMMARY OF STUDIES INCLUDED IN THE SYNTHESIS

As noted above, our synthesis draws on both random assignment evaluations and econometric studies. To better understand these studies, we review their key features. We begin with the details of the random assignment studies.28

3.4.1. Features of Random Assignment Studies

Tables 3.4 and 3.5 summarize the features of the experimental evaluations that we draw on for the synthesis and provide a useful reference for the discussion of these evaluations in the chapters that follow. We include only those studies that reasonably approximate the types of policies implemented under TANF. As a result, we exclude some evaluations that consider very specialized reforms such as those that focus on service delivery for teen parents on welfare or at risk of welfare participation (e.g., the Learning, Earning, and Parenting (LEAP) program, the New Chance program, and the Teen Parent Demonstration program), child support policies (e.g., New York Child Assistance Program), and specialized service delivery (e.g., the Postemployment Services Demonstration program). Furthermore, we exclude some of the earlier welfare-to-work experiments that predate the 1988 Family Support Act (e.g., San Diego’s Saturation Work Initiative Model and the early GAIN experiments in several California counties). We also exclude Project Independence, which was Florida’s initial JOBS program. It was the precursor to Florida’s Family Transition Program (FTP), which we do include.

Table 3.4 describes basic features of the experiments such as the location of the demonstration, whether it was part of a statewide reform, the population served, the period of randomization and length of the follow-up, the sample sizes in treatment and control groups, the policy environment for the controls (typically AFDC/JOBS), and contextual information in the form of the unemployment rate and welfare benefit level. Table 3.5 provides details on the policy reforms applicable to the treatment group. This includes information on the central reform components of financial work incentives, mandatory work-related activities, and time limits, as well as other reforms such as sanctions, family caps, parental responsibility requirements, transitional child care and health insurance, changes in eligibility for two-parent families, and various other features (e.g., changes in asset limits and use of personal responsibility agreements).

Table 3.4: Selected Design Features of Random Assignment Studies Included in Synthesis
Name State Sites Demo part of statewide reform? Cases served RA period start RA period end F/U length Sample sizes Controls Initial conditions in state
(for RA start year)
Cite(s)
  Total T C   U rate (%) Max $ grant (a)  
A. Programs that focus on financial work incentives
California Work Pays Demonstration Project (CWPDP) CA 3 counties No Single-parent recipients (b) Oct 92 Dec 92 42 months 7,841 5,211 2,630 AFDC/
JOBS
9.3 663 Becerra et al. (1998)
Hu (2000)
Welfare Restructuring Project Incentives Only (WRP-IO) VT 6 welfare
districts
Yes Single-parent recipients and applicants (b) Jul 94 Jun 95 42 months 2,196 1,087