Skip Navigation
acfbanner  
ACF
Department of Health and Human Services 		  
		  Administration for Children and Families
          
ACF Home   |   Services   |   Working with ACF   |   Policy/Planning   |   About ACF   |   ACF News   |   HHS Home

  Questions?  |  Privacy  |  Site Index  |  Contact Us  |  Download Reader™Download Reader  |  Print Print      

Office of Planning, Research & Evaluation (OPRE) skip to primary page content
Advanced
Search

Table of Contents | Previous | Next

10. ANALYSIS OF OUTLIERS

As previously documented, welfare-to-work interventions vary in their effectiveness. In this section, we look more closely at interventions that stand out because they achieve especially high or especially low impacts. We refer to these interventions as “outlier interventions” and the atypically large or small impacts they produce as “outliers.” Somewhat arbitrarily, we define impacts that are outliers as those that are at least one standard deviation higher or lower than the mean impact of the interventions in our database. We identify outliers for all four impact measures. As in previous analyses, we focus on four selected quarters after random assignment: quarters 3, 7, 11, and 15.

Using our definition of outliers, impacts that are at least one standard deviation above or below the mean of all the impact estimates available in a given quarter, we identify outliers in two ways: First, we simply compare each impact estimate in a given calendar quarter to the weighted mean of all the impact estimates available for that quarter. We refer to these outliers as Type A outliers. Second, we identify outliers by regressing the impacts on the same independent, explanatory variables that we used previously. In other words, we use the weighted regressions that appear in Tables 4-7 and contain 13 explanatory variables to control for factors that affect the effectiveness of welfare-to-work programs. We refer to these outliers as Type B outliers.

10.1 THE PREVALENCE OF OUTLIER PROGRAMS AND OUTLIER ESTIMATES

Table 14 summarizes details about the prevalence of outliers among the quarterly estimates of the four impact measures. Across the four quarters, the evaluations of welfare-to-work programs included in our database recorded between 232 (percentage receiving AFDC) and 259 (earnings) quarterly impact estimates. Of these, between 71 (earnings) and 125 (percent employed) are designated as outliers. In relative terms, between about a quarter (earnings) and a half (percentage employed and percentage receiving AFDC) of the impact estimates are classified as outliers.

Between one-third (earnings) and two-fifths (AFDC payment and percentage employed) of the outliers are only Type A outliers. About half as many impact estimates are designated as only Type B outliers—that is, they become outliers only after controlling for explanatory factors. The remaining outliers are of both types. Thus, Type B outliers (i.e., those listed in the last two columns of Table 14) account for around 60 percent of all the impact estimates designated as outliers.

These figures demonstrate that controlling for independent influences on program impacts substantially reduces the number designated as outliers. Nonetheless, there are considerable numbers of Type B outliers. There are two reasons for this: First, because of limitations on the availability of the data, as discussed in considerable detail earlier, we could not control for all the factors that may cause impact estimates to vary systematically from one another. Second, as also previously discussed, sampling error causes impact estimates to vary across evaluated programs, but regression analysis does not control for this. Thus, the “true” impacts of some interventions may not deviate very much from the mean for our sample of interventions, even when their estimated impacts differ considerably.

In Table 15, we show the number of interventions for which outliers for a given type of impact occur in more than one quarter. We refer to such interventions as “multiple outlier interventions.” Because the statistics in Table 15 are based on four quarters of impact estimates, however, and not all evaluations estimated impacts for all four quarters, not all interventions have the same chance of having multiple outliers. Nevertheless, in focusing on interventions with multiple outliers, we aim to reduce the risk of highlighting exceptionally high or low impacts that perhaps reflect isolated instances instead of interventions that consistently under- or over-performed. Moreover, by considering multiple outlier interventions, we are able to identify programs that over- or under-perform two or more years after random assignment, thereby highlighting those that may have required time to ‘settle in’ before showing an impact. The latter might be the case for programs that focused on developing human capital, instead of emphasizing immediate job placement.

Over the four quarters of data, earnings impacts are recorded for 99 interventions, employment percentage impacts for 87 interventions, AFDC payment impacts for 90, and AFDC percentage receipt impacts for 84 interventions. Only a minority of these interventions produced multiple outliers. Seventeen of the 99 interventions with earnings impacts contain multiple outliers that are either Type A outliers, Type B outliers, or both. The same is true for 42 of the 87 interventions with percentage employed impact estimates; 27 of the 90 interventions with AFDC payment impact estimates; and 32 of the 84 interventions with percentage receiving AFDC impact estimates. The fourth column in Table 15 indicates the number of outlier impacts accounted for by these interventions. A comparison of this column with the third column in Table 14 indicates that the interventions with multiple outliers account for most of the impact estimates designated as outliers.

About half of the interventions with multiple outliers produced two or more impact estimates classified as Type B outliers, although this rises to over 60 percent in the case of the impact measure for the percentage in receipt of AFDC (20 of 32). Thus, after controlling for external and program-specific influences, far fewer interventions can be classified as multiple outlier interventions. Moreover, interventions with multiple Type B earnings outliers account for just under half of all Type B earnings outliers (23 of 14+38; cp. Table 14) and between 60 and 70 percent of the outliers for the other three impact measures.

10.2 SEPARATING POSITIVE AND NEGATIVE OUTLIERS

So far, we have presented summary results for both positive and negative outliers. We now turn to identifying the individual interventions with multiple outliers, distinguishing between those with multiple positive outliers (Table 16) and those with multiple negative outliers (Table 17). These tables show the number of quarterly outliers for interventions with multiple outliers both before and after regression adjustments that control for factors affecting program impacts (Type A and Type B outliers, respectively). Single Type B outliers are reported in parentheses if multiple Type A outliers occur for the same impact measure, and vice-versa.

Positive Outliers. Table 16 indicates that 22 interventions targeted at one-parent families and 10 interventions targeted at two-parent families have multiple positive outliers for one or more of the impact measures. Together, these 32 interventions account for 210 outliers. Interventions implemented under the GAIN program or assessed as part of the NEWWS evaluation are more likely to contain multiple outliers for several, rather than just one or two, impact measures than most other interventions.

Two-parent interventions account for only about one-fifth of the interventions in our overall sample (see Table 3) but comprise around one-third of the interventions listed in Table 16. This likely reflects the much smaller samples usually used in evaluating interventions targeted at two-parent families than used to evaluate interventions targeted at one-parent families. Smaller samples increase the statistical variance and, as a result, raise the likelihood of impacts being classified as outliers.

Most interventions that produce multiple positive Type A outliers also produce multiple positive Type B outliers, although, for a majority of interventions, there are fewer Type B outliers than Type A outliers. However, there are numerous exceptions. For example, one-parent interventions that were assessed as part of the California Work Pays Demonstration Program (CWPD) or the New York State Child Assistance Program (CAP) tend to be associated with more Type B outliers than Type A outliers. Riverside, GAIN and some of the interventions assessed as part of the NEWWS evaluation, in contrast, lose outliers as a result of regression adjustment. Most interventions that are targeted at two-parent families tend to retain the same number of multiple outliers before and after regression adjustment.

Negative Outliers. Eighteen one-parent family interventions and ten two-parent family interventions produced multiple negative outliers. These 28 interventions account for 164 outliers in total, a smaller number than in the case of positive outliers. Slightly fewer than half the interventions listed in Table 17 (13 of 28) provided financial incentives. In contrast, in the overall sample, about a third of the interventions offered these incentives (see Table 3).

Among the one-parent interventions, negative outliers less frequently stretch across more than two impact measures than positive outliers do. Moreover, although there are a few exceptions, the negative outliers tend to occur for either the AFDC impact measures or the labor market impact measures but not both. One-parent interventions with multiple negative outliers, thus, under-performed either in terms of increasing employment and earnings or in terms of reducing reliance on AFDC; but in stark contrast to interventions with multiple positive outliers, they rarely under-performed on both accounts. This pattern is absent in the case of interventions targeted at two-parent families.

Comparing Interventions with Multiple Positive and Negative Outliers. As indicated by Table 16, besides GAIN’s Riverside intervention and the NEWWS evaluation’s Portland program, which are already celebrated cases of a high-impact one-parent family interventions, other especially notable high performers among interventions targeted at one-parent families include the New York State Child Assistance Program (CAP) and the NEWWS evaluation’s Labor Force Attachment (LFA) interventions in Grand Rapids, Michigan and Riverside, California. Among interventions targeted at two-parent families, repeated high-impact performers include the GAIN program in Butte County and the California Work Pays Demonstration sites in Alameda County and San Bernardino County.

By contrast, as indicated by Table 17, Minnesota’s Family Investment Program (MFIP), Vermont’s Welfare Restructuring Project (WRP), and GAIN’s Tulare intervention under-performed for both one-parent and two-parent families. In addition, Baltimore’s Employment Initiative generated negative outliers for two-parent families, but not for one-parent families.

Most interventions appear in either Table 16 or in Table 17 but not both. Thus, they either produce positive or negative impact outliers. Of course, evaluations that encompass multiple interventions, such as GAIN or NEWWS, might have positive outliers for some interventions and negative outliers for others. However, the same interventions record both positive and negative outliers in only four instances. First, Minnesota’s Family Investment Program (MFIP) in urban Minnesota achieved positive employment outliers, but negative AFDC payment and AFDC receipt outliers. This may reflect the presence of financial incentives as part of the MFIP treatment. Second, New York State’s Child Assistance Program (CAP) in Niagara County produced atypically large increases in earnings, but was less effective in reducing the percentage of program group members receiving AFDC. This may again reflect the provision of financial incentives. However, the regression adjustment has the interesting effect of increasing the positive earnings outliers from one to two, but decreasing the negative AFDC receipt outliers from two to one. Third, GAIN’s two-parent family program in San Diego shows positive AFDC payment outliers, albeit only before regression adjustment, but negative employment impact outliers.

The fourth and final case of concurrent positive and negative outliers also represents the most complex and perplexing instance. Vermont’s incentives-only variant of its Welfare Restructuring Project (WRP) resulted in exceptionally large increases in the earnings of two-parent AFDC recipients, but substantially under-performed in terms of the other three impact indicators. Financial incentives may have been responsible for retaining a high proportion of program group members on AFDC, but the program’s under-performance in terms of the employment indicator is more difficult to explain. This said, it may reflect the fact that the program’s financial incentive structure slightly penalized program group members during their first few months of employment, but substantially rewarded them after that.20 It is conceivable that, for this reason, the program initially discouraged and delayed employment, leading to negative employment and AFDC impact outliers. However, the reasons why the program had substantially better than average impacts on earnings is unclear.

10.3 WHAT CAUSES TYPE B OUTLIERS TO OCCUR?

We noted earlier that Type B outliers occur both because it is not possible through regression analysis to control for all the factors that cause impacts to vary across interventions and because sampling error causes some impacts estimates to be exceptionally large or small. However, the relative importance of these factors cannot be measured. Moreover, almost by definition, uncertainty is inevitable about the role of factors omitted from the regression equations. Thus, one can only speculate as to why a specific intervention produces impacts that are outliers. This section contains such speculation for a few of the interventions listed in Tables 16 and 17. Useful lessons may be learned by a more in-depth study of the interventions listed in these tables than we are able to provide here.

Because of limited information in the evaluation reports used to construct our database, it has not been possible to use regression analysis to control for how welfare-to-work programs deliver their services, a factor that other authors suggest plays a significant role in determining program impacts (Bloom et al., 2003). For example, in GAIN’s Riverside intervention and in the NEWWS evaluation’s Riverside and Grand Rapids Labor Force Attachment programs, the staffs placed particular emphasis on placing program group members into employment as quickly as possible. As shown in Table 16, these programs all produced exceptionally large positive impacts. Leadership and direction provided to program staff by program management, which have been particularly associated with GAIN’s Riverside intervention, may also be important. NEWWS’ Portland program may have benefited from a feature that allowed program group members to wait for a good job before accepting employment and from arrangements that provided for close cooperation between the welfare agency and various partner organizations. Other interventions made potentially important changes in their AFDC programs that may well have proved to the detriment of their performance. For example, Minnesota’s Family Investment program provided financial incentives that increased the earnings level at which families can continue to receive AFDC, which may explain why the impacts of this program on AFDC payments and the receipt of AFDC were exceptionally low. Although the regressions included a dummy variable that was intended to control for the provision of financial incentives, it may have inadequately done so in the case of the Minnesota program.

Sampling error is most likely to be important when the sample used to estimate impacts is small. In evaluating welfare-to-work programs, much smaller samples are usually used to measure the impacts of interventions targeted at two-parent families than at one-parent families, reflecting the smaller number of two-parent families participating in AFDC. For example, 2,823 one-parent families but only 337 two-parent families were used in evaluating the Employment Initiative Demonstration in Baltimore. The corresponding figures for the GAIN program in Alameda County was 1,205 one-parent families and 182 two-parent families. This may help explain why negative outliers are associated with Baltimore and Alameda’s two-parent interventions but not the one-parent interventions. However, the GAIN program in San Diego County produced similar findings; yet the impact estimates for two-parent families were based on a large sample of 3,277 observations.




20 We estimate that a family of three would be $12 worse off per month under WRP during the first year of employment than under Vermont’s traditional welfare program. After that, they would be $168 better off. (back)

 

Table of Contents | Previous | Next