Table of Contents | Previous | Next |
8. THE PREDICTIVE ABILITY OF THE REGRESSIONS
The regressions that were reported earlier can be assessed in terms of their predictive ability. We do this in two ways:
First, we re-estimate the 7th quarter regressions reported in Tables 4-7, leaving out a randomly selected observation. We then see how successfully the regressions can predict the impact of the omitted observation. This test is repeated for ten randomly selected observations for each of the four impact measures. The selection of the ten observations is conducted by independently generating random numbers for each of the four impact measures. Thus, 40 separate predictive regressions were estimated in all.
Second, we re-estimate the 7th quarter regressions reported in Tables 4-7, leaving out the five observations that resulted from the three most recently evaluated mandatory welfare-to-work programs in our database.17 We then see how well the impacts of these newer programs are predicted on the basis of findings from the older welfare-to-work programs. This is important because it indicates how reliably evaluations of existing programs can predict the impacts of future programs.
In accessing both sets of predictions, it is important to recognize that their accuracy can be determined only by comparing them to estimates of program impacts; they cannot be compared to the “true” impact of a welfare-to-work intervention. As previously discussed, because each estimated impact is subject to sampling error, it is unlikely to measure the “true” effect exactly. Indeed, unless the standard error of the estimated impact is small, it could diverge considerably from the “true” value. Consequently, it is quite possible that a predicted impact and an estimated impact differ, but the former is actually closer to the “true” impact than the latter. There is no way to know for sure.
Table 12 compares 7th quarter estimated impacts with 7th quarter predicted impacts. The first panel presents the comparison for the ten randomly selected observations and the second panel reports the comparison for the five observations drawn from the three most recent evaluations. The ten observations in the first panel differ for each of the four impact measures because they were selected by four independent random draws, but the five observations in the second panel remain constant across the impact measures.
It is evident that many of the individual predicted impacts in Table 12 vary considerably from their estimated counterparts. Indeed, sometimes they even have a different sign. Moreover, for the ten randomly selected interventions, the average of the absolute value of the difference, which is shown in the bottom row of the panel for each impact measure, is larger than the average estimated impact for all four impact measures. The predictive performance of the regressions is considerably better in the case of the five most recent interventions in terms of this comparison, however. This suggests that the regressions can provide useful information about how well future welfare-to-work programs are likely to function, even though a prediction for a specific individual program is likely to be subject to considerable error.
The divergence between the estimated and predicted impacts occurs for two reasons. First, as already discussed, the estimated impacts are subject to sampling error. Second, the regressions on which the predictions are based undoubtedly fail to capture some of the systematic factors that cause the “true” impacts of welfare-to-work programs to vary. This is really an omitted variables problem. Unfortunately, it is not possible to determine the relative importance of these two sources of divergence. However, by averaging across the observations, both sources of divergence tend to wash out to some extent. Thus, except for the pair of figures appearing in the bottom right-hand corner of Table 12, the averages for the estimated and predicted impacts that are presented at the bottom of each panel are quite similar to one another. It is also interesting to note that the predicted impacts suggest that the five most recent interventions should have substantially larger impacts, on average, than the ten randomly selected interventions (with the exception of the impact on the receipt of AFDC), and the estimated impacts indicate that this is indeed the case.
Assuming, as seems likely, that the omitted variables are an important source of the differences between the estimated and predicted impacts, the results in Table 12 suggest that although it is possible to predict impacts for groups of welfare-to-work programs with reasonable accuracy, predictions of the impacts of individual interventions will often be subject to considerable error. Yet, the fact that a number of the coefficients on which the predictions are based are statistically significant implies it is possible to say something useful about a “typical” (i.e., average) welfare-to-work program. However, because it is likely that most welfare-to-work programs differ from a “typical” program in various ways that are difficult to measure, few individual programs are probably sufficiently “typical” that their impacts can be predicted with precision.
Nonetheless, the regression findings are still of use for predictive purposes. For example, local welfare officials might want to know if it is likely that a proposed welfare-to-work program will do better than already existing programs. Alternatively, they may want an idea of whether it is likely that their current welfare-to-work program is performing better than a “typical” welfare-to-work program, but not have to conduct a full evaluation to find out. The regressions can potentially provide useful information about such issues if the values pertaining to the local program are available for each of the explanatory variables used in the regressions.
To illustrate, the weighted average for 7th quarter earnings impacts in Table 2 is $99, but the predicted earnings for Case 1 in Table 12 are less than this amount. This implies that this intervention performed worst than a typical program. The fact that the estimated impact for Case 1 is also below the average impact indicates that this conclusion is probably correct. The predicted impact for Case 2, on the other hand, is well above $99, suggesting that this program performs better than a typical welfare-to-work program, a conclusion that is again verified by comparing the estimated impact with the impact for a typical welfare-to-work program. Conclusions based on the regressions about whether individual programs are performing better or worse than typical programs are not confirmed every time, of course—for example, they are not confirmed for earnings impacts for Cases 7 and 8—but as the following tabulation demonstrates, in most cases they are:
| Randomly Selected Interventions | Most Recent Intervention | |
|---|---|---|
| Earnings | 80% | 40% |
| Percent Employed | 80% | 80% |
| Average AFDC Payments | 60% | 60% |
| Percent Receiving AFDC Payments | 60% | 40% |
An important reason why a specific welfare-to-work program may perform better or worse than a “typical” program is because potentially important factors (for example, leadership and staff morale) could not be measured and included in the regressions. Thus, using the regressions to compare a particular program with a “typical” program can only be viewed as suggestive. However, by comparing the explanatory variable values for a local program with those for a “typical” program (see Table 3), an administrator can obtain pertinent information about how the former differs from the latter. Such a comparison may suggest, for example, that participation in job search might be usefully increased, while participation in basic education is decreased.
| Table of Contents | Previous | Next |

