Table of Contents | Previous | Next |
D.7 RESULTS FROM THE SERVICE INTENSITY ANALYSIS
Families in the program group received different amounts of Early Head Start services. The amount and nature of services that a particular family received was determined in part by family members themselves (because Early Head Start is a voluntary program), as well as by the amount and nature of services they were offered. Thus, the level of services received by families differed both within programs and across programs.
An important policy issue is the extent to which impacts on key outcomes varied for families who received different levels of service intensity. In Chapter III of Volume I, we identified family and site characteristics that are associated with high levels of service receipt. We then used this information to examine whether estimated impacts on key outcomes were larger for subgroups of families who received intensive services than for subgroups of families who received less intensive services. This approach only indirectly assesses whether service intensity matters, because there may be other factors besides differences in service intensity that can account for differences in impacts across subgroups.
This appendix describes our analysis to more directly assess the extent to which service intensity matters. First, we present our methodological approach, and second, the analysis findings.
1. Methodological Approach
As discussed in Chapter II, the estimation of dosage effects is complicated by the potential presence of unobservable differences between families who received different amounts of services that are correlated with child and family outcomes. If uncorrected, this “sample selection” problem can lead to seriously biased estimates of dosage effects. This section discusses our approach for adjusting for this potential selection problem.
a. Propensity Scoring
We used “propensity scoring” (Rosenbaum and Rubin 1985) as our primary approach, to try to account for sample selection bias when estimating dosage effects. In our context, this procedure identified control group members who would have been likely to receive intensive services and those who would not have been likely to receive intensive services if they had instead been assigned to the program group. Impacts for the high-service intensity group were then estimated by comparing the outcomes of program and control group families in the high-service intensity group, and similarly for the low-service intensity group. We then compared these two sets of impact estimates.
We used two versions of the propensity scoring approach: (1) the “matching method” and (2) the “cutoff method.”
The Matching Method. This method was implemented as follows:
-
Using the program group only, we estimated logit regression models predicting whether a family received intensive services. For analytic simplicity and sample size considerations, we conducted the analysis by classifying program group families into two groups: a high-service intensity group and a low-service intensity group (including those who received no services). We then estimated a logit model where the probability a program group family received intensive services was regressed on child and family characteristics measured at baseline and site indicator variables. The explanatory variables used in these logit models were posited to be associated with service intensity and with the child and family outcome measures, and were the same ones as those used in the regression models for the basic impact analysis (see Table II.6 in the main report).9
-
Predicted probabilities (propensity scores) were calculated for each program and control group member. The propensity scores were constructed using the parameter estimates from the logit models and the sample members’ explanatory variable values. The propensity scores are a function (weighted average) of the observable characteristics of the families.
-
Using the propensity scores, we matched a control group family to each program group family. A control group family was selected as a match for a program group family if, among all controls, it had the closest propensity score value to that of the program group family. Matching was performed with replacement, so that a control group family could be a match for multiple program group families.10
-
Dosage effects were then estimated by comparing the outcomes of program group members to their matched controls for each service intensity group. Impacts for those who received intensive services were estimated by comparing the average outcomes of program group members who received intensive services to the average outcomes of their matched controls. Similarly, impacts for those in the low-service intensity group were estimated by comparing the average outcomes of program group families who did not receive intensive services with their matched controls.
This propensity scoring procedure uses a flexible functional form to match control group members to program group members, based on their observable characteristics (that is, it adjusts for selection on observable variables). The procedure assumes that if the distributions of observable characteristics are similar for program group families and their matched controls in each service intensity group, then the distributions of unobservable characteristics for program and control group families should also be similar in each service intensity group. Under this (untestable) assumption, the procedure yields unbiased estimates of dosage effects.11
The Cutoff Method. We also estimated dosage effects using a variant of the matching method, which we refer to as the cutoff method. The cutoff method is based on the fact that, because of random assignment, the expected percentage of control group members who would have received intensive services if they had instead been assigned to the program group should be equal to the percentage of program group members who actually received intensive services (which, as described below, is about 33 percent using the self-reported measure from the PSI data). Similarly, we expect that 67 percent of control group families would have received less-intensive services. Thus, we can divide both the program and control groups into those with the largest propensity scores (the high-service intensity group) and those with lowest propensity scores (the low-service intensity group), and estimate impacts for each group.
Specifically, the cutoff method was implemented as follows:
-
The high-service intensity group was created by selecting program and control group members with large propensity scores, and the low-service intensity group was created by selecting those with smaller propensity scores. The high-service intensity group included the 33 percent of program group members with the largest propensity scores among all program group members, and the 33 percent of control group members with the largest propensity scores among all controls. Similarly, the low-service intensity group included the remaining 67 percent of sample members with smaller propensity scores.
-
Dosage effects were then estimated by comparing the average outcomes of program and control group members within each service intensity group. Impacts for those who received intensive services were obtained by comparing the average outcomes of program and control group members in the high-service intensity group. Similarly, impacts for those who received fewer services were obtained by comparing the average outcomes of program and control group families in the low–service intensity group.
Importantly, the matching and cutoff methods should produce similar results if the propensity scores are capturing important differences between high- and low-service intensity families that are correlated with the outcome measures. Thus, as discussed in the next section, we examined the similarity of the impact results using the two methods to test the reliability of the propensity scoring approach.
Interpretation of the Impact Estimates. A subtle, but important, point concerns the interpretation of the impact estimates using the matching and cutoff methods. The estimated impacts for the high service-intensity group tell us about the effects of Early Head Start for those families who chose to receive or had access to a significant amount of services. Similarly, the estimated impacts for the low service-intensity group tell us about program effects for those families who chose to receive or had access to smaller amounts of services. The two types of families are very different. Thus, the impact findings do not tell us about how those families in the low service intensity group would have fared if they had received more services. Nor do the impact estimates tell us about the extent to which the outcomes of an average family would have improved if that family received additional services. Instead, the findings shed light on the effectiveness of Early Head Start for those who opt to receive significant amounts of services and for those who opt to receive fewer services. We believe that these are the policy-relevant questions, because Early Head Start is a voluntary program and not a mandatory one; thus, families cannot be forced to receive a minimum amount of services.
Goodness-of-Fit Tests. The propensity scoring approach uses the predicted probabilities from the logit models to classify sample members into high- or low-service intensity groups. A fundamental question, however, is: Are families classified correctly? Clearly, we can only obtain credible impact estimates for the two service intensity groups if families are partitioned correctly into the two groups (and in particular, for control group families whose service intensity measures are not observed).
We use three categories of statistical goodness-of-fit tests to assess the success of the propensity scoring procedure: (1) those based on the parameter estimates from the logit models; (2) those based on the quality of the matches and group designations; and (3) those based on the outcome variablesthe best tests.
The first category includes goodness-of-fit measures for the parameter estimates from the logit models. For each model, we examine the pseudo-R2 value (which is based on the likelihood ratio statistic and can range from 0 to 1) and the magnitude and statistical significance of the estimated parameters. If a model has a large pseudo-R2 value and many significant and large estimated parameters, then the explanatory variables in the model can effectively distinguish between high- and low-dosage families. In this case, the propensity scoring procedure may produce unbiased estimates, because many sample members are likely to be classified correctly. The problem with these goodness-of-fit measures, however, is that a low pseudo-R2 value or few significant explanatory variables does not necessarily imply that the propensity scoring approach is unsuccessful, because there may, in fact, be few differences between those who received intensive services and those who did not. Furthermore, even if the goodness-of-fit measures are favorable, the propensity scoring procedure may not be successful if the explanatory variables are not highly correlated with the outcome variables (which is usually the case; see Chapter II).
The second category of goodness-of-fit measures are based on the quality of the matches and group designations. We conducted the following tests:
-
For the matching method, we compared, for each service intensity group, the distribution of the explanatory variables and propensity scores of program group members and their matched controls within each of five propensity scoring groups. We sorted the program group on the basis of their propensity scores from largest to smallest, and used this ordering to divide the program group into five propensity scoring groups of equal size. This analysis was done separately for high- and low-dosage program group families. We then compared the distribution of the baseline characteristics and propensity scores of program families and their matched controls within each propensity scoring group. If the matching process was determined to be unsatisfactory on the basis of these statistical tests, we re-estimated the logit regression models by including interaction terms as additional explanatory variables in the models (see Dehejia and Wahba 1999; Rubin 2001). The process was continued until a satisfactory model specification was found.
-
For the matching method, we computed the proportion of matched controls who were assigned to both the high-service and low-service intensity groups. As discussed, the matching process was conducted with replacement so that a control group family could be a match for more than one program group family. The overlap between matched controls in the low- and high-dosage groups should be less for models that predict well than for models with less predictive power. Thus, we compared the overlap from our matching process to the overlap that would be expected if controls were randomly matched with replacement to each program group family. Similarly, we calculated the percentage of all control group members who were in the matched control group samples.
-
For the cutoff method, we examined the proportion of program group families who were “assigned” to the high-dosage group who actually received intensive services, and similarly for program group families who were assigned to the low-dosage group. These proportions (that is, correct classification rates) were compared to the correct classification rates that would be expected if program group families were randomly assigned to the two dosage groups.
The final category of goodness-of-fit tests are based on the mean values of, and the impacts on, the outcome variables. Because these tests are based directly on the outcomes of interest, they are the best tests to assess the success of the propensity scoring procedure. Specifically, we conducted the following tests:
-
For the matching method, we tested, for each outcome measure, whether the weighted average of the mean outcome for the controls in the high- and low-dosage groups equals the mean outcome for the full control group. The aim of the matching method is to partition the full control group into two dosage groups. Thus, if this procedure was successful, the weighted average of the mean outcome for controls in the two dosage groups should equal the mean outcome for the full control group, where the weights are .33 and .67, respectively. Similarly, we assessed whether the weighted average of the impact estimates for the two dosage groups are similar to the impact estimates for the full sample, as should be the case for any subgroup analysis that divides the sample into mutually exclusive groups.
-
For the cutoff method, we compared the mean outcomes of “predicted” high-dosage (low-dosage) program group members to those of actual high-dosage (low-dosage) program group members. We expect that, if the mean outcomes for those in the “predicted” and “actual” dosage groups are similar for the program group, then it is likely that the mean outcomes for control group families in the two dosage groups are also accurate, and hence, that unbiased impact estimates can be obtained.
-
We compared impact results using the cutoff method and matching methods. As discussed, the cutoff and matching methods should yield similar impact results because they are both based on the same propensity scores and both partition the sample into two dosage groups.12
b. Fixed-Effects Method
In order to test the robustness of our findings using the propensity scoring approach, we also estimated dosage effects by (1) calculating, for each program group member, the difference between their 14- and 36-month outcomes (that is, the growth in their outcomes), and (2) comparing the mean difference in these growth rates for those who received intensive services and those who did not. This “fixed-effects” or “difference-in-difference” approach adjusts for selection bias by assuming that permanent unobservable differences between families in the two service intensity groups are captured by their 14-month measures. This analysis was conducted using only those outcomes that were measured at multiple time points.
Mathematically, dosage effects using the fixed-effects approach were obtained using variants of the following model:
![]()
where y36 (is the outcome at 36 months,) y14 is the outcome at 14 months, H is an indicator variable equal to 1 for high service-intensity program group members and to 0 for low service-intensity program group members, Xs are explanatory variables, e is the disturbance term, and the as and Bs are parameters to be estimated. In some specifications, we did not include the explanatory variables (that is, the Xs), and in other specifications we included the 14-month outcome measure as an explanatory variable rather than as part of the dependent variable. The parameter, a1, represents the difference in the growth of the outcome between high-service intensity and low service intensity program group members (that is, the dosage effect).
Although intuitively appealing and widely used, this approach has several serious problems in our context. First, ideally, we would want to use baseline measures of the outcomes rather than 14-month measures, because program group families had already received some services at the 14-month point. Furthermore, the high service-intensity group had received more services on average than the low service-intensity group. Thus, the 14-month measures for the two groups are likely to have already been affected by Early Head Start in different ways, which could lead to biased estimates of dosage effects. Second, the fixed-effects approach assumes that in the absence of Early Head Start, the growth trajectories of outcomes for the low and high service-intensity groups would have been similar. This assumption, however, may not be realistic for some outcome measures. Finally, this analysis is restricted to those who have available data at 14 and 36 months.
c. Measures of Service Intensity
As discussed in Chapter II of Volume I, we estimated dosage effects using two overall measures of service intensity. First, we constructed a measure using data from the PSI and exit interviews. Families were categorized as receiving intensive services if they remained in the program for at least two years and received more than a threshold level of services. The threshold level for those in center-based sites was the receipt at least 900 total hours of Early Head Start center care during the 26-month follow-up period. The threshold level for those in home-based sites was the receipt of home visits at least weekly in at least two of the three follow-up periods. Families categorized as receiving intensive services in mixed-approach sites were those who exceeded the threshold level for either center-based or home-based services. About one-third of program group families received intensive services using this definition. The service intensity rate varied from 8 to 56 percent across sites, but 9 of the 17 sites had a rate greater than 33 percent. This measure is missing for about 8 percent of program group families.
Second, we used a measure of program engagement provided by the sites for each family in the program group. Program staff rated each family as (1) consistently highly involved throughout their enrollment, (2) involved at varying levels during their enrollment, (3) consistently involved at a low level throughout their enrollment, (4) not involved in the program at all, or (5) they could not remember how involved the family was. Those 40 percent of families who were rated as consistently highly involved were considered to have received intensive services in our analysis. The program engagement rate ranged from 20 to 77 percent across sites, although 10 sites had a rate greater than 40 percent. The program engagement measure is missing for 7 percent of program group families.
There is some overlap between the two intensity measures, although there are many families who are classified as having receiving intensive services according to one measure but not the other. For example, about 58 percent of those classified as high dosage using the PSI measure were also classified as high dosage using the program engagement measure. Similarly, about half of those classified as high dosage using the program engagement measure were also classified as high dosage using the PSI measure.
The lack of perfect overlap between the two intensity measures reflects the different aspects of program involvement that they measure. The first measure is based on duration of enrollment and hours of center care or frequency of home visits and reflects the quantity of services received, while the second measure captures staff assessments of families’ level of involvement in program services in terms of both attendance and emotional engagement in program activities.
To keep the presentation manageable, we present impact estimates for 28 key outcome variables spanning a range of types of outcomes.
2. Analysis Results
In this section, we first report results from the logit models, then present the impact findings.
a. Logit Model Results and Goodness-of-Fit Tests
Table D.7A displays, for each measure of service intensity, results from a logit model where the probability that a program group family received intensive services was regressed on family, child, and site characteristics. For ease of presentation, these models are a simplified version of the models actually used in the propensity scoring analysis, which included additional explanatory variables (see the previous section) and site indicator variables (rather than variables signifying key site characteristics). The table displays the regression-adjusted probability that a family received intensive services (that is, marginal probabilities) for each family, child, and site characteristic included in the models. The table also displays the significance of these marginal probabilities.
The parameter estimates on the explanatory variables are jointly statistically significant at the 1 percent significance level. This result holds for both the PSI intensity measure and the program engagement measure.
| Variable1 | Probability Family Received Intensive Services | |
|---|---|---|
| Self-Reported PSI Measure | Program Engagement Measure | |
| Total | 32.7 | 40.3 |
| Site Characteristics | ||
| Program Approach | ||
| Center-based | 26.9 | 43.6 |
| Home-based | 39.0** | 34.3*** |
| Mixed (L) | 28.5 | 46.1 |
| Overall Implementation Level | ||
| Early | 40.6*** | 45.2 |
| Late | 32.8** | 35.4 |
| Incomplete (L) | 21.9 | 40.9 |
| Urban or Rural | ||
| Urban | 32.2 | 41.2 |
| Rural (L) | 33.2 | 39.4 |
| Unemployment Rate | ||
| Higher than 5 percent | 22.9*** | 48.2** |
| 5 percent of less (L) | 35.7 | 38.1 |
| Family and Parent Characteristics | ||
| Mother's Age at Birth of Focus Child | ||
| Less than 20 | 35.8 | 36.4 |
| 20 to 25 | 30.1 | 41.2 |
| Older than 25 (L) | 31.7 | 44 |
| Race and Ethnicity | ||
| White non-Hispanic (L) | 34.2 | 40.6 |
| Black non-Hispanic | 30.4 | 36.3 |
| Hispanic | 35.1 | 46 |
| Other | 21.5** | 34.4 |
| Primary Language | ||
| English | 32.9 | 41 |
| Other (L) | 31.9 | 38.2 |
| Mother's Education | ||
| Less than grade 12 (L) | 27 | 36.6 |
| Grade 12 or earned a GED | 39.8*** | 41.2 |
| Greater than grade 12 | 35.2* | 45.8* |
| Primary Occupation | ||
| Employed (L) | 33.3 | 48.6 |
| In school or training | 35.8 | 41 |
| Neither | 31.2 | 36.5*** |
| Living Arrangements | ||
| With spouse | 33.5 | 44.8 |
| With other adults | 34.7 | 36.9 |
| Alone (L) | 29.8 | 40.5 |
| Received AFDC/TANF | ||
| Yes | 29.6 | 37.8 |
| No (L) | 33.9 | 41.3 |
| Received Food Stamps | ||
| Yes | 32.5 | 38 |
| No (L) | 32.8 | 42.2 |
| Random Assignment Date | ||
| Before 10/96 (L) | 38.2 | 45.3 |
| 10/96 to 6/97 | 30.3** | 35.3** |
| After 6/97 | 28.8** | 39.8 |
| Child Characteristics | ||
| Age of Focus Child | ||
| Unborn | 30 | 35.7 |
| Less than 5 months | 33.3 | 40.1 |
| 5 months or older (L) | 33.9 | 43.3 |
| First Born | ||
| Yes | 29.7** | 40.1 |
| No (L) | 37.6 | 40.7 |
| Gender | ||
| Male | 32.9 | 41 |
| Female (L) | 32.4 | 39.7 |
| Mother or Anyone Else Had Concerns About Child's Overall Health and Development | ||
| Yes | 34.4 | 42.2 |
| No (L) | 32.5 | 40.2 |
| Child Received an Evaluation Because of Concerns About the Child's Overall Health and Development or Because of Suspected Developmental Delay | ||
| Yes | 37.1 | 40.8 |
| No | 32.5 | 40.3 |
| Has Established or Biological/Medical Risks | ||
| Yes | 30 | 38.5 |
| No | 33.1 | 40.7 |
| Sample Size | 1,076 Program Group Families | 1,076 Program Group Families |
|
SOURCE: HSFIS and PSI Data NOTES: 1. All estimates are regression-adjusted using logistic regression procedures where the probability a family in the program group received intensive services was regressed on the explanatory variables listed in the table. 2. For the PSI measure, families were categorized as receiving intensive services if they remained in the program for at least two years and received more than a threshold level of services. The threshold level for those in center-based sites was the receipt at least 900 total hours of Early Head Start center care during the 26-month follow-up period. The threshold level for those in home-based sites was the receipt of home visits at least weekly in at least 2 of the 3 follow-up periods. Families categorized as receiving intensive services in mixed-approach sites were those who exceeded the threshold level for either center-based or home-based services. The program engagement measure pertains to the family’s level of engagement in Early Head Start as reported by site staff. 1An “L” signifies that the variable was left out of the regression models(back) *Difference between the regression-adjusted percentage
for the subgroup relative to the percentage for the left-out subgroup
is statistically significant at the .10 level, two-tailed test |
We find some differences in service intensity levels across sites. Families in home-based programs were more likely to receive intensive services than those in center-based or mixed programs using the PSI intensity measure, but the opposite result holds using the program engagement measure. There is some evidence that service intensity levels were higher for families in sites that were early implementers than for families in other sites.
We find that better-off families were somewhat more likely to receive intensive services than were more disadvantaged families. For example, families were more likely to receive intensity services if the mother (1) had a high school degree, (2) was employed (for the program engagement measure), (3) was not receiving welfare, and (4) was living with her spouse or other adults. Importantly, however, the subgroup differences are not large, and few of the other family and child measures are statistically significant. The pseudo-R2 values from the logit models used in the propensity scoring analysis are about .12 for both service intensity measures. These relatively low values suggest that the explanatory variables included in the models do not have substantial predictive power. As a further illustration of this point, only about 58 percent of those predicted to be in the high dosage group using the cutoff method actually received high-intensity services (using the PSI measure). This correct classification rate is substantially larger than the 33 percent that would be expected if random classifications were performed, but still suggests that the predicted high-dosage group contains a substantial number of misclassified families (and similarly for the low-dosage group).13
For the matching method, we find that the distributions of the baseline characteristics of program group families and their matched controls are similar for each service intensity group (see Table D.7B which shows results for the PSI measure). Very few of the differences in key family and child characteristics between program and control group families in each dosage group are statistically significant, and program group members are clearly more similar to their matched controls than to the full control group.
| Variable | High-Dosage Group | Low-Dosage Group | All Controls | ||
|---|---|---|---|---|---|
| Program Group | Matched Control Group | Program Group | Matched Control Group | ||
| Site Characteristics | |||||
| Program Approach | |||||
| Center-based | 16.7 | 17.3 | 22.8 | 24.3 | 20.2 |
| Home-based | 49.7 | 51.5 | 43.3 | 41.2 | 44.8 |
| Mixed | 33.6 | 31.2 | 34 | 34.6 | 35 |
| Overall Implementation Level | |||||
| Early | 44.8 | 39.8 | 30.7 | 32.6 | 36.3 |
| Late | 35.8 | 41 | 41.3 | 36.4 | 37 |
| Incomplete | 19.4 | 19.1 | 28 | 31 | 26.7 |
| Urban | 51.2 | 47.2 | 57.6 | 63.6** | 58.2 |
| Unemployment Rate Higher than 5 Percent | 17.3 | 19.4 | 25.3 | 25.9 | 21.9 |
| Family and Parent Characteristics | |||||
| Mother's Age at Birth of Focus Child | |||||
| Less than 20 | 37 | 36.5 | 39.5 | 42.4 | 38.9 |
| 20 to 25 | 33.8 | 34.6 | 31.8 | 30.4 | 33.4 |
| Older than 25 | 29.3 | 28.8 | 28.7 | 27.2 | 27.8 |
| Race and Ethnicity | |||||
| White non-Hispanic | 47.8 | 47.9 | 35 | 29.4* | 38.2 |
| Black non-Hispanic | 28.2 | 30.7 | 36.2 | 41.8 | 34.1 |
| Hispanic | 21.5 | 18.2 | 24.5 | 23 | 22.8 |
| Other | 2.5 | 3.2 | 4.4 | 5.8 | 4.9 |
| Primary Language is English | 83 | 80.9 | 77.7 | 77.7 | 78.2 |
| Mother's Education | |||||
| Less than grade 12 | 38 | 39.3 | 49.9 | 53.9 | 46.2 |
| Grade 12 or earned a GED | 34.5 | 31 | 25 | 23.4 | 29.2 |
| Greater than grade 12 | 27.5 | 29.7 | 25.1 | 22.7 | 24.6 |
| Primary Occupation | |||||
| Employed | 25.2 | 24.2 | 22.5 | 21.1 | 23.2 |
| In school or training | 21.7 | 23.9 | 23.1 | 26.6 | 21 |
| Neither | 53.2 | 51.9 | 54.4 | 52.3 | 55.8 |
| Living Arrangements | |||||
| With spouse | 30.9 | 27.8 | 24.6 | 23.2 | 26.9 |
| With other adults | 38.9 | 43.5 | 38.9 | 40 | 40.4 |
| Alone | 30.2 | 28.7 | 36.5 | 36.8 | 32.7 |
| Received AFDC/TANF | 30 | 27.1 | 35.4 | 35.4 | 33.2 |
| Received Food Stamps | 43.9 | 44.4 | 46.2 | 53.1** | 46.8 |
| Random Assignment Date | |||||
| Before 10/96 | 42.9 | 38.9 | 32.8 | 31 | 35.4 |
| 10/96 to 6/97 | 29.6 | 34.6 | 31.3 | 32.6 | 32.3 |
| After 6/97 | 27.5 | 26.5 | 35.9 | 36.4 | 32.3 |
| Child Characteristics | |||||
| Age of Focus Child | |||||
| Unborn | 25.9 | 30.9 | 24.6 | 24.7 | 27.5 |
| Less than 5 months | 33 | 29.6 | 35.3 | 38.6 | 34.2 |
| 5 months or older | 41 | 39.5 | 40.1 | 36.7 | 38.3 |
| First Born | 58.1 | 53.9 | 63.7 | 66.3 | 60.6 |
| Male | 50.9 | 46.9 | 49.9 | 48.8 | 50.3 |
| Mother or Anyone Else Had Concerns About Child's Overall Health and Development | 12.4 | 12 | 12.2 | 14.8 | 14.6 |
| Child Received an Evaluation Because of Concerns About the Child's Overall Health and Development or Because of Suspected Developmental Delay | 5.7 | 8.2 | 5.4 | 5.5 | 6.4 |
| Has Established or Biological/Medical Risks | 19.6 | 21.2 | 22 | 21.4 | 19.8 |
| Sample Size | 324 | 324 | 668 | 668 | 1,011 |
| SOURCE: PSI and HSFIS data.
NOTE: Controls were matched to program group families with replacement using the propensity scoring approach (matching method) described in the text. *Difference between program and matched control
group is significantly different from zero at the .10 level, two-tailed
test |
Thus, the procedure succeeded in producing equivalent groups on the basis of observable characteristics. However, only about 55 percent of control group families were matched to program group families, which is much lower than one might expect. Furthermore, the overlap in the matched high- and low-dosage control group samples is about 12 percent of the full control group, which is not substantially smaller than the 15 percent that would be expected if random matching were performed.
In sum, the goodness-of-fit tests based on the logit regression results yield mixed results about the success of the propensity scoring procedure, but on the whole, are disappointing. On the positive side, the parameter estimates on the explanatory variables are jointly significant. Furthermore, the matching method yielded program and matched control group families with similar observable characteristics within each service intensity group. However, the pseudo-R2 values from the logit models are low (about .12); many program group families were misclassified to the high- and low-dosage groups using the cutoff method, and only slightly more than half of control group families were matched to program group families using the matching method. In addition, many of the parameters in the logit models are not statistically significant.
The results from the goodness-of-fit tests based on the outcome measures are also mixed. Table D.7C displays test results for the matching method where mean outcomes for the full control group are compared to the weighted averages of the mean outcomes for the matched controls in the low- and high-dosage groups. We find that, as expected, the mean outcomes of matched controls in the high-dosage group usually were more favorable than for those in the low-dosage group, because, as discussed, those in the high-dosage group were somewhat less disadvantaged. The differences between the full control group mean outcomes and the weighted averages of the mean outcomes for the two dosage groups usually are small in nominal terms, but are often large relative to the estimated full sample impacts on the outcomes.
| Variable | High-Service Intensity Controls (1) |
Low-Service Intensity Controls (2) |
Weighted Average of (1) and (2) (3) |
Full Control Group (4) |
Error {(3)-(4)}as a Percent the Impact on the Outcome |
|---|---|---|---|---|---|
| Bayley Mental Development Index (MDI) | 92.02 | 89.36 | 90.46 | 90.16 | 28 |
| Percentage with Bayley MDI Below 85 | 29.65 | 34.07 | 32.24 | 31.55 | -24 |
| PSI: Parental Distress | 25.8 | 25.09 | 25.39 | 25.55 | -50 |
| Center for Epidemiologic Studies Depression Scale (CES-D) Total Score | 8.13 | 7.47 | 7.74 | 7.91 | -46 |
| Percentage of Parents Who Spanked the Child in the Previous Week | 52.11 | 51.37 | 51.68 | 53.44 | -30 |
| Index of Severity of Discipline Strategies | 3.29 | 3.55 | 3.44 | 3.47 | -27 |
| Percentage of Parents Suggesting Only Mild Responses to Hypothetical Situations | 48.7 | 39.65 | 43.4 | 41.97 | 33 |
| Percentage of Parents Who Read to Their Child Every Day | 53.23 | 49.91 | 51.29 | 51.8 | 14 |
| Home Observation for Measurement of the Environment (HOME): Total Score | 26.64 | 26.36 | 26.48 | 26.93 | 90 |
| HOME: Support of Language and Learning | 10.05 | 10.16 | 10.12 | 10.35 | 100 |
| HOME: Warmth | 2.36 | 2.44 | 2.41 | 2.48 | 100 |
| Parent Supportiveness (Semistructured Play) | 3.89 | 3.78 | 3.82 | 3.87 | 63 |
| Parent Intrusiveness (Semistructured Play) | 1.76 | 1.58 | 1.66 | 1.59 | NA |
| Parent Detachment (Semistructured Play) | 1.32 | 1.31 | 1.32 | 1.25 | -350 |
| Parent Engagement (Semistructured Play) | 4.54 | 4.6 | 4.57 | 4.63 | 50 |
| Sustained Attention with Objects (Semistructured Play) | 4.73 | 4.74 | 4.74 | 4.83 | 90 |
| Negativity Toward Parent (Semistructured Play) | 1.48 | 1.3 | 1.38 | 1.31 | -117 |
| Persistence (Puzzle Challenge Task) | 4.58 | 4.46 | 4.51 | 4.55 | -400 |
| Child Behavior Checklist: Aggressive Behavior | 11.17 | 11.26 | 11.22 | 11.3 | -25 |
| Peabody Picture Vocabulary Test (PPVT-III) Standard Score | 83.89 | 82.16 | 82.88 | 82.49 | 27 |
| Percentage with PPVT <85 | 51.4 | 56.15 | 54.18 | 53.27 | -26 |
| Percentage of Caregivers Ever Employed During the 26 Months After Random Assignment | 81.73 | 82.34 | 82.09 | 83.04 | 35 |
| Percentage of Caregivers Ever in an Education or Training Program During the 26 Months After Random Assignment | 54.06 | 51 | 52.27 | 50.25 | 25 |
| Average Parent-Reported Health Status of Child | 4.07 | 4.09 | 4.08 | 4.02 | -600 |
| Continuous Biological Father Presence Child Age 14 to 36 Months | 75 | 67.06 | 70.35 | 70.25 | -3 |
| Continuous Male Presence Child Age 14 to 36 Months | 90.61 | 81.34 | 85.18 | 84.89 | -7 |
| Sample Size | 324 | 668 | 1,011 |
| SOURCE: PSI and PI Data and Bayley and Video
Assessments at 36 Months.
NOTE: Controls were matched to program group families
with replacement using the propensity scoring approach NA = Not applicable because the impact was zero for the outcome variable. |
This suggests that the estimates of dosage effects may be biased. We find similar results when the mean outcomes of program group families predicted to be in a particular dosage group using the cutoff method are compared to the mean outcomes of program group families who were actually in that dosage group (see Table D.7D).
b. Impact Results
The impact results using the matching method strongly suggest that service intensity matters (Tables D.7E and D.7F). Across a wide range of outcome variables, the estimated impacts are more beneficial for those in the high dosage group than for those in the low dosage group. For example, the impact on the Bayley MDI was 2.35 points and statistically significant at the 5 percent level for those in the high dosage group, but was only 0.39 points and statistically insignificant for those in the low dosage group. Similarly, the impact was more than 3 points on the PPVT for the high dosage group, but was small and statistically insignificant for those in the low dosage group. A similar pattern exists across other key child and family outcomes, and exists for both the PSI intensity measure and the program engagement measure. The results using the fixed effects method support the findings using the matching method for some outcomes.
The findings using the cutoff method, however, do not support the conclusion that program impacts were larger for those families who received intensive services than for families who received less intensive or no services. There is no evidence that the estimated impacts using the cutoff method were systematically larger for those in the high dosage group than for those in the low dosage group for either the PSI or program engagement measure.
In sum, it is unclear whether impacts for the full sample are concentrated in those families who received substantial amounts of Early Head Start services. We do find evidence of dosage effects using one version of the propensity scoring approach (the matching method), but do not find this evidence using another version of this approach (the cutoff method).
| Variable | High-Service Intensity Group | Low-Service Intensity Group | Full Program Group | ||
|---|---|---|---|---|---|
| Predicted | Actual | Predicted | Actual | ||
| Bayley Mental Development Index (MDI) | 94.14 | 93.08 | 89.92 | 90.27 | 91.25 |
| Percentage with Bayley MDI Below 85 | 20.18 | 22.62 | 32.66 | 31.99 | 28.73 |
| PSI: Parental Distress | 25.12 | 24.69 | 25.29 | 25.51 | 25.23 |
| Center for Epidemiologic Studies Depression Scale (CES-D) Total Score | 7.76 | 7.26 | 7.42 | 7.67 | 7.53 |
| Percentage of Parents Who Spanked the Child in the Previous Week | 37.33 | 40.68 | 52.68 | 51.04 | 47.53 |
| Index of Severity of Discipline Strategies | 2.95 | 3.09 | 3.57 | 3.5 | 3.36 |
| Percentage of Parents Suggesting Only Mild Responses to Hypothetical Situations | 57.38 | 53.11 | 38.45 | 40.4 | 44.69 |
| Percentage of Parents Who Read to Their Child Every Day | 60.48 | 61.02 | 52.93 | 52.61 | 55.41 |
| Home Observation for Measurement of the Environment (HOME): Total Score | 28 | 27.97 | 27.14 | 27.13 | 27.42 |
| HOME: Support of Language and Learning | 10.83 | 10.78 | 10.46 | 10.48 | 10.58 |
| HOME: Warmth | 2.53 | 2.55 | 2.56 | 2.56 | 2.55 |
| Parent Supportiveness (Semistructured Play) | 4.11 | 4.05 | 3.87 | 3.89 | 3.95 |
| Parent Intrusiveness (Semistructured Play) | 1.44 | 1.53 | 1.67 | 1.63 | 1.6 |
| Parent Detachment (Semistructured Play) | 1.21 | 1.27 | 1.24 | 1.21 | 1.23 |
| Parent Engagement (Semistructured Play) | 4.9 | 4.83 | 4.68 | 4.72 | 4.75 |
| Sustained Attention with Objects (Semistructured Play) | 5.08 | 5.07 | 4.84 | 4.84 | 4.92 |
| Negativity Toward Parent (Semistructured Play) | 1.19 | 1.28 | 1.29 | 1.25 | 1.26 |
| Persistence (Puzzle Challenge Task) | 4.77 | 4.78 | 4.43 | 4.41 | 4.54 |
| Child Behavior Checklist: Aggressive Behavior | 11.6 | 10.75 | 10.68 | 11.1 | 10.98 |
| Peabody Picture Vocabulary Test (PPVT-III) Standard Score | 86 | 86.06 | 82.77 | 82.61 | 83.9 |
| Percentage with PPVT <85 | 44.44 | 44.59 | 52.62 | 52.85 | 49.76 |
| Percentage of Caregivers Ever Employed During the 26 Months After Random Assignment | 85.45 | 87.35 | 85.93 | 85.01 | 85.77 |
| Percentage of Caregivers Ever in an Education or Training Program During the 26 Months After Random Assignment | 57.99 | 58.31 | 58.66 | 58.51 | 58.44 |
| Average Parent-Reported Health Status of Child | 4.06 | 4.03 | 3.99 | 4.01 | 4.01 |
| Continuous Biological Father Presence Child Age 14 to 36 Months | 70.35 | 69.46 | 64.87 | 65.22 | 66.77 |
| Continuous Male Presence Child Age 14 to 36 Months | 80.43 | 84.52 | 80.48 | 78.13 | 80.46 |
| Sample Size | 324 | 324 | 668 | 668 | 992 |
| SOURCE: PSI and PI Data and Bayley and Video
Assessments at 36 Months.
NOTE: Analysis was conducted using program group families only. Families were predicted to be in the high- or low-service intensity group on the basis of the size of their propensity scores and using the cutoff method described in the text. |
| Variable | Impact for the Full Samplea | Matching Method | Cut off Method | Fixed-Effects Method | ||||
|---|---|---|---|---|---|---|---|---|

