Table of Contents | Previous | Next |
Appendix 4.3: Impact Regression Procedures
This appendix describes the regression procedures used to calculate and test the statistical significance of all impact estimates in the report, building on the less detailed discussion of this topic in Chapter 4. It begins by explaining the analysis samples used before proceeding through the different impact regression models from the simplest to the most complex: regressions without covariates, regressions with covariates, and regressions with interactions to examine subgroup and moderator effects.
Analysis Samples Used
The unit of analysis for all impact analyses is the child. This is true irrespective of the outcome measure or data source considered; even outcomes reported by parents and caregivers (the majority) are weighted and analyzed according to the children they described. This makes all impact findings representative of all Head Start children in the nation in 2002. The child weights applied during analysis (see Appendix 1.2) make each child in this universe count equally, not each parent/caregiver nor each Head Start center nor each grantee/delegate agency. Weighting adjustments are made to account for the exclusion from the frame of “saturated” programs and centers.
Different collections of observations are used for different impact purposes. The most important variation concerns the division of the available spring 2003 data into two distinct age-level cohorts: all findings are derived and presented separately for the age 3 cohort and the age 4 cohort. A given cohort is split further into language groups for some purposes, based on the primary language used in the initial cognitive assessments of children in fall 2002 (Spanish versus English + Other). Very rarely—for certain subgroup and moderator analyses noted elsewhere in the report—a sample is deliberately restricted below this level such as when any child with a deceased biological parent is excluded from the analysis of parents’ marital status as a moderator. Further small variations occur due to missing data in the spring 2003 outcome observation period, described for cognitive outcomes derived from in-person child assessments and all other outcomes taken from interviews with parents and primary caregivers (hereafter referred to for brevity simply as “parents”).
For most cognitive outcomes, impact estimates are calculated using data from all children assessed in spring 2003.1 This gives a nearly identical sample for all findings on a given age cohort, and where relevant, language group, with variations based on the inability to compute all desired test scores for all assessed children; a small number of cases within the common sample of all assessed children had to be omitted on an outcome-by-outcome basis for this reason. Similar slight variations within a uniform basic sample occur for outcomes measured in parent interviews. Here, item nonresponse in otherwise completed interviews creates case-by-case omissions from an otherwise uniform sample in a small number of cases.
Variations in the analysis sample across data collection instruments are more common due to assessed children without parent interviews and parent interviews without child assessments. Total sample sizes (i.e., number of respondents) for each data type are provided in Exhibit A.4.3.1, as is information on the extent of overlap between the parent interview sample and the child assessment sample. Overlap is considerable for both age cohorts, all language groups, and sample sizes, prior to the outcome-by-outcome exclusions described above, and track closely between the two different data sources. There are only two ways to move closer to a single, totally uniform sample for each age cohort so that impacts on all outcomes would derive from exactly the same set of cases: impute missing outcomes (and entirely missing data collection instruments) for cases with available data for some but not all outcome measures or choose not to use data that are available by excluding observations with less than universal spring 2003 data from all analyses. We do neither of these: the latter would waste information while cutting sample sizes unnecessarily (if still only modestly), while the former would require assumptions too closely intertwined with the program impacts the study intends to measure with observed data not imputed values.
| Child Assessments | Parent Interview | Total | |
|---|---|---|---|
| Respondents | Nonrespondents | ||
| Respondents | 3,808 | 90 | 3,898 |
| Nonrespondents | 79 | 690 | 769 |
| Total | 3,887 | 780 | 4,667 |
Regressions Without Covariates
For continuous outcome variables (e.g., PPVT III adapted scale score), impact estimates are based on ordinary least-squares (OLS) regression models applied to the weighted data2 that replicate the difference-in-means calculation by expressing spring 2003 outcomes as the sum of an intercept term and a shift in the intercept produced by a dummy variable for inclusion in the Head Start group. Using Y to represent the outcome measure, this equation is
Y = A + BH ,
where H is a dummy variable for inclusion in the Head Start group. The estimated coefficients from this model, a and b (estimating A and B respectively) have the following equivalence to calculated measures from the difference-in-mean approach:
a = Y bar (c)
b = Y bar (h) - Y bar (c) ,
where Y bar (h) is the weighted mean value of Y for the Head Start sample and Y bar (c) is the weighted mean value of Y for the non-Head Start comparison sample. By either formulation, b gives an estimate of the impact of access to Head Start unbiased by selection into and out of the program, since no systematic differences can exist between the two samples (assuming complete follow-up data on Y) given both were chosen as random subsamples of all children randomly assigned and hence the universe of interest (the national population of newly entering Head Start children in communities with more potential Head Start participants than funded Federal Head Start slots).
When divided by its standard error, b follows the students’ t distribution with 51 degrees of freedom under the null hypothesis that true impact, B, is 0, where 51 is the number of degrees of freedom associated with the jackknife estimate of the variance of b. (This makes the usual OLS assumption that the dependent variable, Y, and hence all estimated coefficients, have normal distributions.) An unbiased standard error for b reflective of how the sample was drawn and weighted is obtained using replicates and weights described in Appendix 1.2. The last step conducts a two-tailed test of the null hypothesis of no Head Start impact (i.e., B=0) to allow the possibility of program effects in either direction, up or down. Three different levels of statistical significance, i.e., three different probabilities of rejecting a true null hypothesis, are used and reported in the tables of results in the body of the report, 0.05, 0.01, and 0.001.3
Logistic regressions are used in place of OLS regressions for discrete (0/1) outcome variables such as the use of dental care. Here, the specification is non-linear to accommodate the non-normal distribution of Y, which must always take on a value of 0 or 1. However, the model can essentially parallel that of continuous variables with a non-linear transformation added. Specifically, the model expresses the natural log of the odds ratio—the probability that Y=1 divided by the probability Y=0—as the sum of an intercept term and a shift in the intercept produced by a dummy variable for inclusion in the Head Start group:
ln [P/(1-P)] = C + DH,
where P is the probability that Y = 1 and, hence, 1-P is the probability Y = 0. The coefficients in this model, C and D, are estimated using a maximum-likelihood statistical routine in the SUDAAN software package that again takes appropriate account of the complex sampling and weighting structure of the data. Standard errors for these estimates are derived using jackknife replication.
Though it occupies a position similar to B’s in this model, the coefficient D does not replicate the difference-in-means calculation on Y. It does, however, capture the difference in the typical outcome between the Head Start sample and the non-Head Start sample in an appropriate fashion, calibrated in the non-natural units of log-odds. Once C and D are estimated as c and d, respectively, the meaning of d can be recovered in more intuitive units that show it to be the impact of access to Head Start on the probability of a positive outcome, such as a dental visit occurring. Favorable results in this respect translate into more frequent occurrence of the desired outcome in the real world—a positive Head Start impact. To make the translation out of log-odds space, we calculate the difference between two quantities:
-
The log-odds ratio for children in the Head Start group, c + dH, converted into the probability of a positive outcome given access to Head Start by the inverse transformation,
P(H=1) = exp (c+d) / [1 + exp (c+d)] , and
-
the log-odds ratio for children in the non-Head Start group, c, converted into the probability of a positive outcome absent access to Head Start by the same transformation,
P(H=0) = exp (c) / [1 + exp (c)] .
Hence, the impact estimate from a logistic model is
P(H=1) – P(H=0) = exp (c+d) / [1 + exp (c+d)] - exp (c) / [1 + exp (c)] .
Logically, this quantity differs from 0 if and only if d differs from 0. Thus, if assignment to the Head Start group has a statistically significant impact on the log-odds ratio through D (as estimated by d) when tested using the maximum-likelihood assumptions of the logistic model, we can conclude that it also significantly influences the probability of a favorable outcome, P(H=1) – P(H=0). Significance test results reported in the tables in the body of the report are determined in this manner.
Adding Controls for Fall 2002 Factors
The linear and logistic impact models described above can be extended to include fall 2002 characteristics of children and families as predictors of spring 2003 outcomes. The addition of these covariates is represented through the addition of a set of background variables, represented here collectively by the symbol X, to the models already presented:
Y = A + BH + EX
for continuous outcome variables, and
ln [P/(1-P)] = C + DH + FX
for dichotomous (0/1) outcome variables.
These additions do not change sample sizes in any way, since all background X variables used as covariates if not observed are imputed for all cases in the analysis sample (i.e., for all children with completed spring 2003 assessments when analyzing cognitive outcomes other than the PELS and for all children with completed parent interviews when analyzing other outcomes; see Appendix 4.1 for details). Methods of estimation and significance testing involving the key parameters, B and D, which still capture the impact of access to Head Start, are also unchanged from those described in the no-covariates case.4 Selection of the particular X variables to be included in each impact regression is discussed elsewhere in the report (including the Chapter 4 text, Appendix 4.5, and Chapters 5 through 8 of impact findings by domain).
Separate “English PPVT III adapted” scores are included in the impact regressions for children assessed initially in English and children assessed initially in Spanish too allow each covariate to play a distinct explanatory role in predicting spring outcomes. Vocabulary measures as indicators of child development depend heavily on the skills of the assessed child in the particular language in which the test is administered, in this case English. As a result, PPVT III adapted scores likely measure different aspects of language and literacy skills for predominantly English-speaking children and predominantly Spanish-speaking children in the fall of 2002, given their substantially different capabilities with the English language. Separate variables are needed to allow the regressions to take advantage of these distinct meanings when predicting outcomes.
In order to do this in analyses that include children with both Spanish- and English-language backgrounds in the same regression, a numeric value for the “English PPVT III adapted” variable is artificially assigned to children originally assessed in Spanish and a numeric value for the “Spanish PPVT III adapted” variable is artificially assigned to children originally assessed in English. The value 0 is used in each case, though any single common number would work. However, if included among the X variables with no further adjustments to the model, these “artificial zeroes” would distort estimates of the coefficients in the model since they do not have the same meaning as true 0s nor can they extend the linear scale followed by other real values to the 0 point on the axis without distorting how the model uses real values to predict outcomes.
We neutralized this potentially distorting effect by including in the set of X variables in the model a pair of 0/1 dummy variables that flag observations where artificial 0s have been inserted, one dummy variable for artificial 0s in the “English PPVT III adapted” variable and one dummy variable for artificial 0s in the “Spanish PPVT III adapted” variable. To see how this averts distortions of the other regression coefficients, consider how the linear models used (straight linear for continuous outcomes, linear in the log-odds ratio for categorical outcomes) would seek to accommodate the 0 values if no dummy variables are added. The first problem is that true 0s do not exist on the PPVT III adapted scale. This could be countered by inserting an artificial value that does fall within the defined range. However, that does not fully address the problem, nor is it actually necessary as opposed to using the more transparent device of inserting artificial 0s. Whatever the value chosen to “fill the gap” in these special X variables, its relationship to the outcome variable Y will have to be reflected by the same estimated e or f coefficient on PPVT III adapted scores as the influence of other real values on these variables. This clearly would create inaccuracies in how the model accounts for pre-test scores and, if those scores are correlated with other variables in the model, open the door to distortions in how the model represents those factors as well. Moreover, it will seriously diminish the amount of predictive information the model can extract from the pre-test measures, the very purpose for including them in the regressions in the first place.
A dummy variable for observations with artificial 0s in the English PPVT field and another dummy variable for observations with artificial 0s in the Spanish PPVT field remove this threat by giving the model the perfect ability to predict the average outcome of those cases from the coefficients on the dummy variables alone. If the Y outcome variables of these cases are distinctive at all (as we would expect), their tendency to be above or below the point at which a properly fitted linear model—the model we do not want to distort—hits the axis (the 0 point on the PPVT score number line) will be fully reflected in the coefficient on the associated dummy variable. That coefficient will provide precisely the upward or downward shift needed to account for what is distinctive about the artificial 0 cases’ outcomes on average without disturbing any of the other coefficients in the model. Hence, the simultaneous addition of covariates with artificial 0s and, in each case, the corresponding “neutralizing” dummy variable will leave all the other estimates from the model unchanged, including, crucially, the estimate of Head Start’s impact on that outcome, the coefficients e or f. At the same time, this paired insertion of covariates helps account for more of the variation in outcomes within the group of children who have real values in the English PPVT and Spanish PPVT fields.
An identical approach is used to take advantage of the distinctive predictive information in the fall 2002 Woodcock-Johnson III Letter-Word Identification scores of children in one instance assessed predominantly in English and, in the other instance, children assessed predominantly in Spanish. Like the PPVT III adapted, the Woodcock-Johnson III Letter-Word Identification test was administered to both sets of children in English, so that its meaning depends on the quite different English language skills of the two groups. Dual insertion of language-specific versions of this variable from the fall, together with “neutralizing” dummy variables flagging the cases where those variables contain meaningless artificial 0 values, again addresses this issue effectively.
Adding Interaction Terms
A final extension of the regression analysis interacts selected demographic characteristics and pretest measures with the Head Start group dummy variable, H, in order to explore if and how the intervention’s impact varies among different types of children. A single regression provides information on two questions of interest, both addressed in Chapter 9 of the text: how impacts vary with the moderating factor examined and, if that factor is a 0/1 indicator of membership in a particular group, how large an impact Head Start had on each of the subgroups defined by the moderator variable (or the set of moderator variables, if a given dimension defines more than two subgroups; e.g., race/ethnicity). Letting Z represent the moderating variable or set of variables of interest, where Z may or may not have been among the covariates previously included in the regressions through X, the impact regression equations become
Y = A + BH + EX + QZ + RHZ
for continuous outcome variables, and
ln [P/(1-P)] = C + DH + FX + SZ + THZ
for dichotomous (0/1) outcome variables.
A number of different coefficients in these models play important roles in addressing the questions of interest. Suppose Y is a continuous outcome variable, such as the number of time outs used by parents to discipline their children in the last week, and Z is a simple two-category dummy variable distinguishing boys (Z=0) from girls (Z=1). The logit model for dichotomous outcome variables is exactly the same for a given moderator variable—although the type of moderator will matter. In this example,
-
B, the coefficient on the dummy variable for assignment to the Head Start sample, H, represents the impact of Head Start on the subgroup of children not flagged by the dummy variable Z—e.g., boys, for whom Z=0. For these children, the regression equation reduces to Y = A + BH + EX, paralleling the equation used previously to determine impacts on all children.
-
Q, the coefficient on the moderator variable Z, shows how much higher or lower average outcomes are for girls (Z=1) than for boys (Z=0), when children do not have access to Head Start—that is, for children in the non-Head Start sample for whom H=0. For these children, the regression model simplifies to Y = A + EX + QZ, highlighting the role of Z in influencing spring 2003 outcomes but not telling us anything about the impact of Head Start.
-
R, the coefficient on the interaction of the Head Start sample dummy variable and the moderator variable, HZ, indicates the difference between the average impact of the intervention on girls (Z=1) and the average impact on boys (Z=0), the latter previously identified as B. From this we infer that
- B+R is the average impact of Head Start on girls, the Z=1 group. For these children, the model simplifies to Y = A + BH + EX + Q + RH or, rearranging terms, Y = [A + Q] + [B+R]H + EX, again paralleling in terms of the intercept and H and X explanatory variables the equation previously used to determine impacts on all children.
This way of looking at the results—once A, B, E, Q, and R are estimated as a, b, e, q, and r —highlights impacts on subgroups: b for the Z=0 group, b+r for the Z=1 group. Statistical significance tests on this coefficient and sum (i.e., linear combination) of coefficients tell us whether either impact differs significantly from 0, using test procedures available for this purpose in OLS (for continuous outcomes) and logit (for dichotomous outcomes).5
Another perspective on the same set of regression coefficients and related tests, highlights Z’s role as a moderator of the size of impact without reference to average impact on any particular subgroup. Restating the third bullet above as
- R, the coefficient on the interaction of the Head Start sample dummy variable and the moderator variable, HZ, indicates the degree to which Z (in this case gender) alters the size of Head Start’s impact.
This perspective shines through when the complete regression equation is reordered and certain terms taken apart and regrouped:
Y = A + EX + QZ + [B + RZ] H .
This formulation emphasizes how the size of Head Start’s influence— [B + RZ] — may vary with the background variable Z; i.e., how Z may moderate the impact of the intervention to create differences in the degree to which different types of children benefit from the program. The estimate of R and its test of significance measure the existence and strength of this moderating influence.
Continuous moderator variables, such as maternal depression scores or the pretest values of a cognitive assessment, fit easily into this last formulation. If Z can take on a range of values over an expansive ordinal and cardinal scale, no one value of Z carves out a subgroup of Head Start participants for special focus and an exclusive impact estimate. Rather, the main question is whether this factor exerts an influence on the size of impact across its entire range. The [B + RZ] H term in the preceding equation conveys a linearized version of this influence: an impact may occur when children gain access to Head Start (H=1, rather than H=0) and, if so, that impact may vary in size with the moderating factor Z along the slope and intercept defined by B + RZ.6 Thus, the test of the statistical significance of r, our estimate of R, (or of t, the estimate of T in the logit version of the model for a dichotomous outcome variable) and consideration of its magnitude tells us about the influence of Z as a moderator whether it be a categorical dummy variable distinguishing one subgroup of children from another or a continuous measure describing the characteristics of a whole range of individual children.
A final variation of the moderator/subgroup analysis occurs by replacing Z with a collection of two or more categorical dummy variables. In the current report, this arises only when looking at the moderating influence of race/ethnicity and at subgroup impacts for children in different racial/ethnic groups. Here, we use two Z variables, call them Zh and Zb, which flag Hispanic and non-Hispanic Black children respectively. The OLS version of the regression equation in this instance expands slightly into
Y = A + BH + EX + Q[Zh] + RH[Zh] + U[Zb] + VH[Zb]
Here we have the standard subgroup model with the main effect of the moderator and its interaction with the intervention echoed twice at the end of the equation, once for Hispanic children — Q[Zh] + RH[Zh] — and once for non-Hispanic Black children — U[Zb] + VH[Zb] . As before, the coefficient on H alone (i.e, B) gives the impact of Head Start on the Z=0 subgroup, only here that subgroup is defined by all Z moderator variables in the model equaling 0: Zh = Zb = 0. In other words, b, our estimate of B, measures the impact of the program on non-Hispanic, non-Black children.
This point echoes that made first in the original set of four bullet points above—how the model captures the impact of Head Start on the “omitted” group. The interpretations of estimated coefficients in the other three original bullets also apply here, repeated twice, once for Hispanic children and once for Black children. Most important in analyzing subgroup impacts are the two versions of the final bullet in that collection:
-
B+R is the average impact of Head Start on Hispanic children (the Zh=1, Zb=0 subgroup). For these children, the model simplifies to Y = A + BH + EX + Q + RH or, rearranging terms, Y = [A + Q] + [B+R]H + EX, paralleling the equation originally used to determine impacts on all children.
-
B+V is the average impact of Head Start on Black children (the Zh=0, Zb=1 subgroup). For these children, the model simplifies to Y = A + BH + EX + U + VH or, rearranging terms, Y = [A + U] + [B+V]H + EX, again paralleling the equation originally used to determine impacts on all children.
These—plus the just preceding interpretation of B—are the basis for the reported magnitudes and tests of the statistical significance Head Start’s impact on the three racial/ethnic subgroups.
1 The exception is the PELS, which comes from parent interviews and fits under discussion of that instrument and its sample definitions. (back)
2 See Appendix 1.2 for a description of how analysis weights and replicate weights were constructed based on initial probabilities of selection at different levels of sampling, adjustments for follow-up data nonresponse, and raking to external control totals. (back)
3 Operationally, the set of tests is accomplished by determining whether the calculated probability of obtaining the observed impact estimate b when B=0-known as the "p value" of the estimate-falls below the 0.05, 0.01, and/or 0.001 significance levels of the respective tests. (back)
4 With covariates added, the conversion of D from log-odds space into an estimate of impact on the probability of a positive outcome is computed using the (weighted) mean values of the new X variables across all observations in the analysis sample for a given age cohort. (back)
5 Actual computation of impact estimates for different subgroups requires conversion from log-odds space to probability units for the dichotomous outcome variables. For this purpose, the covariates in the model (other than Z) are set to their (weighted) mean values for the entire analysis sample in a given age cohort. The moderator Z is set first to 0 and probabilities with H equal to 1 (for the Head Start group) and H equal to 0 differenced to get the impact estimate for the Z=0 subgroup. Then Z is set to 1 and probabilities with H equal to 1 (for the Head Start group) and H equal to 0 differenced to get the impact estimate for the Z=1 subgroup. (back)
6 Converted from log-odds space to probability units, the size of impact on a dichotomous outcome variable will vary with Z in a nonlinear fashion that depends on the values of the other covariates in the model (the Xs) and the precise level of Z itself (through the SZ term). As always, the Xs are set to their (weighted) mean values for the entire analysis sample in a given age cohort. For parallelism, the measure of variation in impact with changes in Z (e.g., maternal depression) is then calculated at the sample-wide (weighted) mean value of Z, recognizing that this is a “local” slope coefficient showing how impact on probability varies in size with the moderating factor in the near vicinity of Z‘s mean value and may not apply elsewhere in the range of Z values. (back)
| Table of Contents | Previous | Next |

