Skip Navigation
Administration for Children and Families  
ACF
ACF Home   |   Services   |   Working with ACF   |   Policy/Planning   |   About ACF   |   ACF News   |   HHS Home

  Questions?  |  Privacy  |  Site Index  |  Contact Us  |  Download Reader™  |  Print      

Office of Planning, Research & Evaluation (OPRE) skip to primary page content
Advanced
Search

 Table of Contents | Previous | Next

II. EVALUATION DESIGN, DATA, AND ANALYTIC APPROACHES

ACYF designed a thorough and rigorous evaluation to examine the impacts of Early Head Start on key child and family outcomes. This chapter summarizes the study design, the data sources and outcome variables used in this report, and our approach to conducting the impact analysis.

A. STUDY DESIGN

The evaluation was conducted in 17 sites where Early Head Start research programs were located. Once selected for participation in the study, programs began enrolling families and worked with MPR staff to coordinate with the requirements of random assignment.

1. Site Selection

When the 68 Early Head Start programs in the first wave were funded in late 1995, they agreed, as a condition of funding, to participate in local and national research if selected to do so. In March 1996, 41 university research teams submitted proposals to the Head Start Bureau—in partnership with Wave I Early Head Start program grantees—to conduct local research and participate in the national evaluation. ACYF purposively selected 15 research sites, using a number of criteria: (1) programs had to be able to recruit twice as many families as they could serve; (2) programs had to have a viable research partner; and (3) in aggregate, programs had to provide a national geographic distribution that represented the major programmatic approaches and settings and reflected diverse family characteristics thought to be typical of Early Head Start families nationally. Applying these criteria resulted in fewer center-based programs than desired, so in 1996 ACYF selected one additional center-based program from Wave I, and in late 1997 selected another center-based program (without a local research partner) from Wave II programs (75 of which were funded in mid-1996), resulting in the full sample of 17 programs.

Because the 17 research programs were not randomly selected, the impact results cannot be formally generalized to all Early Head Start programs funded during 1995 and 1996. Instead, the results can be generalized only to the 17 programs themselves (that is, the impact results are internally valid). However, as shown in Chapter I (Table I.2), the features of the 17 programs, as well as the characteristics of their enrolled families and children, are similar to those of all Early Head Start programs in 1995 and 1996. Thus, to the extent that the quality and quantity of services offered in the 17 programs are similar to those offered nationwide, our findings about effective program practices and their impacts on children and families are likely to pertain to Early Head Start programs more broadly.

2. Sample Enrollment

Although Wave I grantees entered Head Start with varying degrees and types of experiences (see Chapter I), all had been asked not to enroll any families until it was decided whether they would be selected for the research sample. Because all programs had agreed, in submitting their original proposals, to participate in the random assignment process if they were selected for the research sample, it was not necessary to persuade any of the programs to cooperate. Thus, as soon as the programs were selected, beginning in spring 1996, MPR staff began working with their staffs to implement the random assignment process in conjunction with each program’s regular enrollment procedures. Except for recruiting about twice as many families as they could serve, programs were expected to recruit as they would in the absence of the research, with special instructions to be sure to include all the types of families that their program was designed to serve (including those whose babies had disabilities). MPR and ACYF created detailed procedures (outlined in a “frequently asked questions” document—see Appendix E.II.A) to guide the sample enrollment process.

3. Random Assignment

As soon as programs determined through their application process that families met the Early Head Start eligibility guidelines, they sent the names to MPR, and we entered the names and identifying information into a computer program that randomly assigned the families either to the program or to the control group (with equal probabilities). Program staff then contacted the program group families, while representatives of the local research partners notified the control group families of their status.

Control group families could not receive Early Head Start services until their applicant child reached the age of 3 (and was no longer eligible for Early Head Start), although they could receive other services in the community. This ensures that our analytic comparisons of program and control group outcomes represent the effects of Early Head Start services relative to the receipt of all other community services that would be available to families in the absence of Early Head Start.

Some program staff were concerned that random assignment might, by chance, result in denial of services to families with particularly high service needs. ACYF was very clear, however, that the study findings should pertain to all families and children that Early Head Start was designed to serve, including infants and toddlers with disabilities. To address program concerns, however, ACYF and MPR established a process by which programs could apply to have a family declared exempt from participating in the research. ACYF received only one request for an exemption, and it was not considered to be warranted.

Sample enrollment and random assignment began in July 1996 and were completed in September 1998. In most sites, sample intake occurred over a two-year period, although some took less time. The extended enrollment period was due in part to the extra work involved in recruiting twice as many families as programs were funded to serve, and in part to the process of new programs working out their recruitment procedures. Two programs completed sample enrollment in late 1997, and one (the 17th site) did not begin sample intake until fall 1997. Thus, the study population for the evaluation includes Early Head Start-eligible families who applied to the program between late 1996 and late 1998.

During the sample intake period, 3,001 families were randomly assigned to the program (1,513) and control (1,488) groups (Table II.1). The samples in most sites include between 150 and 200 families, divided fairly evenly between the two research groups.

Early Head Start staff implemented random assignment procedures well. We estimate that about 0.7 percent of all control group members received any Early Head Start services (that is, were “crossovers”), and most sites had no crossovers.1 Furthermore, our discussions with site staff indicate that information on nearly all eligible families who applied to the program during the sample intake period was sent to MPR for random assignment. Program staff did not provide Early Head Start services to families who were not submitted for random assignment. Hence, we believe that the research sample is representative of the intended study population of eligible families, and that any bias in the impact estimates due to contamination of the control group is small.

Random assignment yielded equivalent groups: the average baseline characteristics of program and control group members are very similar (Appendix D). This is as expected, because MPR used computer-generated random numbers to assign families. Therefore, the only difference between the two research groups at random assignment was that the program group was offered Early Head Start services and the control group was not. Thus, differences in the subsequent outcomes of the two groups can be attributed to the offer of Early Head Start services with a known degree of statistical precision.


TABLE II.1

EVALUATION SAMPLE SIZES, BY SITE AND
RESEARCH STATUS
Site Program Group Control Group Combined Sample
1 74 77 151
2 93 86 179
3 84 78 162
4 75 72 147
5 74 76 150
6 115 110 225
7 104 108 212
8 98 98 196
9 98 95 193
10 71 70 141
11 104 96 200
12 73 79 152
13 104 98 202
14 75 71 146
15 90 92 182
16 95 95 190
17 86 87 173
All Sites 1,513 1,488 3,001
NOTE: Sites are in random order.

B. DATA SOURCES AND OUTCOME MEASURES

Comprehensive data from multiple sources were used to examine the effects of Early Head Start participation on a wide range of child, parenting, and family outcomes. This section provides an overview of data sources and outcome measures used for the analysis, the response rates to the interviews and assessments, and the timing of interviews. These topics are discussed in greater detail in the Appendixes.

1. Data Sources

The follow-up data used for the analysis were collected at time points based on (1) the number of months since random assignment, and (2) the age of the focus child. Each family’s use of services and progress toward self-sufficiency were seen as likely to be a function of the amount of time since the family applied for Early Head Start services. Therefore, these data were collected at selected intervals following random assignment. Other data—particularly those related to child and family development—were more likely to be a function of the increasing age of the focus child over time. Thus, the data collection schedule for these developmental outcomes was tied to children’s birth dates. The data sources used in this report include:

  1. Parent Services Follow-Up Interview (PSI) Data Targeted for Collection 6, 15, and 26 Months After Random Assignment. These data contain information on (1) the use of services both in and out of Early Head Start (such as the receipt of home visits, and of services related to case management, parenting, health, employment, and child care); (2) progress toward economic self-sufficiency (such as employment, welfare receipt, and participation in education and training programs); (3) family health; and (4) children’s health. Most PSIs were conducted by telephone with the focus child’s primary caregiver, although some interviews were conducted in person for those not reachable by phone.

  2. Exit Interview When Children Reached 36 Months of Age. These interviews were conducted only with program group families when their children were 36 months old and had to transition out of Early Head Start. The exit interviews obtained information on the use of services in Early Head Start. Whenever possible, the interviews were conducted in conjunction with the 36-month parent interviews (see below), but in some cases were conducted in conjunction with the 26-month parent services interviews.

  3. Parent Interview (PI) Data Targeted for Collection When Children Were 14, 24, and 36 Months Old. These interviews obtained a large amount of information from the primary caregivers about their child’s development and family functioning. These data usually were collected in person, but some PIs or portions of them were conducted by telephone when necessary.

  4. Child and Family Assessments Targeted for Collection When Children Were 14, 24, and 36 Months Old. Field interviewers provided data on their observations of children’s behavior and home environments. Interviewers conducted direct child assessments (such as Bayley assessments) and videotaped structured parent-child interactions. Several measures constructed using these data overlap with those constructed from the PI data, which allowed us to compare impact findings using the two data sources.

  5. Father Interviews Targeted for Collection When Children Were 24 and 36 Months Old. In addition to asking mothers about the children’s father, we interviewed the men directly about fathering issues at the time of the 24- and 36-month birthday-related interviews.2 The father study was conducted in 12 sites only. Father observational data were collected in 7 sites.

  6. Baseline Data from the Head Start Family Information System (HSFIS) Program Application and Enrollment Forms. We used these forms, completed by families at the time of program application, to create subgroups defined by family characteristics at baseline, and to adjust for differences in the characteristics of program and control group members when estimating program impacts. We also used the forms to compare the characteristics of interview respondents and nonrespondents, and to construct weights to adjust for potential nonresponse bias.

  7. Baseline Data from Selected Sites on Mother’s Risk of Depression. Local researchers in eight sites administered the Center for Epidemiologic Studies Depression Scale (CES-D) at baseline. These data were used in the subgroup analysis to assess whether impacts differed for mothers at risk of depression and for those who were not.

  8. Data from the Implementation Study. Finally, the analysis used data from the implementation study to define subgroups based on program characteristics (such as program approach and level of program implementation) and site characteristics (such as urban or rural status and welfare regulations).

MPR prepared all the follow-up data collection instruments and trained all field staff. In all sites but one (where MPR collected the data), data collection field staff were hired by the local research teams, who were responsible, under subcontract to MPR, for collecting the data and monitoring data quality. Respondents were offered modest remuneration and a small gift to complete each set of interviews and assessments. Appendix B describes the data collection procedures in greater detail. Details about all the measures can be found in Chapter V and in Appendix C.3

It is important to recognize that linking PIs and child and family assessments to the age of the child, rather than to a fixed period after random assignment, means that at the time those instruments were administered, families were exposed to the program for different lengths of time. Nevertheless, questions about children’s development at particular ages are policy relevant. It is also of policy interest, however, to assess impacts for children and families with similar lengths of exposure to the program. Therefore, as described in Section C, we estimated impacts by doing subgroup analyses based on the child’s age at random assignment (so that program exposure times would be similar within each age group).

It is also important to recognize that at the 14-month birthday-related interviews, many families had been exposed to Early Head Start for only a short time, and especially so for families with older focus children. Thus, we did not expect impacts to appear at 14 months. In this report, we focus on the child, parenting, and family outcomes when children are 2 and 3 years old.

In sum, in this report we present impact findings using follow-up data from the 6-, 15-, and 26-month PSIs, from the exit interview, and from the 14-, 24-, and 36-month PIs and child and family assessments. Thus, our impact findings cover the first three years of the focus children’s lives. A longitudinal study is underway that will follow and interview program and control group families just before the focus children enter kindergarten to assess the longer-term effects of Early Head Start.

2. Response Rates

Table II.2 displays overall response rates for key data sources by research status,4 as well as response rates for various combinations of interviews. Interview respondents are sample members who provided data that could be used to construct key outcome variables. Nonrespondents include those who could not be located, as well as those who could be located but for whom complete or usable data were not obtained (Appendix B).

Response rates were higher for the PSIs and the PIs than for the Bayley and video assessments. Furthermore, as expected, response rates decreased somewhat over time. The rate was about 82 percent to the 6-month PSI, 75 percent to the 15-month PSI, and 70 percent to the 26-month PSI. It was 78 percent to the 14-month PI, 72 percent to the 24-month PI, and 70 percent to the 36-month PI. At 14 months, it was 63 percent to the Bayley assessment and 66 percent to the video assessment, while at 36 months, it was about 55 percent to each. About 57 percent of sample members completed all three PIs, 39 percent completed all three video assessments, and 35 percent completed all three Bayley assessments.5 The percentages who completed both the 24- and 36-month interviews were about 5 percentage points higher than those who completed all three interviews.6

Importantly, response rates were similar for program and control group members for all data sources. Although response rates were consistently 2 to 6 percentage points higher for the program group, this differential did not result in any attrition bias, as the following analyses demonstrate.

TABLE II.2

RESPONSE RATES TO KEY DATA SOURCES
(Percentages)
Data Source Program Group Control Group Combined Sample
Parent Service Interviews (PSIs):
     6-Month 83.9 79.3 81.6
     15-Month 76.1 74.4 75.2
     26-Month 71.1 67.9 69.5
     15- and 26-Month 63 59.9 61.5
     All three 58.6 54.4 56.5
Parent Interviews (PIs):
     14-Month 79.1 77.1 78.1
     24-Month 73.9 70.4 72.2
     36-Month 73.2 67.4 70.3
     24- and 36-Month 64.4 58.2 61.4
     All three 59.4 53.9 56.7
Bayley Assessments:
     14-Month 64.2 61.2 62.7
     24-Month 61.5 57.1 59.4
     36-Month 58.1 52.4 55.3
     24- and 36-Month 46.5 40.6 43.6
     All three 37 32.6 34.8
Video Assessments:
     14-Month 66.5 65.2 65.8
     24-Month 62.2 57.5 59.9
     36-Month 57.8 52.7 55.3
     24- and 36-Month 48.1 42.7 45.4
     All three 40.8 37 38.9
Combinations:
     PSI 15 and PI 24 65.6 63.2 64.4
     PSI 26 and PI 36 63.9 58.7 61.3
     PI 24 and Bayley 24 60.5 56.5 58.6
     PI 24 and Video 24 61.5 57.1 59.4
     Bayley 24 and Video 24 55.9 51.9 53.9
     PI 24, Bayley 24, and Video 24 55.4 51.5 53.5
     PI 36 and Bayley 36 57.4 52 54.7
     PI 36 and Video 36 57.4 52.4 54.9
     Bayley 36 and Video 36 53.2 47.9 50.6
     PI 36, Bayley 36, and Video 36 52.8 47.6 50.2
     PI 24 and Bayley 36 52.2 46 49.2
     PI 24 and Video 36 52.4 47 49.7
     Video 24 and PI 36 55.8 48.8 52.3
     Video 24 and Bayley 36 47.2 40.9 44.1
Sample Size 1,513 1,488 3,001

 

In general, the same families responded to the different interviews (Table II.2). For example, among those who completed a 36-month PI, about 87 percent completed a 24-month PI, and 81 percent completed both a 14- and 24-month PI. Similarly, among those who completed a 36-month video assessment, about 99 percent also completed a 36-month PI, and about 92 percent also completed a 36-month Bayley assessment.

Response rates differed across sites (Table II.3). The rate to the 26-month PSI ranged from 55 percent to 81 percent, although it was 70 percent or higher in 11 sites. Similarly, response rates to the 36-month PI ranged from 51 percent to 81 percent; 12 sites had a rate greater than 70 percent, but 3 sites had a rate less than 60 percent (for the control group). The response rate to the 36-month Bayley and video assessments varied more, ranging from about 27 percent to 76 percent, with less than half the sites having a response rate greater than 60 percent. Response rates for the program group were substantially larger than those for the control group in some sites, although the reverse was true in a few sites.7

TABLE II.3

RESPONSE RATES TO THE 26-MONTH PSI, 36-MONTH PI AND 36-MONTH ASSESSMENTS,
BY SITE
(Percentages)
  26-Month PSI 36-Month PI 36-Month Bayley 36-Month Video
Site Program
Group
Control
Group
Total Program
Group
Control
Group
Total Program
Group
Control
Group
Total Program
Group
Control
Group
Total
1 86 73 79 86 77 81 78 65 72 84 69 76
2 62 62 62 70 57 64 55 41 48 60 44 53
3 76 77 77 76 77 77 56 53 54 62 59 60
4 60 61 61 88 67 78 65 54 60 72 56 64
5 76 67 71 80 64 72 61 36 48 59 45 52
6 54 57 56 65 65 65 49 46 48 45 42 44
7 62 69 65 51 52 51 46 46 46 35 40 37
8 80 83 81 82 72 77 63 56 60 68 62 65
9 58 52 55 53 49 51 40 35 37 27 27 27
10 61 60 60 61 64 62 61 57 59 58 59 58
11 79 68 74 69 73 71 53 55 54 55 53 54
12 79 61 70 75 67 71 52 46 49 45 46 45
13 74 73 74 82 70 76 60 57 58 65 60 63
14 67 73 68 79 73 47 65 55 47 54 50
15 73 75 74 77 76 76 59 62 60 57 63 60
16 78 74 76 77 74 75 75 71 73 74 71 72
17 91 71 81 94 68 81 78 49 64 81 54 68
Total 71 68 70 73 67 70 58 52 55 58 53 55
NOTE: Sites are in random order.

Table II.4 displays response rates for key subgroups defined by site and family characteristics at random assignment. The family subgroups were constructed using HSFIS data collected at the time of program application, which are available for both interview respondents and nonrespondents. Asterisks in the table signify whether differences in the variable distributions for respondents and the full sample of respondents and nonrespondents are statistically significant at the 10 percent level. We conducted separate statistical tests for the program and control groups. Appendix D presents detailed results from the nonresponse analysis.

TABLE II.4

RESPONSE RATES TO THE 26-MONTH PSI, 36-MONTH PI AND 36-MONTH ASSESSMENTS, BY SUBGROUPS DEFINED BY SITE AND FAMILY CHARACTERISTICS
(Percentages)
Site 26-Month PSI 36-Month PI 36-Month Bayley 36-Month Video
Program
Group
Control
Group
Program
Group
Control
Group
Program
Group
Control
Group
Program
Group
Control
Group
Site Characteristics
Program Approach     *   *   * *
     Center-based 75 67 83 69 71 56 74 59
     Home-based 69 67 71 66 56 52 56 51
     Mixed 72 70 70 68 53 51 50 51
Overall Implementation Level * * * * *   * *
     Early Implementers 70 71 74 69 58 58 59 56
     Later Implementers 78 72 79 69 64 53 66 56
     Incomplete Implementers 65 60 65 63 52 46 48 44
Family Characteristics at Random Assignment
Mother's Age at Birth of Focus Child
     Less than 20 71 67 71 66 57 55 56 54
     20 or older 71 69 74 69 58 51 59 52
Mother's Education * * * *       *
     Less than grade 12 68 66 69 65 57 51 57 51
     Grade 12 or earned a GED 73 67 78 66 59 51 62 50
     Greater than grade 12 74 75 75 77 58 57 55 60
Race and Ethnicity *   * * *   * *
     White non-Hispanic 71 70 78 73 59 57 60 59
     Black non-Hispanic 73 67 70 66 56 48 53 48
     Hispanic 70 67 73 62 62 52 63 54
Welfare Receipt *   *   *   * *
     Received welfare 66 65 68 66 54 50 54 48
     Did not receive welfare 74 69 77 70 60 53 60 56
Primary Occupation     *   *   *  
     Employed 75 66 80 68 67 52 66 55
     In school or training 71 67 72 66 60 52 60 50
     Neither 69 76 70 69 53 53 53 53
Primary Language       *        
     English 70 69 73 70 57 54 57 54
     Other 72 67 72 62 61 50 60 51
Living Arrangements   *         * *
     With spouse 73 72 76 72 56 54 59 57
     With other adults 72 70 73 67 61 53 61 54
     Alone 69 63 71 65 56 51 53 49
Random Assignment Date     *   *   *  
     Before 10/96 70 66 72 66 56 51 56 51
     10/96 to 6/97 71 71 69 68 54 54 53 54
     After 6/97 72 67 78 68 64 51 64 53
Total 71 68 73 67 58 52 58 53
SOURCE: HSFIS, 26-month PSI, 36-month PI, 26-month Bayley, and 36-month video data.

*Difference between the variable distribution for interview respondents and the full sample of respondents and non respondents is statistically significant at the 10 percent level.

We find some differences in response rates across groups of sites. Response rates for the program group were higher in the center-based programs than in the home-based and mixed-approach ones, although rates for the control group were more similar across program approaches. Thus, differences in response rates between the program and control groups were largest in the center-based programs. Interestingly, rates for both research groups were higher in sites that were fully implemented than in the incompletely implemented sites.

Response rates also differed across some subgroups defined by family characteristics. They increased with the education level of the primary caregiver. In addition, they were higher if the primary caregiver was employed at random assignment (for the program group), if she was married or living with other adults, and if the family was receiving welfare. Response rates were also slightly higher for whites than for African Americans and Hispanics for some data sources, and for those randomly assigned later than earlier. The pattern of response rates across subgroups was similar for the program and control groups.

Importantly, we find fewer differences in the baseline characteristics of program and control group respondents (Appendix D). Very few of the differences in the distributions of the baseline variables for respondents in the two research groups are statistically significant for each data source. None of the p-values for testing the hypotheses that the distribution of the baseline variables are jointly similar are statistically significant. Thus, although we find some differences in the characteristics of respondents and nonrespondents, the characteristics of respondents in the two research groups appear to be similar.

Our main procedure to adjust for potential nonresponse bias was to estimate impacts using regression models that control for differences in the baseline characteristics of program and control group respondents (see Section C below). We used a large number of control variables from the HSFIS forms to adjust for observable baseline differences between the two groups. We gave each site equal weight in the analysis (regardless of the response rates in each site). In addition, as discussed in Appendix D, we calculated sample weights to adjust for nonresponse, so that the weighted characteristics of respondents matched those of the full sample of respondents and nonrespondents. We used these weights in some analyses to check the robustness of study findings (see Appendix D).

These procedures adjust for nonresponse by controlling for measurable differences between respondents and nonrespondents in the two research groups. To be sure, there may have been unmeasured differences between the groups. However, because of the large number of baseline data items in the HSFIS forms, we believe that our procedures account for some important differences between the groups. Therefore, we are confident that our procedures yielded meaningful estimates of program impacts.

3. Timing of Interviews

Most interviews were conducted near their target dates (Appendix B). For example, the average 15-month PSI was conducted 16.6 months after random assignment, and about 80 percent were conducted between 12 and 18 months. Similarly, the average 26-month PSI was conducted 28.4 months after random assignment, and about 76 percent were conducted within 30 months. The average 24-month PI was conducted when the child was 25.1 months old, and about 88 percent were conducted when the child was between 23 and 27 months old. The average 36-month PI was completed when the child was 37.5 months old, and about 82 percent were completed before the child was 40 months old. The corresponding figures for the Bayley and video assessments are very similar to those of the PIs.

On average, the 6-, 15-, and 26-month PSI interviews were conducted about 5 months before the 14-, 24-, and 36-month birthday-related instruments, respectively (Appendix B). Thus, at the 36-month birthday-related interviews and assessments, some families who remained in the program for a long period probably had received more Early Head Start services than we report here.

The distributions of interview completion times were similar for program and control group families. Thus, it is not likely that impact estimates on outcomes (such as the child language measures) were affected by differences in the ages of program and control group children at the time the data were collected.8 As discussed in Appendix C, we did not have a pertinent norming sample to age-norm some measures.

4. Outcome Variables

The Early Head Start evaluation was designed to examine the extent to which Early Head Start programs influence a wide range of outcomes. Four main criteria guided specification of the major outcome variables for the analysis: (1) selecting outcomes that are likely to be influenced significantly by Early Head Start on the basis of programs’ theories of change and the results of previous studies, (2) selecting outcomes that have policy relevance, (3) measuring outcomes reliably and at reasonable cost, and (4) selecting outcomes that could be reliably compared over time.

The primary outcome variables for the analysis can be grouped into three categories:

  1. Service use

  2. Child development and parenting

  3. Family development

Table II.5 summarizes the key categories of outcome variables in each area, as well as the data sources used to construct them. In the analysis, we first describe the EHS experiences of program group members and examine impacts for the service use outcomes, because we would not expect meaningful impacts on the child, parenting, and family outcomes unless program group families received substantial amounts of Early Head Start services and received more and higher-quality services than the control group. Examining the services received by control group families is crucial for defining the counterfactual for the evaluation, and for interpreting impact estimates on all other outcomes. These results are presented in Chapter IV. Impact results for the child, parent, and family outcomes are presented in Chapters V, VI, and VII. A detailed discussion of the specific outcome variables for the analysis, the reasons they were selected, and the way they were constructed can be found at the start of each chapter.

TABLE II.5

CATEGORIES OF OUTCOME VARIABLES REFERRED TO IN THIS REPORT, AND THEIR DATA SOURCES
Outcome Measure Data Source
Service Use
Home visits 6-, 15-, and 26-Month Parent Services Interviews
Case management 6-, 15-, and 26-Month Parent Services Interviews
Parenting-related services 6-, 15-, and 26-Month Parent Services Interviews
Child care and child development services 6-, 15-, and 26-Month Parent Services Interviews
Services for children with disabilities 6-, 15-, and 26-Month Parent Services Interviews
Child health services and status 6-, 15-, and 26-Month Parent Services Interviews
Family health and other family development services 6-, 15-, and 26-Month Parent Services Interviews
Father participation in program-related activities 36-Month Father Interview
Parenting Behavior, Knowledge, and the Home Environment
Knowledge of child development, discipline strategies, and safety precautions 24- and 36-Month Parent Interviews
Parent supportiveness, detachment, intrusiveness, and negative regard Coding from Videotaped Parent-Child Semistructured Play Task (24 and 36 Months)
Parent quality of assistance, detachment, and intrusiveness Coding from Videotaped Puzzle Challenge Task (36 Months)
Parent warmth, harshness and stimulation of language and learning 24- and 36-Month Parent Interviews
Quality of cognitive and emotional support provided in the home environment 24- and 36-Month Parent Interviews and Interviewer Observations
Father Involvement 24- and 36-Month Parent Interviews
Child Development
Child social and emotional well-being  
Child engagement, negativity toward parent, and sustained attention with objects Coding from Videotaped Parent-Child Semistructured Play Task (24 and 36 months)
Child engagement, persistence, and frustration Coding from Videotaped Puzzle Challenge Task (36 Months)
Emotional regulation, orientation/engagement Interviewer Observations (24 and 36 months)
Aggressive behavior 24- and 36-Month Parent Interviews
Child cognitive and language development  
Bayley Mental Development Index (MDI) Direct Child Assessment (24 and 36 months)
Vocabulary production and sentence complexity 24-Month Parent Interviews
Receptive vocabulary Direct Child Assessment (36 Months)
Child Health Status 24- and 36-Month Parent Interviews
Family Outcomes
Parent's Health and Mental Health  
Depression 24- and 36-Month Parent Interviews
Parenting stress 24- and 36-Month Parent Interviews
Family Functioning  
Family conflict 24- and 36-Month Parent Interviews
Self-Sufficiency  
Education and training 6-, 15, and 26-Month Parent Services Interviews
Welfare receipt 6-, 15, and 26-Month Parent Services Interviews
Employment and income 6-, 15, and 26-Month Parent Services Interviews
Father Presence, Behavior, and Well-Being
Father presence 14-, 24-, and 36-Month Parent Interviews
Father caregiving, social, cognitive, and physical play activities 36-Month Father Interview
Father discipline strategies 36-Month Father Interview
Father supportiveness and intrusiveness Coding from Videotaped Father-Child Semistructured Play Task (36 months)
Father quality of assistance and intrusiveness Coding from Videotaped Father-Child Puzzle Challenge Task (36 months)
Father's Mental Health  
Depression 36-Month Father Interview
Parenting stress 36-Month Father Interview
Family Functioning  
Family conflict 36-Month Father Interview
Child Behavior With the Father
Child engagement of the father, negativity toward the father, and sustained attention with objects Coding from Videotaped Father-Child Semistructured Play Task (36-Months)
Child engagement of father, persistence, and frustration Coding from Videotaped Father-Child Puzzle Challenge Task (36-Months)

5. Analysis Samples

We used different analysis samples, depending on the data source and type of analysis. The primary sample used to estimate “point-in-time” impacts on outcomes from the 24-month or 36-month PI data includes those who completed 24-month or 36-month PIs. Similarly, the primary sample for the point-in-time analysis based on the birthday-related child and family assessment data includes those who completed the assessments at each time point. In sum, we conducted separate point-in-time analyses using each of these samples in order to maximize the sample available for the analyses.

The primary sample, however, used in the analysis to examine impacts on the growth in child and family outcomes (that is, the growth curve analysis) includes those for whom data are available for all three time points. Similarly, the primary sample used in the analysis to examine the extent to which impacts on mediating (24-month) variables correlate with impacts on longer-term (36-month) outcomes (that is, the mediated analysis) includes those for whom both 24-month and 36-month data are available.

For the analysis of the service use and self-sufficiency outcomes, we used the sample of those who completed 26-month PSIs (regardless of whether a 6- or 15-month PSI was completed). Most of the service use and self-sufficiency outcomes pertain to the entire 26-month period since random assignment (for example, the receipt of any home visits, the average hours per week the child spent in center-based child care, and the average number of hours the mother spent in education and training programs), so data covering the entire 26-month period were required to construct these outcomes. About 88 percent of those who completed a 26-month PSI also completed a 15-month PSI, and 97 percent completed either a 6-month or a 15-month PSI. In the 26-month PSI, respondents were asked about their experiences since the previous PSI interview (or since random assignment if no previous PSI was completed). Thus, complete data covering the 26-month period are available for all those in the 26-month analysis sample.

We did estimate impacts, however, using alternative sample definitions to test the robustness of study findings (see Appendix D). For example, we estimated point-in-time impacts on 36-month outcomes using those who completed both the 24- and 36-month PIs (the mediated analysis sample), as well as those who completed all birthday-related interviews and assessments (the growth curve analysis sample). As another example, we estimated impacts on service use and self-sufficiency outcomes using those who completed both the 15- and 26-month PSIs. Our results using alternative samples were very similar, so, in the main body of this report, we present only results that were obtained using the primary analysis samples described above.

C. ANALYTIC APPROACHES

The Early Head Start impact analysis addresses the effectiveness of Early Head Start services on key child, parenting, and family outcomes from several perspectives. The global analysis examines the overall impacts of Early Head Start across all 17 sites combined, while the targeted analysis addresses the important policy questions of what works and for whom.

1. Global Analysis

In this section, we discuss our approach for answering the question: Do Early Head Start programs have an effect on child, parenting, and family outcomes overall? Stated another way, we discuss our approach for examining the extent to which the 17 programs, on average, changed the outcomes of program participants relative to what their outcomes would have been had they not received Early Head Start services. First, we discuss our primary approach for estimating impacts per eligible applicant. Second, we discuss our approach for estimating impacts per participant (that is, for families that received Early Head Start services). Finally, we discuss our approach for estimating impacts using growth curve models.

a. Estimating Point-in-Time Impacts per Eligible Applicant

Random assignment was performed at the point that applicant families were determined to be eligible for the program. Thus, we obtained estimates of impacts per eligible applicant by computing differences in the average outcomes of all program and control group families at each time point. This approach yields unbiased estimates of program impacts on the offer of Early Head Start services, because the random assignment design ensures that no systematic differences between program and control group members existed at the point of random assignment except for the opportunity to receive Early Head Start services.

We used regression procedures to estimate program impacts, for two reasons. First, the regression procedures produce more precise impact estimates. Second, they can adjust for any differences in the observable characteristics of program and control group members due to random sampling and interview nonresponse. However, we also estimated impacts using simple differences-in-means procedures to test the sensitivity of our findings to alternative estimation strategies (see Appendix D). The two procedures yielded very similar results; we present the regression-adjusted estimates in the main body of this report.

We estimated variants of the following regression model:

Equation1

where y is an outcome variable at a specific time point, Sj (is an indicator variable equal to 1 if the) family is in site j, T is an indicator variable equal to 1 if the family is in the program group, Xs are explanatory variables measured at baseline (that include site indicator variables), e is a mean zero disturbance term, and aj and B are parameters to be estimated. In this formulation, the estimate of aj (represents the regression-adjusted impact estimate for sitej.9

An important aspect of our analytic approach was to give each site equal weight regardless of sample sizes within the sites. Early Head Start services are administered at the site level and differ substantially across programs; thus, the site is the relevant unit of analysis. Accordingly, the global impact estimates were obtained by taking the simple average of the regression-adjusted impact estimates in each site.10 The associated t-tests were used to test the statistical significance of the impact estimates.

We included a large number of explanatory variables in the regression models (Table II.6 lists the categories of variables, and Appendix Table E.II.B provides variable descriptions and means). These variables were constructed using HSFIS data and pertain to characteristics and experiences of families and children prior to random assignment. We used two main criteria to select the explanatory variables: (1) they should have some predictive power in the regression models for key outcome variables (to increase the precision of the impact estimates); and (2) they should be predictors of interview nonresponse (to adjust for differences in the characteristics of program and control group respondents).11 There was no theoretical reason to include different explanatory variables by site or to assume that the parameter estimates on the explanatory variables would differ by site. Thus, we used the same model specification for each site.12 The regression R2 values for key 36-month outcomes ranged from about .10 (for maternal depression and distress measures) to .15 (for parent-child interaction scales from the video assessments) to .30 (for measures of child cognitive and language development and the home environment) to .50 (for measures of welfare receipt).

TABLE II.6

CATEGORIES OF EXPLANATORY VARIABLES FOR REGRESSIONS
Family and Parent Characteristics
Age of Mother
Race
English-Language Ability
Education Level
Primary Occupation
Living Arrangements
Number of Children in the Household
Poverty Level
Welfare Receipt (AFDC/TANF; Food Stamps; WIC; SSI)
Has Inadequate Resources (Food, Housing, Money, Medical care, Transportation)
Previously Enrolled in Head Start or Another Child Development Program
Mobility in the Previous Year
Random Assignment Date
Child Characteristics
Age of Focus Child at Random Assignment
Age of Focus Child at Interview or Assessment
Birthweight Less than 2,500 Grams
Gestational Age
Gender
Evaluation History
Risk Categories (Established, Biological/Medical, Environmental)
SOURCE: HSFIS application and enrollment forms.

 

As discussed, we constructed weights to adjust for interview nonresponse. Our basic approach was not to use these weights in the regression models, because there is no theoretical reason to use them in this context (DuMouchel and Duncan 1983). However, to test the robustness of study findings, we estimated some regression models using the weights (see Appendix D). We also used weights to obtain all estimates of impacts using simple differences-in-means procedures. The weighted and unweighted impact results are very similar (see Appendix D).

b. Estimating Point-in-Time Impacts per Participant

Random assignment occurred at the point of eligibility and not when families started receiving services. Hence, program and control group differences yield combined impact estimates for those who participated in Early Head Start and those who enrolled but did not participate.

An important evaluation goal, however, is to estimate impacts on those who received program services. Estimating impacts for this group is complicated by the fact that a straightforward comparison of the outcomes of program group participants and all control group members does not yield the desired impact on participants. Ideally, we would compare the outcomes of program group participants with control group families who would have participated in Early Head Start had they been in the program group. However, we cannot identify these control group families.

As discussed in Appendix D, we can overcome these complications by assuming that Early Head Start had no effect on families who enrolled but did not receive Early Head Start services. In this case, the impact per participant in a site can be obtained by dividing the impact per eligible applicant in that site by the site’s program group participation rate (Bloom 1984). The estimated global impact per participant across all sites can then be calculated as the average of the estimated impacts per participant in each site.

A crucial issue is how to define a program participant. The key assumption that allows us to estimate impacts for participants is that the outcomes of those in the program group who enrolled but did not receive services would have been the same if they had instead been assigned to the control group (that is, the program had no effect on nonparticipants). Thus, in order to be confident that this (untestable) assumption holds, we need a conservative definition of a program participant.

A program group family was considered to be an Early Head Start participant if, during the 26 months after random assignment, the family received more than one home visit, met with a case manager more than once, enrolled its child in center care for at least two weeks, or participated in a group activity. This participation rate was 91 percent for the full program group. It ranged from 68 percent to 97 percent across the program sites, but was at least 88 percent in 15 of the 17 sites. Because the participation rate was fairly high in most sites, the estimated impacts per eligible applicant and the estimated impacts per participant are very similar.13

c. Crossovers in the Control Group and Spillover Effects

As discussed, about 0.7 percent of control group members participated in Early Head Start. These “crossovers” were treated as control group members in the analysis, to preserve the integrity of the random assignment design. Thus, the presence of these crossovers could yield impact estimates that are biased slightly downward if the crossovers benefited from program participation.

The procedure to estimate impacts for participants can be adapted to accommodate the control group crossovers (Angrist et al. 1996). This involves dividing the impacts per eligible applicant by the difference between the program group participation rate and the control group crossover rate. The key assumption underlying this procedure is that the outcomes of control group crossovers would have been the same if they had instead been assigned to the program group. These estimates, however, are very similar to the impacts per participant, because of the small number of crossovers. For example, the impacts per participant in most sites were obtained by dividing the impacts per eligible applicant by about .91, whereas the impacts that adjust for the crossovers were typically obtained by dividing the impacts per eligible applicant by .903 (.91 - .07). Thus, for simplicity, we do not present the impacts that adjust for crossovers.

About one-third of control group families reported during the PSIs that they knew at least one family in Early Head Start. Thus, “spillover” effects could lead to impact estimates that are biased downwards if control group families, through their interactions with Early Head Start families, learned some of the parenting skills that program group families acquired in Early Head Start. It is difficult to ascertain the extent of these spillover effects, because we did not collect detailed information on the extent to which control group families benefited from their interactions with program group families. Furthermore, we cannot use the same statistical procedures to adjust for spillover effects as for crossover effects, because it is not reasonable to assume that the outcomes of control group families who had contact with program group families would have been the same had these controls instead been assigned to the program group (and directly received Early Head Start services). Thus, we do not adjust for spillover effects, and our impact estimates are likely to be conservative.

d. Growth Curve Models

We also used longitudinal statistical methods (or, more specifically, growth curve or hierarchical linear modeling) to estimate the effects of Early Head Start participation on child and family outcomes that were measured when the focus children were 14, 24, and 36 months old. These methods were used to examine impacts (program and control group differences) on the growth trajectories of child and family outcomes during the follow-up period.

In our context, the growth curve models can be estimated using the following two steps:

  1. Fit a regression line through the three data points for each program and control group member, and save the estimated intercepts and slopes of the fitted lines. Mathematically, the following equation is estimated for each sample member:
  2. Equation2

    where yit is the outcome variable of sample member i at time t, age it (is the age of the)child (in months) at the interview or assessment, uit is a mean zero disturbance term, and a0i and a1i are parameters to be estimated.14 We use 15 months as the base period, because this was the average age of the children at the 14-month interviews and assessments.

  3. Compute impacts on the intercepts and slopes from Step 1. Mathematically, variants of the following equations are estimated:

    Equation3

    Equation4
    where a0 is the vector of intercepts from equation (2) (and which are replaced by their estimates), (.)1 (is the vector of slopes from equation (2) (and which are replaced by) their estimates), T is an indicator variable equal to 1 if the family is in the program group, Xs are explanatory variables, e0 and e1 are mean zero disturbance terms (that are assumed correlated with each other and with the error term in equation (2) for the same individual but not across individuals), and the Bs, ys, ßs, and Os are parameters to be estimated.

In this formulation, the estimate of the slope, (3)1(, represents the program and control group) difference in the mean growth of the outcome variable between the 14- and 36-month data collection points. The estimate of the intercept, yo, represents the point-in-time impact of Early) Head Start on the outcome variable at 15 months (the base period).15,16

For each outcome measure, the growth curve approach produces an overall regression line for the program group (defined by the mean estimated intercept and mean estimated slope across all program group members) and, similarly, an overall regression line for the control group. The difference between these overall regression lines at any given time point yields a point-in-time impact estimate.

The growth curve approach has several advantages over our basic point-in-time analysis. First, the growth curve approach may yield more precise impact estimates because it assumes that outcomes grow linearly over time. This functional form assumption “smoothes” the data points, which can lead to estimates with smaller standard errors. Second, because of the linearity assumption, the growth curve approach can account directly for differences in the ages of children at a particular interview or assessment (which occurred because it took more time to locate some families than others). Finally, the approach produces important descriptive summary information about the growth in outcomes over time, and can be used to predict future impacts.

There are, however, several important disadvantages of the growth curve approach. The main disadvantage is that the relationship between some outcomes and a child’s age may not be linear. In this case, the growth curve approach can lead to biased impact estimates. A related issue is that the linearity assumption implies that the estimated impacts can only grow or diminish over time; they cannot grow and then diminish, or vice versa. As discussed in this report, this assumption is often violated. Another disadvantage of the growth curve approach is that it can be used only on those outcomes that were measured at all three time points (Chapter V discusses the specific outcome measures that were used in the growth curve analysis).17 Finally, the sample for the growth curve approach includes only those sample members who completed interviews and assessments at every time point, whereas the point-in-time analysis uses all available data at each time point.18

Importantly, despite these advantages and disadvantages, impacts obtained using the growth curve approach and our point-in-time approach are very similar. This is not surprising, because the growth curve approach essentially fits a regressionline through the mean outcomes of program group members at each time point and, similarly, for the control group. Thus, if the growth of an outcome measure is roughly linear over time, then the overall regressionline for the program group that is produced by the growth curve approach should pass close to the observed mean outcome for the program group at each time point, and, similarly, for the control group. Consequently, we view the growth curve approach as a supplementary analysis to our basic point-in-time analysis, and use it primarily to test the robustness of study findings. Results from the growth curve models are presented in Appendix D.5 and are discussed in Chapter V as we present our main findings.

e. Presentation of Results

In Chapters V through VII, where we report program effects on child, parenting, and family outcomes, and the effects on these outcomes for population subgroups, we present impact results for participants.19 However, in Chapter IV, where we report program effects for the service use outcomes, we present results for eligible applicants, in order to understand the extent to which Early Head Start programs are serving eligible families, and to understand the services available to eligible families in the absence of Early Head Start. This analysis is critical to understanding program operations and implementation, as well as program impacts.

In the impact tables in Chapters V to VII, we present the following statistics:

  1. The Mean Outcome for Participants in the Program Group. This mean was calculated using the 91 percent of program group members who participated in Early Head Start (using the definition of participation discussed above).

  2. The Mean Outcome for Control Group Members Who Would Have Been Early Head Start Participants if They Had Instead Been Assigned to the Program Group. This mean is not observed, but is estimated as the difference between the program group participant mean and the estimated impact per participant. We sacrifice technical accuracy for simplicity in the text, and refer to this mean as the “control group mean.”

  3. The Estimated Impact per Participant. As discussed, this impact was obtained by (1) dividing the regression-adjusted impacts per eligible applicant in each site by the program group participation rate in each site; and (2) averaging these site-specific impacts across sites.

  4. The Size of the Impact in Effect Size Units. This statistic was calculated as the impact per participant divided by the standard deviation of the outcome variable for the control group times 100.

  5. The Significance Level of the Estimated Impact. We indicate whether the estimated impact is statistically significant at the 1 percent, 5 percent, or 10 percent level, using a two-tailed test.20 We indicate marginally significant findings at the 10 percent level, because we seek to identify patterns of program effects across the large number of outcomes and subgroups under investigation, and thus, relax the traditional 5 percent significance level threshold (see Section 3 below). We present similar statistics in Chapter IV for the impact findings on service use outcomes, except that the statistics pertain to eligible applicants rather than to participants only.

2. Targeted Analysis

The targeted analysis uses a more refined approach than the global analysis to examine the effects of Early Head Start on key outcomes. The targeted analysis addresses the important policy questions of what works, and for whom. It focuses on estimating whether impacts differ (1) for sites with different program approaches, implementation levels, and community contexts; (2) for families with different characteristics at the time of program application; and (3) for families who received different amounts of Early Head Start services. The analysis also examines the extent to which impacts on shorter-term (24-month) mediating variables correlate with impacts on longer-term (36-month) outcomes.

Specifically, the targeted analysis addresses the following research questions:

  1. Do different program approaches have different program impacts?

  2. Do different levels of program implementation result in different impacts?

  3. Do different community contexts result in different impacts?

  4. Do program impacts differ for children and parents with different baseline characteristics?

  5. Are impacts on mediating variables consistent with impacts on longer-term outcomes?

a. Program Approach, Implementation Level, and Community Context

Early Head Start programs tailor their program services to meet the needs of eligible low-income families in their communities, and select among program options specified in the Head Start Program Performance Standards. ACYF selected the 17 research sites to reflect Early Head Start sites more broadly; thus the Early Head Start programs participating in the evaluation varied in their approach to serving families. Furthermore, they differed in their pattern of progress in implementing key elements of the revised Head Start Program Performance Standards. Accordingly, we examined how impacts varied by program approach, implementation level, and community context.

Impact results by program approach can provide important information on how to improve program services, as well as to develop and expand the program. Variations in impacts across programs that achieved different levels of implementation may provide insights into the importance of fully implementing key program services. Because Early Head Start programs are required to tailor services to meet local community needs, it is very important to understand the conditions under which they can have various effects.

The specific subgroups defined by key site characteristics that we examined are displayed in Table II.7. The table also displays the number of sites and the percentage of research families (at the time of random assignment) who are included in each subgroup. Table II.8 displays these variables by site (so that the overlap in these site subgroups can be examined). We selected these groupings in consultation with ACYF and the Early Head Start Research Consortium. Because of the small number of sites included in the evaluation, we limited the analysis to a few key subgroups that would capture distinguishing features of Early Head Start programs that are policy relevant and could be accurately measured.

For the analysis of impacts by program approach, we divided programs into four center-based, seven home-based, and six mixed-approach programs on the basis of their program approaches in 1997 (see Chapter I). As discussed throughout this report, because the three approaches offer different configurations of services, we expect differences in the pattern of impacts by approach (see, especially, discussions of the hypotheses relating to expected impacts in Chapter VI).

TABLE II.7

SUBGROUPS DEFINED BY PROGRAM APPROACH, IMPLEMENTATION PATTERN, AND COMMUNITY CONTEXT
Subgroup Number of Sites Percentage of
Families
Program Approach
     Center-based 4 20
     Home-based 7 46
     Mixed Approach 6 34
Overall Implementation Pattern
     Early implementers 6 35
     Later implementers 6 35
     Incomplete implementers 5 30
Overall Implementation Among Home-Based Programs
     Early or later implementers 4 55
     Incomplete implementers 3 45
Overall Implementation Among Mixed-Approach Programs
     Early implementers 3 54
     Later or incomplete implementers 3 46
Implementation of Child and Family Development Services
     Full implementers in both areas in both time periods 4 24
     Not full implementers in both areas in both time periods 13 76
Whether Program is in a Rural or Urban Area    
     Rural 7 41
     Urban 10 59
Whether State or County Has Work Requirements for TANF Mothers with Children Younger Than 1
     State has requirements 7 42
     State has no requirements 10 58
SOURCE: Data from 1997 and 1999 site visits.

TABLE II.8

SUBGROUPS DEFINED BY SITE CHARACTERISTICS, BY SITE
Site Program
Approach
Implementation Pattern Work Requirements
for TANF Mothers
With Infants

In an Urban Area
Overalla Strong Full
Implementationb
1 Center Early Yes Yes No
2 Home Later No No Yes
3 Mixed Later No Yes Yes
4 Center Early No Yes Yes
5 Mixed Incomplete No No Yes
6 Home Incomplete No Yes No
7 Mixed Early Yes No Yes
8 Home Later No Yes Yes
9 Home Incomplete No No Yes
10 Center Incomplete No No Yes
11 Home Incomplete No No Yes
12 Mixed Later No No No
13 Home Early Yes No No
14 Mixed Early Yes Yes No
15 Mixed Early No No Yes
16 Home Later No No No
17 Center Later No Yes No

SOURCE: Implementation study data.

NOTE: Sites are in random order.

a“Early” indicates program was rated as fully implementing the key elements of the Head Start Program Performance Standards in 1997, “later” means the program was fully implemented in 1999 but not 1997, and “incomplete” means full implementation was not achieved by 1999 (see Appendix C for more details of the implementation ratings).(back)

b“Strong full implementation” indicates that a program fully implemented both child and family development services early and sustained full implementation of both areas in 1999.(back)

We used data collected from the implementation study site visits in fall 1997 and fall 1999 to assess the degree of implementation in each of the research programs (see Chapter I). We then divided programs into (1) early implementers (six sites), (2) later implementers (six sites), and (3) incomplete implementers (five sites). The early implementers became “fully implemented” by 1997 and remained so at the time of the 1999 site visits, while the later implementers were not fully implemented in 1997 but were by 1999. The incomplete implementers had still not achieved full implementation by 1999, although they demonstrated a number of strengths in particular programmatic areas.21 We also identified programs that achieved an especially strong pattern of full implementation—these were the four programs that fully implemented both child and family development services early and remained fully implemented in these areas in 1999.

To be rated as fully implemented overall, programs had to be fully implemented in most of the five component areas. Reflecting the Head Start Bureau’s focus on child development, special consideration was given to the child development rating, and it was weighted more heavily in arriving at the consensus rating for overall implementation. The rating panel judged that three programs that were not rated “fully implemented” in child development should be rated as “fully implemented” overall because they were strong in all other component areas, were exceptionally strong in several aspects of child development services, and close to full implementation in the remaining areas.

Clearly, we expect impacts on child, parenting, and family outcomes to be larger in the fully implemented programs than in the incompletely implemented programs, because the fully implemented programs delivered services that were more intensive, more comprehensive, and of higher quality. Similarly, we expect impacts on child, parenting, and family outcomes to be even larger in the strong fully implemented programs. We also expect impacts to be larger in the programs that became fully implemented earlier than in those implemented later.

Assessing impacts by the level of implementation is complicated by the fact that the fully implemented programs were not evenly distributed across the program approaches, as can be seen in Table II.8. For example, only one of the seven home-based programs was an early implementer, as compared to two of the four center-based programs and three of the six mixed-approach programs. Thus, comparing all implementers to all nonimplementers confounds impact differences by implementation level with impact differences by program approach. Therefore, we also estimated impacts for subgroups defined by interacting program approach and implementation level. Because of sample size constraints, this analysis focused on comparing estimated impacts for the three mixed programs that were early implementers to those of the three mixed programs that were not early implementers and for the four home based programs that were implemented (whether early or later) compared to the three that were not implemented. (see Chapter VI and Appendix E.VI). There were too few center-based programs to make this comparison across implementation patterns.

We created two additional site-level subgroups: one defined by whether or not the state or county had work requirements for mothers who were receiving TANF and who had children younger than 12 months, and one defined by whether the program was located in an urban area. Hypotheses of expected impacts for these groups are discussed in Chapter VII.

The ability of the national evaluation to assess the community context was somewhat limited. A number of the local research teams conducted in-depth research in their program communities, however. Examples of their research are included in boxes in appropriate places in the report.

Estimation Issues. The random assignment design allows us to estimate unbiased impacts for sites with a specific characteristic by comparing the outcomes of program and control group members in those sites. For example, we obtained unbiased impacts for sites with center-based programs by estimating the regression models discussed above, using program and control group members in those four locations. Similarly, we estimated impacts for early implementers using only program and control group families in those six sites. Sites were given equal weight in all analyses. We conducted statistical tests to gauge the statistical significance of the subgroup impacts as well as whether the impacts differed across subgroups (for example, whether impacts for center-based, home-based, and mixed-approach sites differed).

Interpretation of Estimates. The results from this analysis should be interpreted cautiously, for several interrelated reasons. First, there are only a small number of programs in each subgroup, so the estimates are imprecise. Second, program features were not randomly assigned to the research sites. Instead, as specified in the Head Start Program Performance Standards, the programs designed their services on the basis of their community needs and contexts. Accordingly, the configuration of services offered, the program structure, and the characteristics of families served all varied across sites. Consequently, our results tell us about the effectiveness of specific program features for programs that adopted those features, given their community contexts and eligible population. The results do not tell us how successful a particular program feature would have been if it had been implemented in another site, or how well a family in one type of program would have fared in another. We are comparing the outcomes of program and control group families within sites, not comparing families across sites. Thus, for example, our results inform us about the effectiveness of mixed-approach programs for the research sites that implemented this program approach. These results, however, cannot necessarily be used to assess how the mixed approach would have succeeded in sites that chose to adopt home-based or center-based approaches, because of other differences in the characteristics of these sites.

These important qualifications can be further illustrated by noting that the characteristics of families differed by program approach (Table II.9). For example, compared to families in home-based and mixed-approach programs, families in center-based programs were much more likely to have been employed or in school at the time of program application, and to have older children. They were also less likely to be receiving welfare. Furthermore, communitycharacteristics, as well as implementation levels, differed by program approach. Because of these important differences, our results do not provide strong evidence that one particular program approach is better than another. Instead, our analysis addresses the important policy question of whether programs that purposively select and provide a particular array of services to meet perceived needs can effectively improve various outcomes for program participants in their communities.

We did attempt to isolate the effects of particular program features from others using two related approaches, although these results must be interpreted cautiously. First, we estimated regression models where subgroup impacts on program and family characteristics were estimated simultaneously. These models were estimated by including as explanatory variables terms formed by interacting the treatment status indicator variable with several key subgroup indicator variables. This method examines the effects of a particular program feature (for example program approach), holding constant the effects of other site features with which it may be correlated (such as implementation level and the characteristics of families served by the program).

TABLE II.9

KEY FAMILY, PARENT, AND CHILD CHARACTERISTICS AT BASELINE, BY PROGRAM APPROACH
(Percentages)
Characteristic Program Approach
Center-Based Home-Based Mixed
Mother a Teenager at Birth of Focus Child 41 36 42
Mother's Education
     Less than grade 12 45 49 48
     Grade 12 or earned a GED 29 28 29
     Greater than grade 12 26 23 23
Race and Ethnicity
     White non-Hispanic 30 41 37
     Black non-Hispanic 37 28 42
     Hispanic 27 27 17
     Received Welfare 26 39 37
Primary Occupation
     Employed 34 22 19
     In school or training program 28 18 23
     Neither 39 60 58
Living Arrangements
     With spouse 19 29 24
     With other adults 43 30 48
     Alone 38 41 28
Maternal Risk Indexa
     0 or 1 (low risk) 21 17 18
     2 or 3 (moderate risk) 57 56 54
     4 or 5 (high risk) 23 27 29
Age of Focus Child
     Unborn 12 26 33
     Less than 5 months 32 36 37
     5 months or older 56 39 30
SOURCE: HSFIS application and enrollment forms.

aThis index was constructed by summing the number of the following risk factors that the mother faced: (1) being a teenage mother; (2) having no high school credential; (3) receiving public assistance; (4) not being employed or in school or training, and (5) being a single mother.(back)


Second, as discussed, we estimated program impacts for finer subgroups of sites by combining across the site categories discussed above (see Appendix D). For example, we estimated impacts by combining the implementation and program approach categories. While these results were sometimes unstable because of small sample sizes, they provided important information about the pattern of program impacts across the important subgroups defined by site characteristics.

The results from these two analyses are very similar to the results where the site subgroups were estimated separately. For example, our results indicate that certain program approaches were not responsible for the results by implementation status, and that the results by program approach were not driven by the particular levels of implementation in the program approach subgroups. These analyses, however, could only control for a small number of site features, because of the relatively small number of sites in the sample. Consequently, it is likely that our models do not adequately control for other important differences across sites that could affect impacts. Thus, as discussed, the subgroup results must be interpreted cautiously.

b. Child and Family Characteristics

Determining the extent to which Early Head Start programs benefit children and families with different personal characteristics has important policy implications, both for the operation of Early Head Start and for the development of other programs designed to serve this population. Policymakers and program staff can use findings from this subgroup analysis to improve program services and target them appropriately. Even where equity considerations prevent targeting of services, subgroup impacts could provide insights into how the program generates large or small overall impacts.

We constructed the child and family subgroups for the analysis using HSFIS data. The variables were measured at baseline (that is, prior to random assignment), because variables pertaining to the post-random assignment period are outcomes (that is, they could have been affected by Early Head Start participation) and therefore cannot be used to define valid subgroups. We selected the subgroups in consultation with ACYF and the Early Head Start Research Consortium to capture key variations in the program needs and experiences of families served by Early Head Start.

We examined the following subgroups (Table II.10 displays subgroup sample sizes):

  • Mother’s Age at Birth of Focus Child. It is likely that a number of developmental outcomes vary by the mother’s age, and the difficulty of supporting mothers in various aspects of parenting might also vary by the mother’s age. About 39 percent of mothers were teenagers when the Early Head Start focus child was born (including those born after random assignment). We created a group consisting of mothers under 20 years of age in order to have a subgroup of teenagers sufficiently large for analysis.

  • Mother’s Education. Considerable research has shown the mother’s education to be a predictor of children’s development and well-being. We created three subgroups (completion of less than 12th grade, completion of grade 12 or attainment of a GED, and education beyond high school). About half the mothers had not completed high school by the time they applied to Early Head Start, and about one-fourth were in each of the other groups.

  • Race and Ethnicity. A little more than one-third of the program applicants were white non-Hispanic, about one-third were African American non-Hispanic, and one-quarter were Hispanic. (The “other” group is too small to constitute a subgroup.)

  • Whether Mother Received AFDC/TANF Cash Assistance. As noted in Chapter I, Early Head Start began just as TANF was enacted. Issues related to public assistance and employment are of keen interest to policymakers, so it was important to examine the extent to which Early Head Start programs benefited families receiving such assistance (about 35 percent of mothers were receiving AFDC/TANF at the time they applied to their local Early Head Start program).

  • Primary Occupation. Three subgroups were used to distinguish applicants who were employed, in school or training, or neither. About 50 percent were neither working nor in school, with about 25 percent employed and 25 percent in school.

    TABLE II.10

    SUBGROUPS DEFINED BY FAMILY AND CHILD CHARACTERISTICS AT BASELINE
    Subgroup Sample in All Sites Sample in Sites With at Least 10 Program Group
    Participants and 10 Controls in the Subgroupa
    Sample
    Size
    Percent
    of Families
    Sample
    Size
    Number of
    Sites
    Number of Sites in
    36-Month Bayley
    Sample
    Parent and Family Characteristics
    Mother's Age at Birth of Focus Child
         Less than 20 1,142 39 1,116 16 14
         20 or older 1,771 61 1,754 16 16
         Missing 88        
    Mother's Age at Birth of First Child
          Less than 19 1,247 42 1,247 17 14
         19 or older 1,720 58 1,691 16 16
         Missing 34        
    Mother's Education
          Less than grade 12 1,375 48 1,375 17 15
         Grade 12 or attained a GED 822 29 773 14 9
         Greater than grade 12 682 24 664 15 8
         Missing 122        
    Race and Ethnicityb
         White Non-Hispanic 1,091 37 1,017 11 7
         Black Non-Hispanic 1,014 35 952 10 9
         Hispanic 693 24 643 8 4
         Missing 68        
    Welfare Receiptc
         Received welfare 842 35 769 13 7
         Did not receive welfare 1,554 65 1,554 17 16
         Missing 41        
    Primary Occupation
         Employed 677 24 651 15 8
         In school or training 630 22 564 12 6
         Neither 1,590 55 1,590 17 16
         Missing 104        
    Primary Language
         English 2,265 79 2,265 17 16
         Other 615 21 560 9 4
         Missing 121        
    Living Arrangements
         With spouse 752 25 657 11 8
         With other adults 1,157 39 1,157 17 14
         Alone 1,080 36 1,021 14 13
         Missing 12        
    Presence of Adult Male in the Household
         Male present 1,153 39 1,145 16 15
         Male not present 1,836 61 1,836 17 17
         Missing 12        
    Random Assignment Date
         Before 10/96 1,088 36 1,062 13 10
         10/96 to 6/97 916 31 916 16 10
         After 6/97 997 33 952 15 11
         Missing 0        
    Maternal Risk Indexd
         0 or 1 (low risk) 483 18 336 8 4
         2 or 3 (moderate risk) 1,478 55 1,478 17 16
         4 or 5 (high risk) 713 27 665 13 6
         Missing 327        
    Mother at Risk for Depressione
         Yes (CES-D at least 16) 617 48 617 8 7
         No (CES-D less than 16) 658 52 658 8 8
    Focus Child Characteristics
    Age
         Unborn 761 25 678 12 8
         Less than 5 months 1,063 35 1,051 16 16
         5 months or older 1,177 39 1,172 16 14
         Missing 0        
    Gender
         Male 1,510 51 1,510 17 17
         Female 1,448 49 1,448 17 17
         Missing 43        
    First Born
         Yes 1,858 63 1,858 17 17
         No 1,112 37 1,097 15 13
         Missing 31        
    Sample Size 3,001        

    SOURCE: HSFIS application and enrollment data.

    aData for the subgroup analysis pertain to sites that have at least 10 program group participants and 10 control group members in the subgroup.(back)

    bAbout 5 percent of cases (135 cases) were American Indian, Eskimo, Aleut, and Asian or Pacific Islander. Sample sizes for these groups were too small to support separate impact estimates for them.(back)

    cData pertain to families with focus children who were born at baseline.(back)

    dThis index was constructed by summing the number of the following risk factors that the mother faced: (1) being a teenage mother; (2) having no high school credential; (3) receiving public assistance; (4) not being employed or in school or training, and (5) being a single mother.(back)

    eThe CES-D was administered at baseline to sample members in eight sites only.(back)

  • Living Arrangements. We created three categories: (1) lives with a spouse, (2) lives with other adults, and (3) lives alone. The sample is divided, with about 25, 39, and 36 percent in each of these groups, respectively.

  • Age of the Focus Child. We created three subgroups based on the age of the child at random assignment: (1) unborn, (2) under 5 months, and (3) 5 to 12 months, with 25, 35, and 39 percent of the sample in each group, respectively.

  • Gender of the Focus Child. About 50 percent of the sample children are boys and 50 percent girls.

  • Birth Order of Focus Child. About 63 percent were first-born.

  • Mother’s Risk of Depression. Local researchers in eight sites administered the CES-D at baseline. For that subset of sites, we grouped families into those in which the primary caregiver was at risk for depression (CES-D at least 16) and those in which the primary caregiver was not at risk for depression. About 48 percent of primary caregivers were at risk according to this measure.

Because many of the family subgroups are correlated with each other, we constructed a maternal risk index to reduce the dimensionality of the subgroup analysis. We defined the index as the number of risk factors that the mother faced, including (1) being a teenage mother, (2) having no high school credential, (3) receiving public assistance, (4) not being employed or in school or training, and (5) being a single mother. We created three subgroups for the impact analysis: (1) those with 0 or 1 risk factor (low risk; 18 percent of mothers); (2) those with 2 or 3 factors (moderate risk; 55 percent of cases), and (3) those with 4 or 5 factors (high risk; 27 percent of cases). Because the high and low risk groups were relatively small, we also looked at two additional subgroups: families with 0 to 2 risk factors and families with 3 to 5 risk factors.

Estimation Issues. Random assignment simplifies estimating impacts for subgroups defined by child and family characteristics measured at the time of application to Early Head Start. Differences in the mean outcomes between program and control group members in a particular subgroup provide unbiased estimates of the impact of Early Head Start for the subgroup. For example, we estimated impacts for teenage mothers by comparing the mean outcomes of teenage mothers in the program and control groups. Similarly, we estimated impacts for female focus children by comparing the outcomes of girls in the program and control groups. We used similar regression procedures, as discussed above, to estimate impacts per eligible applicant and per participant only. We conducted statistical tests to gauge the statistical significance of the subgroup impact estimates, and the difference in impacts across levels of a subgroup.

Because our primary approach was to weight each site equally in the analysis, to avoid unstable results, we included sites in particular subgroup analyses only if their sample included at least 10 program group participants and 10 control group members in that subgroup. Most sites were included in each of the subgroup analyses, although this was not always the case (Table II.10). For example, for the full sample, only 8 sites had the requisite number of Hispanic families, only 11 had the requisite number of primary caregivers who lived with a spouse or partner, and only 12 had enough families with unborn focus children. Furthermore, fewer sites were included for outcomes constructed from data sources with lower response rates, such as the Bayley and video assessments. Thus, the subgroup results must be interpreted cautiously, because they are somewhat confounded with impacts by site.

We conducted several analyses to examine the sensitivity of the subgroup impact results to alternative estimation strategies. First, as described in the previous section, we estimated regression models where subgroup impacts on program and family characteristics were estimated simultaneously. The purpose of this analysis was to try to isolate the effects of a particular subgroup (for example, the mother’s age), holding constant the effects of other family and site features with which it may be correlated (such as education level). Second, we estimated impacts using different weighting schemes. For example, we estimated subgroup impacts where members of a subgroup from all sites were pooled, so that sites with more subgroup members were given a larger weight in the analysis than sites with fewer subgroup members. In most cases, our conclusions about impacts on subgroups defined by family and child characteristics are similar using these alternative estimation strategies. The figures presented in this report are based on our primary estimation approach discussed above.

c. Presentation of Results for Child, Family, and Site Subgroups

The results from the targeted analysis are presented in a similar way as the results from the global analysis. We present subgroup impact results per participant for the child, parenting, and family outcomes. Focusing on the impacts per participant in the subgroup analyses is particularly important because of some subgroup differences in participation rates (see Chapter IV). For example, if participation rates were high in center-based programs and low in home-based programs (which is not the case), comparing impacts per eligible applicant would be misleading, because the impacts would be “diluted” more for the home-based programs. Thus, focusing on the impacts per participant facilitates the comparison of impacts across subgroups. As with the global analysis, however, we present impact results per eligible applicant for the service use outcomes. For all outcomes, we indicate not only whether impact estimates for each subgroup are statistically significant, but also whether the difference between impacts across levels of a subgroup are statistically significant.

We view the subgroup impact results by site characteristics as particularly important, and present these results in Chapter VI. We present the results for the subgroups based on family and child characteristics together in Chapter VII. The emphasis we place on various subgroups in our presentation varies, depending on the outcome variable and our hypotheses about the extent and nature of expected program impacts.

d. Impacts by Level of Service Intensity and Program Engagement

Families in the program group received different amounts of Early Head Start services. The amount and nature of services that a particular family received were determined in part by family members themselves (because Early Head Start is a voluntary program), as well as by the amount and nature of services they were offered. Thus, the level of services received by families differed both within and across programs.

An important policy issue is the extent to which impacts on key outcomes varied for families who received different levels of service intensity. Evidence that service intensity matters (that is, that impacts are larger for families who received more services than for those who received fewer services) would indicate a need to promote program retention, and might justify focusing future recruiting efforts on those groups of families who are likely to remain in the program for a significant period of time.

We took two approaches to assessing evidence that service intensity matters: (1) an indirect approach that relies on service use data for groups of families and programs and that draws on the experimental subgroup analysis, and (2) a direct approach that relies on service use data at the individual family level and employs statistical techniques to account for the fact that families were not randomly assigned to receive more or less intensive services.

For the indirect approach, we compared impacts on key child and family outcomes for subgroups of families likely to receive intensive services to impacts for subgroups that were less likely to receive intensive services. Our hypothesis is that, if impacts are generally larger for the subgroups of families who received intensive services, then these results are suggestive that service intensity matters. Of course, there are likely to be other factors that could explain impact differences across subgroups besides differences in the amount and types of services received. However, a consistent pattern of findings across subgroups is indicative of dosage effects. An advantage of this approach is that it uses the subgroup impact estimates—that are based on the experimental design—to indirectly assess dosage effects. In Chapter III, we discuss variations in service intensity across key subgroups, and in Chapter IV, we discuss the linkages between service intensity and impacts on child and family outcomes as we present our subgroup findings.

We also attempted to directly assess the extent to which service intensity matters by using service use data on individual families. This analysis is complicated by the fact that families were not randomly assigned to different levels of service intensity. Rather, the amount of services a family received was based on the family’s own decisions, as well as on the services offered to the family in their site. Thus, estimating dosage effects is complicated by the potential presence of unobservable differences between those families who received different amounts of services that are correlated with child and family outcome measures and are difficult to account for in the analysis. If uncorrected, this “sample selection” problem can lead to seriously biased estimates of dosage effects.

For example, we generally find that less disadvantaged families were more likely to receive intensive services than more disadvantaged families. Thus, the simple comparison of the average outcomes of program group families who received intensive services with the average outcomes of program group families who received less intensive services are likely to yield estimates that are biased upward (that is, they are too large), because the outcomes of the high service-intensity group (better-off families) probably would have been more favorable regardless of the amount of services that they received. Multivariate regression analysis can be used to control for observable differences between the high and low service-intensity families. However, there are likely to be systematic unobservable differences between the two groups, which could lead to biased regression results.22 A similar sample selection problem exists if we were to compare high service-intensity program group families to the full control group.

As discussed in detail in Appendix D.7, we used propensity scoring procedures (Rosenbaum and Rubin 1983) as our primary approach to account for selection bias. This procedure uses a flexible functional form to match control group members to program group members based on their observable characteristics. The procedure assumes that, if the distributions of observable characteristics are similar for program group members and their matched controls, then the distributions of unobservable characteristics for the two research groups should also be similar. Under this (untestable) assumption, we can obtain unbiased impacts estimates for those who received intensive services by comparing the average outcomes of program group members who received intensive services to the average outcomes of their matched controls. Similarly, impacts for those in the low-service intensity group can be obtained by comparing the average outcomes of program group families who did not receive intensive services with their matched controls. The two sets of impact estimates can then be compared.

In order to test the robustness of our findings using the propensity scoring approach, we also estimated dosage effects by (1) calculating, for each program group member, the difference between their 14- and 36-month outcomes (that is, the growth in their outcomes), and (2) comparing the mean difference in these growth rates for those in the low and high service-intensity groups. This “fixed-effects” or “difference-in-difference” approach adjusts for selection bias by assuming that permanent unobservable differences between families in the two service intensity groups are captured by their 14-month measures. This analysis was conducted using only those outcomes that were measured at multiple time points. The details and limitations of this approach are discussed in Appendix D.7.

Results from the service intensity analysis using the propensity scoring and fixed effects approaches did not yield consistent, reliable results. Thus, we do not discuss these results in the main body of the report, but discuss them in Appendix D.7.

We estimated dosage effects using two overall measures of service intensity. First, we constructed a measure using data from the PSI and exit interviews. Families were categorized as receiving intensive services if they remained in the program for at least two years and received more than a threshold level of services. The threshold level for those in center-based sites was the receipt of at least 900 total hours of Early Head Start center care during the 26-month follow-up period. The threshold level for those in home-based sites was the receipt of home visits at least weekly in at least two of the three follow-up periods. Families categorized as receiving intensive services in mixed-approach sites were those who exceeded the threshold level for either center-based or home-based services. About one-third of program group families received intensive services using this definition.

Second, we used a measure of program engagement provided by the sites for each family in the program group. Program staff rated each family as (1) consistently highly involved throughout their enrollment, (2) involved at varying levels during their enrollment, (3) consistently involved at a low level throughout their enrollment, (4) not involved in the program at all, or (5) involvement unknown (they could not remember how involved the family was). Those 40 percent of families who were rated as consistently highly involved were considered to have received intensive services in our analysis.

There is some overlap between the two intensity measures, although there are many families who are classified as having receiving intensive services according to one measure but not the other. For example, about 58 percent of those classified as high dosage using the PSI measure were also classified as high dosage using the program engagement measure. Similarly, about half of those classified as high dosage using the program engagement measure were also classified as high dosage using the PSI measure.

The lack of perfect overlap between the two intensity measures reflects the different aspects of program involvement that they measure. The first measure is based on duration of enrollment and hours of center care or frequency of home visits, and reflects the quantity of services received, while the second measure captures staff assessments of families' level of involvement in program services in terms of both attendance and emotional engagement in program activities.

e. Mediated Analysis

The analyses described so far have not addressed the mechanisms whereby outcomes at one point in time (the mediators) might influence subsequent outcomes, or the extent to which impacts on mediating variables at an earlier age are consistent with impacts on later outcomes. We therefore conducted mediated analyses to examine how Early Head Start impacts on parenting outcomes when children were 2 years old are associated with impacts on children’s age 3 outcomes.

In presenting the results, we describe hypotheses based on child development theory and program theory of change that suggest age 2 parenting variables that could be expected to contribute to 3-year-old child impacts. The results of the mediated analyses permit us to estimate the extent to which the relationships between the 3-year-old child impacts and the parenting outcomes when children were 2 are consistent with the hypotheses. They suggest explanations for the impacts that Early Head Start programs produced when the children were 3 years old.

Mediated analyses serve several additional purposes:

  • They can be used to examine whether impact estimates for the evaluation are internally consistent (that is, they “make sense”) based on the theoretical relationships between mediating and longer-term outcomes.

  • Through these analyses, we provide plausible support for, or raise questions about, programs’ theories of change that suggest the programs can have an impact on children through earlier impacts on parenting behavior.

  • Program staff can use the results to focus efforts on improving mediating variables that Early Head Start has large impacts on and that are highly correlated with longer-term child outcomes. For example, if Early Head Start has a significant impact on the time that parents spend reading to their children, and if time spent reading is highly correlated with children’s language development, then policymakers could use this information to increase program efforts to promote reading.

The specific mediated analyses that we conducted, and the results from these analyses, are discussed in Chapters V and VI and Appendix D.9. The discussion in the remainder of this section focuses on the statistical procedures.

The approach to the mediated analysis can be considered a three-stage process. In the first stage, a longer-term outcome measure was regressed on mediators and other explanatory variables (moderators). In the second stage, the regression coefficient on each mediator was multiplied by the impact on that mediator. These products are what we would expect the impacts on the longer-term outcome to be, based on the relationship between the mediators and the longer-term outcome. We label them “implied” impacts. Finally, the implied impacts were compared to the actual impact on the longer-term outcome. These results indicate the extent to which impacts on the longer-term outcome variable can be partitioned into impacts due to each mediator.

Formally, we conducted the mediated analysis by first estimating the following regression model:

Equation6
where y is a longer-term (36-month) outcome, T is an indicator variable equal to 1 for program group members, Mi is a mediating (24-month) variable, X are explanatory variables (moderators), 0 is a mean zero disturbance term, and the other Greek letters are parameters to be estimated. The estimated parameters from this model were then used to partition the impact on y (denoted by Iy) as follows:

Equation7

where IMi is the impact on the mediator.

In this formulation, the parameter, (3)i(, represents the marginal effect of a particular mediator) on the longer-term outcome variable, holding constant the effects of the other mediators and moderators. For example, it represents the change in the longer-term outcome variable if the value of the mediator were increased by one unit, all else equal.23 Thus, the impact of Early Head Start on the longer-term outcome in equation (7) can be decomposed into two parts: (1) a part due to the mediators (the “implied” impacts), and (2) a part due to residual factors (represented by the parameter a1). Our analysis focuses on the part due to the mediators and the) extent to which these implied impacts account for the impact on the longer-term outcome.

As important as the mediated analyses are, we interpret them cautiously, for a number of reasons. Like correlation coefficients, they describe relationships without necessarily attributing causality. In addition, they do not allow us to test the structural model specifying the relationships between the two sets of measures. In general, interpretations of the results of mediated analyses are difficult because of the complex relationships between the parent and child measures, and the likely bias in these estimated relationships due to simultaneity (sample selection) problems. In other words, the estimated parameter on a particular parent outcome may be capturing the effects of other factors influencing the child outcome that are not controlled for in the regression models. We interpret the results cautiously for another reason: It is likely that the estimated relationships are biased upwards (that is, suggesting a strong relationship), because child outcomes tend to be better in families with better parent outcomes. With these considerations in mind, our goal is to examine the broad relationships between the mediators and longer-term outcomes to suggest explanations for the impacts that Early Head Start programs produced when the children were 3 years old.

3. Criteria for Identifying Program Effects

The global and targeted analyses generated impact estimates for a very large number of outcome measures and for many subgroups. In each analysis, we conducted formal statistical tests to determine whether program-control group differences exist for each outcome measure. However, an important challenge for the evaluation is to interpret the large number of impact estimates, to assess whether, to what extent, and in which areas Early Head Start programs make a difference.

The initial guide we use to determine whether programs have had an impact on a particular outcome variable at this interim stage was the p-value associated with the t-statistic or chi-square statistic for the null hypothesis of no program impact on that outcome variable. We adopt the convention of reporting as significant only those program-control differences that are statistically significant. So that we can examine patterns of effects, we include differences significant at p<.05 and p<.01, but we also note marginally significant findings, where p<.10, when they contribute to a consistent pattern of impacts across multiple outcomes.24 However, criteria more stringent than the p-values are needed to identify “true” program impacts, because significant test statistics are likely to occur by chance (even when impacts may not exist) because of the large number of outcomes and subgroups under investigation. For example, when testing program-control group differences for statistical significance at the 5 percent level, 1 out of 20 independent tests will likely be significant when, in fact, no real difference exists.

Thus, we apply several additional criteria to identify potential program impacts:

  1. We examine the magnitude of the significant impact estimates to determine whether the differences are large enough to be policy relevant. To provide a common benchmark that allows comparison across various findings that are based on different scales, we assess impacts in reference to effect size units. As noted earlier, the effect size is expressed as a percentage calculated by dividing the magnitude of the impact by the standard deviation of the outcome variable for the control group multiplied by 100.

  2. We check that the sign and magnitude of the estimated impacts and effect sizes are similar for related outcome variables and subgroups.

  3. We analyze subgroup impacts from the targeted analysis to examine whether impacts follow the pattern predicted (see below).

  4. We determine whether the sign and magnitude of the impact estimates are robust to the alternative sample definitions, model specifications, and estimation techniques discussed in this chapter.

  5. We drew on local research through discussion of findings with local researchers and include summaries of some of their research throughout the remaining chapters of this volume, and in Volume III.

In discussing subgroup findings, we compare impacts across subgroups and focus primarily on those differences in impacts that are statistically significant according to the chi-square statistic. The chi square is a conservative test, however, so we use it as a guide rather than an absolute rule. We also discuss impacts within particular subgroups that are statistically significant or relatively large (in terms of effect sizes), without comparison to their counterpart subgroups. Some of the demographic subgroups are small, and power to detect significant differences is low. In these subgroups, especially, we note relatively larger impacts even when they are not statistically significant, in order to identify patterns of findings. In drawing conclusions from the impact estimates, we focus on patterns of impacts across outcomes, rather than giving undue emphasis to isolated impacts.

In sum, we identify program effects by examining the pattern of results rather than by focusing on isolated results. At this early stage in the evolution of Early Head Start programs, it is important to be able to see the range of potential impacts, while at the same time using rigorous criteria for interpreting meaning across the outcome areas and various subgroups that are of the greatest interest to the Head Start Bureau, other policymakers, and the hundreds of Early Head Start programs around the country.




1Site staff reported that 10 control group families in 5 programs received Early Head Start services. One program had 4 crossovers, one program had 3 crossovers, and 3 programs had 1 crossover each.(back)

2The father study is supported with funding from the National Institute of Child Health and Human Development, the Ford Foundation, ACYF, and the Office of the Assistant Secretary for Planning and Evaluation. (back)

3Early Head Start evaluation data on the quality of child care used by families in the sample will be the subject of a special policy report.(back)

4Response rates to the father interviews are discussed in Appendix B.(back)

5The sample that completed all three interviews is used in the growth curve analysis as described later in this chapter.(back)

6The sample that completed the 24- and 36-month interviews is used in the mediated analysis as described later in this chapter.(back)

7Appendix D.2 in the interim report displays response rates by site to the 15-month PSI and the 24-month PI and Bayley and video assessments. The 24-month findings are very similar to the 36-month ones.(back)

8To further test the age bias, we estimated impacts separately by the age of the child at interview completion by including in the regression models explanatory variables formed by interacting child’s age with an indicator of whether the family is in the program group. These results indicate that the estimated impacts on key outcomes do not differ by the age of the child at interview completion (that is, the interaction terms are not statistically significant at the 5 percent level). Thus, we are confident that the impact estimates are not biased due to age differences of the children at interview completion.(back)

9The estimated standard errors of the impact estimates take into account the variance of outcomes within sites, but not the variance of impacts across sites. Thus, from a statistical standpoint, the impact estimates can be generalized to the 17 research sites only (that is, are internally valid), but not more broadly (that is, are not externally valid).(back)

10Appendix D presents impact estimates where sites are weighted by their sample sizes. These results are very similar to those presented in the main body of this report.(back)

11We imputed missing values for the explanatory variables. If an explanatory variable was missing for 5 percent of cases or less, then missing cases were assigned the mean of the explanatory variable for nonmissing cases by site, research status, and race. If an explanatory variable was missing for more than 5 percent of cases, then we set the variable equal to zero for the missing cases and included as an explanatory variable an indicator variable that was set to 1 for missing cases and to zero otherwise.(back)

12Several explanatory variables, however, did not pertain to some sites (Appendix Table E.II.B). For example, only 12 programs served families whose English was “poor,” so the control variable for this measure varied only for families in those 12 programs.(back)

13The impact estimates per participant are slightly less precise than the impact estimates per eligible applicant, because the standard errors of the impact estimates per participant must take into account the estimation error of the participation rate in each site.(back)

14With only three data points, it is necessary to posit a linear relationship between the outcome measure and the child’s age. With additional follow-up data, it would be possible to include quadratic age terms as additional explanatory variables in the model.(back)

15To increase the precision of the estimates, the growth curve models were estimated in one stage rather than two by inserting equations (3) and (4) into equation (2) and by setting the 383to zero. Generalized least squares techniques were used to estimate this regression model where the explanatory variables included a treatment status indicator variable, a variable signifying the age of the child at the interview or assessment relative to 15 months, a term formed by interacting child’s age relative to 15 months and the treatment status indicator variable, and the X variables.(back)

16The estimates from the growth curve model represent impacts per eligible applicant. We did not estimate impacts for participants using this approach because of the analytic complications of obtaining these impacts and their correct standard errors.(back)

17In particular, we select outcome measures that are continuous variables (not binary or categorical variables) and that are not age-normed.(back)

18We also estimated growth curve models using sample members that had available data for at least two data points by specifying a simplified (random effects) error structure in equations (2) to (4). These results are very similar to those using the sample that have three data points, and are not presented in this report. We did not use statistical procedures to impute missing outcome data for our analysis, because response rates were similar for program and control group members. Thus, we are confident that our impact estimates are unbiased. Furthermore, we were concerned that imputing a large amount of outcome data could generate biased estimates.(back)

19For completeness, we also present impacts on eligible applicants for selected child, parenting, and family impacts in Appendix D. These show essentially the same patterns of impacts as the analysis of impacts for participants that we present in the main body of this report. In addition, as discussed, we only present impacts on eligible applicants for the growth curve analysis.(back)

20 We used a two-tailed test because it was not reasonable to assume a priori that Early Head Start would have only beneficial impacts on all outcomes, given that control group families could obtain other services in the community. The convention used throughout the Early Head Start evaluation reports is that * indicates p<.10, ** indicates p<.05, and *** indicates p<.01.(back)

21The assessment of levels of implementation is directly linked to the revised Head Start Program Performance Standards, and involved a systematic and rigorous process that is described fully in Chapter II of Leading the Way, Volume III (Administration on Children, Youth and Families 2000) and summarized in Appendix C of this report.(back)

22In logit regression models where the probability a family received intensive services was regressed on baseline measures from HSFIS and on site-level indicator variables, the pseudo-R(2) values were only about .10. Thus, service receipt decisions can be explained only in small part by observable variables. (back)

23For simplicity, we assume that the effect of the mediator on the longer-term outcome variable is the same for the program and control groups. This assumption can be relaxed by including in the model terms formed by interacting the mediators and the program status indicator variable.(back)

24The majority of significant impacts reported are significant at the .05 or .01 level, and in each set of related child or family outcomes for which we found any significant impacts, the pattern of significant impacts includes some (or all) impacts that are significant at the .01 or .05 level.(back)

 

 Table of Contents | Previous | Next