Table of Contents | Previous | Next |
Appendix 2.3: The Racial/Ethnic Composition of the Study Sample
This appendix examines the distribution of the research sample by race/ethnicity at each stage of sampling for the study, in relation to both the Head Start Program Information Report data system that served as the original starting point for sampling and the newer HSNRS. It shows how the set of newly entering Head Start children studied in this report came to differ somewhat from published information regarding the share of Head Start program participants in different racial/ethnic groups. It also demonstrates that these differences—a somewhat higher share of the age 3 cohort in the Black (non-Hispanic) category than is true for the sampling frame defining the population studied and a somewhat higher share of the age 4 cohort in the Hispanic category—are due largely to normal sampling variation in the selection of the programs, centers, and children for study.
Exhibit A.2.3.1 shows the distribution of the portion of national Head Start enrollment covered by the selected research sample at each stage of the sampling process, by race/ethnicity. It shows a small but steady increase in the percentage of Hispanic children and a small but steady reduction in the percentage of Black children (lines 1 to 10 of the exhibit). There are several causes of the apparent shift in the racial/ethnic distribution:
-
Exclusion of programs and Head Start centers in communities saturated by Head Start (i.e., communities where all eligible families interested in Head Start are already served and vacancies exist);
-
Chance sampling error when picking the geographically based PSUs at the beginning of the process, as well as in later selection of programs within PSU and centers within program;
-
Differences in race/ethnicity reporting procedures among the PIR, HSNRS, and instruments used by the study to measure additional characteristics of individual Head Start centers (the CIF and the applicant rosters); and
-
Definitional differences in the populations being compared, newly entering children versus all children served by the program.
Additional deviations occur when the sample is divided by age cohort (lines 11 to 14 of the exhibit). These reflect previously unmeasured variations in the types of children Head Start serves, particularly newly entering children, at different age levels. These various steps are discussed in more detail below.
Racial/Ethnic Distribution at the Program Level (Lines 1 to 6)
The initial sampling frame appears at the top of the exhibit: all grantees in the PIR data system for the 1998-1999 Head Start program year. These were the most recent data available when sampling began in late 2000 (when commitments to include certain sections of the country in the study had to be made to stay on schedule for the research as a whole). Data on race and ethnicity from the PIR are self-reported by agencies and do not break down Head Start enrollees by age reflective of the two analysis cohorts used in this report. As a result, the initial rows of the exhibit provide numbers for the combined group of all children potentially eligible for inclusion in the study.
Line 1 of the exhibit looks at the racial/ethnic composition of the 1,715 Head Start programs (i.e., grantees) that existed in the 1998-1999 program year, with the race/ethnicity data updated to the 1999-2000 program year where feasible,1 and following PIR data as described by PIR guidelines given to reporting agencies:
-
Actual Enrollment. “The total number of children who have been enrolled in your program for any length of time, provided they have attended at least one class or, for home-based children, received at least one home visit. This includes children who have dropped out or enrolled late. Those children funded by other sources who are part of the Head Start program and receive Head Start services are to be included in the actual enrollment figures.”
- Race/Ethnicity. “Of the total actual enrollment, the number of children in the following ethnic categories: AMERICAN INDIAN OR ALASKAN NATIVE. (A person having origins in any of the original peoples of North and South America, and who maintains tribal affiliation.); ASIAN. (A person having origins in any of the original peoples of the Far East, Southeast Asia, or the Indian subcontinent.); BLACK OR AFRICAN AMERICAN. (A person having origins in any of the Black racial groups of Africa.); HISPANIC OR LATINO. (A person of Cuban, Mexican, Puerto Rican, South or Central American, or other Spanish culture or origin, regardless of race.); NATIVE HAWAIIAN OR OTHER PACIFIC ISLANDER. (A person having origins in any of the original peoples of Hawaii, Guam, Samoa, or other Pacific Islands.); and, WHITE. (A person having origins in any of the original peoples of Europe, the Middle East, or North Africa.).”
Line 2 of the exhibit describes the reduced frame of 355 programs in the 25 PSUs selected for the study (see Chapter 1), weighted by the inverse of each PSU’s probability of selection. The source of enrollment data here is again the PIR. The slight increase in the percentage of Hispanic children and decrease in the percentage of Black children at this point are most likely due to chance sampling errors when selecting the 25 PSUs at random out of a much larger universe of PSUs spanning the entire United States.
Line 3 reduces the frame to just 261 programs through subsampling in the largest PSUs and exclusion of certain unusable programs. These programs are weighted to reflect the multiple steps used to arrive at this frame, i.e., by the product of (1) the inverse of the probability of selection of the PSU, (2) the inverse of the probability of a particular program’s being selected when subsampling programs in the largest three PSUs, and (3) an adjustment for excluding from the frame eight programs already involved in extensive data collection for FACES. The source of race/ethnicity data here is again the PIR. Estimates from the subsample of 261 selected programs in line 3 closely match those of the frame from which they were selected (line 2), indicating that chance sampling variation and the eight systematic exclusions did not lead to any shift away from the universe of interest.
Line 4 reflects a frame of 223 programs, dropping from line 3: (1) the programs found to be in saturated communities after screening by study staff and (2) a small number of programs that had closed since the 1998-99 program year. Programs are weighted here as in the previous line: by the product of the inverse of the probability of selection for the PSU, the inverse of the probability of selection for subsampling in the three PSUs, and an adjustment for excluding the eight FACES programs. The source of enrollment data is the PIR. After dropping the saturated and closed programs, the estimates in line 4 remain close to those in line 3 in terms of racial/ethnic composition.
Line 5 of the exhibit looks at the subsample of 90 programs selected for actual participation in the study from among the 223 identified as part of the frame at the previous step. These are weighted by the inverse of each program’s overall probability of selection through all steps in the sampling to this point. Using race/ethnicity data from the PIR once again, it is shown that the estimates from the 90 sampled programs very closely match the frame from which they were selected (line 4). Confidence intervals are provided for the estimates of the share of children in each racial/ethnic group, indicating the range of values that almost certainly contains the true overall population share once sampling variation is taken into account. These 95 percent confidence intervals show a fairly wide potential for true population shares to differ from the sample-driven estimates, though the latter remain the single best indicators of how the population represented by the data may have (or in this case, has not) shifted as the set of programs to be studied narrowed.
Line 6 drops three additional saturated programs (ones identified as saturated only once study staff began working with the 90 sampled programs to determine the centers appropriate for inclusion in the study), two that were part of intensive data collection that same year for Head Start’s QRC program and one that had closed. Again using race/ethnicity data from the PIR, the percentage of Hispanic children in the universe represented by the data now jumps 3 percentage points, while the percentage of Black children drops by 6 percentage points. The 7 percentage-point difference in the percentage Black enrollment here compared to the first row of the table is probably not due solely to sampling error, since (from the previous step) a 95 percent confidence interval ranges just plus or minus 6 percentage points from its midpoint. Some may be sampling error, however. Still, an analysis of the programs excluded to this point because they are in saturated communities shows them to be less Hispanic than the norm, accounting for another portion of the change. (One excluded saturated program had a very large enrollment that was more than 90 percent Black.) Another contributing factor to the shift in the race/ethnicity distribution may be that the saturation adjustment to the program weights (see Appendix 1.2) does not sufficiently control for race/ethnicity, as race/ethnicity data were only collected for all Head Start enrollees.
Racial/Ethnic Distribution at the Center Level (Lines 7 to 10)
Step 7 of the process moves from programs to centers as the unit of sampling, using the CIF. This form was developed by the study and filled out jointly with program staff to identify and gather information about each center relevant to the sampling process. In total, the 84 remaining programs provided a frame of 1,254 centers. Data on race/ethnicity were collected on the CIF using the same total enrollment concept and the same race/ethnicity categories as the PIR. However, the CIF data in this row are not strictly comparable to PIR information in the previous rows for two reasons: the CIF collected its information 2 to 3 years later than the PIR, and the PIR figures include any child who attended Head Start at any time during the program year involved (1999-2000). The CIF, on the other hand, collected enrollment counts as of a single point in time (October 1, 2001).
Moreover, the CIF data are not 100 percent complete. Eight percent of the centers operated by the 84 sample programs provided no information on their Hispanic enrollment, and 9 percent were missing information on Black enrollment. Values were imputed for these cases by multiplying each center’s reported total enrollment by the average percentage Black and percentage Hispanic enrollment for other sampled centers in the same zip code, city, county, program, or PSU (moving outward geographically as far as was required to obtain available data). As a check on the accuracy of the reporting of race/ethnicity enrollments by the centers, the sum of enrollment across all race/ethnicity categories was compared to the reported total enrollment in each center and found to be fairly consistent (i.e., their sum came very close to the reported total enrollment in 95 percent of the centers involved).
Because the line 7 estimates are based on a complete census of all the centers in each sampled program, the sum of enrollments across centers for any program is weighted according to that program’s final weight, and no new sampling variability is added. However, the shift in measurement methods results in a slight increase in the percentage Hispanic and a moderate decrease in the percentage Black enrollment. However, these differences appear to be within the overall sampling error of the process to this point, as indicated by the width of the 95 percent confidence intervals (i.e., see line 7).
Line 8 drops saturated centers from the frame but makes no other changes (i.e., estimates are still weighted by the final program weight and the source of enrollment data is the CIF). This produces another slight increase in the percentage of Hispanic enrollees and a further slight decrease in the percentage of Black enrollees as compared with line 7, though enrollee differences are still well within the overall sampling error to this point. However, the total upward “creep” in percentage Hispanic enrollment from the original PIR program frame at step 1 has, by this point, reached 8 percentage points, with an offsetting downward shift in the percentage of Black enrollment of equal magnitude. The percentage of White enrollment is essentially unchanged from its starting point.
Line 9 estimates are based on the final sample of selected centers, 458 of the 1,254 total centers at the previous step, with each center weighted by the inverse of its overall probability of selection (incorporating sampling probabilities at the PSU and program as well as center level). As shown, the race/ethnicity distribution of the sample of centers matches the frame from which it was selected, again based on CIF data.
Line 10 shows the consequences of the removal from the sample of 80 centers because of the late discovery of saturation and closure and, in a very small number of cases, refusal by agency leadership to implement random assignment and participate in the study. Estimates are based on CIF data and weights equal to the inverse of the overall probability of selection of a given center with an adjustment to compensate for the dropped saturated and refusing centers. This adjustment inflates the weights of the 378 remaining centers to reach the same total enrollment of newly entering children (as reported on the CIF) as the original 458 centers sampled (i.e., the 378 represent all sampled centers still in operation). The net result of the deletions and adjustments is another slight increase in the percentage Hispanic and a slight decrease in the percentage Black enrollment, but again, differences at this particular step are well within the overall sampling error.
Racial/Ethnic Distribution at the Child Level (Lines 11 to 14)
Line 11 shifts again to a new data source, moving progressively closer to counts of enrollees that will actually flow into the frame during child-level sampling (at step 12 below). Once agreement was reached on the exact centers selected for participation in the study and applications for the 2002-2003 program year started to be submitted, grantee and delegate agency staff, supported by the research team, began filling out rosters of all applicants. When assembled on a cumulative basis, these rosters were considered a census of all the Head Start applicants at a given center over the study’s intake period and so were weighted using the same final center weights used at the preceding step. The source of the new counts of children by race/ethnicity at this stage was the race/ethnicity field on the pre-formatted roster form, again patterned after the PIR (and hence CIF) categories. Despite using a new data source and a later program year (i.e., the upcoming 2002-2003 year, as compared with the enrollment experiences at the start of the 2001-2002 program year captured by the CIF), the figures for all children on the roster almost identically match those from the earlier CIF data in line 10. The slight shift that does occur continues the very gradual upward creep of the percentage Hispanic and the corresponding incremental decline of the percentage Black enrollments.
The rosters of applicants included information on each child’s age group, i.e., whether the child was thought by program staff to be 1 year away from kindergarten entry (the 4-year-old group) or 2 years away (the 3-year-old group). It thus became possible at this stage to conduct sampling separately for the two age groups. Line 11 of the exhibit breaks out the figures for each racial/ethnic group into separate distributions for the two age groups. Eight percent of the children on the rosters had missing data for either age or race/ethnicity and were not included in these figures. As shown in Exhibit A.2.3.1, a major shift occurs when the overall numbers for percentage Hispanic and percentage Black enrollment are broken down into separate numbers by age group: the 3-year-old group is found to be several percentage points more Black and less Hispanic than the average, and the 4-year-old group is several percentage points more Hispanic than the average. These racial/ethnic distinctions by age level have not previously been documented in national Head Start data since the two factors are not cross-tabulated in the PIR.
Another large shift in the populations described also occurs at line 12. Returning children who had already participated in Head Start (or Early Head Start) and a very small number of children considered “high-risk” by participating grantees and delegate agencies were excluded from random assignment based on information on the rosters (a check for duplicate entries further pruned the rosters). Next, using the local agency’s eligibility criteria (usually a numerical score), the list of newly entering children that would ordinarily have been enrolled was “extended” to provide for a specified number of children who would subsequently be randomly assigned to the non-Head Start group and not enrolled in the program. (The children added were those who would be “next in line” for admission based on the agency’s eligibility criteria.) This extended list became the sampling frame for actual random assignment at step 13 below. Together, the restrictions just described shrank the sampling frame on the rosters from 27,526 children to 14,439 children, with most of the deletions resulting from the exclusion of children who had previously participated in Head Start or Early Head Start.
The line 12 estimates are based on this restricted frame, with each child weighted by the final center weights consistent with the fact that the rosters constituted a census of the relevant children at each center. The source of the race/ethnicity categories is again the rosters of applicants, with the 8 percent of children missing either age or race data excluded from the calculations for Exhibit A.2.3.1 (though not from randomization). Note that as with all applying children, the distribution of race/ethnicity for the “top priority” newly entering applicants in the 3-year-old group differs markedly from that of the 4-year-old group. The percentage Black enrollment is now very much higher for the 3-year-old group than the 4-year-old group, and the percentage Hispanic enrollment follows an equally sharp reverse pattern. For the 3-year-old group, this offsets the gradual drop in percentage Black enrollment in previous rows of the exhibit, while for the 4-year-old group it exacerbates the rise in the Hispanic enrollment. Thus, compared with the race/ethnicity distribution estimated from CIF data just two steps earlier (line 10) the 3-year-old group looks hardly any different (the percentage of Black children has risen 3 percentage points by line 12, mostly at the expense of the percentage of White children). The 4-year-old group distribution has changed radically, however, to 52 percent Hispanic and 17 percent Black enrollment compared to 38 and 28 percent at the earlier point. It is important to recall that the reference population for line 12 is the population of newly entering children, whereas the population for line 10 is the population of all Head Start enrollees.
Line 13 moves from the restricted frame of 14,439 children to the 4,747 children sampled into the Head Start and non-Head Start research groups through random assignment. The randomization algorithm allocated children in the right proportions into statistically equivalent Head Start and non-Head Start samples and into a group of children admitted to Head Start (to provide for full enrollment) but who were not included in the study. The children in the two research samples are banded together in line 13 and weighted by the inverse of each child’s overall probability of selection, including child-level sampling at random assignment and all prior stages of selection. The within-center probability of selection was approximated as the ratio of the number of sampled Head Start and non-Head Start children in each center to the newly entering enrollment for the center as a whole. This reflects actual program size rather than the artificial construct of the impact study created by all children included in the random assignment pool. Total newly entering enrollment by center was collected on the CIF as of October 1, 2001, and updated in about half the centers to reflect fall 2002 numbers. The goal was for each research sample to weight up to the national population of newly entering Head Start enrollees as of fall 2002.
The source of the race/ethnicity data for this population is again the application roster. At this point, about 9 percent of child records had missing age or race and were excluded from line 13. The population represented by the selected sample of study children closely matches the frame from which it was selected (line 12)—although the 3-year-old group continues to shift incrementally to greater representation of Black children and less representation of Hispanic children.
The last line in the exhibit, line 14, provides estimates of the population represented by the baseline data used in this report—the sampled Head Start and non-Head Start children for whom cognitive assessments were completed in fall 2002. Each child was weighted by his or her overall probability of selection from the previous step, a nonresponse adjustment to account for children who did not complete fall 2002 assessments, and, for the 4-year-old group, a poststratification adjustment to the race/ethnicity proportions for the newly entering 4-year-olds from the HSNRS. This last adjustment, which could not be done for the 3-year-old group because the HSNRS samples only 4-year-olds, is undertaken to reduce sampling error, as explained below. However, the race/ethnicity data collected by the HSNRS do not follow any type of standardized definitions; they are reported by category using definitions decided by individual grantees and may not be comparable to those of PIR, CIF, or application roster. As before, about 9 percent of the relevant records were missing age or race information on the roster and are excluded from the calculations.
As can be seen by comparing lines 13 and 14, poststratification to the HSNRS race/ethnicity distribution significantly reduces the estimated proportion of Hispanic children and increases the proportion of Black children in the 4-year-old group. This closes most of the gap between the line 13 numbers—the starting point for poststratification—and the control totals for newly entering 4-year-olds shown at the bottom of the exhibit and provided by the HSNRS. Differences remain, however, because the data were not poststratified directly to the overall national distribution for newly entering 4-year-olds from the HSNRS. Rather, for each race/ethnicity category, we poststratified to the ratio of the HSNRS percentage for all programs reporting in the HSNRS to the study sample percentage for 84 programs, using HSNRS first year enrollment data. This poststratification adjustment does not remove real differences in concepts and measurement between the two data sources but is intended to reduce the PSU and program component of sampling error (the change from line 1 to line 5 in the Exhibit).
Since the poststratification adjustment closed most of the gap, we learn from this procedure that the difference in the racial/ethnic composition of the 4-year-old group is partially due to sampling error from the sampling of PSUs and programs. Differences in race/ethnicity reporting procedures among the PIR, HSNRS and the NHIS and in the populations being compared (newly entering vs. all children) also contribute to the differences observed. These differences do not necessarily indicate there is a systematic bias in the NHIS sample with respect to race/ethnicity. Presumably the same is true of the 3-year-old group, having been generated in precisely the same manner at every step of the process, described in this appendix.
| Data Source - Units Measured | Observations Examined | Percent Hispanic | 95 Percent Confidence Interval | Percent Black | 95 Percent Confidence Interval | Percent White and Other |
|---|---|---|---|---|---|---|
| 1. PIR - Total Enrollment | All programs in the National PIR Data System (N=1,715) | 28% | 37% | 36% | ||
| 2. PIR - Total Enrollment | Frame of programs in selected PSU's (N=355) | 30% | 38% | 32% | ||
| 3. PIR - Total Enrollment | Subsample of programs in selected PSU's (N=261) | 30% | 38% | 32% | ||
| 4. PIR - Total Enrollment | Restricted frame of programs (less saturated) (N=223) | 31% | 38% | 30% | ||
| 5. PIR - Total Enrollment | Selected programs [grantees/delegate agencies] (N=90) | 31% | [25%,36%] | 39% | [33%,45%] | 30% |
| 6. PIR - Total Enrollment | Final sample of programs (N=84) | 34% | 33% | 33% | ||
| 7. CIF - Total Enrollment | Frame of centers (N=1,423) | 35% | [30%,40%] | 30% | [24%,36%] | 34% |
| 8. CIF - Total Enrollment | Restricted frame of centers (less saturated) (N=1,254) | 36% | [31%,41%] | 29% | [24%,34%] | 35% |
| 9. CIF - Total Enrollment | Selected centers (N=458) | 36% | [30%,42%] | 29% | [23%,35%] | 35% |
| 10. CIF - Total Enrollment | Selected centers where RA was done (N=378) | 38% | [32%,44%] | 28% | [22%,34%] | 34% |
| 11. Roster - All Applicants | Frame of children (including exempt) (N=27,562) | 39% | 27% | 34% | ||
| All 3-Year-Old Group | 35% | 31% | 34% | |||
| All 4-Year-Old Group | 42% | 25% | 33% | |||
| 12. Roster: | Restricted frame of children (N=14,439) | 44% | 24% | 32% | ||
| Nonexempt, Newly entering Applicants | Newly Entering 3-Year-Old Group | 37% | 31% | 32% | ||
| Newly Entering 4-Year-Old Group | 52% | 17% | 31% | |||
| 13. Roster: | Sampled children (N=4,747) | 43% | 25% | 32% | ||
| Nonexempt, Newly entering Applicants | Newly Entering 3-Year-Old Group | 34% | 33% | 33% | ||
| Newly Entering 4-Year-Old Group | 53% | 16% | 31% | |||
| 14. Roster: | Poststratification (N=3,723 child respondents) | 37% | 30% | 33% | ||
| Nonexempt, Newly entering Applicants | Fall 2002 NHIS - Newly Entering 3-Year-Old Group | 33% | 35% | 32% | ||
| Fall 2002 NHIS - Newly Entering 4-Year-Old Group w/ poststratified weight | 43% | 24% | 33% | |||
| HSNRS: Newly Entering 4-Year-Old Group | 36% | 25% | 39% | |||
| HSNRS: Returning 4-Year-Old Group | 26% | 35% | 39% |
1 Updates were not made for programs absent from the 1999-2000 PIR data or with missing data that year. (back)
| Table of Contents | Previous | Next |

