Skip Navigation
acfbanner  
ACF
Department of Health and Human Services 		  
		  Administration for Children and Families
          
ACF Home   |   Services   |   Working with ACF   |   Policy/Planning   |   About ACF   |   ACF News   |   HHS Home

  Questions?  |  Privacy  |  Site Index  |  Contact Us  |  Download Reader™Download Reader  |  Print Print      

Office of Planning, Research & Evaluation (OPRE) skip to primary page content
Advanced
Search

 Table of Contents | Previous | Next

Appendix

Methodology

I. Sample Design Overview

The third cohort of Head Start children for FACES was selected as a two-stage sample.28  The first stage sampling units were Head Start programs; the second stage units were classes within sampled programs. In each sampled classroom, all eligible children in their first year of Head Start were taken into the sample.

A. Programs

The sampling frame of eligible Head Start programs was constructed from the 1998-1999 Program Information Report (PIR). Migrant and Seasonal Head Start programs, American Indian/Alaska Native Head Start programs, Early Head Start programs, programs in the territories, and programs that do not serve children directly were excluded, resulting in a frame of 1,675 programs. The programs were stratified by Census region (NE, NC, S, W), percent minority (above/below 50%), and metro or urban/rural status (MSA/non-MSA). These are the same stratification variables used in sampling programs for the first (Spring 1997) and second (fall 1997 – spring 2000) FACES cohorts.

A sample of 45 programs was selected in spring 2000. The sample size in each stratum was proportional to the stratum first year Head Start enrollment. The programs were selected with probability proportional to the program’s first year enrollment using systematic sampling. The first year enrollment was calculated from the PIR by subtracting the reported second and third year enrollment from the total enrollment. A Keyfitz procedure was used to minimize the overlap with the 40 programs sampled previously by Abt Associates for the first and second FACES cohorts. As a result, there was no overlap with the previous program sample. Of the 45 programs selected for the third cohort, two were later discovered to be ineligible because they had been defunded.

B. Classrooms

In the 43 remaining programs, lists of the anticipated classes for fall 2000 were obtained in late summer 2000. The programs also provided the expected number of first year Head Start children in each class. These lists formed the basis for the classroom sampling frame, after excluding classes with no first year children. Classes with fewer than five first year children expected were combined with another class in the same center to form a “class group.” The class groups were treated as a single unit for sampling purposes and sample size calculations. The total target sample size of first year children was 2,825, or 66 per program. In general, the desired sample size of classes in each program was determined as 66 / (average class size for the program), where the average class size was in terms of number of first year children. The actual initial sample size was increased by 2 classes to allow for a reserve sample in each program. In programs where the total first year enrollment as obtained from the class rosters was more than twice the measure of size used to sample the program, the initial class sample size was increased to prevent large variation in class weights. In small programs where the initial sample size exceeded the number of classes available, all classes were taken with certainty.

Classes were sorted by center within program and were sampled with equal probabilities. A subsample of the initial sample was selected with equal probabilities to obtain a main sample of the desired sample size and a reserve sample of two classes in each program. A total of 367 classes were selected: 279 classes for the main sample and 88 for the reserve sample. The number of main sample classes in each program varied from 3 to 15, with an average of 6. (In terms of collapsed classrooms, a total of 252 classroom groups were sampled for the main sample and 82 for the reserve sample, for an average of 6 per program and a range of 3 to 10.)

In fall 2000 the eligibility status of the main sample classes was determined. One or two reserve classes were added in some programs to prevent a shortfall in the target number of first year children for the study. The final sample for weighting purposes included all sampled classes where an attempt was made to collect data from the classroom, including those discovered to be ineligible. The rationale for this is because ineligible classes in the sample represent ineligible classes on the Head Start program frame. A total of 307 main and reserve classes were in sample in fall 2000. Twenty of the 307 classes were discovered to be ineligible when the program was contacted by field staff in fall 2000 because they no longer existed, they didn’t receive Head Start funding, or they had no first year Head Start children. In 286 of the remaining 287 eligible classes (one teacher refused to allow the children in her class to be sampled), all first year children were taken into the sample.

II. Response Rates

Fall 2000

  • 2,508 child assessments were completed out of 2,790 for a completion rate of 90 percent.

  • 2,488 parent interviews were completed out of 2,790 families selected for the sample (89 percent).

  • Teacher report forms were obtained on 2,532 of the sample children (91 percent).

  • Assessment, parent, and teacher data were obtained on 2,396 of the 2,790 sample children (86 percent).

  • A total of 278 classrooms were observed out of 286 in the sample for a completion rate of 97 percent.

Spring 2001

  • 2,232 child assessments were completed out of 2,288, representing 98 percent of the children who remained in the program, and 80 percent of the original sample (2,790).

  • 2,166 parent interviews were completed out of 2,288, representing 95 percent of the children who remained in the program, and 78 percent of the original sample.

  • Teacher report forms were obtained on 2,236 of the sample children, representing 98 percent of the children who remained in the program and 80 percent of the original sample.

  • Assessment, parent, and teacher data were obtained on 2,115 of the 2,288 sample children who remained in the program (92 percent).

  • A total of 275 classrooms were observed out of 284 in the sample for a completion rate of 97 percent.

Spring 2002 (Kindergartners Only)

  • 831 child assessments were completed out of 979, representing 85 percent of the children who were in kindergarten in spring 2002.

  • 901 parent interviews were completed out of 979, representing 92 percent of the children who were in kindergarten in spring 2002.

  • Teacher report forms were obtained on 681 of the children, representing 70 percent of the children who were in kindergarten in spring 2002.

  • Assessment, parent, and teacher data were obtained on 624 of the 979 children who were in kindergarten in spring 2002 (64%).

III. Program Weights

The program weight was calculated as the inverse of the program’s probability of selection. As mentioned earlier, a Keyfitz procedure was used to minimize the overlap with the cohort 1 and 2 program sample drawn earlier by Abt Associates. This procedure involved calculating conditional probabilities of selection, which are based on whether the program was sampled previously or not, and whether its probability of selection increased compared with the previous sample. Prior to sampling for cohort 3, the unconditional probability of selection for each program on the cohort 3 frame was calculated as

equation

where Nh (is the number of programs on the frame in stratum h, NEWSMPSZh is the sample size for stratum h for the cohort 3 design, and FIRSTYRi is the first year enrollment for program i from the PIR. The probability of selection for each program under the Abt sample design for cohorts 1 and 2 was also calculated as

equation

where Nh was the number of programs on the Abt frame in stratum h, SAMPSIZEh was the sample size for stratum h under the Abt design, and ENRTOTi was the total enrollment for the i-th program from an earlier PIR.

The conditional probability of selection was calculated for each program on the cohort 3 frame according to the Keyfitz procedure as:

Case 1: NEWPSEL ³ 1 - ORIGPSEL

equation

if the program was sampled for cohorts 1 and 2, = 1 if the program was not sampled for cohorts 1 and 2.

Case 2: NEWPSEL < 1 - ORIGPSEL

CONDPROB = 0 if the program was sampled for cohorts 1 and 2,

equation

if the program was not sampled for cohorts 1 and 2.

These conditional probabilities of selection were the measures of size used to select the cohort 3 program sample. It can be shown that the Keyfitz procedure preserves the unconditional cohort 3 program probabilities of selection, while at the same time minimizing the overlap. Thus the cohort 3 program weight is the inverse of NEWPSEL, the unconditional probability of selection under the cohort 3 design.

All 43 eligible programs cooperated with the study, so that nonresponse adjustments at the program level were unnecessary.

For each program, a set of 43 jackknife replicate weights was created for calculating standard errors. The replicate weights were created using a standard stratified jackknife procedure. One program at a time was dropped (i.e. given a zero replicate weight) and the weights of the remaining programs in the same stratum were adjusted by a factor of nh/(nh-1), where nh is the number of sampled programs in stratum h. The program weights in the other strata were left unchanged. By repeating this 43 times, 43 replicate weights were obtained for each program. For estimates involving child or classroom data from all 43 programs, the degrees of freedom for the variance of the estimate is #PSUs - #varstrat = 43 - 12 = 31. (One of the 13 original sampling strata was collapsed with an adjacent stratum for variance estimation purposes because it contained only one eligible sampled program.)

A. Classroom Weighting

Two sets of class weights were produced for classroom level estimation: one set for fall 2000 cross-sectional estimates and a second set for fall 2000 – spring 2001 longitudinal classroom analysis. Class base weights were first created that reflected the overall probability of selection for the class, including the program probability of selection. These base weights were adjusted for classroom level nonresponse, using the following criteria for a complete classroom:

Fall 2000 cross-sectional estimates: the classroom must have complete fall 2000 observation data. Classroom observation data include counts of children and adults, Assessment Profile (Scheduling, Learning Environment, and Individualizing), ECERS-R, Arnett Caregiver Interaction Scale, Teacher-Directed Activities Checklist and Wrap-Up measures.

Fall 2000-Spring 2001 longitudinal analysis: the classroom must have complete observation data for either fall 2000 or spring 2001 and child assessment data for both fall 2000 and spring 2001.

A1. Class Base Weights
A class base weight was created for each of the 367 initially sampled classes in Fall 2000. Fifty-four reserve classes that were never used were given base weights of zero. Six main sample classes were sampled out on an ad hoc basis by field staff to reduce burden and to have independence between classes. They were assigned base weights of zero, since they were not part of the final sample. In this situation, a teacher had both a morning and afternoon class in the sample. One class out of the morning/afternoon pair was subsampled.

The remaining 307 classes considered to constitute the sample were each assigned a class base weight equal to the inverse of their overall probability of selection. The overall probability of selection is the product of the program probability of selection and the probability of selecting the class within the program. The inverse of the overall probability of selection can also be written as the product of the program weight and the within-program class weight:

Class Base Weight = Program Weight * (Total # Classes in Program / # sampled classes fielded)

Collapsed classrooms were counted as one classroom in the base weight calculations, since they were treated as a single unit in sampling. The ad hoc subsampling was reflected by multiplying the base weight of the retained class in the am/pm pair by a factor of 2 and the dropped class by zero. One class that had merged with another was given a zero base weight, and the newly merged class had its base weight multiplied by a factor of .5 to reflect its increased probability of selection.

Forty-three jackknife class replicate base weights were created from the program replicate weights:

Class Replicate Base Weight j = Program Replicate Weight j * (Total #Classes in Program / #sampled classes fielded); j = 1, 2, …43.

A2. Cross-sectional Fall 2000 Class Weights
Of the 307 sampled classes that were fielded in fall 2000, 279 were eligible and had complete classroom data, 8 were eligible but didn’t complete data collection, and 20 were discovered to be ineligible. A class nonresponse adjustment factor was applied to the class base weights of the 279. The nonresponse adjustment factor was computed separately by program. Both the 8 incomplete and 20 ineligible classes were given a zero final class weight. The classroom replicate base weights were also adjusted for nonresponse by program, so that the sampling variability in the nonresponse adjustments will be reflected in the standard error estimates.

The sum of the nonresponse-adjusted fall 2000 classroom weights is 34,638. The unweighted and weighted completion rates are both 97%, excluding ineligibles from both numerator and denominator. The unweighted and weighted eligibility rates are both 94%. The class base weight was used in calculating the weighted rates.

A3. Longitudinal Fall 2000 - Spring 2001 Class Weights
Of the 286 eligible classes in fall 2000, 280 completed data collection in spring 2001. Note that the 279 fall 2000 classroom completes are not a subset of the 280 spring 2001 completes. Five classes that completed fall 2000 data collection did not complete the spring 2001, and six classes that completed spring 2001 data collection did not complete the fall 2000. There were 79 new classes added in spring 2001 because children who switched classes after the fall 2000 data collection were followed to the new class. However, no classroom observations were done at these new classes, so they were not considered to be part of the classroom sample and were assigned a zero base weight.

A class nonresponse adjustment factor was applied to the class base weights of the 280 eligible completes. The nonresponse adjustment factor was computed separately by program. The classroom replicate base weights were also adjusted for nonresponse by program, so that the sampling variability in the nonresponse adjustments will be reflected in the standard error estimates.The incomplete and ineligible classes, along with the 79 new classes, were given a zero final class weight.

The sum of the nonresponse-adjusted fall 2000–spring 2001 classroom weights is 34,768. The unweighted and weighted completion rates are both 98%, excluding ineligibles from both numerator and denominator. Both unweighted and weighted eligibility rates are 94%. The class base weight was used in calculating the weighted rates.

B. Child Weights

Two sets of child weights were produced: a cross-sectional set for fall 2000 estimates, and a fall 2000 -spring 2001 set for longitudinal analysis. Child base weights were first created that reflected the overall probability of selection for the child, including the program and classroom stages of sampling. These base weights were adjusted for child nonresponse, using the following criteria for a complete child case:

Fall 2000 cross-sectional analysis: a child is considered a complete case if the child has a parent interview from either fall 2000 or spring 2001, and a fall 2000 child assessment or teacher rating.

Fall 2000-Spring 2001 longitudinal analysis: a child is considered a complete case if the child has either a fall 2000 or spring 2001 parent interview, and one of the following data pairs: a child assessment for both fall 2000 and spring 2001, or a teacher rating for both fall 2000 and spring 2001.

B1. Child Base Weights
In 286 eligible fall 2000 classes, all eligible children in their first year of Head Start were taken into the sample with certainty. A base weight was created for each child as the product of their program weight and nonresponse-adjusted classroom weight. Note that these nonresponse adjusted class weights are not the same as those described earlier, which were designed for use in classroom level analysis. The creation of special classroom weights for the child weights was necessary because there were eligible classrooms that did not have complete classroom observations, but did allow their children to be sampled, and vice versa. To create this special classroom weight, the classroom base weight was adjusted for classes that had eligible children but where “sampling” of children did not take place.This nonresponse-adjusted classroom weight was then used in calculating the child base weight. Since there was no subsampling of children within classrooms, the within-classroom child weight is equal to one, and the overall child weight can be written as:

Child Base Weight = Program Weight * Nonresponse-adjusted Classroom Weight.

A set of 43 jackknife (JKn) replicate base weights was also created for each child using the program replicate weights and the special full-sample nonresponse-adjusted classroom weight:

Child Replicate Base Weight j = Program Replicate Weight j * Nonresponse-adjusted Classroom Weight; j = 1, 2, …43.

B2. Child Fall 2000 Cross-Sectional Weights
Of the 3,100 children in the fall 2000 sample, 2,535 were considered completes for the fall 2000 data collection, 251 were eligible but incomplete (30 of these had assessments but no parent interview), and 314 were ineligible. Children could be ineligible if they came from classrooms that were ineligible, or they were discovered to be in their second year of Head Start, or were otherwise ineligible when fall 2000 data collection began.

The child base weights of the eligible, complete children in each classroom were adjusted for nonresponse separately by classroom. The ineligible and incomplete children were given a zero final child weight and were dropped from the sample for the spring 2001 data collection. The replicate child base weights were also adjusted for nonresponse by classroom, so that the sampling variability in the nonresponse adjustments will be reflected in the standard error estimates.

The sum of the nonresponse-adjusted fall 2000 child weights is 337,247. The unweighted and weighted completion rates are both 91%, excluding ineligibles from both the numerator and denominator. The unweighted and weighted eligibility rates are 90% and 91%, respectively. The child base weight was used in calculating the weighted rates.

B3. Child Fall 2000-Spring 2001 Longitudinal Weights
In spring 2001 the eligible first year children were again given assessments, a teacher rating, and an attempt was made to interview the child’s parent(s). Of the 2,535 eligible children who had completed fall 2000 data collection, 2,359 were eligible, complete cases for the fall 2000 – spring 2001 data collection; 171 were eligible, incompletes; and five became ineligible because they moved out of the area.

Children who had switched to new classes in the spring 2001 were followed up, but classroom observations were not done at the new classes. There were 91 children from the fall 2000 sample who were followed to 79 new classrooms in spring 2001. In calculating their base weights, these children were given the classroom probability of selection associated with the classroom from which they were originally sampled in fall 2000.

The child base weights of the eligible, complete children in each classroom were adjusted for nonresponse separately by classroom. The ineligible and incomplete children were given a zero final child weight. The replicate child base weights were also adjusted for nonresponse by classroom, so that the sampling variability in the nonresponse adjustments will be reflected in the standard error estimates.

The sum of the nonresponse-adjusted fall 2000–spring 2001 child weights is 338,047. The unweighted and weighted conditional spring 2001 completion rates are both 93%. The conditional rate is the percent of fall 2000 eligible completes who also completed the spring 2001 data collection. The overall (unconditional) completion rate is the product of the completion rates for the fall 2000 and spring 2001 data collections: 91% * 93% = 85%. This rate is the percent of eligible, sampled children in fall 2000 who completed the spring 2001 data collection.

IV. Data Collection Instruments

A. Direct Child Assessment

A1. Peabody Picture Vocabulary Test - Third Edition - Revised
The Peabody Picture Vocabulary Test (PPVT-III) (Dunn & Dunn, 1997) is designed to assess children’s knowledge of the meaning of words by asking them to say or indicate by pointing which of four pictures best shows the meaning of a word that is said aloud by the assessor. A series of words is presented, ranging from easy to difficult for children of a given age, each accompanied by a picture plate consisting of four line drawings. The test requires about 10 minutes to administer. It is suitable for a wide range of ages from 2 ½ through adulthood and has established age norms based on a national sample of 2,725 children and adults tested at 240 sites across the U.S.

The PPVT-III has been extensively revised from earlier versions of the test. These improvements were undertaken to promote easier testing and more accurate scoring. Also, new drawings have been added and dated illustrations dropped so as to achieve better gender and ethnic balance. Individual test items that showed statistical bias by race or ethnicity, gender, or region were deleted from the item pool for the scale prior to standardization. PPVT-III was reported to be highly reliable utilizing FACES data with internal-consistency reliability (alpha) coefficients ranging from .96 for fall 2000 to .97 for spring 2001.

A Spanish-language test, the Test de Vocabulario en Imagenes Peabody (TVIP), is also available, but has not been updated to be directly comparable to the PPVT-III. For FACES, the TVIP was used with children whose primary language was Spanish.

A screener was used to determine whether English-language learners were to be administered the direct child assessment battery in English or not. The screener involved information provided by teachers and assessors which was used to determine the language of administration. In fall 2000, English-language learners who were determined to be primarily Spanish-speaking, received the entire direct child assessment battery in Spanish, e.g., TVIP, Woodcock Munoz Letter-Word Identification,Applied Problems, Dictation, etc. They also were administered the PPVT and Woodcock Johnson Letter-Word Identification in English, as well. In spring 2001, these same children received the entire direct child assessment battery in English. They were also administered the TVIP and Woodcock Munoz Letter Word Identification in Spanish for the purpose of comparison. In fall 2000, English-language learners who were determined to primarily speak a language other than Spanish did not receive any portion of the direct child assessment battery in their native languages. In spring 2001, these same children received the entire direct child assessment battery in English.

A2. Woodcock-Johnson Psycho-Educational Battery - Revised
The updated edition of the Woodcock-Johnson Battery (WJ-R) is a carefully constructed and widely used test battery. The set of individually administered tests is designed to assess the intellectual and academic development of individuals from preschool through adulthood (Woodcock & Johnson, 1989; Salvia & Ysseldyke, 1991). FACES used three subtests from the Achievement Battery that together constitute an “Early Development—Skills” cluster, according to the test developers. The cluster is comprised of the Letter-Word Identification, Applied Problems, and Dictation tests. The same three subtests of the Spanish version (Woodcock-Muñox Pruebas de Aprovechamiento-Revisada) were used in the Spanish version of the FACES assessment battery.

Letter-Word Identification. The first five Letter-Word Identification items involve symbolic learning, or the ability to match a rebus (pictographic representation of a word) with an actual picture of the object. The remaining items measure children’s reading identification skills in identifying isolated letters and words that appear in large type on the pages of the test book. As well as being part of the Early Development cluster, this subtest is also part of the Basic Reading Skills cluster.The internal consistency of the Letter-Word Identification subtest with FACES children averaged .84 for fall 2000 and .86 for spring 2001.

Letter Naming. The Letter Naming task is a test developed for use in the Head Start Quality Research Centers curricular intervention studies. Children are shown all 26 upper-case letters of the alphabet, divided into three groups of 8, 9, and 9 letters, arranged in approximate order of item difficulty.They are asked to identify the letters they know by name. It has the virtue of providing specific numeric information about how many letters Head Start children learn and which ones they are more or less likely to acquire. The Letter Naming task provides complementary information to the Woodcock Johnson Letter Word Identification task regarding children’s knowledge and awareness of letters. Children’s knowledge and awareness of letters is an essential prerequisite to their learning how to read.

Applied Problems. This subtest measures children’s skill in analyzing and solving practical problems in mathematics. In order to solve the problems, the child must recognize the procedure to be followed and then perform relatively simple counting, addition, or subtraction operations. Because many of the problems include extraneous stimuli or information, the child must also decide which data to include in the count or calculation. As well as being part of the Early Development cluster, the subtest is also part of a Broad Mathematics cluster. The internal consistency of the Applied Problems subtest with FACES children averaged .90 for fall 2000 and .91 for spring 2001.

Dictation. The first six items in this subtest measure prewriting skills, such as drawing lines and copying letters. The remaining items measure the child’s skill in providing written responses when asked to write specific upper- or lower-case letters of the alphabet. Later parts of the test ask the child to write specific words and phrases, punctuation, and capitalization.The internal consistency of the Dictation subtest with FACES children averaged .77 for both fall 2000 and spring 2001.

A3. McCarthy Scales of Children’s Abilities
The McCarthy Scales of Children’s Abilities is a widely used and well-documented test battery. FACES employed one subtest from the battery, the Draw-A-Design Task. The Draw-A-Design Task was used to assess children’s perceptual-motor skills. This task asks the child to draw copies of a series of increasingly complex geometric figures. For FACES, this task was directly translated as part of the Spanish version of the assessment. The Draw-A-Design Task was reported, utilizing FACES data, with internal-consistency reliability (alpha) coefficients ranging from .58 for fall 2000 to .68 for spring 2001.

A4. Story and Print Concepts
The Story and Print Concepts task was an adaptation of earlier prereading assessment procedures developed by Marie Clay (1979), William Teale (1988, 1990), and Mason and Stewart (1989). In these procedures, a child is handed a children’s storybook (FACES Battery - Where’s My Teddy? (Alborough, 1992) or ¿Dónde Está Mi Osito? (Alborough, Castro, Trans. 1992)) upside down and backwards. The assessor asks a series of questions designed to test the children’s knowledge of books. These include questions regarding the location of the front of the book, the point at which one should begin reading, and information relating to the title and author of the book. The assessor reads the story to the child and asks basic questions about both the mechanics (print conventions) of reading and the content (comprehension) of the story. The print conventions questions pertain to children’s knowledge of the left-to-right and up-and-down conventions of reading, while the comprehension questions pertain to children’s recall of key facts from the story. Additionally, for FACES, questions were added tapping rhyming awareness (e.g., “I’ll say some words from the story and you tell me whether they rhyme, OK - bawl and small, etc.”) and phonological awareness (e.g., “What word would be left if I took “teh” away from Ted?”). These additions were only included in the fall 2000 direct child assessment battery. FACES reliabilities (internal consistencies) for these concepts for both the fall and the spring were as follows: 1.) Book Knowledge (.57 and .59); 2.) Print Conventions (.73 and .74); and 3.) Comprehension (.43 and .41).

A5. Social Awareness
This measure was adapted from a subtest of the Comprehensive Assessment Program (CAP) Early Childhood Diagnostic Instrument used by Snow et al. (1995) among others to test children’s general knowledge and awareness of the social environment. The child is asked to give his/her “full name,” which includes both first and last name, his/her age (either verbally, which is given full credit, or by holding up the correct number of fingers, which is given partial credit), and month/day of birth. The FACES reliabilities for the Social Awareness measure were .63 for fall 2000 and .61 for spring 2001.

A6. Color Names and One-to-One Counting
This was also a subtest of the CAP Early Childhood Diagnostic Instrument used by Snow et al. (1995) and developed by Marie Clay (1979),William Teale (1988, 1990) and Mason and Stewart (1989) as a battery of emergent literacy and school readiness measures. For the FACES battery, 10 teddy bears of different colors are presented randomly arranged on a page and the child is asked to point to each in turn and name the color. Following the Color Names task, the child is asked to count the bears and the assessor marks the final number the child arrives at when finished counting (correct answer is “10”). After this, the child is asked to report the total number of bears. The verbatim response is then recorded. Following these questions, the assessor must rate the child’s one-to-one counting performance using a 5-point scale. At the extremes, a score of 5 indicated that the child made no mistakes and score of 1 indicated that the child could not count or did not try to count. The FACES reliabilities for the Color Naming task were .95 for Fall 2000 and .94 for Spring 2001.

A7. Leiter International Performance Scale -Revised (Leiter-R) - Attention Sustained
The Leiter-R by Roid and Miller (1997) assesses cognitive function in children and adolescents. The battery includes measures of nonverbal intelligence in fluid reasoning and visualization, as well as appraisals of visuospatial memory and attention. In spring 2001, the Leiter-R AS (Attention Sustained) Subtest was added to the FACES direct child assessment battery to permit assessments of children’s visuospatial memory and attention. The subtest is primarily nonverbal and is administered in two subsections—the first being for those 2-3 years of age and the second being for those 4-5 years of age. Assessors provide minimal instructions throughout the administration of the Leiter-R AS. Children are presented with a series of pages containing pictures and are instructed to mark off all pictures that resemble a reference picture. The assessor times the child, with times ranging from 30 seconds to 120 seconds allotted for completion of the tasks. FACES reliabilities for the Leiter Attention Sustained subtask by age groupings for spring 2001 were as follows: 1.) 2 - 3 year old - .71; and 2.) 4 - 5 year olds - .81.

A8. Interviewer Ratings
At the end of the one-on-one testing sessions with the children, the assessor completes a set of rating scales evaluating the child’s behavior in the test situation, including the child’s approaches to learning and problem behaviors. There are two sections to these ratings. The first consists of eight scales rating the child’s response during the assessment on eight different domains: task persistence, attention span, body movement, attention to directions, comprehension of directions, verbalization, ease of relationship, and the child’s level of confidence. Ratings use 4-point scales with descriptive anchors at each point. For example, the “task persistence” scale consists of the following anchor points: persists with task (4), attempts task briefly (3), attempts task after much encouragement (2), refuses (1). The FACES reliabilities for the Interviewer Ratings were .82 for fall 2000 and .81 for spring 2001.

The second section asks the assessor to indicate any special concerns regarding the child’s ability to complete the assessment: responding nonverbally, using nonstandard English such as dialect, speaking English as a second language, having limited English proficiency, experiencing difficulty hearing or seeing the assessor/test materials, or reporting the child’s speech was difficult to understand. These items use 3-point ratings to indicate the degree to which the child displayed any of these characteristics (i.e., “not at all,” “somewhat,” and “very much”).

A9. Kindergarten Follow-Up ECLS-K Measures
Two additional measures were included in the follow-up kindergarten assessment battery (spring 2002): the Reading scale and the General Knowledge scale, which were adapted from the Early Childhood Longitudinal Study - Kindergarten Cohort (ECLS-K).

In ECLS-K, the Reading scale taps a variety of skills that indicate reading ability (including familiarity with print), recognition of letters and phonemes, vocabulary, and reading comprehension skills (e.g., children’s understanding of the text), as well as their personal reflection and critical evaluation of the text. The General Knowledge scale taps skills in the natural sciences (e.g., their conceptual understanding of why things occur as they do, and their ability to pose questions and investigate answers in the natural sciences) and social studies (e.g., their basic knowledge of History, Government, and Culture). Both scales follow the guidelines of the 1996 National Assessment of Educational Progress, have been reviewed by curriculum experts, as well as elementary school teachers, and have been found to be both reliable and valid measures of reading achievement and basic knowledge acquisition.29 

The Reading assessment was administered in two stages. First, a routing test was administered to estimate the child’s reading ability. Based on his/her performance on the routing test (either “high,” “medium,” or “low”), an appropriate “second stage” test was administered. The Reading assessment had three levels of second stage tests: low (red), medium (yellow), and high (blue). For the General Knowledge assessment, each child was administered only the routing test. Estimates of reliability with FACES data, as measured by Cronbach’s coefficient alpha, will be provided at a later point when the data become available.

B. Classroom Observation Instruments

In FACES, two distinctive types of observation instruments (i.e., classroom observation and child observation) were used to measure peer interactions, friendships of children, and the extent to which Head Start programs employed skilled teachers and provided developmentally appropriate environments and curricula for their children.

B1. Counts of Children and Adults
The Counts of Children and Adults provide information needed to calculate child/adult ratios and for other calculations to be used in assessing specific measures of classroom quality. Classroom observers are tasked with counting the number of children (boys and girls), the number of paid staff, and the number of adult volunteers at two separate time periods during the classroom observation. The two counts must be at least an hour apart and must involve one structured (teacher-directed) activity and one unstructured activity.

B2. Assessment Profile
The Assessment Profile (Abbott-Shim & Sibley, 1987) is a structured observation guide designed to provide a quantitative assessment of classrooms and teaching practices that facilitate the learning and development of children. Three subscales were used in FACES: Scheduling, Learning Environment, and Individualizing.

The Scheduling subscale assesses the written plans for classroom scheduling and how classroom activities are implemented. The appropriateness and completeness of the classroom activity plan are also noted. The subscale also assesses the balance and variety of learning contexts (e.g., individual, small group, and large group) and learning opportunities (i.e., child- vs. teacher-directed and active vs. quiet activities). The 14 observation items are scored in a yes/no format. High scores on this measure are indicative of a teacher that uses a “planful” approach to classroom activities. The reliability of the Scheduling subscale was reported as .89 for fall 2000 and .87 for spring 2001.

The Learning Environment subscale focuses on the accessibility of a variety of learning materials to children in the classroom. Variety is assessed across various conceptual areas, such as science, math, language, fine motor, etc. and also within each conceptual area. The subscale also assesses how classroom space is arranged to determine whether the classroom encourages independence (e.g., whether the learning materials are located on low shelves and clearly labeled) and reflects the child as an individual. When materials are both available and accessible, and in sufficient numbers (typically a minimum of three in each group) the item is given a positive score. High scores on this 7-item measure indicate a “learning rich” environment, filled with toys and learning materials that address a variety of developmental domains. The reliability of the Learning Environment subscale was reported as .68 for fall 2000 and .77 for spring 2001.

The Individualizing subscale focuses on the extent to which emphasis is placed on children, individually, in the classroom setting. This includes whether or not there are periodic individual assessments of each child’s performance using portfolios of his/her work, performance inventories, and teacher notations. Also included is whether or not child assessment information is used for planning individualized learning experiences. The final inclusion involves whether or not teachers have the ability to make provisions for children with special needs. The reliability of the Individualizing subscale was reported as .50 for fall 2000 and .54 for spring 2001.

B3. Early Childhood Environment Rating Scale-Revised (ECERS-R)
The Early Childhood Environment Rating Scale-Revised (ECERS-R) is a global rating of classroom quality based on structural features of the classroom (Harms & Clifford, 1980). It has been widely used in child development research and has predicted optimal child outcomes in a number of studies (e.g., Phillips, Voran, Kisker, Howes, & Whitebook, 1994). The revised version of the ECERS provides improvements to the items and represents an improvement on the standardization of the observational methods. In addition, the ECERS-R is easier to train and gain inter-rater reliability. The ECERS-R contains 37 items representative of classroom quality. Each item is coded on a 7-point scale with a score of 1 representing “inadequate,” a score of 3 representing “minimal quality,” a score of 5 representing “good quality,” and a score of 7 representing “excellent quality.” The internal consistency of the ECERS-R mean score for all combined items was .92 for both fall 2000 and spring 2001.

Seven subscales were derived from the ECERS-R for usage in analysis of FACES classroom quality, each pertaining to different elements of classroom quality. These are as follows: 1.) Personal Care Routines are measured using six items: greeting/departing, meals/snacks, nap/rest, toileting/diapering, health practices, and safety practices; 2.) Furnishings are measured using four items: indoor space, furniture for routine care, play, and learning, furniture for relaxation and comfort, and room arrangement for play; 3.) Language Skills are measured using four items: books and pictures, encouraging children to communicate, using language to develop reasoning skills, and informal use of language; 4.) Motor Skills are measured using four items: space for gross motor play, gross motor equipment, fine motor activities, and supervision of gross motor activities; 5.) Creativity is measured using six items: child-related display, art, music/movement, blocks, sand/water, and dramatic play; 6.) Social Skills are measured using four items: supervision, other than gross motor activity, discipline, staff-child interactions, and interactions among children; and 7.) Program Structure is measured using four items: space for privacy, schedule, free play, and group time. Five items were not incorporated into any of the subscales and are as follows: nature/science, math/numbers, use of TV, video, and/or computers, promoting acceptance of diversity, and provisions for children with disabilities. Thus there were only 32 of the 37 available items included in the subscales.

A separate subscale, labeled ECERS-R Language, was comprised of four items and was devised to assess the quality of the language environment in Head Start classrooms. Additional information about this subscale can be found in Chapter 4.

B4. Classroom Observation of Teacher-Directed Activities
The Classroom Observation of Teacher-Directed Activities is a checklist completed by classroom observers of observed teacher-directed activities in 21 specific areas, e.g., reading stories, singing songs, etc. The classroom observer indicates whether observed activities were directed toward individual children (Individual Attention), a small group of children (Small Group = 3 to 8 children), or a whole group of children (Whole Group = entire classroom). Observers were instructed to mark down, only once for any item, any teacher-directed activities observed throughout the course of the classroom observation and if these observed activities were directed toward individuals, a small group of children, or the entire classroom. This checklist was introduced in Spring 2001.

B5. Arnett Caregiver Interaction Scale
The Arnett Caregiver Interaction Scale is a rating scale of teacher behavior towards the children in the classroom. It consists of 26 items that assess five areas of teacher behavior: sensitivity, punitiveness, detachment, permissiveness, and prosocial interaction (Arnett, 1989). The version of the Arnett Caregiver Interaction Scale utilized in the current round of FACES consists of 30 items and five subscales with the subscale labels being as follows: Sensitivity, Harshness, Detachment, Permissiveness, and Independence. At the end of the observational period, the observer completes the scale for an individual teacher, typically the lead teacher in the classroom. For example, in evaluating whether the teacher “speaks warmly to the children,” the observer will assign ratings indicating the extent to which the statement is characteristic of the teacher, from 1 “never seen” to 4 “always or almost always.” The Cronbach Coefficient Alpha for all of the items was .94 for fall 2000 and .69 for spring 2001.

C. Teacher’s Child Ratings and Teacher Background

Teacher ratings of children were important sources of information about children’s learning and behavior because teachers see children over extended periods of time and in a variety of settings. Using a rating form known as the Teacher’s Child Report (TCR), teacher’s were first asked to rate each child on a set of behaviors that assessed the child’s basic social skills and classroom behavior. In these two sections, the teacher is asked to indicate the extent to which a given statement (e.g., “follows the teacher’s directions”) is characteristic of the child, from 1 “never” to 3 “very often.” The items making up these ratings form two scales:

C1. Cooperative classroom behavior:
There are 12 ratings items for the teacher to indicate how often the child engages in cooperative classroom behaviors such as following teacher’s directions, helping put things away, complimenting classmate, and following rules when playing games. The ratings include items drawn from the Personal Maturity Scale (Alexander & Entwisle, 1988) and the Social Skills Rating System (Elliott, Gresham, Freeman, & McCloskey, 1988) to assess positive behavior such as cooperation, sharing, and expression of feelings. A summary score is created from the 3-point scale items which ranges from zero to 24, with high scores indicating more frequent cooperative behavior. The internal consistency for this measure was .88 in both Fall 2000 and Spring 2001.

C2. Total behavior problems:
The Behavior Problems scale is based on measures of negative child behaviors that are associated with learning problems and later grade retention. Items come from an abbreviated adaptation of the Personal Maturity Scale (Alexander & Entwisle, 1988), the Child Behavior Checklist for Preschool-Aged Children, Teacher Report (Achenbach, Edelbrock, & Howell, 1987) and The Behavior Problems Index (Zill, 1990). The items ask about the frequency of aggressive behavior (e.g., hits/fights with others), hyperactive behavior (e.g., is very restless), and anxious or depressed and withdrawn behavior (e.g., is unhappy). The summary score from the scale’s 14 behavior items ranges from zero to 28, with higher scores representing more frequent or severe negative behavior. The reliabilities (internal consistency) for these measures for both Fall and Spring are as follows:

1.) Total Problem Behaviors - .86 for both; 2.) Aggression - .83 and .85; 3.) Hyperactivity - .72 for both; and 4.) .77 and .76.

The teacher is then asked to rate the child’s problem solving skills and initiative, social relationships, creative representations, music/movement skills, and language/math skills. The teacher is asked to rate the child’s highest level of behavior in each of the above domains observed in the past week. Scale points for each item are described on paper and there is a glossary that provides concrete examples of each anchor point. For the purpose of FACES, fourteen items from the Child Observation Record (COR; High/Scope Educational Research Foundation, 1992) were selected with a demonstrated reliability of .94 for both fall 2000 and spring 2001. These 14-items were further divided up into the following scales: social relationships, creative representations, music and movement, and cognitive.

C3. Social Relationships (3 items):
A composite score based on teacher’s ratings of how well the child makes friends, works with other children, and understands and expresses feelings. Each item is rated on a five-point scale with higher scores representing greater skill in coping with social situations and expressing feelings appropriately. The summary score is the average of the three items and ranges from one to five. The measure shows good reliability with the FACES study, with Alpha Coefficients of .83 for both fall 2000 and spring 2001.

C4. Creative Representations (3 items):
A composite score based on the teacher’s ratings of how well the child uses creative materials for self-expression in making and building things, drawing and painting, and engaging in pretend play. Each item is rated on a five-point scale with higher scores representing greater proficiency. The summary score is the average of the three items and ranges from one to five. The measure shows good reliability with the FACES study, with Alpha Coefficients of .80 for fall 2000 and .81 for spring 2001.

C5. Music and Movement (4 items):
A composite score based on teacher’s ratings of how well the child can imitate movements to a steady beat, follow music and movement directions, exhibit body coordination, and manipulate small objects and perform precise actions. Each item is rated on a five-point scale with higher scores representing greater proficiency. The summary score is the average of the four items and ranges from one to five. The measure shows good reliability with the FACES study, with Alpha Coefficients of .88 for both fall 2000 and spring 2001.

C6. Cognitive (4 items):
A composite score based on teacher’s ratings of how well the child can solve problems, engage in complex play, show interest in reading, and exhibit classification skills by sorting objects. Each item is rated on a five-point scale with higher scores representing greater proficiency. The summary score is the average of the four items and ranges from one to five. The measure shows good reliability with the FACES study, with Alpha Coefficients of .82 for fall 2000 and .83 for spring 2001.

The Lead Teacher Background Information consists of questions asking the teacher about himself/herself, including sociodemographic and educational background and professional experience. Information about the curriculum being used, his/her attitude and knowledge about early childhood education practice (see Teacher Beliefs Scale write-up referenced in Chapter 4), and accommodations he/she has made or that others have made to meet the learning needs of children in his/her classroom, particularly children with special needs, are included, as well.

D. Parent Interview

Data from the FACES Parent Interview, administered in fall 2000 and spring 2001, provide Head Start with a comprehensive understanding of the families that they serve, including the characteristics of households and household members, levels and types of participation in the program and in other community services, involvement with their children, and understanding of their children’s development.

Parents were also asked to rate each child on a set of behaviors that assessed the child’s basic social skills and behavior problems. In this section, the parent is asked to indicate the extent to which a given statement (e.g., “makes friends easily”) is characteristic of the child, from 1 “not true” to 3 “very true or often true.” The items making up these ratings were drawn from two well-known measures of children’s positive behavior and behavior problems: the Entwisle scale of Personal Maturity (Entwisle, Alexander, Cadigan, & Pallis, 1987) and the Child Behavior Checklist for Preschool-Aged Children (Achenbach, Edelbrock, & Howell, 1987). Two scales were formed to assess children’s social competence:

D1. Social skills and positive approaches to learning:
Parents were asked to rate their child’s social skills and positive approaches to learning by describing their children’s skills in making friends and accepting their ideas, as well as enjoying learning and trying new things. A summary score based on the scale’s seven items ranges from zero to 14, with higher scores representing more positive behavior.Table A-10 shows the reliabilities for the Social Skills measure in both fall 2000 and spring 2001.

D2. Total Problem Behaviors:
Parents were also asked to rate their children on negative behaviors that are relatively common among preschool children and that are associated with adjustment problems in elementary school. Parents were asked about three domains of problem behavior: hyperactive behavior, aggressive behavior, and depressed or withdrawn behavior. The 12 behavior items were combined in a summary score ranging from zero to 24, with higher scores representing more frequent or severe negative behavior. Table A-10 shows the reliabilities for all of these behavior problem measures in both Fall 2001 and Spring 2001.

D3. Other Parent Interview Scales/Measures Referenced in the Report: (see chart on next page)

V. Field Staff Training

A weeklong training was conducted prior to each data collection period to prepare field staff for successful completion of data collection. The training included a wide variety of activities covering all the procedures, techniques, and contents required to carry out successful data collection in the Head Start centers:

  • Lecture, incorporating slides, overheads, and videotapes;

  • Exercises that simulate various procedures such as assessing classroom scheduling;

  • Video demonstration of assessment techniques and components of classroom scoring procedures;

  • Exercises to achieve pre-established levels of inter-rater reliability;

  • Participatory involvement of all trainees in small groups so that trainers may evaluate individual performance;

  • Multiple occasions of practice in real classroom settings that simulate what they are expected to do in the field, with the presence of a trainer and a small group of trainees to discuss the classroom ratings and provide valuable guidance on scoring reliability and agreement; and

  • One-on-one practice and role-play in the administration of child assessment procedures under supervision of training staff.

NAMES AND SOURCES FOR OTHER PARENT INTERVIEW SCALES/MEASURES
REFERENCED IN THE REPORT
Name Source
Pearlin Mastery Scale (Locus of Control) Pearlin, L. I. and Schooler, C. (1978). The structure of coping. Journal of Health and Social Behavior, 22, 337-356.
CES-D Depression Scale Radloff, L. S. (1977). The CES-D: A self-report
depression scale for research in the general population.
Applied Psychological Measurement, 1, 385-401.
Family Activities with Children National Household Education Survey - FACES
Research Team
Parental Involvement in Head Start Head Start Quality Research Consortium (QRC)
Exposure to Violence FACES Research Team
Domestic Violence Screener Feldous, K. M., Koziol-McLain, J., Amsbury, H. L. et.
al. (1997). Accuracy of three brief screening questions for detecting partner violence in the emergency room.
JAMA, 227(17), 1357.
Substance Abuse Screener Administration for Children and Families (1997).
National Impact Evaluation of the Comprehensive Child
Development Program. Washington, D.C.: U.S.
Department of Health and Human Services.
Involvement with Criminal Justice System FACES Research Team
Parenting Style National Longitudinal Study of Youth (NLSY), Early
Head Start Evaluation (EHS), QRC

The field procedures manual contained information about working with a research team, appropriate behaviors within a classroom, and how to orchestrate Head Start center visits. Moreover, the manual covered an overview of all data collection instruments and administrative and travel procedures. Complete scoring rules and question-by-question specifications for the child assessment and child and classroom observation instruments were also discussed in the manual.

During the training, trainees were introduced to the purpose and goals of the study and background information on Head Start. Trainees were also introduced to the data collection materials and general issues regarding children and early childhood learning environments. Each day of training included a morning question and answer period regarding the previous day’s training, a daily review of the current day’s material, and a brief discussion of the next day’s events.

An additional practice session was given to provide trainees with more practice in either observation or assessment. Assignment of this practice was based on the measures in which the trainees needed more practice. For administering child assessments in Spanish, a special training for English-Spanish speaking trainees was held. The bilingual trainees had an opportunity to practice assessments with Spanish-speaking children.

VI. Data Collection Procedures

A. Site Visit Arrangements

The research team obtained feasible dates for the 2-week site visit from each of the sampled Head Start programs. Site visit dates for each program were coordinated within the data collection period and programs were notified about the visit dates. Three weeks before the site visit, a scheduling packet which contained the final visit schedule, a master list, organized by classroom, a reminder list, and a request for maps and directions to aid the research team was sent to the on-site coordinator (OSC). OSCs are members of the Head Start program staff specially designated to coordinate the data collection efforts by scheduling parent interviews, classroom visits with program teachers, and obtaining consent forms, among other related duties.

VII. Quality Control Visits

In FACES, Quality Control (QC) visits were built into every step of the data collection to ensure the highest quality data possible. The QC visitors consisted of the FACES project staff who were involved in designing the instruments, preparing the training materials, and conducting the training. The QC visitors were trained in both observation and assessment data collection and also served as technical consultants in the field. During the fall 2000 data collection, one 3-day QC visit to program sites was made.

VIII. Data Preparation & Data File Creation

A. Data Entry

Key entry and verification were performed on the study instruments using a sophisticated production data entry system. This system provides entry form layout, application of edit specification, data verification control, and provides data entry quality and production reports.

B. Frequency Review

The frequencies of responses to all data items (both individually and in conjunction with related data items) were reviewed to ensure that appropriate skip patterns were followed. Members of the data preparation team checked each item to make sure the correct number of responses was represented for all items. If a discrepancy was discovered, the problem case was identified and reviewed.

C. Data Edit

To code and edit questionnaire data, an integrated collection of software was utilized. Through this system of software, coding manuals and codebooks were developed, data editing was performed, and SAS source code was generated.

D. Data File Creation

Data files were created and analysis performed to provide summaries and assessments of Head Start children and their families during this period and to assess the reliability and validity of information contained within the data collection instruments. Numerous derived variables were created to increase the magnitude and scope of analytical capabilities. The coding for these derived variables may be obtained upon request.

IX. Reliability and Data Summary

In FACES, various data collection instruments were used to assess the accomplishments and behaviors of children in Head Start programs, as well as the educational and familial support that is provided to them. As noted in Section IV: Data Collection Instruments, these instruments are widely used and report mostly high reliabilities. The reliabilities for each data collection instrument and summaries for these data collection instruments are provided in the following Tables:Table A-2 - Tables A-11.

 

Table A-1. Summary of Measures Administered from Fall 2000 to Spring 2001
Fall 2000
(Head Start)
Spring 2001
(Head Start)
Social Awareness Social Awareness
PPVT-III / TVIP PPVT-III / TVIP
McCarthy Draw-A-Design McCarthy Draw-A-Design
----- Leiter-R AS (Attention Sustained) Subset
Color Names and Counting Color Names and Counting
Woodcock Johnson (Munoz): Letter-Word Identification Woodcock Johnson (Munoz): Letter-Word Identification
Woodcock Johnson (Munoz): Applied Problems Woodcock Johnson (Munoz): Applied Problems
Woodcock Johnson (Munoz): Dictation Woodcock Johnson (Munoz): Dictation
Story and Print Concepts Story and Print Concepts
Interviewer Rating: Assessment Behavior Interviewer Rating: Assessment Behavior

 

Table A-2. Reliability of Fall 2000 and Spring 2001 FACES Child Assessment Data - English Assessments Only (Spring 2001 Leiter Results are for Children Assessed in Both Eng. & Span.)
Scales Fall 2000 Spring 2001
Number
of Items
Number
of Cases
Cronbach
Alphas
Number
of Items
Number
of Cases
Cronbach
Alphas
Social Awareness 5 2,068 .63 5 1,948 .61
PPVT-III 144 2,116 .96 144 1,980 .97
McCarthy: Draw-A-Design 9 2,068 .58 9 1,943 .68
Leiter-R AS - Ages 2 to 3 - - - 4 406 .71
Leiter-R AS - Ages 4 to 5 - - - 4 1,758 .81
Color Names 10 2,055 .95 10 1,940 .94
WJR: Letter-Word Identification 23 1,054 .84 23 1,595 .86
WJR: Applied Problems 23 1,054 .90 23 1,595 .91
WJR Dictation 12 1,054 .77 12 1,595 .77
Story and Print Concepts: Print Conventions 2 2,116 .73 2 1,980 .74
Story and Print Concepts:
Book Knowledge
5 2,116 .57 5 1,980 .59
Story and Print Concepts:
Comprehension
2 2,116 .43 2 1,980 .41
Interviewer Rating:
Assessment Behavior
8 2,021 .82 8 1,901 .81

 

Table A-3. Reliability of Fall 2000 and Spring 2001 FACES Child Assessment Data - Spanish Assessments Only (Spring 2001 Leiter Results are Referenced in Table A-2.)
Scales Fall 2000 Spring 2001
Number
of Items
Number
of Cases
Cronbach
Alphas
Number
of Items
Number
of Cases
Cronbach
Alphas
Social Awareness 5 385 .36 5 356 .45
TVIP 144 392 .92 144 364 .92
McCarthy: Draw-A-Design 9 375 .57 9 355 .74
Leiter-R AS - Ages 2 to 3 - - - - - -
Leiter-R AS - Ages 4 to 5 - - - - - -
Color Names 10 378 .92 10 358 .93
WM: Letter-Word Identification 23 219 .75 23 307 .78
WM: Applied Problems 23 219 .85 231 307 .89
WM: Dictation 12 219 .77 121 307 .73
Story and Print Concepts:
Print Conventions
2 392 .59 2 364 .77
Story and Print Concepts:
Book Knowledge
5 392 .43 5 364 .48
Story and Print Concepts:
Comprehension
2 392 .39 2 364 .43
Interviewer Rating:
Assessment Behavior
8 372 .77 8 353 .68
1Spring 2001 Applied Problems & Dictation are Woodcock Johnson, not Woodcock Munoz.(back)

 

Table A-4. Summary Statistics for Fall 2000 and Spring 2001 FACES Child Assessment Data - English Assessments Only (Spring 2001 Leiter Results are for Children Assessed in Both Eng. & Span.)
Scales Fall 2000 Spring 2001
Number of Cases Mean SD Reported
Response
Range
Possible
Response
Range
Number of Cases Mean SD Reported
Response
Range
Possible
Response
Range
Social Awareness 2,101 3.36 1.69 0 - 6 0 - 6 1,967 3.98 1.58 0 - 6 0 - 6
PPVT-III* 2,031 35.06 17.65 0 - 98 0 - 144 1,932 45.30 18.72 1 - 98 0 - 144
McCarthy: Draw-A-Design 2,112 2.92 1.33 0 - 13 0 - 19 1,980 3.52 1.70 0 - 15 0 - 19
Leiter-R AS - Ages 2 to 5 - - - - - 2,253 40.72 10.79 1 - 70 0 - 70
Color Names 2,101 11.32 7.37 0 - 20 0 - 20 1,969 15.59 5.98 0 - 20 0 - 20
WJR: Letter-Word
Identification*
948 5.30 2.61 0 - 21 0 - 23 1,511 6.59 3.19 0 - 22 0 - 23
WJR: Applied Problems* 963 7.52 4.36 0 - 21 0 - 23 1,542 8.98 4.70 0 - 22 0 - 23
WJR: Dictation* 916 5.11 1.83 0 - 12 0 - 12 1,491 5.64 2.11 0 -12 0 - 12
Story and Print Concepts:
Print Conventions
2,089 0.23 0.57 0 - 2 0 - 2 1,968 0.37 0.69 0 - 2 0 - 2
Story and Print Concepts:
Book Knowledge
2,087 1.62 1.27 0 - 5 0 - 5 1,961 2.41 1.30 0 -5 0 - 5
Story and Print Concepts:
Comprehension
2,102 0.54 0.70 0 - 2 0 - 2 1,967 0.71 0.75 0 - 2 0 - 2
Interviewer Rating:
Assessment Behavior
2,094 17.14 5.02 0 -24 0 - 24 1,950 19.05 4.38 0 - 24 0 - 24
*Raw scores were used.

 

Table A-5. Summary Statistics for Fall 2000 and Spring 2001 FACES Child Assessment Data Spanish Assessments Only (Spring 2001 Leiter Results are Referenced in Table A-4.)
Scales Fall 2000 Spring 2001
Number of Cases Mean SD Reported
Response
Range
Possible
Response
Range
Number of Cases Mean SD Reported
Response
Range
Possible
Response
Range
Social Awareness 390 2.62 1.21 0 - 6 0 - 6 360 2.56 1.30 0 - 6 0 - 6
TVIP* 369 11.34 8.38 1 - 47 0 - 144 322 16.27 10.04 1 - 48 0 - 144
McCarthy: Draw-A-Design 392 3.37 1.34 0 - 13 0 - 19 364 4.05 1.90 0 -12 0 - 19
Leiter-R AS - Ages 2 to 5 - - - - - - - - - -
Color Names 386 8.90 6.62 0 - 20 0 - 20 362 13.46 6.49 0 - 20 0 - 20
WM: Letter-Word Identification* 195 4.37 1.20 0 - 10 0 - 23 295 5.01 1.68 0 - 12 0 - 23
WM: Applied Problems* 200 5.29 3.40 0 - 14 0 - 231 294 5.81 4.13 0 - 19 0 - 23
WM: Dictation* 188 4.99 1.28 1 - 11 0 - 121 293 5.72 1.74 0 - 11 0 - 12
Story and Print Concepts:
Print Conventions
391 0.17 0.53 0 - 3 0 - 2 360 0.14 0.47 0 - 2 0 - 2
Story and Print
Concepts:
Book Knowledge
376 1.25 1.13 0 - 5 0 - 5 355 1.70 1.13 0 - 5 0 - 5
Story and Print Concepts:
Comprehension
386 0.47 0.67 0 -2 0 - 2 360 0.50 0.68 0 - 2 0 - 2
Interviewer Rating:
Assessment Behavior
383 17.54 4.41 0 -24 0 - 24 360 17.99 3.50 3 - 24 0 - 24
*Raw scores were used.

1Spring 2001 Applied Problems & Dictation are Woodcock Johnson, not Woodcock Munoz.(back)

 

Table A-6. Reliability of Fall 2000 and Spring 2001 FACES Classroom Observation Data Selected Measures
Scales Fall 2000 Spring 2001
Number
of Items
Number
of Cases
Cronbach
Alphas
Number
of Items
Number
of Cases
Cronbach
Alphas
Assessment Profile:
Scheduling
14 227 .89 14 243 .87
Assessment Profile:
Learning Environment
18 228 .68 18 228 .77
Assessment Profile:
Individualizing
5 250 .50 5 250 .54
ECERS Total Mean 37 270 .92 37 235 .92
Personal Care 6 146 .73 6 269 .70
Furnishings 4 263 .52 4 263 .60
Language 4 2