Table of Contents | Previous | Next |
Appendix
Methodology
I. Sample Design Overview
The third cohort of Head Start children for FACES was selected as a two-stage sample.28 The first stage sampling units were Head Start programs; the second stage units were classes within sampled programs. In each sampled classroom, all eligible children in their first year of Head Start were taken into the sample.
A. Programs
The sampling frame of eligible Head Start programs was constructed from the 1998-1999 Program Information Report (PIR). Migrant and Seasonal Head Start programs, American Indian/Alaska Native Head Start programs, Early Head Start programs, programs in the territories, and programs that do not serve children directly were excluded, resulting in a frame of 1,675 programs. The programs were stratified by Census region (NE, NC, S, W), percent minority (above/below 50%), and metro or urban/rural status (MSA/non-MSA). These are the same stratification variables used in sampling programs for the first (Spring 1997) and second (fall 1997 – spring 2000) FACES cohorts.
A sample of 45 programs was selected in spring 2000. The sample size in each stratum was proportional to the stratum first year Head Start enrollment. The programs were selected with probability proportional to the program’s first year enrollment using systematic sampling. The first year enrollment was calculated from the PIR by subtracting the reported second and third year enrollment from the total enrollment. A Keyfitz procedure was used to minimize the overlap with the 40 programs sampled previously by Abt Associates for the first and second FACES cohorts. As a result, there was no overlap with the previous program sample. Of the 45 programs selected for the third cohort, two were later discovered to be ineligible because they had been defunded.
B. Classrooms
In the 43 remaining programs, lists of the anticipated classes for fall 2000 were obtained in late summer 2000. The programs also provided the expected number of first year Head Start children in each class. These lists formed the basis for the classroom sampling frame, after excluding classes with no first year children. Classes with fewer than five first year children expected were combined with another class in the same center to form a “class group.” The class groups were treated as a single unit for sampling purposes and sample size calculations. The total target sample size of first year children was 2,825, or 66 per program. In general, the desired sample size of classes in each program was determined as 66 / (average class size for the program), where the average class size was in terms of number of first year children. The actual initial sample size was increased by 2 classes to allow for a reserve sample in each program. In programs where the total first year enrollment as obtained from the class rosters was more than twice the measure of size used to sample the program, the initial class sample size was increased to prevent large variation in class weights. In small programs where the initial sample size exceeded the number of classes available, all classes were taken with certainty.
Classes were sorted by center within program and were sampled with equal probabilities. A subsample of the initial sample was selected with equal probabilities to obtain a main sample of the desired sample size and a reserve sample of two classes in each program. A total of 367 classes were selected: 279 classes for the main sample and 88 for the reserve sample. The number of main sample classes in each program varied from 3 to 15, with an average of 6. (In terms of collapsed classrooms, a total of 252 classroom groups were sampled for the main sample and 82 for the reserve sample, for an average of 6 per program and a range of 3 to 10.)
In fall 2000 the eligibility status of the main sample classes was determined. One or two reserve classes were added in some programs to prevent a shortfall in the target number of first year children for the study. The final sample for weighting purposes included all sampled classes where an attempt was made to collect data from the classroom, including those discovered to be ineligible. The rationale for this is because ineligible classes in the sample represent ineligible classes on the Head Start program frame. A total of 307 main and reserve classes were in sample in fall 2000. Twenty of the 307 classes were discovered to be ineligible when the program was contacted by field staff in fall 2000 because they no longer existed, they didn’t receive Head Start funding, or they had no first year Head Start children. In 286 of the remaining 287 eligible classes (one teacher refused to allow the children in her class to be sampled), all first year children were taken into the sample.
II. Response Rates
Fall 2000
-
2,508 child assessments were completed out of 2,790 for a completion rate of 90 percent.
-
2,488 parent interviews were completed out of 2,790 families selected for the sample (89 percent).
-
Teacher report forms were obtained on 2,532 of the sample children (91 percent).
-
Assessment, parent, and teacher data were obtained on 2,396 of the 2,790 sample children (86 percent).
-
A total of 278 classrooms were observed out of 286 in the sample for a completion rate of 97 percent.
Spring 2001
-
2,232 child assessments were completed out of 2,288, representing 98 percent of the children who remained in the program, and 80 percent of the original sample (2,790).
-
2,166 parent interviews were completed out of 2,288, representing 95 percent of the children who remained in the program, and 78 percent of the original sample.
-
Teacher report forms were obtained on 2,236 of the sample children, representing 98 percent of the children who remained in the program and 80 percent of the original sample.
-
Assessment, parent, and teacher data were obtained on 2,115 of the 2,288 sample children who remained in the program (92 percent).
- A total of 275 classrooms were observed out of 284 in the sample for a completion rate of 97 percent.
Spring 2002 (Kindergartners Only)
-
831 child assessments were completed out of 979, representing 85 percent of the children who were in kindergarten in spring 2002.
-
901 parent interviews were completed out of 979, representing 92 percent of the children who were in kindergarten in spring 2002.
-
Teacher report forms were obtained on 681 of the children, representing 70 percent of the children who were in kindergarten in spring 2002.
-
Assessment, parent, and teacher data were obtained on 624 of the 979 children who were in kindergarten in spring 2002 (64%).
III. Program Weights
The program weight was calculated as the inverse of the program’s probability of selection. As mentioned earlier, a Keyfitz procedure was used to minimize the overlap with the cohort 1 and 2 program sample drawn earlier by Abt Associates. This procedure involved calculating conditional probabilities of selection, which are based on whether the program was sampled previously or not, and whether its probability of selection increased compared with the previous sample. Prior to sampling for cohort 3, the unconditional probability of selection for each program on the cohort 3 frame was calculated as

where Nh (is the number of programs on the frame in stratum h, NEWSMPSZh is the sample size for stratum h for the cohort 3 design, and FIRSTYRi is the first year enrollment for program i from the PIR. The probability of selection for each program under the Abt sample design for cohorts 1 and 2 was also calculated as

where Nh was the number of programs on the Abt frame in stratum h, SAMPSIZEh was the sample size for stratum h under the Abt design, and ENRTOTi was the total enrollment for the i-th program from an earlier PIR.
The conditional probability of selection was calculated for each program on the cohort 3 frame according to the Keyfitz procedure as:
Case 1: NEWPSEL ³ 1 - ORIGPSEL
if the program was sampled for cohorts 1 and 2, = 1 if the program was not sampled for cohorts 1 and 2.
Case 2: NEWPSEL < 1 - ORIGPSEL
CONDPROB = 0 if the program was sampled for cohorts 1 and 2,

if the program was not sampled for cohorts 1 and 2.
These conditional probabilities of selection were the measures of size used to select the cohort 3 program sample. It can be shown that the Keyfitz procedure preserves the unconditional cohort 3 program probabilities of selection, while at the same time minimizing the overlap. Thus the cohort 3 program weight is the inverse of NEWPSEL, the unconditional probability of selection under the cohort 3 design.
All 43 eligible programs cooperated with the study, so that nonresponse adjustments at the program level were unnecessary.
For each program, a set of 43 jackknife replicate weights was created for calculating standard errors. The replicate weights were created using a standard stratified jackknife procedure. One program at a time was dropped (i.e. given a zero replicate weight) and the weights of the remaining programs in the same stratum were adjusted by a factor of nh/(nh-1), where nh is the number of sampled programs in stratum h. The program weights in the other strata were left unchanged. By repeating this 43 times, 43 replicate weights were obtained for each program. For estimates involving child or classroom data from all 43 programs, the degrees of freedom for the variance of the estimate is #PSUs - #varstrat = 43 - 12 = 31. (One of the 13 original sampling strata was collapsed with an adjacent stratum for variance estimation purposes because it contained only one eligible sampled program.)
A. Classroom Weighting
Two sets of class weights were produced for classroom level estimation: one set for fall 2000 cross-sectional estimates and a second set for fall 2000 – spring 2001 longitudinal classroom analysis. Class base weights were first created that reflected the overall probability of selection for the class, including the program probability of selection. These base weights were adjusted for classroom level nonresponse, using the following criteria for a complete classroom:
Fall 2000 cross-sectional estimates: the classroom must have complete fall 2000 observation data. Classroom observation data include counts of children and adults, Assessment Profile (Scheduling, Learning Environment, and Individualizing), ECERS-R, Arnett Caregiver Interaction Scale, Teacher-Directed Activities Checklist and Wrap-Up measures.
Fall 2000-Spring 2001 longitudinal analysis: the classroom must have complete observation data for either fall 2000 or spring 2001 and child assessment data for both fall 2000 and spring 2001.
A1. Class Base Weights
A class base weight was created for each of the 367 initially sampled
classes in Fall 2000. Fifty-four reserve classes that were never used
were given base weights of zero. Six main sample classes were sampled
out on an ad hoc basis by field staff to reduce burden and to have independence
between classes. They were assigned base weights of zero, since they were
not part of the final sample. In this situation, a teacher had both a
morning and afternoon class in the sample. One class out of the morning/afternoon
pair was subsampled.
The remaining 307 classes considered to constitute the sample were each assigned a class base weight equal to the inverse of their overall probability of selection. The overall probability of selection is the product of the program probability of selection and the probability of selecting the class within the program. The inverse of the overall probability of selection can also be written as the product of the program weight and the within-program class weight:
Class Base Weight = Program Weight * (Total # Classes in Program / # sampled classes fielded)
Collapsed classrooms were counted as one classroom in the base weight calculations, since they were treated as a single unit in sampling. The ad hoc subsampling was reflected by multiplying the base weight of the retained class in the am/pm pair by a factor of 2 and the dropped class by zero. One class that had merged with another was given a zero base weight, and the newly merged class had its base weight multiplied by a factor of .5 to reflect its increased probability of selection.
Forty-three jackknife class replicate base weights were created from the program replicate weights:
Class Replicate Base Weight j = Program Replicate Weight j * (Total #Classes in Program / #sampled classes fielded); j = 1, 2, …43.
A2. Cross-sectional Fall 2000 Class
Weights
Of the 307 sampled classes that were fielded in fall 2000, 279 were eligible
and had complete classroom data, 8 were eligible but didn’t complete
data collection, and 20 were discovered to be ineligible. A class nonresponse
adjustment factor was applied to the class base weights of the 279. The
nonresponse adjustment factor was computed separately by program. Both
the 8 incomplete and 20 ineligible classes were given a zero final class
weight. The classroom replicate base weights were also adjusted for nonresponse
by program, so that the sampling variability in the nonresponse adjustments
will be reflected in the standard error estimates.
The sum of the nonresponse-adjusted fall 2000 classroom weights is 34,638. The unweighted and weighted completion rates are both 97%, excluding ineligibles from both numerator and denominator. The unweighted and weighted eligibility rates are both 94%. The class base weight was used in calculating the weighted rates.
A3. Longitudinal Fall 2000 - Spring
2001 Class Weights
Of the 286 eligible classes in fall 2000, 280 completed data collection
in spring 2001. Note that the 279 fall 2000 classroom completes are not
a subset of the 280 spring 2001 completes. Five classes that completed
fall 2000 data collection did not complete the spring 2001, and six classes
that completed spring 2001 data collection did not complete the fall 2000.
There were 79 new classes added in spring 2001 because children who switched
classes after the fall 2000 data collection were followed to the new class.
However, no classroom observations were done at these new classes, so
they were not considered to be part of the classroom sample and were assigned
a zero base weight.
A class nonresponse adjustment factor was applied to the class base weights of the 280 eligible completes. The nonresponse adjustment factor was computed separately by program. The classroom replicate base weights were also adjusted for nonresponse by program, so that the sampling variability in the nonresponse adjustments will be reflected in the standard error estimates.The incomplete and ineligible classes, along with the 79 new classes, were given a zero final class weight.
The sum of the nonresponse-adjusted fall 2000–spring 2001 classroom weights is 34,768. The unweighted and weighted completion rates are both 98%, excluding ineligibles from both numerator and denominator. Both unweighted and weighted eligibility rates are 94%. The class base weight was used in calculating the weighted rates.
B. Child Weights
Two sets of child weights were produced: a cross-sectional set for fall 2000 estimates, and a fall 2000 -spring 2001 set for longitudinal analysis. Child base weights were first created that reflected the overall probability of selection for the child, including the program and classroom stages of sampling. These base weights were adjusted for child nonresponse, using the following criteria for a complete child case:
Fall 2000 cross-sectional analysis: a child is considered a complete case if the child has a parent interview from either fall 2000 or spring 2001, and a fall 2000 child assessment or teacher rating.
Fall 2000-Spring 2001 longitudinal analysis: a child is considered a complete case if the child has either a fall 2000 or spring 2001 parent interview, and one of the following data pairs: a child assessment for both fall 2000 and spring 2001, or a teacher rating for both fall 2000 and spring 2001.
B1. Child Base Weights
In 286 eligible fall 2000 classes, all eligible children in their first
year of Head Start were taken into the sample with certainty. A base weight
was created for each child as the product of their program weight and
nonresponse-adjusted classroom weight. Note that these nonresponse adjusted
class weights are not the same as those described earlier, which were
designed for use in classroom level analysis. The creation of special
classroom weights for the child weights was necessary because there were
eligible classrooms that did not have complete classroom observations,
but did allow their children to be sampled, and vice versa. To create
this special classroom weight, the classroom base weight was adjusted
for classes that had eligible children but where “sampling”
of children did not take place.This nonresponse-adjusted classroom weight
was then used in calculating the child base weight. Since there was no
subsampling of children within classrooms, the within-classroom child
weight is equal to one, and the overall child weight can be written as:
Child Base Weight = Program Weight * Nonresponse-adjusted Classroom Weight.
A set of 43 jackknife (JKn) replicate base weights was also created for each child using the program replicate weights and the special full-sample nonresponse-adjusted classroom weight:
Child Replicate Base Weight j = Program Replicate Weight j * Nonresponse-adjusted Classroom Weight; j = 1, 2, …43.
B2. Child Fall 2000 Cross-Sectional Weights
Of the 3,100 children in the fall 2000 sample, 2,535 were considered completes
for the fall 2000 data collection, 251 were eligible but incomplete (30
of these had assessments but no parent interview), and 314 were ineligible.
Children could be ineligible if they came from classrooms that were ineligible,
or they were discovered to be in their second year of Head Start, or were
otherwise ineligible when fall 2000 data collection began.
The child base weights of the eligible, complete children in each classroom were adjusted for nonresponse separately by classroom. The ineligible and incomplete children were given a zero final child weight and were dropped from the sample for the spring 2001 data collection. The replicate child base weights were also adjusted for nonresponse by classroom, so that the sampling variability in the nonresponse adjustments will be reflected in the standard error estimates.
The sum of the nonresponse-adjusted fall 2000 child weights is 337,247. The unweighted and weighted completion rates are both 91%, excluding ineligibles from both the numerator and denominator. The unweighted and weighted eligibility rates are 90% and 91%, respectively. The child base weight was used in calculating the weighted rates.
B3. Child Fall 2000-Spring 2001 Longitudinal
Weights
In spring 2001 the eligible first year children were again given assessments,
a teacher rating, and an attempt was made to interview the child’s
parent(s). Of the 2,535 eligible children who had completed fall 2000
data collection, 2,359 were eligible, complete cases for the fall 2000
– spring 2001 data collection; 171 were eligible, incompletes; and
five became ineligible because they moved out of the area.
Children who had switched to new classes in the spring 2001 were followed up, but classroom observations were not done at the new classes. There were 91 children from the fall 2000 sample who were followed to 79 new classrooms in spring 2001. In calculating their base weights, these children were given the classroom probability of selection associated with the classroom from which they were originally sampled in fall 2000.
The child base weights of the eligible, complete children in each classroom were adjusted for nonresponse separately by classroom. The ineligible and incomplete children were given a zero final child weight. The replicate child base weights were also adjusted for nonresponse by classroom, so that the sampling variability in the nonresponse adjustments will be reflected in the standard error estimates.
The sum of the nonresponse-adjusted fall 2000–spring 2001 child weights is 338,047. The unweighted and weighted conditional spring 2001 completion rates are both 93%. The conditional rate is the percent of fall 2000 eligible completes who also completed the spring 2001 data collection. The overall (unconditional) completion rate is the product of the completion rates for the fall 2000 and spring 2001 data collections: 91% * 93% = 85%. This rate is the percent of eligible, sampled children in fall 2000 who completed the spring 2001 data collection.
IV. Data Collection Instruments
A. Direct Child Assessment
A1. Peabody Picture Vocabulary Test
- Third Edition - Revised
The Peabody Picture Vocabulary Test (PPVT-III) (Dunn & Dunn, 1997)
is designed to assess children’s knowledge of the meaning of words
by asking them to say or indicate by pointing which of four pictures best
shows the meaning of a word that is said aloud by the assessor. A series
of words is presented, ranging from easy to difficult for children of
a given age, each accompanied by a picture plate consisting of four line
drawings. The test requires about 10 minutes to administer. It is suitable
for a wide range of ages from 2 ½ through adulthood and has established
age norms based on a national sample of 2,725 children and adults tested
at 240 sites across the U.S.
The PPVT-III has been extensively revised from earlier versions of the test. These improvements were undertaken to promote easier testing and more accurate scoring. Also, new drawings have been added and dated illustrations dropped so as to achieve better gender and ethnic balance. Individual test items that showed statistical bias by race or ethnicity, gender, or region were deleted from the item pool for the scale prior to standardization. PPVT-III was reported to be highly reliable utilizing FACES data with internal-consistency reliability (alpha) coefficients ranging from .96 for fall 2000 to .97 for spring 2001.
A Spanish-language test, the Test de Vocabulario en Imagenes Peabody (TVIP), is also available, but has not been updated to be directly comparable to the PPVT-III. For FACES, the TVIP was used with children whose primary language was Spanish.
A screener was used to determine whether English-language learners were to be administered the direct child assessment battery in English or not. The screener involved information provided by teachers and assessors which was used to determine the language of administration. In fall 2000, English-language learners who were determined to be primarily Spanish-speaking, received the entire direct child assessment battery in Spanish, e.g., TVIP, Woodcock Munoz Letter-Word Identification,Applied Problems, Dictation, etc. They also were administered the PPVT and Woodcock Johnson Letter-Word Identification in English, as well. In spring 2001, these same children received the entire direct child assessment battery in English. They were also administered the TVIP and Woodcock Munoz Letter Word Identification in Spanish for the purpose of comparison. In fall 2000, English-language learners who were determined to primarily speak a language other than Spanish did not receive any portion of the direct child assessment battery in their native languages. In spring 2001, these same children received the entire direct child assessment battery in English.
A2. Woodcock-Johnson Psycho-Educational
Battery - Revised
The updated edition of the Woodcock-Johnson Battery (WJ-R) is a carefully
constructed and widely used test battery. The set of individually administered
tests is designed to assess the intellectual and academic development
of individuals from preschool through adulthood (Woodcock & Johnson,
1989; Salvia & Ysseldyke, 1991). FACES used three subtests from the
Achievement Battery that together constitute an “Early Development—Skills”
cluster, according to the test developers. The cluster is comprised of
the Letter-Word Identification, Applied Problems, and Dictation tests.
The same three subtests of the Spanish version (Woodcock-Muñox
Pruebas de Aprovechamiento-Revisada) were used in the Spanish version
of the FACES assessment battery.
Letter-Word Identification. The first five Letter-Word Identification items involve symbolic learning, or the ability to match a rebus (pictographic representation of a word) with an actual picture of the object. The remaining items measure children’s reading identification skills in identifying isolated letters and words that appear in large type on the pages of the test book. As well as being part of the Early Development cluster, this subtest is also part of the Basic Reading Skills cluster.The internal consistency of the Letter-Word Identification subtest with FACES children averaged .84 for fall 2000 and .86 for spring 2001.
Letter Naming. The Letter Naming task is a test developed for use in the Head Start Quality Research Centers curricular intervention studies. Children are shown all 26 upper-case letters of the alphabet, divided into three groups of 8, 9, and 9 letters, arranged in approximate order of item difficulty.They are asked to identify the letters they know by name. It has the virtue of providing specific numeric information about how many letters Head Start children learn and which ones they are more or less likely to acquire. The Letter Naming task provides complementary information to the Woodcock Johnson Letter Word Identification task regarding children’s knowledge and awareness of letters. Children’s knowledge and awareness of letters is an essential prerequisite to their learning how to read.
Applied Problems. This subtest measures children’s skill in analyzing and solving practical problems in mathematics. In order to solve the problems, the child must recognize the procedure to be followed and then perform relatively simple counting, addition, or subtraction operations. Because many of the problems include extraneous stimuli or information, the child must also decide which data to include in the count or calculation. As well as being part of the Early Development cluster, the subtest is also part of a Broad Mathematics cluster. The internal consistency of the Applied Problems subtest with FACES children averaged .90 for fall 2000 and .91 for spring 2001.
Dictation. The first six items in this subtest measure prewriting skills, such as drawing lines and copying letters. The remaining items measure the child’s skill in providing written responses when asked to write specific upper- or lower-case letters of the alphabet. Later parts of the test ask the child to write specific words and phrases, punctuation, and capitalization.The internal consistency of the Dictation subtest with FACES children averaged .77 for both fall 2000 and spring 2001.
A3. McCarthy Scales of Children’s
Abilities
The McCarthy Scales of Children’s Abilities is a widely used and
well-documented test battery. FACES employed one subtest from the battery,
the Draw-A-Design Task. The Draw-A-Design Task was used to assess children’s
perceptual-motor skills. This task asks the child to draw copies of a
series of increasingly complex geometric figures. For FACES, this task
was directly translated as part of the Spanish version of the assessment.
The Draw-A-Design Task was reported, utilizing FACES data, with internal-consistency
reliability (alpha) coefficients ranging from .58 for fall 2000 to .68
for spring 2001.
A4. Story and Print Concepts
The Story and Print Concepts task was an adaptation of earlier prereading
assessment procedures developed by Marie Clay (1979), William Teale (1988,
1990), and Mason and Stewart (1989). In these procedures, a child is handed
a children’s storybook (FACES Battery - Where’s My Teddy?
(Alborough, 1992) or ¿Dónde Está Mi Osito?
(Alborough, Castro, Trans. 1992)) upside down and backwards. The assessor
asks a series of questions designed to test the children’s knowledge
of books. These include questions regarding the location of the front
of the book, the point at which one should begin reading, and information
relating to the title and author of the book. The assessor reads the story
to the child and asks basic questions about both the mechanics (print
conventions) of reading and the content (comprehension) of the story.
The print conventions questions pertain to children’s knowledge
of the left-to-right and up-and-down conventions of reading, while the
comprehension questions pertain to children’s recall of key facts
from the story. Additionally, for FACES, questions were added tapping
rhyming awareness (e.g., “I’ll say some words from the story
and you tell me whether they rhyme, OK - bawl and small, etc.”)
and phonological awareness (e.g., “What word would be left if I
took “teh” away from Ted?”). These additions were only
included in the fall 2000 direct child assessment battery. FACES reliabilities
(internal consistencies) for these concepts for both the fall and the
spring were as follows: 1.) Book Knowledge (.57 and .59); 2.) Print Conventions
(.73 and .74); and 3.) Comprehension (.43 and .41).
A5. Social Awareness
This measure was adapted from a subtest of the Comprehensive Assessment
Program (CAP) Early Childhood Diagnostic Instrument used by Snow et al.
(1995) among others to test children’s general knowledge and awareness
of the social environment. The child is asked to give his/her “full
name,” which includes both first and last name, his/her age (either
verbally, which is given full credit, or by holding up the correct number
of fingers, which is given partial credit), and month/day of birth. The
FACES reliabilities for the Social Awareness measure were .63 for fall
2000 and .61 for spring 2001.
A6. Color Names and One-to-One Counting
This was also a subtest of the CAP Early Childhood Diagnostic Instrument
used by Snow et al. (1995) and developed by Marie Clay (1979),William
Teale (1988, 1990) and Mason and Stewart (1989) as a battery of emergent
literacy and school readiness measures. For the FACES battery, 10 teddy
bears of different colors are presented randomly arranged on a page and
the child is asked to point to each in turn and name the color. Following
the Color Names task, the child is asked to count the bears and the assessor
marks the final number the child arrives at when finished counting (correct
answer is “10”). After this, the child is asked to report
the total number of bears. The verbatim response is then recorded. Following
these questions, the assessor must rate the child’s one-to-one counting
performance using a 5-point scale. At the extremes, a score of 5 indicated
that the child made no mistakes and score of 1 indicated that the child
could not count or did not try to count. The FACES reliabilities for the
Color Naming task were .95 for Fall 2000 and .94 for Spring 2001.
A7. Leiter International Performance
Scale -Revised (Leiter-R) - Attention Sustained
The Leiter-R by Roid and Miller (1997) assesses cognitive function in
children and adolescents. The battery includes measures of nonverbal intelligence
in fluid reasoning and visualization, as well as appraisals of visuospatial
memory and attention. In spring 2001, the Leiter-R AS (Attention Sustained)
Subtest was added to the FACES direct child assessment battery to permit
assessments of children’s visuospatial memory and attention. The
subtest is primarily nonverbal and is administered in two subsections—the
first being for those 2-3 years of age and the second being for those
4-5 years of age. Assessors provide minimal instructions throughout the
administration of the Leiter-R AS. Children are presented with a series
of pages containing pictures and are instructed to mark off all pictures
that resemble a reference picture. The assessor times the child, with
times ranging from 30 seconds to 120 seconds allotted for completion of
the tasks. FACES reliabilities for the Leiter Attention Sustained subtask
by age groupings for spring 2001 were as follows: 1.) 2 - 3 year old -
.71; and 2.) 4 - 5 year olds - .81.
A8. Interviewer Ratings
At the end of the one-on-one testing sessions with the children, the assessor
completes a set of rating scales evaluating the child’s behavior
in the test situation, including the child’s approaches to learning
and problem behaviors. There are two sections to these ratings. The first
consists of eight scales rating the child’s response during the
assessment on eight different domains: task persistence, attention span,
body movement, attention to directions, comprehension of directions, verbalization,
ease of relationship, and the child’s level of confidence. Ratings
use 4-point scales with descriptive anchors at each point. For example,
the “task persistence” scale consists of the following anchor
points: persists with task (4), attempts task briefly (3), attempts task
after much encouragement (2), refuses (1). The FACES reliabilities for
the Interviewer Ratings were .82 for fall 2000 and .81 for spring 2001.
The second section asks the assessor to indicate any special concerns regarding the child’s ability to complete the assessment: responding nonverbally, using nonstandard English such as dialect, speaking English as a second language, having limited English proficiency, experiencing difficulty hearing or seeing the assessor/test materials, or reporting the child’s speech was difficult to understand. These items use 3-point ratings to indicate the degree to which the child displayed any of these characteristics (i.e., “not at all,” “somewhat,” and “very much”).
A9. Kindergarten Follow-Up ECLS-K
Measures
Two additional measures were included in the follow-up kindergarten assessment
battery (spring 2002): the Reading scale and the General Knowledge scale,
which were adapted from the Early Childhood Longitudinal Study - Kindergarten
Cohort (ECLS-K).
In ECLS-K, the Reading scale taps a variety of skills that indicate reading ability (including familiarity with print), recognition of letters and phonemes, vocabulary, and reading comprehension skills (e.g., children’s understanding of the text), as well as their personal reflection and critical evaluation of the text. The General Knowledge scale taps skills in the natural sciences (e.g., their conceptual understanding of why things occur as they do, and their ability to pose questions and investigate answers in the natural sciences) and social studies (e.g., their basic knowledge of History, Government, and Culture). Both scales follow the guidelines of the 1996 National Assessment of Educational Progress, have been reviewed by curriculum experts, as well as elementary school teachers, and have been found to be both reliable and valid measures of reading achievement and basic knowledge acquisition.29
The Reading assessment was administered in two stages. First, a routing test was administered to estimate the child’s reading ability. Based on his/her performance on the routing test (either “high,” “medium,” or “low”), an appropriate “second stage” test was administered. The Reading assessment had three levels of second stage tests: low (red), medium (yellow), and high (blue). For the General Knowledge assessment, each child was administered only the routing test. Estimates of reliability with FACES data, as measured by Cronbach’s coefficient alpha, will be provided at a later point when the data become available.
B. Classroom Observation Instruments
In FACES, two distinctive types of observation instruments (i.e., classroom observation and child observation) were used to measure peer interactions, friendships of children, and the extent to which Head Start programs employed skilled teachers and provided developmentally appropriate environments and curricula for their children.
B1. Counts of Children and Adults
The Counts of Children and Adults provide information needed to calculate
child/adult ratios and for other calculations to be used in assessing
specific measures of classroom quality. Classroom observers are tasked
with counting the number of children (boys and girls), the number of paid
staff, and the number of adult volunteers at two separate time periods
during the classroom observation. The two counts must be at least an hour
apart and must involve one structured (teacher-directed) activity and
one unstructured activity.
B2. Assessment Profile
The Assessment Profile (Abbott-Shim & Sibley, 1987) is a structured
observation guide designed to provide a quantitative assessment of classrooms
and teaching practices that facilitate the learning and development of
children. Three subscales were used in FACES: Scheduling, Learning Environment,
and Individualizing.
The Scheduling subscale assesses the written plans for classroom scheduling and how classroom activities are implemented. The appropriateness and completeness of the classroom activity plan are also noted. The subscale also assesses the balance and variety of learning contexts (e.g., individual, small group, and large group) and learning opportunities (i.e., child- vs. teacher-directed and active vs. quiet activities). The 14 observation items are scored in a yes/no format. High scores on this measure are indicative of a teacher that uses a “planful” approach to classroom activities. The reliability of the Scheduling subscale was reported as .89 for fall 2000 and .87 for spring 2001.
The Learning Environment subscale focuses on the accessibility of a variety of learning materials to children in the classroom. Variety is assessed across various conceptual areas, such as science, math, language, fine motor, etc. and also within each conceptual area. The subscale also assesses how classroom space is arranged to determine whether the classroom encourages independence (e.g., whether the learning materials are located on low shelves and clearly labeled) and reflects the child as an individual. When materials are both available and accessible, and in sufficient numbers (typically a minimum of three in each group) the item is given a positive score. High scores on this 7-item measure indicate a “learning rich” environment, filled with toys and learning materials that address a variety of developmental domains. The reliability of the Learning Environment subscale was reported as .68 for fall 2000 and .77 for spring 2001.
The Individualizing subscale focuses on the extent to which emphasis is placed on children, individually, in the classroom setting. This includes whether or not there are periodic individual assessments of each child’s performance using portfolios of his/her work, performance inventories, and teacher notations. Also included is whether or not child assessment information is used for planning individualized learning experiences. The final inclusion involves whether or not teachers have the ability to make provisions for children with special needs. The reliability of the Individualizing subscale was reported as .50 for fall 2000 and .54 for spring 2001.
B3. Early Childhood Environment Rating
Scale-Revised (ECERS-R)
The Early Childhood Environment Rating Scale-Revised (ECERS-R) is a global
rating of classroom quality based on structural features of the classroom
(Harms & Clifford, 1980). It has been widely used in child development
research and has predicted optimal child outcomes in a number of studies
(e.g., Phillips, Voran, Kisker, Howes, & Whitebook, 1994). The revised
version of the ECERS provides improvements to the items and represents
an improvement on the standardization of the observational methods. In
addition, the ECERS-R is easier to train and gain inter-rater reliability.
The ECERS-R contains 37 items representative of classroom quality. Each
item is coded on a 7-point scale with a score of 1 representing “inadequate,”
a score of 3 representing “minimal quality,” a score of 5
representing “good quality,” and a score of 7 representing
“excellent quality.” The internal consistency of the ECERS-R
mean score for all combined items was .92 for both fall 2000 and spring
2001.
Seven subscales were derived from the ECERS-R for usage in analysis of FACES classroom quality, each pertaining to different elements of classroom quality. These are as follows: 1.) Personal Care Routines are measured using six items: greeting/departing, meals/snacks, nap/rest, toileting/diapering, health practices, and safety practices; 2.) Furnishings are measured using four items: indoor space, furniture for routine care, play, and learning, furniture for relaxation and comfort, and room arrangement for play; 3.) Language Skills are measured using four items: books and pictures, encouraging children to communicate, using language to develop reasoning skills, and informal use of language; 4.) Motor Skills are measured using four items: space for gross motor play, gross motor equipment, fine motor activities, and supervision of gross motor activities; 5.) Creativity is measured using six items: child-related display, art, music/movement, blocks, sand/water, and dramatic play; 6.) Social Skills are measured using four items: supervision, other than gross motor activity, discipline, staff-child interactions, and interactions among children; and 7.) Program Structure is measured using four items: space for privacy, schedule, free play, and group time. Five items were not incorporated into any of the subscales and are as follows: nature/science, math/numbers, use of TV, video, and/or computers, promoting acceptance of diversity, and provisions for children with disabilities. Thus there were only 32 of the 37 available items included in the subscales.
A separate subscale, labeled ECERS-R Language, was comprised of four items and was devised to assess the quality of the language environment in Head Start classrooms. Additional information about this subscale can be found in Chapter 4.
B4. Classroom Observation of Teacher-Directed
Activities
The Classroom Observation of Teacher-Directed Activities is a checklist
completed by classroom observers of observed teacher-directed activities
in 21 specific areas, e.g., reading stories, singing songs, etc. The classroom
observer indicates whether observed activities were directed toward individual
children (Individual Attention), a small group of children (Small Group
= 3 to 8 children), or a whole group of children (Whole Group = entire
classroom). Observers were instructed to mark down, only once for any
item, any teacher-directed activities observed throughout the course of
the classroom observation and if these observed activities were directed
toward individuals, a small group of children, or the entire classroom.
This checklist was introduced in Spring 2001.
B5. Arnett Caregiver Interaction Scale
The Arnett Caregiver Interaction Scale is a rating scale of teacher behavior
towards the children in the classroom. It consists of 26 items that assess
five areas of teacher behavior: sensitivity, punitiveness, detachment,
permissiveness, and prosocial interaction (Arnett, 1989). The version
of the Arnett Caregiver Interaction Scale utilized in the current round
of FACES consists of 30 items and five subscales with the subscale labels
being as follows: Sensitivity, Harshness, Detachment, Permissiveness,
and Independence. At the end of the observational period, the observer
completes the scale for an individual teacher, typically the lead teacher
in the classroom. For example, in evaluating whether the teacher “speaks
warmly to the children,” the observer will assign ratings indicating
the extent to which the statement is characteristic of the teacher, from
1 “never seen” to 4 “always or almost always.”
The Cronbach Coefficient Alpha for all of the items was .94 for fall 2000
and .69 for spring 2001.
C. Teacher’s Child Ratings and Teacher Background
Teacher ratings of children were important sources of information about children’s learning and behavior because teachers see children over extended periods of time and in a variety of settings. Using a rating form known as the Teacher’s Child Report (TCR), teacher’s were first asked to rate each child on a set of behaviors that assessed the child’s basic social skills and classroom behavior. In these two sections, the teacher is asked to indicate the extent to which a given statement (e.g., “follows the teacher’s directions”) is characteristic of the child, from 1 “never” to 3 “very often.” The items making up these ratings form two scales:
C1. Cooperative classroom behavior:
There are 12 ratings items for the teacher to indicate how often the
child engages in cooperative classroom behaviors such as following
teacher’s directions, helping put things away, complimenting
classmate, and following rules when playing games. The ratings include
items drawn from the Personal Maturity Scale (Alexander & Entwisle,
1988) and the Social Skills Rating System (Elliott, Gresham, Freeman,
& McCloskey, 1988) to assess positive behavior such as cooperation,
sharing, and expression of feelings. A summary score is created from
the 3-point scale items which ranges from zero to 24, with high scores
indicating more frequent cooperative behavior. The internal consistency
for this measure was .88 in both Fall 2000 and Spring 2001.
C2. Total behavior problems:
The Behavior Problems scale is based on measures of negative child behaviors
that are associated with learning problems and later grade retention.
Items come from an abbreviated adaptation of the Personal Maturity Scale
(Alexander & Entwisle, 1988), the Child Behavior Checklist for Preschool-Aged
Children, Teacher Report (Achenbach, Edelbrock, & Howell, 1987) and
The Behavior Problems Index (Zill, 1990). The items ask about the frequency
of aggressive behavior (e.g., hits/fights with others), hyperactive behavior
(e.g., is very restless), and anxious or depressed and withdrawn behavior
(e.g., is unhappy). The summary score from the scale’s 14 behavior
items ranges from zero to 28, with higher scores representing more frequent
or severe negative behavior. The reliabilities (internal consistency)
for these measures for both Fall and Spring are as follows:
1.) Total Problem Behaviors - .86 for both; 2.) Aggression - .83 and .85; 3.) Hyperactivity - .72 for both; and 4.) .77 and .76.
The teacher is then asked to rate the child’s problem solving skills and initiative, social relationships, creative representations, music/movement skills, and language/math skills. The teacher is asked to rate the child’s highest level of behavior in each of the above domains observed in the past week. Scale points for each item are described on paper and there is a glossary that provides concrete examples of each anchor point. For the purpose of FACES, fourteen items from the Child Observation Record (COR; High/Scope Educational Research Foundation, 1992) were selected with a demonstrated reliability of .94 for both fall 2000 and spring 2001. These 14-items were further divided up into the following scales: social relationships, creative representations, music and movement, and cognitive.
C3. Social Relationships (3 items):
A composite score based on teacher’s ratings of how well the child
makes friends, works with other children, and understands and expresses
feelings. Each item is rated on a five-point scale with higher scores
representing greater skill in coping with social situations and expressing
feelings appropriately. The summary score is the average of the three
items and ranges from one to five. The measure shows good reliability
with the FACES study, with Alpha Coefficients of .83 for both fall 2000
and spring 2001.
C4. Creative Representations (3 items):
A composite score based on the teacher’s ratings of how well
the child uses creative materials for self-expression in making and
building things, drawing and painting, and engaging in pretend play.
Each item is rated on a five-point scale with higher scores representing
greater proficiency. The summary score is the average of the three
items and ranges from one to five. The measure shows good reliability
with the FACES study, with Alpha Coefficients of .80 for fall 2000
and .81 for spring 2001.
C5. Music and Movement (4 items):
A composite score based on teacher’s ratings of how well the child
can imitate movements to a steady beat, follow music and movement directions,
exhibit body coordination, and manipulate small objects and perform precise
actions. Each item is rated on a five-point scale with higher scores representing
greater proficiency. The summary score is the average of the four items
and ranges from one to five. The measure shows good reliability with the
FACES study, with Alpha Coefficients of .88 for both fall 2000 and spring
2001.
C6. Cognitive (4 items):
A composite score based on teacher’s ratings of how well the child
can solve problems, engage in complex play, show interest in reading,
and exhibit classification skills by sorting objects. Each item is rated
on a five-point scale with higher scores representing greater proficiency.
The summary score is the average of the four items and ranges from one
to five. The measure shows good reliability with the FACES study, with
Alpha Coefficients of .82 for fall 2000 and .83 for spring 2001.
The Lead Teacher Background Information consists of questions asking the teacher about himself/herself, including sociodemographic and educational background and professional experience. Information about the curriculum being used, his/her attitude and knowledge about early childhood education practice (see Teacher Beliefs Scale write-up referenced in Chapter 4), and accommodations he/she has made or that others have made to meet the learning needs of children in his/her classroom, particularly children with special needs, are included, as well.
D. Parent Interview
Data from the FACES Parent Interview, administered in fall 2000 and spring 2001, provide Head Start with a comprehensive understanding of the families that they serve, including the characteristics of households and household members, levels and types of participation in the program and in other community services, involvement with their children, and understanding of their children’s development.
Parents were also asked to rate each child on a set of behaviors that assessed the child’s basic social skills and behavior problems. In this section, the parent is asked to indicate the extent to which a given statement (e.g., “makes friends easily”) is characteristic of the child, from 1 “not true” to 3 “very true or often true.” The items making up these ratings were drawn from two well-known measures of children’s positive behavior and behavior problems: the Entwisle scale of Personal Maturity (Entwisle, Alexander, Cadigan, & Pallis, 1987) and the Child Behavior Checklist for Preschool-Aged Children (Achenbach, Edelbrock, & Howell, 1987). Two scales were formed to assess children’s social competence:
D1. Social skills and positive approaches to learning:
Parents were asked to rate their child’s social skills and positive
approaches to learning by describing their children’s skills in
making friends and accepting their ideas, as well as enjoying learning
and trying new things. A summary score based on the scale’s seven
items ranges from zero to 14, with higher scores representing more positive
behavior.Table A-10 shows the reliabilities for the Social Skills measure
in both fall 2000 and spring 2001.
D2. Total Problem Behaviors:
Parents were also asked to rate their children on negative behaviors
that are relatively common among preschool children and that are associated
with adjustment problems in elementary school. Parents were asked
about three domains of problem behavior: hyperactive behavior, aggressive
behavior, and depressed or withdrawn behavior. The 12 behavior items
were combined in a summary score ranging from zero to 24, with higher
scores representing more frequent or severe negative behavior. Table
A-10 shows the reliabilities for all of these behavior problem measures
in both Fall 2001 and Spring 2001.
D3. Other Parent Interview Scales/Measures Referenced in the Report: (see chart on next page)
V. Field Staff Training
A weeklong training was conducted prior to each data collection period to prepare field staff for successful completion of data collection. The training included a wide variety of activities covering all the procedures, techniques, and contents required to carry out successful data collection in the Head Start centers:
-
Lecture, incorporating slides, overheads, and videotapes;
-
Exercises that simulate various procedures such as assessing classroom scheduling;
-
Video demonstration of assessment techniques and components of classroom scoring procedures;
-
Exercises to achieve pre-established levels of inter-rater reliability;
-
Participatory involvement of all trainees in small groups so that trainers may evaluate individual performance;
-
Multiple occasions of practice in real classroom settings that simulate what they are expected to do in the field, with the presence of a trainer and a small group of trainees to discuss the classroom ratings and provide valuable guidance on scoring reliability and agreement; and
-
One-on-one practice and role-play in the administration of child assessment procedures under supervision of training staff.
| NAMES AND SOURCES FOR OTHER PARENT
INTERVIEW SCALES/MEASURES REFERENCED IN THE REPORT |
|
| Name | Source |
| Pearlin Mastery Scale (Locus of Control) | Pearlin, L. I. and Schooler, C. (1978). The structure of coping. Journal of Health and Social Behavior, 22, 337-356. |
| CES-D Depression Scale | Radloff, L. S. (1977). The CES-D: A self-report depression scale for research in the general population. Applied Psychological Measurement, 1, 385-401. |
| Family Activities with Children | National Household Education Survey - FACES Research Team |
| Parental Involvement in Head Start | Head Start Quality Research Consortium (QRC) |
| Exposure to Violence | FACES Research Team |
| Domestic Violence Screener | Feldous, K. M., Koziol-McLain, J., Amsbury, H. L. et. al. (1997). Accuracy of three brief screening questions for detecting partner violence in the emergency room. JAMA, 227(17), 1357. |
| Substance Abuse Screener | Administration for Children and Families (1997). National Impact Evaluation of the Comprehensive Child Development Program. Washington, D.C.: U.S. Department of Health and Human Services. |
| Involvement with Criminal Justice System | FACES Research Team |
| Parenting Style | National Longitudinal Study of Youth (NLSY), Early Head Start Evaluation (EHS), QRC |
The field procedures manual contained information about working with a research team, appropriate behaviors within a classroom, and how to orchestrate Head Start center visits. Moreover, the manual covered an overview of all data collection instruments and administrative and travel procedures. Complete scoring rules and question-by-question specifications for the child assessment and child and classroom observation instruments were also discussed in the manual.
During the training, trainees were introduced to the purpose and goals of the study and background information on Head Start. Trainees were also introduced to the data collection materials and general issues regarding children and early childhood learning environments. Each day of training included a morning question and answer period regarding the previous day’s training, a daily review of the current day’s material, and a brief discussion of the next day’s events.
An additional practice session was given to provide trainees with more practice in either observation or assessment. Assignment of this practice was based on the measures in which the trainees needed more practice. For administering child assessments in Spanish, a special training for English-Spanish speaking trainees was held. The bilingual trainees had an opportunity to practice assessments with Spanish-speaking children.
VI. Data Collection Procedures
A. Site Visit Arrangements
The research team obtained feasible dates for the 2-week site visit from each of the sampled Head Start programs. Site visit dates for each program were coordinated within the data collection period and programs were notified about the visit dates. Three weeks before the site visit, a scheduling packet which contained the final visit schedule, a master list, organized by classroom, a reminder list, and a request for maps and directions to aid the research team was sent to the on-site coordinator (OSC). OSCs are members of the Head Start program staff specially designated to coordinate the data collection efforts by scheduling parent interviews, classroom visits with program teachers, and obtaining consent forms, among other related duties.
VII. Quality Control Visits
In FACES, Quality Control (QC) visits were built into every step of the data collection to ensure the highest quality data possible. The QC visitors consisted of the FACES project staff who were involved in designing the instruments, preparing the training materials, and conducting the training. The QC visitors were trained in both observation and assessment data collection and also served as technical consultants in the field. During the fall 2000 data collection, one 3-day QC visit to program sites was made.
VIII. Data Preparation & Data File Creation
A. Data Entry
Key entry and verification were performed on the study instruments using a sophisticated production data entry system. This system provides entry form layout, application of edit specification, data verification control, and provides data entry quality and production reports.
B. Frequency Review
The frequencies of responses to all data items (both individually and in conjunction with related data items) were reviewed to ensure that appropriate skip patterns were followed. Members of the data preparation team checked each item to make sure the correct number of responses was represented for all items. If a discrepancy was discovered, the problem case was identified and reviewed.
C. Data Edit
To code and edit questionnaire data, an integrated collection of software was utilized. Through this system of software, coding manuals and codebooks were developed, data editing was performed, and SAS source code was generated.
D. Data File Creation
Data files were created and analysis performed to provide summaries and assessments of Head Start children and their families during this period and to assess the reliability and validity of information contained within the data collection instruments. Numerous derived variables were created to increase the magnitude and scope of analytical capabilities. The coding for these derived variables may be obtained upon request.
IX. Reliability and Data Summary
In FACES, various data collection instruments were used to assess the accomplishments and behaviors of children in Head Start programs, as well as the educational and familial support that is provided to them. As noted in Section IV: Data Collection Instruments, these instruments are widely used and report mostly high reliabilities. The reliabilities for each data collection instrument and summaries for these data collection instruments are provided in the following Tables:Table A-2 - Tables A-11.
| Fall 2000 (Head Start) |
Spring 2001 (Head Start) |
| Social Awareness | Social Awareness |
| PPVT-III / TVIP | PPVT-III / TVIP |
| McCarthy Draw-A-Design | McCarthy Draw-A-Design |
| ----- | Leiter-R AS (Attention Sustained) Subset |
| Color Names and Counting | Color Names and Counting |
| Woodcock Johnson (Munoz): Letter-Word Identification | Woodcock Johnson (Munoz): Letter-Word Identification |
| Woodcock Johnson (Munoz): Applied Problems | Woodcock Johnson (Munoz): Applied Problems |
| Woodcock Johnson (Munoz): Dictation | Woodcock Johnson (Munoz): Dictation |
| Story and Print Concepts | Story and Print Concepts |
| Interviewer Rating: Assessment Behavior | Interviewer Rating: Assessment Behavior |
| Scales | Fall 2000 | Spring 2001 | ||||
|---|---|---|---|---|---|---|
| Number of Items |
Number of Cases |
Cronbach Alphas |
Number of Items |
Number of Cases |
Cronbach Alphas |
|
| Social Awareness | 5 | 2,068 | .63 | 5 | 1,948 | .61 |
| PPVT-III | 144 | 2,116 | .96 | 144 | 1,980 | .97 |
| McCarthy: Draw-A-Design | 9 | 2,068 | .58 | 9 | 1,943 | .68 |
| Leiter-R AS - Ages 2 to 3 | - | - | - | 4 | 406 | .71 |
| Leiter-R AS - Ages 4 to 5 | - | - | - | 4 | 1,758 | .81 |
| Color Names | 10 | 2,055 | .95 | 10 | 1,940 | .94 |
| WJR: Letter-Word Identification | 23 | 1,054 | .84 | 23 | 1,595 | .86 |
| WJR: Applied Problems | 23 | 1,054 | .90 | 23 | 1,595 | .91 |
| WJR Dictation | 12 | 1,054 | .77 | 12 | 1,595 | .77 |
| Story and Print Concepts: Print Conventions | 2 | 2,116 | .73 | 2 | 1,980 | .74 |
| Story and Print
Concepts: Book Knowledge |
5 | 2,116 | .57 | 5 | 1,980 | .59 |
| Story and Print
Concepts: Comprehension |
2 | 2,116 | .43 | 2 | 1,980 | .41 |
| Interviewer
Rating: Assessment Behavior |
8 | 2,021 | .82 | 8 | 1,901 | .81 |
| Scales | Fall 2000 | Spring 2001 | ||||
|---|---|---|---|---|---|---|
| Number of Items |
Number of Cases |
Cronbach Alphas |
Number of Items |
Number of Cases |
Cronbach Alphas |
|
| Social Awareness | 5 | 385 | .36 | 5 | 356 | .45 |
| TVIP | 144 | 392 | .92 | 144 | 364 | .92 |
| McCarthy: Draw-A-Design | 9 | 375 | .57 | 9 | 355 | .74 |
| Leiter-R AS - Ages 2 to 3 | - | - | - | - | - | - |
| Leiter-R AS - Ages 4 to 5 | - | - | - | - | - | - |
| Color Names | 10 | 378 | .92 | 10 | 358 | .93 |
| WM: Letter-Word Identification | 23 | 219 | .75 | 23 | 307 | .78 |
| WM: Applied Problems | 23 | 219 | .85 | 231 | 307 | .89 |
| WM: Dictation | 12 | 219 | .77 | 121 | 307 | .73 |
| Story and Print
Concepts: Print Conventions |
2 | 392 | .59 | 2 | 364 | .77 |
| Story and Print
Concepts: Book Knowledge |
5 | 392 | .43 | 5 | 364 | .48 |
| Story and Print
Concepts: Comprehension |
2 | 392 | .39 | 2 | 364 | .43 |
| Interviewer
Rating: Assessment Behavior |
8 | 372 | .77 | 8 | 353 | .68 |
| 1Spring 2001 Applied Problems & Dictation are Woodcock Johnson, not Woodcock Munoz.(back) |
| Scales | Fall 2000 | Spring 2001 | ||||||||
|---|---|---|---|---|---|---|---|---|---|---|
| Number of Cases | Mean | SD | Reported Response Range |
Possible Response Range |
Number of Cases | Mean | SD | Reported Response Range |
Possible Response Range |
|
| Social Awareness | 2,101 | 3.36 | 1.69 | 0 - 6 | 0 - 6 | 1,967 | 3.98 | 1.58 | 0 - 6 | 0 - 6 |
| PPVT-III* | 2,031 | 35.06 | 17.65 | 0 - 98 | 0 - 144 | 1,932 | 45.30 | 18.72 | 1 - 98 | 0 - 144 |
| McCarthy: Draw-A-Design | 2,112 | 2.92 | 1.33 | 0 - 13 | 0 - 19 | 1,980 | 3.52 | 1.70 | 0 - 15 | 0 - 19 |
| Leiter-R AS - Ages 2 to 5 | - | - | - | - | - | 2,253 | 40.72 | 10.79 | 1 - 70 | 0 - 70 |
| Color Names | 2,101 | 11.32 | 7.37 | 0 - 20 | 0 - 20 | 1,969 | 15.59 | 5.98 | 0 - 20 | 0 - 20 |
| WJR: Letter-Word Identification* |
948 | 5.30 | 2.61 | 0 - 21 | 0 - 23 | 1,511 | 6.59 | 3.19 | 0 - 22 | 0 - 23 |
| WJR: Applied Problems* | 963 | 7.52 | 4.36 | 0 - 21 | 0 - 23 | 1,542 | 8.98 | 4.70 | 0 - 22 | 0 - 23 |
| WJR: Dictation* | 916 | 5.11 | 1.83 | 0 - 12 | 0 - 12 | 1,491 | 5.64 | 2.11 | 0 -12 | 0 - 12 |
| Story and Print
Concepts: Print Conventions |
2,089 | 0.23 | 0.57 | 0 - 2 | 0 - 2 | 1,968 | 0.37 | 0.69 | 0 - 2 | 0 - 2 |
| Story and Print
Concepts: Book Knowledge |
2,087 | 1.62 | 1.27 | 0 - 5 | 0 - 5 | 1,961 | 2.41 | 1.30 | 0 -5 | 0 - 5 |
| Story and Print
Concepts: Comprehension |
2,102 | 0.54 | 0.70 | 0 - 2 | 0 - 2 | 1,967 | 0.71 | 0.75 | 0 - 2 | 0 - 2 |
| Interviewer
Rating: Assessment Behavior |
2,094 | 17.14 | 5.02 | 0 -24 | 0 - 24 | 1,950 | 19.05 | 4.38 | 0 - 24 | 0 - 24 |
| *Raw scores were used. |
| Scales | Fall 2000 | Spring 2001 | ||||||||
|---|---|---|---|---|---|---|---|---|---|---|
| Number of Cases | Mean | SD | Reported Response Range |
Possible Response Range |
Number of Cases | Mean | SD | Reported Response Range |
Possible Response Range |
|
| Social Awareness | 390 | 2.62 | 1.21 | 0 - 6 | 0 - 6 | 360 | 2.56 | 1.30 | 0 - 6 | 0 - 6 |
| TVIP* | 369 | 11.34 | 8.38 | 1 - 47 | 0 - 144 | 322 | 16.27 | 10.04 | 1 - 48 | 0 - 144 |
| McCarthy: Draw-A-Design | 392 | 3.37 | 1.34 | 0 - 13 | 0 - 19 | 364 | 4.05 | 1.90 | 0 -12 | 0 - 19 |
| Leiter-R AS - Ages 2 to 5 | - | - | - | - | - | - | - | - | - | - |
| Color Names | 386 | 8.90 | 6.62 | 0 - 20 | 0 - 20 | 362 | 13.46 | 6.49 | 0 - 20 | 0 - 20 |
| WM: Letter-Word Identification* | 195 | 4.37 | 1.20 | 0 - 10 | 0 - 23 | 295 | 5.01 | 1.68 | 0 - 12 | 0 - 23 |
| WM: Applied Problems* | 200 | 5.29 | 3.40 | 0 - 14 | 0 - 231 | 294 | 5.81 | 4.13 | 0 - 19 | 0 - 23 |
| WM: Dictation* | 188 | 4.99 | 1.28 | 1 - 11 | 0 - 121 | 293 | 5.72 | 1.74 | 0 - 11 | 0 - 12 |
| Story and Print
Concepts: Print Conventions |
391 | 0.17 | 0.53 | 0 - 3 | 0 - 2 | 360 | 0.14 | 0.47 | 0 - 2 | 0 - 2 |
| Story and Print
Concepts: Book Knowledge |
376 | 1.25 | 1.13 | 0 - 5 | 0 - 5 | 355 | 1.70 | 1.13 | 0 - 5 | 0 - 5 |
| Story and Print
Concepts: Comprehension |
386 | 0.47 | 0.67 | 0 -2 | 0 - 2 | 360 | 0.50 | 0.68 | 0 - 2 | 0 - 2 |
| Interviewer
Rating: Assessment Behavior |
383 | 17.54 | 4.41 | 0 -24 | 0 - 24 | 360 | 17.99 | 3.50 | 3 - 24 | 0 - 24 |
| *Raw scores were used. 1Spring 2001 Applied Problems & Dictation are Woodcock Johnson, not Woodcock Munoz.(back) |
| Scales | Fall 2000 | Spring 2001 | ||||
|---|---|---|---|---|---|---|
| Number of Items |
Number of Cases |
Cronbach Alphas |
Number of Items |
Number of Cases |
Cronbach Alphas |
|
| Assessment Profile: Scheduling |
14 | 227 | .89 | 14 | 243 | .87 |
| Assessment Profile: Learning Environment |
18 | 228 | .68 | 18 | 228 | .77 |
| Assessment Profile: Individualizing |
5 | 250 | .50 | 5 | 250 | .54 |
| ECERS Total Mean | 37 | 270 | .92 | 37 | 235 | .92 |
| Personal Care | 6 | 146 | .73 | 6 | 269 | .70 |
| Furnishings | 4 | 263 | .52 | 4 | 263 | .60 |
| Language | 4 | 2 | ||||

