Table of Contents | Previous |
Appendix A
Measures Used in the NSCAW
Battelle Developmental Inventory (BDI)
The BDI (Newborg, Stock, Wnek, Guidubaldi, & Svinicki, 1984) was used to assess development in children aged three years and younger. The instrument is designed to evaluate five domains of development for children birth to eight years: cognitive, adaptive (self-help), motor, communication, and personal-social; for this study only the cognitive domain was administered. The cognitive domain measures skills and abilities that are conceptual in nature. There are four subdomains: perceptual discrimination, memory, reasoning and academic skills, and conceptual development. The normative sample was composed of more than 800 children, with approximately 100 in each year age group. A total of 75% were from urban areas; 50% were male; and 84% were white, with the remaining 16% being of other ethnicities. Test-retest reliability reliability ranges from .90 to .99. For concurrent validity, correlations between the 10 BDI components and the Vineland Social Maturity Scale (VSMS) range from .79 to .93 (Newborg et al., 1984).
Bayley Infant Neurodevelopmental Screener (BINS)
The BINS is a screening tool to identify infants between the ages of 3 and 24 months with developmental delays or neurological impairments for further diagnostic testing. It has four conceptual assessment areas: Basic Neurological Functions/Intactness (of the infant’s central nervous system), Receptive Functions (sensation and perception), Expressive Functions (fine, oral, and gross motor skills), and Cognitive Processes (memory/learning and thinking/reasoning) (Aylward, 1995).
The BINS was standardized with a nonclinical and clinical sample. The nonclinical sample consisted of 600 infants with a normal length of gestation (38 to 42 weeks) and no prenatal, perinatal, or neonatal medical complications. This sample was stratified on age, race, gender, geographic region, and parent education level; it is representative of the U.S. population according to the 1988 update of the U.S. Census. The clinical sample was composed of 303 infants from clinics across the nation that deal with infants with neurodevelopmental problems. Most infants had more than one medical complication (Aylward, 1995).
Internal consistency was acceptable as indicated by Cronbach’s alpha ranging from .73 to .85 for the various age groups. Inter-rater reliability was higher at older ages, as indicated by .79 for 6 months, .91 for 12 months, and .96 for 24 months. Construct validity was moderate, as evidenced by correlations with the Mental Development (.63) and Psychomotor Development (.47) indexes of the Bayley Scales of Infant Development—Second Edition (BSID-II) and the BDI at 12 months for the Communication (.50), Cognitive (.51), and Motor (.50) domains (Aylward, 1995).
Child Behavior Checklist (CBCL)
The CBCL was “designed to provide standardized descriptions of behavior rather than diagnostic inferences” (Achenbach, 1991a, p. iii) about competencies, problem behaviors, and other problems. Items are on a 3-point Likert-type scale (0 = not true, 1 = somewhat or sometimes true, and 2 = very true or often true). It contains 100 items for 2- to 3-year-olds and 113 items for 4- to 18-year-olds. The problem scale is composed of eight syndromes (Withdrawn, Somatic Complaints, Anxious/Depressed, Social Problems, Thought Problems, Attention Problems, Delinquent Behavior, and Aggressive Behavior) and an Other Problems category (26 items for the 2- to 3-year-olds and 33 items for the 4- to 18-year-olds). Behaviors are also categorized as externalizing—containing the Delinquent and Aggressive Behavior syndromes—or internalizing—containing the Withdrawn, Somatic Complaints, and Anxious/Depressed syndromes. A Total Problems score may be derived from the total of the syndromes and Other Problems items (Achenbach, 1991a).
The problem syndromes were normed by gender and age, using a nationally representative sample of 2,368 children aged 4 to 18 years old who had not received mental health services or special remedial school classes in the previous 12 months (Achenbach, 1991a).
Cronbach’s alpha for the different samples ranged from .54 for Sex Problems for 4- to 11-year-old females to .96 for Total Problems. Very high inter-rater reliability was found as indicated by an intraclass correlation coefficient (ICC) of .96 for the problem items. Construct validity is good, as the problem syndromes correlate fairly well (.59 to .88) with similar scales from other instruments (Parent Questionnaire, Quay-Peterson Revised Behavior Problem Checklist, and ACQ Behavior Checklist) (Achenbach, 1991a).
Children classified as having clinical/borderline problem behaviors had scores 60 and above for externalizing, internalizing, and Total Problem behaviors. These cutoffs were the same for the 2- to 3- and 4- to 18-year-olds.
Children’s Depression Inventory (CDI)
The CDI measures depression by asking various questions of children aged 7 to 17 about their engagement in certain activities or their experience of certain feelings (e.g., sad, enjoy being around other people). The CDI contains 27 items, each with a 3-point Likert-type scale (0 = absence of symptom, 1 = mild symptom, and 2 = definite symptom) that addresses a range of depressive symptoms as indicated by five factors: Negative Mood, Interpersonal Problems, Ineffectiveness, Anhedonia, and Negative Self--Esteem. The normative sample consisted of 1,266 Florida public school students aged 7 to 16 (Kovacs, 1992).
In studies conducted from 1983 to 1991, internal consistency has been good, with Cronbach’s alpha ranging from .71 to .86. Alpha for the five factors range from .59 to .68—suggesting that the subscales are not robust. Test-retest reliability has ranged from .38 to .87 depending on the time interval and sample. Studies (cited in Kovacs, 1992) have established concurrent validity with the Coopersmith Self-Esteem Inventory (-.72 for girls and -.67 for boys), Center for Epidemiological Studies Depression Scale (.44), and Social Adjustment Scale-Self-Report (.50). Although discriminant validity results have been mixed, significant differences were found between normative and clinical groups (Kovacs, 1992). Children were classified as depressed if they fell at or above the 91st percentile for their age and gender group. This clinical cutoff is based on the CDI normative sample’s rates of depression in the CDI manual (Kovacs, 1992).
Home Observation Measure of the Environment—Short Form (HOME-SF)
The HOME measures the quality and quantity of stimulation and support in the home environment of children from birth to 10 years (Bradley, 1994; Bradley, Corwyn, Burchinal, McAdoo, and Coll, 2001). The number of items ranges from 20 to 24, depending on the age of the child. Items address the mother’s behaviors toward the child and various aspects of the physical environment (e.g., safe play environment, size of living space), asking whether these conditions exist, do not exist, or were not observed. Although the observer’s presence may influence the parent-child interaction, the duration of the caregiver interview increases the likelihood that any such alteration in behavior will be reduced, for the mother will have more difficulty inhibiting her usual reactions over this extended period (Caldwell, Bradley, & Staff, 1979).
The initial normative sample was composed of 174 infants (aged 4 to 36 months) and 117 preschoolers. Since then, the HOME has been adapted for many national studies, although national norms have never been established. The version this study duplicates is the shorter version of the HOME used in the National Longitudinal Survey of Youth (NLSY), a study that includes many low-income families. Reference to the NLSY scores is most useful for interpreting these NSCAW OYFC scores (e.g., Center for Human Resource Research, 1999). In keeping with Bradley’s designation, this measure is labeled as the HOME-SF in this report.
Estimates of internal consistency have been greater than .80 for total scores, whereas coefficients for subscales range from .30 to .80. When percentage has been used to measure inter-observer agreement, levels have always been at least 85%. When a coefficient has been used to measure agreement, the coefficient was at least .80 (Bradley, 1994). No independent tests of inter-observer agreement were conducted for this study.
Kaufman Brief Intelligence Test (K-BIT)
The K-BIT is a brief, individually administered measure of verbal and nonverbal intelligence for children, adolescents, and adults, ranging in age from 4 to 90 years. Verbal items assess word knowledge and verbal concept formation. Matrices (nonverbal) items assess ability to perceive relationships and complete analogies. The normative sample was composed of a nationally representative sample of 2,022 people aged 4 to 90 years tested at 60 sites in the United States. The sample was stratified based on gender, geographic region, socioeconomic status, and race/ethnicity. Children aged 4 to 16 made up 66% (1,342) of the sample (Kaufman & Kaufman, 1990).
Internal consistency for the Vocabulary subscale was high for 4- to 19-year-olds, ranging from .89 to .98, and moderate for Matrices, ranging from .74 to .95. Test-retest reliability for 5- to 12-year-olds was good for Vocabulary (.86) and moderate for Matrices (.83). Test-retest reliability for 13- to 19-year-olds was higher for Vocabulary (.96) and moderate for Matrices (.80) (Kaufman & Kaufman, 1990).
Limited Maltreatment Classification System (L-MCS)
In the present study we used a modification of the Maltreatment Classification System (MCS: Barnett et al, 1993) to capture information about the report of alleged maltreatment that preceded the investigation that triggered the child’s entrance into the study. Although the MCS was designed for case record reviews, in this study we collected data about maltreatment in an interview with the child welfare worker who knew the most about the investigation and had immediate access to case record materials. Although the MCS gathers information about all types of maltreatment and then classifies each of them according to severity, this was not feasible in an interview setting because of interview length. Data was collected about all the types of maltreatment that had been recorded in the allegation, but the one that was judged to be most serious was the only one coded in greater detail. For this type of maltreatment, the onset was recorded and the severity was rated (on closed ended scales provided by the MCS and modified by the investigators to create 5-point scales for each) from 1 (least) to 5 (most). The investigators also added examples of parameters of maltreatment that could anchor each of these scale points. These were based on the instructions to the coders of the case materials. Thus the MCS-L offers four dimensions of maltreatment, the number of types, the combination of types, the severity of the most serious type, the onset of the maltreatment, and who was responsible for the maltreatment.
Parent-Child Conflict Tactics Scale (CTS-PC)
The CTS-PC’s theoretical basis is conflict theory, which assumes that conflict is an inevitable part of all human association, whereas physical assault as a tactic to deal with conflict is not. CTS-PC uses an 8-point Likert-type scale (1 time, 2 times, 3 to 5 times, 6 to 10 times, 11 to 20 times, more than 20 times, not in the past 12 months, never) to measure frequency and extent to which a parent has carried out specific acts of physical and psychological aggression (Straus, Hamby, Finkelhor, Moore, & Runyan, 1998). This measure consists of three subscales that assess Nonviolent Discipline, Psychological Aggression, and Physical Assault. Two additional supplemental subscales measuring Neglect and Sexual Abuse (total 22 items) were available but not administered on NSCAW.
The CTS-PC was tested on a nationally representative sample of 1,000 U.S. children. Internal consistency was marginal, as indicated by Cronbach’s alpha ranging from .55 (Physical Assault) to .70 (Nonviolent Discipline). Construct validity for the CTS-PC has been moderate, with correlations of -.34 between Corporal Punishment and child’s age and lack of significant correlations with Child Age and Severe Assault (-.06). Analysis of covariance found no significant differences between Euro-American and African-American parents on corporal punishment, but significant differences on severe physical assault were found, which is consistent with past findings in the literature. Gender differences consistent with the literature were also found in this study of construct validity (Straus, Hamby, Finkelhor, Moore, & Runyan, 1998).
Preschool Language Scale-3 (PLS-3)
The PLS-3 measures language development of children from birth to six years (in this study it was administered to children from birth to five years). The Auditory Comprehension subscale measures precursors of receptive communication skills with tasks focusing on attention abilities. The Expressive Communication subscale measures precursors of expressive communication skills with tasks that focus on social communication and vocal development. A Total Language score combines these two subscales. The PLS-3 was standardized with a sample of 1,200 children aged 2 weeks to 6 years, 11 months, with equal percentages of males and females in each age group. Representative sampling based on 1980 U.S. Census data and the 1986 update was stratified by parent education level, geographic region, and race (Zimmerman, Steiner, & Pond, 1992).
Internal consistency using Cronbach’s alpha is, on average, acceptable for Auditory Comprehension (mean = .76; range of .47 to .88) and higher for Expressive Communication (mean = .81; range of .68 to .91), and Total Language (mean = .87; range of .74 to .94). Test-retest reliabilities ranged from .89 to .90 for Auditory Comprehension, from .82 to .92 for Expressive Communication, and from .91 to .94 for Total Language. Inter-rater agreement is 89% with correlation between scores = .98 (Zimmerman, Steiner, & Pond, 1992).
Using discriminant analysis, the PLS-3 identified language-disordered children from 66% to 80% of the time; the majority of incorrect distinctions were for those children previously classified as language-disordered. Concurrent validity was assessed by comparing the PLS-3 to the PLS-Revised Edition (PLS-R) and the Clinical Evaluation of Language Fundamentals—Revised (CELF-R). Correlation with the PLS-R was .66 for Auditory Comprehension and .86 for Expressive Communication. Correlation with the CELF-R was .69 for Auditory Comprehension and .75 for Expressive Communication (Zimmerman, Steiner, & Pond, 1992).
Rochester Assessment Package for Schools —Student (RAPS-S)
A shorted version of the Relatedness scale from the RAPS-S was used to measure children’s feelings about their relationship with their primary and secondary caregivers. There are two sets of questions—one for each caregiver. Four subscales were used for NSCAW: Parental Emotional Security, Involvement, Autonomy Support, and Structure. Children answer how true each statement is (1 = not at all true, 2 = not very true, 3 = sort of true, and 4 = very true). Parental Emotional Security asks how true it is that the child feels good, mad, or happy with his or her caregiver. Involvement asks questions about the caregiver’s interest in, time spent with, and things done to help the child. Autonomy Support inquires about the caregiver’s trust of the child and whether the child is allowed to make his or her own decisions. Structure asks about the caregiver’s fair treatment of the child, belief in the child’s abilities, and the child’s understanding of what the caregiver wants (Connell, 1990; Wellborn and Connell, 1987 as cited in Lynch and Cicchetti, 1991).
Internal consistency for the overall Relatedness score was high (.84) and was the only score used. Subscales scores were not used because while Cronbach’s alpha for the Parental Emotional Security and Involvement were fair (.64 to .76), alpha was very low for Autonomy Support and Structure (.06 to .52).
Self-Report Delinquency (SRD)
The Self-Report Delinquency measure (Elliott & Ageton, 1980) was designed for use in the National Longitudinal Survey of Youth (NLSY), a nationally representative sample of 12,686 males and females who were 14 to 22 years old when first surveyed in 1979 (U.S. Bureau of Labor Statistics, 2001). A total of 72 questions were taken from the SRD version used for Wave 7 (1987) of the NLSY. These questions ask about the occurrence of 36 specific acts and their frequency (1 = once to 5 = 5 or more times).
Short Form Health Survey (SF-12)
The SF-12, a shorter version of the SF-36 (12 versus 36 items), measures mental and physical health. Descriptive statistics for the SF-12 scores by gender and age using the National Survey of Functional Mental Health (NSFMH), the normative sample from the SF-36, were very similar to the SF-36 descriptive statistics, indicating support for use of norms and other interpretation guidelines from the SF-36 (Ware, Kosinski, & Keller, 1998). Test-retest reliability was acceptable for mental health (.76) and higher for physical health (.89). Data to test the validity of the SF-12 came from the NSFMH and the Medical Outcome Study, an observational study of health outcomes for patients with chronic conditions. In 12 validity tests involving physical criteria, relative validity estimates ranged from .43 to .78 (median = .67). In four validity tests involving mental health criteria, relative validity estimates ranged from .93 to .98 (Ware, Kosinski, & Keller, 1998).
Social Skills Rating System (SSRS)
The SSRS measures child, parent, and teacher perception of the child’s social skills. NSCAW used the parent and teacher report which addresses social skills in four domains: cooperation, assertion, responsibility, and self-control. The SSRS was standardized on a national sample of 4,170 children, 1,027 parents, and 259 teachers during the spring of 1988. Children ranged from third- to twelfth-graders; 51% were male; and 17% were “handicapped,” compared with 11% of the U.S. population. The handicapped designation was given to students in nonmainstreamed special education classes by teacher rating if the child was learning disabled, behaviorally disordered, mentally handicapped, or other. Black children and white children were slightly over-represented, and Hispanic and other groups were slightly under-represented (Gresham & Elliott, 1990).
Internal consistency was high for preschoolers and secondary-age children (.90) and for elementary-age children (.87); test-retest reliability was also good (.87). Construct validity was indicated by a correlation of .58 between the SSRS and CBCL-Parent Social Competence scale.
Teacher’s Report Form (TRF)
The TRF is almost identical to the CBCL, including the problem syndromes and Other Problems items. Some questions are worded differently to make them more appropriate for teacher response. The TRF also contains academic and adaptive functioning scales, though this information was not collected for NSCAW. The normative sample was drawn from two sources: a nationally representative sample of children (7 to 18 years) assessed with the CBCL, and another contract that identified 5- to 6-year-olds in these homes and randomly selected one child to assess when more than one nonhandicapped child was in the home. Teachers completed TRFs for 1,613 children aged 5 to 18 years. The normative sample was composed of the 1,391 children who had not received mental health services or special remedial school classes within the past 12 months (Achenbach, 1991b).
Test-retest reliability after 15 days for a sample of 44 children was .95 for Total Problems, .92 for Externalizing Behaviors, and .91 for Internalizing Behaviors. Construct validity was particularly good as indicated by TRF scale correlations with the Conners Revised Teacher Rating Scales: .83 for Total Problems, .80 between the TRF Attention Problems and Conners Inattention/Passivity; and for Conners Conduct Problem, .80 with TRF Aggressive Behavior, and .83 with TRF externalizing behaviors. Cronbach’s alpha for the different age ranges and genders ranged from .63 for Thought Problems to .98 for Total Problems for 5- to 11-year-old females. The entire sample averaged .97 for Total Problems, .96 for Internalizing Behaviors, and .91 for Externalizing Behaviors (Achenbach, 1991b).
Vineland Adaptive Behavior Scale Screener (VABS)
The Vineland Screener was used to measure daily living skills among children aged 1 to 10 years. The 45-item screener was developed from the 261-item Vineland Adaptive Behavior Scale. Screener items were selected based on ease of administration, reliability, domain coverage, and strength of correlation with the total scales. The Screener was developed for research purposes only, for screening large groups, rather than for making clinical judgments (Sparrow, Carter, & Cicchetti, 1993). While there are three domains (Communication, Daily Living Skills, and Socialization), NSCAW used only the Daily Living Skills domain. This domain measures personal (e.g., how the child eats, dresses, and performs personal hygiene), domestic (household tasks the child performs), and community skills (how the child spends his or her time, and telephone skills). The normative sample comprises a nationally representative sample in terms of gender, ethnicity, geographic region, and parent education level (compared with 1980 U.S. Census data) of children from birth to 18 years, 11 months (Sparrow, Balla, & Cicchetti, 1984).
Internal consistency for the Daily Living Skills domain of the full Vineland was high, with a mean of .88 (median of .90); inter-rater reliability was also high (.98). Criterion-related validity was as expected, a low but positive correlation with Peabody Picture Vocabulary Test-Revised, ranging from .12 for Daily Living Skills to .37 for Communication months (Sparrow, Balla, & Cicchetti, 1984). Correlation between the Screener and full Vineland is good for all age groups, ranging from .87 to .98. Inter-rater reliability is high as well for the Screener (r=.98). A comparison of 300 inpatient, outpatient, and control children found high external validity for the Screener, ranging from .89 to .97 for 0 to 12 years (Sparrow et al., 1993).
The Violence Exposure Scale for Children—Revised (VEX-R)
The VEX-R was used to assess frequency of exposure to violent and criminal events in children aged 5 and older. The VEX-R is a 23-item child self-report measure in a cartoon format that has been previously administered to minority, inner-city children and elementary school children in Israel (Stein et al., 2001). Children are shown cards depicting violent and criminal acts and are asked to respond on a 4-point scale (never, once, a few times, lots of times) regarding their experiences. The VEX-R inquires about being a victim or witness to 13 types of violent and criminal events.
Internal consistency for the VEX-R as indicated by Cronbach’s alpha ranged from .72 to .86 in a sample of inner-city minority preschool children (Shahinfar, Fox, & Leavitt, 2000). A recent factor analysis of the VEX-R on a sample of 134 children by Raviv et al. (2001) indicated two dimensions grouping into mild and severe violence categories. This was consistent with another factor analytic study of this instrument conducted by Raviv, Raviv, Shimoni, Fox, and Leavitt (1999), which found alpha reliabilities to be .84 and .85 for mild and severe violence. A major indicator of the validity of the VEX-R was its ability to discriminate between low-violence school communities and high-violence ones (Raviv et al., 1999). Also it has been found to have moderate significant correlations with children’s total reported distress symptoms (Shahinfar, Fox, & Leavitt, 2000).
Woodcock-McGrew-Werder Mini-Battery of Achievement (MBA)
The MBA is a brief, wide-range test of basic skills and knowledge, including tests of reading, mathematics, writing, and factual knowledge (science, social studies, and humanities). The MBA may be used with children and adults aged 4 to over 90 (Woodcock, McGrew, & Werder, 1994). NSCAW utilized the MBA with children aged 6 and older and administered only the Reading and Math tests. Because the MBA is a subset of the WJ-R, Woodcock-Johnson MPsycho-Educational Battery—Revised (Woodcock & Johnson, 1989) norms for the MBA are based on data from the normed WJ-R sample. This normed sample included 6,026 individuals aged 4 to 95 years, from 100 geographically diverse U.S. communities. Subjects were randomly selected within a stratified sampling design controlling for 10 community and individual variables. These data were gathered throughout the school year from September 1986 to August 1988 (Woodcock, McGrew, & Werder, 1994).
Internal consistency is high across all age groups as indicated by medians for Reading (.94), Writing (.92), Mathematics (.93), Factual Knowledge (.87), and Basic Skills (.93). Test-retest reliability after one week for a sample of 52 sixth graders was .89 for Reading, .85 for Writing, .86 for Mathematics, .88 for Factual Knowledge, and .96 for Basic Skills. Concurrent validity studies using the same sample indicated that the five tests of the MBA do correlate fairly well with sections of other instruments, such as the WJ-R, KTEA (Brief), PIAT-R, and WRAT-R (Woodcock, McGrew, & Werder, 1994).
Youth Self Report (YSR)
The YSR was designed to “obtain self-report of feelings and behavior in a standardized fashion for comparison with reports by normative groups of 11- to 18-year-olds” (Achenbach, 1991c, p. iii). The YSR is almost identical to the CBCL in content and structure, including the competence scales, problem syndromes, and other problems. The normative sample was drawn from a group of 1,719 children who completed the YSR. The normative sample is nationally representative and consisted of those children who were 11 to 18 years old when they completed the YSR and who had not received mental health services or special remedial school classes within the past 12 months (Achenbach, 1991c).
One-week test-retest reliabilities for the whole sample were .79 for Total Problems, .81 for Externalizing, and .80 for Internalizing. This is somewhat higher than the seven-month test-retest of .56 for Total Problems, .49 for Externalizing, and .52 for Internalizing. Cronbach’s alpha ranged from a low of .59 for the Withdrawn syndrome scale to a high of .95 for Total Problems. Alpha tends to be directly related to the length of the scale, therefore alphas for scales with fewer items tend to be lower (Achenbach, 1991c).
Other investigators of children in OOHC have found YSR scores that are lower than those reported by their caregivers using the CBCL, and a modest correspondence between CBCL and YSR scores (Courtney & Zinn, 1996; Handwerk, Larzelere, Soper, & Friman, 1999)
| Table of Contents | Previous |

