Skip Navigation
acfbanner  
ACF
Department of Health and Human Services 		  
		  Administration for Children and Families
          
ACF Home   |   Services   |   Working with ACF   |   Policy/Planning   |   About ACF   |   ACF News   |   HHS Home

  Questions?  |  Privacy  |  Site Index  |  Contact Us  |  Download Reader™Download Reader  |  Print Print      

Office of Planning, Research & Evaluation (OPRE) skip to primary page content
Advanced
Search

Table of Contents | Previous | Next

2. Study Design: Sampling, Estimation, and Measures

2.1 Target Population

The target population for the NSCAW CPS sample included all children in the U.S. who were subjects of child abuse or neglect investigations (or assessments) conducted by CPS agencies during the sampling period, with one exception. In some states laws required that the first contact of a caregiver whose child was selected for the study be made by CPS agency staff rather than by a NSCAW Field Representative (FR); these four states were excluded from the study. In these sites response rates achieved under these conditions were close to zero, after numerous attempts were made to engage these families. Thus, the target population for the NSCAW CPS sample was modified to be “all children in the U.S. who are subjects of child abuse or neglect investigations (or assessments) conducted by CPS and who live in states not requiring agency first contact.”

The study design did not include all children reported for maltreatment, because many such reports will be screened out—about 38% according to national sources—as inappropriate (DHHS, 2001). Many other reports are never investigated because, although considered to be appropriate reports to a child welfare hotline, they are not judged to be serious enough to warrant a full face-to-face investigation. In this study, the screened-out cases as well as those that do not get a full investigation are excluded from the sample.

Among the cases that are investigated following a credible report of child abuse or neglect, a significant proportion will not receive any ongoing services. The size of this proportion has been widely debated. Although Child Maltreatment 1999 based on state administrative data (DHHS, 2001) indicates that 55% of families that are investigated or assessed will then receive services, this is likely to be an overcount because many states include investigations as a service. Other researchers (e.g., Waldfogel, 1998) have set the proportion of cases that receive services at closer to 40%. NSCAW sampled on the basis of whether or not cases were opened to child welfare services following the investigation. Although many families are involved with other human services, before and after CWS investigations, this study focused on those who were served by public child welfare agencies. This could include families whose cases were managed by child welfare agencies but received services from private agencies.

Sample eligibility was not restricted to new entrants to CWS. The target population included children who had previously been involved with child welfare services as well as those who were new to CWS. Although there are many virtues to studying entry cohorts, and this would have simplified interpretation of the “impact” of the episode of child welfare services on children's outcomes, this is only one objective of the study. The study also intends to describe the children and families who are representative of all children entering CWS during the sampling period (with their accompanying child welfare histories) so that we can understand who they are, what services they receive, and what outcomes result. By not restricting the sample to cases with no prior child welfare involvement, we have created a sample that reflects the group of children entering child welfare services. We have collected information about these contacts because we expect from prior research (e.g., Fluke, Yuan, & Edwards, 1999) that many of these families have multiple prior contacts with child welfare services. The report will clarify the proportion of children who had previously been involved with CWS, and the frequency and timing of their involvement.

To be eligible for the sample, children had to enter child welfare services through an investigation of child abuse or neglect by Child Protective Services (CPS). Children who received child welfare services through some other pathway, not involving a CPS investigation, were ineligible for the study. (Although the initial study plan included a cohort of children and youths who received child welfare services but entered through other gateways, this group proved practically impossible to sample and was dropped from the study.)

Although the study design collects data relevant to substantiated child abuse cases, cases that were not substantiated (following investigation) were also included in the sample. Thus families may have entered the study even though they never committed child abuse or neglect. Even if families received services, it does not mean that they had a substantiated episode of child abuse or neglect. Among the families who received services there are likely to be some that were not substantiated for maltreatment and not required to obtain services, e.g., families who may have obtained services voluntarily. National data indicate that approximately one-fifth of children who were not found to be substantiated victims of maltreatment also received services (Children's Bureau, 2002). Orr's 1999 analysis notes that the percentage of substantiated investigations has dropped from a high of 61% of all investigations in 1973 to 31% in 1996. This argues for better understanding of these unsubstantiated cases, because they are the majority of cases.

Including unsubstantiated cases created some problems for recruitment and sampling, because a few jurisdictions had statutory, regulatory, or practice constraints regarding providing the names of unsubstantiated cases to the study. In the four states that interpreted their state law to require them to contact families in order to inform them of our interest in recruiting them into the study, the obtained sample was so low (25% or less) as to be unusable. These four states were dropped from the study. In one state, unsubstantiated cases were recruited into the study by employees of the research branch of the department; this arrangement was slow but reasonably successful. The vast majority of primary sampling units (PSUs) agreed to work with us to develop procedures that allowed the study to contact families with unsubstantiated cases while still meeting the strictures of the human subjects protocols under which the study operated.

2.2 Study Design

Familiarity with the NSCAW design is crucial to understanding the challenges of study implementation and the significance of the findings. The NSCAW cohort included 5,504 children, aged birth to 15 years, who had contact with CWS within a 15-month period starting in October 1999. These children comprise two distinct cohorts: 5,504 interviewed from those entering the system during the reference period (October 1999 through December 2000), and 727 from among children who had been in out-of-home placement for 12 months at the time of sampling. These 6,231 children were selected from 92 PSUs sampled proportionately to size in 97 counties (parts of 36 states) nationwide.

This report is about the larger group of cases investigated for maltreatment. A prior report describes the children who were one year in foster care (OYFC) (DHHS; available at http://www.acf.hhs.gov/programs/core/ongoing_research/afc/nscaw_oyfc/). The sample of investigated/assessed cases includes cases that receive ongoing services and cases that are not receiving services, either because they were not substantiated or because it was determined that services were not required. Open cases were oversampled to ensure that NSCAW had statistical power to examine the experiences of those children and families that did receive services.

This sample design also required oversampling of infants (in order to ensure that there would be enough cases going through to permanency planning) and sexual abuse cases (in order to ensure that there would be enough cases to have the statistical power to analyze this kind of abuse alone). For this study, the age of children at investigation was capped at 14 years at the time of sampling to increase the likelihood that youths could be located during subsequent waves of data collection—a task made much harder when youths emancipated.

There are four possible respondents for each “case”: the caregiver (the biological parent, another responsible adult, or the out-of-home caregiver), the child welfare worker, the child, and the child's teacher. This is reduced to three when the child is below school age. The information in this report is based primarily on “baseline” interviews with these respondents, which were conducted following the close of the investigation.

A series of steps were needed before these interviews could be conducted. At the end of the month following the close of the investigations, the child welfare agency notified Research Triangle Institute (RTI) of the sampling frame of all completed investigations and additional information needed for sampling—specifically, the age of the child, whether the case was open or closed following the investigation, whether the allegation was sexual abuse, and whether the child was placed into out-of-home care. RTI then completed the sampling and informed the interviewer of the selected cases. The field representative then worked with the agency to locate the families and contact them to ascertain their interest in participating in the study and to complete the interview. Interviews were completed, on average, about six months after the initial report of maltreatment was accepted by the child welfare agency. Although only 7 to 14 children are sampled from each PSU in a given month, the time for sampling, acquiring family contact information, scheduling interviews, traveling, tracing respondents, and completing interviews was substantially greater than original estimates, which were based on experiences from single-site studies of child welfare populations and large surveys of low-income populations.

2.3 Sampling

The NSCAW sample was designed to maximize precision of estimates related to children in CWS. The sample design may be described as a stratified cluster sample of all children in the target population. In response to the mandate in the authorizing legislation, the sample was designed to calculate state-level estimates for the eight states with the largest numbers of CPS cases; each of these states forms one stratum. The ninth and final stratum consists of the remaining states, with a few exceptions described below.

Within these strata, primary sampling units were formed, where the PSU was defined as a geographic area encompassing the population served by a CPS agency. In most cases, PSUs are counties, but in a few cases two or three contiguous counties were grouped to form a single PSU. Further, several counties comprising large metropolitan areas were split into two or more PSUs along CPS agency jurisdiction boundaries in order to facilitate sampling and data collection. Each PSU was then assigned a selection probability, and a random sample of 100 PSUs was selected accordingly. The selection probability for a PSU was computed using composite size measures derived from eight population subgroups (or sampling domains) whose selection rates were to be controlled during the second-stage selection process for the CPS sample component (Biemer et al., 1998; see also detailed information on the sample design at www.ndacan.cornell.edu/NDACAN/Datasets/Abstracts).

After the PSUs were selected, six child welfare agencies indicated that they were unable or unwilling to participate in the NSCAW study and, therefore, were replaced in the sample by six new PSUs that were similar with regard to the sampling control variables. In addition, problems arose in four states in the remaining stratum due to state laws requiring that information on CPS children and their caregivers be released to the study only by consent of the current caregiver. As a result, the response rates in those states were essentially zero, and it was necessary to cease data collection efforts there for both the OYFC and CPS components. These four states were subsequently removed from the target population for the study; consequently, inferences are restricted to children living in states that do not have laws restricting direct access to the children for research purposes—92% of all children originally eligible for the CPS sample. The proportion of the original target population excluded from the study is about 8%, so it is unlikely that the results would change appreciably with the inclusion of these agency first-contact states.

The within-PSU sampling frame for selecting children for the CPS sample was constructed from lists or files of children who were investigated for child abuse or neglect within the sample PSUs between October 1999 and December 2000. Within each PSU, eight mutually exclusive and exhaustive categories of children were created and sampled independently. These within-PSU sampling strata are referred to as sampling domains to avoid confusion with the nine sampling strata formed for the primary stage selection process. The eight within-PSU sampling domains are shown in Table 2-1.

Table 2-1. Within-PSU Sampling Domains
Domain Description
1 Infants (aged < 1 year old) who are not receiving CPS agency-funded services
2 Children aged 1 to 14 years who are not receiving CPS agency-funded services
3 Infants (aged < 1 year old) who are receiving CPS agency-funded services and are not in out-of-home care
4 Children aged 1 to 14 years who are receiving CPS agency-funded services, are not in out-of-home care, and are investigated for allegations of sexual abuse
5 Children aged 1 to 14 years who are receiving CPS agency-funded services, are not in out-of-home care, and are investigated for allegations of other abuse or neglect
6 Infants (aged < 1 year old) who are receiving CPS agency-funded services and are in out-of-home care
7 Children aged 1 to 14 years who are receiving CPS agency-funded services, are in out-of-home care, and are investigated for allegations of sexual abuse
8 Children aged 1 to 14 years who are receiving CPS agency-funded services, are in out-of-home care, and are investigated for allegations of other abuse or neglect

Essentially, the domain structure consists of the cross-classification of four characteristics. At the first level, children are divided into “not receiving services” (Domains 1 and 2) and “receiving services” (Domains 3-8). The group “not receiving services” is further subdivided into two subdomains corresponding to children who are less than 1 year old (Domain 1) and older children (Domain 2). The group “receiving services” is further subdivided into six subdomains, first by age (less than 1 year old and 1 to 14 years old) and then, within each of these age groups, by type of service (in-home care and out-of-home care). Finally, the older group by type of care domains are further subdivided by type of abuse/neglect (children who were investigated for sexual abuse allegations and all other children).

The NSCAW sampling process was conducted over a 15-month period and included all children investigated between October 1999 and December 2000. Each month, the agencies in the sample provided files that contained all children who were investigated for child abuse or neglect in the previous month. Only children aged 0 to 14 years were eligible for the study; children 15 years old or older were removed from the frame. Children on the file who were included in a prior month's file were deleted from the current month's file to avoid the chance of selecting the child again in the current month. In addition, children who were members of the same family of a previously selected child (for example, siblings of a previously selected child) were also deleted from the current month's file in order to limit the burden on families.

2.3.1 Within-PSU Sampling

As the sample agencies were recruited, we worked with them to refine our projections of the expected sizes of the domains of analysis for sampling in 1999. From these projected domain sizes, the initial sampling rates by domain were specified. Software was developed that applied these sampling rates to the domains during the 15-month second-stage sampling period.

Because of the diversity of state and local record-keeping procedures, two different systems were developed for the within-PSU sampling. One, the File Transfer (FT) system, was used for the majority (about 85%) of PSUs; those that could and were willing to transmit files in electronic format. The FT system (1) formats the files provided by the sites into usable form; (2) constructs the sampling frame for the current period; (3) unduplicates records of the frame of the current period with that of all previous months; (4) selects children according to the specified sampling rates; and (5) delivers the selected sample to the survey control system.

The system in the remaining PSUs was a computer-assisted data entry (CADE) system which allowed for constructing the sampling frame in the field. With this system, the Field Representatives obtained the information about completed investigations from the child welfare agency, entered the necessary information into a laptop computer, construct the sampling frame, and then transfer the file to the RTI central office for sampling.

The second-stage sampling period was from early September 1999 through December 2000. The sample was selected in segments on a monthly basis during this period. Monthly sampling allows the workload to be distributed in such a way that it is feasible for agency personnel and NSCAW field staff to accomplish the task. Sample children were selected from those cases for which an investigation/assessment was completed in the previous month.

2.3.2 Description of the Achieved Sample and Response Rates

The achieved sample closely approximated the intended sample. Table 2-2 presents the targeted number of CPS respondents, the number selected, and the number of final respondents in each of the first- and second-stage strata. The actual number of respondents is very close to, and in many cases exceeds, the targeted number. Sampling rates and the achieved sample sizes were monitored monthly, and the sampling rates were adjusted as necessary so that at the end of data collection, the number of interviews in each domain would be as close as possible to the targeted sample sizes. Adjustments to the sampling rates were made to keep the monthly workload within each PSU within an acceptable range, considering the interviewing staff available for the PSU, and to keep the unequal weighting effect for each domain as small as possible for each PSU.

Table 2-2. Comparison of CPS Allocated Sample, Number Selected, and Responding Sample Size, for First and Second Stage Strata
  First Stage Strata Total Not Receiving Services Receiving Services
Not Placed In Out-of-Home Care Placed In Out-of-Home Care
< 1 yr. old 1-14 yrs. old <1 yr. old 1-14 yrs. old <1 yr. 1-14yrs. old
Sexual Abuse Other Sexual Abuse Other
Allocated Sample Size (Targeted number of Respondents) Key State 1 703 52 121 98 47 220 39 19 107
Key State 2 304 5 27 47 29 124 19 10 43
Key State 3 284 18 52 41 19 86 19 11 38
Key State 4 297 26 53 44 25 90 15 8 36
Key State 5 402 27 67 59 32 124 27 10 56
Key State 6 293 17 54 39 21 90 21 12 39
Key State 7 300 16 43 37 22 110 18 15 39
Key State 8 473 27 81 77 38 145 28 14 63
Remainder 2,381 151 397 341 179 760 148 78 327
Total 5,437 339 895 783 412 1,749 334 177 748
Number Selected Key State 1 1,359 89 241 179 102 449 70 38 191
Key State 2 503 17 54 75 39 209 33 14 62
Key State 3 445 19 72 67 31 147 32 22 55
Key State 4 435 43 96 60 35 132 18 1 50
Key State 5 686 63 160 73 29 213 45 9 94
Key State 6 433 27 85 60 32 128 30 19 52
Key State 7 439 27 75 51 32 150 28 22 54
Key State 8 683 48 133 97 54 202 41 23 85
Remainder 3,978 262 999 472 264 1,187 204 104 486
Total 8,961 595 1,915 1,134 618 2,817 501 252 1,129
Responding Sample Size Key State 1 695 53 113 105 53 191 45 21 114
Key State 2 298 8 28 45 26 114 21 11 45
Key State 3 285 15 45 43 15 87 27 15 38
Key State 4 336 33 64 48 26 107 16 1 41
Key State 5 408 47 97 47 18 119 28 4 48
Key State 6 314 17 53 46 22 91 27 13 45
Key State 7 301 20 53 36 21 104 18 12 37
Key State 8 485 29 84 78 37 144 33 16 64
Remainder 2,382 138 524 321 157 703 155 71 313
Total 5,504 360 1,061 769 375 1,660 370 164 745

In discussing the results for NSCAW sampling and recruitment, both weighted and unweighted response rates are relevant. The unweighted response rate is the number of respondents divided by the number of respondents and nonrespondents in the sample; this is a useful indicator of the success of the field effort because it conveys the actual rate at which eligible sample members were interviewed. However, the weighted response rates (simply the sum of the weights for respondents to the survey, divided by the sum of the weights of respondents and nonrespondents) are a more relevant indicator of the potential for bias in the results due to nonresponse.

As mentioned earlier, NSCAW data were obtained through interviews with several respondents, including the current caregiver, the former caregiver (if different), the child, the child welfare worker, and the child's teacher. Any one or all of these interviews may be missing for a sample child; thus exactly what constitutes a “response” to NSCAW is not obvious. One possible definition requires a full response from all four or five possible respondents. This definition is too strict, however, because the key analysis variables may still be available even if the teacher, former caregiver, or child welfare worker does not respond. Therefore, for operational reasons, we defined a response as a completed interview for the key respondent, which was defined as the current caregiver if the child was younger than 11 years or the child, if 11 years or older. Using this definition of a completed interview, the overall weighted response rate for the CPS sample component was 64.3%, and the unweighted response rate was 69.22%.

Table 2-3 includes the weighted response rates for all the major control variables and Table 2-4 presents weighted response rate by respondent type.

Table 2-5 summarizes the final case dispositions in Wave 1 for CPS sample component key respondents. This includes a breakdown of the number of selected children; completed and partial interviews, and final noninterview; and the number of children and adults who were key respondents.

A total of 8,961 children were selected for the CPS sample. Of these, 2,114 (24%) were children aged 11 and older (child was key respondent), and 6,847 (76%) were children under age 11 (caregiver was key respondent). From the CPS sample, 5,487 key respondent interviews were completed, along with 17 partial interviews. Interviews were deemed complete if they met specific criteria established by the NSCAW project team. For child interviews, at least one well-being measure had to have been obtained.

Final noninterview cases included 1,014 ineligibles (11%), 1,028 refusals (13%), 649 unlocatables (8%), and 502 cases that could not be reached after repeated attempts (6%). Cases were deemed ineligible if:

Table 2-3. Key Respondents Weighted Response Rates for CPS Cases by Various Case and Location Characteristics
Sampling Stratification Variable Weighted Response Rate (Percent)
Overall 64.3
Case Type Substantiated 67.3
Unsubstantiated 62.2
Status Not Provided on Sampling File 67.7
Service Receipt Receiving Services 68.1
Not Receiving Services 62.9
Abuse Type Sexual Abuse 60.6
Other Abuse 64.7
Out-of-Home Placement In Out-of-Home Placement 88.6
Not in Out-of-Home Placement 62.0
Location of PSU^ Urban 63.5
Rural 67.4
Size of Agency^^ Small 67.0
Medium 62.7
Large 64.2

^ Based on 1990 U.S. Census data for the county. Counties with > 50% urban were classified as Urban. The remaining counties were classified as Rural. (back)

^^ The size of the agency was determined by the frame count of the number of CPS children in the sample. Small, medium, and large classifications were based on the 33rd and 66th percentiles of the distribution. (back)


Table 2-4. Weighted Response Rates, by Respondent Type
Component Number Interviewed Weighted Response Rate (Percent)
Child 5,154 66
Current caregiver 5,468 70
Child welfare worker 5,101 86
Teacher ^ 1,339  

^ The completion rate is reported for the teacher survey, computed as the number of interviews divided by the number known to be eligible for the component. To be eligible for the teacher survey, children had to be aged 4 or older, in school in grades K-12, not home schooled, and have a signed authorization from the legal guardian or caregiver. (back)


Table 2-5 Key Respondent Final Case Dispositions
Disposition SampleCounts and Percentage
N Percent
Children selected 8,961 100
Number of selected cases with Child as key respondent 2,114 23.6
Caregiver as key respondent 6,847 76.4
Key respondent case status Completed full interview 5,487 69.0
Completed partial interview 17 0.2
Final ineligible 1,014 11.3
Key respondent nonresponse Unavailable after repeated attempts 502 6.3
Final refusal 1,028 12.9
Final unlocatable 649 8.2
Final out of area 79 0.99
Physically/mentally incapable 31 0.39
Incarcerated - interview not obtained 5 0.06
Institutionalized - interview not obtained 8 0.1
Final other 88 1.1

  • the selected child was found to be older than 15 at the time of sampling
  • the selected child was determined to be the sibling of another child in the study
  • the selected child was not the target of the investigation into abuse or neglect (for example, there were cases in which other members of the family were the focus of the investigation or the selected child was the alleged abuser rather than the victim)
  • the investigation date for the selected child occurred outside the sampling period
  • the selected child was deceased.

Refusal cases included those in which (1) the key respondent refused to consent to the interview or (2) parental or legal guardian consent could not be obtained for the child interview. Unlocatable cases included those in which the key respondent could not be located after extensive field- and central-office-based tracing. Cases that could not be completed after repeated attempts included those in which the key respondent either could not be reached or was unavailable for the interview during the data collection period. Cases received a “final out of area” disposition code in situations in which the key respondent lived more than 65 miles (one-way) from an NSCAW field representative, a firm appointment could not be obtained, or costs for securing the interview were considered prohibitive. Final “other” noninterview codes were assigned in situations in which the child's case records were sealed because of the case's high profile or because of completed or ongoing adoption proceedings.

2.3.3 Characteristics of the Final Achieved Sample

Table 2-6 presents the distribution of the selected and final achieved samples by age, race, and ethnicity . The percentage distribution shown is the unweighted distribution of the achieved sample; other tables in this report provide the weighted distribution, which reflects the distribution of the CPS population. The final achieved sample was nearly evenly divided between males (50.6%) and females (49.4%). The largest group of children were younger than 5 years of age (50%), with only 21% of children being older than 10. There were more White children (51%) than African American (34%); only a small group were identified as other race (15%)—these children were primarily Asian (3%) and American Indian (6%). Participation in the study seems to be unaffected by any interaction between age and gender or race and gender.

Table 2-6. Distribution of Sample by Age, Race, and Ethnicity
Characteristic Sample Percentage N Counts and Percent
Age Birth - 2 years 1,701 30.9
2 - 5 years 1,131 20.5
6 - 10 years 1,492 27.2
11 - 14 years 1,180 21.4
Race American Indian or Alaskan Native 342 6.2
Asian, Hawaiian, or Pacific Islander 142 2.7
Black or African American 1,863 33.8
White 2,817 51.2
Some Other Race 335 6.1
Unknown/Not Ascertained 5 0.1
Ethnicity Hispanic 956 17.4
Not Hispanic 4,531 82.3
Unknown/Not Ascertained 17 0.3

2.4 Weighting and Estimation

The design of the CPS sample component is complex and carefully targets some subpopulations (e.g., children less than one year old, those receiving services, victims of sexual abuse) for sampling at a higher proportion to ensure sufficient completed cases for precision in statistical analysis. Given the complex design and oversampling, sample weights must be applied to the observations in order to obtain unbiased estimates of the population parameters. Thus, an estimate of the population total, denoted by Tˆ, takes the form

Tˆ = ∑i wiyi

where wi is the sample weight and yi is the observation for the ith child. An estimate of the population mean, denoted by θˆ , is a ratio and takes the form

θˆ = ∑i wi yi / ∑i wi

To the extent that nonresponse and sampling frame noncoverage error adjustments are effective, the bias in estimates due to these sources of error is reduced. Thus the use of sampling weights in analysis is necessary in order to properly represent the target population selected for the NSCAW CPS sample component. Although comparisons between weighted and unweighted analyses sometimes showed minor differences, many of these differences are substantial. Hence all analyses reported here are weighted because they offer more precision.

Moreover, because the observations are clustered within PSUs, the standard errors of the estimates must account for the potential correlation between the observations within the same PSU to be statistically valid. Consequently, standard error estimates typically produced by software packages that assume simple random sampling (SAS, SPSS) will produce standard error estimates that are likely to be understated. This implies that the true alpha levels for standard tests of hypotheses will likely be somewhat larger than the nominal level, and the levels of confidence for confidence intervals will be somewhat lower than the nominal levels. To account for these properties of the sample design, the analyses were completed using the SUDAAN™ software package (Research Triangle Institute, 2001), which appropriately accounts for the unequal weighting, stratification, and clustering of the observations inherent in the NSCAW sample design. SUDAAN uses Taylor series linearization for estimating the standard errors of nonlinear statistics, such as ratios (Cochran, 1977). Use of statistical software packages that do not properly account for the unequal weighting and clustering effects in the sample may lead to invalid estimates.

Precision in this report is reported as the standard error of the estimate for means and as the endpoints of the 95% confidence interval for proportions. The confidence intervals were computed using the logit transformation of the proportion.

2.5 Analysis of Nonresponse

Child welfare services research has been characterized by studies with poor sample construction and low response rates (Rossi, 1992), leaving the studies open to the criticism that they capture a biased view of the population of concern. To determine the potential for nonresponse to bias the NSCAW results, we conducted an analysis of the nonresponse bias for these data. For a large proportion of key nonrespondents, data were available from the child welfare worker. These data were used to estimate the nonresponse bias and then destroyed. An estimate of the nonresponse bias for the population mean of some variable, y, is given by

Bias = (1 - r) ( y¯R - y¯NR )

where r is the response rate, y¯R is the mean for respondents, and y¯NR is the mean of the nonrespondents.

As a general indicator of the potential for nonresponse to bias the results, we used a count of the number of variables in the nonresponse analysis for which the bias is significantly different from zero (two-sided test). At the p < .05 alpha-level, one would expect 5% of such tests to be significant by chance alone. Conversely, if more than 5% of tests of non-zero bias are significant, that would be evidence of nonresponse bias in some of the study variables. Likewise, at a significance level of p < .01, one would expect approximately 1% of the tests of non-zero bias to be significant by chance.

Table 2-7 presents the number of times that the null hypothesis was rejected at a = 0.05, using both sets of weights. This table demonstrates that for the CPS sample, with the final analysis weight, the number of variables with practically significant relative bias is 4%, or about what would be expected by chance. Thus, we conclude that nonresponse bias in the CPS is unlikely to be consequential for most types of analyses. Variables showing practically significant bias in the CPS sample were variables related to the type and severity of abuse/neglect, the relationship of the primary care giver to the child, the likelihood of abuse/neglect in the next 12 months without services, child placement in a group home, and the outcome of the investigation being substantiated. The actual bias in these variables was small (less than 10%).

Table 2-7. Number of Significant Biases Observed by Type of Respondent for the CPS Sample
  CPS Sample
Base Weight Final Analysis Weight
Caregiver Items with more than 20 cases in the denominator 500 500
Items where Null Hypothesis: Bias = 0 was rejected 81 (16.2%) 59 (11.8%)
Items where Null Hypothesis: |Relative Bias|<5% was rejected 34 (6.8%) 21 (4.2%)
Child Items with more than 20 cases in the denominator 478 478
Items where Null Hypothesis: Bias = 0 was rejected 47 (9.8%) 32 (6.7%)
Items where Null Hypothesis: |Relative Bias|<5% was rejected 44 (9.2%) 20 (4.2%)

This does not necessarily mean that the CPS data were not biased by nonresponse, only that the data available for this analysis were insufficient to detect a nonresponse bias. Nor is there indication that the bias was sufficiently large to justify the additional effort required to include bias estimates in the data analyses.

2.6 Description of Analyses

2.6.1 Comparisons Conducted and Interpretation of Comparisons

One of the key questions this study addresses is how children are faring in out-of-home care. To answer this question, we must first understand differences and similarities between children who have been placed in out-of-home care and those who have not. To further understand the relationship between child and family characteristics and receipt of services, we also need to compare the types of services received. In the quest to discover what differences, if any, exist among these children, differences between various subgroups were routinely tested; these include:

  • children living at home versus children in out-of-home placements
  • children living at home who have not received services versus children living at home who have received services
  • children in foster homes versus children in kin-care settings versus children in group homes
  • children in various age categories
  • children in various race categories.

The analysis approach balances the goal of identifying key relationships between case factors and the possibility that the findings might be spurious. Although this work is exploratory, and an important goal of this report is to identify relationships that deserve further analysis, the authors also recognize that conducting so many tests can result in a large number of relationships that are identified as “significant” simply by chance. For that reason differences are interpreted as “significant” when there is a 99% chance that the association between variable is not by chance alone (i.e., p < .01). Although this high and uniform standard is used in textual interpretations of the data, 95% confidence intervals are used in the tables to show findings that meet the more conventional standard for significance (p < .05 is our standard for a “trend” and is referred to as such in the text). Providing this information offers readers the opportunity to read the tables with attention to associations that have some probability of being meaningful, even though they are not as definitive as those meeting the 99% likelihood standard. When a relationship reaches the level of p < .001, we call this “highly significant” in the text. This does not mean that the relationship is more important than one at the p < .01 level, but only that the likelihood of an actual difference is higher. All t-tests were two-tailed.

2.6.2 The Approach to Bivariate and Multivariate Analyses

As a general approach, we have conducted statistical tests for differences—usually t-tests or ² tests—on bivariate relationships between our outcome variables and the age, race/ethnicity, and type of child welfare setting (e.g., in-home vs. out-of-home care) experienced by the child. The exact variables and categories did vary somewhat, depending on the analysis. Limited multivariate analyses were then performed on these dependent variables in order to control for the possible joint dependency between such variables as age and race/ethnicity, race/ethnicity and type of setting, and age and type of setting. Gender is also included as an independent variable in the multivariate analyses. If a multivariate analysis simply confirms the results of the related bivariate analysis, with no additional or contradictory findings, the results of the multivariate analysis are discussed in the text but not presented in a table.

Standard reference groups were used in the multivariate analyses as follows:

  • Age—11 and older

  • Race/ethnicity—White

  • Setting—in-home, not receiving Child Welfare Services

  • Gender-female


When a particular analysis dictated that a more appropriate reference group be used, this is reflected in the table and the reason a different reference group was chosen is explained in the text. (Typically, we chose a different reference group because the comparison of this group with others was suggested by literature or prior studies. In other words, we modified the reference group because we aimed to test theoretically derived hypotheses, to answer questions regarding disputable findings shown in prior studies, or to double-check differences identified in bivariate findings.) In addition, when a 95% confidence interval indicated that a particular category, other than the reference group, might exhibit significant differences from other categories (i.e., it did not overlap with the standard reference group), the multivariate model was rerun with that category as the reference group to check for such differences.

This report does not attempt to control the overall probability of falsely rejecting any null hypothesis by limiting our testing of item-level differences when the overall “family” of items is not significant. Although not making item-level comparisons when the family of items is not significantly different is a common approach, it risks the failure to explore important differences between two subgroups because of the pattern of variation across all subgroups. Significant findings in multivariate analyses such as linear and logistic regressions are interpreted and presented for contrasts between categories within a particular variable rather than for the main effect of a particular variable. That is, if the overall family of “race/ethnicity” is not significant, we have still compared the individual races/ethnicities to each other. Thus, the pairwise t-test for White vs. Hispanic children is presented rather than the F statistic for the main effect of race/ethnicity. Though this approach increases the risk of Type I errors (indicating that a relationship exists when it does not), ignoring significant differences between item-level responses when the family-wise F statistic is not significant increases the risk of Type II errors (not detecting existing relationships). We view this as consistent with the exploratory goals of this study. Because this is the first child welfare study of such breadth and depth, the report errs on the side of overestimating differences rather than underestimating differences. As researchers examine these data in more depth, using more highly specified models, some spurious findings in this report will almost certainly be identified.

Controlling for the family-wise error rate is most important when the number of individual contrasts is high and the consequences of drawing an erroneous conclusion based on a Type I error are severe. Since in most multivariate analyses of this study we compare no more than four groups (e.g., comparison of differences among four groups of race/ethnicity), and the maximum number of individual contrasts is six under such conditions, the possibility of making a Type I error should not be extremely high, and it is mitigated by our use of .01 for our significance level. In child welfare research about the general patterns and associations related to the receipt of services, the risk of making a Type I error is not large.

Note that, unless otherwise indicated, all proportions and means presented in tables are based on weighted data. The data are weighted to the population of all CWS cases in the U.S., and the standard errors that are presented serve as an indicator of the sample size (i.e., the larger the standard error, the smaller the underlying sample size). The minimum number of cases utilized for individual analyses is 10; cell sizes smaller than 10 are indicated by “---” in the tables.

2.7 Instrumentation

The NSCAW instruments were designed to measure a broad range of constructs identified from the research questions guiding the study. The instruments selected and developed had to be able to answer the key research questions as well as the subquestions and the specific analytic questions identified by RTI, subcontractors, ACYF, and Technical Work Group (TWG) members. Operationalization of the constructs that would be used in the analysis required us to consider:

  • measures across a broad age span
  • the most cost-effective measurement procedures
  • respondent burden
  • protection of subjects from the consequences of responding to sensitive topics
  • measures appropriate for, and sensitive to, a diverse, multicultural population.

Whenever possible, standardized instruments with national norms were chosen, or instruments or questions that had been used in previous studies with large and diverse national samples of children and families. Instruments were assembled into interviews for each of the survey informants resulting in six separate interviews: current caregiver, former caregiver, child, teacher, child welfare worker, and agency personnel. Instruments are further described below.

At every step in the instrument development process, we included discussion and outside review by TWG members and consultants. Cognitive testing, pretesting, and reviews by focus groups were conducted with volunteer clients and personnel from child welfare agencies. With the exception of teacher and agency questionnaires, all instruments and assessments were computerized to assist lay interviewers in consistently administering questions and in obtaining reliable assessment information.

Many measures used in the following analyses were simply single items (e.g., the race and age of the child); others were derived after consolidating a number of single items intended to capture key case characteristics; and some (described at the end of this chapter) were standardized measures. Most of these items and scales measure child functioning as rated by:

  • caregivers (e.g., the Child Behavior Checklist [CBCL] and the Social Skills Rating System [SSRS])
  • teachers (e.g., the Teacher Rating Form [TRF] and the Teacher version of the Social Skills Rating Form [SSRS])
  • field representatives during standard assessment procedures (e.g., the Battelle Developmental Inventory [BDI], the Bayley Infant Neurodevelopmental Screener [BINS], the Kaufman Brief Intelligence Test [KBIT], the Mini-Battery of Achievement [MBA], and the Preschool Language Scale-3 [PLS-3]).

A few are self-report child measures (e.g., the Children's Depression Inventory [CDI], the Research Assessment Package for Schools [RAPS], the Violence Exposure Scale for Children-Revised [VEX-R], and the Youth Self-Report [YSR]) that were completed by older children (aged 5 to 14 years, depending on the measure).

Also administered were the Short Form Health Survey (SF-12), a measure of health and well-being for caregivers, and the NLS/Y short form of the Home Observation Measure of the Environment (HOME-SF). As used in NSCAW, the HOME-SF includes some parental report items and some from the field representative's observations. In addition, many items allow respondents to describe their experiences—some of these were later scaled or scored, some clusters of items are presented in their entirety, and some are not discussed in this report.

Instruments used in NSCAW are described in detail in Appendix B.

2.7.1 Battelle Developmental Inventory (BDI)

BDI (Newborg et al., 1984) was used to assess development in children aged 3 years and younger. The instrument is designed to evaluate five domains of development for children aged birth to 8 years: cognitive, adaptive (self-help), motor, communication, and personal-social. For this study, only the cognitive domain was administered. This domain measures skills and abilities that are conceptual in nature. There are four subdomains: perceptual discrimination, memory, reasoning and academic skills, and conceptual development. The normative sample was composed of more than 800 children, with approximately 100 in each year age-group. A total of 75% were from urban areas; 50% were male; and 84% were White, with the remaining 16% being of other ethnicities. Test-retest reliability ranged from .90 to .99. For concurrent validity, correlations between the 10 BDI components and the Vineland Social Maturity Scale (VSMS) ranged from .79 to .93 (Newborg et al., 1984).

2.7.2 Bayley Infant Neurodevelopmental Screener (BINS)

BINS is a screening tool to identify infants between the ages of 3 and 24 months with developmental delays or neurological impairments for further diagnostic testing. It has four conceptual assessment areas: Basic Neurological Functions/Intactness (of the infant's central nervous system), Receptive Functions (sensation and perception), Expressive Functions (fine, oral, and gross motor skills), and Cognitive Processes (memory/learning and thinking/reasoning) (Aylward, 1995).

BINS was standardized with a nonclinical and a clinical sample. The nonclinical sample consisted of 600 infants with a normal length of gestation (38 to 42 weeks) and no prenatal, perinatal, or neonatal medical complications. This sample was stratified by age, race, gender, geographic region, and parent education level; it is representative of the U.S. population according to the 1988 update of the 1980 U.S. Census. The clinical sample was composed of 303 infants from clinics across the nation that deal with infants with neurodevelopmental problems. Most infants had more than one medical complication (Aylward, 1995).

Inter-rater reliability was higher at older ages, as indicated by .79 for 6 months, .91 for 12 months, and .96 for 24 months. Construct validity was moderate, as evidenced by correlations with the Mental Development (.63) and Psychomotor Development (.47) indexes of the Bayley Scales of Infant Development—Second Edition (BSID-II) and BDI at 12 months for the Communication (.50), Cognitive (.51), and Motor (.50) domains (Aylward, 1995). Internal consistency in the NSCAW study is acceptable as indicated by Cronbach's alphas ranging from .73 to .84 for the various age groups.

2.7.3 Child Behavior Checklist (CBCL)

CBCL was “designed to provide standardized descriptions of behavior rather than diagnostic inferences” (Achenbach, 1991a, p. iii) about competencies, problem behaviors, and other problems. Items are on a 3-point Likert-type scale (0 = not true, 1 = somewhat or sometimes true, 2 = very true or often true). It contains 100 items for 2- to 3-year-olds and 113 items for 4- to 18-year-olds. The problem scale is composed of eight syndromes (Withdrawn, Somatic Complaints, Anxious/Depressed, Social Problems, Thought Problems, Attention Problems, Delinquent Behavior, and Aggressive Behavior) and an Other Problems category (26 items for the 2- to 3-year-olds and 33 items for the 4- to 18-year-olds). Behaviors are also categorized as Externalizing (containing the Delinquent and Aggressive Behavior syndromes) or Internalizing (containing the Withdrawn, Somatic Complaints, and Anxious/Depressed syndromes). A Total Problems score may be derived from the total of the syndromes and Other Problems items (Achenbach, 1991a).

The problem syndromes were normed by gender and age, using a nationally representative sample of 2,368 children aged 4 to 18 years who had not received mental health services or special remedial school classes in the previous 12 months (Achenbach, 1991a).

Cronbach's alpha for the different samples for 4- to 11-year-old females ranged from .54 for Sex Problems to .96 for Total Problems. Very high inter-rater reliability was found, as indicated by an intraclass correlation coefficient (ICC) of .96 for the problem items. Construct validity was good, as the problem syndromes correlated fairly well (.59 to .88) with similar scales from other instruments (Parent Questionnaire, Quay-Peterson Revised Behavior Problem Checklist, and ACQ Behavior Checklist) (Achenbach, 1991a). Internal consistency in the NSCAW sample is high for 2- to 3-year-olds (Externalizing = .91, Internalizing = .80, and Total Problem Behavior = .95) and for 4- to 15-year-olds (Externalizing = .92, Internalizing = .90, and Total Problem Behavior = .96).

Children classified as having clinical/borderline problem behaviors had scores above 60 for Externalizing, Internalizing, and Total Problem behaviors. These cutoffs were the same for the 2- to 3-year-olds and 4- to 15-year-olds.

2.7.4 Children's Depression Inventory (CDI)

CDI measures depression by asking various questions of children aged 7 to 17 about their engagement in certain activities or their experience of certain feelings (e.g., sad, enjoy being around other people). CDI contains 27 items, each with a 3-point Likert-type scale (0 = absence of symptom, 1 = mild symptom, 2 = definite symptom) that addresses a range of depressive symptoms as indicated by five factors: Negative Mood, Interpersonal Problems, Ineffectiveness, Anhedonia, and Negative Self-Esteem. The normative sample consisted of 1,266 Florida public school students aged 7 to 16 (Kovacs, 1992).

In studies conducted from 1983 to 1991, internal consistency has been good, with Cronbach's alpha ranging from .71 to .86. Alpha for the five factors ranged from .59 to .68—suggesting that the subscales are not robust. Test-retest reliability ranged from .38 to .87 depending on the time interval and sample. Studies (cited in Kovacs, 1992) have established concurrent validity with the Coopersmith Self-Esteem Inventory (-.72 for girls and -.67 for boys), Center for Epidemiological Studies Depression Scale (.44), and Social Adjustment Scale Self-Report (.50). Although discriminant validity results have been mixed, significant differences were found between normative and clinical groups (Kovacs, 1992). In the NSCAW sample, internal consistency is good, averaging .81 for 7- to 12-year-olds and .87 for 13-to 15-year-olds.

Children were classified as depressed if they fell at or above the 91st percentile for their age and gender group. This clinical cutoff is based on the CDI normative sample's rates of depression in the CDI manual (Kovacs, 1992).

2.7.5 Closeness to Caregiver

Questions regarding closeness to the caregiver were obtained from a series of single-item questions taken from the National Longitudinal Study of Adolescent Health (AddHealth) (Carolina Population Center, 1998). A total of four questions, two about their primary caregiver and two about their secondary caregivers, asked children how close they felt to their caregiver and how much they thought their caregiver cared about them. The questions were summed to create a closeness to caregiver score. Scores range from 1 to 5, with 5 indicating the highest degree of closeness to the caregiver. Reliability is good (α = .75).

An additional 20 questions (10 for the primary caregiver and 10 for the secondary caregiver) concerned joint activities in which the child and caregiver participated within the past four weeks. Children could endorse 10 possible activities, such as shopping, discussing things, working on a school project, attending a religious service, or playing sports together.

2.7.6 Community Environment

The community environment was measured using the abridged community environment scale that was developed by Abt Associates (1996) for use on the National Evaluation of Family Support Programs. The scale consists of nine items that ask caregivers about their neighborhood. Reliability for this scale in the NSCAW population is good (α = .86).

2.7.7 Composite International Diagnostic Interview Short Form (CIDI-SF)

CIDI-SF is a highly standardized interview that screens for mental health and substance use disorders using the criteria established in the Diagnostic and Statistical Manual of Mental Disorders (American Psychological Association, 1994). The presence of eight disorders is evaluated: major depression, generalized anxiety, specific phobia, social phobia, agoraphobia, panic attack, alcohol dependence, and drug dependence. For this study, only the sections on major depression, alcohol dependence, and drug dependence were administered. Questions are scripted to ask about the previous 12-month period (Nelson, Kessler, & Mroczek, 2001); the section on depression was administered by an in-person interview, while the sections on alcohol and drug dependence were administered using an audio computer-assisted self-interview (ACASI).

CIDI-SF is a shortened form of the Composite International Diagnostic Interview, which was developed from the NIMH-Diagnostic Interview Schedule (DIS) by the Joint Project on Diagnosis and Classification of Mental Disorders in Alcohol and Drug-related Problems (funded by the World Health Organization and the former Alcohol, Drug Abuse, and Mental Health Administration). CIDI-SF was developed using data from the U.S. National Comorbidity Survey and has been shown to reproduce accurately the CIDI diagnostic classifications (Kessler et al., 1998).

The reliability and validity of CIDI has been widely studied (Wittchen, 1994). Internal consistency for the alcohol and drug dependence sections ranged from .70 to .94 (Cottler et al., 1991; Ustun et al., 1997). Inter-rater reliability has ranged from .67 to 1.0 (Andrews et al., 1995; Wittchen et al., 1991). Test-retest data have shown kappas of .62 to .78 for the three disorders included in this study (Wittchen, 1994). Concordance with clinical diagnoses ranged from .76 to .84 (Janca et al., 1992), while comparisons with the Schedules for Clinical Assessment in Neuropsychiatry (SCAN) ranged from .66 for lifetime to .69 for current diagnoses (Andrews et al., 1995).

2.7.8 Conflict Tactics Scale (CTS1)

CTS1 is a self-report or interview measure designed to assess the overt means by which family members respond to conflicts (Straus, 1979). CTS1's physical violence scale was used to assess caregivers' experiences with intimate partner violence (IPV). This measure is divided into minor and severe subscales, based on the severity of the violent act. The minor violence items include being pushed, grabbed, shoved, or slapped, whereas the severe violence items inquire about experiences that include being choked, beaten, and threatened with a knife or gun. Response categories range from 0 (never) to 6 (more than 20 times), indicating the frequency of occurrence of the violent acts in the preceding 12 months. For events that did not occur in the previous 12 months, the respondent is asked to indicate if they ever happened.

CTS1 has been used in national surveys of IPV and is the most frequently employed and thoroughly validated measure of IPV. The reliability (α =0.88) and validity of the physical violence section of CTS1 have been well documented (Straus, 1990; Straus & Gelles, 1990). The violence items have face or content validity since they all describe acts of actual physical force being used by one family member on another. In the use of the CTS1 with the NSCAW sample, internal consistency is good for Any Domestic Violence (α =.90), Minor Violence (α =.77), and Severe Violence (α =.86).

2.7.9 Conflict Tactics Scale—Parent Child (CTS-PC)

CTS-PC was developed to assess the uses of discipline. There are two versions: one in which the children report their experience of disciplinary actions and one in which permanent caregivers report their use of those disciplinary tactics with their study child. The “disciplinary” actions include more than those ordinarily considered part of parental discipline and range from time out to burning a child. The underlying assumption is that much maltreatment is justified by parents as discipline and understood by children as discipline.

CTS-PC's theoretical basis is conflict theory, which assumes that conflict is an inevitable part of all human association, whereas physical assault as a tactic to deal with conflict is not. CTS-PC uses an 8-point Likert-type scale (1 time, 2 times, 3 to 5 times, 6 to 10 times, 11 to 20 times, more than 20 times, not in the past 12 months, never) to measure frequency and extent to which a parent has carried out specific acts of physical and psychological aggression (Straus et al., 1998). This measure consists of three subscales that assess Nonviolent Discipline, Psychological Aggression, and Physical Assault. The Physical Assault scale can be subdivided and consists of three subscales: minor physical assault (corporal punishment), severe physical assault, and very severe physical assault. Two additional supplemental subscales measuring Neglect and Sexual Abuse (total 22 items) were available and were administered to the caregivers but not the children of the NSCAW dataset.

CTS-PC was tested on a nationally representative sample of 1,000 U.S. children. Internal consistency was marginal, as indicated by Cronbach's alpha ranging from .55 (Physical Assault) to .70 (Nonviolent Discipline). Construct validity for the CTS-PC was moderate, with correlations of -.34 between Corporal Punishment and Child Age and lack of significant correlations with Child Age and Severe Assault (-.06). Analysis of covariance found no significant differences between White and African American parents on corporal punishment, but significant differences on severe physical assault were found, which is consistent with past findings in the literature. Gender differences consistent with the literature were also found in this study of construct validity (Straus et al., 1998).

In the NSCAW study, internal consistency for the child and caregiver report on the CTS-PC scales varies. Cronbach's alpha for Total score on the child report is .85, with subscales ranging from .50 for Nonviolent Discipline to .85 for Total Physical Assault. Cronbach's alpha for Total score on the caregiver report is .79, with subscales ranging from .39 for Neglect to .77 for Nonviolent Discipline.

2.7.10 Home Observation Measure of the Environment (HOME-SF)

HOME measures the quality and quantity of stimulation and support in the home environment of children from birth to 10 years (Bradley, 1994; Bradley et al., 2001). The number of items ranges from 20 to 24, depending on the age of the child. Items address the mother's behaviors toward the child and various aspects of the physical environment (e.g., safe play environment, size of living space), asking whether these conditions exist, do not exist, or were not observed. Although the observer's presence may influence the parent-child interaction, the duration of the caregiver interview increases the likelihood that any such alteration in behavior will be reduced, because the mother will have more difficulty inhibiting her usual reactions over this extended period (Caldwell, Bradley, & Staff, 1979).

The initial normative sample was composed of 174 infants (aged 4 to 36 months) and 117 preschoolers. Since then, HOME has been adapted for many national studies, although national norms have never been established. The version this study duplicates is the shorter version of HOME used in the National Longitudinal Survey of Youth (NLSY), a study that includes many low-income families. In keeping with Bradley's designation, this measure is labeled as the HOME-SF in this report.

Estimates of internal consistency have been greater than .80 for total scores, whereas coefficients for subscales ranged from .30 to .80. When percentage has been used to measure inter-observer agreement, levels have always been at least 85%. When a coefficient has been used to measure agreement, the coefficient was at least .80 (Bradley, 1994). No independent tests of inter-observer agreement were conducted for this study.

Internal consistency is generally low for HOME-SF in its use for NSCAW. Cronbach's alphas for HOME-SF scales for children aged 2 years and younger are less than .45. Cronbach's alphas for measures for 3- to 5-year-olds are somewhat higher, ranging from .41 for Emotional Support to .71 for Physical Environment. For 6- to 10-year-olds, Cronbach's alphas range from .48 for Cognitive Stimulation and Emotional Support to .74 for Physical Environment.

2.7.11 Kaufman Brief Intelligence Test (K-BIT)

K-BIT is a brief, individually administered measure of verbal and nonverbal intelligence for children, adolescents, and adults, ranging in age from 4 to 90 years. Verbal items assess word knowledge and verbal concept formation. Nonverbal items (matrices) assess ability to perceive relationships and complete analogies. The normative sample was composed of a nationally representative sample of 2,022 people aged 4 to 90 years tested at 60 sites in the U.S. The sample was stratified based on gender, geographic region, socioeconomic status, and race/ethnicity. Children aged 4 to 16 years made up 66% (1,342) of the sample (Kaufman & Kaufman, 1990).

Internal consistency for the Vocabulary subscale was high for 4- to 19-year-olds, ranging from .89 to .98, and moderate for Matrices, ranging from .74 to .95. Test-retest reliability for 5- to 12-year-olds was good for Vocabulary (.86) and moderate for Matrices (.83). Test-retest reliability for 13- to 19-year-olds was higher for Vocabulary (.96) and moderate for Matrices (.80) (Kaufman & Kaufman, 1990). In the NSCAW sample internal consistency is good for Composite (.84), Verbal (.76), and Matrices (.79) scores.

2.7.12 Limited Maltreatment Classification System (L-MCS)

In the present study, we used a modification of the Maltreatment Classification System (MCS) (Barnett et al., 1993) to capture information about the report of alleged maltreatment that preceded the investigation that triggered the child's entrance into the study. Although the MCS was designed for case record reviews, in this study we collected data about maltreatment in an interview with the child welfare worker who knew the most about the investigation and had immediate access to case record materials. Although the MCS gathers information about all types of maltreatment and then classifies each of them according to severity, this was not feasible in an interview setting because of interview length. Data were collected about all the types of maltreatment that had been recorded in the allegation, but the maltreatment that was judged to be most serious was the only one coded in greater detail. For this type of maltreatment, the onset was recorded and the severity was rated (on closed-ended scales provided by the MCS and modified by the investigators to create 5-point scales for each) from 1 (least) to 5 (most). The investigators also added examples of parameters of maltreatment that could anchor each of these scale points. These were based on the instructions to the coders of the case materials. Thus the L-MCS offers five dimensions of maltreatment: the number of types, the combination of types, the severity of the most serious type, the onset of the maltreatment, and who was responsible for the maltreatment.

2.7.13 Peer Loneliness and Social Dissatisfaction

Peer relations were measured using a slight modification of the Loneliness and Social Dissatisfaction Scale (Asher & Wheeler, 1985). This instrument asks questions about how true various statements are, such as, “It's easy for me to make new friends at school,” “It's hard for me to get kids in school to like me,” and “I don't have anyone to play with at school.” A modified version was used for children aged 5 to 7 years with questions rather than statements and fewer response options (yes, no, sometimes). Children aged 8 and older had the option of five responses to indicate how often statements were true (never, hardly ever, sometimes, most of the time, always). Summing the scores on each item created an overall score for each child. Possible scores range from 16 to 48 for 5- to 7-year-olds and 16 to 80 for children aged 8 years and older. Higher scores reflect more loneliness. Internal consistency is high (α = .90) for elementary school children (Asher & Wheeler, 1985) and middle school children (Parkhurst & Asher, 1992). In NSCAW, internal consistency is good for 5- to 7-year-olds (α = .70) and high for children aged 8 years and older (α = .89).

2.7.14 Poverty Level

The poverty level was determined using the family's income level and the number of adults and children in the household, according to the procedures used by the U.S. Census Bureau (Dalaker, 2000). The average threshold ranges from $11,239 for a two-member household to $35,060 for a nine-or-more-member household. We collected information about income levels in $5,000 increments that ranged from 0 to $5,000 per year to over $50,000 per year. The midpoint of each increment was chosen to indicate the household's income. Households with an income “over $50,000” were all assigned an income of $75,000 for the purposes of calculating poverty. This choice was based on information from the National Survey of America's Families that indicated that twice as many families had incomes greater than or equal to 300% of the poverty level than had incomes of 200% to 300% of the poverty level (Urban Institute, 2002).

2.7.15 Preschool Language Scale-3 (PLS-3)

PLS-3 measures language development of children from birth to 6 years (in this study it was administered to children from birth to 5 years). The Auditory Comprehension subscale measures precursors of receptive communication skills with tasks focusing on attention abilities. The Expressive Communication subscale measures precursors of expressive communication skills with tasks that focus on social communication and vocal development. A Total Language score combines these two subscales. PLS-3 was standardized with a sample of 1,200 children aged 2 weeks to 6 years, 11 months, with equal percentages of males and females in each age group. Representative sampling based on 1980 U.S. Census data and the 1986 update was stratified by parent education level, geographic region, and race (Zimmerman, Steiner, & Pond, 1992).

Internal consistency using Cronbach's alpha has, on average, been acceptable for Auditory Comprehension (mean = .76; range of .47 to .88) and higher for Expressive Communication (mean = .81; range of .68 to .91) and Total Language (mean = .87; range of .74 to .94). Test-retest reliabilities ranged from .89 to .90 for Auditory Comprehension, from .82 to .92 for Expressive Communication, and from .91 to .94 for Total Language. Inter-rater agreement is 89% with correlation between scores = .98 (Zimmerman, Steiner, & Pond, 1992).

Using discriminant analysis, PLS-3 identified language-disordered children from 66% to 80% of the time; the majority of incorrect distinctions were for those children previously classified as language-disordered. Concurrent validity was assessed by comparing PLS-3 to PLS-Revised Edition (PLS-R) and the Clinical Evaluation of Language Fundamentals–Revised (CELF-R). Correlation with PLS-R was .66 for Auditory Comprehension and .86 for Expressive Communication. Correlation with the CELF-R was .69 for Auditory Comprehension and .75 for Expressive Communication (Zimmerman, Steiner, & Pond, 1992). In NSCAW, sufficient data are missing to prevent calculation of Cronbach's alphas.

2.7.16 Punitiveness/Hostility

This subscale of HOME-SF was developed by Linver, Fuligni, and Brooks-Gunn (2001) to measure the level of observed caregiver punitiveness/hostility. The subscale consists of items that ask whether the caregiver: shouts at the child, expresses annoyance with or hostility to child, slaps or spanks the child during the visit, scolds or criticizes the child, and interferes with the child more than three times. All five of these items are measured for caregivers of children less than 3 years old, though only the last three items were measured for caregivers of children between 3 and 5 years old. This scale's reliability was tested using data from the Infant Health and Development Project, the Early Head Start Research and Evaluation Project, and the Project of Human Development in Chicago Neighborhoods. The reliability in NSCAW is good for children younger than 3 years old (α = .69) but is poor for children between 3 and 5 years old (α = .12). This suggests that extra care should be taken when interpreting findings of punitiveness among caregivers of children between 3 and 5 years old.

2.7.17 Rochester Assessment Package for Schools—Student (RAPS-S)

A shorter version of the Relatedness scale from RAPS-S was used to measure children's feelings about their relationship with their primary and secondary caregivers. There are two sets of questions, one for each caregiver. Four subscales were used for NSCAW: Parental Emotional Security, Involvement, Autonomy Support, and Structure. Children answered how true each statement was (1 = not at all true, 2 = not very true, 3 = sort of true, and 4 = very true). Parental Emotional Security asked how true it was that the child felt good, mad, or happy with his or her caregiver. Involvement asked questions about the caregiver's interest in, time spent with, and things done to help the child. Autonomy Support inquired about the caregiver's trust of the child and whether the child was allowed to make his or her own decisions. Structure asked about the caregiver's fair treatment of the child, the caregiver's belief in the child's abilities, and the child's understanding of what the caregiver wants (Connell, 1990; Wellborn & Connell, 1987, as cited in Lynch & Cicchetti, 1991).

A mean rather than a summed Relatedness score was created to account for the fact that not all children answered the same number of questions (e.g., not all answered questions for the secondary caregiver). Internal consistency for the overall Relatedness score was high (.88) and was the only score used. Subscales scores were not used because while Cronbach's alpha for the Parental Emotional Security and Involvement were fair (.65 to .76), the alpha was very low for Autonomy Support and Structure (.28 to .66).

2.7.18 Satisfaction with Caseworker and Services

NSCAW's current caregiver instrument contains 17 items addressing current caregivers' satisfaction with their caseworker(s). Current caregivers of children remaining in the home were asked whether or not they had talked to a caseworker since the start of the child welfare investigation. Only caregivers reporting positively on this item continue to answer the remaining questions in this section.

Caregivers who reported speaking with a caseworker since the start of the investigation were first asked how many caseworkers they had met with and how long ago they had last spoken with a caseworker. Six questions in the instrument inquired about respondents' relationship with their caseworker(s): how often their caseworker(s) listened to their concerns, understood their situation, treated them with respect and fairness, explained treatment and service options to them, and met with them to develop an action plan to address their needs and concerns. Three questions addressed the extent to which caregivers have been satisfied with the amount of c