Skip Navigation
Administration for Children and Families  
ACF
ACF Home   |   Services   |   Working with ACF   |   Policy/Planning   |   About ACF   |   ACF News   |   HHS Home

  Questions?  |  Privacy  |  Site Index  |  Contact Us  |  Download Reader™  |  Print      

Office of Planning, Research & Evaluation (OPRE) skip to primary page content
Advanced
Search

 Table of Contents | Previous | Next

ASSESSMENT TOOLS AND PRACTICES

Issues and Observations

Making sure that children are prepared for school has been emphasized as a priority by parents, educators, policymakers, and presidents. Whether children are ready to learn when they reach school can be determined only when the concept is properly defined and assessed (Schweinhart, 1993). Several materials observe that there may be a mismatch between the learning styles of AI-AN children and tests intended to determine their knowledge:

  • American Indian and Alaska Native people share with other minority groups the concern that IQ scores and results from achievement tests have not recognized potential bias against speakers of other languages and people who are not members of a Caucasian, middle-class culture (Deyhle and Swisher, 1997).

  • Observers note that standardized assessment methods may be inadequate as indicators of American Indian and Alaska Native children’s abilities because the tools do not match the culture, language patterns, learning styles, and strengths of AI-AN children (Banks and Neisworth, 1995; Bordeaux, 1995; Estrin and Nelson-Barber, 1995; Harris, 1985). Others caution, however, not to use test bias to reject standardized assessments (Shields, 1997).

  • Relatedly, there are concerns that some assessors may attribute low scores or measures to an AI-AN child’s culture or use of “Indian English,” when the child has a genuine lack or disability that should be addressed (Harris, 1985; Saxton, 2001).

  • Among many AI-AN populations, the extent of acquired knowledge is often demonstrated in actual practice, rather than measured through assessments and test scores (Estrin and Nelson-Barber, 1995; Kawagley and Barnhardt, 1998).

  • Although creating locally developed norms for standardized assessments may produce a better fit to measure the abilities of certain populations, some caution against this practice because it may lower expectations for those populations (Harris, 1985).

Research Findings: Assessment Tools and Practices
Author Sample, Measures, and Methods Major Findings Reported by Author
Banks, 1997 Twenty AI-AN parents/caregivers and 11 professionals who assess AI-AN children completed a mail questionnaire on their perceptions of assessment practices. Assessors reported using norm-referenced instruments, criterion-referenced instruments, and curriculum-based assessment, but upon closer examination almost all of the instruments actually used are standardized, norm-referenced tools. Parents reported lower levels of involvement in the assessment process (as observers or in interviews) than the assessors reported about them. All respondents agreed that most testing took place in schools, but about half of the assessors said they did additional testing in the home, compared to 10 percent of the parents. The author notes that findings cannot be generalized because of the small sample size. Only limited research has been conducted on assessment practices among AI-AN children; more studies are necessary to investigate reported differences.
Brachlow, Jordan, and Tervo, 2001 The Denver II Developmental Screening Test (a standardized tool) and the Child Development Review (an open-ended parent report tool) were administered to 38 children from the Cheyenne River Reservation (two-thirds were AI) and 35 children from an urban area (only one was AI). Overall, children who were administered the Denver II had higher rates of failure, higher rates of borderline scores, and lower passing scores than those administered the CDR. Among reservation-based children, more passed the CDR than the Denver II; there were no significant differences in passing rates for urban children. The authors caution that testers should not over-interpret Denver II outcomes among populations in culturally unique settings and recommend not relying on a single test.
Browne, 1984 To examine scoring patterns and cognitive processing strengths of AI children, the Wechsler Intelligence Scale for Children–Revised (WISC–R) was administered to 197 American Indian children, ages 6-16. The sample was drawn from children attending the St. Joseph’s boarding school in South Dakota. Results were compared to those of a standardization population. WISC–R scoring patterns of AI children were different from the predominantly non-Native standardization sample. The AI children had strengths in performance tasks of subtests that have been associated with “right brain” functioning. The authors suggest that the cognitive strengths may show a preference for right-hemisphere information processing in AI children.
Crowe, McClain, and Provost, 1999 The Peabody Developmental Motor Scales is a standardized assessment often used to determine whether children need early intervention. No AI children are included in the normative sample. The PDMS was administered to 44 Pueblo children age 24- 35 months, and a family member completed a 20-item questionnaire on demographic characteristics and child development. Results show significant differences between the Pueblo children’s scores and the normative sample scores. Pueblo boys and girls in two age groups (24-29 months and 30–35 months) scored lower on fine motor scales, and Pueblo girls in the younger age group scored lower on gross motor skills. The authors recommend that therapists and others administering and interpreting the PDMS should be cautious when using it with AI children who have not been included in the normative sample.
Kazimour and Reschly, 1981 The Adaptive Behavior Inventory for Children (ABIC) is an assessment instrument used to determine in part whether to classify a child as mildly retarded. It is one measure within the System of Multicultural Pluralistic Assessment (SOMPA) for children age 5-11. To investigate the overrepresentation of minorities in programs for students with mild retardation, the ABIC was administered to a random sample of 146 white, 120 black, 112 Hispanic, and 105 Papago children in grades 1, 3, and 5, enrolled in Pima County, Arizona schools. Means were compared to results from a normative sample for SOMPA from California. To study the concurrent validity of the ABIC, correlations were assessed among ABIC scores, the Metropolitan Achievement Test, the Teacher Rating Scale, the WISC–R IQ scales, and SOMPA sociocultural measures. Papago children had significantly lower mean scores on the ABIC than the rest of the sample. Although there were few differences between Hispanic subgroups in Arizona and California, there were differences for white, black, and Papago children, suggesting that issues exist for using ABIC norms for classification and placement. There was little evidence of concurrent validity of the ABIC with the other instruments examined.
Long, 1998 The Preschool Language Scale–3 was administered to 60 children enrolled in Cherokee Nation Head Start, Tahlequah, OK, and to 20 other AI-AN and 20 non-AI-AN children to test the validity of the measure for use among Cherokee children. Results suggest the PLS–3 is a valid measure of language skills among 3- and 4-year old Cherokee children but is questionable for use with 5-year-olds. Although all tested children fell within one standard deviation of the total test scores, Cherokee Nation children ages 4 and 5 had better language comprehension than language production, and 5-year-old children scored lower than 3- to 4-year old children. The test scores may be reflecting cultural factors. Children from AI-AN populations that value silence and listening may not score well on tests that require verbal expression.
Mishra and Lord, 1982 To examine the reliability and predictive validity of the Wechsler Intelligence Scale for Children–Revised (WISC–R) for culturally diverse children, the test was administered to 40 4th and 5th grade Navajo children attending school on a Navajo reservation. The arithmetic, spelling, and reading sections of the Wide Range Achievement Test (WRAT) were then administered. Results indicate the WISC–R has low reliability and predictive validity for Navajo children. The highest validity was obtained for the performance scale. The authors conclude that the WISC–R may not be an appropriate test to use with Navajo children.
Mitchell, 1985 To compare the potential cultural bias of two different assessments, the Kaufman Achievement Battery for Children (K–ABC) and the WISC–R were administered to 29 preschool-age Cherokee and 30 non-Native children from eastern Oklahoma, and to 15 Kiowa and 12 non-Native children from southwestern Oklahoma. Overall scores of the AI children on the WISC–R and K–ABC were not significantly different from those of non-Native children. Additionally, there was no difference in scores on either test between Cherokee and Kiowa children. However, scores of AI children on the WISC–R subtests of the WISC–R Verbal Scale (Vocabulary and Comprehension) were significantly lower than those of non-Native children. The author notes that children from bilingual homes or from a cultural minority may experience difficulty on these subtests. Because there were no performance differences on the K–ABC subtests, the author concludes that the K–ABC may have fewer issues of cultural bias than the WISC–R.
Morgan and Whorton, 1991 To compare differences in assessment instruments for children from culturally diverse backgrounds, the WISC–R and the Diagnostic Achievement Battery for Children were administered to 12 AI and 13 white children between the ages of 6-15. Differences in scores between AI and white children on both the WISC–R and the Diagnostic Achievement Battery for Children were not statistically significant.
Oesterheld and Haber, 1997 Four focus groups were held with a total of 33 Dakotan/Lakotan parents, in urban, rural, and reservation locations. Participants reviewed the Conners Parent Rating Scale (CPRS) and Child Behavior Checklist (CBCL), which are often used to screen for attention-deficit/hyper-activity disorder. Overall, all parents said that the CPRS and CBCL were not insensitive to their Native culture. They identified three types of problems with the assessment tools: (1) some questions could not be answered because they contained words or idioms that were not understood (e.g., rather than “sassy,” as used in the CPRS, a better word would be “disrespectful”); (2) a question implied a cultural value of the dominant society that did not account for Native traditions (e.g., a phrase reflecting a child’s lack of independence would not acknowledge that it is culturally acceptable for children to cling to adults in Dakotan/Lakotan families); and (3) responses would be misunderstood by those who did not understand the culture (e.g., the CBCL asks about hearing sounds or voices that aren’t there, but some Natives embrace the presence of spirits).
Plank, 2001 Eight AI children who had been diagnosed as intellectually disabled (mentally retarded) using the WISC–III and WAIS–R were re-evaluated using a cross-battery approach to establish intelligence scores, or by using the Visual Processing component from the Woodcock-Johnson Revised Test of Cognitive Ability or the Universal Nonverbal Intelligence Test. When AI children who had previously been diagnosed as mentally retarded (on the basis of intelligence scores ranging from 58-69) were reassessed using the cross-battery approach, intelligence scores of 90-105 were produced, meaning that none was mentally retarded or had impaired intelligence. The author states that psychologists working with AI children should use multiple measures and methods to avoid poor decisions and inaccurate diagnoses.
Powless and Elliott, 1993 Parents and teachers used the Social Skills Rating System (SSRS) for a sample of 50 AI Head Start children from the Oneida Indian Reservation and 50 white Head Start children from Madison, WI programs. The SSRS is a standardized, norm-referenced tool. The teacher version of the SSRS has 52 items that reflect three prosocial domains (cooperation, assertion, and self-control) and one domain for problem behavior (interfering behaviors); teachers also rate how each behavior is related to classroom success. The parent version of the SSRS has 70 items that measure prosocial and problem behaviors at home and asks parents to designate the importance of each behavior. According to ratings of both teachers and parents, white children demonstrated the measured social skills more frequently than AI children. This was particularly the case for a subscale measuring assertive behavior. AI and white teachers did not designate the same specific social behaviors as important, but AI teachers and parents did share a common sense of important social skills. The authors note that the SSRS may not measure social skills children exhibit that are unnoticed. For example, some AI children may show assertive skills nonverbally rather than verbally. The authors also posit that similarities between AI teachers and parents may be due to common cultural experiences.
Reschly and Reschly, 1979 Scores on the WISC–R for 212 white, 289 black, 184 Hispanic, and 202 Papago children in grades 1, 3, 5, 7, and 9 from Pima County, Arizona were analyzed to compare children’s performance and the test’s predictive value for school achievement and attention. Findings confirmed a strong relationship between WISC–R scores and school achievement for all groups, with the exception of the Papago children. The authors suggest that the Papago children may have experienced bias in test administration techniques.
Ross-Reynolds and Reschly, 1983 Six subtests (Information, Similarities, Arithmetic, Vocabulary, Comprehension, and Picture Completion) of the WISC–R were examined for item bias. Scores on the WISC–R were examined for 252 white, 237 black, 223 Hispanic, and 238 Papago children from Pima County, Arizona. The analyses found no evidence of item bias for black or Hispanic children. Ambiguous results were found for the Papago children. Depending on the analyses and criteria used, bias ranged from none or negligible to substantial for Papago children.
Spiegel, 1986 To investigate the psychometric properties of the Peabody Picture Vocabulary Test–Revised (PPVT–R), the test was administered to 343 Sioux children enrolled in Head Start on four reservations (Crow Creek, Lower Brule, Santee, and Standing Rock). Scores on the PPVT–R were compared with scores on two subtests—Communications and Concepts—of the Developmental Indicators for the Assessment of Learning (DIAL). Mean scores of the Sioux Head Start children were significantly (often greatly) lower than mean scores of children in the PPVT–R standardization sample (the standard deviation of the Sioux children was smaller, suggesting a homogeneous group different from the national standardization sample). The effects of age and gender for Sioux children’s PPVT–R test scores were consistent with effects of age and gender for the standardization sample. Item-response analysis suggests consistency, namely that more able children succeeded in answering more difficult items correctly. Moderate correlations between PPVT–R scores and the DIAL Communications and Concepts subtests were achieved, and a discriminant analysis predicted accurately whether 78 percent of the Sioux children would pass or fail the DIAL. The author concludes that the PPVT–R has many of the same psychometric properties when used with the Sioux children as with the standardization sample, but the Sioux children’s mean scores were about one standard deviation below that of the standardization sample and suggests that local norms should be examined when a local population is unique.
Thomas, 1987 Grades were reviewed on report cards of 44 Caucasian and 38 AI kindergarten children in a rural midwestern town. In several categories of grades—expression, work habits, physical development, reading and number readiness, art appreciation and music participation, and overall—there were no statistically significant differences in the grades teachers gave Caucasian and AI children. Teachers gave lower grades to AI children for their general characteristics and social development. The author thinks the grade differences could reflect cultural differences. Further research could examine whether clothing, speech, and manners, along with family demographics, affect grades given in kindergarten.
Ukrainetz, Harpell, Walsh, and Coyle, 2000 23 Arapahoe/Shoshone kindergarten students were assessed using dynamic assessment techniques. Dynamic assessment (which looks at responses to learning situations, rather than traditional assessment methods) is a process that is being tested to accurately diagnose language impairments in minority populations. The results measured from the dynamic assessment were consistent with teachers’ identification of children as weaker or stronger language learners. The study shows the potential of dynamic assessment as a more valid method for accurately measuring cultural differences vs. disorders in minority children.


 

 

 Table of Contents | Previous | Next