Skip Navigation
acfbanner  
ACF
Department of Health and Human Services 		  
		  Administration for Children and Families
          
ACF Home   |   Services   |   Working with ACF   |   Policy/Planning   |   About ACF   |   ACF News   |   HHS Home

  Questions?  |  Privacy  |  Site Index  |  Contact Us  |  Download Reader™Download Reader  |  Print Print      

Office of Planning, Research & Evaluation (OPRE) skip to primary page content
Advanced
Search

 Table of Contents | Previous | Next

Section 3
INFORMATION INCLUDED FOR EACH INSTRUMENT
 
infant The purpose of this resource document is to provide information, in one place, about existing screening and assessment instruments designed for use with children under age 3 and their families, as well as instruments designed for assessing services provided by programs serving them. Thus, we cast a broad net and include a wide range of screening and assessment tools of potential use to programs. Many of the instruments described are established instruments that yield a standard score that places the child’s performance in the context of other children of the same age. We also include some data collection tools that may be useful, such as implementation rating scales and questionnaires that include questions on family practices, health, and health care receipt from the national Early Head Start Research and Evaluation Project (EHSRE).

We did not set strict inclusion criteria, but tried to provide information on a range of features for each instrument so programs can make informed decisions in selecting instruments. Each program must determine the purposes for considering a particular instrument and evaluate how well the instrument fulfills those purposes.

In general, because of their limited applicability for programs serving infants and toddlers, we did not include measures for which the lowest appropriate age for administration was older than 2 years. We made an exception for certain instruments, such as the Woodcock Johnson III and Peabody Picture Vocabulary Test, that Head Start programs sometimes use and that may be helpful for continuity when children go on to Head Start.

We consulted multiple sources of information to identify instruments for inclusion in this resource document. We looked at the National Early Head Start Research and Evaluation Project (EHSRE) to identify instruments used by the national and local research teams and instruments that research programs used. We held group discussions with Early Head Start program staff at the 2002 Birth to Three Institute to learn about screening and assessment tools they are using in their programs. Information was provided about screening and assessment tools that Early Head Start programs are currently using. We consulted with researchers and technical assistance experts. Finally, we conducted a literature review to identify instruments that are used widely and have been developed and/or normed within the past 15 years, or after 1987.

The instruments included in this document were developed for a variety purposes and by individuals from different disciplines. Thus, you may find that some instrument names are overly technical or offensive. In these cases, you may want to present the instruments to parents using a less technical name that describes what the instrument measures in terms that parents will understand. For example, you might want to refer to the Parent-Child Conflict Tactics Scale as a questionnaire on discipline and responses to children’s behavior.

The screening and assessment instruments in this resource document are presented in three groups: (1) instruments for measuring child development; (2) instruments for measuring parenting, the home environment, and parent well-being; and (3) instruments for measuring program implementation and quality. Within each group, instruments are in alphabetical order. Summary tables listing the instruments are presented at the beginning of each group of instruments. 3 This resource document is intended to be a living document that will be updated as new screening and assessment instruments are identified or become available.

We gathered information about each instrument from different sources, depending on the type of instrument. For the more formal, copyrighted instruments, we relied primarily on the manuals or Web-based information available from the authors or their publishers. If we found a key research article about a formal instrument, we also reviewed it and included the pertinent information. For the more experimental, less formal instruments, we reviewed the instrument itself and the supporting material we were able to locate, such as research reports and published articles, and reviews conducted by others. Each entry includes a reference section that identifies the sources of information we used.

Many of these instruments are grounded in developmental theory and research. Developers of standardized tests for children usually begin with their theory of how abilities develop and identify areas to be assessed. Then they create items to measure the identified areas and try them with children to determine whether the items discriminate among children by age. After a core set of items is identified, test developers often launch a large, nationally representative study to test the items and obtain statistical information about how the study participants performed on each item. From the study findings, the test developers determine the best set of items, develop rules about where to begin and end the test, and decide on procedures for converting raw scores (based on summing the number of items answered correctly or on the average rating across items on a rating scale) to norm-referenced scores. The norm-referenced scores take advantage of the nationally representative study and allow comparisons between how an individual child performed on the test and how children of the same age in the study performed. The nationally representative study also provides information about how the instrument works with diverse and low-income populations.

Other types of research also provide important information about a screening or assessment instrument. Studies that use a new instrument in conjunction with established instruments that measure the same ability or skill provide information about whether the new instrument measures what it was intended to measure. Other studies compare how well the new instrument predicts children’s performance in a given skill area many years later. Because they take a long time to conduct, these studies are not available for very new instruments, but they can be valuable in evaluating an instrument administered when children are young.

No screening or assessment instrument performs perfectly across all the dimensions practitioners and researchers believe are important (such as the statistical properties of the instrument or how easily the resulting information feeds back into individualized intervention planning) and for all the purposes for which the instrument may be used. We encourage you to weigh the information described for each instrument according to your program’s theory of change, your comprehensive plan for gathering and analyzing data, and the purposes for which you will use the information. Consultation with an expert may help you sort through this information and select screening and assessment instruments.

The language that describes screening and assessment instruments is filled with jargon. Box 4 defines the key terms used in this document.

The rest of this chapter includes a summary of what you will find described for each instrument included in this resource document. Each entry includes a summary table and a more detailed description of the topics we identified as most useful for making comparisons across instruments. The topics in the summary table include:

  • Authors, Publisher, Ordering, and Initial Material Cost Information. This information will allow you to obtain the instruments. Some publishers will provide an inspection copy of the materials for a short period of time at no charge. Some publishers require that only trained psychologists or other assessment professionals purchase and use the materials, because the content of the instruments must be kept confidential and the instruments must be administered and used in accordance with professional guidelines. We list the cost for the initial materials required to use the instruments. For some copyrighted materials, you will be required to purchase a score sheet for each screening or assessment you conduct. You may be able to negotiate with the publisher for a reduced price if you are buying in bulk.

  • Representativeness of Norming Sample. As described in Box 4, knowing whether the norming sample was nationally representative or representative of the children or parents in your program is important in deciding whether to use an instrument. Your screening and assessment plan will include the purpose for each screening and assessment. If you are interested in how the children in your program are performing compared with children nationally, you will want to choose an instrument with a nationally representative norming sample.

  • Knowing how children from low-income families in the norming sample performed compared with all children nationally can also be important for interpreting assessment results. For example, the Early Head Start Research and Evaluation study found that children’s standardized scores on the Bayley Mental Development Index decreased between 14 and 24 months of age and remained at the 24-month level at 36 months. This pattern has also been found in other studies of low-income children and in the Bayley norming sample. In this case, the decrease in standardized scores reflects differences in the composition of the test at different ages. At 14 months of age, the Bayley does not include many items directly focused on language development. At 24 and 36 months, the Bayley includes many items that tap language development. The decrease in standardized scores among low-income children as they get older indicates that low- income children score less well compared to children nationally as language development becomes a more important part of the test.

  • Languages. We included the languages in which the instruments are available. Some instruments have unofficial translations used in the field, but we restricted our listing to the languages that are available from the authors or publishers. If you are planning to use an instrument to compare the children in your program with those in the instrument’s norming sample, using an unofficial translation or directly translating the instrument into another language will result in scores that may not be comparable to the norming sample scores. According to the strictest standards, such scores are not valid.

  • Type of Instrument. We categorized the child and parent instruments as one of three types: (1) direct child or parent instruments, in which a trained individual works one-on-one with the child or parent to administer the instrument; (2) observation, in which a trained individual observes the child or parent and either rates or scores the behaviors of interest; and (3) parent report or self-report, in which the parent reports about the child or himself or herself. These basic categories apply to most of the other areas we reviewed as well, such as quality of program services. As needed, we used different descriptors to make our meaning as clear as possible.

  • Age Range and Administration Interval. We have included the age range for which the instrument is appropriate, as well as the recommended time between administrations of the instrument, if given. Some instruments are designed to be administered at regular intervals, and that information is also included.

  • Personnel, Training, Administration, and Scoring Requirements. We described whether the instrument requires administration by a consultant or expert with clinical training, a highly trained program staff member, or a clerical program staff member. We included an estimate of how much time a person at the level required would need to learn, conduct, and score the instrument. Some of the authors and publishers suggest that trainees have an administration reviewed by an experienced assessor. If so, we also included this requirement. Some of the authors and publishers offer group training on the use of their instruments, and we included that information and the cost of the training, if it is available.

  • Summary. We chose five key features of the instruments to include in the summary table. Each feature has descriptors numbered from 1 to 3. A descriptor of 1 indicates a lack of information or lower-level performance on the feature, a descriptor of 3 indicates a higher-level of performance, and 2 is intermediate. We include this summary section to help you compare the features of the instruments, but do not consider this information as a recommendation of one instrument or another. Only you and your staff can decide which features are most important to you. The purposes of your screening and assessment must guide your choices about which instruments to use. The features we include in the summary section are:

- Initial material cost: 1 (under $100), 2 ($100 to $200), 3 (more than $200).

- Reliability: 1 (none described); 2 (all or mostly under .65); 3 (all or mostly .65 or higher). See Box 4 for a brief definition of the various types of reliability. We chose these groupings based on the prevalent rule of thumb researchers and assessment developers use. Other things being equal, the higher the reliability is, the better the instrument is.

- Validity: 1 (none described); 2 (all or mostly under .5 for concurrent; all or mostly under .4 for predictive); 3 (all or mostly .5 or higher for concurrent; all or mostly .4 or higher for predictive). See Box 4 for a brief definition of the various types of validity. We chose these groupings based on the prevalent rules of thumb researchers and instrument developers use. Generally, the higher the validity is, the better. It is especially challenging to create instruments for infants and toddlers that strongly predict how the children will do as preschoolers. Therefore, the grouping for predictive validity reflects a less stringent criterion for the highest grouping.

- Norming sample characteristics: 1 (none described); 2 (older than 15 years, not nationally representative or representative of the low-income population enrolled by Head Start programs serving infants and toddlers); 3 (normed within past 15 years, nationally representative or representative of the low-income population enrolled by Head Start programs serving infants and toddlers). See Box 4 for a brief definition of representativeness of the norming sample. This section also includes information on the date that the norming sample was obtained. The more time that has elapsed since the norming sample was obtained, the less likely it is to be representative. Many authors/publishers re-norm their assessments every 10 to 12 years to keep them up-to-date. We chose 15 years as the critical time here.

- Ease of administration and scoring: 1 (not described); 2 (self-administered or administered and scored by someone with basic clerical skills); 3 (administered and scored by a highly trained individual). The administration and scoring requirements for each instrument vary and these descriptors help you determine what is involved for these steps.

The other topics included for each instrument are:

  • Description. This section provides an overview of what the instrument was designed to measure, the age range of individuals it may be used with, the number of items, how it is administered, and what types of information can be derived (including any scores and subscale scores).

  • Uses of Information. To help you match your intended purposes for an instrument with the results, we included a summary of how the information that comes from an instrument may be used. Some of the instruments are clearly designed for screening children, some for in-depth assessment, some for allowing comparisons to a national norming sample, some for parent education, and some for feeding back into individual intervention planning and continuous program improvement.

  • Reliability. Indicators of an instrument’s reliability help determine whether an instrument is dependable. For example, a dependable instrument is also stable, and the results would be similar if the instrument was administered to the same individual several times in a short period. Box 4 summarizes key information about what to look for in reports of an instrument’s reliability. The types of reliability summarized in the resource document entries include:

- Measures of internal consistency (split-half reliability, internal consistency reliability) that indicate the extent to which the items in the instrument “hang together” and tell a coherent story about the child or adult’s functioning

- Measures of stability (test-retest reliability, alternate form reliability) that indicate the extent to which the instrument yields the same results when used at different times or using a different form of the instrument (for those that have multiple forms)

- Measures of the reliability of administration (inter-rater reliability) that indicate the extent to which two different observers or instrument administrators would interpret and record the information in the same way

  • Validity. Indicators of an instrument’s validity help determine whether the instrument really measures what it is supposed to for the purpose it is being used. For example, if an instrument is supposed to provide an estimate of a toddler’s language production, how the child performs on the instrument should be similar to how the child performs on another established instrument of language production. We summarize key information about what to look for in reports of an assessment’s validity in Box 4. The types of validity summarized in the resource document entries include:

- Content validity, which relies on expert judgment to determine that an instrument actually measures what it is intended to measure

- Criterion-related validity, including concurrent validity, which indicates how well the instrument results relate to other information collected at the same time, and predictive validity, which indicates the extent to which the instrument results are related to later functioning

  • Method of Scoring. Child screening and assessment instruments may be scored using a simple pass/fail point system, or they may use a broader range of response categories, such as whether the child usually exhibits a particular behavior, is just starting to show the behavior, or does not yet display the behavior. In this section, we summarize the response categories used in the instrument and the types of scores it is possible to compute.

  • Interpretability. Many instrument authors and publishers provide information about how to interpret what a score or range of scores means as to whether the child is functioning at the level expected for his or her age or whether additional information may be needed. These guidelines are helpful in making sense out of the results. In this section, we summarize what is available to help you interpret the information that comes from each instrument.

  • Training Support. In this section, we summarize what training in the use of the instrument the authors and publishers recommend. We also describe training materials, products, or sessions available. Some authors and publishers include a lot of information about how to prepare to administer their instruments, while others provide little. Some provide training videotapes or exercises as part of the purchase of the instrument. In this section, we summarize what the authors and publishers include to help you identify who needs to administer the instrument and the resources available for training them.

  • Adaptations/Special Instructions for Individuals with Disabilities. Some instruments are designed specifically to assess the abilities or performance of individuals with disabilities, but most are not. In this section, we describe adaptations or instructions the authors or publishers included for working with people with disabilities.

  • Report Preparation Support. Some instruments include summary sheets or software to help you prepare individual reports based on the results. These reports may be designed to help you customize the program for a given child or parent or to help you share information with parents. Some instruments also include recommendations on how to present reports to parents.

  • References. In this section, we give the full citations for the instruments, manuals, and other sources of information we used to complete each entry. We also include citations for any other materials the authors/publishers make available about the instrument, such as training videotapes and computer scoring programs.

The entries are organized alphabetically in three groups: (1) measures of child development; (2) measures of parenting, the home environment, and family well-being; and (3) measures of program implementation and quality. In front of each group of entries is a summary table that lists the instruments profiled in that section and summarizes their main features.


3 THE INCLUSION OF AN INSTRUMENT IN THIS RESOURCE DOCUMENT DOES NOT CONSTITUTE ENDORSEMENT OF THE INSTRUMENT BY THE AUTHORS, MATHEMATICA POLICY RESEARCH, OR THE U.S. GOVERNMENT. (back)

 

Box 4: BRIEF DEFINITIONS OF KEY TERMS

Assessment. Assessment is a generic term referring to a variety of procedures for obtaining systematic information on a child’s, parent’s, family’s, or program’s strengths or needs. As noted in Chapter I, the Head Start Program Performance Standards focus on the child and family assessment purposes of identifying “(i) the child’s unique strengths and needs and the services appropriate to meet those needs; and (ii) the resources, priorities, and concerns of the family and the supports and services necessary to enhance the family’s capacity to meet the developmental needs of their child.” These two major purposes of assessment are sometimes described as providing information for individual diagnosis and program planning. The purposes of a diagnostic assessment are to (1) identify whether an individual has special needs, (2) determine what the problems are, (3) suggest the cause of the problems, and/or (4) propose strategies to address the problems (Meisels and Provence 1992). The purposes of an assessment for program planning are to (1) learn about an individual’s ability to perform particular tasks or achieve mastery of particular skills, and (2) design intervention activities for the individual that support the completion of tasks and mastery of skills over time. Depending on the purpose of the assessment process, it may include norm-referenced tests; observations in the home, child care, early intervention, program, or school setting; interviews with family members, child care providers, or others who may provide important information about the individual; and ratings by adults knowledgeable about the child (including a parent, caregiver, or teacher) (Sattler 1992). The performance standards also require programs to conduct an “assessment of community strengths, needs, and resources,” as well as an annual program self-assessment of “effectiveness and progress in meeting program goals and objectives and in implementing federal regulations.”

Screening. Screening is made up of a set of activities designed to identify individuals who have a high probability of exhibiting delayed, abnormal, or problematic development. The screening is intended to identify problems at an early stage and to use this information to flag individuals for further, in-depth assessment activities.

Basal. A basal is established on a standardized test when the individual demonstrates that he or she successfully completes the first few items administered. On most standardized tests, the tester begins administering the items based on how old the individual is, starting later if the individual is older. If the individual passes the number of items specified in the test manual for establishing a basal, the tester is able to assume that the individual would have gotten all of the previous items correct and adds in the number of untested items to the correctly passed items administered to the individual. If the individual does not pass the specified number of items, the tester would administer earlier items until the prescribed number of items are passed or the tester reaches the start of the test. Using a basal rule saves time during the testing session and reduces fatigue.

Ceiling. A ceiling is established on a standardized test when the individual demonstrates that he or she fails a few of the later items administered. On most standardized tests, the tester continues administering the items until a certain number (either in a row or a proportion, such as six out of eight in a row) are failed. If the individual fails the number of items specified in the test manual for establishing a ceiling, the tester ends the test and is able to assume that all later test items would be failed by that individual as well. This saves time during the testing session and reduces fatigue.

Criterion-Referenced Test. This type of test compares an individual’s performance to an established measure of performance rather than to the performance of others. Criterion-referenced tests will usually include a measure of mastery, or how well a child is able to complete a task. For example, if a test required that a child identify all of the letters of the alphabet, that would be a criterion-referenced test. We would be able to describe the child’s mastery of the test by using statements such as, “The child is able to identify 80 percent of the letters in the alphabet.”

Norm-Referenced Test. This type of test compares an individual’s performance to the performance of others on the same measure. Usually, the norms are developed from data collected from a large, nationally representative group of individuals.

Reliability. Indicators of reliability tell how dependable an assessment or screening tool is for the purpose it is used. Reliable tools are stable over time and include items that measure the same thing in different ways. For tools that require standardized observation (for example, child care quality observations or ratings of children’s behavior), the scores obtained by two different, well-trained observers must be similar to be considered reliable. Statistical measures of reliability are typically reported as correlation coefficients, which range from 0 to 1.0, with a higher value reflecting greater reliability. Many researchers and test developers require that assessment and screening tools have reliability values of 0.7 or higher. For our summary descriptors, we adopted a criterion of 0.65, which reflects a rule of thumb commonly used in the field. Typical indicators of reliability include measures of consistency of results and stability over time:

  • Internal consistency. If the individual items in an instrument tool measure the same thing (for example, they all assess motor ability or language development), the measure is considered to be internally consistent. One measure of internal consistency is split-half reliability. To demonstrate this, test developers and researchers test a group of individuals, then split the test items in half, usually by grouping the odd- and even-numbered items. If the two groupings of the test items are highly correlated with each other, the split-half reliability is considered to be acceptable. Another measure of internal consistency reliability is based on the correlations among all of the individual test items. This index of internal consistency is called Cronbach’s alpha (named after the researcher who developed the statistical formula)

Stability. By this measure, an assessment is reliable to the extent the procedure yields the same result on two different occasions. Test-retest reliability involves testing the same group of individuals at least twice, with a relatively short interval between assessments, usually no longer than a few days or weeks apart. The higher the test-retest reliability, the more stable the assessment tool is considered to be. Longer periods between administrations of the same assessment will reduce the reliability, partly because the individual’s situation (for example, skill) can be expected to change. Some assessment tools have two versions of the same test so that the same skills or behaviors can be assessed a second or third time (as in a pre-post or longitudinal study). In such cases, test developers include information on alternate form reliability. To demonstrate that both forms of the test are essentially equivalent, a random half of a large group of individuals is given one form of the test and the other half is given the other form. Alternate form reliability is demonstrated if the scores of the two groups are highly correlated.

Reliability of administration. Another reliability consideration applies to assessment tools that require an observer to score a child’s or parent’s behavior or complete a rating or checklist describing the behavior observed. To use such assessments in evaluation, researchers and test developers want to be sure that these ratings can be made consistently. One index of consistency is the extent to which two trained observers obtain the same scores when they do their observations at the same time, although independently. This index is referred to as inter-rater reliability. It is usually reported either as the correlation between the scores or ratings obtained by the two observers or as the percentage of items on which the two agree.

Representativeness of Norming Sample. Standardized screening and assessment tools provide information about how the children and parents in your program are doing compared to the group (or sample) of individuals the test developers or researchers included in their norming group. Knowing whether the norming sample was nationally representative or representative of the children or parents in your program is important in deciding whether to use a screening or assessment tool. Most test authors include this information in their manuals. In general, it is better if the norming sample includes individuals of the same age group that you will be assessing, as well as geographic and racial/ethnic diversity, so that the assessment results will be relevant to the families in your program.

Validity. Indicators of a screening or assessment tool’s validity provide information about whether the tool measures what it is supposed to for the purpose it is being used. Several types of validity are commonly used:

  • Content validity. This indicator of validity provides information about whether the screening or assessment tool includes items that are a good representation of the area the tool is supposed to measure. There are no statistics associated with content validity. Instead, it is based on professional judgment from reviews of the items to verify that what they are measuring represents the domain of development that the developer intended them to measure and that they provide variety and a range of difficulty. A good manual will include a description of the procedures followed in ensuring that the content is appropriate and representative.

  • Criterion-related validity. Criterion-related validity indicates how well performance on the screening or assessment tool compares with a criterion, or an independent measure of what the assessment is designed to predict. The criterion measure can be obtained at about the same time or after some interval:

- To establish concurrent validity, test developers and researchers administer the new screening or assessment tool as well as a similar, established tool to the same individuals within a few hours or days. If the correlation between the two measures is high, concurrent validity is established. Strict interpretations require concurrent validity to reach levels of .70 or higher, but as a rule of thumb, many researchers accept .50 or higher as acceptable. Sometimes concurrent validity is expressed in terms of percent agreement between the two measures. In this compendium, we consider 80 percent agreement or higher as acceptable.

- To establish predictive validity, researchers and test developers determine whether the screening or assessment tool conducted at one time point with a group of individuals is correlated with later functioning (these studies are often conducted over two to five years or more). If the correlation between the two measures obtained across the time interval is high, predictive validity is established. If, for example, a measure of vocabulary at age 3 is highly correlated with a test of reading ability in second grade, the vocabulary test could be said to have predictive validity. In some cases, researchers use other activities or events as the criterion, rather than another assessment. For example, predictive validity might be established by correlating age 3 vocabulary with children’s second-grade language report card grades. In general, the younger the child being assessed, the poorer the predictive validity. There is a long history of poor predictive validity among infant tests, with almost none meeting high levels of validity, such as .80. Researchers have advanced many explanations for this, including the important contributions of the different environments to which children are exposed. Because we know the predictive validity of infant and toddler assessment tools is low, in this compendium, we consider a correlation of .40 to be adequate for establishing predictive validity.

Scoring. Alone, the scores from screening and assessment instruments (raw scores) have limited value. It is only when they are compared against a similar group (or norming sample) of children with known characteristics that a child’s score becomes meaningful. Because of this, instrument developers often provide the user with tables for converting raw scores into scores that are normed to a comparison sample. Below are some of the more frequently used normative scores:

  • Percentile rank. The percentile rank indicates a score’s relative ranking, in units 0 to 100, to other scores in the norming sample. A child whose score is at the 65th percentile has scored higher than 65 percent of the children in the norming sample. However, percentiles are not easily comparable to each other because the raw score difference between percentiles will vary depending on the percentiles’ location. The raw score differences between percentiles at the extreme ends of the percentile distribution are larger than raw score differences in the middle of the percentile distribution.

  • Stanine score. Like percentile ranks, stanine scores provide information on children’s performance relative to children in the norming sample, but without the restriction on comparing scores. Stanines divide the normal curve into nine intervals, with the lowest scores falling into the first stanine, the highest scores falling into the ninth stanine, and the fifth stanine straddling the midpoint of the distribution. Except for the two extreme stanines (the first and the ninth), each stanine is one-half of a standard deviation unit, and equal differences between two pairs of stanines represent equal differences in performance. A disadvantage of stanine scores is that they magnify small differences between raw scores that fall on either side of a point separating adjacent stanines.

  • Standardized score. Standardized scores express the difference between a raw score and the mean score in standard deviation units. Standard scores have the properties of the normal curve and maintain the absolute differences between the raw scores. Thus, the difference in performance between standard scores of 85 and 90 is the same as the difference between standard scores of 55 and 60. Three types of standard scores are often used: T-scores, quotients, and normal curve equivalents (NCEs). T-scores have a mean of 50 and a standard deviation of 10, while quotients have a mean of 100 and a standard deviation of 15, and NCEs have a mean of 50 and a standard deviation of 21.06. Most tests of cognitive abilities have a mean of 100 and a standard deviation of 15. For most standardized tests, we consider scores within 30 points of the mean (from 70 to 130) to be in the “normal” range.

    • Age-equivalent scores. An age-equivalent score is the average raw score of children at that age in the norming sample. The age-equivalent score corresponding to a child’s raw score provides information on the child’s level of performance in terms of the age at which that level of performance could be expected, based on the performance of children in the norming sample.

    • Sensitivity is a measure of an instrument’s ability to correctly identify persons with the disorder as having the disorder.

    • Specificity is a measure of an instrument’s ability to identify persons who do not have the disorder as not having the disorder.

1 This discussion is important for interpreting scores from standardized instruments. Scores from other instruments can also be interpreted meaningfully if you can compare the performance of children or parents across two points in time (such as comparing scores at the beginning and end of their program experience.

2 A standard deviation is a measure of the score’s dispersion or variability in a sample. The proportion of scores within a standard deviation unit of the mean score is known. For example, in a normal distribution, 68 percent of all the scores fall between one standard deviation below and one standard deviation above the mean. Thus, scores expressed in standard deviation units enable the user to understand how a child has performed relative to other children in the sample.

 



 

 

 Table of Contents | Previous | Next