Skip Navigation
Administration for Children and Families  
ACF
ACF Home   |   Services   |   Working with ACF   |   Policy/Planning   |   About ACF   |   ACF News   |   HHS Home

  Questions?  |  Privacy  |  Site Index  |  Contact Us  |  Download Reader™  |  Print      

Office of Planning, Research & Evaluation (OPRE) skip to primary page content
Advanced
Search

 Table of Contents | Previous | Next

Technical Report #5

B-Spline Modeling Techniques

Prepared by Charles Katholi, Ph.D.

As illustrated in Chapter 12, figure 12.2 shows the growth patterns of the four Woodcock-Johnson Sub scales utilized in the study for an age range which includes the ages of the study participants. While each of these is monotone increasing and could possibly be reasonably approximated over the range of interest by a low order polynomial model, the degree of the polynomial would necessarily not be the same in all cases. Thus, for example, the curve for the Letter-Word Identification sub scale would require a cubic polynomial because of the apparent inflection points at about 6 and 9 years of age. The curve associated with the Applied Problems could be well approximated by a quadratic curve. The other two sub scales are such that no reference scores are available prior to age 4. Both scales are flat until ages 5.5 and 5.0 respectively and then increase rapidly. Since some of the children in the study are in this age group, a polynomial model based on a single functional form will not adequately describe this data.

Spline models are made up of polynomial pieces joined together at selected points which are called break points. The pieces are joined together in such a way as to guarantee continuity of the model across the break points. Such models are well suited for modeling the behavior noted above for the two sub scales. The polynomial pieces can be of any order but for purposes of this analysis we have chosen second order polynomials (degree 1) so that the model consists of a sequence of straight line segments. Such models are easily represented mathematically as a linear combination of special spline functions called B-Splines. Thus in its simplest form the model to be used has the form, Equation

where the Equation are parameters to be estimated , the Equation are the B-Splines which depend both on t and a set of break points denoted in the equation by the vector parameter EquationandEquation is a random error. It is well known that the exact placement of the break points is not critical and so we have chosen to use values corresponding to the approximate median ages at each grade where testing was carried out. Thus we have taken interior break points at ages 6,7,8 and 9. In addition to these break points, the spline models require the selection of two additional points, one at each end of the range of values of the independent variable. Again the placement of these values is not critical; they need only be well outside of the range of the data. Finally, in specifying the model, it is necessary to specify two end conditions. For the data in this analysis, the left hand break point is taken at 2 and the value of the spline is specified there to be equal to the reference W score for that age. At the right end of the curve several options are available. One is to again set the value of the spline to be equal to the reference W score at this right break point. The alternative and the one we used in our analysis is to require that the slope of the curve be constant across the break point at age 9 . Experiments with the two methods showed no difference in the results of the analysis. In addition, the lack of sensitivity of the analysis to the position of the interior break points was confirmed by simulation. For the model we have used, the B-Splines are the "hat" functions defined as follows: Let the break point set be EquationwhereEquation Then,

Equation

These function have the desirable property that,

Equation

so that the coefficients are easily interpreted; that is,Equation For our purposes this means that the coefficients will represent the population marginal mean score at each break point. The parameters themselves are actually linear combinations of effect coding variables and so we can fit models to test for differences in sites, the treatment condition, gender and so forth. Because the subjects in the study are of many different ages at each examination point and because the curves indicate that age is a factor in the expected performance of the subject, the spline model adds the information for each subject into the estimation process through the functions Equationand so each subject's observations effect the population means in an appropriate way (see figures on following two pages).

Reference W Scores for the Four Woodcock-Johnson Subscales Used in the Study

[D]

 

Detailed View of Reference W Score for the Woodcock-Johnson Passae Comprehension and Calculation Subscales for ages 4 Through 6

[D]

 

Finally, the data for analysis are longitudinal and so we cannot assume that the observations within a subject over time are statistically independent. As a result we must assume a model for the error structure in the data which reflects this possible dependence. The sequence of observations on each subject is in the nature of a time series and so a reasonable model for the serial dependence within a subject is the model,

Equation

where -1 <Equation <1 , u~Normal(0,Equation ) and the are Equation time points where data is available. This model for the error structure allows for the fact that the time points for each subject are not equally spaced throughout the duration of the study. Finally, it is assumed that the observations are independent

between subjects. The parameters Equationand Equationare two nonlinear parameters which must be estimated as part of the estimation process. Estimates of the parameters in the models were calculated and all hypothesis tests were performed using the SAS PROC MIXED procedure.



 

 

 Table of Contents | Previous | Next