Skip Navigation
acfbanner  
ACF
Department of Health and Human Services 		  
		  Administration for Children and Families
          
ACF Home   |   Services   |   Working with ACF   |   Policy/Planning   |   About ACF   |   ACF News   |   HHS Home

  Questions?  |  Privacy  |  Site Index  |  Contact Us  |  Download Reader™Download Reader  |  Print Print      

Office of Planning, Research & Evaluation (OPRE) skip to primary page content
Advanced
Search

 Table of Contents | Previous | Next

CHAPTER VII

EVALUATING INTERVENTIONS TO STRENGTHEN FAMILIES

Earlier chapters of this report described interventions aimed at increasing the likelihood of unwed parents entering a healthy, stable marriage, with the ultimate goal of improving the lives of their children. Strong evidence, however, is lacking on the effectiveness of these interventions for the populations they currently serve. And little or no evidence exists on the effectiveness of these interventions for low-income, unwed parents. Yet policymakers, program officials, and parents need to know whether these interventions are effective, whether they are effective for some populations but not for others, and whether some interventions are more effective than others. They also need to know how the interventions can be improved to better meet the needs of unwed parents and their families. Providing this information requires a thorough and comprehensive evaluation of the interventions.

An evaluation of these interventions would address five main questions.

  1. What are the interventions, and how are they implemented? What are the goals of the interventions? How were they developed and funded? What is the setting in which they are implemented? What is the target population? What are the components of the interventions? How are they implemented? What do they cost? How much are they used?
  2. Do the interventions work? Do the interventions increase the likelihood that the parents will marry? Do they affect the type or quality of the parent’s relationship? Do they affect parenting behavior, father involvement, and family functioning? Do they affect the well-being of the parents? Do they affect children’s development and well-being?
  3. Do the interventions work better for some population groups than others? Do the interventions work best with couples who are more committed at the birth of their children? Couples who are having their first child? Couples in which neither parent has children with other partners? Younger or older couples? The most or least needy families? Other subgroups of the target population?
  4. Do some interventions work better than others? Which interventions work best? Does it depend on how the interventions are implemented?

  5. How do the interventions work? Is there a minimum exposure to the intervention for it to work? Do the interventions affect child development and child and parental well-being by improving the parent’s relationship or via another mechanism?

Policymakers also may want to know whether the benefits from the interventions outweigh the costs of providing the services or changing the policy. Because the interventions are expected to yield benefits to the participants’ children throughout their lives, answering this question adequately may require researchers to follow the children in the evaluation into their adulthood.

This chapter describes how these interventions can be evaluated. It begins by describing the importance of answering the first research question and developing a thorough understanding of the intervention for understanding the evaluation’s findings and for replicating successful interventions (Section A). The chapter then explains the issues involved in addressing the four remaining research questions, including those related to experimental evaluations (Section B), program size (Section C), data needs and sources (Section D), and estimation approaches (Section E). Section F provides some concluding comments.

A. DESCRIBING THE INTERVENTION

Information on which interventions are most effective is only useful to policymakers and practitioners if there is a clear understanding of the model for the successful interventions and how it is implemented. This detailed description will allow practitioners to replicate successful interventions in other sites. It will provide context for interpreting differences in the effectiveness of interventions implemented in different sites and may shed light on why the interventions work well.

While the areas covered in the description may vary depending on the type of intervention, Table VII.1 lists the main topics the description should include. The topics fall into four categories:

  1. Foundation of the Interventions. This includes goals, type of organization providing the services, how the interventions were developed, who they serve, and how they are funded. This will provide an understanding of why the interventions were designed as they were and provides important context to other sites considering replicating the intervention.

 

Table VII.1. Topics for the Description of the Intervention
FOUNDATION OF THE INTERVENTION
Goals: objectives, outcomes expected to be affected, theory underlying the intervention
Organizational Background: type of organization providing services,history of organization,decision toprovide services
Development: agencies involved in developing the intervention, whether it was based on other programs
Target Population: eligibility criteria, age of child at intervention,whether targeting first-time parents,other demographic characteristics of target population
Funding: sources of funding for the interventions
OPERATIONS AND SERVICE DELIVERY
Recruitment and Sustaining Participation: outreach approaches, intake procedures, procedure to encourage participation, approaches to sustaining participation
Components of Intervention: relationship skills services, policychanges, types of servicesprovided toimprove marriageability, approach to providing those services(integrated/assessmentandreferral/information sharing)
Assessments: formality of assessment, types of assessments used, actions taken as a result of assessments
Curriculum: type of curricula used, topics covered, any modifications made
Mode of Service Delivery: whether delivered via classes/workshops, support groups, home visits, case management
Tracking Success: how success is defined and tracked
Staffing: number, background and experience, training, turnover, ease of recruitment
Program Message: extent and content of messages to client about marriage, father involvement, and out-of-wedlock births
COSTS
Staff Costs: wages and salaries, fringe benefits
Other Resources Used: overhead, contracted services, donations and volunteers
INTERVENTION USE
Participation: number of mothers/fathers who use interventions, characteristics of participants
Intensity of Use: average length of time participants spend in the program, frequency of interactions with program, amount and types of services used

 

  1. Operations and Service Delivery. This addresses how programs recruit, components of the intervention (including services to promote couple relationships as well as services to improve marriageability), any assessments and curricula used, how services are delivered (for example, by case management, home visits, classes, or support groups), and background, experiences, and training of the staff. It describes the extent to which staff articulates to clients a clear message about healthy marriage.
  2. Intervention Costs. Information on the cost of providing services or implementing policy changes is important for other sites considering replicating the intervention. Estimates of service costs should include staff costs, costs of contractors, and overhead costs. Policy changes may affect the amount of TANF or other benefits paid to parents.
  3. Intervention Use. Understanding how much participants are exposed to the intervention is critical for interpreting any differences in program impacts across different populations or different sites. This information can be used to estimate whether a greater use of the intervention increases its effectiveness. Dimensions of intervention use should include: number of mothers and fathers who participate with and without their partners, length of time they participate, and amount, types, and intensity of services used.

Trained researchers can obtain most of these data during periodic site visits to the programs. Researchers would conduct staff interviews, observe service provision, review case files, and conduct focus groups of participants. Data on service use is best collected by the program staff and maintained on a management information system designed specifically for the study.

B. AN EXPERIMENT: THE MOST RIGOROUS EVALUATION

The most rigorous approach to determining whether the interventions are effective in strengthening families is to conduct an experiment in which families are randomly assigned to one or more program groups and a control group. Families in the program group are offered the program services or are subject to new policies; those in the control group do not receive the program services and are not subject to the new policies. Compared with other possible evaluation designs, the overwhelming advantage of an experimental design is that any difference in the outcomes of program and control group members can be attributed to the intervention alone, with a known degree of certainty.1

Some interventions cannot be evaluated using an experimental design. It would be difficult, for example, to create a control group to evaluate a community-wide campaign to promote the importance of healthy marriages. When experimental designs are not feasible, the best approach is to develop a comparison group of families similar to those affected by the intervention. For example, outcomes for families in the community with the marriage-promotion campaign could be compared with outcomes for families in similar communities without the campaign. The problem with all nonexperimental approaches is a strong possibility that program group members differ from comparison group members in unobservable ways (such as motivation, attitudes, and culture). This means some differences in the outcomes of program and comparison group members may be a result of these differences rather than as a result of the intervention. If this were the case, the estimates of the impacts of the intervention would be biased.

1. Potential Resistance to Random Assignment

Many program managers find it extremely difficult to deny services to members of a control group, and may see it as contrary to their mission or even unethical. While these concerns are understandable, because it is not known which interventions truly make a difference, formal experimentation has a sound ethical basis. Program managers often are convinced of their program’s effectiveness. Many programs may look effective because some clients had strong outcomes, but these clients may have had these outcomes even in the absence of the programs. In medical science, it is considered ethical to withhold a drug until its efficacy has been established through randomized trials. Program participants, staff and the public who pay for the interventions through taxes, deserve good evidence that they are using their time and money well—evidence that often only an experimental design can provide.

Public relations concerns can arise when deserving, eligible applicants for services are turned away from the program for the purpose of creating a control group. These concerns can be addressed if more eligible families want to participate in the program than there are available slots. Random assignment can be viewed as a lottery, a fair way to decide who gets access to the services and who does not. As long as the flow of applicants is large enough to keep the program operating at the desired capacity and to create a control group, the same number of families will receive services; random assignment will simply create a different rationing mechanism. Even if the flow of applicants to a program is not sufficient to create a control group, the flow can be increased through intensified outreach efforts, unless the program is serving a large proportion of the eligible population.

An experimental study may impose some burden on program staff, which faces the daunting task of dealing with disappointed applicants assigned to the control group. The task can be made less difficult if staff members are trained and provided with materials on how to explain the study to applicants. The key points they need to make are: getting into the program is a true lottery, being selected or not does not reflect on the applicant personally, and each applicant has an equal probability of getting into the program.

Program staff usually agree to random assignment once they fully understand the benefits of an experimental design. The principal benefit to the staff is that an experiment is the only way to provide rigorous and defensible evidence that the program works. Obtaining this evidence is extremely beneficial in obtaining additional program funding as well as in encouraging participation in the program. Another benefit of random assignment is that the ease of recruiting additional eligible applicants indicates the extent of unmet program demand.

2. Defining the Intervention and the Counterfactual

The difference in the services that program and control group members can receive, or the policies they are subject to, determines the question that can be addressed with the evaluation. The outcomes of the program group members measure what the outcomes are with the intervention; the outcomes of the control group members measure the counterfactual—what the outcomes would be in the absence of the intervention. To ensure that the evaluation addresses an interesting policy question and to increase the likelihood that the study will find the intervention to have meaningful impacts, there should be a significant difference between the services offered to the program and control groups and/or in the policies affecting the two groups.

Evaluating a Whole Program. For some program models, such as modified existing relationship or marriage education programs, the policy question of interest may require evaluating a whole program. In this case, the control group should not be able to receive any services from the program. The embargo to receiving program services must last at least as long as the researchers intend to follow the outcomes of the program participants. (In the evaluation of Baby Makes Three, for example, couples in the control group are prevented from entering the program until three years later, when all follow-up data will have been collected.)

The control group members should be allowed to receive services from other programs in the community. This allows the evaluation to address a policy-relevant question: whether the program has any incremental impact relative to the services already available, rather than relative to a hypothetical situation in which no family-strengthening services at all are available. In fact, few services designed to strengthen relationships are available at an affordable cost to low-income populations in most communities, although some services that could improve marriageability (such as employment and training programs) are more readily available.

Evaluating the Addition of a Relationship Component to an Existing Program. Interventions to strengthen families may include adding a relationship component to an existing program, such as Early Head Start or Healthy Start (as discussed in Chapter VI). These programs may be unwilling to deny all program services to the control group. This is especially likely in programs, such as Early Head Start and Healthy Start, which already have been evaluated (Love et al. 2002; Devaney et al. 2000).

An alternative design for these programs would be to deny only the new relationship component to the control group. This evaluation still would address a meaningful policy question: what is the incremental effectiveness of the relationship component? An advantage of this approach is that all the impacts could be attributed to the relationship component. If the control group were denied all services, determining the roles of the relationship component and other program services would require statistical modeling. With this alternative design, consideration must be given to whether the additional relationship component on its own is a strong enough intervention that its impact can be detected with the sample size available.

Evaluating an Enhanced Versus Standard Intervention. Another potential design is to provide members of the program group with a full set of program services, perhaps including a relationship component and other services, and to provide the control group with a smaller set of services. This evaluation would address the question: what is the incremental effectiveness of the enhanced services? An advantage of this design is that the members of the control group would still receive services which may make random assignment more acceptable to program staff. Again, however, consideration must be given to whether the difference between the enhanced and the standard intervention is large enough that its impact can be detected with the sample size available.

Evaluating More Than One Intervention. An evaluation could address the effectiveness of more than one intervention by evaluating different interventions in different sites. This is important when existing evidence does not suggest that one intervention is clearly more effective than others. The downside of this approach is that differences in the effectiveness of different interventions may be attributable to either the different interventions or to differences in the sites in which the interventions are implemented.

A second approach would be to randomly assign families to more than one program group. With more than one program group, the design could test more than one different intervention or it could test the incremental effect of adding components to an intervention. For example, the evaluation of Baby Makes Three has two program groups—one group receives a weekend workshop only while the other group receives the weekend workshop plus a series of support groups. Comparisons of the outcomes of the two program groups will indicate the additional effectiveness of the support groups for those who have received the workshop. The downside of this approach is that the sample size needed to detect policy-meaningful impacts increases substantially with the number of program groups (see Section C).

3. Fitting Random Assignment into the Program’s Intake Procedures

Random assignment needs to fit into the program’s intake procedures in a way that balances several, often competing, research and operational objectives. One objective is to maximize the proportion of sample members who participate in the program. This implies that the impacts will likely be larger and the impact estimates more precise for a given sample size. In addition, fewer research resources will be used to track and interview families in the program group not exposed to the intervention. Because, in most programs, some people change their mind about participating during the intake procedures, the later in the intake procedures that random assignment occurs, the greater the proportion of sample members who will participate in the program.

A disadvantage of conducting random assignment late in the intake process is that because the program has had more contact with the family before random assignment, the assignment process is more likely to disrupt program operations. The later random assignment is conducted, the more time and effort families will have invested in the program at random assignment and the greater the cost to them of being assigned to the control group. And although increasing the proportion of program group members who receive program services has advantages (as described above), it does reduce the opportunity to learn about the reasons some couples do not participate after beginning the intake process.

4. Monitoring the Integrity of Random Assignment

The main threat to the integrity of an experimental design is poor implementation of random assignment. To ensure that random assignment is implemented correctly, close monitoring is crucial. The monitoring should ensure adherence to two basic principles:

  • Every eligible family is randomly assigned and assigned only once. No family should receive program services during the study without having been randomly assigned. If a family reapplies for the program during the study, it will remain in the research group to which it was first assigned.

  • Families assigned to the control group cannot receive services designated for the program group only. Although it may be tempting for program staff to provide services to families it believes will benefit from them, doing so will contaminate the impact estimates and bias the impact estimates toward showing no impact of the program.

C. PROGRAM SIZE CONSIDERATIONS

Even the most rigorous experimental design will fail if sample sizes are not large enough to detect impacts that are meaningful to policymakers and practitioners. Table VII.2 displays the minimum impacts that can be detected for a given target sample size. As some survey nonresponse is inevitable, the number of persons who need to be randomly assigned will exceed these target sample sizes. These impacts are calculated assuming the program and control group are of equal size (a balanced design), because this is the most statistically efficient.

Table VII.2 shows the minimum detectable impacts for five outcome variables. Data on the first three outcome variables—the percentage of parents who marry, whether the parents maintain or improve their relationship status, and whether the father is present at all in the life of his biological child—can be collected from either parent and so response rates will be high. Data on whether the father believes the marriage of parents is beneficial for children can be obtained only from the father, and hence a lower response rate should be expected for this outcome measure. The minimum detectable impacts for the score on the Peabody Picture Vocabulary Test, a measure of a child’s cognitive and language development, was included in the table to show the minimum detectable impacts for variables that only can be collected via an in-person assessment of the child.

A sample of 1,000 families (500 program and 500 control group members) would be sufficient to detect policy meaningful impacts for the full sample. For example, it would be sufficient to detect an impact of 4 percentage points in the percentage of parents in the sample who marry. Similar impacts on marriage rates have been found in studies of programs that did not focus on family formation. For example, the MFIP increased by 4

 

Table VII.2. Minimum Impacts Detectable by Sample Size, for Key Outcomes
Sample Size
(Program / Control)
Percent Married Percent of Couples Who Maintain or Improve Relationship Statusa Percent of Biological Fathers Present in Life of their Children Percent of Fathers Who Believe Marriage Is Better for Kids Child Assessment: Peabody Picture Vocabulary Test Standard Score
Outcomes Expected in Absence of Strategy 9%b 61%b 71%c 77%b 81.1c
250 (125/125) 8.1 13.7 12.8 11.8 4.2
500 (250/250) 5.7 9.7 9.0 8.4 3.0
1,000 (500/500) 4.0 6.9 6.4 5.9 2.1
1,500 (750/750) 3.3 5.6 5.2 4.8 1.7
2,000 (1,000/1,000) 2.9 4.9 4.5 4.2 1.5


Calculations assume: (1) an equal number of treatment and control members; (2) a 95 percent confidence level with an 80 percent level of power; (3) a one-tail test; (4) a reduction in the variance of 20 percent from the use of regression models; and (5) the variance of the Peabody Picture Vocabulary Score is 225.

a A relationship is viewed as "improved" if the couple moves up the ladder of relationships identified in the Fragile Families study (McLanahan et al. 2001) and described in Section E. (back)
b Based on findings from the Fragile Families 12-month follow-up survey. (back)
c Based on findings from the Early Head Start evaluation when the child was about 36 months old (Love et al. 2002). (back)

 

percentage points long-term, unmarried (at random assignment) welfare recipients who were married 36 months later (Miller et al. 2000). PREP increased the likelihood by 24 percentage points that couples were still married three years after the program (Markman et al. 1988). Early Head Start was found to increase scores for three-year old children on the Peabody Picture Vocabulary Test by 2.1 standard scale points (Love et al. 2002), which is the minimum detectable impact with a sample size of 1,000.

If the evaluation includes more than one program group, the sample size would need to be larger to obtain the same minimum detectable impacts. To obtain the same minimum detectable impacts with two program groups rather than one would require the sample size to increase by 50 percent.

Some samples smaller than 1,000 would allow policy meaningful impacts on marriage rates to be detected for the full sample, but the samples would not be large enough to detect impacts for important subgroups. With a sample size of 1,000, impacts of 5.7 percentage points on the likelihood of marriage could be detected for 50-percent subgroups (such as couples cohabiting at random assignment), and impacts of 8.1 percentage points could be detected for 25-percent subgroups (such as teen mothers or mothers who believe the chance of marrying the baby’s father is less than 50 percent).

 

Table VII.3. Size of Programs Included in This Study
Program Estimated Size Other Locations of Program
Baby Makes Three, Seattle, WA 79 couples None

Becoming Parents Program, Naperville, IL 50 couples annually Curriculum used in other states

Bienvenidos Family Services, East Los Angeles, CA Annually:
359 families in home visiting programs
54 fathers in Con Los Padres
426 persons in parenting classes
None

Boot Camp for New Dads, Denver, CO 1,500 fathers annually 128 programs in 35 states

Building Strong and Ready Families, United States Army 435 couples total None
     
Center for Fathers, Families, and Workforce Development (CFWD), Baltimore, MD 180 to 200 men annually None
     
Children First, 77 counties in OK 3,900 to 4,000 new clients annually state-wide David Old's Nurse Home Visitation programs also implemented in 23 other states
     
Family Star, Early Head Start, Denver, CO 75 families
Average Early Head Start program serves 85 families, but size varies from 30 to 200, with most programs serving 60 to 100 families
644 grantees nationwide
     
First Things First, Community-Wide Initiative, Chattanooga, TN 2,300 people annually attending a variety of marriage seminars Other community-wide initiatives include Greater Grand Rapids Community Marriage Project
     
Healthy Start, Heart of America United Way, Kansas City, KS 400 clients annually 94 programs nationwide
     
Healthy Start, Allegheny County, PA 1,300 clients annually 94 programs nationwide
     
Responsible Choices TANF Agency, MD 112 families annually None

 

To meet a sample size of 1,000, during the sample intake period the program must: (1) be able to serve 500 families and (2) identify 1,000 families eligible for the program. Table VII.3 presents the approximate size of the programs in this study’s telephone survey that could provide an estimate of the number of clients served. Of the programs listed, five or six are large enough to serve 500 families in one year.

The difficulty of reaching a sample of 1,000 depends on the target population for the intervention. A change in child support enforcement policy, for example, may be targeted to all low-income unmarried parents. In this case, the sample of 1,000 will be easy to obtain because it is small compared to the population of low-income, unmarried parents. In the United States as a whole, there are currently about 3.6 million single parents living below poverty. However, if the intervention is targeted at low-income unmarried couples at or near the time of the birth of their baby, a sample size of 1,000 is a substantially larger proportion of the target population. Currently, there are approximately 600,000 births annually to unmarried couples living in poverty in the United States.2 Hence, any program targeted at low-income unmarried couples at or near the time of the birth of their baby would need to be located in populous low-income areas so the target population that could potentially be served by the program is sufficiently large.

If existing relationship skills and marriage programs are modified to focus on the needs of low-income couples at or around the time of the birth of their baby, the programs will need to both deploy a large number of staff and invest considerable funds in outreach efforts to obtain a research sample of 1,000. Of the three programs we interviewed that target couples at around the time of the birth, none target low-income or unmarried couples. Although Boot Camp for New Dads would be large enough to yield 1,000 couples a year, the other programs are small, serving fewer than 100 couples annually (Table VII.3). Other existing relationship skills and marriage programs—such as PREP, Relationship Enhancement, PREPARE, and ENRICH—are currently large enough to serve 500 families each annually, but they do not focus on low-income populations or unmarried couples around the time of the birth of their baby.

Meeting the sample size requirements may be less of a challenge when adding a relationship skills component to or strengthening this component in a program that provides services to low-income families—not only because of the target population but also because of program size and well-established recruitment procedures. Some programs already serve 500 or more families per year. For example, as shown in Table VII.3, Healthy Start in Kansas City and Children First in Oklahoma both serve more than 1,000 clients annually (although not all of these clients may wish to participate in a relationship or marriage education program). In addition, these two programs along with Early Head Start already have well-developed recruitment procedures and serve low-income families. Some programs, such as Healthy Start and Early Head Start, currently recruit families around the time of the birth. Others, such as Bienvenidos, already recruit more than they can serve and have waiting lists. Achieving a sample size of 500 families would mean that some programs would need to grow substantially. Early Head Start programs, for example, serve only 85 families on average. The growth in the program required to meet the sample size requirements, however, may change the nature of the program substantially. An evaluation of an expanded program should not begin until any problems related to its expansion have been resolved.

Larger samples could be obtained by evaluating a group of similar programs together. For example, a group of six Early Head Start programs together could yield a sufficient sample during a one-year sample intake period. For the evaluation to yield meaningful findings, however, the programs evaluated together would have to serve similar target populations and provide similar services.

Lengthening the sample intake period would also allow smaller programs to be evaluated. Many studies extend the sample intake period to two or even three years. The duration of the sample intake period is limited, however, by concerns about program burden, the potential for the program to change during the sample intake period, increased survey costs as the survey fielding period lengthens, and delays in obtaining evaluation findings.

For programs that would find it especially difficult to increase recruitment, it may be preferable to have a larger program group than control group. The disadvantage of this unbalanced design is that it is less statistically efficient. Hence, the total sample size would need to be larger, and data collection costs higher, to obtain the same minimum detectable impacts as a balanced sample of the same size. The same minimum detectable impact for the marriage rate variable, for example, could be obtained with a balanced sample of 500 program group members and 500 control group members (1,000 in total) as with an unbalanced sample of 700 program group members and 400 control group members (1,100 in total).

D. DATA NEEDS AND SOURCES

An evaluation of an intervention to strengthen families has three main data needs: (1) data on family outcomes expected to be affected by the intervention, (2) data on the use of services to strengthen families by both program and control group members, and (3) characteristics of the families at random assignment (baseline).

1. Outcome Measures

The heart of the evaluation involves comparing the outcomes of program and control group members. The outcome measures to be collected (Table VII.4) are dictated by the model of family formation and child outcomes presented in Chapter II. The list of outcomes includes intermediate outcomes (such as the status of the mother-father

 

Table VII.4. Outcome Measures and Their Potential Sources
Outcome Source
Marriage and Other Aspects of Mother-Father Relationship  
- Marital status Survey, administrative data
- Type of relationship, living arrangement Survey
- Stability, Quality Survey, observation
- Attitudes and expectations about marriage Survey
Father Involvement and Cooperation in Childrearing  
- Frequency of visits, frequency of father's involvement in different activities Survey
- Contributions in cash or in-kind Survey
- Trust between parents Survey
- Agreement about how to parent, father's influence in child's upbringing Survey
- Attitudes and expectations about father's role Survey
Parent Well-Being  
- Health status Survey
- Mental health and emotional well-being Survey
- Substance abuse Survey
- Criminal behavior Survey
- Employment, earnings Survey, administrative data
- Receipt of TANF, food stamps, and other public assistance Survey, administrative data
- Amount of child support ordered and received Survey, administrative data
Child Well-Being and Development  
- Aggressive, hyperactive, anxious behavior Survey, child assessment
- Emotionality, adaptability, and sociability Survey, child assessment
- Cognitive and language development Survey, child assessment
- Reaching development milestones Survey
- Involvement with child welfare system Survey, administrative data
- Health status Survey
Family Structure  
- Stability of relationship with other romantic partners Survey
- Subsequent children of parents Survey
- Out-of-wedlock births Survey
- Child's living arrangements Survey
Parenting Home Environment, and Parent-Child Relationship  
- Parenting activities Survey
- Discipline strategies Survey
- Support of language and learning in home Survey, observation
- Physical environment of home Survey, observation
- Warmth and harshness of parent-child interaction Observation
- Child-care arrangements Survey
- Parent's feelings about parenting and child Survey
Family Functioning  
- Family organization, control, conflict Survey
- Domestic violence Survey
- Whether child observes violence Survey

 

relationship) as well as long-term outcomes (such as improved child development and improved child and parent well-being). The list of outcomes also includes measures of domestic violence because some programs aim to reduce domestic violence and because some concern has been expressed that programs to promote healthy marriages could inadvertently increase domestic violence.

Mother-Father Relationship. A key outcome in evaluating the interventions is whether the biological parents marry. Equally important is whether the parents’ marriage is healthy and stable. However, the well-being of the parents and children may be improved if biological parents become more committed in their relationship and have a more family-like relationship, even if they do not marry. Hence, the type and stability of the couple relationship are outcomes as well as marriage.

To measure the type of relationship of parents, the Fragile Families study categorized relationships into four types (McLanahan et al. 2001): married, cohabiting, “visiting” (romantically involved but living apart), and not in a romantic relationship. Whether couples maintain their relationship status or move up this “ladder” of relationships (e.g. visitors remain as visitors or begin to cohabit) could be used as a measure of relationship stability.

Many argue it is the quality rather than the type of relationship between parents that is important for child well-being. Relationship quality also is a good predictor of the future status of the relationship. The Locke-Wallace Marital Adjustment Test, a widely used measure of marital satisfaction, has good reliability and validity for identifying distressed couples (Locke and Wallace 1959; Gottman et al. 1977). However, this scale was designed for married couples and is criticized for giving too much weight to one question about respondents’ degree of happiness in their marriages. The Spanier Dyadic Adjustment Scale may be a preferable scale. It derives from the Locke-Wallace scale but includes seven additional items and is worded so it can be used for unmarried couples (Spanier 1976).

Although the Locke-Wallace or Spanier scales have been used widely in analyses of the effectiveness of marital therapy, both have a shortcoming. Couples can score well on these measures if they agree on such topics as family finances, recreation, sex, friends, and in-laws. While agreeing can indicate marital satisfaction, it may just show conflict avoidance (Ryan and Gottman 2002). The Global Relationship Satisfaction Scale (Gottman 1999) avoids this problem.

Other aspects of relationship quality also can be measured. Scales have been developed to measure the degree of commitment in a relationship, such as the Stanley-Markman Relationship Dynamics Scale (Stanley and Markman 1992). The Stanley-Markman Relationship Dynamics Scale predicts the likelihood of future relationship failure (Stanley and Markman 1997).

Observations of couple interactions are widely used to assess the effectiveness of marital interventions. Observations involve videotaping couples interacting (discussing an area of disagreement, for example) and then coding their interactions. Such observations are more likely than interviews to detect impacts on relationship quality. Among studies of programs to improve couple interaction, those that conducted couple observation have more frequently detected impacts than studies that used only interviews (Silliman et al. 2001). These advantages should be balanced against the disadvantages of conducting observations: response rates to observations would likely be low and correlated with the quality of the relationship, and conducting and coding observations is very costly.

Other Intermediate Outcomes. The interventions could affect other intermediate outcomes, such as father involvement, parenting, and family functioning, in three ways. First, the interventions could directly affect these outcomes. For example, the programs may include parenting instruction that affects parenting behavior. Early Head Start was found to have impacts on a wide range of parenting behaviors (Love et al. 2002). Second, improvements in the couple relationship may lead to changes in outcomes. A healthy marriage between the mother and father, for example, is likely to make the family structure more stable and increase father involvement and cooperation in parenting. Third, the programs may affect one intermediate outcome via their effects on another. For example, increased father involvement has been associated with a more cognitively stimulating home environment for children (Williams 1997).

Impacts on these outcomes are important to measure because they may in turn affect child well-being and development. Studies find that changes in family structure have deleterious effects on children (Wu 1996; Najman et al. 1997; and Kurdek et al. 1995). Other studies have shown an association between increased father involvement and child well-being (Cox et al. 1992; Pedersen et al. 1980; and Yogman et al. 1995). The Early Head Start evaluation finds that reductions in children’s negativity and aggressiveness at age three were associated with less physical punishment, lower levels of distress, and greater warmth in parenting (Love et al. 2002).

Long-Term Outcomes: Child Development and Well-Being and Parent Well-Being. Although research clearly showing causal relationships between improved mother-father relationships and child well-being is sparse, many studies have shown a statistical association. Some find that relationship quality and union stability are correlated with good parenting and better child outcomes (Cummings and Davies 1994; Emery 1999). Others find that parental conflicts, marital disruptions, and divorce are associated with behavior disorders in children (Zill and Peterson 1983; Grych and Fincham 1990). Rutter (1971) finds that the longer the discord preceding separation of parents, the greater the resulting antisocial behavior. Using data from the National Surveys of Children, Peterson and Zill (1986) showed that the incidence of child behavior problems increased substantially as the degree of parenting conflict increased. The Cowans (Cowan and Cowan 2002) find that their programs, aimed at improving couple relationships and parenting skills, reduced children’s aggressive and withdrawn behaviors at school and improved their academic performance.

This research suggests that child outcome measures should include: aggressive and withdrawn behavior problems, social development, involvement with the child welfare system, cognitive development, and health. Aggressive behavior problems are especially important developmental indicators because they are good predictors of conduct problems later in school (Love 1997).
The interventions to strengthen families may affect the well-being of parents. Services to improve marriageability may improve parent well-being directly. In addition, increased healthy marriage may also improve parent well-being. Studies find an association between marriage and stronger couple relationships and a wide range of measures of parent well-being (Waite and Gallagher 2000; Cowan and Cowan 1995; Kitson and Morgan 1990). Measures of parent well-being include health, economic measures (such as employment), substance abuse, and involvement with the criminal justice system.

Data Sources. As most outcome data for an evaluation of interventions to strengthen families can only be collected via parent interviews, an evaluation would require follow-up surveys. These surveys would need to collect data from the mother and the father, and perhaps from the child, too. Some outcome data, such as whether a marriage occurred, can be collected from either parent. Other data, such as expectations of marriage and relationship quality, need to be collected from both parents. While some child outcome measures can be constructed from data collected from parents, others require data that can only be collected by specially trained interviewers. Data on some outcomes, such as marriage, divorce, earnings, receipt of public assistance, and involvement with the child welfare system, can be collected from administrative records.

2. Use of Family-Strengthening Services

Differences in the amount, types, and intensity of services received by program and control group members indicate the extent of the intervention. Data should be collected on the use of relationship-strengthening services as well as any services that could increase the marriageability of the parents. To assess the intensity of the intervention, data should be collected on the receipt of these services from any program and for both program and control group members.

Data on the receipt of services at the family-strengthening program are best collected by program staff. Data on services received from other programs by both the program and control group members should be collected by a follow-up survey that occurs shortly after the end of program participation. All follow-up surveys should collect data on service use because one impact of the intervention may be that families learn to take advantage of resources available in the community on a long-term basis.

3. Baseline Characteristics

Collecting data on the characteristics of all sample members at baseline (random assignment) is important because these data can be used to: define subgroups (to address whether the intervention is more effective for participants with particular characteristics), improve the precision of the impact estimates (by controlling for baseline characteristics in regression models), and adjust for survey nonresponse. To minimize survey nonresponse, it is important to collect good contact information on all sample members at baseline. Table VII.5 provides a list of potential baseline data needs.

 

Table VII.5. Baseline Data Needs
Locating Information for Mother and Father
Name, address, telephone numbers, social security number, contact information for relatives and friends of both mother and father
Family Structure
Number of persons in family, ages and relationship of persons in families, support from extended family
Mother-Father Relationship
Whether married, relationship status, whether father visited during hospital stay, living arrangements, length of marriage and relationship, attitudes and expectations about marriage, history of domestic violence, paternity establishment
Employment and Income
Whether mother/father is employed, earnings, whether father pays child support, household income, receipt of government assistance
Prior Marriages and Childbearing
Marital history, number of children with other parent, number of children with others
Demographic Characteristics
Age of mother, father, and child; race/ethnicity of father and child; country of birth of mother and father; religion of mother and father
Child's Characteristics
Weight at birth, gender, health/disabilities
Education
Highest grade completed by mother and father

 

Ideally, data on baseline characteristics should be collected on all sample members just before random assignment. A common way of collecting these data is to ask sample members to complete a short form just before they are randomly assigned. To keep this form short, the first follow-up surveys also can be used to collect some data about the family at random assignment, as long as these data are not susceptible to recall error.

E. ESTIMATING THE IMPACTS OF THE INTERVENTION

Random assignment, if well implemented, eliminates the need to use sophisticated statistical models to obtain unbiased estimates of impacts. In an experiment, the simple difference in mean outcomes between the program and control groups is an unbiased estimate of the impact of the intervention. Regression and related statistical models that include baseline characteristics to explain some of the variance of the outcome measures can increase the precision of the estimates for a given sample size.

1. Analyzing Impacts by Subgroup

It is straightforward to estimate impacts for different population groups—the outcomes of program group members with a particular characteristic can be compared with the outcomes of control group members with the same characteristic. Estimates can be obtained for any subgroup as long as they are defined by a baseline characteristic. Characteristics that could be used to define subgroups of interest in an evaluation of interventions to strengthen families include:

  • Age of the mother and father
  • Race/ethnicity of the mother and father

  • Age of baby (including gestational age of babies not born)

  • Status and length of parent’s relationship at birth of child
  • Whether the parents are cohabiting
  • Whether the parents have previous children together

  • Whether the mother or father has children with previous partners

  • Whether the mother or father has barriers to a healthy marriage, such as substance abuse, mental health problems, or poor labor market prospects
  • Parent’s attitudes toward and expectations of marriage

2. Dealing with Program Nonparticipation and Attrition

In many programs designed to strengthen families, a high proportion of couples recruited for the program either do not show up or leave the program prematurely. In a study of PREP in Denver, for example, only about half of the couples offered a place in the program participated (Markman et al. 1988; Stanley et al. 1995). Only about 50 percent of the contacts made by the Bienvenidos program lead to a family participating in one of the programs.

Because program group members who choose to participate in the program may differ from those who choose not to in ways that are related to the status, stability, and quality of their relationship with the other parents of their children, comparing the outcomes of those program group members who actually receive services with the outcomes of all control group members may lead to biased impact estimates. For example, if couples who are more committed to their relationships are more likely to participate in the program and are more likely to marry or stay married, then comparing the outcomes of those who participate in the program with the outcomes of all control group members will bias the impact estimates in favor of finding the program effective.

One approach that has been used to estimate impacts for programs with high nonparticipation is to match each program group member to a control group member with similar baseline characteristics, such as relationship status or satisfaction (Markman et al. 1988). The outcomes of the program group members who participate are then compared only with the “matched” control group members. This controls for some observable differences between those who participate and those who do not. However, the estimates of the program’s impacts still may be biased because of unobservable differences between those who participate and those who do not, such as motivation, attitudes, and personality.

Differences in the mean outcomes of all members of the program group (including those who did not receive services) and the mean outcomes of all members of the control group will produce unbiased estimates of an “offer” of a place in the program. An estimate of the more policy-relevant impact of the program on those who participate can be obtained by dividing the impact estimate for those offered a place in the program by the proportion of program group members who participated in the program (Bloom 1984). Although this approach yields unbiased impact estimates for those who participate, the estimates will not be precisely estimated if the participation rate is low.

3. Dealing with Sample Attrition

Some survey nonresponse is inevitable, especially in surveys of young fathers, who are often difficult to locate. Sample attrition because of survey nonresponse is problematic if those who respond differ from nonresponders in ways that are correlated with the outcome variable. This would occur if, for example, fathers with stronger relationships with their children were more likely to respond. Although the best approach to survey nonresponse is prevention, statistical techniques, such as propensity-scoring methods (Rosenbaum and Rubin 1983), can be used to adjust for observable differences between responders and nonresponders.

4. Estimating the Impacts of Different Levels of Exposure to the Intervention

Questions often arise about whether interventions are more effective if participants receive greater exposure to the intervention, by either staying longer or by receiving more intensive services. Attrition from many programs designed to strengthen families can be high, resulting in a large variation in the exposure to the services. Only about 40 percent of men who attend the orientation for the Center for Fathers, Families, and Workforce Development (CFWD) program complete the program and only 25 percent of mothers who participate in Children First complete two years of the program.

As families who choose to receive more services may differ from those who choose to receive fewer services, comparing the outcomes of participants who have different exposures to the intervention with the control group is not a valid measure of program impacts. Statistical techniques, such as propensity scoring, can be used to predict the level of exposure to the intervention that the control group members would have received had they been assigned to the program group. Impacts for families with different exposures to the intervention can then be estimated by comparing the outcomes of program group families who received a specific exposure to the intervention with control group families with the same predicted exposure.

5. Determining the Role Played by Intermediate Outcomes in Long-Term Outcomes

Many programs articulate intermediate and long-term goals. For example, one intermediate goal of the CFWD program is to help fathers find work, while a long-term goal is for fathers to be more active in their children’s lives for the well-being of both father and child.

An important question is whether any impacts on long-term outcomes are achieved because of the impacts on intermediate outcomes. For example, an interesting policy question is whether improvements in child well-being result from a stronger relationship between the parents or are attributable to other factors, such as higher family income. Estimates of the relative role of intermediate outcomes can be obtained using statistical techniques referred to as “mediated analysis.”

F. SUMMARY

A thorough and comprehensive evaluation of an intervention to strengthen families is the only way to provide policymakers and other stakeholders good information about whether the intervention is effective. The evaluation should include a detailed description of the design and implementation of the intervention as well as an impact evaluation. To be most defensible, the impact evaluation should be based on random assignment. This will allow differences in the outcomes of program and control group members to be attributed to the intervention alone. Data should be collected on a sample of couples that is large enough to ensure that all policy-relevant impacts can be detected. Data should be collected on the wide range of outcomes that the interventions are expected to affect and for a long enough follow-up period to detect long-term impacts.




1 Eight programs described in this report have either been or are currently being evaluated experimentally: (1) Couple Communication (Russell et al. 1984); (2) Baby Makes Three (Shapiro and Gottman, in progress); (3) Becoming a Family (Cowan and Cowan 1992); (4) Nurse-Family Partnership (Olds et al. 2000), (5) PREP (Markman et al. 1988; Stanley et al. 1995), and (6) Relationship Enhancement (Ridley et al. 1981, 1982; Ridley and Bain 1983; Heitland 1986); (7) Becoming Parents (Jordan et al 2000); and (8) Marriage Moments (Hawkins 2002). (back)

2 This is based on the assumption that the proportion of single parents in poverty at the time of the birth of their baby is the same as the poverty rate among all single-parent households (44.5 percent). (back)

 

 Table of Contents | Previous | Next