Assets for Independence Act Evaluation:
Design Phase, Concept Paper
February 16, 2000
6. |
Impact Analysis |
||
| Mandated Experimental Design | |||
| Research Questions | |||
| Measures | |||
| Challenges Of An Experimental Design | |||
| Nonexperimental Impact Analysis | |||
This section
describes the general proposed strategy for estimating program effects.
We first introduce the statutory mandate for an experimental design.
Subsequent materials address the research questions and outcome measures.
We then discuss the challenges of experimental research and develop
one possible option for estimating impacts through a nonexperimental
approach.
Mandated Experimental Design
The Assets for Independence Act specifies that the research organization shall "for at least one site, use control groups to compare participants with nonparticipants." In the experimental site(s), individuals will be randomly assigned to either a treatment group, which is allowed to participate in the program, or a control group, which is not. In addressing the research questions through an experimental design, Congress has properly sought to establish the strongest empirical foundation for drawing policy implications from the demonstration.
Experimental impact analyses are used to estimate the effects of a program as measured against the outcomes that would have happened in its absence. Measures of this sort provide the best indication possible of the effectiveness of a program in achieving its desired outcomes. For policy makers, the experimental evaluation provides the best policy counterfactual: a control group whose experiences can be interpreted as representing what would have happened to the treatment group in the absence of the demonstration. Any observed differences between the treatment and control groups can be attributed to the program.
Properly implemented, an experimental design through random assignment assures that the control group does not differ from the treatment group in any systematic way other than the receipt of program services. Thus, any subsequent differences in outcomes between the two groups that exceed the bounds of statistical fluctuation can be confidently attributed to the intervention. With any non-random comparison group, there is always a chance that differences in outcomes are the result of pre-existing differences between the two groups, rather than the program itself.
An experimental impact analysis will strive to answer the key research
questions posed by the evaluation by collecting data from the research
sample over a period of time, initially at baseline (i.e., immediately
prior to random assignment) and then at one or more prescribed follow-up
interval(s). Experimental impact studies typically consist of four
elements: baseline data collection; random assignment of program applicants
to treatment and control groups; follow-up data collection; and impact
estimation.
Research Questions
In general, the experimental component of the evaluation will seek to quantify program impacts, or the influence of IDA programs on participating individuals. As a result, many of the research questions concern the difference between participants' pre-program baseline status and their status after participating in an IDA program.
Most fundamentally, AFI programs-and IDA programs more generally-are intended to increase the savings rates and assets of program participants. The experimental research questions will address whether these effects occur, and whether they have longer-term implications for individual well-being. Three major categories of program effects have been identified from the "factors to evaluate" in the AFI legislation. These categories, described below, include effects on savings and asset accumulation, on employment and income, and on the personal well-being of IDA program recipients.
Effects on savings and asset accumulation
The most immediate, short-term impact of IDA program participation is expected to be an increase in the savings of the individual participant. The increased savings then enable the participant to acquire the types of assets favored by the program. These include financial assets, purchased assets, and human capital that reflect allowable uses of IDA funds. Key questions the experimental evaluation will address include:
- What is the difference between baseline (pre-participation) savings account balances and post-participation savings account balances? What portion of IDA program participants' savings balances are their own deposits? What portion are matching funds? What portion (if any) reflect other public investments?
- What is the difference between baseline (pre-participation) purchased assets and post-participation purchased assets?
- Do IDA program participants experience greater improvements in their education levels (including employment training) than members of the control group?
- Do IDA program participants purchase homes at a greater rate than control group members?
- Do IDA program participants start businesses at a greater rate than control group members?
- Does IDA program participation affect individuals' debt holdings?
Effects on employment and income
It is hypothesized that a number of longer-term changes will occur as a result of IDA program participation. Some of these changes concern one's employment status and income situation. The experimental evaluation will seek to monitor such conditions over time. These issues are addressed by a number of experimental research questions which, as above, will be answered through comparisons between the experiences of IDA program participants and a control group:
- What is the effect of program participation on total income levels?
- To what extent does IDA program participation influence levels and rates of employment?
- What effect does program participation have on earned income? On self-employment, sole-proprietorship, or other microenterprise business income?
- To what extent does IDA program participation influence dependency on public assistance programs (e.g., cash assistance, food stamps, Medicaid)? To what extent does any difference in public assistance usage reflect avoidance of receiving public assistance versus helping people to leave public assistance?
Effects on personal well-being
Finally, it is suggested that IDA program participation can result in a second type of long-term impact concerning participants' quality of life. By providing people with working assets, IDA programs are intended to result in increased feelings of self-efficacy, community involvement, future orientation, and other effects. These impacts may be somewhat less tangible than those cited above, but will nonetheless be addressed in the experimental evaluation. Key research questions will include whether IDA program participation influences any of the following outcomes:
- Participants' feelings of self-efficacy?
- Participants' future orientation, planning horizon, or timeline?
- Participants' maintenance and utilization of their personal assets?
- Participants' feelings of financial well-being or financial planning
activities?
Measures
To answer these research questions, a variety of measures will be employed to reach a series of program impact estimates. These estimates will seek to quantify the effect of the program versus the 'counterfactual' -- what would have happened to the IDA participants in the absence of the IDA program. This will be accomplished by comparing the differential between the pre- and post-program experiences of program participants (the "treatment group") versus the same differential experienced by a similar group of people who do not receive IDA services (the "control group").
Importantly, all measures used to draw comparisons between program participants and nonparticipants require data sources that are equivalent for both groups. That is, the method of data collection must be the same for both groups to avoid the possibility of biases inherent in the collected data ("measurement bias").
Two methods are available for collecting the required information and satisfying the requirement of identical sources for the treatment and control groups: 1) survey data (baseline and follow-up); 2) common administrative data (as reported by employers through State unemployment insurance systems). The majority of measures required are not available from existing public data sources, but are conducive to survey collection. As a result, surveys will be the primary source of data required for the experimental evaluation. Original survey instruments will be developed for this project based on its specific requirements.
To the extent that public data can be utilized, however, it may capture certain important measures well (such as earned income and unemployment). Public data sources have several benefits. First, they provide the potential to track individuals' employment record longitudinally. Second, they may be a more reliable and consistent source of information, given that income-related questions can sometimes be difficult and/or uncomfortable for people to answer in a survey. Third, they would reduce the burdens of original data collection from the treatment and control groups.
A focused set of measures flows directly from the research questions posed for the study, as discussed in Section One. Key measures are presented below. Unless otherwise indicated, these measures will be collected using baseline and follow-up surveys:
Effects on savings and asset accumulation
- Savings level at baseline and followup
- Self-investment between baseline and follow-up
- Matching funds received (treatment group only)
- Funds from any other sources
- Net savings increase: savings at follow-up, minus savings at baseline, plus self-investment between baseline and follow-up
- Home ownership and improvement/maintenance
- Business startup
- Other assets and their value (e.g., vehicles, property, other accounts)
- Own educational activity, including employment training
- Debts, by type
- Effects on employment and income
- Employment status, tracked over time (public data source)
- Hours worked per week and hourly wage
- Other private (own) income
- Public assistance use (cash assistance, food stamps, Medicaid)
- Other income sources
- Effects on personal well-being
- Outlook (feelings of self-efficacy, regard for the future, expectations for children)
- Financial well-being / avoidance of hardship
- Activities to improve status (e.g., looked at home purchase or job change opportunities)
- Financial planning activities (e.g., budgeting, goal-setting,
encouraging children to save)
Challenges Of An Experimental Design
Implementation of random assignment and an experimental impact analysis in the context of an ongoing program is not a simple task. It requires careful design and planning, in close consultation with program staff, to ensure that the approach taken is consistent with the overall design and institutional context of the program, both to minimize the intrusion of the evaluation on program activities and to ensure that randomization is not compromised by events in the field.
Along with the benefits of an experimental impact study come some drawbacks that need to be considered. Careful design of random assignment procedures and thorough training of program staff in those procedures will help ensure that the experimental design is implemented as intended. Nevertheless, it is essential to anticipate potential threats to the experimental design and to develop procedures that will minimize the likelihood of their occurrence.
The following is a list of issues or concerns that need to be considered in the design of any experimental impact study.
- Recruiting representative site(s) to participate in the study. Selecting appropriate site(s) to participate in the experimental impacts study is always a challenge. Site(s) selected for the experimental impact study must have the capacity to recruit and serve large numbers of individuals and be willing to participate in the study. However, any site(s) large enough to enroll a sufficient research sample may not be representative of the other programs being funded. The findings may therefore have limited generalizability (specific to the particular program model at the experimental site). As a result, the most important task in site selection will be to establish a broad understanding of the programs funded to date under AFIA and thereby assess the appropriateness of particular programs to serve as evaluation sites.
- Ethical issues of denying services to control group members. This is especially a concern among program staff. However, one must recognize that the programs funded under the Assets for Independence Act have limited funding; as a result, not everyone who wants to be enrolled in a program will necessarily be served, even in nonexperimental sites.
- Need to compensate sites for additional burden placed on them for participating in the experimental study. Sites participating in the experimental study will need additional resources for outreach and recruitment of the research sample, as they are required to recruit twice as many eligible applicants as non-experimental sites. Staff from the research organization must be available to provide technical assistance to staff at the experimental site on various issues pertaining to establishing and maintaining the treatment-control regime. Prior to implementing the study, staff from both organizations will be involved in designing an efficient approach to recruitment and random assignment that will be the least burdensome for the experimental site(s). This includes a recruitment strategy and operational procedures for transferring data from the program site to the research organization and back again.
- Need to minimize mid-course program changes. Experimental impacts are based on the treatment-control differences that emerge over time. To ensure the validity of the results, it is important that all individuals in the treatment group receive a consistently administered set of program benefits and services. If the program intervention shifts after enrollment of the research sample, it becomes extremely difficult to interpret the observed treatment-control differences in outcomes. One way to avoid such confouding is to select site(s) that appear to have settled into a stable set of program rules and are less likely to make changes. Again, this emphasizes the need for a careful and comprehensive site selection process. It is important to discuss this requirement with the site(s) before the study begins, so that any comtemplated changes can be incorporated into the program before recruitment of the sample. At the initial stages of recruitment, a site visit should be conducted to document all recruitment efforts, program eligibility requirements, and services provided by the program.
- Inability to attribute estimated program impacts to specific program features. The experimental design is likely to include only one treatment group at each experimental site. As a consequence, it will not be possible to determine whether any estimated treatment-control differences in outcomes are attributable, for instance, to the availability of matching funds, to the counseling and training services offered to account-holders, or to some combination of these features. Although multiple treatment groups would address this limitation, they require either a larger research sample or some loss of "statistical power" (i.e., a reduced ability to detect program effects).
- Expense of primary data collection. It is costly to conduct
primary data collection over a multi-year period with households
in both a treatment group and a control group, even at minimally
acceptable initial sample sizes and survey response rates. To provide
appreciable statistical power to detect program effects under conventional
assumptions, impact estimates are typically based on at least 300
households per group, with available data at both baseline and one
or more followup intervals. Followup response rates of 70 percent
or higher are generally considered necessary to minimize the risk
of nonresponse bias (i.e., the risk that those not interviewed differ
systematically from the respondents). These requirements entail
a substantial, labor-intensive effort by telephone interviewers
and field survey staff to track, locate, and interview sample cases.
Nonexperimental Impact Analysis
As noted above, there are a number of drawbacks to an experimental design. One option is to undertake nonexperimental impact analysis. Under this alternative approach, instead of using a randomly assigned control group to represent the policy counterfactual, one uses available data on nonparticipants within the general population. Comparable data would then be collected on program participants. Multivariate statistical techniques would be employed to account for observable differences between participants and nonparticipants on individual background characteristics and other contextual factors, such as local economic conditions.
Nonexperimental analysis requires that one has adequate data to parcel out program effects from non-program "external" effects on savings and asset outcomes. If one is unable to control adequately for the external factors, the resulting impact estimates could falsely attribute to the program the effects of underlying demographic or socioeconomic differences between participants and nonparticipants. This is especially problematic in programs such as IDAs, where one expects that participants have greater motivation and initiative than nonparticipants. Such personal traits are typically unmeasured in available data; without any means to properly control for them, one tends to overstate the program=s effects.
With these limitations in mind, it is nonetheless worth considering the merits of nonexperimental approaches. To be feasible, this strategy requires a database that would enable one to measure the savings and asset patterns among households who participate in an AFIA-funded program and also among those who would qualify for, but are not participating in, such a program. For the program participants, as noted above, comparable data would need to be collected through a separate primary data collection effort, to the extent that participants would be found in very small numbers in any national database.
Such a database would need to meet the following criteria:
- It would contain national data with oversampling of the low-income population, to provide sufficient numbers of AFIA-eligible households.
- It would provide detail on income, savings, assets, and liabilities, both to identify the AFIA-eligible households and to track outcomes on savings and asset accumulation.
- It would follow households longitudinally (i.e., over multiyear intervals), to enable one to profile the year-to-year changes in household savings and asset-holdings.
- The data for nonparticipants would cover a time period that coincides with the operation of AFIA programs, to avoid the need to control for the effects of recent changes in external factors such as economic conditions, institutional arrangements, or technology (such as direct deposit or other electronic transfers).
The one dataset that appears to meet these requirements is the Survey of Income and Program Participation (SIPP), which is administered by the U.S. Census Bureau. The features of SIPP that make it well-suited for such analysis are as follows:
- The survey is a series of national "panels" or household samples. The members of each panel are interviewed in successive "waves" every four months, over periods of 22 to 4 years. The most recent panel, the 1996 panel, was introduced in April 1996 and will be interviewed over 12 waves, encompassing 4 years. The twelfth and final wave is about to begin in December 1999. (A 2000 panel will be introduced in February 2000; a 2001 panel will be introduced in February 2001. Each of these is now assumed to cover 22 years.)
- Each panel is a stratified sample of the U.S. civilian noninstitutional population, with oversampling of low-income households. The 1996 panel consists of 36,700 households. (The sizes of the 2000 and 2001 panels have not yet been announced.)
- Detailed financial information is collected for each household. The "core module" of questions administered to each panel at each wave includes items on income sources and amounts, labor force status, living arrangements, and participation in income support programs. Such basic information is recorded for each of the last four months. Additionally, asset information is asked as of the last day of the four-month reference period. The latter items include checking account balances, value of U.S. savings bonds, amounts in individual retirement accounts (IRAs), and outstanding debts and obligations, including unpaid bank loans and credit card bills.
- At each wave, the core questions are supplemented by several "topical modules" that address particular household circumstances. One of the topical modules pertains to "Assets and Liabilities." It is administered every year (i.e., every third wave for each panel).[3] The items include savings accounts, stocks, mutual funds, bonds, Keogh and IRA accounts, and unsecured liabilities (e.g., loans, credit cards, medical bills). One can calculate each individual's net worth in conjunction with other information, including the value of homes and automobiles, collected through another topical module on "Housing Costs and Energy Usage."
One possible approach would be to combine the forthcoming SIPP data from the 2000 and 2001 panels with data that could be collected on a supplementary national sample of AFIA program participants. The intent would be to conduct the supplementary data collection contemporaneously with the SIPP data collection, using the same survey instruments (i.e., the core module and the topical modules on "Assets and Liabilities" and "Housing Costs and Energy Usage"). The joint dataset would then become the basis for a statistical analysis of program effects. One would model savings and asset outcomes as a function of household-specific explanatory variables, including whether or not one participates in an AFIA-funded project.
Undeniably, this proposed strategy for nonexperimental analysis would be an ambitious undertaking. One drawback of relying on SIPP data from upcoming panels is that such data typically do not become available for analysis until approximately two years after collected. Thus, the information on "Assets and Liabilities" from Waves 3 and 6 of the 2000 panel (collected in October of 2000 and 2001) would not become available until late 2002 and 2003, respectively. For the 2001 panel, the corresponding dates would be one year later.
Another drawback of a SIPP-based approach is the risk of measurement
bias in conducting the supplementary data collection for AFIA program
participants separately from the data collection (through SIPP) for
nonparticipants. Even with identical instruments, different interviewing
methods between Census and non-Census interviewers might lead to different
response patterns and thus spurious estimates of program effects.
The preferable strategy for any SIPP-based approach of this type would
be to arrange, if possible, for Census interviewers to conduct the
supplementary data collection for AFIA program participants.
Notes
[3] Nonexperimental impact analysis, to the extent that it makes use of national data, may also potentially include information for all funded sites. [Return to Text]