Keep It Simple: Picking the Right Data Science Method to Improve Workforce Training Programs

Publication Date: May 16, 2023
Cover art for Keep It Simple: Picking the Right Data Science Method to Improve Workforce Training Programs

Download Brief

Download PDF (321.44 KB)
  • File Size: 321.44 KB
  • Pages: 4
  • Published: 2023

Introduction

Research Questions

  1. What factors are important in predicting a participant’s outcome in a career pathways program?
  2. Are participant outcomes predictable using simple data science methods such as basic risk indicators?
  3. What is the added value and cost to practitioners of incorporating regression and complex data science methods such as machine learning (ML)?

ACF’s Office of Planning, Research, and Evaluation oversees the Career Pathways Secondary Data Analysis Grants (CPSDA) to support policy-relevant secondary analyses of career pathways programs. Grant recipients use data collected through the Pathways for Advancing Careers and Education (PACE) Project, Health Profession Opportunity Grants (HPOG) 1.0 Impact StudyHPOG 1.0 National Implementation Evaluation (NIE)and HPOG 2.0 National Evaluation to deepen understanding of the implementation and effectiveness of career pathways programs.

Authored by MDRC, this brief summarizes findings and recommendations from a study designed to measure and compare the added value of models used to predict participant success within career pathways programs. The selected models vary in complexity and accuracy of predicted participant success outcomes. While more complex models may be more accurate for providers to base program improvement upon, the study team aims to explore the tradeoffs of complex prediction models such as increased costs and racial biases, and whether they are worth it.

Purpose

As part of program enrollment and participation, workforce program providers have access to program data through their management information systems (MIS) that has the potential to improve program outcomes through data analysis. In addition to improving program outcomes, analyzing the wealth of program data allows providers to identify participants at greater risk for program dropout and tailor the program accordingly to participants with an increased risk.

From a provider perspective, predictive models vary in the level of program value, costs, and complexity.  For example, methods such as machine learning provide the ability for programs to identify hidden patterns without oversight from a data analyst, however there are additional costs and complex processes associated with machine learning.  To help inform practitioner decision-making and explore differences using real-world program data, the brief explores the tradeoffs of using simple to complex data science methodology to analyze career pathways program data.

Key Findings and Highlights

Key findings from the brief include:

  • Program outcomes are predictable even when simple, cost-effective methods of data science are used. 
  • Within HPOG 1.0 programs, the most important factor to predict participant success is prior education level.
  • When one powerful indicator is used to predict program outcomes, results show that the simple model is only marginally less accurate than the best machine learning algorithm.
  • Complex methods such as machine learning provide small gains in predicting program outcomes compared to simple methods. These small gains should be balanced with consideration of program staff resources, decreased transparency in machine learning methods, and bias from the algorithm that can reinforce existing discrimination and inequity.

Methods

Using participant data (N = 5,566) from the Health Profession Opportunity Grants (HPOG) Program evaluation, the study team determined participant success outcomes such as completion of training, in ongoing training, or employed in a healthcare position 15 months after program enrollment. Then, the study team tested several prediction models to explore the added value of each model as they increase in complexity.

The methods included models with and without a specific indicator, regression analyses, and complex machine learning algorithms.  To compare the value of each model, the study team used the F0.5 score to measure the accuracy of each model and their ability to predict program outcomes.

Recommendations

To help decide whether to use machine learning models, workforce program providers should weigh:

  • The need for improvements in predicting performance.  Consider how the results will be used and how crucial it is to see improvements in performance predictions.
  • The size of the data set available for modeling. Depending on the size of program data, simple methods may be able to capture patterns adequately.  For larger datasets, more complex models such as machine learning may be worth consideration.
  • The budget and study timeline. Staff resources and time will be required to test and learn new methods of predictive modeling.  Consider the capacity and ability to develop a reliable model.
  • The potential sources of bias in the predictive study. Consider the areas that bias might be inherently present in the dataset or be introduced through the planned model of prediction.  Identify solutions to mitigate inherent or introduced biases to contribute to the transparency and equity of a study.

Citation

Preel-Dumas, Camille, Richard Hendra, Dakota Denison (2023). Keep It Simple: Picking the Right Data Science Method to Improve Workforce Training Programs, OPRE Report #2023-058, Washington, DC: Office of Planning, Research and Evaluation, Administration for Children and Families, U.S. Department of Health and Human Services.

Glossary

ACF:
Administration for Children & Families
HPOG:
Health Profession Opportunity Grants
ML:
Machine Learning; The use of computer algorithms and statistical models to find patterns and make predictions from data.
MIS:
Management Information System
PACE:
Pathways for Advancing Careers and Education