"Towards Accurate and Fair Prediction of College Success: Evaluating Different Sources of Student Data"
Qiujie Li is a postdoctoral researcher in the Learning Analytics Research Network (LEARN) at New York University. For her doctoral work at UCI, Li specialized in Learning, Teaching, Cognition, and Development. During her doctoral studies, she was advised by Professor Mark Warschauer and Assistant Professor Rachel Baker.
Fischer is an assistant professor of Educational Effectiveness at the Hector Research Institute of Education Sciences and Psychology at the University of Tübingen in Germany and a Research Affiliate with the UCI School of Education. Previously, he was a distinguished postdoctoral scholar at UCI Irvine working under the mentorship of Warschauer and Dean Richard Arum.
Xu researches labor market returns to different degree programs and major areas in higher education and conducts research to explore the impacts of educational programs, interventions, and instructional practice on student course performance, persistence, and degree completion. Xu is co-director of UCI's Online Learning Research Center (OIRC).
Doroudi studies educational data sciences, educational technology, and learning sciences. He is particularly interested in studying the prospects and limitations of data-driven algorithms in learning technologies, including lessons that can be drawn from the rich history of educational technology
In higher education, predictive analytics can provide actionable insights to diverse stakeholders such as administrators, instructors, and students. Separate feature sets are typically used for different prediction tasks, e.g., student activity logs for predicting in-course performance and registrar data for predicting long-term college success. However, little is known about the overall utility of different data sources across prediction tasks and the fairness of their predictions with respect to different subpopulations. Using data from over 2,000 college students at a large public university, we examined the utility of institutional data, learning management system (LMS) data, and survey data for accurately and fairly predicting short-term and long-term student success. We found that institutional data and LMS data both have decent predictive power, but survey data shows very little predictive utility Combining institutional data with LMS data leads to even higher accuracy than using either alone. In terms of fairness, using institutional data consistently underestimates historically disadvantaged student subpopulations more than their peers, whereas LMS data tend to overestimate some of these groups more often. Combining the two data sources does not fully neutralize the biases and still leads to high rates of underestimation among disadvantaged groups. Moreover, algorithmic biases affect not only demographic minorities but also students with acquired disadvantages. These analyses serve to inform more cost-effective and equitable use of student data for predictive analytics applications in higher education.