Login with ORCID

document/10564463

Full identifier: https://ieeexplore.ieee.org/document/10564463

Assigned to 3 classes:

Described in 2 nanopublications:

References

Nanopublication Part Subject Predicate Object Published By Published On
links a nanopublication to its assertion http://www.nanopub.org/nschema#hasAssertion assertion
document/10564463
Emily Regalado
2026-01-21T17:46:40.759Z
links a nanopublication to its assertion http://www.nanopub.org/nschema#hasAssertion assertion
document/10564463
Emily Regalado
2026-01-21T17:46:40.759Z
links a nanopublication to its assertion http://www.nanopub.org/nschema#hasAssertion assertion
document/10564463
Addressing the Challenge of Missing Medical Data in Healthcare Analytics: A Focus on Machine Learning Predictions for ICU Length of Stay
Emily Regalado
2026-01-21T17:46:40.759Z
links a nanopublication to its assertion http://www.nanopub.org/nschema#hasAssertion assertion
document/10564463
Emily Regalado
2026-01-21T17:46:40.759Z
links a nanopublication to its assertion http://www.nanopub.org/nschema#hasAssertion assertion
document/10564463
Emily Regalado
2026-01-21T17:46:40.759Z
links a nanopublication to its assertion http://www.nanopub.org/nschema#hasAssertion assertion
document/10564463
This research investigates the impact of missing data on the performance of machine learning algorithms, with a particular focus on the MIMIC-IV dataset. This project aims to investigate the extent to which missing data negatively impacts the training of machine learning algorithms, and whether demographic groups with a higher proportion of missing data (i.e.,ethnicity) have lower predictive accuracy. Using advanced machine learning and data analysis techniques, our results highlight important considerations related to missing data in medical datasets and provide useful insights for improving predictive modeling and decision support systems in clinical practice offers. Major findings:This investigation leveraged the MIMIC-IV v2.2 dataset—containing de-identified data from 73,141 ICU admissions at Beth Israel Deaconess Medical Center—to study the impact of missing data on machine learning. The research found that while electronic health records (EHRs) offer massive clinical datasets, they are often non-standardized and riddled with missing values. By predicting hospital Length of Stay (LOS), the study showed that as data is missing "not at random," algorithm performance (measured by RMSE) degrades. Specifically, when datasets were intentionally biased to have more missing entries for certain racial groups (Asian, Black, Hispanic, etc.), the predictive error for those specific groups increased in 83% of "aggressive" data removal tests. This highlights that simply imputing or completing missing data can entrench existing healthcare inequities.
Emily Regalado
2026-01-21T17:46:40.759Z
links a nanopublication to its assertion http://www.nanopub.org/nschema#hasAssertion assertion
document/10564463
2024
Emily Regalado
2026-01-21T17:46:40.759Z
links a nanopublication to its assertion http://www.nanopub.org/nschema#hasAssertion assertion
document/10564463
2023
Emily Regalado
2026-01-21T17:46:40.759Z
links a nanopublication to its assertion http://www.nanopub.org/nschema#hasAssertion assertion
document/10564463
Emily Regalado
2026-01-21T17:46:40.759Z
links a nanopublication to its assertion http://www.nanopub.org/nschema#hasAssertion assertion
document/10564463
Emily Regalado
2026-01-21T17:46:40.759Z