Ismail Chitaouy, Abdelaziz BERRADO
Research Team AMIPS
Mohammed V University in Rabat
Rabat, Morocco
ismail.chitaouy@research.emi.ac.ma, berrado@emi.ac.ma
Strategies and Obstacles in Curating Data for Breast Cancer Prediction Models: A Foundation for AI-Driven Models
Abstract
Breast cancer continues to impact millions globally, driving urgent demand for innovative tools to improve early detection. While AI and machine learning offer transformative potential in predicting breast cancer, their success hinges on overcoming significant data-related barriers. This work examines practical approaches to assembling robust datasets, such as tapping into open-access medical repositories, forging partnerships with hospitals and research networks, and combining diverse data streams—including clinical histories, imaging and molecular biomarkers. A key focus is ensuring datasets reflect varied demographics, geographies, and cancer subtypes to reduce algorithmic bias and enhance real-world applicability.
Despite these efforts, obstacles persist. Strict patient privacy laws, institutional hesitancy to share sensitive health data, and fragmented annotation practices often stall progress. Limited availability of comprehensively labeled datasets and disparities in imaging equipment or protocols further compound these issues. To address these constraints, this study advocates for adaptive methodologies, such as cross-institutional data alliances that anonymize and aggregate records ethically, semi-supervised learning to extract insights from partially labeled data, and expert-guided annotation frameworks to improve label consistency. Additionally, harmonizing imaging standards and promoting interoperable formats could streamline data integration. By prioritizing these collaborative and methodological innovations, researchers can cultivate richer, more inclusive datasets while adhering to ethical guidelines. Advancing these strategies not only refines AI model accuracy but also strengthens trust in their clinical deployment, offering a pathway to earlier interventions and equitable care for diverse patient populations.
Keywords
Breast cancer risk prediction, Healthcare data integration, Breast cancer datasets, Longitudinal Data, AI-driven diagnosis.
Biography
Ismail CHITAOUY is a PhD researcher at the EMI School of Engineering in Rabat. His research focuses on the development and optimization of AI-driven models for medical applications, particularly in breast cancer treatment and prognosis leveraging longitudinal data. Prior to his PhD, he worked in the field of software engineering, gaining expertise in system development, data engineering.
He holds an integrated Master of Engineering in Computer Science from Durham University in the United Kingdom, where he explored the use of computer vision in autonomous driving as part of his bachelor dissertation, and worked on various interdisciplinary projects. His master's dissertation focused on the application of AI in serious collaborative games for teaching. His academic and professional journey reflects a strong commitment to leveraging artificial intelligence for real-world challenges, with a particular emphasis on healthcare innovation and predictive analytics.
Abdelaziz BERRADO is Professor of Industrial Engineering in EMI School of Engineering at Mohammed V University in Rabat. He was previously Deputy Director of Research and Cooperation and Industrial Engineering Department Chair at the same institution. Prior to his current position, he was a faculty member of Engineering Management at AlAkhawayn University in Ifrane. He holds a Ph.D. in Decision Systems and Industrial Engineering from Arizona State University. He researches advanced analytical methods and frameworks for knowledge generation and decision support in organizations. He focuses on data analytics for Operations and Supply Chain Modelling, Planning, Improvement and Control with applications in Healthcare, Education and other Industries. He has led several funded applied research projects with local and international impact and published research papers in renowned journals. In addition to academic work, he interacts closely with the industry through training and consulting projects. He is a fellow of IEOM Society and a member of INFORMS and IEEE. Previously, he was senior engineer and data analytics lead at Intel.