Biomathematicus

Science, Technology, Engineering, Art, Mathematics

See Data and AI Intensive Research with Rigor and Reproducibility (DAIR³)
https://dair-3.org/

Training Biomedical Research Teams for Rigor and Reproducibility in Data Science. The project team will develop a training program to teach skills and tools for rigor and reproducibility in biomedical data science, through a bootcamp with a flexible syllabus and collaborative projects, follow-on mentoring, and an online bilingual study guide that is interactive and self-paced. The trainees will be teams of faculty and technical personnel who will collaborate in the learning process and complement each other’s skills and perspectives. The program will strengthen trainees’ research and enable them to teach rigor and reproducibility at their institutions, which could be especially important for researchers from underrepresented groups and institutions with limited resources.

Funding Agency: National Institute of General Medicines: Training Biomedical Research Teams for Rigor and Reproducibility in Data Science
Research Team:  Juan B. Gutiérrez (PI). University of Texas at San Antonio, Department of Mathematics
Jing Liu (PI), Michigan Institute for Data Science (MIDAS), University of Michigan
Funding:  $450K (UTSA) of $2.25M (Project Total)
Project Period:  2023-2028

Abstract: We will develop a training program to shape the thinking, impart skills and tools for rigor and reproducibility in biomedical data science, and ensure the application of such skills and tools in a wide range of biomedical research through a learning phase (bootcamp with collaborative learning) and an implementation phase (mentoring). In addition, we will enable our trainees to teach their newly acquired skills at their institutions. Our short-term goal is to shape the thinking of biomedical researchers from diverse backgrounds and equip them with skills and tools to improve the rigor and reproducibility of their research. Our long-term goal is to have a long-lasting impact on rigor and reproducibility through the transfer of skills from our trainees to their trainees, improve research outcomes and its benefits to the society, and to strengthen a diverse biomedical data science workforce. Research projects with long data manipulation pipelines face rigor and reproducibility challenges throughout their lifecycle. Despite the efforts of the research community to promote rigor and reproducibility, there lacks systematic training for researchers to build the technical know-how to achieve this in practice. Our program will focus on six topics: 1) Ethical issues in biomedical data science. 2) Data management, representation, data sharing with confidentiality considerations, and metadata. 3) Rigorous statistical design. 4) Design and reporting of predictive modeling. 5) Reproducible workflow. 6) Meta-analysis. Our program will support diversity at four levels. Scientifically, we train researchers who use diverse types of data (from -omics data all the way to population data) to address research questions at various scales. Professionally, we will train faculty and technical personnel at any career stage. Demographically, we will ensure that researchers from underrepresented groups have a strong presence in our program, through intense recruitment effort and by building a friendly learning environment. Institutionally, we will train researchers from major research universities as well as from institutions with limited resources, and we will especially welcome researchers from Minority-Serving Institutions. Our training program will focus on teams of faculty (project PIs) and technical personnel. They both play critical roles to ensure rigor and reproducibility, but may approach this from different perspectives. Training them together will allow them to benefit from each other’s scientific expertise and technical skills and address rigor and reproducibility in a collaborative manner. We will use a combination of training components (lectures, small group intensive sessions and team projects) through an online adaptive learning tool to effectively accommodate the highly variable scientific and technical backgrounds of our teams of trainees.