Multimodal Learning for Medicine & Healthcare The Challenges and Opportunities (IN2107)

Feb 5, 2026

Doctors typically make clinical decisions using several modalities, such as images, language, or tabular data. Deep learning models offer a powerful framework for integrating these heterogeneous modalities to support automated and data-driven decision making. However, effective multimodal learning remains non-trivial, facing challenges such as under-optimization across modalities [1] or the presence of missing data [2]. These challenges are particularly pronounced in real-world clinical settings, where data availability, quality, and alignment across modalities vary substantially. In this seminar, we will discuss recent advances in multimodal learning, covering key paradigms such as fusion and alignment mechanisms, self-supervised and contrastive pretraining across modalities, and the emergence of multimodal foundation models for medical AI. We will also examine strategies that address real-world challenges, including handling missing or noisy modalities, improving cross-modal generalization, and enhancing data efficiency and robustness for clinical applications.

References [1] Shicai Wei, Chunbo Luo, and Yang Luo. Boosting multimodal learning via disentangled gradient learning. arXiv preprint arXiv:2507.10213, 2025. [2] Sijie Li, Chen Chen, and Jungong Han. Simmlm: A simple framework for multi-modal learning with missing modality. arXiv preprint arXiv:2507.19264, 2025. [3] Wu, Zhenbang, et al. “Multimodal patient representation learning with missing modalities and labels.” The Twelfth International Conference on Learning Representations. 2024. [4] Yun, Sukwon, et al. “Flex-moe: Modeling arbitrary modality combination via the flexible mixture-of-experts.” Advances in Neural Information Processing Systems 37 (2024): 98782-98805. [5] Zhang, Kai, et al. “A generalist vision–language foundation model for diverse biomedical tasks.” Nature Medicine 30.11 (2024): 3129-3141. [6] Radford, Alec, et al. “Learning transferable visual models from natural language supervision.” International conference on machine learning. PmLR, 2021. [7] Ma, Jun, et al. “Segment anything in medical images.” Nature Communications 15.1 (2024): 654. [8] Li, Songtao, and Hao Tang. “Multimodal alignment and fusion: A survey.” arXiv preprint arXiv:2411.17040 (2024).

Key topics to be covered include:

Introduction to multimodal learning in medicine
Challenges of multimodal learning in clinical applications, including missing and noisy data
Multimodal pretraining for medicine
State-of-the-art methods

Requirements:

Background in image processing and machine learning/deep learning
Interest in medical multimodal learning
Interest in research

Please register via the TUM matching system: https://matching.in.tum.de

Check the intro slides here:

ss26

Julia A. Schnabel

Professor for Computational Imaging and AI in Medicine, Director of the Institute of Machine Learning in Biomedical Imaging

My research interests include machine/deep learning, nonlinear motion modeling, as well as multimodal and quantitative imaging, for cancer-, cardiac-, neuro- and perinatal imaging.