AI for Vision-Language Models in Medical Imaging (IN2107)


Time: Wednesday 14-16.

Location: - Garching (in-person): FMI, 5610.01.11

Vision-language models (VLMs) in medical imaging leverage the integration of visual data and textual information to enhance representation learning. These models can be pre-trained to improve representations, enabling a wide range of downstream applications. This seminar will explore foundational concepts, current methodologies, and recent advancements in applying vision-language models to diverse tasks in medical imaging, such as:

  • Synthetic image synthesis
  • Anomaly detection
  • Clinical report generation
  • Visual-question answering
  • Classification
  • Segmentation

Please register via the TUM matching system: or write an e-mail to

Check the intro slides here:


Cosmin I. Bercea
Cosmin I. Bercea
Research Scientist

I am a postdoctoral researcher specializing in vision and multimodal learning for medical image analysis, with the current focus on developing vision-language models for generative downstream tasks.

Jun Li
Jun Li
Doctoral Researcher

My research interests include Vision and Language, Multi-Modal Learning, and Cross-Modality Generation.