Integrating Location Context in Patch-based 3D Medical Image Segmentation
Abstract:
Deep learning-based semantic segmentation is highly effective for detecting and segmenting tumors, anomalies, and organs-at-risk in medical imaging. Unlike general computer vision tasks, medical imaging involves unique challenges, such as large 3D volumes from CT and MRI scans that require substantial computational resources. To manage memory constraints, state-of-the-art models like nnUNet and SwinUNETR use patch-based methods, dividing 3D scans into smaller subvolumes. The choice of patch size is crucial—large enough to retain meaningful anatomical context but small enough to fit within hardware limitations. Networks must also infer the relative position of patches within the body to maintain spatial coherence in segmentation tasks. Additional location information can be incorporated using image coordinates, physical scanner coordinates, or body-part regression tools. This project aims to evaluate different methods for integrating location information into CNNs and Transformers for 3D medical image segmentation. Techniques under consideration include coordinate images as additional channels, CoordConv layers, attention mechanisms, and positional embeddings.