Skip to main content

Cross-scale Attention Guided Multi-instance Learning for Crohn's Disease Diagnosis with Pathological Images


Deng RRuining , Cui CCan , Remedios LWLucas W , Bao SShunxing , Womick RMR Michael , Chiron SSophie , Li JJia , Roland JTJoseph T , Lau KSKen S , Liu QQi , Wilson KTKeith T , Wang YYaohong , Coburn LALori A , Landman BABennett A , Huo YYuankai . Multiscale multimodal medical imaging : Third International Workshop, MMMI 2022, held in conjunction with MICCAI 2022, Singapore, September 22, 2022, proceedings. 2022 10 12; 13594(). 24-33


Multi-instance learning (MIL) is widely used in the computer-aided interpretation of pathological Whole Slide Images (WSIs) to solve the lack of pixel-wise or patch-wise annotations. Often, this approach directly applies “natural image driven” MIL algorithms which overlook the multi-scale (i.e. pyramidal) nature of WSIs. Off-the-shelf MIL algorithms are typically deployed on a single-scale of WSIs (e.g., 20× magnification), while human pathologists usually aggregate the global and local patterns in a multi-scale manner (e.g., by zooming in and out between different magnifications). In this study, we propose a novel cross-scale attention mechanism to explicitly aggregate inter-scale interactions into a single MIL network for Crohn’s Disease (CD), which is a form of inflammatory bowel disease. The contribution of this paper is two-fold: (1) a cross-scale attention mechanism is proposed to aggregate features from different resolutions with multi-scale interaction; and (2) differential multi-scale attention visualizations are generated to localize explainable lesion patterns. By training ~250,000 H&E-stained Ascending Colon (AC) patches from 20 CD patient and 30 healthy control samples at different scales, our approach achieved a superior Area under the Curve (AUC) score of 0.8924 compared with baseline models. The official implementation is publicly available at