Swin-AFR-FPN: Attention and fusion refined object detection model with swin-transformer backbone for Escherichia coli (E. coli) bacterial cell cycle identification and antibiotic phenotyping


BAYDİLLİ Y. Y.

Journal of Supercomputing, cilt.82, sa.3, 2026 (SCI-Expanded, Scopus) identifier

  • Yayın Türü: Makale / Tam Makale
  • Cilt numarası: 82 Sayı: 3
  • Basım Tarihi: 2026
  • Doi Numarası: 10.1007/s11227-026-08244-8
  • Dergi Adı: Journal of Supercomputing
  • Derginin Tarandığı İndeksler: Science Citation Index Expanded (SCI-EXPANDED), Scopus, Compendex, INSPEC, zbMATH
  • Anahtar Kelimeler: Bacteria detection, Deep learning, Escherichia coli, Object detection, Vision transformers
  • Hakkari Üniversitesi Adresli: Evet

Özet

Escherichia coli (E. coli) bacteria are single-celled, rod-shaped microorganisms that can be pathogenic and cause severe diseases in both humans and animals. Studying the environments in which these microorganisms live, multiply, and colonize can enhance our understanding of their behavior. Additionally, overcoming antibiotic resistance in bacteria poses a significant challenge in the fight against microbial enemies. For these reasons, E. coli strains are cultured in suitable media and monitored under a microscope. However, this approach is both prone to errors and time-consuming. Due to their small size and highly similar morphology, detecting and classifying these cells is a challenging task that requires high-resolution, multi-scale feature representations and leads to substantial computational demands when analyzing large-scale microscopic image data. In this study, a two-stage object detection model with a swin-transformer backbone is proposed for the analysis of growth stages and antibiotic resistance phenotyping of E. coli bacteria. The Attention and Feature Refined Feature Pyramid Network (AFR-FPN), comprising two key components, was introduced to uncover complex hierarchical relationships in the feature maps from the backbone and to enhance the model’s contextual focus capabilities. The Hybrid Channel and Spatial Attention Module (HCSAM) is responsible for weighting the important regions of the feature maps, and the Multi-Scale Feature Fusion Gate (MSFFG) is used to preserve semantic and spatial information during the fusion phase. Finally, region of interest (ROI) areas are identified using a region proposal network (RPN). Ablation experiments were performed to investigate the impact of the backbone, training strategy, and proposed network on the results. The results of the study indicate that the self-attention mechanism in vision transformers outperformed convolution-based models in the detection of microscopic small and dense objects. Additionally, AFR-FPN contributed positively to the model’s overall performance and interpretability. Experimental results demonstrate that the proposed approach achieves a favorable balance between detection accuracy, computational efficiency, and scalability for large-scale microscopic image analysis.