Author: Chunyu Hou (Tianjin University) - Visual attention has been widely applied to force neural networks to focus on target feature representations, improving object detection accuracy. However, in the practical deployment of power systems, access to high-quality data involving privacy is often restricted. From the perspective of network learning, this study reexamines the issues of overfitting and attention bias in handling limited training data and imbalanced features, and proposes a Foreground Attention Mechanism (FAM). FAM trains a universal attention mechanism using the deepest outputs of the backbone network and achieves commonality extraction of foreground features and effective target anchoring through deep-shallow feature fusion and similarity computation of deep feature activations. Additionally, the deep feature activation map is incorporated as an auxiliary objective function, combined with the original detection loss, to train the detector in an end-to-end manner. FAM is integrated into a two-stage object detection framework, consisting of a convolutional neural network backbone, a region proposal network, and a spatially aware classifier. The study demonstrates that FAM not only helps the network focus more effectively on foreground objects but also enables the classifier to concentrate on key positive samples. Extensive experiments on the substation operation maintenance target detection datasets (SOM) and the large-scale benchmark datasets COCO2017 show that the proposed FAM algorithm surpasses the current state-of-the-art methods in two-stage object detection.