With the expansion of image application scale and the development of artificial intelligence industry, the demand for high precision and high robustness of image feature expression is gradually increasing. Convolutional neural networks (CNNS) have attracted much attention in the field of computer vision in recent years because of their excellent image feature learning ability. In order to further improve the accuracy and robustness of image classification, a second-order statistical convolutional neural network with multilayer fusion and cross-convolutional layer pooling are proposed in this study to obtain feature information of different dimensions in different convolutional layers. Due to the increase of computing and storage costs, an image classification method based on robust covariance of deep convolution features is proposed, and two regularized maximum likelihood estimators are introduced. Finally, the proposed method is tested by fine-grained experiments, image retrieval experiments and image classification experiments. The results show that MFBP (4,5) structure achieves 92.4%, 74.2% and 72.9% accuracy in fine-grained image classification. Compared with MFBP (3,4,5), despite a slight decrease in accuracy, FPS performance is improved by more than 40%, demonstrating the advantages of this method in realtime applications. In addition, RMIE-VND and RMIE-IM methods improved by 4.1%, 8.4%, 7.8% and 7.1%, 12%, 9.8%, respectively, compared with LW methods on different data sets. The experimental results of image classification show that MFBP+RLME (VND+IM) has better performance and applicability than the current main-stream classification methods. This research is of great value for improving the efficiency and accuracy of image processing in big data environment, and provides a new research direction and practical tool for future realtime image application and computer vision system development.