The safety of people’s lives and property is the primary factor for the success of urban construction. Therefore, in order to better maintain social stability and harmony, relying on computer technology to effectively detect violence and to make decision support has important theoretical and practical significance. Aiming at the shortcomings of traditional manual design feature extraction methods, this paper proposes a super automatic violence detection method based on the combination of Deep Learning and trajectory in AI systems. Firstly, aiming at the problem of complex time and high accuracy of traditional manual feature extraction, a deep spatiotemporal violence detection method based on three-dimen-sional convolution and trajectory in AI systems is proposed. We improve the IDT algo-rithm to extract the target trajectory, and carry out three-dimensional convolution and pool-ing operation to calculate the deep-seated temporal and spatial information in the video frame, so as to realize peer-to-peer detection in AI systems. Secondly, in order to further improve the acquired deep-seated time and space information and utilization rate and achieve high detection rate, the feature fusion of double stream convolution and three-dimensional convolution is proposed, and the feature extraction of continuous video frame sequence is carried out by three-dimensional convolution neural network (C3D), which can effectively extract the fusion feature information of time and space in the classification layer, so as to obtain the final classification result. Finally, in order to solve the problem of too deep network level and slow convergence, dense convolution is introduced, which reduces the parameters of the network model and time complexity. Experimental results show that compared with other mainstream algorithms, this method is more effective and stable, and can be applied to the detection of violent abnormal behavior in video. Mean-while, the method proposed in this paper has important theoretical value and practical sig- nificance for decision support of video surveillance system in AI systems.