JISE


  [ 1 ] [ 2 ] [ 3 ] [ 4 ] [ 5 ] [ 6 ] [ 7 ] [ 8 ] [ 9 ] [ 10 ] [ 11 ] [ 12 ]


Journal of Information Science and Engineering, Vol. 38 No. 3, pp. 499-515


Monaural Instrument Sound Segregation by Stacked Recurrent Neural Network


WEN-HSING LAI1 AND SIOU-LIN WANG2
1Department of Computer and Communication Engineering
2Program in Engineering Science and Technology, College of Engineering
National Kaohsiung University of Science and Technology
Kaohsiung, 824 Taiwan
E-mail: {lwh; 0015901}@nkust.edu.tw


A stacked recurrent neural network (sRNN) with gated recurrent units (GRUs) and jointly optimized soft time-frequency mask was proposed for extracting target musical instrument sounds from a mixture of instrumental sound. The sRNN model stacks and links multiple simple recurrent neural networks (RNNs), which makes sRNN an excellent model with temporal dynamic behavior and real deepness. The GRU improves the gate foundations of long short-term memory and reduces the operating time. Experiments were conducted to test the proposed method. A musical dataset collected from real instrumental music was used for training and testing; electric guitar and drum sounds were the target sounds. Objective and subjective assessment scores obtained for the proposed method were compared with those obtained for two models, namely Wave-U-Net and SH-4stack, and a conventional RNN model. The results indicated that electric guitar and drum sounds can be successfully extracted through the proposed method.
 


Keywords: electric guitar, drums, sound separation, stacked recurrent neural network, gated recurrent unit, time-frequency mask

  Retrieve PDF document (JISE_202203_01.pdf)