JISE

[1] [2] [3] [4] [5] [6] [7] [8] [9] [10] [11] [12]

Journal of Information Science and Engineering, Vol. 38 No. 3, pp. 499-515

Monaural Instrument Sound Segregation by Stacked Recurrent Neural Network

WEN-HSING LAI¹ AND SIOU-LIN WANG²
¹Department of Computer and Communication Engineering
²Program in Engineering Science and Technology, College of Engineering
National Kaohsiung University of Science and Technology
Kaohsiung, 824 Taiwan
E-mail: {lwh; 0015901}@nkust.edu.tw

A stacked recurrent neural network (sRNN) with gated recurrent units (GRUs) and jointly optimized soft time-frequency mask was proposed for extracting target musical instrument sounds from a mixture of instrumental sound. The sRNN model stacks and links multiple simple recurrent neural networks (RNNs), which makes sRNN an excellent model with temporal dynamic behavior and real deepness. The GRU improves the gate foundations of long short-term memory and reduces the operating time. Experiments were conducted to test the proposed method. A musical dataset collected from real instrumental music was used for training and testing; electric guitar and drum sounds were the target sounds. Objective and subjective assessment scores obtained for the proposed method were compared with those obtained for two models, namely Wave-U-Net and SH-4stack, and a conventional RNN model. The results indicated that electric guitar and drum sounds can be successfully extracted through the proposed method.

Keywords: electric guitar, drums, sound separation, stacked recurrent neural network, gated recurrent unit, time-frequency mask

Retrieve PDF document (JISE_202203_01.pdf)