JISE


  [1] [2] [3] [4] [5] [6] [7] [8] [9] [10] [11] [12] [13]


Journal of Information Science and Engineering, Vol. 35 No. 2, pp. 447-469


DSM: A Low-Overhead, High-Performance, Dynamic Stream Mapping Approach for MongoDB
 


TRONG-DAT NGUYEN AND SANG-WON LEE+
College of Information and Communication Engineering
Sungkyunkwan University
Suwon, 16419 Korea
E-mail: {datnguyen; swlee}@skku.edu


For write-intensive workloads, reclaiming free blocks in flash SSDs is expensive due to data fragmentation problem that leads to performance degradation. This paper addresses that problem in MongoDB, a popular document store in the current market, by introducing a novel stream mapping scheme that exploits unique characteristics of MongoDB and multi-streamed technology. It dynamically assigns streams for corresponding writes according to their hotness values and distinguishes writes on primary index files from writes on secondary index files. The proposed method is high-performance, lowoverhead, and independent of data models or workloads. Empirical results in Linkbench benchmark show that compared to the original WiredTiger our approach improves the throughput and reduces the 99th-percentile latency by up to 65% and 46.2% respectively. Compared to the best-performance in the prior research, our approach improves the throughput and reduces the 99th-percentile latency by up to 23% and 28.5% respectively. Distinguishing writes on primary index files from writes on secondary index files enhances the throughput and the 99th-percentile latency by up to 11.7% and 15.7% respectively. Moreover, by tuning the leaf page size in B+Tree of MongoDB, we can significantly improve the throughput by 1.6-2.1 in Linkbench. 


Keywords: data fragmentation, hot/cold data identification, multi-streamed SSD, NoSQL database, MongoDB

  Retrieve PDF document (JISE_201902_12.pdf)