JISE

Document-stores leverage the flexibility of structured documents to pack closely related data within a single autonomous aggregate (i.e. document). Selecting an appropriate set of aggregates for a document database is a non-trivial task since: (i) there are no clear-cut transformation rules from a conceptual design to a document design; (ii) a large space of design options must often be considered; and (iii) most importantly, it is difficult, if not impossible, to find out a single set of aggregates suitable for all data access patterns.
In a previous work, we proposed distorted replicas: a replication scheme that leverages ubiquitous replication in document-stores and restructures replicated data in different ways to better cope with the heterogeneity of data access patterns. In this paper, we tackle the problem of aggregates selection and replication in an integrated manner. In particular, given a database with a replication factor of C and a workload W , we propose novel cost-driven techniques allowing to: (i) determine the most interesting aggregates; and (ii) pack the most interesting aggregates into C disjoint and complete subsets in such a way that the execution time of W is minimized. Experimental results obtained over two real-world workloads showed that distorted replicas allow to run queries up to tens of times faster than state-of-the-art approaches.