Cube Optimizations

Next: Scheduling Aggregation Operations Up: Implementation and Optimizations Previous: Number of aggregates to

Cube Optimizations

Several optimizations can be done over the naive method of calculating each aggregate separately from the initial data [GBLP96].

Smallest Parent computes a group-by by selecting the smallest of the previously computed group-bys from which it is possible to compute the group-by. Consider a four attribute cube (ABCD). Group-by AB can be calculated from ABCD, ABD and ABC. Clearly sizes of ABC and ABD are smaller than that of ABCD and are better candidates.
Cache Results: Compute the group-bys in an order in which the next group-by calculation can benefit from the cached results of the previous calculation. This can be extended to disk based data cubes by reducing disk scans and I/O. For example, after computing ABC from ABCD we compute AB followed by A.
An important multi-processor optimization is to minimize inter-processor communication. The order of computation should minimize the communication among the processors because inter-processor communication costs are typically higher than computation costs.
For partial cubes, as in mining m-way association rules, minimizing the number of intermediate cubes, at a level higher than m.

A lattice framework to represent the hierarchy of the group-bys was introduced in [HRU96]. This is an elegant model for representing the dependencies in the calculations and also to model costs of the aggregate calculations. A scheduling algorithm can be applied to this framework substituting the appropriate costs of computation and communication. A lattice for the group-by calculations for a five-dimensional cube (ABCDE) is shown in Figure 3(a). Each node represents an aggregate and an arrow represents a possible aggregate calculation which is also used to represent the cost of the calculation. Applying the optimizations to this lattice will result in a cube ordering DAG.

Sanjay Goil
Fri Aug 7 14:58:04 CDT 1998