Next: Scheduling Aggregation Operations
Up: Implementation and Optimizations
Previous: Number of aggregates to
Several optimizations can be done over the naive method of calculating
each aggregate separately from the initial data [GBLP96].
- Smallest Parent computes a group-by by selecting the smallest of
the previously computed group-bys from which it is possible to compute
the group-by. Consider a four attribute cube (ABCD). Group-by AB
can be calculated from ABCD, ABD and ABC. Clearly sizes of
ABC and ABD are smaller than that of ABCD and are better
candidates.
- Cache Results: Compute the group-bys in an order in which the next
group-by calculation can benefit from the cached results of the previous calculation.
This can be extended to disk based data cubes by reducing disk scans and I/O.
For example, after computing ABC from ABCD we compute AB followed by A.
- An important multi-processor optimization is to minimize inter-processor
communication. The order of computation should minimize the
communication among the processors because inter-processor
communication costs are typically higher than computation costs.
- For partial cubes, as in mining m-way association rules,
minimizing the number of intermediate cubes, at a level higher
than m.
A lattice framework to represent the hierarchy of the group-bys was
introduced in [HRU96]. This is an elegant model for
representing the dependencies in the calculations and also to model
costs of the aggregate calculations. A scheduling algorithm can be
applied to this framework substituting the appropriate costs of
computation and communication. A lattice for the group-by
calculations for a five-dimensional cube (ABCDE) is shown in Figure
3(a). Each node represents an aggregate and an arrow
represents a possible aggregate calculation which is also used to
represent the cost of the calculation. Applying the optimizations to
this lattice will result in a cube ordering DAG.
Sanjay Goil
Fri Aug 7 14:58:04 CDT 1998