next up previous
Next: Chunk management Up: Implementation and Optimizations Previous: Cube Optimizations

Scheduling Aggregation Operations

Calculation of the order in which the GROUP-BYs are created depends on the cost of deriving a lower order (one with a lower number of attributes) group-by from a higher order (also called the parent) group-by. For example, between ABD tex2html_wrap_inline804 BD and BCD tex2html_wrap_inline804 BD one needs to select the one with the lower cost. Cost estimation of the aggregation operations can be done by establishing a cost model. Some calculations do not involve communication and are local, others involving communication are labeled as non-local. Details of these techniques for a parallel implementation using multidimensional arrays can be found in [GC97a]. For higher dimensions and sparse data sets, a chunk based implementations using BESS needs to determine the order in which chunks are accessed. To reduce disk I/O, chunk reuse has to be maximized. During aggregation, both source and destination chunks can be reused. The aggregation order can either be a depth-first traversal of the DAG or a breadth-first traversal. Each of these can be either done at the individual cube reuse level or at the chunk reuse level, thus resulting in 4 different schedules. At the cube reuse level, a child cube is calculated from the parent by aggregating each chunk of the parent along the aggregation dimension, in a pipelined manner. For a depth-first traversal this would result in the following order of calculation ABCD tex2html_wrap_inline804 ABC tex2html_wrap_inline804 AB tex2html_wrap_inline804 A, and then other calculations from ABCD and so on. Similarly, for a breadth-first traversal, ABCD tex2html_wrap_inline804 ABC, ABCD tex2html_wrap_inline804 ABD, ABCD tex2html_wrap_inline804 ACD, ABCD tex2html_wrap_inline804 BCD are materialized and then other calculations from ABC and so on.

In a chunk-reuse mechanism, the order of calculations are the same but this order is followed on a chunk by chunk basis. We are evaluating these for the effect of various optimizations on performance and results of these will be presented at the conference.


next up previous
Next: Chunk management Up: Implementation and Optimizations Previous: Cube Optimizations

Sanjay Goil
Fri Aug 7 14:58:04 CDT 1998