Next: Attribute-Oriented Mining On Data Up: High Performance Multidimensional Previous: Multidimensional Data Storage and

OLAP and Data Mining

OLAP is used to summarize, consolidate, view, apply formulae to, and synthesize data according to multiple dimensions. Traditionally, a relational approach (relational OLAP) has been taken to build such systems. Relational databases are used to build and query these systems. A complex analytical query is cumbersome to express in SQL and it might not be efficient to execute. Alternatively, multi-dimensional database techniques (multi-dimensional OLAP) have been applied to decision-support applications. Data is stored in multi-dimensional structures which is a more natural way to express the multi-dimensionality of the enterprise data and is more suited for analysis. A ``cell'' in multi-dimensional space represents a tuple, with the attributes of the tuple identifying the location of the tuple in the multi-dimensional space and the measure values represent the content of the cell. Various sparse storage techniques have been applied to deal with sparse data in earlier methods. We use bit-encoded sparse structure (BESS) is used for compressing chunks since it is suited for fast dimensional operations.

We have evaluated the nature of query operations in OLAP and SSDB in [GC97b], which are dimension oriented classified as:

Retrieval of a random cell element
Retrieval along a dimension or a combination of dimensions
Retrieval for values of dimensions within a range (Range Queries)
Aggregation operations on dimensions
Multi-dimensional Aggregation (Generalization/Consolidation: lower to higher level in hierarchy)

In [GC97b], an analysis is provided for each of these query operations using various sparse data structures and it is shown that BESS is superior to others.

Data can be organized into a data cube by calculating all possible combinations of GROUP-BYs [GBLP96]. This operation is useful for answering OLAP queries which use aggregation on different combinations of attributes. For a data set with k attributes this leads to GROUP-BY calculations. A data cube treats each of the k aggregation attributes as a dimension in k-space. An aggregate of a particular set of attribute values is a point in this space. The set of points form a k-dimensional cube. Data Cube operators generalize the histogram, cross-tabulation, roll-up, drill-down and sub-total constructs required by financial databases.

Attribute-Oriented Mining On Data Cubes

Next: Attribute-Oriented Mining On Data Up: High Performance Multidimensional Previous: Multidimensional Data Storage and

Sanjay Goil
Fri Aug 7 14:58:04 CDT 1998