6. Grouping Sets, Cube, and Rollup
Grouping Sets
(generate multiple grouping sets in reporting)
It a set of columns by which you group by using the
GROUP BY
clauseIt is denoted by a comma separated list of columns placed inside parenthesis i.e
(column1, column2, ...)
Example
The sales table stores the number of products sold by brand and segment

To get number of products sold by brand and segment
SELECT
brand,
segment,
SUM (quantity)
FROM
sales
GROUP BY
brand,
segment;

To get number of products sold by a brand
SELECT
brand,
SUM (quantity)
FROM
sales
GROUP BY
brand;

To get number of products sold by a segment
SELECT
segment,
SUM (quantity)
FROM
sales
GROUP BY
segment;

To get number of products sold for all brands and segments
SELECT SUM (quantity) FROM sales;

To get all of above in a single query
SELECT
brand,
segment,
SUM (quantity)
FROM
sales
GROUP BY
brand,
segment
UNION ALL
SELECT
brand,
NULL,
SUM (quantity)
FROM
sales
GROUP BY
brand
UNION ALL
SELECT
NULL,
segment,
SUM (quantity)
FROM
sales
GROUP BY
segment
UNION ALL
SELECT
NULL,
NULL,
SUM (quantity)
FROM
sales;

Because UNION ALL requires all result sets to have the same number of columns with compatible data types, you need to adjust the queries by adding NULL to the selection list
it has two main problems.
First, it is quite lengthy.
Second, it has a performance issue because PostgreSQL has to scan the sales table separately for each query.
Using the GROUPING SETS clause
SELECT
brand,
segment,
SUM (quantity)
FROM
sales
GROUP BY
GROUPING SETS (
(brand, segment),
(brand),
(segment),
()
);

Grouping function
GROUPING()
function returns bit 0 if argument is member of current grouping set and 1 otherwise
SELECT
GROUPING(brand) grouping_brand,
GROUPING(segment) grouping_segment,
brand,
segment,
SUM (quantity)
FROM
sales
GROUP BY
GROUPING SETS (
(brand),
(segment),
()
)
ORDER BY
brand,
segment;

Cube
(define multiple grouping sets that include all possible combinations of dimensions)
It allows us to generate multiple grouping sets
CUBE subclause is a short way to define multiple grouping sets so the following are equivilant
CUBE (c1, c2, c3)
GROUPING SETS (
(c1, c2, c3),
(c1, c2),
(c1, c3),
(c2, c3),
(c1),
(c2),
(c3),
()
)
In general, if the number of columns specified in
CUBE
isn
, then we will have 2n combinations
Example
Consider the sales
table

The following query uses CUBE
subclause to generate multiple grouping sets
SELECT
brand,
segment,
sum (quantity)
FROM
sales
GROUP BY
CUBE (brand, segment)
ORDER BY
brand,
segment;

Rollup
(generate reports that contain totals and subtotals)
It is different from
CUBE
subclause,ROLLUP
does not generate all possible grouping sets based on specified columns, it just makes a subset of thoseIt assumes a heirarchy among the input columns and generates all grouping sets that make sense considering the heirarchy. This is the reason why
ROLLUP
is often used to generate the subtotals and grand total for reportsExample,
CUBE (c1, c2, c3)
makes all eight possible grouping sets
(c1, c2, c3)
(c1, c2)
(c2, c3)
(c1, c3)
(c1)
(c2)
(c3)
()
However,
ROLLUP (c1, c2, c3)
generates only four grouping sets, assuming the heirarchyc1 > c2 > c3
as follows
(c1, c2, c3)
(c1, c2)
(c1)
()
Common use of
ROLLUP
is to calculate aggregations of data by year, month and date, considering the heirarchyyear > month > date
It is also possible to do a partial roll up to reduce the number of subtotals generated.
SELECT
c1,
c2,
c3,
aggregate(c4)
FROM
table_name
GROUP BY
c1,
ROLLUP (c2, c3);
Example (number of products sold by brand (subtotal) and by all brands and segments (total))
SELECT
brand,
segment,
SUM (quantity)
FROM
sales
GROUP BY
ROLLUP (brand, segment)
ORDER BY
brand,
segment;

If you change the order of brand and segment, the result will be different as follows:
SELECT
segment,
brand,
SUM (quantity)
FROM
sales
GROUP BY
ROLLUP (segment, brand)
ORDER BY
segment,
brand;

The following statement performs a partial roll-up:
SELECT
segment,
brand,
SUM (quantity)
FROM
sales
GROUP BY
segment,
ROLLUP (brand)
ORDER BY
segment,
brand;

Example (finds the number of rental per day, month, and year)
This is rental
table

SELECT
EXTRACT (YEAR FROM rental_date) y,
EXTRACT (MONTH FROM rental_date) m,
EXTRACT (DAY FROM rental_date) d,
COUNT (rental_id)
FROM
rental
GROUP BY
ROLLUP (
EXTRACT (YEAR FROM rental_date),
EXTRACT (MONTH FROM rental_date),
EXTRACT (DAY FROM rental_date)
);

Last updated