Grouping Sets

The GROUPING SETS clause is used in conjunction with the GROUP BY clause to allow you to easily summarize data by aggregating a fact over as many dimensions as you like.  

SQL Group By

Recall that the SQL GROUP BY clause allows you to summarize an aggregation such as SUM or AVG over the distinct members, or groups, of a categorical variable or dimension. 

You can extend the functionality of the GROUP BY clause using SQL clauses such as CUBE and ROLLUP to select multiple dimensions and create multi-dimensional summaries. These two clauses also generate grand totals, like a report you might see in a spreadsheet application or an accounting style sheet. Just like CUBE and ROLLUP, the SQL GROUPING SETS clause allows you to aggregate data over multiple dimensions but does not generate grand totals. 

Data

Let’s start with an example of a regular GROUP BY aggregation and then compare the result to that of using the GROUPING SETS clause. We’ll use data from a fictional company called Shiny Auto Sales. The schema for the company’s warehouse is displayed in the entity-relationship diagram in Figure below:

  • Let’s say we already created a materialized view of a completely denormalized fact table from the sales star schema
  • This DNsales table was created by joining all the dimension tables to the central fact table and selecting only the columns which are displayed.
  • Each record in DNsales contains details for an individual sales transaction
  • It looks like this:

Group By

Here is what using GROUP BY on the auto class dimension to summarize total sales of new autos by auto class would look like:

SELECT   autoclassname, SUM(amount)
FROM     DNsales
WHERE    isNew=True
GROUP BY autoclassname

Grouping Sets

Now suppose you want to generate a similar view, but

  • You also want to include the total sales by salesperson
  • You can use the GROUPING SETS clause to access both the auto class and salesperson dimensions in the same query

Here is the SQL code you can use to summarize total sales of new autos, both by auto class and by salesperson, all in one expression: 

SELECT   autoclassname,salespersonname, SUM(amount)
FROM     DNsales
WHERE    isNew=True
GROUP BY
GROUPING SETS(autoclassname, salespersonname)

Essentially, applying GROUPING SETS to the two dimensions, salespersonname and autoclassname, provides the same result that you would get by appending the two individual results of applying GROUP BY to each dimension separately