Data grouping and aggregation techniques allow us to effectively summarize data, making it easier to perform analytical queries. SQL offers various ways to group and filter aggregated data

Table of Contents
1. GROUP BY
Definition:
The GROUP BY clause groups rows with the same values in specified columns are combined into summary rows. It is commonly used with aggregate functions like SUM, COUNT, AVG, etc. to perform calculations on each group.
Syntax:
SELECT column1, aggregate_function(column2)
FROM table_name
GROUP BY column1;
Example: Using an Orders table with order information for various customers:
Orders Table:
OrderID | CustomerID | OrderDate | Amount |
1 | 101 | 2024-01-05 | 500 |
2 | 102 | 2024-01-10 | 300 |
3 | 101 | 2024-01-12 | 200 |
4 | 103 | 2024-01-15 | 700 |
5 | 102 | 2024-02-01 | 150 |
Query:
SELECT CustomerID, SUM(Amount) AS TotalAmount
FROM Orders
GROUP BY CustomerID;
Output:
CustomerID | TotalAmount |
101 | 700 |
102 | 450 |
103 | 700 |
2. HAVING
Definition:
The HAVING clause is used to filter groups created by the GROUP BY clause, similar to how the WHERE clause is used with individual rows. You use HAVING to set conditions on aggregated data.
Syntax:
SELECT column1, aggregate_function(column2)
FROM table_name
GROUP BY column1
HAVING aggregate_function(column2) condition;
Example: Continuing with the Orders table, to filter customers with a total order amount greater than 500:
Query:
SELECT CustomerID, SUM(Amount) AS TotalAmount
FROM Orders
GROUP BY CustomerID
HAVING SUM(Amount) > 500;
Output:
CustomerID | TotalAmount |
101 | 700 |
103 | 700 |
3. CUBE and ROLLUP for Multi-Level Aggregation
Definition:
CUBE and ROLLUP are extensions of the GROUP BY clause that allow for multi-level aggregations.
- ROLLUP generates a hierarchical summary (from most detailed to least detailed).
- CUBE generates all possible combinations of specified columns, resulting in a full cross-tabulation.
Syntax for ROLLUP:
SELECT column1, column2, aggregate_function(column3)
FROM table_name
GROUP BY ROLLUP (column1, column2);
Syntax for CUBE:
SELECT column1, column2, aggregate_function(column3)
FROM table_name
GROUP BY CUBE (column1, column2);
Note: MySQL and PostgreSQL support both ROLLUP and CUBE syntax. Some databases may require enabling additional settings for CUBE.
Example Using ROLLUP
Suppose we want a hierarchical summary of total sales by CustomerID and OrderDate using ROLLUP.
Query:
SELECT CustomerID, OrderDate, SUM(Amount) AS TotalAmount
FROM Orders
GROUP BY ROLLUP (CustomerID, OrderDate);
Output:
CustomerID | OrderDate | TotalAmount |
101 | 2024-01-05 | 500 |
101 | 2024-01-12 | 200 |
101 | NULL | 700 |
102 | 2024-01-10 | 300 |
102 | 2024-02-01 | 150 |
102 | NULL | 450 |
103 | 2024-01-15 | 700 |
103 | NULL | 700 |
NULL | NULL | 1850 |
In the ROLLUP output:
- NULL in OrderDate indicates totals for each CustomerID.
- NULL in both CustomerID and OrderDate represents the grand total.
Example Using CUBE
Now, let’s generate all combinations of CustomerID and OrderDate for total sales.
Query:
SELECT CustomerID, OrderDate, SUM(Amount) AS TotalAmount
FROM Orders
GROUP BY CUBE (CustomerID, OrderDate);
GROUP BY CUBE (CustomerID, OrderDate);
Output:
CustomerID | OrderDate | TotalAmount |
101 | 2024-01-05 | 500 |
101 | 2024-01-12 | 200 |
101 | NULL | 700 |
102 | 2024-01-10 | 300 |
102 | 2024-02-01 | 150 |
102 | NULL | 450 |
103 | 2024-01-15 | 700 |
103 | NULL | 700 |
NULL | 2024-01-05 | 500 |
NULL | 2024-01-10 | 300 |
NULL | 2024-01-12 | 200 |
NULL | 2024-01-15 | 700 |
NULL | 2024-02-01 | 150 |
NULL | NULL | 1850 |
In the CUBE output:
- Each possible combination of CustomerID and OrderDate is shown.
- The grand total is represented with NULL in both columns.
Conclusion
Data grouping and aggregation in SQL, especially with the GROUP BY, HAVING, ROLLUP, and CUBE clauses, are essential for analyzing large datasets and obtaining summary insights. GROUP BY allows you to break down data into manageable groups while HAVING lets you filter these groups based on aggregate conditions. For advanced scenarios, ROLLUP and CUBE enable multi-level aggregations, providing hierarchical and full cross-tabulated summaries, respectively. Understanding these tools and their syntax variations across SQL databases such as MySQL and PostgreSQL allows you to structure efficient queries tailored to complex analytical needs.
Most Commented