4. Grouping Data

Group By

(divide rows into groups and applies an aggregate function on each)

  • It divides the rows returned from SELECT statement into groups

  • For each group, we can apply an aggregate function

  • PostgreSQL evaluates this clause something like this

Example 1 (Group By without an aggregate function)

SELECT
	customer_id
FROM
	payment
GROUP BY
	customer_id;

In this case, it works like DISTINCT clause that removes duplicate rows from the result set

Example 2 (Group By with SUM() function)

To select the total amount that each customer has been paid, you use the GROUP BY clause to divide the rows in the payment table into groups grouped by customer id

SELECT
	customer_id,
	SUM(amount)
FROM
	payment
GROUP BY
	customer_id;
ORDER BY
	SUM(amount) DESC;

Example 3 (Group By with JOIN clause)

The following statement uses the GROUP BY clause with the INNER JOIN clause the get the total amount paid by each customer. Unlike the previous example, this query joins the payment table with the customer table and group customers by their names.

SELECT
	first_name || ' ' || last_name full_name,
	SUM(amount) amount
FROM
	payment
INNER JOIN
	customer USING (customer_id)
GROUP BY
	first_name || ' ' || last_name
ORDER BY
	amount DESC;

Example 4 (Group By with COUNT() function)

To find the number of payment transactions that each staff has processed, we can

SELECT
	staff_id,
	COUNT (payment_id)
FROM
	payment
GROUP BY
	staff_id;

Example 5 (Group By with multiple columns)

SELECT
	customer_id,
	staff_id,
	SUM(amount)
FROM
	payment
GROUP BY
	staff_id,
	customer_id
ORDER BY
	customer_id;

Example 6 (Group By with date column)

To group the payment by dates, we can use DATE() function to convert timestamps

SELECT
	DATE(payment_date) paid_date,
	SUM(amount) sum
FROM
	payment
GROUP BY
	DATE(payment_date);

Having

(apply conditions to groups)

  • It specifies a search condition for a group or an aggregate

  • It is often used with GROUP BY clause to filter groups or aggregates

  • Its evaluation sequence is

  • Since, it is evaluated before SELECT clause, we cannot use column aliases in HAVING clause

HAVING vs WHERE

  • WHERE allows us to filter rows based on a specified condition

  • HAVING allows us to filter groups of rows according to specified condition

Example 1 (HAVING with SUM() function)

To find total amount of each customer who have been spending more than 200

SELECT
	customer_id,
	SUM(amount)
FROM
	payment
GROUP BY
	customer_id
HAVING
	SUM(amount) > 200;

Example 2 (HAVING with COUNT)

To select a store that has more than 300 customers

SELECT
	store_id,
	COUNT (customer_id)
FROM
	customer
GROUP BY
	store_id
HAVING
	COUNT (customer_id) > 300;

Last updated