Learning SQL - Part 2

I learned the magic of CTEs

Welcome to part 2 of my Learn SQL series! Learning advanced SQL has been a fun ride so far, and to be honest, I’ve been seeing the benefits of understanding SQL so much more and more as I look into advanced data engineering and analytics.

Today’s topic is about CTEs, or also commonly known as Common Table Expressions.

The reason I want to go into CTEs is quite simple - it’s easy to read and understand. I’m sure, many of you would also somewhat agree!

What is a CTE?

A Common Table Expression is a type of temporary table which exists throughout the length of a long query. You can kind of think of it like a temporary result that you can use within main table queries via SELECT, UPDATE, or DELETE statements.

Why CTEs?

As I mentioned before, the main reason why I like CTEs is that they’re so much easier to parse out by eye compared to other temporary expressions such as derived tables.

Consider the following example.

Given the following table, write a SQL query which finds the largest number that only appears once.

num
8
8
3
3
1
4
5
6

For the sample table above, the query should return:

num
6

There are two ways to compute this solution. We’ll go over a derived table solution, and a solution which utilizes CTEs.

Solution

The idea behind the solution is simple. We want to do the following:

  1. Find the set of numbers where total count is not greater than 1
  2. Find the max number from the set specified in 1

Using a Derived Table

SELECT 
    MAX(num) AS num
FROM (
    SELECT 
        num 
    FROM my_numbers
    GROUP BY num
    HAVING COUNT(*) = 1
) valid_nums

Using a CTE

WITH ValidNums AS (
    SELECT 
        num 
    FROM my_numbers
    GROUP BY num
    HAVING COUNT(*) = 1
)

SELECT 
    MAX(num) AS num 
FROM ValidNums

Summary

Did you notice anything different between the two queries?

To be fair, both queries look relatively the same since this problem doesn’t require complicated table joins or filtering. I, however, think it’s important to see that the logical grouping (at eye level) in the CTE solution parses better than the derived query solution.

It is possible to give longer names to derived queries as well as CTEs. For example, the derived query solution calls our temporary table: valid_nums.

This method of naming can cause glance-over parsing difficulty when it involves multiple complex joins. The join names and definitions are usually cluttered on the same lines, which makes it harder to understand.

Kind of like programming best practices, right?

Ready to try it out? I promise you won’t regret it!

Yiping Su
Yiping Su
Engineering | Analytics

I am interested in data, software engineering, and the application of computer science concepts in real-world scenarios.

Related