Day 06 of SQL for Data Engineers: CASE Expressions & Business Rules
Part of Week 1: Query Essentials
With todays post. we’ll turn messy real-world data into tidy, useful columns using CASE
. You’ll build flags, buckets, and canonical labels, learn how condition order affects results, and see when to replace big CASE
blocks with a lookup table for maintainability.
TL;DR
CASE
lets you compute a value based on conditions (like IF/ELSE).Use
ELSE
to catch “everything else” (avoid accidentalNULL
s).Order matters:
CASE
returns the first matching branch.Prefer lookup tables over giant
CASE
statements for business rules.
The Patterns
Searched CASE: The everyday workhorse
“Searched” CASE
checks boolean conditions in order and returns the first match.
-- Flag orders by status
SELECT
order_id,
status,
CASE
WHEN status = 'completed' THEN 'success'
WHEN status = 'cancelled' THEN 'cancelled'
ELSE 'other'
END AS status_flag
FROM analytics.orders;
Simple CASE: Convenient equality mapping
When you’re matching a single value to several options, simple CASE
is tidy.
-- Canonicalise site names
SELECT
order_id,
site,
CASE site
WHEN 'dudcorpuk' THEN 'uk'
WHEN 'dudcorpes' THEN 'es'
ELSE 'other'
END AS site_region
FROM analytics.orders;
Bucketing values (dates, numbers)
Turn continuous values into segments, great for dashboards and cohorting.
-- Bucket orders by time of day
SELECT
order_id,
order_ts,
CASE
WHEN EXTRACT(HOUR FROM order_ts) < 12 THEN 'morning'
WHEN EXTRACT(HOUR FROM order_ts) < 18 THEN 'afternoon'
ELSE 'evening'
END AS day_part
FROM analytics.orders;
Building business flags (and avoiding divide-by-zero)
You can combine conditions; just guard risky expressions first.
-- Mark suspicious orders (toy example)
SELECT
order_id,
status,
CASE
WHEN status = 'cancelled' THEN 'suspicious'
WHEN status = 'completed' THEN 'ok'
ELSE 'review'
END AS review_flag
FROM analytics.orders;
Replace giant CASE with a lookup table
Hardcoding dozens of business rules in SQL makes changes risky. A small reference table is cleaner and testable.
-- 1) A tiny reference table for status mapping
CREATE OR REPLACE TABLE ref_status_map (
raw_status VARCHAR PRIMARY KEY,
canonical VARCHAR NOT NULL
);
INSERT INTO ref_status_map VALUES
('completed','completed'),
('complete','completed'),
('cancelled','cancelled'),
('canceled','cancelled');
-- 2) Use it instead of a long CASE
SELECT o.order_id, COALESCE(m.canonical, 'other') AS status_canonical
FROM analytics.orders o
LEFT JOIN ref_status_map m
ON LOWER(o.status) = m.raw_status;
Segment customers using a CASE over aggregates
Derive segments from metrics you compute with GROUP BY
.
-- Segment by total lifetime orders
WITH cust_orders AS (
SELECT customer_id, COUNT(*) AS total_orders
FROM analytics.orders
GROUP BY customer_id
)
SELECT
c.customer_id,
c.email,
COALESCE(co.total_orders, 0) AS total_orders,
CASE
WHEN COALESCE(co.total_orders,0) = 0 THEN 'new'
WHEN COALESCE(co.total_orders,0) BETWEEN 1 AND 2 THEN 'active'
ELSE 'vip'
END AS customer_segment
FROM analytics.customers c
LEFT JOIN cust_orders co ON co.customer_id = c.customer_id
ORDER BY customer_segment, c.customer_id;
Use CASE to control ordering & presentation
Consistent reporting order matters, CASE
can drive custom sort order.
-- Show segments in a stable custom order
ORDER BY CASE customer_segment
WHEN 'vip' THEN 1
WHEN 'active' THEN 2
WHEN 'new' THEN 3
ELSE 4
END,
customer_id;
Data Engineering Wisdom
Always include
ELSE
and log unexpected values, don’t hide them asNULL
.Put complex or changing rules in a reference table; your SQL becomes a simple join.
Keep
CASE
logic idempotent and documented (comments or a design note).Beware performance: huge
CASE
trees in hot paths can be slower than a join to a small mapping table.Reuse: materialise frequently used mappings as views (e.g.,
v_orders_canonical
).
Mini-exercises (5–10 minutes)
Email quality flag: On
analytics.customers
, addemail_quality
with valuesvalid
(non-blank),missing
(NULL/blank).Hint:
TRIM(email) = '' OR email IS NULL
.Expected: id
1,4 → valid
;2,3 → missing
.
Order day part: Add
day_part
(morning/afternoon/evening
) usingEXTRACT(HOUR FROM order_ts)
.Expected: check against your
order_ts
values.
Customer segment: Build the
cust_orders
CTE and assignnew/active/vip
as in the example.Expected with seed: cust
1 → active (2 orders)
,2 → active (1)
,3 → active (Aug order)
,4 → new (0)
.
Lookup vs CASE: Create
ref_site_region(site, region)
with rows('jdsportsuk','uk')
,('jdsportses','es')
. Join it to orders and produceregion
, defaulting toother
when missing.Compare: implement once with
CASE
, once with the lookup; note readability.
Knowledge Check - Day 06 + Day 05 Answers
Answers to yesterdays quiz:
Q1: Choice 1
Q2: Choice 2
Q3: Choice 1
Tips for Different Engines
Postgres/DuckDB/SparkSQL: support searched & simple
CASE
;EXTRACT(HOUR FROM ts)
works;FILTER
on aggregates is available (not needed here).SQL Server: same
CASE
syntax; useDATEPART(HOUR, order_ts)
instead ofEXTRACT
.Short-circuiting: Engines generally return on first match; still guard risky cases early to avoid errors (e.g., check divisor before dividing).
What’s next?
Day 07: Text Cleaning & Regex:
We’ll standardise text with TRIM
, UPPER/LOWER
, REPLACE
, and dip into regex to extract patterns, plus performance caveats and where to put heavy text cleaning in your medallion flow.
As always, have a good one and Happy Coding!