Day 06 of SQL for Data Engineers: CASE Expressions & Business Rules
Part of Week 1: Query Essentials
With todays post. we’ll turn messy real-world data into tidy, useful columns using CASE. You’ll build flags, buckets, and canonical labels, learn how condition order affects results, and see when to replace big CASE blocks with a lookup table for maintainability.
TL;DR
CASElets you compute a value based on conditions (like IF/ELSE).Use
ELSEto catch “everything else” (avoid accidentalNULLs).Order matters:
CASEreturns the first matching branch.Prefer lookup tables over giant
CASEstatements for business rules.
The Patterns
Searched CASE: The everyday workhorse
“Searched” CASE checks boolean conditions in order and returns the first match.
-- Flag orders by status
SELECT
order_id,
status,
CASE
WHEN status = 'completed' THEN 'success'
WHEN status = 'cancelled' THEN 'cancelled'
ELSE 'other'
END AS status_flag
FROM analytics.orders;Simple CASE: Convenient equality mapping
When you’re matching a single value to several options, simple CASE is tidy.
-- Canonicalise site names
SELECT
order_id,
site,
CASE site
WHEN 'dudcorpuk' THEN 'uk'
WHEN 'dudcorpes' THEN 'es'
ELSE 'other'
END AS site_region
FROM analytics.orders;Bucketing values (dates, numbers)
Turn continuous values into segments, great for dashboards and cohorting.
-- Bucket orders by time of day
SELECT
order_id,
order_ts,
CASE
WHEN EXTRACT(HOUR FROM order_ts) < 12 THEN 'morning'
WHEN EXTRACT(HOUR FROM order_ts) < 18 THEN 'afternoon'
ELSE 'evening'
END AS day_part
FROM analytics.orders;Building business flags (and avoiding divide-by-zero)
You can combine conditions; just guard risky expressions first.
-- Mark suspicious orders (toy example)
SELECT
order_id,
status,
CASE
WHEN status = 'cancelled' THEN 'suspicious'
WHEN status = 'completed' THEN 'ok'
ELSE 'review'
END AS review_flag
FROM analytics.orders;Replace giant CASE with a lookup table
Hardcoding dozens of business rules in SQL makes changes risky. A small reference table is cleaner and testable.
-- 1) A tiny reference table for status mapping
CREATE OR REPLACE TABLE ref_status_map (
raw_status VARCHAR PRIMARY KEY,
canonical VARCHAR NOT NULL
);
INSERT INTO ref_status_map VALUES
('completed','completed'),
('complete','completed'),
('cancelled','cancelled'),
('canceled','cancelled');
-- 2) Use it instead of a long CASE
SELECT o.order_id, COALESCE(m.canonical, 'other') AS status_canonical
FROM analytics.orders o
LEFT JOIN ref_status_map m
ON LOWER(o.status) = m.raw_status;Segment customers using a CASE over aggregates
Derive segments from metrics you compute with GROUP BY.
-- Segment by total lifetime orders
WITH cust_orders AS (
SELECT customer_id, COUNT(*) AS total_orders
FROM analytics.orders
GROUP BY customer_id
)
SELECT
c.customer_id,
c.email,
COALESCE(co.total_orders, 0) AS total_orders,
CASE
WHEN COALESCE(co.total_orders,0) = 0 THEN 'new'
WHEN COALESCE(co.total_orders,0) BETWEEN 1 AND 2 THEN 'active'
ELSE 'vip'
END AS customer_segment
FROM analytics.customers c
LEFT JOIN cust_orders co ON co.customer_id = c.customer_id
ORDER BY customer_segment, c.customer_id;Use CASE to control ordering & presentation
Consistent reporting order matters, CASE can drive custom sort order.
-- Show segments in a stable custom order
ORDER BY CASE customer_segment
WHEN 'vip' THEN 1
WHEN 'active' THEN 2
WHEN 'new' THEN 3
ELSE 4
END,
customer_id;Data Engineering Wisdom
Always include
ELSEand log unexpected values, don’t hide them asNULL.Put complex or changing rules in a reference table; your SQL becomes a simple join.
Keep
CASElogic idempotent and documented (comments or a design note).Beware performance: huge
CASEtrees in hot paths can be slower than a join to a small mapping table.Reuse: materialise frequently used mappings as views (e.g.,
v_orders_canonical).
Mini-exercises (5–10 minutes)
Email quality flag: On
analytics.customers, addemail_qualitywith valuesvalid(non-blank),missing(NULL/blank).Hint:
TRIM(email) = '' OR email IS NULL.Expected: id
1,4 → valid;2,3 → missing.
Order day part: Add
day_part(morning/afternoon/evening) usingEXTRACT(HOUR FROM order_ts).Expected: check against your
order_tsvalues.
Customer segment: Build the
cust_ordersCTE and assignnew/active/vipas in the example.Expected with seed: cust
1 → active (2 orders),2 → active (1),3 → active (Aug order),4 → new (0).
Lookup vs CASE: Create
ref_site_region(site, region)with rows('jdsportsuk','uk'),('jdsportses','es'). Join it to orders and produceregion, defaulting tootherwhen missing.Compare: implement once with
CASE, once with the lookup; note readability.
Knowledge Check - Day 06 + Day 05 Answers
Answers to yesterdays quiz:
Q1: Choice 1
Q2: Choice 2
Q3: Choice 1
Tips for Different Engines
Postgres/DuckDB/SparkSQL: support searched & simple
CASE;EXTRACT(HOUR FROM ts)works;FILTERon aggregates is available (not needed here).SQL Server: same
CASEsyntax; useDATEPART(HOUR, order_ts)instead ofEXTRACT.Short-circuiting: Engines generally return on first match; still guard risky cases early to avoid errors (e.g., check divisor before dividing).
What’s next?
Day 07: Text Cleaning & Regex:
We’ll standardise text with TRIM, UPPER/LOWER, REPLACE, and dip into regex to extract patterns, plus performance caveats and where to put heavy text cleaning in your medallion flow.
As always, have a good one and Happy Coding!


