Day 02 of SQL for Data Engineers: Sorting & Paging
Part of Week 1: Query Essentials
Welcome back to the SQL for Data Engineering series, sticking with the weekly theme of query fundamentals we are going to be touching on sorting and paging, essentially… making results deterministic. You’ll learn reliably with ORDER BY, handle ties, control where NULLs appear, and paginate results, first with OFFSET/LIMIT, then with the faster keyset/seek approach for bigger datasets.
TL;DR
Results are unordered unless you add
ORDER BY.Add a tie-breaker (e.g.,
ORDER BY event_ts, id) for stable, repeatable results.Control nulls with
NULLS FIRST/LAST(or aCASEexpression).OFFSET/LIMITis simple but slow at deep pages; prefer keyset pagination.
The Core Patterns
Deterministic ordering with tie-breaking
ORDER BYtells the database exactly how to arrange rows.If the first sort column has duplicate values, the final order among those ties is undefined unless you add another column.
A unique tie-breaker (often the surrogate key) makes the order stable across runs.
-- July 2025 orders, oldest→newest, break ties by order_id
SELECT order_id, customer_id, order_ts, status
FROM analytics.orders
WHERE order_ts >= TIMESTAMP '2025-07-01 00:00:00'
AND order_ts < TIMESTAMP '2025-08-01 00:00:00'
ORDER BY order_ts ASC, order_id ASC;Put real emails first and the missing ones last
By default, different engines place
NULLs differently (some first, some last).You can control placement explicitly: use
NULLS FIRST/LAST(where supported) or sort by a boolean/CASEexpression that groups missing values.
-- Real emails first, newest customers first within each group
SELECT customer_id, email, created_at
FROM analytics.customers
ORDER BY
(TRIM(email) = '' OR email IS NULL) ASC, -- false(0)=real first, true(1)=missing last
created_at DESC;
OFFSET/LIMIT paging (simple, but costly at deep pages)
OFFSET n LIMIT mskipsnrows and returnsm.It’s great for beginner paging and small pages, but for page 500 the engine still has to scan and sort all previous rows before skipping them, which gets slow and expensive.
-- Page 2 of July orders, 2 rows per page
WITH ordered AS (
SELECT order_id, order_ts
FROM analytics.orders
WHERE order_ts >= TIMESTAMP '2025-07-01 00:00:00'
AND order_ts < TIMESTAMP '2025-08-01 00:00:00'
ORDER BY order_ts, order_id
)
SELECT * FROM ordered
OFFSET 2 LIMIT 2; -- skips first 2, returns next 2
Keyset/seek pagination (fast and scalable)
Instead of “skip N rows,” ask for “rows after the last row you saw.”
You remember the last page’s sort keys (e.g.,
last_ts,last_id) and fetch strictly greater than that position.This avoids scanning/skipping thousands of rows and scales with big tables.
-- After previous page ended at (:last_ts, :last_id)
SELECT order_id, order_ts
FROM analytics.orders
WHERE order_ts >= TIMESTAMP '2025-07-01'
AND order_ts < TIMESTAMP '2025-08-01'
AND (order_ts > :last_ts OR (order_ts = :last_ts AND order_id > :last_id))
ORDER BY order_ts, order_id
LIMIT :page_size;Standard alternative to LIMIT
Many engines support the ANSI form
FETCH FIRST n ROWS ONLY.It’s functionally similar to
LIMIT nbut more portable in some enterprise environments.
SELECT ...
FROM ...
ORDER BY ...
FETCH FIRST 10 ROWS ONLY;Data Engineering Wisdom
If a query feeds downstream jobs or tests, always specify
ORDER BYwith a unique final key.Be explicit about
NULLplacement to keep CSV exports/reports consistent.Use
OFFSET/LIMITfor quick prototypes; switch to keyset when performance matters.Log your paging cursor (the last seen keys) so retries can resume safely.
Mini-exercises (5–10 minutes)
July 2025 ordered list: Return
order_id, order_tsfor July 2025 ascending, tie-break byorder_id.Expected order_ids:
1001, 1002, 1003.
Customers: real emails first: Sort with non-blank emails first, then
created_at DESC.Expected customer_id order:
4, 1, 3, 2.
Paging practice: Using the result from #1 and page size = 2, what does page 2 return with
OFFSET 2 LIMIT 2?Expected order_ids:
1003.
Knowledge Check - Day 02 + Day 01 Answers
Let’s get the answers to yesterdays quiz out the way first:
Q1: Choice 3
Q2: Choice 2
Tips for Different Engines
Postgres/DuckDB/SparkSQL: Support
LIMIT/OFFSET,FETCH FIRST, andNULLS FIRST/LAST.SQL Server: Use
ORDER BY … OFFSET … ROWS FETCH NEXT … ROWS ONLY; emulateNULLS LASTwith aCASEexpression.
What’s Next?
Day 03: JOINS 101
We’ll cover combining tables safely: picking correct join keys, understanding null expansion in LEFT JOINs, and avoiding accidental cartesian products, with clear visuals and runnable examples. JOINs will be across two separate posts to make sure we cover these properly.
Until then, see you for the next one and… Happy Coding!


