Stax

SQL Cheat Sheet

Comprehensive SQL reference: SELECT, JOINs, GROUP BY, window functions, CTEs, INSERT/UPDATE/DELETE, indexes, and PostgreSQL/MySQL/SQLite differences.

Basic SELECT structure

SELECT column_a, column_b, AGG(column_c) AS total
FROM table_a
JOIN table_b ON table_a.id = table_b.a_id
WHERE table_a.status = 'active'
GROUP BY column_a, column_b
HAVING AGG(column_c) > 100
ORDER BY total DESC
LIMIT 50;

JOIN types

TypeReturnsUse case
INNER JOINRows that match in both tablesCustomers who have placed orders
LEFT JOINAll left rows + matching right (NULL otherwise)All customers, with orders if any
RIGHT JOINAll right rows + matching leftRare — usually rewritten as LEFT JOIN
FULL OUTER JOINAll rows from both, NULL where no matchReconciling two datasets
CROSS JOINCartesian productRare — combinations, calendar dimensions
SELF JOINSame table joined to itselfHierarchies, employee→manager

Aggregate functions

FunctionNotes
COUNT(*)All rows including NULLs
COUNT(col)Non-NULL values in col
COUNT(DISTINCT col)Unique non-NULL values
SUM / AVGIgnore NULLs
MIN / MAXSmallest / largest non-NULL
STRING_AGG / GROUP_CONCATConcatenate strings (PostgreSQL / MySQL)
ARRAY_AGGCollect into array (PostgreSQL)

Window functions

-- Running total
SELECT
  date,
  amount,
  SUM(amount) OVER (ORDER BY date) AS running_total
FROM transactions;

-- Rank within partitions
SELECT
  department,
  name,
  salary,
  RANK() OVER (PARTITION BY department ORDER BY salary DESC) AS dept_rank
FROM employees;

-- Previous row's value
SELECT
  date,
  amount,
  LAG(amount) OVER (ORDER BY date) AS prev_amount,
  amount - LAG(amount) OVER (ORDER BY date) AS day_over_day_change
FROM transactions;
FunctionReturns
ROW_NUMBER()1, 2, 3, … no ties
RANK()1, 2, 2, 4 — ties get same rank, gap after
DENSE_RANK()1, 2, 2, 3 — ties same rank, no gap
NTILE(n)Bucket into n groups (quartiles, deciles)
LAG(col) / LEAD(col)Previous / next row's value
FIRST_VALUE / LAST_VALUEFirst / last in window

CTEs (Common Table Expressions)

WITH active_customers AS (
  SELECT id, name FROM customers WHERE status = 'active'
),
recent_orders AS (
  SELECT customer_id, SUM(total) AS total
  FROM orders
  WHERE created_at >= NOW() - INTERVAL '30 days'
  GROUP BY customer_id
)
SELECT ac.name, COALESCE(ro.total, 0) AS spent_30d
FROM active_customers ac
LEFT JOIN recent_orders ro ON ro.customer_id = ac.id
ORDER BY spent_30d DESC;

-- Recursive CTE: organisation hierarchy
WITH RECURSIVE tree AS (
  SELECT id, name, manager_id, 1 AS level
  FROM employees WHERE manager_id IS NULL
  UNION ALL
  SELECT e.id, e.name, e.manager_id, t.level + 1
  FROM employees e JOIN tree t ON e.manager_id = t.id
)
SELECT * FROM tree;

INSERT / UPDATE / DELETE

-- Insert single
INSERT INTO users (name, email) VALUES ('Alice', 'a@x.com');

-- Insert from query
INSERT INTO users_archive (id, name)
SELECT id, name FROM users WHERE created_at < '2024-01-01';

-- Upsert (PostgreSQL)
INSERT INTO users (id, email) VALUES (1, 'a@x.com')
ON CONFLICT (id) DO UPDATE SET email = EXCLUDED.email;

-- Update with JOIN-like logic
UPDATE orders SET status = 'shipped'
WHERE id IN (SELECT order_id FROM shipments WHERE shipped_at IS NOT NULL);

-- Delete with safety
DELETE FROM logs WHERE created_at < NOW() - INTERVAL '90 days';
-- Always SELECT first to verify what will be deleted!

Indexes

TypeWhen to use
B-tree (default)Equality, range, sorting — covers 95% of cases
CompositeWHERE on multiple columns. Order matters: most-selective first
UniqueEmail, slug, business keys — enforces uniqueness
PartialIndex only WHERE active = true — smaller, faster
Expression / functionalWHERE LOWER(email) = ? → index on LOWER(email)
GIN (PostgreSQL)Full-text search, JSONB, arrays
CREATE INDEX idx_orders_user_created ON orders (user_id, created_at DESC);
CREATE UNIQUE INDEX idx_users_email ON users (email);
CREATE INDEX idx_active ON users (created_at) WHERE status = 'active';

Dialect quick reference

OperationPostgreSQLMySQLSQLite
String concata || b or CONCATCONCAT(a, b)a || b
PaginationLIMIT 10 OFFSET 20LIMIT 20, 10LIMIT 10 OFFSET 20
Auto-incrementSERIAL or GENERATEDAUTO_INCREMENTINTEGER PRIMARY KEY
Now()NOW() or CURRENT_TIMESTAMPNOW()CURRENT_TIMESTAMP
UpsertON CONFLICTON DUPLICATE KEY UPDATEON CONFLICT
BooleanNativeTINYINT(1)INTEGER 0/1

Performance hints

  • Always run EXPLAIN (or EXPLAIN ANALYZE in PostgreSQL) on slow queries — it shows the actual plan and timing.
  • SELECT only the columns you need — avoid SELECT * in production code.
  • WHERE before JOIN where possible — filtering early reduces the rows that need joining.
  • Use EXISTS or IN for membership tests rather than counting all matches.
  • Don't wrap indexed columns in functions — that disables the index.
  • Run ANALYZE (or its equivalent) after large data loads — keeps stats fresh for the planner.

The SQL execution order (the one mental model that matters)

SQL is written in this order: SELECT … FROM … WHERE … GROUP BY … HAVING … ORDER BY … LIMIT. But it's executed in a different order: FROM & JOIN first (the rows exist), WHERE filters, GROUP BY collapses, HAVING filters groups, SELECT picks columns, ORDER BY sorts, LIMIT cuts. This explains why you can't reference an alias from SELECT in WHERE — WHERE runs before SELECT exists. You can in ORDER BY because it runs after.

Dialect differences

The core (SELECT/JOIN/GROUP BY) is universal. But pagination, string concatenation, date functions, and DDL diverge. PostgreSQL: LIMIT 10 OFFSET 20, || for concat. MySQL: LIMIT 20, 10, CONCAT(). SQL Server: OFFSET 20 ROWS FETCH NEXT 10 ROWS ONLY, + concat. SQLite is mostly PostgreSQL-compatible. When portability matters, prefer LIMIT/OFFSET and CONCAT().

Frequently asked questions

What's the difference between INNER JOIN and LEFT JOIN?
INNER JOIN returns only rows where the join condition matches in both tables — drops anything unmatched. LEFT JOIN keeps all rows from the left table, fills NULLs where the right table has no match. Practical rule: 'I want all customers, plus their orders if any' → LEFT JOIN customers→orders. 'I want only customers who have orders' → INNER JOIN. Most reporting queries you write are LEFT JOINs.
When should I use a CTE instead of a subquery?
CTEs (WITH clauses) make complex queries readable. Use a CTE when you reference the same subquery multiple times, when nesting depth would exceed 2 levels, or when you want named intermediate steps for debugging. Modern engines (PostgreSQL 12+, MySQL 8+) inline CTEs unless you use WITH ... MATERIALIZED, so there's no performance penalty. For one-shot derived tables, plain subqueries are fine.
What does GROUP BY actually do?
GROUP BY collapses rows that share the same value(s) in the grouped columns into one row, applying aggregate functions (COUNT, SUM, AVG, MIN, MAX) across the collapsed rows. Every column in SELECT must either be in GROUP BY or be inside an aggregate. Common mistake: SELECT name, COUNT(*) FROM users — fails because name isn't grouped. Either GROUP BY name or wrap it in an aggregate.
How are window functions different from GROUP BY?
GROUP BY collapses rows. Window functions (OVER clause) keep all rows but compute aggregates across a 'window' of related rows. Use window functions for running totals, ranks, percentiles, lag/lead — anything where you need to know about other rows but want to keep the original row in the output. Example: SELECT name, salary, AVG(salary) OVER (PARTITION BY dept) FROM employees gives every employee with their department's average salary.
Why is my query slow even with an index?
Common reasons: (1) Function on indexed column — WHERE LOWER(email) = 'x' kills the index; use citext or store lowercased values. (2) Leading wildcard — WHERE name LIKE '%foo' can't use a B-tree. (3) OR conditions across columns — engine often picks one index, sequential-scans the rest. (4) Wrong column order in compound index — index on (a,b) doesn't help WHERE b = 1. (5) Out-of-date statistics — run ANALYZE. EXPLAIN is your friend; always check the actual plan.

Related tools