What's the difference between INNER JOIN and LEFT JOIN?

INNER JOIN returns only rows where the join condition matches in both tables — drops anything unmatched. LEFT JOIN keeps all rows from the left table, fills NULLs where the right table has no match. Practical rule: 'I want all customers, plus their orders if any' → LEFT JOIN customers→orders. 'I want only customers who have orders' → INNER JOIN. Most reporting queries you write are LEFT JOINs.

When should I use a CTE instead of a subquery?

CTEs (WITH clauses) make complex queries readable. Use a CTE when you reference the same subquery multiple times, when nesting depth would exceed 2 levels, or when you want named intermediate steps for debugging. Modern engines (PostgreSQL 12+, MySQL 8+) inline CTEs unless you use WITH ... MATERIALIZED, so there's no performance penalty. For one-shot derived tables, plain subqueries are fine.

What does GROUP BY actually do?

GROUP BY collapses rows that share the same value(s) in the grouped columns into one row, applying aggregate functions (COUNT, SUM, AVG, MIN, MAX) across the collapsed rows. Every column in SELECT must either be in GROUP BY or be inside an aggregate. Common mistake: SELECT name, COUNT(*) FROM users — fails because name isn't grouped. Either GROUP BY name or wrap it in an aggregate.

How are window functions different from GROUP BY?

GROUP BY collapses rows. Window functions (OVER clause) keep all rows but compute aggregates across a 'window' of related rows. Use window functions for running totals, ranks, percentiles, lag/lead — anything where you need to know about other rows but want to keep the original row in the output. Example: SELECT name, salary, AVG(salary) OVER (PARTITION BY dept) FROM employees gives every employee with their department's average salary.

Why is my query slow even with an index?

Common reasons: (1) Function on indexed column — WHERE LOWER(email) = 'x' kills the index; use citext or store lowercased values. (2) Leading wildcard — WHERE name LIKE '%foo' can't use a B-tree. (3) OR conditions across columns — engine often picks one index, sequential-scans the rest. (4) Wrong column order in compound index — index on (a,b) doesn't help WHERE b = 1. (5) Out-of-date statistics — run ANALYZE. EXPLAIN is your friend; always check the actual plan.

SQL Cheat Sheet

Comprehensive SQL reference: SELECT, JOINs, GROUP BY, window functions, CTEs, INSERT/UPDATE/DELETE, indexes, and PostgreSQL/MySQL/SQLite differences.

Basic SELECT structure

SELECT column_a, column_b, AGG(column_c) AS total
FROM table_a
JOIN table_b ON table_a.id = table_b.a_id
WHERE table_a.status = 'active'
GROUP BY column_a, column_b
HAVING AGG(column_c) > 100
ORDER BY total DESC
LIMIT 50;

JOIN types

Type	Returns	Use case
`INNER JOIN`	Rows that match in both tables	Customers who have placed orders
`LEFT JOIN`	All left rows + matching right (NULL otherwise)	All customers, with orders if any
`RIGHT JOIN`	All right rows + matching left	Rare — usually rewritten as LEFT JOIN
`FULL OUTER JOIN`	All rows from both, NULL where no match	Reconciling two datasets
`CROSS JOIN`	Cartesian product	Rare — combinations, calendar dimensions
`SELF JOIN`	Same table joined to itself	Hierarchies, employee→manager

Aggregate functions

Function	Notes
`COUNT(*)`	All rows including NULLs
`COUNT(col)`	Non-NULL values in col
`COUNT(DISTINCT col)`	Unique non-NULL values
`SUM / AVG`	Ignore NULLs
`MIN / MAX`	Smallest / largest non-NULL
`STRING_AGG / GROUP_CONCAT`	Concatenate strings (PostgreSQL / MySQL)
`ARRAY_AGG`	Collect into array (PostgreSQL)

Window functions

-- Running total
SELECT
  date,
  amount,
  SUM(amount) OVER (ORDER BY date) AS running_total
FROM transactions;

-- Rank within partitions
SELECT
  department,
  name,
  salary,
  RANK() OVER (PARTITION BY department ORDER BY salary DESC) AS dept_rank
FROM employees;

-- Previous row's value
SELECT
  date,
  amount,
  LAG(amount) OVER (ORDER BY date) AS prev_amount,
  amount - LAG(amount) OVER (ORDER BY date) AS day_over_day_change
FROM transactions;

Function	Returns
`ROW_NUMBER()`	1, 2, 3, … no ties
`RANK()`	1, 2, 2, 4 — ties get same rank, gap after
`DENSE_RANK()`	1, 2, 2, 3 — ties same rank, no gap
`NTILE(n)`	Bucket into n groups (quartiles, deciles)
`LAG(col)` / `LEAD(col)`	Previous / next row's value
`FIRST_VALUE / LAST_VALUE`	First / last in window

CTEs (Common Table Expressions)

WITH active_customers AS (
  SELECT id, name FROM customers WHERE status = 'active'
),
recent_orders AS (
  SELECT customer_id, SUM(total) AS total
  FROM orders
  WHERE created_at >= NOW() - INTERVAL '30 days'
  GROUP BY customer_id
)
SELECT ac.name, COALESCE(ro.total, 0) AS spent_30d
FROM active_customers ac
LEFT JOIN recent_orders ro ON ro.customer_id = ac.id
ORDER BY spent_30d DESC;

-- Recursive CTE: organisation hierarchy
WITH RECURSIVE tree AS (
  SELECT id, name, manager_id, 1 AS level
  FROM employees WHERE manager_id IS NULL
  UNION ALL
  SELECT e.id, e.name, e.manager_id, t.level + 1
  FROM employees e JOIN tree t ON e.manager_id = t.id
)
SELECT * FROM tree;

INSERT / UPDATE / DELETE

-- Insert single
INSERT INTO users (name, email) VALUES ('Alice', 'a@x.com');

-- Insert from query
INSERT INTO users_archive (id, name)
SELECT id, name FROM users WHERE created_at < '2024-01-01';

-- Upsert (PostgreSQL)
INSERT INTO users (id, email) VALUES (1, 'a@x.com')
ON CONFLICT (id) DO UPDATE SET email = EXCLUDED.email;

-- Update with JOIN-like logic
UPDATE orders SET status = 'shipped'
WHERE id IN (SELECT order_id FROM shipments WHERE shipped_at IS NOT NULL);

-- Delete with safety
DELETE FROM logs WHERE created_at < NOW() - INTERVAL '90 days';
-- Always SELECT first to verify what will be deleted!

Indexes

Type	When to use
B-tree (default)	Equality, range, sorting — covers 95% of cases
Composite	WHERE on multiple columns. Order matters: most-selective first
Unique	Email, slug, business keys — enforces uniqueness
Partial	Index only WHERE active = true — smaller, faster
Expression / functional	WHERE LOWER(email) = ? → index on LOWER(email)
GIN (PostgreSQL)	Full-text search, JSONB, arrays

CREATE INDEX idx_orders_user_created ON orders (user_id, created_at DESC);
CREATE UNIQUE INDEX idx_users_email ON users (email);
CREATE INDEX idx_active ON users (created_at) WHERE status = 'active';

Dialect quick reference

Operation	PostgreSQL	MySQL	SQLite
String concat	`a \|\| b` or `CONCAT`	`CONCAT(a, b)`	`a \|\| b`
Pagination	`LIMIT 10 OFFSET 20`	`LIMIT 20, 10`	`LIMIT 10 OFFSET 20`
Auto-increment	`SERIAL` or `GENERATED`	`AUTO_INCREMENT`	`INTEGER PRIMARY KEY`
Now()	`NOW()` or `CURRENT_TIMESTAMP`	`NOW()`	`CURRENT_TIMESTAMP`
Upsert	`ON CONFLICT`	`ON DUPLICATE KEY UPDATE`	`ON CONFLICT`
Boolean	Native	TINYINT(1)	INTEGER 0/1

Performance hints

Always run EXPLAIN (or EXPLAIN ANALYZE in PostgreSQL) on slow queries — it shows the actual plan and timing.
SELECT only the columns you need — avoid SELECT * in production code.
WHERE before JOIN where possible — filtering early reduces the rows that need joining.
Use EXISTS or IN for membership tests rather than counting all matches.
Don't wrap indexed columns in functions — that disables the index.
Run ANALYZE (or its equivalent) after large data loads — keeps stats fresh for the planner.

SQLの実行順序（最も重要なメンタルモデル）

方言の違い

SQLはこの順序で書きます：SELECT … FROM … WHERE … GROUP BY … HAVING … ORDER BY … LIMIT。しかし実行される順序は異なります：まずFROM & JOIN（行が存在する）、WHEREでフィルタリング、GROUP BYで集約、HAVINGでグループをフィルタリング、SELECTで列を選択、ORDER BYで並び替え、LIMITで切り取り。これがWHERE句でSELECTのエイリアスを参照できない理由です——WHEREはSELECTが存在する前に実行されます。ORDER BYでは参照できます——SELECTの後に実行されるからです。

コア（SELECT/JOIN/GROUP BY）は共通です。しかしページネーション、文字列連結、日付関数、DDLは異なります。PostgreSQL：LIMIT 10 OFFSET 20、連結には||。MySQL：LIMIT 20, 10、CONCAT()。SQL Server：OFFSET 20 ROWS FETCH NEXT 10 ROWS ONLY、+で連結。SQLiteはほぼPostgreSQL互換です。移植性が重要な場合はLIMIT/OFFSETとCONCAT()を使用してください。

バックエンドエンジニアは、ページネーションされたユーザーレコードの取得、分析イベントの集計、レポート用のユーザーと注文テーブルの結合にクエリを書きます。データアナリストはウィンドウ関数とCTEを使ってコホート分析、ファネルドロップオフレポート、7日間のローリング平均を構築します。QAエンジニアはテスト実行後のデータベース状態を確認するアドホッククエリを書きます。基本的なSQLを持つプロダクトマネージャーは、データリクエストを待たずにシンプルなSELECT … WHERE … GROUP BYクエリでビジネスの質問に答えます。DevOpsエンジニアはエラー率やレイテンシのパーセンタイルのためにログテーブルをクエリします。

本番にクエリをデプロイする前に必ずEXPLAIN ANALYZE（PostgreSQL）またはEXPLAIN（MySQL）を実行してください——プランにはインデックスが必要な大きなテーブルへのシーケンシャルスキャンが現れることがよくあります。外部キー列にインデックスを付けてください：JOIN条件でほぼ常に使用され、よく見落とされます。アプリケーションクエリではSELECT *を避けてください——未使用の列のフェッチはI/Oを無駄にし、スキーマ変更時に壊れます。大きな分析クエリでは、5レベルのサブクエリをネストするより中間結果を一時テーブルに展開してください——デバッグしやすく、プランナーが一時テーブルにインデックスを作成できるため多くの場合より速くなります。

SQLデータベース（PostgreSQL、MySQL、SQLite）は、データが固定された明確なスキーマを持つ場合、エンティティ間の関係（ユーザー→注文→製品）を柔軟にクエリする必要がある場合、ACID トランザクション保証が重要な場合（財務記録、在庫）に優れています。NoSQLデータベース（MongoDB、DynamoDB、Cassandra）は、スキーマが流動的な場合、書き込みスループットが読み取りの複雑さをはるかに超える場合、数千ノードでの水平スケーリングが必要な場合に有利です。1億行までのほとんどのウェブアプリケーションでは、適切にインデックスが設定されたPostgreSQLデータベースがNoSQLの複雑さなしにすべてを処理します。

よくある質問

What's the difference between INNER JOIN and LEFT JOIN?: INNER JOIN returns only rows where the join condition matches in both tables — drops anything unmatched. LEFT JOIN keeps all rows from the left table, fills NULLs where the right table has no match. Practical rule: 'I want all customers, plus their orders if any' → LEFT JOIN customers→orders. 'I want only customers who have orders' → INNER JOIN. Most reporting queries you write are LEFT JOINs.
When should I use a CTE instead of a subquery?: CTEs (WITH clauses) make complex queries readable. Use a CTE when you reference the same subquery multiple times, when nesting depth would exceed 2 levels, or when you want named intermediate steps for debugging. Modern engines (PostgreSQL 12+, MySQL 8+) inline CTEs unless you use WITH ... MATERIALIZED, so there's no performance penalty. For one-shot derived tables, plain subqueries are fine.
What does GROUP BY actually do?: GROUP BY collapses rows that share the same value(s) in the grouped columns into one row, applying aggregate functions (COUNT, SUM, AVG, MIN, MAX) across the collapsed rows. Every column in SELECT must either be in GROUP BY or be inside an aggregate. Common mistake: SELECT name, COUNT(*) FROM users — fails because name isn't grouped. Either GROUP BY name or wrap it in an aggregate.
How are window functions different from GROUP BY?: GROUP BY collapses rows. Window functions (OVER clause) keep all rows but compute aggregates across a 'window' of related rows. Use window functions for running totals, ranks, percentiles, lag/lead — anything where you need to know about other rows but want to keep the original row in the output. Example: SELECT name, salary, AVG(salary) OVER (PARTITION BY dept) FROM employees gives every employee with their department's average salary.
Why is my query slow even with an index?: Common reasons: (1) Function on indexed column — WHERE LOWER(email) = 'x' kills the index; use citext or store lowercased values. (2) Leading wildcard — WHERE name LIKE '%foo' can't use a B-tree. (3) OR conditions across columns — engine often picks one index, sequential-scans the rest. (4) Wrong column order in compound index — index on (a,b) doesn't help WHERE b = 1. (5) Out-of-date statistics — run ANALYZE. EXPLAIN is your friend; always check the actual plan.