What's the difference between INNER JOIN and LEFT JOIN?

INNER JOIN returns only rows where the join condition matches in both tables — drops anything unmatched. LEFT JOIN keeps all rows from the left table, fills NULLs where the right table has no match. Practical rule: 'I want all customers, plus their orders if any' → LEFT JOIN customers→orders. 'I want only customers who have orders' → INNER JOIN. Most reporting queries you write are LEFT JOINs.

When should I use a CTE instead of a subquery?

CTEs (WITH clauses) make complex queries readable. Use a CTE when you reference the same subquery multiple times, when nesting depth would exceed 2 levels, or when you want named intermediate steps for debugging. Modern engines (PostgreSQL 12+, MySQL 8+) inline CTEs unless you use WITH ... MATERIALIZED, so there's no performance penalty. For one-shot derived tables, plain subqueries are fine.

What does GROUP BY actually do?

GROUP BY collapses rows that share the same value(s) in the grouped columns into one row, applying aggregate functions (COUNT, SUM, AVG, MIN, MAX) across the collapsed rows. Every column in SELECT must either be in GROUP BY or be inside an aggregate. Common mistake: SELECT name, COUNT(*) FROM users — fails because name isn't grouped. Either GROUP BY name or wrap it in an aggregate.

How are window functions different from GROUP BY?

GROUP BY collapses rows. Window functions (OVER clause) keep all rows but compute aggregates across a 'window' of related rows. Use window functions for running totals, ranks, percentiles, lag/lead — anything where you need to know about other rows but want to keep the original row in the output. Example: SELECT name, salary, AVG(salary) OVER (PARTITION BY dept) FROM employees gives every employee with their department's average salary.

Why is my query slow even with an index?

Common reasons: (1) Function on indexed column — WHERE LOWER(email) = 'x' kills the index; use citext or store lowercased values. (2) Leading wildcard — WHERE name LIKE '%foo' can't use a B-tree. (3) OR conditions across columns — engine often picks one index, sequential-scans the rest. (4) Wrong column order in compound index — index on (a,b) doesn't help WHERE b = 1. (5) Out-of-date statistics — run ANALYZE. EXPLAIN is your friend; always check the actual plan.

Hoja de referencia SQL

Referencia completa SQL: SELECT, JOINs, GROUP BY, funciones de ventana, CTEs, INSERT/UPDATE/DELETE, índices y diferencias entre PostgreSQL/MySQL/SQLite.

Basic SELECT structure

SELECT column_a, column_b, AGG(column_c) AS total
FROM table_a
JOIN table_b ON table_a.id = table_b.a_id
WHERE table_a.status = 'active'
GROUP BY column_a, column_b
HAVING AGG(column_c) > 100
ORDER BY total DESC
LIMIT 50;

JOIN types

Type	Returns	Use case
`INNER JOIN`	Rows that match in both tables	Customers who have placed orders
`LEFT JOIN`	All left rows + matching right (NULL otherwise)	All customers, with orders if any
`RIGHT JOIN`	All right rows + matching left	Rare — usually rewritten as LEFT JOIN
`FULL OUTER JOIN`	All rows from both, NULL where no match	Reconciling two datasets
`CROSS JOIN`	Cartesian product	Rare — combinations, calendar dimensions
`SELF JOIN`	Same table joined to itself	Hierarchies, employee→manager

Aggregate functions

Function	Notes
`COUNT(*)`	All rows including NULLs
`COUNT(col)`	Non-NULL values in col
`COUNT(DISTINCT col)`	Unique non-NULL values
`SUM / AVG`	Ignore NULLs
`MIN / MAX`	Smallest / largest non-NULL
`STRING_AGG / GROUP_CONCAT`	Concatenate strings (PostgreSQL / MySQL)
`ARRAY_AGG`	Collect into array (PostgreSQL)

Window functions

-- Running total
SELECT
  date,
  amount,
  SUM(amount) OVER (ORDER BY date) AS running_total
FROM transactions;

-- Rank within partitions
SELECT
  department,
  name,
  salary,
  RANK() OVER (PARTITION BY department ORDER BY salary DESC) AS dept_rank
FROM employees;

-- Previous row's value
SELECT
  date,
  amount,
  LAG(amount) OVER (ORDER BY date) AS prev_amount,
  amount - LAG(amount) OVER (ORDER BY date) AS day_over_day_change
FROM transactions;

Function	Returns
`ROW_NUMBER()`	1, 2, 3, … no ties
`RANK()`	1, 2, 2, 4 — ties get same rank, gap after
`DENSE_RANK()`	1, 2, 2, 3 — ties same rank, no gap
`NTILE(n)`	Bucket into n groups (quartiles, deciles)
`LAG(col)` / `LEAD(col)`	Previous / next row's value
`FIRST_VALUE / LAST_VALUE`	First / last in window

CTEs (Common Table Expressions)

WITH active_customers AS (
  SELECT id, name FROM customers WHERE status = 'active'
),
recent_orders AS (
  SELECT customer_id, SUM(total) AS total
  FROM orders
  WHERE created_at >= NOW() - INTERVAL '30 days'
  GROUP BY customer_id
)
SELECT ac.name, COALESCE(ro.total, 0) AS spent_30d
FROM active_customers ac
LEFT JOIN recent_orders ro ON ro.customer_id = ac.id
ORDER BY spent_30d DESC;

-- Recursive CTE: organisation hierarchy
WITH RECURSIVE tree AS (
  SELECT id, name, manager_id, 1 AS level
  FROM employees WHERE manager_id IS NULL
  UNION ALL
  SELECT e.id, e.name, e.manager_id, t.level + 1
  FROM employees e JOIN tree t ON e.manager_id = t.id
)
SELECT * FROM tree;

INSERT / UPDATE / DELETE

-- Insert single
INSERT INTO users (name, email) VALUES ('Alice', 'a@x.com');

-- Insert from query
INSERT INTO users_archive (id, name)
SELECT id, name FROM users WHERE created_at < '2024-01-01';

-- Upsert (PostgreSQL)
INSERT INTO users (id, email) VALUES (1, 'a@x.com')
ON CONFLICT (id) DO UPDATE SET email = EXCLUDED.email;

-- Update with JOIN-like logic
UPDATE orders SET status = 'shipped'
WHERE id IN (SELECT order_id FROM shipments WHERE shipped_at IS NOT NULL);

-- Delete with safety
DELETE FROM logs WHERE created_at < NOW() - INTERVAL '90 days';
-- Always SELECT first to verify what will be deleted!

Indexes

Type	When to use
B-tree (default)	Equality, range, sorting — covers 95% of cases
Composite	WHERE on multiple columns. Order matters: most-selective first
Unique	Email, slug, business keys — enforces uniqueness
Partial	Index only WHERE active = true — smaller, faster
Expression / functional	WHERE LOWER(email) = ? → index on LOWER(email)
GIN (PostgreSQL)	Full-text search, JSONB, arrays

CREATE INDEX idx_orders_user_created ON orders (user_id, created_at DESC);
CREATE UNIQUE INDEX idx_users_email ON users (email);
CREATE INDEX idx_active ON users (created_at) WHERE status = 'active';

Dialect quick reference

Operation	PostgreSQL	MySQL	SQLite
String concat	`a \|\| b` or `CONCAT`	`CONCAT(a, b)`	`a \|\| b`
Pagination	`LIMIT 10 OFFSET 20`	`LIMIT 20, 10`	`LIMIT 10 OFFSET 20`
Auto-increment	`SERIAL` or `GENERATED`	`AUTO_INCREMENT`	`INTEGER PRIMARY KEY`
Now()	`NOW()` or `CURRENT_TIMESTAMP`	`NOW()`	`CURRENT_TIMESTAMP`
Upsert	`ON CONFLICT`	`ON DUPLICATE KEY UPDATE`	`ON CONFLICT`
Boolean	Native	TINYINT(1)	INTEGER 0/1

Performance hints

Always run EXPLAIN (or EXPLAIN ANALYZE in PostgreSQL) on slow queries — it shows the actual plan and timing.
SELECT only the columns you need — avoid SELECT * in production code.
WHERE before JOIN where possible — filtering early reduces the rows that need joining.
Use EXISTS or IN for membership tests rather than counting all matches.
Don't wrap indexed columns in functions — that disables the index.
Run ANALYZE (or its equivalent) after large data loads — keeps stats fresh for the planner.

The SQL execution order (the one mental model that matters)

SQL is written in this order: SELECT … FROM … WHERE … GROUP BY … HAVING … ORDER BY … LIMIT. But it's executed in a different order: FROM & JOIN first (the rows exist), WHERE filters, GROUP BY collapses, HAVING filters groups, SELECT picks columns, ORDER BY sorts, LIMIT cuts. This explains why you can't reference an alias from SELECT in WHERE — WHERE runs before SELECT exists. You can in ORDER BY because it runs after.

Dialect differences

The core (SELECT/JOIN/GROUP BY) is universal. But pagination, string concatenation, date functions, and DDL diverge. PostgreSQL: LIMIT 10 OFFSET 20, || for concat. MySQL: LIMIT 20, 10, CONCAT(). SQL Server: OFFSET 20 ROWS FETCH NEXT 10 ROWS ONLY, + concat. SQLite is mostly PostgreSQL-compatible. When portability matters, prefer LIMIT/OFFSET and CONCAT().

Common SQL use cases by role

Backend engineers write queries to fetch paginated user records, aggregate analytics events, and join user and order tables for reporting. Data analysts use window functions and CTEs to build cohort analyses, funnel drop-off reports, and rolling 7-day averages. QA engineers write ad-hoc queries to verify database state after test runs. Product managers with basic SQL use simple SELECT … WHERE … GROUP BY queries to answer business questions without waiting for a data request. DevOps engineers query log tables for error rates and latency percentiles.

Performance tips that matter for real queries

Always run EXPLAIN ANALYZE (PostgreSQL) or EXPLAIN (MySQL) before deploying a query to production — the plan often reveals sequential scans on large tables that need an index. Index foreign key columns: they are almost always used in JOIN conditions and are frequently missed. Avoid SELECT * in application queries — fetching unused columns wastes I/O and breaks when schema changes. For large analytical queries, materialize intermediate results into temp tables rather than nesting 5-level subqueries — it's easier to debug and often faster because the planner can create indexes on temp tables.

SQL vs NoSQL — when SQL wins

SQL databases (PostgreSQL, MySQL, SQLite) excel when data has a fixed, well-understood schema; when relationships between entities (users → orders → products) need to be queried flexibly; and when ACID transaction guarantees matter (financial records, inventory). NoSQL databases (MongoDB, DynamoDB, Cassandra) win when schema is fluid, when write throughput vastly exceeds read complexity, or when horizontal scaling across thousands of nodes is required. For most web applications up to 100M rows, a well-indexed PostgreSQL database handles everything without NoSQL complexity.

Preguntas frecuentes

What's the difference between INNER JOIN and LEFT JOIN?: INNER JOIN returns only rows where the join condition matches in both tables — drops anything unmatched. LEFT JOIN keeps all rows from the left table, fills NULLs where the right table has no match. Practical rule: 'I want all customers, plus their orders if any' → LEFT JOIN customers→orders. 'I want only customers who have orders' → INNER JOIN. Most reporting queries you write are LEFT JOINs.
When should I use a CTE instead of a subquery?: CTEs (WITH clauses) make complex queries readable. Use a CTE when you reference the same subquery multiple times, when nesting depth would exceed 2 levels, or when you want named intermediate steps for debugging. Modern engines (PostgreSQL 12+, MySQL 8+) inline CTEs unless you use WITH ... MATERIALIZED, so there's no performance penalty. For one-shot derived tables, plain subqueries are fine.
What does GROUP BY actually do?: GROUP BY collapses rows that share the same value(s) in the grouped columns into one row, applying aggregate functions (COUNT, SUM, AVG, MIN, MAX) across the collapsed rows. Every column in SELECT must either be in GROUP BY or be inside an aggregate. Common mistake: SELECT name, COUNT(*) FROM users — fails because name isn't grouped. Either GROUP BY name or wrap it in an aggregate.
How are window functions different from GROUP BY?: GROUP BY collapses rows. Window functions (OVER clause) keep all rows but compute aggregates across a 'window' of related rows. Use window functions for running totals, ranks, percentiles, lag/lead — anything where you need to know about other rows but want to keep the original row in the output. Example: SELECT name, salary, AVG(salary) OVER (PARTITION BY dept) FROM employees gives every employee with their department's average salary.
Why is my query slow even with an index?: Common reasons: (1) Function on indexed column — WHERE LOWER(email) = 'x' kills the index; use citext or store lowercased values. (2) Leading wildcard — WHERE name LIKE '%foo' can't use a B-tree. (3) OR conditions across columns — engine often picks one index, sequential-scans the rest. (4) Wrong column order in compound index — index on (a,b) doesn't help WHERE b = 1. (5) Out-of-date statistics — run ANALYZE. EXPLAIN is your friend; always check the actual plan.