Stax
Tools
developer-toolsweburlencoding

URL Encoding Explained: Why Your Links Break and How Percent-Encoding Fixes Them

What percent-encoding actually is, which characters must be encoded in a URL and why, the difference between encodeURI and encodeURIComponent in JavaScript, and common bugs it causes in production.

Harshil
Harshil
··5 min read
🌐

This article is currently only available in English. A Français translation is coming soon.

URL Encoding Explained: Why Your Links Break and How Percent-Encoding Fixes Them

You paste a URL into a browser. It works. You embed the same URL in a query string parameter. It breaks. You spend 20 minutes wondering why a working link suddenly 404s — until you realise the problem is a space, an ampersand, or an accented character that means something different to the HTTP parser than it does to you.

This is a URL encoding problem. It affects every developer who builds links programmatically, processes form submissions, or works with APIs that accept URLs as parameters.


Why URLs can only contain a restricted character set

The URL specification (RFC 3986) defines a strict set of characters that are allowed in a URL without encoding:

  • Unreserved characters (always safe): A–Z a–z 0–9 - _ . ~
  • Reserved characters (have special meaning in URLs): : / ? # [ ] @ ! $ & ' ( ) * + , ; =

Every other character — spaces, accented letters, emoji, CJK characters, angle brackets, pipes — must be percent-encoded before appearing in a URL.

Percent-encoding replaces each byte of the character's UTF-8 representation with % followed by two uppercase hex digits. A space becomes %20. An é (U+00E9) is encoded in UTF-8 as the bytes 0xC3 0xA9, so it becomes %C3%A9. A # in a query string parameter becomes %23 — because unencoded, it would be interpreted as the fragment identifier.


The production bug that URL encoding causes

Here is the scenario. Your application builds a redirect URL:

const searchTerm = "noise & vibration safety";
const url = `https://stax.tools/search?q=${searchTerm}`;
// Produces: https://stax.tools/search?q=noise & vibration safety

The & in the search term prematurely ends the q parameter and starts what the parser treats as a second parameter named vibration safety. Your server receives q=noise (with a trailing space), not q=noise & vibration safety. The space itself may be treated as + or left as-is depending on the server.

The fix:

const url = `https://stax.tools/search?q=${encodeURIComponent(searchTerm)}`;
// Produces: https://stax.tools/search?q=noise%20%26%20vibration%20safety

This is the most common URL encoding bug in production systems. It appears in redirect URLs, OAuth state parameters, email confirmation links, deep links, and any place where user-provided text becomes part of a URL.

Encode and decode any string instantly at the Stax URL Encoder.


encodeURI vs encodeURIComponent: the difference that matters

JavaScript provides two encoding functions and they do very different things.

Function What it encodes What it leaves alone Use case
encodeURI Everything except unreserved + reserved characters : / ? # [ ] @ ! $ & ' ( ) * + , ; = Encoding a complete URL
encodeURIComponent Everything except unreserved characters A–Z a–z 0–9 - _ . ~ Encoding a single query parameter value

The distinction matters because reserved characters like &, =, +, and # have structural meaning in URLs. encodeURI leaves them unencoded so the URL structure stays intact. encodeURIComponent encodes them, which is what you need when those characters appear as values rather than URL structure.

Rule of thumb: Always use encodeURIComponent when constructing individual query parameter values. Use encodeURI only when encoding a full URL that you've already constructed correctly.

// Wrong: encodeURI doesn't encode & and =
encodeURI("key=value&other=thing")
// → "key=value&other=thing"  (unchanged — still breaks)

// Right: encodeURIComponent encodes & and =
encodeURIComponent("key=value&other=thing")
// → "key%3Dvalue%26other%3Dthing"  (safe as a parameter value)

The + vs %20 confusion

HTML form submissions encode spaces as + (plus sign) rather than %20. This is application/x-www-form-urlencoded format, which predates RFC 3986 and behaves differently.

When a browser submits a form with method="GET", spaces become +. When JavaScript encodes a string with encodeURIComponent, spaces become %20.

Most web frameworks decode both correctly. But if you're building a URL manually and passing it to a system that uses strict RFC 3986 parsing, use %20 — it's unambiguous. If you're building a query string for an HTML form action, + is expected.

Encoding Space representation Standard
encodeURIComponent %20 RFC 3986
HTML form (GET) + HTML spec / application/x-www-form-urlencoded
encodeURIComponent + replace + Common convention for query strings

Path segments vs query strings: different rules

The slash / in a URL path is a structural delimiter. If a file name or resource ID contains a /, it must be encoded as %2F in the path segment — otherwise the server interprets it as a directory separator.

GET /files/report/Q1 2026.pdf       ← broken (unencoded space)
GET /files/report/Q1%202026.pdf     ← correct
GET /files/folder/subfolder/file    ← server sees three path segments
GET /files/folder%2Fsubfolder/file  ← server sees two path segments

Some servers (including AWS S3 and some proxies) refuse %2F in path segments by default and require additional configuration. This is a known compatibility issue when object keys contain slashes.


Decoding: when not to double-decode

If you receive a URL-encoded string from a user or external system and then encode it again before storing or forwarding it, you get double-encoding: %20 becomes %2520 (the % itself gets encoded to %25). Subsequent decoding gives you %20 as a literal string, not a space.

Always decode before re-encoding if you're transforming a value. Never call encodeURIComponent on a string that might already be encoded.

// Safe decode-then-encode pattern
const raw = decodeURIComponent(potentiallyEncoded);
const safe = encodeURIComponent(raw);

International domain names (IDN) and Punycode

Domain names have their own encoding system separate from percent-encoding. The domain münchen.de cannot appear literally in a URL — it must be converted to its Punycode representation: xn--mnchen-3ya.de. This is handled automatically by browsers and most HTTP libraries, but if you're processing domains programmatically, be aware that .toASCII() (in the URL API) or a Punycode library is required for non-ASCII hostnames.

The URL API in modern browsers handles this correctly:

new URL("https://münchen.de/path").hostname
// → "xn--mnchen-3ya.de"

Quick reference: commonly encoded characters

Character Encoded Why it must be encoded in query strings
Space %20 Delimiter in many contexts
& %26 Query string parameter separator
= %3D Key-value separator
+ %2B Alternative space in form encoding
# %23 Fragment identifier
% %25 Encoding escape character
/ %2F Path delimiter
? %3F Query string start
@ %40 User info delimiter
: %3A Scheme/port delimiter

Use the Stax URL Encoder / Decoder to encode, decode, and inspect any string in real time.


By Harshil Shah, developer and founder at Stax Tools. Encoding rules verified against RFC 3986 and the WHATWG URL Standard.

Sources & methodology

  1. RFC 3986 — Uniform Resource Identifier (URI): Generic Syntax, IETF, January 2005
  2. WHATWG URL Standard — url.spec.whatwg.org (living standard)
  3. HTML Living Standard — forms section, html.spec.whatwg.org
Harshil

Harshil

Developer & Founder, stax.tools

Harshil is the developer behind stax.tools, building privacy-first tools that run entirely in your browser.

More by Harshil →

🛠️

Found this useful?

Browse 235+ free privacy-first tools — no login, no uploads, instant results.

Browse tools →
← Back to all posts