Why is my Base64 output 33% larger than the input?

Base64 groups bytes in sets of 3 and represents each group with 4 ASCII characters. Three bytes (24 bits) become 4 characters (32 bits), an overhead of 4/3 ≈ 1.333. A 100 KB image encoded as Base64 will be roughly 136 KB. This overhead is unavoidable: it's the cost of representing arbitrary binary data using a printable 64-symbol alphabet.

Base64 vs hexadecimal: when should I use each one?

Hex represents every byte as 2 characters (100% overhead); Base64 averages 1.33 characters per byte (33% overhead), so Base64 is more compact. Hex is easier to read when debugging short values like SHA-256 hashes or memory addresses, since each nibble maps directly to a digit. For transporting large binary payloads — images, certificates, cryptographic keys — Base64 is the better choice. For displaying a hash or a digital signature, hex wins on readability.

Why does my JWT have dots (.) between the Base64 segments?

A JWT (JSON Web Token) is made of three independently Base64url-encoded parts — header, payload, and signature — joined by dots. Each part is encoded using Base64url (- instead of +, _ instead of /, no padding =). The dots are fixed delimiters defined in RFC 7519; they are a JWT formatting convention, not something special about Base64 itself.

My Base64 string contains spaces or line breaks — is that normal?

Yes, in MIME contexts such as email attachments and PEM certificates. RFC 2045 requires a line break every 76 characters for compatibility with older mail systems. A .pem certificate file starts with -----BEGIN CERTIFICATE----- and contains Base64 wrapped at 64 columns. If you're decoding Base64 from an email or a certificate, strip whitespace and newlines before decoding, or use a library that handles this automatically.

Standard Base64 vs URL-safe Base64: how do I know which one to use?

Simple rule: if your Base64 string will travel inside a URL (query parameter, path segment, cookie) or inside a JSON/JWT context, use Base64url (RFC 4648 §5) with - instead of + and _ instead of /. For email MIME attachments, PEM certificates, or any purely textual context outside of URLs, use standard Base64. The two variants are identical except for those two characters and the optional padding.

Base64 explained: encoding, security, use cases

What Base64 is — and what it is not

Base64 is an encoding scheme, not encryption. This distinction matters enormously and is routinely misunderstood, even by experienced developers. Encoding transforms data from one representation to another in a fully reversible, secret-free way. Encryption transforms data into a form that is unreadable without a secret key. Base64 protects nothing: anyone can decode a Base64 string in a single line of code, with no password, no key, and no specialized tool.

The technical definition: Base64 is an encoding that represents arbitrary binary data as ASCII text using an alphabet of 64 printable characters. Its historical purpose was to allow binary payloads — images, attachments, executables — to travel over channels that could only transmit 7-bit ASCII text, namely the email protocols of the 1980s and 1990s.

Origins: MIME and RFC 4648

Base64 emerged from the MIME standard (Multipurpose Internet Mail Extensions), formalized in RFC 1521 in 1993 and later in RFC 2045. Mail servers of the era reliably forwarded only printable ASCII characters (codes 32–126). A binary file attachment — a Word document, a JPEG image — had to be converted to text before it could be sent by email.

Today, the canonical specification is RFC 4648 (2006), which also covers the Base64url, Base32, and Base16 variants. It is the reference document consulted by implementors of cryptographic libraries, JWT parsers, and web protocols.

Encoding vs encryption vs hashing

A precise taxonomy:

Encoding (Base64, URL encoding, UTF-8): reversible transformation with no secret. Anyone can reverse it. Purpose: transport compatibility or representation.
Encryption (AES-256, RSA, ChaCha20): reversible only with the correct secret key. Purpose: confidentiality.
Hashing (SHA-256, bcrypt, Argon2): one-way, irreversible transformation. Purpose: integrity verification or secure password storage.

Base64 belongs firmly in the first category. Treating it as encryption is a serious security mistake — more on that shortly.

How Base64 works, byte by byte

The mechanics of Base64 rest on a straightforward idea: regroup the input bits into 6-bit blocks, then represent each 6-bit block as a character from a 64-symbol alphabet (2⁶ = 64). Here is the full algorithm.

The 64-character alphabet

The standard Base64 alphabet (RFC 4648 §4):

Index  0–25  : A B C D E F G H I J K L M N O P Q R S T U V W X Y Z
Index 26–51  : a b c d e f g h i j k l m n o p q r s t u v w x y z
Index 52–61  : 0 1 2 3 4 5 6 7 8 9
Index 62     : +
Index 63     : /
Padding      : =

All 64 characters are printable ASCII that posed no problem on legacy text-only protocols. The = sign is used exclusively as padding at the end of the encoded string.

Groups of 3 bytes become 4 characters

The algorithm reads input in 3-byte blocks (24 bits). Those 24 bits are split into four 6-bit groups, each mapped to a character in the alphabet. Example with "Man" (ASCII 77, 97, 110):

M          a          n
01001101   01100001   01101110
└──────────────────────────────┘ 24 bits grouped
 010011  010110  000101  101110
   19      22       5      46
    T       W       F       u
→ "TWFu"

This is the canonical result: btoa("Man") in any browser returns "TWFu". You can verify this instantly with our Base64 encoder tool.

Encoding "Hello" step by step

Let's take "Hello" (5 bytes: 72, 101, 108, 108, 111):

Group 1: H(72)   e(101)  l(108)
  Binary: 01001000 01100101 01101100
  6-bits: 010010 000110 010101 101100
  Index:  18     6      21     44
  Chars:  S      G      V      s   → "SGVs"

Group 2: l(108)  o(111)  [1 byte only → padding]
  Binary: 01101100 01101111
  Padded: 01101100 01101111 00000000
  6-bits: 011011 000110 111100 [padding]
  Index:  27     6      60
  Chars:  b      G      8      =  → "bG8="

Result: "SGVsbG8="

Verify: atob("SGVsbG8=") in any browser console returns "Hello".

The `=` padding character

When the input byte count is not a multiple of 3, the last block is incomplete. Base64 pads it with zero bits and marks the unused positions with =:

1 remaining byte → 2 Base64 characters + ==
2 remaining bytes → 3 Base64 characters + =
Input multiple of 3 → no padding needed

Padding allows the decoder to know exactly how many bytes were in the last block. Some variants — notably Base64url — omit the padding entirely; the decoder then infers the length from the total string length modulo 4.

Variants: standard, URL-safe, MIME

RFC 4648 defines several alphabets for different contexts. The algorithm is always the same (6-bit groups); only the characters at positions 62 and 63 change.

Standard Base64 (RFC 4648 §4)

Index 62 = +, index 63 = /. This is the original variant, found in PEM certificates, MIME attachments, and the browser's btoa() function. The problem with + and /: both characters have special meanings in URLs. The + represents a space in form-encoded data (application/x-www-form-urlencoded), and / is a path separator. Placing standard Base64 directly in a URL without percent-encoding is a common source of subtle bugs.

Base64url (RFC 4648 §5)

Index 62 = -, index 63 = _. These two characters are safe in URLs and file names. Padding = is typically omitted (it can be confused with a query-string separator in some contexts). This variant is used by:

JWT (JSON Web Tokens, RFC 7519) — all three segments are Base64url-encoded
OAuth 2.0 and PKCE (Proof Key for Code Exchange)
WebAuthn / FIDO2 — credential identifiers
URL parameters carrying binary data (tokens, nonces, state values)

MIME Base64 (RFC 2045)

Uses the same alphabet as standard Base64, but adds a formatting constraint: a line break every 76 characters (CRLF). This line-length limit existed for compatibility with legacy MUAs (Mail User Agents) that processed mail line by line. You find it in:

Email attachments encoded with Content-Transfer-Encoding: base64
.pem files (X.509 certificates, RSA keys) which conventionally use 64-column lines
MIME multipart bodies

Important: when decoding MIME Base64, line breaks must be ignored. Some libraries handle this automatically; others do not. This is a frequent source of "invalid Base64" errors.

Legitimate use cases for Base64

Base64 solves real problems. Here are the main use cases with their technical justification.

Data URIs: inline images and resources in HTML/CSS

The Data URI scheme (RFC 2397) lets you embed binary files directly in HTML or CSS:

<img src="data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAA...">

Or in CSS:

.icon {
    background-image: url("data:image/svg+xml;base64,PHN2ZyB4bWxucz0i...");
}

This is useful for small icons (avoids an extra HTTP request), but counterproductive for large images: browsers cannot cache an inline resource independently, and the 33% size overhead inflates the HTML/CSS itself. Practical rule of thumb: Data URIs are worthwhile below 2–4 KB.

JWT (JSON Web Tokens)

A JWT has three dot-separated parts: header.payload.signature. Each part is a JSON object encoded in Base64url. The benefit: the token is an opaque string that travels safely in an HTTP header (Authorization: Bearer <token>), a cookie, or a URL parameter, with no special-character issues.

eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9
.eyJzdWIiOiIxMjM0NTY3ODkwIiwibmFtZSI6IkpvaG4gRG9lIn0
.SflKxwRJSMeKKF2QT4fwpMeJf36POk6yJV_adQssw5c

Decode the header: {"alg":"HS256","typ":"JWT"}. Decode the payload: {"sub":"1234567890","name":"John Doe"}. These fields are not secret — they are simply encoded. The secret lives in the signature, which verifies the content has not been tampered with. Never put a password or user-invisible data in a JWT payload.

HTTP Basic Authentication

The HTTP Basic Auth scheme (RFC 7617) encodes credentials as Base64 in the Authorization header:

Authorization: Basic dXNlcjpwYXNzd29yZA==

Here, dXNlcjpwYXNzd29yZA== is the Base64 of user:password. There is zero intrinsic security: anyone who intercepts this header can decode the credentials instantly. Basic Auth is only acceptable over HTTPS, never over plain HTTP.

PEM certificates and cryptographic keys

X.509 certificates, RSA/ECDSA private keys, and CSRs (Certificate Signing Requests) are all binary DER (Distinguished Encoding Rules) structures. The PEM format (Privacy-Enhanced Mail) wraps them in Base64 with human-readable markers:

-----BEGIN CERTIFICATE-----
MIIBIjANBgkqhkiG9w0BAQEFAAOCAQ8AMIIBCgKCAQEA2a2rwplB...
-----END CERTIFICATE-----

The "PEM" name is a historical artifact — these files are no longer tied to the Privacy-Enhanced Mail protocol, but the format convention stuck. OpenSSL, Let's Encrypt, and every modern web server uses this format.

API tokens and binary data in JSON

JSON is a text format: it cannot represent raw binary data natively. To include an image, a cryptographic signature, or a binary hash in a JSON response, Base64 is the standard solution:

{
    "user_id": "abc123",
    "avatar": "data:image/jpeg;base64,/9j/4AAQSkZJRgAB...",
    "signature": "3q2+7w=="
}

This is also the pattern used by AWS, Google Cloud, and countless REST APIs that return binary outputs (generated images, exported files, audio streams).

MIME email attachments

A MIME-encoded attachment looks like:

Content-Type: application/pdf; name="invoice.pdf"
Content-Transfer-Encoding: base64
Content-Disposition: attachment; filename="invoice.pdf"

JVBERi0xLjQKJeLjz9MKMSAwIG9iago8PAovVHlwZSAvQ2F0YWxvZwov
UGFnZXMgMiAwIFIKPj4KZW5kb2JqCjIgMCBvYmoKPDwKL1R5cGUgL1Bh
Z2VzCi9LaWRzIFszIDAgUl0KL0NvdW50IDEKPj4KZW5kb2JqCg==

The same binary PDF content, represented as text to pass through legacy SMTP relays.

The critical pitfall: Base64 is not security

This is the most important section in this guide, and the one the web documents least clearly. Base64 is completely transparent: decoding a Base64 string takes one line of code in any language, zero prior knowledge, and no specialized tool. Yet Base64 regularly appears as a protection mechanism in production applications.

Concrete examples of bad practice

Case 1 — Base64-encoded password in a config file:

# config.yml — NEVER DO THIS
database:
    host: db.example.com
    password: cGFzc3dvcmQxMjM=   # "password123" encoded as Base64

Any developer who clones the repository, any attacker who reads the config file, can decode this in seconds. The fact that it is not plain text provides zero security — it is obfuscation, not encryption.

Case 2 — "Encrypting" sensitive data as Base64 in a database:

-- Column "ssn" (social security number) in a users table
-- Some apps store: base64_encode("123-45-6789")
-- Result: "MTIzLTQ1LTY3ODk="
-- Actual protection: none

Case 3 — Base64-"secured" API token:

// Header sent by some poorly designed apps
X-Api-Key: dXNlcjEyMzpzZWNyZXQ0NTY=
// Immediate decode: "user123:secret456"

Case 4 — Malware obfuscation: malicious scripts frequently use Base64 to hide their code from naive textual signature detection tools. A classic pattern:

eval(atob("dmFyIHggPSBkb2N1bWVudC5jb29raWU7..."));

This is not secure for the attacker either — modern sandboxes and antivirus tools decode and analyze Base64 payloads. But it fooled basic detection tools in the early 2010s.

What to use instead

If you need confidentiality: use authenticated encryption (AES-256-GCM, ChaCha20-Poly1305). For storing secrets in configs: use environment variables, a secrets manager (HashiCorp Vault, AWS Secrets Manager, Doppler), or at minimum an encrypted file with git-crypt or sops.

If you need integrity verification: use an HMAC-SHA256 or an asymmetric signature (Ed25519, RSA-PSS). That is exactly what the third segment of a JWT provides.

If you need to store a password: use Argon2id, bcrypt, or scrypt — slow hashing functions designed to make brute-force attacks computationally expensive. See our password generator for context on what makes a password strong, and how that strength depends on the server-side hashing algorithm.

Performance and size: the 33% overhead

The direct consequence of the 3-bytes-to-4-characters ratio is a size increase of 33% (precisely: 4/3 − 1 ≈ 0.333). In practice, with MIME line breaks and padding, the real overhead often reaches 36–37%.

Impact on bandwidth and caching

For HTML/CSS Data URIs, this overhead has direct consequences:

A 50 KB image embedded as a Data URI in CSS adds roughly 68 KB to the CSS file.
That larger CSS file is reloaded with every cache invalidation.
An image referenced by URL benefits from an independent HTTP cache and can be shared across multiple pages.

Under compression: Base64 compresses well with gzip/brotli because it has low entropy relative to the original binary data. In practice, the overhead after compression often drops to 10–15%. Most modern web servers compress HTML/CSS/JS automatically, which softens the real network impact. But the parsing and decoding overhead on the browser side remains.

When to use vs avoid Data URIs

Use Data URIs for:

Small inline SVG icons (under 2 KB)
Very small CSS sprites
Critical above-the-fold images where you want to eliminate the loading flash

Avoid Data URIs for:

Any image over 5 KB (the size overhead exceeds the savings from avoiding one HTTP request)
Images repeated across multiple pages (cannot be cached independently)
Logos and brand assets (lazy loading with normal URLs is more efficient)

Common developer pitfalls

Base64 looks simple to use, but it hides several traps that produce subtle bugs, often difficult to diagnose.

UTF-8 vs Latin-1: the character encoding bug

This is the most frequent trap. The browser's btoa() function only accepts strings containing Latin-1 characters (codes 0–255). Encoding a string with Unicode characters outside this range throws an exception:

// Works (pure ASCII)
btoa("Hello World")  // → "SGVsbG8gV29ybGQ="

// Throws InvalidCharacterError
btoa("こんにちは")  // Japanese characters: ERROR

The correct approach for encoding Unicode text as Base64 in a browser:

// Modern approach (recent browsers)
function base64EncodeUnicode(str) {
    const bytes = new TextEncoder().encode(str);
    const binString = String.fromCodePoint(...bytes);
    return btoa(binString);
}

// Classic approach
function base64EncodeUnicode(str) {
    return btoa(encodeURIComponent(str).replace(/%([0-9A-F]{2})/g,
        (match, p1) => String.fromCharCode('0x' + p1)
    ));
}

In Python, the issue is different: base64.b64encode() expects bytes, not a str. You must encode the string first:

import base64

# Incorrect (TypeError in Python 3)
base64.b64encode("Hello")

# Correct: encode the string to bytes first
base64.b64encode("Hello".encode("utf-8"))
# → b'SGVsbG8='

# Decode back
base64.b64decode(b"SGVsbG8=").decode("utf-8")
# → "Hello"

API differences across languages: reference table

Each language has its own Base64 API, with its own quirks:

# Python 3
import base64
encoded = base64.b64encode(b"data")           # Standard
url_safe = base64.urlsafe_b64encode(b"data")  # URL-safe (- and _)
decoded = base64.b64decode(encoded)

// JavaScript (Browser)
const encoded = btoa("data");          // Standard, Latin-1 only
const decoded = atob(encoded);
// Base64url: no native function — replace + with - and / with _

// Node.js
const buf = Buffer.from("data", "utf8");
const encoded = buf.toString("base64");        // Standard
const urlSafe = buf.toString("base64url");     // URL-safe (Node 16+)
const decoded = Buffer.from(encoded, "base64").toString("utf8");

// PHP
$encoded = base64_encode("data");     // Standard
$decoded = base64_decode($encoded);
// URL-safe: strtr(base64_encode($data), '+/', '-_')

// Go
import "encoding/base64"
encoded := base64.StdEncoding.EncodeToString([]byte("data"))
urlSafe := base64.URLEncoding.EncodeToString([]byte("data"))

// Java
import java.util.Base64;
String encoded = Base64.getEncoder().encodeToString("data".getBytes());
String urlSafe = Base64.getUrlEncoder().encodeToString("data".getBytes());

Base64 vs URL encoding: do not confuse them

A common mix-up pairs Base64 with URL encoding (percent-encoding, RFC 3986). These are distinct mechanisms with distinct goals:

URL encoding (%XX): encodes characters that are forbidden in URLs by replacing them with their hex code preceded by %. Applies to text strings. Example: "Hello World" → "Hello%20World".
Base64: encodes binary data as printable ASCII. Applies to any sequence of bytes. Example: a PNG file → a string of printable characters.

You can apply both in sequence: to put Base64 inside a URL, you sometimes need to percent-encode the + and / characters — or simply use Base64url, which avoids them altogether. Our QR code generator actually uses both encodings: text content is URL-encoded for the QR payload, and the resulting QR image can be exported as a Base64 Data URI.

Validating a Base64 string before decoding

Always validate the incoming string before attempting to decode it, to avoid exceptions and corrupted data. The validation regex for standard Base64:

// JavaScript
const isValidBase64 = (str) => /^[A-Za-z0-9+/]+={0,2}$/.test(str)
    && str.length % 4 === 0;

// Python
import re
def is_valid_base64(s):
    return bool(re.fullmatch(r'[A-Za-z0-9+/]*={0,2}', s)) and len(s) % 4 == 0

For Base64url (no required padding):

// JavaScript
const isValidBase64url = (str) => /^[A-Za-z0-9_-]+$/.test(str);

Note: a string that passes the regex is syntactically well-formed Base64, but not necessarily meaningful data. Decoding may still produce content that does not match your expectations.

Security and best practices

Rules to follow systematically whenever you work with Base64.

1. Always specify the character encoding

When encoding text (not raw binary data) as Base64, explicitly specify the character encoding — almost always UTF-8. Without this, developers on different systems may get different results for the same string if their default encoding is ISO-8859-1, Windows-1252, or something else.

# Good practice: document and fix the encoding
data = "Data with special characters: éàü"
encoded = base64.b64encode(data.encode("utf-8"))
# Always decode with the same charset
decoded = base64.b64decode(encoded).decode("utf-8")

2. Never use Base64 to hide secrets

Any Base64-encoded value should be treated as public. If you cannot show the data in plain text, you cannot show it in Base64 either. Practical consequences:

Never commit a Base64-encoded API key, password, or secret to a git repository — even a private one.
Never store sensitive data Base64-encoded in a cookie or localStorage without prior encryption.
During a security audit, always decode suspicious Base64 values found in configs, headers, or tokens.

3. Watch out for injections via decoded Base64

If your application decodes user-supplied Base64 and uses the result in an SQL query, shell command, or XML/HTML parser, you are vulnerable to injection attacks. Base64 is a transport wrapper, not a security filter. Always validate and sanitize data after decoding, never before.

// Bad: validating the Base64 does not protect against SQL injection
const username = atob(req.query.user); // "admin'--" is valid Base64
db.query(`SELECT * FROM users WHERE name = '${username}'`); // SQL INJECTION

// Correct: parameterize the query after decoding
const username = atob(req.query.user);
db.query("SELECT * FROM users WHERE name = ?", [username]);

4. Pick the right variant for the context

Simple rule:

In a URL or a JWT → Base64url (- and _, no padding)
In a MIME email or PEM certificate → MIME Base64 (with line wrapping)
Everywhere else (JSON, storage, API payloads) → standard Base64

5. Expected length and integrity check

A correctly padded Base64 string always has a length that is a multiple of 4. The formula: ceil(n / 3) * 4 characters for n input bytes. If an incoming Base64 string does not have a length that is a multiple of 4 (and is not Base64url without padding), that is a signal of truncation or corruption.

Conclusion: Base64 is a transport tool, not a protection mechanism

Base64 is a 1993 tool that still solves its original problem with remarkable efficiency: carrying binary data over text-only channels. In 2026 it is everywhere in the modern web — JWT tokens, Data URIs, REST APIs, TLS certificates, WebAuthn. Understanding its internals (6-bit groups, 64-symbol alphabet, 4/3 overhead) lets you avoid the two most common error categories: character encoding bugs (UTF-8 vs Latin-1) and the dangerous confusion with encryption.

The golden rule: Base64 = visibility, not security. Any data you cannot show in plain text should not simply be Base64-encoded — it must be encrypted with an appropriate algorithm, protected by access controls, or hashed irreversibly.

You can experiment with everything covered in this guide using our Base64 encoder and decoder tool — it handles standard Base64, URL-safe, and variants with or without padding. For further reading on data security, see also our password generator and our QR code generator, which puts several of the encodings described here into practice.

Base64 explained: encoding, security, and real use cases

What Base64 is — and what it is not

Origins: MIME and RFC 4648

Encoding vs encryption vs hashing

How Base64 works, byte by byte

The 64-character alphabet

Groups of 3 bytes become 4 characters

Encoding "Hello" step by step

The `=` padding character

Variants: standard, URL-safe, MIME

Standard Base64 (RFC 4648 §4)

Base64url (RFC 4648 §5)

MIME Base64 (RFC 2045)

Legitimate use cases for Base64

Data URIs: inline images and resources in HTML/CSS

JWT (JSON Web Tokens)

HTTP Basic Authentication

PEM certificates and cryptographic keys

API tokens and binary data in JSON

MIME email attachments

The critical pitfall: Base64 is not security

Concrete examples of bad practice

What to use instead

Performance and size: the 33% overhead

Impact on bandwidth and caching

When to use vs avoid Data URIs

Common developer pitfalls

UTF-8 vs Latin-1: the character encoding bug

API differences across languages: reference table

Base64 vs URL encoding: do not confuse them

Validating a Base64 string before decoding

Security and best practices

1. Always specify the character encoding

2. Never use Base64 to hide secrets

3. Watch out for injections via decoded Base64

4. Pick the right variant for the context

5. Expected length and integrity check

Conclusion: Base64 is a transport tool, not a protection mechanism

Frequently asked questions

Base64 explained: encoding, security, and real use cases

What Base64 is — and what it is not

Origins: MIME and RFC 4648

Encoding vs encryption vs hashing

How Base64 works, byte by byte

The 64-character alphabet

Groups of 3 bytes become 4 characters

Encoding "Hello" step by step

The = padding character

Variants: standard, URL-safe, MIME

Standard Base64 (RFC 4648 §4)

Base64url (RFC 4648 §5)

MIME Base64 (RFC 2045)

Legitimate use cases for Base64

Data URIs: inline images and resources in HTML/CSS

JWT (JSON Web Tokens)

HTTP Basic Authentication

PEM certificates and cryptographic keys

API tokens and binary data in JSON

MIME email attachments

The critical pitfall: Base64 is not security

Concrete examples of bad practice

What to use instead

Performance and size: the 33% overhead

Impact on bandwidth and caching

When to use vs avoid Data URIs

Common developer pitfalls

UTF-8 vs Latin-1: the character encoding bug

API differences across languages: reference table

Base64 vs URL encoding: do not confuse them

Validating a Base64 string before decoding

Security and best practices

1. Always specify the character encoding

2. Never use Base64 to hide secrets

3. Watch out for injections via decoded Base64

4. Pick the right variant for the context

5. Expected length and integrity check

Conclusion: Base64 is a transport tool, not a protection mechanism

Frequently asked questions

The `=` padding character