System design questions

Table of Contents

Base62 Encoding

Base62 encoding is a technique for representing integers or binary data as short strings using an alphabet of 62 alphanumeric characters:A-Z,a-z, and0-9. Since the encoded output contains only letters and digits, it is naturally URL-safe and does not require additional escaping when used in URLs, filenames, or HTML.

The encoding process is similar to converting a decimal number into another numeral system. Instead of repeatedly dividing by 10 (decimal) or 2 (binary), the number is repeatedly divided by 62. At each step, the remainder is used as an index into the Base62 alphabet. The encoded string is formed by reading the remainders in reverse order.

1Alphabet:
2ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789
3
4Example: Encode 123
5
6123 ÷ 62 = 1 remainder 61
7  1 ÷ 62 = 0 remainder 1
8
9Read the remainders in reverse:
10
111  -> 'B'
1261 -> '9'
13
14Encoded value: "B9"

Decoding performs the reverse operation. Starting from the leftmost character, each character is converted back to its numeric value and multiplied by the appropriate power of 62 before summing the results to reconstruct the original number.

1Decode "B9"
2
3'B' = 1
4'9' = 61
5
61 × 62¹ + 61 × 62⁰
7= 62 + 61
8= 123

Why use Base62?

It produces significantly shorter strings than decimal representations, making identifiers easier to read, copy, and share. The output consists entirely of alphanumeric characters, making it safe for use in URLs, query parameters, filenames, and most text-based protocols without requiring percent encoding. Encoding and decoding involve only repeated division, multiplication, and table lookups, making the algorithm simple and efficient to implement.

Common use cases

URL shortening services, where long numeric identifiers are converted into compact links (for example,https://example.com/B9).

Generating compact public identifiers from database primary keys while avoiding long decimal values.

Creating short invitation codes, coupon codes, referral links, or other human-readable tokens.