How Strings Work
Strings as character arrays in memory. How they are stored, accessed, and what makes them different from regular arrays.
What Is It?
A string is just an array of characters stored one after another in memory. The word "HELLO" is stored as five characters sitting in a row — like five boxes in a line, each box holding one letter.
But strings come with some extra features that a plain array doesn't have. They track where they end. They often come with built-in tools like search or compare. And in many languages, once you create a string you can't change it.
Strings show up everywhere in programming — URLs, usernames, messages, file contents. Knowing how they actually work will help you avoid slow code and confusing bugs.
Analogy
The Scrabble Rack
Imagine a Scrabble tile rack. Each slot holds exactly one tile, and each tile has a letter on it. The tiles sit side by side in a row.
Each slot is one byte in memory. Each tile is a character code. The slot number is the index.
This is a string. Each slot is one spot in memory. Each tile is one character. The slot number is the index.
But a rack by itself doesn't tell you how many tiles are on it. You need some way to know where the word ends. Two common approaches:
- Put a special "end" marker after the last tile — like a blank tile that means "stop reading here." This is called null termination.
- Write the count on a sticky note attached to the rack — "this rack has 5 tiles." This is called length-prefixed.
Both solve the same problem: knowing where the string ends. But they work differently.
Try It Yourself
How It Works
Characters Are Numbers
Computers don't actually store the letter 'A' — they store the number 65. Every character is mapped to a number. This mapping is called a character encoding.
The most basic encoding is ASCII, which assigns numbers to common characters like letters, digits, and symbols:
A few things worth noticing: 'A' is 65 and 'a' is 97 — they differ by 32. The digit character '5' is stored as 53, not 5. To turn the character '5' into the number 5, you subtract 48 (the code for '0'): 53 - 48 = 5.
ASCII covers basic English characters. For other languages and emoji, Unicode is used — it has over 140,000 characters. The most common Unicode format is UTF-8, which uses 1 byte for standard English characters and more bytes for others.
String Memory Layout
The string "HELLO" stored in memory looks like this:
Position: 0 1 2 3 4┌────┬────┬────┬────┬────┐│ 72 │ 69 │ 76 │ 76 │ 79 │└────┴────┴────┴────┴────┘'H' 'E' 'L' 'L' 'O'Each box holds one character as a number.5 characters = 5 spots in memory.
But how does the program know the string is 5 characters long and not 500?
Null Termination vs. Length-Prefixed
Approach 1: Null Termination
A special value of 0 (called the null character, written as '\0') is placed right after the last character:
A null byte (0) marks the end. To find the length, scan until you hit 0 — O(n).
Pros: Simple — just one extra spot used.
Cons: To find the length, you have to scan through every character until you hit the 0. That's O(n) — it takes longer the longer the string is.
Approach 2: Length-Prefixed
The string stores its own length upfront, before the characters:
Length stored in the header — read it in O(1). Can contain null bytes as data.
Pros: Length is O(1) — you just read the number at the front.
Cons: Uses a little extra space for that length number.
Most modern languages use length-prefixed strings (or a mix of both).
Accessing Individual Characters
Because strings are arrays, you can grab any character by its index instantly:
string = "HELLO"string[0] → 'H'string[3] → 'L'string[4] → 'O'
This is O(1) — it doesn't matter how long the string is, jumping to any position takes the same amount of time.
Strings vs. Character Arrays: What's Different?
At a low level they look the same. The difference is in what rules come with them:
Character Array
- Contiguous bytes in memory
- O(1) index access
- Usually mutable
- Manual length tracking
- You handle termination
- No built-in operations
- Just bytes — no type safety
raw bytes
String
- Contiguous bytes in memory
- O(1) index access
- Often immutable
- Automatic length tracking
- System handles termination
- Concat, search, etc. built in
- Enforced as text
language-level type
A string is a character array with guardrails: it tracks its own length, comes with built-in operations like search and compare, and often prevents you from changing individual characters directly.
Examples
Example 1: What "CAT" looks like in memory
string = "CAT"In memory (null-terminated):Position: 0 1 2 3┌────┬────┬────┬────┐│ 67 │ 65 │ 84 │ 0 │└────┴────┴────┴────┘'C' 'A' 'T' endstring[0] = 67 → 'C'string[1] = 65 → 'A'string[2] = 84 → 'T'
Example 2: Finding string length
Null-Terminated
- Walk through every character
- Count until \0 is reached
- Must scan entire string
- Simple — minimal overhead
Scan until null byte found
O(n) to find length
Length-Prefixed
- Length stored in header
- Read one value instantly
- No scanning required
- Can contain null bytes as data
Read length from metadata header
O(1) to find length
Common Mistakes
- Assuming one character = one byte. This is true for basic English with ASCII. But with UTF-8, a single emoji or accented character can take 2-4 bytes. The string "hello" is 5 bytes, but a string of 5 emoji might be 20 bytes.
- Treating character codes as the actual number. The character '5' is stored as 53, not 5. To convert the character '5' to the number 5, subtract the code of '0' (which is 48):
'5' - '0' = 5.
- Trying to modify a string that can't be changed. In many languages,
string[2] = 'X'is not allowed — strings are immutable (can't be changed after creation). You have to make a new string instead.
- Comparing strings like numbers. Strings are compared character by character. "9" is greater than "10" in string comparison because '9' > '1'. This trips people up when sorting numbers stored as strings.
Best Practices
- Know which character encoding your system uses — it affects how length and indexing work
- Never assume one character equals one byte if your text might include non-English characters
- Cache the string length in a variable before a loop — don't recalculate it every iteration
- When you get a character by index, remember you're getting a number (its code), not a visual symbol