StringsLesson 3

String Immutability and Builders

Why modifying a string creates a new one, the hidden O(n²) cost of repeated concatenation, and how string builders fix it.

What Is It?

In many languages, strings are immutable — that means once a string is created, it cannot be changed. You can't modify the third character. You can't add to the end. You can't delete from the middle. Every "change" actually creates a brand new string.

This sounds wasteful. Sometimes it is. But it also makes strings much safer to work with — you can pass a string to a function and know it won't be silently changed on you.

The problem shows up when you need to build a string piece by piece — like assembling a message one character at a time, or joining many small strings together. With immutable strings, every concatenation copies everything you've built so far. Do this in a loop and the cost explodes. A string builder is the solution: a mutable buffer where you can append cheaply, then convert to a real string at the end.

Analogy

Carving in Stone vs. Writing on a Whiteboard

Immutable strings are like words carved in stone. Each stone tablet is permanent. If you want to add a word, you don't chisel it onto the existing tablet — you get a brand new tablet and carve the entire message from scratch, old words and new.

Building a long message one word at a time with stone tablets:

  1. Carve "Hello" on tablet 1 (5 characters carved)
  2. Want to add "World"? Carve "Hello World" on tablet 2 (11 characters carved), discard tablet 1
  3. Want to add "How"? Carve "Hello World How" on tablet 3 (15 characters carved), discard tablet 2

Every step, you re-carve everything you've already written. The total carving effort keeps growing.

A string builder is the whiteboard. You write with a marker. Adding a word just means writing it after the last word — you don't rewrite anything. When you're done, you take a photo of the whiteboard (convert to a string). One copy, done.

How It Works

Why Immutability?

Immutability has real benefits. Here are the two most important:

1. Safe to share

If a string can never change, you can pass it to any function without worrying that the function will secretly alter it:

pseudocode
1
2
3
4
5
6
7
8
9
// With mutable strings — risky:
name = "Alice"
greetUser(name) // could this change name to something else?
saveToDatabase(name) // are we saving the original or a changed version?
// With immutable strings — safe:
name = "Alice"
greetUser(name) // name cannot change, period
saveToDatabase(name) // still "Alice", guaranteed

2. Safe as dictionary keys

When you store something in a dictionary (also called a hash map) with a string key, the system uses the key's characters to decide where to store it. If the key changed after insertion, the system would look in the wrong place and never find your value again:

pseudocode
1
2
3
4
5
6
7
8
// With mutable key — broken:
key = "hello"
map.put(key, 42) // stored based on "hello"
key[0] = 'j' // key is now "jello"
map.get(key) // looks for "jello" → not found!
// the entry is stuck, unreachable
// With immutable key — always works

The O(n²) Concatenation Problem

Here's the expensive trap. Suppose you build a string by adding one character at a time in a loop:

pseudocode
1
2
3
result = ""
for i = 0 to n - 1:
result = result + characters[i] // creates a NEW string every time

Trace with n = 5, building "ABCDE":

1
2
3
4
5
6
7
Iteration 0: result = "" + "A" → copy 1 char → "A"
Iteration 1: result = "A" + "B" → copy 2 chars → "AB"
Iteration 2: result = "AB" + "C" → copy 3 chars → "ABC"
Iteration 3: result = "ABC" + "D" → copy 4 chars → "ABCD"
Iteration 4: result = "ABCD" + "E" → copy 5 chars → "ABCDE"
Total characters copied: 1 + 2 + 3 + 4 + 5 = 15

For n characters: 1 + 2 + 3 + ... + n = n(n+1)/2 = O(n²)

The wasted work is easy to see:

1
2
3
4
5
6
7
8
Iteration 0: [A] → 1 copy, then thrown away
Iteration 1: [A][B] → 2 copies, then thrown away
Iteration 2: [A][B][C] → 3 copies, then thrown away
Iteration 3: [A][B][C][D] → 4 copies, then thrown away
Iteration 4: [A][B][C][D][E] → 5 copies ← only this survives
─────────────────────────────────
15 total copies for a 5-character string
4 intermediate strings created and immediately discarded

With 10,000 characters, that's around 50 million copies. With 100,000 characters, around 5 billion. This is one of the most common performance bugs in software.

The String Builder Solution

A string builder is a mutable buffer — a resizable array of characters that you can keep appending to. When you're done, you convert it to an immutable string once.

pseudocode
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
class StringBuilder:
buffer = allocate(16) // start with space for 16 characters
size = 0
capacity = 16
function append(str):
strLen = length(str)
while size + strLen > capacity:
resize(capacity * 2) // double the buffer if needed
for i = 0 to strLen - 1:
buffer[size + i] = str[i]
size = size + strLen
function appendChar(ch):
if size == capacity:
resize(capacity * 2)
buffer[size] = ch
size = size + 1
function toString():
result = allocate(size)
for i = 0 to size - 1:
result[i] = buffer[i]
return createImmutableString(result)
function resize(newCapacity):
newBuffer = allocate(newCapacity)
for i = 0 to size - 1:
newBuffer[i] = buffer[i]
buffer = newBuffer
capacity = newCapacity

Now the same loop:

pseudocode
1
2
3
4
builder = new StringBuilder()
for i = 0 to n - 1:
builder.appendChar(characters[i]) // each append is O(1)
result = builder.toString() // one final copy at the end: O(n)

Trace with n = 5, building "ABCDE":

1
2
3
4
5
6
7
8
appendChar('A'): buffer=['A',_,_,...], size=1 → 1 copy
appendChar('B'): buffer=['A','B',_,...], size=2 → 1 copy
appendChar('C'): buffer=['A','B','C',...], size=3 → 1 copy
appendChar('D'): size=4 → 1 copy
appendChar('E'): size=5 → 1 copy
toString(): copy 5 chars to final string → 5 copies
Total copies: 10 (vs. 15 with repeated concatenation, and much better at scale)

For large n, the difference is dramatic:

n = 100
5,050concat copies
n = 1,000
500,500concat copies
n = 10,000
50Mconcat copies
n = 100,000
5Bconcat copies

When Immutability Helps vs. Hurts

Immutable Strings

  • ✓ Safe to pass to multiple functions
  • ✓ Hash map keys never change
  • ✓ No locks needed for threads
  • ✓ Can share/intern repeated strings
  • ✗ O(n²) for building in loops
  • ✗ Must copy entire string to modify

safe by default

Mutable Strings

  • ✗ Must copy defensively
  • ✗ Keys could mutate (hash breaks)
  • ✗ Needs synchronization for threads
  • ✗ Each variable needs its own copy
  • ✓ O(n) with buffer for building
  • ✓ O(1) to modify single char

flexible but risky

The pattern in practice: use immutable strings for storing, passing around, and comparing. Switch to a builder only when building a string step by step. Convert back to immutable when done.

Examples

Example 1: Building a CSV line

pseudocode
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
// BAD — O(n²): each concatenation copies everything so far
function buildCSV_bad(values):
line = ""
for i = 0 to length(values) - 1:
if i > 0:
line = line + "," // copies the whole line built so far
line = line + values[i] // copies it again
return line
// GOOD — O(n): each append just writes to the end of the buffer
function buildCSV_good(values):
builder = new StringBuilder()
for i = 0 to length(values) - 1:
if i > 0:
builder.append(",")
builder.append(values[i])
return builder.toString() // one copy at the very end

Example 2: Why changing one character means creating a whole new string

pseudocode
1
2
3
4
5
6
7
8
9
// Want to change "Hello" to "Jello" (swap first character)
// With immutable strings — you must build a new one:
original = "Hello"
newStr = "J" + substring(original, 1, 5) // copies 5 characters total
// With a mutable char array — you just change the one character:
chars = ['H','e','l','l','o']
chars[0] = 'J' // O(1), no copy needed

Common Mistakes

  1. Concatenating in a loop without realizing the cost. This is the most common string performance mistake. Every concatenation of immutable strings copies everything built so far. The fix is always a string builder.
  1. Using a string builder for just two or three strings. If you're combining a small number of strings once (not in a loop), plain concatenation is fine. The builder adds complexity that isn't worth it for simple cases.
  1. Forgetting to call toString() at the end. A string builder's internal buffer is not a string. You must explicitly call toString() when you're done to get the actual string out.
  1. Not pre-allocating when you know the size. If you know your final string will be about 1,000 characters, create the builder with that initial capacity. It avoids the buffer growing and being copied multiple times during construction.

Best Practices

  • Use a string builder any time you concatenate strings inside a loop
  • If you know roughly how long the final string will be, set that as the initial capacity
  • Keep strings immutable by default — only use a builder during construction, then convert back
  • A few concatenations outside of loops are perfectly fine — only optimize when you're in a loop