StringsLesson 1

How Strings Work

Strings as character arrays in memory. How they are stored, accessed, and what makes them different from regular arrays.

What Is It?

A string is just an array of characters stored one after another in memory. The word "HELLO" is stored as five characters sitting in a row — like five boxes in a line, each box holding one letter.

But strings come with some extra features that a plain array doesn't have. They track where they end. They often come with built-in tools like search or compare. And in many languages, once you create a string you can't change it.

Strings show up everywhere in programming — URLs, usernames, messages, file contents. Knowing how they actually work will help you avoid slow code and confusing bugs.

Analogy

The Scrabble Rack

Imagine a Scrabble tile rack. Each slot holds exactly one tile, and each tile has a letter on it. The tiles sit side by side in a row.

String as a Scrabble Rack— Each slot holds one character

[0]

[1]

[2]

[3]

[4]

Each slot is one byte in memory. Each tile is a character code. The slot number is the index.

This is a string. Each slot is one spot in memory. Each tile is one character. The slot number is the index.

But a rack by itself doesn't tell you how many tiles are on it. You need some way to know where the word ends. Two common approaches:

Put a special "end" marker after the last tile — like a blank tile that means "stop reading here." This is called null termination.
Write the count on a sticky note attached to the rack — "this rack has 5 tiles." This is called length-prefixed.

Both solve the same problem: knowing where the string ends. But they work differently.

Try It Yourself

How It Works

Characters Are Numbers

Computers don't actually store the letter 'A' — they store the number 65. Every character is mapped to a number. This mapping is called a character encoding.

The most basic encoding is ASCII, which assigns numbers to common characters like letters, digits, and symbols:

'A'— Uppercase A

'B'— Uppercase B

'Z'— Uppercase Z

'a'— Lowercase a

'z'— Lowercase z

122

'0'— Digit zero

'9'— Digit nine

' '— Space

'\n'— Newline

A few things worth noticing: 'A' is 65 and 'a' is 97 — they differ by 32. The digit character '5' is stored as 53, not 5. To turn the character '5' into the number 5, you subtract 48 (the code for '0'): 53 - 48 = 5.

ASCII covers basic English characters. For other languages and emoji, Unicode is used — it has over 140,000 characters. The most common Unicode format is UTF-8, which uses 1 byte for standard English characters and more bytes for others.

String Memory Layout

The string "HELLO" stored in memory looks like this:

Position:  0    1    2    3    4
          ┌────┬────┬────┬────┬────┐
          │ 72 │ 69 │ 76 │ 76 │ 79 │
          └────┴────┴────┴────┴────┘
           'H'  'E'  'L'  'L'  'O'
  Each box holds one character as a number.
  5 characters = 5 spots in memory.

But how does the program know the string is 5 characters long and not 500?

Null Termination vs. Length-Prefixed

Approach 1: Null Termination

A special value of 0 (called the null character, written as '\0') is placed right after the last character:

Null-Terminated "HELLO"— Uses 6 bytes for a 5-character string

200

201

202

203

204

205

'H'

'E'

'L'

'O'

'\0'

A null byte (0) marks the end. To find the length, scan until you hit 0 — O(n).

Pros: Simple — just one extra spot used.

Cons: To find the length, you have to scan through every character until you hit the 0. That's O(n) — it takes longer the longer the string is.

Approach 2: Length-Prefixed

The string stores its own length upfront, before the characters:

Length-Prefixed "HELLO"— 4 bytes for length + 5 bytes for chars = 9 bytes total

200

204

205

206

207

208

length

'H'

'E'

'L'

'O'

Length stored in the header — read it in O(1). Can contain null bytes as data.

Pros: Length is O(1) — you just read the number at the front.

Cons: Uses a little extra space for that length number.

Most modern languages use length-prefixed strings (or a mix of both).

Accessing Individual Characters

Because strings are arrays, you can grab any character by its index instantly:

string = "HELLO"
string[0] → 'H'
string[3] → 'L'
string[4] → 'O'

This is O(1) — it doesn't matter how long the string is, jumping to any position takes the same amount of time.

Strings vs. Character Arrays: What's Different?

At a low level they look the same. The difference is in what rules come with them:

Character Array

Contiguous bytes in memory
O(1) index access
Usually mutable
Manual length tracking
You handle termination
No built-in operations
Just bytes — no type safety

raw bytes

String

Contiguous bytes in memory
O(1) index access
Often immutable
Automatic length tracking
System handles termination
Concat, search, etc. built in
Enforced as text

language-level type

A string is a character array with guardrails: it tracks its own length, comes with built-in operations like search and compare, and often prevents you from changing individual characters directly.

Examples

Example 1: What "CAT" looks like in memory

string = "CAT"
In memory (null-terminated):
  Position:  0    1    2    3
            ┌────┬────┬────┬────┐
            │ 67 │ 65 │ 84 │  0 │
            └────┴────┴────┴────┘
             'C'  'A'  'T'  end
  string[0] = 67 → 'C'
  string[1] = 65 → 'A'
  string[2] = 84 → 'T'

Example 2: Finding string length

Null-Terminated

Walk through every character
Count until \0 is reached
Must scan entire string
Simple — minimal overhead

Scan until null byte found

O(n) to find length

Length-Prefixed

Length stored in header
Read one value instantly
No scanning required
Can contain null bytes as data

Read length from metadata header

O(1) to find length

Common Mistakes

Assuming one character = one byte. This is true for basic English with ASCII. But with UTF-8, a single emoji or accented character can take 2-4 bytes. The string "hello" is 5 bytes, but a string of 5 emoji might be 20 bytes.

Treating character codes as the actual number. The character '5' is stored as 53, not 5. To convert the character '5' to the number 5, subtract the code of '0' (which is 48): '5' - '0' = 5.

Trying to modify a string that can't be changed. In many languages, string[2] = 'X' is not allowed — strings are immutable (can't be changed after creation). You have to make a new string instead.

Comparing strings like numbers. Strings are compared character by character. "9" is greater than "10" in string comparison because '9' > '1'. This trips people up when sorting numbers stored as strings.

Best Practices

Know which character encoding your system uses — it affects how length and indexing work
Never assume one character equals one byte if your text might include non-English characters
Cache the string length in a variable before a loop — don't recalculate it every iteration
When you get a character by index, remember you're getting a number (its code), not a visual symbol

Dynamic Arrays String Operations