AlgoMaster Logo

String Iteration (bytes vs runes)

Last Updated: May 17, 2026

13 min read

Go gives you two different ways to walk through a string, and they look almost identical until you put a name like José or an emoji like in the data. One iteration mode hands you raw bytes; the other decodes UTF-8 and hands you whole characters. Picking the wrong one is the source of most string bugs in Go, and the fix usually changes one line. This lesson covers both modes, the byte-index gotcha that trips up everyone the first time, and how to do the common tasks (counting characters, reversing, taking the first N characters) correctly.

Two Loops, Two Different Things

The setup is simple. A Go string is a read-only sequence of bytes, and those bytes happen to be UTF-8 encoded text. A rune represents a Unicode code point, and UTF-8 packs a single code point into one, two, three, or four bytes, which is why len(s) measures bytes rather than characters.

The two iteration styles are:

For an ASCII string, byte and character line up one to one, so the loop above looks fine. Now compare it with the range form on the same value:

Same output, different machinery. The byte loop reads one byte at a time. The range loop decodes UTF-8 and yields a rune for each character. With an ASCII-only string the two are indistinguishable, which is why the difference is so easy to miss.

The difference shows up the moment a non-ASCII character appears. The string "José" has four characters, but é takes two bytes in UTF-8 (0xC3 0xA9), so the byte length is 5, not 4. The two loops now disagree.

The byte loop runs 5 times, one per byte, and hands back two raw bytes (195 and 169) for the é. The range loop runs 4 times and yields the é as a single rune. Same string, same data in memory, two different views of it.

The range loop is what most code wants when it's working with text. The byte loop is what you want when you're working with bytes (parsing a wire protocol, scanning for an ASCII delimiter, computing a checksum).

Byte Iteration With an Index

Byte iteration uses a classic C-style for loop. Each step pulls one byte out of the string with the index operator s[i]. The type of s[i] is byte, which is just an alias for uint8.

A few things to note:

  • len(code) is the byte length, which is what you want for this loop.
  • code[i] returns a byte, not a rune. With %c it still prints as a character because every byte under 128 is a valid one-byte UTF-8 sequence and matches the ASCII character.
  • For an all-ASCII string this loop visits each character exactly once.

The byte loop is the right tool for byte-oriented work. Parsing a product code that's guaranteed to be ASCII, checking that a string starts with "SKU-", computing a hash byte by byte, all of these are byte work. The loop is straightforward, allocates nothing, and runs in O(n) where n is the byte length.

The problem starts when the data isn't all ASCII and you forget. The byte loop will happily walk through "José" and give you the bytes 74, 111, 115, 195, 169. If your code treats each of those as a character, you'll print garbage for the last two and you'll think the string has 5 characters when it has 4.

Rune Iteration With Range

The for ... range form over a string is rune-aware. The Go specification defines this specifically: ranging over a string decodes UTF-8 one code point at a time and yields the byte index of where that code point starts together with the rune itself.

Look at the index column. It runs 0, 1, 2, ... 15, then jumps to 18, then 21, then 24, then 27. The star takes three bytes in UTF-8 (0xE2 0xAD 0x90), so each successive star's starting byte is three positions later than the last.

This is the part that surprises everyone the first time. The i in for i, r := range s is not a sequential 0-based character counter. It is the byte offset of the first byte of the current rune. For an ASCII-only string the two are the same number, which is exactly why the bug is hard to find by accident.

Why does Go work this way? Because the byte index is the only thing that's safe to use as a substring boundary. If you have the rune at index 15 and you want everything before it, you can write review[:15]. If i were a character count, you couldn't slice with it at all without re-walking the string.

The rune type behind r is an alias for int32. A rune holds a single Unicode code point, which is a number, not a string. That's why printing it with %d shows 11088 and with %q shows '⭐'.

The Byte-Index Gotcha

The single most common bug in this area is treating the index from range as a sequential counter. Here's the wrong pattern:

Look at the last line. The rune is 'é' (code point 233), but name[i] is 195, which is the first byte of é's UTF-8 encoding, not the rune itself. The two Printf arguments disagree because they're measuring different things. r is the decoded rune; name[i] is one raw byte sitting at offset i.

The fix is to just use r. The whole point of the range form is that it hands you the rune already. Reaching back to name[i] defeats the purpose and reintroduces the byte view.

The other side of the same bug is trying to index a string with a "character number" you've been counting yourself:

What's wrong with this code?

This prints Ã, not é. The intent is "the fourth character", but name[3] is the fourth byte, which is the first byte of the two-byte sequence that encodes é. Printed alone, that byte is the Latin-1 character Ã, which is not what anyone wanted.

Fix:

Converting to []rune gives you a slice where index n is the n-th character. That's what we'll cover next.

Why Range Decodes UTF-8

This is a deliberate language design decision. The Go team made range over a string rune-aware because text iteration is the common case, and forcing every programmer to call utf8.DecodeRuneInString by hand would be miserable. The byte-index loop is still available for the cases where bytes are what you want, but the easier form (range) is the safer one for text.

The decoding is real work, not magic. Each step of the range loop:

  1. Reads the byte at the current position.
  2. Looks at the high bits to figure out how many bytes this rune uses (1, 2, 3, or 4).
  3. Reads the remaining bytes of the rune.
  4. Assembles them into a rune value.
  5. Advances the byte position by the rune's byte length.

The visual below shows what happens for the string "Hi⭐". The byte indices and the rune indices line up for H and i, then the star takes three bytes and the next rune would start at byte 5.

The cyan boxes are single-byte ASCII characters. The orange boxes are the three bytes that together encode the star. On the right side, the range loop produces three iterations. The first two indices match the byte position one-to-one. The third one starts at byte 2 and the loop's next position would jump to byte 5, which is past the end of the string, so the loop stops.

Reading this diagram, the byte-index gotcha makes sense. The i value in the range loop is the byte where the rune starts. After the star, the next iteration would carry i = 5, not i = 3.

Invalid UTF-8 in a Range Loop

A Go string can hold any sequence of bytes, including bytes that don't form valid UTF-8. The compiler doesn't check, and you can construct invalid UTF-8 by reading bytes off the network, opening a file with an unexpected encoding, or just slicing in the middle of a multibyte rune.

When the range loop hits an invalid byte sequence, it yields the special rune utf8.RuneError (which equals U+FFFD, the Unicode replacement character) and advances one byte. Then it continues from the next byte. It doesn't panic, and it doesn't skip the bad data.

The first three iterations are fine. At byte 3, the loop sees 0xC3, which is the start of a two-byte rune, but there's no second byte. The range loop yields U+FFFD with i=3 and stops.

In production code you usually want to either reject the input (string came from an untrusted source, validate it with utf8.ValidString before processing) or replace bad runes with U+FFFD and continue. The range loop already does the second one by default, which is convenient and occasionally surprising.

Converting to []rune

When you actually need random access by character position, convert the string to []rune:

[]rune(s) walks the entire string, decodes every rune, and stores the runes in a freshly allocated slice. After the conversion you have a slice of int32 values where index n is the n-th character. Indexing is O(1), as it is for any slice.

This is what you reach for when:

  • You need the N-th character of a string and N might fall in the middle of a multibyte rune.
  • You need to walk the string in reverse character order.
  • You need to mutate characters (recall that strings themselves are immutable; the rune slice can be edited).

The cost is real. The conversion is O(n) in the byte length and it allocates a new slice big enough to hold all the runes. For a short customer name this is invisible. For a 10 KB product description that you only need to read forward, it's wasted work; the range loop already does what you need with zero allocations.

Once you have the rune slice, iterating it with range works the way you'd expect, with sequential 0, 1, 2, ... indices:

The indices are now true character positions, because chars is a slice of runes, not a string. chars[i] is the same as r for every iteration, and the byte-index gotcha is gone. You traded one allocation for a saner iteration model.

Counting Characters

A common question: how many characters are in this string? The answer is not len(s), which counts bytes. There are two correct ways:

Both utf8.RuneCountInString(s) and len([]rune(s)) give the same answer. They count code points, not bytes. The byte length is much larger because each star takes three bytes.

The two methods are not equivalent in cost.

A common pattern in product description validation is to enforce a character limit, not a byte limit. Using len(s) > 200 would be wrong because a description full of emoji or accented characters might be well under 200 visual characters but still over 200 bytes. Use the rune count:

For the all-ASCII description the two numbers match. For the second, bytes are well above characters because the accented é and the five stars each take more than one byte.

Reversing a String

"Reverse a string" is a popular interview question, and it's a perfect demonstration of the bytes vs runes split. The naive approach uses bytes:

What's wrong with this code?

For ASCII input the function works. For "José" it produces ©Ã©soJ, which is mojibake (corrupted text). The byte loop swapped the two bytes of é independently, splitting a single rune into garbage. The second swap put one of those bytes adjacent to the wrong neighbor, which decoded back as à and ©.

Fix:

Now é stays intact through the reversal, and the star comes out as a single character on the left side of the result. The fix is to swap whole runes, not bytes. The body of the loop is otherwise the same.

This is also a hint about what "reverse a string" even means once you allow non-ASCII text. There are characters in Unicode that are technically composed of multiple code points (a letter plus a combining accent, for example), and reversing by rune can still split those. For most practical e-commerce inputs (names, descriptions, search terms) the rune-by-rune reverse is the right answer and gets the job done.

Taking the First N Characters

Another common task: truncate a product description to the first 50 characters and add an ellipsis. Slicing by byte (s[:50]) is wrong as soon as the cut falls inside a multibyte rune. You either get an invalid UTF-8 fragment, or you cut a rune in half and corrupt it.

Here's the correct version:

How it works:

  • The for i := range s form is the rune-aware iteration. The i is the byte position where each rune starts.
  • We count how many runes we've seen. The instant the count reaches n, the next rune is the one we want to exclude, so its starting byte position i is the correct cut point.
  • We slice s[:i] and return. That slice is a valid string because i is always on a rune boundary.
  • If the string is shorter than n, the loop exits normally and we return the whole string.

Why is this safer than s[:n]? Because s[:n] would cut at byte n, which has no relation to character n. For an ASCII string those are the same, but for any string with multibyte characters, byte slicing can produce garbage.

You can do the same job with []rune(s)[:n] and a conversion back, but that allocates the rune slice for no real reason. The range-loop version above is O(n) in characters and allocates nothing beyond the result string itself.

Choosing Between Byte and Rune Iteration

A short decision table for the common cases:

GoalIteration
Walk every character of text in orderfor i, r := range s
Process raw bytes (parse a fixed ASCII format, compute a checksum)for i := 0; i < len(s); i++
Get the N-th character for random accessrunes := []rune(s); runes[n]
Count charactersutf8.RuneCountInString(s)
Truncate to the first N charactersrange loop + count + slice at byte index
Reverse a stringconvert to []rune, reverse, convert back
Check a prefix or suffix where you know it's ASCIIbyte index or strings.HasPrefix
Check that a string is valid UTF-8utf8.ValidString(s)

When in doubt, use the range loop. It's correct for text, fast enough for almost any workload, and doesn't allocate. Reach for []rune(s) only when you actually need random access by character position, and reach for the byte loop only when you actually need bytes.

A short example tying it together. Imagine you're processing a list of customer reviews. You want to print each review's character count and the first 30 characters as a preview.

Three reviews, three different character counts, and each preview cuts cleanly at a character boundary. The accented é in the French review is preserved intact, and the stars never appear half-encoded. The range loop is doing all the real work; utf8.RuneCountInString is just there for the count.

Summary

  • A Go string can be walked in two ways. for i := 0; i < len(s); i++ reads one byte at a time, and for i, r := range s decodes UTF-8 and yields one rune at a time.
  • In the range form, i is the byte index where the current rune starts, not a 0, 1, 2 character counter. For strings with multibyte runes, i jumps by more than 1 between iterations.
  • s[i] is always a byte. Inside a range loop, reaching back to s[i] reintroduces the byte view and undoes the rune decode that range just did.
  • The range loop handles invalid UTF-8 by yielding utf8.RuneError (U+FFFD) and advancing one byte at a time. It never panics on malformed input.
  • []rune(s) converts the string into a slice of code points, allocating O(n) memory but giving you O(1) random access by character. Use it when you need indexed access or in-place edits.
  • Counting characters: prefer utf8.RuneCountInString(s) over len([]rune(s)). Both are O(n), but the first allocates nothing.
  • Reversing a string and slicing to the first N characters both need to operate on runes, not bytes. The range loop or a rune slice gets the right answer; a byte loop corrupts multibyte runes.
  • When in doubt, use range. It's correct for text and the fastest allocation-free way to walk a string by character.

The next lesson, String Comparison and Equality, looks at how strings compare with == and <, why those compare bytes rather than characters, and what to do when you need case-insensitive or locale-aware comparison from the strings package.