Last Updated: May 17, 2026
Go gives you two different ways to walk through a string, and they look almost identical until you put a name like José or an emoji like ⭐ in the data. One iteration mode hands you raw bytes; the other decodes UTF-8 and hands you whole characters. Picking the wrong one is the source of most string bugs in Go, and the fix usually changes one line. This lesson covers both modes, the byte-index gotcha that trips up everyone the first time, and how to do the common tasks (counting characters, reversing, taking the first N characters) correctly.
The setup is simple. A Go string is a read-only sequence of bytes, and those bytes happen to be UTF-8 encoded text. A rune represents a Unicode code point, and UTF-8 packs a single code point into one, two, three, or four bytes, which is why len(s) measures bytes rather than characters.
The two iteration styles are:
For an ASCII string, byte and character line up one to one, so the loop above looks fine. Now compare it with the range form on the same value:
Same output, different machinery. The byte loop reads one byte at a time. The range loop decodes UTF-8 and yields a rune for each character. With an ASCII-only string the two are indistinguishable, which is why the difference is so easy to miss.
The difference shows up the moment a non-ASCII character appears. The string "José" has four characters, but é takes two bytes in UTF-8 (0xC3 0xA9), so the byte length is 5, not 4. The two loops now disagree.
The byte loop runs 5 times, one per byte, and hands back two raw bytes (195 and 169) for the é. The range loop runs 4 times and yields the é as a single rune. Same string, same data in memory, two different views of it.
The range loop is what most code wants when it's working with text. The byte loop is what you want when you're working with bytes (parsing a wire protocol, scanning for an ASCII delimiter, computing a checksum).
Byte iteration uses a classic C-style for loop. Each step pulls one byte out of the string with the index operator s[i]. The type of s[i] is byte, which is just an alias for uint8.
A few things to note:
len(code) is the byte length, which is what you want for this loop.code[i] returns a byte, not a rune. With %c it still prints as a character because every byte under 128 is a valid one-byte UTF-8 sequence and matches the ASCII character.The byte loop is the right tool for byte-oriented work. Parsing a product code that's guaranteed to be ASCII, checking that a string starts with "SKU-", computing a hash byte by byte, all of these are byte work. The loop is straightforward, allocates nothing, and runs in O(n) where n is the byte length.
Cost: byte iteration is O(n) with zero allocations. s[i] is a constant-time array index. This is the fastest way to walk a string when you don't need to interpret UTF-8.
The problem starts when the data isn't all ASCII and you forget. The byte loop will happily walk through "José" and give you the bytes 74, 111, 115, 195, 169. If your code treats each of those as a character, you'll print garbage for the last two and you'll think the string has 5 characters when it has 4.
The for ... range form over a string is rune-aware. The Go specification defines this specifically: ranging over a string decodes UTF-8 one code point at a time and yields the byte index of where that code point starts together with the rune itself.
Look at the index column. It runs 0, 1, 2, ... 15, then jumps to 18, then 21, then 24, then 27. The star takes three bytes in UTF-8 (0xE2 0xAD 0x90), so each successive star's starting byte is three positions later than the last.
This is the part that surprises everyone the first time. The i in for i, r := range s is not a sequential 0-based character counter. It is the byte offset of the first byte of the current rune. For an ASCII-only string the two are the same number, which is exactly why the bug is hard to find by accident.
Why does Go work this way? Because the byte index is the only thing that's safe to use as a substring boundary. If you have the rune at index 15 and you want everything before it, you can write review[:15]. If i were a character count, you couldn't slice with it at all without re-walking the string.
The rune type behind r is an alias for int32. A rune holds a single Unicode code point, which is a number, not a string. That's why printing it with %d shows 11088 and with %q shows '⭐'.
The single most common bug in this area is treating the index from range as a sequential counter. Here's the wrong pattern:
Look at the last line. The rune is 'é' (code point 233), but name[i] is 195, which is the first byte of é's UTF-8 encoding, not the rune itself. The two Printf arguments disagree because they're measuring different things. r is the decoded rune; name[i] is one raw byte sitting at offset i.
The fix is to just use r. The whole point of the range form is that it hands you the rune already. Reaching back to name[i] defeats the purpose and reintroduces the byte view.
The other side of the same bug is trying to index a string with a "character number" you've been counting yourself:
What's wrong with this code?
This prints Ã, not é. The intent is "the fourth character", but name[3] is the fourth byte, which is the first byte of the two-byte sequence that encodes é. Printed alone, that byte is the Latin-1 character Ã, which is not what anyone wanted.
Fix:
Converting to []rune gives you a slice where index n is the n-th character. That's what we'll cover next.
This is a deliberate language design decision. The Go team made range over a string rune-aware because text iteration is the common case, and forcing every programmer to call utf8.DecodeRuneInString by hand would be miserable. The byte-index loop is still available for the cases where bytes are what you want, but the easier form (range) is the safer one for text.
The decoding is real work, not magic. Each step of the range loop:
rune value.Cost: range iteration is O(n) in the number of bytes, with no allocation. The UTF-8 decode adds a few instructions per character compared to a byte loop. For most workloads the difference is invisible, but in a tight hot path that only needs bytes, the byte loop is faster.
The visual below shows what happens for the string "Hi⭐". The byte indices and the rune indices line up for H and i, then the star takes three bytes and the next rune would start at byte 5.
The cyan boxes are single-byte ASCII characters. The orange boxes are the three bytes that together encode the star. On the right side, the range loop produces three iterations. The first two indices match the byte position one-to-one. The third one starts at byte 2 and the loop's next position would jump to byte 5, which is past the end of the string, so the loop stops.
Reading this diagram, the byte-index gotcha makes sense. The i value in the range loop is the byte where the rune starts. After the star, the next iteration would carry i = 5, not i = 3.
A Go string can hold any sequence of bytes, including bytes that don't form valid UTF-8. The compiler doesn't check, and you can construct invalid UTF-8 by reading bytes off the network, opening a file with an unexpected encoding, or just slicing in the middle of a multibyte rune.
When the range loop hits an invalid byte sequence, it yields the special rune utf8.RuneError (which equals U+FFFD, the Unicode replacement character) and advances one byte. Then it continues from the next byte. It doesn't panic, and it doesn't skip the bad data.
The first three iterations are fine. At byte 3, the loop sees 0xC3, which is the start of a two-byte rune, but there's no second byte. The range loop yields U+FFFD with i=3 and stops.
In production code you usually want to either reject the input (string came from an untrusted source, validate it with utf8.ValidString before processing) or replace bad runes with U+FFFD and continue. The range loop already does the second one by default, which is convenient and occasionally surprising.
Cost: the range loop's behavior on invalid UTF-8 is to emit U+FFFD and advance one byte at a time. A long run of bad bytes turns into one iteration per byte, not one per rune. For trusted input this never matters; for untrusted input, validate up front with utf8.ValidString.
[]runeWhen you actually need random access by character position, convert the string to []rune:
[]rune(s) walks the entire string, decodes every rune, and stores the runes in a freshly allocated slice. After the conversion you have a slice of int32 values where index n is the n-th character. Indexing is O(1), as it is for any slice.
This is what you reach for when:
The cost is real. The conversion is O(n) in the byte length and it allocates a new slice big enough to hold all the runes. For a short customer name this is invisible. For a 10 KB product description that you only need to read forward, it's wasted work; the range loop already does what you need with zero allocations.
Cost: []rune(s) is O(n) time and allocates O(rune count) memory. Use it when random access or in-place edits are needed. For a single forward walk, use range and skip the allocation entirely.
Once you have the rune slice, iterating it with range works the way you'd expect, with sequential 0, 1, 2, ... indices:
The indices are now true character positions, because chars is a slice of runes, not a string. chars[i] is the same as r for every iteration, and the byte-index gotcha is gone. You traded one allocation for a saner iteration model.
A common question: how many characters are in this string? The answer is not len(s), which counts bytes. There are two correct ways:
Both utf8.RuneCountInString(s) and len([]rune(s)) give the same answer. They count code points, not bytes. The byte length is much larger because each star takes three bytes.
The two methods are not equivalent in cost.
Cost: utf8.RuneCountInString(s) is O(n) in bytes with zero allocation. len([]rune(s)) is also O(n) but allocates a full rune slice that's immediately discarded. For counting alone, prefer the utf8 function. The slice form is only worthwhile if you'll keep the slice around for further work.
A common pattern in product description validation is to enforce a character limit, not a byte limit. Using len(s) > 200 would be wrong because a description full of emoji or accented characters might be well under 200 visual characters but still over 200 bytes. Use the rune count:
For the all-ASCII description the two numbers match. For the second, bytes are well above characters because the accented é and the five stars each take more than one byte.
"Reverse a string" is a popular interview question, and it's a perfect demonstration of the bytes vs runes split. The naive approach uses bytes:
What's wrong with this code?
For ASCII input the function works. For "José" it produces ©Ã©soJ, which is mojibake (corrupted text). The byte loop swapped the two bytes of é independently, splitting a single rune into garbage. The second swap put one of those bytes adjacent to the wrong neighbor, which decoded back as à and ©.
Fix:
Now é stays intact through the reversal, and the star comes out as a single character on the left side of the result. The fix is to swap whole runes, not bytes. The body of the loop is otherwise the same.
This is also a hint about what "reverse a string" even means once you allow non-ASCII text. There are characters in Unicode that are technically composed of multiple code points (a letter plus a combining accent, for example), and reversing by rune can still split those. For most practical e-commerce inputs (names, descriptions, search terms) the rune-by-rune reverse is the right answer and gets the job done.
Cost: reverseRunes is O(n) in characters and allocates one rune slice plus one new string. That's the price of correctness. The byte-loop version is faster but only safe for pure ASCII.
Another common task: truncate a product description to the first 50 characters and add an ellipsis. Slicing by byte (s[:50]) is wrong as soon as the cut falls inside a multibyte rune. You either get an invalid UTF-8 fragment, or you cut a rune in half and corrupt it.
Here's the correct version:
How it works:
for i := range s form is the rune-aware iteration. The i is the byte position where each rune starts.n, the next rune is the one we want to exclude, so its starting byte position i is the correct cut point.s[:i] and return. That slice is a valid string because i is always on a rune boundary.n, the loop exits normally and we return the whole string.Why is this safer than s[:n]? Because s[:n] would cut at byte n, which has no relation to character n. For an ASCII string those are the same, but for any string with multibyte characters, byte slicing can produce garbage.
You can do the same job with []rune(s)[:n] and a conversion back, but that allocates the rune slice for no real reason. The range-loop version above is O(n) in characters and allocates nothing beyond the result string itself.
A short decision table for the common cases:
| Goal | Iteration |
|---|---|
| Walk every character of text in order | for i, r := range s |
| Process raw bytes (parse a fixed ASCII format, compute a checksum) | for i := 0; i < len(s); i++ |
| Get the N-th character for random access | runes := []rune(s); runes[n] |
| Count characters | utf8.RuneCountInString(s) |
| Truncate to the first N characters | range loop + count + slice at byte index |
| Reverse a string | convert to []rune, reverse, convert back |
| Check a prefix or suffix where you know it's ASCII | byte index or strings.HasPrefix |
| Check that a string is valid UTF-8 | utf8.ValidString(s) |
When in doubt, use the range loop. It's correct for text, fast enough for almost any workload, and doesn't allocate. Reach for []rune(s) only when you actually need random access by character position, and reach for the byte loop only when you actually need bytes.
A short example tying it together. Imagine you're processing a list of customer reviews. You want to print each review's character count and the first 30 characters as a preview.
Three reviews, three different character counts, and each preview cuts cleanly at a character boundary. The accented é in the French review is preserved intact, and the stars never appear half-encoded. The range loop is doing all the real work; utf8.RuneCountInString is just there for the count.
for i := 0; i < len(s); i++ reads one byte at a time, and for i, r := range s decodes UTF-8 and yields one rune at a time.i is the byte index where the current rune starts, not a 0, 1, 2 character counter. For strings with multibyte runes, i jumps by more than 1 between iterations.s[i] is always a byte. Inside a range loop, reaching back to s[i] reintroduces the byte view and undoes the rune decode that range just did.utf8.RuneError (U+FFFD) and advancing one byte at a time. It never panics on malformed input.[]rune(s) converts the string into a slice of code points, allocating O(n) memory but giving you O(1) random access by character. Use it when you need indexed access or in-place edits.utf8.RuneCountInString(s) over len([]rune(s)). Both are O(n), but the first allocates nothing.The next lesson, String Comparison and Equality, looks at how strings compare with == and <, why those compare bytes rather than characters, and what to do when you need case-insensitive or locale-aware comparison from the strings package.