Overview
A string is a read-only slice of arbitrary bytes. It is not required to hold Unicode, UTF-8 text, or any other predefined format
A string literal, absent byte-level escapes, always holds valid UTF-8 sequences
Representation
A string
is represented by a stringStruct
struct:
str
is a pointer to an immutable backing array of byteslen
is a total number of bytes in the backing array
For example:
Strings are immutable, so there is no need for a capacity (you can’t grow them)
It is safe for multiple strings to share the same storage, so slicing s
results in a new 2-word structure with a potentially different pointer and length that still refers to the same byte sequence
This means that slicing can be done without allocation or copying, making string slices as efficient as passing around explicit indexes
When a string is assigned to another string, the two word value is copied, resulting in two different string values both sharing the same backing array. The cost of copying a string is the same regardless of the size of a string, a two word copy
Casting and Memory Allocation
Because the underlying byte array is immutable, casting []byte
to string
and vice versa results in a copy
However, there are some optimizations that the compiler makes to avoid copies:
- For a map
m
of typemap[string]T
and[]byte b
,m[string(b)]
doesn’t allocate - No allocation when converting a
string
into a[]byte
for ranging over the bytes - No allocation when converting a
[]byte
into astring
for comparison purposes - A conversion from
[]byte]
tostring
which is used in a string concatenation, and at least one of concatenated string values is a non-blank string constant
Substrings and Memory Leaks
When performing a substring operation, the Go specification doesn’t specify whether the resulting string and the one involved in the substring operation should share the same data. However, the standard Go compiler does allow them share the same backing array
Thus, there can be memory leaks, as the string returned by a substring operation will be backed by the same byte array. The solution is to make a copy of the string:
References
- research!rsc: Go Data Structures
- 100 Go Mistakes and How to Avoid Them. Teiva Harsanyi
- Source code: string.go
- Strings in Go -Go 101
- Go Wiki: Compiler And Runtime Optimizations - The Go Programming Language
- Strings, bytes, runes and characters in Go - The Go Programming Language
- Ultimate Go Notebook. William Kennedy, Hoanh An