The answer depends on the Java version you're working with. However, two fields have remained consistent across all versions: a
char[] value array (which stores the characters of the string) and an
int hash field (which caches the hash code upon its first calculation, in accordance with the
hashCode method contract).
Before Java 7, there were additional fields:
offset and
count, which allowed reusing portions of arrays without creating new ones (useful for
string builders and substrings). However, these were removed to reduce memory consumption.
Initially, all strings were stored in UTF-16 encoding, where each character occupied 2 bytes, fitting into a
char. However, it was discovered that most strings in practice contain only ASCII characters, which require only 1 byte and fit within the LATIN-1 encoding. This meant that the upper byte of most
char values remained unused, and strings were effectively half empty. Meanwhile, a large portion of an application's memory (
around a quarter) is taken up by strings.
In Java 6, an experimental feature called
Compressed Strings was introduced, allowing strings containing only LATIN-1 characters to be stored in a
byte[] instead of
char[]. However, due to several issues, this feature was later reverted.
String compression returned in Java 9 with the introduction of
Compact Strings, which is enabled by default. A new
coder field was added to the
String class, which determines the encoding (LATIN-1 or UTF-16). The type of the value field was also changed from
char[] to
byte[]. A static flag
COMPACT_STRINGS allows the feature to be turned off entirely.