What is a String made of?

The answer depends on the Java version you're working with. However, two fields have remained consistent across all versions: a char[] value array (which stores the characters of the string) and an int hash field (which caches the hash code upon its first calculation, in accordance with the hashCode method contract).

Before Java 7, there were additional fields: offset and count, which allowed reusing portions of arrays without creating new ones (useful for string builders and substrings). However, these were removed to reduce memory consumption.

Initially, all strings were stored in UTF-16 encoding, where each character occupied 2 bytes, fitting into a char. However, it was discovered that most strings in practice contain only ASCII characters, which require only 1 byte and fit within the LATIN-1 encoding. This meant that the upper byte of most char values remained unused, and strings were effectively half empty. Meanwhile, a large portion of an application's memory (around a quarter) is taken up by strings.

In Java 6, an experimental feature called Compressed Strings was introduced, allowing strings containing only LATIN-1 characters to be stored in a byte[] instead of char[]. However, due to several issues, this feature was later reverted.

String compression returned in Java 9 with the introduction of Compact Strings, which is enabled by default. A new coder field was added to the String class, which determines the encoding (LATIN-1 or UTF-16). The type of the value field was also changed from char[] to byte[]. A static flag COMPACT_STRINGS allows the feature to be turned off entirely.