Understanding the garbage collector

Introduction to garbage collection
Memory spaces in the heap
Mark and sweep algorithm
Events that trigger garbage collection
Garbage collectors types
Marking objects as inaccessible

Introduction to Garbage Collection

In Java, garbage collection is an automatic process that manages memory by identifying and removing objects that are no longer in use. This helps prevent memory leaks and ensures that the heap remains available for new object creation without requiring manual memory management by developers.

The garbage collector operates using a process known as the mark-and-sweep algorithm. In this approach, the collector first marks all objects that are still reachable from the program's root references. Any objects that are not marked during this phase are considered unreachable and therefore eligible for removal.

During the sweeping phase, the collector scans through the heap, freeing up memory occupied by these unreachable objects. This reclaimed memory can then be reused for new object allocations, keeping the program's memory usage efficient and preventing out-of-memory errors.

To further improve performance, the JVM divides the heap into distinct regions, or generations. This separation allows garbage collection to focus more frequently on short-lived objects, which typically occupy the younger generation, while scanning long-lived objects less often in the older generation. This generational approach helps optimize memory management and reduces the performance impact of frequent garbage collection.

In the following sections, we'll take a closer look at the different generations within the JVM's memory heap and explore how the mark-and-sweep algorithm works in greater detail.

Memory Spaces in the Heap

The JVM heap is divided into multiple regions, each optimized for managing objects based on their lifespan. This generational design allows the garbage collector to efficiently handle short-lived and long-lived objects without scanning the entire heap each time. The main areas are the Eden space, Survivor spaces, and the Tenured (or Old) space.

Eden Space

The Eden space is where all new objects are initially created. Most objects in Java are short-lived, meaning they quickly become unreachable after creation. When the Eden space becomes full, a minor garbage collection is triggered. The collector removes objects that are no longer in use and moves the remaining live objects to one of the Survivor spaces. The Eden space is part of the Young Generation in the heap.

Survivor Spaces

There are two Survivor spaces in the JVM: Survivor 0 (S0) and Survivor 1 (S1). These spaces are also part of the Young Generation and work in rotation. After each minor garbage collection, objects that survive are copied from Eden into one of the Survivor spaces. With every subsequent collection, objects move back and forth between these two spaces until they have survived enough cycles to be promoted to the Tenured space.

Tenured (Old) Space

The Tenured or Old space holds long-lived objects that have survived several garbage collection cycles in the Young Generation. This region is larger than the Eden or Survivor spaces, and garbage collection here happens less frequently through what is called a major garbage collection. Because it is not checked as often, this area can sometimes be more susceptible to memory leaks if unused objects remain referenced.

How These Spaces Improve Efficiency

Dividing the heap into these distinct regions makes garbage collection far more efficient. Most new objects are short-lived and can be quickly cleaned up from the Eden space during minor collections, which are fast and frequent. Meanwhile, objects that survive longer are moved to areas that are checked less frequently. This design reduces the time spent on garbage collection and minimizes performance overhead during program execution.

Because the Tenured space is much larger than the Eden space, it fills up less frequently and is collected less often. However, this efficiency comes with a trade-off: if unused objects remain referenced in the Tenured space, they can cause memory leaks that persist over time.

Mark and Sweep Algorithm

The Java Virtual Machine's garbage collector primarily relies on the mark-and-sweep algorithm to identify and reclaim unused memory. This algorithm ensures that objects no longer referenced by the application are efficiently removed from the heap, preventing memory leaks and optimizing performance.

The algorithm operates in two main phases: the mark phase and the sweep phase. When an object is first created in the heap, it has a mark bit that is initially set to 0 (false), meaning it has not been visited or marked as active by the garbage collector.

During the mark phase, the garbage collector starts from a set of root references—these include active local variables, static fields, and references held by running threads. From these roots, the collector traverses the object graph, following references from one object to another. Every object that is reachable from a root is marked by setting its mark bit to 1 (true). Objects that are not reachable remain unmarked, with their bits still set to 0.

Once the marking process is complete, the sweep phase begins. In this phase, the garbage collector scans through the entire heap and reclaims memory from all objects whose mark bits remain 0, meaning they are no longer reachable. These unreferenced objects are effectively removed, and their memory becomes available for future allocations.

A downside of the sweep phase is that it can result in memory fragmentation, where free memory becomes scattered across the heap in small, non-contiguous blocks. To mitigate this, some garbage collectors perform an additional compact phase, shifting surviving objects toward one end of the heap and consolidating free memory into a large continuous region. This mark-sweep-compact strategy is commonly used in the old generation of the heap, where long-lived objects tend to accumulate and fragmentation becomes more pronounced. For example, collectors such as G1 apply variations of this algorithm.

However, one limitation of this approach is that it can temporarily pause program execution during collection, especially when the heap is large. Modern JVM implementations therefore build on this algorithm—introducing optimizations such as concurrent marking, generational garbage collection, and compacting—to minimize pauses and improve efficiency.

Events that trigger garbage collection

Garbage collection in the JVM is not a constant process; it occurs in response to specific events that signal the need to reclaim memory. These events are typically categorized into three types — minor, mixed, and major — depending on which parts of the heap are affected.

Minor events

Minor garbage collection events occur when the Eden space in the young generation becomes full. When this happens, the garbage collector identifies live objects in the Eden space and moves them to one of the survivor spaces. Objects that survive multiple minor collections may eventually be promoted to the old generation. These events are relatively frequent but fast, as they focus only on the young generation.

Mixed events

Mixed garbage collection events occur in certain collectors, such as the G1 Garbage Collector. These events reclaim memory not only from the young generation but also from selected regions of the old generation. The goal is to balance performance and memory usage by cleaning up areas that contribute most to heap occupancy.

Major events

Major garbage collection events, sometimes called full GCs, clean up both the young and old generations of the heap. They are more expensive operations, as they require pausing application threads for a longer time. These events typically occur when the old generation becomes full or when memory fragmentation prevents efficient allocation of new objects.

In summary, minor events handle short-lived objects quickly, mixed events optimize heap usage across generations, and major events perform deep cleanups when memory pressure becomes high.

Types of garbage collectors

The Java Virtual Machine (JVM) provides several types of garbage collectors, each optimized for different workloads and performance goals. Some collectors prioritize throughput, while others focus on minimizing pause times or handling large heaps efficiently. Choosing the right collector depends on the nature of the application and its performance requirements.

Serial garbage collector

The Serial Garbage Collector is the simplest type of collector, designed primarily for single-threaded applications or environments with small data sets. It uses a single thread to perform garbage collection and triggers a stop-the-world pause, halting all other application threads while it reclaims memory. Although efficient in smaller systems with limited resources, it is generally unsuitable for production environments or applications that require high responsiveness.

Parallel garbage collector

The Parallel Garbage Collector, also known as the Throughput Collector, is the JVM's default garbage collector. It improves performance by using multiple threads to perform garbage collection tasks in parallel, taking advantage of modern multi-core processors. This approach enhances overall throughput by reducing the time spent in collection cycles. However, like the Serial Collector, it still performs stop-the-world pauses during garbage collection, making it less ideal for applications requiring low latency.

Concurrent Mark and Sweep (CMS) collector

The Concurrent Mark and Sweep Collector aims to minimize pause times by performing most of its garbage collection work concurrently with the application threads. It uses multiple threads to mark and sweep objects, significantly reducing the duration of stop-the-world events. This makes it a better choice for interactive or user-facing applications, where long pauses would degrade the user experience. However, CMS can only collect objects in the old generation concurrently. It still requires pauses when collecting the young generation, and it tends to consume more CPU resources due to concurrent execution.

Garbage-First (G1) Garbage Collector

The Garbage-First (G1) garbage collector was designed as a long-term replacement for the Concurrent Mark-Sweep (CMS) collector, with a strong focus on predictable pause times and reduced fragmentation. Unlike earlier collectors that performed large, monolithic collections, G1 aims to meet user-defined pause time goals by doing its work incrementally.

G1 uses a region-based heap layout, dividing the heap into many small, fixed-size regions rather than strictly separating it into young and old generations. Each region can belong to either generation, allowing the collector to flexibly choose which regions to collect based on how much garbage they contain.

During a collection cycle, G1 prioritizes regions with the highest amount of reclaimable space, collecting the “most garbage first.” This selective approach avoids scanning the entire old generation at once and helps keep pause times short and more predictable.

To combat fragmentation, G1 performs incremental compaction as part of its evacuation process, moving live objects between regions. While this work is spread across multiple pauses—resulting in shorter individual pauses—it may lead to more frequent pauses overall. Most of G1's marking and cleanup phases run concurrently and in parallel with application threads, balancing throughput with responsiveness.

By combining region-based memory management, predictive pause-time modeling, and mostly concurrent reclamation, G1 provides more consistent performance across a wide range of workloads while giving users control over maximum pause-time targets.

Z Garbage Collector (ZGC)

The Z Garbage Collector, or ZGC, is one of the most advanced collectors in the JVM. It is designed for extremely low-latency performance, even with massive heap sizes that can scale up to 16 terabytes. ZGC keeps pause times consistently below one millisecond by performing almost all of its work concurrently with application threads, including marking and relocation. This makes ZGC ideal for large-scale, memory-intensive, and real-time applications where predictable response times are essential.

ZGC achieves its performance targets through two main innovations:colored pointers and load barriers. These mechanisms allow the collector to track object states and relocate objects concurrently without requiring long stop-the-world pauses.

Colored Pointers:ZGC embeds metadata directly into an object's 64-bit pointer by using a set of unused high-order bits. These bits—referred to as pointer "colors"—encode the object's current status during the garbage collection cycle. Some examples include Marked0 and Marked1, which indicate whether an object has been marked as live in the current cycle (with ZGC alternating between them each cycle), and Remapped, which indicates whether a pointer already refers to the object's updated location after relocation. Because this metadata is stored inside the pointer itself, ZGC can quickly determine an object's state without performing extra memory lookups, significantly reducing latency.

Load Barriers:A load barrier is a small piece of code that the JVM inserts into compiled application code. It runs whenever a reference is loaded from the heap (for example, when accessing a non-primitive field). The barrier checks the pointer's color to determine whether the reference is up-to-date. If the color indicates a potential issue—such as theRemapped bit showing that the object may have been moved—the load barrier performs a corrective action.

This corrective action involves determining the object's current, correct location (often via a forwarding table), updating the stale pointer in memory to the relocated address, and returning the corrected pointer to the application thread. This "self-healing" mechanism allows ZGC to relocate and compact objects concurrently without requiring long stop-the-world pauses to fix references.

As a result, ZGC delivers predictable, sub-millisecond pause times while supporting extremely large heaps, making it one of the most efficient low-latency garbage collectors available in modern JVMs.

Why is stop-the-world necessary?

Garbage collectors use stop-the-world (STW) pauses because it is the most reliable way to obtain a consistent snapshot of the heap. During an STW pause, all application threads (mutators) are temporarily suspended, ensuring that object references do not change while the garbage collector is examining memory.

The primary reason for this pause is correctness. To safely reclaim memory, the garbage collector must accurately determine which objects are still reachable (live) and which are no longer referenced (garbage). If application threads continue to modify references while the GC is marking objects, the collector could mistakenly reclaim memory that is still in use, leading to data corruption or crashes.

Stop-the-world is especially important during compaction. Over time, memory becomes fragmented, leaving small gaps scattered throughout the heap. Compaction relocates live objects to create larger contiguous free regions. Moving an object requires updating every reference that points to it, which is extremely difficult to do safely while the application is actively reading and writing those references.

Finally, STW pauses simplify the design of the garbage collector and improve efficiency for throughput-oriented workloads. A stop-the-world collector does not need complex synchronization mechanisms to coordinate with running application threads, allowing it to complete its work faster and with less overhead, at the cost of brief pauses in application execution.

Summary

In summary, each garbage collector offers a trade-off between throughput, pause time, and resource usage. The Serial Collector is simple but limited, the Parallel Collector boosts throughput, CMS reduces pause time, G1 balances both, and ZGC delivers cutting-edge scalability with near-zero latency.

Marking objects as inaccessible

In Java, an object becomes eligible for garbage collection when it is no longer reachable by any active reference in the program. The garbage collector identifies such objects as inaccessible and reclaims their memory during the next collection cycle. Below are several common scenarios that make an object inaccessible.

Creating an object inside a method

When you create an object inside a method, the reference to that object typically exists only within the method's local scope. Once the method finishes executing, all local variables are removed from the stack, making the objects they referenced unreachable. As a result, those objects become eligible for garbage collection.

Nullifying a reference variable

Another way to make an object unreachable is by setting its reference variable to <NULL>. When all references to an object are set to null, the object can no longer be accessed by any part of the program, allowing the garbage collector to reclaim its memory.

Reassigning a reference variable

You can also make an object inaccessible by reassigning its reference variable to another object. When this happens, the old object loses its last reference and becomes unreachable. Whether through nullifying or reassigning, once no active references exist, the object is considered inaccessible and is eventually removed by the garbage collector.

Creating an anonymous object

Anonymous objects are created without assigning them to a variable, which means they have no explicit reference in the program. Because they cannot be accessed once their purpose is fulfilled, the garbage collector automatically marks them for removal during the next collection cycle.

In summary, garbage collection in Java depends on the concept of reachability. Whether an object loses its reference because a method ends, a variable is nullified, or it is replaced by another reference, the result is the same — the object becomes inaccessible and is eventually cleared from memory.