Introduction to JVM

Table of Contents

JVM Function and Its Components

The primary function of the Java Virtual Machine (JVM) is to load and execute your application. The application is typically a .class file, which is generated by compiling a .java source file. When you run the command java MyApp, a new JVM instance is created to execute the program.

So, how does the JVM instance load and execute a class file? This process involves three main components: the Class Loader, the Runtime Data Area, and the Execution Engine.

The Class Loader is responsible for locating and loading .class files into memory. Once loaded, these files contain bytecode—intermediate instructions that are understood by the JVM. These bytecode instructions are then passed to the Execution Engine.

The Execution Engine interprets or compiles the bytecode into native machine code, depending on the implementation. To execute this native code, the Execution Engine interacts with the underlying operating system. This is often done through native method calls, allowing the JVM to leverage platform-specific functionality while maintaining platform independence at the bytecode level.

Class Loader Subsystem

The Class Loader subsystem has three main phases: Load, Link, and Initialize.

Load Phase

The Load phase is responsible for loading bytecode into memory. There are three types of class loaders involved in this phase:

  • Bootstrap Class Loader: Loads core Java classes found in the rt.jar file. These classes are part of the JVM itself.
  • Extension Class Loader: Loads classes from the jre/lib/ext directory. These are typically optional extension libraries.
  • Application Class Loader: Loads classes from the paths specified in the CLASSPATH environment variable, including your application classes and dependencies (e.g., from pom.xml).

Link Phase

The Link phase consists of three sub-phases: Verify, Prepare, and Resolve.

  • Verify: Ensures the bytecode loaded by the class loader adheres to the JVM specification and is safe to execute.
  • Prepare: Allocates memory for all static variables of the class and sets them to their default values.
  • Resolve: Converts symbolic references (e.g., class names) into direct references in memory. If the class references another class, those references are resolved during this phase.

In Java, class references are initially stored as symbolic references. This means that instead of directly pointing to a memory address, they refer to the class using a symbolic name (e.g., com.example.OtherClass). These symbolic references act as placeholders and are resolved during the Resolve sub-phase.

During resolution, the JVM looks up the symbolic name in the Metaspace—a region of memory dedicated to storing class metadata, such as methods and static fields. If the referenced class (e.g., OtherClass) has already been loaded, the JVM retrieves its memory address. Otherwise, the JVM loads it first, and then resolves the reference to point directly to the loaded class.

Initialize Phase

In the Initialize phase, static initializers and static blocks in the class are executed. Static variables are also assigned their defined values during this phase.

Runtime Data Areas

Metaspace

The Metaspace is the area of memory where class-related metadata is stored. This includes:

  • Class names, superclass names, and implemented interfaces
  • Information about Static variables(actual data stored in heap)
  • Bytecode for methods and constructors
  • Constant pool (literals and symbolic references to classes)
  • Field information (names, types, access modifiers)
  • Method metadata

Metaspace automatically grows as needed, unlike the older PermGen memory space.

1public class HelloWorld {
2    static String message = "Hello";
3    
4    public static void main(String[] args) {
5        System.out.println(message);
6    }
7}

In the Metaspace for the above example, you would find:

  • Metadata for the HelloWorld class
  • The static field message
  • Bytecode for the main method
  • Constant pool entry for the string literal "Hello"

Heap

The Heap is where all object data and instance variables are stored. This memory area is shared among all threads and can be tuned using the -Xms (initial size) and -Xmx (maximum size) JVM options.

Program Counter (PC) Registers

Each thread has its own PC Register, which contains the address of the next bytecode instruction to execute. It helps the JVM keep track of execution flow in multithreaded environments.

Java Stacks

Each thread has its own Java stack, which stores stack frames for method execution. A stack frame contains method arguments, local variables, the return address, and intermediate results.

Execution Engine

The execution engine is the core component of the JVM responsible for executing bytecode. It consists of:

  • Interpreter
  • Just-In-Time (JIT) Compiler
  • HotSpot Profiler
  • Garbage Collector

Interpreter

Once the bytecode is loaded, the interpreter reads and executes each instruction line by line. It determines what native operations are needed and performs them via the Native Method Interface, which connects to native libraries present in the JVM.

For example, on Windows systems, you'll find native libraries as .dll files in the JRE's bin folder, while Linux systems use .so or .a modules.

The interpreter is useful for short-lived or rarely used code. However, it has slower performance overall because it repeatedly interprets every instruction.

JIT Compiler

The Just-In-Time (JIT) compiler improves performance by compiling frequently executed (hot) bytecode instructions into native machine code at runtime. Once compiled, these sections are executed directly, bypassing interpretation.

This is particularly beneficial for performance-critical methods or loops. The native code produced by JIT runs significantly faster than interpreted bytecode. Compilation only occurs for hot methods, as determined by the HotSpot Profiler.

HotSpot Profiler

The HotSpot Profiler monitors bytecode execution and collects runtime statistics, such as:

  • Which methods and loops are frequently used (hot spots)
  • Type information
  • Branching behavior

This profiling data is used by the JIT compiler to apply advanced optimizations like inlining, loop unrolling, and branch prediction.

Interpreter vs. JIT Compiler

Both the interpreter and JIT compiler are used in the JVM for different purposes:

  • Fast Startup: The interpreter enables immediate execution of bytecode, making it ideal for short-lived applications or command-line tools.
  • Resource Efficiency: Most code is executed only a few times. The interpreter avoids wasting time and memory by not compiling such code.
  • Better Optimizations: Runtime profiling data gathered by the interpreter and HotSpot Profiler allows the JIT to perform smarter optimizations.

This hybrid approach allows the JVM to balance quick startup with long-term performance.

Metaspace

When you compile Java source code using javac, the compiler produces a .class file. This file is a binary representation of the class and contains the class header, constant pool, field and method definitions, bytecode instructions, and additional metadata such as annotations, line numbers, and exception tables.

The .class file itself is a static, on-disk artifact defined by the JVM specification. It does not execute on its own; instead, it describes how the class should behave once loaded by the Java Virtual Machine.

When a program runs, the JVM performs class loading. A ClassLoader locates and reads the .class file—whether from disk, a JAR, or another source—and parses its binary structure according to the Class File Format. From this, the JVM constructs an internal runtime representation of the class.

This internal representation, often referred to as the class descriptor, is stored in Metaspace. Metaspace is a native-memory region used by the JVM to hold class metadata, including method and field information, the runtime constant pool, the class hierarchy, and the bytecode for each method. Once loading is complete, the JVM can execute methods directly from the metadata stored in Metaspace and no longer needs to access the .class file.

At the same time, the JVM creates a corresponding java.lang.Class<?> object on the Java heap. This heap object acts as a handle that points to the class's runtime metadata stored in Metaspace and is the object through which reflection and runtime type checks operate.

During the linking and resolution phases, symbolic references in the constant pool are resolved into direct references to runtime structures. After resolution, references to a class effectively become pointers to its runtime metadata in Metaspace, accessed indirectly via the associated Class<Foo> object on the heap.

In summary, saying that a class is "loaded into Metaspace" means that the JVM has parsed the .class file, created an internal runtime representation of the class, stored its metadata in Metaspace, and established a corresponding java.lang.Class<?> object on the heap that serves as a gateway to that metadata.

1class Example {
2    static int counter = 10;
3    static final String msg = "Hello";
4}

In this example, the class Example has a static integer field counter and a static final string msg. The metadata describing these fields — such as their names, types, and modifiers — is stored in Metaspace. The actual value of counter(which is 10) and the string object representing "Hello" reside in the Java heap.

In essence, the .class file on disk serves as the source of bytecode and metadata, while the JVM's Metaspace is where this information resides and is used at runtime. Once loaded, all necessary data for class execution exists entirely in memory, and the original.class file is no longer required.

Therefore, while the bytecode originates from the.class file, it ultimately resides and runs inside the JVM's Metaspace — the region of memory dedicated to holding each class's runtime representation.