Wonderfull Rust

Performance

The Sources of Rust's Performance

Rust is a language designed to achieve "both safety and performance." This chapter explains why Rust can achieve such high performance, along with the technical basis for it.

Zero-Cost Abstractions

Zero-cost abstractions are a core concept of Rust's design principles, originating from a principle articulated by C++ designer Bjarne Stroustrup:

What you don't use, you don't pay for. And further: What you do use, you couldn't hand code any better.

Rust follows this principle thoroughly. Even when using high-level abstractions (iterators, traits, generics, etc.), the generated machine code is equivalent to hand-written low-level code.

Iterator Example

// High-level style
let sum: i32 = (0..1000)
    .filter(|x| x % 2 == 0)
    .map(|x| x * x)
    .sum();

// Hand-written low-level style
let mut sum: i32 = 0;
let mut i = 0;
while i < 1000 {
    if i % 2 == 0 {
        sum += i * i;
    }
    i += 1;
}

These two produce identical machine code after optimization. The overhead from iterator chain abstraction is literally zero.

The reason this is achievable lies in monomorphization, described later.

The Choice of No GC

Many modern languages (Java, Go, Python, JavaScript, C#, ...) employ garbage collection (GC). GC automates memory management and reduces the programmer's burden, but it comes with unavoidable performance costs:

  • Stop-the-world (STW) pauses: Application execution is temporarily halted when the GC reclaims memory. Generational and concurrent GC can mitigate this, but cannot completely eliminate it
  • Memory overhead: GC needs to allocate more memory than actually required to operate efficiently. GC-equipped languages typically consume 2-5x more memory than GC-free languages
  • Unpredictable latency: GC timing is non-deterministic, which can be problematic for latency-sensitive applications (real-time systems, game engines, HFT, etc.)

Rust has no GC. Instead, it determines memory deallocation timing at compile time through the ownership system. When a value goes out of scope, its memory is freed immediately (RAII). This means:

  • No STW pauses
  • Predictable memory usage
  • Deterministic deallocation timing

Eliminating GC while still guaranteeing memory safety -- this is the ownership system's greatest achievement.

The LLVM Backend

Rust's compiler (rustc) uses LLVM for code generation. LLVM is one of the world's most mature compiler infrastructures, used as a backend by many languages including Clang (C/C++), Swift, and Julia.

Examples of optimizations LLVM performs:

  • Function inlining: Eliminates function call overhead
  • Loop unrolling: When loop iteration counts are small, unrolls the loop to eliminate branching
  • Constant propagation: Pre-computes values that can be determined at compile time
  • Dead code elimination: Removes code that's never executed
  • Auto-vectorization: Automatically generates code using SIMD instructions for parallel processing
  • Alias analysis: Analyzes where pointers point to discover optimization opportunities

Rust can provide LLVM with rich compile-time information (ownership, lifetimes, aliasing rules), which in some cases enables optimizations equal to or better than C/C++.

Particularly important is the exclusivity guarantee of &mut. In Rust, while a mutable reference (&mut T) to a value exists, the compiler guarantees no other references to that value exist. This provides semantics equivalent to C's restrict pointer qualifier, but in C, restrict is merely a programmer's self-declaration with no actual guarantee it's upheld. In Rust, the borrow checker verifies this statically.

This guarantee allows LLVM to perform more aggressive alias analysis and optimization.

Monomorphization

Rust's generics are implemented through a mechanism called monomorphization. This means that for each concrete type that a generic function or type is actually used with, separate code is generated at compile time.

fn max<T: PartialOrd>(a: T, b: T) -> T {
    if a >= b { a } else { b }
}

fn main() {
    max(1_i32, 2_i32);       // max::<i32> is generated
    max(1.0_f64, 2.0_f64);   // max::<f64> is generated
    max("a", "b");            // max::<&str> is generated
}

The compiler generates three independent functions: max::<i32>, max::<f64>, max::<&str>. Each function is fully optimized for the concrete type, so there's no runtime overhead from using generics.

This is a fundamentally different approach from Java's generics (runtime casting via type erasure) or Go's generics (partial monomorphization via GC Shape Stenciling).

The tradeoff of monomorphization is increased compile time and binary size. However, the runtime performance cost is zero.

Static Dispatch vs Dynamic Dispatch

Rust's traits support both static and dynamic dispatch:

trait Drawable {
    fn draw(&self);
}

// Static dispatch: concrete type is known at compile time
fn draw_static(item: &impl Drawable) {
    item.draw();
}

// Dynamic dispatch: method is called through a vtable at runtime
fn draw_dynamic(item: &dyn Drawable) {
    item.draw();
}

With impl Trait static dispatch, monomorphization determines the concrete function call at compile time, eliminating virtual function table (vtable) overhead. With dyn Trait dynamic dispatch, indirect calls go through a vtable, at the same cost as C++ virtual function calls.

The key point is that the programmer explicitly chooses which to use. In Java and C#, virtual method calls are the default, and optimization is delegated to the JIT compiler's devirtualization. In Rust, the decision is in the programmer's hands: choose static dispatch when performance matters, and dynamic dispatch when flexibility is needed.

Memory Layout Control

Rust lets programmers control data memory layout:

// Field ordering is optimized by the compiler (padding minimization)
struct Optimized {
    a: u8,
    b: u64,
    c: u8,
}

// Force C-compatible layout
#[repr(C)]
struct CCompatible {
    a: u8,
    b: u64,
    c: u8,
}

// Specify a particular alignment
#[repr(align(64))]
struct CacheAligned {
    data: [u8; 64],
}

By default, Rust's compiler is free to reorder fields to minimize padding. #[repr(C)] forces C-compatible layout, making it safe for FFI use. #[repr(align(N))] controls alignment, enabling placement aligned to cache line boundaries.

Compile-Time Computation

Rust's const fn makes functions evaluable at compile time:

const fn factorial(n: u64) -> u64 {
    match n {
        0 | 1 => 1,
        _ => n * factorial(n - 1),
    }
}

// Computed at compile time
const FACT_10: u64 = factorial(10);

fn main() {
    // FACT_10 is just a constant at runtime
    println!("{}", FACT_10); // 3628800
}

Functions defined with const fn are evaluated at compile time when called in a const context (constant definitions, array length specifications, etc.). The runtime cost is zero.

Rust's const fn capabilities have been expanding with each version, with loops, conditionals, reference taking, and many other operations now executable at compile time.

Summary

Rust's performance isn't the result of a single technical factor, but is achieved through the accumulation of multiple design choices:

Factor Effect
No GC Predictable latency, low memory usage
RAII via ownership Deterministic resource deallocation
Monomorphization Zero-cost generics
LLVM backend State-of-the-art optimization
&mut exclusivity Aggressive alias optimization
Static/dynamic dispatch choice Programmer-controlled costs
Memory layout control Cache efficiency optimization
const fn Compile-time computation

Combined, these allow Rust to achieve execution performance on par with C/C++ while using high-level abstractions. There's no need to sacrifice speed for safety -- that's Rust's answer on performance.

Related

Back to book