Wonderfull Rust

The Borrow Checker

Ownership: The Foundation of Rust

In Bitterless Rust, we explained ownership as "dynamic stuff disappears when you pass it." Let's understand it accurately now.

Rust's ownership system is defined by three rules:

  1. Every value in Rust has a single variable called its owner
  2. When the owner goes out of scope, the value is dropped
  3. Ownership can be moved, but there can never be multiple owners at the same time
fn main() {
    let s1 = String::from("hello");  // s1 is the owner
    let s2 = s1;                     // Ownership moves from s1 to s2
    // println!("{}", s1);           // Compile error: s1 is no longer valid
    println!("{}", s2);              // OK: s2 is the owner
}  // <- s2 goes out of scope, and String's memory is freed

This is called "move semantics." Types that don't implement the Copy trait (String, Vec<T>, Box<T>, etc. -- types that hold data on the heap) have their ownership moved on assignment or when passed to functions.

Borrowing

When you want to use a value without moving ownership, you take a reference. This is called borrowing:

fn calculate_length(s: &String) -> usize {
    s.len()
}

fn main() {
    let s = String::from("hello");
    let len = calculate_length(&s);  // Borrow s (ownership doesn't move)
    println!("'{}' has length {}", s, len);  // s is still usable
}

&s creates an immutable reference to s. calculate_length doesn't take ownership of s -- it just borrows it temporarily.

Borrowing Rules

Rust's borrow checker enforces these rules at compile time:

  1. At any given time, you can have either one mutable reference (&mut T) or any number of immutable references (&T), but not both simultaneously
  2. References must always be valid (no dangling references)
fn main() {
    let mut s = String::from("hello");

    let r1 = &s;      // OK: first immutable reference
    let r2 = &s;      // OK: second immutable reference (multiple immutable refs can coexist)
    println!("{} {}", r1, r2);

    let r3 = &mut s;  // OK: r1, r2 are no longer used, so we can take a mutable reference
    r3.push_str(" world");
    println!("{}", r3);
}
fn main() {
    let mut s = String::from("hello");

    let r1 = &s;
    let r2 = &mut s;  // Compile error: immutable and mutable references exist simultaneously
    println!("{}", r1);
}

Bugs this rule prevents:

  • Data Races: Simultaneous reads and writes to the same data
  • Iterator Invalidation: Modifying a collection while iterating over it
  • Use-After-Free: Accessing freed memory

These are among the most frequently occurring severe bugs in C/C++, and Rust's borrow checker eliminates all of them at compile time.

Lifetimes

Lifetimes are a mechanism for telling the compiler how long a reference remains valid. In many cases, the compiler infers lifetimes automatically (lifetime elision rules), but explicit annotations are sometimes needed.

Why Lifetimes Are Needed

// This function returns one of two references
// Which reference's lifetime does the return value correspond to?
fn longer(a: &str, b: &str) -> &str {
    if a.len() > b.len() { a } else { b }
}
// Compile error: lifetime unclear

The compiler can't determine whether the return reference is tied to a's or b's lifetime. We make it explicit with lifetime annotations:

fn longer<'a>(a: &'a str, b: &'a str) -> &'a str {
    if a.len() > b.len() { a } else { b }
}

'a is a lifetime parameter. This annotation means "both a and b are valid for at least 'a, and the return value is also valid for 'a." At the call site, the shorter of a's and b's lifetimes is adopted as 'a.

fn main() {
    let s1 = String::from("long string");

    {
        let s2 = String::from("xyz");
        let result = longer(&s1, &s2);
        println!("{}", result);  // OK: result is used within s2's scope
    }

    // s2 doesn't exist here, so trying to use
    // the result of longer(&s1, &s2) here would be a compile error
}

Lifetime Elision Rules

In many cases, lifetime annotations can be omitted. The compiler infers them with these rules:

  1. Each reference parameter gets its own lifetime
  2. If there's exactly one reference parameter, its lifetime becomes the output's lifetime
  3. For methods (functions with &self / &mut self), self's lifetime becomes the output's lifetime
// Elided form
fn first_word(s: &str) -> &str { /* ... */ }

// Expanded form (what the compiler infers)
fn first_word<'a>(s: &'a str) -> &'a str { /* ... */ }

Lifetimes in Structs

Structs that hold references as fields require lifetime parameters:

struct Excerpt<'a> {
    text: &'a str,
}

fn main() {
    let novel = String::from("Call me Ishmael. Some years ago...");
    let first_sentence = novel.split('.').next().unwrap();

    let excerpt = Excerpt { text: first_sentence };
    println!("{}", excerpt.text);
}
// excerpt can't outlive novel (guaranteed by lifetimes)

The 'static Lifetime

'static is a lifetime that is valid for the entire duration of the program:

// String literals have the 'static lifetime
let s: &'static str = "hello";

// 'static bound: the value contains no references, or all contained references are 'static
fn spawn_thread<F: FnOnce() + Send + 'static>(f: F) {
    // f is passed to a thread, so it needs to live for an arbitrary duration
    std::thread::spawn(f);
}

'static doesn't mean "leaked memory" -- it means "can remain valid for as long as needed." All owned values (String, Vec<T>, etc.) satisfy the 'static bound because they own their data rather than borrowing it, so they can be held for any duration.

Generic Lifetimes

Lifetimes can be made generic, just like type parameters:

use std::fmt::Display;

fn longest_with_announcement<'a, T>(
    x: &'a str,
    y: &'a str,
    ann: T,
) -> &'a str
where
    T: Display,
{
    println!("Announcement: {}", ann);
    if x.len() > y.len() { x } else { y }
}

The lifetime parameter 'a and the type parameter T are used in the same function. Just like the trait bounds from the generics chapter, lifetimes are handled uniformly as part of the type system.

Multiple Lifetime Parameters

fn first_of_second<'a, 'b>(first: &'a str, second: &'b str) -> &'a str {
    // 'a and 'b are independent lifetimes
    // The return value has the same lifetime as first
    first
}

Lifetime Bounds

// 'b is at least as long as 'a
fn example<'a, 'b: 'a>(x: &'a str, y: &'b str) -> &'a str {
    if x.len() > y.len() { x } else { y }
}

'b: 'a means "'b outlives 'a." This lets you precisely express relationships between lifetimes.

Concurrency and Ownership: Send and Sync

Rust's ownership system also guarantees safety in concurrent programming. This is supported by two marker traits, Send and Sync:

  • Send: Values of this type can be transferred (moved) to another thread
  • Sync: References (&T) to this type can be safely accessed from multiple threads (i.e., &T is Send)

Most types automatically implement both Send and Sync. Types that aren't thread-safe, like Rc<T>, have the compiler automatically exclude them from Send / Sync implementations.

use std::thread;

fn main() {
    let data = vec![1, 2, 3];

    // data's ownership moves to the thread
    let handle = thread::spawn(move || {
        println!("{:?}", data);
    });

    // println!("{:?}", data);  // Compile error: ownership has been moved
    handle.join().unwrap();
}

Arc and Mutex

When sharing data across multiple threads, use Arc (Atomic Reference Counted) and Mutex:

Arc: Thread-Safe Reference Counting

use std::sync::Arc;
use std::thread;

fn main() {
    let data = Arc::new(vec![1, 2, 3, 4, 5]);

    let mut handles = vec![];

    for i in 0..3 {
        let data = Arc::clone(&data);  // Increments reference count (doesn't copy data)
        handles.push(thread::spawn(move || {
            println!("Thread {}: {:?}", i, data);
        }));
    }

    for handle in handles {
        handle.join().unwrap();
    }
}

Arc is a thread-safe reference-counted smart pointer. Arc::clone() only atomically increments the reference count -- it doesn't copy the inner data. When the reference count reaches zero, the data is freed.

Use Rc<T> for single-threaded contexts and Arc<T> for multi-threaded contexts. Rc<T> doesn't implement Send, so attempting to send it to a thread results in a compile error. This is a compile-time error, not a runtime one.

Mutex: Exclusive Access

use std::sync::{Arc, Mutex};
use std::thread;

fn main() {
    let counter = Arc::new(Mutex::new(0));

    let mut handles = vec![];

    for _ in 0..10 {
        let counter = Arc::clone(&counter);
        handles.push(thread::spawn(move || {
            let mut num = counter.lock().unwrap();
            *num += 1;
            // <- When the lock's scope ends, it's automatically unlocked (Drop)
        }));
    }

    for handle in handles {
        handle.join().unwrap();
    }

    println!("Result: {}", *counter.lock().unwrap()); // 10
}

Mutex<T> provides exclusive locking. .lock() acquires the lock, and the returned MutexGuard automatically unlocks when it goes out of scope (RAII via the Drop trait).

What's crucial is that modifying data inside an Arc<T> without Mutex<T> is a compile error. Rust's type system enforces at compile time that "exclusive access to shared data requires a lock."

RwLock: Read-Write Lock

use std::sync::RwLock;

let lock = RwLock::new(5);

// Multiple read locks can be acquired simultaneously
{
    let r1 = lock.read().unwrap();
    let r2 = lock.read().unwrap();
    println!("{} {}", r1, r2);
}

// Write locks are exclusive
{
    let mut w = lock.write().unwrap();
    *w += 1;
}

RwLock can be understood as a runtime extension of the borrow checker's rule: "multiple immutable references or one mutable reference."

The Value of the Borrow Checker

The borrow checker is known as the biggest barrier to learning Rust. "Fighting with the compiler," "getting yelled at by the borrow checker" -- these experiences are a rite of passage for Rust programmers.

However, considering the categories of bugs the borrow checker eliminates, the cost is well justified:

Bug Category Handling in C/C++ Handling in Rust
Use-After-Free Undefined behavior at runtime Compile error
Dangling pointers Crash at runtime (if you're lucky) Compile error
Data races Tools like ThreadSanitizer Compile error
Double free Crash at runtime Compile error
Iterator invalidation Undefined behavior at runtime Compile error

All of these are caught at compile time, not at runtime. Instead of discovering bugs in production, you find them the moment you write the code.

The borrow checker isn't just a constraint -- it's a system that automatically proves the correctness of code. Code that compiles is structurally guaranteed to be safe with respect to the bug categories above.

Related

Back to book