Bitterless Rust

Practice: Operator Precedence and Finishing Up

In the previous chapter, 2 + 3 * 4 = 20 -- a sad result. It should be 14. Let's fix it.

Why It's Wrong

The current parse treats all operators with the same precedence, left to right:

2 + 3 * 4
-> (2 + 3) * 4
-> 20

* should bind tighter than +:

2 + 3 * 4
-> 2 + (3 * 4)
-> 14

The Fix: Split Parsing into Two Levels

Lower-precedence operators are handled in the "upper layer":

parse (entry point)
  -> parse_add_sub (handles + and -. Precedence: low)
       -> parse_mul_div (handles * and /. Precedence: high)
            -> parse_primary (handles numbers)

Because parse_mul_div is called first, * and / get grouped first. This is the basic idea behind a recursive descent parser.

Split parse

Turn the current parse into parse_add_sub and parse_mul_div:

fn parse(&mut self) -> Expr {
    self.parse_add_sub()
}

// Handles + and - (precedence: low)
fn parse_add_sub(&mut self) -> Expr {
    let mut left = self.parse_mul_div(); // <- parse_mul_div, not parse_primary

    loop {
        match self.peek() {
            Some(Token::Plus) | Some(Token::Minus) => {
                let op = self.next_token().unwrap();
                let right = self.parse_mul_div(); // <- here too
                left = Expr::BinOp {
                    op,
                    left: Box::new(left),
                    right: Box::new(right),
                };
            }
            _ => break,
        }
    }

    left
}

// Handles * and / (precedence: high)
fn parse_mul_div(&mut self) -> Expr {
    let mut left = self.parse_primary();

    loop {
        match self.peek() {
            Some(Token::Star) | Some(Token::Slash) => {
                let op = self.next_token().unwrap();
                let right = self.parse_primary();
                left = Expr::BinOp {
                    op,
                    left: Box::new(left),
                    right: Box::new(right),
                };
            }
            _ => break,
        }
    }

    left
}

The structure of parse_add_sub and parse_mul_div is almost identical. The differences are:

  • Which operators they handle (+/- vs *//)
  • Where sub-expressions are parsed from (parse_mul_div vs parse_primary)

Let's trace how 2 + 3 * 4 gets parsed:

  1. parse_add_sub starts
  2. Calls parse_mul_div -> parse_primary -> Number(2) becomes left
  3. peek is Plus -> not * or /, so parse_mul_div returns Number(2)
  4. Back in parse_add_sub. peek is Plus -> enter the loop
  5. Consume Plus, call parse_mul_div
  6. parse_primary -> Number(3) becomes left
  7. peek is Star -> enter the loop! Consume Star, parse_primary -> Number(4)
  8. left = BinOp { Star, 3, 4 } -> parse_mul_div returns
  9. Back in parse_add_sub. left = BinOp { Plus, 2, BinOp { Star, 3, 4 } }

Star gets grouped first.

Add Parentheses

While we're at it, let's add parentheses. For when you want to change precedence with (1 + 2) * 3.

Add to Token:

#[derive(Debug, Clone, PartialEq)]
enum Token {
    Number(f64),
    Plus,
    Minus,
    Star,
    Slash,
    LParen,     // <- added
    RParen,     // <- added
}

Add to Lexer:

'(' => {
    tokens.push(Token::LParen);
    self.pos += 1;
}
')' => {
    tokens.push(Token::RParen);
    self.pos += 1;
}

Extend parse_primary:

fn parse_primary(&mut self) -> Expr {
    match self.next_token() {
        Some(Token::Number(n)) => Expr::Number(n),
        Some(Token::LParen) => {
            // Parse the contents starting from the lowest precedence
            let expr = self.parse_add_sub();
            // Consume the closing paren
            match self.next_token() {
                Some(Token::RParen) => {}
                other => panic!("expected ')', got {:?}", other),
            }
            expr
        }
        other => panic!("expected number or '(', got {:?}", other),
    }
}

When we see (, we call parse_add_sub() again. The contents of the parentheses are parsed from the lowest precedence level, so nested parentheses work correctly too.

Try It Out

fn main() {
    let tests = vec![
        ("2 + 3 * 4", 14.0),
        ("(2 + 3) * 4", 20.0),
        ("1 + 2 * 3", 7.0),
        ("10 - 2 - 3", 5.0),
        ("2 * 3 + 4 * 5", 26.0),
        ("100 / (5 + 5)", 10.0),
        ("(1 + 2) * (3 + 4)", 21.0),
        ("3.14 * 2", 6.28),
    ];

    for (input, expected) in tests {
        let mut lexer = Lexer::new(input.to_string());
        let tokens = lexer.tokenize();
        let mut parser = Parser::new(tokens);
        let ast = parser.parse();
        let result = eval(ast);

        let status = if (result - expected).abs() < 0.0001 { "OK" } else { "NG" };
        println!("[{}] {} = {}", status, input, result);
    }
}
[OK] 2 + 3 * 4 = 14
[OK] (2 + 3) * 4 = 20
[OK] 1 + 2 * 3 = 7
[OK] 10 - 2 - 3 = 5
[OK] 2 * 3 + 4 * 5 = 26
[OK] 100 / (5 + 5) = 10
[OK] (1 + 2) * (3 + 4) = 21
[OK] 3.14 * 2 = 6.28

All OK.

Build a REPL

Let's make it interactive:

use std::io;
use std::io::Write;

fn main() {
    println!("Bitterless Calc v0.1");
    println!("Type 'exit' to quit.");
    println!();

    loop {
        print!("> ");
        io::stdout().flush().unwrap();

        let mut input = String::new();
        io::stdin().read_line(&mut input).unwrap();
        let input = input.trim().to_string();

        if input == "exit" {
            break;
        }

        if input.is_empty() {
            continue;
        }

        let mut lexer = Lexer::new(input);
        let tokens = lexer.tokenize();
        let mut parser = Parser::new(tokens);
        let ast = parser.parse();
        let result = eval(ast);

        println!("{}", result);
    }
}
cargo run
Bitterless Calc v0.1
Type 'exit' to quit.

> 1 + 2 * 3
7
> (1 + 2) * 3
9
> 100 / 3
33.333333333333336
> exit

New elements:

  • use std::io -- declares we're using the standard library's I/O module
  • io::stdout().flush().unwrap() -- print! doesn't add a newline, so we flush to ensure output appears
  • io::stdin().read_line(&mut input) -- reads keyboard input into input
  • .trim() -- strips leading/trailing whitespace and newlines

&mut input -- &mut shows up here. read_line declares "I will modify this variable's contents." For now, just treat it as boilerplate.

Final Complete Code

The finished calculator, built over 5 chapters:

use std::io;
use std::io::Write;

// ---- Token ----

#[derive(Debug, Clone, PartialEq)]
enum Token {
    Number(f64),
    Plus,
    Minus,
    Star,
    Slash,
    LParen,
    RParen,
}

// ---- Lexer ----

struct Lexer {
    input: Vec<char>,
    pos: usize,
}

impl Lexer {
    fn new(input: String) -> Lexer {
        Lexer {
            input: input.chars().collect(),
            pos: 0,
        }
    }

    fn tokenize(&mut self) -> Vec<Token> {
        let mut tokens = Vec::new();

        while self.pos < self.input.len() {
            let ch = self.input[self.pos];

            match ch {
                ' ' | '\t' => {
                    self.pos += 1;
                }
                '+' => {
                    tokens.push(Token::Plus);
                    self.pos += 1;
                }
                '-' => {
                    tokens.push(Token::Minus);
                    self.pos += 1;
                }
                '*' => {
                    tokens.push(Token::Star);
                    self.pos += 1;
                }
                '/' => {
                    tokens.push(Token::Slash);
                    self.pos += 1;
                }
                '(' => {
                    tokens.push(Token::LParen);
                    self.pos += 1;
                }
                ')' => {
                    tokens.push(Token::RParen);
                    self.pos += 1;
                }
                '0'..='9' => {
                    let token = self.read_number();
                    tokens.push(token);
                }
                _ => {
                    self.pos += 1;
                }
            }
        }

        tokens
    }

    fn read_number(&mut self) -> Token {
        let start = self.pos;

        while self.pos < self.input.len()
            && (self.input[self.pos].is_ascii_digit() || self.input[self.pos] == '.')
        {
            self.pos += 1;
        }

        let num_str: String = self.input[start..self.pos].iter().collect();
        let num: f64 = num_str.parse().unwrap();

        Token::Number(num)
    }
}

// ---- AST ----

#[derive(Debug, Clone)]
enum Expr {
    Number(f64),
    BinOp {
        op: Token,
        left: Box<Expr>,
        right: Box<Expr>,
    },
}

// ---- Parser ----

struct Parser {
    tokens: Vec<Token>,
    pos: usize,
}

impl Parser {
    fn new(tokens: Vec<Token>) -> Parser {
        Parser { tokens, pos: 0 }
    }

    fn peek(&self) -> Option<Token> {
        if self.pos < self.tokens.len() {
            Some(self.tokens[self.pos].clone())
        } else {
            None
        }
    }

    fn next_token(&mut self) -> Option<Token> {
        if self.pos < self.tokens.len() {
            let token = self.tokens[self.pos].clone();
            self.pos += 1;
            Some(token)
        } else {
            None
        }
    }

    fn parse(&mut self) -> Expr {
        self.parse_add_sub()
    }

    fn parse_add_sub(&mut self) -> Expr {
        let mut left = self.parse_mul_div();

        loop {
            match self.peek() {
                Some(Token::Plus) | Some(Token::Minus) => {
                    let op = self.next_token().unwrap();
                    let right = self.parse_mul_div();
                    left = Expr::BinOp {
                        op,
                        left: Box::new(left),
                        right: Box::new(right),
                    };
                }
                _ => break,
            }
        }

        left
    }

    fn parse_mul_div(&mut self) -> Expr {
        let mut left = self.parse_primary();

        loop {
            match self.peek() {
                Some(Token::Star) | Some(Token::Slash) => {
                    let op = self.next_token().unwrap();
                    let right = self.parse_primary();
                    left = Expr::BinOp {
                        op,
                        left: Box::new(left),
                        right: Box::new(right),
                    };
                }
                _ => break,
            }
        }

        left
    }

    fn parse_primary(&mut self) -> Expr {
        match self.next_token() {
            Some(Token::Number(n)) => Expr::Number(n),
            Some(Token::LParen) => {
                let expr = self.parse_add_sub();
                match self.next_token() {
                    Some(Token::RParen) => {}
                    other => panic!("expected ')', got {:?}", other),
                }
                expr
            }
            other => panic!("expected number or '(', got {:?}", other),
        }
    }
}

// ---- Eval ----

fn eval(expr: Expr) -> f64 {
    match expr {
        Expr::Number(n) => n,
        Expr::BinOp { op, left, right } => {
            let l = eval(*left);
            let r = eval(*right);
            match op {
                Token::Plus => l + r,
                Token::Minus => l - r,
                Token::Star => l * r,
                Token::Slash => l / r,
                _ => panic!("unknown operator: {:?}", op),
            }
        }
    }
}

// ---- main ----

fn main() {
    println!("Bitterless Calc v0.1");
    println!("Type 'exit' to quit.");
    println!();

    loop {
        print!("> ");
        io::stdout().flush().unwrap();

        let mut input = String::new();
        io::stdin().read_line(&mut input).unwrap();
        let input = input.trim().to_string();

        if input == "exit" {
            break;
        }

        if input.is_empty() {
            continue;
        }

        let mut lexer = Lexer::new(input);
        let tokens = lexer.tokenize();
        let mut parser = Parser::new(tokens);
        let ast = parser.parse();
        let result = eval(ast);

        println!("{}", result);
    }
}

Wrapping Up

We built a calculator over 5 chapters. Looking back:

Chapter What we added What we learned
8 Number parsing enum, struct, impl, match, Vec
9 Addition Box, Parser struct, recursion
10 Subtraction Getting used to the pattern
11 Multiplication & division Discovering the precedence problem
12 Precedence, parentheses, REPL Recursive descent parser, standard I/O

By taking it one step at a time, repeating the same pattern, you've naturally learned to write Rust code.

Next Steps

Some ideas to extend the project:

  • Unary minus: Handle -5 or -(1 + 2)
  • Variables: Define variables with x = 10 and use them in x + 5 (use HashMap)
  • Error handling: Replace panic! with Result to handle invalid input gracefully
  • Module splitting: Split Token, Lexer, Parser, and Eval into separate files

From here, read The Book to learn about ownership and lifetimes properly.

We've also prepared a sequel: Wonderfull Rust. It covers the "bitter" parts we intentionally skipped in Bitterless Rust -- ownership, lifetimes, the borrow checker, traits, macros, unsafe -- and explains precisely why these features are wonderful. If you want a conceptual understanding of what makes Rust's design excellent, give it a read.

We hope this Bitterless beginning leads to a deeper journey with Rust.

Related

Back to book