Bitterless Rust

Practice: Parse and Display a Number

From here on, it's hands-on. We'll use everything we've learned to incrementally build a calculator that can do basic arithmetic.

The end goal:

"1 + 2 * 3" -> [Lexer] -> token stream -> [Parser] -> AST -> [Eval] -> 7.0

But we're not building all of that at once. In this chapter, we'll start by reading a single number and displaying it.

Create the Project

cargo new calc
cd calc

From here, we'll be editing src/main.rs.

Step 1: Define Token

A lexer breaks a string into "tokens." Let's define the token types as an enum. For now, just numbers:

#[derive(Debug, Clone, PartialEq)]
enum Token {
    Number(f64),
}

Number(f64) -- a token representing a number with an f64 value inside.

Step 2: Build the Lexer

struct Lexer {
    input: Vec<char>,
    pos: usize,
}

Convert the string to Vec<char> and process one character at a time. pos tracks where we're reading.

Why Vec<char>?: Rust's String makes it hard to access individual characters by index (UTF-8 encoding reasons). Vec<char> lets you simply do input[0], input[1], etc.

impl Lexer {
    fn new(input: String) -> Lexer {
        Lexer {
            input: input.chars().collect(),
            pos: 0,
        }
    }

    fn tokenize(&mut self) -> Vec<Token> {
        let mut tokens = Vec::new();

        while self.pos < self.input.len() {
            let ch = self.input[self.pos];

            match ch {
                // Skip whitespace
                ' ' | '\t' => {
                    self.pos += 1;
                }

                // Digit -> read a number
                '0'..='9' => {
                    let token = self.read_number();
                    tokens.push(token);
                }

                // Skip everything else (for now)
                _ => {
                    self.pos += 1;
                }
            }
        }

        tokens
    }

    fn read_number(&mut self) -> Token {
        let start = self.pos;

        // Keep reading while we see digits or a decimal point
        while self.pos < self.input.len()
            && (self.input[self.pos].is_ascii_digit() || self.input[self.pos] == '.')
        {
            self.pos += 1;
        }

        // Collect the range into a string and convert to f64
        let num_str: String = self.input[start..self.pos].iter().collect();
        let num: f64 = num_str.parse().unwrap();

        Token::Number(num)
    }
}

Key points:

  • '0'..='9' matches characters from '0' to '9'
  • read_number keeps reading digits and decimal points, then converts to f64
  • .parse().unwrap() converts a string to a number. Panics on failure, but we're only feeding it digits so it's fine

Step 3: AST and Parsing

AST (Abstract Syntax Tree) represents the structure of an expression. Right now we only have numbers so it's barely a "tree," but we're laying the groundwork:

#[derive(Debug, Clone)]
enum Expr {
    Number(f64),
}

Parsing is dead simple for now -- just grab the first number:

fn parse(tokens: Vec<Token>) -> Expr {
    let token = tokens[0].clone();
    match token {
        Token::Number(n) => Expr::Number(n),
    }
}

Step 4: Eval

Takes an AST, returns the result. For now, just returns the number:

fn eval(expr: Expr) -> f64 {
    match expr {
        Expr::Number(n) => n,
    }
}

Step 5: Wire It All Together

fn main() {
    let input = String::from("42");

    let mut lexer = Lexer::new(input);
    let tokens = lexer.tokenize();
    println!("Tokens: {:?}", tokens);

    let ast = parse(tokens);
    println!("AST: {:?}", ast);

    let result = eval(ast);
    println!("Result: {}", result);
}
cargo run
Tokens: [Number(42.0)]
AST: Number(42.0)
Result: 42

Decimals work too:

fn main() {
    let input = String::from("3.14");

    let mut lexer = Lexer::new(input);
    let tokens = lexer.tokenize();
    let ast = parse(tokens);
    let result = eval(ast);

    println!("{}", result); // 3.14
}

Complete Code for This Chapter

#[derive(Debug, Clone, PartialEq)]
enum Token {
    Number(f64),
}

struct Lexer {
    input: Vec<char>,
    pos: usize,
}

impl Lexer {
    fn new(input: String) -> Lexer {
        Lexer {
            input: input.chars().collect(),
            pos: 0,
        }
    }

    fn tokenize(&mut self) -> Vec<Token> {
        let mut tokens = Vec::new();

        while self.pos < self.input.len() {
            let ch = self.input[self.pos];

            match ch {
                ' ' | '\t' => {
                    self.pos += 1;
                }
                '0'..='9' => {
                    let token = self.read_number();
                    tokens.push(token);
                }
                _ => {
                    self.pos += 1;
                }
            }
        }

        tokens
    }

    fn read_number(&mut self) -> Token {
        let start = self.pos;

        while self.pos < self.input.len()
            && (self.input[self.pos].is_ascii_digit() || self.input[self.pos] == '.')
        {
            self.pos += 1;
        }

        let num_str: String = self.input[start..self.pos].iter().collect();
        let num: f64 = num_str.parse().unwrap();

        Token::Number(num)
    }
}

#[derive(Debug, Clone)]
enum Expr {
    Number(f64),
}

fn parse(tokens: Vec<Token>) -> Expr {
    let token = tokens[0].clone();
    match token {
        Token::Number(n) => Expr::Number(n),
    }
}

fn eval(expr: Expr) -> f64 {
    match expr {
        Expr::Number(n) => n,
    }
}

fn main() {
    let input = String::from("42");

    let mut lexer = Lexer::new(input);
    let tokens = lexer.tokenize();
    println!("Tokens: {:?}", tokens);

    let ast = parse(tokens);
    println!("AST: {:?}", ast);

    let result = eval(ast);
    println!("Result: {}", result);
}

What we did is modest, but the Lexer -> Parser -> Eval pipeline skeleton is in place. Starting next chapter, we'll add operators to this skeleton.


Next chapter: we implement addition.

Related

Back to book