Practice: Addition
In the previous chapter we parsed a single number. In this chapter, we implement addition.
"1 + 2" -> [Lexer] -> [Number(1), Plus, Number(2)] -> [Parser] -> AST -> [Eval] -> 3.0
Add Plus to Token
#[derive(Debug, Clone, PartialEq)]
enum Token {
Number(f64),
Plus, // <- added
}
Recognize + in the Lexer
Add '+' to the match in tokenize:
fn tokenize(&mut self) -> Vec<Token> {
let mut tokens = Vec::new();
while self.pos < self.input.len() {
let ch = self.input[self.pos];
match ch {
' ' | '\t' => {
self.pos += 1;
}
'+' => { // <- added
tokens.push(Token::Plus);
self.pos += 1;
}
'0'..='9' => {
let token = self.read_number();
tokens.push(token);
}
_ => {
self.pos += 1;
}
}
}
tokens
}
Now "1 + 2" becomes [Number(1.0), Plus, Number(2.0)].
Add Binary Operations to the AST
We need to represent "left + right." Extend Expr:
#[derive(Debug, Clone)]
enum Expr {
Number(f64),
BinOp { // <- added
op: Token,
left: Box<Expr>,
right: Box<Expr>,
},
}
BinOp stands for Binary Operation. left and right each hold a sub-expression.
What's Box
We want Expr to contain another Expr, but Rust needs to know data sizes at compile time. If Expr directly contains another Expr, the size would be infinite.
Box<Expr> means "put the Expr on the heap and just hold a pointer."
Oversimplification warning: Think of
Boxas "the thing you use for recursive data structures." Wrap withBox::new(value). When you use it, the contents are automatically accessible.
Turn Parser into a Struct
The simple parse function from before isn't enough anymore. We need to track position as we read through multiple tokens:
struct Parser {
tokens: Vec<Token>,
pos: usize,
}
impl Parser {
fn new(tokens: Vec<Token>) -> Parser {
Parser { tokens, pos: 0 }
}
// Peek at the current token (don't advance)
fn peek(&self) -> Option<Token> {
if self.pos < self.tokens.len() {
Some(self.tokens[self.pos].clone())
} else {
None
}
}
// Take the current token and advance
fn next_token(&mut self) -> Option<Token> {
if self.pos < self.tokens.len() {
let token = self.tokens[self.pos].clone();
self.pos += 1;
Some(token)
} else {
None
}
}
}
peek and next_token are helpers we'll keep using.
.clone()copies Vec elements. The usual trick to avoid ownership issues.
Implement parse
impl Parser {
// ... new, peek, next_token omitted ...
fn parse(&mut self) -> Expr {
// First, read the left side (a number)
let mut left = self.parse_primary();
// As long as Plus follows, read the right side and combine into BinOp
while let Some(Token::Plus) = self.peek() {
let op = self.next_token().unwrap(); // consume Plus
let right = self.parse_primary();
left = Expr::BinOp {
op,
left: Box::new(left),
right: Box::new(right),
};
}
left
}
// Read a single number
fn parse_primary(&mut self) -> Expr {
match self.next_token() {
Some(Token::Number(n)) => Expr::Number(n),
other => panic!("expected number, got {:?}", other),
}
}
}
while let Some(Token::Plus) = self.peek() means "loop as long as the next token is Plus."
Let's trace how 1 + 2 + 3 gets parsed:
parse_primary()->Number(1.0)becomesleft- peek is
Plus-> enter the loop - Consume
Plus,parse_primary()->Number(2.0)becomesright left=BinOp { Plus, Number(1), Number(2) }- peek is
Plusagain -> another iteration - Consume
Plus,parse_primary()->Number(3.0)becomesright left=BinOp { Plus, BinOp { Plus, 1, 2 }, Number(3) }
Extend eval
fn eval(expr: Expr) -> f64 {
match expr {
Expr::Number(n) => n,
Expr::BinOp { op, left, right } => { // <- added
let l = eval(*left);
let r = eval(*right);
match op {
Token::Plus => l + r,
_ => panic!("unknown operator: {:?}", op),
}
}
}
}
*left extracts the Expr from inside Box<Expr>. Recursively eval both sides and add them.
Try It Out
fn main() {
let inputs = vec!["1 + 2", "10 + 20 + 30", "3.14 + 0.86"];
for input in inputs {
let mut lexer = Lexer::new(input.to_string());
let tokens = lexer.tokenize();
let mut parser = Parser::new(tokens);
let ast = parser.parse();
let result = eval(ast);
println!("{} = {}", input, result);
}
}
1 + 2 = 3
10 + 20 + 30 = 60
3.14 + 0.86 = 4
Complete Code for This Chapter
#[derive(Debug, Clone, PartialEq)]
enum Token {
Number(f64),
Plus,
}
struct Lexer {
input: Vec<char>,
pos: usize,
}
impl Lexer {
fn new(input: String) -> Lexer {
Lexer {
input: input.chars().collect(),
pos: 0,
}
}
fn tokenize(&mut self) -> Vec<Token> {
let mut tokens = Vec::new();
while self.pos < self.input.len() {
let ch = self.input[self.pos];
match ch {
' ' | '\t' => {
self.pos += 1;
}
'+' => {
tokens.push(Token::Plus);
self.pos += 1;
}
'0'..='9' => {
let token = self.read_number();
tokens.push(token);
}
_ => {
self.pos += 1;
}
}
}
tokens
}
fn read_number(&mut self) -> Token {
let start = self.pos;
while self.pos < self.input.len()
&& (self.input[self.pos].is_ascii_digit() || self.input[self.pos] == '.')
{
self.pos += 1;
}
let num_str: String = self.input[start..self.pos].iter().collect();
let num: f64 = num_str.parse().unwrap();
Token::Number(num)
}
}
#[derive(Debug, Clone)]
enum Expr {
Number(f64),
BinOp {
op: Token,
left: Box<Expr>,
right: Box<Expr>,
},
}
struct Parser {
tokens: Vec<Token>,
pos: usize,
}
impl Parser {
fn new(tokens: Vec<Token>) -> Parser {
Parser { tokens, pos: 0 }
}
fn peek(&self) -> Option<Token> {
if self.pos < self.tokens.len() {
Some(self.tokens[self.pos].clone())
} else {
None
}
}
fn next_token(&mut self) -> Option<Token> {
if self.pos < self.tokens.len() {
let token = self.tokens[self.pos].clone();
self.pos += 1;
Some(token)
} else {
None
}
}
fn parse(&mut self) -> Expr {
let mut left = self.parse_primary();
while let Some(Token::Plus) = self.peek() {
let op = self.next_token().unwrap();
let right = self.parse_primary();
left = Expr::BinOp {
op,
left: Box::new(left),
right: Box::new(right),
};
}
left
}
fn parse_primary(&mut self) -> Expr {
match self.next_token() {
Some(Token::Number(n)) => Expr::Number(n),
other => panic!("expected number, got {:?}", other),
}
}
}
fn eval(expr: Expr) -> f64 {
match expr {
Expr::Number(n) => n,
Expr::BinOp { op, left, right } => {
let l = eval(*left);
let r = eval(*right);
match op {
Token::Plus => l + r,
_ => panic!("unknown operator: {:?}", op),
}
}
}
}
fn main() {
let input = String::from("1 + 2");
let mut lexer = Lexer::new(input);
let tokens = lexer.tokenize();
println!("Tokens: {:?}", tokens);
let mut parser = Parser::new(tokens);
let ast = parser.parse();
println!("AST: {:?}", ast);
let result = eval(ast);
println!("Result: {}", result);
}
Addition works. Next chapter, we add subtraction. It's almost the same thing -- get used to the pattern.