Practice: Parse and Display a Number
From here on, it's hands-on. We'll use everything we've learned to incrementally build a calculator that can do basic arithmetic.
The end goal:
"1 + 2 * 3" -> [Lexer] -> token stream -> [Parser] -> AST -> [Eval] -> 7.0
But we're not building all of that at once. In this chapter, we'll start by reading a single number and displaying it.
Create the Project
cargo new calc
cd calc
From here, we'll be editing src/main.rs.
Step 1: Define Token
A lexer breaks a string into "tokens." Let's define the token types as an enum. For now, just numbers:
#[derive(Debug, Clone, PartialEq)]
enum Token {
Number(f64),
}
Number(f64) -- a token representing a number with an f64 value inside.
Step 2: Build the Lexer
struct Lexer {
input: Vec<char>,
pos: usize,
}
Convert the string to Vec<char> and process one character at a time. pos tracks where we're reading.
Why Vec<char>?: Rust's
Stringmakes it hard to access individual characters by index (UTF-8 encoding reasons).Vec<char>lets you simply doinput[0],input[1], etc.
impl Lexer {
fn new(input: String) -> Lexer {
Lexer {
input: input.chars().collect(),
pos: 0,
}
}
fn tokenize(&mut self) -> Vec<Token> {
let mut tokens = Vec::new();
while self.pos < self.input.len() {
let ch = self.input[self.pos];
match ch {
// Skip whitespace
' ' | '\t' => {
self.pos += 1;
}
// Digit -> read a number
'0'..='9' => {
let token = self.read_number();
tokens.push(token);
}
// Skip everything else (for now)
_ => {
self.pos += 1;
}
}
}
tokens
}
fn read_number(&mut self) -> Token {
let start = self.pos;
// Keep reading while we see digits or a decimal point
while self.pos < self.input.len()
&& (self.input[self.pos].is_ascii_digit() || self.input[self.pos] == '.')
{
self.pos += 1;
}
// Collect the range into a string and convert to f64
let num_str: String = self.input[start..self.pos].iter().collect();
let num: f64 = num_str.parse().unwrap();
Token::Number(num)
}
}
Key points:
'0'..='9'matches characters from '0' to '9'read_numberkeeps reading digits and decimal points, then converts tof64.parse().unwrap()converts a string to a number. Panics on failure, but we're only feeding it digits so it's fine
Step 3: AST and Parsing
AST (Abstract Syntax Tree) represents the structure of an expression. Right now we only have numbers so it's barely a "tree," but we're laying the groundwork:
#[derive(Debug, Clone)]
enum Expr {
Number(f64),
}
Parsing is dead simple for now -- just grab the first number:
fn parse(tokens: Vec<Token>) -> Expr {
let token = tokens[0].clone();
match token {
Token::Number(n) => Expr::Number(n),
}
}
Step 4: Eval
Takes an AST, returns the result. For now, just returns the number:
fn eval(expr: Expr) -> f64 {
match expr {
Expr::Number(n) => n,
}
}
Step 5: Wire It All Together
fn main() {
let input = String::from("42");
let mut lexer = Lexer::new(input);
let tokens = lexer.tokenize();
println!("Tokens: {:?}", tokens);
let ast = parse(tokens);
println!("AST: {:?}", ast);
let result = eval(ast);
println!("Result: {}", result);
}
cargo run
Tokens: [Number(42.0)]
AST: Number(42.0)
Result: 42
Decimals work too:
fn main() {
let input = String::from("3.14");
let mut lexer = Lexer::new(input);
let tokens = lexer.tokenize();
let ast = parse(tokens);
let result = eval(ast);
println!("{}", result); // 3.14
}
Complete Code for This Chapter
#[derive(Debug, Clone, PartialEq)]
enum Token {
Number(f64),
}
struct Lexer {
input: Vec<char>,
pos: usize,
}
impl Lexer {
fn new(input: String) -> Lexer {
Lexer {
input: input.chars().collect(),
pos: 0,
}
}
fn tokenize(&mut self) -> Vec<Token> {
let mut tokens = Vec::new();
while self.pos < self.input.len() {
let ch = self.input[self.pos];
match ch {
' ' | '\t' => {
self.pos += 1;
}
'0'..='9' => {
let token = self.read_number();
tokens.push(token);
}
_ => {
self.pos += 1;
}
}
}
tokens
}
fn read_number(&mut self) -> Token {
let start = self.pos;
while self.pos < self.input.len()
&& (self.input[self.pos].is_ascii_digit() || self.input[self.pos] == '.')
{
self.pos += 1;
}
let num_str: String = self.input[start..self.pos].iter().collect();
let num: f64 = num_str.parse().unwrap();
Token::Number(num)
}
}
#[derive(Debug, Clone)]
enum Expr {
Number(f64),
}
fn parse(tokens: Vec<Token>) -> Expr {
let token = tokens[0].clone();
match token {
Token::Number(n) => Expr::Number(n),
}
}
fn eval(expr: Expr) -> f64 {
match expr {
Expr::Number(n) => n,
}
}
fn main() {
let input = String::from("42");
let mut lexer = Lexer::new(input);
let tokens = lexer.tokenize();
println!("Tokens: {:?}", tokens);
let ast = parse(tokens);
println!("AST: {:?}", ast);
let result = eval(ast);
println!("Result: {}", result);
}
What we did is modest, but the Lexer -> Parser -> Eval pipeline skeleton is in place. Starting next chapter, we'll add operators to this skeleton.
Next chapter: we implement addition.