Actually doing stuff with ASTs
So I spent this week trying to change a reverse-polish expression into an infix one.
but the docs are dense and (though i got some basic stuff working) it all felt pretty gross.
After a few days of looking at Rackets built-in tools and feeling underwhelmed, I decide to start coding my own.
I started with a pretty basic reverse-polish notation grammar:
#lang brag
program: (definition | assignment | expression)*
definition: def id lparen (id)* rparen lbrac (expression)* rbrac
assignment: (id eq expression)+
expression: ((arg | (lparen expression rparen))+ (id| op))+
arg: ( num | id )
op: OP
id: ID
num: NUM
def: DEF
eq: EQ
lparen: LPAREN
rparen: RPAREN
lbrac: LBRAC
rbrac: RBRAC
and a dead simple lexer:
#lang racket
#| reversePolishLexer.rkt
DESCRIPTION:
A simple reverse-polish lexer/tokenizer.
TO USE:
To use, import into a new file and call
(tokenize <some-input-port>).
The resulting object should be given to
a BRAG parser to be contructed into an AST.
Racket 6.3 | 10/25/16 | License MIT
|#
(require parser-tools/lex)
(require parser-tools/lex-sre)
(require brag/support)
(define-lex-abbrev number?
(or (+ (char-range #\0 #\9))
(: (+ (char-range #\0 #\9)) "." (* (char-range #\0 #\9)))))
(define-lex-abbrev identifier? (: (+ alphabetic)
(* (or alphabetic
#\- #\> #\< #\_ #\+ #\. #\/ #\= #\! #\?
#\: #\$ #\% #\& #\~ #\^))))
; A lexer to tokenize LISP files
(define tokenize
(lambda (file-port)
(port-count-lines! file-port)
(define lisp-lexer
(lexer
["define" (token 'DEF lexeme)]
[#\( (token 'LPAREN lexeme)]
[#\) (token 'RPAREN lexeme)]
[#\{ (token 'LBRAC lexeme)]
[#\} (token 'RBRAC lexeme)]
[(or #\+ #\- #\* #\/) (token 'OP lexeme)]
["=" (token 'EQ lexeme)]
[identifier? (token 'ID lexeme)]
[number? (token 'NUM lexeme)]
[(or #\newline #\space "") (token 'WHITESPACE lexeme #:skip? #t)] ;; Ignore
[(eof) (void)]))
(define (next-token) (lisp-lexer file-port))
next-token))
(provide (all-defined-out))
Which, when fed "4 5 6 (5 7 z *) +" produces:
'(program (expression (arg (num "4")) (arg (num "5")) (arg (num "6"))
(lparen "(") (expression (arg (num "5")) (arg (num "7")) (arg (id "z"))
(op "*")) (rparen ")") (op "+")))
I made a partial (to-be-finished) AST transformer:
#lang racket
#|
tree-transform.rkt
DESCRIPTION:
An AST-to-AST (both represented as a datum)
translator that consumes
a reverse-polish-notation AST and turns
it into a infix-AST while remaining
semantically consistent
TO USE:
Import this file into another project and call
(translate <some-datum-that-was-generated-by-brag>)
It will return another datum that is the AST of the
resulting infix program.
TODO:
allow multi-line files, define statements, and
assignments.
Racket 6.6 | 10/25/16 | MIT license
|#
; Translates a AST-datum to another AST-datum.
(define transform-tree
(lambda (item)
(cond
;;;-------------------------------- TOP-LEVEL PROGRAM -------------------------------------------;;;
[(equal? (first item) 'program)
(let ([content (rest item)]
[result '(program )])
(for ([i content])
(cond [(equal? (first i) 'expression)
(set! result (append result
(list (transform-tree i))))]))
result)]
;;;-------------------------------- INFIX TERMS -------------------------------------------------;;;
[(and (equal? (first item) 'expression)
(equal? (first (last item)) 'op)) ;;; start with 'expression and end with 'op
(let ([op (last item)] ;;; reconstruct as infix
[args (middle item)]
[infix-expr '(expression (lparen "("))]) ;;;boiler plate
(for ([i args])
(cond [(equal? (first i) 'arg) ;;; if terminal, append it
(set! infix-expr (append infix-expr
(list (second i))
(list op)))]
[(equal? (first i) 'expression) ;;;if expression, rec-trans it
(set! infix-expr (append infix-expr
(list (transform-tree i))
(list op)))]))
(append (shave-off-end infix-expr)
(list (list 'rparen ")"))))]
;;;--------------------------------- FUNCTION CALLS ----------------------------------------------;;;
[(and (equal? (first item) 'expression) (equal? (first (last item)) 'id)) ;;; is an expression that ends in a function-id
(let ([func (last item)] ;;; contruct with commas seperating args
[args (middle item)]
[args-with-commas '()])
(for ([i args])
(cond [(equal? (first i) 'arg)
(set! args-with-commas (append args-with-commas
(list (second i))
(list '(comma ", "))))]
[(equal? (first i) 'expression)
(set! args-with-commas (append args-with-commas
(list (transform-tree i))
(list '(comma ", "))))]))
(append (list 'func-call func '(lparen "("))
(shave-off-end args-with-commas)
(list '(rparen ")"))))]))) ;;; insert into func call
; return a list without either end item
(define middle
(lambda (lst)
(reverse (rest (reverse (rest lst))))))
; return list without last item
(define shave-off-end
(lambda (lst)
(reverse (rest (reverse lst)))))
(provide (all-defined-out))
Which, when given the previous AST, produces:
'(program (expression (lparen "(")(num "4") (op "+") (num "5")
(op "+") (num "6") (op "+") (expression (lparen "(") (num "5")
(op "*") (num "7") (op "*") (id "z") (rparen ")")) (rparen ")")))
Lastly, I made a small script to unpack the resulting AST into a string:
#lang racket
#|
translator.rkt
DESCRIPTION:
Unpacks the AST-datum of an infix language
into a single string.
TO USE:
Import this file into another project and call
(translate <some-AST-datum-from-tree-transform.rkt>)
It will return a string form of that AST.
TODO:
allow multi-line files, define statements, and
assignments.
Racket 6.6 | 10/25/16 | MIT license
|#
; Unpacks an AST-datum into a string.
(define translate
(lambda (expr)
(cond
;;;-------------------------------- TOP LEVEL PROGRAM -------------------------------------------;;;
[(equal? (first expr) 'program)
(let ([result ""]
[content (rest expr)])
(for ([i content])
(set! result (string-append result (translate i) ";\n ")))
result)]
;;;-------------------------------- INFIX TERMS -------------------------------------------------;;;
[(equal? (first expr) 'expression) ;;;in-fix notation
(let ([statement ""]
[content (rest expr)])
(for ([i content])
(cond
[(or (equal? (first i) 'expression) (equal? (first i) 'func-call))
(set! statement (string-append statement (translate i)))]
[else (set! statement (string-append statement (second i)))]))
statement)]
;;;--------------------------------- FUNCTION CALLS ----------------------------------------------;;;
[(equal? (first expr) 'func-call) ;;;func call
(let ([statement ""]
[content (rest expr)])
(for ([i content])
(cond
[(or (equal? (first i) 'expression)
(equal? (first i) 'func-call))
(set! statement (string-append statement
(translate i)))]
[else (set! statement (string-append statement
(second i)))]))
statement)])))
(provide (all-defined-out))
Which, when all is said and done prints this out: "(4+5+6+(5*7*z));\n "
I also created a small test script (which will be expanded) using
Rackunit.
It is included in the .zip as "test-cases.rkt".
Lastly, using my dot-graph converter, I did a visualization.
The post-fix ast:
The infix ast:
So, ah...yeah.