Jim's
Tutorials

Fall 2016
course
navigation

Actually doing stuff with ASTs

So I spent this week trying to change a reverse-polish expression into an infix one.
The BRAG docs have a starting place for this conversation.
So I tried to read as much as I could about using racket syntax objects,
but the docs are dense and (though i got some basic stuff working) it all felt pretty gross.
After a few days of looking at Rackets built-in tools and feeling underwhelmed, I decide to start coding my own.
I started with a pretty basic reverse-polish notation grammar: #lang brag program: (definition | assignment | expression)* definition: def id lparen (id)* rparen lbrac (expression)* rbrac assignment: (id eq expression)+ expression: ((arg | (lparen expression rparen))+ (id| op))+ arg: ( num | id ) op: OP id: ID num: NUM def: DEF eq: EQ lparen: LPAREN rparen: RPAREN lbrac: LBRAC rbrac: RBRAC and a dead simple lexer: #lang racket #| reversePolishLexer.rkt DESCRIPTION: A simple reverse-polish lexer/tokenizer. TO USE: To use, import into a new file and call (tokenize <some-input-port>). The resulting object should be given to a BRAG parser to be contructed into an AST. Racket 6.3 | 10/25/16 | License MIT |# (require parser-tools/lex) (require parser-tools/lex-sre) (require brag/support) (define-lex-abbrev number? (or (+ (char-range #\0 #\9)) (: (+ (char-range #\0 #\9)) "." (* (char-range #\0 #\9))))) (define-lex-abbrev identifier? (: (+ alphabetic) (* (or alphabetic #\- #\> #\< #\_ #\+ #\. #\/ #\= #\! #\? #\: #\$ #\% #\& #\~ #\^)))) ; A lexer to tokenize LISP files (define tokenize (lambda (file-port) (port-count-lines! file-port) (define lisp-lexer (lexer ["define" (token 'DEF lexeme)] [#\( (token 'LPAREN lexeme)] [#\) (token 'RPAREN lexeme)] [#\{ (token 'LBRAC lexeme)] [#\} (token 'RBRAC lexeme)] [(or #\+ #\- #\* #\/) (token 'OP lexeme)] ["=" (token 'EQ lexeme)] [identifier? (token 'ID lexeme)] [number? (token 'NUM lexeme)] [(or #\newline #\space "") (token 'WHITESPACE lexeme #:skip? #t)] ;; Ignore [(eof) (void)])) (define (next-token) (lisp-lexer file-port)) next-token)) (provide (all-defined-out))
Which, when fed "4 5 6 (5 7 z *) +" produces:
'(program (expression (arg (num "4")) (arg (num "5")) (arg (num "6")) (lparen "(") (expression (arg (num "5")) (arg (num "7")) (arg (id "z")) (op "*")) (rparen ")") (op "+")))
I made a partial (to-be-finished) AST transformer: #lang racket #| tree-transform.rkt DESCRIPTION: An AST-to-AST (both represented as a datum) translator that consumes a reverse-polish-notation AST and turns it into a infix-AST while remaining semantically consistent TO USE: Import this file into another project and call (translate <some-datum-that-was-generated-by-brag>) It will return another datum that is the AST of the resulting infix program. TODO: allow multi-line files, define statements, and assignments. Racket 6.6 | 10/25/16 | MIT license |# ; Translates a AST-datum to another AST-datum. (define transform-tree (lambda (item) (cond ;;;-------------------------------- TOP-LEVEL PROGRAM -------------------------------------------;;; [(equal? (first item) 'program) (let ([content (rest item)] [result '(program )]) (for ([i content]) (cond [(equal? (first i) 'expression) (set! result (append result (list (transform-tree i))))])) result)] ;;;-------------------------------- INFIX TERMS -------------------------------------------------;;; [(and (equal? (first item) 'expression) (equal? (first (last item)) 'op)) ;;; start with 'expression and end with 'op (let ([op (last item)] ;;; reconstruct as infix [args (middle item)] [infix-expr '(expression (lparen "("))]) ;;;boiler plate (for ([i args]) (cond [(equal? (first i) 'arg) ;;; if terminal, append it (set! infix-expr (append infix-expr (list (second i)) (list op)))] [(equal? (first i) 'expression) ;;;if expression, rec-trans it (set! infix-expr (append infix-expr (list (transform-tree i)) (list op)))])) (append (shave-off-end infix-expr) (list (list 'rparen ")"))))] ;;;--------------------------------- FUNCTION CALLS ----------------------------------------------;;; [(and (equal? (first item) 'expression) (equal? (first (last item)) 'id)) ;;; is an expression that ends in a function-id (let ([func (last item)] ;;; contruct with commas seperating args [args (middle item)] [args-with-commas '()]) (for ([i args]) (cond [(equal? (first i) 'arg) (set! args-with-commas (append args-with-commas (list (second i)) (list '(comma ", "))))] [(equal? (first i) 'expression) (set! args-with-commas (append args-with-commas (list (transform-tree i)) (list '(comma ", "))))])) (append (list 'func-call func '(lparen "(")) (shave-off-end args-with-commas) (list '(rparen ")"))))]))) ;;; insert into func call ; return a list without either end item (define middle (lambda (lst) (reverse (rest (reverse (rest lst)))))) ; return list without last item (define shave-off-end (lambda (lst) (reverse (rest (reverse lst))))) (provide (all-defined-out))
Which, when given the previous AST, produces: '(program (expression (lparen "(")(num "4") (op "+") (num "5") (op "+") (num "6") (op "+") (expression (lparen "(") (num "5") (op "*") (num "7") (op "*") (id "z") (rparen ")")) (rparen ")")))
Lastly, I made a small script to unpack the resulting AST into a string: #lang racket #| translator.rkt DESCRIPTION: Unpacks the AST-datum of an infix language into a single string. TO USE: Import this file into another project and call (translate <some-AST-datum-from-tree-transform.rkt>) It will return a string form of that AST. TODO: allow multi-line files, define statements, and assignments. Racket 6.6 | 10/25/16 | MIT license |# ; Unpacks an AST-datum into a string. (define translate (lambda (expr) (cond ;;;-------------------------------- TOP LEVEL PROGRAM -------------------------------------------;;; [(equal? (first expr) 'program) (let ([result ""] [content (rest expr)]) (for ([i content]) (set! result (string-append result (translate i) ";\n "))) result)] ;;;-------------------------------- INFIX TERMS -------------------------------------------------;;; [(equal? (first expr) 'expression) ;;;in-fix notation (let ([statement ""] [content (rest expr)]) (for ([i content]) (cond [(or (equal? (first i) 'expression) (equal? (first i) 'func-call)) (set! statement (string-append statement (translate i)))] [else (set! statement (string-append statement (second i)))])) statement)] ;;;--------------------------------- FUNCTION CALLS ----------------------------------------------;;; [(equal? (first expr) 'func-call) ;;;func call (let ([statement ""] [content (rest expr)]) (for ([i content]) (cond [(or (equal? (first i) 'expression) (equal? (first i) 'func-call)) (set! statement (string-append statement (translate i)))] [else (set! statement (string-append statement (second i)))])) statement)]))) (provide (all-defined-out)) Which, when all is said and done prints this out: "(4+5+6+(5*7*z));\n "
I also created a small test script (which will be expanded) using Rackunit.
It is included in the .zip as "test-cases.rkt".
Lastly, using my dot-graph converter, I did a visualization.
The post-fix ast:
The infix ast:
So, ah...yeah.
http://cs.marlboro.edu/ courses/ fall2016/jims_tutorials/ ldavis/ Oct_25
last modified Tuesday October 25 2016 11:39 pm EDT

attachments [paper clip]

     name last modified size
[IMG]infix.png Oct 25 2016 11:37 pm 103kB [IMG]post-fix.png Oct 25 2016 11:37 pm 106kB    reversePolishNotation.zip Oct 25 2016 11:37 pm 50.3kB