Implementing a loosened form of syntax-rules (wrong :()

found out this is quite wrong, kept for archival purposes only :3

(editor note: if you want to actually learn about Scheme frontends, read mflatt's Binding as Sets of Scopes)

Expanding out syntax-rules macros is the first step a Scheme program goes through in a compiler. Syntax-rules is the only macro system present in the R5RS standard and the R7RS-smal standard, two standards most implemented by implementers.

There are multiple approaches to implementing syntax-rules in a Scheme implementation, including implementing explicit renaming macros and writing syntax-rules in terms of it as Chibi does, implementing syntax-case and writing syntax-rules in terms of it as Chez does, and implementing it directly. Implementing syntax-rules directly could be considered the most simple approach since it does not require the full Scheme language to be available at compile time, and this is the approach we will try out in this article.

A note: the “loosened” in the title does not mean that the implemented pattern language is not in its fullest form, with ellipses and hygiene, but rather that error handling is loosened, as in many toy language implementations. However, you could add syntax checking as a separate pass in your compiler rather easily.

We will start by implementing a simple pattern matcher, a more complex one, and finally completing the puzzle by implementing the transcriber for syntax-rules templates.

A simple pattern matcher

Let’s consider a minimal pattern matching language. We support matching conses of conses and symbols with another cons, creating bindings for the symbols present in the pattern. We express the bindings as an alist, so we could implement the matcher as a function:

(define match
  (lambda (pattern form)
    (letrec ((iter (lambda (pattern form acc)
                     (cond ((not acc) #f)
                           ((symbol? pattern)
                            (cons (cons pattern form) acc))
                           ((and (cons? pattern) (cons? form))
                            (iter (cdr pattern)
                                  (cdr form)
                                  (iter (car pattern) (car form) acc)))
                           (else #f)))))
      (iter pattern form ‘()))))

The full pattern matcher

Our previous pattern matcher is lacking ellipses support and the ability to specify symbols used as literals. However, before we advance in that route, we need to settle on a data structure for expressing the match result, since each symbol could be matched multiple times and nested differently.

One might think of addressing that by representing match results in a list of alists for symbols nested one level deep, list of lists of alists for those nested two levels deep and so on, but that won’t work since in the template, the different template variables could be split and joined at will, as long as the nesting levels are correct in the standard.

That was close to a valid solution though. We were getting a table of type index -> symbol -> match, but we could “transpose” the table to get an alist of symbol to a list or a match. That is what the following code does:

(define lookup
  (lambda (env symbol)
    (letrec ((lookup-frame (lambda (frame)
                             (cond ((null? frame) #f)
                                   ((eq? (caar frame) symbol) (car frame))
                                   (else (lookup-frame (cdr frame))))))
             (iter-frame (lambda (env)
                           (if (null? env)
                               #f
                               (or (lookup-frame (car env)) (iter-frame (cdr env)))))))
      (iter-frame env))))

(define reverse-lookup
  (lambda (env symbol)
    (letrec ((lookup-frame (lambda (frame)
                             (cond ((null? frame) #f)
                                   ((eq? (cdar frame) symbol) (car frame))
                                   (else (lookup-frame (cdr frame))))))
             (iter-frame (lambda (env)
                           (if (null? env)
                               #f
                               (or (lookup-frame (car env)) (iter-frame (cdr env)))))))
      (iter-frame env))))

(define fand
  (lambda (l)
    (if (null? l)
        #t
        (if (car l)
            (fand (cdr l))
            #f))))

(define transpose-matches
  (lambda (matches keys)
    (map (lambda (key)
           (cons key
                 (map (lambda (al)
                        (cdr (assq key al)))
                      matches)))
         keys)))

(define scan-variables
  (lambda (template acc)
    (cond ((symbol? template)
           (if (or (eq? template '...) (memq template acc))
               acc
               (cons template acc)))
          ((vector? template)
           (scan-variables (vector->list template) acc))
          ((pair? template)
           (scan-variables (cdr template)
                           (scan-variables (car template) acc)))
          (else
           acc))))

;; returns alist of matches or #f
(define match-syntax-rules
  (lambda (rule form literals acc)
    (cond ((not acc)
           #f)
          ((and (null? rule) (null? form))
           acc)
          ((eq? rule '_)
           acc)
          ((and (symbol? rule) (not (memq rule literals)))
           (cons (cons rule form) acc))
          ((and (memq rule literals) (eq? rule form)) ; hygiene condition satisfied by renaming first
           acc)
          ((and (list? rule)
                (= (length rule) 2)
                (eq? (cadr rule) '...)
                (list? form))
           (let* ((pattern (car rule))
                  (matches (map (lambda (form)
                                  (match-syntax-rules pattern form literals '()))
                                form)))
             (if (fand matches)
                 (append (transpose-matches matches (scan-variables pattern '())) acc)
                 #f)))
          ((and (vector? rule)
                (vector? form))
           (match-syntax-rules (vector->list rule)
                               (vector->list form)
                               literals
                               acc))
          ((and (pair? rule) (pair? form))
           (match-syntax-rules (car rule)
                               (car form)
                               literals
                               (match-syntax-rules (cdr rule)
                                                   (cdr form)
                                                   literals
                                                   acc)))
          ((equal? rule form)
           acc)
          (else #f))))

Note that we are misusing the list for representing multiple matches since a symbol could match a list as well. You could maintain a “levels of nesting” table for strict error checking.

Transcribing the template

Transcribing the template has its own difficulties. Let's take a look at this example:

(define-syntax let
    (syntax-rules ()
     ((_ ((name val) ...) body1 body2 ...)
      ((lambda (name ...) body1 body2 ...)
       val ...))))
(define values 0)   
(define-syntax id
  (syntax-rules ()
    ((id formals ...) (lambda (formals ...) (values formals ...)))))
(let ((lambda 'not-anymore))
  (id a))

Our transcription output must be an s-expression since we might need to repeatedly apply syntax transformers, and by the definition of syntactic closures, lambda in the transcription output must still refer to the lambda defined in the definition environment of the macro, which is the core form in Scheme. It is obvious that no matter how we choose to rename identifiers in the output since we have no way of referring to lambda the core form.

In fact, what we need is to rename all identifiers. We need to interleave renaming and expansion, so as to avoid the following bug: the expanded result of

(define-syntax let
  (syntax-rules ()
    ((_ ((name val) ...) body1 body2 ...)
     ((lambda (name ...) body1 body2 ...)
        val ...))))
(define values 0)
(define-syntax id
  (syntax-rules ()
    ((id formals ...) (lambda (formals ...) (values formals ...)))))
(let ((lambda 'not-anymore))
  (id a))

should NOT have the 'not-anymore be recognized as a formal list.

During different transformations, it might not be obvious how an identifier may be used in the end, since a syntax transformer may be recursive. That's why we need to rename everything that is unbound after each transformation. Take a look at the following example:

(define-syntax let
  (syntax-rules ()
    ((_ ((name val) ...) body1 body2 ...)
     ((lambda (name ...) body1 body2 ...)
      val ...))))
(define-syntax letrec
  (syntax-rules ()
    ((letrec ((var1 init1) ...) body ...)
     (letrec "generate temp names"
       (var1 ...)
       ()
       ((var1 init1) ...)
       body ...))
    ((letrec "generate temp names" ()
             (temp1 ...)
             ((var1 init1) ...)
             body ...)
     (let ((var1 (if #f #f)) ...)
       (let ((temp1 init1) ...)
         (set! var1 temp1)
         ...
         body ...)))
    ((letrec "generate temp names" (x y ...)
             (temp ...)
             ((var1 init1) ...)
             body ...)
     (letrec "generate temp names" (y ...)
             (newtemp temp ...)
             ((var1 init1) ...)
             body ...))))
(letrec ((a 1)
         (b 2)
         (c 3))
  (a b c))

In this example, letrec expands to itself to generate as many temporary names as there are bindings to create.

In the end, the code boils down to this:

under construction

The driver is easily written, using separate passes for renaming and expanding:

under construction

I have also compiled a simple test for a syntax-rules expander:

(expand-macros
 '((define-syntax let
     (syntax-rules ()
       ((_ ((name val) ...) body1 body2 ...)
        ((lambda (name ...) body1 body2 ...)
         val ...))))
   (define-syntax cond
     (syntax-rules (else =>)
       ((_ (else result1 result2 ...))
        (begin result1 result2 ...))
       ((_ (test => result))
        (let ((temp test))
          (if temp (result temp))))
       ((_ (test => result) clause1 clause2 ...)
        (let ((temp test))
          (if temp
              (result temp)
              (cond clause1 clause2 ...))))
       ((_ (test)) test)
       ((_ (test) clause1 clause2 ...)
        (let ((temp test))
          (if temp
              temp
              (cond clause1 clause2 ...))))
       ((_ (test result1 result2 ...))
        (if test (begin result1 result2 ...)))
       ((_ (test result1 result2 ...)
           clause1 clause2 ...)
        (if test
            (begin result1 result2 ...)
            (cond clause1 clause2 ...)))))
   (define-syntax letrec
     (syntax-rules ()
       ((letrec ((var1 init1) ...) body ...)
        (letrec "generate temp names"
          (var1 ...)
          ()
          ((var1 init1) ...)
          body ...))
       ((letrec "generate temp names" ()
                (temp1 ...)
                ((var1 init1) ...)
                body ...)
        (let ((var1 (if #f #f)) ...)
          (let ((temp1 init1) ...)
            (set! var1 temp1)
            ...
            body ...)))
       ((letrec "generate temp names" (x y ...)
                (temp ...)
                ((var1 init1) ...)
                body ...)
        (letrec "generate temp names" (y ...)
                (newtemp temp ...)
                ((var1 init1) ...)
                body ...))))
   (letrec ((a 1)
            (b 2)
            (c 3))
     (a b c))
   (define odd? (lambda (x) x))
   (define even? (lambda (x) x))
   (let-syntax ((given-that (syntax-rules ()
                              ((given-that test stmt1 stmt2 ...)
                               (if test
                                   (begin stmt1
                                          stmt2 ...))))))
     (let ((if #t))
       (given-that if (set! if 'now))
       if))
   (let ((x 'outer))
     (let-syntax ((m (syntax-rules () ((m) x))))
       (let ((x 'inner))
         (m))))
   (letrec-syntax
       ((my-or (syntax-rules ()
                 ((my-or) #f)
                 ((my-or e) e)
                 ((my-or e1 e2 ...)
                  (let ((temp e1))
                    (if temp
                        temp
                        (my-or e2 ...)))))))
     (let ((x #f)
           (y 7)
           (temp 8)
           (let odd?)
           (if even?))
       (my-or x
              (let temp)
              (if y)
              y)))
   (let-syntax
       ((my-or (syntax-rules ()
                 ((my-or) #f)
                 ((my-or e) e)
                 ((my-or e1 e2 ...)
                  (let ((temp e1))
                    (if temp
                        temp
                        (my-or e2 ...)))))))
     (let ((x #f)
           (y 7)
           (temp 8)
           (let odd?)
           (if even?))
       (my-or x
              (let temp)
              (if y)
              y)))
   (let ((=> #f))
     (cond (#t => 'ok)))
   (define values 0)
   (define-syntax id
     (syntax-rules ()
       ((id formals ...) (lambda (formals ...) (values formals ...)))))
   (let ((lambda 'not-anymore))
     (id a))))