SQL WHERE clauses in Clojure from S-Expressions

SQL WHERE and HAVING clause strings can be rendered from neat, structured S-expressions with this simple Clojure macro:

(defmacro sql-expand
  "Transforms nested s-expressions into SQL, for use in WHERE or HAVING clauses.

e.g.: (sql-expand (and (> foo 3)
                       (< bar 4)))

     -> '(foo > 3 AND bar < 4)'


 Nested clauses:

      (sql-expand (or (and (> foo 3) (< bar 4)) 
                      (> baz 6)))

      -> '((foo > 3 AND bar < 4) OR baz > 6)'

 Embedded arbitrary SQL:
         (sql-expand (and (> foo 3)
                          (< bar \"(SELECT max(foo) + 10 FROM bar)\")))
      -> '(foo > 3 AND bar < (SELECT max(foo) + 10 FROM bar))'

"
  [form]
  (let [head (first form)]
    (if (includes? '(and or) head)
      `(str "(" (sql-expand ~(second form)) " "
            ~(.toUpperCase (str head)) " " 
            (sql-expand ~(last form)) ")")
      (str (second form) " " head " " (last form)))))

Generating SQL from Lisp has been approached in a few different ways:

CL-SQL version

CL-SQL uses reader macros to do something similar, but with square brackets and new syntax:

;; From the CL-SQL docs
(sql [select [foo] [bar] :from [baz]] 'having [= [foo id] [bar id]] 
     'and [foo val] '< 5)
=> "SELECT FOO,BAR FROM BAZ HAVING (FOO.ID = BAR.ID) AND FOO.VAL < 5"

I like my version better as it works with regular S-expressions. No new syntactic constructs are necessary, so you can re-use the parsers already set up in your brain for quick and easy mental code scanning and construction.

Some people mentioned that you have to jump through a lot of hoops to make sure that CL-SQL’s reader macros are activated and deactivated correctly around your code. Apparently it’s the only major Common Lisp package that actually uses reader macros extensively. (Clojure doesn’t support user-provided reader macros anyway, at least not without messy unsupported hackery.)

Cynojure’s functional version

Cynojure has a similar approach to mine:

;; From Cynojure's blog
 :where (sql-and (sql-= 'role +role-programmer+)
                                 (sql-< 'age 30)))

Cynojure uses special sql-* function names instead of transforming an s-expression with a macro. Since and and or are already in use by Clojure itself, he had to use other names. Symbols naming SQL fields (role and age, in this case) must also be quoted. This adds a lot of unnecessary noise to the signal and, as he notes, “interrupts the flow” of the code when reading.

By virtue of not automatically evaluating its arguments, a macro-based version can re-use already reserved symbols like and and or, and the user doesn’t have to bother with messily quoting symbols. This results in much cleaner and more readable code. The user-supplied data for the macro version contain 100% signal and no noise.

Chouser’s recursive function within a macro version

Chouser has a different macro-based approach involving a recursive anonymous function:

;; By Chouser, from http://groups.google.com/group/clojure/msg/62437f3b96835b58
(defmacro where [e]
  (let [f (fn f [e]
            (if-not (list? e)
              [(str e)]
              (let [[p & r] e]
                (if (= p `unquote)
                  r
                  (apply concat (interpose [(str " " p " ")]
                                           (map f r)))))))]
    (list* `str "where " (f e)))) 

His macro allows variable interpolation with quasiquotes, which is really cool. For example, you can do:

;; Chouser's examples
user=> (where (and (> i (- 3 1)) (< i ~(+ 3 1))))
"where i > 3 - 1 and i < 4"
user=> (let [my-name "'chouser'"] (where (and (> id 0) (= name ~my-name))))
"where id > 0 and name = 'chouser'" 

This version also avoids the inelegant repetition of the SQL template at lines 25-28 in my version, at the price of slightly more complexity.

The only problem is that it doesn’t write grouping parentheses in the output SQL string, which discards potentially important grouping information embedded in the s-expression tree and leaves evaluation grouping order to the SQL server:

(where (or (<= baz 123)
           (and (> foo 3) (> bar 6))))
-> "baz <= 123 or foo > 3 and bar > 6"

My version

By being aware of the and and or list heads as special case operators which join other clauses together, my version can wrap their output in parentheses, which preserves the evaluation priority implicit within the input S-expression in the final SQL output. It seems that there should be a way to do the same thing both generically and recursively (that is, without repetition of similar code and without conditional recursion), but I can’t make that work without making the macro significantly longer and more complex (read: fragile), so for the moment, I’m going to live with simple and slightly repetitious.

One problem lies in knowing when to add parentheses. Although bluntly wrapping the result of every SQL operator and outputting SQL like “((foo < 3) AND (bar > 4))” will execute fine on the SQL server, the extra parentheses are superfluous. Ideally, we want to use parentheses only when strictly necessary to preserve the semantic intent of the S-expression tree, not blindly wrap every single expression.

Some sort of awareness of clause-combinatory SQL operators such as AND and OR as special cases is therefore necessary to know when to add parentheses in the output, but I don’t like having to repeat what’s essentially the same almost identical output template. In all cases, we have 3 elements, and we’re just swapping the first two and then taking a string output.

The other issue is that the AND/OR case needs to recurse, but the ordinary case doesn’t; the ordinary case can be translated directly to a string. The logic would have to be adjusted to either differentiate between these cases and recurse conditionally, or else recurse unconditionally and then have a way to deal with the base case of translating non-trinary lists. That is, with unconditional recursion, a foo symbol will eventually be fed into the macro by itself and must return just the string "foo", which complicates the management of the SQL string template significantly by forcing you to have multiple templating procedures and then decide between them. (Well, we’re doing that anyway now, but that would be even worse.)

I messed around with conditionally interpolateing the template result into a '("(" ")") list, and with conditionally stringifying directly or else building a trinary list based on the structure of the input, but the result was ugly, far longer and far more complicated and confusing than just repeating the template. Any suggestions?

I also haven’t figured out the best way to add unquote support without making a huge mess. That would be a very cool feature. Any ideas on that are also welcome.

Someday, this macro could be a small part of a user-transparent Clojure object persistence library like Common Lisp’s Elephant. That would truly be a killer feature for the Clojure language.

2 Responses to SQL WHERE clauses in Clojure from S-Expressions

  1. The approach we take with ClojureQL is function based. In ClojureQL you first define a query (which might end up as SQL, but does not have to do so), which is later on compiled when we actually know which DB we target.

    The WHERE-clause is basically turned into a function which acts on the query to turn it into a valid SQL string.

    Examples can be found here: http://kotka.de/blog/2010/02/A_preview_of_ClojureQL.html

  2. There’s a project called NAMED-READTABLES that aims to take the pain out of having custom readtables in your projects. CL-SQL’s bracket reader is indeed painful to integrate into things like SLIME, but I expect things will get better as NAMED-READTABLES is widely adopted.