XML DTD Validation in Clojure: Turning It Off, Parsing Malformed XML

I wanted to parse some externally-generated and malformed HTML, so naturally I went to the short and sweet clojure.xml/parse function. I got a nasty error:

error: java.io.IOException: Server returned HTTP response code: 503 for URL: http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd

It seems that the W3C blocked access to the DTDs two years ago, but Java still tries to load them by default anyway. The following allows clojure.xml to run without checking external DTDs:

(defn startparse-sax-non-validating [s ch]
  (.. (doto (. javax.xml.parsers.SAXParserFactory (newInstance))
       (.setValidating false)
       (.setFeature "http://apache.org/xml/features/nonvalidating/load-dtd-grammar" false)
       (.setFeature "http://apache.org/xml/features/nonvalidating/load-external-dtd" false)
       (.setFeature "http://xml.org/sax/features/validation" false)
       (.setFeature "http://xml.org/sax/features/external-general-entities" false)
       (.setFeature "http://xml.org/sax/features/external-parameter-entities" false))

       (newSAXParser) (parse s ch))))

Then you can simply (xml/parse "sourcefile.xml" startparse-sax-non-validating). This is not the ideal solution — ideally, we want to use a locally cached DTD — but it works well enough for one-off code. Read on for further information. Continue reading

Goldman’s Blankfein before Congress

Goldman CEO Lloyd Blankfein testifying before Congress

And it came to pass, that Goldman’s then-chief overlord was order’d to parade full 7 days and 7 nights through the streets of lower Manhattan, clad in naught save his drawers, bearing tether’d to heavy chains every SEC filing that Goldman hath wrought for 10 years previous; while the peasants of the borough were to gather round to hurl rotting vegetables thence, and cry, “Lo! How the mighty have fallen!”

And only then, once the indignity were suffer’d in full, might he retreat to his penthouse to recover, or to his yacht, or to the coffers of his chalet in southern France, or to his Bahamas estate countinghouse, or to any other sequester’d location whatsoever.

Randomness with Clojure

Psyleron sells a hardware random number generator and associated software package for experimentation on the interaction of consciousness and randomness. No, don’t laugh. Princeton Engineering Anomalies Research conducted decades of methodologically rigorous research into the nature of randomness. They concluded that deliberate conscious intention can produce a small but statistically significant and reproducible effect on the outcome of stochastic processes. Psyleron is a commercial offshoot of PEAR.

The Psyleron system, unfortunately, costs hundreds of dollars and only runs on Windows, so I wrote some quick Clojure to do experimentation with randomness, using /dev/random as the source. (It appears that /dev/random on Mac OS X does not provide the same quality guarantees as it does on Linux, but it’s fine for a prototype and the implementation allows easy substitution of another randomness source later.) Continue reading

SQL WHERE clauses in Clojure from S-Expressions

SQL WHERE and HAVING clause strings can be rendered from neat, structured S-expressions with this simple Clojure macro:

(defmacro sql-expand
  "Transforms nested s-expressions into SQL, for use in WHERE or HAVING clauses.

e.g.: (sql-expand (and (> foo 3)
                       (< bar 4)))

     -> '(foo > 3 AND bar < 4)'

 Nested clauses:

      (sql-expand (or (and (> foo 3) (< bar 4)) 
                      (> baz 6)))

      -> '((foo > 3 AND bar < 4) OR baz > 6)'

 Embedded arbitrary SQL:
         (sql-expand (and (> foo 3)
                          (< bar \"(SELECT max(foo) + 10 FROM bar)\")))
      -> '(foo > 3 AND bar < (SELECT max(foo) + 10 FROM bar))'

  (let [head (first form)]
    (if (includes? '(and or) head)
      `(str "(" (sql-expand ~(second form)) " "
            ~(.toUpperCase (str head)) " " 
            (sql-expand ~(last form)) ")")
      (str (second form) " " head " " (last form)))))

Continue reading

Clojure SQL Dates and Times with Joda

The date and time classes built into Java are a horrible mess. What are Clojure programmers to do? Use Joda Time instead. Joda Time is coherently designed and easy to use.

JDBC (at least, the PostgreSQL driver) can’t use Joda Time directly (without explicit type mapping). One way to convert a Joda LocalDate into something JDBC can use:

(defn to-sql-date [date]
  "Convert any Joda-readable date object (including a string) to a java.sql.Date"
  (java.sql.Date. (.. (LocalDate. date) toDateMidnight toInstant getMillis)))

It’s not pretty, but it works. You can follow a similar procedure for any of the Joda classes.

Setting Clojure’s Log Level

Clojure.contrib.logging doesn’t have any way to set the log level. This is obviously a problem if you want to make use of various log levels (debug, warn, etc.) to separate different logging depths. Here’s a function to set the logging level on my default clojure.contrib.logging setup:

Update (Feb 2012): I suggest that you use clj-logging-config for all new projects, and bypass this mess entirely! What’s below is here for historical purposes only.

;;; This version works when (impl-get-log "") returns an org.apache.commons.logging.impl.Jdk14Logger
(use 'clojure.contrib.logging)
(defn set-log-level! [level]
  "Sets the root logger's level, and the level of all of its Handlers, to level.
   Level should be one of the constants defined in java.util.logging.Level."
  (let [logger (.getLogger (impl-get-log ""))]
    (.setLevel logger level)
    (doseq [handler (.getHandlers logger)]
      (. handler setLevel level))))
;;; This version works when (impl-get-log "") returns a java.util.logging.LogManager$RootLogger
(use 'clojure.contrib.logging)
(defn set-log-level! [level]
  "Sets the root logger's level, and the level of all of its Handlers, to level.
   Level should be one of the constants defined in java.util.logging.Level."
  (let [logger (impl-get-log "")]
    (.setLevel logger level)
    (doseq [handler (.getHandlers logger)]
      (. handler setLevel level))))

This log level setting function works with a standard out-of-the-box clojure.contrib.logger on my system — depending on what logging libraries it finds on your classpath, as per the docs, it may not work for you. In particular, you need to be using clojure.contrib.logger to wrap an Apache Commons Logging instance, which in turn wraps a java.util.logging instance. This is the way my system works without any configuration; YMMV. Hopefully, something like this will be assimilated into a universal wrapper in the next clojure.contrib.logging.

For the gory details of how this was constructed… Continue reading

Libcurl bindings for Java on Macintosh

Libcurl’s Java bindings now compile on Macintosh, with a few minor modifications to the Makefile. Get the code from my Github account.

Update: More recent Java bindings, which do not seem to be linked anywhere on the libcurl site, are available at http://www.gknw.net/viewvc/trunk/?root=curl-java. Still no multi support, though.

Remote debugging Clojure and Leiningen

Debuggers like JSwat and Eclipse can be remotely attached to live Clojure processes via TCP, but you have to tell the JVM to enable remote debugging when you start it. Leiningen does not presently have an (obvious) way to set java command line flags in project.clj, but it does pass the JAVA_OPTS environment variable to the JVM, so you can do:

JAVA_OPTS="-Xdebug -Xrunjdwp:transport=dt_socket,server=y,suspend=n" lein swank 
# or "lein repl", etc.

to enable remote debugging. Watch the output for the port it allocates, and put that into JSwat/Eclipse/etc. Continue reading