BZip2 compressed files can easily be read in Clojure thanks to Apache Commons Compress. Sample code inside!
Add an Apache Commons Compress dependency to your Leiningen
project.clj file like so:
(defproject bz2reader "0.1.0-SNAPSHOT" :dependencies [ [org.clojure/clojure "1.3.0"] [org.apache.commons/commons-compress "1.4"] ;; Read/write compressed files (BZip2, etc.) ])
Then, you can make a BZip2 capable reader-generating function as follows:
(ns bz2reader.util (:require [ clojure.java.io :as io]) (:import (org.apache.commons.compress.compressors.bzip2 BZip2CompressorInputStream) )) (defn bz2-reader "Returns a streaming Reader for the given compressed BZip2 file. Use within (with-open)." [filename] (-> filename io/file io/input-stream BZip2CompressorInputStream. io/reader))
This is based on a line from the fs utilites project, which includes a function to uncompress the file on disk. Rather than send the output of our BZip2 stream to a copy function which writes it to disk, we just return the stream for the user to use in the program. You can use the BZip2-enabled stream with any of the normal Clojure I/O methods, just like any other stream.
Here’s an example of how you can print the contents of a BZip2-compressed file to stdout using the above function:
(defn print-bz2-file [filename] (with-open [rdr (bz2-reader filename)] (doseq [line (line-seq rdr)] (println line))))
It’s easy to create similar readers for GZip files, zip files, and so on using the other Apache Commons Compress classes. You can also write BZip2 compressed files similarly, just by piping your regular OutputStream through the encoder.