Posted on March 21, 2020

Using Hakyll as a Lightweight CMS

The PL research group at UPenn felt like building a new website recently. The old one (archive) had been essentially unchanged for nearly 15 years, and its build system was using Java and something called OMake. If you’ve heard of OMake before that makes one of us.

I did most of the backend work for this project and decided that using Haskell’s Hakyll library seemed like a good idea. I knew going in that Hakyll isn’t a CMS and that I’d had trouble in the past wrestling with some of its design aspects, which are pretty narrowly tailored. In particular, Hakyll is not a build system, at least not a general-purpose one. Nonetheless, a few months after initial deployment, Hakyll still seems like a good choice. But its use for our purposes requires some mild thoughtfulness so that you’re not reinventing Common Lisp, or specifically reinventing GNU Make.

Near the end I’ll explain our process and why we chose Hakyll, but I know you’re just here for a tour of the code :)

The code

Frontend

Our frontend uses Bootstrap with jQuery. There isn’t anything particularly special about this. The relevant snippet is here.

match "vendor/**" $ do
    route   idRoute
    compile copyFileCompiler

Our frontend libraries are dumped in a top-level directory vendor/ and simply copied to the output folder _site/. In particular we don’t try to build Bootstrap from its Sass sources or something crazy. I think it can be tempting for PL enthusiasts to over-engineer things in order to obtain (real or perceived) correctness. Here, that would mean tracking Bootstrap upstream as a submodule in git, then compiling it manually, in order to achieve something like “up-to-date-ness” or “flexibility”, or worse yet to patch Bootstrap. Instead I made a personal vow to keep it simple. There’s no particular need to stick with the latest sources from upstream, so we can just copy stuff here if we feel like updating. If we make a change in this design, it would be to switch to using a CDN.

Meetings

The PL Club has regular meetings on Fridays where one of us or a guest presents a topic for about 45 minutes. We thought it would be nice to advertise those publicly, listing the speaker, title, and an abstract. Our solution is very simple: The speaker writes a Markdown file and stores it here, files a pull request, and then a simple site rebuild picks up on the new content and updates the site.

Here’s some of the relevant code. This is pure normal Hakyll, in the domain where it particularly shines.

match "meetings/*" $ do
    route   $ idRoute <!> setExtension "html" <!> canonizeRoute
    compile $ do
        pandocCompiler
            >>= loadAndApplyTemplate "templates/meeting.html" siteContext 
            >>= loadAndApplyTemplate "templates/default.html"  siteContext
            >>= relativizeUrls
-- ... truncated ...
match "club.html" $ do
    route   $ idRoute <!> canonizeRoute
    compile $ do
        meetings <- recentFirst =<< loadAll "meetings/*"
        let meetingsCtx =
                listField "meetings" siteContext (return meetings) `mappend`
                constField "title" "Penn PL Club" `mappend`
                siteContext
        getResourceBody
            >>= applyAsTemplate meetingsCtx
            >>= loadAndApplyTemplate "templates/default.html" meetingsCtx
            >>= relativizeUrls            

People

The front page also shows the members of the group, sorted by categories. This is handled very nearly like meetings are. Each person gets a Markdown file here. This is nice for the same reasons above: People can very easily add themselves, change the spelling of their names, update their website URL, etc., without anything but a text editor and a pull request. The Markdown files are mostly just metadata—it would make sense to use the body of the file to give a space for people to write information about their research, although we don’t currently do that.

We use Hakyll’s tags mechanism to sort people into categories (faculty, postdoc, etc.). This lets us use all of the tags infrastructure, although mostly we just use it to group people by category. A person’s metadata includes a tags field that lists which category they fall into. This has the downside of potential run-time errors for misspelled tags, possibly silent errors in which people are quietly dropped. An easy modification would be to pull the same information from a folder hierarchy, e.g. people/faculty/person.md instead.

Displaying people is a little harder. Our design currently puts people into groups of three, and then we display those groups as rows. Making this work with Hakyll isn’t great. The code is here:

peopleContext :: Tags -> Context String
peopleContext ptags = 
  let faculty  = (unbindList 3) <$> loadTag ptags "faculty" :: Compiler [[Item String]]
        ... etc ...
  in
    nestedListField "facultyGroup" "faculty" siteContext faculty `mappend`
    ... etc ...
    
unbindList :: Int -> [a] -> [[a]]
unbindList _ [] = []
unbindList n as =
    (take n as):(unbindList n $ drop n as)
    

We start by loading (categories of) people from the tags structure. Let’s pretend a person has type Person, so the type we have loaded is [Person]. Now we want to split these people into groups of three to yield a [[Person]]. We want to iterate over this list, and for each element of type [Person] we want to instantiate a template that iterates over its list of three persons and pretty prints a row.

Hakyll has its own template system including basic support for iteration. What’s challenging is that we need nested iteration. In Hakyll terms this means we want a listField of listFields. This is the code we use (here):

nestedListField :: String -- outer key
                -> String -- inner key
                -> Context a
                -> Compiler [[Item a]]
                -> Context b
nestedListField ko ki ctx items =
    listField ko innerctx ((Item "" <$>) <$> items)
  where
    innerctx = listFieldWith ki ctx (\(Item _ as) -> return as)

Basically we accept a list of lists, a Hakyll context designed to run on the individual Item a elements, and two key names. We create a list field whose individual values have type Item [Item a]s, and this list is rendered within another context that does the following: accepts an argument of type Item [Item a], strips off the outer constructor, and renders the resulting list in the original context.

Unfortunately this code is hard to understand without knowing about how Hakyll’s template system works. And that’s where we’re somewhat fighting against Hakyll, since its purpose isn’t to be a powerful template engine or general CMS.

Making People

I said that we create people as Markdown files which is mostly correct. But when we were first creating the site we found it easiest to let people use a spreadsheet on Google Docs and fill in their information on that. Then we exported that list to CSV and used a little Python script to split that into Markdown files.

Now when a new class of PhD students joins the group, we will likely do the same thing. But if we naïvely added to the original list of people and re-exported, we’d break things. Since we encourage people to make changes to their Markdown files, the original CSV file doesn’t reflect the latest updates. We could have people edit the CSV file instead, but this has several serious downsides to using our Markdown files. Among them, Hakyll isn’t built to generate several Items from single files like this, so we’d still have to split the CSV into individual files. If we tried doing that inside Hakyll, we’d be using Hakyll as a multi-stage build system, which it’s certainly not. This would also incur a dependency on Python just to build a site written mostly in Haskell.

Our “problem” can be solved by simply using our Python script as a one-way CSV -> Markdown process, only used to create new people in bulk. But this is an issue where we are clearly at the edge of what Hakyll’s good at. In a way, though, I’ve found this to be a good thing. If it’s hard to do something in Hakyll, it’s a sign that we might be overengineering things when a simpler solution exists.

Publications

Our professors have a long-maintained system for listing their publications on the PL Club website by merging several .bib files together and using bibtex2html. There is a Makefile, probably many years old, which orchestrates this here. It’s sounds easy enough to call a Makefile with Hakyll’s unixFilter, but this wasn’t good enough for a few reasons:

  • The output it produces is an extremely rigid HTML file that doesn’t fit into the website design at all
  • It doesn’t work without glue. unixFilter reads from stdout but the Makefile produces a set of files

Our solution is seen here. Namely, we simply use Hakyll’s unsafeCompiler escape hatch and use regular Haskell to run the Makefile in a temporary directory. Then we read the generated HTML with Pandoc and run some quick and dirty parsing to split the resulting table into a list of individuals Strings, one for each publication. Those strings are put into a Hakyll Context and used to generate the publications page with a listField.

The results look pretty good, no?.

Unfortunately this bypasses Hakyll’s caching mechanism and requires rebuilding the publications page every time the site is built. That’s because the Makefile fetches each professor’s .bib file from the internet. It could be possible to simply download these lists periodically and check them into the repo, but I doubt they would be kept up to date for a whole semester, much less for years to come. With the current setup, each week there’s a PL Club talk, the website is built in order to add the latest talk annoucement, which also rebuilds the publications page. Note that the build is incremental modulo the publications.

At any rate, this is a good case study in integrating a completely separate build system into Hakyll.

Grafting on the old site

Despite its age, there was plenty of content from the old website which the department wanted to maintain—old conference and project pages, for example. We had to find a way to serve this content and not break old URLs.

Our solution was simple.

match "old_site/**" $ do
  route   $ routeTail
  compile $ copyFileCompiler
 
-- | Move /foo/bar/bang.ext to /bar/bang.ext 
routeTail :: Routes
routeTail = customRoute $
  joinPath. tail. splitPath. toFilePath

All that happens here is that content from the old site (which was also mostly static) was copied into a top-level directory old_site/. At build time, this content is grafted on to the top-level of our overall website. This has the advantage of keeping URLs working but prevents the content from cluttering the repo with top-level directories for content no maintainers will ever touch.

The final product

Overall we’re quite pleased with the end product. The site is very easy to use, both on the programmatic side and the creating content side. Other people can successfully build the site and preview any changes they make without deploying, which is a particularly neat advantage over a true CMS like Wordpress. It doesn’t require any particular accounts system or central administrator to set others up with access besides basic Github controls. It requires no onboarding to explain how to create content, including uploading new students, pictures, talks, or publications.

The site has a “professional” feel to an extent—it looks good on phones for example, and the content feels as if there is a CMS on the backend. But it also has a cozy feel, like you’re looking at something some academics made for their department website. I know several of us liked the charm of the old website, at least to some extent, and I’d say this iteration of the website preserves that old school feeling with a definitively modern touch.

The priorities

Why Hakyll? Some of these major priorities for the new site included

  • Longevity
  • Ease of maintenance
  • Simplicity in design
  • Flexibility

These points aren’t totally redundant.

  • Longevity: The tools used to write, compile, deploy (etc…) the site must be reasonably expected to exist in usable form in 5 years.
  • Ease of maintenance: The tools must be usable and customizable by other maintainers without an unusually significant learning curve.
  • Transparency: The tools must not run on black magic, even if they are easy to use.
  • Flexibility: The site must be able to accommodate departmental needs, even if via a reasonable escape hatch

Bootstrap certainly satisfies these requirements as well as any other frontend design package.

Hakyll also meets the requirements. Contrary to the assertions of certain poorly-written articles, Haskell will exist in 5 years. 1 Hakyll itself was first uploaded to Hackage in 2009 and has massively expanded since then to become the primary static site generator in Haskell. It is reasonable to expect that any programming languages group will have people who know Haskell and probably even Hakyll for some time to come. Besides, using Haskell is a healthy form of dogfooding for a PL department.

Hakyll’s internals can be a little mysterious, but not too much so. Since it generates static sites only, its deployment is particularly simple. Hakyll is flexible enough to accomadate other components. Hakyll is also easy enough to use, except for Haskell general build infrastructure.

What else we considered

Shake

I gave a lot of thought to using Shake and Slick. Ultimately this was a dead-end. The flexibility here is off the charts, but as far as I can tell Shake seems to run off of actual black magic. Don’t be fooled by the public exports, because the behind-the-scenes view is much worse. I wouldn’t feel comfortable using something this complicated, and I could hardly inflict this on some poor soul who inherits the repo. It’s also hard to say what this will look like in 5 years.

Jekyll

Nothing against Jekyll but I don’t know Ruby and it seems appropriate for a PL department to use Haskell.

Nix

I thought about Nix and Styx. This could have proved to be an extremely flexible and robust system, but at the end of the day the arguments against it are the same as Shake. Besides, it’s not clear that using Nix actually gives you the sort of reproducibility that you might think—I’d say Nix derivations are unambiguously identified, but that doesn’t mean you can run rebuild a Nix derivation in 5 years and reasonably expect it to work without modifications. At least, not without pinning Nixpkgs. (And Nix doesn’t try to offer this–it’s just a misimpression one might have.)

Wordpress or Drupal

Our wonderful Penn Database Group group uses Drupal and it offers certain advantages… but no :) One of the biggest factors here is that by default our technology folks already had us set up with a means of deploying static content which is honestly so convenient I would sacrifice other features for it. And like I said, dogfooding. (Come to think of it, the database group using a real CMS is probably dogfooding too.)

Bulma

Shoutout to Bulma! I love this frontend library and would probably have chosen it over Bootstrap if I had been doing the design, although our frontend team went with Bootstrap which I also think is a good choice.


  1. Even if only as a Coq backend.↩︎