running bash commands in jekyll (plus git commit info)

I started vibe-coding a little jekyll extension in Ruby to embed the git commit sha and timestamp into the blog, then accidentally came up with something even cooler.

the original goal - git info in the jekyll build output

When I push a change to the site, github takes a minute to build and publish the new version. I have to either manually navigate to the github page and watch the build-status, or refresh the webpage over and over to see if it’s changed yet, and make sure the browser isn’t caching it, etc… And for subtle or invisible changes, this can be even trickier to tell if it’s changed yet.

I wanted to be able to have the sha and timestamp from the git commit embedded somewhere in the webpage where I could quickly check it.

I frequently embed the short commit hash in the build artifacts of other projects I work in, even the firmware ones. I find that it’s a more reliable way to track what code a software build was made from than a build number (which can be arbitrary and require manual association with a commit) or a version number (which typically requires manual updating and only gets changed at a formal release time). With this I can literally generate a URL that links back to that commit on github.

I also like to append a -dirty suffix to the sha if there are any untracked changes.

the original solution

Since I’m not familiar with Ruby, I used a little extra AI help to quickly figure out the script. It was also helpful for figuring out the exact git log commands to get the timestamp in my preferred timezone.

For now it's on the about page so I can be sure of what version of the site I am looking at.

the git jekyll code

module Jekyll
  class GitCommitGenerator < Generator
    priority :highest
    def generate(site)
      commit_hash = ENV['JEKYLL_BUILD_REVISION'] || `git rev-parse --short HEAD`.strip
      if !`git status --short`.strip.empty?
        commit_hash += "-dirty"
      end
      site.config['commit_hash'] = commit_hash
      site.config['commit_timestamp'] = `TZ="America/New_York" git log -1 --format=%cd --date=iso-local`.strip
    end
  end
end

the next goal - embedding code

While starting to write this blog post, I was trying to figure out how best to embed or include the plugin code into the post. I could have just copied and pasted the code, as it currently is, into this post inside a code tag and be done with it…

But I have the file right here - why can’t I embed it somehow? However, since it’s in a directory with an underscore, jekyll won’t include it in the output by default.

Maybe I could just include the plugins folder manually? But that seems too heavy-handed, and I don’t want to be dumping other unnecessary files into the build output.

Since I’m including files relative to my posts with the jekyll-postfiles extension, maybe I could just symlink the file into this directory? Nah, that didn’t seem to work - the symlink was getting copied instead of the file… Maybe I could go the other way around, and have the actual code in this directory and have a symlink in the plugins folder to the file here? But that seems like poor organization and pretty brittle to have my actual code living inside some random blog post.

I started exploring some other relative-include solutions and found one that mentioned executables. Hmm, this got me thinking - could I just call cat _plugins/... from within the jekyll file? Or any other bash command?

Imagine the possibilities… I could invoke some build steps in other languages, or even fetch remote content and pipe it right into my markdown.

the bonus solution - executing system commands from within jekyll

Here’s the code I came up with to create a run_cmd liquid tag that allows me to straight-up run commands and put their stdout output in the pre-rendered jekyll files.

module Jekyll
  class RunCommandTag < Liquid::Tag
    def initialize(tag_name, command, tokens)
      super
      @command = command.strip
    end

    def render(context)
      `#{@command}`.rstrip
    rescue => e
      "Error executing command: #{e.message}"
    end
  end
end

Liquid::Template.register_tag('run_cmd', Jekyll::RunCommandTag)

And here’s the markdown/liquid i had to write to generate the code block above:

```ruby
{% run_cmd cat _plugins/run_cmd.rb %}
```

Side-note - to generate this second code block, I had to escape the liquid using raw / endraw.

I could even use this to replace the git plugin I just made. I’ll get around to that later…

And just for the hell of it, here’s the output of the grep --version command for the runner that’s built the file you’re reading right now (e.g. run_cmd grep --version wrapped in liquid brackets, wrapped in a markdown code fence):

grep (GNU grep) 3.11
Copyright (C) 2023 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <https://gnu.org/licenses/gpl.html>.
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.

Written by Mike Haertel and others; see
<https://git.savannah.gnu.org/cgit/grep.git/tree/AUTHORS>.

grep -P uses PCRE2 10.42 2022-12-11

some security considerations

I did a quick test to make sure that a run_cmd ... invocation that outputs another run_cmd ... won’t be recursively evaluated. If it did, and my command involved fetching some external file, a bad actor could modify that external file to cause my builder to run some arbitrary commands. But thankfully that’s not the case.

some performance considerations

If I include a file using another liquid directive which includes a run_cmd, then that would get evaluated, which is nice. But this means if I were to use a run_cmd in a common file that’s included in multiple places, such as _includes/footer.html, then the command would be invoked multiple times, once for each file that includes the footer. So this kind of filter which runs in the pre-rendered stage should be used mainly in the lowest-level output files.