Home | Notes | Github |
---|
Written: 21-Jan-2023 (Updated: 24-Mar-2024)
So I wrote my own site generator… sorta. Its a script that just loop a single pandoc
command over a directory of markdown files. And despite thinking about custom headers, automatic dates, parsing headers on markdown files, it just doesn’t need to be that complex. So well done me on not over-engineering it (as may or may not be a perennial habit of mine).
LTDR; for fun.
So previously I was using hugo
, so why change? - I spent as much time making sure the features work as I would it I just wrote them by hand (with some copy paste from simple templates). - This is a function of the fact I write so little, and sporadically I forget things. - The benefit is I can make each page as unique as I want without worrying about breaking other pages when they next get generated.
Also hugo
function stuff wasn’t working too swell with my fav markdown notes database obsidian. This method lets me link a folder for the website into the “vault” where it plays 95% nicely with my systems.
Convert markdown into html
.
It’ll insert a header and footer file, and link an appropriate style-sheet.
Beyond that not much.
Update: I now have a separate RSS generator. It works off the index file, and seems to work okay. See my notes on it
Or more importantly what can it do that I’d like it to do but it’s not worth the effort?
--toc
, I spent too long chasing it.This, however, can all be done manually. Given the volume of writing I do (i.e. a couple times a year) the cost benefit works out well.
Another thing this script cannot (or shouldn’t) do is overwrite existing html
files. I consider this a feature, not a bug2. This means to update from the markdown, you’ll need to manually delete the existing html
and re-generate a new one. No overwriting means thay you can go in and manually massage the html
file to whatever is the heart desires, and not worry about it being destroyed or causing an error the next time you write something.
The downside is that it makes updating a footer a pain if you have been massaging the html
. If not just delete the folder 🙂️
I do want to take a second and just say: “Wow pandocs… Wow”. Can you blame me? First things first, the basics is already pretty neat. Secondly, it seems to do everything. I was starting to think about how to put a header, footer etc. but after reading the docs, turns out it’s all there. I recon (if I chased it) I could get a good --toc
going, even if i had to delve into a filter3 or something.
So the meat of the operation is the pandocs line in the script. The rest is just a wrapper. It (minus the bash variables) goes as such:
pandoc -f markdown -t html -c style.css -H head.html -s --metadata pagetitle="DMW" -B header.html -A footer.html --highlight-style tango --mathml test.md > test.html
So all those arguments eh? these are my notes on them:
-f markdown
from markdown - This is pandoc
flavoured, not my usual github flavoured, but it works for footnotes.-t html
to html-s
for standalone, means it writes a “full” html
file (not just the body?).-c style.css
include the style.css
style-sheet.-H head.html
includes any more things for in the <head>
portion of the file
--metadata pagetitle="DMW"
set the title of the page to “DMW”
title
makes it put the title as another h1
element-B header.html
include header.html
at the top of the body-A footer.html
append footer.html
to the bottom.--highlight-style tango
syntax highlighting in one of the pre-set styles.--mathml
convert equations to mathml
which the browser should support natively.
And that’s it, and notably no --toc
flag, because the table of contents is not worth it (wrong place style, level etc), despite how swish it had the potential to be.
Interestingly the --self-contained
or -s --embed-resources
option didn’t work for me. It would be a clean way to get round some of the finagleing required for the CSS sheet to work with sub-dirs, but oh well…
Also want to shout out this CSS guide as it was perfect for the touch of CSS I wanted to make this look simple, but readable. Also see the advice on 55 bytes of CSS for similar ideas.
Shout out to Alex Rutar for some inspiration.
The script, for those interested / if I’ve not made it public on github:
#!/bin/sh
FOLD_IN="./md" #Source files location (a.k.a. input folder)
# No Sub Dirs!
FOLD_OUT="./site" #Website files location (a.k.a. output folder)
RESC_DIR="./resources"
IMG_FOLD_NAME="files"
# Finding all the markdown files
find $FOLD_IN -name '*.md' > filelist.temp
# Looping over all the markdown files
while read file
do
# Getting the filename
# In future might presever the sub dir.
FILE_NAME=$( basename "$file" .md)
SUB_DIR=$( dirname "$file" | cut -c 5-)
# Hackey way to get the CSS Dir to work
CSS_DIR=$( echo "$file" | tr -cd "/" | cut -c 3- | sed "s|/|../|g" | sed "s|.$||" )
if [ "$CSS_DIR" = "" ]
then
CSS_DIR="."
fi
# Making Subdir if required
-d "$FOLD_OUT$SUB_DIR" ] && mkdir "$FOLD_OUT$SUB_DIR" -p
[ !
# Output filename w/ folders
OUT_NAME="$FOLD_OUT$SUB_DIR/$FILE_NAME"
#echo "$OUT_NAME.html from $file"
# Checking if file exists
if [ -f "$OUT_NAME.html" ]; then
echo "Exists (delete to overwrite): $FOLD_OUT/$FILE_NAME.html"
else
echo "Converting $file"
# Running the pandoc line
pandoc -f markdown -t html -c "$CSS_DIR/style.css" -H "$RESC_DIR/head.html" -s --metadata pagetitle="DMW" -B "$RESC_DIR/header.html" -A "$RESC_DIR/footer.html" --highlight-style tango --mathml "$file" > "$OUT_NAME.html"
# Changing the ".md" links to ".html" links so it works with both pandocs and Obsidian
# Not perfect, but if anything else gets changed I'll be shocked.
sed -i 's/.md">/.html">/' "$OUT_NAME.html"
fi
done < filelist.temp
# Copying the `.css` file
cp "$RESC_DIR/style.css" "$FOLD_OUT/style.css"
# Copying the image folder
cp -r "$FOLD_IN/files/" "$FOLD_OUT/"
# Copyting a static folder? - Just copy it manually.
#cp -r "$RESC_DIR/static/* "$FOLD_OUT/"
rm filelist.temp
in fairness, the messing around with images was half of hugo’s issues for me. One potential solution could be post processing the generated html
with jampack
which does image optimisation (and more).↩︎
Esp. because I had to wire it in explicitly.↩︎
The pandoc
filters look immensely powerful, but must restrain from over engineering. 🥸️↩︎