Table of Contents

Thinking about publishing and maintaining a personal blog based on pure html/css, I was confronted with the question whether I really wanted to spend so much time and energy modifying the HTML file by hand. So what did I do? I have spend time and energy to build a program in python that integrates my Markdown files, of which one holds the post's content your are reading right now, into the html file that is used to build the website (probably) in front of your nose.

How does it work?

The foundation of my python program depends on setting a reference point in the html. I chose to reference the visual header you see on top of my blog. So, the program injects the next post directly after the header div.

The Markdown file is split for sections and lines, scanning it one by one. Naturally, every section is surrounded by start and end p tags and every line break gets one br tag in front of the next line.

Metadata

You may have noticed some additional information at the beginning of the post, how did I do that? Well, everything we need for the post is in the Markdown file. We use Markdown standard Frontmatter, which includes a timestamp, date and time, as well as a title. Actually, you can also set whether your Markdown file is a draft or not, and gets integrated into the html based on that.

---
timestamp: 24. April 2024 - 19:47
title: Markdown into single-html blog programmed
draft: False
---

First, the frontmatter section is extracted by splitting the file's content by "---", after which the section's lines are analysed one by one and the leading keyword added to a dictionary with its followed associated value:

def getMdHeaderParams(md):
    markdownHeader = md.split("---")[1]
    markdownHeaderItems = markdownHeader.split("\n")[1:-1]
    markdownHeaderItems = list(filter(None, markdownHeaderItems)) # MarkText just puts empty lines around the frontmatter properties
    markdownHeaderItemsDict = {}
    for markdownHeaderItem in markdownHeaderItems:
        markdownHeaderItemsDict[markdownHeaderItem.split(":", 1)[0]] = markdownHeaderItem.split(":", 1)[1][1:]

    return markdownHeaderItemsDict

Then, these values are added to the html text:

html = ""
html += wrapText("<p>", mdHeaderParams["timestamp"], class_="timestamp")
html += wrapText("<h1>", mdHeaderParams["title"], class_="title")

Tanslating markdown symbols

You may wonder how I have implemented the translation of Markdown symbols for markup, like bold or italic text, to html syntax, apart from the title which is just wrapped in h1 like paragraphs as indicated in the code block above.

After adding timestamp and title, refactoring markdown # headlines to h1 headlines, wrapping p tags around sections and div tags around the content of the post, every 2nd Markdown symbol (starting with the first) is replaced by one corresponding opening html element and every other by a closing one. It neglects last single symbols, so there won't be an opening element without a closing one, but I am noticing some weaknesses writing this post. I suppose a single symbol could be interpreted as an opening tag, before the intended opening tag, and the intended opening tag as the corresponding closing tag, messing things up.

The html is passed to the function, where every string equaling to a key of the dictionary "translation" (Markdown symbols) is replaced by its dictionary value and returned and used for the integration into the html file:

def translateSymbol(html, mdSymbol, htmlSymbol):
    import math

    count = str(html).count(mdSymbol)
    for i in range(math.trunc(count/2)):
        html = html.replace(mdSymbol, htmlSymbol[0], 1)
        html = html.replace(mdSymbol, htmlSymbol[1], 1)

    return html

def translateSymbols(html):
    # important to list * after **
    translation = {
        "```": ['<pre class="code">', "</pre>"],
        "**": ["<b>", "</b>"],
        "*": ["<i>", "</i>"]
    }

    for mdSymbol in translation:
        html = translateSymbol(html, mdSymbol, translation[mdSymbol])

    return html

But somehow I have to make sure that at least code blocks are skipped. If you could see what I am seeing right now.. Embedding the code parts in code tags helps, but the programmatic method of setting those needs adjustment because of markdown symbols within the code. Codeblocks seem to be the trickiest part. With the messiest consequences.

A litte time later: It works. I lost overview, but it works.

Rewriting the beginning is probably advisable..

Also, still, some code parts have unnescessary big (probably double) spaces and the same goes for the spaces after code blocks, but that's for another time. The latter can be coated as intentional modern design choice, I guess.

PS: all worked-around fixed now, as of 26. April 2024

PPS: it gets better and better :)

Sorting Posts

The posts are inserted into the html with a for-loop in insertAllPosts(), which are extracted from a sorted list of filenames returned by sortDirForTimestamp().

First, the md/ directory is listed. Then, the timestamp is extracted and inserted as a nested list, together with the filename.

timeFiles.append([timestamp, mdFilename])

Finally, the timestamps are converted into DateTime objects and inserted into the timefileObjects list, again together with the filename, after which the list is sorted in filenamesSortedForTimestamp. That is used for the for-loop metioned before, to insert one post after another.

def sortDirForTimestamp(mdDir):
    months = ["January", "February", "March", "April", "May", "June", "July", "August", "September", "October", "November", "December"]

    timeFiles = []
    timeFileObjects = []
    filenamesSortedForTimestamp = []

    mdDir = os.listdir("md/")
    for mdFilename in mdDir:
        with open(f"md/{mdFilename}") as f:
            md = f.read()
            mdHeaderParams = getMdHeaderParams(md)
            timestamp = mdHeaderParams["timestamp"]

            if mdHeaderParams["draft"] != "True":
                timeFiles.append([timestamp, mdFilename])

    for date in timeFiles:
        dateTimeSplit = date[0].split(" - ")

        time = dateTimeSplit[1]
        timeSplit = time.split(":")
        hour = timeSplit[0]
        second = timeSplit[1]

        dateSplit = dateTimeSplit[0].split(" ")
        day = dateSplit[0][:-1]
        # string month to int of month (i.e. April -> 4)
        month = months.index(dateSplit[1]) + 1
        year = dateSplit[2]

        timeFileObjects.append([datetime.datetime(int(year), int(month), int(day), int(hour), int(second)), date[1]])

    datetimeObjectsSorted = sorted(timeFileObjects, key=lambda dateDict: dateDict[0])
    for datetimeItem in datetimeObjectsSorted:
        filenamesSortedForTimestamp.append(datetimeItem[1])

    return filenamesSortedForTimestamp

Wanna try?

At the moment of writing that blog and its tools, it's not public. But as I think about publishing it on GitLab Codeberg, you may be able to view the source of those as you are reading. I am sure you will find it, if you haven't already.

Did you already think about starting a blog? A basic blog isn't really technically hard to set up. I am just nerding around. Hosted with GitLab Codeberg Pages, or similar, it's free too.