Markdown into single-html blog programmed
24. April 2024 - 19:47Table of Contents
Thinking about publishing and maintaining a personal blog based on pure html/css, I was confronted with the question whether I really wanted to spend so much time and energy modifying the HTML file by hand. So what did I do? I have spend time and energy to build a program in python that integrates my Markdown files, of which one holds the post's content your are reading right now, into the html file that is used to build the website (probably) in front of your nose.
How does it work?
The foundation of my python program depends on setting a reference point in the html. I chose to reference the visual header you see on top of my blog. So, the program injects the next post directly after the header div.
The Markdown file is split for sections and lines, scanning it one by one. Naturally, every section is surrounded by start and end p tags and every line break gets one br tag in front of the next line.
Metadata
You may have noticed some additional information at the beginning of the post, how did I do that? Well, everything we need for the post is in the Markdown file. We use Markdown standard Frontmatter, which includes a timestamp, date and time, as well as a title. Actually, you can also set whether your Markdown file is a draft or not, and gets integrated into the html based on that.
---
timestamp: 24. April 2024 - 19:47
title: Markdown into single-html blog programmed
draft: False
---
First, the frontmatter section is extracted by splitting the file's content by "---", after which the section's lines are analysed one by one and the leading keyword added to a dictionary with its followed associated value:
def getMdHeaderParams(md):
markdownHeader = md.split("---")[1]
markdownHeaderItems = markdownHeader.split("\n")[1:-1]
markdownHeaderItems = list(filter(None, markdownHeaderItems)) # MarkText just puts empty lines around the frontmatter properties
markdownHeaderItemsDict = {}
for markdownHeaderItem in markdownHeaderItems:
markdownHeaderItemsDict[markdownHeaderItem.split(":", 1)[0]] = markdownHeaderItem.split(":", 1)[1][1:]
return markdownHeaderItemsDict
Then, these values are added to the html text:
html = ""
html += wrapText("<p>", mdHeaderParams["timestamp"], class_="timestamp")
html += wrapText("<h1>", mdHeaderParams["title"], class_="title")
Tanslating markdown symbols
You may wonder how I have implemented the translation of Markdown symbols for markup, like bold or italic text, to html syntax, apart from the title which is just wrapped in h1 like paragraphs as indicated in the code block above.
After adding timestamp and title, refactoring markdown # headlines to h1 headlines, wrapping p tags around sections and div tags around the content of the post, every 2nd Markdown symbol (starting with the first) is replaced by one corresponding opening html element and every other by a closing one. It neglects last single symbols, so there won't be an opening element without a closing one, but I am noticing some weaknesses writing this post. I suppose a single symbol could be interpreted as an opening tag, before the intended opening tag, and the intended opening tag as the corresponding closing tag, messing things up.
The html is passed to the function, where every string equaling to a key of the dictionary "translation" (Markdown symbols) is replaced by its dictionary value and returned and used for the integration into the html file:
def translateSymbol(html, mdSymbol, htmlSymbol):
import math
count = str(html).count(mdSymbol)
for i in range(math.trunc(count/2)):
html = html.replace(mdSymbol, htmlSymbol[0], 1)
html = html.replace(mdSymbol, htmlSymbol[1], 1)
return html
def translateSymbols(html):
# important to list * after **
translation = {
"```": ['<pre class="code">', "</pre>"],
"**": ["<b>", "</b>"],
"*": ["<i>", "</i>"]
}
for mdSymbol in translation:
html = translateSymbol(html, mdSymbol, translation[mdSymbol])
return html
But somehow I have to make sure that at least code blocks are skipped. If you could see what I am seeing right now.. Embedding the code parts in code tags helps, but the programmatic method of setting those needs adjustment because of markdown symbols within the code. Codeblocks seem to be the trickiest part. With the messiest consequences.
A litte time later: It works. I lost overview, but it works.
Rewriting the beginning is probably advisable..
Also, still, some code parts have unnescessary big (probably double) spaces and the same goes for the spaces after code blocks, but that's for another time. The latter can be coated as intentional modern design choice, I guess.
PS: all worked-around fixed now, as of 26. April 2024
PPS: it gets better and better :)
Sorting Posts
The posts are inserted into the html with a for-loop in insertAllPosts(), which are extracted from a sorted list of filenames returned by sortDirForTimestamp().
First, the md/ directory is listed. Then, the timestamp is extracted and inserted as a nested list, together with the filename.
timeFiles.append([timestamp, mdFilename])
Finally, the timestamps are converted into DateTime objects and inserted into the timefileObjects list, again together with the filename, after which the list is sorted in filenamesSortedForTimestamp. That is used for the for-loop metioned before, to insert one post after another.
def sortDirForTimestamp(mdDir):
months = ["January", "February", "March", "April", "May", "June", "July", "August", "September", "October", "November", "December"]
timeFiles = []
timeFileObjects = []
filenamesSortedForTimestamp = []
mdDir = os.listdir("md/")
for mdFilename in mdDir:
with open(f"md/{mdFilename}") as f:
md = f.read()
mdHeaderParams = getMdHeaderParams(md)
timestamp = mdHeaderParams["timestamp"]
if mdHeaderParams["draft"] != "True":
timeFiles.append([timestamp, mdFilename])
for date in timeFiles:
dateTimeSplit = date[0].split(" - ")
time = dateTimeSplit[1]
timeSplit = time.split(":")
hour = timeSplit[0]
second = timeSplit[1]
dateSplit = dateTimeSplit[0].split(" ")
day = dateSplit[0][:-1]
# string month to int of month (i.e. April -> 4)
month = months.index(dateSplit[1]) + 1
year = dateSplit[2]
timeFileObjects.append([datetime.datetime(int(year), int(month), int(day), int(hour), int(second)), date[1]])
datetimeObjectsSorted = sorted(timeFileObjects, key=lambda dateDict: dateDict[0])
for datetimeItem in datetimeObjectsSorted:
filenamesSortedForTimestamp.append(datetimeItem[1])
return filenamesSortedForTimestamp
Wanna try?
yes, the sections title is a hint to Wanna cryAt the moment of writing this blog and its tools, it's not public. But as I think about publishing it on GitLab Codeberg, you may be able to view the source of those as you are reading. I am sure you will find it, if you haven't already.
Did you already think about starting a blog? A basic blog isn't really technically hard to set up. I am just nerding around. Hosted with Codeberg Pages, or similar, it's free too.
I am sure there are people that want to hear what you have to say. I am sure you will have an effect on people. Maybe you write a technical post, which then helps others solve technical issues they face or start a project. Maybe you write a more abstract or philosophical post about why to blog and what effects it can have on communities, which animates others to blog. Maybe you write about your unique perspectives on things and enrich other people's undertanding or you share experiences that resonate with others, which give people feeling unheard the feeling of not being alone.
It's your choice. It's your blog. Just start!
If you have questions about code or starting a blog, your E-Mail is welcome at lewin@tuta.io