Adding a Mastodon feed to a static HTML site

I do not update my private web page that often, and while I do post in this blog every now and then, it can be several months without activity. I have links to the latest few posts from my private web site, using the RSS feed and a script that converts it to HTML, but as I am more active other places, I felt like I wanted to include that information as well.

I did have a limited presence on Twitter, but when it turned all nazi last year, I, and many others, left. A number of the people I follow left for Mastodon, the decentralized “Twitter alternative”, including George Takei and Charles Stross, I did so too. I had been toying with the idea of including the last few “toots” on my website for a while, but never got around to.

First I was looking at using an easy solution like emfed, which adds some scripting that downloads and displays the posts in the page. But as I mentioned in the previous post, my website is old-school with static HTML, so I wanted something that matched that, including the content as text and links, not as singing and dancing stuff, so I ended up writing my own stuff. I ended up with a shell script that downloads the last few posts, and a Python script that converts the posts into a HTML snippet that I can include in the HTML using Apache server-side includes.

Everything in the Mastodon API is JSON, which is the hype nowadays (I’m old enough to remember when XML was new and all the hype, so I don’t think JSON is the solution to everything, either, but it does the job). To parse JSON in my shell script I found jq, which was already installed on my hosting service and packaged for all systems I am running. While I know my used ID doesn’t change, I made the script resilient to that by first looking up the user ID and then download the feed:

#!/bin/sh
SERVER=mastodon.example.com
USERNAME=myusername
MAX=10
USERID="$(curl --silent "https://$SERVER/api/v1/accounts/lookup?acct=$USERNAME" | jq -r .id)"
if [ -z "$USERID" ]; then
	echo "Failed getting user ID" 1>&2
	exit 1
fi
curl --silent -o output.json "https://$SERVER/api/v1/accounts/$USERID/statuses?limit=$MAX"

This script writes the file as output.json, which I then feed into a simple Python script that reads the latest (i.e. first in the file) posts and writes a short HTML snippet that I can include. Since toots does not have headings like blog posts, there’s no clean markup-free text that can be copied to the page, everything is provided as HTML, so I have added some code to strip the markup and just give me the text. I also completely ignore any attachments and stuff, you have to click to go to media yourself:

#!/usr/bin/python3

import sys
import json
from io import StringIO
from html.parser import HTMLParser

# https://stackoverflow.com/a/925630
class MLStripper(HTMLParser):
    def __init__(self):
        super().__init__()
        self.reset()
        self.strict = False
        self.convert_charrefs= True
        self.text = StringIO()
    def handle_data(self, d):
        self.text.write(d)
    def get_data(self):
        return self.text.getvalue()

# Strip markup from HTML input
def strip_tags(html):
    s = MLStripper()
    s.feed(html)
    return s.get_data()

def latest(file, url):
    """Fetch entries from JSON and print them"""
    # Slurp JSON
    try:
        with open(file, 'rb') as jsondata:
            data = json.load(jsondata)
    except:
        return

    # Output headers
    print("<ul>")

    # Print the latest five
    num = 0
    for item in data:
        # Hide sensitive and unlisted toots
        if item['sensitive']:
             continue
        if item['visibility'] == 'unlisted':
            continue
        # Extract information
        link = url + '/' + item['id']
        date = item['created_at']
        reply = item['in_reply_to_id']
        is_reblog = 'reblog' in item and item['reblog'] is not None
        is_reply = 'in_reply_to_id' in item and item['in_reply_to_id'] is not None
        html = item['content']
        text = 'Toot'
        if is_reply:
            text = 'Reply in thread'
        if is_reblog:
            text = 'Boost @' + item['reblog']['account']['acct']
            html = item['reblog']['content']

        content = strip_tags(html.replace('</p><p>', '</p>\n<p>')).replace('\n', '<br>')
            
        # Truncate date to YYYY-MM-DD
        datestr = date[0:10]
        outhtml = ' <li><a href="%s">%s</a> (%s):<br>%s</li>' % (link, text, datestr, content)
        print(outhtml.encode('ascii', 'xmlcharrefreplace').decode('utf-8'))
        num += 1
        if num == 5:
            break

    print("</ul>")

if sys.argv[1] == 'output':
    latest('output.json', 'https://mastodon.example.com/@myusername')

In both scripts, replace “mastodon.example.com” with the actual host name of your instance and “myusername” with the handle.

All scripts are included without any warranties. If it breaks, you get to keep both parts.