Adding a Mastodon feed to a static HTML site

I do not update my private web page that often, and while I do post in this blog every now and then, it can be several months without activity. I have links to the latest few posts from my private web site, using the RSS feed and a script that converts it to HTML, but as I am more active other places, I felt like I wanted to include that information as well.

I did have a limited presence on Twitter, but when it turned all nazi last year, I, and many others, left. A number of the people I follow left for Mastodon, the decentralized “Twitter alternative”, including George Takei and Charles Stross, I did so too. I had been toying with the idea of including the last few “toots” on my website for a while, but never got around to.

First I was looking at using an easy solution like emfed, which adds some scripting that downloads and displays the posts in the page. But as I mentioned in the previous post, my website is old-school with static HTML, so I wanted something that matched that, including the content as text and links, not as singing and dancing stuff, so I ended up writing my own stuff. I ended up with a shell script that downloads the last few posts, and a Python script that converts the posts into a HTML snippet that I can include in the HTML using Apache server-side includes.

Everything in the Mastodon API is JSON, which is the hype nowadays (I’m old enough to remember when XML was new and all the hype, so I don’t think JSON is the solution to everything, either, but it does the job). To parse JSON in my shell script I found jq, which was already installed on my hosting service and packaged for all systems I am running. While I know my used ID doesn’t change, I made the script resilient to that by first looking up the user ID and then download the feed:

#!/bin/sh
SERVER=mastodon.example.com
USERNAME=myusername
MAX=10
USERID="$(curl --silent "https://$SERVER/api/v1/accounts/lookup?acct=$USERNAME" | jq -r .id)"
if [ -z "$USERID" ]; then
	echo "Failed getting user ID" 1>&2
	exit 1
fi
curl --silent -o output.json "https://$SERVER/api/v1/accounts/$USERID/statuses?limit=$MAX"

This script writes the file as output.json, which I then feed into a simple Python script that reads the latest (i.e. first in the file) posts and writes a short HTML snippet that I can include. Since toots does not have headings like blog posts, there’s no clean markup-free text that can be copied to the page, everything is provided as HTML, so I have added some code to strip the markup and just give me the text. I also completely ignore any attachments and stuff, you have to click to go to media yourself:

#!/usr/bin/python3

import sys
import json
from io import StringIO
from html.parser import HTMLParser

# https://stackoverflow.com/a/925630
class MLStripper(HTMLParser):
    def __init__(self):
        super().__init__()
        self.reset()
        self.strict = False
        self.convert_charrefs= True
        self.text = StringIO()
    def handle_data(self, d):
        self.text.write(d)
    def get_data(self):
        return self.text.getvalue()

# Strip markup from HTML input
def strip_tags(html):
    s = MLStripper()
    s.feed(html)
    return s.get_data()

def latest(file, url):
    """Fetch entries from JSON and print them"""
    # Slurp JSON
    try:
        with open(file, 'rb') as jsondata:
            data = json.load(jsondata)
    except:
        return

    # Output headers
    print("<ul>")

    # Print the latest five
    num = 0
    for item in data:
        # Hide sensitive and unlisted toots
        if item['sensitive']:
             continue
        if item['visibility'] == 'unlisted':
            continue
        # Extract information
        link = url + '/' + item['id']
        date = item['created_at']
        reply = item['in_reply_to_id']
        is_reblog = 'reblog' in item and item['reblog'] is not None
        is_reply = 'in_reply_to_id' in item and item['in_reply_to_id'] is not None
        html = item['content']
        text = 'Toot'
        if is_reply:
            text = 'Reply in thread'
        if is_reblog:
            text = 'Boost @' + item['reblog']['account']['acct']
            html = item['reblog']['content']

        content = strip_tags(html.replace('</p><p>', '</p>\n<p>')).replace('\n', '<br>')
            
        # Truncate date to YYYY-MM-DD
        datestr = date[0:10]
        outhtml = ' <li><a href="%s">%s</a> (%s):<br>%s</li>' % (link, text, datestr, content)
        print(outhtml.encode('ascii', 'xmlcharrefreplace').decode('utf-8'))
        num += 1
        if num == 5:
            break

    print("</ul>")

if sys.argv[1] == 'output':
    latest('output.json', 'https://mastodon.example.com/@myusername')

In both scripts, replace “mastodon.example.com” with the actual host name of your instance and “myusername” with the handle.

All scripts are included without any warranties. If it breaks, you get to keep both parts.

Moving a website to a new host

It is when you are moving a website to a new hosting provider that you find all the little things you have taken for granted, everything that “just worked” but that has to be adjusted. My private website is ancient, I registered the domain in 1999 and at first it was hosted on a server that a couple of friends from the university hosted for free for several years, until the server was too old to keep running about five years ago. The server was just a simple Debian Linux server, and to avoid overloading it, the site is mostly static HTML files, with a few CGI scripts (some shell scripts, some Perl scripts, some compiled C binaries) for “interactivity”.

When I suddenly had to find a new home for the server five years ago, I already had an account on Dreamhost, which hosts the domain for this server. On Dreamhost I have been playing around with WordPress (for this site) and other stuff, but I just basically copied the other site over as-is. I have it all in version control (it was running CVS for a long while, but is now in Git), so it was fairly simple to get the stuff over. The problem was fixing up all the references in scripts, paths to home directories and the subtle Apache server configuration differences. Dreamhost was running Ubuntu, which is based on Debian, so the differences weren’t too big.

My friends also ran my email (with me doing all the spam filtering in Procmail). I didn’t want all my email to flow across the pond and back, so I got that hosted over at ABC-Klubben, which I have been a member of since around 1996. While their mail server was running BSD, it still supported Procmail, so after an initial mail loop and some mail ending up being nuked on receive, it worked mostly with the old setup.

I recently moved the site again. I didn’t really want a Swedish domain name hosted over in the US, especially with the worrying political trends over there. I live in Norway since 2000, and after looking for a while, I ended up chosing Nordhost as my local provider. They are running Debian, which means that most of the stuff worked. Some of the supporting scripts have started to show their age (who writes Perl nowadays, besides me?) so I had to do some rewrites. Also I didn’t manage to get my hacks for including a RSS feed “dynamically” working (having curl download the RSS and running a Perl script to output HTML for dynamic include), so I had to rewrite that (into a Python script that writes the HTML to include to disk, and include it statically instead). Still very Web 1.0, but that’s how I like it.

I also moved my email to the new host, so now it is hosted in Norway (outside the reach of FRA, perhaps, but now the Norwegians can spy on me instead). They didn’t support Procmail, so I had to learn to configure Sieve instead. Fortunately I didn’t really need all that complicated Procmail stuff to just sort my mailing lists, and it was time to clean up anyway.

I’m very happy with Nordhost. My next project is moving this site over to them, and to stop using Dreamhost before the next bi-yearly billing cycle, mostly for economic reasons. They finally figured out that they do need to charge Norwegian VAT (25%), and that the US dollar is about twice as expensive now compared to when I set up hosting there fifteen years ago.