Adding a Mastodon feed to a static HTML site

I do not update my private web page that often, and while I do post in this blog every now and then, it can be several months without activity. I have links to the latest few posts from my private web site, using the RSS feed and a script that converts it to HTML, but as I am more active other places, I felt like I wanted to include that information as well.

I did have a limited presence on Twitter, but when it turned all nazi last year, I, and many others, left. A number of the people I follow left for Mastodon, the decentralized “Twitter alternative”, including George Takei and Charles Stross, I did so too. I had been toying with the idea of including the last few “toots” on my website for a while, but never got around to.

First I was looking at using an easy solution like emfed, which adds some scripting that downloads and displays the posts in the page. But as I mentioned in the previous post, my website is old-school with static HTML, so I wanted something that matched that, including the content as text and links, not as singing and dancing stuff, so I ended up writing my own stuff. I ended up with a shell script that downloads the last few posts, and a Python script that converts the posts into a HTML snippet that I can include in the HTML using Apache server-side includes.

Everything in the Mastodon API is JSON, which is the hype nowadays (I’m old enough to remember when XML was new and all the hype, so I don’t think JSON is the solution to everything, either, but it does the job). To parse JSON in my shell script I found jq, which was already installed on my hosting service and packaged for all systems I am running. While I know my used ID doesn’t change, I made the script resilient to that by first looking up the user ID and then download the feed:

#!/bin/sh
SERVER=mastodon.example.com
USERNAME=myusername
MAX=10
USERID="$(curl --silent "https://$SERVER/api/v1/accounts/lookup?acct=$USERNAME" | jq -r .id)"
if [ -z "$USERID" ]; then
	echo "Failed getting user ID" 1>&2
	exit 1
fi
curl --silent -o output.json "https://$SERVER/api/v1/accounts/$USERID/statuses?limit=$MAX"

This script writes the file as output.json, which I then feed into a simple Python script that reads the latest (i.e. first in the file) posts and writes a short HTML snippet that I can include. Since toots does not have headings like blog posts, there’s no clean markup-free text that can be copied to the page, everything is provided as HTML, so I have added some code to strip the markup and just give me the text. I also completely ignore any attachments and stuff, you have to click to go to media yourself:

#!/usr/bin/python3

import sys
import json
from io import StringIO
from html.parser import HTMLParser

# https://stackoverflow.com/a/925630
class MLStripper(HTMLParser):
    def __init__(self):
        super().__init__()
        self.reset()
        self.strict = False
        self.convert_charrefs= True
        self.text = StringIO()
    def handle_data(self, d):
        self.text.write(d)
    def get_data(self):
        return self.text.getvalue()

# Strip markup from HTML input
def strip_tags(html):
    s = MLStripper()
    s.feed(html)
    return s.get_data()

def latest(file, url):
    """Fetch entries from JSON and print them"""
    # Slurp JSON
    try:
        with open(file, 'rb') as jsondata:
            data = json.load(jsondata)
    except:
        return

    # Output headers
    print("<ul>")

    # Print the latest five
    num = 0
    for item in data:
        # Extract information
        link = url + '/' + item['id']
        date = item['created_at']
        reply = item['in_reply_to_id']
        is_reblog = 'reblog' in item and item['reblog'] is not None
        is_reply = 'in_reply_to_id' in item and item['in_reply_to_id'] is not None
        html = item['content']
        text = 'Toot'
        if is_reply:
            text = 'Reply in thread'
        if is_reblog:
            text = 'Boost @' + item['reblog']['account']['acct']
            html = item['reblog']['content']

        content = strip_tags(html.replace('</p><p>', '</p>\n<p>')).replace('\n', '<br>')
            
        # Truncate date to YYYY-MM-DD
        datestr = date[0:10]
        outhtml = ' <li><a href="%s">%s</a> (%s):<br>%s</li>' % (link, text, datestr, content)
        print(outhtml.encode('ascii', 'xmlcharrefreplace').decode('utf-8'))
        num += 1
        if num == 5:
            break

    print("</ul>")

if sys.argv[1] == 'output':
    latest('output.json', 'https://mastodon.example.com/@myusername')

In both scripts, replace “mastodon.example.com” with the actual host name of your instance and “myusername” with the handle.

All scripts are included without any warranties. If it breaks, you get to keep both parts.

Moving a website to a new host

It is when you are moving a website to a new hosting provider that you find all the little things you have taken for granted, everything that “just worked” but that has to be adjusted. My private website is ancient, I registered the domain in 1999 and at first it was hosted on a server that a couple of friends from the university hosted for free for several years, until the server was too old to keep running about five years ago. The server was just a simple Debian Linux server, and to avoid overloading it, the site is mostly static HTML files, with a few CGI scripts (some shell scripts, some Perl scripts, some compiled C binaries) for “interactivity”.

When I suddenly had to find a new home for the server five years ago, I already had an account on Dreamhost, which hosts the domain for this server. On Dreamhost I have been playing around with WordPress (for this site) and other stuff, but I just basically copied the other site over as-is. I have it all in version control (it was running CVS for a long while, but is now in Git), so it was fairly simple to get the stuff over. The problem was fixing up all the references in scripts, paths to home directories and the subtle Apache server configuration differences. Dreamhost was running Ubuntu, which is based on Debian, so the differences weren’t too big.

My friends also ran my email (with me doing all the spam filtering in Procmail). I didn’t want all my email to flow across the pond and back, so I got that hosted over at ABC-Klubben, which I have been a member of since around 1996. While their mail server was running BSD, it still supported Procmail, so after an initial mail loop and some mail ending up being nuked on receive, it worked mostly with the old setup.

I recently moved the site again. I didn’t really want a Swedish domain name hosted over in the US, especially with the worrying political trends over there. I live in Norway since 2000, and after looking for a while, I ended up chosing Nordhost as my local provider. They are running Debian, which means that most of the stuff worked. Some of the supporting scripts have started to show their age (who writes Perl nowadays, besides me?) so I had to do some rewrites. Also I didn’t manage to get my hacks for including a RSS feed “dynamically” working (having curl download the RSS and running a Perl script to output HTML for dynamic include), so I had to rewrite that (into a Python script that writes the HTML to include to disk, and include it statically instead). Still very Web 1.0, but that’s how I like it.

I also moved my email to the new host, so now it is hosted in Norway (outside the reach of FRA, perhaps, but now the Norwegians can spy on me instead). They didn’t support Procmail, so I had to learn to configure Sieve instead. Fortunately I didn’t really need all that complicated Procmail stuff to just sort my mailing lists, and it was time to clean up anyway.

I’m very happy with Nordhost. My next project is moving this site over to them, and to stop using Dreamhost before the next bi-yearly billing cycle, mostly for economic reasons. They finally figured out that they do need to charge Norwegian VAT (25%), and that the US dollar is about twice as expensive now compared to when I set up hosting there fifteen years ago.

On the futility of using Microsoft as an email provider

My $DAYJOB, like many others, have a Office 365 subscription, using also services like Microsoft’s Outlook for email. While I have, at a previous job, had mixed experiences with Microsoft and their upside-down view on how email should work. After fighting with their abysmal Outlook software, and their even worse web interface, my previous employer eventually managed to get the company they had subcontracted for email services to open up IMAP, and that was the end of that. With my current employer, I have used IMAP from the start, first continuing to use the Opera Mail client, despite it being out of support, and recently moving to its spiritual successor in Vivaldi.

This was working fine, I could mostly ignore Outlook and its idiosyncrasies. Until 2022-11-01, that is.

The powers that be at Microsoft were rolling out several changes, one was to remove support for password log-in, instead forcing everyone to use a web browser for logging in (fortunately, that was postponed a bit), but the other was some kind of update to the IMAP servers that failed horribly. Since it was the email client that started complaining, I first thought it was a bug in Vivaldi. And posted about it there. After researching it, I did find that Vivaldi was speaking IMAP correctly, but the servers didn’t understand it. I was pointed to a thread at Microsoft’s support forum, where I was far from the only one to have the problem, across different accounts and mail clients.

One would think that a company that provides a paid-for email hosting service that in an update breaks the support for a standardized network protocol would be quick to fix it. But, no. I am writing this at the end of January 2023, and it is still broken. I guess now that the tech oligopoly of Microsoft and Google has embraced and twisted email into whatever they want it to mean, they can do what they want and not care about end-users, without much recourse.

A sad state of affairs.

Anything can be imported to WordPress

A neighborhood association I am involved in has a web site for local news that grew out of some hand-made PHP scripts from 20 years ago. They have really shown their age, but the work involved in updating it is enormous as each page is its own PHP script with partial HTML, and parts in various templates. It has over a thousand news articles that are worth keeping. What do do, in 2022, to get it manageable?

Well, it seems WordPress is the One True System™ for easy publishing these days, so import it there, then? But as I said, this is a home-grown system with over a thousand PHP pages with hand-edited code.

Perl to the rescue. Fortunately, there were two index files, one with “modern” articles (from after the initial scripting broke down) and one with “archived” pages (from before, where the page display quality was rather low), so at least I had a list of all the pages to import. Right? Well, not really, some had chained links to sub-pages (picture carousels and such), so I had to add that. And making something to parse a thousand “almost similar” PHP documents does require some work, but after several hours of work I have managed to get the tool to read the indices, insert the missing pages where they belong, read the pages and parse their contents and meta-data (headers, dates), and spit out a WordPress-compatible 3 megabyte large XML for import.

Thanks to WP Sandbox, I have managed to test the XML (and iteratively fix the bugs in it) so that I now have one XML file that imports all the articles (minus one that no matter what I do end up being identified as a duplicate of another article, but I will cut my losses and lose that one, it was just a link to another site anyway). Of course I have none of the images and other pages on the sandbox, so I cannot really test that everything works out as expected, that will have to be done on the target site.

Oh, and, of course, I did make the script generate a giant .htaccess file with Redirect directives to map the old self-publish URLs to the new WordPress URLs. We can’t have old links become invalid, can we?

Since this is obviously never ever going to be useful for anyone else, as this is a unique home-grown system, I have published the script over at GitHub.

Displaying EML attachments in Vivaldi on Linux

Until recently, I have been using the old Opera Mail client for my work email. Opera Mail 12 was released in 2013. Being almost ten years old by now, it is starting to show its age with lack of support for newer standards. Since I really like the way the Opera Mail client works, I was really looking forward to the work done by the amazing Vivaldi team (where a lot of my old colleagues from Opera Software ended up, together with a lot of other amazing people), on their Vivaldi Mail client.

As my $DAYJOB uses Microsoft 365 for their email, we have several Outlook users in the house, and sometimes I happen upon a forwarded email sent in .eml format, and while Vivaldi actually uses this format to store email internally, it cannot display an attachment in this format (at least not as of October 2022). An eml file is basically just a bare email message, so it should be easy to read, right? Apparently, not so much. Googling around mostly recommended installing Thunderbird and import it there. That question did have an interesting answer recommending mhonarc, using it to convert to HTML. I ended up expanding on that, writing a script that converts to HTML and then calls back to my stand-alone Vivaldi e-mail instance. I have this as /home/peter/bin/eml-viewer (if you’re using the main instance, remove the –user-data stuff, otherwise replace it with a path to your configuration directory):

#!/bin/bash
if [ ! -e "$1" ]; then
    echo 'eml-viewer file.eml'
    exit 1
fi
OUT=$(mktemp --suffix .html)
/bin/mhonarc -single "$1" > "$OUT"
/bin/vivaldi-stable --user-data-dir=/home/peter/.config/vivaldi-standalone "$OUT"

The next step is to connect this up to open .eml files. For this, we need to create a matching .local/share/applications/eml-viewer.desktop file:

[Desktop Entry]
Version=1.0
Name=EML Viwer
GenericName=EML Viewer
Comment=View Outlook EML file
Exec=/home/peter/bin/eml-viewer %U
StartupNotify=false
Terminal=false
Icon=mail-mark-unread
Type=Application
Categories=Viewer;
MimeType=message/rfc822;

And finally, make it the default application for opening eml files:

xdg-mime default eml-viewer.desktop message/rfc822

This will leave the temporary HTML files in your temporary directory, so take care about sensitive data if on a shared computer. Set $TMPDIR to somewhere private and clear it periodically.

Watching the WWE Network on Linux: an update

A few years back, I wrote about how I had to set up WINE to be able to watch the WWE Network from Linux, as they are using an incompatible DRM (Digital Restrictions Management) system, which is not supported on Linux. I have been using the set up pretty much unchanged since, while updating various components.

At one point I had to stop upgrading Firefox, as it started using features from newer versions of Windows that were not supported under WINE. I had configured WINE to make Firefox think it was running under Windows XP, and Firefox had since dropped XP support. If I try to claim a newer version of Windows, Firefox failed to connect to the Internet. The version I stopped at is version 53.0, which means that I am running an old, vulnerable Firefox version for this. This means do not use this browser for anything but watching WWE Network! Doing so may trigger an invulnerability somewhere.

I did keep my Flash Player up-to-date, but the latest Adobe Flash Player update (version 32) failed to run under WINE. Why I do not know, perhaps they also started using APIs that WINE does not support. I upgraded my WINE installation to the latest stable version (4.0), but to no avail. It was still hanging.

The only solution I have found so far has been to downgrade to Adobe Flash Player version 31. I had to dig a bit to find the download page for archived versions of Flash Player, and from there I ran the uninstaller and then downloaded the 31.0.0.153 archive. From that archive I installed the NPAPI Flash Player (31_0_r0_153/flashplayer31_0r0_153_win.exe) and now I have a working setup again.

So, an up-to-date WINE version, but an ancient Firefox and Flash Player. I hope WWE update their web site soon to something a bit more modern that work from Linux as well, but nothing has happened for the last five years, so I am not holding my breath…

Watching the WWE Network on Linux

Okay, I confess, I am a fan of pro wrestling. You know, that weird US-American show-style wrestling where people pretend to beat each other up? Hulk Hogan and Ric Flair? No, okay, then you don’t need to continue reading.

Anyway, I am a fan, I even have a website dedicated to it, and I am subscribed to the WWE Network, an on-line channel where WWE broadcast their live events and I have access to their back archive. I subscribed when they opened international subscriptions back in August 2014, and among others, I have watched it on my PCs running Linux. It has worked flawlessly, until a few weeks ago, when it started developing error messages and then stopped playing completely.

Contacting their technical support didn’t help, once they heard about me running Linux they just stopped responding, both on Facebook and e-mail. Despite it having worked perfectly before, apparently since it is unsupported they do not want to look at ways of fixing it. So, what to do?

I ended up finding a workaround in installing the Windows (32-bit) version of Firefox and Flash Player under WINE. While it was easy enough to find the download link for Firefox, finding a working installation for the Flash plug-in was a bit more difficult. The normal plug-in download page didn’t work, as the installer was just a placeholder that downloaded the real installer, which it was unable to do under WINE. I managed to find a page with an off-line installer, a page that started with a big warning that it is going to be taken away next year.

Installing those and launching the Windows Firefox, I am able to play videos again. There are a few issues, the audio is not 100 % synchronized with the audio, but it at least is better than not playing at all.

I now have a workaround, but I still hope they will fix it properly soon.