Adding a Mastodon feed to a static HTML site

I do not update my private web page that often, and while I do post in this blog every now and then, it can be several months without activity. I have links to the latest few posts from my private web site, using the RSS feed and a script that converts it to HTML, but as I am more active other places, I felt like I wanted to include that information as well.

I did have a limited presence on Twitter, but when it turned all nazi last year, I, and many others, left. A number of the people I follow left for Mastodon, the decentralized “Twitter alternative”, including George Takei and Charles Stross, I did so too. I had been toying with the idea of including the last few “toots” on my website for a while, but never got around to.

First I was looking at using an easy solution like emfed, which adds some scripting that downloads and displays the posts in the page. But as I mentioned in the previous post, my website is old-school with static HTML, so I wanted something that matched that, including the content as text and links, not as singing and dancing stuff, so I ended up writing my own stuff. I ended up with a shell script that downloads the last few posts, and a Python script that converts the posts into a HTML snippet that I can include in the HTML using Apache server-side includes.

Everything in the Mastodon API is JSON, which is the hype nowadays (I’m old enough to remember when XML was new and all the hype, so I don’t think JSON is the solution to everything, either, but it does the job). To parse JSON in my shell script I found jq, which was already installed on my hosting service and packaged for all systems I am running. While I know my used ID doesn’t change, I made the script resilient to that by first looking up the user ID and then download the feed:

#!/bin/sh
SERVER=mastodon.example.com
USERNAME=myusername
MAX=10
USERID="$(curl --silent "https://$SERVER/api/v1/accounts/lookup?acct=$USERNAME" | jq -r .id)"
if [ -z "$USERID" ]; then
	echo "Failed getting user ID" 1>&2
	exit 1
fi
curl --silent -o output.json "https://$SERVER/api/v1/accounts/$USERID/statuses?limit=$MAX"

This script writes the file as output.json, which I then feed into a simple Python script that reads the latest (i.e. first in the file) posts and writes a short HTML snippet that I can include. Since toots does not have headings like blog posts, there’s no clean markup-free text that can be copied to the page, everything is provided as HTML, so I have added some code to strip the markup and just give me the text. I also completely ignore any attachments and stuff, you have to click to go to media yourself:

#!/usr/bin/python3

import sys
import json
from io import StringIO
from html.parser import HTMLParser

# https://stackoverflow.com/a/925630
class MLStripper(HTMLParser):
    def __init__(self):
        super().__init__()
        self.reset()
        self.strict = False
        self.convert_charrefs= True
        self.text = StringIO()
    def handle_data(self, d):
        self.text.write(d)
    def get_data(self):
        return self.text.getvalue()

# Strip markup from HTML input
def strip_tags(html):
    s = MLStripper()
    s.feed(html)
    return s.get_data()

def latest(file, url):
    """Fetch entries from JSON and print them"""
    # Slurp JSON
    try:
        with open(file, 'rb') as jsondata:
            data = json.load(jsondata)
    except:
        return

    # Output headers
    print("<ul>")

    # Print the latest five
    num = 0
    for item in data:
        # Extract information
        link = url + '/' + item['id']
        date = item['created_at']
        reply = item['in_reply_to_id']
        is_reblog = 'reblog' in item and item['reblog'] is not None
        is_reply = 'in_reply_to_id' in item and item['in_reply_to_id'] is not None
        html = item['content']
        text = 'Toot'
        if is_reply:
            text = 'Reply in thread'
        if is_reblog:
            text = 'Boost @' + item['reblog']['account']['acct']
            html = item['reblog']['content']

        content = strip_tags(html.replace('</p><p>', '</p>\n<p>')).replace('\n', '<br>')
            
        # Truncate date to YYYY-MM-DD
        datestr = date[0:10]
        outhtml = ' <li><a href="%s">%s</a> (%s):<br>%s</li>' % (link, text, datestr, content)
        print(outhtml.encode('ascii', 'xmlcharrefreplace').decode('utf-8'))
        num += 1
        if num == 5:
            break

    print("</ul>")

if sys.argv[1] == 'output':
    latest('output.json', 'https://mastodon.example.com/@myusername')

In both scripts, replace “mastodon.example.com” with the actual host name of your instance and “myusername” with the handle.

All scripts are included without any warranties. If it breaks, you get to keep both parts.

Moving a website to a new host

It is when you are moving a website to a new hosting provider that you find all the little things you have taken for granted, everything that “just worked” but that has to be adjusted. My private website is ancient, I registered the domain in 1999 and at first it was hosted on a server that a couple of friends from the university hosted for free for several years, until the server was too old to keep running about five years ago. The server was just a simple Debian Linux server, and to avoid overloading it, the site is mostly static HTML files, with a few CGI scripts (some shell scripts, some Perl scripts, some compiled C binaries) for “interactivity”.

When I suddenly had to find a new home for the server five years ago, I already had an account on Dreamhost, which hosts the domain for this server. On Dreamhost I have been playing around with WordPress (for this site) and other stuff, but I just basically copied the other site over as-is. I have it all in version control (it was running CVS for a long while, but is now in Git), so it was fairly simple to get the stuff over. The problem was fixing up all the references in scripts, paths to home directories and the subtle Apache server configuration differences. Dreamhost was running Ubuntu, which is based on Debian, so the differences weren’t too big.

My friends also ran my email (with me doing all the spam filtering in Procmail). I didn’t want all my email to flow across the pond and back, so I got that hosted over at ABC-Klubben, which I have been a member of since around 1996. While their mail server was running BSD, it still supported Procmail, so after an initial mail loop and some mail ending up being nuked on receive, it worked mostly with the old setup.

I recently moved the site again. I didn’t really want a Swedish domain name hosted over in the US, especially with the worrying political trends over there. I live in Norway since 2000, and after looking for a while, I ended up chosing Nordhost as my local provider. They are running Debian, which means that most of the stuff worked. Some of the supporting scripts have started to show their age (who writes Perl nowadays, besides me?) so I had to do some rewrites. Also I didn’t manage to get my hacks for including a RSS feed “dynamically” working (having curl download the RSS and running a Perl script to output HTML for dynamic include), so I had to rewrite that (into a Python script that writes the HTML to include to disk, and include it statically instead). Still very Web 1.0, but that’s how I like it.

I also moved my email to the new host, so now it is hosted in Norway (outside the reach of FRA, perhaps, but now the Norwegians can spy on me instead). They didn’t support Procmail, so I had to learn to configure Sieve instead. Fortunately I didn’t really need all that complicated Procmail stuff to just sort my mailing lists, and it was time to clean up anyway.

I’m very happy with Nordhost. My next project is moving this site over to them, and to stop using Dreamhost before the next bi-yearly billing cycle, mostly for economic reasons. They finally figured out that they do need to charge Norwegian VAT (25%), and that the US dollar is about twice as expensive now compared to when I set up hosting there fifteen years ago.

On the futility of using Microsoft as an email provider

My $DAYJOB, like many others, have a Office 365 subscription, using also services like Microsoft’s Outlook for email. While I have, at a previous job, had mixed experiences with Microsoft and their upside-down view on how email should work. After fighting with their abysmal Outlook software, and their even worse web interface, my previous employer eventually managed to get the company they had subcontracted for email services to open up IMAP, and that was the end of that. With my current employer, I have used IMAP from the start, first continuing to use the Opera Mail client, despite it being out of support, and recently moving to its spiritual successor in Vivaldi.

This was working fine, I could mostly ignore Outlook and its idiosyncrasies. Until 2022-11-01, that is.

The powers that be at Microsoft were rolling out several changes, one was to remove support for password log-in, instead forcing everyone to use a web browser for logging in (fortunately, that was postponed a bit), but the other was some kind of update to the IMAP servers that failed horribly. Since it was the email client that started complaining, I first thought it was a bug in Vivaldi. And posted about it there. After researching it, I did find that Vivaldi was speaking IMAP correctly, but the servers didn’t understand it. I was pointed to a thread at Microsoft’s support forum, where I was far from the only one to have the problem, across different accounts and mail clients.

One would think that a company that provides a paid-for email hosting service that in an update breaks the support for a standardized network protocol would be quick to fix it. But, no. I am writing this at the end of January 2023, and it is still broken. I guess now that the tech oligopoly of Microsoft and Google has embraced and twisted email into whatever they want it to mean, they can do what they want and not care about end-users, without much recourse.

A sad state of affairs.

Anything can be imported to WordPress

A neighborhood association I am involved in has a web site for local news that grew out of some hand-made PHP scripts from 20 years ago. They have really shown their age, but the work involved in updating it is enormous as each page is its own PHP script with partial HTML, and parts in various templates. It has over a thousand news articles that are worth keeping. What do do, in 2022, to get it manageable?

Well, it seems WordPress is the One True System™ for easy publishing these days, so import it there, then? But as I said, this is a home-grown system with over a thousand PHP pages with hand-edited code.

Perl to the rescue. Fortunately, there were two index files, one with “modern” articles (from after the initial scripting broke down) and one with “archived” pages (from before, where the page display quality was rather low), so at least I had a list of all the pages to import. Right? Well, not really, some had chained links to sub-pages (picture carousels and such), so I had to add that. And making something to parse a thousand “almost similar” PHP documents does require some work, but after several hours of work I have managed to get the tool to read the indices, insert the missing pages where they belong, read the pages and parse their contents and meta-data (headers, dates), and spit out a WordPress-compatible 3 megabyte large XML for import.

Thanks to WP Sandbox, I have managed to test the XML (and iteratively fix the bugs in it) so that I now have one XML file that imports all the articles (minus one that no matter what I do end up being identified as a duplicate of another article, but I will cut my losses and lose that one, it was just a link to another site anyway). Of course I have none of the images and other pages on the sandbox, so I cannot really test that everything works out as expected, that will have to be done on the target site.

Oh, and, of course, I did make the script generate a giant .htaccess file with Redirect directives to map the old self-publish URLs to the new WordPress URLs. We can’t have old links become invalid, can we?

Since this is obviously never ever going to be useful for anyone else, as this is a unique home-grown system, I have published the script over at GitHub.

Displaying EML attachments in Vivaldi on Linux

Until recently, I have been using the old Opera Mail client for my work email. Opera Mail 12 was released in 2013. Being almost ten years old by now, it is starting to show its age with lack of support for newer standards. Since I really like the way the Opera Mail client works, I was really looking forward to the work done by the amazing Vivaldi team (where a lot of my old colleagues from Opera Software ended up, together with a lot of other amazing people), on their Vivaldi Mail client.

As my $DAYJOB uses Microsoft 365 for their email, we have several Outlook users in the house, and sometimes I happen upon a forwarded email sent in .eml format, and while Vivaldi actually uses this format to store email internally, it cannot display an attachment in this format (at least not as of October 2022). An eml file is basically just a bare email message, so it should be easy to read, right? Apparently, not so much. Googling around mostly recommended installing Thunderbird and import it there. That question did have an interesting answer recommending mhonarc, using it to convert to HTML. I ended up expanding on that, writing a script that converts to HTML and then calls back to my stand-alone Vivaldi e-mail instance. I have this as /home/peter/bin/eml-viewer (if you’re using the main instance, remove the –user-data stuff, otherwise replace it with a path to your configuration directory):

#!/bin/bash
if [ ! -e "$1" ]; then
    echo 'eml-viewer file.eml'
    exit 1
fi
OUT=$(mktemp --suffix .html)
/bin/mhonarc -single "$1" > "$OUT"
/bin/vivaldi-stable --user-data-dir=/home/peter/.config/vivaldi-standalone "$OUT"

The next step is to connect this up to open .eml files. For this, we need to create a matching .local/share/applications/eml-viewer.desktop file:

[Desktop Entry]
Version=1.0
Name=EML Viwer
GenericName=EML Viewer
Comment=View Outlook EML file
Exec=/home/peter/bin/eml-viewer %U
StartupNotify=false
Terminal=false
Icon=mail-mark-unread
Type=Application
Categories=Viewer;
MimeType=message/rfc822;

And finally, make it the default application for opening eml files:

xdg-mime default eml-viewer.desktop message/rfc822

This will leave the temporary HTML files in your temporary directory, so take care about sensitive data if on a shared computer. Set $TMPDIR to somewhere private and clear it periodically.

Reading Icalendar (.ics) file from Outlook on Linux

At $DAYJOB, email is handled through Microsoft’s Office 365, and with that I occasionally get event invitations in Microsofts’s internal format. As I am using an IMAP-based e-mail client (since I cannot stand Outlook Web Access), actually reading those invites can be a bit difficult.

With the default settings, the invitations are presented as a link into the Outlook Web Access client, with only the subject of the event readable (as the email subject). Everything else is completely hidden from the user. Thunderbird does have some built-in code that downloads the calendaring information and displays it to the user, but I am using a different email client and only get the web link.

Entering Outlook Web Access and going into the settings, there is a setting to present invites as Icalendar files (MIME type text/calendar, extension .ics). Enabling this changes the emails so that the event text is presented in the message body, but all the important details, such as start time and location, are only present in the Icalendar file. And while the calendar is “readable” in the sense that it is a text file, it is not readable in the sense that it is easy to find out what it says.

I am running Linux on my desktop, and do not have any calendaring software installed, so nothing wants to pick up the .ics file. And reading it in a text editor isn’t easy. There are several timestamps, and it takes a while to figure out that it is the third DTSTART entry that contains the event start time:

$ grep DT attachment.ics
DTSTART:16010101T030000
DTSTART:16010101T020000
DTSTART;TZID=W. Europe Standard Time:20211103T100000
DTEND;TZID=W. Europe Standard Time:20211103T142500
DTSTAMP:20211102T150149Z

Trying to find software that will just view an ics file in a readable format isn’t easy. I don’t need calendaring software on my desktop (I do have a calendar app on my phone that I could use, though), but it would be nice to display it.

After some intense web searching, I found mutt-ics, a plug-in for the textual Mutt e-mail client. I am not using Mutt, but running the script on the ics file did produce readable output:

$ python ./mutt_ics/mutt_ics.py /tmp/attachment857.ics
[...]
Start: Wednesday, 03 November 2021, 10:00 CET
End: Wednesday, 03 November 2021, 14:25 CET

That’s a step forward. The next issue is that I am using a graphical e-mail client, and this is a text-mode script. The e-mail software runs “xdg-open” to open the file, so I had to create a few items to get it working. First, a script wrapper that runs the script and shows the output using “xmessage” (other software also works, I have not yet found out how to get xmessage to display UTF-8 text properly, so I might need to replace it eventually):

#!/bin/bash
python /home/peter/mutt-ics/mutt_ics/mutt_ics.py "$1" | iconv -c -f UTF-8 -t ISO8859-1 | xmessage -file -
exit 0

Next step was to make a .desktop file that defines the script as a handler for the text/calendar MIME type:

$ cat /home/peter/.local/share/applications/view-ics.desktop
[Desktop Entry]
Type=Application
Version=1.0
Name=View iCalendar
Exec=/home/peter/bin/view_ics
Terminal=false
MimeType=text/calendar;
StartypNotify=false

And to tie it all together, I have to register it as the default handler for text/calendar by running xdg-mime:

xdg-mime default view-ics.desktop text/calendar

There, now running “xdg-open file.ics” opens a xmessage dialog showing the calendar details in a new window. Managed to get it working just in time, the meeting starts in twenty minutes…

Watching the WWE Network on Linux: an update

A few years back, I wrote about how I had to set up WINE to be able to watch the WWE Network from Linux, as they are using an incompatible DRM (Digital Restrictions Management) system, which is not supported on Linux. I have been using the set up pretty much unchanged since, while updating various components.

At one point I had to stop upgrading Firefox, as it started using features from newer versions of Windows that were not supported under WINE. I had configured WINE to make Firefox think it was running under Windows XP, and Firefox had since dropped XP support. If I try to claim a newer version of Windows, Firefox failed to connect to the Internet. The version I stopped at is version 53.0, which means that I am running an old, vulnerable Firefox version for this. This means do not use this browser for anything but watching WWE Network! Doing so may trigger an invulnerability somewhere.

I did keep my Flash Player up-to-date, but the latest Adobe Flash Player update (version 32) failed to run under WINE. Why I do not know, perhaps they also started using APIs that WINE does not support. I upgraded my WINE installation to the latest stable version (4.0), but to no avail. It was still hanging.

The only solution I have found so far has been to downgrade to Adobe Flash Player version 31. I had to dig a bit to find the download page for archived versions of Flash Player, and from there I ran the uninstaller and then downloaded the 31.0.0.153 archive. From that archive I installed the NPAPI Flash Player (31_0_r0_153/flashplayer31_0r0_153_win.exe) and now I have a working setup again.

So, an up-to-date WINE version, but an ancient Firefox and Flash Player. I hope WWE update their web site soon to something a bit more modern that work from Linux as well, but nothing has happened for the last five years, so I am not holding my breath…

Gemini PDA first impressions

I loved the Psion range of PDAs back in the late 1990s and early 2000s, for a while I had a Revo which was perfect for keeping a calendar and such on-the-go, but it eventually broke down and was replaced by smartphones.

My first smartphone was the Sony Ericsson P800; the OS was Symbian, the successor to the EPOC OS of the Psions, but now with just a touch screen and no keyboard. The follow-up units in the P series all had keyboards, and so did my first Android phone, Sony Ericsson’s X10 Mini Pro. But after that it has been touchscreen only. Nice for watching movies and reading web sites, but a nightmare to write longer texts, like this, on.

Enter the Gemini. Launched as a crowdfunding project on Indiegogo in February of 2017, I fell in love with the idea immediately. With the same form factor as the Psion 5mx, with an almost-full keyboard just large enough to type on with more than two fingers, and designed by ex-Psion folks, it looked like the device I had been missing since the Revo died.

Sometimes a laptop is just too big, and the netbooks from a few years ago were simply just too cheaply done, but this form factor is just perfect for typing on the go, like onboard a bus like I am just now.

I got my device earlier this week, and already it has replaced my laptop on several occasions. There is room for improvements, sure, the keyboard is not quite perfect yet and Android might not be the best fit. I have not had time to install the Linux dualboot image just yet, but I expect to spend most time there, as long as the 4G data works there.

All in all, I am very happy with the device. I backed it immediately after reading about it in The Register, as number 10. There are now over 5000 backers, so there seems to be a market for a device that has been missing from the market for 15 years.

Running memtest86 on a Mac Mini

At $DAYJOB, we are having issues with a Mac Mini that is acting up. It crashed on boot, and re-installing macOS didn’t help as it complained about the file system being damaged, no matter if I reformat (“erased” in Apple-speak) or repartition the disk. The built-in Apple Diagnostics tool crashed after about 16 minutes, so I thought I’d run memtest86+ on the machine. But without a working OS boot, I was unable to get it up and running, and googling for information didn’t help.

To get it running, I had to create a bootable USB stick, for which I had to find a Windows machine and run their USB Key installer. However, the disk did not show up in the list of boot options when booting the Mac Mini pressing the Option key. To find it, I had to install rEFInd on a second USB stick (they have a USB flash image ready for download, so no Windows machine needed).

With both USB sticks in the Mac, booting with the Option key let me select the rEFInd USB stick, which in turn found the memtest86+ stick as a “Legacy Windows” image. Now the test started fine.

Sound output from the wrong jack

Debian recently released an update to their stable release, version 8.7, and with it an update to slightly more recent Linux kernel version (up to 3.16 from 3.2). Well, that would be nice to have I thought, and updated my office workstation and rebooted. Everything looked fine, it even picked up and updated the Nvidia graphics driver that I always have problems with. But then, when I tried to play radio over the Internet, the sound suddenly started blaring out from a speaker inside the chassis that I didn’t even know it had, instead of my connected proper speakers.

So, first I thought the driver was broken, so I rebooted back to the old kernel. Still wrong, then I turned power off and back on and started the old kernel, still the wrong output. Strange.

I have a HP Z220 Workstation (from 2013) at the office, with an “Intel Corporation 7 Series/C210 Series Chipset Family High Definition Audio Controller (rev 04)” audio controller, with a Realtek ALC221 chip (as per output from lspci -v and /proc/asound/card0/codec#0). It took me an hour of intense googling to find the correct set of keywords to find something, but apparently most English-language threads use “jack” for the outputs. I should have known that.

I eventually stumbled on this ArchLinux thread from 2014 which mentioned a tool called hdajackretask that can be used to rearrange the outputs from the HDA cards. Debian distributes this utility in the alsa-tools-gui package. After installing the package and changing the output type I managed to get sound playing through my speakers again.

hdajackretask screenshot, setting "Green Line Out, Rear side" to "Line out (back)"

Screenshot from hdajackretask, used to select output devices from an HDA audio card

Now to actually get some work done. That is Mondays for you.