Split ticketing?

Due to how frankly ridiculously expensive UK rail travel has become, I've been experimenting with split ticketing recently.

INFO

Split ticketing is when you split a leg of a journey into two or more journeys at stations you are still passing through anyway, to try to take advantage of subsidised routes. There is now an entire industry in this, with companies competing on how to search and display these for a cut of the savings. The savings can be quite dramatic, e.g. less than half total price, depending on how complicated you want your journey to become.

Last week there were some problems with my journey and the fact that I'd gone for a more complicated route resulted in some difficulties I had not anticipated, and which I'll have to bear in mind in future.

What I'd do in happier times

I'm not going to go back 10 years, when it was somehow affordable for two of us to go first class from Feltham to Birmingham New Street via London Euston. No, those days are long gone, but even a few years ago it was still relatively affordable to do standard class journeys on that route, the London Euston to Birmingham New Street part of which is now operated by Avanti West Coast.

So, typical route would be:

Feltham to Vauxhall
Victoria Line from Vauxhall to London Euston
London Euston to Birmingham New Street

Or, given that the trains that stop at Vauxhall are not so frequent, sometimes it would be:

Feltham to London Waterloo
Northern Line from London Waterloo to London Euston
London Euston to Birmingham New Street

Not too much difference in it. From departure of first train to arrival of last train you could expect this to take about 2h50m.

These days, either of those routes when booked conventionally a couple of weeks in advance will come to about £60–£80 per person.

You can sometimes make it a bit cheaper by picking a different operator that stops at many places between London Euston and Birmingham New Street thus taking much longer. There is also the Chiltern Railways route from London Marylebone to Birmingham Moor Street (or Birmingham Snow Hill) which is also sometimes a little cheaper while also being a lot slower.

Quite a lot of the time I have a further journey to do in Birmingham, and also my time there is short, so I usually value being at Birmingham New Street (for onward travel) and it not taking ages.

Splitting for savings

There are some absolutely crazy splits you can do for this journey to get it really cheap, like sub-£20, but they involve convoluted journeys with 6 or 7 changes and more than 5 hours of travelling.

The route I chose was:

Feltham to Clapham Junction
Clapham Junction to Harrow & Wealdstone
Harrow & Wealdstone to Watford Junction
Watford Junction to Birmingham New Street

The tricky part here is that the service from Clapham Junction stops at Harrow & Wealdstone before continuing to Watford Junction so despite being two individual tickets would be only one journey. The total cost of the full journey was £28.80 which did include seat reservations from Watford Junction to Birmingham New Street.

The train that I'd be getting on at Watford Junction would be the same (fairly) fast Avanti West Coast service that I would normally get on from London Euston, so not much time lost there, and this whole trip is still just one single train across London. Seemed okay, and I'd already done it a few times without issue.

How things went wrong on the day

On the day of travel there was a signal failure just outside Feltham and all trains from Feltham were cancelled for most of the day.

I checked how things were running fairly early, but not early enough to make it to Clapham Junction (by alternate means that I paid extra for) in time to make my connection to Watford Junction. No problem, right? Just get to London Euston and use the tickets on a later service.

Sadly not. Yes, due to the disruption I could use the tickets I had on a later service. But only a later service on the exact route I had tickets for. In this case, that was Clapham Junction to Harrow & Wealdstone, and there's only one train per hour for that. Despite me being at a major London terminal and wanting to get to another major London terminal, I could only use an annoying route.

What I ended up doing was paying extra to get to London Euston by the fastest route I could.

I then wanted to get on an Avanti West Coast service from London Euston to Birmingham New Street, the portion of which from Watford Junction I did already have tickets for. They would not allow me on the next one departing because that one did not stop at Watford Junction so they said my tickets weren't valid for it, even though it would arrive at destination later than the tickets I held. They told me I'd have to pay extra for a ticket to Watford Junction and then get the next Avanti West Coast from there, as my Birmingham ticket would be valid on that.

I did manage to do this and somehow arrived in Birmingham only 66 minutes late. Annoyingly I made it to Watford Junction 9 minutes after my original connection left, so I then had to wait there 51 minutes. If I had waited at Clapham Junction for the next Watford Junction train though, I would have missed the one I did get on and had to wait an hour for the next, making me over 2 hours late.

It's by no means the worst experience I've had on UK railways but it's just frustrating that it's so inflexible.

What I'll do differently in future

In the past I've always booked tickets from Feltham all the way through on the basis that if there's delays that cause me to miss a connection then I'm able to use the same tickets on later services, the big one being the leg from London Euston to Birmingham New Street. So, for example, if there were some big problem getting across London from Feltham then I would still be able to use the same tickets from London Euston.

That all falls apart when I've picked some route that doesn't involve London Euston. I might have to be careful not to do that again, which might mean that split ticketing is of little use to me, since most of the cheap results involve avoiding London Euston!

The other thing I might try is only booking journeys that start at London Euston, and just using Oyster pay as you go to get to it. Oyster is relatively cheap, and I'm normally quite good at looking early for disruptions and working around them. There's a few ways to get to London Euston if I have free choice of route.

Doing it on a coach (e.g. National Express) sounds unpleasant. I haven't tried that sort of thing since my student days 30 years ago. But, desperate times… I'll investigate.

Delay Repay

INFO

Delay Repay is part of the National Rail Conditions of Travel and provides for compensation in the event of delays of at least 15 minutes.

I'm still waiting for my Delay Repay claim to be processed but in theory it should result in a complete refund of the £28.80 since my delay was more than an hour but less than two hours.

This won't cover the extra I spent to still make my journey. I think it is possible to try to claim for these consequential losses but I'm not sure I have the will to try that.

by Andy at February 28, 2026 12:00 AM

February 21, 2026

Alan Pope – Running RISC-V in a VM to test my snaps

tl;dr: I wanted to test one of my snaps on riscv64. I don’t own any RISC-V hardware. I set up a QEMU VM on my ThinkPad, installed Ubuntu desktop inside it, and it actually worked. Slowly. Very slowly. But it worked.

Notepad Next built for riscv processors, running in an Ubuntu VM

I maintain nearly 50 snaps in the Snap Store. Most of the time I test things on my ThinkPad running Ubuntu 24.04 (amd64), or my MacBook Air running Ubuntu Asahi (arm64). That covers the two architectures most people care about. But some of my snaps are built for more… exotic architectures. Things like s390x, ppc64el, and riscv64.

Now, I don’t care massively about s390x or ppc64el – my wife (and UK house construction) has made it abundantly clear that an IBM mainframe in the spare room is a non-starter – but I do care about RISC-V. It feels like the future. Or at least, a very exciting chunk of the future.

The problem? I don’t own any RISC-V hardware.

The hardware question

I recently updated Notepad Next to the latest upstream release and, as I was feeling smug about shipping the update, a tiny voice in the back of my head asked: “Did you actually test this on riscv64?”

Reader, I had not.

Notepad Next is a bit different from some of my other snaps. A lot of what I publish is essentially repackaged upstream .deb files – I only build for the architectures the upstream project builds for. But Notepad Next is built from source, which means I can (and do) target riscv64. Same story for ppc64el and s390x, actually.

Some of my other snaps need specific build tooling or components that haven’t made it to those architectures yet – I haven’t done a proper audit of which ones and why, but it’s on the list. Point is: the Notepad Next riscv64 snap exists and is in the store. Whether it works was the question I’d never answered.

So I started a thread on the Ubuntu Discourse asking whether there was a reasonably priced RISC-V board I could buy that’d run Ubuntu desktop well enough to do basic snap testing. Even just “does the window paint?” would do me.

The response from Heinrich (@xypron) was quick and genuinely excellent. The short version: don’t buy hardware quite yet. Here’s why, and it’s actually quite interesting.

A quick detour into ISA profiles

RISC-V is an open instruction set architecture, which means chip designers have a lot of latitude in what they implement. That’s great for flexibility, but software portability requires some agreement on a baseline. Enter the RVA profiles – essentially standardised bundles of RISC-V extensions that application processors must support.

Up to Ubuntu 24.04 (Noble) and core24, Ubuntu targets RVA20 – a relatively minimal baseline. With Ubuntu 25.10 (Questing) and core26, the requirement jumps to RVA23, which mandates a much richer set of features including the Vector extension (handy for AI/ML workloads and generally going faster) and the Hypervisor extension. RVA23 was only ratified in October 2024.

Here’s the fun bit: as of right now, there are essentially zero RVA23-compliant RISC-V boards you can actually buy. They’re coming – a handful of promising-looking boards from Sipeed, Milk-V, Banana Pi and others are expected around April/May 2026 – but they’re not on shelves yet. Even the Orange Pi RV2, released in March 2025, is RVA20 only, which means it already can’t run Ubuntu 25.10. Ouch.

So buying hardware right now feels like bad timing. A VM, on the other hand, is free and available immediately.

Getting the VM running

Heinrich pointed me at the Canonical docs for running Ubuntu RISC-V in QEMU, which are pretty solid. I hit one snag immediately: the docs reference a -cpu rva23s64 flag in the live server image section, but when I tried it on my Ubuntu 24.04 ThinkPad, QEMU spat back:

qemu-system-riscv64: unable to find CPU model 'rva23s64'

Turns out the version of QEMU in Ubuntu 24.04 is too old to know about RVA23. You need Ubuntu 25.10 or later for that. Since I’m firmly in the LTS camp and have no intention of running a non-LTS release on my daily driver just to get a newer QEMU, I dropped the -cpu flag and carried on with the default CPU emulation – which gives you RVA20, perfectly fine for testing core24-based snaps.

The docs do mention this, to be fair, but the caveat is tucked away in a callout box earlier in the page. If, like me, you navigate directly to the “live server image” section using the right-hand nav, you’ll miss it entirely. I’ve fed that back; hopefully it’ll get bumped up to the prerequisites section where it belongs.

Anyway! Here’s the install script I ended up using, with a couple of tweaks (I bumped the RAM up to 16GB and gave it 4 CPUs because I’m impatient and have RAM to burn):

#!/bin/bash
# https://canonical-ubuntu-hardware-support.readthedocs-hosted.com/boards/how-to/qemu-riscv/
qemu-system-riscv64 \
  -machine virt,acpi=off -m 16G -smp cpus=4 \
  -nographic \
  -kernel /usr/lib/u-boot/qemu-riscv64_smode/u-boot.bin \
  -netdev user,id=net0 \
  -device virtio-net-device,netdev=net0 \
  -device virtio-rng-pci \
  -drive file=disk,format=raw,if=virtio \
  -drive file=ubuntu-24.04.4-live-server-riscv64.iso,format=raw,if=virtio

The server installer runs headlessly (no GUI during install, which is fine). Once Ubuntu Server was installed on the virtual disk, and rebooted, I installed the desktop with sudo apt install ubuntu-desktop. That took a while, but when it finished, I rebooted with a slightly different launch script that adds a virtual GPU and input devices so I can actually get a desktop:

#!/bin/bash
qemu-system-riscv64 \
  -machine virt,acpi=off -m 16G -smp cpus=4 \
  -kernel /usr/lib/u-boot/qemu-riscv64_smode/u-boot.bin \
  -netdev user,id=net0 \
  -device virtio-net-device,netdev=net0 \
  -device virtio-rng-pci \
  -device virtio-gpu-pci \
  -device virtio-keyboard-pci \
  -device virtio-tablet-pci \
  -display gtk \
  -serial mon:stdio \
  -drive file=disk,format=raw,if=virtio

The key additions for desktop use are virtio-gpu-pci (so you get a display), virtio-keyboard-pci, virtio-tablet-pci, and -display gtk to open a window on the host. After a bit of fiddling – and a lot of waiting, because this thing is not quick – I got a working Ubuntu GNOME desktop running inside a RISC-V VM on my ThinkPad.

I was genuinely quite chuffed.

Did the snap work?

Yes! See screenshot at the top of this blog.

I just snap install‘ed Notepad Next (it’s a classic snap, so --classic needed), it launched, the window painted, and I could use it. I’m not going to pretend the experience was snappy (pun very much intended) – it’s CPU emulation all the way down, so every riscv64 instruction is being translated in software by QEMU running on my Intel i5. It’s impressively sluggish. But for the purposes of “does this thing actually run on riscv64”, it absolutely does the job.

This isn’t something I’d use as a daily development environment. It’s more like an on-demand sanity check: spin it up, install the snap, verify it launches, move on with your life.

Aside: I guess I could add some automation here, to spin up a machine and test each of these during the build in a GitHub action…

What about my other snaps?

Honestly, this whole exercise made me realise I should do a proper audit of my snap portfolio and riscv64 support. My current understanding is roughly:

Most of my snaps are amd64 only – either because the upstream doesn’t build for other architectures, or because I’m repackaging .deb files and only publishing what upstream provides.
telegram-asahi is arm64 only – that one exists specifically for the Apple Silicon Linux crowd.
Some build from source and may target riscv64 (and ppc64el and s390x) just fine.
Various others are somewhere in between – I suspect some are missing riscv64 builds due to missing build dependencies or tools that haven’t been ported yet, but I haven’t actually investigated.

The plan is to go through each snap and understand why it does or doesn’t build for the interesting architectures, and fix what I can. I’ll report back in another blog post.

What’s next for RISC-V hardware?

If you’re thinking about picking up a RISC-V board right now: it’s complicated. Anything currently available is RVA20, which means it’ll run Ubuntu 24.04 but not Ubuntu 25.10. The genuinely exciting RVA23 hardware (with proper vector support, better performance, etc.) is expected to land around April-May 2026. The DeepComputing Framework laptop mainboard in particular looks very interesting – a RISC-V chip in a Framework 13. Yes please.

For now though, QEMU is your friend. It’s a bit like going to the gym – tedious, slow, not something you’d do for fun – but you come away knowing your snap actually works on the architecture it claims to support. And that’s got to count for something.

Thanks to Heinrich Schuchardt (xypron) from Canonical for the fast and helpful response on Discourse, and for pointing me at the docs.

by Alan Pope at February 21, 2026 03:00 PM

February 13, 2026

Christopher Roberts – Make My Year with Vibe Coding

Around Christmas each year, I produce printed PDF monthly calendars for my wife, incorporating our family’s anniversaries and other important dates.

Originally this was done in a spreadsheet, but that was always a headache, especially when multiple events shared the same date. One year, I gave up and replaced the spreadsheet with a Perl script using PDF::API2 and this has worked well for a number of years. But, being CLI-based, this remained on my annual to-do list.

The idea to create a web front-end through Vibe Coding came from an episode of the Linux Matters Podcast, in which the hosts discussed maintaining a living specification document and using AI to generate an application directly from that specification. The process they described was disciplined: refine the specification, regenerate the system, repeat - until the specification itself becomes the definitive source of truth.

The process

I started by asking ChatGPT whether my choice of Perl + Dancer would be suitable, or whether a different language and web framework would be likely to produce better results. Sadly, ChatGPT felt that Codex would be much more effective using a more popular combination and, of the choices suggested, I opted for Ruby with Sinatra. I had previously used Ruby on Rails, but had found it to be too difficult to maintain. Sinatra is a more lightweight web framework that keeps structure simple and avoids unnecessary abstraction. For the same reason, I chose Sequel for the database layer instead of ActiveRecord.

The project was broken into sections - creating the base website, adding authentication, etc. At each major step, we worked on a structured specification, which I fed into Codex. I tested the results and gave feedback to Codex. Codex automatically adds unit tests, and uses them to give itself feedback - it is interesting to watch it attempting to fix its own mistakes, unprompted.

When problems occurred, I would often discuss these with ChatGPT before asking Codex to make changes. For minor steps, I worked directly with Codex, with more of a simple trial and error approach.

The underlying PDF generation remains largely the original Perl script. The surrounding system - authentication, subscription handling, templating, validation and deployment - was almost entirely developed conversationally.

The frustrations

There have been some frustrations.

At one point, what I thought was a simple change, resulted in a convoluted and entirely unjustified code change. This happened simply because I was not close enough to the JSON data being accessed, and Codex was trying to achieve something which normally I would have known was not possible. I reverted the change, but not before it had used up my remaining Codex tokens.

Another issue was that, whilst complex changes were often completed quickly and efficiently, sometimes unbelievably trivial changes had us looping around in circles until I was forced to take control. Minor CSS positioning tweaks were a constant frustration.

We experienced a lot of ‘mission creep’ - it was just too tempting to extend the project, given that it wasn’t me doing the hard work. We should support Internet Calendars! We should have a weekly calendar format! We should have a daily planner format! Quick and easy to type the commands into Codex, but the ensuing trial and error, added much to the length of the project.

Conclusions

After many iterations and ‘back to the drawing board’ moments, we finally reached the point where it received spouse-approval, especially when she realised that next year she will be able to simply login and print.

The result is MakeMyYear, a small subscription service that allows users to generate personalised downloadable calendars for £4 per year, which hopefully will offset some of the Codex costs, as well as giving me invaluable experience with working with Stripe.

I do feel it will be very difficult to go back to traditional coding. The line by line coding which I had always found so engaging, now seems like far too much like hard work!

Have I just spent months of free time and hard-earned money, simply making it possible for my wife to print her own calendars? Why yes, yes I have.

by Christopher Roberts at February 13, 2026 05:14 PM

February 10, 2026

Jon Spriggs – EFS-CSI-Driver and the curse of the missing (deleted) PVC in #Kubernetes

A colleague at work mentioned seeing this log in some security logs:

User: arn:aws:sts::123456789012:assumed-role/eks-cluster-efs/1111111111111111111 is not authorized to perform: elasticfilesystem:DeleteAccessPoint on the specified resource

I thought “Oh, that’s simple, it must be that the IAM policy we applied to the assumed role for the EFS driver was missing that permission”. It’s never that simple.

When I checked the logs for the efs-csi-controller pods, specifically the csi-provisioner container, I saw lots of blocks like this:

I0210 08:15:21.944620       1 event.go:389] "Event occurred" object="pvc-aaaaaaaa-1111-2222-3333-bbbbbbbbbbbb" fieldPath="" kind="PersistentVolume" apiVersion="v1" type="Warning" reason="VolumeFailedDelete" message="rpc error: code = Unauthenticated desc = Access Denied. Please ensure you have the right AWS permissions: Access denied"
E0210 08:15:21.944609       1 controller.go:1025] error syncing volume "pvc-aaaaaaaa-1111-2222-3333-bbbbbbbbbbbb": rpc error: code = Unauthenticated desc = Access Denied. Please ensure you have the right AWS permissions: Access denied
I0210 08:15:21.944598       1 controller.go:1007] "Retrying syncing volume" key="pvc-aaaaaaaa-1111-2222-3333-bbbbbbbbbbbb" failures=9

But wait… I don’t have a PVC with that key. And then it hit me. Several months ago we had a stuck PVC, and we ended up having to delete it in a really weird way, which included deleting the directory at the EFS level directly… I don’t even recall the details now, but I remember it being quite painful.

Anyway, to resolve the above issue, what you need to do is find your policy for the EFS role, and look for this block:

{
  "Effect": "Allow",
  "Action": "elasticfilesystem:DeleteAccessPoint",
  "Resource": "*",
  "Condition": {
    "StringEquals": {
      "aws:ResourceTag/efs.csi.aws.com/cluster": "true"
    }
  }
}

And then remove the condition, so it looks like this:

{
  "Effect": "Allow",
  "Action": "elasticfilesystem:DeleteAccessPoint",
  "Resource": "*"
}

Apply this new policy, restart the efs-csi-controller deployment (or just delete all the pods that are in the deployment)… give it 2 minutes, and then re-apply the previous IAM policy (with the Condition block). Tada. All gone.

So what was happening? The Cluster somehow still remembered there had been a volume with that PVC ID, but when it was trying to delete it from the EFS volume, the access point had been removed because the underlying path had gone. As a result, the access point didn’t have that tag (aws:ResourceTag/efs.csi.aws.com/cluster = true) so it couldn’t delete it.

I’d never have found it without finding issue #1722 in the EFS CSI Driver github repository which linked to issue #522, and specifically this comment which said to delete the condition… even though the rest of the context isn’t quite right.

Featured image is “Magnifying glass” by “Michael Pedersen” on Flickr and is released under a CC-BY license.

by JonTheNiceGuy at February 10, 2026 08:51 AM

February 07, 2026

Alan Pope – The Threads Algorithm Loves Rage Bait

I use Publer to post identical content across Threads, Mastodon, and Bluesky. Same words, same time, same bloke. It’s a massive time-saver, and means I can reach people wherever they happen to hang out online without having to faff about copying and pasting between apps.

I wasn’t running some grand social media experiment. I was just having a moan about Windows updates like any reasonable person would. But the results were so stark they opened my eyes to exactly what these platforms reward - and it’s not what you might think.

The Numbers

Here’s what I posted:

Six months since I booted my Windows “gaming” PC.
One game I want to play.
GPU drivers.
Windows updates.
27GB Steam update.
Three hours later, I’ve played zero games.

Same post, three platforms, posted simultaneously at about 6pm. Here’s what happened:

Platform	Followers	Likes/Favourites	Comments/Replies	Reshares/Boosts/Reposts
Threads	330	927	404	3
Mastodon	5,300	19	4	3
Bluesky	1,700	3	0	1

Let me spell that out clearly: On Threads, where I have my smallest following by a country mile, the post got 49 times more engagement than on Mastodon, where I have 16 times more followers.

Mental.

What Threads Actually Rewards

Here’s the thing - most of my Threads posts get precisely sod all engagement. Zero. Zilch. Nowt. I’ll share something I’m building, a technical observation, community updates, links to blog posts - nothing.

Complain about Windows? 927 likes.

A few weeks back I had a moan about a poorly-worded GitLab email. The message said “You’re receiving this email because of your account on gitlab.gnome.org” which, as I pointed out, isn’t actually a reason for sending an email - it’s just stating a fact about my account existing. Fair criticism, I thought.

Got absolutely slated in the replies. Told I was clueless about technology, that I was being unreasonable, the works. 22 hearts, 11 replies, mostly having a go.

Pattern emerging?

Threads has worked out what drives engagement on their platform: conflict.

My Windows post hit every algorithmic sweet spot:

Complaint/frustration âœ“
Platform wars potential (Windows vs. Linux) âœ“
Invites correction âœ“
Triggers strong opinions âœ“
Generates replies (even hostile ones) âœ“

The algorithm doesn’t care if those replies are helpful, thoughtful, or complete bollocks. It just cares that people are engaging.

The Response Patterns

404 comments. Bloody hell. Now, I’m fully aware that posting a complaint online is basically asking for it. Some of these responses were genuinely trying to help, some were having a laugh, some were probably winding me up for sport (fair play), and some seemed genuinely annoyed that I’d dared complain about Windows. They fell into some pretty clear categories:

The “You’re Doing It Wrong” Brigade

“That’s neglect….. you’re a terrible PC dad.”

“clearly a skill issue”

“Sounds like this is 100% user-error”

“You’re just self reporting your incompetence”

My personal favourite: “Your poor parents, bro. As a father of five I can’t imagine seeing my adult child struggle with basic critical thinking skills.”

Look, I’ll admit this last one made me laugh. The mental image of my 84-year-old mum receiving a concerned phone call about my inability to operate a computer is genuinely brilliant. I’m choosing to believe this was someone having a laugh rather than genuine concern for my elderly parents’ wellbeing.

The Well-Meaning (But Mistaken) Helper Squad

Dozens of people telling me to use Linux. Fair enough on the surface, except the game won’t run on Linux. That’s literally why I have a Windows PC. But they didn’t ask what game - they just assumed I was either being thick or hadn’t considered Linux. Which is quite funny given I co-host a Linux podcast and spent years advocating for Ubuntu professionally.

Multiple suggestions to “just enable auto-updates” - missing the point that I can’t predict when I’ll want to play, so auto-updates wouldn’t have solved this particular Friday evening situation.

“Play while it downloads” - it’s multiplayer, the game literally requires the update to be installed before you can launch it. But I appreciate the optimism.

A surprising number recommending GeForce Now or other streaming services. Points for creativity, even if it wasn’t what I was after.

The Actually Helpful Responses

Buried among everything else, there were some genuinely lovely responses:

“This is my experience as a new parent. I finally get a few hours to do something fun for the first time in months and BAM 3 hours of updates, no matter what I choose to play.”

“People will blame you about this, but I’ve made the same complaints. The problem is not the amount of upgrades but how blocking and synchronous they are. They don’t know there are better user experiences, they are just used to poorly designed systems.”

These folks got it. The issue isn’t that updates exist - it’s that the entire experience is designed to block you from doing what you actually want to do. They understood it was about UX frustration, not technical incompetence.

The Question Almost Nobody Asked

What game?

Out of 404 comments, almost nobody asked which game I was trying to play. They just assumed I was being thick, or that I should obviously be using Linux, or that I was exaggerating for comic effect (I wasn’t - it genuinely took three hours).

The game is PUBG. Kernel-level anti-cheat. Won’t work on Linux, won’t work on SteamOS, won’t work in a VM, won’t work on anything except native Windows. Full stop. It’s the sole reason I keep a Windows PC around.

What Nobody Knew (And Nobody Asked)

The PC sat unused for six months because we offered housing to someone experiencing homelessness. My gaming room became their room. They’ve since moved on, and it took a while to get everything set back up properly.

Not anyone’s business, really. But it’s interesting what people will assume when you don’t provide full context. I’m not blaming anyone for not knowing this - how could they? But it does highlight how social media strips away context and leaves room for people to fill in their own narratives.

For the record: I’m a Developer Advocate at an AI-native dev company. Long-time Ubuntu community member. I co-host Linux Matters podcast. I spent years working at Canonical in various roles including Developer Advocate. I’ve advocated for Linux professionally and personally for over two decades.

But Threads doesn’t know that. Threads doesn’t care. Threads just knows: complaint â†’ engagement â†’ boost the hell out of it.

The Platform Comparison

Let’s look at what happened on the other platforms.

Mastodon’s 4 Replies:

“that’s why ppl kick out win and start using linux but you already know that I guess ðŸ˜‚”

“and I used to complain about the loading bars on the speccy taking ages before I could play dizzy”

This second one is absolutely brilliant - nostalgic Spectrum reference, understands the frustration of waiting ages to do something you wanted to do right now, and doesn’t assume I’m a numpty. Just lovely, that.

“RIP Windows”

“And you know what the next stage is going to be. Yesss Finaly! Let the games begin!!! Oh. Shader pre-cashing .. I might as well go to bed :D”

Light, friendly, empathetic. Nobody assumed I was completely incompetent. Nobody got angry at me for daring to complain. Just people who understood the frustration and shared their own experiences.

Bluesky:

One repost. No commentary.

The community’s still finding its feet over there, I reckon. Or maybe they just didn’t think it was worth engaging with. Either way, pretty quiet.

Threads:

Mixed bag. Some genuinely thoughtful responses, some people winding me up (fair enough), some people who seemed genuinely annoyed that I’d complained, and quite a few who wanted to help but made assumptions about my technical knowledge.

But here’s the key thing: the algorithm boosted it all the same, because engagement is engagement. Doesn’t matter if you’re being helpful, hostile, or having a laugh - it all counts the same to the algorithm.

What This Actually Reveals

Different platforms, different algorithms, different incentives:

Mastodon: Chronological timeline. No algorithm boosting controversial content. Your followers see your posts in order. Engagement is modest but genuine. People who follow you chose to follow you, and they see what you post. Revolutionary concept, that.

Bluesky: Algorithm exists but doesn’t seem to heavily favour conflict. Still relatively quiet, community still finding its shape. Might just be that my particular crowd hasn’t really migrated over there yet.

Threads: Meta’s algorithm surfaces content that generates interaction - any interaction. Arguments are engagement gold. The algorithm doesn’t distinguish between “this post made people think” and “this post wound people up” - it just sees numbers going up.

I wasn’t trying to game the system. I was genuinely frustrated about waiting three hours to play a bloody video game on a Friday evening. But Threads saw a post about Windows, updates, and gaming, thought “this will make people argue” and ran with it.

Got notifications at 100 views, 500 views, 1000 views. The algorithm was having a whale of a time.

The Irony

I’d rather have 19 favorites and 4 friendly replies on Mastodon than 927 likes on Threads if a chunk of those likes come packaged with people assuming I’m incompetent or having a go at me (even if some of them are just having a laugh).

High engagement â‰ good engagement.

The algorithm doesn’t care if people are being helpful or hostile. It cares that they’re engaging. That’s the metric. That’s what gets boosted. That’s what drives the platform.

And look, I get it - I posted a complaint online. That’s basically an invitation for people to respond however they like. Some will help, some will take the piss, some will genuinely think you’re being thick. That’s the internet. I’m not precious about it.

But it’s fascinating that this type of content gets algorithmically boosted so much more than anything else I post. That’s the bit worth paying attention to.

What I’ve Learned

Threads rewards rage bait. Complain about popular things - Windows, GitLab emails, whatever - and you’ll get engagement. Won’t necessarily be good engagement, mind. You’ll get helpful people, people winding you up, people having a laugh, and people who are genuinely annoyed you dared complain.

Platform incentives shape behaviour. On Threads, I’m learning (reluctantly) that controversy performs. On Mastodon, thoughtful content finds thoughtful people - just fewer of them. The numbers tell you what each platform values.

Context collapse is real. Without context, a post becomes whatever people project onto it. Very few people assume good intent. Most jump straight to filling in the gaps with their own assumptions. That’s not necessarily malicious - it’s just how human brains work when presented with incomplete information.

The network effect paradox: I have 16x more followers on Mastodon, but get better reach on Threads with 330 followers - because one platform actively surfaces controversial content to people who don’t follow me.

I’m still posting to all three platforms. But I know now what each platform rewards. I know what kind of engagement I’ll get where.

Choose your platform based on what kind of engagement you want - not just how much of it.

Did I eventually get to play PUBG?

Yeah, hours later, as advertised. I’m terrible at it, but glad I managed to squeeze a quick round in.

The internet’s reaction to my mild frustration about waiting to play it? That turned out to be the more interesting story.

by Alan Pope at February 07, 2026 10:30 PM

February 02, 2026

Andy Smith – Rusty Fizz buzz with some Divan

Recently I watched a Rust video by Andy Balaam covering a test-driven implementation of one brand of the Fizz buzz game. Watching it inspired me to learn about benchmarking Rust with Divan, and that's what this write up is about.

Watch the video first!

I'm not sure that this article will make much sense unless you watch Andy's video first, or possibly at the same time.

So there I was enjoying Andy's video. When he got to the part where he'd implemented his variation of Fizz buzz to the point that it passed all his unit tests he started wondering about what changes he could make to make the code more pleasing and possibly more performant.

Throughout this Andy stated repeatedly that it's always a bad idea to do performance changes without checking if they actually make things better (or even if they are needed), but his main focus was on correctness and how you might test for that as the code evolves.

I'd seen a few people use Divan for benchmarking Rust code and thought to myself, oh, as a learning exercise for Divan I could write out the different versions of Andy's implementation and try benchmarking them. So that's what I've done.

NOTE

I am a novice with Rust and I've never previously actually used Divan! ðŸ˜€
I do realise that an implementation of Fizz buzz isn't a great thing to try to benchmark as it's rather too simple. This was mainly about learning how to use the crate.

Nevertheless, I did find out some things that interested me.

Fizz buzz implementations

In the src/lib.rs file you can see all the different implementations that Andy came up with, in the order that he discussed them in his video. Of course, in his video he was just iterating on the one implementation, but I've captured what I think were each of the key stages and kept them as separate functions so they can be evaluated together. They are:

`naive()`

A straightforward test of every scenario as a list of if â€¦ / else if â€¦ predicates.

`mod_then_match()`

A neater looking version which does all the tests first and then uses a single match block to check all possible states.

`early_return_before_mod()`

Like mod_then_match(), but first checks for the case where both '5' and '7' appear in the number string in order to short circuit before doing any of the modulus tests.

`single_string_scan()`

Like early_return_before_mod() but instead of doing multiple checks with n_str.contains(â€¦), this version does just one scan through the string.

`single_string_scan_early_fizzbuzz()`

Like single_string_scan() but do a check for the "FizzBuzz" case due to character matches, to be able to sometimes avoid having to do any modulus checks.

And one moreâ€¦

At the end I added one more of my own, and one variant on it. Read on!

Testing

Andy spent a lot of time covering his testing strategy, probably you could say it was the main thrust of the video. I only altered it to use the test_case crate so that I could pass in a pointer to a function that is the desired Fizz buzz implementation to be tested. That way it was easy to do all the same unit tests on every implementation.

You can see them all in the src/main.rs file, but here's an example of one of the unit tests.

    #[test_case(naive ; "using naive implementation")]
    #[test_case(mod_then_match ; "using mod first then match cases implementation")]
    #[test_case(early_return_before_mod ; "using early return before mod")]
    #[test_case(single_string_scan ; "using single string scan")]
    #[test_case(single_string_scan_early_fizzbuzz ; "using single string scan with early fizzbuzz shortcircuit")]
    fn fizzbuzz_all_counts_up_to_max(fzbz_fn: fn(i32) -> Answer) {
        let answers = fizzbuzz_all(fzbz_fn, 50);
        assert_eq!(answers[0], Number(1));
        assert_eq!(answers[4], Buzz);
        assert_eq!(answers[6], Fizz);
        assert_eq!(answers[34], FizzBuzz);
        assert_eq!(answers[35], Number(36));
        assert_eq!(answers.len(), 50);
    }

Benchmarking with Divan

Again, a reminder that Andy was at pains to point out that he didn't know if any of the changes he was making were actually making the code more performant and that wasn't the goal of his video. I was interested in that though!

The Divan crate has good instructions, but basically it's a case of adding it to Cargo.toml as a development dependency and then adding a [[bench]] section:

[dev-dependencies]
divan = "0.1.21"
test-case = "3.3.1"

[[bench]]
name = "fizzbuzz"
harness = false

INFO

The harness = false bit disables the built-in benchmarking so that Divan can take over.

The name = "fizzbuzz" part corresponds to my benches/fizzbuzz.rs file where my benchmark functions live. In that file are just a list of functions each of which calls an implementation of Fizz buzz.

#[divan::bench]
fn naive_bench() -> Vec<Answer> {
    fizzbuzz_all(naive, divan::black_box(2_000_000))
}

#[divan::bench] tells rustc that the following function is a Divan benchmark and benchmarking code should be generated. The divan::black_box(2_000_000) part avoids the compiler optimising away code that seemingly is not actually used for anything.

This is going to generate Fizz buzz for every number between 1 and 2,000,000 many many times and sample how long it took to do it, with that particular implementation (naive()).

I'm pretty sure there is a nicer way to organise that benches/fizzbuzz.rs file but this was good enough for my first try!

Results

On my (8 year old, rather slow Intel(R) Core(TM) i7-8700 CPU @ 3.20GHz) desktop computer it comes out like this:

Timer precision: 12 ns
fizzbuzz                                    fastest       â”‚ slowest       â”‚ median        â”‚ mean          â”‚ samples â”‚ iters
â”œâ”€ early_return_before_mod_bench            53.56 ms      â”‚ 65.34 ms      â”‚ 54.2 ms       â”‚ 54.66 ms      â”‚ 100     â”‚ 100
â”œâ”€ mod_then_match_bench                     56.45 ms      â”‚ 60.22 ms      â”‚ 57.23 ms      â”‚ 57.38 ms      â”‚ 100     â”‚ 100
â”œâ”€ naive_bench                              59.62 ms      â”‚ 63.31 ms      â”‚ 60.16 ms      â”‚ 60.33 ms      â”‚ 100     â”‚ 100
â”œâ”€ single_string_scan_bench                 59.59 ms      â”‚ 64.17 ms      â”‚ 60.3 ms       â”‚ 60.48 ms      â”‚ 100     â”‚ 100
â•°â”€ single_string_scan_early_fizzbuzz_bench  56.86 ms      â”‚ 59.98 ms      â”‚ 57.65 ms      â”‚ 57.81 ms      â”‚ 100     â”‚ 100

Now, if you recall, the order in which these implementations had been thought up in the video was:

naive()
mod_then_match()
early_return_before_mod()
single_string_scan()
single_string_scan_early_fizzbuzz()

The thought was that each iteration would hopefully be faster than the previous one.

When I had first got the benchmarking done, I think that due to a combination of cold CPU cache and some busy tasks on my desktop at the time (leading to varying CPU time being available), I got quite an extreme result for single_string_scan() and its variant single_string_scan_early_fizzbuzz(). It came out about 12% slower than the fastest of the other implementations, and I rather excitedly told Andy about this.

On further checking this became a bit less dramatic, however, as can be seen above. The mean time for early_return_before_mod() is 54.66 ms while the mean time for single_string_scan() is 60.48 ms. That's about 10% slower.

This is consistently reproducible on my desktop and what it means is that a for loop iterating through n_str.chars() once is slower than doing n_str.contains(â€¦) twice! That's the only difference between those two implementations.

That last single_string_scan() version was thought to be a good place to leave it, but in fact it ended up slower than most of the other versions. I think it's a quite good real-world example of why not to try performance tuning without checking.

It's also interesting to see what happens on a faster computer. I refreshed my home fileserver's hardware within the last year so it is actually one of the newest computers I own at home (AMD Ryzen 9 7900 12-Core Processor). Results here look like:

Timer precision: 10 ns
fizzbuzz                                    fastest       â”‚ slowest       â”‚ median        â”‚ mean          â”‚ samples â”‚ iters
â”œâ”€ early_return_before_mod_bench            30.6 ms       â”‚ 42.58 ms      â”‚ 30.93 ms      â”‚ 32.37 ms      â”‚ 100     â”‚ 100
â”œâ”€ mod_then_match_bench                     28.09 ms      â”‚ 32.54 ms      â”‚ 28.94 ms      â”‚ 28.97 ms      â”‚ 100     â”‚ 100
â”œâ”€ naive_bench                              29.44 ms      â”‚ 31.55 ms      â”‚ 30.11 ms      â”‚ 30.14 ms      â”‚ 100     â”‚ 100
â”œâ”€ single_string_scan_bench                 31.63 ms      â”‚ 50.32 ms      â”‚ 31.83 ms      â”‚ 32.62 ms      â”‚ 100     â”‚ 100
â•°â”€ single_string_scan_early_fizzbuzz_bench  31.28 ms      â”‚ 32.26 ms      â”‚ 31.53 ms      â”‚ 31.54 ms      â”‚ 100     â”‚ 100

Here the results for all implementations are much closer. Possibly there is nothing really to judge between them, performance-wise. Okay, the machines are of different vintages, but they're both AMD64 running the same version of Debian Linux and the Rust toolchain, and the workload is single-threaded.

On this newer, faster machine, mod_then_match() is consistently very slightly better than every other implementation from the video. This was only the second of five tries at improving this! It's also pleasing that it's one of the nicest to look at. There really is no point in making it look complicated if it has no bearing on the performance, right?

I suppose another way to look at it is that even the very straightforward list of if â€¦ tests (naive()) is amongst the best performers. I will guess that's because this is a pretty simple problem that the compiler ends up optimising down to very similar machine code no matter which of these you choose. I am not smart enough to prove that hypothesis.

Relight my fire

At the beginning I'd sort of expected this outcome, though I hadn't expected for c in n_str.chars() to be noticeably worse than n_str.contains(â€¦). I'd expected the results to be very close, and they are, except for that. It still felt like a bit of an anticlimax though, and I wondered if there was anything else I could learn.

This is a very noticeably CPU-bound task. My desktop's fans go crazy while running cargo benchmark and I see that single-threaded process at 100% the whole time. I wondered what a flame graph might look like.

This turns out to be really easy with Rust.

$ sudo apt install linux-perf
$ cargo install flamegraph
$ CARGO_PROFILE_RELEASE_DEBUG=true cargo flamegraph

INFO

The CARGO_PROFILE_RELEASE_DEBUG=true causes the release build that cargo will generate and run to still include debug symbols, which give the generated graph more details. Normally release builds don't include debug symbols.

That generates a flamegraph.svg file, which I do this to:

$ sed -i 's/eeeeee/111111/g; s/eeeeb0/111100/g' flamegraph.svg

because I have some vision issues and prefer a dark background.

Let's look at the mod_then_match() implementation.

It's a good idea to click on this to view it directly in your browser. The SVG will provide detailed hover text for each function but that might be easier to see when it's the full width of your browser, and that way you can also click on a function to exclude all others.

You see in there that fzbz::mod_then_match is using 70.06% of the CPU time. So, not much to gain from improving anything outside of that function. Inside it, we've got:

<T as alloc::string::ToString>::to_string using 40.50% of CPU; then
core::ptr::drop_in_place<alloc::string::String> using 8.77%; then
core::str::<impl str>::contains using 11.61%

So 60.88% of the CPU time of the whole thing is spent converting numbers to strings and then looking for characters inside them.

It hurts when I do this

Yeah, so in this case it's not very hard to avoid turning these numbers into strings.

pub fn only_using_mod(n: i32) -> Answer {
    let (buzzy, fizzy) = test_for_fives_and_sevens(n);

    let buzzy = buzzy || n % 5 == 0;
    let fizzy = fizzy || n % 7 == 0;

    match (buzzy, fizzy) {
        (true, true) => Answer::FizzBuzz,
        (true, _) => Answer::Buzz,
        (_, true) => Answer::Fizz,
        _ => Answer::Number(n),
    }
}

fn test_for_fives_and_sevens(mut n: i32) -> (bool, bool) {
    let mut five = false;
    let mut seven = false;

    // Doing `n % 10` separates off the last (right-most, least significant)
    // digit, then dividing by 10 lops off that digit and lets us consider
    // the next one. e.g. given `n = 4567`:
    // n % 10 = 7
    // seven = true
    // n / 10 = 456
    // n % 10 = 6
    // n / 10 = 45
    // n % 10 = 5
    // five = true
    // n / 10 = 4
    // n % 10 = 4
    // n / 10 = 0, stop there returning (true, true).
    while n > 0 {
        let digit = n % 10;

        match digit {
            5 => five = true,
            7 => seven = true,
            _ => {}
        };

        n /= 10;
    }

    (five, seven)
}

Plus another variant (only_using_mod_with_early_return()) that returns early if both a 5 and a 7 have been seen.

Slow desktop benchmark

fizzbuzz                                    fastest       â”‚ slowest       â”‚ median        â”‚ mean          â”‚ samples â”‚ iters
â”œâ”€ early_return_before_mod_bench            53.3 ms       â”‚ 56.87 ms      â”‚ 53.89 ms      â”‚ 54.11 ms      â”‚ 100     â”‚ 100
â”œâ”€ mod_then_match_bench                     56.4 ms       â”‚ 62.62 ms      â”‚ 57.03 ms      â”‚ 57.24 ms      â”‚ 100     â”‚ 100
â”œâ”€ naive_bench                              59.34 ms      â”‚ 62.13 ms      â”‚ 60 ms         â”‚ 60.12 ms      â”‚ 100     â”‚ 100
â”œâ”€ only_using_mod_bench                     22.9 ms       â”‚ 24.24 ms      â”‚ 23.16 ms      â”‚ 23.26 ms      â”‚ 100     â”‚ 100
â”œâ”€ only_using_mod_with_early_return_bench   14.28 ms      â”‚ 15.68 ms      â”‚ 14.6 ms       â”‚ 14.65 ms      â”‚ 100     â”‚ 100
â”œâ”€ single_string_scan_bench                 59.32 ms      â”‚ 62.9 ms       â”‚ 60.15 ms      â”‚ 60.25 ms      â”‚ 100     â”‚ 100
â•°â”€ single_string_scan_early_fizzbuzz_bench  56.73 ms      â”‚ 59.24 ms      â”‚ 57.59 ms      â”‚ 57.68 ms      â”‚ 100     â”‚ 100

Faster server benchmark

fizzbuzz                                    fastest       â”‚ slowest       â”‚ median        â”‚ mean          â”‚ samples â”‚ iters
â”œâ”€ early_return_before_mod_bench            30.6 ms       â”‚ 42.58 ms      â”‚ 30.93 ms      â”‚ 32.37 ms      â”‚ 100     â”‚ 100
â”œâ”€ mod_then_match_bench                     28.09 ms      â”‚ 32.54 ms      â”‚ 28.94 ms      â”‚ 28.97 ms      â”‚ 100     â”‚ 100
â”œâ”€ naive_bench                              29.44 ms      â”‚ 31.55 ms      â”‚ 30.11 ms      â”‚ 30.14 ms      â”‚ 100     â”‚ 100
â”œâ”€ only_using_mod_bench                     11.63 ms      â”‚ 12.1 ms       â”‚ 11.64 ms      â”‚ 11.67 ms      â”‚ 100     â”‚ 100
â”œâ”€ only_using_mod_with_early_return_bench   8.274 ms      â”‚ 9.495 ms      â”‚ 8.492 ms      â”‚ 8.786 ms      â”‚ 100     â”‚ 100
â”œâ”€ single_string_scan_bench                 31.63 ms      â”‚ 50.32 ms      â”‚ 31.83 ms      â”‚ 32.62 ms      â”‚ 100     â”‚ 100
â•°â”€ single_string_scan_early_fizzbuzz_bench  31.28 ms      â”‚ 32.26 ms      â”‚ 31.53 ms      â”‚ 31.54 ms      â”‚ 100     â”‚ 100

Flame graph for this version

With that, fizzbuzz_all() uses 62,787,476 samples while test_for_fives_and_sevens_with_early_return() uses 47.024,211 samples, so just that function is still 74.89% of the CPU time of the meaningful part of the whole program.

Where to go next?

I'm not sure if this performance can be improved but I would like to improve my knowledge of Divan.

I'd like to try benchmarking different end values for each implementation, so for example to see how fast each can do up to 1,000, up to 10,000, up to 100,000 and so on. There might be some distributions that are faster than others.

Once the numbers get long, is it worth trying to short circuit the "divisible by both 5 and 7" case in order to sometimes avoid having to scan through the whole number?

Maybe not but I'd like to work out how to do it anyway.

What if you spent some memory to cache the outcome of expensive calculations? fizzbuzz(i32::MAX) is always Fizz (in this variant)!

Then there is multithreading, but that is definitely for a later date!

by Andy at February 02, 2026 12:00 AM

January 24, 2026

Alex Hudson – Mindful Chef Review: a long-term reflection

I’ve been cooking with Mindful Chef for about two years. For those who don’t know, it’s sort of yet another meal kit service — you can probably think of similar names — but with a focus on nutrition and balanced meals.

If you’ve been thinking about trying it out, I can heartily recommend it, but it’s not going to suit everyone. Read my (too extensive?) thoughts to see if it’s right for you.

January 17, 2026

Alan Pope – Malware Peddlers Are Now Hijacking Snap Publisher Domains

tl;dr: There’s a relentless campaign by scammers to publish malware in the Canonical Snap Store. Some gets caught by automated filters, but plenty slips through. Recently, these miscreants have changed tactics - they’re now registering expired domains belonging to legitimate snap publishers, taking over their accounts, and pushing malicious updates to previously trustworthy applications. This is a significant escalation.

Context

Snaps are compressed, cryptographically signed, revertable software packages for Linux desktops, servers, and embedded devices. They use standard security primitives in the Linux kernel alongside technology developed by Canonical for Ubuntu.

Snaps are published in the Canonical-run Snap Store. Anyone can sign up for an account and, with relatively few roadblocks, publish a confined application. There are over 7,000 publicly published snaps from hundreds of developers.

Yes, I’m Banging This Drum Again

I have written about this topic before.

Exodus Bitcoin Wallet: $490K Swindle - February 2024
Exodus Bitcoin Wallet: Follow up 2.0 - February 2024
Guess Who’s Back? Exodus Scam BitCoin Wallet Snap! - March 2024

I worked for Canonical between 2011 and 2021 as an Engineering Manager, Community Manager, and Developer Advocate. I was a strong advocate for snap packages and the Snap Store. While I left the company nearly five years ago, I still maintain nearly 50 packages in the Snap Store, with thousands of users.

I also worked for Anchore until September 2025, where they sponsor development of two open-source, Apache-licensed security tools: Syft (an SBOM generator) and Grype (a vulnerability scanner).

I mention my employment history for transparency, and because knowledge from both roles informed this investigation.

Personally, I want the Snap Store to be successful, and for users to be confident that the packages they install are trustworthy and safe.

Currently, that confidence isn’t warranted, which is a problem for desktop Linux users who install snap packages.

I report every bad snap I encounter, and I know other security professionals do the same - even though doing so results in no action for days sometimes.

Enter SnapScope

I recently created SnapScope as part of a “VibeCoding” contest run by some friends at Chainguard. They’re good people doing great work - check them out.

My original intention was to create a web app that generates an SBOM (Software Bill of Materials) for each snap using Syft, then produces a vulnerability report using Grype. Neither tool is perfect, and other tools exist, but I’m familiar with these and they suited the contest nicely.

While monitoring SnapScope, I was reminded of the malware problem. SnapScope wasn’t intended to find malware - it was meant to identify outdated snaps with potential security issues. But I soon added features to highlight suspicious packages, so users could make informed decisions when searching for software.

To be clear: none of this should be seen as an attack on the Snap Store, Canonical, or the engineers working on these problems. I’m raising awareness of an issue that exists, because I want it fixed.

The Scam

The perpetrators - probably based in or near Croatia, based on data I’ve been gathering - are constantly attempting to publish fake crypto wallet applications. The malware masquerades as genuine apps like Exodus, Ledger Live, or Trust Wallet. It asks users to enter their wallet recovery phrase, sends those credentials to the criminals, displays an error to the user, and by the time anyone realises what’s happened, the wallet is empty.

Whatever you think of cryptocurrency, these people are absolute dog shit.

The Cat-and-Mouse Game

The Canonical Snap Store admins have attempted to stem the flow. However, as with many security issues, it’s a relentless game of whack-a-mole.

The scammers have tried various approaches to appear legitimate:

Initially, they just published authentic-looking applications with plausible screenshots and storefront pages. This was the approach I documented in my previous articles about the fake “Exodus” and “Ledger Live” apps.

Then, they started evading text filters by using similar-looking characters from other alphabets. For example, replacing the lowercase “d” in “Ledger” with the Armenian letter Zhe “ժ”, which at first glance passes for a Latin “d”. They also use Palochka “ӏ” which resembles a lowercase “L”.

More recently, they’ve adopted a bait-and-switch approach:

Register an innocuous, unrelated snap name like lemon-throw, alpha-hub, or tenor-freeze
Publish something harmless - often claiming to be a game
Wait for approval
Push a second revision containing the fake crypto wallet app

Some of us in the community diligently report these applications to Canonical, and after a period they get removed. But the sad fact is they keep getting through, and users could see, install, and be fooled by these malicious applications before anyone catches them. Déjà vu, much?

Following the Telegram Trail

Every so often, I download these dodgy snaps (snap download badsnap), unpack them (unsquashfs badsnap.snap), and have a poke around. Frequently it’s the same codebase republished under a new name. Sometimes they vary the language or framework, but almost always follow the same pattern: an app rendering a web page that looks like a legitimate crypto wallet.

I discovered that when launched, the applications ping this URL:

https://togogeo.com/trust.php?name=8888-8888

This appears to be a connectivity test - if the victim has no internet connection or if togogeo.com is inaccessible, the application reports an error and refuses to continue. Makes sense from the scammers’ perspective: no point asking for credentials if you can’t exfiltrate them.

Initially, hitting this URL manually with curl returned something rather revealing:

{
  "status": "success",
  "message": "Notification sent to Telegram: 8888-8888",
  "telegram_response": {
    "ok": true,
    "result": {
      "message_id": 30111,
      "from": {
        "id": 7477383815,
        "is_bot": true,
        "first_name": "SmartShieldBot",
        "username": "pandadrainerbot"
      },
      "chat": {
        "id": 5915612669,
        "first_name": "ezzcash",
        "username": "ikaikaika101",
        "type": "private"
      },
      "date": 1767464029,
      "text": "8888-8888"
    }
  }
}

Well, hello there @ikaikaika101. Nice of them to leave their Telegram username in the response. They’ve since changed this to return only an exclamation mark - clearly someone’s reading my work, or at least noticed they were leaking operational details.

They also used to have an avatar, but after I started poking the URL, they removed that from their Telegram account. Fancy that.

If the user does enter their passphrase (valid or not), it gets sent in a subsequent request to a similar URL. By then, it’s too late.

Domain Squatting: The New Low

Just when you thought they’d run out of tricks, they’ve found another way in - and this one’s particularly nasty.

Rather than registering new accounts and hoping to slip through the filters, they’ve started monitoring the Snap Store for published applications whose publishers’ domain registrations have lapsed. Think about it: a snap published years ago by somedev@coolproject.tech, where coolproject.tech expired and wasn’t renewed.

The scammers swoop in, register the expired domain, trigger a password reset on the Snap Store account, and boom - they now control a legitimate, trusted publisher account with an established history. No new account scrutiny. No “New Publisher” flags. Just an existing, seemingly trustworthy snap that suddenly gets a new revision containing their wallet-draining malware.

I’ve identified at least two domains this has happened with recently: storewise.tech and vagueentertainment.com. There are almost certainly more.

This is a significant escalation. Previously, users could at least be wary of newly published snaps from fresh accounts. Now? That snap you installed three years ago from a reputable-looking publisher could push a malicious update tomorrow.

What Now?

I’m not writing this to bury snaps or the Snap Store. I want this ecosystem to succeed - I still maintain nearly 50 packages there myself. But pretending there isn’t a problem helps nobody.

The domain takeover angle is particularly concerning because it undermines one of the few trust signals users had: publisher longevity. Canonical needs to address this, whether that’s monitoring for domain expiry on publisher accounts, requiring additional verification for accounts that have been dormant, implementing mandatory two-factor authentication, or something else entirely. I don’t have all the answers, but I know the current situation isn’t sustainable.

If you’re a snap publisher: keep your domain registration current, and enable two-factor authentication if you haven’t already. Your lapsed domain could become someone else’s attack vector.

If you’re a snap user: be extremely cautious with cryptocurrency wallet applications from any source, not just the Snap Store. Check the publisher, check when the snap was last updated, and if something feels off, it probably is. Actually, scratch that - just don’t install crypto wallet apps from app stores at all. Get them directly from the official project websites.

If you spot something dodgy: report it. Use the “Report this app” link at the bottom of the snap’s store page. The community catching these before they catch users is the best defence we’ve got right now.

I’ll keep monitoring via SnapScope and reporting what I find. Someone has to.

Stay safe out there.

by Alan Pope at January 17, 2026 07:00 PM

January 10, 2026

Alex Hudson – Why I Left Twitter

I left Twitter not long after it rebranded to X. At the time, I didn’t write about why â€” I simply walked away. But recent developments have prompted me to finally put my thoughts down.

January 04, 2026

Paul Rudkin – Your new post

Your new post

This is another post

by Paul Rudkin at January 04, 2026 07:43 AM

Paul Rudkin – This is my second post

This is my second post

This is another post

by Paul Rudkin at January 04, 2026 07:43 AM

December 25, 2025

Laura Hobbs – Updated Key Lime Cheesecake Recipe

I have tweaked my Key Lime Cheesecake recipe since I did it the first time so that it’s not two separate desserts layered together. I prefer the consistency of this version and think the flavor is better as well.

by laura at December 25, 2025 07:15 PM

Laura Hobbs – Peppermint Bark Cheesecake

I made this for our Christmas dinner dessert this year. I don’t like it as much as the Key Lime Cheesecake, but if Peppermint is your thing, here’s the recipe.

by laura at December 25, 2025 07:11 PM

November 28, 2025

BitFolk Wiki – Hardware refresh, 2025-2026

Progress: WHat comes next

← Older revision		Revision as of 21:26, 28 November 2025
Line 16:		Line 16:
	\|}		\|}

	As of 2025-08-05 BitFolk ~~is working on~~ server "talisker". ~~Notifications have been sent out~~ and those ~~who respond early~~ will be ~~migrated~~, with the ~~rest starting a week later~~.		As of 2025-11-27 BitFolk has finished emptying server "talisker".

			As "talisker" is a pretty beefy server it will not be retired yet, so it needs to be reinstalled. Also it needs 25Gbps network interfaces added, and the previous SAS instability might be avoided by re-routing those drives to the motherboard's SATA ports instead.

			The next server to be worked on will be "limoncello", and some small amounts of progress can be made in parallel with the remaining work on "talisker", but the "talisker" work really needs to be completed before much more progress can be made.

	<br clear="all"/>		<br clear="all"/>

by Strugglers at November 28, 2025 09:26 PM

BitFolk Wiki – Hardware refresh, 2025-2026

add cute progress bar

← Older revision		Revision as of 20:14, 28 November 2025
Line 8:		Line 8:

	{\| class="wikitable" style="float:right; width:25em; margin:1em"		{\| class="wikitable" style="float:right; width:25em; margin:1em"
	! Upgrade progress as of<br />{{#external_value:progress_at}} UTC		!colspan="2"\| Upgrade progress as of<br />{{#external_value:progress_at}} UTC
	\|- style="text-align:center"		\|- style="text-align:center"
	\| {{#external_value:progress_pct_done}}% complete		\|colspan=2\| {{#external_value:progress_pct_done}}% complete
			\|-
			\|style="background-color: #597258" width="{{#external_value:progress_pct_done}}%"\|
			\|style="background-color: #D3E6D2"\|
	\|}		\|}

by Zuzak at November 28, 2025 08:14 PM

November 18, 2025

Paul Rayner – pgfs - a FUSE filesystem on top of a postgres database

pgfs is a FUSE based filesystem which is backed by a postgres table, with a bytea field as the file. The filesystem is configured with a simple toml file:

mountpoint = "/home/paul/pgfs"

[database]

database = "127.0.0.1/pgfstest"
user = "user"
pass = "pass"

[filestest]

table_name = "files"
data_type = "bytea"
id_field = "id"
length_field = "length"
data_field = "file"
created_date_field = "created"
modified_date_field = "modified"
data_query = "select id, name,  octet_length(file) as length, created, modified from files"
readonly = false

if you then create a table files:

create table files (
    id serial primary key,
    name varchar,
    created timestamp default now(),
    modified timestamp,
    file bytea
);

Running pgfs with the specified config will present you with a filesystem at /home/paul/pgfs which contains a single folder “files”. Inside that folder, there will be one file per record in the files table. You can modify files, add them (creates a new record), or delete them. It’s not very fast (FUSE uses 4k writes and postgres isn’t optimised for seeking inside a bytea field), but it works. On my linux box, vim works on text files just fine, and drag and drop, renaming, thumbnails, copy & paste etc all work fine in nautilus.

I built this mostly for fun (to see if I could get it to work), but for some small projects I store files as bytea fields for convenience so it’s useful.

I plan to add a few niceties, one which would make it a lot more useful would be to allow paths to show up as links, so if you don’t store bytea files (which I wouldn’t typically do either), but store paths to files outside the database, you can have a filesystem created with those links in. Presenting a table as a delimited text file you can edit as a file would also be neat.

The code is on Github, and it’s published to crates.io, so you can install with

cargo install pgfs

I wrote most of this code back in 2021, but am trying to tidy up and publish a few utilities I’ve created over the years in the coming months.

Thanks for reading

by Paul Rayner at November 18, 2025 12:00 PM

October 31, 2025

BitFolk Wiki – PVH

Deadlines: The increasing need to boot in PVH mode

← Older revision		Revision as of 23:30, 31 October 2025
Line 297:		Line 297:

	==Deadlines==		==Deadlines==
	~~The hard deadlines are all about~~ 32-bit ~~guest~~ support ~~at the moment~~.		Ancient 32-bit kernels are already out of support but are being kept bootable by BitFolk using a 64-bit PVH shim.

	~~Conceivably there could end up being security issues~~ that ~~mean~~ PV ~~guests become infeasible, but at the time of writing the reduced performance and increased security risk is all on the customer so it can be deferred to customer choice whether to switch~~.		The remaining deadlines are mostly to do with Linux distributions deciding to use kernel compression that BitFolk's PV bootloader doesn't support.

	===~~Xen 32-bit PV support~~===		===Debian kernels since 13 (trixie) are compressed with '''zstd'''===
	~~The current stable version~~ of ~~Xen is 4.14~~, and ~~this will be the last version~~ to ~~support 32-bit PV guests~~. ~~It has security support until~~ '''~~2023-07-24~~'''. ~~After this date the only~~ way ~~to run a 32-bit guest will be in~~ PVH mode.		As of some time in the Debian 13 (trixie) release cycle, Debian kernel images became compressed with '''zstd''' and so require PVH mode to boot them.

			===Ubuntu kernels since 20.04 are compressed with either '''lz4''' or '''zstd'''===
			Since 20.04 Ubuntu kernels are compressed ina way that requires PVH mode to boot them at BitFolk.

	===~~Linux 32-bit PV guest support~~===		===Fedora kernels are compressed with '''zstd'''===
	'''~~Already removed as of 5.9.0~~'''~~. If you don't want to switch to~~ PVH mode ~~you'll need to run a kernel older than 5.9~~.		As of some time in 2020 Fedora kernels were already compressed with '''zstd''' which requires use of PVH mode.

	~~Not likely to be an issue since~~ the ~~majority of~~ customers ~~wishing to keep running~~ 32-bit ~~guests are on ancient out of support Linux distributions that get no kernel upgrades anyway~~. ~~You would only be hit by this if you wanted~~ to ~~do a new~~ 32-bit ~~install of what~~ is ~~currently Debian testing, for example, or when installing a new custom mainline kernel on your old install~~.		===Xen 32-bit PV support===
			The Xen hypervisor disabled support for 32-bit PV guests in 2023. Since that time the remaining BitFolk customers on 32-bit have been run inside a 64-bit PVH shim container. This has security and performance drawbacks inherent to 32-bit and is not recommended.

by Strugglers at October 31, 2025 11:30 PM

BitFolk Wiki – Using the self-serve net installer

Debian: Debian 13 needs PVH mode

â†� Older revision		Revision as of 23:02, 31 October 2025
Line 68:		Line 68:

	* 13.x (trixie) from '''v1.48bitfolk82'''		* 13.x (trixie) from '''v1.48bitfolk82'''
			:Note that as of this release the Debian kernel uses '''zstd''' compression which is not supported by BitFolk's PV mode bootloader and booting of any kernel will fail at the initial GRUB menu. You should switch to [[PVH]]] mode, which BitFolk recommends anyway. No other changes should be necessary.
	* 12.x (bookworm)		* 12.x (bookworm)
	:Note that as of this release '''udev''' has learned about Xen virtual Ethernet devices and so will by default rename your eth0 to enX0. Our installer will correctly set up your '''/etc/network/interfaces''' file for this device name. This is not specific to BitFolk or Xen; it's a consequence of the usual "predictable names" scheme. For more information please see the [https://wiki.debian.org/NetworkInterfaceNames Debian Wiki].		:Note that as of this release '''udev''' has learned about Xen virtual Ethernet devices and so will by default rename your eth0 to enX0. Our installer will correctly set up your '''/etc/network/interfaces''' file for this device name. This is not specific to BitFolk or Xen; it's a consequence of the usual "predictable names" scheme. For more information please see the [https://wiki.debian.org/NetworkInterfaceNames Debian Wiki].

by Strugglers at October 31, 2025 11:02 PM

Jon Spriggs – Meetup with my tech community of the 2000-2010s – Catching up with GeekUp

Last night, I met with a group of friends that I’d not seen for probably 10 years, and it was glorious.

In 2006, I was relatively fresh-faced in the North-West Tech Community. I’d moved North in 2002, spent a couple of years finding my feet, and finally ended up hearing about GeekUp. GeekUp had been a mash-up of a few different Web Developers community groups, and was, for many years, my “geeky home”. It met monthly in a room above a bar, under a bar, and for a period of time, in a small nook on a balcony over a bar. Organised by Andrew for much of it’s time, and spread from Manchester to Leeds, Preston, Sheffield, Liverpool and further afield, it was a safe place for anyone who worked in Tech to come and realise that life wasn’t so bad all the time, or… at least, that’s how it felt to me. It was the first place I gave a talk outside of work, and a place where I finally felt like I’d found my “tribe”.

In 2011, my eldest was born, and I found it progressively harder and harder to get to meetings, and by 2016, when the meetings finally ended, I’d not been to one for over a year.

I kept an eye on people I knew, and kept bumping into individuals around the events I could make it to – mostly BarCamps [1] and by 2018 I missed the community feel, so I created something like GeekUp in my semi-rural area. It ran for about a year, by which time the mojo had gone, and I wrapped up the group … we were ahead of the curve on closing in-person meetups by a year!

About a month ago, I saw on LinkedIn that GeekUp was holding one-part reunion, one-part restart of events at the Leigh Hackspace, near Warrington.

And my word, what a GeekUp it was; I knew every face (barring one Leigh Hackspace user who turned up and discovered we had pizza, so stayed) although some took me a while to recall who they were and we all got to chat about what we’ve done in the in-between times.

Will there be another one? Andrew hopes to restart something in Manchester in the new year. One of the former Preston GeekUp organisers wants to restart something there, and the Leigh Hackspace was a great space to host “something”… although it might be a bit far for me for a monthly meeting!

So, what’s my message here? Well, if you previously organised a meetup, social group or community for some passion of yours, and it’s been a while since you ran one… why not consider whether you might consider reviving it – especially if you’ve still got the passion for that thing… and get back out there. I certainly enjoyed being back in the thick of it all.

[1] BarCamp: An “Unconference” or Unscheduled Conference; where there is no pre-defined schedule for your conference, just a number of meeting spaces, a “grid” or timetable, and a stack of sticky notes and pens for your attendees to put their talks forward. Usually free or a small nominal cost for room hire.

Featured image is the GeekUp logo, retrieved 2025-10-31 from https://geekup.org

by JonTheNiceGuy at October 31, 2025 12:47 AM

October 07, 2025

David Leadbeater – Bash a newline: Exploiting SSH via ProxyCommand, again (CVE-2025-61984)

A look at how a newline character in SSH usernames could confuse ProxyCommand in OpenSSH, leading to command injection and potential RCE.

by David Leadbeater at October 07, 2025 03:05 AM

September 28, 2025

David Leadbeater – Switchable dark mode with 5 lines of JavaScript

Progress on progressive enhancement with new CSS features.

by David Leadbeater at September 28, 2025 03:29 AM

September 25, 2025

Josh Holland – A smart e-paper display

A smart e-paper display

25 September 2025

A picture is worth a thousand words, so here is the finished product:

A 7-inch e-paper display, showing the current date and time, a summarised weather forecast, the next bin collection day and type, the current grid carbon intensity and a QR code (with fake details) for my home WiFi network.

This is an e-paper display, made by RevK, and sold on Tindie, which I have set up to show some useful stuff in my kitchen. To the extent that this post is a review, it’s a positive one: the display itself is of great quality, and the software running on the controller, while a little tricky to get to grips with, is flexible enough to serve all of my needs, with a fun side project to build along the way.

This project started when my partner asked whether I could put something together that would use the UK grid operator’s Carbon Intensity API to notify us (or her) when the grid was operating at its cleanest so we know when to run the washing machine or dishwasher. I spent a bit of time trying to figure out the Web Push API but never really got anywhere with it, and it got shelved until I saw various posts from RevK’s Mastodon about the e-paper devices he’d been working on and I decided that would be a fun way to show the data. I also thought it’d be useful to show the upcoming bin collections on it too, especially as I saw there was a built-in module to do that in the controller (but more on that later).

So $160 and 1 working day’s delivery later, I had the screen in my hands. The set-up experience was pretty smooth: once power is supplied by USB-C, the device hosts a WiFi network for you to connect to and then all configuration is done via a web interface it provides.

A screenshot of the EPD’s web configuration interface. It shows buttons at the top for general configuration, and then 20 buttons for individual widgets. The first widget, configuring the date, is selected, showing settings for widget position, alignment, font size and date format.

The web interface takes a bit of getting used to, and a lot of reference to the docs to figure out which substitution variables and size parameters go where, but eventually I had things laid out and looking like I wanted them to.

The next step was putting together a service¹ to bridge between the display and the Carbon Intensity API and council bin services so that it could display them. This amounts to providing some JSON endpoints which provide the necessary data. For the bins display, there’s a schema in the docs, but the carbon intensity display uses the $API substitution variable so that is up to me.

I wrote this service using Elixir’s Phoenix framework, and the source is available here. It’s nothing too complicated, but it’s the first “production” service I’ve built in Elixir. I reimplemented the logic for scraping Lancaster City Council from the excellent UKBinCollectionData repo. There’s also some caching so it doesn’t re-fetch the same info on every call to the endpoint when the bin collection dates aren’t going to change very often.

And it all works beautifully! I included a NixOS module in the flake for the service, so it was easy to deploy it to the Raspberry Pi sitting next to my WiFi router, and it’s been sitting there doing the job for a couple of months now.

I named it “thyra” from the Ancient Greek “θύρα” meaning “door”, i.e. “portal”.↩︎

by Josh Holland at September 25, 2025 12:00 AM

September 24, 2025

Jon Spriggs – Getting an asset from a Github Release in Bash

A few years ago, I wrote an Ansible role to download the latest version of a file from a Github release. Around the same point, I also wrote a bash script to do the same thing. For some reason, I never released either of them (or if I did, I can’t find them), so here’s the Bash one, as I needed to reuse it today :)

#!/bin/bash

AGENT=""
TRY_WGET=1
TRY_CURL=1
if [ "$TRY_WGET" == "1" ] && command -v wget >/dev/null 2>&1
then 
    AGENT=wget
elif [ "$TRY_CURL" == "1" ] && command -v curl >/dev/null 2>&1
then
    AGENT=curl
fi

if [ -z "$AGENT" ]
then
    echo "Error: No HTTP agent (curl and wget tested)" >&2
    exit 1
fi

DEBUG() {
    echo "$@" >&2
}

GET() {
    URL="$1"
    TARGET="${2:-}"
    if [ "$AGENT" == "curl" ]
    then
        if [ -z "$TARGET" ]
        then
            DEBUG curl --silent "$URL"
            curl --silent "$URL"
        elif [ "$TARGET" == "ORIGIN" ]
        then
            DEBUG curl --silent -LO "$URL"
            curl --silent -LO "$URL"
        else
            DEBUG curl --output "$TARGET" --silent "$URL"
            curl --output "$TARGET" --silent "$URL"
        fi
    else
        if [ -z "$TARGET" ]
        then
            DEBUG wget -qO- "$URL"
            wget -qO- "$URL"
        elif [ "$TARGET" == "ORIGIN" ]
        then
            DEBUG wget -q "$URL"
            wget -q "$URL"
        else
            DEBUG wget -q -O "$TARGET" "$URL"
            wget -q -O "$TARGET" "$URL"
        fi
    fi
}

REPO="${1:-}"
ASSET="${2:-}"
VERSION="${3:-latest}"

[ "$VERSION" != "latest" ] && VERSION="tags/$3"

RELEASE_JSON=$(GET "https://api.github.com/repos/$REPO/releases/$VERSION")
ASSET_URL=$(echo "$RELEASE_JSON" | grep -oP "(?<=browser_download_url\": \")[^\"]*${ASSET}" | head -n 1)

if [ -n "$ASSET_URL" ]
then
    GET "$ASSET_URL" "${OUTPUT:-ORIGIN}"
else
    echo "Asset not found" >&2
    exit 2
fi

This is also available as a gist on github.

Featured image is “Yes!” by “storebukkebruse” on Flickr and is released under a CC-BY license.

by JonTheNiceGuy at September 24, 2025 09:19 AM

September 20, 2025

David Leadbeater – Images over DNS

Answering the question of how big a TXT record can be.

by David Leadbeater at September 20, 2025 11:26 AM

September 14, 2025

Andy Smith – Database backups, dump files and restic

In the previous article about rethinking my backups I had a TODO item regarding moving away from using intermediary dumps of database content. Here's some notes about that.

The old way

What I used to do in order to back up some MariaDB databases for example was to have a script something like this called regularly:

#/usr/bin/env bash

set -euf
set -o pipefail

umask 0066

# !!! Insecure file overwrite problem here if attacker can create their own
# /srv/backup/mariadb/all.sql.gz.new file. Should use secure temp file instead.
/usr/bin/mysqldump \
    --defaults-extra-file=/etc/mysql/backup_credentials.cnf \
    --single-transaction \
    --databases mysql dss_wp dev_dss_wp \
    | /bin/gzip --best --rsyncable -c \
    > /srv/backup/mariadb/all.sql.gz.new \
    && mv /srv/backup/mariadb/all.sql.gz.new /srv/backup/mariadb/all.sql.gz

So, every day the databases get dumped out to /srv/backup/mariadb/all.sql.gz and then at some point that day the backup system picks that file up.

Not ideal

That worked but has a few downsides.

Redundant data storage

The data that's in the database also ends up on disk again, although in a quite well compressed form.

Constant change

Even if nothing in the database has changed, the dump file will always change.

gzip and many other compression tools are (or can be set to be) deterministic, in that they will always produce the same output for a given input, so it wasn't necessarily that. More that the metadata of the file such as the inode and modification time would change, and that would be enough for rsnapshot to store an entire extra copy.

There's various things that could be done to mitigate this but I never felt like there was much point in spending time making the "no changes at all" case highly efficient because there usually was some change in the data, even if it was small.

One of the mitigations I used was to switch to btrfs for the backup repository and use reflinks. The --rsyncable flag to gzip then did help a little. The gzip manual explains:

Cater better to the rsync program by periodically resetting the internal structure of the compressed data stream. This lets the rsync program take advantage of similarities in the uncompressed input when synchronizing two files compressed with this flag. The cost: the compressed output is usually about one percent larger.

I figured that if it helped for rsync then use of that flag should help in minimising changes in the compressed file generally. More on that later.

The new way

Since I switched to using restic, I noted the recommendation to use its standard input backup mode for things like this. This would address the above shortcomings as:

It doesn't store anything extra on the database host, and;
it deduplicates, compresses and encrypts the data itself anyway.

The replacement mysqldump script now looks a bit like:

#!/usr/bin/env bash

do_dump() {
    local dbnames=("$@")
    # Prepend "db_" to each element of dbnames.
    local extra_tags=( "${dbnames[@]/#/db_}" )
    # Comma separate.
    local extra_tags_string=$(IFS=,; printf "%s" "${extra_tags[*]}")

    local ignored_tables=(rt5.sessions)

    printf "Starting mysqldump / backup for DBs: %s…\n" "${dbnames[*]}"
    printf "  Ignoring tables: %s\n" "${ignored_tables[*]}"

    # "--stdin-filename" must not contain "/" due to restic bug
    # https://github.com/restic/restic/issues/5324
    /usr/local/sbin/restic \
        --retry-lock 1h \
        backup \
        --no-scan \
        --group-by host,tags \
        --tag "db,db_mariadb,${extra_tags_string}" \
        --stdin-filename "mariadb.sql" \
        --stdin-from-command -- \
        /usr/bin/mysqldump \
        --defaults-extra-file=/etc/mysql/backup_credentials.cnf \
        --default-character-set=utf8mb4 \
        --skip-dump-date \
        --databases "${dbnames[@]}" \
        --ignore-table "${ignored_tables[@]}" \
        --single-transaction

    printf "Finished mysqldump / backup for DBs: %s\n" "${dbnames[*]}"
}

do_dump rt5

Learnings

I have now converted all of my database backup scripts to this way of doing things and discovered a few things along the way.

Avoid pipe confusion with `--stdin-from-command`

Not much of a "discovery" since it's right there in the documentation, but I did do a web search for other people's scripts for database backups using restic and oh boy let me tell you, many of them are still doing the equivalent of:

mysqldump … | restic backup --stdin

That will work most of the time, but does have some caveats regarding the exit code of piped commands. If not careful you can end up making empty backups of a failed mysqldump and not noticing.

You can try using set -o pipefail and/or you can use bash's PIPESTATUS array to examine the exit code of any part of a pipeline. But really, it is much easier to sidestep the issue by not using a pipe at all:

restic backup … --stdin-from-command -- mysqldump …

In that form restic will fail if the command it's executing fails.

Specifying the filename

Naturally, stdin doesn't have a filename. You can get around this with the -stdin-filename flag. If you use --stdin-filename foo then whatever your command outputs will appear as the file /foo inside the backup snapsshot.

NOTE

There's a bug in the currently-released version of restic where it doesn't allow the / character anywhere in the filename, so you can't fake the existence of subdirectories. It doesn't really matter since there's only one file and if you did restore it to the filesystem it would be in a relative path anyway.

The bug has been fixed but the fix isn't in a release yet at the time I write this.

Use tags to differentiate and group backups

It's a good idea to tag these backups in some way.

With restic, backups are by default grouped by host and the set of paths that were specified to backup. I found however that a backup using --stdin-from-command has an empty set of paths for grouping purposes even though --stdin-filename is used. I don't know if this is a bug.

The consequence here is that if you have multiple of these types of backups for a single host, by default restic will use the most recent one as the parent for the current one even if it has a different command and/or --stdin-filename. This doesn't cause too many problems, it's just a bit confusing and will result in a later diff command showing one database dump file being removed and the other added, every time.

It can easily be avoided though by setting --group-by to include tags and making sure different database backups are tagged differently. It is probably a good idea to use some tags anyway so you can programmatically identify what the backups are. This will be useful later if for example you want to have different retention periods for different kinds of data, or for different databases.

I tag my general host backups as auto, and all the database backups as db. Then, there is db_mariadb, db_postgresql and db_sqlite for backups that have come from MariaDB, PostgreSQL and SQLite respectively. Finally I am also adding a tag for each named database.

I think I probably will want to retain most databases for the same time period as general host backups, but I know there are a few less important databases that I will retain for a shorter time. Having those tagged will be helpful for writing the forget policy later.

I learned about `mysqldump --skip-dump-date`

Using --skip-dump-date removes some timestamps from comments which helps to reduce churn.

It really is storing less churn

Having had this running for a few days now I can see it really is storing less of a delta. In the case where I do have databases that haven't changed at all, the snapshot ends up just being a couple of hundred bytes which I assume is just metadata.

You can tell restic not to store a backup with no changes at all, but I like doing so as a record of a successful but unchanged backup.

That `--rsyncable` really does work

My old mysqldump scripts all used gzip --rsyncable but at some point it seems I decided that better compression was more important, so some of them ended up using xz.

I never really examined in detail what the churn was like because rsnapshot made that quite awkward to do, especially after I switched it to using reflinks. I have been able to look at it now though, because I've been doing restic backups for some time before adjusting those mysqldump scripts.

What I can tell you is that restic is able to effectively deduplicate a database backup file made with gzip --rsyncable whereas the ones that are compressed with xz show huge amounts of daily churn even when the database had little.

My conclusions:

gzip --rsyncable really does work for minimising changes if the source file doesn't change much.
zstd now has a similar --rsyncable flag.
xz was a bad choice
If you don't want to do all this --stdin-from-command malarkey or can't because you're doing backups another way, --rsyncable is well worth using. It's nearly as good as just letting restic deduplicate the raw SQL.

by Andy at September 14, 2025 12:00 AM

September 07, 2025

Andy Smith – Rethinking my backups

Growing difficulties with rsnapshot spurred me in to a long-overdue rethink of how I do my backups. I decided to evaluate restic and rustic for this purpose and here's some notes on that.

A brief introduction on how `rsnapshot` works

For about two decades I have done all my backups with the venerable rsnapshot.

rsnapshot's deal is:

You'd usually run it on a central server.
It connects out to the thing you want backed up using rsync over ssh and brings back all the data into a sequence of snapshot directories, for example daily.0, daily.1, … daily.6, weekly.0, … weekly.3, monthly.0, … monthly.56.

The names of these snapshots are arbitrary; the actual age of them simply depends upon when you called rsnapshot and how many of them you told it to keep.

For example if you used the configuration:

interval	daily   7

…then every time you ran a backup it would do:

rm -r daily.6
mv daily.5 daily.6
mv daily.4 daily.5
mv daily.3 daily.4
mv daily.2 daily.3
mv daily.1 daily.2
mv daily.0 daily.1
cp -al daily.1 daily.0

By this means the oldest daily backup is destroyed, each earlier one gets shifted back one day, and then the most recent one gets copied with hard links as the basis for the new snapshot. Since hard links are used here, daily.0 and daily.1 are at this point identical but the data doesn't take up any extra space (space is still needed for the inodes that contain the filesystem metadata so this doesn't break the laws of physics to make it entirely free).

In reality you would likely have more intervals like:

interval	daily   7
interval	weekly	4
interval	monthly	72

This would tell it to also move daily.6 to weekly.0, weekly.3 to monthly.0 and so on.

The clever thing here is that the rsync that rsnapshot calls is only going to mess with the file in the daily.0 directory if the current file on the target differs from it in some way that rsync is able to determine. If rsync sees no change then the file remains a hardlink to it its previous version and takes up almost no space.

In a real rsnapshot setup there's a subdirectory in each interval for the particular thing (e.g. a host) that is being backed up. So, the directory tree will end up looking like:

$ tree /srv/rsnapshot
├── daily.0
│   ├── foo.example.com
│   │   ├── home
│   │   │   ├── andy
│   │   │   │   ├── .bash_profile
│   │   │   │   │
.   .   .   .   .
│   ├── bar.example.com
│   │   ├── home
│   │   │   ├── andy
│   │   │   │   ├── .bash_profile
│   │   │   │   │
.   .   .   .   .
├── daily.1
│   ├── foo.example.com
│   │   ├── home
│   │   │   ├── andy
│   │   │   │   ├── .bash_profile
│   │   │   │   │
.   .   .   .   .
│   ├── bar.example.com
│   │   ├── home
│   │   │   ├── andy
│   │   │   │   ├── .bash_profile
│   │   │   │   │
.   .   .   .   .

Advantages of `rsnapshot`

Simple deployment

You don't get a lot more simple than rsync over ssh and a perl script with no non-core dependencies, that you only have to have in one place. All the backup targets require is working ssh and an rsync binary.

Simple restores

All the files are just there on disk in a regular filesystem.

Limitations of `rsnapshot`

Quickly becomes unwieldy

Unless you're only backing up a fairly trivial amount of data, one inode per file per interval quickly becomes an unwieldy filesystem tree to work with. Operations on a tree of hundreds of millions of files are not cheap, even if most of them are just hardlinks.

This is actually the worst limitation that rsnapshot has, though it's a pretty short explanation. At every backup the rsync component has to traverse the entire tree of that host's previous backup and for each file decide whether it will do nothing (file hasn't changed, so hard link can stay where it is) or if it must transfer the file (file changed so hard link must be broken and new content stored). For any appreciable number of files this will cause the scan time of the backup to be far in excess of the time spent actually transferring data.

But so what? As long as you only want to do a backup say once a day, you have all day to do it, right?

That's true, but that's not where the most pain exists. The most pain exists when trying to manage the backups — when trying to do the basic admin tasks that are inevitable in a working system. Things like trying to work out…

how much data changed between two different backup runs for a given host
which files exactly changed between two different backup runs

…and things of that nature.

Detecting change with hard links

Thankfully checking if two exact file paths are identical is still pretty cheap, thanks to hard links.

Under rsnapshot's design, if two versions of a file path are identical then they should be hard links to each other, and hard links all have the same inode number. That is, if daily.0/foo.example.com/home/andy/.bash_profile and daily.1/foo.example.com/home/andy/.bash_profile have the same inode number then they are by definition the same file, which means there were no detectable changes between the times that rsnapshot ran. There's no need to look at the files' content; a stat() system call will do.

Of course, if they aren't the same inode number then they're probably different files and you would have to confirm that by looking at the file content.

I want to be clear that this only becomes a problem when you have a really large number of files that you are keeping for many rsnapshot intervals.

Exactly how many you can have before it becomes unwieldy will depend upon how beefy your backup server is, mainly in the form of how many random I/O operations per second its storage can do. If you're far off from that point then rsnapshot is a really good system because it's so simple!

The rest of the limitations are ones I could have put up with, but let's talk about them anyway.

Crude deduplication

When most people think about the term deduplication, they think of it in terms of chunks of data. rsnapshot can only do deduplication in terms of whole files.

Due to the use of hardlinks, entirely identical files do not consume any additional space for their data, just an inode for the hardlink. The "entirely identical" part of that sentence is concealing a multitude of caveats. As soon as anything about the file changes, rsync will send a new version of the entire file.

If just the ownership, permissions or any other metadata like modification time change, you'll get an entirely new copy of both sets of file data.

If just one byte is added (or removed!) you'll get an entirely new copy of all the file's data.

What's more, the file paths have to remain the same too! If you have /srv/immense_tree_of_files and rename it to /srv/why_did_i_store_this_immense_tree_of_files, your backups will contain the entire content of both file trees until the oldest backup ages out. It doesn't matter that every file within that tree is still identical to the same path within the other tree.

If you're backing up a big log file that gets a little bit appended and then rotated, you'll get full a copy under the new name (e.g. syslog.1) and multiple full copies under its current name.

As a consequence of all of the above it also follows that the same file path and contents between backup targets will always be backed up multiple times, because there is no deduplication between targets, i.e. daily.0/foo.example.com/home/andy/hugefile and daily.0/bar.example.com/home/andy/hugefile both get stored even if they are actually the same file.

Even crude deduplication can get you a long way though. The seven daily.* intervals plus the four weekly.* intervals of my current rsnapshot repository reference 13.1 times as much data as actually appears on disk. That is, if I'd simply stored a copy every day for the last seven days plus also a copy once per week for four weeks prior to that, I would need more than 13 times as much storage to do it.

No compression

The file tree is just stored on disk in a regular filesystem. There's no compression going on unless the filesystem offers it.

No encryption

The file tree is just stored on disk in a regular filesystem. There's no encryption going on unless the filesystem offers it.

Not very portable

This wasn't a factor for me because all of my machines started off being Unix and later ended up being only Linux, but I suppose some people might have difficulty getting their backup targets to run an sshd and have a working rsync.

A short-term band-aid

Running up against all of those limitations I took some steps to prolong the life of what I was doing.

Firstly I used LUKS to format the backup filesystem, so on disk it's encrypted and hopefully anyone coming into possession of the backup server would have a power cycle in between and so wasn't going to be able to get at it.

Secondly, I was running out of storage capacity so I put the backup filesystem on btrfs with a mount option of compress-force:zstd so that it would always try to compress everything. That reduced the total size on disk by about 24%. Approximately 57% of my backed up data (by bytes) is not compressible at all. The rest did compress down to around 45% of its original size on average.

Thirdly, in an attempt to make deduplication slightly more effective, I changed from using rsync's hard links support to doing a cp -a --reflink=always and made rsync do --in-place. That uses a reflinked copy instead of a hardlink and then hopefully when the file is changed, only the parts of it that actually did change will get new extents on the filesystem.

This last change still did not address the problem of renamed paths since rsync will write a whole new file if there's not one in place already.

I don't have the data to make a determination of how well this worked for deduplication but my gut feeling is that it wasn't a huge amount. What also happened is that it made it much harder to manage, so if I could wind back time I would probably stop at encryption and compression.

Breaking point

Recently the disk space I had available for storing the backups was exhausted again and I found myself having a very hard time actually working out why. I mean, obviously the high level why is because I backed up too much stuff! But answering questions like:

which interval grew by the most?
which host in an interval grew the most?
which files changed between two specific pairs of interval/host (e.g. daily.1/foo.example.com vs daily.0/foo.example.com)?

…were extremely slow to compute given a tree of hardlinks. There are currently over 400 million files in my rsnapshot tree.

My use of reflinking made it even more difficult. Before reflinks it was a cheap operation to look at the inode numbers of two files. Remember: if they're the same then by definition the files are the same file. With reflinks in use each file is a collection of extents and there's no way to tell if they're identical without listing out the extent ranges and seeing if they match (and then if they don't you probably have to compare the content anyway, just to be sure).

Just walking through a few trees out of the filesystem was taking more than an hour and that's before trying to actually do anything with any of the files.

My gut feeling was also that there was more scope for deduplication, but I could think of no practical way to do it. By this point my backup host had four HDDs in a RAID-10 and 64GiB of RAM but trying to run something like duperemove across the whole backup tree would take all day (during which time no more backups could happen!) before running out of RAM and dying.

I gave BEES a go since this was by now a btrfs filesystem, but never got it to work properly. I considered zfs with its native deduplication but I found even 64GiB RAM wouldn't be enough. I could do without zfs's deduplication because the inherent deduplication of zfs snapshots would probably be enough, but my tests showed I'd still need more expensive hardware and my other off-site backups of this would have to be done differently as well (much more expense).

In the end I eventually did determine that there wasn't actually anything unexpected about my rsnapshot backups. They had simply outgrown the storage I had available.

Fixing that would require a new backup server though and I started to wonder to myself if just setting up the same rsnapshot on a new server was the best that I could do. Could I face spending all the effort just to end up with a thing that had more storage but still so many irritating limitations?

Having a look at the most recommended modern Unix backup systems, I decided to evaluate BorgBackup, restic and rustic as alternatives.

First impressions of alternatives

BorgBackup

rsnapshot is a pull-based backup system: the backup host connects out to every machine to be backed up and pulls the data back to itself. Every other system I was going to look at is push-based: The machine that is being backed up runs something that connects out to the backup server to push its data in to the backup repository.

Under rsnapshot each machine only had to run an sshd and have an rsync binary, but with any of the alternatives I looked at each machine would have to run the backup software itself.

Since BorgBackup is a Python application, this ruled it out for me straight away. I do not generally enjoy deploying Python applications and the idea of having to do it on pretty much every system I run was not appealing. I don't care if it could possibly be done by any of the means of packing a Python app into a single executable blob. I don't want to deal with it.

Moving on.

restic

restic at first glance has all the attributes I was looking for:

It's a Go application, so its single binary is completely static and will run anywhere. I embarrassingly do still have a couple of 32-bit (i686) hosts, so I'd just need two different restic binaries.
It does encrypted, deduplicated backups.
It supports many storage backends. I really only needed sftp but it also can talk to a thing called rest-server which is a separate application that exposes a restic repository over HTTP(S). This performs better than sftp, offers more interesting authentication options, and some other useful properties like "append-only mode":

The --append-only mode allows creation of new backups but prevents deletion and modification of existing backups. This can be useful when backing up systems that have a potential of being hacked.

restic seems to have a fairly active set of developers and a reasonably large user community.

rustic

rustic is a re-implementation of restic but in Rust. restic is well-documented and is of course open source, so its repository format is known. This has enabled a few different tools to work with restic repositories, perhaps the most ambitious being rustic which aims to be a complete but compatible alternative.

As far as deployment goes, Rust applications are almost as easy as Go ones as they statically link everything except the C library. This does mean that a few different compiles can be required depending on the version of glibc present on the host. Alternatively, a completely static binary can be compiled that uses musl libc instead of glibc, and that should work anywhere with the same architecture. Deployment was not going to be an issue.

Initial experimentation suggested that rustic might have a slightly flashier user interface (better progress bars, etc.), a few more convenient commands and slightly lower memory usage. I decided to start with restic though, owing to it being the more established project.

Importing data from rsnapshot

It would have been quite easy to just cut over from using rsnapshot to using the new thing, but I decided that it would be wise to import as much data as possible so that I could get a good idea how how the new solution would scale when it actually had an appreciable amount of data in it.

This was going to take quite a long time. The machine with the rsnapshot data on it is at a remote data centre, and the new machine I was experimenting with is at a different data centre of the same hosting company. While there was only 1.6 TiB of data on disk, any new system was going to have to read it all in its deduplicated raw state, which in this case was more than 14 TiB of data to be processed.

Pathname conundrum

The first stumbling block was restic's handling of path names. If you recall, I was starting with an rsnapshot repository with top level directory trees like this:

$ tree -L 1 daily.0/foo.example.com/
daily.0/foo.example.com/
├── etc
├── home
├── opt
├── srv
└── var

The daily.0 here is the interval of that backup that rsnapshot took. The directory foo.example.com contains the backups for a host by that name, and data for directories like /etc/, /home/ and so on are within that. So, given that I can tell restic both the host name and the time that the backup it is about to do (as oposed to having it assume hostname and current time), I could just script it being run against every host directory in every interval, right? Right???

Well it turns out that restic always wants to fully-qualify path names, so in the example above it would not be backing up /etc/, it would be backing up /srv/rsnapshot/daily.0/foo.example.com/etc/! The time and host name would be correctly faked, but those paths would not be the same as a later real backup run on the real foo.example.com.

The consequences of this are not dire.

restic uses a content-addressed store, so these path mismatches don't affect deduplication — data will be chunked and if that content is already in there then it won't be sent or stored twice no matter where it is found again, it will just be a small amount of metadata for the paths.

What it will affect is historical context, i.e. the ability to tell that /srv/rsnapshot/daily.0/foo.example.com/etc/fstab at time t is the same file as /etc/fstab at time t+1, where time t was a backup taken with rsnapshot and t+1 taken later with restic, is lost. A diff command done between these two snapshots will just show all the paths as removed files and then all the files put back again as new.

This could just be accepted: It would only matter for queries across that single boundary in time. restic's developers don't consider it a big deal and don't seem to have any plans to do anything about it. Fair enough. It bothered me though.

This is one area where rustic is a bit friendlier. It supports relative paths! So, if you:

change in to an rsnapshot directory like daily.0/foo.example.com
tell it to back up etc/, home/, opt/, srv/ and var/ (relative paths)
go to the real foo.example.com host
change to /
again back up etc/, home/, opt/, srv/ and var/ (relative paths)

then it will

correctly determine that the first backup can be "parent" of the current one, and
store matching paths which can be properly diffed afterwards!

Parent snapshots

The idea of "parent" snapshots is just to speed up change detection. If restic sees that it already did a backup for this host with this same set of paths then it will consider the most recent snapshot to be the parent of the new one. That just enables restic to compare its stored file metadata with the file metadata on the host being backed up, so it can decide whether it has changed and thus requires backing up again.

If restic can't identify a parent snapshot then it has to read all of the file content before it can determine whether any of it needs sending and storing.

Another nicety of rustic is that it can fake the time of a backup on the command line, whereas restic can only do so after the backup has completed — you basically issue a command to alter the metadata of the snapshot to say it happened at a different time. This creates a new snapshot with the desired time, leaving you to forget and prune the old one afterwards. That wasn't a huge deal, it was just a bit more convenient.

I decided to build a script around rustic that would start importing the rsnapshot backups.

restic vs rustic terminology

Even though at this point I was using the rustic binary to import backups, I will still generally call this system restic in this article because it's a restic structured repository and the protocol is what restic says it is. When I say rustic it will be in regard to the specific use of the rustic binary.

The import script

I wrote a script that would operate on an individual interval directory of the rsnapshot tree, e.g. daily.0 or weekly.3 and so on. The script would:

Get the modification time of the parent directory and assume the backup happened at that time
Iterate through each subdirectory, using the subdirectory as the host name for that backup and the time from step 1 as the time of the backup

This was using rustic over HTTPS to a rest-server in append-only mode. Importing the first interval got most of the unique data into the restic repository and after that each interval took about 2 hours to process. This time was quite predictable and I assume this was because rustic's work was dominated by processing through the data on disk, checking it against the repository and mostly finding that there was nothing to send.

Additionally, backups in restic can have tags as part of their metadata. I decided it would probably be helpful to tag all of these imports with from_rsnapshot and a tag based on the rsnapshot interval, e.g. rsnapshot_daily_0, rsnapshot_weekly_2, etc.

This was all going very well! I'd got all the daily.* intervals imported and was pleased to note that the 568 GiB of rsnapshot data this comprised has translated into 515 GiB in restic. The btrfs filesystem that the rsnapshot data was in had zstd:1 compression on it, and restic is using zstd for compression too, so this was probably due to the better deduplication.

Some steps towards real backups

Since I had quite a lot of spare time waiting for all these scripted imports to complete, I decided to work on my configuration management (Ansible) to set up each host to push a backup into restic every day.

Ignoring things that shouldn't be backed up

A tedious part of this was converting from my old way of ignoring files and directories into something that rustic supported.

My rsnapshot backups of course were using rsync which has its own filter language. I had set that to look for files called .bitfolk-rsync-filter in each directory. The rules in that file would only apply to entries in that directory. I had files like this:

- cache/

- .cache/

rustic has a --glob-file option where you specify a file that contains a list of glob patterns to exclude (or include). Exclude lines to replicate the above could be in just one file and would look like:

!/var/cache/
!/home/andy/.cache/

It's pretty clear how to convert from one to the other but I've got many hosts and I hadn't been disciplined about deploying the .bitfolk-rsync-filter files from config management. I ended up running this bit of bash:

#!/usr/bin/env bash

for f in $(sudo find /data /etc /usr/local /var \
            -type f -name .bitfolk-rsync-filter); do
    echo "# $f"
    dn=$(dirname "$f")
    sudo grep -Ev '^(#|$)' "$f" | sed "s|^- |!$dn/|"
done

That finds all the .bitfolk-rsync-filter files on the system and produces output like:

# /usr/share/.bitfolk-rsync-filter
!/usr/share/doc-base/
!/usr/share/doc/
!/usr/share/locale/
!/usr/share/zoneinfo/
# /etc/.bitfolk-rsync-filter
!/etc/logcheck/
# /var/.bitfolk-rsync-filter
!/var/cache/
!/var/lock/
!/var/log/
# /var/spool/.bitfolk-rsync-filter
!/var/spool/exim4/
!/var/spool/uptimed/
# /var/backups/.bitfolk-rsync-filter
!/var/backups/dpkg.status.*.gz
# /var/lib/.bitfolk-rsync-filter
!/var/lib/apt-xapian-index/
!/var/lib/apticron/
!/var/lib/arpwatch/
!/var/lib/greylistd/
!/var/lib/logcheck/
!/var/lib/logrotate/
!/var/lib/mysql/
!/var/lib/node_exporter/
!/var/lib/pengine/
!/var/lib/percentilemon/
!/var/lib/php5/
!/var/lib/schroot/
!/var/lib/smokeping/
!/var/lib/spamassassin/
!/var/lib/sudo/

…and so on.

I took the opportunity to check all of these ignores were still relevant, and put them into config management. This took a really long time!

Spreading the jobs out a bit

The vast majority of the hosts being backed up are virtual machines on a smaller number of bare metal servers. I didn't want them all to start running a backup job all at the same time.

In a systemd timer unit you can do a RandomizedDelaySec= to spread activations out randomly. I wanted there to be 24 hours between runs on any given host though. Since Debian 12 (bookworm) you can also specify FixedRandomDelay=true and then the delay will be random per host but deterministic on that host.

So, here's an example of a timer that triggers within 6 hours from 21:00:00:

[Unit]
Description=Do a daily backup starting at a random time \
    within 6 hours from 21:00

[Timer]
OnCalendar=*-*-* 21:00:00
RandomizedDelaySec=6h
FixedRandomDelay=true
Persistent=false
Unit=rustic-backup.service

[Install]
WantedBy=timers.target

Repository corruption incident

At this point I had a scripted import running from the rsnapshot host and had most of the production hosts ready to start doing daily backups overnight. The next day I found that my new backup host had filled its filesystem, and due to this my import script had bailed out and there were several partially-completed overnight backups. I had obviously missed some things on the production hosts that should have been excluded from being backed up.

I was able to get a list of all backups ("snapshots", in restic terminology) that had completed so I tried to do a diff between the most recent snapshot for a host and its corresponding one tagged rsnapshot_daily_0 in order to attempt to work out what I had accidentally backed up. This resulted in rustic complaining about missing pack files and crashing!

I thought perhaps it was because the filesystem was full. I grew the filesystem a bit. Same problem.

I thought perhaps it was because some of the last backups had only partially completed and the repository might need the check command to be run, possibly followed by some of the repair commands. I ran check and got:

$ sudo rustic check
[INFO] using config /etc/rustic/rustic.toml
[INFO] repository local:/srv/restic/repo: password is correct.
[INFO] using no cache
[00:00:00] getting snapshots...           ████████████████████████████████████████
[00:00:01] reading index...               ████████████████████████████████████████ 577/577
[00:00:00] listing packs...
[WARN] pack e04d6298 not referenced in index. Can be a parallel backup job. To repair: 'rustic repair index'.
[WARN] pack e0351889 not referenced in index. Can be a parallel backup job. To repair: 'rustic repair index'.
[WARN] pack 51cab1d1 not referenced in index. Can be a parallel backup job. To repair: 'rustic repair index'.
[WARN] pack 515975ce not referenced in index. Can be a parallel backup job. To repair: 'rustic repair index'.
[WARN] pack 51942075 not referenced in index. Can be a parallel backup job. To repair: 'rustic repair index'.

[…many more of this similar message…]

[00:00:00] listing packs...
[00:00:00] checking trees...              ░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░ 0/904
[ERROR] dir "boot" subtree blob 64b7176d is missing in index
[ERROR] dir "data" subtree blob f614dc41 is missing in index
[ERROR] dir "etc" subtree blob 9ff42fd0 is missing in index
[ERROR] dir "home" subtree blob f4f1f48e is missing in index
error: `rustic_core` experienced an error related to `internal operations`.

Message:
Tree ID `64b7176d` not found in index


Some additional details ...

Backtrace:
disabled backtrace (set 'RUST_BACKTRACE="1"' environment variable to enable)

So, repair index then?

$ sudo rustic repair index
[INFO] using config /etc/rustic/rustic.toml
[INFO] repository local:/srv/restic/repo: password is correct.
[INFO] using no cache
[WARN] error reading pack 5e7121d3 (-> removing from index): Error: Data is too short (less than 16 bytes), cannot decrypt. (kind: related to cryptographic operations)
[WARN] error reading pack 50209f8e (-> removing from index): Error: Data is too short (less than 16 bytes), cannot decrypt. (kind: related to cryptographic operations)

[…many more…]
$

That one was at least completing without crashing, but afterwards check still crashed and repair index when run again still reported the same problems.

The documentation warns against trying to continue doing backups if the check command did not report a healthy repository, so my import was now stalled and future backups not happening. I had spent two days trying to get help on this but it was only the main author of rustic responding to me. I started to become uncomfortable that if I had problems with my backups there would be limited resources to help me out.

I decided to stop evaluating rustic at this point and spend more time with restic, allowing for its shortcomings that I'd already identified (the path names and specifying fake times things).

A return to restic

Since I'd decided to start over again with restic I was prepared to destroy the data in my restic repository that had been put in by rustic, that was currently corrupt and unusable anyway. On reading the documentation for restic it did suggest that running repair snapshots might sort things out. Of course, they would never promise to support a repository with snapshots that different softyware (rustic) had put into it, but I had nothing to lose.

I ran restic repair snapshots and this claimed to work, after removing a few of the partially-completed backups. A later restic check came back clean.

rustic most likely did nothing incorrect

I want to stress that I've no reason to believe that I couldn't have fixed my repository by running rustic repair snapshots and carried on using rustic. I hadn't got as far as being suggested to run repair snapshots by the rustic author. I also have no reason to be sure that restic would have behaved any differently if I had allowed it to fill the filesystem of the repository. The repository format is supposed to be compatible so they probably would have behaved the same.

What made the difference, and what made me decide to carry on with restic for now, is that the documentation and support available allowed me to feel more confident in what I was doing.

I was able to see both from logs on the machines being backed up and from the list of snapshots which ones had added the most data, and by doing a restic diff between the snapshot tagged rsnapshot_daily_0 and the most recent snapshot for a given host tagged auto (the tag I was using for the new, regular daily backups) I could see lots of things I had failed to exclude.

It is possible to tell restic to remove some paths from an existing snapshot. It just makes a new snapshot without the specified data present and then you tell it to forget and prune the original snaoshot(s). This unwanted data was only in some of the previous night's snapshots tagged auto though, so I just got rid of those snapshots entirely.

I was now ready to resume my imports and daily backups but I was still a little paranoid about the health of the repository. check and every other command I ran showed me sensible output and really I probably shouldn't have been concerned because, well, this store is encrypted, right? If it was corrupted I wouldn't be able to run check successfully nor be able to list off the files within the snapshots.

As a compromise I decided to run the imports again but first:

Mark all existing imports with tag suspect so I could identify them later.
Plan to run new imports with extra tag second_try, again so they could be distinguished.

My thought here was that if I am importing the exact same data over again then yes it will still take ages to read all of that, but as long as the data that's in the repository already is still correct then restic is just going to see the same data and not actually send it in.

Dealing with the absolute path names

As mentioned, one of my major reasons for exploring rustic instead of restic was so I could make the path names in the backups match. I wasn't going to let this beat me.

Given that restic is a static binary it's extremely easy to run it in a chroot, because it just doesn't need much else besides the binary itself. For example, let's say I want to back up the directories etc/, home/ and src/ that are found within /srv/rsnapshot/daily.0/foo.example.com/, but I want restic to think those directories are actually at /. I just have to do this:

# mount --bind /srv/rsnapshot/daily.0/foo.example.com /mnt/fake_restic_root
# cp /usr/local/sbin/restic /mnt/fake_restic_root/restic
# cp /usr/local/etc/restic/passwd /mnt/fake_restic_root/restic_passwd
# mkdir /mnt/fake_restic_root/tmp
# mount --bind /tmp /mnt/fake_restic_root/tmp
# chroot /mnt/fake_restic_root \
    /restic --insecure-tls \
        -r 'https://user:pass@192.168.10.20/'
        -p /restic_passwd \
        --verbose \
        backup \
            --host foo.example.com \
            --tag from_rsnapshot,rsnapshot_daily_0,second_try \
            /etc /home /srv

This is an abomination, but it works, and I only had to do it once. It made restic see all the data directories as if they are rooted at /. Since I hadn't set up resolv.conf or anything there's no DNS resolution so the (remote) repository had to be specified by IP address. As there's no CA store inside there the --insecure-tls flag had to be used¹.

It wasn't hard to add that to my import script and set it going again.

Dealing with the time faking

The other nicety of rustic was the ability to specify the fake backup time on the command line. With restic I found it easier to just do the import and then afterwards do:

$ for d in daily.*; do stat -c '%n %y' $d; done
daily.0 2025-08-17 21:03:22.000000000 +0000
daily.1 2025-08-16 20:45:51.000000000 +0000
daily.2 2025-08-15 21:00:44.000000000 +0000
daily.3 2025-08-14 20:46:47.000000000 +0000
daily.4 2025-08-13 20:51:27.000000000 +0000
daily.5 2025-08-12 21:50:05.000000000 +0000
daily.6 2025-08-11 20:33:44.000000000 +0000

I then had a script over on the repository server go through every snapshot that was tagged both rsnapshot_daily_0 and second_try and adjust its time with the equivalent of:

$ sudo restic rewrite \
    --forget \
    --tag rsnapshot_daily_0,second_try \
    --new-time "2025-08-17 21:03:22"

and so on for each other interval.

It was easier to do this on the repository host because clients are accessing the repository through a rest-server that is in append-only mode: they can't actually forget and prune old snapshots.

Import all done

Finally the import was all done and I was satisfied with it. I did a forget --prune on all the snapshots tagged with suspect and had a look at the new situation.

The 1.6 TiB of data from rsnapshot was all in restic where it took up 920 GiB.

Memory usage can be problematic

Daily backups have been happening for a while now using restic. I have quite a few low spec virtual machines that have only 1 GiB or 1.5 GiB of memory and this has proven to be a problem. restic is using between 600 and 800 MiB memory which is just too much for those tiny VMs even though they don't have a lot of data to back up.

Searching around I found a recommendation to set the environment variable GOGC=20. That does seem to reduce usage by about 10% for me.

I was able to make it work on these low memory VMs by giving them another 1 GiB of swapfile. Obviously this isn't ideal as it makes the backup run take longer, and also blows out the disk cache of the VM every night.

It's possible I may dare to go back to trying rustic on these VMs.

TODO

Pruning

I haven't yet set old backups to expire. That's basically going to be something like this:

$ sudo restic forget \
    --tag from_rsnapshot \
    --group-by host \
    --keep-within-daily 8d \
    --keep-within-weekly 1m7d \
    --keep-within-monthly 13m \
    --keep-within-yearly 6y1m
$ sudo restic prune

(and then again for the auto tag.)

Decide about the additional remote backups

There's more than one backup, except those are paused at the moment because they all used to come out of rsnapshot. I need to decide about how to re-do those. At least one of them probably should not be inside an opaque blob like a restic repository.

Use standard input for database backups

For things like databases I've generally been doing a daily dump into the filesystem, compressing that with gzip --rsyncable and then backing that file up. That works but it's not ideal as:

It stores data twice on the filesystem
It changes every time even when the database doesn't
Even small changes in data produce large deltas in the gzipped file

restic supports backups directly from standard input. That will solve the above issues and will still be stored compressed in the repository.

Move to multiple repositories

I've come to the decision that a single restic repository for all hosts being backed up is too risky from a security point of view.

The issue is that for the backup to be automated the repository secret key must be on the client machine. If the client machine — any client machine — is compromised then the attacker has the secret key needed to decrypt the backups for absolutely everything.

The append-only mode of the rest-server means that the attacker would be unable to destroy the backup data, but having access to absolutely all data is unacceptable.

The good news is that rest-server can talk to multiple repositories each with its own keys. Client machines can have individual credentials and an individual repository URL and they will not be able to access or decrypt anything else.

The additional good news is that there's a copy command to copy snapshots from one repository to another, so I can fairly easily reconfigure clients and move their old snapshots over. I've started to do this.

The bad news of course is that there won't be any cross-host deduplication after this. I'm going to have to live with that. There isn't any cross-host deduplication in rsnapshot either; in fact there is no deduplication there at all except for between exact path matches. If things went from 1.6 TiB in rsnapshot to 920 GiB in a restic mega-repo then without cross-host deduplication we could expect it to move a bit more towards 1.6 TiB, but not all the way. Probably not all that far, actually. I shall report back.

It was running out of storage capacity that prompted all this in the first place, but only because I wanted the new thing to be easier to manage. I don't actually in principle have too much of an issue with 920 GiB inflating even 1.7x, though I would hope for less.

Not hugely important since the stream is encrypted by restic anyway, and then encrypted again with TLS. The risk would be man-in-the-middle impersonation by IP address.

by Andy at September 07, 2025 12:00 AM

August 13, 2025

Phil Spencer – Social media crackdowns

It’s become painfully obvious how easy it is for us the average internet user to have all the content we see be very carefully filtered so we don’t know what’s going on. Last week I broke through back into the protest side of tiktok and saw that the protests were still going on strongly in many US cities. The term Music Festival was used by organizers to avoid filtering but this just highlights the huge problem we currently have using global social media platforms.

The US citizens are still protesting in all major cities
ICE is still kidnapping people
Israel is still committing genocide on Palestinians

I keep saying this every couple of years but we do need to go back to older forms of communication on the internet like blogs, forums , self hosting etc. The corporate social media is all compromised as far as legitimate content goes. The web needs to web again, link to your friends sites directly. Don’t rely on search engines.

Decentralize yourself

Mastodon in some ways is still immune to this because you can run your own instance but admin infighting leads to gaps between instances. Bluesky makes itself sound like it’s decentralized but in the end it is a central business and will in all likelihood become compromised.
Self host your web server when possible, if you use a big provider you’ll probably get filtered by them as well. This is something I never thought would happen but some of it can still be within our control if we choose to.

Be Safe, decentralize yourself.

by KingPhil at August 13, 2025 11:28 PM

July 21, 2025

Ross Younger – A web toy: Forced ranking assistant

I wanted to arrange some items in a strongly ranked order of preference, but found it quite hard. So I made a little web tool that tries to help.

The idea? A sort of battle royale.

Consider two of the items chosen at random. Which do you prefer? Give it one point.
Repeat for every possible pairing of the items, and add up the scores.

The tool: https://crazyscot.github.io/forced-rank/

It’s a Javascript single-page application written in VueJS. Source code

by Ross Younger at July 21, 2025 05:21 AM

July 09, 2025

David Leadbeater – CVE-2025-48384: Breaking git with a carriage return and cloning RCE

tl;dr: On Unix-like platforms, if you use git clone --recursive on an untrusted repo, it could achieve remote code execution. Update to a fixed version of git and other software that embeds Git (including GitHub Desktop).

by David Leadbeater at July 09, 2025 05:30 PM

July 05, 2025

Josh Holland – Worstsort yet again: polymorphic recursion

Worstsort yet again: polymorphic recursion

5 July 2025

A long time ago, I wrote a couple of posts about a maximally slow (but still non-pathological) sorting algorithm which I’d tried to port to Rust. It compiled fine, until I tried to write a test for it and it suddenly blew up a recursion limit when trying to compile it. Eventually I realised this was due to Rust’s usage of monomorphisation meaning that it couldn’t statically compile infinitely many different instances of the function each time we wanted a version for a more nested list type. I was somewhat happy with that explanation, but I was left a little confused about why Brent Yorgey’s Haskell implementation worked fine. I concluded:

So, there are two key questions: why was this not a problem for Haskell, and is there a way to get round it in Rust? At the moment, I don’t know! I hope to get some free time to investigate this at some point, and I’ll definitely write up whatever I find on here.

That day has, apparently, and by complete chance, finally come, when I stumbled upon the key phrase: “polymorphic recursion”. Wikipedia defines it as “a recursive parametrically polymorphic function where the type parameter changes with each recursive invocation made, instead of staying constant”. In the case of Worstsort, the generic type is changing with each call from T or a (in the Rust and Haskell respectively) to &mut [T] or [a]. It’s actually mentioned in Brent Yorgey’s original post in an aside that I must have glossed over at the time, or not thought of it as a set term with a particular technical meaning. But the other day, somewhere, I encountered a link to a StackOverflow question (I forget where now) asking about polymorphic recursion in Rust and the penny dropped that this is the name of the property that’s missing from Rust. It seems to be an issue that a few people have run into: there’s an issue that’s been open for a while asking for better error messages (possibly even including the term!) when the compiler detects it, and there is a steady stream of other issues being marked as duplicates of it.

My initial thoughts now I revisit this 6 years later is that there’s nothing intrinsic to Rust’s type system that would prevent this from being compiled, and it’s more an implementation artifact. However, it definitely feels against the spirit of Rust to wrap all the type information up somewhere behind indirection and let functions work on pointers to things. I wonder if it’d be possible to work around it at the programmer level somehow, although since it’d all have to be at run-time you’d be working entirely round the type checker. Another side project to look into at some point I suppose.

by Josh Holland at July 05, 2025 12:00 AM

June 24, 2025

David Leadbeater – Can your terminal do emojis? How big?

Ancient history meets modern terminals... Looking at varying support for DECDHL in terminals.

by David Leadbeater at June 24, 2025 01:58 AM

June 16, 2025

Josh Holland – KubeCon Europe 2025: days 2-3 (at last)

KubeCon Europe 2025: days 2-3 (at last)

16 June 2025

I’ve (finally) finished writing my my notes from the remainder of KubeCon; see my previous post for some background and initial thoughts. This is a lot later than I’d like, I was pretty tired after each day in the end and then life and other work has taken over a little bit since then. Hopefully it’s not all too murky to make sense of my notes. At least by now, the videos are up, so I can jog my memory as I go. I’ll add links to the recording of each talk, and also go back and do the same for my previous post.

This’ll be a pretty long one, as a paragraph or two for all the sessions I attended adds up fast. My general summary is sadly a bit removed from my feelings at the time, but I’ll do my best to remember my feelings at the time.

Day 2

Keynotes

This was the first day we made it to the keynotes. Unfortunately, there were a few issues with the clicker and screens for the speakers today. Today’s keynotes were the “end-user showcase”, where “end-user” in CNCF jargon means “company which has deployed Kubernetes” (not “someone who visits a website served by Kubernetes”). As a result, I didn’t get as much from these keynotes as I did from the talks in general or the next day’s, especially as almost all of the end-users were operating on a vastly larger scale than we do.

My notes from the flurry of end-users on stage are pretty brief, as each one only had a few minutes, and I had to take their (obviously sugar-coated) takes on Kubernetes tech at face value without being able to get into much detail.

HSBC

My only notes here are “stunning shirt” and “way bigger than anything we do”.

Peptone

As they did for the whole conference, my ears metaphorically pricked up when I saw a research-y organisation appear on stage, in this case a biotech company modelling disordered proteins or something (biology was never my strongest science). My main takeaway from this one was that they found it easy to migrate from dev machines with local GPUs to using Nvidia cards in the DGX cloud.

Spotify

This talk was focused on their Backstage “Internal Developer Platform”, which is a UI to let developers (and presumably SREs, platform engineers, devops engineers and whatever the trendy job title is these days) see what is going on across Kubernetes. It was interesting to hear how they had to balance company-internal needs and those coming from the open-source community as they built improvements.

Apple

The first of many times that I saw Katie Gamanji on stage at this conference, today with her Apple hat on (rather than as part of the Technical Oversight Committee as she appeared later on). Her topic was about how Apple used gRPC to connect the Apple Intelligence system (running locally on iPhones/iPads/Macs) to private clusters. Surprisingly enough for an Apple project it was written in Swift, and as a result it was too hard to resist mentioning that it was Memory Safe (as opposed to all the other CNCF projects written in Go).

End User Awards

This was mostly a promotion for various engagement activities run by the CNCF for end users. It was the first time I’d heard “observability” being reduced to the numeronym “o11y”, which for some terrible reason the speaker pronounced as “olly”. Numeronyms in general have a slightly bad reputation, which might not be entirely deserved, but I don’t think there’s ever a reason not to pronounce them as the thing they abbreviate.

Some other initiatives that seemed interesting were the Academic Accreditation Program to certify academic courses at universities and other educational institutes, and GitJobs.dev, a jobs board promoting careers opportunities with time allocated to working on open source and upstream projects.

CERN/Linux Foundation/OpenInfra

I thought the HSBC talk earlier was a big estate, but CERN’s system puts almost everything else to shame. The top level numbers for (I think) just the ATLAS project come to over 10,000 servers with several petabytes of RAM between them. They were dealing with petabytes of data in the early 2000s, and are now looking at exabytes.

Michelin

This was another end-user example running on a scale much larger than our dozen-node, several-hundred-pod deployment. Still, the problem they were talking about, vendor migrations, is something that applies at all scales, and building a solution entirely on open-source technologies is a great way to remain vendor-agnostic.

Red Hat

It had been a while since anyone talked about AI, so it was about time for a run of talks about running AI models on Kubernetes. The trend is for pressures on cloud infrastructure and technology to push towards solutions based on agents and orchestration models working with smaller models, which in theory is a good fit for Kubernetes, but the problem is how to handle state, at a much larger and more dynamic way than traditional workloads have demanded.

Panel discussion

The panel featured Joseph Sandoval from Adobe, Liz Rice from Cisco, and the return of Katie Gamanji of Apple to talk about the history of the CNCF around its tenth anniversary. The CNCF took credit for the convergence of the industry on Kubernetes as the standard orchestration engine, as a vendor-neutral foundation with ownership of the project. Focuses for the future include multi-cluster observability across providers; cost management, sustainability and hardware management; and secret management (which is definitely something I’d be interested to do better!).

Solo.io

This was pretty much just an advert for Solo.io’s sidecarless service mesh. I still don’t know what a service mesh is or why I’d use one. It also includes some Kgateway thing that lets you connect to LLM providers, because it had been too long since anyone had mentioned AI.

Mirantis

This was more or less another advert, one much more focused on AI, showing how their new control plane is great for firing up LLM workloads. It included the amazing line “Kubernetes has the opportunity to win open source”. I wasn’t previously aware that open source was a competition you could win, but apparently Kubernetes can do it.

NAV

This was probably the keynote I found the most interesting, perhaps because the public sector is a lot closer to academia than industry is. They were presenting the PaaS they had built for the Norwegian government, built by the Norwegian welfare service. They had begun by creating a community for public-sector cloud users in Norway, with in-person meetups and a Slack. Then they built their “Nais” platform. In true public sector form, it’s a backronym from the word “nice”. NAV (the welfare service) in-sourced development of their services, built and open-sourced the Nais service and now has over 3000 open-source code repos. In 2024 the service did something like 3000 production deploys per week. They had a similar experience to us, finding the non-code stuff harder than writing code.

OTel sucks!

I’ve subtitled this one “and so does the conference wifi”, since no amount of technical expertise on the event side will compete with 13,000 geeks trying to use several devices each at the same time. I was hoping this would be something of an introduction to OpenTelemetry, which seemed to be everywhere. Monitoring and alerting is something I’ve wanted to improve at work and I was hopeful that OTel was something we could put to use on that front. The format used in this “X Sucks!” talk is to have a bunch of community members give (edited) complaints about X, then re-play the full video in which it is revealed that the complaints are actually strong points. My notes contain a lot more downsides than advantages, which is perhaps indicative of how this format can end up a bit back-slappy.

The quote I’ve jotted down perhaps sums up OTel: “It is a lot”. The speaker counted up 13 distinct APIs and a huge number of SDKs for different languages. Autoinstrumentation can be a bit overzealous in collecting unnecessary data and overload the collection process. The one pro I’ve got is that the “stability of the semantic convention is taken seriously” which seems a little opaque. The semantic convention came up almost every time OTel was mentioned, and it’s clear that it’s a very important idea.

Strengthening Auth in Kubernetes

I’m not really sure why I decided to go to this session: perhaps I was hoping to understand better how to handle multiple users in one Flux cluster. In any case, I did pick up one or two interesting and useful things from this talk. I’ve split my notes into APIs which have “graduated” and those which are “upcoming”.

In the graduated section, I have noted down a new way to pass CA roots into Pods, which is not something I see myself using any time soon (although we have occasionally run into issues before we migrated to LetsEncrypt with the certificate on our GitLab container registry expiring). There was also something about new authorisation¹ for the Kubelet API where it’s now possible to restrict things to the /configz, /healthz and/or /pods endpoints.

The “upcoming” part had some things which were a bit more relevant. It turns out that if an image which requires authentication had already been downloaded, the Pod’s credentials aren’t checked. I was pretty surprised that hadn’t already been the case, but soon that little hole will be closed. I’ve also written down “Service Account pull credentials rather than separate secret options”. I’m not sure what this means, and I can’t find the point in the talk recording which prompted me to make this note. Hopefully it means something to someone.

KubeCon Family Fortune

Another “fun” presentation, taking the popular game show format (also known as “Family Feud”) to Kubernetes. I appreciated them framing it as “Tabs v Spaces”, although a serious opportunity was missed by putting Tabitha on the “Spaces” team, or at the very least not by pointing out this contradiction.

Navigating the inevitable: Kubernetes Breaking Changes Behind the Scenes

This talk on how Kubernetes (or at least the core project) manages breaking changes definitely taught me a few interesting things. The speaker started by distinguishing two types of breaking change: those with and without a mitigation, where a mitigation is some sort of config change or similar that lets you still more or less use the feature in question. The Kubernetes project categorises them as Major Changes or Removals, which map more or less one-one onto the with/without mitigation dichotomy.

Operationally, breaking changes in Kubernetes always go through a deprecation cycle, although there’s often not a lot of incentive to act on deprecations until the feature is finally removed or changed. As always, it’s a good idea to look at the changelog for both removals and deprecations when upgrading. A change in Kubernetes starts as a Kubernetes Enhancement Proposal (KEP), so if you really want to stay ahead of the game, then watching the KEPs is one way to do that.

Another key aspect to be aware if is Kubernetes-hosted infrastructure, which has no particular guarantee or SLA. If you do rely on anything like that, it’s very important to keep an eye on the Kubernetes news channels and run a mirror if necessary. Two key news channels are the kubernetes-announce mailing list and Last Week in Kubernetes Development.

Image Snapshotters for Efficient Container Execution in Particle Physics

Another talk by CERN was my last one of day 2. They started with the stat that 85% of the bytes downloaded are not used by a container. Physicists’ containers often end up gigabytes in size in such a way that they can’t be slimmed down. This means that images have to wait a long time before the whole container is downloaded before they can start. One way to reduce this startup latency is lazy pulling, where the storage layer intelligently fetches only the bytes that the container actually needs, at the cost of some runtime performance.

Lazy pulling requires some metadata indices to the container so that the runtime knows what bytes are needed for each request by the container. The main solution presented here was CUMFS, a FUSE-based filesystem developed at CERN. It’s a fully global lazy filesystem, which is generally mounted at /cumfs by nodes at CERN. It serves something like 4 billion files running to multiple petabytes (they said it had been proven up to 100 PB). They showed some every impressive looking graphs, but warned about using lots of small files, and the fly crawling across the lens of the projector was a little distracting.

Day 3

Keynotes

Google/Bytedance

This was basically an announcement for their new Inference Gateway for self-hosting LLMs (the first hint at something of a theme for the keynotes this morning). One issue is how LLM requests are quite variable and unpredictable in terms of input and output size. There are also issues where, for supply chain and hardware availability reasons, clusters might run a range of different GPUs. Overall, traditional microservice requests (small, predictable, cheap) are very different to LLM requests (large, variable, expensive), so the Inference Gateway provides more intelligent load balancing.

Oracle/Red Bull Racing

This was a pretty content-free talk really, just Oracle making the most of sponsoring the F1 team. They spoke a little about how they were using AI for real time insights for race strategy, and also getting input on FIA regulatory desicions. They also managed to make it seem like they were rather ignorant of recent politics by saying that this was all happening “right here in the EU” on a stage in post-Brexit London.

Panel: cloud native in telecoms

I’m not sure how informative this panel discussion was really. The main takeaways I had were that telcos tend to be reasonably conservative, with physical boxes wired together virtually. They spoke about an ambition to upgrade to 6G by some sort of rolling upgrade or operator subscription, which sounds pretty ambitious.

Cutting Through the Fog: Clarifying CRA Compliance in Cloud Native

This was the first detail I’d heard about the Cyber Resiliance Act, upcoming EU legislation for “products with digital elements”. It’s unclear to me what a scientific institute working (generally) outside the EU, albeit often partnering with EU-based universities and research centres, has to worry about the CRA. Still, it is a codification of mostly good practices, but how that shakes out for open source projects rather than companies remains to be seen. One interesting aspect of this is the notion of “stewards” as an intermediate between maintainers and manufacturers, which is likely the role that the CNCF and Linux Foundation would assume.

Lessons Learned in LLM Prompt Security

It had been a whole two talks since we last talked about LLMs, so it was time to bring them back up again. Despite being a sponsored keynote, I did find this one a little bit interesting, although more in the way of morbid fascination. If you are running an LLM to which users can submit queries, how can you make sure their prompts aren’t dangerous (whatever that means in your usage context)? The obvious way, if you are that deep into the hype cycle, is of course to run another LLM to classify prompts as “safe” or “not safe”². However, as we saw in the earlier talk, LLM queries tend to be slow and expensive, which is not what you want when you are building a filter into your load balancer. More “advanced” techniques, such as text filtering(!), are required.

221B Cloud Native Street

It’s Katie Gamanji again! This time, it was a presentation from the Technical Oversight Committee (TOC), giving their update on what they’d been up to. They explained what the CNCF project maturity levels meant and let us know where the Technical Advisory Group (TAG) reform was going. They also gave an update on the End User Technical Advisory Board, which is currently working on “Feedback Loops”, “Reference Architectures” and “Gaps”.

Science at Light Speed

Another talk focused on science, but unfortunately again at a scale way beyond the few gigabytes a year we’ll be collecting. The Square Kilometre Array (SKA) is a radio telescope currently under construction in Australia and South Africa. It will produce something like 600 PB of data each year, and this presentation went into how you can possibly build infrastructure to handle that. The starting point will be local processing centres in the same country as the telescope and then distributed resources around the world.

The project has 14 SKA Regional Centres (SRCs), which independently use whatever infrastructure they choose to provide a unified service layer, as much as possible with off-the-shelf tools. The speaker was from the Swiss SRC and explained how they used worker nodes from Swiss supercompute facilities. There was also a quick demo of using the network to access some data, and, as seems to be the case for every single data science demo, the example used was to cookie-cut a subset of the data out.

Ensuring Quality in Kubernetes: The Graduation Process From Alpha To GA

This talk pairs with the other one I attended about the lifecycle of deprecations in Kubernetes, and describes how APIs go from alpha to beta to general availability, and a bit of describing what each step along that process means.

Alpha APIs should work and not be buggy, although the interface is not necessarily stable yet. They emphasised the point that API stability is not just about syntax, but also semantics and behaviour.

Alpha and beta APIs are disabled by default, and part of the consideration as they progress is the scalability and performance requirements. The testing requirements also increase as the feature moves towards GA. The talk finished with a bit of a discussion about the project’s strong policy about avoiding flaky tests, and how much work had been done to reach that standard.

The State of Prometheus and OpenTelemetry Interoperability

I got most of the sort of introductory overview of Prometheus and OTel I was hoping for out of this talk, even if it wasn’t intended as such. There was a lot of describing the philosophical differences between the two systems.

The first difference is how Prometheus pulls data while OTel pushes it. Pulling data means that the instrumented service has to keep its metrics in memory for when it is queried, and doesn’t let you only submit data when it changes, which are two points in favour of push-based architecture. A key downside of pushing data is that your instrumentation finds it difficult to tell the difference between your app not existing and your app being down. I’ve scribbled down the following table to compare the use cases for the two:


Prometheus	OpenTelemetry
Time series database	Instrumentation framework
Monitoring and alerting	Creation, collection, processing and export of metrics

Another issue of interoperability is metric names: OTel supports the whole of Unicode, whereas (until version 3.0) Prometheus only supports alphanumerics, _ and -.

A goal of the Prometheus project is to be the best backend for OpenTelemetry. One way to bring the two together is to use versioned reads in Prometheus to access metrics which were renamed from the Prometheus naming style to the OTel one. OTel deltas can be converted to cumulative metrics, but resource attributes (like information about OS, architecture, process IDs or Docker) have taken a few iterations to translate into Prometheus:

turn them all into labels, but this led to cardinality issues
create a target_info metric which holds all the resource attributes and use PromQL JOINs to access them, but PromQL JOINs are very hard to use and form a barrier for many users
let the user configure which resource attributes to turn into labels, but this is then an admin nightmare

User research for the best solution is still ongoing.

Finally, since we are using Parquet for storing time series in the current project at work, my ears pricked up when they mentioned a Parquet storage working group, even if it is still early days.

And that’s it! After that last Prometheus/OTel talk we had to leave the conference to catch the train back to Lancaster. It had been an exhausting few days but I think I learned a lot about Kubernetes. It was also quite eye-opening to see a conference of that scale, drink all the coffee, eat all the cakes and meet lots of people. Maybe I’ll go again next time, and not take 3 months to write up all of my notes.

It always feels weird spelling “authorisation” in the British English way when it’s so frequently abbreviated to “authz”.↩︎
What could possibly go wrong‽‽‽‽↩︎

by Josh Holland at June 16, 2025 12:00 AM

April 06, 2025

Jon Spriggs – How I deploy Vaultwarden to provide a Bitwarden compatible service in Kubernetes with Monitoring and Backups

This initially was going to be a mammoth blog post going through all of the lines of code in how I’ve built a Vaultwarden service in Kubernetes rather than just writing what I’ve done. You can just look at the git repo and see what’s there! Ask for comments on that if you need more details!

So, instead, let me link you to the helm chart and docker containers I created, and I’ll pull out some notes on some of the specific details in there.

https://github.com/JonTheNiceGuy/vaultwarden-helm-chart

This helm chart comprises of the 4 services I feel you need:

vaultwarden <- The actual password safe service
vwmetrics <- Prometheus Metrics for the service
vaultwarden-sync <- A packaged deployment of the directory synchronization tool from Bitwarden
vaultwarden-backup <- A tool to backup the data directory and the database from Bitwarden.

In addition, the chart allows you to provision dynamically allocated Persistent Volumes through a StorageClass, and flexibility to set all of the variables in the Vaultwarden settings file.

The biggest “weird-ish” thing I’ve done is to create a configuration file as a secret, and mounted that configuration file into the vaultwarden container. This prevents compromised hosts from being able to extract admin tokens and database credentials from process environment variables. That said, it would be better to somehow make this a Read-Once value, which I believe is possible with something like Hashicorp Vault, or SOPS. If you’ve got any advice on how to do this, I’d be very grateful for your advice!

I’m not exactly overjoyed with the vwmetrics, as it doesn’t expose any internal metrics, just a count of the number of each type of asset in the database, but the project are clear they don’t want to add any additional tracing to the application, so this is the best we can do.

vaultwarden-backup is a script I wrote which reads the vaultwarden environment file to get the database credentials and data path, and then backs up both database and non-database files (following the official guidance). In this invocation, the only fields required from the environment file are the path to the data directory and the database credentials are required, so the config secret stores those as a separate key. It also means that this can be just a Read-Only database credential too.

I wrote this script because no-one had released a containerised script that performed the database backup in something other than sqlite that I’d seen.

vaultwarden-sync is a wrapper I wrote to get the Bitwarden Directory Connector, and setup the configuration files to support performing LDAP sync. The other directories have not been tested, but are configured according to the changes to the configuration file when you configure them in the Bitwarden Directory Connector GUI.

I wrote this script because I couldn’t see any way to run the Directory Connector as part of an all-in-one set of containers for my cluster.

Both the backup and sync tools use the livenessProbe feature of Kubernetes to execute themselves, and use the termination log as their output method. This is a method one of my colleagues found when we were setting up some inter-cluster communication tests a while ago, and it works really well where you need to see the status of a long running loop.

I should stress, this is not a “fully-packaged” helm chart. It’s a learning aid, both for someone who hasn’t written many helm charts, and for me, to get feedback from people who *do* write lots of helm charts, and are prepared to tell me how I can do better!

Featured image is â€œRiggs Bank Vault in Washington D.C.â€� by â€œSteve Jurvetsonâ€� on Flickr and is released under a CC-BY license.

by JonTheNiceGuy at April 06, 2025 10:45 AM

April 02, 2025

Josh Holland – KubeCon Europe 2025: day 1

KubeCon Europe 2025: day 1

2 April 2025

This is a fairly non-structured log of my thoughts and notes from attending KubeCon Europe 2025 ¹. I was lucky to be sponsored by work, so I made at least some attempt to get to talks and sessions which seemed relevant to that. Unfortunately getting to the ExCeL venue from Lancaster on the Wednesday morning meant that I missed the morning keynotes.

I wasn’t sure what to expect from the event really: it was the same week that I returned from a month of leave (my attendance had been arranged while I was away by a “do you want to go” text), and it’s by far the largest tech conference I’ve been to. So my preparation (going through the schedule and saving all the talks with a vaguely interesting title and/or abstract) was a bit limited, and I didn’t have as long as I’d have liked to consider what I really wanted to get out of it. The rough list of things I was keen to learn more about was:

GitOps tooling around Flux, including secret management and the best way of managing both a production and staging cluster in a reasonably harmonious way
The experience of other scientific institutions in using Kubernetes (for example some of the many talks on using GPUs in a cluster)
Any other useful tools and knowledge I can pick up
Having a go at the CTF as I’ve never taken part in one before

I was less interested in some of the talks about scaling Kubernetes to huge deployments, as all the stuff we run is fairly small-scale, and didn’t really have much time for the LLM/AI stuff that filled up a huge portion of the schedule. It’s probably going to be pretty unavoidable throughout the event though.

The rest of this post consists of mildly edited thoughts from my handwritten notes taken during the sessions I was in.

Explain How Kubernetes Works With GPU Like I’m 5

Carlos Santana, AWS

The first talk we made it to was an introduction to the various layers involved in running GPUs on Kubernetes. The speaker broke down the layers from device driver, CUDA runtime, container toolkit to node and GPU feature discovery and the device plugin to handle scheduling and access to the compute resources. It was a good introduction to the various things that are needed to set up GPU-enabled workflows, but what caught me the most was a passing comment about how EKS hybrid nodes allowed an AWS-managed EKS cluster to include nodes running remotely (for example in a homelab) over a VPN like Wireguard.

Bringing Agentic AI to Cloud Native - Introducing kagent

Christian Posta, Solo.io

I missed the start of this one as I was wandering around trying to find the room for the CTF intro, but after giving up on that until the later session I ended up watching this “sponsored demo” of an LLM attached to a Kubernetes cluster. The introduction that I missed presumably explained exactly what I was looking at, but it seemed to be a web interface for a chatbot that was connected to a Kubernetes cluster which could explain the state of resources in a namespace, preview and apply changes (yes, it did have kubectl and working credentials), as well as reading and summarising online documentation for the user. The speaker also made a big deal of its support for MCP, which apparently stands for the “Model Control Framework” for “domain-specific extensions”. I have no idea what that is, but I’m sure if you are into LLMs that is a good thing.

Booths - Clickhouse & Wiz

Over the lunch break, we wandered around some of the booths in the exhibitor area. We had a look at the Clickhouse one, as it seems like it could be a better way to do some of the columnar querying needed in one of our projects than the current solution of a hand-rolled connection pool to a load of Parquet files in S3. They said their server was open-source, so perhaps it’s worth a bit of an experiment, especially as it can ingest Parquet files and query them from a client library.

We also chatted to the people on the Wiz stand, mostly because we’d heard they had recently been acquired for an ungodly amount of money but didn’t really know much more than that. I’m not sure how much need or budget we have for compliance and security scanning, but they didn’t look completely horrified when we said “non-profit” or “scientific research” so perhaps they have pricing options that might be compatible with what we do.

Poster - Enhancing Research and Data Delivery With the Data Delivery System (DDS)

Álvaro Revuelta, SciLifeLab Data Centre & Valentin Georgiev, Uppsala University

This was one of the things I thought would be the closest thing to the work we do in providing scientific data to other researchers. It turned out that it wasn’t really the same thing, more a tool for short-term sharing of datasets with known collaborators who have requested it specifically, rather than publishing ongoing data publicly for anyone to access.

An Introduction to Capture The Flag

Andy Martin & Kevin Ward, ControlPlane

Having successfully found the right room, there was a short introduction from the team running the CTF at KubeCon this year, in which they provided an overview of how it worked, one hint² and then some background music for a roomful of people trying to break into an imaginary Kubernetes cluster in a scenario used at a previous conference. I initially felt like I wasn’t making much headway, and was aware of the time ticking away until the next talk I wanted to go to, before I suddenly found the first flag just before I had to pack up and go back downstairs. I guess that’s how these things often go. I learned a lot more about Hashicorp Vault than I was expecting to and I look forward to having a bash at the “real thing” tomorrow.

The Life (or Death) of a Kubernetes Request, 2025 Edition

Abu Kashem, Red Hat Inc. & Stefan Schimanski, Upbound

This talk was framed as the answer to a hypothetical interview question of “what happens when you create a new resource with kubectl apply -f job.yaml?”. It gave a good tour of what happens inside the request handler in the apiserver, mostly covering the various validations, timeouts and audit logs that are added, as well as what “creating a new resource” actually entails in the registry and etcd. There were a lot of details that I’m unlikely to remember, but it’s almost certainly useful to have a sense of what’s going on in there, as well as some trivia like the differences between kinds and resources and what is going on with different apiVersions.

Flux Ecosystem Evolution

Stefan Prodan, ControlPlane & Sanskar Jaiswal, Kong

Again I missed the start of this talk having accidentally walked to the wrong end of the conference centre to find the room. Luckily, I don’t think I missed too much and was able to figure out that Flagger is a system for doing canary rollouts that we are unlikely to ever use at our scale. It’s not something I’ve ever looked into in detail, and while I’m sure it’s obvious to people who do do this sort of thing, the idea of progressively rolling a new version out automatically as long as the metrics look good is not something I’d considered before.

The main thing I was interested in from this talk was Flux, something we definitely do use. There were a lot of exciting-sounding new features discussed, mostly enabled by the Flux Operator. Ephemeral environments for PRs/MRs is something I’ve thought about before for when we are reviewing changes, and it seems like these should be fairly straightforward to set up with the operator, as well as making Flux component upgrades a lot easier than re-running the bootstrap to update the component manifests in the git repository. Even the presenter said that it was scary and easy to blow up your own cluster before!

The Ultimate Container Challenge: An Interactive Trivia Game on OCI, Podman, Docker…

Aurélie Vache, OVHCloud & Sherine Khoury, Red Hat

It was definitely by now time for the “fun” talks, starting with this interactive quiz about Docker and OCI containers. For a moment before the questions got too hard I made it onto the top 5 leaderboard, but then we got onto the things I’d actually come along to learn about. It was a good format to have the audience answer a question, then give the answer and a live demo to explain it even more. I was also very impressed with whatever technology they were using to handle typing the demo commands into the terminal, as it clearly was actually doing the work live but also seemed to grab the hashes out of the command output for use in later commands.

Museum of Weird Bugs: Our Favorites From 8 Years of Service Mesh Debugging

Alex Leong, Buoyant

This one was a bit more of a punt, as I didn’t really know what a Service Mesh was or how you’d debug one, but I always enjoy hearing war stories about this sort of thing. The morals of the two bugs presented are almost just about relevant to some of the things I do, and are probably general enough to think about: make sure you aren’t calling blocking functions in places that blocking would lead to deadlock/client service denial, and be careful with different versions of CRDs. I’d also not heard of HTTP2 flow control before, which is something good to be aware of before I encounter some weird bug caused by it in the future.

Clash Loop Back Off

The final session of the day was a fun game-show type system, which challenged two Kubernetes experts to solve a problem (from a shortlist of 3 where they didn’t know which would be picked) competitively live on stage. They had to provision a cluster, install a stateful workflow, back it up, delete it and then restore from the backup, all within 25 minutes and while being entertaining on stage. I had grabbed some dinner to eat while it was going on, and I really should have cashed in my first beer token as it was very light-hearted. A fun way to finish off the day.

That was my first day at KubeCon. I feel like I made it through more talks than I was expecting: I often find that regardless of how interesting the content of the talk is, unless the speaker is extremely engaging (by which I really just mean upbeat and hyperactive) I often find it hard not to drift off while sitting and listening. Pehaps having a notebook and pen to hand, even if I’m not compulsively taking full notes, wards off that sort of drowsiness, or maybe I just had enough coffee to keep me going.

Tomorrow we’ll be there in good time to see all the morning keynotes. The container quiz I saw towards the end was in the main auditorium, which was frankly outrageously large. If it’s close to being full then that will be far too many people in one place. Now, time to press publish and get some sleep…

If you are reading this in the future, the link may not be to the 2025 event that I attended; if I realise I’ll update it to a permalink.↩︎
“Try running kubectl auth can-i --list to see what credentials you have”↩︎

by Josh Holland at April 02, 2025 12:00 AM

March 25, 2025

Alun Jones – Faking a JPEG

× ⇩ Click to expand I've been wittering on about Spigot for a while. It's small web application which generates a fake hierarchy of web pages, on the fly, using - about 1174 words

by Alun Jones at March 25, 2025 12:00 AM

February 17, 2025

Jon Spriggs – Talk Summary – An Eulogy for Auntie Pat

Format: Theatre Style room. ~30 attendees.

Slides: No slides provided (nothing to present on!), but the script is here

Video: Not recorded.

Slot: 11 AM, 10th February 2025, 10 minutes

Notes: This is a little unusual. both because I’m posting it as a “Talk Summary” but also because it was a Eulogy. Auntie Pat died in December. The talk I delivered was my memories of her, augmented by a few comments from her next nearest relative, the daughter of her cousin. The room was mostly filled with people I didn’t know, except for one row with my brother and his family. Following the funeral, several people suggested I’d done very well. One person remarked they hadn’t heard the talk because they forgot to wear their hearing aid. I guess when someone passes away in their 80’s, most of their friends will be too. Several people expressed sadness that they hadn’t known all the things I shared about her. We all enjoyed memories of her.

by JonTheNiceGuy at February 17, 2025 02:59 PM

February 16, 2025

Chris Wallace – Art at Southmead Hospital

During my short stay with a bout of pneumonia, I spent each of four nights in a different ward. I...

by Chris Wallace at February 16, 2025 04:06 PM

Chris Wallace – Art at Southmead Hospital

During my short stay with a bout of pneumonia, I spent each of four nights in a different ward. I...

by Chris Wallace at February 16, 2025 04:06 PM

February 13, 2025

Alan Pope – Spotlighting Community Stories

tl;dr I’m hosting a Community Spotlight Webinar today at Anchore featuring Nicolas Vuilamy from the MegaLinter project. Register here.

Throughout my career, I’ve had the privilege of working with organizations that create widely-used open source tools. The popularity of these tools is evident through their impressive download statistics, strong community presence, and engagement both online and at events.

During my time at Canonical, we saw the tremendous reach of Ubuntu, along with tools like LXD, cloud-init, and yes, even Snapcraft.

At Influxdata, I was part of the Telegraf team, where we witnessed substantial adoption through downloads and active usage, reflected in our vibrant bug tracker.

Now at Anchore, we see widespread adoption of Syft for SBOM generation and Grype for vulnerability scanning.

What makes Syft and Grype particularly exciting, beyond their permissive licensing, consistent release cycle, dedicated developer team, and distinctive mascots, is how they serve as building blocks for other tools and services.

Syft isn’t just a standalone SBOM generator - it’s a library that developers can integrate into their own tools. Some organizations even build their own SBOM generators and vulnerability tools directly from our open source foundation!

$ docker-scout version
      â¢€â¢€â¢€             â£€â£€â¡¤â£”â¢–â£–â¢½â¢�
   â¡ â¡¢â¡£â¡£â¡£â¡£â¡£â¡£â¡¢â¡€    â¢€â£ â¢´â¡²â£«â¡ºâ£œâ¢�â¢®â¡³â¡µâ¡¹â¡…
  â¡œâ¡œâ¡œâ¡œâ¡œâ¡œâ œâ ˆâ ˆ        â �â ™â ®â£ºâ¡ªâ¡¯â£ºâ¡ªâ¡¯â£º
 â¢˜â¢œâ¢œâ¢œâ¢œâ œ               â ˆâ ªâ¡³â¡µâ£¹â¡ªâ ‡
 â ¨â¡ªâ¡ªâ¡ªâ ‚    â¢€â¡¤â£–â¢½â¡¹â£�â¡�â£–â¢¤â¡€    â ˜â¢�â¢®â¡š       _____                 _
  â ±â¡±â �    â¡´â¡«â£�â¢®â¡³â£�â¢®â¡ºâ£ªâ¡³â£�â¢¦    â ˜â¡µâ �      / ____| Docker        | |
   â �    â£¸â¢�â£•â¢—â¡µâ£�â¢®â¡³â£�â¢®â¡ºâ£ªâ¡³â££    â �      | (___   ___ ___  _   _| |_
        â£—â£�â¢®â¡³â£�â¢®â¡³â£�â¢®â¡³â£�â¢®â¢®â¡³            \___ \ / __/ _ \| | | | __|
   â¢€    â¢±â¡³â¡µâ£¹â¡ªâ¡³â£�â¢®â¡³â£�â¢®â¡³â¡£â¡�    â¡€       ____) | (_| (_) | |_| | |_
  â¢€â¢¾â „    â «â£�â¢®â¡ºâ£�â¢®â¡³â£�â¢®â¡³â£�â �    â¢ â¢£â¢‚     |_____/ \___\___/ \__,_|\__|
  â¡¼â£•â¢—â¡„    â ˆâ “â �â¢®â¡³â£�â ®â ³â ™     â¢ â¢¢â¢£â¢£
 â¢°â¡«â¡®â¡³â£�â¢¦â¡€              â¢€â¢”â¢•â¢•â¢•â¢•â …
 â¡¯â£�â¢¯â¡ºâ£ªâ¡³â£�â¢–â£„â£€        â¡€â¡ â¡¢â¡£â¡£â¡£â¡£â¡£â¡ƒ
â¢¸â¢�â¢®â¡³â£�â¢®â¡ºâ£ªâ¡³â •â —â ‰â �    â ˜â œâ¡œâ¡œâ¡œâ¡œâ¡œâ¡œâ œâ ˆ
â¡¯â¡³â ³â �â Šâ “â ‰             â ˆâ ˆâ ˆâ ˆ



version: v1.13.0 (go1.22.5 - darwin/arm64)
git commit: 7a85bab58d5c36a7ab08cd11ff574717f5de3ec2

$ syft /usr/local/bin/docker-scout | grep syft
 âœ” Indexed file system /usr/local/bin/docker-scout
 âœ” Cataloged contents f247ef0423f53cbf5172c34d2b3ef23d84393bd1d8e05f0ac83ec7d864396c1b
   â”œâ”€â”€ âœ” Packages                        [274 packages]
   â”œâ”€â”€ âœ” File digests                    [1 files]
   â”œâ”€â”€ âœ” File metadata                   [1 locations]
   â””â”€â”€ âœ” Executables                     [1 executables]
github.com/anchore/syft     v1.10.0     go-module

(I find it delightfully meta to discover syft inside other tools using syft itself)

This collaborative building upon existing tools mirrors how Linux distributions often build upon other Linux distributions. Like Ubuntu and Telegraf, we see countless individuals and organizations creating innovative solutions that extend beyond the core capabilities of Syft and Grype. It’s the essence of open source - a multiplier effect that comes from creating accessible, powerful tools.

While we may not always know exactly how and where these tools are being used (and sometimes, rightfully so, it’s not our business), there are many cases where developers and companies want to share their innovative implementations.

I’m particularly interested in these stories because they deserve to be shared. I’ve been exploring public repositories like the GitHub network dependents for syft, grype, sbom-action, and scan-action to discover where our tools are making an impact.

The adoption has been remarkable!

I reached out to several open source projects to learn about their implementations, and Nicolas Vuilamy from MegaLinter was the first to respond - which brings us full circle.

Today, I’m hosting our first Community Spotlight Webinar with Nicolas to share MegaLinter’s story. Register here to join us!

If you’re building something interesting with Anchore Open Source and would like to share your story, please get in touch. ğŸ™�

by Alan Pope at February 13, 2025 10:00 AM

January 21, 2025

Alun Jones – Lithium Ion Discharge Curve

Note: I started writing this post in July 2023, and forgot to finish it. The live battery graphs, linked below, give a bunch of extra info for guesstimating discharge curves - about 476 words

by Alun Jones at January 21, 2025 12:00 AM

January 19, 2025

Chris Wallace – Trees of Essaouira

Palms and Norfolk pines dominate the street scene. The Norfolk pines (Auracaria hetrophylla) do...

by Chris Wallace at January 19, 2025 08:51 PM

Chris Wallace – Trees of Essaouira

Palms and Norfolk pines dominate the street scene. The Norfolk pines (Auracaria hetrophylla) do...

by Chris Wallace at January 19, 2025 08:51 PM

January 02, 2025

Alun Jones – Managing load from abusive web bots

A few months back I created a small web application which generated a fake hierarchy of web pages, on the fly, using a Markov Chain to make gibberish content that - about 1987 words

by Alun Jones at January 02, 2025 12:00 AM

December 17, 2024

Andy Smith – I recommend avoiding the need to have panretinal photocoagulation (PRP) laser treatment

WARNING

This article contains descriptions of medical procedures on the eye. If that sort of thing makes you squeamish you may want to give it a miss.

Yesterday I had panretinal photocoagulation (PRP) laser treatment in both eyes, and it was quite unpleasant. I recommend trying really hard to avoid having to ever have that if possible.

PRP is used to manage symptoms of proliferative diabetic retinopathy. A laser is used to burn abnormal new blood vessels around the retina.

Having had a different kind of laser treatment before I wasn't expecting this to be a big deal. Unfortunately I was wrong and it was a bit of an ordeal.

As usual at these eye examinations I had drops to dilate my pupils and a bunch of different scans and photographs of the back of my eye taken so they knew what they were dealing with. Then in preparation for the procedure, some numbing eye drops. It's an odd sensation not being able to feel your eyelids or the skin around your eyes, but that part wasn't uncomfortable.

Next up the consultant held some sort of eyepiece firmly against the surface of my eyeball and applied a decent amount of pressure to keep it in place.

Then the laser pulses began. Many, many pulses. Each caused an unpleasant stabbing sensation in my eyeball with a dull ache following it. It wasn't so much that it was painful — Wikipedia describes this as "stinging" and in isolation I'd agree with that description. However while this was taking place my head was in a chin rest with an eyepiece thing pressed against my eyeball and the knowledge that if I moved unexpectedly then I risked having my vision destroyed by the laser. And these laser pulses were coming multiple times per second.

I was doing some grunting at the discomfort of each laser pulse when…

Consultant: What! I'm on 30% power. If I make it any lower it'll be homeopathy, know what I mean? It needs to be effective.

Me, through gritted teeth: Just do what you need to do.

Another thing I was not prepared for was total blindness during the procedure and for a few minutes after. He was telling me to look in certain directions, but my vision had gone completely black due to the laser so I couldn't actually tell which direction I was looking in.

Then when it was finally over for one eye, I still could not see anything and due to the anaesthetic could not even tell if my eye was open or not as I couldn't feel my eyelid. Thankfully that recovered after a couple of minutes so he could begin on the other eye…

Post procedure was not too bad. It's an outpatient procedure and I was immediately able to go home on the bus! My eyes just felt tired and took a lot longer to recover from the dilation drops than they usually do (I have vision tests several times a year and they always involve dilation drops). A headache between the temples did force me to go to bed early, but feel fine today.

So… if you have diabetes then blood sugar control is important to help avoid having to go through something like this. If you lose a genetic lottery then after decades living with diabetes you may need it anyway, or if you win then perhaps you never do, but I just suggest doing what you can to improve your odds.

This is still only the second most unpleasant procedure I've had on my eye though!

by Andy at December 17, 2024 12:00 AM

November 30, 2024

Ross Younger – Getting value from CI

Many of my peers within the software world know the value of Continuous Integration and don’t need convincing. This article is for everybody else.

Introduction

In my first job out of college we had what you’d recognise as CI, though the term wasn’t so popular then. It was powerful, very useful, but a source of Byzantine complexity.

I’ve also worked for people who didn’t think CI was worth doing because it was too expensive to set up and maintain. This is not totally unreasonable; the real question is to figure out where the value for your project might lie.

Recently, a friend wrote:

I don't really know very much about CI. I would be interested in knowing more and might even use some of the quick wins (...) I do not want to become completely reliant upon GitHub for anything.

So let’s start with a primer.

Terminology: What is CI?

Unfortunately the term “CI” is sometimes misused and/or confused.

The short answer is that it’s automation that regularly (continuously) does something useful with your codebase. These actions might take place on every commit, nightly, or be activated by some external trigger.

CI usually refers to a spectrum of practices, each step building on the last:

Continuous… Typical activities

Build Builds your code, usually to the unit or module level. Runs unit tests.

Integration Assembles modules to a “finished application”, whatever that means. Runs integration tests.

Test A full suite of automated tests. May include regression, performance, deployability and data migration.

Delivery When the test suite passes, the latest version of the system is automatically released to a staging environment. This might involve building packages and putting them in a download area.

Deployment When the automated tests pass, the software automatically goes live. Hold tight!

Exactly what these phases mean for your project, and how far you go with them, depends on your project.

What suits my embedded firmware probably won’t suit your cloud app or that other person’s desktop app.

The lines between the phases are blurry. For example, it may or may not make sense to build and integrate everything in one go.

â�” Why CI

If deployed appropriately, CI can save time, reduce costs and improve quality. Even on a hobby project, there is often value in saving your time.

1. Automating stuff, so the humans don’t have to

You could use your engineers to do the repetitive drudge work of creating a release across multiple platforms. You could have them run a full barrage of tests before committing a code change… but should you? Engineers are expensive and generally dislike boring stuff, so the smart business move is usually to automate away the repetitive parts and have them focus where they can deliver most value.

If you’re not sure, consider this: how much time does your team spend per release cycle on the repetitive parts? Consider your expected frequency of release cycles, that should lead you to the answer.

2. Automatic analysis and status reporting

One place I worked had a release process which relied on an engineer reading multiple megabytes of log file to see if things had been successful. Many things could go wrong and leave the final output in a plausible but half-broken state. Worse, it wasn’t as simple as running the script in stop-on-error mode, because some of the steps were prone to false alarms.

You may be ahead of me here, but I didn’t think much of that setup.

Compilation failed? Show me the compiler output from the file that failed.

A test failed? I want to see the result of that test (expected & observed results).

Everything passed? Great, but don’t spend megabytes to convey one single bit of information.

At its simplest, a small project will have a single main branch, and the operational information you need can be boiled down to a small number of states:

Something is broken Non-critical warning (not all projects use this) Everything is working

In a non-remote workplace it might make sense to set up some sort of status annunciator.

Some people use coloured lava lamps or similar.

At one place I worked the machinery in the factory had physical traffic light (andon) lamp sets. We set one of these up, driven by a Raspberry Pi wired in to the build server.

Some projects build more elaborate virtual dashboards that suit their needs. Multiple branches, multiple build configurations, whatever makes sense.

3. Improved quality

This one might be self-evident, but I’ll spell it out anyway.

A good CI system will let you incorporate tests of many different types, with variable pass/fail criteria. Think beyond unit and integration testing:

Regression (check that your bugs stay fixed)

Code quality (code/test coverage analysis; static analysis; dynamic memory leak analysis; automated code style checks)

Security analysis (are there any known issues in your dependencies?)

License/SBOM compliance

Fuzz testing (how does it handle randomised, unexpected inputs?)

Performance requirements

“Early warning” performance canaries

Standards compliance

System data migration

On-device testing (might be real, emulated or simulated hardware)

Performance canaries
Particularly where physical devices are involved, you might have a performance margin built in to your hardware spec. As the project evolves, inevitably new features will erode this margin. When you run out things are going to go wrong, so you want to take action before you get there.
An early warning canary is some sort of metric with a threshold. Examples might include free memory, CPU/MCU consumption, or task processing time. When the threshold is passed, that's a sign that things are getting tight and it's time to take pre-emptive action. You might plan to spend some time on algorithmic optimisations, or to kick off a new hardware spin.

If you can automate a really robust set of tests, you can have a lot of confidence in the state of your code at any given time. This gives incredible agility: you can release at any time, if the tests pass. This is the key to moving quickly, and is how a number of tech companies operate.

For a success story involving physical devices, check out the HP LaserJet team’s DevOps transformation.

4. Reduced time to resolve issues

If there’s one thing I’ve learned in the software business, it’s that it’s cheaper to find bugs closer to development - by orders of magnitude.

In other words, reduce the feedback cycle to reduce your costs. This is where automated tests and checks have great value.

If there is something wrong in code I modified a minute ago, I’m still in the right headspace and can usually fix it pretty quickly.

If it takes a few days to get a test result, I won’t remember all the detail and will have to refresh my memory.

If it takes several months to hear that something’s wrong, I may be working on a totally different part of the system and it will take longer to context switch.

If a bug report comes in from the field a year or two later, I might as well be starting again from scratch.

But - as ever - engineering is a trade-off. You can’t write a test to catch a bug you haven’t foreseen. It may be prohibitively expensive to test all possible combinations before release.

â�Œ Why not CI

CI is not suitable for all software projects.

If you’re writing a scratch throw-away project that won’t live for very long, even simple CI may not be worth it.

If you have a legacy codebase that was written without testing in mind, it might be prohibitively expensive to refactor to set these up. Nevertheless, in such projects there is often still some value to be found in a continuous build.

Let’s be pragmatic.

Tests aren’t everything

On the face of it, more testing means greater quality, right? Well… maybe?

Keep the end goal in sight. It’s up to you to decide what makes sense for your situation; I recommend taking a whole-of-organisation view.

You need to balance test runtime against overall feedback cycles. If the tests take too long to run, you’re slowing people down.

Some tests are expensive in terms of time or consuming resources, so you might not want to run them daily.

Tests involving physical devices can be difficult to automate, and risk creating a process bottleneck. (Consider emulation and/or simulation where appropriate.)

Beware of over-testing; you may not need to exhaustively check all the combinations. Statistical techniques might help you out here.

Beware of making your black-box tests too strict; this can lead to brittle tests that are more hassle to maintain than they are worth.

Costs and maintenance

It will take time and effort to set up CI. How much time and effort, I can’t say.

In times past, CI was quite the bespoke effort.

These days there is good tooling support for many environments, so it is usually pretty quick to get something going. From there you can decide how far to go.

It might be too big for your platform

CI platforms are designed for small, lightweight processes. Think seconds to minutes, not hours.

If you need to build a large application or a full Yocto firmware image, it’s going to be tough to make that fit within the limits of a cloud-hosted CI platform. Don’t despair! There are ways out, but you need to be smart. Alternative options include:

self-hosting CI runners that are take part in a cloud source repository;

self-hosting the CI environment (e.g. Gitlab, Jenkins, CircleCI), noting that most source code hosting platforms have integrations;

split up the task into multiple smaller CI jobs making good use of artefacts between stages;

reconsidering what is truly worth automating anyway.

ğŸ‘· Steps you can take

1. Build your units

In most projects you already had to set up a buildsystem. Automating this is usually pretty cheap though you will need to get the tooling right.

Tooling on cloud platforms
On-cloud CI (as provided by Github, Gitlab, Bitbucket and others) is generally containerised. What this means is that your project has to know how to install its own tooling, starting with a minimal (usually Linux) container image.
This is really good practice! Doing so means your required tools are themselves expressed in source code under revision control.

Where this might get tricky is if you have multiple build configurations (platforms or builds with different features). Don’t be surprised if automating reveals shortcomings in your setup.

If you have autogenerated documentation, consider running that too. (In Rust, for example, it could be as easy as adding a cargo doc step.)

2. Test your units

Adding unit tests to CI is usually pretty cheap though it will depend on the language and available test frameworks.

If you want to include language-intrinsic checks (e.g. code style, static analysis) this is a good time to build them in. Some analyses can be quite expensive so it may not make sense to run all the checks at the same frequency.

3. Integrate it

If you’re pulling multiple component parts (microservices, standalone executables) together to make an end result, that’s the next step. Do they play nicely? Do you want to run any isolated tests among them before you move to delivery-level tests?

4. Add more checks

I spoke about these above.

This is where things stop being cheap and you have to start thinking about building out supporting infrastructure.

5. Deliver it

Now we’re getting quite situation-specific. Think about what it means to deliver your project.

Are you building a package for an ecosystem (Rust crate / Python pypi / npm.js / …) ? You might be able to automate the packaging steps and that might be pretty cheap.

Are you building an application? Perhaps you can automate the process of building the installer / container / whatever shape it takes. If you have multiple build configurations or platforms, it could get very tedious to build them all by hand and there is often a win for automation.

Where there's code signing involved, you'll need to decide whether it makes sense to automate that or leave it as a manual release step. Never put private keys or other code signing secrets directly into source! Some platforms have secrets mechanisms that may be of use, but it pays to be cautious. If your secrets leak, how will you repair the situation?

Closing thoughts

Most projects will benefit from a little CI. You don’t need to have unit tests, though they are a good idea.

You’re going to have to maintain your CI, so build it for maintainability like you do your software.

Apply agile to your CI as you do to your deliverables. Perfect is the enemy of good enough. Build something, get feedback, iterate!

CI vendors want to lock you in to their platform. Keep your eyes open.

Don’t let CI become an all-consuming monster that prevents you from delivering in the first place!

Continuous…	Typical activities
Build	Builds your code, usually to the unit or module level. Runs unit tests.
Integration	Assembles modules to a “finished application”, whatever that means. Runs integration tests.
Test	A full suite of automated tests. May include regression, performance, deployability and data migration.
Delivery	When the test suite passes, the latest version of the system is automatically released to a staging environment. This might involve building packages and putting them in a download area.
Deployment	When the automated tests pass, the software automatically goes live. Hold tight!



Something is broken	Non-critical warning (not all projects use this)	Everything is working

by Ross Younger at November 30, 2024 09:53 PM

November 02, 2024

Ross Younger – Announcing qcp

The QUIC Copier (qcp) is an experimental high-performance remote file copy utility for long-distance internet connections.

Source repository: https://github.com/crazyscot/qcp

ğŸ“‹ Features

ğŸ”§ Drop-in replacement for scp

ğŸ›¡ï¸� Similar security to scp, using existing, well-known mechanisms

ğŸš€ Better throughput on congested networks

ğŸ“– About qcp

qcp is a hybrid protocol combining ssh and QUIC.

We use ssh to establish a control channel to the target machine, then spin up the QUIC protocol to transfer data.

This has the following useful properties:

User authentication is handled entirely by ssh

Data is transmitted over UDP, avoiding known issues with TCP over “long, fat pipe” connections

Data in transit is protected by TLS using ephemeral keys

The security mechanisms all use existing, well-known cryptographic algorithms

For full documentation refer to qcp on docs.rs.

Motivation

I needed to copy multiple large (3+ GB) files from a server in Europe to my home in New Zealand.

Iâ€™ve got nothing against ssh or scp. Theyâ€™re brilliant. Iâ€™ve been using them since the 1990s. However they run on top of TCP, which does not perform very well when the network is congested. With a fast fibre internet connection, a long round-trip time and noticeable packet loss, I was right in the sour spot. TCP did its thing and slowed down, but when the congestion cleared it was very slow to get back up to speed.

If youâ€™ve ever been frustrated by download performance from distant websites, you might have been experiencing this same issue. Friends with satellite (pre-Starlink) internet connections seem to be particularly badly affected.

ğŸ’» Getting qcp

The project is a Rust binary crate.

You can install it:

as a Debian package or pre-compiled binary from the latest qcp release page (N.B. the Linux builds are static musl binaries);

with cargo install qcp (you will need to have a rust toolchain and capnpc installed);

by cloning and building the source repository.

You will need to install qcp on both machines. Please refer to the README for more.

See also

RFC9000 “QUIC: A UDP-Based Multiplexed and Secure Transport”

by Ross Younger at November 02, 2024 09:23 PM

October 27, 2024

Chris Wallace – Moving from exist-db 3.0.1 to 6.0.1 6.2.0

Moving from exist-db 3.0.1 to 6.0.1 6.2.0That’s an awful lot of release notes to read through...

by Chris Wallace at October 27, 2024 03:15 PM

September 13, 2024

Alan Pope – Where are Podcast Listener Communities

Parasocial chat

On Linux Matters we have a friendly and active, public Telegram channel linked on our Contact page, along with a Discord Channel. We also have links to Mastodon, Twitter (not that we use it that much) and email.

At the time of writing there are roughly this â¬‡ï¸� number of people (plus bots, sockpuppets and duplicates) in or following each Linux Matters “official” presence:

Channel Number

Telegram 796

Discord 683

Mastodon 858

Twitter 9919

Preponderance of chat

We chose to have a presence in lots of places, but primarily the ~~talent~~ presenters (Martin, Mark, and myself (and Joe)) only really hang out to chat on Telegram and Mastodon.

I originally created the Telegram channel on November 20th, 2015, when we were publishing the Ubuntu Podcast (RIP in Peace) A.K.A. Ubuntu UK Podcast. We co-opted and renamed the channel when Linux Matters launched in 2023.

Prior to the channel’s existence, we used the Ubuntu UK Local Community (LoCo) Team IRC channel on Freenode (also, RIP in Peace).

We also re-branded our existing Mastodon accounts from the old Ubuntu Podcast to Linux Matters.

We mostly continue using Telegram and Mastodon as our primary methods of communication because on the whole they’re fast, reliable, stay synced across devices, have the features we enjoy, and at least one of them isn’t run by a weird billionaire.

Other options

We link to a lot of other places at the top of the Linux Matters home page, where our listeners can chat, mostly to eachother and not us.

Being over 16, I’m not a big fan of Discord, and I know Mark doesn’t even have an account there. None of us use Twitter much anymore, either.

Periodically I ponder if we (Linux Matters) should use something other than Telegram. I know some listeners really don’t like the platform, but prefer other places like Signal, Matrix or even IRC. I know for sure some non-listeners don’t like Telegram, but I care less about their opinions.

Part of the problem is that I don’t think any of us really enjoy the other realtime chat alternatives. Both Matrix and Signal have terrible user experience, and other flaws. Which is why you don’t tend to find us hanging out in either of those places.

There are further options I haven’t even considered, like Wire, WhatsApp, and likely more I don’t even know or care about.

So we kept using Telegram over any of the above alternative options.

Pondering Posting Polls

I have repeatedly considered asking the listeners about their preferred chat platforms via our existing channels. But that seems flawed, because we use what we like, and no matter how many people prefer something else, we’re unlikely to move. Unless something strange happens ğŸ‘€ .

Plus, often times, especially on decentralised platforms, the audience can be somewhat “over-enthusiastic” about their preferred way being The Wayâ„¢ï¸� over the alternatives. It won’t do us any favours to get data saying 40% report we should use Signal, 40% suggest Matrix and 20% choose XMPP, if the four of us won’t use any of them.

Pursue Podcast Palaver Proposals

So rather than ask our audience, I thought I’d see what other podcasters promote for feedback and chatter on their websites.

I picked a random set from shows I have heard of, and may have listened to, plus a few extra ones I haven’t. None of this is endorsement or approval, I wanted the facts, just the fax, ma’am.

I collated the data in a json file for some reason, then generated the tables below. I don’t know what to do with this information, but it’s a bit of data we may use if we ever decide to move away from Telegram.

Presenting Pint-Sized Payoff

The table shows some nerdy podcasts along with their primary means (as far as I can tell) of community engagement. Data was gathered manually from podcast home pages and “about” pages. I generally didn’t go into the page content for each episode. I made an exception for “Dot Social” and “Linux OTC” because there’s nothing but episodes on their home page.

It doesn’t matter for this research, I just thought it was interesting that some podcasters don’t feel the need to break out their contact details to a separate page, or make it more obvious. Perhaps they feel that listeners are likely to be viewing an episode page, or looking at a specific show metadata, so it’s better putting the contact details there.

I haven’t included YouTube, where many shows publish and discuss, in addition to a podcast feed.

I am also aware that some people exclusively, or perhaps primarily publish on YouTube (or other video platforms). Those aren’t podcasts IMNSHO.

Key to the tables below. Column names have been shorted because it’s a w i d e table. The numbers indicate how many podcasts use that communication platform.

EM - Email address (13/18)

MA - Mastodon account (9/18)

TW - Twitter account (8/18)

DS - Discord server (8/18)

TG - Telegram channel (4/18)

IR - IRC channel (5/18)

DW - Discourse website (2/18)

SK - Slack channel (3/18)

LI - LinkedIn (2/18)

WF - Web form (2/18)

SG - Signal group (3/18)

WA - WhatsApp (1/18)

FB - FaceBook (1/18)

Linux

Show EM MA TW DS TG IR DW SK MX LI WF SG WA FB

Linux Matters âœ… âœ… âœ… âœ… âœ… âœ…

Ask The Hosts âœ… âœ… âœ… âœ… âœ…

Destination Linux âœ… âœ… âœ… âœ… âœ…

Linux Dev Time âœ… âœ… âœ… âœ… âœ…

Linux After Dark âœ… âœ… âœ… âœ… âœ…

Linux Unplugged âœ… âœ… âœ… âœ…

This Week in Linux âœ… âœ… âœ… âœ… âœ…

Ubuntu Security Podcast âœ… âœ… âœ… âœ… âœ…

Linux OTC âœ… âœ… âœ…

Open Source Adjunct

Show EM MA TW DS TG IR DW SK MX LI WF SG WA FB

2.5 Admins âœ… âœ…

Bad Voltage âœ… âœ… âœ… âœ…

Coffee and Open Source âœ…

Dot Social âœ… âœ…

Open Source Security âœ… âœ… âœ…

localfirst.fm âœ…

Other Tech

Show EM MA TW DS TG IR DW SK MX LI WF SG WA FB

ATP âœ… âœ… âœ… âœ…

BBC Newscast âœ… âœ… âœ…

The Rest is Entertainment âœ…

Point

Not entirely sure what to do with this data. But there it is.

Is Linux Matters going to move away from Telegram to something else? No idea.

by Alan Pope at September 13, 2024 04:00 PM

September 12, 2024

Alun Jones – Messing with web spiders

Yesterday I read a Mastodon posting. Someone had noticed that their web site was getting huge amounts of traffic. When they looked into it, they discovered that it was OpenAI's - about 422 words

by Alun Jones at September 12, 2024 11:00 PM

August 15, 2024

Ross Younger – Broadcast graphics for fencing

I created a TV graphics package for fencing tournaments.

Earlier this year, Christchurch played host to the Commonwealth Junior & Cadet fencing tournament.

Selected parts of the tournament were livestreamed, with a package broadcast on Sky TV (NZ). The broadcast and finals streams had a live graphics package fed from the scoreboard.

The programmes were produced using a broadcast-spec OB truck supplied by Kiwi Outside Broadcast. The truck graphics PC used Captivate to generate its graphics, which output as key+fill SDI signals. These were fed to the vision mixer and keyed onto the picture in the usual way.

The package can be seen in action on the Commonwealth Junior & Cadet 2024 programmes.

The details read from the scoreboard cover the “hit” lamps, scores, clock (including fractional seconds in the last 10s), period, red/yellow cards, and the priority indicator. On top of that, the package provides a place to enter the fencer names and nationalities, and set colours for them.

This is all made possible by the scoreboard, a Favero FA-07, offering a data feed over an RS-422 interface. I wrote a Python script to parse the data feed, turn it into a JSON dictionary and pass on to Captivate.

by Ross Younger at August 15, 2024 09:16 AM

May 21, 2024

Josh Holland – Even more on git scratch branches: using Jujutsu

Even more on git scratch branches: using Jujutsu
21 May 2024

This is the third post in an impromptu series:

Use a scratch branch in git

More on git scratch branches: using stgit

It seems the main topic of this blog is now git scratch branches and ways to manage them, although the main prompt for this one is discovering someone else had exactly the same idea, as I found from a blog post extolling Jujutsu.

I don’t have much to add to the posts from qword and Sandy, beyond the fact that Jujutsu really is the perfect tool to make this workflow straightforward. The default change selection logic in jj rebase means that 9 times out of 10 it’s enough just to run jj rebase -d master to get everything up to date with the master branch, and the Jujutsu workflow as a whole really is a great experience.

So go forth, use Jujutsu to manage your dev branch, and hopefully I’ll never have to write another post on this, and you can have the traditional “I rewrote my blogging engine from scratch again” post that I’ve been owing for a month or two now.

by Josh Holland at May 21, 2024 12:00 AM

April 29, 2024

Ross Younger – Fault-finding at the ends of the earth

This is a tale from many months ago, working on an embedded ARM target.

In my private journal I wrote:

Today I feel like I saddled up and rode my horse to the literal ends of the earth. I was fault-finding in the setting-up-of-the-universe that happens before your program starts up, and in the tearing-it-down-again that happens after you declare you’re finished.

If you know C++, you might guess that this was a story about static object allocation and deallocation. You’d be right. So, destructors belonging to static-allocated objects. You’d never think they’d run on a bare-metal embedded target.

Well, they can. If your target supports exit() - e.g. if you are running with newlib - then an atexit handler is set up for you, and that will be set up to run the static destructors. If your program then calls exit() (as, say, your on-silicon unit tests might, at the end of a test run) then things are at risk of turning to custard.

You might have enabled an interrupt for some peripheral on the silicon. In order to do anything really useful, the ISR might reference a static object. If you do this, you’d damn well better make sure the object has a static destructor that disables the interrupt, or hilarity is one day going to ensue. You know, the sort of hilarity that involves being savaged by a horde of angry rampaging badgers, or your socks catching fire.

But wait, I hear you say, it called exit! The program no longer exists! Well, sure it doesn’t; but what happens on exit? On this particular ARM target, running tests via a debugger as part of a CI chain, when the atexit handlers have run the process signals final completion with a semihosting call, which is a special flavour of debug breakpoint. It is… not fast. If your interrupt happens regularly, the goblins are going to get you before the pseudo-system-call completes. Your test framework will fail the test executable for hanging, despite somehow passing all of its tests.

There was an actual bug in there, and it was mine. Class X, which contained an RTOS queue and enabled an interrupt, only had a default destructor. On exit, somewhen between static destructors and completion of the semihosting exit call, the ISR fired. It duly failed to insert an item into the now-destroyed queue, so jumped to the internal panic routine. That routine contained a breakpoint and then went nowhere fast, waiting for a debugger command that was never going to arrive — hence the time-out. Maybe it would have been useful to have a library option to skip the static destructors, but I probably wouldn’t have been aware of it ahead of time anyway.

The static destructor ordering fiasco can also be yours for the taking, but thankfully that hadn’t bitten me. Nevertheless, it was a rough day.

Cover image: Cyber Bug Search, by juicy_fish on Freepik

by Ross Younger at April 29, 2024 01:29 AM

January 09, 2024

Phil Spencer – Who’s sick of this shit yet?

I find some headlines just make me angry these days, especially ones centered around hyper late stage capitalism.

This one about Apple and Microsoft just made me go “Who the fuck cares?” and seriously, why should I care. those two idiot companies having insane and disgustingly huge market caps isn’t something I’m impressed by.

If anything it makes me furious.

Do something useful besides making iterations of the same ol junk. Make a few thousand houses, make an affordable grocery supply chain.

If you’re doing anything else you’re a waste of everyones time….as I type this on my Apple computer. Still, that bit of honesty aside I don’t give a fuck about either companies made up valuation.

by KingPhil at January 09, 2024 05:22 AM

January 02, 2024

Phil Spencer – New year new…..This

I have made a new years goal to retire this server before March, the OS has been upgraded many many times over the years and various software I’ve used has come and gone so there is lots of cruft. This server/VM started in San Fransisco and then my provider stopped offering VMs in CA and moved my VM to the UK which is where it has been ever since. This VM started its life in Jan 2008 and it is time to die.

During my 2 week xmas break I have been updating web facing software as much as I could so that when I do put the bullet in the head of this thing I can transfer my blog, wiki, and a couple other still active sites to the new OS without minimal tweaking in the new home.

So far the biggest issues I ran into were with my mediawiki, that entire site is very old, from around 2006 2 years before I started hosting it for someone and then I inherited it entirely around 2009 so the database is very finicky to upgrade and some of the extensions are no longer maintained. What I ended up doing was setting up a docker instance at home to test upgrading and working through the kinks and I have put together a solid step by step on how to move/upgrade it to latest.

I have also gotten sick of running my own e-mail servers, the spam management, certificates, block lists…..it’s annoying. I found out recently that iCloud which I already have a subscription to allows up to 5 custom e-mail domains so I retired my Philtopia e-mail to it early in December and as of today I moved the vo-wiki domain to it as well. Much less hassle for me, I already work enough for work I don’t need to work at home as well.

The other work continues, site by site but I think I am on track to put an end to this ol server early in the year.

by KingPhil at January 02, 2024 06:29 AM

December 31, 2023

Phil Spencer – 8bit party

It’s been a few years…four? since my Commodore 64 collection started and I’ve now got 2 working C64’s and a C128 that functions along with 2 disk drives, a tape drive and a collection of addon hardware and boxed games.

That isn’t all I am collecting however I also have my Nintendo Entertainment System and even more recently I acquired a Sega Master System. The 8bit era really seems to catch my eye far more than anything that came after. I suppose it’s because the whole era made it on hacks and luck.

In any case here are some pictures of my collection, I don’t collect for the sake of collecting. Everything I have I use or play cause otherwise why bother having it?

Enjoy

My desk

NES

Sega Master System

Commodore 64

Games

by KingPhil at December 31, 2023 07:30 AM

December 30, 2023

Phil Spencer – I think it’s time the blog came back

It’s been a while since I’ve written a blog post, almost 4 years in fact but I think it is time for a comeback.

The reason for this being that social media has become so locked down you can’t actually give a valid opinion about something without someone flagging your comment or it being caught by a robot. Oddly enough it seems the right wing folks can say whatever they want against the immigrant villain of the month or LGTBQIA+ issues without being flagged but if you dare stand up to them or offer an opposing opinion. 30 day ban!

So it is time to dust off the ol blog and put my opinions to paper somewhere else just like the olden days before social media! It isn’t all bad of course, I’ve found mastodon quite open to opinions but the fediverse is getting a lot of corporate attention these days and i’m sure it’s only a year or two before even that ends up a complete mess.

Crack open the blogs and let those opinions fly

by KingPhil at December 30, 2023 05:11 PM

October 29, 2023

Paul Rayner – Print (only) my public IP

Every now and then, I need to know my public IP. The easiest way to find it is to visit one of the sites which will display it for you, such as https://whatismyip.com. Whilst useful, all of the ones I know (including that one) are chock full of adverts, and can’t easily be scraped as part of any automated scripting.

This has been a minor irritation for years, so the other night I decided to fix it.

http://ip.pr0.uk is my answer. It’s 50 lines of rust, and is accessible via tcp on port 11111, and via http on port 8080.

use std::io::Write; use std::net::{IpAddr, Ipv4Addr, Ipv6Addr, SocketAddr, TcpListener, TcpStream}; use chrono::Utc; use threadpool::ThreadPool; fn main() { let worker_count = 4; let pool = ThreadPool::new(worker_count); let tcp_port = 11111; let socket_v4_tcp = SocketAddr::new(IpAddr::V4(Ipv4Addr::new(0, 0, 0, 0)), tcp_port); let http_port = 8080; let socket_v4_http = SocketAddr::new(IpAddr::V4(Ipv4Addr::new(0, 0, 0, 0)), http_port); let socket_addrs = vec![socket_v4_tcp, socket_v4_http]; let listener = TcpListener::bind(&socket_addrs[..]); if let Ok(listener) = listener { println!("Listening on {}:{}", listener.local_addr().unwrap().ip(), listener.local_addr().unwrap().port()); for stream in listener.incoming() { let stream = stream.unwrap(); let addr =stream.peer_addr().unwrap().ip().to_string(); if stream.local_addr().unwrap_or(socket_v4_http).port() == tcp_port { pool.execute(move||send_tcp_response(stream, addr)); } else { //http might be proxied via https so let anything which is not the tcp port be http pool.execute(move||send_http_response(stream, addr)); } } } else { println!("Unable to bind to port") } } fn send_tcp_response(mut stream:TcpStream, addr:String) { stream.write_all(addr.as_bytes()).unwrap(); } fn send_http_response(mut stream:TcpStream, addr:String) { let html = format!("<html><head><title>{}</title></head><body><h1>{}</h1></body></html>", addr, addr); let length = html.len(); let response = format!("HTTP/1.1 200 OK\r\nContent-Length: {length}\r\n\r\n{html}" ); stream.write_all(response.as_bytes()).unwrap(); println!("{}\tHTTP\t{}",Utc::now().to_rfc2822(),addr) }

A little explanation is needed on the array of SocketAddr. This came from an initial misreading of the docs, but I liked the result and decided to keep it that way. Calls to listen() will only listen on one port - the first one in the array which is free. The result is that when you run this program, it listens on port 11111. If you keep it running and start another copy, that one listens on port 80 (because it can’t bind to port 11111). So to run this on my server, I just have systemd keep 2 copies alive at any time.

The code and binaries for Linux and Windows are available on Github.

Next steps

I might well leave it there. It works for me, so it’s done. Here are some things I could do though:

1) Don’t hard code the ports 2) Proxy https 3) make a client 4) make it available as a binary for anyone to run on crates.io 5) Optionally print the ttl. This would be mostly useful to people running their own instance.

Boring Details

Logging

I log the IP, port, and time of each connection. This is just in case it ever gets flooded and I need to block an IP/range. The code you see above is the code I run. No browser detection, user agent or anythign like that is read or logged. Any data you send with the connection is discarded. If I proxied https via nginx, that might log a bit of extra data as a side effect.

Systemd setup

There’s not much to this either. I have a template file:

[Unit] Description=Run the whatip binary. Instance %i After=network.target [Service] ExecStart=/path/to/whatip Restart=on-failure StandardOutput=syslog StandardError=syslog SyslogIdentifier=whatip%i [Install] WantedBy=multi-user.target

stored at /etc/systemd/system/whatip@.service and then set up two instances to run:

systemctl enable whatip@1 systemctl enable whatip@2

Thanks for reading

by Paul Rayner at October 29, 2023 11:10 AM

September 18, 2023

Alun Jones – Messing with web spiders

You've surely heard of ChatGPT and its ilk. These are massive neural networks trained using vast swathes of text. The idea is that if you've trained a network on enough - about 467 words

by Alun Jones at September 18, 2023 11:00 PM

May 21, 2023

Alex Hudson – Jobs in the AI Future

Everyone is talking about what AI can do right now, and the impact that it is likely to have on us. This weekends’s Semafor Flagship (which is an excellent newsletter; I recommend subscribing!) asks a great question: “What do we teach the AI generation?”. As someone who grew up with computers, knowing he wanted to write software, and knowing that tech was a growth area, I never had to grapple with this type of worry personally. But I do have kids now. And I do worry. I’m genuinely unsure what I would recommend a teenager to do today, right now. But here’s my current thinking.

© Alex Hudson at May 21, 2023 01:10 PM

Channel	Number
Telegram	796
Discord	683
Mastodon	858
Twitter	9919

Footnotes

Info
Blogs from BitFolk, its customers and hangers-on.
Unless otherwise stated articles are the work of their individual authors and do not represent the opinions of BitFolk.
For additions / removals please see BitFolk's Wiki .
Last updated:
March 19, 2026 06:05 PM
Powered by:
Export:

Subscriptions

Alan Pope
Running RISC-V in a VM to test my snaps
The Threads Algorithm Loves Rage Bait
Malware Peddlers Are Now Hijacking Snap Publisher Domains
Spotlighting Community Stories
Where are Podcast Listener Communities

Alex Hudson
Mindful Chef Review: a long-term reflection
Why I Left Twitter
Jobs in the AI Future

Alun Jones
Faking a JPEG
Lithium Ion Discharge Curve
Managing load from abusive web bots
Messing with web spiders
Messing with web spiders

Andy Smith
Limitations of UK split ticketing
Rusty Fizz buzz with some Divan
Database backups, dump files and restic
Rethinking my backups
I recommend avoiding the need to have panretinal photocoagulation (PRP) laser treatment

BitFolk Issue Tracker

BitFolk Wiki
Ubuntu
Hardware refresh, 2025-2026
Hardware refresh, 2025-2026
PVH
Using the self-serve net installer

Chris Wallace
Art at Southmead Hospital
Art at Southmead Hospital
Trees of Essaouira
Trees of Essaouira
Moving from exist-db 3.0.1 to 6.0.1 6.2.0

Christopher Roberts
Make My Year with Vibe Coding

David Leadbeater
Bash a newline: Exploiting SSH via ProxyCommand, again (CVE-2025-61984)
Switchable dark mode with 5 lines of JavaScript
Images over DNS
CVE-2025-48384: Breaking git with a carriage return and cloning RCE
Can your terminal do emojis? How big?

Dominic Cleal

Graham Bleach

James Beckett

Jon Fautley

Jon Spriggs
EFS-CSI-Driver and the curse of the missing (deleted) PVC in #Kubernetes
Meetup with my tech community of the 2000-2010s – Catching up with GeekUp
Getting an asset from a Github Release in Bash
How I deploy Vaultwarden to provide a Bitwarden compatible service in Kubernetes with Monitoring and Backups
Talk Summary – An Eulogy for Auntie Pat

Josh Holland
A smart e-paper display
Worstsort yet again: polymorphic recursion
KubeCon Europe 2025: days 2-3 (at last)
KubeCon Europe 2025: day 1
Even more on git scratch branches: using Jujutsu

Laura Hobbs
Updated Key Lime Cheesecake Recipe
Peppermint Bark Cheesecake

Paul Rayner
pgfs - a FUSE filesystem on top of a postgres database
Print (only) my public IP

Paul Rudkin
Your new post
This is my second post

Phil Spencer
Social media crackdowns
Who’s sick of this shit yet?
New year new…..This
8bit party
I think it’s time the blog came back

Richard Wallman

Ross Younger
A web toy: Forced ranking assistant
Getting value from CI
Announcing qcp
Broadcast graphics for fencing
Fault-finding at the ends of the earth

Stuart Swindells

← Older revision		Revision as of 00:08, 2 March 2026
Line 25:		Line 25:

	Now when your system reboots into the new 24.04 kernel, interface names should stay as they were before the upgrade.		Now when your system reboots into the new 24.04 kernel, interface names should stay as they were before the upgrade.

			=====Example netplan config=====
			Assuming:

			* Single IPv4 address of '''85.119.82.121''';
			* New IPv6 /48 assignment of '''2a0a:1100:1018::/48''' of which you only intend to use the first address;
			* Legacy IPv6 /64 assignment of '''2001:ba8:1f1:f004::/64''' of which you intend to use the first usable address ('''…::2''');
			* You are allowing the network interface name to change to '''enX0'''.

			<syntaxhighlight lang="yaml">
			network:
			version: 2
			ethernets:
			enX0:
			dhcp4: false
			dhcp6: false
			addresses:
			- "85.119.82.121/21"
			- "2a0a:1100:1018::/128"
			- "2001:ba8:1f1:f004::2/64"
			routes:
			- to: default
			via: "85.119.80.1"
			- to: default
			via: "fe80::1"
			- to: "2a0a:1100:1018::/48"
			via: "::"
			type: blackhole
			nameservers:
			addresses:
			- "85.119.80.232"
			- "85.119.80.233"
			- "2001:ba8:1f1:f205::53"
			- "2001:ba8:1f1:f206::53"
			</syntaxhighlight>

			If your VPS dates from after October 2024 then you may not have a legacy IPv6 /64. If you do have one but don't intend to use it any more you can just remove that address (and please let [[Support]] know so BitFolk can stop routing the /64 to you). If none of this IPv6 /48 talk is making sense to you, check out [[New /48 assignments, October 2024\|the instructions about its introduction]] from October 2024. All customers have one, and you can see what yours is in the [[Panel]].

	==22.04 (Jammy Jellyfish) and beyond==		==22.04 (Jammy Jellyfish) and beyond==

Show	EM	MA	TW	DS	TG	IR	DW	SK	MX	LI	WF	SG	FB
Linux Matters	âœ…	âœ…	âœ…	âœ…	âœ…						âœ…
Ask The Hosts	âœ…			âœ…	âœ…	âœ…			âœ…
Destination Linux		âœ…	âœ…	âœ…				âœ…				âœ…
Linux Dev Time	âœ…			âœ…	âœ…	âœ…			âœ…
Linux After Dark	âœ…			âœ…	âœ…	âœ…			âœ…
Linux Unplugged			âœ…							âœ…		âœ…	âœ…
This Week in Linux		âœ…	âœ…	âœ…				âœ…				âœ…
Ubuntu Security Podcast	âœ…	âœ…	âœ…			âœ…	âœ…
Linux OTC	âœ…	âœ…		âœ…

Show	EM	MA	TW	DW	SK	LI	WF
2.5 Admins	âœ…	âœ…
Bad Voltage	âœ…		âœ…	âœ…	âœ…
Coffee and Open Source							âœ…
Dot Social	âœ…	âœ…
Open Source Security	âœ…	âœ…				âœ…
localfirst.fm			âœ…

Planet BitFolk

March 02, 2026

BitFolk Wiki – Ubuntu

February 28, 2026

Andy Smith – Limitations of UK split ticketing

Split ticketing?

What I'd do in happier times

Splitting for savings

How things went wrong on the day

What I'll do differently in future

Delay Repay

February 21, 2026

Alan Pope – Running RISC-V in a VM to test my snaps

The hardware question

A quick detour into ISA profiles

Getting the VM running

Did the snap work?

What about my other snaps?

What’s next for RISC-V hardware?

February 13, 2026

Christopher Roberts – Make My Year with Vibe Coding

The process

The frustrations

Conclusions

February 10, 2026

Jon Spriggs – EFS-CSI-Driver and the curse of the missing (deleted) PVC in #Kubernetes

February 07, 2026

Alan Pope – The Threads Algorithm Loves Rage Bait

The Numbers

What Threads Actually Rewards

The Response Patterns

The “You’re Doing It Wrong” Brigade

The Well-Meaning (But Mistaken) Helper Squad

The Actually Helpful Responses

The Question Almost Nobody Asked

What Nobody Knew (And Nobody Asked)

The Platform Comparison

Mastodon’s 4 Replies:

Bluesky:

Threads:

What This Actually Reveals

The Irony

What I’ve Learned

February 02, 2026

Andy Smith – Rusty Fizz buzz with some Divan

Watch the video first!

Fizz buzz implementations

naive()

mod_then_match()

early_return_before_mod()

single_string_scan()

single_string_scan_early_fizzbuzz()

And one moreâ€¦

Testing

Benchmarking with Divan

Results

Relight my fire

It hurts when I do this

Slow desktop benchmark

Faster server benchmark

Flame graph for this version

Where to go next?

January 24, 2026

Alex Hudson – Mindful Chef Review: a long-term reflection

January 17, 2026

Alan Pope – Malware Peddlers Are Now Hijacking Snap Publisher Domains

Context

Yes, I’m Banging This Drum Again

Enter SnapScope

The Scam

The Cat-and-Mouse Game

Following the Telegram Trail

Domain Squatting: The New Low

What Now?

January 10, 2026

Alex Hudson – Why I Left Twitter

January 04, 2026

Paul Rudkin – Your new post

Your new post

Paul Rudkin – This is my second post

`naive()`

`mod_then_match()`

`early_return_before_mod()`

`single_string_scan()`

`single_string_scan_early_fizzbuzz()`

Avoid pipe confusion with `--stdin-from-command`

I learned about `mysqldump --skip-dump-date`

That `--rsyncable` really does work

A brief introduction on how `rsnapshot` works

Advantages of `rsnapshot`

Limitations of `rsnapshot`