For systemd this all creates various challenges. One specifically is what this episode is about: there's a per-user service manager for each user, and it manages cgroups. When invoking a userns based container, it makes sense to delegate a cgroup to it, so that the container has all it needs to boot a full blown systemd inside. Delegation means assigning ownership of the cgroup to the UID range used for the userns. But this then means that the per-user service manager…
…will lack the privs to clean up the cgroup delegated to the container, since after all it just runs under the user's UID, but it has no knowledge of the userns or its mappings created by the container manager.
And this is a problem for robustness: it means that the container executor has to carefully clean up after itself, and never leave cgroups around, because unlike almost all other resources, the service manager managing that container executor is unable to clean up after it.
I ran into this problem quite frequently while hacking on nspawn and other userns related code: when my unpriv code died due to some bug I ended up with cgroups in the user's cgroup hierarchy that the per-user service manager couldn't clean up anymore, thus creating something of a DoS scenario.
With systemd v258 this changes a bit. The per-system service manager gained an IPC call that the per-user service manager can call, requesting it to clean up such cgroups for it. The per-system service…
…manager runs privileged after all, and thus can do this.
Of course, the per-system service manager carefully validates the caller's credentials, and verifies that it delegated the cgroup to the caller in the first place. If that checks out, it will remove any subgroup requested, regardless by which UID it owns.
All of this is mostly transparent to services btw: if your code delegates a cgroup to other UIDs, and your service dies it will now be cleaned up no matter what.
That said, the D-Bus method call RemoveSubgroupFromUnit() that is behind this is actually available to clients too, which even allows just removing parts of a delegated subtree, instead of the whole thing.
Moreover, there's a related call KillUnitSubgroup() will allows killing processes in a delegated cgroup subtree, too, for similar reasons and usecases.