Fix podman leaking conmon processes

When running in the background without a full-blown init system,
`podman system service` will leak `conmon` processes for every
gitlab-runner job that executes via the docker socket API.  These
`conmon` processes almost immediately becomes zombies, and are never
cleaned up.  Eventually the zombies will consume all available PIDs.

Many attempts to fix this in various ways have all failed.  In all cases
the GitLab Runner process will start behaving strangely (or fail
completely) after an amount of time dependent on its usage executing
jobs.

Fix this by entirely reimplementing *pipglr* to utilize systemd and a
pair of lingering user-slices.  One for podman, another for the gitlab
runner.  Include a systemd timer service to affect runner cleanup,
periodically. Also update documentation and examples accordingly.

Signed-off-by: Chris Evich <chris_gitlab@icuc.me>
This commit is contained in:
Chris Evich
2023-01-06 11:53:11 -05:00
parent f44e9891d1
commit 6cb20272e4
14 changed files with 361 additions and 329 deletions

213
README.md
View File

@@ -22,112 +22,122 @@ configuration relative to their own security situation/environment.
### Operation
This image supports `podman container runlabel`, or if your version
lacks this feature, Several labels are set on the image to support
easy registration and execution of a runner container using a special
bash command. See the examples below for more information.
This image leverages the podman `runlabel` feature heavily. Several
labels are set on the image to support easy registration and execution
of the runner container. While it's possible to use the container
with your own command-line, it's highly recommended to base them
off of one of the labels. See the examples below for more information.
#### [Volume setup]
Since podman inside the container runs as user `podman`, the volumes
used by it need to be pre-created with ownership information. While,
we're at it, might as well add the performance-improving `noatime`,
option as well.
***Note:*** Some older versions of podman don't support the
`container runlabel` sub-command. If this is the case, you may simulate
it with the following, substituting `<label>` with one of the predefined
values (i.e. `register`, `setupconfig`, etc.):
```bash
$ VOLOPTS="o=uid=1000,gid=1000,noatime"; \
for VOLUME in pipglr-podman-root pipglr-config pipglr-podman-cache; do \
podman volume create --opt $VOLOPTS $VOLUME || true ; \
VOLPTH=$(podman unshare podman volume mount $VOLUME)
podman unshare chown -c -R 1000:1000 $VOLPTH && \
podman unshare chmod -c 02770 $VOLPTH && \
podman unshare podman volume unmount $VOLUME ; \
done
$ IMAGE="registry.gitlab.com/qontainers/pipglr:latest"
$ eval $(podman inspect --format=json $IMAGE | jq -r .[].Labels.<label>)
```
If you get `podman system service` startup permission-denied errors, or
errors from gitlab-runner, unable to connect to the podman socket, this is
likely the cause. You can fix it after-the-fact using the same commands
above.
#### Runner registration (step 1)
#### Runner registration
All runners must be connected to a project or group runner configuration
on your gitlab instance (or `gitlab.com`). This is done using a special
registration *runlabel*. The command can (and probably should) be run
more than once (using the same `config.toml`) to configure and register
multiple runners. This is necessary for the *pipglr* container to execute
multiple jobs in parallel. For example, if you want to support running
four jobs at the same time, you would use the `register` *runlabel*
four times.
Each time the registration command is run, a new runner is added into
the configuration. If however, you simply need to update/modify the
configuration, please edit the `config.toml` file directly after mounting
(default) `pipglr-runner-config` (`/home/podman/.gitlab-runner/`) volume.
For modern versions of podman, registration can be performed with the
following commands:
Before using the `register` *runlabel*, you must set your unique
*registration* (a.k.a. *activation*) token as a podman *secret*. This
secret may be removed once the registration step is complete. The
**<actual registration token>** value (below) should be replaced with
the value obtained from the "runners" settings page of a gitlab
group or project's *CI/CD Settings*. Gitlab version 16 and later
refers to this value as an *activation* token, but the usage is the same.
```bash
$ IMAGE="registry.gitlab.com/qontainers/pipglr:latest"
$ echo '<actual registration token>' | podman secret create REGISTRATION_TOKEN -
$ touch ./config.toml # important: file must exist, even if empty.
$ podman container runlabel register $IMAGE
...repeat as desired...
$ podman secret rm REGISTRATION_TOKEN # if desired
```
Where `<actual registration token>` is the value obtained from the "runners"
settings page of a gitlab group or project. When you're finished registering
as many runners as you want, the secret is no-longer needed and may be removed:
#### Runner Configuration (step 2)
During the registration process (above), a boiler-plate (default) `config.toml` file
will be created/updated for you. At this point you may edit the configuration
if desired before committing it as a *podman secret*. Please refer to the
[gitlab runner documentation](https://docs.gitlab.com/runner/configuration/)
for details.
```bash
$ podman secret rm REGISTRATION_TOKEN
$ $EDITOR ./config.toml # if desired
$ podman secret create config.toml ./config.toml
$ rm ./config.toml # if desired
```
##### Note
#### Volume setup (step 3)
Some versions of podman don't support the `container runlabel` sub-command.
If this is the case, you may simulate it with the following command (in addition
to the other example commands above):
Since several users are utilized inside the container volumes must be
specifically configured to permit access. This is done using several
*runlabels* as follows:
```bash
$ eval $(podman inspect --format=json $IMAGE | jq -r .[].Labels.register)
$ IMAGE="registry.gitlab.com/qontainers/pipglr:latest"
$ podman container runlabel setupstorage $IMAGE
$ podman container runlabel setupcache $IMAGE
```
#### Runner Startup
Note: These volumes generally do not contain any critical operational data,
they may be re-created anytime to quickly free up host disk-space if
it's running low. Simply remove them with the command
`podman volume rm pipglr-storage pipglr-cache`. The reuse the `setupstorage`
and `setupcache` *runlabels* as in the above example.
With one or more runners successfully registered and configured, the GitLab
runner container may be launched with the following commands:
#### Runner Startup (step 4)
With the runner configuration saved as a Podman secret, and the runner volumes
created, the GitLab runner container may be launched with the following commands:
```bash
$ IMAGE="registry.gitlab.com/qontainers/pipglr:latest"
$ podman container runlabel run $IMAGE
```
##### Note
### Configuration Editing
As above, if you're missing the `container runlabel` sub-command, the following
may be used instead (assuming `$IMAGE` remains set):
The gitlab-runner configuration contains some sensitive values which
should be protected. The pipglr container assumes the entire configuration
will be passed in as a Podman secret. This makes editing it slightly
convoluted, so a handy *runlabel* `dumpconfig` is available.
It's intended use is as follows:
```bash
$ eval $(podman inspect --format=json $IMAGE | jq -r .[].Labels.run)
$ IMAGE="registry.gitlab.com/qontainers/pipglr:latest"
$ podman container runlabel dumpconfig $IMAGE > ./config.toml
$ $EDITOR ./config.toml
$ podman secret rm config.toml
$ podman secret create config.toml ./config.toml
$ rm ./config.toml # if desired
```
#### Runner configuration
### Debugging
You may inspect/modify the gitlab-runner configuration as you see fit, just be
sure to use the `podman unshare` command-wrapper to enter the usernamespace.
For example, to display the config:
```bash
$ podman unshare cat $(podman unshare podman volume mount pipglr-config)/config.toml
```
Edit the config with your favorite `$EDITOR`:
```bash
$ podman unshare $EDITOR $(podman unshare podman volume mount pipglr-config)/config.toml
```
#### Debugging
The first thing to check is the container output:
The first thing to check is the container output. This shows three things:
Systemd, Podman, and GitLab-Runner output. For example:
```bash
$ podman logs --since 0 pipglr
```
Next, try running pipglr after an `export PODMAN_RUNNER_DEBUG=debug` to enable
debugging on the inner-podman. If more runner detail is needed, you can instead/additionally
set `export LOG_LEVEL=debug` to debug the gitlab-runner itself.
Next, try running a pipglr image built with more verbose logging. Both
the `runner.service` and `podman.service` files have a `log-level` option.
Simply increase one or both to the "info", or "debug" level. Start the
debug container, and reproduce the problem.
## Building
@@ -140,69 +150,34 @@ $ podman build -t registry.gitlab.com/qontainers/pipglr:latest .
This will utilize the latest stable version of podman and the latest
stable version of the gitlab runner.
### Notes
* If you wish to use the `testing` or `upstream` flavors of the podman base image,
simply build with `--build-arg FLAVOR=testing` (or `upstream`).
* Additionally or alternatively, you may specify a specific podman base image tag
with `--build-arg BASE_TAG=<value>`. Where `<value>` is either `latest`, the
podman image version (e.g. `v4`, `v4.2`, `v4.2.0`, etc.)
### Build-args
Several build arguments are available to control the output image:
* `FLAVOR` - Choose from 'stable', 'testing', or 'upstream'. These
select the podman base-image to utilize - which may affect the
podman version, features, and stability. For more information
see [the podmanimage README](https://github.com/containers/podman/blob/main/contrib/podmanimage/README.md).
* `BASE_TAG` - When `FLAVOR="stable"`, allows granular choice over the
exact podman version. Possible values include, `latest`, `vX`, `vX.Y`,
and `vX.Y.Z` (where, `X`, `Y`, and `Z` represent the podman semantic
version numbers). It's also possible to specify an image SHA.
* `CLEAN_INTERVAL` - A `sleep` (command) compatible time-argument that
determines how often to clean out podman storage of disused containers and
images. Defaults to 24-hours, but should be adjusted based on desired caching-effect
versus available storage space and rate of job execution.
* `EXCLUDE_PACKAGES` - A space-separated list of RPM packages to prevent
their existence in the final image. This is intended as a security measure
to limit the attack-surface should a gitlab-runner process escape it's
inner-container.
* `PRUNE_INTERVAL` - A systemd.timer compatible `OnCalendar` value that
determines how often to prune Podman's storage of disused containers and
images. Defaults to "daily", but should be adjusted based on desired
caching-effect balanced against available storage space and job
execution rate.
* `RUNNER_VERSION` - Allows specifying an exact gitlab runner version.
By default the `latest` is used, assuming the user is building a tagged
image anyway. Valid versions may be found on the [runner
release page](https://gitlab.com/gitlab-org/gitlab-runner/-/releases).
* `DNFCMD` - By default this is set to `dnf --setopt=tsflags=nodocs -y`.
However, if you'd like to volume-mount in `/var/cache/dnf` then you'll
need to use
`--build-arg DNFCMD="dnf --setopt=tsflags=nodocs -y --setopt keepcache=true`"
Note: Changing `DNFCMD` will cause build-time cache cleanup to be disabled.
* `TARGETARCH` - Supports inclusion of non-x86_64 gitlab runners. This
value is assumed to match the image's architecture. If using the
`--platform` build argument, it will be set automatically.
* `RUNNER_LISTEN_ADDRESS` - Disabled by default, setting this to the FQDN
and port supports various observability and debugging features of the
gitlab runner. For more information see the [gitlab runner advanced
configuration documentation](https://docs.gitlab.com/runner/configuration/advanced-configuration.html#the-global-section).
* `PRIVILEGED_RUNNER` - Defaults to 'true', may be set 'true' if you're brave.
However this may result in the gitlab-runner failing to launch inner-containers.
Setting it false will also prevent building container images using the runner.
* `RUNNER_TAGS` - Defaults to `podman_in_podman`, may be set to any comma-separated
list (with no spaces!) of tags. These show up in GitLab (not the runner
configuration), and determines where jobs are run.
* `RUNNER_UNTAGED` - Defaults to `true`, may be set to `false`. Allows
the runner to service jobs without any tags on them at all.
value is assumed to match the image's architecture. If using the
`--platform` build argument, it will be set automatically. Note:
as of this writing, only `amd64` and `arm64` builds of the gitlab-runner
are available.
* `NESTED_PRIVILEGED` - Defaults to 'true', may be set 'false' to prevent
nested containers running in `--privileged` mode. This will affect
the ability to build container images in CI jobs using tools like
podman or buildah.
### Environment variables
Nearly every option to every gitlab-runner sub-command may be specified via
environment variable. Many important/required options are set in the
`Containerfile`. However it's entirely possible to pass them in via
either of the `podman container runlabel...` container commands. To
discover them, simply append `--help` to the end of the command.
For example:
```bash
podman container runlabel $IMAGE register --help
```
environment variable. Some of these are set in the `Containerfile` for
the `register` *runlabel*. If you need to set additional runtime
env. vars., please do so via additional `Environment` optionns in the
`runner.service` file. See the *systemd.nspawn* man page for important
value-format details.