forgejo
- For every Git commit, fetch the version of every file in the repository at that commit.
- See
git blamefor every file at every commit. - Attempt to download the archive of each repo at every commit.
- Run every possible pull request search filter combination.
- Run every possible issue search filter combination.
- Fetch each of those URLs at random from some residential IP in Brazil that had not ever accessed my server before.
- Set up the runner user. Since I was using Podman, not Docker, I didn’t have to add it to the
dockergroup. As root: - Allow that user to run commands via
systemctlwithout logging in and launching them manually: - Use
machinectlinstead ofsuto become theforgejo-runneruser. Without this, mostsystemdcommands will fail with theFailed to connect to bus: No medium foundmessage. I’m certain there’s a way to getsuorsudoto play nicely withdbusbut I had more interesting problems to solve today than this. - Run
podman-system-serviceas theforgejo-runneruser: - Run the
forgejo-runnerprogram as theforgejo-runneruser. I lightly modified the standard forgejo-runner.service file:
I read Yann Esposito’s blog post, How I protect my forgejo instance from AI Web Crawlers, and think that’s a great idea. My main concern with the crawlers is that they’re horribly written and behave poorly. My own Forgejo server was getting slammed with about 600,000 crawler requests per day. This little server is where I share tiny personal projects like my Advent of Code solutions. I wouldn’t expect any project there to get more than a handful of queries per day, but suddenly I was serving 10 requests per second. That’s not a lot compared to any popular website, but that’s a lot for this service, on this tiny VPS, on my shoestring budget.
Worse, the traffic patterns were flat-out abusive. All the content on this site comprises nearly static Git repositories. The scrapers try things like:
My first huge success at cutting through the flurry of bad traffic was with deploying Anubis. You know those anime girl pictures you see before accessing lots of web pages now? Well, those are part of a highly effective bot blocker. There’s a reason you’re seeing more and more of them.
And this morning, I also adapted Yann’s idea for my server which runs behind Caddy instead of Nginx. I made a file named /etc/caddy/shibboleth like this (but with the cookie name suitably altered to a random local value):
@needs_cookie {
not {
header User-Agent *git/*
}
not {
header User-Agent *git-lfs/*
}
not {
header X-Runner-Uuid *
}
not {
header Cookie *Yogsototh_opens_the_door=1*
}
}
handle @needs_cookie {
header Content-Type text/html
respond 418 {
body `<script>document.cookie = "Yogsototh_opens_the_door=1; Path=/;"; window.location.reload();</script>`
}
}
Note the extra X-Runner-Uuid line that Yann did’t have. This allows my Forgejo Action Runners to connect without going through the cookie handshake.
Then I added a line to the configurations for services I wanted to protect, like:
myserver.example.com {
root * /path/to/files
...
import shibboleth
}
This way I can easily reuse the snippet for any of those services.
Thanks for the great idea, Yann!
Forgejo Runner in rootless Podman on Debian
I wanted to experiment with Forgejo’s Actions as a DIY alternative to GitHub Actions, using a nearby Raspberry Pi as a build server. I also wanted to deviate slightly from their Runner installation process by executing the Runner and rootless Podman as a regular, non-privileged user and without using the system-level systemctl. It was pretty easy once I wrapped my head around it.
root# useradd --create-home forgejo-runner
This created user number 1001 on my system. Remember that number later when it’s time to configure systemd.
root# loginctl enable-linger forgejo-runner
root# apt install systemd-container
root# machinectl shell forgejo-runner@
$ systemctl --user enable podman.socket
$ systemctl --user start podman.socket
$ cat > .config/systemd/user/forgejo-runner.service <<EOHD
[Unit]
Description=Forgejo Runner
Documentation=https://forgejo.org/docs/latest/admin/actions/
After=podman.socket
[Service]
ExecStart=/usr/local/bin/forgejo-runner daemon
ExecReload=/bin/kill -s HUP $MAINPID
# 1001 is the forgejo-runner user's UID
Environment="DOCKER_HOST=unix:///run/user/1001/podman/podman.sock"
# This user and working directory must already exist
WorkingDirectory=/home/forgejo-runner
Restart=on-failure
TimeoutSec=0
RestartSec=10
[Install]
WantedBy=default.target
EOHD
$ systemctl --user daemon-reload
$ systemctl --user enable forgejo-runner.service
$ systemctl --user start forgejo-runner.service
I rebooted my RPi to make sure it would start on its own and it did. Yay! Now I can run Forgejo Actions on my little server and everything works as documented.
Gitea vs Forgejo development activity
I was curious whether Gitea or its recent fork, Forgejo, has had more development activity. I cloned both repos (Gitea’s from GitHub; Forgejo’s from Codeberg, which runs on Forgejo) and ran this command:
$ git log --since="1 year ago" --format="%an" | sort | uniq -c | sort -n | wc -l
to get an overview of things. That showed 153 people (including a small handful of bots) contributing to Gitea, and 232 people (and a couple of bots) contributing to Forgejo. There are some dupes in each list — showing separate accounts for “John Doe” and “johndoe”, that kind of thing — but the numbers look small and similar to me so I think they can be safely ignored.
Some commenters have suggested that Gitea’s development model rebases pull requests onto the main branch instead of applying the individual commits, and that Forgejo does the opposite. This would make it artificially look as though Forgejo has more commit activity.
However, it looks to me like Forgejo is using a similar process of combining lots of smaller PR commits into a single merge commit. The wide majority of its commits since June 2024 or so seem to be 1-commit-per-PR. Changing the above command to --since="2024-07-01" reduces the number of unique contributors to 136 for Gitea and 217 for Forgejo. It also shows 1,228 commits for Gitea and 3,039 for Forgejo, and I do think that’s a legitimately apples-to-apples comparison.
If we brute force it and run
$ git log --since="1 year ago" | rg '\(\#\d{4,5}\)' | wc -l
to match lines that mention a PR (like “Simplify review UI (#31062)” or “Remove title from email heads (#3810)”), then I find 1,256 PR-like Gitea commits and 2,181 Forgejo commits.
I also wondered how many committers were in both repos. I got this from:
$ git log --since="2024-07-01" --format="%an" | sort | uniq > /tmp/users.{whichever}
$ comm -12 users.gitea users.forgejo | wc -l # Users with commits in both
67
$ comm -13 users.gitea users.forgejo | wc -l # Users with Forgejo commits, but not Gitea
150
$ comm -23 users.gitea users.forgejo | wc -l # Users with Gitea commits, but not Forgejo
69
67 users committed to both. Without digging into it, the likeliest explanation to me is that this is mostly due to the projects pulling commits from each other, although nothing I know of keeps an author from sending patches to both. 69 contributors participated in Gitea and not Forgeo. Another 150 contributed to Forgejo but not Gitea.
And finally, their respective activity pages (for Gitea and for Forgejo) show a similar story.
I’m not an expert in methodology here, but from my initial poking around, it would seem to me that Forgejo has a lot more activity and a larger variety of contributors than Gitea does.
Updated rate limits for unauthenticated requests - GitHub Changelog
Summary: Microsoft is locking down access to FOSS source code unless viewers create accounts and log in. This is an excellent time to move projects you own to something you can actually control, like Forgejo.
I just noticed that Forgejo 11 LTS is out now. It’s time to upgrade my older v7 LTS setup!