forgejo

    I read Yann Esposito’s blog post, How I protect my forgejo instance from AI Web Crawlers, and think that’s a great idea. My main concern with the crawlers is that they’re horribly written and behave poorly. My own Forgejo server was getting slammed with about 600,000 crawler requests per day. This little server is where I share tiny personal projects like my Advent of Code solutions. I wouldn’t expect any project there to get more than a handful of queries per day, but suddenly I was serving 10 requests per second. That’s not a lot compared to any popular website, but that’s a lot for this service, on this tiny VPS, on my shoestring budget.

    Worse, the traffic patterns were flat-out abusive. All the content on this site comprises nearly static Git repositories. The scrapers try things like:

    • For every Git commit, fetch the version of every file in the repository at that commit.
    • See git blame for every file at every commit.
    • Attempt to download the archive of each repo at every commit.
    • Run every possible pull request search filter combination.
    • Run every possible issue search filter combination.
    • Fetch each of those URLs at random from some residential IP in Brazil that had not ever accessed my server before.

    My first huge success at cutting through the flurry of bad traffic was with deploying Anubis. You know those anime girl pictures you see before accessing lots of web pages now? Well, those are part of a highly effective bot blocker. There’s a reason you’re seeing more and more of them.

    And this morning, I also adapted Yann’s idea for my server which runs behind Caddy instead of Nginx. I made a file named /etc/caddy/shibboleth like this (but with the cookie name suitably altered to a random local value):

    @needs_cookie {
        not {
            header User-Agent *git/*
        }
        not {
            header User-Agent *git-lfs/*
        }
        not {
            header X-Runner-Uuid *
        }
        not {
            header Cookie *Yogsototh_opens_the_door=1*
        }
    }
    
    handle @needs_cookie {
        header Content-Type text/html
        respond 418 {
            body `<script>document.cookie = "Yogsototh_opens_the_door=1; Path=/;"; window.location.reload();</script>`
         }
    }
    

    Note the extra X-Runner-Uuid line that Yann did’t have. This allows my Forgejo Action Runners to connect without going through the cookie handshake.

    Then I added a line to the configurations for services I wanted to protect, like:

    myserver.example.com {
        root * /path/to/files
        ...
        import shibboleth
    }
    

    This way I can easily reuse the snippet for any of those services.

    Thanks for the great idea, Yann!

    Forgejo Runner in rootless Podman on Debian

    I wanted to experiment with Forgejo’s Actions as a DIY alternative to GitHub Actions, using a nearby Raspberry Pi as a build server. I also wanted to deviate slightly from their Runner installation process by executing the Runner and rootless Podman as a regular, non-privileged user and without using the system-level systemctl. It was pretty easy once I wrapped my head around it.

    1. Set up the runner user. Since I was using Podman, not Docker, I didn’t have to add it to the docker group. As root:
    root# useradd --create-home forgejo-runner
    

    This created user number 1001 on my system. Remember that number later when it’s time to configure systemd.

    1. Allow that user to run commands via systemctl without logging in and launching them manually:
    root# loginctl enable-linger forgejo-runner
    
    1. Use machinectl instead of su to become the forgejo-runner user. Without this, most systemd commands will fail with the Failed to connect to bus: No medium found message. I’m certain there’s a way to get su or sudo to play nicely with dbus but I had more interesting problems to solve today than this.
    root# apt install systemd-container
    root# machinectl shell forgejo-runner@
    
    1. Run podman-system-service as the forgejo-runner user:
    $ systemctl --user enable podman.socket
    $ systemctl --user start podman.socket
    
    1. Run the forgejo-runner program as the forgejo-runner user. I lightly modified the standard forgejo-runner.service file:
    $ cat > .config/systemd/user/forgejo-runner.service <<EOHD
    [Unit]
    Description=Forgejo Runner
    Documentation=https://forgejo.org/docs/latest/admin/actions/
    After=podman.socket
    
    [Service]
    ExecStart=/usr/local/bin/forgejo-runner daemon
    ExecReload=/bin/kill -s HUP $MAINPID
    # 1001 is the forgejo-runner user's UID
    Environment="DOCKER_HOST=unix:///run/user/1001/podman/podman.sock"
    
    # This user and working directory must already exist
    WorkingDirectory=/home/forgejo-runner
    Restart=on-failure
    TimeoutSec=0
    RestartSec=10
    
    [Install]
    WantedBy=default.target
    EOHD
    $ systemctl --user daemon-reload
    $ systemctl --user enable forgejo-runner.service
    $ systemctl --user start forgejo-runner.service
    

    I rebooted my RPi to make sure it would start on its own and it did. Yay! Now I can run Forgejo Actions on my little server and everything works as documented.

    Gitea vs Forgejo development activity

    I was curious whether Gitea or its recent fork, Forgejo, has had more development activity. I cloned both repos (Gitea’s from GitHub; Forgejo’s from Codeberg, which runs on Forgejo) and ran this command:

    $ git log --since="1 year ago" --format="%an" | sort | uniq -c | sort -n | wc -l
    

    to get an overview of things. That showed 153 people (including a small handful of bots) contributing to Gitea, and 232 people (and a couple of bots) contributing to Forgejo. There are some dupes in each list — showing separate accounts for “John Doe” and “johndoe”, that kind of thing — but the numbers look small and similar to me so I think they can be safely ignored.

    Some commenters have suggested that Gitea’s development model rebases pull requests onto the main branch instead of applying the individual commits, and that Forgejo does the opposite. This would make it artificially look as though Forgejo has more commit activity.

    However, it looks to me like Forgejo is using a similar process of combining lots of smaller PR commits into a single merge commit. The wide majority of its commits since June 2024 or so seem to be 1-commit-per-PR. Changing the above command to --since="2024-07-01" reduces the number of unique contributors to 136 for Gitea and 217 for Forgejo. It also shows 1,228 commits for Gitea and 3,039 for Forgejo, and I do think that’s a legitimately apples-to-apples comparison.

    If we brute force it and run

    $ git log --since="1 year ago" | rg '\(\#\d{4,5}\)' | wc -l
    

    to match lines that mention a PR (like “Simplify review UI (#31062)” or “Remove title from email heads (#3810)”), then I find 1,256 PR-like Gitea commits and 2,181 Forgejo commits.

    I also wondered how many committers were in both repos. I got this from:

    $ git log --since="2024-07-01" --format="%an" | sort | uniq > /tmp/users.{whichever}
    $ comm -12 users.gitea users.forgejo | wc -l # Users with commits in both
          67
    $ comm -13 users.gitea users.forgejo | wc -l # Users with Forgejo commits, but not Gitea
         150
    $ comm -23 users.gitea users.forgejo | wc -l # Users with Gitea commits, but not Forgejo
          69
    

    67 users committed to both. Without digging into it, the likeliest explanation to me is that this is mostly due to the projects pulling commits from each other, although nothing I know of keeps an author from sending patches to both. 69 contributors participated in Gitea and not Forgeo. Another 150 contributed to Forgejo but not Gitea.

    And finally, their respective activity pages (for Gitea and for Forgejo) show a similar story.

    I’m not an expert in methodology here, but from my initial poking around, it would seem to me that Forgejo has a lot more activity and a larger variety of contributors than Gitea does.

    Updated rate limits for unauthenticated requests - GitHub Changelog

    Summary: Microsoft is locking down access to FOSS source code unless viewers create accounts and log in. This is an excellent time to move projects you own to something you can actually control, like Forgejo.

    I just noticed that Forgejo 11 LTS is out now. It’s time to upgrade my older v7 LTS setup!