Well, we are now in the adapt-or-die phase of the internet. Crawlers and bots have always been problematic but this novel scourge of AI is truly obscene. The writing was on the wall when search result quality plummeted a while back, but now comes the next nail in the coffin. As a simple Mensch with an internet facing server what else am I to do? Deploy countermeasures!
Putting up bot tarpits and zipbombs feels a lot like hiring more cops to stop crime: wasteful and pointless without addressing the root cause. While Cloudflare, Anthropic and OpenAI can continue to profit from the unfortunate state of the internet, I will have to make lemonade with the help of Anubis.
I’ll share my €0.02 about LLMs in another blog post, but for now let me show off this somewhat inelegant solution.
How Anubis works
The idea is that is that it acts as a middleware for requests from your reverse proxy, adding a proof-of-work challenge designed to allow normal browser agents through while bouncing scrapers. Once the challenge is solved you get a cookie and all is well. It builds on the premise of hashcash.
How Anubis works with Traefik
That’s the neat part: it doesn’t. But once again thanks to tireless heroes putting in the work to bake an image that actually does integrate with the Traefik middleware I can rejoice. It will probably get merged in eventually and the docs will be updated but for now this will do.
Correctly configuring the container took a lot of trial and error. Wading through github issues tipped me off about omitting the REDIRECT_DOMAINS env var. I don’t want to get too deep into the weeds as to why, after all, I am using an image some stranger created based off an unmerged pull-request. We are truly cooking with gas here. Playbook to deploy the container:
~/projects/gwt/anubis.yml
---
- hosts: gwt
vars:
application: anubis
podman_network: "{{ networks.luxurious_lair }}"
tasks:
- name: Create container
ansible.builtin.include_role:
name: podman_container
vars:
image: ttl.sh/techaro/pr-368/anubis:24h
env:
BIND: ":8989"
TARGET: ' '
# REDIRECT_DOMAINS: "rpavlov.com"
PUBLIC_URL: "https://anubis.rpavlov.com"
COOKIE_NAME: "rpavlov.com"
COOKIE_DOMAIN: "rpavlov.com"
generate_systemd:
path: "/home/{{ common_user }}/.config/systemd/user"
For my special snowflake stack where Anubis, Static Webserver and Traefik are running as containers, I just run another playbook to deploy a dynamic config that traefik will pick up:
---
- hosts: gwt
gather_facts: false
vars:
application: traefik
tasks:
- name: Create folder for dynamic configs
ansible.builtin.file:
path: "{{ config_directory }}/conf.d"
state: directory
owner: "{{ common_user }}"
group: "{{ common_group }}"
mode: "0770"
- name: Template static traefik config
ansible.builtin.template:
src: "traefik/traefik.yaml.j2"
dest: "{{ config_directory }}/traefik.yaml"
mode: "0660"
- name: Template dynamic traefik configs
ansible.builtin.template:
src: "traefik/conf.d/{{ item }}.j2"
dest: "{{ config_directory }}/conf.d/{{ item }}"
mode: "0660"
with_items:
- middlewares.yml
- sws.yml
This site is just static content served by the sws container, which is configured as service in Traefik:
~/projects/gwt/templates/traefik/conf.d/sws.yml.j2
http:
services:
sws:
loadBalancer:
servers:
- url: "http://sws:8888"
routers:
sws-public-router:
rule: 'Host(`rpavlov.com`)'
service: sws@file
entryPoints:
- web-secure
middlewares:
- middlewares-anubis
- chain-no-auth
tls:
certResolver: letsencrypt
options: modern@file
domains:
- main: 'rpavlov.com'
sans:
- '*.rpavlov.com'
~/projects/gwt/templates/traefik/conf.d/middlewares.yml.j2
http:
middlewares:
middlewares-anubis:
forwardAuth:
address: "http://anubis:8989/.within.website/x/cmd/anubis/api/check"
...
lgtm shipit.