FreedomBox Mediawiki Fail2ban filter

Created by Steven Baltakatei Sandoval on 2023-12-28T15:37-08 under a CC BY-SA 4.0 (🅭🅯🄎4.0) license and last updated on 2023-12-31T21:36-08.

Summary

I noticed a significant increase in CPU usage on a public FreedomBox webserver I run for publishing my notes via a MediaWiki instance at reboil.com/mediawiki. The high usage was caused by frequent expensive requests for dyanmically generated special pages. I implemented two solutions: modifying the server's robots.txt and creating a fail2ban filter.

Table of Contents

  1. FreedomBox Mediawiki Fail2ban filter
    1. Summary
    2. Background
    3. Analysis
      1. journalctl
      2. top
      3. dstat
      4. dool
    4. Methodology
      1. robots.txt
      2. fail2ban filter
      3. Relevant system information

Background

Note that this procedure may likely only work for FreedomBox instances since FreedomBox itself makes automatic configuration file changes. For example, fail2ban should already be installed and maintained by FreedomBox.

FreedomBox is a Debian package that converts the machine it is installed on into a personal cloud server. Specifically, it converts the machine into an Apache server with a webUI interface for installing apps such as WordPress (blog), Mediawiki (wiki), Bepasty (file sharing), Ejabberd (XMPP chat server), Postfix/Dovecot (Email server), OpenVPN, Radicale (calendar and addressbook), among others. See the Manual for details.

All files are created or edited with root access by a FreedomBox account with administrator privileges by logging in via ssh and running:

$ sudo su -
#

This article assumes a basic knowledge of GNU/Linux such as logging into your FreedomBox via ssh, editing text files via the command line, viewing file contents with cat, running Bash scripts, and being aware of file ownership issues.

Analysis

I detected the high traffic usage via a journalctl resembling the following:

journalctl

# journalctl --output=short-iso --follow

A more focused command is described in the following Bash script run as the root user.

#!/bin/bash
journalctl --output=short-iso --follow | \
  grep --line-buffered "apache-access" | \
  less -S +F

Below are portions of example lines of expensive index.php? requests.

/mediawiki/index.php?returnto=1770-03-07&returntoquery=redirect%3Dno&title=Speci
/mediawiki/index.php?target=1770-03-22&title=Special%3AWhatLinksHere HTTP/1.1" 4
/mediawiki/index.php?action=history&title=1784-03-15 HTTP/1.1" 200 5025 "-" "Moz
/mediawiki/index.php?returnto=1770-03-07&returntoquery=redirect%3Dno&title=Speci
/mediawiki/index.php?action=history&title=1784-03-15 HTTP/1.1" 200 6494 "-" "Moz
/mediawiki/index.php?action=edit&title=1770-03-08 HTTP/1.1" 200 5031 "-" "Mozill
/mediawiki/index.php?target=1862-04-02&title=Special%3AWhatLinksHere HTTP/1.1" 4
/mediawiki/index.php?action=edit&title=1770-03-08 HTTP/1.1" 200 4911 "-" "Mozill

top

High CPU usage was indicated by the appearance of multiple php-fpm7.4 processes indicated by the top command, a task manager available on most Unix-like operating systems such as Debian 12.

dstat

dstat was a system performance monitoring utility that I was fond of. Although FreedomBox uses a Red Hat version which took over the dstat namespace, and replaced it with a rewritten version that lacks the handy --top-cpu option, the following command should still work to show you relevant CPU information, outputting averaged data in a line every 60 seconds:

# dstat --time --load --proc --cpu --mem --disk --io --net --sys --vm 60

dool

dool is the python3 compatible fork of dstat that isn't tracked by Debian but which recreates the dstat behavior I'm used to such as including the --top-cpu option. You can install it into local user space via:

# git clone https://github.com/scottchiefbaker/dool.git dool
# cd dool
# ./install.py 
You are root, doing a local install

Installing binaries to /usr/bin/
Installing plugins  to /usr/share/dool/
Installing manpages to /usr/share/man/man1/

Install complete. Dool installed to /usr/bin/dool

You can then run the command via:

# dool --time --load --proc --cpu --top-cpu --mem --disk --io --net --sys --vm 60

Installing for use by the

Methodology

robots.txt

According to the MediaWiki Manual for robots.txt, requests from webcrawlers to index.php may be disallowed by adding the following text to a server's robots.txt file. In my particular FreedomBox instance, the file is located at /var/www/html/robots.txt. I use the cat command merely to show the contents and location of the file for this explanation.

# cat /var/www/html/robots.txt
User-agent: *
Disallow: /mediawiki/index.php?

No restart to the apache2 service should be necessary.

index.php is the main access point for a MediaWiki site. In my FreedomBox installation, MediaWiki pages are served by default at URLs omitting index.php. For example, my article on the Moon is served by default at https://reboil.com/mediawiki/Moon. Notably, https://reboil.com/mediawiki/index.php?title=Moon also works, but it will trigger the fail2ban filter described below.

This modification of robots.txt alone is an indirect way to reduce web crawler requests for dynamically generated MediaWiki pages but it relies on coöperation from webcrawlers themselves to honor the Disallow request. The following fail2ban filter is an active response that bans IP addresses that make repeated requests.

fail2ban filter

Fail2ban is a program used by default with a FreedomBox instance. A generic tutorial for configuring it is available here.

For my instance, I set up the filter by creating two files and running a systemd command to restart the fail2ban service.

The first file to create sets up a filter for fail2ban that uses a regular expression to identify requests for index.php. It is a 3-line file named mediawiki.conf saved in /etc/fail2ban/filter.d/.

# cat /etc/fail2ban/filter.d/mediawiki.conf 
[Definition]
failregex = <HOST> -.*"GET /mediawiki/index\.php\?.*"
ignoreregex =

The second file is a jail.local which references the mediawiki.conf file (via the [mediawiki] line) and specifies the trigger and reset conditions for an IP address ban.

# cat /etc/fail2ban/jail.local
[mediawiki]
enabled = true
port = http,https
filter = mediawiki
logpath  = %(apache_error_log)s
maxretry = 60
findtime = 600
bantime = 3600

The logpath line is specific to how FreedomBox configures its logs when applying fail2ban to other applications. maxretry is the number of requests within a window of findtime seconds that will trigger a ban lasting bantime seconds.

In this particular example, a webcrawler requesting 60 or more pages via my Mediawiki's index.php access point within a 5-minute window will get a 1-hour ban on its IP address. Most dynamically generated pages a typicaly human user would use are viewing a page's history or requesting to edit a page. I find it implausible for a human to make sixty such requests in five minutes (that's one request per ten seconds) and so I find these limits rational.

In my instance, I had to create the jail.local file. According to the Linode tutorial, jail.local is meant to permit a local administrator to extend and override configurations established in default .conf files such as fail2ban.conf.

To immediately apply the new fail2ban filter, the service must be restarted:

# systemctl restart fail2ban.service

Current bans can be viewed via a fail2ban-client command:

# fail2ban-client status mediawiki
Status for the jail: mediawiki
|- Filter
|  |- Currently failed: 1
|  |- Total failed: 5920
|  `- Journal matches:  
`- Actions
   |- Currently banned: 1
   |- Total banned: 51
   `- Banned IP list:   47.76.35.19

Relevant system information

  • FreedomBox version: 23.6.2
  • Operating system: Debian GNU/Linux 11 (bullseye)