FreedomBox Mediawiki Fail2ban filter
Created by Steven Baltakatei Sandoval on 2023-12-28T15:37-08 under a CC BY-SA 4.0 (🅭🅯🄎4.0) license and last updated on 2023-12-31T21:36-08.
Summary
I noticed a significant increase in CPU usage on a public FreedomBox
webserver I run for publishing my notes via a MediaWiki instance at
reboil.com/mediawiki. The high usage was caused by frequent expensive
requests for dyanmically generated special pages. I implemented two
solutions: modifying the server's robots.txt
and creating a
fail2ban
filter.
Table of Contents
Background
Note that this procedure may likely only work for FreedomBox instances
since FreedomBox itself makes automatic configuration file
changes. For example, fail2ban
should already be installed and
maintained by FreedomBox.
FreedomBox is a Debian package that converts the machine it is installed on into a personal cloud server. Specifically, it converts the machine into an Apache server with a webUI interface for installing apps such as WordPress (blog), Mediawiki (wiki), Bepasty (file sharing), Ejabberd (XMPP chat server), Postfix/Dovecot (Email server), OpenVPN, Radicale (calendar and addressbook), among others. See the Manual for details.
All files are created or edited with root
access by a FreedomBox account with administrator privileges by
logging in via ssh
and running:
$ sudo su -
#
This article assumes a basic knowledge of GNU/Linux such as logging
into your FreedomBox via ssh
, editing text files via the command
line, viewing file contents with cat
, running Bash scripts, and
being aware of file ownership issues.
Analysis
I detected the high traffic usage via a journalctl resembling the following:
journalctl
# journalctl --output=short-iso --follow
A more focused command is described in the following Bash script run
as the root
user.
#!/bin/bash
journalctl --output=short-iso --follow | \
grep --line-buffered "apache-access" | \
less -S +F
Below are portions of example lines of expensive index.php?
requests.
/mediawiki/index.php?returnto=1770-03-07&returntoquery=redirect%3Dno&title=Speci
/mediawiki/index.php?target=1770-03-22&title=Special%3AWhatLinksHere HTTP/1.1" 4
/mediawiki/index.php?action=history&title=1784-03-15 HTTP/1.1" 200 5025 "-" "Moz
/mediawiki/index.php?returnto=1770-03-07&returntoquery=redirect%3Dno&title=Speci
/mediawiki/index.php?action=history&title=1784-03-15 HTTP/1.1" 200 6494 "-" "Moz
/mediawiki/index.php?action=edit&title=1770-03-08 HTTP/1.1" 200 5031 "-" "Mozill
/mediawiki/index.php?target=1862-04-02&title=Special%3AWhatLinksHere HTTP/1.1" 4
/mediawiki/index.php?action=edit&title=1770-03-08 HTTP/1.1" 200 4911 "-" "Mozill
top
High CPU usage was indicated by the appearance of multiple
php-fpm7.4
processes indicated by the top
command, a task manager
available on most Unix-like operating systems such as Debian 12.
dstat
dstat was a system performance monitoring utility that I was fond
of. Although FreedomBox uses a Red Hat version which took over the
dstat
namespace, and replaced it with a rewritten version that lacks
the handy --top-cpu
option, the following command should still work
to show you relevant CPU information, outputting averaged data in a
line every 60 seconds:
# dstat --time --load --proc --cpu --mem --disk --io --net --sys --vm 60
dool
dool
is the python3 compatible fork of dstat
that isn't tracked by
Debian but which recreates the dstat behavior I'm used to such as
including the --top-cpu
option. You can install it into local user
space via:
# git clone https://github.com/scottchiefbaker/dool.git dool
# cd dool
# ./install.py
You are root, doing a local install
Installing binaries to /usr/bin/
Installing plugins to /usr/share/dool/
Installing manpages to /usr/share/man/man1/
Install complete. Dool installed to /usr/bin/dool
You can then run the command via:
# dool --time --load --proc --cpu --top-cpu --mem --disk --io --net --sys --vm 60
Installing for use by the
Methodology
robots.txt
According to the MediaWiki Manual for robots.txt, requests from
webcrawlers to index.php may be disallowed by adding the following
text to a server's robots.txt
file. In my particular FreedomBox
instance, the file is located at /var/www/html/robots.txt
. I use the
cat
command merely to show the contents and location of the file for
this explanation.
# cat /var/www/html/robots.txt
User-agent: *
Disallow: /mediawiki/index.php?
No restart to the apache2
service should be necessary.
index.php
is the main access point for a MediaWiki site. In my
FreedomBox installation, MediaWiki pages are served by default at URLs
omitting index.php
. For example, my article on the Moon is served by
default at https://reboil.com/mediawiki/Moon. Notably,
https://reboil.com/mediawiki/index.php?title=Moon also works, but it
will trigger the fail2ban
filter described below.
This modification of robots.txt
alone is an indirect way to reduce
web crawler requests for dynamically generated MediaWiki pages but it
relies on coöperation from webcrawlers themselves to honor the
Disallow request. The following fail2ban
filter is an active
response that bans IP addresses that make repeated requests.
fail2ban filter
Fail2ban is a program used by default with a FreedomBox instance. A generic tutorial for configuring it is available here.
For my instance, I set up the filter by creating two files and running
a systemd
command to restart the fail2ban
service.
The first file to create sets up a filter for fail2ban
that uses a
regular expression to identify requests for index.php
. It is a
3-line file named mediawiki.conf
saved in /etc/fail2ban/filter.d/
.
# cat /etc/fail2ban/filter.d/mediawiki.conf
[Definition]
failregex = <HOST> -.*"GET /mediawiki/index\.php\?.*"
ignoreregex =
The second file is a jail.local
which references the
mediawiki.conf
file (via the [mediawiki]
line) and specifies the
trigger and reset conditions for an IP address ban.
# cat /etc/fail2ban/jail.local
[mediawiki]
enabled = true
port = http,https
filter = mediawiki
logpath = %(apache_error_log)s
maxretry = 60
findtime = 600
bantime = 3600
The logpath
line is specific to how FreedomBox configures its logs
when applying fail2ban
to other applications. maxretry
is the
number of requests within a window of findtime
seconds that will
trigger a ban lasting bantime
seconds.
In this particular example, a webcrawler requesting 60 or more pages
via my Mediawiki's index.php
access point within a 5-minute window
will get a 1-hour ban on its IP address. Most dynamically generated
pages a typicaly human user would use are viewing a page's history or
requesting to edit a page. I find it implausible for a human to make
sixty such requests in five minutes (that's one request per ten
seconds) and so I find these limits rational.
In my instance, I had to create the jail.local
file. According to
the Linode tutorial, jail.local
is meant to permit a local
administrator to extend and override configurations established in
default .conf
files such as fail2ban.conf
.
To immediately apply the new fail2ban
filter, the service must be
restarted:
# systemctl restart fail2ban.service
Current bans can be viewed via a fail2ban-client
command:
# fail2ban-client status mediawiki
Status for the jail: mediawiki
|- Filter
| |- Currently failed: 1
| |- Total failed: 5920
| `- Journal matches:
`- Actions
|- Currently banned: 1
|- Total banned: 51
`- Banned IP list: 47.76.35.19
Relevant system information
- FreedomBox version: 23.6.2
- Operating system: Debian GNU/Linux 11 (bullseye)