Please note: This project is no longer active. The website is kept online for historic purposes only.
If you´re looking for a Linux driver for your Atheros WLAN device, you should continue here .

Fighting Trac Spam with mod-security

Please note

The ruleset presented here is obsolete and received no updates anymore.

The successor, ScallyWhack, has improved functionality and has been adjusted to the current mod_security generation 2.x. Learn more about ScallyWhack on its own project website.

Introduction

Ticket comment spam has become a major pain in the ass at least for those Trac-driven websites which are listed in the major search engines. madwifi.org, for example, is hit by around 400 spam comments per day, and the number is steadily increasing.

What can be done? Well, Trac 0.10dev users can install the SpamFilter plugin and use that to filter out unwanted comments. The only chance users of Trac 0.9.x have is to keep a close eye on the ticket tracker and make use of the TicketDelete plugin to remove the spam after it has been submitted. And projects that still run on Trac 0.8.x are totally hosed, since none of the aforementioned solutions work for them.

While the available options basically work, I don't like either of them - they eat too much. The manual action that is required by the TicketDelete plugin eats too much time from project members that weed out the spam comments. And the disadvantage of the SpamFilter plugin is that the spam still hits Trac, which consumes valuable CPU cycles to recognize the spam as such. Compared with the solution that is presented here, implementing the spam filter as part of Trac is more expensive, which might become a problem for sites with medium to high traffic.

mod_security, on the other hand, can be made a relatively lightweight tool that is able to block spam comments even before Trac has to spend time on them. This relieves the webserver and allows a better performance. mod_security currently is available as module for Apache 2.0.x only as it seems. But the author announced support for other webservers.

Examination

I started to monitor the site when the amount of spam comments that showed up in our ticket tracker rised. That was shortly after I fixed an issue which caused google to dislike our site - once that issue was resolved and the site was listed on google and friends, the amount of spam increased... no real surprise here, I guess.

By now and as of writing this article, we (and other Trac installations I've checked) seem to be hit by five different types of spam. In the following they are listed

Type 1: #preview spam

It seems that there is at least one spam bot out there that tries to post to, for example, /ticket/123#preview rather than just /ticket/123 as browsers usually do. Many of these posts don't show up in Trac, and once logging is enabled it's easy to determine the reason:

2006-07-10 12:36:46,452 Trac[main] ERROR: Sorry, can not save your changes. This ticket has been modified by someone else since you started

I didn't dig deeper into the reason for this message, but it seems that the spambot either sends no timestamp at all or one that is too far back in the past.

However, some of the spam bots seem to make it right now. Recently madwifi.org was hit by some new tickets that passed the applied filter rules and were even accepted by Trac, so they also had valid timestamps. Nevertheless it's possible to filter them, since they also apply to this spam type: the POST requests for these new tickets were sent to /newticket#preview...

Type 2: no cookie spam

Google and friends are used by spammers to find Trac installations and existing tickets they can post their spamvertisements on. They send their POST requests directly to previously determined URLs without GETting them before, and (thus?) do not have a Trac cookie. A regular (human) site visitor, on the other hand, has either a trac_session (anonymous user) or a trac_auth (registered user who has logged in before) cookie.

Type 3: html processor spam

Some of the spammers are quite "clever". They make use of the HTML processor that comes with Trac (see here and here) to hide the spam from users - I think they hope that this increases the chance that their spam remains unnoticed by the administrators.

Here's an example of such a comment:

{{{
#!html
<div style="overflow:auto; height: 1px;">
<a href="http://spammers-website.tld/faked-handbags-suck.html">faked handbags</a>
<a href="http://spammers-website.tld/so-do-faked-sunglasses.html">faked sunglasses</a>
...
</div>
}}}

The HTML processor "hides" the content of the comment, so users think it's just an empty comment.

Type 4: markup spam

I can only guess, but it seems that some of the spam bots out there are either all-rounders or dumb. We've seen spam comments in our ticket tracker which made use of various forms of "markup language", such as BBcode or HTML (without using the HTML processor). Some of them even contain the plain URLs without any markup, hoping that they will be converted to clickable links.

BBcode example:

... [url=http://spamvertised.tld/some-stupid-page.html]visit this site![/url] ...

HTML example:

... <a href="http://spamvertised.tld/some-stupid-page.html">visit this site!</a> ...

The results are not showing up as expected, nevertheless these comments do their unwanted job in our case: Trac automatically makes fully-working, "clickable" links out of plain-text URLs that will be followed by search engine spiders. And that's all that counts for a spammer, isn't it?

Type 5: LED spam

If I remember correctly, this guy was the first spammer that hit our site, and he seems to be well-known in the Trac community (check this and this, for example, and see what google finds). From what I can tell this guy is a moron sitting somewhere in china, submitting the same post over and over again manually via his browser. My knowledge of chinese language is quite bad, but it seems that he spamvertises a site that is in the LED business.

Type 6: attachment description spam

A relatively new type of spam that slipped through my attention for a while. When attaching files to a wiki page users are allowed to describe their attachment. This description might contain WikiFormatting which is parsed when the description is displayed, which can be misused to embed links to spamvertised pages - which makes it related to type 4. At least in Trac 0.9.x neither "recent changes" nor the timeline notifies about attachments, which makes it quite hard to spot this type of attack.

Commonalities

All these spam posts have a very typical signature, allowing to easily block them with the help of mod-security. I'm quite sure this will change in the future, but for now it works nicely as described below. Having said that, let's stop talking and head over to practice.

Implementation

Step 1: install and activate mod-security

The main requirement for this recipe is, of course, that you have mod-security installed. The latest stable release is 1.9.4. However, the rules below have been developed and tested with 1.8.7, so this version should be sufficient for our purposes.

If your server runs on Debian try:

apt-get install libapache2-mod-security
a2enmod   # on the prompt answer: mod-security
/etc/init.d/apache2 restart

Otherwise have a look at the installation instructions on the project website.

Due to license problems, the package is not longer in the Debian archive. You may find it at the maintainer's site or at [http://debian-unofficial.org Debian-Unofficial site].

Step 2: apply the filter rules

Copy and paste the following lines into configuration file(s) for the vhost(s) you want to secure. If all of the vhosts your server is hosting should be protected, you should put the rules into the global configuration.

Don't forget to adjust the path to the debug log file and the name if the custom 402 error HTML page (if you make use of either of that).

<IfModule mod_security.c>
    #
    # anti trac-spam rules v8
    # http://madwifi.org/wiki/FightingTracSpam
    #

    SecFilterDebugLevel     0
    # uncomment the following line if you enable debugging:
    #SecFilterDebugLog /path/to/somewhere/trac-spam.log

    SecFilterEngine         On
    SecFilterScanPOST       On
    SecFilterCheckURLEncoding On
    SecFilterCheckCookieFormat On
    SecFilterCheckUnicodeEncoding Off

    # default rule: if a request matches, we want mod-security to
    # put a notice about it into the (v)hosts' error log and
    # deny the request with status 402 ("Payment required")
    SecFilterDefaultAction "deny,log,status:402"

    # have a look at POST requests only, since they are what is used
    # to submit the spam - this helps to reduce the load that is
    # caused by mod-security
    SecFilterSelective REQUEST_METHOD "!(^POST$)" "nolog,allow"

    # allow all POST requests that are not directed to one of the
    # handlers we take into account below
    SecFilterSelective REQUEST_URI "!(/(wiki|newticket|ticket).*$)" "nolog,allow"

    # block POSTs to /ticket/<number>#preview and /newticket#preview
    # this catches spam type 1
    SecFilterSelective REQUEST_URI "^/(newticket|ticket/[0-9]+).*\#preview"

    # block POSTs to /wiki, /ticket and /newticket from users who
    # don't have a trac cookie
    # this catches spam type 2
    # 
    # CAUTION: these rules likely cause false positives, as some users tend
    # to turn off cookie support in their browser. Don't activate them unless
    # you're sure that this won't offend your visitors, or at least warn
    # visitors.
    #SecFilterSelective REQUEST_URI "^/(wiki/|newticket|ticket/).*$" chain
    #SecFilterSelective HTTP_COOKIE "!(trac_auth|trac_session)"

    # don't accept usage of HTML processor in tickets / ticket comments
    # this catches spam type 3
    SecFilterSelective REQUEST_URI "^/(newticket|ticket/).*$" chain
    SecFilterSelective "ARG_description|ARG_comment" "#!html"

    # block new ticket and ticket comment POSTs if they contain more
    # than one URL 
    # this catches spam type 4
    SecFilterSelective "REQUEST_URI" "^/(newticket|ticket/).*$" chain
    SecFilterSelective "ARGS" "http\:/.*http\:/"

    # block LED spammer; his spam is not blocked by the previous
    # rule, since he only includes only one URL to the spamvertised
    # website
    # last but not least, this catches spam type 5
    SecFilterSelective REQUEST_URI "^/(newticket|ticket/).*$" chain
    SecFilterSelective "ARG_description|ARG_comment" "www.tideled.com"

    # block tickets or comments with an http://-URL in it, if user is
    # not properly authenticated; throw a 403 that allows to present
    # users with a custom error page which explains what is going
    # on (see below)
    # this rule is used on tickets as well as attachment descriptions
    # and therefore also catches spam type 6
    SecFilterSelective "REQUEST_URI" "/(newticket|ticket/|attachment/).*$" chain
    SecFilterSelective HTTP_COOKIE "!trac_auth" chain
    SecFilterSelective HTTP_Authorization "!Basic" chain
    SecFilterSelective "ARGS" "(http|https):/" "deny,log,status:403"


    # Apache allows to present users with customized error pages,
    # and we can make use of that feature to let spammers know what
    # we think of 'em.
    # Tell Apache what file to use as error page for 402, and
    # let it know that requests to this file should not be handled
    # by Trac.
    # 
    # Uncomment the following lines if you want to make use of this
    # feature (see also step 3 of the recipe):
    #ErrorDocument   402     /error402.html
    #<Location /error402.html>
    #    SetHandler  None
    #</Location>
    #
    # Another use for customized error pages is, as mentioned above,
    # to let users know why they are not allowed to give URLs in
    # their tickets and what they can do to circumvent this
    # limitation.
    #ErrorDocument   403     /error403.html
    #<Location /error403.html>
    #    XBitHack On
    #    SetHandler  None
    #</Location>
</IfModule>

Step 3: customize the 402 error page (optional)

As mentioned in the comments of the above mod-security rules, Apache allows to present customized error pages. This is a nice way to let spammers know what we think of them (if they read it at all). More important, it allows us to explain to legitimate users that something went wrong if they are sent to that page - nobody is perfect.

madwifi.org uses two different pages for that purpose: one for error 402 (explaining that advertisements have to be paid for) and one for error 403 (explaining why URLs are not allowed in tickets and what can be done to avoid them). Feel free to use that, too, but make sure you adjust the e-mail addresses that are mentioned in the first two paragraphs :)

Using UNIQUE_ID for troubleshooting

Tracking down what went wrong when a legitimate user caused a false positive can be problematic. In order to identify the "offending" request(s) in the logfiles, you will need to know things like the user's IP address, the approximate time when his request was blocked, and so on. On the other hand, users who have been suspected to be a spammer most probably won't be in the mood to answer all your questions patiently.

To help both sides, mentioning a unique token in a prominent place is probably the best idea. Users will be given something they can easily refer to, and you can dig the logs for this token. Luckily, Apache provides everything that is required to accomplish this: mod_unique_id and mod_include.

mod_unique_id, you already guessed it, generates the unique token we need for each request and stores it in the environment variable UNIQUE_ID. If present, mod_security will mention that in its logs automatically. mod_include then is used for processing server side includes and allows to dynamically insert the UNIQUE_ID into the custom error page. That's all we need.

So, after making sure that Apache loads these two modules, you need to tell it that our custom error page should be processed by mod_include. Personally, I think that XBitHack is the easiest way to do that. Add XBitHack On to the <Location /error402.html>- and <Location /error403.html>-directive in the above configuration example:

...
<Location /error402.html>
    XBitHack On
    SetHandler  None
</Location>
...
<Location /error403.html>
    XBitHack On
    SetHandler  None
</Location>
...

Then set the user executable bit on the html-file for the custom error page:

yourhost:~# chmod +x /path/to/www-root/error402.html
yourhost:~# chmod +x /path/to/www-root/error403.html

Last but not least you need to insert the following to your custom error page (wherever you want the UNIQUE_ID to appear):

<!--#echo var="UNIQUE_ID" -->

That's it, now it should work.

Conclusion

I think the described solution is easy to implement, relatively light-weight and efficient. madwifi.org is using this setup since a few days now, and it seems to work fine for us. However, your mileage may vary, so in the beginning you should keep an eye on what happens. You also should let your users know about potential problems (for example if they turned off cookies in their browser) and how they can work around them.

Last but not least, please let me know about your experiences and tell me if you found a way to improve the rules or something like that. You can contact me by mail or in our IRC support channel on Freenode as user otaku42.