Hot stock tips, male enhancement patches and pills, replica high-end wristwatches, diet potions, pirated software (but at a Discount!), forged messages from financial institutions I've never even heard of, offers from the nephew of the president of Nigeria to share his multi-million dollar bank account, and on and on and on: I've Had Enough!
The first thing I do every morning when I get into my office is check my e-mail --well, that used to be what I did. Now, the first thing I do is spend 10 to 15 minutes deleting the junk e-mail that some anonymous miscreant has shoved into my Inbox. I have to filter out a hundred or so unwanted, unsolicited, forged and fraudulent messages before I can begin to read the real e-mail that I depend on for my job.
And my modified morning routine exists despite the installation of a (slightly) functional commercial content-based anti-spam program on NJIT's e-mail gateways.
Content-based junk e-mail filtering programs scan the content of each piece of e-mail that passes through them. Using a scoring system, they assign point values to features contained within the header, subject and body of the messages. Typical things that cause points to be scored are: html in e-mail, non-standard fonts, large font sizes, banned words or phrases and fictitious or empty To: addresses. If the points assigned within the e-mail exceed a certain threshold (usually 5.0), the e-mail is placed in a quarantine and the intended recipient of the e-mail is notified, daily, that there are quarantined e-mails that can be accepted for delivery or deleted. On top of the hundred or so junk e-mails I get, there are usually 20 - 30 e-mail messages caught in the quarantine for me each day. The university anti-spam system is catching about 1 in 4 pieces of spam and, given the huge volume of junk e-mail, that is a junk e-mail solution of very little value.
The fundamental problem with filtering e-mail based on content is the dynamic nature of e-mail content. Like snowflakes, no 2 e-mail messages are ever exactly the same. The header, subject and body of the message can all be altered (or forged) by a skillful spammer who knows the types of content that content-based anti-spam programs check. Since e-mail messages can be changed a lot faster than e-mail filtering programs can be updated, the spammers are always several steps ahead. Because content-based e-mail filtering is so unsuccessful, effective e-mail filtering has to based on the the parts of the email process that are static and never change regardless of the type or content of the messages that are sent. The network rules that govern the transport and delivery of electronic messages define and enforce those unchanging parts of the e-mail process. Those rules are described in the RFCs and they can be used to produce powerful anti-spam tactics because the spammers can't play by all the rules.
A few days ago, I installed 2 rules-based e-mail filtering programs on 2 different servers. One of those servers, dl1.njit.edu, handles about 75000 e-mails per day, the other server, www.smsd.tv, handles about 750 e-mails per day. www.smsd.tv uses a milter (a mail filter) program called milter-greylist, dl1.njit.edu uses a milter called spamilter.
Milter-greylist uses a very basic e-mail approach. It supports lists of users who can be predefined to whitelist (allow), blacklist (reject) or greylist (delay) their incoming e-mail. Blacklisted users can be entered manually or derived automatically from a number of internet based RBLs (realtime blackhole list) that are dynamic lists of broken, compromised or mis-configured e-mail servers through which spammers often send their junk. Greylisting e-mail messages is done by delaying the e-mail delivery by sending this special error code and message back to the originating server : reject=451 4.7.1 Greylisting in action, please come back later.
A properly configured e-mail server will try and resend a message for up to 72 hours until the e-mail is delivered or it receives a permanent delivery failure. Most of the mass spam is sent through hacked and/or mis-configured e-mail servers that don't recognize the 451 delay code (it is part of the RFCs regulating e-mail traffic) and never resend the e-mail message. E-mail that is never resent is never delivered by the local e-mail server and the message never darkens anyone's Inbox, again.
Spamilter is a more complex rule-based e-mail filtering program. It checks the domain name of the sending server against a blacklist to see if it has been reported as compromised (broken). It verifies the e-mail sender by sending an e-mail back to the originating sender to see if it is a valid address. It can verify whether or not the hostname supplied in the transport of the e-mail (called the HELO) can be independently resolved to a valid host name, it can inject the IP address of a rejected host into the server's firewall and block any connection from that host (not just e-mail) for 48 hours; it can verify the IP address the sending server uses to connect, and it can even check the e-mail attachments for dangerous filename extensions. If any of the verification functions fail, the email will be rejected or tagged by pre-pending "Valid Sender?" to the subject. A desktop email program can filter the messages from the Inbox and place them in a local quarantine folder. Those trapped messages can be used to document and report spamming attempts to the spammer's internet service provider.
On www.smsd.tv, in one 24 hour period, 704 internet e-mails were logged. Of those e-mail messages, 353 were delivered. Unknown addresses were greylisted and delayed 56 times and of those 56, 19 were delivered. The remaining 295 messages were either whitelisted on arrival and delivered, or were messages sent to e-mail addresses that didn't exist on www.smsd.tv. Milter-greylist blocked about 2/3rds of the messages it saw as potential spam and delivered the other 1/3rd. Of the 353 messages that were delivered, 1 delivered message that made it through the filter was actually spam. That e-mail message was an advertisement from a prior contact and its only qualification as spam was that it was a bulk e-mailing that used a customer list. That e-mailing contained no "opt out" or unsubscribe information or links as required by the CanSpam Act.
On dl1.njit.edu, in one 24 hour period, 90,000 inbound e-mail messages were rejected by Spamilter. Only 142 inbound e-mail messages were delivered.
dl1.njit.edu supports many mailing lists and group e-mail accounts that have widely published addresses. Spammers have automated programs that continuously look for published e-mail addresses on the World Wide Web and in Usenet newsgroups. Those addresses are "harvested" and inserted into bulk mailing lists that are blasted non-stop with junk e-mail. The dl1.njit.edu test was done during Winter Break when there was almost no one on campus or in staff or faculty offices. NJIT was closed, and that accounts for the tiny number of e-mail messages actually delivered compared to the outrageous number of attempted-and-rejected messages sent to accounts on the server. 17 of the rejected messages were blocked by lookup in a blackhole list of known compromised servers, the rest were blocked because of fictitious Sender addresses.
www.smsd.tv maintains a list of e-mail addresses, networks and domains that have sent any spam to a smsd.tv address during the past 3 years. Those addresses are rejected by the e-mail server before they ever pass through milter-greylist. Blocked addresses include all of hotmail.com and many networks and domains in eastern Europe and Asia.. When a piece of spam is delivered to an smsd.tv address, the originating IP address of the e-mail is immediately blocked along with the network that allowed the transport of the message. An e-mail is sent to the responsible party or parties listed in the Whois database registry informing them of their ban from www.smsd.tv's network, the reason(s) for the ban and a source copy of the entire offending e-mail embedded in the body of the message. If the responsible parties decide to contact the administrator of www.smsd.tv, they must use the postal address listed in the Whois database registry.
The ban of an entire network or domain for one piece of junk e-mail is, certainly, a harsh and sweeping measure. It can be argued, though, that it is the only way to bring e-mail back under control and prevent it from becoming so overwhelmed by garbage that it can no longer function as a useful communication tool. Instead of spending our time and money on solutions that allow us to separate bona fide electronic mail messages from the mountains of garbage some criminal spammer is dumping, we should immediately sever the communication pathway those messages travel to invade our Inboxes. If we immediately unplug spammers and their supporting networks from our networks at the first offense, and make the network reconnection process long and difficult, we can make clear to system and network administrators that our goal is to put them out of business. Once disconnected, if the administrators don't fix their networks and banish their spammers, it will make little difference to us. Their access to our Inboxes will have already been banned and we will never hear from them again, anyway. Their customers, unable to communicate with segments of the internet will migrate to other service providers that do run compliant servers, don't harbor spammers, and have full network connectivity.
Contact your ISP or your system administrator to ask about implementing server-side rule-based anti-spam measures.