Spring Cleaning


"Serendipity will get you through times of no Internet better than Internet will get you through times of no Serendipity" --Perfesser Pedagogue

Heeding those words of wisdom:while Serendipity35's code sorcerers update the magic spells that guide us through the cyberspace cloud (and may also knock us offline intermittently over the next 24 hours), we direct you to the current list of spammers and general e-mail bad guys for your reading pleasure.

If your socks drawer is already arranged, if you've already washed and waxed the cat, and if you know that even Spring Training baseball on TV is still a week away: there is no better time to update your spam filters to keep your e-mail Spring Cleaning right on track.

You might notice hotmail.com (still) on and yahoo.com (newly) on the banned e-mail spammers list. Hotmail.com has failed for years in preventing spammers from using their hostname; yahoo.com has had recent problems with spammers flooding the market with real and faked yahoo.com e-mail relays. It is likely that yahoo.com will far sooner have its current spam troubles solved than hotmail.com ever will.

And it is probable that the code warriors at Serendipity35 will have the blog back up and running, by the time any of us get this whole spam block thing figured out. It just might take a little (more) serendipity.

None Shall Spam

Hot stock tips, male enhancement patches and pills, replica high-end None Shall Spam!wristwatches, diet potions, pirated software (but at a Discount!), forged messages from financial institutions I've never even heard of, offers from the nephew of the president of Nigeria to share his multi-million dollar bank account, and on and on and on: I've Had Enough!

The first thing I do every morning when I get into my office is check my e-mail --well, that used to be what I did. Now, the first thing I do is spend 10 to 15 minutes deleting the junk e-mail that some anonymous miscreant has shoved into my Inbox. I have to filter out a hundred or so unwanted, unsolicited, forged and fraudulent messages before I can begin to read the real e-mail that I depend on for my job.

And my modified morning routine exists despite the installation of a (slightly) functional commercial content-based anti-spam program on NJIT's e-mail gateways.

Content-based junk e-mail filtering programs scan the content of each piece of e-mail that passes through them. Using a scoring system, they assign point values to features contained within the header, subject and body of the messages. Typical things that cause points to be scored are: html in e-mail, non-standard fonts, large font sizes, banned words or phrases and fictitious or empty To: addresses. If the points assigned within the e-mail exceed a certain threshold (usually 5.0), the e-mail is placed in a quarantine and the intended recipient of the e-mail is notified, daily, that there are quarantined e-mails that can be accepted for delivery or deleted. On top of the hundred or so junk e-mails I get, there are usually 20 - 30 e-mail messages caught in the quarantine for me each day. The university anti-spam system is catching about 1 in 4 pieces of spam and, given the huge volume of junk e-mail, that is a junk e-mail solution of very little value.

The fundamental problem with filtering e-mail based on content is the dynamic nature of e-mail content. Like snowflakes, no 2 e-mail messages are ever exactly the same. The header, subject and body of the message can all be altered (or forged) by a skillful spammer who knows the types of content that content-based anti-spam programs check. Since e-mail messages can be changed a lot faster than e-mail filtering programs can be updated, the spammers are always several steps ahead. Because content-based e-mail filtering is so unsuccessful, effective e-mail filtering has to based on the the parts of the email process that are static and never change regardless of the type or content of the messages that are sent. The network rules that govern the transport and delivery of electronic messages define and enforce those unchanging parts of the e-mail process. Those rules are described in the RFCs and they can be used to produce powerful anti-spam tactics because the spammers can't play by all the rules.

A few days ago, I installed 2 rules-based e-mail filtering programs on 2 different servers. One of those servers, dl1.njit.edu, handles about 75000 e-mails per day, the other server, www.smsd.tv, handles about 750 e-mails per day. www.smsd.tv uses a milter (a mail filter) program called milter-greylist, dl1.njit.edu uses a milter called spamilter.

Milter-greylist uses a very basic e-mail approach. It supports lists of users who can be predefined to whitelist (allow), blacklist (reject) or greylist (delay) their incoming e-mail. Blacklisted users can be entered manually or derived automatically from a number of internet based RBLs (realtime blackhole list) that are dynamic lists of broken, compromised or mis-configured e-mail servers through which spammers often send their junk. Greylisting e-mail messages is done by delaying the e-mail delivery by sending this special error code and message back to the originating server : reject=451 4.7.1 Greylisting in action, please come back later.
A properly configured e-mail server will try and resend a message for up to 72 hours until the e-mail is delivered or it receives a permanent delivery failure. Most of the mass spam is sent through hacked and/or mis-configured e-mail servers that don't recognize the 451 delay code (it is part of the RFCs regulating e-mail traffic) and never resend the e-mail message. E-mail that is never resent is never delivered by the local e-mail server and the message never darkens anyone's Inbox, again.

Spamilter is a more complex rule-based e-mail filtering program. It checks the domain name of the sending server against a blacklist to see if it has been reported as compromised (broken). It verifies the e-mail sender by sending an e-mail back to the originating sender to see if it is a valid address. It can verify whether or not the hostname supplied in the transport of the e-mail (called the HELO) can be independently resolved to a valid host name, it can inject the IP address of a rejected host into the server's firewall and block any connection from that host (not just e-mail) for 48 hours; it can verify the IP address the sending server uses to connect, and it can even check the e-mail attachments for dangerous filename extensions. If any of the verification functions fail, the email will be rejected or tagged by pre-pending "Valid Sender?" to the subject. A desktop email program can filter the messages from the Inbox and place them in a local quarantine folder. Those trapped messages can be used to document and report spamming attempts to the spammer's internet service provider.

On www.smsd.tv, in one 24 hour period, 704 internet e-mails were logged. Of those e-mail messages, 353 were delivered. Unknown addresses were greylisted and delayed 56 times and of those 56, 19 were delivered. The remaining 295 messages were either whitelisted on arrival and delivered, or were messages sent to e-mail addresses that didn't exist on www.smsd.tv. Milter-greylist blocked about 2/3rds of the messages it saw as potential spam and delivered the other 1/3rd. Of the 353 messages that were delivered, 1 delivered message that made it through the filter was actually spam. That e-mail message was an advertisement from a prior contact and its only qualification as spam was that it was a bulk e-mailing that used a customer list. That e-mailing contained no "opt out" or unsubscribe information or links as required by the CanSpam Act.

On dl1.njit.edu, in one 24 hour period, 90,000 inbound e-mail messages were rejected by Spamilter. Only 142 inbound e-mail messages were delivered.
dl1.njit.edu supports many mailing lists and group e-mail accounts that have widely published addresses. Spammers have automated programs that continuously look for published e-mail addresses on the World Wide Web and in Usenet newsgroups. Those addresses are "harvested" and inserted into bulk mailing lists that are blasted non-stop with junk e-mail. The dl1.njit.edu test was done during Winter Break when there was almost no one on campus or in staff or faculty offices. NJIT was closed, and that accounts for the tiny number of e-mail messages actually delivered compared to the outrageous number of attempted-and-rejected messages sent to accounts on the server. 17 of the rejected messages were blocked by lookup in a blackhole list of known compromised servers, the rest were blocked because of fictitious Sender addresses.

www.smsd.tv maintains a list of e-mail addresses, networks and domains that have sent any spam to a smsd.tv address during the past 3 years. Those addresses are rejected by the e-mail server before they ever pass through milter-greylist. Blocked addresses include all of hotmail.com and many networks and domains in eastern Europe and Asia.. When a piece of spam is delivered to an smsd.tv address, the originating IP address of the e-mail is immediately blocked along with the network that allowed the transport of the message. An e-mail is sent to the responsible party or parties listed in the Whois database registry informing them of their ban from www.smsd.tv's network, the reason(s) for the ban and a source copy of the entire offending e-mail embedded in the body of the message. If the responsible parties decide to contact the administrator of www.smsd.tv, they must use the postal address listed in the Whois database registry.

The ban of an entire network or domain for one piece of junk e-mail is, certainly, a harsh and sweeping measure. It can be argued, though, that it is the only way to bring e-mail back under control and prevent it from becoming so overwhelmed by garbage that it can no longer function as a useful communication tool. Instead of spending our time and money on solutions that allow us to separate bona fide electronic mail messages from the mountains of garbage some criminal spammer is dumping, we should immediately sever the communication pathway those messages travel to invade our Inboxes. If we immediately unplug spammers and their supporting networks from our networks at the first offense, and make the network reconnection process long and difficult, we can make clear to system and network administrators that our goal is to put them out of business. Once disconnected, if the administrators don't fix their networks and banish their spammers, it will make little difference to us. Their access to our Inboxes will have already been banned and we will never hear from them again, anyway. Their customers, unable to communicate with segments of the internet will migrate to other service providers that do run compliant servers, don't harbor spammers, and have full network connectivity.

Contact your ISP or your system administrator to ask about implementing server-side rule-based anti-spam measures.

The Upslope to Upgrade

When I first learned about Moodle I was excited about an LMS that was written by educators, for educators. When I found out that it was already ported to FreeBSD (my server operating system of choice), I immediately installed it from the FreeBSD Ports collection. For reasons I've described in another post, I configured it to use PostgreSQL as a database backend, and I was off and running, moving my Open Source Unix Certification curriculum from my own web-based delivery system to Moodle in just a few hours of code sorcery.

The Moodle folks are diligent about development, patches and bugfixes and before I finished the first section of my course, Moodle 1.6 was released. The main feature from Moodle 1.6 I wanted to be able to use (I had been using Moodle 1.5.x) was the "Student View" feature --the ability for me, as teacher, to see the course assignments the same way that the students would see them without having to create a phantom student account that I had to use to login and see the courses and assignments from the student's perspective.

Of course, when one is upgrading (or just changing) software in a production environment, the first thing anyone should do is backup everything that might get touched in the upgrade process in case the unexpexted happens and the upgrade breaks the software. A major change between the 1.5 version of Moodle and the 1.6 version was the requirement that the database backend support Unicode (UTF-8) characters so that any selected language could be generated from a standard set of characters without having to store every variant alphabet the world's languages contain. I backed up my PostgreSQL Moodle database, backed up my Moodle 1.5 directory and began the upgrade process.

The Moodle documentation included a script and instructions to import the old non-unicode PostegeSQL database and I began to climb the upslope to an upgrade.

The shell commands for migrating the database from the command line were (I think) Linux specific and didn't work at all in the FreeBSD environment. While Linux is an open source player and buzzword, it is curious why the BSD platforms are often ignored in many (not just Moodle) program configurations. FreeBSD is among the most popular (and is often the most reliable) operating system in the world. After an entire day of hand editing SQL files (not fun even for the enthusiastic), I had a database that worked correctly with Moodle 1.6.1

In late summer, 2006, there were a couple of Moodle security advisories that encouraged administrators to upgrade from Moodle 1.6.x to 1.6.2, then 1.6.3. I downloaded the source archive for the Moodle 1.6.2 upgrade, dutifully backed-up everything in sight and ran the upgrade process from within Moodle. The upgrade script ran fine until it had to alter my database tables that contained "Hot-potatoes" quiz information. The script just stopped at that upgrade segment never to error out and never to continue. The database tables it did alter made the database unrunnable for the older Moodle 1.6.1 version and left the installation completely broken. A few hours (and buckets of sweat) later, I managed to roll everything back to Moodle 1.6.1, reinstall the older database and bring back online my older (but working) Moodle installation.

I swore off (and at) any more upgrades until I had both a break in my classes and some demonic possession that forced me back into the database abyss. But, on November 7th, 2006, Moodle released version 1.7, and on November 29th, FreeBSD published a compatible version in the FreeBSD Ports collection. I backed-up (again) anything that wasn't nailed down and let FreeBSD's automated (and very reliable) portupgrade process transport me to Moodle 1.7.

Portupgrade reported no errors and it looked like Moodle 1.7 was mine to keep, but after I logged in as the moodle administrator, every choice I clicked on returned an "incompatible configuraton error" in an ominous red box and a suggestion (and a link to) check Moodle's online documentation. The link I followed related to upgrading to Moodle 1.7 and wasn't too helpful. Staring at a completely broken Moodle 1.7, I deleted everything that had been installed or upgraded (except my backups) and just did a fresh new install of Moodle 1.7.

Success!

Moodle 1.7 handled the upgrade of my 1.6.1 database without error and I got my courses back and all the content they had, pre-upgrade, contained. My moment of joy was temporarily squashed when I checked my student's grades for the past courses and found out thay had no grades to report, but some hacking around inside of Moodle's configuration modules lead me to the new "roles" functions where I had to re-assign my enrolled students their roles as students. Had I read the version 1.7 administrator's documentation and new features ahead of time, I would have been saved that extra Angst.

Moodle 1.7 looks terrific. Now that I've recovered from scrambling up the slippery grade, I'm enthusiastic, again. But, I am afraid I might be in a minority position. I know a lot of system administrators who don't host or teach their own courses. And I'm not at all certain I'd have gone through this upgrade process if I didn't have my own course materials living inside (and delivered by) Moodle. I haven't looked in much detail at how much faculty re-training might be required to adequately support Moodle 1.7's new features, but it looks like the people who train the trainers have a lot of independent work to do before they can offer support for this release.

I hope that running the Moodle Marathon to version 2.0 will be a bit more downhill.

Thinking Inside the Box

The Thanksgiving holiday and extended weekend have given me some time to spend on projects that have been tasking me. Primary among those tasks has been some way to figure out how to make our dear Serendipity35 run a bit faster.

Back in the summer, I discovered that the bottleneck through which Serendipity35 tried to squeeze its content was our backend database server MySQL. In July the loading of the database server was so heavy that I had to move it to a faster, less busy machine to prevent endless waits for pages to be delivered and to prevent random corruption of our main display pages. While this off-loading of the database processes stopped the random file corruption we were experiencing, it only sped up the delivery of our pages a little.


For about 5 years, when developing a new project, the database backend of choice for me has been PostgresQL. At first I didn't have any compelling reason to use PostgresQL over MySQL other than it had a simpler method of assigning database user and host permissions, but as time moved on, I discovered some things about the structure of PostgresQL that made me feel more comfortable in designing web sites and applications.


I'm not advocating using PostgresQL over MySQL in all database driven applications, but here, in an egg basket. is why PostgresQL can outperform MySQL in this blogging environment.


If we were sitting at the breakfast table on our Cyber Farm and we wanted to order an omelet and MySQL was our server du jour, everytime the cook wanted an egg MySQL would go and gather every egg on the farm in its basket and bring it to the cook even if just one egg was requested. On a fairly large farm, the load of carrying all the eggs around all the time would considerably tire out (and slow down) our server.


If PostgresQL were our server and we ordered a 3 egg omelet, because it has a much smaller egg basket than MySQL it would have to make 3 trips from the chicken coop to the cook to deliver its 3 eggs. Having a small basket, though, and not having to gather every egg on the farm each time even one egg was requested, PostgresQL would make much faster trips and not need as many support resources. Our virtual breakfast could be enjoyed without waiting for the server to catch its breath and bring us our finished omelet.


Daily repastes aside, I decided I wanted to switch our server backend from MySQL to PostgresQL and bask in its glow of promised increased speed. But that database conversion process was slow and after many starts, stops, and hand edits, I couldn't get the data stored in the MySQL database to play nice with PostgresQL. I needed a fresh start and a different approach.


I built a new installation of Serendipity on my server at home and configured t to use PostgresQL as its backend server. After installing, by hand, Serendipity35's, users and categories, I set up an RSS feed to capture all of the current content and place it on my new "at home" server. You can see for yourself the increase in speed and page delivery. Not everything was a smooth transition, though. Articles had to be edited by hand to preserve the original author ID. phpPgAdmin saved my virtual bacon when it came to making those edits.


As of now the imported web log looks pretty good when viewed with Firefox, Mozilla, or Konqueror, but it loses its sidebar in Internet Explorer. The improper display in Internet Explorer may be a function of the Serendipity blogging software version 1.0.3a I installed. Serendipity35 runs on version 1.0.2


My next project for this rapidly evaporationg weekend will be creating an SQL from my home server's file that I can import into a PostgresQL server that supports Serendipity version 1.0.2 at NJIT, and soon make the switch to PostgresQL on this blog to improve its performance.


One project I'm NOT tackling this weekend is cleaning up my desk.