Laura Atkins: Marketers Can't Learn From Spam
“Why is my mail being blocked if I still get spam?”
It’s almost an inevitable question when handling delivery issues. I understand why I get it so often. People look in their inbox and see this mail is clearly spam and it’s in the inbox. But they look at the mail they send that they know isn’t spam and it ends up in the bulk folder. It’s logical to ask why legitimate marketers have to follow all these complicated and arbitrary rules to reach the inbox when spammers reach the inbox and they don’t follow any of the rules.
But delivery experts know about delivery, right?
Even for delivery experts, there is no real answer to why one particular piece of mail is delivered to one place or another. There are lots of reasons for the question being unanswerable. A big reason is that deliverability is about scale. We’re discussing probabilities across a mail stream, not about a particular email. We simply cannot answer that question with a single email because we don’t have enough data. When we can look at the broader campaign we can see enough data to determine why mail was delivered where it was delivered.
Another reason this question is so hard to answer is because of the complexity of spam filters. Generally we think of spam filters as a single set of rules that every email goes through. But that’s not the real model of filtering. There’s not a single spamfilter. There are a lot of different filters that work together to make decisions about delivery of an email: outbound filtering at the ISP/ESP, SMTP level filtering, content filtering, mail client filtering and personal filters all act on email. Every one of these filters could be responsible for a mail being undeliverable or going to the bulk folder.
Deliverability folks tend to only talk about two of these filters: SMTP level filtering and content filtering. Outbound filters are typically used to protect the sending network and are non-negotiable. Mail client filters and personal filters are different for every user, and without specific feedback from the user we can’t speak to these filters.
We most comfortable discussing SMTP level filtering and content filtering and many recommendations from deliverability come from a deep understanding of how those filters work. For ease of discussion we can group these two types of filtering mechanisms into “incoming filters.”
OK, how do incoming filters work?
Incoming filters ask questions about an email. Does this mail contain a virus? Does it come from an IP with a history of delivering mail to us? What is that history? Does it contain a dangerous link? Does it contain an executable file? How many users is this going to? Is it authenticated? Does is look like mail we know is bad? Is it going to questionable email addresses?
Answering each question involves its own set of rules.. Rules are generated through many different means.
Some rules are manual. One example of a manual rule is “if this email fails DMARC authentication and the sender publishes a p=reject, do not accept it.” Another example is “if the sending IP is on the SBL do not accept the mail.”
Some rules are manual but programatic and conditional. If this email meets all these conditions, send a temporary failure message and wait for a retry. If this email meets these other conditions, put it in the bulk folder for users who have not directly whitelisted the sender.
Some rules are generated through a machine learning process. While they make up a small fraction of spam filters, they are often the most difficult to understand.
Machine learning is complicated, done in many different ways and cannot really be discussed in general terms. Imagine 2 different folders of emails. One folder is spam, one folder is not spam. We show a program these two folder and tell it to figure out the distinguishing characteristics. Then we hand it a folder of email is never seen before. We tell it to categorize each of those mails base on the characteristics it discovered in the known groups of mail. We check the machine’s work and tell it what it got wrong and what it got right. Then we tell it to go back and think about it some more and work out new rules. This process goes on and on with new stacks of mail and new sorting rules and new corrections. Eventually we are confident enough that the machine can make decisions about mail, so we turn it lose on real data.
Even when the machine learning is run against real data there is feedback as to how right or wrong the filters got things. For email filters, this feedback comes directly from the recipients. Specific actions the user takes tell the machine what it got right and what it got wrong and the feedback is incorporated into the rules.
The important thing to remember is the machine doesn’t have any real understanding of what spam is or is not. It has a folder of mail its been told is spam and a folder its been told is not. All the machine is doing is looking for combinations of distinctive things about an email.
What can we do?
It’s always frustrating to get spam in your inbox. It’s even more frustrating when it seems like spammers are getting a free ride while legitimate mail is going to the bulk folder. Discovering why a spam made it into the inbox offers no actionable insight to the email marketer. Many times the answer is because the spammer is actively probing filters to see what gets through and then spamming mail out until the filters catch up. This isn’t sustainable long term and wastes a lot of cycles that could be better used improving address collection, user engagement or any of the other things we know get emails into the inbox.