2023-06-02 05:38 PM
We are seeing a large number of phishing emails that have pretty predictable content in the body of the message. I was thinking about creating a parser to handle some of the more pervasive messages. My first cut are messages where, for example, the email is sent to user@domain.com and the body of the message starts with "Dear user@domain.com," These are notoriously phishing emails with a pdf attachment with an invoice. I'm not a lua parser writer, but certainly could learn to be. But does anyone have thoughts on a first cut of a parser for this type of email? Or maybe another suggestion on how to tag these messages.
Thanks,
/Dion
2023-10-25 04:30 PM
Dion,
Let me start by saying I'm not a LUA parser expert either, however my support experience has generally shown that parsing message body data can be very difficult from a conceptualization perspective. When writing LUA parsers you have to know roughly where in the session stream the data you are looking for is located. The reason our parsers work is because TCP/UDP sessions have to be built in a very specific way due to RFC specs. Same holds true for most universal content related items such as emails. The email headers are laid out in very specific places and can be easily parsed. What you are looking for is actually trying to parse by doing a search within the message body. I'm not saying this isn't possible, only that it can be very time consuming for the decoder parsing which can lead to packet drops on busy systems and you really have to know what you are doing. This is probably why even the Mail_LUA parser does not parse the email body into a meta key.
I would look closely at several parsed email examples that you know are phishing and see if there is any other meta that is currently being parsed by the mail_lua parser that you can use to build a profile for a phishing attempt. As you mentioned there are several items within the body, but you may find that the headers are generally manipulated in some way. If you can find some commonalities you may be able to use that meta along with Feeds and Application Rules to create a new piece of meta to represent all emails you consider to be suspicious. You may not be able to get 100% certainty that it is phishing but it may get you close. This will then allow you to do reports, alerts and dashlets to see these easier.
I know this isn't exactly what you were looking for but I hope it helps with possible ideas and directions you might be able to use to help with your use case.