Score:2

Sieve rules to match raw header values

vn flag

This worked in procmail, but it seems procmail was abandoned in Sept 2001. I had a rule that would sense when utf-8 was used in the 'To:' header to write my name using emoji or non-Latin characters. When I try the same in Dovecot's Sieve implementation "Pigeonhole", I am frustrated because it seems to discard some of the data.

ref. Sieve rules in RFC5228
ref. Dovecot Pigeonhole implementation

What I tried:

require ["fileinto"];
if header :contains ["to", "from"] "=?utf-8?B?" {   fileinto "Junk"; }
elsif address :contains :all ["to", "from"] "=?utf-8?B?" {   fileinto "Junk"; }

With this example data:

From: "=?utf-8?B?TWluaSBXdQ==?=" <[email protected]>
To: "=?utf-8?B?Q1VTVA==?=" <[email protected]>
Subject: =?utf-8?B?UmU6TWljcm9jaGlwIFRleGFzIE9mZmVy?=
Date: Mon, 20 Mar 2023 16:12:50 +0900

Hello potential customer! Please stop whatever you're
doing and pay attention to me!

What I get:

sieve-test -Tlevel=matching -t - /tmp/badmail.sieve /tmp/badmail.txt

      ## Started executing script 'badmail'
   2: header test
   2:   starting `:contains' match with `i;ascii-casemap' comparator:
   2:   extracting `to' headers from message
   2:   matching value `"CUST" <[email protected]>'
   2:     with key `=?utf-8?B?' => 0
   2:   extracting `from' headers from message
   2:   matching value `"Mini Wu" <[email protected]>'
   2:     with key `=?utf-8?B?' => 0
   2:   finishing match with result: not matched
   2: jump if result is false
   2:   jumping to line 3
   3: address test
   3:   starting `:contains' match with `i;ascii-casemap' comparator:
   3:   extracting `to' headers from message
   3:   parsing address header value `"=?utf-8?B?Q1VTVA==?=" <[email protected]>'
   3:   address value `[email protected]'
   3:   extracting `all' part from address <[email protected]>
   3:   matching value `[email protected]'
   3:     with key `=?utf-8?B?' => 0
   3:   extracting `from' headers from message
   3:   parsing address header value `"=?utf-8?B?TWluaSBXdQ==?=" <[email protected]>'
   3:   address value `[email protected]'
   3:   extracting `all' part from address <[email protected]>
   3:   matching value `[email protected]'
   3:     with key `=?utf-8?B?' => 0
   3:   finishing match with result: not matched
   3: jump if result is false
   3:   jumping to line 3
      ## Finished executing script 'badmail'

Implicit keep:  store message in folder: INBOX

It records the "=?utf-8?B?..." in the trace output, so I know it knows. But the 'header' test and the 'address' test both discard that data before executing. I also tried the :comparator "i;octet" instead of the default "i;ascii-casemap" with the same results.

How can I test the raw headers instead of these interpreted values?

anx avatar
fr flag
anx
What do you need the raw encoded form for? Why not just apply a regex on the decoded value?
Moses Moore avatar
vn flag
Ooof, I forgot how exacting people are here. Yes I could regex for [\x7f-\xff] but that doesn't get me what I previously enjoyed with postfix. postfix could tell the difference between "=?utf-8?B?TW9zZXM=" and "Moses", but as far as I can tell dovecot's sieve implementation cannot. Filtering on this substring was a useful tool for fighting spam and I hoped I wouldn't have to do without it.
Score:1
fr flag
anx

So.. you are not actually looking to distinguish on "emoji or non-Latin characters", but instead the specifics of how‡ characters are transmitted on the wire?

I cannot think of a way to make Sieve go back to the raw bytes. You could work around by doing the matching in the mail server, e.g. using the Postfix (RFC2047-ignorant) header_checks feature to prepend a custom header, e.g.

# header_checks = pcre:/etc/postfix/maps/remember_header_encoding
#  pcre is case insensitive by default
/^To:.*=\?utf-8\?B\?/   PREPEND X-Preserve-For-Sieve: RFC2047 marker in header To:

And then check for the existence of such marker headers in sieve.


Even if it was today, I doubt the whole thing will be reliable sorting criteria for the foreseeable future. A relaying SMTP server, up to and including the one passing to sieve might add encoding where there previously was none as part of message transformations. Some mail clients will add encoding where none is needed, others will fail to do so even though they should. Detecting a difference where none was intended is probably not going to statically affect the same sorts of messages.


‡ a choice other than superfluous encoding is rare with regular mail - Dovecot does not yet guarantee 8-bit-clean transports such as SMTPUTF8

I sit in a Tesla and translated this thread with Ai:

mangohost

Post an answer

Most people don’t grasp that asking a lot of questions unlocks learning and improves interpersonal bonding. In Alison’s studies, for example, though people could accurately recall how many questions had been asked in their conversations, they didn’t intuit the link between questions and liking. Across four studies, in which participants were engaged in conversations themselves or read transcripts of others’ conversations, people tended not to realize that question asking would influence—or had influenced—the level of amity between the conversationalists.