Score:1

spamassasin unable to read Japanese when it is html encoded

ug flag

I would like to block some emails which contains certain Japanese words, but spamassassin fails to detect such words when the email is HTML encoded, for example:

This is a multi-part message in MIME format.
--------------050206070005060005050706
Content-Type: text/plain; charset=ISO-2022-JP; format=flowed
Content-Transfer-Encoding: quoted-printable

こんにちは!残念な&=
#12364;ら凶報がございま&#=
12377;。数ヶ月前、あな...

--------------050206070005060005050706
Content-Type: text/html; charset="ISO-2022-JP"
Content-Transfer-Encoding: quoted-printable

<html>
  <head>

    <meta http-equiv=3D"content-type" content=3D"text/html; =
charset=3DISO-2022-JP">
  </head>
  <body bgcolor=3D"#FFFFFF" text=3D"#000000">
    &#12371;&#12435;&#12395;&#12385;&#12399;&#65281;</br>
</br>
&#27531;&#24565;&#12394;&#12364;&#12425;&#20982;&#22577;&#12364;&#12372;&=
#12374;&#12356;&#12414;&#12377;&#12290;</br>
...
  </body>
</html>
--------------050206070005060005050706--

Example rule in spamassassin:

body     JAP_BAD_1  /残念ながら凶報がございます/
score    JAP_BAD_1  5.0

However, when I run the test:

spamassassin -D textcat -t spam.test

It doesn't show up the match. What do I have to do?

anx avatar
fr flag
anx
Any reason to not simply reject *all* mail with numeric html entities in supposedly `text/plain` type parts?
lepe avatar
ug flag
@anx I'm not sure if doing that may reject authentic messages.

Post an answer

Most people don’t grasp that asking a lot of questions unlocks learning and improves interpersonal bonding. In Alison’s studies, for example, though people could accurately recall how many questions had been asked in their conversations, they didn’t intuit the link between questions and liking. Across four studies, in which participants were engaged in conversations themselves or read transcripts of others’ conversations, people tended not to realize that question asking would influence—or had influenced—the level of amity between the conversationalists.