Hi PsyAfter,
The classic MIME specifications specify that the character set that is to be used for message headers is US-ASCII and only US-ASCII is allowed. To facilitate this, rfc2047 defined the rules for encoding non-ASCII text such that it could be used within the headers of a message. If you look at the raw source of many of your international emails, you'll probably see things like this in your headers:
A library I've written, MimeKit, deals with this kind of situation by first checking if the 8-bit text in the headers is UTF-8 and, if so, converts into a C# string using
ImapX could probably use a similar approach if it doesn't already have a charset fallback option (I haven't looked at the code in ImapX in a while and don't recall if it already has such an option).
Hopefully my explanation is useful to both you and to Pavel. If either of you have any questions, feel free to poke me and I will hopefully be able to answer them. My email address is listed on my GitHub page (I think Pavel already knows my email address as we've emailed back and forth a few times already).
-- Jeff
Note: A relatively new addition to the specifications makes it possible to send non-ASCII text in headers, but only if it is in UTF-8. As far as I'm aware, however, there aren't very many servers that support this yet so it is unlikely that, even if the headers are in UTF-8, that it is a client validly constructing the headers - but it is possible.
The classic MIME specifications specify that the character set that is to be used for message headers is US-ASCII and only US-ASCII is allowed. To facilitate this, rfc2047 defined the rules for encoding non-ASCII text such that it could be used within the headers of a message. If you look at the raw source of many of your international emails, you'll probably see things like this in your headers:
=?iso-8859-8-i?b?<base64 blob>?=
It appears that the message you are having issues with does not follow the specifications for this. The reason for disallowing arbitrary 8-bit text in headers is that there's no reliable way for the client (library, in this case) to figure out what the character encoding is.A library I've written, MimeKit, deals with this kind of situation by first checking if the 8-bit text in the headers is UTF-8 and, if so, converts into a C# string using
System.Text.Encoding.UTF8
. If it is not valid UTF-8, then it falls back to a user-supplied charset (ParserOptions.CharsetEncoding). If the headers do not fit the user-supplied charset either, then it falls back to ISO-8859-1. Later, if the user so desires, he/she is able to locate the Header
in the MimeMessage.Headers
list and try to decode the header using a different System.Text.Encoding
.ImapX could probably use a similar approach if it doesn't already have a charset fallback option (I haven't looked at the code in ImapX in a while and don't recall if it already has such an option).
Hopefully my explanation is useful to both you and to Pavel. If either of you have any questions, feel free to poke me and I will hopefully be able to answer them. My email address is listed on my GitHub page (I think Pavel already knows my email address as we've emailed back and forth a few times already).
-- Jeff
Note: A relatively new addition to the specifications makes it possible to send non-ASCII text in headers, but only if it is in UTF-8. As far as I'm aware, however, there aren't very many servers that support this yet so it is unlikely that, even if the headers are in UTF-8, that it is a client validly constructing the headers - but it is possible.