Score:2

How do I (automatically) re-attempt to deliver mail from a different Postfix server?

id flag

My network consists of two different locations and in both I have a Postfix server running for outgoing mail on each of them. Mail gets submitted to any of the two round-robin, based on having them both in DNS, e.g. smtp.mydomain.tld points to two A-records. Users submit their mail there and these servers will send the mail out. All good so far in the happy flow.

Now, sometimes it happens that the nexthop mail server to deliver the mail to is unreachable from one of the two networks. Say, for example, that (VPS) host smtp1 is hosted in EU with cloud provider A and smtp2 is hosted in US by cloud provider B. A mail submitted with a recipient on the example.com domain could sometimes only be delivered through smtp2, because of some networking issues or ASN-level blocks by the receiving party for smtp1. This could be a TCP connection timeout, a temporary SMTP error with a message mentioning a ban for our network/location, etc. - I've seen a lot of different type of failures here.

diagram of delivery failing from one of the two locations

In practice I have seen mail hosts blocking/throttling based on what appears to be based on geolocation, e.g. an American agency's mail server only allowing connections from the US.

What I have tried so far is 'transplanting' these mails manually out of the /var/spool/postfix directory to the other host. And this works. But trying to automate this seems ridiculously complicated.

What I also have tried is setting up transport overrides for known-bad recipient domains. They will then be forwarded to the known-good outgoing smtp server to be delivered OK instead of even attempting to deliver directly to the internet. However, maintaining the list is cumbersome on individual hosts and requires signals from users first together with confirmation what is a known-good path.

What I would like to accomplish is to automate a 'transplant' of these problematic queued messages to the other outbound SMTP server to retry delivery from there. However, I feel that it seems a complicated patch for something that should be fixed in the design.

Ideally, I would like to have a shared outgoing mail queue (database?) for a potential larger amount of outgoing mail servers. This cluster would then attempt to deliver the mail in turn by turn until successful. I just fail to find any non-local directory 'backend' for Postfix as message queue - it seems really limited to just a single host.

Such an ideal design would then also allow us to centralize the outgoing mail queue state to a much more reliable storage and consider the hosts running Postfix as more ephemeral servers that you can scale up and down on demand.

Any attempt to work-around the existing use of a shared queue directory will simply fail, because (1):

The reason is that you cannot share Postfix queues among multiple running Postfix instances.

Anyway, to keep the scope here on the initial issue... how do you handle such deliverability issues with your mail cluster? Looking for a solution that uses Postfix, preferably.

anx avatar
fr flag
anx
Its easy to see the appeal of adding a degree of centralization, but however you proceed with that: partition - don't distribute. Your mail flows are highly unlikely uniform, so if to recipients you appear to apply random mixing to make them so, you will appear as working against some of the least intrusive and most universally applicable tools they have to fight spam, and making the modified setup ultimately less reliable.
Score:2
in flag

Configure your postfix servers to allow them to relay for each other and then add their counter to the smtp_fallback_relay list.

That should be a Postfix native way to ensure that the original SMTP message will be forwarded to your alternate mail-relay for a second delivery attempt after the first delivery attempt fails with a non-permanent error.

As noted that might be slightly detrimental when (a lot of) your destinations use greylisting.

smtp_fallback_relay (default: $fallback_relay)
Optional list of relay destinations that will be used when an SMTP destination is not found, or when delivery fails due to a non-permanent error. With Postfix 2.2 and earlier this parameter is called fallback_relay.

By default, smtp_fallback_relay is empty, mail is returned to the sender when a destination is not found, and delivery is deferred after it fails due to a non-permanent error.

With bulk email deliveries, it can be beneficial to run the fallback relay MTA on the same host, so that it can reuse the sender IP address. This speeds up deliveries that are delayed by IP-based reputation systems (greylist, etc.).

The fallback relays must be SMTP destinations. Specify a domain, host, host:port, [host]:port, [address] or [address]:port; the form [host] turns off MX lookups. If you specify multiple SMTP destinations, Postfix will try them in the specified order.

To prevent mailer loops between MX hosts and fall-back hosts, Postfix version 2.2 and later will not use the fallback relays for destinations that it is MX host for (assuming DNS lookup is turned on).

mangohost

Post an answer

Most people don’t grasp that asking a lot of questions unlocks learning and improves interpersonal bonding. In Alison’s studies, for example, though people could accurately recall how many questions had been asked in their conversations, they didn’t intuit the link between questions and liking. Across four studies, in which participants were engaged in conversations themselves or read transcripts of others’ conversations, people tended not to realize that question asking would influence—or had influenced—the level of amity between the conversationalists.