So finally, we have found out the cause and solution. I will try to keep it brief:
Confirmed by Epicor Support, Epicor 10 is hard coded to retry an email operation after 30 seconds if there isn’t an smtp acknowledgement provided by the smtp server within that timeframe.
The default timeout for Exchange servers to provide an acknowledgement (what’s called the MaxAcknowledgementDelay) when it’s still processing a request is 30 seconds. What this means is that if an email does not fail or succeed within 30 seconds, the email server will still continue to try to send the message for a length of time but at least provide back an acknowledgement message to the sender (Epicor) indicating that it’s email is still in process.
The customers having issues, as I mentioned before, have excessively long smtp transaction times, for what could be a variety of reasons which ultimately take anywhere over 30+. This is why it only happens to some of our customers.
I have configured our email server to provide an acknowledgement within 10 seconds so that Epicor gets the response it’s waiting for well before the timeout. This prevents Epicor from sending additional email send requests every 30 seconds and then ultimately producing the ‘Operation Timed Out’ error despite all emails actually sending.
Thank you everyone for your help and ideas. Hopefully no one else encounters this issue, but if they do they can find this information here.