Skip to content
This repository has been archived by the owner on Feb 6, 2023. It is now read-only.

Parsing SMTP Socket Errors #23

Open
pbatroff opened this issue Jun 25, 2018 · 4 comments
Open

Parsing SMTP Socket Errors #23

pbatroff opened this issue Jun 25, 2018 · 4 comments

Comments

@pbatroff
Copy link

pbatroff commented Jun 25, 2018

Apparently syntactically valid emails with invalid domains aren't processed properly, and if there are more than 5 such mails, the whole cron-job is interrupted, indefinitely. Probably even other jobs executed after the Mail job (though I haven't verified this yet).

In this case the whole mailing system is blocked, since old mailings wont be completed and the cronjob fails.

In the code it seems like after 5 smtp errors the process is interrupted.

I wonder if this is the correct behavior in this specific case! While I agree that interrupting the process in case of connection problem is correct, invalid domain should lead to setting the email address on hold for the contact.

Is it possible, or would it make sense to include some form of feedback parsing from the SMTP Server at this point?

@totten
Copy link
Member

totten commented Oct 8, 2018

@pbatroff, you raised a good point. First, a few random observations/reactions:

  • The scenario you describe sounds like an accurate interpretation of the code/logic. I haven't r-run it to reproduce the problem -- but (trying to read between the lines) would it be fair to guess that you filed this issue because you actually saw some mailings which got stuck in this way?
  • The SMTP error-classification is older than flexmailer (originated in core). But: Fixing it in flexmailer feels better than fixing in core (i.e. because there's less risk to iterating in an extension).
  • In thinking about how Civi handles SMTP errors, I've rationalized to myself that SMTP servers aren't particularly reliable parties (e.g. some sysadmins setup their own mid-tier smtpd for buffering, but they may misconfigure them).
  • The fear of hypothetical misbehavior in smtpd's shouldn't make us accept bad behavior in flexmailer.
  • Currently, the code categorizes all SMTP errors as either temporary/retriable or permanent/non-retriable (which leads to on-hold).

Agree that parsing the SMTP server response (e.g. in isTemporaryError()) could be good. If you have an example of a specific SMTP response, then 👍 to add another item to isTemporaryError().

Similarly, one could break-down isTemporaryError() and move the strings/patterns in a data-file. That might make it easier to maintain a more robust list of patterns.

The potential monkey-wrench comes if you don't think there is a clear/reliable pattern in SMTP responses -- and want the system to behave well anyway. In that case, one could validate the DNS ahead of time. One approach would be a general email cleanup (along the lines of https://civicrm.org/extensions/email-address-corrector). I expect it'd be a little heavy-handed to hard-code a DNS lookup into the main process (for all recipients / all blasts), but we could also look into some contract where extensions can tap into the delivery-engine and add some validation/veto-style logic.

That leaves the general idea that the current retry might be problematic/degenerate (ie if it always restarts from the earliest failed record, then there might be an irresolvable loop). Ideally, we'd keep some kind of retry count for each recipient -- and use this count to (a) prioritize retries and (b) eventually give up on retries.

I don't have a hard answer -- these things are all good and generally aren't mutually exclusive. Any of them could mitigate the scenario you describe. I'd say take whatever can be funded.

@pbatroff
Copy link
Author

Hey @totten, thanks for the elaborate respone! The problem indeed arose on 2 instances where syntactically correct emails with invalid domains were contained. The result was that the cronjob got interrupted, which is especially bad if other jobs are scheduled after send-scheduled-mailing.

If you are interested, the log file looks like this:

[info] SMTP Socket Error or failed to set sender error. Message: Failed to add recipient: [email protected] [SMTP: Invalid response code received from SMTP server while sending email. This is often caused by a misconfiguration in Outbound Email settings. Please verify the settings at Administer CiviCRM >> Global Settings >> Outbound Email (SMTP). (code: 450, response: 4.1.2 <info@INVALID_DOMAIN.TLD>: Recipient address rejected: Domain not found)], Code: 10005

One way could be to add code to isTemporaryError(), but once you start adding code here you end up with the civi solution for classifying bounce patterns in the extension (I wrote some stuff about that in an extension in gitlab). I don't think we should start this in the extension as well, because it only happens to direct smtp-errors (highly unrealiable as well).

My current solution is a small cronjob who checks DNS for emails periodically with a library, implemented in the mailingtools extension, currently in branch dev_7756_7394 if you are interested. While this is not perfect either, it at least cleans up the email data, and can do so continuously as a Cron Job if needed.

Ideally, we'd keep some kind of retry count for each recipient -- and use this count to (a) prioritize retries and (b) eventually give up on retries.

I would indeed propose to implement a counter to prevent the loop and set a retry limit, thus being able to finish cronjobs/mailings properly. Also if these emails could be marked, it would be fairly easy to get some statistics for them as well as deal with them properly (e.g. set on hold, remove, correct via email-address-corrector, ...).

@bugfolder
Copy link

We just encountered this exact problem with a big mailing. Apparently Civi keeps trying the bad email address forever as part of the mailing job and once 5 bad addresses have percolated to the front of the queue, the cron job terminates every time (but not without stuffing the log with error messages on every attempt).

The way I found to manually unwedge the cron job (without entirely killing the mailing) was to look up the contact IDs with the offending email addresses and then delete those entries from the civicrm_mailing_queue table. That is less than desirable, though, since it's manual and involves direct db-editing.

So it would be nice if this issue were fixed; a retry limit sounds like a good solution.

@Detsieber
Copy link

@totten The pattern I just encountered was:

[info] SMTP Socket Error or failed to set sender error. Message: Failed to add recipient: [email protected] [SMTP: Invalid response code received from SMTP server while sending email. This is often caused by a misconfiguration in Outbound Email settings. Please verify the settings at Administer CiviCRM >> Global Settings >> Outbound Email (SMTP). (code: 450, response: 4.1.2 : Recipient address rejected: Domain not found)], Code: 10005

It would be very helpful if someone could look after that...

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants