Section 8.2 of RFC 8555 explains how retries apply to the validation
process. However, much is left up to the implementer.
Add retries every 12 seconds for 2 minutes after a client requests a
validation. The challenge status remains "processing" indefinitely until
a distinct conclusion is reached. This allows a client to continually
re-request a validation by sending a post-get to the challenge resource
until the process fails or succeeds.
Challenges in the processing state include information about why a
validation did not complete in the error field. The server also includes
a Retry-After header to help clients and servers coordinate.
Retries are inherently stateful because they're part of the public API.
When running step-ca in a highly available setup with replicas, care
must be taken to maintain a persistent identifier for each instance
"slot". In kubernetes, this implies a *stateful set*.
Make sure we do not pass domains with asterisk (wildcard) in the middle,
like _acme-challenge.*.example.com to lookupTxt function, but preprocess
domain and remove leading wildcard so we lookup for
_acme-challenge.example.com.
Perform domain normalization for wildcard domains, so we do query
TXT records for _acme-challenge.example.domain instead of
_acme-challenge.*.example.domain when performing DNS-01 challenge. In
this way the behavior is consistent with letsencrypt and records queried
are in sync with the ones that are shown in certbot manual mode.