I have a thundering herd. I would like Puppet Server to mitigate herds by spreading agents out when too many check in at once.
Version and installation information
Puppet version: 5.3.1 and later
OS: *nix
Installation type: All
Solution
Configure puppetserver
to return 503 Service Unavailable error responses with random Retry-After headers when too many agents check in at once. Agents sleep for a random amount of time set in the Retry-After field and then check in, breaking up the herd.
Note: Do not use this solution if you have a significant number of agents on a version older than Puppet 5.3. Older agents treat a 503 response as a failure, ending their runs. This causes groups of older agents to schedule their next runs at the same time, creating a thundering herd.
Edit your puppetserver
configuration file, by default located at /etc/puppetlabs/puppetserver/conf.d/puppetserver.conf
. Set the values max-queued-requests
and max-retry-delay
in the jruby-puppet
section of the puppetserver.conf
.
# configuration for the JRuby interpreters jruby-puppet: { ... max-queued-requests: 48 max-retry-delay: 600 }
The max-queued-requests
setting limits the maximum number of waiting requests allowed before puppetserver
starts sending 503 responses to spread agents out. Change this setting based on the number of JRuby workers Puppet Server is running. Start with a limit of 12 queued requests per JRuby. The example above is based on the default JRuby worker pool of 4 instances.
The max-retry-delay
setting limits the maximum amount of time that puppetserver
returns as a Retry-After header on 503 responses. This limit is multiplied by a random number, and each agent sleeps for a different amount of time, preventing a thundering herd. The example above uses a limit of 10 minutes. You might want to manage these settings with Puppet, especially if you have more than one compiler. You can create a profile and role to do that.
Comments
0 comments
Please sign in to leave a comment.