I want to learn how to spread out agent catalog requests to the master and avoid a thundering herd condition.
Version and installation information
Puppet version: 5.0, 6.0
OS: Any *nix
Installation type: Any
After you’ve added hundreds of nodes to your deployment, you might notice that your agents are running slowly or timing out. By default, an agent checks in to retrieve a catalog when the agent service is started in PE and every 30 minutes thereafter. As a result, if you restart Puppet server on all of your nodes, they might all try to check in at the same time. This can cause a thundering herd of processes, degrading CPU and memory performance.
Options for fixing a thundering herd condition:
Use tasks to delay agent check-in for each node from anywhere between one second up to the configured runinterval.
Note: Choose only one of these solutions.
If you’re using Puppet 5.3.1 or later, use
max-queued-requeststo prevent a thundering herd.
If you’re able to use Cron, spread out agent catalog requests by running Puppet out of Cron.
If you’re not able to use Cron, spread out agent catalog requests using