If Puppet is experiencing performance issues, it could be that too many nodes are checking in simultaneously. Here’s how to find out and what to do.
You must have PuppetDB installed and configured to store agent run reports.
To install and configure PuppetDB, follow the instructions in our documentation.
Ensure that you have
reports = store,puppetdb included in the
[master] section of your
After PuppetDB is properly configured, you must wait for at least two successful Puppet agent runs on all of your nodes to complete before a thundering herd condition can be determined.
Version and installation information
Puppet version: Puppet 6.0
OS: Any *nix
Installation type: Any
After you've added hundreds of nodes to your deployment, you might notice that your agents are running slow or timing out. When hundreds of nodes check in simultaneously to request a catalog, it might cause a thundering herd of processes that causes CPU and memory performance to suffer. To verify that you have a thundering herd condition, you can run a query on the PuppetDB node (the primary server in a monolithic installation) to show how many nodes check in per minute.
Log into the PuppetDB node(the primary server in a monolithic installation) as the pe-postgres user.
Open the PostgreSQL command line interface by running
sudo su - postgres -s /bin/bash -c "/usr/bin/psql -d puppetdb
Find out how many nodes are checking in per minute for the past 7 days by running the following query:
select date_part('month', start_time) as month, date_part('day', start_time) as day, date_part('hour', start_time) as hour, date_part('minute', start_time) as minute, count(*) from reports where start_time between now() - interval '7 days' and now() GROUP BY date_part('month', start_time), date_part('day', start_time), date_part('hour', start_time), date_part('minute', start_time) ORDER BY date_part('month', start_time) DESC, date_part('day', start_time) DESC, date_part( 'hour', start_time ) DESC, date_part('minute', start_time) DESC;
Check the results to see if you have a pattern of many nodes checking in simultaneously during some minutes, and few nodes checking in at other times.
month | day | hour | minute | count -------+-----+------+--------+------- 10 | 11 | 8 | 11 | 2 10 | 11 | 8 | 10 | 9 10 | 11 | 8 | 9 | 115 10 | 11 | 8 | 8 | 858 10 | 11 | 8 | 7 | 33 10 | 11 | 8 | 6 | 80 10 | 11 | 8 | 5 | 182 10 | 11 | 8 | 4 | 155 10 | 11 | 8 | 3 | 92 10 | 11 | 8 | 2 | 29 10 | 11 | 8 | 1 | 24 10 | 11 | 8 | 0 | 21
Exit the PostgreSQL command line by typing
If you find that you have a thundering herd condition, distribute agent check-ins more evenly by taking one of the following approaches:
Use the reidmv/puppet_run_scheduler module to automatically evenly distribute puppet runs with Cron and Scheduled Tasks.
- Confirm that any changes you make are effective by re-running the query in step 3.