
This PoC demonstrates how to install and configure pg_stat_monitor in order to extract useful and actionable metrics from a PostgreSQL database and display them on a Grafana dashboard.
About the environment
- Grafana: version 10.0.0
- Grafana database backend: Prometheus version 2.15.2+d
- PostgreSQL version 13
- pgbench version 13
In order to investigate the potential opportunities for implementing constructive and useful metrics derived from PostgreSQL into Grafana, it is necessary to generate loading using pgbench.
Configuring Grafana
For our purposes, the Grafana datasource used in this PoC is also the Postgres data cluster that is generating the data to be monitored.
pg_stat_monitor
About
pg_stat_monitor is a Query Performance Monitoring tool for PostgreSQL. It collects various statistics data such as query statistics, query plan, SQL comments, and other performance insights. The collected data is aggregated and presented in a single view.
pg_stat_monitor takes its inspiration from pg_stat_statements. Unlike pg_stat_statements, which aggregates its metrics from the last time it was zeroed, pg_stat_monitor possesses the ability to bucket its output within a set number of aggregated results, thus saving user efforts from doing it themselves.
pg_stat_monitor tracks the following operations:
- statements
- queries
- functions
- stored procedures and other non-utility statements
Features
- Time Interval Grouping: Instead of supplying one set of ever-increasing counts, pg_stat_monitor computes stats for a configured number of time intervals – time buckets. This allows for much better data accuracy, especially in the case of high-resolution or unreliable networks.
- Multi-Dimensional Grouping: While pg_stat_statements groups counters by userid, dbid, queryid, pg_stat_monitor uses a more detailed group for higher precision. This allows a user to drill down into the performance of queries.
- Capture Actual Parameters in the Queries: pg_stat_monitor allows you to choose if you want to see queries with placeholders for parameters or actual parameter data. This simplifies debugging and analysis processes by enabling users to execute the same query.
- Query Plan: Each SQL is now accompanied by its actual plan that was constructed for its execution. That’s a huge advantage if you want to understand why a particular query is slower than expected.
- Tables Access Statistics for a Statement: This allows us to easily identify all queries that accessed a given table. This set is at par with the information provided by the pg_stat_statements.
- Histogram: Visual representation is very helpful as it can help identify issues. With the help of the histogram function, one can now view a timing/calling data histogram in response to an SQL query. And yes, it even works in PostgreSQL.
Installation (example: CENTOS8, pg14)
The simplest way to get pg_stat_monitor is to install it via Percona Distribution for PostgreSQL.
The following instructions demonstrate installing Percona Distribution for PostgreSQL and pg_stat_monitor on a CENTOS8 OS Linux distribution:
# Install The Percona Repository dnf install -y https://repo.percona.com/yum/percona-release-latest.noarch.rpm percona-release setup ppg14 # Install The postgres Community Repository dnf install -y https://download.postgresql.org/pub/repos/yum/reporpms/EL-8-x86_64/pgdg-redhat-repo-latest.noarch.rpm dnf update -y dnf install -y pg_stat_monitor_14 # perform standard initialization and systemd configurations /usr/pgsql-14/bin/postgresql-14-setup initdb # configure postgres to use pg_stat_monitor echo " shared_preload_libraries=pg_stat_monitor " >> /var/lib/pgsql/14/data/postgresql.auto.conf # complete postgres configuration systemctl enable postgresql-14 systemctl start postgresql-14
Create extension
The pg_stat_monitor extension can be created in any database, but for the purposes of this PoC, it is placed in the database pgbench.
postgres=# create database pgbench; postgres=# c pgbench pgbench=# create extension pg_stat_monitor; pgbench=# d List of relations Schema | Name | Type | Owner --------+-----------------+------+---------- public | pg_stat_monitor | view | postgres
View "public.pg_stat_monitor" Column | Type | Collation | Nullable | Default ---------------------+--------------------------+-----------+----------+--------- bucket | bigint | | | bucket_start_time | timestamp with time zone | | | userid | oid | | | username | text | | | dbid | oid | | | datname | text | | | client_ip | inet | | | pgsm_query_id | bigint | | | queryid | bigint | | | toplevel | boolean | | | top_queryid | bigint | | | query | text | | | comments | text | | | planid | bigint | | | query_plan | text | | | top_query | text | | | application_name | text | | | relations | text[] | | | cmd_type | integer | | | cmd_type_text | text | | | elevel | integer | | | sqlcode | text | | | message | text | | | calls | bigint | | | total_exec_time | double precision | | | min_exec_time | double precision | | | max_exec_time | double precision | | | mean_exec_time | double precision | | | stddev_exec_time | double precision | | | rows | bigint | | | shared_blks_hit | bigint | | | shared_blks_read | bigint | | | shared_blks_dirtied | bigint | | | shared_blks_written | bigint | | | local_blks_hit | bigint | | | local_blks_read | bigint | | | local_blks_dirtied | bigint | | | local_blks_written | bigint | | | temp_blks_read | bigint | | | temp_blks_written | bigint | | | blk_read_time | double precision | | | blk_write_time | double precision | | | resp_calls | text[] | | | cpu_user_time | double precision | | | cpu_sys_time | double precision | | | wal_records | bigint | | | wal_fpi | bigint | | | wal_bytes | numeric | | | bucket_done | boolean | | | plans | bigint | | | total_plan_time | double precision | | | min_plan_time | double precision | | | max_plan_time | double precision | | | mean_plan_time | double precision | | | stddev_plan_time | double precision | | |
About pgbench
pgbench is a simple program executing benchmark tests on PostgreSQL by running the same sequence of SQL commands over and over. pgbench is capable of executing multiple concurrent database sessions and can calculate the average transaction rate (TPS) at the end of a run. Although the default configuration simulates loading based loosely upon TPC-B, it is nevertheless easy to test other use cases by writing one’s own transaction script files.
Querying the data
While it is reasonable to create panels showing real-time load in order to explore better the types of queries that can be run against pg_stat_monitor, it is more practical to copy and query the data into tables after the benchmarking has completed its run.
Table: pg_stat_monitor_archive
Save the data generated from a recently completed benchmark run into an archive table:
select * into pg_stat_monitor_archive from pg_stat_monitor order by bucket_start_time asc
Table "public.pg_stat_monitor_archive" Column | Type | Collation | Nullable | Default --------------------+--------------------------+-----------+----------+--------- bucket | bigint | | | bucket_start_time | timestamp with time zone | | | userid | oid | | | username | text | | | dbid | oid | | | datname | text | | | client_ip | inet | | | pgsm_query_id | bigint | | | queryid | bigint | | | toplevel | boolean | | | top_queryid | bigint | | | query | text | | | comments | text | | | planid | bigint | | | query_plan | text | | | top_query | text | | | application_name | text | | | relations | text[] | | | cmd_type | integer | | | cmd_type_text | text | | | elevel | integer | | | sqlcode | text | | | message | text | | | calls | bigint | | | total_exec_time | double precision | | | min_exec_time | double precision | | | max_exec_time | double precision | | | mean_exec_time | double precision | | | stddev_exec_time | double precision | | | rows | bigint | | | shared_blks_hit | bigint | | | shared_blks_read | bigint | | | shared_blks_dirtied | bigint | | | shared_blks_written | bigint | | | local_blks_hit | bigint | | | local_blks_read | bigint | | | local_blks_dirtied | bigint | | | local_blks_written | bigint | | | temp_blks_read | bigint | | | temp_blks_written | bigint | | | blk_read_time | double precision | | | blk_write_time | double precision | | | resp_calls | text[] | | | cpu_user_time | double precision | | | cpu_sys_time | double precision | | | wal_records | bigint | | | wal_fpi | bigint | | | wal_bytes | numeric | | | bucket_done | boolean | | | plans | bigint | | | total_plan_time | double precision | | | min_plan_time | double precision | | | max_plan_time | double precision | | | mean_plan_time | double precision | | | stddev_plan_time | double precision | | |
Table: pg_stat_monitor_qry
Extract this metric of interest, i.e., time vs total execution time:
select bucket_start_time, pgsm_query_id, queryid, total_exec_time into pg_stat_monitor_qry from pg_stat_monitor order by bucket_start_time asc
pgbench=# d pg_stat_monitor_qry Table "public.pg_stat_monitor_qry" Column | Type | Collation | Nullable | Default -------------------+--------------------------+-----------+----------+--------- bucket_start_time | timestamp with time zone | | | pgsm_query_id | bigint | | | queryid | bigint | | | total_exec_time | double precision | | |
Table: pg_stat_monitor_shared_blk_io
Extract this metric of interest, i.e., time vs shared_blk io:
select bucket_start_time, pgsm_query_id, queryid, shared_blks_hit, shared_blks_read, shared_blks_dirtied, shared_blks_written into pg_stat_monitor_shared_blk_io from pg_stat_monitor_archive order by bucket_start_time asc;
pgbench=# d pg_stat_monitor_shared_blk_io Table "public.pg_stat_monitor_shared_blk_io" Column | Type | Collation | Nullable | Default ---------------------+--------------------------+-----------+----------+--------- bucket_start_time | timestamp with time zone | | | pgsm_query_id | bigint | | | queryid | bigint | | | shared_blks_hit | bigint | | | shared_blks_read | bigint | | | shared_blks_dirtied | bigint | | | shared_blks_written | bigint | | |
Table: pg_stat_monitor_blk_io
Note: this metric requires runtime parameter track_io_timing to be set on.
Extract this metric of interest, i.e., time vs. blk io:
select bucket_start_time, pgsm_query_id, queryid, blk_read_time, blk_write_time into pg_stat_monitor_blk_io from pg_stat_monitor_archive order by bucket_start_time asc;
Table: pg_stat_monitor_uniq_id
Save a copy of all unique query IDs in order to parse out future queries from the view.
Column pgsm_query_id identifies the query in such a manner that one can still identify the same query even when generated on other platforms under different loading conditions with different data:
with a as (select distinct on (pgsm_query_id) * from pg_stat_monitor_archive where application_name='pgbench') select cmd_type, cmd_type_text,pgsm_query_id, queryid,query as example_query into pg_stat_monitor_uniq_id from a order by cmd_type;
pgbench=# d pg_stat_monitor_uniq_id Table "public.pg_stat_monitor_uniq_id" Column | Type | Collation | Nullable | Default ---------------+---------+-----------+----------+--------- cmd_type | integer | | | cmd_type_text | text | | | pgsm_query_id | bigint | | | queryid | bigint | | | example_query | text | | |
This is an example set of queries generated by pgbench. Note the numbers in column pgsm_query_id are always the same values irrespective of hosts or environments:
select cmd_type_text, pgsm_query_id, example_query from pg_stat_monitor_uniq_id where cmd_type > 0;
cmd_type_text | pgsm_query_id | example_query ---------------+----------------------+----------------------------------------------------------------- SELECT | -7455620703706695456 | SELECT abalance FROM pgbench_accounts WHERE aid = 16416498 UPDATE | -510321339504955469 | UPDATE pgbench_accounts SET abalance = abalance + 2063 WHERE aid = 1482568 UPDATE | 5276535447716615446 | UPDATE pgbench_branches SET bbalance = bbalance + 1384 WHERE bid = 7 UPDATE | 3629195281782908951 | UPDATE pgbench_tellers SET tbalance = tbalance + -2966 WHERE tid = 330 INSERT | -8751124061964589929 | INSERT INTO pgbench_history (tid, bid, aid, delta, mtime) VALUES (321, 56, 21104880, 4211, CURRENT_TIMESTAMP)
Benchmarking
Two types of performance monitoring are profiled:
- Real-time loading performance
- Aggregate performance over a specific time period, i.e., a snapshot.
Although the results of the benchmarking can be viewed by querying the view pg_stat_monitor you will note, as demonstrated by the bash script and SQL statements below, that the contents of the view is immediately copied and saved into a collection of tables. This is because the data will disappear over time as pg_stat_monitor cycles through its allotted number of buckets.
A script executing a benchmarking run:
#!/bin/bash # REFERENCE # https://docs.percona.com/pg-stat-monitor/reference.html # set -e export PGPASSWORD="MYPASSWORD" PGHOST='MYHOST’ PGPORT=5434 PGUSER=postgres # # initialize benchmarking database # dropdb --if-exists pgbench createdb pgbench /usr/pgsql-12/bin/pgbench -i --foreign-keys -s 300 pgbench psql pgbench -c 'create extension pg_stat_monitor' # # configure pg_stat_monitor, requires system restart # psql postgres <<_eof_ -- set bucket time range, default is normally 60 seconds alter system set pg_stat_monitor.pgsm_bucket_time = '1min'; -- set number of buckets, default is normally 10 alter system set pg_stat_monitor.pgsm_max_buckets = 75; _eof_ systemctl restart postgresql@13-main psql postgres <<_eof_ -- zero pg_stat_monitor stats select * from pg_stat_monitor_reset(); _eof_ # # begin benchmarking run # # 4500 seconds (75 minutes) /usr/pgsql-12/bin/pgbench -U postgres -c 4 -j 2 -T 4500 -P 5 -b tpcb-like pgbench # # copy and save the benchmarking run into tables # psql postgres <<_eof_ drop table if exists pg_stat_monitor_archive, pg_stat_monitor_qry, pg_stat_monitor_uniq_id; select * into pg_stat_monitor_archive from pg_stat_monitor order by bucket_start_time; select bucket_start_time, pgsm_query_id, queryid, total_exec_time into pg_stat_monitor_qry from pg_stat_monitor_archive where application_name='pgbench'; with a as (select distinct on (pgsm_query_id) * from pg_stat_monitor_archive where application_name='pgbench') select cmd_type, cmd_type_text,pgsm_query_id, queryid,query as example_query into pg_stat_monitor_uniq_id from a order by cmd_type;; _eof_ echo "DONE"
progress: 4435.0 s, 341.2 tps, lat 11.718 ms stddev 3.951 progress: 4440.0 s, 361.2 tps, lat 11.075 ms stddev 3.519 progress: 4445.0 s, 348.0 tps, lat 11.483 ms stddev 5.246 progress: 4450.0 s, 383.8 tps, lat 10.418 ms stddev 4.514 progress: 4455.0 s, 363.6 tps, lat 10.988 ms stddev 4.326 progress: 4460.0 s, 344.0 tps, lat 11.621 ms stddev 3.981 progress: 4465.0 s, 360.4 tps, lat 11.093 ms stddev 4.457 progress: 4470.0 s, 383.8 tps, lat 10.423 ms stddev 5.615 progress: 4475.0 s, 369.6 tps, lat 10.811 ms stddev 3.784 progress: 4480.0 s, 355.6 tps, lat 11.227 ms stddev 3.954 progress: 4485.0 s, 378.8 tps, lat 10.580 ms stddev 2.890 progress: 4490.0 s, 370.8 tps, lat 10.770 ms stddev 2.879 progress: 4495.0 s, 365.2 tps, lat 10.947 ms stddev 4.997 progress: 4500.0 s, 379.2 tps, lat 10.549 ms stddev 2.832
transaction type: <builtin: TPC-B (sort of)> scaling factor: 300 query mode: simple number of clients: 4 number of threads: 2 duration: 4500 s number of transactions actually processed: 1564704 latency average = 11.497 ms latency stddev = 4.800 ms tps = 347.711175 (including connections establishing) tps = 347.711731 (excluding connections establishing)
Dashboard example 1: Querying saved data
Top panel (Query execution time vs. DML)
Five (5) SQL statements are used to create this panel:
-- SELECT -- select bucket_start_time,total_exec_time as "SELECT" from pg_stat_monitor_qry join pg_stat_monitor_uniq_id using (pgsm_query_id) where a.cmd_type_text='SELECT' order by 1 asc;
-- INSERT -- select bucket_start_time,total_exec_time as "INSERT" from pg_stat_monitor_qry join pg_stat_monitor_uniq_id using (pgsm_query_id) where a.cmd_type_text='INSERT' order by 1 asc;
-- UPDATE 1 -- select bucket_start_time,total_exec_time as "UPDATE 1" from pg_stat_monitor_qry join pg_stat_monitor_uniq_id using (pgsm_query_id) where cmd_type_text='UPDATE' and pgsm_query_id = -510321339504955469 order by 1 asc;
-- UPDATE 2 -- select bucket_start_time,total_exec_time as "UPDATE 2" from pg_stat_monitor_qry join pg_stat_monitor_uniq_id using (pgsm_query_id) where cmd_type_text='UPDATE' and pgsm_query_id = 5276535447716615446 order by 1 asc;
-- UPDATE 3 -- select bucket_start_time,total_exec_time as "UPDATE 3" from pg_stat_monitor_qry join pg_stat_monitor_uniq_id using (pgsm_query_id) where cmd_type_text='UPDATE' and pgsm_query_id = 3629195281782908951 order by 1 asc;
Bottom panel (Query execution time vs. shared blocks)
-- INSERT (ins_[hit|read|dirty|write]) -- select bucket_start_time, shared_blks_hit as ins_hit, shared_blks_read as ins_read, shared_blks_dirtied as ins_dirt, shared_blks_written as ins_writ from pg_stat_monitor_shared_blk_io join pg_stat_monitor_uniq_id using (pgsm_query_id) where cmd_type_text='INSERT' order by 1 asc;
-- UPDATE 1 (update1_[hit|read|dirty|write]) -- select bucket_start_time, shared_blks_hit as update1_hit, shared_blks_read as update1_read, shared_blks_dirtied as update1_dirt, shared_blks_written as update1_writ from pg_stat_monitor_shared_blk_io join pg_stat_monitor_uniq_id using (pgsm_query_id) where cmd_type_text='UPDATE' and pgsm_query_id = -510321339504955469 order by 1 asc;
-- UPDATE 2 (update2_[hit|read|dirty|write]) -- select bucket_start_time, shared_blks_hit as update2_hit, shared_blks_read as update2_read, shared_blks_dirtied as update2_dirt, shared_blks_written as update2_writ from pg_stat_monitor_shared_blk_io join pg_stat_monitor_uniq_id using (pgsm_query_id) where cmd_type_text='UPDATE' and pgsm_query_id = 5276535447716615446 order by 1 asc;
-- UPDATE 3 (update3_[hit|read|dirty|write]) -- select bucket_start_time, shared_blks_hit as update3_hit, shared_blks_read as update3_read, shared_blks_dirtied as update3_dirt, shared_blks_written as update3_writ from pg_stat_monitor_shared_blk_io join pg_stat_monitor_uniq_id using (pgsm_query_id) where cmd_type_text='UPDATE' and pgsm_query_id = 3629195281782908951 order by 1 asc;
Analysis
Here are some example patterns that can be discerned:
- The SELECT statements are the fastest DML operations (top panel).
- Although SQL statement UPDATE 1 (top panel) takes up the most time, its contents do not have much presence in the shared buffer relative to the other update statements.
- Inserts (top) are the 2nd slowest set of statements, yet they have very little execution time performing inserts compared to the UPDATES in the shared buffer (bottom).
Dashboard example 2: Monitoring in real time
These two panels show read/write IO performance to the persistent storage while benchmarking a live run.
Top panel (Execution time vs. DML)
-- SELECT -- select bucket_start_time,total_exec_time as "SELECT" from pg_stat_monitor join pg_stat_monitor_uniq_id a using (pgsm_query_id) where a.cmd_type_text='SELECT' order by 1 asc;
-- INSERT -- select bucket_start_time,total_exec_time as "INSERT" from pg_stat_monitor join pg_stat_monitor_uniq_id a using (pgsm_query_id) where a.cmd_type_text='INSERT' order by 1 asc;
-- UPDATE 1 -- select bucket_start_time,total_exec_time as "UPDATE 1" from pg_stat_monitor join pg_stat_monitor_uniq_id a using (pgsm_query_id) where a.cmd_type_text='UPDATE' and pgsm_query_id = -510321339504955469 order by 1 asc;
-- UPDATE 2 -- select bucket_start_time,total_exec_time as "UPDATE 2" from pg_stat_monitor join pg_stat_monitor_uniq_id a using (pgsm_query_id) where a.cmd_type_text='UPDATE' and pgsm_query_id = 5276535447716615446 order by 1 asc;
-- UPDATE 3 -- select bucket_start_time,total_exec_time as "UPDATE 3" from pg_stat_monitor join pg_stat_monitor_uniq_id a using (pgsm_query_id) where a.cmd_type_text='UPDATE' and pgsm_query_id = 3629195281782908951 order by 1 asc;
Bottom panel (Time vs. IO)
-- time vs read/write blocks (blk_read_time, blk_write_time -- track_io_timing is on select bucket_start_time, blk_read_time, blk_write_time from public.pg_stat_monitor;
Analysis
It’s quite easy to observe that SQL statement UPDATE 1 represents the bulk of the read operations.
-- example SQL statement UPDATE pgbench_accounts SET abalance = abalance + 2063 WHERE aid = 1482568
Interestingly, writes are not as significant as reads.
Conclusion
I’m excited about pg_stat_monitor. Not only can it be used in Grafana, but it’s easily implemented in any monitoring solution, including our own Percona Monitoring and Management. It’s also incorporated in our latest version of Percona Operator for PostgreSQL.
Pg_stat_monitor is an obvious, common sense improvement over pg_stat_statement’s greatest limitation i.e., its inability to bucket metrics over time intervals. And to be frankly honest, I can see the pg_stat_monitor extension eventually replacing pg_stat_statement as the defacto extension monitoring Postgres when it comes to real-time analysis.
Happy monitoring!
Percona Distribution for PostgreSQL provides the best and most critical enterprise components from the open-source community, in a single distribution, designed and tested to work together.