SQL Tips & Tricks - PostgreSQL Queries - Part 2
This is part 2 of PostgreSQL Queries Tips & Tricks from performance and overall backend side of your database.
In PART 1 we have covered, duplicated, rarely used indexes and general amount of elements in your DB.
This time we will dive into the details of tables and indexes. Below you can find 2 queries to rule them all. Enjoy the read!
Get All Tables Details And Wasted Bytes
-- Get all tables and indexes size and bloat size
WITH get_schema_names AS (
SELECT schema_name
FROM information_schema.schemata
WHERE schema_owner <> 'postgres'
) SELECT DISTINCT
current_database(),
schemaname as schema_name,
tablename as table_name,
total_bytes as object_total_size_bytes,
pg_size_pretty(total_bytes) as object_total_size,
(total_bytes-index_bytes-COALESCE(toast_bytes,0)) AS table_size_bytes,
pg_size_pretty((total_bytes-index_bytes-COALESCE(toast_bytes,0))) AS table_size,
ROUND((CASE WHEN otta=0 THEN 0.0 ELSE sml.relpages::FLOAT/otta END)::NUMERIC,1) AS tbloat,
CASE WHEN relpages < otta THEN 0 ELSE bs*(sml.relpages-otta)::BIGINT END AS wastedbytes,
pg_size_pretty(CASE WHEN relpages < otta THEN 0 ELSE bs*(sml.relpages-otta)::BIGINT END) as wasted_size
FROM (
SELECT schemaname,
tablename,
cc.reltuples,
cc.relpages,
bs,
pg_total_relation_size(cc.oid) AS total_bytes,
pg_indexes_size(cc.oid) AS index_bytes,
pg_total_relation_size(cc.reltoastrelid) AS toast_bytes,
CEIL((cc.reltuples*((datahdr+ma-(CASE
WHEN datahdr%ma=0 THEN ma
ELSE datahdr%ma
END))+nullhdr2+4))/(bs-20::FLOAT)) AS otta
FROM ( SELECT ma,
bs,
schemaname,
tablename,
(datawidth+(hdr+ma-(CASE WHEN hdr%ma=0 THEN ma ELSE hdr%ma END)))::NUMERIC AS datahdr,
(maxfracsum*(nullhdr+ma-(CASE WHEN nullhdr%ma=0 THEN ma ELSE nullhdr%ma END))) AS nullhdr2
FROM (SELECT schemaname,
tablename,
hdr,
ma,
bs,
SUM((1-null_frac)*avg_width) AS datawidth,
MAX(null_frac) AS maxfracsum,
hdr+(SELECT 1+COUNT(*)/8
FROM pg_stats s2
WHERE null_frac<>0
AND s2.schemaname = s.schemaname
AND s2.tablename = s.tablename) AS nullhdr
FROM pg_stats s, (SELECT
(SELECT current_setting('block_size')::NUMERIC) AS bs,
CASE WHEN SUBSTRING(v,12,3) IN ('8.0','8.1','8.2') THEN 27 ELSE 23 END AS hdr,
CASE WHEN v ~ 'mingw32' THEN 8 ELSE 4 END AS ma
FROM (SELECT version() AS v) AS foo
) AS constants
GROUP BY 1,2,3,4,5
) AS foo
) AS rs
JOIN pg_class cc ON cc.relname = rs.tablename
JOIN pg_namespace nn ON cc.relnamespace = nn.oid
AND nn.nspname = rs.schemaname
AND nn.nspname <> 'information_schema'
LEFT JOIN pg_index i ON indrelid = cc.oid
LEFT JOIN pg_class c2 ON c2.oid = i.indexrelid
) AS sml
INNER JOIN get_schema_names gsn ON gsn.schema_name = sml.schemaname
WHERE CASE
WHEN relpages < otta THEN 0
ELSE bs*(sml.relpages-otta)::BIGINT
END > 0
ORDER BY wastedbytes DESC;
This query is a complex one. To better understand it, please go through pg_class table description as a starter: https://www.postgresql.org/docs/current/catalog-pg-class.html
Goal of this query is to show all table, (except ones in postgres schema) with size of wasted bytes in table greater than 0. For instance if you have removed rows from your table, and have done the vacuum then these deleted rows will still be there on the backend, treated as wasted bytes (oversimplification a bit but still to the point).
Query Results
Result will give you, currently selected database, schema name and table name, total size of object in bytes and in “pretty” form (kB, MB etc.) – together with indexes etc, table size in bytes and “pretty” form, table bloat and wasted bytes. You can find good description of bloat under this article https://dba.stackexchange.com/questions/126258/what-is-table-bloating-in-databases.
Get All Indexes Details And Wasted Bytes
-- Get Information About Indexes size and Bloat Size - how many "dead" rows are under index
WITH get_schema_names AS (
SELECT schema_name
FROM information_schema.schemata
WHERE schema_owner <> 'postgres'
) SELECT current_database(),
nspname AS schemaname,
tblname,
idxname,
bs*(relpages)::bigint as real_size_bytes,
pg_size_pretty(bs*(relpages)::bigint) AS real_size,
bs*(relpages-est_pages)::bigint AS extra_size,
100 * (relpages-est_pages)::float / relpages AS extra_ratio,
fillfactor,
bs*(relpages-est_pages_ff) AS bloat_size_bytes,
pg_size_pretty(bs*(relpages-est_pages_ff)::bigint) AS bloat_size,
100 * (relpages-est_pages_ff)::float / relpages AS bloat_ratio
FROM (
SELECT coalesce(1 +
ceil(reltuples/floor((bs-pageopqdata-pagehdr)/(4+nulldatahdrwidth)::float)), 0 -- ItemIdData size + computed avg size of a tuple (nulldatahdrwidth)
) AS est_pages,
coalesce(1 +
ceil(reltuples/floor((bs-pageopqdata-pagehdr)*fillfactor/(100*(4+nulldatahdrwidth)::float))), 0
) AS est_pages_ff,
bs, nspname, table_oid, tblname, idxname, relpages, fillfactor, is_na
FROM (
SELECT maxalign, bs, nspname, tblname, idxname, reltuples, relpages, relam, table_oid, fillfactor,
( index_tuple_hdr_bm +
maxalign - CASE -- Add padding to the index tuple header to align on MAXALIGN
WHEN index_tuple_hdr_bm%maxalign = 0 THEN maxalign
ELSE index_tuple_hdr_bm%maxalign
END
+ nulldatawidth + maxalign - CASE -- Add padding to the data to align on MAXALIGN
WHEN nulldatawidth = 0 THEN 0
WHEN nulldatawidth::integer%maxalign = 0 THEN maxalign
ELSE nulldatawidth::integer%maxalign
END
)::numeric AS nulldatahdrwidth, pagehdr, pageopqdata, is_na
FROM (
SELECT
i.nspname, i.tblname, i.idxname, i.reltuples, i.relpages, i.relam, a.attrelid AS table_oid,
current_setting('block_size')::numeric AS bs, fillfactor,
CASE -- MAXALIGN: 4 on 32bits, 8 on 64bits (and mingw32 ?)
WHEN version() ~ 'mingw32' OR version() ~ '64-bit|x86_64|ppc64|ia64|amd64' THEN 8
ELSE 4
END AS maxalign,
/* per page header, fixed size: 20 for 7.X, 24 for others */
24 AS pagehdr,
/* per page btree opaque data */
16 AS pageopqdata,
/* per tuple header: add IndexAttributeBitMapData if some cols are null-able */
CASE WHEN max(coalesce(s.null_frac,0)) = 0
THEN 2 -- IndexTupleData size
ELSE 2 + (( 32 + 8 - 1 ) / 8) -- IndexTupleData size + IndexAttributeBitMapData size ( max num filed per index + 8 - 1 /8)
END AS index_tuple_hdr_bm,
/* data len: we remove null values save space using it fractionnal part from stats */
sum( (1-coalesce(s.null_frac, 0)) * coalesce(s.avg_width, 1024)) AS nulldatawidth,
max( CASE WHEN a.atttypid = 'pg_catalog.name'::regtype THEN 1 ELSE 0 END ) > 0 AS is_na
FROM pg_attribute AS a
JOIN (
SELECT nspname, tbl.relname AS tblname, idx.relname AS idxname, idx.reltuples, idx.relpages, idx.relam,
indrelid, indexrelid, indkey::smallint[] AS attnum,
coalesce(substring(
array_to_string(idx.reloptions, ' ')
from 'fillfactor=([0-9]+)')::smallint, 90) AS fillfactor
FROM pg_index
JOIN pg_class idx ON idx.oid=pg_index.indexrelid
JOIN pg_class tbl ON tbl.oid=pg_index.indrelid
JOIN pg_namespace ON pg_namespace.oid = idx.relnamespace
WHERE pg_index.indisvalid AND tbl.relkind = 'r' AND idx.relpages > 0
) AS i ON a.attrelid = i.indexrelid
JOIN pg_stats AS s ON s.schemaname = i.nspname
AND ((s.tablename = i.tblname AND s.attname = pg_catalog.pg_get_indexdef(a.attrelid, a.attnum, TRUE)) -- stats from tbl
OR (s.tablename = i.idxname AND s.attname = a.attname))-- stats from functionnal cols
JOIN pg_type AS t ON a.atttypid = t.oid
WHERE a.attnum > 0
GROUP BY 1, 2, 3, 4, 5, 6, 7, 8, 9
) AS s1
) AS s2
JOIN pg_am am ON s2.relam = am.oid WHERE am.amname = 'btree'
) AS sub
INNER JOIN get_schema_names gt ON gt.schema_name = sub.nspname
WHERE NOT is_na
AND bs*(relpages-est_pages_ff) > 0
ORDER BY 2,3,4;
Above SQL statement is mostly build based on pg_index table and informations from PostgreSQL wiki https://wiki.postgresql.org/wiki/Index_Maintenance#Index_Bloat.
More about pg_index: https://www.postgresql.org/docs/current/catalog-pg-index.html
What it will produce: current database, schema name, table name and index name. For Indexes its size in bytes and “pretty” form, extra size (extra pages are subtracted), index fill factor and the size of index bloat (same as with bloat for tables).
Query Results
Query is limited to all schema apart of postgres and to results where bloat_size is greater than 0.
Summary
Above two queries, won’t be used daily. However they should be for you to use them when you need it. I would recommend to add them to you regular maintenance scripts (activities). I can think of sending response to certain mail address for further investigation if needed etc.
More about performance, and useful queries you can find on official PostgreSQL wiki’s under following link: https://wiki.postgresql.org/wiki/Performance_Optimization
Want to be up to date with new posts?
Use below form to join Data Craze Weekly Newsletter!