SQL Tips & Tricks - PostgreSQL Queries - Part 2

This is part 2 of PostgreSQL Queries Tips & Tricks from performance and overall backend side of your database. In Part 1 we have covered, duplicated, rarely used indexes and general amount of elements in your DB.

This time we will dive into the details of tables and indexes. Below you can find 2 queries to rule them all. Enjoy the read!

Get All Tables Details And Wasted Bytes

-- Get all tables and indexes size and bloat size
WITH get_schema_names AS (
	SELECT schema_name
	  FROM information_schema.schemata 
	 WHERE schema_owner <> 'postgres'
) SELECT DISTINCT 
         current_database(), 
         schemaname  as schema_name, 
         tablename   as table_name, 
         total_bytes as object_total_size_bytes, 
         pg_size_pretty(total_bytes) as object_total_size,
         (total_bytes-index_bytes-COALESCE(toast_bytes,0)) AS table_size_bytes,
         pg_size_pretty((total_bytes-index_bytes-COALESCE(toast_bytes,0))) AS table_size,
  	     ROUND((CASE WHEN otta=0 THEN 0.0 ELSE sml.relpages::FLOAT/otta END)::NUMERIC,1) AS tbloat,
  	     CASE WHEN relpages < otta THEN 0 ELSE bs*(sml.relpages-otta)::BIGINT END AS wastedbytes,
  	     pg_size_pretty(CASE WHEN relpages < otta THEN 0 ELSE bs*(sml.relpages-otta)::BIGINT END) as wasted_size
    FROM (
          SELECT schemaname, 
                 tablename, 
         		 cc.reltuples, 
         		 cc.relpages, 
         		 bs, 
    	   		 pg_total_relation_size(cc.oid) AS total_bytes,
         		 pg_indexes_size(cc.oid) AS index_bytes,
         		 pg_total_relation_size(cc.reltoastrelid) AS toast_bytes,
         		 CEIL((cc.reltuples*((datahdr+ma-(CASE 
         											WHEN datahdr%ma=0 THEN ma 
         											ELSE datahdr%ma
         										  END))+nullhdr2+4))/(bs-20::FLOAT)) AS otta
  		    FROM (  SELECT ma,
				    	   bs,
				    	   schemaname,
				    	   tablename,
				      	   (datawidth+(hdr+ma-(CASE WHEN hdr%ma=0 THEN ma ELSE hdr%ma END)))::NUMERIC AS datahdr,
				           (maxfracsum*(nullhdr+ma-(CASE WHEN nullhdr%ma=0 THEN ma ELSE nullhdr%ma END))) AS nullhdr2
    				  FROM (SELECT schemaname, 
					      		   tablename, 
					      		   hdr, 
					      		   ma, 
					      		   bs,
					               SUM((1-null_frac)*avg_width) AS datawidth,
					               MAX(null_frac) AS maxfracsum,
					               hdr+(SELECT 1+COUNT(*)/8
					          		     FROM pg_stats s2
					          		    WHERE null_frac<>0 
					          		      AND s2.schemaname = s.schemaname 
					          		      AND s2.tablename = s.tablename) AS nullhdr
      					   FROM pg_stats s, (SELECT
									           (SELECT current_setting('block_size')::NUMERIC) AS bs,
									          		   CASE WHEN SUBSTRING(v,12,3) IN ('8.0','8.1','8.2') THEN 27 ELSE 23 END AS hdr,
									          		   CASE WHEN v ~ 'mingw32' THEN 8 ELSE 4 END AS ma
									              FROM (SELECT version() AS v) AS foo
									      ) AS constants
						GROUP BY 1,2,3,4,5
					) AS foo
			) AS rs
  		JOIN pg_class cc ON cc.relname = rs.tablename
  		JOIN pg_namespace nn ON cc.relnamespace = nn.oid 
  							AND nn.nspname = rs.schemaname 
  							AND nn.nspname <> 'information_schema'
   LEFT JOIN pg_index i ON indrelid = cc.oid
   LEFT JOIN pg_class c2 ON c2.oid = i.indexrelid
) AS sml 
INNER JOIN get_schema_names gsn ON gsn.schema_name = sml.schemaname
WHERE CASE 
		WHEN relpages < otta THEN 0 
		ELSE bs*(sml.relpages-otta)::BIGINT 
	  END > 0 
ORDER BY wastedbytes DESC;

This query is a complex one. To better understand it, please go through pg_class table description as a starter: https://www.postgresql.org/docs/current/catalog-pg-class.html

Goal of this query is to show all table, (except ones in postgres schema) with size of wasted bytes in table greater than 0. For instance if you have removed rows from your table, and have done the vacuum then these deleted rows will still be there on the backend, treated as wasted bytes (oversimplification a bit but still to the point).

Result will give you, currently selected database, schema name and table name, total size of object in bytes and in “pretty” form (kB, MB etc.)  – together with indexes etc, table size in bytes and “pretty” form, table bloat and wasted bytes. You can find good description of bloat under this article (https://dba.stackexchange.com/questions/126258/what-is-table-bloating-in-databases).

Get All Indexes Details And Wasted Bytes

-- Get Information About Indexes size and Bloat Size - how many "dead" rows are under index
WITH get_schema_names AS (
  SELECT schema_name
    FROM information_schema.schemata
   WHERE schema_owner <> 'postgres'
) SELECT current_database(),
       nspname AS schemaname,
       tblname,
       idxname,
       bs*(relpages)::bigint as real_size_bytes,
       pg_size_pretty(bs*(relpages)::bigint) AS real_size,
        bs*(relpages-est_pages)::bigint AS extra_size,
       100 * (relpages-est_pages)::float / relpages AS extra_ratio,
        fillfactor,
        bs*(relpages-est_pages_ff) AS bloat_size_bytes,
        pg_size_pretty(bs*(relpages-est_pages_ff)::bigint) AS bloat_size,
        100 * (relpages-est_pages_ff)::float / relpages AS bloat_ratio
  FROM (
    SELECT coalesce(1 +
         ceil(reltuples/floor((bs-pageopqdata-pagehdr)/(4+nulldatahdrwidth)::float)), 0 -- ItemIdData size + computed avg size of a tuple (nulldatahdrwidth)
      ) AS est_pages,
      coalesce(1 +
         ceil(reltuples/floor((bs-pageopqdata-pagehdr)*fillfactor/(100*(4+nulldatahdrwidth)::float))), 0
      ) AS est_pages_ff,
      bs, nspname, table_oid, tblname, idxname, relpages, fillfactor, is_na
   FROM (
      SELECT maxalign, bs, nspname, tblname, idxname, reltuples, relpages, relam, table_oid, fillfactor,
        ( index_tuple_hdr_bm +
            maxalign - CASE -- Add padding to the index tuple header to align on MAXALIGN
              WHEN index_tuple_hdr_bm%maxalign = 0 THEN maxalign
              ELSE index_tuple_hdr_bm%maxalign
            END
          + nulldatawidth + maxalign - CASE -- Add padding to the data to align on MAXALIGN
              WHEN nulldatawidth = 0 THEN 0
              WHEN nulldatawidth::integer%maxalign = 0 THEN maxalign
              ELSE nulldatawidth::integer%maxalign
            END
        )::numeric AS nulldatahdrwidth, pagehdr, pageopqdata, is_na
       FROM (
        SELECT
          i.nspname, i.tblname, i.idxname, i.reltuples, i.relpages, i.relam, a.attrelid AS table_oid,
          current_setting('block_size')::numeric AS bs, fillfactor,
          CASE -- MAXALIGN: 4 on 32bits, 8 on 64bits (and mingw32 ?)
            WHEN version() ~ 'mingw32' OR version() ~ '64-bit|x86_64|ppc64|ia64|amd64' THEN 8
            ELSE 4
          END AS maxalign,
          /* per page header, fixed size: 20 for 7.X, 24 for others */
          24 AS pagehdr,
          /* per page btree opaque data */
          16 AS pageopqdata,
          /* per tuple header: add IndexAttributeBitMapData if some cols are null-able */
          CASE WHEN max(coalesce(s.null_frac,0)) = 0
            THEN 2 -- IndexTupleData size
            ELSE 2 + (( 32 + 8 - 1 ) / 8) -- IndexTupleData size + IndexAttributeBitMapData size ( max num filed per index + 8 - 1 /8)
          END AS index_tuple_hdr_bm,
          /* data len: we remove null values save space using it fractionnal part from stats */
          sum( (1-coalesce(s.null_frac, 0)) * coalesce(s.avg_width, 1024)) AS nulldatawidth,
          max( CASE WHEN a.atttypid = 'pg_catalog.name'::regtype THEN 1 ELSE 0 END ) > 0 AS is_na
        FROM pg_attribute AS a
          JOIN (
            SELECT nspname, tbl.relname AS tblname, idx.relname AS idxname, idx.reltuples, idx.relpages, idx.relam,
              indrelid, indexrelid, indkey::smallint[] AS attnum,
              coalesce(substring(
                array_to_string(idx.reloptions, ' ')
                 from 'fillfactor=([0-9]+)')::smallint, 90) AS fillfactor
            FROM pg_index
              JOIN pg_class idx ON idx.oid=pg_index.indexrelid
              JOIN pg_class tbl ON tbl.oid=pg_index.indrelid
              JOIN pg_namespace ON pg_namespace.oid = idx.relnamespace
            WHERE pg_index.indisvalid AND tbl.relkind = 'r' AND idx.relpages > 0
          ) AS i ON a.attrelid = i.indexrelid
          JOIN pg_stats AS s ON s.schemaname = i.nspname
            AND ((s.tablename = i.tblname AND s.attname = pg_catalog.pg_get_indexdef(a.attrelid, a.attnum, TRUE)) -- stats from tbl
            OR   (s.tablename = i.idxname AND s.attname = a.attname))-- stats from functionnal cols
          JOIN pg_type AS t ON a.atttypid = t.oid
        WHERE a.attnum > 0
        GROUP BY 1, 2, 3, 4, 5, 6, 7, 8, 9
      ) AS s1
    ) AS s2
      JOIN pg_am am ON s2.relam = am.oid WHERE am.amname = 'btree'
  ) AS sub
INNER JOIN get_schema_names gt ON gt.schema_name = sub.nspname
  WHERE NOT is_na
    AND bs*(relpages-est_pages_ff) > 0
ORDER BY 2,3,4;

Above SQL statement is mostly build based on pg_index table and informations from PostgreSQL wiki https://wiki.postgresql.org/wiki/Index_Maintenance#Index_Bloat.

More about pg_index: https://www.postgresql.org/docs/current/catalog-pg-index.html

What it will produce: current database, schema name, table name and index name. For Indexes its size in bytes and “pretty” form, extra size (extra pages are subtracted), index fill factor and the size of index bloat (same as with bloat for tables).

Query is limited to all schema apart of postgres and to results where bloat_size is greater than 0.

Summary

Above two queries, won’t be used daily. However they should be for you to use them when you need it. I would recommend to add them to you regular maintenance scripts (activities). I can think of sending response to certain mail address for further investigation if needed etc.

 

More about performance, and useful queries you can find on official PostgreSQL wiki’s under following link: https://wiki.postgresql.org/wiki/Performance_Optimization

Thanks,

Krzysiek

 

Leave a Comment

Twój adres e-mail nie zostanie opublikowany. Wymagane pola są oznaczone *