Received: from malur.postgresql.org ([217.196.149.56]) by arkaria.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.96) (envelope-from ) id 1wHb4e-007Kzz-19 for pgsql-performance@arkaria.postgresql.org; Tue, 28 Apr 2026 05:33:32 +0000 Received: from localhost ([127.0.0.1] helo=malur.postgresql.org) by malur.postgresql.org with esmtp (Exim 4.96) (envelope-from ) id 1wHb4c-00GEmQ-0G for pgsql-performance@arkaria.postgresql.org; Tue, 28 Apr 2026 05:33:30 +0000 Received: from makus.postgresql.org ([2001:4800:3e1:1::229]) by malur.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.96) (envelope-from ) id 1wHb4b-00GEmI-1y for pgsql-performance@lists.postgresql.org; Tue, 28 Apr 2026 05:33:29 +0000 Received: from mail-wm1-x334.google.com ([2a00:1450:4864:20::334]) by makus.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256 (Exim 4.98.2) (envelope-from ) id 1wHb4Y-000000038lA-3bD6 for pgsql-performance@lists.postgresql.org; Tue, 28 Apr 2026 05:33:28 +0000 Received: by mail-wm1-x334.google.com with SMTP id 5b1f17b1804b1-488a8ca4aadso152016985e9.3 for ; Mon, 27 Apr 2026 22:33:27 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=cybertec.at; s=google; t=1777354406; x=1777959206; darn=lists.postgresql.org; h=mime-version:user-agent:content-transfer-encoding:references :in-reply-to:date:to:from:subject:message-id:from:to:cc:subject:date :message-id:reply-to; bh=fri30hjCnTQKNOEXPWQLgdx13E2/GnI1f8RkY4n7Hww=; b=jX9yJQGqANDk3jGV0DRoJtehF/vAZq5GW3rqvKZxSVtLuC11nIj+pb3qryZliG6wEG mY9su0uf7CwSf+c9MHIjuhcyDHbUtjODoR25wkAG7cHet8uXgvwSk9i4QYsX3jCFn5tJ MZaGmvYM9vYubUJcDb8+RPoYJrtB03C1x1tchV+VbzDo7d+K9E08Hx+7JMwOQqFzQS99 mCx3TINAjcIvrPHGauA9o1NkBweBnT6GjSvMAgvvyKZqRofivyD5h/v16vHlM7p2mvOY Vq8ZGxLbjwBIXfKZcsmzkrwRqP9kBngdHoxp1kOS2lzPIOvUsO8k34jpkYW7XK/0a1UW /LHQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20251104; t=1777354406; x=1777959206; h=mime-version:user-agent:content-transfer-encoding:references :in-reply-to:date:to:from:subject:message-id:x-gm-gg :x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=fri30hjCnTQKNOEXPWQLgdx13E2/GnI1f8RkY4n7Hww=; b=KO/Sw3PYGDb5JuENTK5axIoqOSyDX+fEUnW9q0GlNRkNP7yrJ8CUKKA5Pnl4onuRuk 7cSHtZqoL8z1qCwxumvzi69Vdfhf1XX0gGRVqwIivaWFC29xW/rZpHLEM6sZPGmEMaAL xdT+jUuEAY3/JBOCRb8IDfbEK7fpL34c44VraiQ753sflR+XbRj8InMxNzaXuh4AlNDC 5JIGxTgMZRNjOED9Zwa7GcRAvHlG/TpAiH/ptUCPs8iF2iFA0xdnmZt6TQ34Qf/hbg8r EV8v984jFyd6OSYtQY0TFLTRzQvOTzZ3WQJ74bustg+HNqD3N9NsbwdXtWyIm8NZpUPb 8Hug== X-Forwarded-Encrypted: i=1; AFNElJ/+KyVhlbJzZAe/9G1ITZJUr9b8vgMi2X7z/Cy7SGqeyyBMJw6DHMOUDqeFeJs4OEfdBOFMTWnLBcmTNQ6R9disfA==@lists.postgresql.org X-Gm-Message-State: AOJu0YwIiw5zGNsQSKzNLN5gCvn5KrYTazDkq+1+vQuDUU0qxsdtap1R 9BCEcPQRWlCA8qf/GZhc4eys0lK9tNGJYuVm0sq/nKXBE881dPfl1OZo5Z6D1OtA4BI= X-Gm-Gg: AeBDiesEggjt/bkwri7p7wSVOlC7poDMPL2yx7MMOswbTbPYqi2dVs6QvtnM3ANiEON im+UithSl826DE0w2J/FZz5aCGfeQBSKrTLoiMo8KgYf1f5a+0bFvynOcwRcvoru8QBtU+ZZdJi W4rTNFY90jYYZwAfurMdFJ72/Gsvhj0zqhEgsy38JgatKeCvaR4mqFtLBG85gst2r0jMfh0Mepu +bOmrcTjaCVPi5CzzQxkH6Uheh/HfC/kvkWjfffkdhQTkK2LXQZlZ7Hst0jDToS/+UxeviFlNQ8 qLG5rZ4z0X/u1o60JpeJXXrRByf7RGRCg7r7JJkSY6Xz4POlKaeJkhH6tG24VvgEw7i2K1YLrPF 8bKizBfQ2z5jxXzWL/hJ2LEmAlDtcHu2VlypLt9NqHG6EDjOxaTIqalYUMuUCU9OzmlcGsuxmyX A+1plnIrMVtEPlGZWSZbGfwybF+X933R75CoCiL7YSutSLdKOt3w+nE9q+rmWefLeJFg== X-Received: by 2002:a05:600c:3b15:b0:487:1108:48af with SMTP id 5b1f17b1804b1-48a77ae5412mr24428825e9.4.1777354405770; Mon, 27 Apr 2026 22:33:25 -0700 (PDT) Received: from laurenz.albe-K4N0CV00F97414D ([2001:871:270:4fd5:603f:45c2:9807:c104]) by smtp.gmail.com with ESMTPSA id 5b1f17b1804b1-48a77442889sm11178625e9.0.2026.04.27.22.33.24 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 27 Apr 2026 22:33:25 -0700 (PDT) Message-ID: Subject: Re: Query performance From: Laurenz Albe To: "Dirschel, Steve-CW" , "pgsql-performance@lists.postgresql.org" Date: Tue, 28 Apr 2026 07:33:24 +0200 In-Reply-To: References: Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable User-Agent: Evolution 3.58.3 (3.58.3-1.fc43) MIME-Version: 1.0 List-Id: List-Help: List-Subscribe: List-Post: List-Owner: List-Archive: Archived-At: Precedence: bulk On Mon, 2026-04-27 at 15:52 +0000, Dirschel, Steve-CW wrote: > Aurora Postgres version 17.4. Aurora works quite differently, as far as storage is concerned, so you may = be suffering from some peculiarity of that closed source software. > Table in question: >=20 >=20 > \d poslog_publisher_rms_stage > =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0Tabl= e "public.poslog_publisher_rms_stage" > =C2=A0 =C2=A0 =C2=A0 Column =C2=A0 =C2=A0 =C2=A0 | =C2=A0 =C2=A0 =C2=A0 = =C2=A0 =C2=A0 Type =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 | Collation | Nullabl= e | Default > -------------------+--------------------------+-----------+----------+---= ------ > =C2=A0stage_id =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0| uuid =C2=A0 =C2=A0 =C2= =A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 | =C2=A0 =C2=A0 =C2=A0= =C2=A0 =C2=A0 | not null | > =C2=A0status =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0| character varying= (100) =C2=A0 | =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 | =C2=A0 =C2=A0 =C2=A0 = =C2=A0 =C2=A0| > =C2=A0message_body =C2=A0 =C2=A0 =C2=A0| text =C2=A0 =C2=A0 =C2=A0 =C2=A0= =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 | =C2=A0 =C2=A0 =C2=A0 =C2=A0 = =C2=A0 | not null | > =C2=A0error_code =C2=A0 =C2=A0 =C2=A0 =C2=A0| character varying(100) =C2= =A0 | =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 | =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2= =A0| > =C2=A0error_category =C2=A0 =C2=A0| character varying(100) =C2=A0 | =C2= =A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 | =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0| > =C2=A0error_message =C2=A0 =C2=A0 | text =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2= =A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 | =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0= | =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0| > =C2=A0error_retry_count | integer =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2= =A0 =C2=A0 =C2=A0 =C2=A0| =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 | =C2=A0 =C2= =A0 =C2=A0 =C2=A0 =C2=A0| 0 > =C2=A0create_date =C2=A0 =C2=A0 =C2=A0 | timestamp with time zone | =C2= =A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 | not null | now() > =C2=A0modified_date =C2=A0 =C2=A0 | timestamp with time zone | =C2=A0 =C2= =A0 =C2=A0 =C2=A0 =C2=A0 | not null | now() > Indexes: > =C2=A0 =C2=A0 "poslog_publisher_rms_stage_pkey" PRIMARY KEY, btree (stage= _id) > =C2=A0 =C2=A0 "idx_poslog_publisher_stage_create_date_col" btree (create_= date) > =C2=A0 =C2=A0 "idx_poslog_publisher_stage_status_error_retry_count_modi_d= ate_c" btree (status, error_retry_count, modified_date) > Referenced by: > =C2=A0 =C2=A0 TABLE "poslog_publisher_rms_detail" CONSTRAINT "fk_poslog_p= ublisher_detail_stage_id" FOREIGN KEY (stage_id) REFERENCES poslog_publishe= r_rms_stage(stage_id) > Publications: > =C2=A0 =C2=A0 =E2=80=9Csashpsrms_publication" >=20 > The table is constantly getting loaded into. =C2=A0Rows are inserted with= a status ready and > then there is a different process looking for that status and will update= to processed > after processing the row. =C2=A0We have multiple tables like this. >=20 > Then every 2 hours a different process runs this query looking for failed= or unprocessed rows: >=20 > select > =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2= =A0 ppse.stage_id as stageId, > =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2= =A0 ppse.status as status, > =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2= =A0 ppse.message_body as messageBody > =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 fro= m > =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2= =A0 poslog_publisher_rms_stage ppse > =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 whe= re > =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2= =A0 ppse.status in ('UNPROCESSED','FAILED') > =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2= =A0 and ppse.error_retry_count < 3 > =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 ord= er by > =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2= =A0 ppse.create_date > =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 lim= it 100; >=20 > If I run that query with explain it is doing a ton of work to find 0 rows= . =C2=A0The index it > uses seems appropriate for the query. The index can be used, but it is far from perfect. The ideal index would b= e: CREATE INDEX ON poslog_publisher_rms_stage (create_date) WHERE status in ('UNPROCESSED','FAILED') AND error_retry_count < 3; > =C2=A0=C2=A0=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2= =A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 = =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2= =A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 = =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2= =A0QUERY PLAN > -------------------------------------------------------------------------= ---------------------------------------------------------------------------= --------------------------------------------------------------------- > =C2=A0Limit =C2=A0(cost=3D10.92..10.93 rows=3D1 width=3D1019) (actual tim= e=3D66566.823..66566.824 rows=3D0 loops=3D1) > =C2=A0 =C2=A0Buffers: shared hit=3D1509768 read=3D2011479 > =C2=A0 =C2=A0I/O Timings: shared read=3D79792.017 > =C2=A0 =C2=A0-> =C2=A0Sort =C2=A0(cost=3D10.92..10.93 rows=3D1 width=3D10= 19) (actual time=3D66566.821..66566.821 rows=3D0 loops=3D1) > =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0Sort Key: create_date > =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0Sort Method: quicksort =C2=A0Memory: 25= kB > =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0Buffers: shared hit=3D1509768 read=3D20= 11479 > =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0I/O Timings: shared read=3D79792.017 > =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0-> =C2=A0Index Scan using idx_poslog_pu= blisher_stage_status_error_retry_count_modi_date_c on poslog_publisher_rms_= stage ppse =C2=A0(cost=3D0.57..10.91 rows=3D1 width=3D1019) (actual time=3D= 66566.761..66566.761 rows=3D0 loops=3D1) > =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0Index Cond: (((sta= tus)::text =3D ANY ('{UNPROCESSED,FAILED}'::text[])) AND (error_retry_count= < 3)) > =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0Buffers: shared hi= t=3D1509765 read=3D2011479 > =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0I/O Timings: share= d read=3D79792.017 > =C2=A0Planning: > =C2=A0 =C2=A0Buffers: shared hit=3D195 read=3D1 > =C2=A0 =C2=A0I/O Timings: shared read=3D1.038 > =C2=A0Planning Time: 2.909 ms > =C2=A0Execution Time: 66581.498 ms >=20 > The query did 3.5 million block reads when scanning the index of which 1.= 5 million were > in memory and 2 million were from disk. =C2=A05 seconds later I ran the e= xact same query again: >=20 > =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2= =A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 = =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2= =A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 = =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 QUERY PLAN > -------------------------------------------------------------------------= ---------------------------------------------------------------------------= --------------------------------------------------------------- > =C2=A0Limit =C2=A0(cost=3D10.92..10.93 rows=3D1 width=3D1019) (actual tim= e=3D23.589..23.591 rows=3D0 loops=3D1) > =C2=A0 =C2=A0Buffers: shared hit=3D18736 > =C2=A0 =C2=A0-> =C2=A0Sort =C2=A0(cost=3D10.92..10.93 rows=3D1 width=3D10= 19) (actual time=3D23.588..23.589 rows=3D0 loops=3D1) > =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0Sort Key: create_date > =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0Sort Method: quicksort =C2=A0Memory: 25= kB > =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0Buffers: shared hit=3D18736 > =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0-> =C2=A0Index Scan using idx_poslog_pu= blisher_stage_status_error_retry_count_modi_date_c on poslog_publisher_rms_= stage ppse =C2=A0(cost=3D0.57..10.91 rows=3D1 width=3D1019) (actual time=3D= 23.583..23.584 rows=3D0 loops=3D1) > =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0Index Cond: (((sta= tus)::text =3D ANY ('{UNPROCESSED,FAILED}'::text[])) AND (error_retry_count= < 3)) > =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0Buffers: shared hi= t=3D18736 > =C2=A0Planning Time: 0.118 ms > =C2=A0Execution Time: 23.628 ms >=20 > Now it only did 18k block reads all in memory. =C2=A0It used the same ind= ex, it also > returned 0 rows. =C2=A0Between those 2 runs I looked at pg_stat_user_tabl= es and could see the > n_tup_ins increased by 13, n_tup_del increased by 13, n_live_tup increase= d by 13, and > n_dead_tup increased by 13. =C2=A0n_live_tup was 96 million and n_dead_tu= p was 8.4 million. > 18k logical reads to find 0 rows is still high but I believe that is most= likely caused > by the 8.4 million n_dead_tups. >=20 > What is going on here where this query has to do 3+ million block reads t= o find 0 rows? > And how is it possible when I run the query 2 times in a row the logical = reads from the > 2nd run comes down significantly? =C2=A0Is this somehow related to determ= ining if rows are > visible or something like that? The only way I can imagine this happening in PostgreSQL is if the first exe= cution kills a lot of index tuples (https://www.cybertec-postgresql.com/en/killed-= index-tuples/). > When I waited 15 minutes between runs the inserted/updated rows only incr= eased by > 10.3k yet total block reads increased by 3 million. Now that seems to speak against the above theory, so you are suffering from= some Amazon- specific behavior. Yours, Laurenz Albe