Received: from malur.postgresql.org ([217.196.149.56]) by arkaria.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.94.2) (envelope-from ) id 1tVuqB-00CbTP-Li for pgsql-performance@arkaria.postgresql.org; Thu, 09 Jan 2025 15:53:00 +0000 Received: from localhost ([127.0.0.1] helo=malur.postgresql.org) by malur.postgresql.org with esmtp (Exim 4.94.2) (envelope-from ) id 1tVuqB-003XwY-2x for pgsql-performance@arkaria.postgresql.org; Thu, 09 Jan 2025 15:52:58 +0000 Received: from makus.postgresql.org ([2001:4800:3e1:1::229]) by malur.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.94.2) (envelope-from ) id 1tVuqA-003Xw2-JD for pgsql-performance@lists.postgresql.org; Thu, 09 Jan 2025 15:52:58 +0000 Received: from mail-ot1-x336.google.com ([2607:f8b0:4864:20::336]) by makus.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256 (Exim 4.96) (envelope-from ) id 1tVuq8-000jG8-0B for pgsql-performance@postgresql.org; Thu, 09 Jan 2025 15:52:57 +0000 Received: by mail-ot1-x336.google.com with SMTP id 46e09a7af769-71e2dccdb81so665312a34.1 for ; Thu, 09 Jan 2025 07:52:56 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1736437976; x=1737042776; darn=postgresql.org; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:from:to:cc:subject:date:message-id:reply-to; bh=qeGI0IokFnWZtQ5pXX2leUn+dMvXCd6XNhq+kWXxdz8=; b=ACBbDwpbr0mvpFzDr5+M6JZNa9nu5uD27YDl/0AX2N2i7mQ8lWpBMvMAzSHmbiIkjw bTWv7KPXpwSqDAtQwxbVJtEH6I9biRckz+ltrf9W7FrHDg37V7cS8jRlGhUKCano47SA PYYThvYZY228nBTpzmi6/PJ6rrCmoK28wBMUbfxbjjyZHdsAJAqAYQEyCmwRugttc9qb XlFA/rfKt+bybp9tEwSbAUYDZAQJXP0jTuPghOZ45QFqG3/tRXLOqSOBQ/Ldk7tX3iPN ubu1Dw5s8o6AuHKI5fcrjcbv89Y/lFVdbOB4ZzpJCjaMqrqBct7/rj5VNk1bp79LmeCd VVKQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1736437976; x=1737042776; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=qeGI0IokFnWZtQ5pXX2leUn+dMvXCd6XNhq+kWXxdz8=; b=f4cati8OpTcK+9V9MylyiZOhJotxBY2qUiREZgcN05sjTw+RlAmHMD4PidZRFwCd3P 4ltQm0v18R4fmrJiSAQmZBT5rC8cqhzQwG3+92C/6NkL7owBsEB6wb8pXkwxtsqHDVH9 4AkeMSvAPtjpUK5Ote63YG443V2ogYW8asvMHEsm/gk4jGQvE9HY8Tvv72ue9nVWP3pQ oI1JERDqZDaLrmptU6X650r5MKGZPqiGKA/UvP4AEzwOZlgQNMQLefqrxzEhpBzBM0UJ SU1O8/m84WbYfBCcs27XnhvIHtACweSD2qxHavbdmKlVW6d4eYMGudvfPc1+eKxXRIcF 27SA== X-Gm-Message-State: AOJu0Yxmv457aw2roXyl//nUikdjPBSVC6WOWZrrwSNeqoDNmUsyMsZ8 i/AxjxLG8zbQsKiOJtG5QqFv57xfz58aV7fiqTYE0LsolxNokHq4qkqUT+yyr80yeXWQ7SL4GxT LTAKmxXfdnslKlhHgdxpT1D+jjQY= X-Gm-Gg: ASbGncsw2X/eatxthvMHuCXKN2cAV+HOeLgX7udv7g8iVhh0owkE2+HsVQ83KJJPeWL hZ8AVAVRPE6KWw4vxkcmQENMKrM22vEZZnuRXvA== X-Google-Smtp-Source: AGHT+IG09p2bpHxGHVQj2niUO0WrdGmq81eT4o8tJgMyMDbMe2qsQDnT803NVnyPZQHcq/bpIiEVtePrC9zMNaLFaWo= X-Received: by 2002:a05:6830:718c:b0:71d:ed5f:f181 with SMTP id 46e09a7af769-721e2ecd789mr4916256a34.22.1736437975394; Thu, 09 Jan 2025 07:52:55 -0800 (PST) MIME-Version: 1.0 References: In-Reply-To: From: David Mullineux Date: Thu, 9 Jan 2025 15:52:45 +0000 X-Gm-Features: AbW1kvaOCLDLHZG8qSMX9g-9IHwdqTUqGMQ2oSbfa6ketJksBfwXMDUpi682Sh4 Message-ID: Subject: Re: Unexpected Seq Scan's when using MERGE WHEN NOT MATCHED BY SOURCE To: Feike Steenbergen Cc: pgsql-performance@postgresql.org Content-Type: multipart/alternative; boundary="000000000000b2b31d062b47f84d" List-Id: List-Help: List-Subscribe: List-Post: List-Owner: List-Archive: Archived-At: Precedence: bulk --000000000000b2b31d062b47f84d Content-Type: text/plain; charset="UTF-8" On Thu, 9 Jan 2025, 12:26 Feike Steenbergen, wrote: > I'm trying to change a few applications to fully use this, as PostgreSQL > 17 added this support. > > The application does something like this: > > - fetch information from a source system and store it in a temp table > - run a MERGE with a table (target) in this database, > updating, inserting and deleting in a single statement > - the target table holds information for multiple systems > > The temp table (source) doesn't contain the identifier for the system, as > we can inject that at runtime. > > This is the shape of the statement: > > MERGE INTO > merge_target AS t > USING > merge_source AS s ON (t.time = s.time AND t.device_id = $1) > WHEN MATCHED THEN > UPDATE set > value = s.value > WHEN NOT MATCHED THEN > INSERT (device_id, time, value) VALUES ($1, time, value) > WHEN NOT MATCHED BY SOURCE > AND t.device_id = $1 > THEN DELETE; > > If we run this however, there is always a Seq Scan against merge_target, > whereas the filter of `AND t.device_id = $1` uses a (Bitmap) Index scan > in other types of queries. > > Previously we would - in a CTE - do a delete *and then* the merge. > > Which performed much better as the DELETE would not do a Seq Scan. > > Attached a full reproducing test case. Some numbers: > > Previously, DELETE in CTE, then merge: > Planning Time: 6.700 ms > Execution Time: 7.473 ms > > Using the MERGE WHEN MATCHED ON SOURCE THEN DELETE: > Planning Time: 0.994 ms > Execution Time: 65.695 ms > > My questions are: > > - is the Seq Scan expected by others in this mailing list as well? > - Is it required to do the Seq Scan? > - is this something that could be optimized? > > Kind regards, > > Feike Steenbergen > An excellent post. I wish all posters provided nice contained example cases like this one. I note ,in the documentation, that a Warning box got added which says this... If both WHEN NOT MATCHED BY SOURCE and WHEN NOT MATCHED [BY TARGET] clauses are specified, the MERGE command will perform a FULL join between data_source and the target table. For this to work, at least one join_condition subexpression must use an operator that can support a hash join, or all of the subexpressions must use operators that can support a merge join. This could be a hint as to the reason maybe ? The NOT MATCHED BY SOURCE is new feature to 17. I'm looking forward to others replies here. > --000000000000b2b31d062b47f84d Content-Type: text/html; charset="UTF-8" Content-Transfer-Encoding: quoted-printable


On Thu, 9 Jan 2025, 12:26 Feike = Steenbergen, <feikesteenbe= rgen@gmail.com> wrote:
I'm trying to change a few applications to fully use this, = as PostgreSQL 17 added this support.

The application does something = like this:

- fetch information from a source system and store it in = a temp table
- run a MERGE with a table (target) in this database,
= =C2=A0 updating, inserting and deleting in a single statement
- the targ= et table holds information for multiple systems

The temp table (sour= ce) doesn't contain the identifier for the system, as
we can inject = that at runtime.

This is the shape of the statement:

=C2=A0 = =C2=A0 MERGE INTO
=C2=A0 =C2=A0 =C2=A0 =C2=A0 merge_target AS t
=C2= =A0 =C2=A0 USING
=C2=A0 =C2=A0 =C2=A0 =C2=A0 merge_source AS s ON (t.tim= e =3D s.time AND t.device_id =3D $1)
=C2=A0 =C2=A0 WHEN MATCHED THEN
= =C2=A0 =C2=A0 =C2=A0 =C2=A0 UPDATE set
=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2= =A0 =C2=A0 value =3D s.value
=C2=A0 =C2=A0 WHEN NOT MATCHED THEN
=C2= =A0 =C2=A0 =C2=A0 =C2=A0 INSERT (device_id, time, value) VALUES ($1, time, = value)
=C2=A0 =C2=A0 WHEN NOT MATCHED BY SOURCE
=C2=A0 =C2=A0 =C2=A0 = =C2=A0 AND t.device_id =3D $1
=C2=A0 =C2=A0 =C2=A0 =C2=A0 THEN DELETE;
If we run this however, there is always a Seq Scan against merge_targ= et,
whereas the filter of `AND t.device_id =3D $1` uses a (Bitmap) Index= scan
in other types of queries.

Previously we would - in a CTE -= do a delete *and then* the merge.

Which performed much better as th= e DELETE would not do a Seq Scan.

Attached a full reproducing test c= ase. Some numbers:

Previously, DELETE in CTE, then merge:
=C2=A0 = =C2=A0 Planning Time: 6.700 ms
=C2=A0 =C2=A0 Execution Time: 7.473 ms
Using the MERGE WHEN MATCHED ON SOURCE THEN DELETE:
=C2=A0 =C2=A0 P= lanning Time: 0.994 ms
=C2=A0 =C2=A0 Execution Time: 65.695 ms

My= questions are:

- is the Seq Scan expected by others in this mailing= list as well?
- Is it required to do the Seq Scan?
- is this somethi= ng that could be optimized?

Kind regards,

Feike Steenbergen

= An excellent post. I wish all posters provided nice contained example cases= like this one.

I note ,= in the documentation, that a Warning box got added which says this...
=

If both WHEN NOT MATCHED BY S= OURCE and WHEN NOT MATCHED [BY TARGET] clauses are specified, the MERGE com= mand will perform a FULL join between data_source and the target table. For= this to work, at least one join_condition subexpression must use an operat= or that can support a hash join, or all of the subexpressions must use oper= ators that can support a merge join.


This could be a hint as to the rea= son maybe ? The NOT MATCHED BY SOURCE is new feature to 17. I'm looking= forward to others replies here.
--000000000000b2b31d062b47f84d--