public inbox for pgsql-performance@postgresql.org
help / color / mirror / Atom feedFrom: David Mullineux <dmullx@gmail.com>
To: Feike Steenbergen <feikesteenbergen@gmail.com>
Cc: pgsql-performance@postgresql.org
Subject: Re: Unexpected Seq Scan's when using MERGE WHEN NOT MATCHED BY SOURCE
Date: Thu, 9 Jan 2025 15:52:45 +0000
Message-ID: <CAGsyd8UeMRtmgrQZ1Tg7gMztr=pvF4Dp8mp0kGHzc9ct04FQhw@mail.gmail.com> (raw)
In-Reply-To: <CAK_s-G0NrC_KH7kn85arfqkdzvs80GOCCKvz9YbU2=E94qfdPA@mail.gmail.com>
References: <CAK_s-G0NrC_KH7kn85arfqkdzvs80GOCCKvz9YbU2=E94qfdPA@mail.gmail.com>
On Thu, 9 Jan 2025, 12:26 Feike Steenbergen, <feikesteenbergen@gmail.com>
wrote:
> I'm trying to change a few applications to fully use this, as PostgreSQL
> 17 added this support.
>
> The application does something like this:
>
> - fetch information from a source system and store it in a temp table
> - run a MERGE with a table (target) in this database,
> updating, inserting and deleting in a single statement
> - the target table holds information for multiple systems
>
> The temp table (source) doesn't contain the identifier for the system, as
> we can inject that at runtime.
>
> This is the shape of the statement:
>
> MERGE INTO
> merge_target AS t
> USING
> merge_source AS s ON (t.time = s.time AND t.device_id = $1)
> WHEN MATCHED THEN
> UPDATE set
> value = s.value
> WHEN NOT MATCHED THEN
> INSERT (device_id, time, value) VALUES ($1, time, value)
> WHEN NOT MATCHED BY SOURCE
> AND t.device_id = $1
> THEN DELETE;
>
> If we run this however, there is always a Seq Scan against merge_target,
> whereas the filter of `AND t.device_id = $1` uses a (Bitmap) Index scan
> in other types of queries.
>
> Previously we would - in a CTE - do a delete *and then* the merge.
>
> Which performed much better as the DELETE would not do a Seq Scan.
>
> Attached a full reproducing test case. Some numbers:
>
> Previously, DELETE in CTE, then merge:
> Planning Time: 6.700 ms
> Execution Time: 7.473 ms
>
> Using the MERGE WHEN MATCHED ON SOURCE THEN DELETE:
> Planning Time: 0.994 ms
> Execution Time: 65.695 ms
>
> My questions are:
>
> - is the Seq Scan expected by others in this mailing list as well?
> - Is it required to do the Seq Scan?
> - is this something that could be optimized?
>
> Kind regards,
>
> Feike Steenbergen
>
An excellent post. I wish all posters provided nice contained example cases
like this one.
I note ,in the documentation, that a Warning box got added which says
this...
If both WHEN NOT MATCHED BY SOURCE and WHEN NOT MATCHED [BY TARGET] clauses
are specified, the MERGE command will perform a FULL join between
data_source and the target table. For this to work, at least one
join_condition subexpression must use an operator that can support a hash
join, or all of the subexpressions must use operators that can support a
merge join.
This could be a hint as to the reason maybe ? The NOT MATCHED BY SOURCE is
new feature to 17. I'm looking forward to others replies here.
>
reply
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Reply to all the recipients using the --to and --cc options:
reply via email
To: pgsql-performance@postgresql.org
Cc: dmullx@gmail.com, feikesteenbergen@gmail.com
Subject: Re: Unexpected Seq Scan's when using MERGE WHEN NOT MATCHED BY SOURCE
In-Reply-To: <CAGsyd8UeMRtmgrQZ1Tg7gMztr=pvF4Dp8mp0kGHzc9ct04FQhw@mail.gmail.com>
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
This inbox is served by agora; see mirroring instructions
for how to clone and mirror all data and code used for this inbox