public inbox for pgsql-performance@postgresql.org  
help / color / mirror / Atom feed
From: Arne Roland <A.Roland@index.de>
To: Oliver Mattos <omattos@gmail.com>
To: pgsql-performance@postgresql.org <pgsql-performance@postgresql.org>
Subject: Re: Query planner gaining the ability to replanning after start of query execution.
Date: Mon, 13 Nov 2017 20:06:20 +0000
Message-ID: <252cab3895894337a3a88275f423fe0b@index.de> (raw)
In-Reply-To: <CAER07O05V+3Sut=--F0Cagve81Rj7dm4fDdsUoS7NiuS+sU4hw@mail.gmail.com>
References: <CAER07O05V+3Sut=--F0Cagve81Rj7dm4fDdsUoS7NiuS+sU4hw@mail.gmail.com>
List-Unsubscribe:  <mailto:majordomo@postgresql.org?body=unsub%20pgsql-performance>

Hello,

I'd love to have some sort of dynamic query feedback, yet it's very complicated to do it right. I am not convinced that changing the plan during a single execution is the right way to do it, not only because it sounds intrusive to do crazy things in the executor, but also because don't understand why the new plan should be any better than the old one. Can you be more elaborate how you'd want to go about it?

In your example (which presumably just has a single relation), we have no notion of whether the scan returns no rows because we were unlucky, because just the first few pages were empty of matching rows (which in my experience happens more often), or because the cardinality estimation is wrong. Even if the cardinality estimation is wrong, we have no notion of which predicate or predicate combination actually caused the misestimation. If the first few pages where empty, the same might happen with every order (so also with every available indexscan). Imagine a very simple seqscan plan of 
select * from mytab where a = 3 and b = 40 limit 1
Even if we know the cardinality is overestimated, we have no idea whether the cardinality of a = 3 or b = 40 is wrong or they just correlate, so there is no notion of which is actually the cheapest plan. Usual workaround for most of these queries is to add an order by (which has the nice addition of having a deterministic result) with an appropriate complex index, usually resulting in indexscans.

While we actually know more after the first execution of a nodes like materialize, sort or hash nodes, I rarely encounter materialize nodes in the wild. Consequently that is the place where the work is usually already done, which is especially true with the hash node. Even though it still might be more optimal to switch from a mergejoin to a hashjoin in some cases, I doubt that's worth any work (and even less the maintenance).

Best regards
Arne Roland

-----Original Message-----
From: pgsql-performance-owner@postgresql.org [mailto:pgsql-performance-owner@postgresql.org] On Behalf Of Oliver Mattos
Sent: Monday, November 13, 2017 5:45 PM
To: pgsql-performance@postgresql.org
Subject: [PERFORM] Query planner gaining the ability to replanning after start of query execution.

I am interested in giving the query planner the ability to replan (or re-rank plans) after query execution has begun, based on the progression of the query so far.

Example use case:

*  A LIMIT 1 query is planned using an expensive scan which the planner expects to return a large number of results, and to terminate
early.   The reality is the query actually produces no results, and
the scan must run to completion, potentially taking thousands of times longer than expected.

*  If this plans costs were adjusted mid-execution to reflect the fact that the scan is producing far fewer rows than expected, then another query plan might come out ahead, which would complete far faster.


Has this been done before?   Are there any pitfalls to beware of?


--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance




-----Original Message-----
From: pgsql-performance-owner@postgresql.org [mailto:pgsql-performance-owner@postgresql.org] On Behalf Of Oliver Mattos
Sent: Monday, November 13, 2017 5:45 PM
To: pgsql-performance@postgresql.org
Subject: [PERFORM] Query planner gaining the ability to replanning after start of query execution.

I am interested in giving the query planner the ability to replan (or re-rank plans) after query execution has begun, based on the progression of the query so far.

Example use case:

*  A LIMIT 1 query is planned using an expensive scan which the planner expects to return a large number of results, and to terminate
early.   The reality is the query actually produces no results, and
the scan must run to completion, potentially taking thousands of times longer than expected.

*  If this plans costs were adjusted mid-execution to reflect the fact that the scan is producing far fewer rows than expected, then another query plan might come out ahead, which would complete far faster.


Has this been done before?   Are there any pitfalls to beware of?


--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance




-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


reply

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Reply to all the recipients using the --to and --cc options:
  reply via email

  To: pgsql-performance@postgresql.org
  Cc: A.Roland@index.de, omattos@gmail.com
  Subject: Re: Query planner gaining the ability to replanning after start of query execution.
  In-Reply-To: <252cab3895894337a3a88275f423fe0b@index.de>

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

This inbox is served by agora; see mirroring instructions
for how to clone and mirror all data and code used for this inbox