Skip to content

Convert IN subquery into UNIQUE INNER JOIN #8531

@dyemanov

Description

@dyemanov

Currently we are able to convert IN into a semi-join:

select * from t1 where t1.fld1 in (select fld2 from t2);

is executed like:

select * from t1 semi join t2 on t1.fld1 = t2.fld2;

The problem here is that if table t1 is large and the IN predicate is selective (often it contains PK values), then the whole table t1 is read. Instead, it makes more sense to use the subquery as a pre-filter with a plan like this:

select * from t2 lateral join (select * from t1 where t1.fld1 = t2.fld2);

or simply INNER JOIN with the join order to be chosen by the optimizer.

If the subquery returns a non-unique field, a DISTINCT should be applied implicitly, but it's still likely to be cheaper (for a small t2) than a t1 fullscan.

The major problem here is to estimate the subquery cardinality before joining, currently we have no such ability. Without that it's impossible to decide whether DISTINCT + INNER JOIN is cheaper than SEMI JOIN.

Metadata

Metadata

Assignees

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions