Convert IN subquery into UNIQUE INNER JOIN

Currently we are able to convert IN into a semi-join:

```sql
select * from t1 where t1.fld1 in (select fld2 from t2);
```
is executed like:

```sql
select * from t1 semi join t2 on t1.fld1 = t2.fld2;
```

The problem here is that if table `t1` is large and the IN predicate is selective (often it contains PK values), then the whole table `t1` is read. Instead, it makes more sense to use the subquery as a pre-filter with a plan like this:

```sql
select * from t2 lateral join (select * from t1 where t1.fld1 = t2.fld2);
```

or simply `INNER JOIN` with the join order to be chosen by the optimizer.

If the subquery returns a non-unique field, a `DISTINCT` should be applied implicitly, but it's still likely to be cheaper (for a small `t2`) than a `t1` fullscan.

The major problem here is to estimate the subquery cardinality before joining, currently we have no such ability. Without that it's impossible to decide whether `DISTINCT + INNER JOIN` is cheaper than `SEMI JOIN`.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Convert IN subquery into UNIQUE INNER JOIN #8531

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Uh oh!

Convert IN subquery into UNIQUE INNER JOIN #8531

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions