Open
Description
I wrote a simple test to test the speed of MERGE JOIN.
RECREATE TABLE BIG_1 (
ID_1 BIGINT NOT NULL,
F_1 BIGINT,
CONSTRAINT PK_BIG_1 PRIMARY KEY(ID_1)
);
RECREATE TABLE BIG_2 (
ID_2 BIGINT NOT NULL,
F_2 BIGINT,
CONSTRAINT PK_BIG_2 PRIMARY KEY(ID_2)
);
SET TERM ^;
EXECUTE BLOCK
AS
DECLARE I BIGINT;
DECLARE A BIGINT;
BEGIN
I = 0;
WHILE (I < 2000000) DO
BEGIN
I = I + 1;
A = NULL;
IF(MOD(I, 19) = 0)
THEN A = I;
INSERT INTO BIG_1(ID_1, F_1)
VALUES (:I, :A);
END
I = 0;
WHILE (I < 1500000) DO
BEGIN
I = I + 1;
A = NULL;
IF(MOD(I, 13) = 0)
THEN A = I;
INSERT INTO BIG_2(ID_2, F_2)
VALUES (:I, :A);
END
END^
SET TERM ;^
COMMIT;
As you can see, most of the field values by which tables are joined are NULL.
And execute simple query. I didn't wait for the result.
SELECT COUNT(*)
FROM
BIG_1
JOIN BIG_2 ON BIG_2.F_2 = BIG_1.F_1
Now we rewrite it in an equivalent form:
SELECT COUNT(*)
FROM
BIG_1
JOIN BIG_2 ON BIG_2.F_2 = BIG_1.F_1
WHERE BIG_2.F_2 IS NOT NULL
Select Expression
-> Aggregate
-> Filter
-> Hash Join (inner)
-> Table "BIG_1" Full Scan
-> Record Buffer (record length: 33)
-> Filter
-> Table "BIG_2" Full Scan
COUNT
=====================
6072
Current memory = 1806305664
Delta memory = 0
Max memory = 1811848224
Elapsed time = 0.944 sec
Buffers = 204800
Reads = 0
Writes = 0
Fetches = 3596854
This is understandable: in a hash table, many entries with the value NULL fall into one bucket.
Why does the optimizer add records with NULL values into a hash table? Why isn't the filter applied implicitly? After all, for an internal join of tables using HASH/MERGE JOIN, it is a priori clear that the condition will be false.
Tried it in 4.0 and 5.0. No difference. although I don't understand why MERGE JOIN is so slow in this case.