Skip to content

New 'UNLIST' function #8418

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 10 commits into from
Mar 24, 2025

Conversation

ChudaykinAlex
Copy link
Contributor

  1. Purpose

The function parses the input string using the specified delimiter (comma "," is implied by default) and returns the identified substrings as discrete records containing a single field. Additionally, the desired type of the returned field can be specified. If the specified data type conversion is impossible, an error is raised at runtime.

See also: #8005

  1. Syntax and rules
<table value function> ::= UNLIST(<input> [, <separator>] [, <data type conversion>]) [AS] <correlation name> [(<derived column name>)]
<input> ::= <value>
<separator> ::= <value>
<data type conversion> ::= RETURNING <data type>

)Arguments:

  • <input>: any value expression that returns a string/blob of characters (or may be converted to a string), including string literals, table columns, constants, variables, expressions, etc. This parameter is mandatory.
  • <separator>: optional value expression that returns a string which is used as a delimiter (i.e. it separates one value from another inside the input string). It may also be a BLOB TEXT value, limited to 32KB. If an empty string is specified, the output will be just one record containing the input string. If omitted, the comma character is used as a delimiter.
  • <data type>: target data type to convert the output values into. Alternatively, a domain can be specified as the returned type. If omitted, VARCHAR(32) is implied. Feel free to suggest any better alternative default.
  • <correlation name>: alias of the record set returned by the UNLIST function. It is a mandatory parameter (per SQL standard).
  • <derived column name>: optional alias of the column returned by the UNLIST function. If omitted, UNLIST is used as an alias.
  1. Changes in the code
  1. New rules are added to the parser (parse.y).
  2. New BLR code is added to support table-value functions with a sub-code for the UNLIST function.
  3. New class TableValueFunctionSourceNode is added, inherited from RecordSourceNode. This class is a generic implementation of table-value functions inside the engine (to be reused later by other table-value functions defined in the SQL specification). Its descendant classes are responsible for implementing particular functions (UNLIST is just the first one in this list, more to follow).
  4. Extend csb_repeat with jrd_tab_func* pointing to a new structure defining the compile-time info about the function (name, fields, format).
  5. New class TableValueFunctionScan is added, inherited from RecordStream. It generates records for the function stream. Class UnlistFunctionScan is inherited from TableValueFunctionScan to implement the record generator specific to the UNLIST function.
  1. Examples
SELECT * FROM UNLIST('1,2,3,4,5') AS U;

UNLIST                           
================================ 
1                                
2                                
3                                
4                                
5    


SELECT * FROM UNLIST('100:200:300:400:500', ':' RETURNING INT) AS U;

      UNLIST 
============ 
         100 
         200 
         300 
         400 
         500 


SELECT U.* FROM UNLIST('text1,text2,text3') AS U;

UNLIST                           
================================ 
text1                            
text2                            
text3     


SELECT C0 FROM UNLIST('text1,text2,text3') AS U(C0);

C0                               
================================ 
text1                            
text2                            
text3 


SELECT U.C0 FROM UNLIST('text1,text2,text3') AS U(C0);

C0                               
================================ 
text1                            
text2                            
text3 


SET TERM ^ ;
RECREATE PROCEDURE TEST_PROC RETURNS (PROC_RETURN_INT INT)
AS
DECLARE VARIABLE text VARCHAR(11);
BEGIN
	text = '123:123:123';
	FOR SELECT * FROM UNLIST( :text, ':' RETURNING INT) AS A INTO :PROC_RETURN_INT DO
	SUSPEND;
END^
SET TERM ; ^

SELECT * FROM TEST_PROC;
PROC_RETURN_INT 
=============== 
            123 
            123 
            123 


CREATE DOMAIN D1 AS INT;
SELECT TEST_DOMAIN FROM UNLIST('1,2,3,4' RETURNING D1) AS A(TEST_DOMAIN);

 TEST_DOMAIN 
============ 
           1 
           2 
           3 
           4 
           

CREATE VIEW TEST_VIEW AS SELECT * FROM UNLIST('1,2,3,4') AS A(B);
SELECT B FROM TEST_VIEW;

B                                
================================ 
1                                
2                                
3                                
4      


SELECT UNLIST FROM UNLIST('UNLIST,A,S,A') AS A;

Statement failed, SQLSTATE = 42S22
Dynamic SQL Error
-SQL error code = -206
-Column unknown
-UNLIST
-At line 2, column 9
At line 24 in file ...

@sim1984
Copy link

sim1984 commented Jan 29, 2025

What about interaction with the IN predicate? In the initial discussion, the idea was put forward that it would be nice to be able to perform queries like this

SELECT * FROM table WHERE id IN UNLIST('1,2,3' RETURNING INT)

That is, without additional wrapping of the UNLIST results in a derived table.

@ChudaykinAlex
Copy link
Contributor Author

I didn't do anything specifically for IN.
I may not understand the specific task at hand. Here is an example of how I understood your question.

recreate table test1 (s01 INT);
insert into test1(s01)values('1');
insert into test1(s01)values('2');
insert into test1(s01)values('3');
insert into test1(s01)values('4');
insert into test1(s01)values('5');
insert into test1(s01)values('6');
insert into test1(s01)values('7');
insert into test1(s01)values('8');

SELECT * FROM test1 WHERE s01 IN (SELECT * FROM UNLIST('1,8' RETURNING INT) AS U);

         S01 
============ 
           1 
           8 


@sim1984
Copy link

sim1984 commented Jan 29, 2025

In the discussion of #8005 it was suggested to use the shortened syntax for IN UNLIST for convenience, but it is not necessary.

One more question.

SELECT *
FROM UNLIST(?, ',' RETURNING INT)

What type of input parameter will be returned to the client after preparing the query?

BLOB SUB_TYPE TEXT or VARCHAR(N). If the second, what will be N?

For the output parameter, unless otherwise specified, it will be VARCHAR(32) judging by the description above.


G)
CREATE DOMAIN D1 AS INT;
SELECT TEST_DOMAIN FROM UNLIST('1,2,3,4' RETURNING D1) AS A(TEST_DOMAIN);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it TYPE OF [COLUMN] allowed?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Check. Granted. Added example in readme

@@ -6446,9 +6458,28 @@ dsql_fld* FieldNode::resolveContext(DsqlCompilerScratch* dsqlScratch, const Meta
return nullptr;
}

const TEXT* dsqlName = NULL;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

NULL -> nullptr

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My bad.

src/dsql/parse.y Outdated
table_value_function
: table_value_function_clause table_value_function_correlation_name table_value_function_columns_name
{
auto *node = nodeAs<TableValueFunctionSourceNode>($1);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
auto *node = nodeAs<TableValueFunctionSourceNode>($1);
auto node = nodeAs<TableValueFunctionSourceNode>($1);

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Сorrected.

src/dsql/parse.y Outdated


%type <metaNameArray> table_value_function_columns_name
table_value_function_columns_name
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just use derived_column_list instead of create duplicate specific rules.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Apparently I didn't notice this rule.

@@ -3651,6 +3657,331 @@ RseNode* SelectExprNode::dsqlPass(DsqlCompilerScratch* dsqlScratch)
return PASS1_derived_table(dsqlScratch, this, NULL);
}

TableValueFunctionSourceNode*
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not a Firebird code convention. Please start the method name in the same line as the return type and break later as necessary.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I used the clang-format utility to do the formatting. That's what he decided. I'll be more careful

const auto valueDesc = EVL_expr(tdbb, request, valueItem);
if (valueDesc == nullptr)
{
rpb->rpb_number.setValid(true);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why true?


const auto textType = toDesc->getTextType();

auto setStringToRecord = [&] (string str, USHORT length = 0)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Whould not be a string reference?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd even say it should be a const string reference.

if (length == 0)
continue;

string valueStr(buffer, length);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please do not copy the buffer to another string.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are you telling me to read the data directly into a string?

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's better to use something lightweight like string_view here, so as not to allocate memory and not do unnecessary copying.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm going to try an experiment with string_view

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reworked for string_view. tests pass

auto end = AbstractString::npos;
do
{
auto size = end = valueStr.find(separatorStr);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this code compatible with multi-character separator and the list string split in different blob segments?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, it's not. To be improved, I guess. While LIST would never split the separator, the input may be provided from the different source as well and we cannot predict the layout.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@asfernandes, do we generally consider OK to read/process the blob as a whole? If so, this would be the easiest solution (and IIRC it worked this way in our original implementation) and we do the same in some built-in functions. Or should it be considered a bad practice to be revisited/fixed and thus the new code should be able to process blobs in chunks?

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

At the very least, processing BLOBs in parts will save a lot on reading huge BLOBs if not all results are needed UNLIST

SELECT *
FROM UNLIST(?, ',')
ROWS 10

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It requires to move the processing from open() to getRecord(). Currently implemented differently, but it could be worth the effort.

Copy link
Member

@dyemanov dyemanov Feb 20, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't it depend on OPTIMIZE FOR {FIRST | ALL} ROWS? I.e. process/return records in chunks for ALL ROWS (not necessarily read/cache the whole blob, maybe just some sliding buffer) but process/return records one by one for FIRST ROWS?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't it depend on OPTIMIZE FOR {FIRST | ALL} ROWS? I.e. process/return records in chunks for ALL ROWS (not necessarily read/cache the whole blob, maybe just some sliding buffer) but process/return records one by one for FIRST ROWS?

I do not see a clear relation of OPTIMIZE FOR and UNNEST. OPTIMIZE FOR is top level query feature, while UNNEST may be in inner queries.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I mean the mechanics, not the clause itself. For example, a subquery with a FIRST clause implies FOR FIRST ROWS when compiling/optimizing its own RSE.

Copy link

@sim1984 sim1984 Feb 21, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The current implementation has two drawbacks:

  1. Increased memory consumption due to result buffering
  2. Slowing down queries with the FIRST ROWS strategy

In general, it is not very clear why TableValueFunction is a buffered data source. It seems to be related to primary, and all our primary sources are are pipelined. If you need to buffer the result, then the optimizer of the higher level does it.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These issues are being addressed.

return;
}

if (valueDesc->isBlob())
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Others Firebird functions compare strings using collation rules.
Here this is disregard. Should it be?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good question to be discussed. Theoretically, one may ask to split the text into chunks by e.g. both 'A' and 'a' used as a case-insensitive separator 'a'. But honestly, I cannot imagine that being used in practice. And collation-aware compares would add some extra overhead. So maybe it's worth comparing byte-wise (converting both value and separator to the common data type, if necessary). Or maybe we should consider two code branches depending on whether byte-wise compare is possible or not.

@dyemanov
Copy link
Member

In the discussion of #8005 it was suggested to use the shortened syntax for IN UNLIST for convenience, but it is not necessary.

One more question.

SELECT *
FROM UNLIST(?, ',' RETURNING INT)

What type of input parameter will be returned to the client after preparing the query?

BLOB SUB_TYPE TEXT or VARCHAR(N). If the second, what will be N?

For the output parameter, unless otherwise specified, it will be VARCHAR(32) judging by the description above.

In the discussion of #8005 it was suggested to use the shortened syntax for IN UNLIST for convenience, but it is not necessary.

One more question.

SELECT *
FROM UNLIST(?, ',' RETURNING INT)

What type of input parameter will be returned to the client after preparing the query?

BLOB SUB_TYPE TEXT or VARCHAR(N). If the second, what will be N?

For the output parameter, unless otherwise specified, it will be VARCHAR(32) judging by the description above.

Good question to be resolved together. Something like VARCHAR(1024) looks OK at the first glance, but it would lead to a runtime error if overflowed by the user (coercion will fail). CLOB looks safer but has a noticeable overhead if the input string is actually a VARCHAR.

@aafemt
Copy link
Contributor

aafemt commented Jan 30, 2025

it would lead to a runtime error if overflowed by the user (coercion will fail).

In #8145 I already pass user-provided buffers to looper. One step further and functions can accept user input directly as descriptor avoiding coercion and thus overflow.

@AlexPeshkoff
Copy link
Member

AlexPeshkoff commented Jan 30, 2025 via email

@dyemanov dyemanov linked an issue Feb 22, 2025 that may be closed by this pull request
{
rpb->rpb_number.setValid(false);
return false;
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

rpb->rpb_number.setValid(true);

The processing has been moved to the internalGetRecord method.
If “RecordBuffer” is empty, it is tried to be filled in the “nextBuffer” method. Filling will be done in portions.
A small refactoring.
continue;
}
return true;
} while (1);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
} while (1);
} while (true);


bool TableValueFunctionScan::nextBuffer(thread_db* /*tdbb*/) const
{
return false;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should not TableValueFunctionScan be an abstract class instead?

struct Impure : public RecordSource::Impure
{
RecordBuffer* m_recordBuffer;
blb* m_blob;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Aren't you mixing specifics of UnlistFunctionScan's impure in the base TableValueFunctionScan?

@dyemanov dyemanov merged commit f00a24c into FirebirdSQL:master Mar 24, 2025
24 checks passed
@pavel-zotov
Copy link

  1. run cmd.exe, then do: chcp 65001
  2. run following sql:
set bail on;
set list on;
set echo on;
set names utf8; --------------- [ ! ]
shell del c:\temp\tmp4test.fdb 2>nul;
create database 'localhost:c:\temp\tmp4test.fdb' user 'sysdba' password 'masterkey';
commit;
SELECT * FROM UNLIST('ABCDEFGHIJKLMNOPQRSTUVWXYZ,ABCDEFGHIJKLMNOPQRSTUVWXYZ', ',') AS U;

Output will be:

ISQL Version: WI-T6.0.0.693 Firebird 6.0 Initial
Server version:
Firebird/Windows/AMD/Intel/x64 (access method), version "WI-T6.0.0.693 Firebird 6.0 Initial"
Firebird/Windows/AMD/Intel/x64 (remote server), version "WI-T6.0.0.693 Firebird 6.0 Initial/tcp (PZ)/P20:C"
Firebird/Windows/AMD/Intel/x64 (remote interface), version "WI-T6.0.0.693 Firebird 6.0 Initial/tcp (PZ)/P20:C"
on disk structure version 14.0
select * from unlist('abcdefghijklmnopqrstuvwxyz,abcdefghijklmnopqrstuvwxyz', ',') as u;

Statement failed, SQLSTATE = 22001
arithmetic exception, numeric overflow, or string truncation
-string right truncation
-expected length 8, actual 32

Expected output (will be also if we comment line marked as " [ ! ] "):

UNLIST                          abcdefghijklmnopqrstuvwxyz
UNLIST                          abcdefghijklmnopqrstuvwxyz

@pavel-zotov
Copy link

pavel-zotov commented Mar 26, 2025

PS.
Same for
select * from unlist('abcdefghi,abcdefghi', ',') as u;
(but not for 'abcdefgh' - for this case function works OK)

@sim1984
Copy link

sim1984 commented Mar 26, 2025

This is not surprising considering that the default return type is This is not surprising considering that the default return type is varchar(32) character set none. It is better to always specify returning type.

@pavel-zotov
Copy link

IMO, varchar(32) is too small. Why can't we use 8190 ?

@dyemanov
Copy link
Member

Resulting type should be VARCHAR(32) of the input character set, but it appears it's described as 32 bytes, not characters. To be fixed, AFAIU.

As for the length, maybe 32 is a short limit, but this function is to be commonly used with numbers and this limit is enough for that usage case. If you want to deal with strings, better specify the length yourself (nobody knows how many characters gonna fit).

@pavel-zotov
Copy link

QA note: deferred waiting for fix of resulting type (expect 32 characters rather than bytes)

@ChudaykinAlex
Copy link
Contributor Author

Good day. Thank you for your comments. I have looked into the previous example. I found an error in my code. I made a patch, now locally checking the build and tests. And will prepare for publishing.
I have executed your script from the example.
image

@pavel-zotov
Copy link

0xFF.
IMO, it is inconveniently that one need to repeat some parts of code (e.g. "unicode_char(0x2114)").
Why this function could not be used as other functions, like this:

with
d as (
    select
         blob_fld
        ,unicode_char(0x2114) as separator
    from t_longblob
)
, e as (
    select unlist(d.blob_fld, d.separator returning blob character set utf8) as x
    from d
)
select e.x
from e
where e.x containing d.separator
;

-- ?

@dyemanov
Copy link
Member

dyemanov commented Apr 9, 2025

We have no other table-valued functions yet, so your example is syntactically incorrect. It should look something like:

, e as (
    select * from d, unlist(d.blob_fld, d.separator returning blob character set utf8) as u (x)
)

@pavel-zotov
Copy link

It should look something like

Yes, it works in such way. Thanks!

@pavel-zotov
Copy link

It seems that ASCII_CHAR(0) can not be used as separator (at least currently):

echo set list on; set count on; select * from unlist('1', ascii_char(1)) as u(x); | isql /:employee
X                               1
Records affected: 1
(OK, expected)

echo set list on; set count on; select * from unlist('1', ascii_char(0)) as u(x); | isql /:employee
-- NO OUTPUT. FB hangs with 100% load of one CPU core.
-- Ctrl-Creak in ISQL does not help: FB process continues its activity

Checked on Windows.
This is URL to FB dump, stack trace, snapshot 6.0.0.725-a2b05f4-x64.7z and other info:
https://drive.google.com/drive/folders/1BgTSsRqvyD-cbmnvyE43xlG0vpxuOwk5?usp=sharing

@aafemt
Copy link
Contributor

aafemt commented Apr 10, 2025

ascii_char(0) must not be used anywhere outside of OCTETS. Perhaps this function should explicitly prohibit zero value.

But the infinite loop is definitely a bug.

@pavel-zotov
Copy link

One more example:

connect '/:employee';
set echo on;
set list on;
set count on;

select '1' || ascii_char(26) || '2' as chr_26_lst from rdb$database;

select /* check ascii_char(26) */ * from unlist( '1' || ascii_char(26) || '2', ascii_char(26)) as u(x);

select /* check ascii_char(26) */ * from unlist('1→2', '→') as u(x);

изображение

Output will be:

set list on;
set count on;
select '1' || ascii_char(26) || '2' as chr_26_lst from rdb$database;
CHR_26_LST                      1→2
Records affected: 1

select /* check ascii_char(26) */ * from unlist( '1' || ascii_char(26) || '2', ascii_char(26)) as u(x);
X                               1
X                               2
Records affected: 2

select /* check ascii_char(26) */ * from unlist('1
Expected end of statement, encountered EOF

изображение

@aafemt
Copy link
Contributor

aafemt commented Apr 10, 2025

This is a separate ISQL problem, not UNLIST one.

@pavel-zotov
Copy link

Perhaps this function should explicitly prohibit zero value.

maybe chr(0) must also be prohibited in LIST ():

echo select list('1', ascii_char(0)) from (select 1 x from rdb$types rows 5); | isql /:employee
LIST:
11111

echo select list(ascii_char(0), ':') from (select 1 x from rdb$types rows 5); | isql /:employee
LIST:
::::

@aafemt
Copy link
Contributor

aafemt commented Apr 10, 2025

"This function" I meant ascii_char().

@pavel-zotov
Copy link

"This function" I meant ascii_char().

Why to decline the whole function usage ? IMO, it is enough to restrict only some values that it returns, at least chr(0).

@aafemt
Copy link
Contributor

aafemt commented Apr 10, 2025

IMO, it is enough to restrict only some values that it returns, at least chr(0).

Yes, this is exactly what I wrote: ascii_char would better to refuse zero,

@pavel-zotov
Copy link

This is a separate ISQL problem, not UNLIST one.

#8512

@ChudaykinAlex
Copy link
Contributor Author

It seems that ASCII_CHAR(0) can not be used as separator (at least currently):

echo set list on; set count on; select * from unlist('1', ascii_char(1)) as u(x); | isql /:employee
X                               1
Records affected: 1
(OK, expected)

echo set list on; set count on; select * from unlist('1', ascii_char(0)) as u(x); | isql /:employee
-- NO OUTPUT. FB hangs with 100% load of one CPU core.
-- Ctrl-Creak in ISQL does not help: FB process continues its activity

Checked on Windows. This is URL to FB dump, stack trace, snapshot 6.0.0.725-a2b05f4-x64.7z and other info: https://drive.google.com/drive/folders/1BgTSsRqvyD-cbmnvyE43xlG0vpxuOwk5?usp=sharing

I found the problem with the infinite loop, I will fix it.
As for the ban ascii_char(0), I won't give you an answer yet.

@aafemt
Copy link
Contributor

aafemt commented Apr 10, 2025

As for the ban ascii_char(0), I won't give you an answer yet.

Perhaps you can answer an other question: is UNLIST supposed to work with (VAR)BINARY datatype (charset OCTETS) and binary BLOBs?

@pavel-zotov
Copy link

It seems that semicolon character ( ';' = ascii_char(59)) has some effect on time of PREPARE statement when source string has lot of such separators.
Consider attached .zip file, it contains four queries with different characters used as separator: ':', ';', '<' and '='.
In each case extremely long source string is used for parsing:
q'#t;y;;;<skipped>;;;#' -- where <skipped> is sequence of ~32K similar characters.

Queries look like this:

select /* check ascii_char(58) */ * from unlist(q'#t:y::: ... :::#')', ':') ;
select /* check ascii_char(59) */ * from unlist(q'#t;y;;; ... ;;;#')', ';') ;
select /* check ascii_char(60) */ * from unlist(q'#t<y<<< ... <<<#')', '<') ;
select /* check ascii_char(61) */ * from unlist(q'#t=y=== ... ===#')', '=') ;

Then run trace with config:

    log_connections = true
    log_transactions = true
    log_statement_prepare = true
    log_statement_start = true
    log_statement_finish = true
    log_statement_free = true
    print_perf = true

(its log also is in attached .zip)

For 1st, 4rd and 4th statements trace will show similar parts which executes almost instantly:

2025-04-10T17:26:13.2550 (232:0000000001FE24C0) START_TRANSACTION
...
2025-04-10T17:26:13.2550 (232:0000000001FE24C0) EXECUTE_STATEMENT_FINISH
...
SET TRANSACTION

2025-04-10T17:26:13.2550 (232:0000000001FE24C0) FREE_STATEMENT
...
2025-04-10T17:26:13.2550 (232:0000000001FE24C0) START_TRANSACTION
...
2025-04-10T17:26:13.2650 (232:0000000001FE24C0) PREPARE_STATEMENT
...
Statement 59:
-------------------------------------------------------------------------------
select /* check ascii_char(58) */ * from unlist(q'#t:y:::::::::::::::::::::::::::::::::
      3 ms

2025-04-10T17:26:13.2660 (232:0000000001FE24C0) EXECUTE_STATEMENT_START
...
select /* check ascii_char(58) */ * from unlist(q'#t:y:::::::::::::::::::::::::::::::::

2025-04-10T17:26:13.3730 (232:0000000001FE24C0) EXECUTE_STATEMENT_FINISH
...

But for SECOND statement (when ; is used as delimiter) trace will show that there is weird "gap" in timestamps after START_TRANSACTION and PREPARE_STATEMENT.
Size of this "gap" is about 8 seconds.

Appropriate lines are marked as [ 1 ] and [ 2 ], their stamps: 2025-04-10T17:26:23.0550 and 2025-04-10T17:26:31.1140:

2025-04-10T17:26:23.0550 (232:0000000001FE24C0) START_TRANSACTION
   employee (ATT_245, SYSDBA:NONE, NONE, TCPv6:fe80::8974:2b63:cd85:3f68%17/56873)
   C:\FB\60SS\isql.exe:10040
       (TRA_276, CONCURRENCY | WAIT | READ_WRITE)

2025-04-10T17:26:23.0550 (232:0000000001FE24C0) EXECUTE_STATEMENT_FINISH
   employee (ATT_245, SYSDBA:NONE, NONE, TCPv6:fe80::8974:2b63:cd85:3f68%17/56873)
   C:\FB\60SS\isql.exe:10040
       (TRA_276, CONCURRENCY | WAIT | READ_WRITE)

-------------------------------------------------------------------------------
SET TRANSACTION
0 records fetched
      0 ms, 1 write(s), 1 fetch(es), 1 mark(s)

2025-04-10T17:26:23.0550 (232:0000000001FE24C0) FREE_STATEMENT
   employee (ATT_245, SYSDBA:NONE, NONE, TCPv6:fe80::8974:2b63:cd85:3f68%17/56873)
   C:\FB\60SS\isql.exe:10040

-------------------------------------------------------------------------------
SET TRANSACTION

2025-04-10T17:26:23.0550 (232:0000000001FE24C0) START_TRANSACTION ------------------------ [  1  ]
   employee (ATT_245, SYSDBA:NONE, NONE, TCPv6:fe80::8974:2b63:cd85:3f68%17/56873)
   C:\FB\60SS\isql.exe:10040
       (TRA_277, READ_COMMITTED | NO_REC_VERSION | WAIT | READ_WRITE)

2025-04-10T17:26:31.1140 (232:0000000001FE24C0) PREPARE_STATEMENT ------------------------ [  2  ]
   employee (ATT_245, SYSDBA:NONE, NONE, TCPv6:fe80::8974:2b63:cd85:3f68%17/56873)
   C:\FB\60SS\isql.exe:10040
       (TRA_277, READ_COMMITTED | NO_REC_VERSION | WAIT | READ_WRITE)

Statement 59:
-------------------------------------------------------------------------------
select /* check ascii_char(59) */ * from unlist(q'#t;y;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
      2 ms

2025-04-10T17:26:31.1140 (232:0000000001FE24C0) EXECUTE_STATEMENT_START
   employee (ATT_245, SYSDBA:NONE, NONE, TCPv6:fe80::8974:2b63:cd85:3f68%17/56873)
   C:\FB\60SS\isql.exe:10040
       (TRA_276, CONCURRENCY | WAIT | READ_WRITE)

Statement 59:
-------------------------------------------------------------------------------
select /* check ascii_char(59) */ * from unlist(q'#t;y;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;

2025-04-10T17:26:31.2350 (232:0000000001FE24C0) EXECUTE_STATEMENT_FINISH
   employee (ATT_245, SYSDBA:NONE, NONE, TCPv6:fe80::8974:2b63:cd85:3f68%17/56873)
   C:\FB\60SS\isql.exe:10040
       (TRA_276, CONCURRENCY | WAIT | READ_WRITE)

unlist-long-duplicated-separators-at-end.zip

изображение

@mrotteveel
Copy link
Member

That thing with ; is probably the new ISQL feature to automatically recognize when a statement is complete.

@dyemanov
Copy link
Member

As for the ban ascii_char(0), I won't give you an answer yet.

Perhaps you can answer an other question: is UNLIST supposed to work with (VAR)BINARY datatype (charset OCTETS) and binary BLOBs?

Yes, it's expected to work.

@pavel-zotov
Copy link

I have a Q about how string that looks like floating-point number is interpreted in UNLIST.
If we check these statements:

echo set heading off; select cast(2.225073858507202e-308 as double precision) from rdb$database; | isql /:employee
echo set heading off; select cast(2.225073858507201e-308 as double precision) from rdb$database; | isql /:employee

-- then output will be:

2.225073858507202e-308
0.000000000000000

So, minimal value that is distinguish from zero is first of above mentioned.
Now lets check this:

echo select * from unlist('2.225073858507202e-307' returning double precision) as a(unlist_double_01); | isql /:employee

Output:

Statement failed, SQLSTATE = 22003
arithmetic exception, numeric overflow, or string truncation
-numeric value is out of range

Why ?

PS.
Minimum value that does not raise this exception in UNLIST (when it must return double) is: 9.9e-307 (9.899999999999e-307 - raises such error).

@dyemanov
Copy link
Member

It's not about UNLIST at all, you're comparing different cases. See:

SQL> select cast('2.225073858507202e-308' as double precision) from rdb$database;

                   CAST 
======================= 
Statement failed, SQLSTATE = 22003
arithmetic exception, numeric overflow, or string truncation
-numeric value is out of range
SQL> 

@pavel-zotov
Copy link

0xFF. Is this a bug or no ? (i mean: cast('2.225073858507202e-308' as double precision) ==> numeric value is out of range)

@dyemanov
Copy link
Member

IMO: if the numeric literal fits the double precision range, CAST from string should work. But maybe the underlying conversion is trickier and may cause undesired side effects.

@pavel-zotov
Copy link

QA note: deferred because of
#8418 (comment)
#8418 (comment)
(waiting for fix / resolution)

dyemanov added a commit that referenced this pull request Apr 19, 2025
@dyemanov
Copy link
Member

Please retest with the new snapshot.

@pavel-zotov
Copy link

Checked on 6.0.0.744-e883a89: all fine now. New test for UNLIST will be added soon.

@pavel-zotov
Copy link

::: QA note :::
See functional/intfunc/unlist/ folder with several tests related to this function.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

UNLIST table-valued function
8 participants