Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Faster execution of indexed STARTING WITH with UNICODE collation #6872

Closed
asfernandes opened this issue Jun 25, 2021 · 5 comments
Closed

Faster execution of indexed STARTING WITH with UNICODE collation #6872

asfernandes opened this issue Jun 25, 2021 · 5 comments

Comments

@asfernandes
Copy link
Member

asfernandes commented Jun 25, 2021

Initial execution time of indexed STARTING WITH lookup with UNICODE collation is very slow.

Test cases with timings using the debug build:

-- UTF8 without collation

recreate table t1 (c1 varchar(10) character set utf8)!
create index t1_idx on t1 (c1)!

execute block
as
    declare n integer = 0;
    declare v type of column t1.c1;
begin
    while (n < 100000)
    do
    begin
        select 1 from t1 where c1 starting with 'x' into v;
        n = n + 1;
    end
end!

-- Elapsed time = 0.422 sec
-- WIN1252 collation WIN_PTBR

recreate table t1 (c1 varchar(10) character set win1252 collate win_ptbr)!
create index t1_idx on t1 (c1)!

execute block
as
    declare n integer = 0;
    declare v type of column t1.c1;
begin
    while (n < 100000)
    do
    begin
        select 1 from t1 where c1 starting with 'x' into v;
        n = n + 1;
    end
end!

-- Elapsed time = 0.440 sec
-- UTF8 collation UNICODE

recreate table t1 (c1 varchar(10) character set utf8 collate unicode)!
create index t1_idx on t1 (c1)!

execute block
as
    declare n integer = 0;
    declare v type of column t1.c1;
begin
    while (n < 100000)
    do
    begin
        select 1 from t1 where c1 starting with 'x' into v;
        n = n + 1;
    end
end!

-- Elapsed time = 6.498 sec
@asfernandes asfernandes self-assigned this Jun 25, 2021
@asfernandes asfernandes changed the title Indexed STARTING WITH execution very slow with UNICODE collation Indexed STARTING WITH execution is very slow with UNICODE collation Jun 25, 2021
asfernandes added a commit that referenced this issue Jun 25, 2021
asfernandes added a commit that referenced this issue Jun 29, 2021
@hvlad
Copy link
Member

hvlad commented Jul 29, 2021

Adriano, could you look at https://groups.google.com/g/firebird-support/c/VCXnWp0IZVw ?
It looks like another incarnation of this issue.

@asfernandes
Copy link
Member Author

Adriano, could you look at https://groups.google.com/g/firebird-support/c/VCXnWp0IZVw ?
It looks like another incarnation of this issue.

This does not happen only with UTF8/UNICODE.

The problem with some characters is that they are part (start of) contractions that generate sort keys to order them in different place.

So the last character of a key that is the start of a contraction must be excluded from the key, otherwise the lookup will not work.

This issue is about how to verify that contractions faster than before.

@javihonza
Copy link

Hi,
unfortunately this fix did not solve the speed problem
https://groups.google.com/g/firebird-support/c/VCXnWp0IZVw

@asfernandes
Copy link
Member Author

Hi,
unfortunately this fix did not solve the speed problem
https://groups.google.com/g/firebird-support/c/VCXnWp0IZVw

I'm verifying a way to improve this.

@asfernandes
Copy link
Member Author

Hi,
unfortunately this fix did not solve the speed problem
https://groups.google.com/g/firebird-support/c/VCXnWp0IZVw

Created #6915 to track this problem.

asfernandes added a commit that referenced this issue Feb 16, 2022
asfernandes added a commit that referenced this issue Mar 16, 2022
@dyemanov dyemanov changed the title Indexed STARTING WITH execution is very slow with UNICODE collation Faster execution of indexed STARTING WITH with UNICODE collation May 1, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment