Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Default collate ruins hana performance, seems like paging is broken #965

Open
ThePlenkov opened this issue Dec 23, 2024 · 9 comments
Open
Labels
bug Something isn't working

Comments

@ThePlenkov
Copy link

Context

  • We have a model where for simplicity we just expose classic BKPF table as is as the entity.
  • We want to swtich to a new @cap-js/hana driver

The problem

We expect that a query like GET /odata/bkpf should result in SQL like:

select .... from BKPF limit 10

however after switching to @cap-js/hana presented as a performance improver we've got a very serious issue, when table cannot even being read.

So SQL now turned out into something like:

SELECT *,
('$[' || lpad("$$RN$$", 6, '0')) as "_path_"
FROM (
        SELECT *,
            ROW_NUMBER() OVER (
                ORDER BY "MANDT" COLLATE ENGLISH ASC,
                    "BUKRS" COLLATE ENGLISH ASC,
                    "BELNR" COLLATE ENGLISH ASC,
                    "GJAHR" COLLATE ENGLISH ASC
            ) as "$$RN$$"
        FROM (
                SELECT "BKPF"."MANDT",
                    "BKPF"."BUKRS",
                    "BKPF"."BELNR",
                    "BKPF"."GJAHR",
                    "BKPF"."BLART",
                    "BKPF"."BLDAT",
                    "BKPF"."BUDAT",
                    "BKPF"."MONAT",
                    "BKPF"."CPUDT",
                    "BKPF"."CPUTM",
...
                FROM "api.odata.BKPF" as "BKPF"
                ORDER BY "BKPF"."MANDT" COLLATE ENGLISH ASC,
                    "BKPF"."BUKRS" COLLATE ENGLISH ASC,
                    "BKPF"."BELNR" COLLATE ENGLISH ASC,
                    "BKPF"."GJAHR" COLLATE ENGLISH ASC
                LIMIT 10
            ) as "BKPF"
    ) as "BKPF"

May be we need to activate something in the database but our HANA 2 SP7 just fails with OOM error now..

To my opinion this is simply not needed here. Therefore I'd like to bring here several points into discussion:

  • there must be a way to switch collate globally. This docs page mentions cds.sql.hana.ignoreLocale but if we look carefully - it's only for Java, there is no such a setting on NodeJS side neither in a new driver nor in a core driver.
  • To my opinion collate must be switched off by default even with enabled global feature and switched on only when is really needed, so instead of collate:false it must be collate: true only for fields when is needed. We don't have a large project but I guess we have already >10K columns in total. How do you imagine annotating every single string field with collate annotation?
  • Explicitly - i do not see any sense to collate key columns. When we order by keys - it's just supposed to be like this, so it only makes sense for non-key fields to my opinion.

What do you think about this?

Can we make this module more configurable please?

As of now at least it would be great if we could sync with the docs and support ignoreLocale in JS too.

Thanks!

@ThePlenkov ThePlenkov added the bug Something isn't working label Dec 23, 2024
@ThePlenkov
Copy link
Author

ThePlenkov commented Dec 23, 2024

I'm still trying to figure out how localized and cds.collate are connected. Is it same?

image

@ThePlenkov
Copy link
Author

ThePlenkov commented Dec 23, 2024

and here where we have every single SELECT localized:

image

@ThePlenkov
Copy link
Author

Let me please share some more evidence. Let's start with a simple SELECT:

image

Cost is extremely low.

Now let's add some order
image
We can see that cost is already higher, so if we select certain lines from db - I'd prefer that we control the order with a query and do not enforce ordering in SQL because it increases costs

Now let's move further and support collations for the order:
image

We can see that collation already makes cost 72 and it's explainable why - TOP K SORT operator works

So imagine how huge the impact on the system is - just enabling collations - may literally eat the available database memory without any logical sense ( in case of key columns for example ).

That's why I highly advice to reconsider usage of this option and leave it only for the cases when it's really needed by using one of CDS features:

  • localized directive
  • @cds.collate: true annotation

@danjoa Please correct me if I don't understand something - but i find this problem now as a blocking factor for switching to a new driver unfortunately.

Thanks!

@ThePlenkov
Copy link
Author

So I found the place which was causing the issue and enabling collation for every single string field.
This code:

    if (req.event === 'READ' && req.query?.SELECT) {
      req.query.SELECT.localized ??= true
    }

is located in a file node_modules/@sap/cds/libx/_runtime/common/generic/crud.js and it's making every single select to work as localized.

It seems to be a perfect place for a global feature flag check, however I'd like to suggest still to review the default logic:

  • there must be a way to enable/disable explicitly collations on a global level
  • there must be a way to enable/disable collations on the entity level ( like no collations for any fields )
  • there must be a way to enable/disable collations on the column level ( certain fields )

The most important part - is that collations enabled by default are causing performance drop unfortunately, so they have to be used only there where is really needed

@ThePlenkov
Copy link
Author

Here is my workaround:

const cds = require("@sap/cds");

module.exports = class odata extends cds.ApplicationService {
  async init() {

    //disable localizations ( to prevent collation performance drop)
    this.before("READ", (req) => {
      req.query.SELECT.localized ??= false;
    });

    await super.init();
  }
};

@BobdenOs
Copy link
Contributor

BobdenOs commented Jan 8, 2025

There are some product standards that have to be met by all SAP applications. One of them is the ability to provide internationalization. This includes localized sorting order of text fields. One of the most critical use cases that we have already tackled in @cap-js/hana is the collation of UUID columns as they are not human readable strings (by definition). As so far all String fields have been presumed to be human readable it wasn't possible for the new databases to switch the default to not collating strings and to only collate localized string columns (discussions are ongoing).

Additionally all ql queries that are fired are by default not localized so no collation will be applied and no internalization will be applied in the queries, but all incoming http requests are considered to be UI requests therefor the crud implementation that you referenced forces all queries to be flagged as localized.

Additionally all UI requests are susceptible to pagination. Therefor it is required to provide explicit stable sorting. Which is achieved by appending all key columns to the orderBy of the incoming query. If this is not provided the sorting order returned by HANA is only guaranteed to be functionally correct for each individual query. Which means that if you have 2 entries with the same value for a ordered column and these two fall exactly in between two pages the same row will always be returned as HANA will order the second row into the other page every time. Only when forced to sort by the unique key columns as well will HANA actually return the two rows in the separate paged requests.

There shouldn't be a need for a global collation disable flag as this would only be a work around for a more fundamental problem. As for disabling certain columns that currently already exists (@cds.collate). As for disabling collation on an entity level is most likely also just a work around for a more fundamental problem.

My current understanding of goal we have so far is to disable collation for String types by default and use the localized to inform the model that this column actually contains human readable texts that is susceptible for localization specific behaviors. And the already adopted never collated UUID columns. This should result in all default sorting applied for UI requests won't trigger collation (assuming you don't have localized key columns). And only once the User actually requests sorting by a localized column would it trigger the database to start applying collation in the sorting. But these discussions are still ongoing as it has wider implications as this change would require other apps to do the opposite of adding @cds.collate:true to all their currently String columns that should still be treated as localized fields.

@ThePlenkov
Copy link
Author

Hi @BobdenOs thanks, for reply. Let me please iterate on your comments.

Before this I'd like to highlight few things:

  • it's a proven fact that COLLATE is causing higher database costs than just ORDER BY
  • it results in higher system load, much more memory consumption and in the end increases costs of the system
  • current hana clients like hdb or hana-client as I understand do not support collations ( at least we never saw this before with previous versions )
  • it means we talk here not about the problem to change all existing apps
  • to my opinion we face here with a risk instead to force existing apps and reports with unexpected database load

Additionally I'd like to iterate on following topics:

There are some product standards that have to be met by all SAP applications. One of them is the ability to provide internationalization.

Indeed, that's true. But CAP apps written by clients, cannot be considered as SAP apps by default and internalization may be not required or even instead must be forbidden ( like in my company we only use English, regardless of users locale ).

As so far all String fields have been presumed to be human readable it wasn't possible for the new databases to switch the default to not collating strings and to only collate localized string columns (discussions are ongoing).

I can't agree with this statement. Of course some fields may contain uuids, but there are many other cases when String field is not supposed to be human-readable, or is not a subject for localization ( like external id, numeric ids ( BELNR ) ) and it doesn't even make sense to collate them. Also talking about uuids, if it's unmanaged entity - we may use other keys rather than uuid, right? key is a key, doesn't matter what is a type

all incoming http requests are considered to be UI

What about service-to-service communication? We're using CAP as REST API and it's not supposed to be called from UI directly. There is no locale in this case.

Additionally all UI requests are susceptible to pagination.

Sorting is OK, as what I mentioned before, main cost is caused by sort with COLLATE.

As for disabling certain columns that currently already exists (@cds.collate).

Correct! But what it means - if you we don't want to collate - then it means we have to annotate every single field. In large data models there may be thousands of columns - this is simply not possible just to annotate them that easily. What is your vision here? What if at some point we want to check things without collations - do we have to edit all the files and redeploy the app?

My current understanding of goal we have so far is to disable collation for String types by default and use the localized to inform the model that this column actually contains human readable texts that is susceptible for localization specific behaviors.

That is exactly what I'd like to achieve, yes. To have the ability to mention explicitly fields that they are localized. It would be nice also to support extended models, that's how we can mark existing columns as localized with extension views.

And the already adopted never collated UUID columns.

Right, but as I already mentioned before - we may use other columns to store different kinds of ids ( external ids, hashes, uuids in other formats, any other non-collatable strings which are technical )

And only once the User actually requests sorting by a localized column would it trigger the database to start applying collation in the sorting

Exactly! That would be much more transparent and predictable.

@David-Kunz
Copy link
Contributor

Hi,

Collation and localization are orthogonal features, and it wouldn't be correct to only treat localized strings as subjects for collation. Some non-localized strings are also valid candidates.

In my opinion, it should be easy to opt out of generic collation and only enable it for some cases, knowing what impact it has on performance.

Best regards,
David

@ThePlenkov
Copy link
Author

Thanks for confirming this @David-Kunz . That what I was also trying to figure out - it's a lot in documentation about localization but not much things mentioned about collations - so that's why I was not fully sure.

Looking forward for updates this on this topic. As I already mentioned - the workaround is available - so it's not very urgent - but of course would be nice to have it working out of the box somehow.

Cheers,
Petr

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

3 participants