Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve performance twenty orm #6691

Merged
merged 5 commits into from
Aug 20, 2024
Merged

Conversation

charlesBochet
Copy link
Member

@charlesBochet charlesBochet commented Aug 19, 2024

Context

As we grow, the messaging scripts are experiencing performance issues forcing us to temporarily disable them on the cloud.
While investigating the performance, I have noticed that generating the entity schema (for twentyORM) in the repository is taking ~500ms locally on my Mac M2 so likely more on pods. Caching the entitySchema then!

I'm also clarifying naming around schemaVersion and cacheVersions ==> both are renamed workspaceMetadataVersion and migrated to the workspace table (the workspaceCacheVersion table is dropped).

Copy link
Contributor

@greptile-apps greptile-apps bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PR Summary

This pull request focuses on improving the performance of the Twenty ORM and implementing caching mechanisms for workspace data sources. The changes primarily affect the WorkspaceDatasourceFactory and related services.

  • Implemented caching for EntitySchemas and EntitySchemaOptions in WorkspaceDatasourceFactory to reduce WorkspaceDataSource creation time
  • Added new methods setEntitySchemaOptions and getEntitySchemaOptions in WorkspaceCacheStorageService for efficient schema caching
  • Introduced performance monitoring in MessagingMessagesImportCronJob and MessagingMessageChannelSyncStatusMonitoringCronJob using console.time()
  • Improved error handling in FileController for better client feedback on stream errors
  • Simplified the query for fetching message channels in MessagingMessageListFetchCronJob, potentially affecting data transfer efficiency

6 file(s) reviewed, 1 comment(s)
Edit PR Review Bot Settings

workspaceId,
);

let cachedEntitySchemas: EntitySchema[];
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

style: Consider using const instead of let for cachedEntitySchemas

@@ -44,6 +44,10 @@ export class FileController {
workspaceId,
);

fileStream.on('error', () => {
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have noticed that local driver (at least) does not throw FileNotFound as the stream get piped. do we have to change the API here of our file drivers or is there a way to make it work like the s3 driver?
@Weiko any idea?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would update the read function in local driver yes. I thought it was done though, my bad! @charlesBochet

@@ -78,18 +78,43 @@ export class WorkspaceDatasourceFactory {
);

if (!dataSourceMetadata) {
throw new Error('Data source metadata not found');
throw new Error(
`Data source metadata not found for workspace ${workspaceId}`,
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

adding a bit more logs to ease debugging

);
}

const cachedEntitySchemaOptions =
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we are caching the entity schema options (this is the typeOrmNaming, not sure it's super clear)

@@ -69,5 +70,7 @@ export class MessagingMessagesImportCronJob {
}
}
}

console.timeEnd('MessagingMessagesImportCronJob time');
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

adding some temporary timers to troubleshoot performances

Copy link
Member

@Weiko Weiko left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm surprised this has an impact, I thought the datasource was cached already? Although this would confirm my suspicion when I saw the logs a few weeks back 🤔
LGTM!

@@ -44,6 +44,10 @@ export class FileController {
workspaceId,
);

fileStream.on('error', () => {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would update the read function in local driver yes. I thought it was done though, my bad! @charlesBochet

@charlesBochet
Copy link
Member Author

charlesBochet commented Aug 20, 2024

as an impact, I thought the datasource was cached already? Although this would confirm my suspicion when I saw the logs a fe

@Weiko
The datasource is cached in memory but when we deploy / scale horizontally the number of worker/servers they have to rebuild it and this cause lagging.

I have found another issue in the way we invalidate the cache, I'll add it

Regarding the file driver, I would like both drivers to have the same behavior and I don't get why the s3 is throwing exceptions while the local is not ; I think the s3 driver is checking the existence of the file before streaming it, I wonder if we should actually do the same for local driver manually

@charlesBochet charlesBochet merged commit 17a1760 into main Aug 20, 2024
7 of 9 checks passed
@charlesBochet charlesBochet deleted the improve-performance-twenty-orm branch August 20, 2024 17:42
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants