Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update searchVector at label identifier update for custom fields #7588

Merged
merged 12 commits into from
Oct 15, 2024

Conversation

ijreilly
Copy link
Collaborator

By default, when custom fields are created, a searchVector field is created based on the "name" field, which is also the label identifier by default.
When this label identifier is updated, we want to update the searchVector field to use this field as searchable field instead, if it is of "searchable type" (today it is only possible to select a text or number field as label identifier, while number fields are not searchable).

);

// index needs to be recreated as typeorm deletes then recreates searchVector column at alter
await this.indexMetadataService.createIndex(
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Weiko what happens in that typeOrm deletes then recreates searchVector column during update (cf PostGresQuerryRunner - changeColumn method - L751:

if (oldColumn.type !== newColumn.type ||
            oldColumn.length !== newColumn.length ||
            newColumn.isArray !== oldColumn.isArray ||
            (!oldColumn.generatedType &&
                newColumn.generatedType === "STORED") ||
            (oldColumn.asExpression !== newColumn.asExpression &&
                newColumn.generatedType === "STORED")) {
            // To avoid data conversion, we just recreate column
            await this.dropColumn(table, oldColumn);
            await this.addColumn(table, newColumn);
            // update cloned table
            clonedTable = table.clone();
        }
```)

So the index on the table is deleted as well, but the index metadata is untouched, creating a temporary inconsistancy. 
When I recreate the index right after, nothing new is saved on the index metadata table as the same information are to be saved, and the index is re-created on the table, so we are back to a consistent state.

Does that sound too fragile? This could also be a potential candidate for another patch

Copy link
Member

@Weiko Weiko Oct 14, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Interesting, thanks for the details! LGTM.

@ijreilly ijreilly requested a review from Weiko October 11, 2024 10:43
Copy link
Contributor

@greptile-apps greptile-apps bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PR Summary

This pull request enhances the search functionality for custom fields by updating the searchVector when the label identifier is modified.

  • Introduced new SearchModule and SearchService to handle search vector updates
  • Updated FieldMetadataService and ObjectMetadataService to integrate with the new search functionality
  • Modified WorkspaceMigrationRunnerService to support generated columns for searchVector
  • Added utility functions in is-searchable-field.util.ts to determine searchable field types
  • Improved getTsVectorColumnExpressionFromFields to filter and validate searchable fields

9 file(s) reviewed, 10 comment(s)
Edit PR Review Bot Settings | Greptile

{
action: WorkspaceMigrationColumnActionType.ALTER,
currentColumnDefinition: {
columnName: currentFieldMetadata.name,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

style: consider using computeColumnName(currentFieldMetadata) for consistency with handleCreateAction

defaultValue: undefined,
},
alteredColumnDefinition: {
columnName: alteredFieldMetadata.name,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

style: consider using computeColumnName(alteredFieldMetadata) for consistency with handleCreateAction

Comment on lines +25 to +27
const filteredFieldsUsedForSearch = fieldsUsedForSearch.filter((field) =>
isSearchableFieldType(field.type),
);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

logic: This filtering is redundant since the input type is already constrained to SearchableFieldType

Comment on lines 33 to 35
const columnExpressions = fieldsUsedForSearch.flatMap(
getColumnExpressionsFromField,
);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

logic: Use filteredFieldsUsedForSearch instead of fieldsUsedForSearch here

Comment on lines +29 to +31
if (filteredFieldsUsedForSearch.length < 1) {
throw new Error('No searchable fields found');
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

style: Consider using a more specific error type, such as WorkspaceMigrationException

Comment on lines +3 to +9
const SEARCHABLE_FIELD_TYPES = [
FieldMetadataType.TEXT,
FieldMetadataType.FULL_NAME,
FieldMetadataType.EMAILS,
FieldMetadataType.ADDRESS,
FieldMetadataType.LINKS,
] as const;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

style: Consider using an enum or object for SEARCHABLE_FIELD_TYPES to improve type safety and maintainability

this.logger.log(`Running command for workspace ${workspaceId}`);

try {
const searchVectorFields = await this.fieldMetadataRepository.findBy({
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Weiko I could add some very specific logic here to filter out fields whose entry in _typeorm_generated_columns_and_materialized_views has a value containing "CASE" (it would imply declaring/creating the repository etc for this entity that we kind of decided to set aside so far). This would make the command idempotent, while it is not completely the case here - if we run the command multiple times the output will be the same but we will have re-created the column and index.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's fine recreating everything as it will be executed only once. We can even go further and truncate the _typeorm_generated_columns_and_materialized_views table

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good idea, as we just discussed I will do the truncate manually on the db so I can first test on a workspace independently.

import { WorkspaceMigrationRunnerModule } from 'src/engine/workspace-manager/workspace-migration-runner/workspace-migration-runner.module';
import { WorkspaceSyncMetadataCommandsModule } from 'src/engine/workspace-manager/workspace-sync-metadata/commands/workspace-sync-metadata-commands.module';

@Module({
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So far I did not added this command to an upgrade-0.32 set of command as we would only want to run it twice and not to run it for self hosting

this.logger.log(`Running command for workspace ${workspaceId}`);

try {
const searchVectorFields = await this.fieldMetadataRepository.findBy({
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's fine recreating everything as it will be executed only once. We can even go further and truncate the _typeorm_generated_columns_and_materialized_views table

workspaceId,
);

if (isSearchEnabled && isWorkspaceMigratedForSearch) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TODO: remove feature flag in next PR, shouldn't be necessary anymore

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done in this PR !

...(isDefined(indexType) ? { indexType: indexType } : {}),
isCustom: isCustom,
});
result = await this.indexMetadataRepository.upsert(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we rename the method upsertIndex? @ijreilly

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wouldn't - it still creates the index, the upsert only concerns the entry in metadata table. but a migration to create the index is added anyway

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You are right @ijreilly!

Copy link
Member

@Weiko Weiko left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, good work!

@ijreilly ijreilly merged commit 1de7391 into main Oct 15, 2024
14 of 16 checks passed
@ijreilly ijreilly deleted the search-update-label-identifier branch October 15, 2024 14:34
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants