Skip to content

Conversation

@GowthamTG
Copy link

@GowthamTG GowthamTG commented Jan 16, 2025

This DatasetRegistry and ColumnCompatibilityAnalyzer functions should help us to power joins and show column suggestions based on schema

@GowthamTG GowthamTG marked this pull request as ready for review January 16, 2025 16:44
@GowthamTG GowthamTG changed the title feat: added dataset and columns registry feat(joins): added dataset and columns registry Jan 16, 2025
@GowthamTG GowthamTG force-pushed the feat/datasets_registry branch from ceec035 to 37d25ff Compare January 16, 2025 17:00
sourceType: ColumnType,
targetType: ColumnType
): number {
return this['getTypeCompatibilityScore'](sourceType, targetType);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

any reason for the 'getTypeCompatibilityScore' syntax?

Copy link
Author

@GowthamTG GowthamTG Jan 16, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes as it a private method I am not able to access it directly via this
This is only done to test the funtion

Comment on lines 54 to 56
if (compatibility.totalScore === 0) {
throw new Error('Columns are not compatible for joining');
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

are we computing the join path here? Should not just join based on whatever the user has selected?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not sure if we actually require this funtion because as soon as the user selects some column form the findCompatibleColumns function the consumer would have the join path

{
  sourceDatasetId: string;
  sourceColumnName: string;
  destinationDatasetId: string;
  destinationColumnName: string;
}

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have removed the same will add in future

});
}

public doesJoinPathExist({
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should this return the joinPath if it exists, otherwise undefined? Would be more reusable that way

Comment on lines 115 to 118
typeScore,
nameScore,
schemaScore,
totalScore: typeScore + nameScore + schemaScore,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

generally if the values can be derived from the return type, then there is not need to precompute it. totalScore is already represented by the returned values of typeScore, nameScore, schemaScore,

package.json Outdated
"duckdb": "^0.10.2",
"express": "^4.19.2",
"fake-indexeddb": "^5.0.1",
"fast-deep-equal": "3.1.3",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

are we using this anywhere?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes I am using this to check if the columns schema is same for any two columns

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we. use lodash instead?

@vpbs2
Copy link
Contributor

vpbs2 commented Jan 16, 2025

Can we also update the meerkat-core and meerkat-browser version

@@ -0,0 +1,126 @@
import * as deepEqual from 'fast-deep-equal';
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is is possible to use lodash.deepEqual?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lodash deepEqal is way slow compared to this.
In web repo aswell we have been using the same

Screenshot 2025-01-28 at 3 02 15 PM

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since we already have the package, i dont want to add another one for just fast equals, we can change the package if it become a bottleneck in the future, for now lodash should be fine

export interface Column<T = object> {
name: string;
dataType: ColumnType;
schema?: T;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should schema have a defined type?

Copy link
Author

@GowthamTG GowthamTG Jan 28, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Schema for us to would be DevRevSchema but for other consumer it might not be the same thats why we have defined it as configurable. Like they might follow their own filter format

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IMO defined type would be better here. Do we have functional support for handling generics?


export interface Column<T = object> {
name: string;
dataType: ColumnType;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

column shuold be just DimensionType right?

@GowthamTG GowthamTG force-pushed the feat/datasets_registry branch from 448649b to 09ed48c Compare January 28, 2025 11:45
@GowthamTG GowthamTG force-pushed the feat/datasets_registry branch from 1f3740b to 206ffbb Compare January 28, 2025 14:18
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants