feat(joins): added dataset and columns registry #127

GowthamTG · 2025-01-16T16:13:31Z

This DatasetRegistry and ColumnCompatibilityAnalyzer functions should help us to power joins and show column suggestions based on schema

zaidjan-devrev · 2025-01-16T17:03:20Z

meerkat-core/src/column-compatibility-analyzer/column-compatibility-analyzer.spec.ts

+    sourceType: ColumnType,
+    targetType: ColumnType
+  ): number {
+    return this['getTypeCompatibilityScore'](sourceType, targetType);


any reason for the 'getTypeCompatibilityScore' syntax?

Yes as it a private method I am not able to access it directly via this
This is only done to test the funtion

zaidjan-devrev · 2025-01-16T17:05:52Z

meerkat-core/src/column-compatibility-analyzer/column-compatibility-analyzer.ts

+    if (compatibility.totalScore === 0) {
+      throw new Error('Columns are not compatible for joining');
+    }


are we computing the join path here? Should not just join based on whatever the user has selected?

I am not sure if we actually require this funtion because as soon as the user selects some column form the findCompatibleColumns function the consumer would have the join path

{ sourceDatasetId: string; sourceColumnName: string; destinationDatasetId: string; destinationColumnName: string; }

I have removed the same will add in future

zaidjan-devrev · 2025-01-16T17:06:26Z

meerkat-core/src/column-compatibility-analyzer/column-compatibility-analyzer.ts

+    });
+  }
+
+  public doesJoinPathExist({


should this return the joinPath if it exists, otherwise undefined? Would be more reusable that way

zaidjan-devrev · 2025-01-16T17:08:15Z

meerkat-core/src/column-compatibility-analyzer/column-compatibility-analyzer.ts

+      typeScore,
+      nameScore,
+      schemaScore,
+      totalScore: typeScore + nameScore + schemaScore,


generally if the values can be derived from the return type, then there is not need to precompute it. totalScore is already represented by the returned values of typeScore, nameScore, schemaScore,

zaidjan-devrev · 2025-01-16T17:10:09Z

package.json

    "duckdb": "^0.10.2",
    "express": "^4.19.2",
    "fake-indexeddb": "^5.0.1",
+    "fast-deep-equal": "3.1.3",


are we using this anywhere?

Yes I am using this to check if the columns schema is same for any two columns

Can we. use lodash instead?

vpbs2 · 2025-01-16T18:30:26Z

Can we also update the meerkat-core and meerkat-browser version

zaidjan-devrev · 2025-01-28T08:35:41Z

meerkat-core/src/column-compatibility-analyzer/column-compatibility-analyzer.ts

@@ -0,0 +1,126 @@
+import * as deepEqual from 'fast-deep-equal';


is is possible to use lodash.deepEqual?

lodash deepEqal is way slow compared to this.
In web repo aswell we have been using the same

Since we already have the package, i dont want to add another one for just fast equals, we can change the package if it become a bottleneck in the future, for now lodash should be fine

zaidjan-devrev · 2025-01-28T08:38:32Z

meerkat-core/src/dataset-registry/types.ts

+export interface Column<T = object> {
+  name: string;
+  dataType: ColumnType;
+  schema?: T;


should schema have a defined type?

Schema for us to would be DevRevSchema but for other consumer it might not be the same thats why we have defined it as configurable. Like they might follow their own filter format

IMO defined type would be better here. Do we have functional support for handling generics?

zaidjan-devrev · 2025-01-28T08:38:55Z

meerkat-core/src/dataset-registry/types.ts

+
+export interface Column<T = object> {
+  name: string;
+  dataType: ColumnType;


column shuold be just DimensionType right?

GowthamTG marked this pull request as ready for review January 16, 2025 16:44

GowthamTG changed the title ~~feat: added dataset and columns registry~~ feat(joins): added dataset and columns registry Jan 16, 2025

GowthamTG force-pushed the feat/datasets_registry branch from ceec035 to 37d25ff Compare January 16, 2025 17:00

zaidjan-devrev reviewed Jan 16, 2025

View reviewed changes

zaidjan-devrev reviewed Jan 28, 2025

View reviewed changes

GowthamTG force-pushed the feat/datasets_registry branch from 448649b to 09ed48c Compare January 28, 2025 11:45

new changes

206ffbb

GowthamTG force-pushed the feat/datasets_registry branch from 1f3740b to 206ffbb Compare January 28, 2025 14:18

move to tree shaking version

ea043ba

		@@ -0,0 +1,126 @@
		import * as deepEqual from 'fast-deep-equal';

feat(joins): added dataset and columns registry #127

Are you sure you want to change the base?

feat(joins): added dataset and columns registry #127

Uh oh!

Conversation

GowthamTG commented Jan 16, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

GowthamTG Jan 16, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

vpbs2 commented Jan 16, 2025

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

GowthamTG Jan 28, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

GowthamTG commented Jan 16, 2025 •

edited

Loading

GowthamTG Jan 16, 2025 •

edited

Loading

GowthamTG Jan 28, 2025 •

edited

Loading