Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Persistence Component: Schema Evolution Enhancements and Bug Fixes #1523

Merged
merged 102 commits into from
Mar 27, 2023

Conversation

prasar-ashutosh
Copy link
Contributor

@prasar-ashutosh prasar-ashutosh commented Mar 15, 2023

What type of PR is this?

Choose one of the following labels :

  • Improvement
  • Bug Fix

What does this PR do / why is it needed ?

  • Schema evolution support for snowflake by adding its capability set and implicit/explicit data type mapping.
  • Add the feature for users to provide a custom capability set for schema evolution and returning a list of performed evolution SQLs for user's reference.
  • Add the feature to derive Main dataset from Staging Dataset
  • Add the feature to derive Main dataset from Database using jdbc connection
  • Added unit tests and end to end tests for different scenario's.
  • Case conversion Support for Datasets and Ingest Mode
  • Fixed the bugs in executor mode

Which issue(s) this PR fixes:

Fixes #

Other notes for reviewers:

Does this PR introduce a user-facing change?

NO

rengam32 and others added 30 commits October 25, 2022 11:20
Support for memsql create statement with shards and column store specification
…to develop-rengam-bitemp-stats

� Conflicts:
�	legend-engine-xt-persistence-component/legend-engine-xt-persistence-component-logical-plan/src/main/java/org/finos/legend/engine/persistence/components/planner/BitemporalDeltaPlanner.java
…ts for bitemporal milestoning statistics collection
…to rengam-infinite-batchid-value

� Conflicts:
�	legend-engine-xt-persistence-component/legend-engine-xt-persistence-component-logical-plan/src/main/java/org/finos/legend/engine/persistence/components/planner/BitemporalDeltaPlanner.java
@prasar-ashutosh prasar-ashutosh changed the title Zhlizh construct dataset Persistence Component: Schema Evolution Enhancements and Bug Fixes Mar 15, 2023
@github-actions
Copy link

github-actions bot commented Mar 15, 2023

Test Results

     406 files  ±0       406 suites  ±0   55m 20s ⏱️ -53s
  9 054 tests ±0    8 729 ✔️ ±0  325 💤 ±0  0 ±0 
10 745 runs  ±0  10 354 ✔️ ±0  391 💤 ±0  0 ±0 

Results for commit c412d61. ± Comparison against base commit 362097d.

♻️ This comment has been updated with latest results.

@kumuwu kumuwu requested a review from a team as a code owner March 16, 2023 06:32
Copy link
Contributor

@rafaelbey rafaelbey left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is a big change - the only things obvious I could notice was the fact that optimizer ends casting the result always. Wondering if the classes can be generalized?

@rafaelbey rafaelbey merged commit 4bcb8b7 into finos:master Mar 27, 2023
ghub-real pushed a commit to ghub-real/legend-engine that referenced this pull request Mar 28, 2023
…inos#1523)

* Support for memsql create statement with shards and column store specification

* Support for memsql create statement with shards and column store specification

* Support for memsql create statement with shards and column store specification

* Support for memsql create statement with shards and column store specification

* Support for memsql create statement with shards and column store specification

* Support for memsql create statement with shards and column store specification

* Implementing statistics collection for Bitemporal Delta milestoning

* Implementing statistics collection for Bitemporal Delta milestoning

* Implementing statistics collection for Bitemporal Delta milestoning

* Implementing statistics collection for Bitemporal Delta milestoning

* Implementing configurable infinite batch Id Value and adding ANSI tests for bitemporal milestoning statistics collection

* Refactoring assert statements for statistics + changes due to rebase with master

* Adding schema evolution compatibility and data mapping for snowflake + accepting user provided capabilities as a parameter

* Adding schema evolution compatibility and data mapping for snowflake + accepting user provided capabilities as a parameter

* Adding schema evolution compatibility and data mapping for snowflake + accepting user provided capabilities as a parameter

* Add one test

* Adding schema evolution compatibility and data mapping for snowflake + accepting user provided capabilities as a parameter

* Add more tests

* Adding schema evolution compatibility and data mapping for snowflake + accepting user provided capabilities as a parameter

* Add second passes

* Add more tests

* Add more tests

* Add more tests

* Add more tests

* Adding schema evolution compatibility and data mapping for snowflake + accepting user provided capabilities as a parameter

* Adding schema evolution compatibility and data mapping for snowflake + accepting user provided capabilities as a parameter

* Add more tests

* Add more tests

* Adding schema evolution compatibility and data mapping for snowflake + accepting user provided capabilities as a parameter

* Adding schema evolution compatibility and data mapping for snowflake + accepting user provided capabilities as a parameter

* Adding schema evolution compatibility and data mapping for snowflake + accepting user provided capabilities as a parameter

* Adding schema evolution compatibility and data mapping for snowflake + accepting user provided capabilities as a parameter

* Adding schema evolution compatibility and data mapping for snowflake + accepting user provided capabilities as a parameter

* Adding schema evolution compatibility and data mapping for snowflake + accepting user provided capabilities as a parameter

* Adding schema evolution compatibility and data mapping for snowflake + accepting user provided capabilities as a parameter

* Adding schema evolution compatibility and data mapping for snowflake + accepting user provided capabilities as a parameter

* Adding schema evolution compatibility and data mapping for snowflake + accepting user provided capabilities as a parameter

* Construct dataset object from database

* Update snowflake data type mapping to handle length and scale

* Fix validate primary keys

* Update implicit and explicit mappings

* Fix bugs for snowflake end-to-end ingest

* Fixing h2 end to end tests with schema evolution

* Fixing h2 end to end tests with schema evolution

* Fix checkstyle

* Auto Schema Derivation from Staging

* Fix the ingestor class to derive the staging schema

* Format the class RelationalIngestor

* Add test for Scenario with no fields in main table

* Fix all tests

* Modify tests for auto deriving schema for main table in H2

* Fix snowflake alter

* Clean up method definitions

* Clean up tests and TODOs

* Change snowflake explicit mapping

* Introduce DATA_TYPE_SCALE_CHANGE capability

* Disable snowflake explicit data type changes

* Code for Handling Case in ingest mode

* provide method for datasets case conversion

* provide method for datasets case conversion + unit tests

* Adding case optimization across datasets and ingestMode in SQL and ingestor mode

* Upper case handling in visitors

* Code refactor

* Adding some code for debug

* Fix for Upper case tests

---------

Co-authored-by: Mythreyi <[email protected]>
Co-authored-by: Mythreyi <[email protected]>
Co-authored-by: kumuwu <[email protected]>
Co-authored-by: Zhang Lizhi <[email protected]>
rafaelbey pushed a commit that referenced this pull request Mar 30, 2023
* added the MongoDbExecutionNode

* Execution plan processing

* mapping model & execution node protocol files.  And execution plugin setup (wip)

* Execute plan against database

* Fix dependency in maven parent pom

* Fix dependency in maven parent pom

* Fix dependency in maven parent pom

* Merge Theo/Hugo's work on execution plan & fix up execution plan module errors

* fixed type in extensions

* Grammer integration module & compiler module

* Grammar integration setup.  Compiler via Translator, not working yet - so commented out

* Merge MongoDB executor integration with Server

* Duplicate dependency warning fixed

* Fix diagram & move serializer to the right module

* Update version to match upstream master

* In Memory mongo setup

* fix checksyle warnings

* Merge Theo & Hugo's branch for passing in executionPlan from pure to legend-engine.  Fixed json _type attribute with custom serializer instead of mixin

* Legend java binding setup

* replace the _type first character from uppercase to lowercase

* Consistent module names

* Add GraphExecution node classes

* Updated Node specifics interface

* Work around maven false positives

* Remove local debug setup

* Update version numbers to match master

* added dependency in monogdb-executionPlan

* Fix pom dependency issues & binding codegen

* wiring of the mongodb binding extensions in the coreExtensions

* checking in these test exectuin plans for reference

* graph fetch query can get data from local mongo

* Test for json externalize

* removed unnecessary dependencies

* Remove unused code

* Added changes to handle SystemPropertiesSecret credentials
Added test class to spin up mongoDB with some test data (Person collection)

checked in setup.pure and welcome.pure files

* Make firm entity optional and still be able to generate pojo based json input

* Json internalize test to adapt to mongo execution result

* Fix up dependency errors

* removing these

* legend-engine-xt-nonrelationalStore-mongodb-executionPlan/pom.xml

* Dependency error post Hugo's branch merge, and clean up credentials

* Checkstyle warnings

* Test setup for demo

* Log db responses, and remove json parsing code - as bson already has all the fields necessary

* Pass in credentialsgit add legend-engine-xt-nonrelationalStore-mongodb-executionPlan/src/main/java/org/finos/legend/engine/plan/execution/stores/mongodb/MongoDBExecutor.java legend-engine-xt-nonrelationalStore-mongodb-executionPlan/src/main/java/org/finos/legend/engine/plan/execution/stores/mongodb/auth/MongoDBStoreConnectionProvider.java, and remove unused code

* Make the function interface correct, to map to concepts

* get the mongo query project fields from graphFetch

* removed unnecessary pattern

* Fix dependency errors

* test data consistency

* Fix field names

* Move query execution code also to test setup

* fix: field names to project from the actual field instead of the domain class

* Fix testsetup to  compile, the extensions parameter as array works only in the IDE, not in compiler

* setup legend compile from pure.ide

* Binding setup for legend compile

* grammar integration wip

* Move grammar integration into grammar module

* WIP: can stream the graph fetch results back to the PURE IDE

* Parser/composer for legend engine

* Make server aware of mongodb parser

* Add missing pom

* empty resources folder

* Add antlr gen folder to .ignore

* fix inadvertent typo

* WIP: can filter on nested properties, working on nested project

* Can filter on one level nested property, can create correct project mongo query for nested properties but the PURE IDE can display only top level

* merged master and cleanup

* Add store compiler + test

* Fix name property & remove logging

* can get nested properties in PURE IDE, using jackson only for JSON

* Compile nested property structure

* objectType extends from BaseType, formatting in MongoDBExecutionNodeExecutor

* data-space: more analytics improvements (#1570)

* data-space: improve specs and analytics

* minor cleanups

* activate Reproducible Builds (#1337)

Signed-off-by: Hervé Boutemy <[email protected]>
Co-authored-by: Kevin Knight <[email protected]>

* data-space: minor adjustment to analytics (#1571)

* [maven-release-plugin] prepare release legend-engine-4.4.5

* [maven-release-plugin] prepare for next development iteration

* data-space: improve service executable info (#1573)

* [maven-release-plugin] prepare release legend-engine-4.4.6

* [maven-release-plugin] prepare for next development iteration

* Legend SQL - introduce pluggable source providers (#1560)

* Legend SQL - introduce pluggable source providers

---------

Co-authored-by: gs-jp1 <[email protected]>

* Legend SQL - further handling of aggregate expressions (#1579)

* Persistence Component: Schema Evolution Enhancements and Bug Fixes (#1523)

* Support for memsql create statement with shards and column store specification

* Support for memsql create statement with shards and column store specification

* Support for memsql create statement with shards and column store specification

* Support for memsql create statement with shards and column store specification

* Support for memsql create statement with shards and column store specification

* Support for memsql create statement with shards and column store specification

* Implementing statistics collection for Bitemporal Delta milestoning

* Implementing statistics collection for Bitemporal Delta milestoning

* Implementing statistics collection for Bitemporal Delta milestoning

* Implementing statistics collection for Bitemporal Delta milestoning

* Implementing configurable infinite batch Id Value and adding ANSI tests for bitemporal milestoning statistics collection

* Refactoring assert statements for statistics + changes due to rebase with master

* Adding schema evolution compatibility and data mapping for snowflake + accepting user provided capabilities as a parameter

* Adding schema evolution compatibility and data mapping for snowflake + accepting user provided capabilities as a parameter

* Adding schema evolution compatibility and data mapping for snowflake + accepting user provided capabilities as a parameter

* Add one test

* Adding schema evolution compatibility and data mapping for snowflake + accepting user provided capabilities as a parameter

* Add more tests

* Adding schema evolution compatibility and data mapping for snowflake + accepting user provided capabilities as a parameter

* Add second passes

* Add more tests

* Add more tests

* Add more tests

* Add more tests

* Adding schema evolution compatibility and data mapping for snowflake + accepting user provided capabilities as a parameter

* Adding schema evolution compatibility and data mapping for snowflake + accepting user provided capabilities as a parameter

* Add more tests

* Add more tests

* Adding schema evolution compatibility and data mapping for snowflake + accepting user provided capabilities as a parameter

* Adding schema evolution compatibility and data mapping for snowflake + accepting user provided capabilities as a parameter

* Adding schema evolution compatibility and data mapping for snowflake + accepting user provided capabilities as a parameter

* Adding schema evolution compatibility and data mapping for snowflake + accepting user provided capabilities as a parameter

* Adding schema evolution compatibility and data mapping for snowflake + accepting user provided capabilities as a parameter

* Adding schema evolution compatibility and data mapping for snowflake + accepting user provided capabilities as a parameter

* Adding schema evolution compatibility and data mapping for snowflake + accepting user provided capabilities as a parameter

* Adding schema evolution compatibility and data mapping for snowflake + accepting user provided capabilities as a parameter

* Adding schema evolution compatibility and data mapping for snowflake + accepting user provided capabilities as a parameter

* Construct dataset object from database

* Update snowflake data type mapping to handle length and scale

* Fix validate primary keys

* Update implicit and explicit mappings

* Fix bugs for snowflake end-to-end ingest

* Fixing h2 end to end tests with schema evolution

* Fixing h2 end to end tests with schema evolution

* Fix checkstyle

* Auto Schema Derivation from Staging

* Fix the ingestor class to derive the staging schema

* Format the class RelationalIngestor

* Add test for Scenario with no fields in main table

* Fix all tests

* Modify tests for auto deriving schema for main table in H2

* Fix snowflake alter

* Clean up method definitions

* Clean up tests and TODOs

* Change snowflake explicit mapping

* Introduce DATA_TYPE_SCALE_CHANGE capability

* Disable snowflake explicit data type changes

* Code for Handling Case in ingest mode

* provide method for datasets case conversion

* provide method for datasets case conversion + unit tests

* Adding case optimization across datasets and ingestMode in SQL and ingestor mode

* Upper case handling in visitors

* Code refactor

* Adding some code for debug

* Fix for Upper case tests

---------

Co-authored-by: Mythreyi <[email protected]>
Co-authored-by: Mythreyi <[email protected]>
Co-authored-by: kumuwu <[email protected]>
Co-authored-by: Zhang Lizhi <[email protected]>

* update version number to match upstream

* remove firm field from testSetup

* added README and cleaned up the setup for the mongo execute

* Add connection grammar

* cleanup and updated README with VM options instructions

* executing the query through the new legend::execute method - updated the corresponding README

* Mapping parser

* Connection parser

* Update engine version

* Connection parser round trip

* remove complier module, part of grammar-integration now

* Fix checkstyle errors

* Fix tests file as we are now excluding ID column by default

* Fix tests file as we are now excluding ID column by default

* Remove test files

* Compiler for connection

* Update version number

* missing depedency

* avoid creating credentialproviderprovider - test code should have been removed earlier

* Pull credential provider from execution state rather than store state

* Missing constructor

* Update version to 4.5.1

* Fix tests to add new code repositories to expected list

* Revert removing the dependency version

---------

Signed-off-by: Hervé Boutemy <[email protected]>
Co-authored-by: Theodosios Malatestas <[email protected]>
Co-authored-by: Hugo Goncalves <[email protected]>
Co-authored-by: An Phi <[email protected]>
Co-authored-by: Hervé Boutemy <[email protected]>
Co-authored-by: Kevin Knight <[email protected]>
Co-authored-by: FINOS Administrator <[email protected]>
Co-authored-by: Vignesh Manickavasagam <[email protected]>
Co-authored-by: gs-jp1 <[email protected]>
Co-authored-by: prasar-ashutosh <[email protected]>
Co-authored-by: Mythreyi <[email protected]>
Co-authored-by: Mythreyi <[email protected]>
Co-authored-by: kumuwu <[email protected]>
Co-authored-by: Zhang Lizhi <[email protected]>
@kumuwu kumuwu deleted the zhlizh-construct-dataset branch September 6, 2023 05:09
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants