Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[FEATURE]: Migrate spark.table("db.table") to spark.table("catalog.db.table") #1082

Closed
1 task done
nfx opened this issue Mar 21, 2024 · 8 comments
Closed
1 task done
Labels
enhancement New feature or request migrate/code Abstract Syntax Trees and other dark magic migrate/jobs Step 5 - Upgrading Jobs for External Tables

Comments

@nfx
Copy link
Collaborator

nfx commented Mar 21, 2024

Is there an existing issue for this?

  • I have searched the existing issues

Problem statement

Every table in UC needs a catalog

Proposed Solution

Transform AST/CST with the migrated table index using the fixer framework declared in #1067

  • spark.table(...)
  • spark.read.table(...)
  • ...write.saveAsTable(...)
  • ...

Add another Linter/Fixer to https://github.com/databrickslabs/ucx/blob/main/src/databricks/labs/ucx/code/pyspark.py

Additional Context

No response

@nfx nfx added enhancement New feature or request migrate/jobs Step 5 - Upgrading Jobs for External Tables labels Mar 21, 2024
@nfx nfx added this to UCX Mar 21, 2024
@github-project-automation github-project-automation bot moved this to Triage in UCX Mar 21, 2024
@nfx nfx added the migrate/code Abstract Syntax Trees and other dark magic label Mar 21, 2024
@ericvergnaud
Copy link
Contributor

As the list of candidate function calls eligible to migration grows, our currently minimalistic approach (checking just the name and arguments of the function being called) might increase the risk of unwanted migrations.
Also, if the argument is not a string literal, the migration will be skipped.
Is that something we want to cater for, and if it is, should it be done as part of this ticket ?

@nfx
Copy link
Collaborator Author

nfx commented Apr 1, 2024

As the list of candidate function calls eligible to migration grows, our currently minimalistic approach (checking just the name and arguments of the function being called) might increase the risk of unwanted migrations.

that's correct.

Also, if the argument is not a string literal, the migration will be skipped.

created a separate issue to track this:

@nfx
Copy link
Collaborator Author

nfx commented Apr 12, 2024

@jimidle ^

@ericvergnaud
Copy link
Contributor

@jimidle you could leverage what I did for sys.path in PythonLinter once it's merged to main - fyi @nfx

@jimidle
Copy link
Contributor

jimidle commented Apr 14, 2024

@jimidle you could leverage what I did for sys.path in PythonLinter once it's merged to main - fyi @nfx

OK - I will take a look once you merge it. @nfx Do you have a preference for using the fixer framework over Eric's suggestion?

@ericvergnaud
Copy link
Contributor

ericvergnaud commented Apr 14, 2024

My suggestion is for improved linting i.e. #1205, not for fixing. It's now merged.

@nfx
Copy link
Collaborator Author

nfx commented Apr 14, 2024

@jimidle @ericvergnaud sys.path manipulation and this feature are orthogonal. We already have the fixer framework and the example is with sql queries.

The implementation that only looks at string constants is trivial and should not take more than few hours to implement and test.

@nfx
Copy link
Collaborator Author

nfx commented Apr 16, 2024

fixed in #1210

@nfx nfx closed this as completed Apr 16, 2024
@github-project-automation github-project-automation bot moved this from Month Backlog to Archive in UCX Apr 16, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request migrate/code Abstract Syntax Trees and other dark magic migrate/jobs Step 5 - Upgrading Jobs for External Tables
Projects
Archived in project
Development

No branches or pull requests

3 participants