Skip to content
This repository has been archived by the owner on Apr 8, 2024. It is now read-only.

Python for Athena not working #844

Open
jmon-fluent opened this issue May 3, 2023 · 3 comments
Open

Python for Athena not working #844

jmon-fluent opened this issue May 3, 2023 · 3 comments
Labels
bug Something isn't working

Comments

@jmon-fluent
Copy link

Here's a link to the post I made on the DBT help site:
https://discourse.getdbt.com/t/python-dbt-fal-with-athena-error-database-does-not-exist/8051/4

Describe the bug
I tried following the directions here(dbt with AWS Athena and Python models. 1), but I’m getting an error.

I have dbt-fal (v 1.4.7) and dbt-athena-community (v1.4.3) installed in my conda environment

Your environment

  • OS: WSL Ubuntu 22.04
  • Paste the following commands output:
fal --version
dbt --version
# Core:
#  - installed: 1.4.6
#  - latest:    1.5.0 - Update available!
# Plugins:
#  - postgres:         1.4.5 - Update available!
#  - fal:              1.4.7 - Update available!
#  - athena:           1.4.3 - Ahead of latest version!
#  - fal_experimental: 1.4.7 - Could not determine latest version

How to reproduce
Pretty much just run dbt run with your python model for Athena

Expected behavior
A new table in the specified location in Athena

Actual behavior
I get this error
sqlalchemy.exc.OperationalError: (pyathena.error.OperationalError) [ErrorCategory:USER_ERROR, ErrorCode:SYNTAX_ERROR], Detail:FAILED: SemanticException [Error 10072]: Database does not exist: awsdatacatalog

What I noticed
So I went to the athena.py script in the athena/fal_experimental/support path and started tinkering with the code. I was actually able to get it to "work" by changing some of the variables. I put work in quotes because I don't know if there are any other effects that my code has changed.

But pretty much needed to change references to "schema" to "table" in some places and then I needed to add a blackslash to the staging directory.

I don't mind doing this work and creating a PR, but I've never contributed to someone else's github before, so I'm a little apprehensive to do so.

@jmon-fluent jmon-fluent added the bug Something isn't working label May 3, 2023
@burkaygur
Copy link
Contributor

Hi @jmon-fluent ! Feel free to do a PR and we will help you merge it.

@jmon-fluent
Copy link
Author

Hi @jmon-fluent ! Feel free to do a PR and we will help you merge it.

Thanks! I actually just realized some things that maybe I should discuss first

  • The final table (after the temp table), seems to get placed in a the s3_stagng_directory instead of the full table path, like the sql tables
    • E.g. SQL table location= s3://staging_directory/tables/table_name vs Python table location = s3://staging_directory
    • Would it make sense to update the code so that it matches the SQL s3 location?
  • I realize that with python, the temp table is created by inserting the data frame values in 1 by 1. One alternative would be to have the data frame write directly to the s3 location as a csv or parquet file, and the temp Athena table just needs to point at this location. Are there any reasons the latter method would be undesirable?
    • One reason I ask is because "Drop Table" in Athena just removes the table reference, but the data still persists, so when we call drop_relation_if_it_exists(adapter, temp_relation) and then dataframe.to_sql(...), the temp table has all the old records and the new records
    • If we just overwrite the file, we'd make sure the temp table has only the new records
  • I haven't checked what incremental mode looks like, so I'll probably ask about that when I get to that
    Please let me know any thoughts or concerns any of you might have

@occulkot
Copy link

Just for update, dbt-fal-1.5.9 is still unable to work with dbt-athena-community. Currently problem appears on resolving database/schema/table in ref/source functions. For me it claims table does not exist and in compiled code it shows quoted reference:
refs = {"v_enrolled": "\"awsdatacatalog\".\"partner_dashboards\".\"v_enrolled\""}
or
sources = {"analytics.user_programs": "\"awsdatacatalog\".\"analytics\".\"user_programs\""}

Similar issue was found in here #781

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

3 participants