This project enhances the amazon/aws-glue-libs
Docker image to support Single Sign-On (SSO) functionality.
With this enhancement, you can run aws2 sso login
to authenticate instead of manually providing credentials for your Glue jobs.
Current Tested Image: amazon/aws-glue-libs:glue_libs_4.0.0_image_01
For more information about amazon/aws-glue-libs Docker image, see:
- Docker Container Image for AWS Glue ETL | hub.docker.com
- Developing and testing AWS Glue job scripts locally, using a Docker image | docs.aws.amazon.com/glue
-
Java Adapter Integration: Builds a Java adapter that enables AWS Java SDK v2 SSO Credentials Providers to work with AWS Java SDK v1 Credentials Provider interfaces. (Credits to: Millems on aws/aws-sdk-java#803 (comment))
-
Docker Image Enhancement: Adds the necessary SSO libraries to the Docker image and updates the Hadoop configuration file to utilize these libraries, facilitating seamless integration with SSO.
- Integration of SSO support in the AWS Glue Docker image.
- Compatibility with AWS Java SDK v2 and v1 credentials providers.
- Updated Hadoop configuration for new library usage.
As mentionned in the prerequisites section of Developing and testing AWS Glue job scripts locally, using a Docker image | docs.aws.amazon.com/glue:
Before you start, make sure that Docker is installed and the Docker daemon is running. For installation instructions, see the Docker documentation for Mac or Linux. The machine running the Docker hosts the AWS Glue container. Also make sure that you have at least 7 GB of disk space for the image on the host running the Docker.
For more information about restrictions when developing AWS Glue code locally, see Local development restrictions.
-
Clone the Repository
Use the following command to clone the repository:
git clone https://github.com/jerdoe/glue_libs_sso.git cd glue_libs_sso
-
Optional (only if using Docker Desktop): Add the workspace folder to the file-sharing resources
- Docker Desktop -> Settings -> Resources -> File sharing -> Virtual file shares -> Browse
- Select
glue_libs_sso/binds/workspace
, then click on the+
symbol
-
Run the container
docker compose up -d
-
Configure SSO
# Do not miss the quotes after `bash -lc` docker compose exec aws-glue bash -lc 'aws2 configure sso'
-
Optional: Specify your custom AWS Profile (only if not using the default aws profile)
-
Open compose.yaml in an editor:
# Open compose.yaml in the editor nano compose.yaml
-
Uncomment the following line and change its value
... services: aws-glue: ... environment: ... #AWS_PROFILE: "my_aws_profile" ...
-
Save your changes
-
-
Configure the Glue Endpoint Region
-
Click here to see a list of valid regions
-
Open an interactive shell inside the container and run the configuration script:
docker compose exec aws-glue bash -l configure-glue-region.py <region> #e.g: ap-south-1
-
Alternatively, you can run the script directly:
# DO NOT MISS THE QUOTES docker compose exec aws-glue bash -lc 'configure-glue-region.py <region>'
-
-
Run your tasks
-
You can run tasks in an interactive shell or via the
docker compose exec
command:# The file `sample.py` on the host would be located at `glue_libs_sso/binds/workspace/src/sample.py` # and automatically mapped to `/home/glue_user/workspace/src/sample.py` inside the container. # Since the container's default working directory is `/home/glue_user/workspace`, # you only need to reference files starting from the `src` directory. # DO NOT MISS THE QUOTES docker compose exec aws-glue bash -lc 'spark-submit src/sample.py' docker compose exec aws-glue bash -lc pyspark # DO NOT MISS THE QUOTES docker compose exec aws-glue bash -lc '~/jupyter/jupyter_start.sh'
-
You might want to adjust the following settings by editing the compose.yaml
file:
-
This Docker image enables you to configure your SSO profile within the container by running
aws2 configure sso
, and log in usingaws2 sso login
.If you prefer to reuse your existing AWS configuration from the host, uncomment the following line. Please note that in this last case, you may need to run
chmod g+rw
on the~/.aws/sso/cache/xxxxx.json
file or on the~/.aws/sso/cache
folder to ensure credentials refresh correctly. This is necessary because the host user ID may not match the Glue user ID (10000), which could prevent the container from updating the JSON file. Granting write access to the owner group can resolve this issue.... services: aws-glue: ... volumes: ... #- vol_aws_custom:${GLUE_AWS} ...
-
To automatically start the jupyter server when running
docker compose up
, uncomment the following line.... services: aws-glue: ... #command: ["${GLUE_HOME}/jupyter/jupyter_start.sh"]
-
To specify a custom path for your workspace on the host (instead of the default
./binds/workspace/
), update the value of thedevice
option.... volumes: vol_workspace: ... driver_opts: ... device: "./binds/workspace/" ...
This project is licensed under MIT license but depends on components that are under the Apache 2.0 License. See LICENSE-MIT, LICENSE-APACHE-2.0 and NOTICE files.