DockerSwarmOperator does not work well with Python's tqdm when displaying logs #40571

bladerail · 2024-07-03T03:42:17Z

Apache Airflow Provider(s)

docker

Versions of Apache Airflow Providers

apache-airflow-providers-docker==3.10.0

Apache Airflow version

2.9.1-python3.11

Operating System

CentOS 7

Deployment

Other Docker-based deployment

Deployment details

I am running a Docker Swarm environment consisting of 3 manager hosts and 10 worker hosts, all running CentOS 7, docker version 24.0.7.

What happened

We run our DAGs as Python scripts embedded inside Docker Containers. When said python script uses the tqdm progress bar, the DAG reflects a failure because an exception is raised in DockerSwarmOperator's stream_new_logs method, which expects the log message to begin with a timestamp. The Docker Container actually continues running to completion, but the DAG has already been marked a failure because of the raised exception.

As a result, the container/service does not get cleaned up, the logs for a successful run do not display, and the DAG mistakenly shows as a failed run.

An example error log has been attached.
error.txt

What you think should happen instead

Since the python script inside the Docker Container actually runs to completion and successfully, Airflow should display this as a successful run instead of failure.

How to reproduce

Create a Docker Swarm.
Start Airflow with default settings.
Create a Docker image that runs the following python code.

from tqdm import tqdm


if __name__ == "__main__":
    for i in tqdm(range(10000), total=10000):
        if i % 5 == 0:
            print(i)

Create a DAG to run the Docker Image created in Step 3.
Run said DAG created in step 4.

Anything else

I am attaching several screenshots below. They show different runs of the same DAG, and show that it fails at different times. This means that, for processes where the tqdm progress bar is short enough, the DAG can actually pass and does not get affected by the stream_new_logs issue. I created a python script that did a tqdm from 0 to 10000, printing on every 5th iteration. The screenshots below show that Airflow is generally able to display the logs and fail at various points, as early as 775 and as late as 8170. This DAG has never succeeded before, but I am able to verify via the dead containers that they actually all ran to completion (docker logs ${container_id} shows that tqdm ran all the way to 10000).

Sample Run 1

Sample Run 2

Sample Run 3

Sample Run 4

Are you willing to submit PR?

Yes I am willing to submit a PR!

Code of Conduct

I agree to follow this project's Code of Conduct

The text was updated successfully, but these errors were encountered:

boring-cyborg · 2024-07-03T03:42:24Z

Thanks for opening your first issue here! Be sure to follow the issue template! If you are willing to raise PR to address this issue please do so, no need to wait for approval.

potiuk · 2024-07-03T09:18:11Z

If you would like to take a stab on it @bladerail -> feel free to attempt PR, otherwise I marked it as "good first issue" for someone to volunteer to do.

mr1holmes · 2024-07-07T14:03:51Z

Hey @potiuk,
I would like to work on this.
Can you please assign it to me?

mr1holmes · 2024-07-07T17:26:57Z

@potiuk
You assigned it to @bladerail,
I wanted to make sure it wasn’t a mix-up, as I’m still interested in working on it.
Could it please be reassigned to me if it’s not actively being worked on?

potiuk · 2024-07-07T18:05:46Z

Oups :)

bladerail added area:providers kind:bug This is a clearly a bug needs-triage label for new issues that we didn't triage yet labels Jul 3, 2024

dosubot bot added the provider:docker label Jul 3, 2024

potiuk added good first issue and removed needs-triage label for new issues that we didn't triage yet labels Jul 3, 2024

potiuk assigned bladerail Jul 7, 2024

potiuk assigned mr1holmes and unassigned bladerail Jul 7, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

DockerSwarmOperator does not work well with Python's tqdm when displaying logs #40571

DockerSwarmOperator does not work well with Python's tqdm when displaying logs #40571

bladerail commented Jul 3, 2024 •

edited

Loading

boring-cyborg bot commented Jul 3, 2024

potiuk commented Jul 3, 2024

mr1holmes commented Jul 7, 2024

mr1holmes commented Jul 7, 2024

potiuk commented Jul 7, 2024

DockerSwarmOperator does not work well with Python's tqdm when displaying logs #40571

DockerSwarmOperator does not work well with Python's tqdm when displaying logs #40571

Comments

bladerail commented Jul 3, 2024 • edited Loading

Apache Airflow Provider(s)

Versions of Apache Airflow Providers

Apache Airflow version

Operating System

Deployment

Deployment details

What happened

What you think should happen instead

How to reproduce

Anything else

Are you willing to submit PR?

Code of Conduct

boring-cyborg bot commented Jul 3, 2024

potiuk commented Jul 3, 2024

mr1holmes commented Jul 7, 2024

mr1holmes commented Jul 7, 2024

potiuk commented Jul 7, 2024

bladerail commented Jul 3, 2024 •

edited

Loading