Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Using service containers #152

Closed
jenstroeger opened this issue Jul 7, 2022 · 11 comments · Fixed by #156
Closed

Using service containers #152

jenstroeger opened this issue Jul 7, 2022 · 11 comments · Fixed by #156

Comments

@jenstroeger
Copy link

jenstroeger commented Jul 7, 2022

I tried out the harden-runner action (based on this repo) and

with:
  egress-policy: audit

and it worked, until I added a PostgreSQL service container to run a few tests. It looks like traffic to that container is blocked? I tried to add

allowed-endpoints: >
  localhost:5432  # `postgres:5432` doesn’t work either

but neither of these two worked. It’s a private organization and I don’t have the privileges to install the app to check the egress audit log.

I disabled the step, and all tests pass just fine. How do you recommend to proceed?

Much thanks!

@Fich0Gl
Copy link

Fich0Gl commented Jul 7, 2022

other variants of this are 127.0.0.1:5432 and 0.0.0.0:5432. I don't know if it is worth trying, the best way to debug this would be to add the Harden Runner App to check whether the port is open or not.
I'll do further investigation about this

@varunsh-coder
Copy link
Member

Sorry to hear that the traffic to the service container is blocked. That is not expected. Both in audit and block mode, localhost traffic is not supposed to be blocked.

I looked at the documentation and see that it has examples of using service container with a container element e.g. container: node:10.18-jessie and also without it. Can you please confirm if you are using container element or running directly on the runner machine? harden-runner is not supported if used with container element (though it should not have blocked any traffic in that case).

Also, when using it in a private repo, you will need to install the App. Else it cannot download the build log and correlate outbound traffic with each step. It only needs actions: read permission.

@varunsh-coder
Copy link
Member

@h0x0er can you please try to repro this issue on a public repo? You can use the workflow from here: https://docs.github.com/en/actions/using-containerized-services/creating-postgresql-service-containers#running-jobs-directly-on-the-runner-machine

@jenstroeger
Copy link
Author

@varunsh-coder the build job that fails looks something like this:

  build:
    name: Check Python ${{ matrix.python }} on ${{ matrix.os }}
    runs-on: ${{ matrix.os }}
    strategy:
      fail-fast: false
      matrix:
        os: [ubuntu-latest]  # Enable more later.
        python: ['3.9', '3.10']
    services:
      postgres:
        image: postgres:14
        env:
          POSTGRES_PASSWORD: postgres
          POSTGRES_DB: testdb
        options: >-
          --health-cmd pg_isready
          --health-interval 10s
          --health-timeout 5s
          --health-retries 5
        ports:
        - 5432:5432
    steps:
      # Disable so tests succeed.
      - name: Harden Runner
        uses: step-security/harden-runner@248ae51c2e8cc9622ecf50685c8bf7150c6e8813
        with:
          egress-policy: audit
      #     allowed-endpoints: >
      #       postgres:5432 # PostgreSQL service container
    - name: Checkout
      uses: actions/checkout@d0651293c4a5a52e711f25b41b05b2212f385d28
    - name: Set up Python
      uses: actions/setup-python@d09bd5e6005b175076f227b13d9730d56e9dcfcb
      with:
        python-version: ${{ matrix.python }}
    - name: Install dependencies
      run: make setup
    - name: Run tests
      run: make test
      # The tests use SQLAlchemy as ORM, and connecting to the db fails.

The Action log shows the following error:

E       sqlalchemy.exc.OperationalError: (psycopg2.OperationalError) connection to server at "localhost" (::1), port 5432 failed: Connection refused
E       	Is the server running on that host and accepting TCP/IP connections?
E       connection to server at "localhost" (127.0.0.1), port 5432 failed: Connection refused
E       	Is the server running on that host and accepting TCP/IP connections?
E       
E       (Background on this error at: https://sqlalche.me/e/14/e3q8)

When I comment out the harden-runner all tests pass as expected.

@h0x0er
Copy link
Member

h0x0er commented Jul 12, 2022

@varunsh-coder I had completed my investigation, the error indeed is occurring because of restarting the docker daemon . To fix this issue, we just need to add an extra flag --restart always in service options. check here

checkout this workflow

@jenstroeger after applying the below fix; the workflow will run normally with harden-runner.

        options: >-
          --health-cmd pg_isready
          --health-interval 10s
          --health-timeout 5s
          --health-retries 5
          --restart always

@jenstroeger
Copy link
Author

Thank you, @h0x0er.

One question: why would the PostgreSQL container stop running, thus warranting the automatic restart? I wasn’t able to find details on the health options but perhaps they’re insufficient if subsequent jobs take too long? I mean 5 retries on a 5s timeout is 25 seconds, not sure where exactly the interval comes in here 🤔

@varunsh-coder
Copy link
Member

Thank you, @h0x0er.

One question: why would the PostgreSQL container stop running, thus warranting the automatic restart? I wasn’t able to find details on the health options but perhaps they’re insufficient if subsequent jobs take too long? I mean 5 retries on a 5s timeout is 25 seconds, not sure where exactly the interval comes in here 🤔

@jenstroeger I can answer your question.

harden-runner Github Action installs an agent to monitor the build process. That agent runs a DNS proxy and additional monitoring on the Ubuntu VM. As part of that, the agent needs to restart the docker daemon. You can see the code below:

https://github.com/step-security/agent/blob/main/dnsconfig.go#L169

Normally, by the time all this happens, in the pre harden-runner step, no images have started to run and no workflow steps have run. But in this scenario, the PostgreSQL container is started before the pre harden-runner step. So when the docker daemon is restarted, this image stops running and doesn't restart on its own.

I hope this answers your question.

@h0x0er is trying to figure out if as part of restarting docker daemon, we can restart all images that were already running. But if we cannot figure that out, we will need to add documentation to add the --restart always argument in this scenario.

@varunsh-coder
Copy link
Member

@h0x0er was able to figure out a way to restart existing running containers as part of the docker daemon restart. I will test out the changes and release next week. After new version is released, you will not need to add --restart always. It should just work as expected.

@jenstroeger
Copy link
Author

@varunsh-coder thanks for the update! I’ll wait for the next release and then update on my end, and I’ll let you know whether it works.

@varunsh-coder
Copy link
Member

This is fixed in the latest release v1.4.5 with tag dd2c410b088af7c0dc8046f3ac9a8f4148492a95.
You should not need any workaround for this to work.
We have also added an integration test for it. Here is an example workflow run:https://github.com/harden-runner-canary/postgres-testing/runs/7810960312?check_suite_focus=true#step:9:10 and insights URL: https://app.stepsecurity.io/github/harden-runner-canary/postgres-testing/actions/runs/2848365031

@jenstroeger
Copy link
Author

You should not need any workaround for this to work.

Confirming that the change works.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants