Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve error and info logging around v2 credentials requests #2705

Merged
merged 1 commit into from
Dec 10, 2020

Conversation

sparrc
Copy link
Contributor

@sparrc sparrc commented Nov 3, 2020

Summary

Just adding log details around v2 credential requests to help with debugging.

Testing

New tests cover the changes: no

Description for the changelog

Enhancement - Improve error and info logging around credentials requests

Licensing

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.

@@ -85,7 +85,7 @@ func CredentialsHandlerImpl(w http.ResponseWriter, r *http.Request, auditLogger
func processCredentialsRequest(credentialsManager credentials.Manager, r *http.Request, credentialsID string, errPrefix string) ([]byte, string, string, *handlersutils.ErrorMessage, error) {
if credentialsID == "" {
errText := errPrefix + "No Credential ID in the request"
seelog.Infof("%s. Request IP Address: %s", errText, r.RemoteAddr)
seelog.Errorf("Error processing credential request, requestIPAddress=%s credentialsID=%s: %s", r.RemoteAddr, credentialsID, errText)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

might be better not logging the credentialsID from a security perspective.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have refactored it to not print the credentials ID, and instead when available it will print the task ARN

@sparrc sparrc force-pushed the credentials-logging branch from 253a003 to 863bd2b Compare November 5, 2020 21:27
also cleanup an unrelated warning message that could have previously
printed the full access key id of the container instance in plain text.

Signed-off-by: Cam <[email protected]>
@sparrc sparrc force-pushed the credentials-logging branch from 863bd2b to ad0d28f Compare November 6, 2020 17:25
@sparrc sparrc added the bot/test label Nov 6, 2020
@sparrc sparrc merged commit 4b0314d into aws:dev Dec 10, 2020
@sparrc sparrc deleted the credentials-logging branch December 10, 2020 01:40
@sparrc sparrc added this to the 1.49.0 milestone Jan 6, 2021
sparrc added a commit that referenced this pull request Feb 5, 2021
* Update go version to 1.15 in travis.yml, github actions, dockerfiles

* Retry to rename files on error as os.Rename calls are prone to failure on Windows

* adding fsxwindowsfileserver task resource

* Release 1.46.0

* Fix gocyclo import

* fsxwindowsfileserver: fixing task concurrency issues with drive assignment

* modify Makefile to use different Golang version while building Agent for Windows

* Update unit test coverage minimum

* update README.md to add new environment variable ECS_FSX_WINDOWS_FILE_SERVER_SUPPORTED

* Explicitly initialize seelog

The logger package's initialization logic is init() which will be
executed by loading the package.

However some of our programs are using the agent as a library and
haven't configured seelog beforehand. Due to that, loading the package
logs

```
1600397429848419666 [Error] node must have children
```
which breaks the programs.

I think it would be safer to do less on init() and explicitly
initialize the logger from the agent's main().

Signed-off-by: Kazuyoshi Kato <[email protected]>

* Release 1.47.0

* stop container timeout: increase buffer from 30s to 2m

* Fix fluentbit integ test container to version 2.7.0

* improve logging for eni attachment issues

* Switch back to latest firelens image in test and update the test.

* save RCI client token to agent state

* integ tests: cleanup and minimize dockerhub pulls (#2714)

* Remove namespace tests, already converted to functional

* remove taskmetadata-validator

* remove GPU integ tests

* remove parallel-pull-fts

* remove awscli, squid

* cleanup gremlin, netkitten, pause-container builds, and some unused dockerfiles

* change tianon/true to gcr distroless static

* remove ubuntu:20.04 and nginx:stable

* remove redundant registry auth integ test

* update v4 metadata to return network stats despite missing stats for single container (#2719)

* removes redundant log statement (#2724)

* Fix jumbled min & max for engine connection retry delays

* add p4d instance type to list of GPU instances

* Revert "removes redundant log statement (#2724)"

This reverts commit 38ac07d.

* Revert "update v4 metadata to return network stats despite missing stats for single container (#2719)"

This reverts commit 7a82ee7.

* Introduce an environment variable ECS_PULL_DEPENDENT_CONTAINERS_UPFRONT.
The environment variable default value is true. It allows dependent containers start
pulling images before dependsOn conditions in a task definition has been satisfied.

Containers have known status PULLED are defined as pulled containers. These containers
are stored in a pulled container map, and can be retrieved from Task Metadata Endpoint
version 4 (TMDEv4) path ${ECS_CONTAINER_METADATA_URI_V4}/task when the environment
variable is set to true.

* load pulled containers map from state

* Set ECS_PULL_DEPENDENT_CONTAINERS_UPFRONT=false as default

* scripts: add ci-ecr-pull and ci-ecr-push (#2720)

These scripts are for pushing the dockerhub images that we need for our
ci system (integ test and build) to ECR, and for pulling those images
down within the CI system.

ci-ecr-pull will tag the ECR images like 'golang:1.15.4' so that users
who do not run this script on their laptop or elsewhere can still pull
from dockerhub, only if this script is run will the builds reference the
ECR images.

The reason behind this is that dockerhub is now rate-limiting anonymous
pulls, so in order to avoid being rate-limited we should pull our images
from ECR as much as we can.

This should also speed up builds because we will be pulling our large
build images (golang, gcc) from in-region s3 rather than over the
internet.

One side-effect of this change is that we have to reference full version
numbers in our dockerfiles (1.15.4 instead of 1.15), in order to avoid
querying docker for the latest version, and to avoid dockerhub pulls
happening when an image is updated.

* remove explicit docker pull of test registry images (#2735)

Only build fluentd container on x86_64

* Update ECS_PULL_DEPENDENT_CONTAINERS_UPFRONT default value in README file

* Release 1.48.0

* Add structured logging on top of seelog

* Fix an edge case that causes container dependency deadlock.

Scenario: two containers container A and container B, container A is a non-essential container,  container B has a dependency on container A.

Workflow leading to deadlock:
1. We start to pull container A's mage;
2. We fail to pull container A's image;
3. We try to create container A anyway (this is intended as we want to try cached image if possible);
4. We fail to create the container A due to missing image;
5. We try to stop container A;
6. We can't stop container A because when container B has a dependency on A, we implicitly have a reversed dependency on shutdown, such that container A won't stop before container B stops;
7. However we can't move forward container B either, because it's waiting on container A. This ends up in a deadlock.

Fix: return an error when checking container B's dependency resolution when it sees that container A will never start.

* Revert "save RCI client token to agent state"

This reverts commit e21488b.

* Revert "Add structured logging on top of seelog"

This reverts commit e11ee0b.

* Release 1.48.1

* fixing agent path vulnerability on windows

* update v4 metadata to return network stats despite missing stats for single container (#2719)

* removes redundant log statement (#2724)

* Add unit test for task metadata endpoint with missing container network.

* update to latest aws-sdk-go (#2752)

* docker_client: Add missing error handling in getContainerStatsNotStreamed

* Fetch goversion for windows from a file

* Delete 'output' file (#2762)

* Improve error and info logging around v2 credentials requests (#2705)

also cleanup an unrelated warning message that could have previously
printed the full access key id of the container instance in plain text.

Signed-off-by: Cam <[email protected]>

* Remove last remaining references to travis (#2763)

also add static checks on 'push', so that we have test runs on the dev
branch which will enable us to use the build status badge.

* sync aws-sdk-go version (#2764)

* Update ECS_POLL_METRICS value to be default false (#2768)

* Make create container timeout configurable

Co-authored-by: Mythri Garaga Manjunatha <[email protected]>
Co-authored-by: Anuj Singh <[email protected]>
Co-authored-by: Meghna Srivastav <[email protected]>
Co-authored-by: amazon-ecs-bot <[email protected]>
Co-authored-by: Kazuyoshi Kato <[email protected]>
Co-authored-by: Cam <[email protected]>
Co-authored-by: Sharanya Devaraj <[email protected]>
Co-authored-by: Cameron Sparr <[email protected]>
Co-authored-by: Ray Allan <[email protected]>
Co-authored-by: Sai Prasad <[email protected]>
Co-authored-by: Shubham <[email protected]>
Co-authored-by: Chien-Han Lin <[email protected]>
Co-authored-by: Angel Velazquez <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants