Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Supported file pattern for logdog log files #1509

Merged
merged 2 commits into from
Apr 23, 2021

Conversation

srgothi92
Copy link
Contributor

@srgothi92 srgothi92 commented Apr 21, 2021

Issue number:
#1481

Description of changes:
Ecs variant awsvpc mode feature's log file has dynamic names, so it was difficult to export such files with current logdog capabilities. This change extends logdog to support a new request type glob, which takes pattern for file matching. For this case, destination filename and path in tarball will be same as source file.

Testing done:

  1. Added unit tests to test different combination of patterns.
  2. Tested logdog in an image:
    • Build an ecs variant image
    • Launced ami in ECS cluster
    • Started eni trunking task to generate ecs-cni-eni-plugin.log, ecs-cni-bridge-plugin.log and vpc-branch-eni.log log files.
    • Waited an hour and re-ran task so that timestamp are attached to log files.
    • Followed steps here to export log files.
    • verified all the log files are present in expected directory.

Terms of contribution:

By submitting this pull request, I agree that this contribution is dual-licensed under the terms of both the Apache License, version 2.0, and the MIT license.

sources/logdog/src/log_request.rs Outdated Show resolved Hide resolved
sources/logdog/src/error.rs Outdated Show resolved Hide resolved
sources/logdog/src/log_request.rs Outdated Show resolved Hide resolved
@samuelkarp
Copy link
Contributor

Trying to build an image with conf that exports awsvpc mode logs and make sure it works.

Do you want to actually update the config for the aws-ecs-1 variant to use the new functionality here in a second commit on top of the one you already have? #1481 calls out one of the log files, the other two that should be adjusted are here.

@srgothi92
Copy link
Contributor Author

srgothi92 commented Apr 21, 2021

Do you want to actually update the config for the aws-ecs-1 variant to use the new functionality here in a second commit on top of the one you already have? #1481 calls out one of the log files, the other two that should be adjusted are here.

Yes, I am planning to do that. Thanks for the pointer.

I raised draft PR to get team's feedback on two things:

  1. If we would like to create a new request type that takes file pattern instead of single file. As of now conf pattern is mode filename instruction, which means it expects only one file per request. So i was not sure how I should handle the middle parameter filename. I decided on using file name provided if there is only one file matching the pattern, otherwise use source filename.
  2. If we would like to support directory pattern, as of now I am throwing error for it.

Edit: I did not refresh the PR, now I see tjk@ feedback exactly for for this two :)

@srgothi92
Copy link
Contributor Author

srgothi92 force-pushed the srgothi92:develop branch from 9654e34 to d0a24af 2 minutes ago

Addressed @tjk comments

@@ -205,12 +218,114 @@ where
Ok(())
}

/// Copies all file matching the glob pattern given by `request.instructions` to the tempdir with filename and path
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Grammar nit

Suggested change
/// Copies all file matching the glob pattern given by `request.instructions` to the tempdir with filename and path
/// Copies all files matching the glob pattern given by `request.instructions` to the tempdir with filename and path

if let Ok(path) = entry {
if path.is_dir() {
// iterate the directory and copy files in the directory, nested
// sub-directories files will be copied only if matched by glob pattern.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Grammar nit, but I also kind of don't understand what it means.

Suggested change
// sub-directories files will be copied only if matched by glob pattern.
// sub-directory files will be copied only if matched by glob pattern.

// iterate the directory and copy files in the directory, nested
// sub-directories files will be copied only if matched by glob pattern.
for candidate in WalkDir::new(&path)
.max_depth(1)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't quite understand the behavior for directories.
If my glob is /etc/foo/*, would both of these files be picked up?

/etc/foo/file.txt
/etc/foo/dir1/dir2/dir3/file.txt

Maybe you can just copy directories that you encounter with something like fs_extra?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If my glob is /etc/foo/*, would both of these files be picked up?

This is what I tried explaining by comment : "sub-directories files will be copied only if matched by glob pattern". :P. Basically glob and walkdir both can iterate sub-directories, if we let both of them iterate we will have conflict. So, I am only copying sub-directories files if it matches glob pattern. In this case, /etc/**/ would match for all directories and sub directories of /etc and /etc/foo/* only matches /etc/foo dir.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure we need to care about that, or the max_depth. If we hit a directory, then the user specified that directory in their glob, so I think we should copy all of it with a plain walkdir, no max_depth. If there are duplicates, we could either copy again (this isn't performance critical) or make a hashset of files to copy and do it once at the end. (With the set, we could get rid of the file_copy function, which I think raises questions about why we're not using it for "file" mode...)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

...or just copy the directory itself rather than walking it, right?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copying would always involve walking the structure, even if we farm the job out to another crate, but if we farm it out we run the risk of copying files-in-directories differently than we do files, and we already use walkdir and it's so simple...

{
// with glob pattern there are chances of multiple targets with same name, therefore
// we maintain source file path and name in destination directory.
// Eg. src file path "/a/b/file" will be converted to "dest_dir/a/b/file"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's too bad that I didn't do this with all the files we already collect, but this behavior makes sense.

Unfortunately it might be a little confusing when you unpack that tarball and some things are at the top and other things replicate their on-host paths. One possibility would be to put all of these paths into a dir at the top of the tarball like /glob, but I don't think that's any better.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's not perfect, but I think it works out OK, because the file/command entries are probably the most important ones, and you can give them particular names and have easier access to them, and the glob entries can grab any extras...

{
// with glob pattern there are chances of multiple targets with same name, therefore
// we maintain source file path and name in destination directory.
// Eg. src file path "/a/b/file" will be converted to "dest_dir/a/b/file"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's not perfect, but I think it works out OK, because the file/command entries are probably the most important ones, and you can give them particular names and have easier access to them, and the glob entries can grab any extras...

// iterate the directory and copy files in the directory, nested
// sub-directories files will be copied only if matched by glob pattern.
for candidate in WalkDir::new(&path)
.max_depth(1)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure we need to care about that, or the max_depth. If we hit a directory, then the user specified that directory in their glob, so I think we should copy all of it with a plain walkdir, no max_depth. If there are duplicates, we could either copy again (this isn't performance critical) or make a hashset of files to copy and do it once at the end. (With the set, we could get rid of the file_copy function, which I think raises questions about why we're not using it for "file" mode...)

sources/logdog/src/error.rs Outdated Show resolved Hide resolved
sources/logdog/src/log_request.rs Outdated Show resolved Hide resolved
sources/logdog/src/log_request.rs Outdated Show resolved Hide resolved
sources/logdog/src/log_request.rs Outdated Show resolved Hide resolved
sources/logdog/src/log_request.rs Outdated Show resolved Hide resolved
sources/logdog/src/log_request.rs Outdated Show resolved Hide resolved
sources/logdog/src/log_request.rs Outdated Show resolved Hide resolved
@srgothi92 srgothi92 force-pushed the develop branch 3 times, most recently from a81ec37 to ad0f158 Compare April 22, 2021 21:06
@srgothi92 srgothi92 marked this pull request as ready for review April 22, 2021 21:08
@srgothi92
Copy link
Contributor Author

addressed @tjk's comments.

sources/logdog/src/error.rs Show resolved Hide resolved
sources/logdog/src/log_request.rs Outdated Show resolved Hide resolved
Copy link
Contributor

@tjkirch tjkirch left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would you please confirm you've re-tested what's in the PR description?

sources/logdog/src/error.rs Outdated Show resolved Hide resolved
sources/logdog/src/log_request.rs Outdated Show resolved Hide resolved
Ecs variant awsvpc mode feature's log file has dynamic names, so it
was difficult to export such files with current logdog capabilities.
This change extends logdog to supporti a new request type `glob`, which
takes pattern for file matching. For this case, destination filename
and path in tarball will be same as source file.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants