Skip to content

Conversation

JanKrivanek
Copy link
Member

@JanKrivanek JanKrivanek commented Oct 5, 2023

Fixes #9147, #9261 and contributes to #8400

Viewer complement PR: KirillOsenkov/MSBuildStructuredLog#732

Important

This contains all changes from

Context and changes are described in those individual PRs
Feel free to review everything together (this PR), or separately (above 2 PRs) - but not all 3 as you'd be viewing same changes twice

TBD:

  • Tests
    • Simulate the fwd compat scenarios by injecting/removing/altering bytes in memory stream underlying under binlog writer and reader.
    • Test the stream utility classes.
  • localize all strings Done
  • I'm contemplating adding support for recognizing over-reading and mismatched reading during events deserialization Done
  • [TBD for next PR] Usage documentation (when, why and how to handle errors during forward compatible reading)

@JanKrivanek JanKrivanek changed the title Binlogs Redacting support + Binlogs forward reading support Binlogs Redacting support + Binlogs forward-compatibility reading support Oct 5, 2023
Copy link
Member

@rokonec rokonec left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Request change is only for missing that "backward-compatible-with" field.
Otherwise it looks OK. But I have to admit that I have not time to carefully review full extent of this PR.

@danmoseley
Copy link
Member

Is there a design doc or some overview of what the redactor does - it sounds interesting?
Eg., does it attempt to discover security tokens and suchlike, as Github does? It would only be defense in depth as it can never be perfect.

@JanKrivanek
Copy link
Member Author

Is there a design doc or some overview of what the redactor does - it sounds interesting? Eg., does it attempt to discover security tokens and suchlike, as Github does? It would only be defense in depth as it can never be perfect.

Hi @danmoseley,

Design proposal of redactor is located here: https://github.com/JanKrivanek/MSBuildBinlogRedactor/blob/main/docs/DesignProposal.md (ultimate long term wishful plan is here https://github.com/dotnet/msbuild/blob/main/documentation/specs/proposed/security-metadata.md)
Autodetections are in plan - so far we have just some PoC samples - couple token types detector (https://github.com/JanKrivanek/MSBuildBinlogRedactor/blob/main/src/Microsoft.Build.SensitiveDataDetector/PatternsDetector.cs) and username detector (https://github.com/JanKrivanek/MSBuildBinlogRedactor/blob/main/src/Microsoft.Build.SensitiveDataDetector/UsernameDetector.cs). But @michaelcfanning wants to contribute some more comprehensive logic they have been using in 1ES. Plus I'm trying to get CredScan team to OSS their client side library for textual data classification

@JanKrivanek JanKrivanek requested a review from ladipro October 16, 2023 20:41
Copy link
Member

@ladipro ladipro left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've left comments inline, mostly nits.

Copy link
Member

@rokonec rokonec left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor comments

@KirillOsenkov
Copy link
Member

Very nice, I made a local change on top of this PR to change the file format version to 19 and added a junk field to the end of ProjectStartedEventArgs, then opened the resulting binlog in the viewer and it was still able to open the binlog and show the error message:

image

@KirillOsenkov
Copy link
Member

I published the viewer update here:
https://github.com/KirillOsenkov/MSBuildStructuredLog/releases/tag/v2.2.155

The NuGet package here:
https://www.nuget.org/packages/MSBuild.StructuredLogger/2.2.155

The binlogtool here:
https://www.nuget.org/packages/binlogtool/1.0.10

Let's proactively test these and maybe send some PRs to Arcade, Roslyn, Maui and a few other repos to get some testing. Then when we do bump the format to 18 and 18 binlogs start to come out we won't break too many people. Both nuget.org and github now can show projects/packages that depend on MSBuild.StructuredLogger NuGet.

@KirillOsenkov
Copy link
Member

binlogtool seems to be working too:

image

@KirillOsenkov
Copy link
Member

Interesting, I forgot to update the reader to read the junk byte I added, and I got a decently nice error message:
image

@KirillOsenkov
Copy link
Member

Is this the behavior we want/expect when we don't opt in MSBuild to be forward compatible by default?

@JanKrivanek
Copy link
Member Author

Is this the behavior we want/expect when we don't opt in MSBuild to be forward compatible by default?

I suppose the junk byte was added without increasing the version - correct? Then it's expected - we still crash, but more gracefully.

@KirillOsenkov
Copy link
Member

No, I increased the version

@JanKrivanek
Copy link
Member Author

No, I increased the version

Thanks @KirillOsenkov - good catch! Not expected indeed! - all the forward compatible flags (allow reading higher version, allow skipping unknown events, allow skiping unknown data in known events) should have same default setting - there was one (allow reading higher version) which was permitted by default - fixed now

@JanKrivanek
Copy link
Member Author

Let's not merge until the linked PRs are merged (to minimize breaking internal teams)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Binlog redacting - properly process embedded files

8 participants