Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Consistent random identifier names #185

Merged
merged 10 commits into from
Oct 13, 2017

Conversation

shabbyrobe
Copy link
Contributor

@shabbyrobe shabbyrobe commented Jul 21, 2017

We choose to commit our generated msgpack files to our git repository, but this can yield some pollution in our diffs when certain changes are made to our structs.

Considering these structs:

package foo
//go:generate msgp
//msgp:tuple Bar
//msgp:tuple Foo
type Bar struct {
	Stuff string
}
type Foo struct {
	Stuff string
}

If you add a new member to Bar that causes the generator to call randIdent() (for example Pants []string), every subsequent ident in the generated file will be changed, even in the generated methods for Foo.

We are leaning very heavily on this generator in a part of our project which is under heavy development. Some of our generated files get quite large and our diffs are getting a little silly: adding a single field towards the top can have the unwanted side effect of changing hundreds of totally unrelated lines in the diff as random identifiers are bumped!

This PR has a stab at making the identifier names consistent based on the method and depth, i.e. every scope at a particular depth will have exactly the same sequence of identifiers. I'd be very grateful if you'd consider it for inclusion, happy to make any revisions necessary.


This change is Reviewable

@philhofer
Copy link
Member

Thanks for the PR.

I think this is a worthwhile change, but I wonder if it could be a little simpler. This is a tough change to test, and probably an easy one to break accidentally, so I'd like it to be "obviously correct" absent a good regression test.

What if randIdent took the nearest reference to a field name and hashed it? (IIRC, it is only used to generate temporary variables for field names, so there should be a unique field nearby that you can use as hash input.)

@shabbyrobe
Copy link
Contributor Author

So it's OK to slap an argument on randIdent? I only went with the solution I went with to be minimally disruptive, but if you're happy for me to change the randIdent API a little to get some of the possible hidden explosions out, I'm all for it!

@philhofer
Copy link
Member

Yeah, go for it. Just keep in mind that inlining means you'll have to use a fully-qualified field name (something like (*StructName).FieldName) rather than just the field itself.

@shabbyrobe
Copy link
Contributor Author

Had another attempt at this by seeding with nearby names. I left two debug lines in, commented out, in case you want to see what it uses for the seed. Here's the output on mine after running make travis. The slice printed at the end is the seed, do you think these will be adequate?:

5409659 /home/bl/code/go/src/github.com/tinylib/msgp/gen/elem.go 303 [slice Blobs z]
5409659 /home/bl/code/go/src/github.com/tinylib/msgp/gen/elem.go 303 [slice Blobs (*z)]
5409659 /home/bl/code/go/src/github.com/tinylib/msgp/gen/elem.go 303 [slice Blobs z]
5409659 /home/bl/code/go/src/github.com/tinylib/msgp/gen/elem.go 303 [slice Blobs (*z)]
5409659 /home/bl/code/go/src/github.com/tinylib/msgp/gen/elem.go 303 [slice Blobs z]

5406775 /home/bl/code/go/src/github.com/tinylib/msgp/gen/elem.go 227 [array z]
5408330 /home/bl/code/go/src/github.com/tinylib/msgp/gen/elem.go 266 [map z.Mp]
5409659 /home/bl/code/go/src/github.com/tinylib/msgp/gen/elem.go 303 [slice []MyEnum z.Enums]
5409659 /home/bl/code/go/src/github.com/tinylib/msgp/gen/elem.go 303 [slice Files z.Some.Relevent]
5409659 /home/bl/code/go/src/github.com/tinylib/msgp/gen/elem.go 303 [slice []Embedded z.Children]
5409659 /home/bl/code/go/src/github.com/tinylib/msgp/gen/elem.go 303 [slice []*Embedded z.PtrChildren]
5409659 /home/bl/code/go/src/github.com/tinylib/msgp/gen/elem.go 303 [slice Files z.Relevent]
5409659 /home/bl/code/go/src/github.com/tinylib/msgp/gen/elem.go 303 [slice Files z]
5409659 /home/bl/code/go/src/github.com/tinylib/msgp/gen/elem.go 303 [slice Files (*z)]
5409659 /home/bl/code/go/src/github.com/tinylib/msgp/gen/elem.go 303 [slice Files z]
5409659 /home/bl/code/go/src/github.com/tinylib/msgp/gen/elem.go 303 [slice Files (*z)]
5409659 /home/bl/code/go/src/github.com/tinylib/msgp/gen/elem.go 303 [slice Files z]
5406775 /home/bl/code/go/src/github.com/tinylib/msgp/gen/elem.go 227 [array z]
5408330 /home/bl/code/go/src/github.com/tinylib/msgp/gen/elem.go 266 [map z[zpql]]
5409659 /home/bl/code/go/src/github.com/tinylib/msgp/gen/elem.go 303 [slice []string z.Slice1]
5409659 /home/bl/code/go/src/github.com/tinylib/msgp/gen/elem.go 303 [slice []string z.Slice2]
5408330 /home/bl/code/go/src/github.com/tinylib/msgp/gen/elem.go 266 [map z.MapMap]
5408330 /home/bl/code/go/src/github.com/tinylib/msgp/gen/elem.go 266 [map zbsm]
5409659 /home/bl/code/go/src/github.com/tinylib/msgp/gen/elem.go 303 [slice []float64 z.B]
5408330 /home/bl/code/go/src/github.com/tinylib/msgp/gen/elem.go 266 [map z.Els]
5409659 /home/bl/code/go/src/github.com/tinylib/msgp/gen/elem.go 303 [slice []string z.Slice1]
5409659 /home/bl/code/go/src/github.com/tinylib/msgp/gen/elem.go 303 [slice []string z.Slice2]
5409659 /home/bl/code/go/src/github.com/tinylib/msgp/gen/elem.go 303 [slice []string *z.SlicePtr]
5409659 /home/bl/code/go/src/github.com/tinylib/msgp/gen/elem.go 303 [slice []int32 z.Vals]
5406775 /home/bl/code/go/src/github.com/tinylib/msgp/gen/elem.go 227 [array z.Arr]
5406775 /home/bl/code/go/src/github.com/tinylib/msgp/gen/elem.go 227 [array z.Arr2]
5409659 /home/bl/code/go/src/github.com/tinylib/msgp/gen/elem.go 303 [slice []Tree z.Children]
5406775 /home/bl/code/go/src/github.com/tinylib/msgp/gen/elem.go 227 [array z.Values]
5406775 /home/bl/code/go/src/github.com/tinylib/msgp/gen/elem.go 227 [array *z.ValuesPtr]
5406775 /home/bl/code/go/src/github.com/tinylib/msgp/gen/elem.go 227 [array z.More]
5409659 /home/bl/code/go/src/github.com/tinylib/msgp/gen/elem.go 303 [slice [][32]int32 z.Others]
5406775 /home/bl/code/go/src/github.com/tinylib/msgp/gen/elem.go 227 [array z.Others[zlwi]]
5409659 /home/bl/code/go/src/github.com/tinylib/msgp/gen/elem.go 303 [slice [][]int32 z.Matrix]
5409659 /home/bl/code/go/src/github.com/tinylib/msgp/gen/elem.go 303 [slice []int32 z.Matrix[zuad]]
5409659 /home/bl/code/go/src/github.com/tinylib/msgp/gen/elem.go 303 [slice []Fixed z.ManyFixed]

I thought it might also be worth reducing the collision chance by increasing the length of the identifier or adding a set to the idgen that prevents duplicates. What do you think?

@glycerine
Copy link
Contributor

I've been experimenting with hashing each file's filename, and appending to that an incrementing counter for each gensym. Seems to work pretty well. The filename hash prevent collisions in the same directory.

https://github.com/glycerine/greenpack/blob/master/gen/cry.go#L29

@shabbyrobe
Copy link
Contributor Author

@glycerine, that doesn't look like it makes the identifiers stable for unmodified functions - this is what I was hoping to accomplish with this PR.

@glycerine
Copy link
Contributor

@shabbyrobe kindly define the term unmodified function for me. I don't know what you are talking about.

@shabbyrobe
Copy link
Contributor Author

Oops, should've said "unmodified structs". If I have a file with two structs in it and change a field in the first one, the random identifier names in the Decode/Marshal/etc functions for the second struct will change (provided it is lower in an alphabetical sort than the first). As I commit all my generated code, some of the diffs are getting a bit out of control just from random identifiers changing.

@glycerine
Copy link
Contributor

@shabbyrobe I see. Yes, files are processed top-to-bottom, so a sequential numbering of gensyms is stable from the top down until the change point. While this isn't the minimal change possible, it is quite stable and easy to observe correct. Anyway, its free to try.

@shabbyrobe
Copy link
Contributor Author

Sorry for the bump, @philhofer, but every day our diffs get more and more crazy. It's nudging 3,000 lines per generation just to add one field! Is there any chance we could get this up and running soon? I'm happy to do whatever is required to make it a reality.

gen/elem.go Outdated
idxChars = "abcdefghijlkmnopqrstuvwxyz"
idxLen = 3
idxChars = "abcdefghijlkmnopqrstuvwxyz"
idxCharsLen = int64(len(idxChars))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is this constant added but never used?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Previous version of the same idea. Didn't clean it up properly. Fixed now.

@glycerine
Copy link
Contributor

glycerine commented Aug 16, 2017

@shabbyrobe how well does the approach here work? Can you quantify?

I only see implementation for Array, Map, and Slice. What about Struct, Ptr, and BaseElem?

Additionally, It sounds like you aren't running your own fork in production... not exactly a vote of confidence in it. msgp is fairly conservative, on purpose and with good reason. Untried and unproven experiments should be thoroughly vetted on forks first. My zebrapack and greenpack are forks of msgp due to the desire to maintain msgp iron stable.

@shabbyrobe
Copy link
Contributor Author

shabbyrobe commented Aug 22, 2017

Had another stab at this. It's easier to follow what has been generated and why this way. I ran into trouble with the fact that the parse package calls SetVarname on the root elem, which calls out for random idents as it walks the tree. I really wanted to remove the package local aspect of id generation but I haven't found a way to do it without changing the exported API of the parse package or gen package, which I'm guessing is a no-no. This generates the few hundred structs we have in our app without a problem.

@philhofer
Copy link
Member

Awesome. I like this approach. This change now seems much more "obviously correct."

Want to add a test in the /gen directory? You'll have to write it like the issue94 test, where you generate code and run a shell script both through go generate. I think it'd look something like this:

  • generate code from a file
  • insert one line into the file to add a field to a struct
  • re-generate the code
  • confirm that diff $FILE0 $FILE1 | wc -l is below some threshold.

@shabbyrobe
Copy link
Contributor Author

Awesome thanks for the feedback. I've made a first pass at this, I went a slightly different route by pulling the declarations out of the AST. I could clean it up a bit and use the same building blocks to run a stack more tests against the identifier generation. Let me know if you're happy enough with the approach and I'll expand it to cover more cases, otherwise I'll have another stab if you'd rather I do it a different way.


// Ensure that consistent identifiers are generated on a per-method basis by msgp.
//
// Also eusnre that no duplicate identifiers appear in a method.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

s/eusnre/ensure/

Makefile Outdated
@@ -28,6 +28,7 @@ $(MGEN): ./msgp/defs_test.go

test: all
go test -v ./msgp
go test -v ./gen
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You should add this line to the travis rule as well (sorry; the Makefile is a mess, as it always is...)

@philhofer
Copy link
Member

This approach is fine with me too. It's about time we had better test harnesses.

@shabbyrobe
Copy link
Contributor Author

I fixed up the makefile, the spelling error and separated aspects of the test case and added another more complex generation template. Can you think of any other scenarios I can add?

@glycerine
Copy link
Contributor

glycerine commented Aug 24, 2017

Do you test for identifier collision between two or more files generated in the same directory? I had to add a per-file hash of the filename to avoid that. (But then, I generate global vars and I'm not sure if you are...)


// Loose sanity check to make sure the tests expectations aren't broken.
// If the prefix ever changes, this needs to change.
for i := 0; i < len(vars); i++ {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

use a range loop here

@philhofer
Copy link
Member

msgp doesn't generate global variables, so I'm not worried about conflicts across generated files.

Can we move the test to the root of the tree and have it call Run rather than calling it through go generate? I worry that having a unit test in gen that depends on msgp having been built and installed in GOPATH makes it too easy to accidentally run the test without actually rebuilding the tool.

@shabbyrobe
Copy link
Contributor Author

Seems much nicer to me with Run, but the one minor downside is that it spews a bunch of purple stuff into the terminal. Is it worth adding a boolean to suppress that to Run or are you OK with the output?

issue185_test.go Outdated
return err
}

var mode = gen.Encode | gen.Decode | gen.Size | gen.Marshal | gen.Unmarshal
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

mode := ...

issue185_test.go Outdated
}

tfile := filepath.Join(tempDir, "msg.go")
genFile := filepath.Join(tempDir, "msg_gen.go")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can just use newFilename here, since it's already defined for you.

@philhofer
Copy link
Member

You can suppress the output by putting the following in the top of the test file:

func init() {
    os.Stdout = ioutil.Discard
}

@shabbyrobe
Copy link
Contributor Author

Unfortunately os.Stdout is not an io.Writer, it's an *os.File. A long-standing problem with the stdlib. You can drain it with a channel and an os.Pipe, but it's pretty gnarly.

@philhofer
Copy link
Member

D'oh. Yeah.

For now lets leave the output as-is and then entertain a refactoring of Run and friends in a second PR.

@@ -0,0 +1,308 @@
package gen
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Isn't this file just a duplicate of the one in the root of the file tree? Remove it?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oops! Done.

Makefile Outdated
@@ -27,7 +27,9 @@ $(MGEN): ./msgp/defs_test.go
go generate ./msgp

test: all
# keep in sync with 'make travis'
go test -v ./msgp
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This and the travis rule can probably just be go test ./... now. At one point long ago I think there was an ordering dependency that needed to be satisfied, but I can't see why that wouldn't be a valid way to run the tests now.

@shabbyrobe
Copy link
Contributor Author

Hello! Just a cheeky bump, is there anything else I need to do with this PR?

@philhofer philhofer merged commit 3b556c6 into tinylib:master Oct 13, 2017
@shabbyrobe
Copy link
Contributor Author

Awesome! Thank you.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants