Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

added Sizer type assert at cachedsize method. The sizecache is not ca… #412

Merged
merged 10 commits into from
Jun 17, 2018

Conversation

jmarais
Copy link
Contributor

@jmarais jmarais commented Jun 1, 2018

…lculated if the struct conforms to the Sizer interface

@@ -213,6 +213,9 @@ func (u *marshalInfo) size(ptr pointer) int {
// cachedsize gets the size from cache. If there is no cache (i.e. message is not generated),
// fall back to compute the size.
func (u *marshalInfo) cachedsize(ptr pointer) int {
if s, ok := structPointer_Interface(ptr, u.typ).(Sizer); ok {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Really clean solution :)

@awalterschulze
Copy link
Member

Let's merge this into swaptypeassert and then we review the whole swaptypeassert branch, what do you think?

@jmarais
Copy link
Contributor Author

jmarais commented Jun 1, 2018

That sounds like a plan. I am not sure about the performance impact though.
There might be other solutions as well, but that it seems to me like we would want a Sizer interface check somewhere, otherwise we will not be able to call the generated Size() methods of struct fields.

@awalterschulze
Copy link
Member

Is it possible to maybe rather do the size call in
func (u *marshalInfo) size(ptr pointer) int {

There is a check for a marshaler

if u.hasmarshaler {
		m := ptr.asPointerTo(u.typ).Interface().(Marshaler)
		b, _ := m.Marshal()
		return len(b)
	}

Maybe that is also a good place to check for sizer, possibly even store whether the message has a sizer in a hassizer field.
What do you think?

@jmarais
Copy link
Contributor Author

jmarais commented Jun 1, 2018

I think we will end up in the same situation.
We will miss the sizecache initialization since the message is 'n Sizer. (in the XXX_Size func we will just call the Size() method). Then, when we call marshal on it, we will be able to computeMarshalInfo() and when the makeMessageMarshaler's marshaler is called, we will still call the cachedsize func. And since the marshalinfo is initialized, it will just try and read the size from the sizecache field and return a invalid size.

@jmarais
Copy link
Contributor Author

jmarais commented Jun 1, 2018

If we want to add a hassizer check in the marshalInfo.size(pointer) method, we might have to replace the marshalInfo.cachedSize(pointer) calls in:
makeMessageMarshaler,
makeMessageSliceMarshaler

to just normal marshalInfo.size(pointer) calls. This is just of the top of my head, there might still be a better place to place the check.

@awalterschulze
Copy link
Member

It would be great if we could do a check, call Size and place the result in cachedSize, but maybe I am missing the whole bug?

@dsnet do you want to get in on this? We are trying to fix swapping the type assert that you suggested, before we merge dev into master for gogoprotobuf.

@jmarais
Copy link
Contributor Author

jmarais commented Jun 1, 2018

With the quick email we just had with walter I think that by only checking the Sizer interface in the cachedsized we might be missing something or might be doing unnecessary work. I might have to rethink this.

@awalterschulze
Copy link
Member

@jmarais this pull request really showed me the problem, but I have a suggestion to go another way.
Please tell me what you think?

I have a small reproducing example:

option (gogoproto.sizer_all) = true;

message FindMistake {
    optional bool Field1 = 1;
}

message NinOptStruct {
    optional FindMistake Field4 = 4;
}

It seems that Marshal assumes that xxx_messageInfo_NinOptStruct.Size(m) has already been called.
I am still confused about how this sizecache becomes valid :(
Generated code is so much easier to debug.

func (m *NinOptStruct) XXX_Size() int {
	if mm, ok := (interface{})(m).(proto.Sizer); ok {
		// xxx_messageInfo_NinOptStruct.Size(m) // uncomment this to make code work
		return mm.Size()
	}
	return xxx_messageInfo_NinOptStruct.Size(m)
}

I have modified the sizedcache method to get a stack trace:

func (u *marshalInfo) cachedsize(ptr pointer) int {
	if u.sizecache.IsValid() {
		panic("valid")
		return int(atomic.LoadInt32(ptr.offset(u.sizecache).toInt32()))
	}
	return u.size(ptr)
}

stack trace:

=== RUN   TestNinOptStructProto
--- FAIL: TestNinOptStructProto (0.00s)
panic: valid [recovered]
	panic: valid

goroutine 18 [running]:
testing.tRunner.func1(0xc4201040f0)
	/usr/local/go/src/testing/testing.go:742 +0x29d
panic(0x14575e0, 0x1519920)
	/usr/local/go/src/runtime/panic.go:502 +0x229
github.com/gogo/protobuf/proto.(*marshalInfo).cachedsize(0xc4200f0100, 0xc42008ee40, 0x4b)
	/Users/.../code/src/github.com/gogo/protobuf/proto/table_marshal.go:217 +0x6f
github.com/gogo/protobuf/proto.makeMessageMarshaler.func2(0xc4200b6050, 0x1, 0x4b, 0xc42008ee10, 0x22, 0x1463e00, 0xc420056c18, 0x100d2be, 0x14ae4e0, 0x3bde0, ...)
	/Users/.../code/src/github.com/gogo/protobuf/proto/table_marshal.go:2237 +0xb4
github.com/gogo/protobuf/proto.(*marshalInfo).marshal(0xc4200f0080, 0xc4200b6050, 0x0, 0x4b, 0xc42008ee10, 0xc420056c00, 0x100e363, 0x14744e0, 0x14ae4e0, 0xc4200b6001, ...)
	/Users/.../code/src/github.com/gogo/protobuf/proto/table_marshal.go:278 +0x168
github.com/gogo/protobuf/proto.(*InternalMessageInfo).Marshal(0x170f0c0, 0xc4200b6050, 0x0, 0x4b, 0x151d1a0, 0xc42008ee10, 0xc420056d00, 0x103e4d7, 0x50, 0x1457760, ...)
	/Users/.../code/src/github.com/gogo/protobuf/proto/table_marshal.go:142 +0xa3
github.com/gogo/protobuf/test.(*NinOptStruct).XXX_Marshal(0xc42008ee10, 0xc4200b6050, 0x0, 0x4b, 0x0, 0x4b, 0xc420056db8, 0x12e0dcb, 0x1520140, 0xc42008ede0)
	/Users/.../code/src/github.com/gogo/protobuf/test/thetest.pb.go:649 +0x105
github.com/gogo/protobuf/proto.Marshal(0x151d1a0, 0xc42008ee10, 0xc4200d6a00, 0xc42008ee10, 0xc4200d6a00, 0x1, 0x2)
	/Users/.../code/src/github.com/gogo/protobuf/proto/table_marshal.go:2714 +0x2b8
github.com/gogo/protobuf/test.TestNinOptStructProto(0xc4201040f0)
	/Users/.../code/src/github.com/gogo/protobuf/test/thetestpb_test.go:618 +0x1b1
testing.tRunner(0xc4201040f0, 0x14e6748)
	/usr/local/go/src/testing/testing.go:777 +0xd0
created by testing.(*T).Run
	/usr/local/go/src/testing/testing.go:824 +0x2e0
exit status 2
FAIL	github.com/gogo/protobuf/test	3.332s

Here on line 2237 we find that u.cachedsize is being called and assumes that the size has already been cached.

// makeMessageMarshaler returns the sizer and marshaler for a message field.
// u is the marshal info of the message.
func makeMessageMarshaler(u *marshalInfo) (sizer, marshaler) {
	return func(ptr pointer, tagsize int) int {
			p := ptr.getPointer()
			if p.isNil() {
				return 0
			}
			siz := u.size(p)
			return siz + SizeVarint(uint64(siz)) + tagsize
		},
		func(b []byte, ptr pointer, wiretag uint64, deterministic bool) ([]byte, error) {
			p := ptr.getPointer()
			if p.isNil() {
				return b, nil
			}
			b = appendVarint(b, wiretag)
			siz := u.cachedsize(p)   // line 2237
			b = appendVarint(b, uint64(siz))
			return u.marshal(b, p, deterministic)
		}
}

But if we generated the Size method, so there is no reason that the size should already be cached.

My preferred solution here is to not call our generated Size method inside XXX_Size.
We can still use our generated size method with our generated Marshal method.

So the XXX_Size method will always look like this:

func (m *NinOptStruct) XXX_Size() int {
	return xxx_messageInfo_NinOptStruct.Size(m)
}

No reason to optimize the calculation of Size with generated code, if you aren't using the generated Marshalers.

What do you think?

If you agree, do you think you can make the change in generator.go ?

Jacques Marais added 3 commits June 14, 2018 05:25
…ither use MarshalTo or default marhsaling. hassizer property now checked in the marshalinfo size method if the type has a Size() method. removed the Sizer interface check in the generated XXX_Size methods.
"reflect"
)

var sizerType = reflect.TypeOf((*Sizer)(nil)).Elem()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess we also have to remember ProtoSizer which has a ProtoSize method, instead of a Size method.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess we should then call either size or protosize if it is available?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, if we call size at all, given the insight we had offline and in the next comment.

}

s := ptr.asPointerTo(u.typ).Interface().(Sizer)
n := s.Size()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As discussed, now we are now effectively calculating the size twice.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeh. But the current size recursively calls size again if needed and that initialized struct field's marshalinfos, calculate the struct field size and store it at the cache location. If we dont call size on struct fields we don't cache their sizes. unfortunate the sizerers don't really need this recursive size call.
We could check if the thing is a marshaler and sizer it can just return the size, because we dont need the caching then. But other than that we do need to be able to cache the size for the default marshaler. Hmm. I might have another idea.

…e and cached size methods to just return the message size without looking at any other struct fields.
@@ -167,6 +169,17 @@ func (u *marshalInfo) size(ptr pointer) int {
u.computeMarshalInfo()
}

// Uses the message's Size method if available
if u.hassizer {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think, like you suggested, the best we can do is move these checks into the if u.hasmarshaler, because then it is quicker to call size than marshaling, but otherwise let's have cachesize run its course.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we move these checks in u.hasmarshaler we wont be calling any other struct field's size methods and thus not init + caching their sizes. I think the best we could do then is just use the default sizing even if it is a sizer, and then only use the Size method if the message is a marshaler as well.

@@ -213,7 +227,15 @@ func (u *marshalInfo) size(ptr pointer) int {
// cachedsize gets the size from cache. If there is no cache (i.e. message is not generated),
// fall back to compute the size.
func (u *marshalInfo) cachedsize(ptr pointer) int {
if u.sizecache.IsValid() {
if u.hassizer {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These will now never be cached, I can see that it will work, but a cachedsize, should win in speed, I would think.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeh. I agree.

@jmarais
Copy link
Contributor Author

jmarais commented Jun 17, 2018

benchcmp_marshaling_33637a95againstc36358ef.txt
Here is the marshaling bench compare with the previous commit (old) and this newest change (new)

messageset bool // uses message set wire format
hasmarshaler bool // has custom marshaler
hassizer bool // has custom sizer
hasprotosizer bool // has custom protosizer
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe we can put hassize and hasprotosizer later in the struct with a newline in between (for gofmt) to minimize the diff here?

g.Out()
g.P("}")
g.P("return xxx_messageInfo_", ccTypeName, ".Size(m)")
if (gogoproto.IsMarshaler(g.file.FileDescriptorProto, message.DescriptorProto) ||
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh nice :)

g.Out()
g.P("}")
g.P("return xxx_messageInfo_", ccTypeName, ".Marshal(b, m, deterministic)")
if gogoproto.IsMarshaler(g.file.FileDescriptorProto, message.DescriptorProto) ||
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we do a compile time check here, then we should do the same for the XXX_Unmarshal method. We should just try to be consistent.

@awalterschulze
Copy link
Member

Cool benchmarks. Maybe in a separate pull request, it might be time to update https://github.com/gogo/protobuf/blob/master/bench.md :)
It has been a while .... 2013

Ok but I think this is very close to merging. Thanks so much :D

@@ -1,73 +1,73 @@
goos: darwin
goos: linux
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe we update this as part of another pull request, which includes the update of bench.md

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, I missed this. My intention wasn't to update the benches.

@awalterschulze
Copy link
Member

Looks good to me :)

Great work!

@awalterschulze awalterschulze merged commit 3557939 into gogo:swaptypeassert Jun 17, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants