Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Simple test case is producing allocations, is this intended? #222

Open
Villenny opened this issue Dec 23, 2020 · 6 comments
Open

Simple test case is producing allocations, is this intended? #222

Villenny opened this issue Dec 23, 2020 · 6 comments

Comments

@Villenny
Copy link
Contributor

Villenny commented Dec 23, 2020

I'm not sure if I'm doing something wrong, but I was surprised this produced allocations:

BenchmarkParse
BenchmarkParse-8
  300330	      4085 ns/op	     272 B/op	       4 allocs/op
PASS
func BenchmarkEachKey(b *testing.B) {
	b.ResetTimer()
	for i := 0; i < b.N; i++ {
		var id string
		var bidId string
		var bidImpId string
		var bidPrice string
		var bidAdm string
		var cur string
		var foo string

		bodyBytes := stringHelper.StringAsBytes(BidResponse_Banner)

		jsonparser.EachKey(bodyBytes, func(idx int, value []byte, vt jsonparser.ValueType, err error) {
			switch idx {
			case 0:
				id = stringHelper.BytesAsString(value)
			case 1:
				bidId = stringHelper.BytesAsString(value)
			case 2:
				bidImpId = stringHelper.BytesAsString(value)
			case 3:
				bidPrice = stringHelper.BytesAsString(value)
			case 4:
				bidAdm = stringHelper.BytesAsString(value)
			case 5:
				cur = stringHelper.BytesAsString(value)
			case 6:
				foo = stringHelper.BytesAsString(value)
			}
		},
			[][]string{
				{"id"},
				{"seatbid", "[0]", "bid", "[0]", "id"},
				{"seatbid", "[0]", "bid", "[0]", "impid"},
				{"seatbid", "[0]", "bid", "[0]", "price"},
				{"seatbid", "[0]", "bid", "[0]", "adm"},
				{"cur"},
				{"foo"},
			}...,
		)

		if "9f3701ef-1d8a-4b67-a601-e9a1a2fbcb7c" != id ||
			"0" != bidId ||
			"1" != bidImpId ||
			"0.35258" != bidPrice ||
			`some \"html\" code\nnext line` != bidAdm ||
			"USD" != cur ||
			"" != foo {
			panic("ack")
		}
	}
}

const BidResponse_Banner = `
{
	"id":"9f3701ef-1d8a-4b67-a601-e9a1a2fbcb7c",
	"seatbid":[
	   {
		  "bid":[
			 {
				"id":"0",
				"adomain":[
				   "someurl.com",
				   "someotherurl.com"
				],
				"impid":"1",
				"price":0.35258,
				"adm":"some \"html\" code\nnext line",
				"w":300,
				"h":250
			 }
		  ]
	   }
	],
	"cur":"USD"
}
`

This is happening inside the jsonparser.EachKey function:

ROUTINE ======================== github.com/buger/jsonparser.EachKey in C:\Users\rhaksi\go\pkg\mod\github.com\buger\[email protected]\parser.go
      69MB       69MB (flat, cum) 97.85% of Total
         .          .    389:           if len(p) > maxPath {
         .          .    390:                   maxPath = len(p)
         .          .    391:           }
         .          .    392:   }
         .          .    393:
   19.50MB    19.50MB    394:   pathsBuf := make([]string, maxPath)
         .          .    395:
         .          .    396:   for i < ln {
         .          .    397:           switch data[i] {
         .          .    398:           case '"':
         .          .    399:                   i++
         .          .    400:                   keyBegin := i
         .          .    401:
         .          .    402:                   strEnd, keyEscaped := stringEnd(data[i:])
         .          .    403:                   if strEnd == -1 {
         .          .    404:                           return -1
         .          .    405:                   }
         .          .    406:                   i += strEnd
         .          .    407:
         .          .    408:                   keyEnd := i - 1
         .          .    409:
         .          .    410:                   valueOffset := nextToken(data[i:])
         .          .    411:                   if valueOffset == -1 {
         .          .    412:                           return -1
         .          .    413:                   }
         .          .    414:
         .          .    415:                   i += valueOffset
         .          .    416:
         .          .    417:                   // if string is a key, and key level match
         .          .    418:                   if data[i] == ':' {
         .          .    419:                           match := -1
         .          .    420:                           key := data[keyBegin:keyEnd]
         .          .    421:
         .          .    422:                           // for unescape: if there are no escape sequences, this is cheap; if there are, it is a
         .          .    423:                           // bit more expensive, but causes no allocations unless len(key) > unescapeStackBufSize
         .          .    424:                           var keyUnesc []byte
   49.50MB    49.50MB    425:                           var stackbuf [unescapeStackBufSize]byte
         .          .    426:                           if !keyEscaped {
         .          .    427:                                   keyUnesc = key
         .          .    428:                           } else if ku, err := Unescape(key, stackbuf[:]); err != nil {
         .          .    429:                                   return -1
         .          .    430:                           } else {

@Villenny
Copy link
Contributor Author

Villenny commented Dec 24, 2020

DOH: Just noticed, a merge just went in that fixes var stackbuf [unescapeStackBufSize]byte, I have updated and the new code doesnt have this allocation happening.

@buger
Copy link
Owner

buger commented Dec 24, 2020

Interesting 🤔

This change was part of the code to actually reduce allocations #208

@Villenny
Copy link
Contributor Author

Villenny commented Dec 24, 2020

Interesting 🤔

This change was part of the code to actually reduce allocations #208

Huh, no idea why that fixed things, but you're right, your last released update fixes one of my 4 allocations, the other 3 can be addressed by:

For this line of code:
19.50MB 19.50MB 394: pathsBuf := make([]string, maxPath)

The problem is that variable sized allocations can never live on the stack, the standard go pattern for this is:

pathsBuf := make([]string, 64)[:0]
// add an item with:
pathsBuf = append(pathsBuf, foo)

In your case you could do:

	pathsBuf := make([]string, 128)[:]
	if maxPath > cap(pathsBuf) {
		pathsBuf = make([]string, maxPath)[:]
	}
	pathsBuf = pathsBuf[0:maxPath]

Similarly for:
pathFlags := make([]bool, len(paths))
and
pIdxFlags := make([]bool, len(paths))

@Villenny
Copy link
Contributor Author

Villenny commented Jan 4, 2021

#223

PS: Ugh, I missed pIdxFlags, which when changed similarly via :

			pIdxFlags := make([]bool, stackArraySize)[:]
			if len(paths) > cap(pIdxFlags) {
				pIdxFlags = make([]bool, len(paths))[:]
			}
			pIdxFlags = pIdxFlags[0:len(paths)]

Brings the allocation count to 0 in my test, all in giving a 10% ish speed bump.

@buger
Copy link
Owner

buger commented Jan 8, 2021

@xsandr since you were the person who made the previous change, can you help with a review here? Thank you!

@Villenny
Copy link
Contributor Author

Villenny commented Jan 8, 2021

On a semi-related note, for ease of use I wrap the EachKeys function with a helper to do field bindings and costlessly marshal directly to a struct:

One side affect of the EachKey interface having a lambda/closure style is that the user provided function cant be inlined, I'd be curious how much difference it would make if rather than invoking a user lamba function per iteration it instead returned a single array of results and let the user iterate.

ie

results, err := jsonparser.EachKeys(bodyBytes, make([]jsonparser.EachKeysResult, 64)[:0], paths...)
for i:=0; i<len(results); i++) {
  switch(results[i].Idx)
  case 0:
     id := BytesAsString(results[i].Value)
  //etc
}

the current EachKey implementation could then call the above one so that external users can still use the lambda function style pattern.

I'm sad to report that I cant achieve nearly the same performance as I can with jsoniterator, although with the fixes EachKey having zero allocations is really nice. (ie jsoniterator is roughly 2X faster for my simple benchmark)

BenchmarkJsonIterator-8
 1695078	      2139 ns/op	     832 B/op	      18 allocs/op
BenchmarkEachKey-8
  924060	      3751 ns/op	      96 B/op	       3 allocs/op

See: https://github.com/Villenny/jsonparserHelper-go for my helper module. Note how parser generation is its own step prior to use, this means that you can create the parser once at program init time, so it doesnt matter if it has allocations, which means that you can build lookup maps in it to achieve O(1) mappings. I didnt go so far as to reflect the struct to populate the parser, but that would be fairly easy.

IMHO, although its more technically correct to use arrays of keys, storing paths as a single string would allow path lookups in a map trivially, and play nicer with go's escape analysis.

Simplest use case for my helper is:

type TestResult struct {
	us  string
	i64 int64
	i   int
	f   float64
}

parser := MakeParser(make([]Parser, 32)[:0]).
		Add(UnsafeStringValue(unsafe.Offsetof(TestResult{}.us)), "us").
		Add(Int64Value(unsafe.Offsetof(TestResult{}.i64)), "i64").
		Add(IntValue(unsafe.Offsetof(TestResult{}.i)), "i").
		Add(Float64Value(unsafe.Offsetof(TestResult{}.f)), "f")

var results TestResult
err := RunParser([]byte(`{
	"us": "some string",
	"i64": "32"
	"i": "64"
	"f": "3.14"
}`), parser, unsafe.Pointer(&results))

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants