Skip to content

JoinPaths: don't add "/" before the query string#19278

Merged
jhendrixMSFT merged 3 commits intoAzure:mainfrom
drakkan:joinpaths
Oct 5, 2022
Merged

JoinPaths: don't add "/" before the query string#19278
jhendrixMSFT merged 3 commits intoAzure:mainfrom
drakkan:joinpaths

Conversation

@drakkan
Copy link
Contributor

@drakkan drakkan commented Oct 4, 2022

Fixes #19245

  • The purpose of this PR is explained in this or a referenced issue.
  • The PR does not update generated files.
  • Tests are included and/or updated for code changes.
  • Updates to CHANGELOG.md are included.
  • MIT license headers are included in each file.

@ghost ghost added the customer-reported Issues that are reported by GitHub users external to the Azure organization. label Oct 4, 2022
@ghost
Copy link

ghost commented Oct 4, 2022

Thank you for your contribution drakkan! We will review the pull request and get back to you soon.

@jhendrixMSFT
Copy link
Member

When you fix the broken test can you please also add an entry to the changelog?

@drakkan
Copy link
Contributor Author

drakkan commented Oct 4, 2022

When you fix the broken test can you please also add an entry to the changelog?

done. I think it is ok to omit the slash before a query string. The url is valid and this change fixes the linked issue

@drakkan
Copy link
Contributor Author

drakkan commented Oct 4, 2022

Maybe we should always remove the trailing slash, something like this:

func JoinPaths(root string, paths ...string) string {
	if len(paths) == 0 {
		return root
	}

	qps := ""
	if strings.Contains(root, "?") {
		splitPath := strings.Split(root, "?")
		root, qps = splitPath[0], splitPath[1]
	}

	joinedPaths := path.Join(paths...)
	root = strings.TrimRight(root, "/")
	joinedPaths = strings.TrimLeft(joinedPaths, "/")
	root += "/" + joinedPaths

	if qps != "" {
		return root + "?" + qps
	}
	return root
}

what do you think about? Thanks

@jhendrixMSFT
Copy link
Member

I was thinking about TrimRight earlier too. It seems like with your fix that TrimRight wouldn't need to do any work. However, having it in place couldn't hurt especially if it catches some edge-case I'm overlooking.

@drakkan
Copy link
Contributor Author

drakkan commented Oct 5, 2022

I was thinking about TrimRight earlier too. It seems like with your fix that TrimRight wouldn't need to do any work. However, having it in place couldn't hurt especially if it catches some edge-case I'm overlooking.

TrimRight is required. If you have:

  • root = http://test.contoso.com/
  • paths = []string{"/path/one", "path/two", "/path/three/", "path/four/"}

you get http://test.contoso.com//path/one/path/two/path/three/path/four if you remove TrimRight.

Refactored the to use path.Join and strings.Builder, added some more test cases

jhendrixMSFT
jhendrixMSFT previously approved these changes Oct 5, 2022
@jhendrixMSFT
Copy link
Member

I was discussing this with @chlowell and he brought up a good point that http://foo/?bar=baz is valid, although unlikely. So maybe it would be better to preserve a trailing / but don't add one by default. Thoughts?

@drakkan
Copy link
Contributor Author

drakkan commented Oct 5, 2022

I was discussing this with @chlowell and he brought up a good point that http://foo/?bar=baz is valid, although unlikely. So maybe it would be better to preserve a trailing / but don't add one by default. Thoughts?

http://foo/?bar=baz, http://foo?bar=baz, http://foo/baz/?bar=baz, http://foo/baz?bar=baz are all valid URIs as far as I know. The problem here is that an URI like http://foo/baz/?bar=baz (even if valid) creates issues for some azblob calls. I don't know azure URL requirements. If you prefer I have no problems to update the patch and preverse the trailing slash. Let me know, thanks

@jhendrixMSFT
Copy link
Member

That's what Charles and I were discussing. Probably best that azcore is unopinionated and preserves what's passed to it but doesn't add it.

@jhendrixMSFT
Copy link
Member

Hmm this might be problematic in some cases. E.g. when obtaining a SAS URL for a storage service account via the portal, there will be a trailing /.

https://jhendrixstorage1.blob.core.windows.net/?sv=2021-06-08&...

Preserving the / in this case while joining container/blob names will be problematic.

So, while I still think that having JoinPaths be unopinionated is correct, we'll also need to fix this in azblob.

@jhendrixMSFT jhendrixMSFT dismissed their stale review October 5, 2022 16:13

need to tweak it per conversation

@drakkan
Copy link
Contributor Author

drakkan commented Oct 5, 2022

Hmm this might be problematic in some cases. E.g. when obtaining a SAS URL for a storage service account via the portal, there will be a trailing /.

https://jhendrixstorage1.blob.core.windows.net/?sv=2021-06-08&...

Preserving the / in this case while joining container/blob names will be problematic.

So, while I still think that having JoinPaths be unopinionated is correct, we'll also need to fix this in azblob.

what do you think about something like this?

func JoinPaths(root string, paths ...string) string {
	if len(paths) == 0 {
		return root
	}

	qps := ""
	if strings.Contains(root, "?") {
		splitPath := strings.Split(root, "?")
		root, qps = splitPath[0], splitPath[1]
	}
	hasTrailingSlash := strings.HasSuffix(paths[len(paths)-1], "/")

	var sb strings.Builder

	sb.WriteString(strings.TrimRight(root, "/"))

	joinedPaths := path.Join(paths...)
	joinedPaths = strings.TrimLeft(joinedPaths, "/")

	sb.WriteString("/")
	sb.WriteString(joinedPaths)
	if hasTrailingSlash {
		sb.WriteString("/")
	}

	if qps != "" {
		sb.WriteString("?")
		sb.WriteString(qps)
	}
	return sb.String()
}

the trailing slash is preserved if provided in the last path element. This should be quite unopinionated.

Also note that I don't care about the authorship of the patch. I just want the original issue fixed. If you think you can fix the problem on your own faster then go for it and close this PR. Thanks

…y string

Preserve a trailing slash if the last path element has it

Fixes Azure#19245
@drakkan drakkan requested a review from zezha-msft as a code owner October 5, 2022 17:24
also fix container.GetSASURL
@jhendrixMSFT jhendrixMSFT merged commit 12e98f5 into Azure:main Oct 5, 2022
@drakkan
Copy link
Contributor Author

drakkan commented Oct 6, 2022

@jhendrixMSFT thanks! I did a quick test and I confirm the merged patch fix the reported issue. I also wrote a small benchmark comparing the string builder approach with the used string concatenation

func BenchmarkJoinPaths(b *testing.B) {
	url := "http://192.168.1.12:10000/devstoreaccount1?sv=2021-04-10&ss=b&srt=sco&st=2022-10-04T17%3A30%3A22Z&se=2022-10-05T17%3A30%3A22Z&sp=rwdl&sig=3Zuos3PufKoEBM5OB9KbM8LJFZvT0JZvHNtoCf0kkSg%3D"
	blobName := "test"
	expected := "http://192.168.1.12:10000/devstoreaccount1/test?sv=2021-04-10&ss=b&srt=sco&st=2022-10-04T17%3A30%3A22Z&se=2022-10-05T17%3A30%3A22Z&sp=rwdl&sig=3Zuos3PufKoEBM5OB9KbM8LJFZvT0JZvHNtoCf0kkSg%3D"
	for i := 0; i < b.N; i++ {
		if p := JoinPaths(url, blobName); p != expected {
			panic(p)
		}
	}
}

func BenchmarkJoinPathsSB(b *testing.B) {
	url := "http://192.168.1.12:10000/devstoreaccount1?sv=2021-04-10&ss=b&srt=sco&st=2022-10-04T17%3A30%3A22Z&se=2022-10-05T17%3A30%3A22Z&sp=rwdl&sig=3Zuos3PufKoEBM5OB9KbM8LJFZvT0JZvHNtoCf0kkSg%3D"
	blobName := "test"
	expected := "http://192.168.1.12:10000/devstoreaccount1/test?sv=2021-04-10&ss=b&srt=sco&st=2022-10-04T17%3A30%3A22Z&se=2022-10-05T17%3A30%3A22Z&sp=rwdl&sig=3Zuos3PufKoEBM5OB9KbM8LJFZvT0JZvHNtoCf0kkSg%3D"
	for i := 0; i < b.N; i++ {
		if p := JoinPathsSB(url, blobName); p != expected {
			panic(p)
		}
	}
}

here the results:

cpu: Intel(R) Core(TM) i7-8750H CPU @ 2.20GHz
BenchmarkJoinPaths-12    	 1623058	       616.4 ns/op	     552 B/op	       6 allocs/op

cpu: Intel(R) Core(TM) i7-8750H CPU @ 2.20GHz
BenchmarkJoinPathsSB-12    	 3126475	       432.1 ns/op	     280 B/op	       5 allocs/op

performance is OK even with the current approach, we can optimize the code in the future, thanks

@jhendrixMSFT
Copy link
Member

Thanks for the analysis. I more or less copied this from url.Join() in Go 1.19 thinking if it was good enough for the standard library, we can use it too.

Agreed there's future optimization work (and not just in this func).

@drakkan
Copy link
Contributor Author

drakkan commented Oct 6, 2022

Thanks for the analysis. I more or less copied this from url.Join() in Go 1.19 thinking if it was good enough for the standard library, we can use it too.

Agreed there's future optimization work (and not just in this func).

I agree, v0.5.0 is a big step forward in usability over previous versions and NewListBlobsHierarchyPager also works much better (no longer making unnecessary recursive requests). Performance can be improved in the future. Thank's for your work.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

customer-reported Issues that are reported by GitHub users external to the Azure organization.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

azblob v0.5.0 and blockBlobClient path joining behaviour

2 participants