Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GitPack: Don't use memory mapped streams when accessing pack files #626

Merged
merged 2 commits into from
Jul 27, 2021
Merged

GitPack: Don't use memory mapped streams when accessing pack files #626

merged 2 commits into from
Jul 27, 2021

Conversation

qmfrederik
Copy link
Contributor

@qmfrederik qmfrederik commented Jun 28, 2021

This commit preserves the usage of memory mapped streams when reading the pack index, but reverts to using standard streams when reading raw data in pack files on 32-bit operating systems.

Fixes #584

@qmfrederik
Copy link
Contributor Author

The perf tests seem to indicate that performance is decent on Linux and Windows, but there's a significant regression on macOS (relative to the libgit2 perf).

Pack files are accessed very frequently (because a single object can be deltafied on top of a base object, which in itself can be deltafied once more).

So it looks like we can either accept a performance regression on macOS, or need to identify a better strategy for caching these streams on macOS.

@AArnott
Copy link
Collaborator

AArnott commented Jun 28, 2021

I would accept a perf regression to correct for a functional regression, which is what we're looking at now with #584, right?
But might it be worth it and not undermine the reliability we're going for to use memory mapped files for "small" pack files? I guess we never know how much memory is available on a system, but for example if mapping 100MB pack files would suit most repos, I imagine we could "risk" consuming that much memory.
Or maybe it could be based on pointer size. If we're in a 64-bit process, we should be able to memory map everything, since RAM isn't really required for memory-mapped files as it's actually virtual memory address space that causes the failure in a 32-bit process.

@filipnavara
Copy link
Member

On macOS the processes are always 64-bit so having two strategies (mmap for 64-bit process, regular I/O otherwise) would likely cover all of the cases without significant performance regressions. The effective maximum size of the pack file is limited anyway so on 64-bit you are basically guaranteed that there's enough virtual memory.

@AArnott
Copy link
Collaborator

AArnott commented Jul 23, 2021

What do you think of that idea, @qmfrederik? Can we make using memory mapped pack files conditional on process pointer size?

@qmfrederik
Copy link
Contributor Author

Sure, that should work. The conditional logic can go into GetPackStream() so we should be able to keep it fairly clean.

Not sure whether the best approach would be to switch on process pointer size (32-bit vs 64-bit) or operating system (macOS vs. Linux & Windows).

@AArnott
Copy link
Collaborator

AArnott commented Jul 23, 2021

Not sure whether the best approach would be to switch on process pointer size (32-bit vs 64-bit) or operating system (macOS vs. Linux & Windows).

I'm thinking process pointer as that feels a bit less hacky or dependent on OS implementation details that we don't understand that might explain the perf difference between them. In a 64-bit process the memory impact seems to be not an issue, so mapping huge files should be fine.

@AArnott
Copy link
Collaborator

AArnott commented Jul 23, 2021

/azp run

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@AArnott
Copy link
Collaborator

AArnott commented Jul 23, 2021

We're seeing file locking test failures. Do we have conditions under which file handles are not released?

Also, I suppose we should now modify the test pipeline to run tests in a 32-bit process, and again in a 64-bit process, so that both code paths are guaranteed to be tested. Do you want to take that on?

xUnit.net 00:00:34.55]     ManagedGit.GitPackTests.GetInvalidObject [FAIL]
[xUnit.net 00:00:34.57]     ManagedGit.GitPackTests.TryGetObjectTest [FAIL]
  Failed ManagedGit.GitPackTests.GetInvalidObject [13 ms]
  Error Message:
   System.IO.IOException : The process cannot access the file 'C:\Users\VssAdministrator\AppData\Local\Temp\tmp70D7.tmp' because it is being used by another process.
  Stack Trace:
     at System.IO.__Error.WinIOError(Int32 errorCode, String maybeFullPath)
   at System.IO.File.InternalDelete(String path, Boolean checkHost)
   at System.IO.File.Delete(String path)
   at ManagedGit.GitPackTests.Dispose() in D:\a\1\s\src\NerdBank.GitVersioning.Tests\ManagedGit\GitPackTests.cs:line 44
   at ReflectionAbstractionExtensions.DisposeTestClass(ITest test, Object testClass, IMessageBus messageBus, ExecutionTimer timer, CancellationTokenSource cancellationTokenSource) in C:\Dev\xunit\xunit\src\xunit.execution\Extensions\ReflectionAbstractionExtensions.cs:line 79
  Failed ManagedGit.GitPackTests.TryGetObjectTest [13 ms]
  Error Message:
   System.AggregateException : One or more errors occurred.
---- Assert.IsType() Failure
Expected: Nerdbank.GitVersioning.ManagedGit.MemoryMappedStream
Actual:   System.IO.FileStream
---- System.IO.IOException : The process cannot access the file 'C:\Users\VssAdministrator\AppData\Local\Temp\tmp7129.tmp' because it is being used by another process.
  Stack Trace:
  
----- Inner Stack Trace #1 (Xunit.Sdk.IsTypeException) -----
   at ManagedGit.GitPackTests.TryGetObjectTest() in D:\a\1\s\src\NerdBank.GitVersioning.Tests\ManagedGit\GitPackTests.cs:line 138
----- Inner Stack Trace #2 (System.IO.IOException) -----
   at System.IO.__Error.WinIOError(Int32 errorCode, String maybeFullPath)
   at System.IO.File.InternalDelete(String path, Boolean checkHost)
   at System.IO.File.Delete(String path)
   at ManagedGit.GitPackTests.Dispose() in D:\a\1\s\src\NerdBank.GitVersioning.Tests\ManagedGit\GitPackTests.cs:line 44
   at ReflectionAbstractionExtensions.DisposeTestClass(ITest test, Object testClass, IMessageBus messageBus, ExecutionTimer timer, CancellationTokenSource cancellationTokenSource) in C:\Dev\xunit\xunit\src\xunit.execution\Extensions\ReflectionAbstractionExtensions.cs:line 79
[xUnit.net 00:00:39.12]     GitExtensionsTests.TestBiggerRepo [SKIP]
  Skipped GitExtensionsTests.TestBiggerRepo [1 ms]
WARNING: Overwriting results file: D:\a\1\a/TestLogs/TestResults.trx
Results File: D:\a\1\a/TestLogs/TestResults.trx

@AArnott
Copy link
Collaborator

AArnott commented Jul 23, 2021

Looks good. You still have this PR as a draft. Are you ready for it to merge?

@AArnott
Copy link
Collaborator

AArnott commented Jul 23, 2021

Oh right, I still want to get the 32/64-bit test run in the pipeline. Did you want to work on that? If not, I'll try to find time by this weekend.

@qmfrederik qmfrederik marked this pull request as ready for review July 23, 2021 18:58
@qmfrederik
Copy link
Contributor Author

@AArnott This should be good to go, modulo the 32 bit tests. I won't have time for that until early next week, so feel free to have a go at it.

@AArnott
Copy link
Collaborator

AArnott commented Jul 27, 2021

I see you're working on 32-bit testing in #632. Thank you, @qmfrederik.

@qmfrederik qmfrederik changed the base branch from master to v3.4 July 27, 2021 14:29
@AArnott AArnott enabled auto-merge July 27, 2021 15:07
@AArnott AArnott merged commit f90c41c into dotnet:v3.4 Jul 27, 2021
@filipnavara
Copy link
Member

🎉 Thanks a lot!

@AArnott AArnott added this to the v3.4 milestone Jul 27, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

IOException when building from mapping pack files into memory
3 participants