GitPack: Don't use memory mapped streams when accessing pack files #626

qmfrederik · 2021-06-28T12:20:58Z

This commit preserves the usage of memory mapped streams when reading the pack index, but reverts to using standard streams when reading raw data in pack files on 32-bit operating systems.

Fixes #584

qmfrederik · 2021-06-28T12:45:26Z

The perf tests seem to indicate that performance is decent on Linux and Windows, but there's a significant regression on macOS (relative to the libgit2 perf).

Pack files are accessed very frequently (because a single object can be deltafied on top of a base object, which in itself can be deltafied once more).

So it looks like we can either accept a performance regression on macOS, or need to identify a better strategy for caching these streams on macOS.

AArnott · 2021-06-28T12:56:17Z

I would accept a perf regression to correct for a functional regression, which is what we're looking at now with #584, right?
But might it be worth it and not undermine the reliability we're going for to use memory mapped files for "small" pack files? I guess we never know how much memory is available on a system, but for example if mapping 100MB pack files would suit most repos, I imagine we could "risk" consuming that much memory.
Or maybe it could be based on pointer size. If we're in a 64-bit process, we should be able to memory map everything, since RAM isn't really required for memory-mapped files as it's actually virtual memory address space that causes the failure in a 32-bit process.

filipnavara · 2021-07-23T13:10:32Z

On macOS the processes are always 64-bit so having two strategies (mmap for 64-bit process, regular I/O otherwise) would likely cover all of the cases without significant performance regressions. The effective maximum size of the pack file is limited anyway so on 64-bit you are basically guaranteed that there's enough virtual memory.

AArnott · 2021-07-23T13:13:32Z

What do you think of that idea, @qmfrederik? Can we make using memory mapped pack files conditional on process pointer size?

qmfrederik · 2021-07-23T13:20:06Z

Sure, that should work. The conditional logic can go into GetPackStream() so we should be able to keep it fairly clean.

Not sure whether the best approach would be to switch on process pointer size (32-bit vs 64-bit) or operating system (macOS vs. Linux & Windows).

AArnott · 2021-07-23T13:22:23Z

Not sure whether the best approach would be to switch on process pointer size (32-bit vs 64-bit) or operating system (macOS vs. Linux & Windows).

I'm thinking process pointer as that feels a bit less hacky or dependent on OS implementation details that we don't understand that might explain the perf difference between them. In a 64-bit process the memory impact seems to be not an issue, so mapping huge files should be fine.

AArnott · 2021-07-23T13:33:25Z

/azp run

azure-pipelines · 2021-07-23T13:33:36Z

Azure Pipelines successfully started running 1 pipeline(s).

AArnott · 2021-07-23T15:01:16Z

We're seeing file locking test failures. Do we have conditions under which file handles are not released?

Also, I suppose we should now modify the test pipeline to run tests in a 32-bit process, and again in a 64-bit process, so that both code paths are guaranteed to be tested. Do you want to take that on?

xUnit.net 00:00:34.55]     ManagedGit.GitPackTests.GetInvalidObject [FAIL]
[xUnit.net 00:00:34.57]     ManagedGit.GitPackTests.TryGetObjectTest [FAIL]
  Failed ManagedGit.GitPackTests.GetInvalidObject [13 ms]
  Error Message:
   System.IO.IOException : The process cannot access the file 'C:\Users\VssAdministrator\AppData\Local\Temp\tmp70D7.tmp' because it is being used by another process.
  Stack Trace:
     at System.IO.__Error.WinIOError(Int32 errorCode, String maybeFullPath)
   at System.IO.File.InternalDelete(String path, Boolean checkHost)
   at System.IO.File.Delete(String path)
   at ManagedGit.GitPackTests.Dispose() in D:\a\1\s\src\NerdBank.GitVersioning.Tests\ManagedGit\GitPackTests.cs:line 44
   at ReflectionAbstractionExtensions.DisposeTestClass(ITest test, Object testClass, IMessageBus messageBus, ExecutionTimer timer, CancellationTokenSource cancellationTokenSource) in C:\Dev\xunit\xunit\src\xunit.execution\Extensions\ReflectionAbstractionExtensions.cs:line 79
  Failed ManagedGit.GitPackTests.TryGetObjectTest [13 ms]
  Error Message:
   System.AggregateException : One or more errors occurred.
---- Assert.IsType() Failure
Expected: Nerdbank.GitVersioning.ManagedGit.MemoryMappedStream
Actual:   System.IO.FileStream
---- System.IO.IOException : The process cannot access the file 'C:\Users\VssAdministrator\AppData\Local\Temp\tmp7129.tmp' because it is being used by another process.
  Stack Trace:
  
----- Inner Stack Trace #1 (Xunit.Sdk.IsTypeException) -----
   at ManagedGit.GitPackTests.TryGetObjectTest() in D:\a\1\s\src\NerdBank.GitVersioning.Tests\ManagedGit\GitPackTests.cs:line 138
----- Inner Stack Trace #2 (System.IO.IOException) -----
   at System.IO.__Error.WinIOError(Int32 errorCode, String maybeFullPath)
   at System.IO.File.InternalDelete(String path, Boolean checkHost)
   at System.IO.File.Delete(String path)
   at ManagedGit.GitPackTests.Dispose() in D:\a\1\s\src\NerdBank.GitVersioning.Tests\ManagedGit\GitPackTests.cs:line 44
   at ReflectionAbstractionExtensions.DisposeTestClass(ITest test, Object testClass, IMessageBus messageBus, ExecutionTimer timer, CancellationTokenSource cancellationTokenSource) in C:\Dev\xunit\xunit\src\xunit.execution\Extensions\ReflectionAbstractionExtensions.cs:line 79
[xUnit.net 00:00:39.12]     GitExtensionsTests.TestBiggerRepo [SKIP]
  Skipped GitExtensionsTests.TestBiggerRepo [1 ms]
WARNING: Overwriting results file: D:\a\1\a/TestLogs/TestResults.trx
Results File: D:\a\1\a/TestLogs/TestResults.trx

AArnott · 2021-07-23T17:20:39Z

Looks good. You still have this PR as a draft. Are you ready for it to merge?

AArnott · 2021-07-23T17:21:13Z

Oh right, I still want to get the 32/64-bit test run in the pipeline. Did you want to work on that? If not, I'll try to find time by this weekend.

qmfrederik · 2021-07-23T19:02:12Z

@AArnott This should be good to go, modulo the 32 bit tests. I won't have time for that until early next week, so feel free to have a go at it.

AArnott · 2021-07-27T14:02:14Z

I see you're working on 32-bit testing in #632. Thank you, @qmfrederik.

filipnavara · 2021-07-27T15:24:15Z

🎉 Thanks a lot!

qmfrederik mentioned this pull request Jul 23, 2021

IOException when building from mapping pack files into memory #584

Closed

qmfrederik marked this pull request as ready for review July 23, 2021 18:58

qmfrederik changed the base branch from master to v3.4 July 27, 2021 14:29

qmfrederik added 2 commits July 27, 2021 17:01

GitPack: Don't use MemoryMappedFiles in 32-bit processes

edc1621

Make sure Streams are disposed of.

c955e1f

AArnott enabled auto-merge July 27, 2021 15:07

AArnott merged commit f90c41c into dotnet:v3.4 Jul 27, 2021

AArnott added this to the v3.4 milestone Jul 27, 2021

AArnott added the managed-git label Jul 28, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

GitPack: Don't use memory mapped streams when accessing pack files #626

GitPack: Don't use memory mapped streams when accessing pack files #626

qmfrederik commented Jun 28, 2021 •

edited

Loading

qmfrederik commented Jun 28, 2021

AArnott commented Jun 28, 2021

filipnavara commented Jul 23, 2021

AArnott commented Jul 23, 2021

qmfrederik commented Jul 23, 2021

AArnott commented Jul 23, 2021

AArnott commented Jul 23, 2021

azure-pipelines bot commented Jul 23, 2021

AArnott commented Jul 23, 2021

AArnott commented Jul 23, 2021

AArnott commented Jul 23, 2021

qmfrederik commented Jul 23, 2021

AArnott commented Jul 27, 2021

filipnavara commented Jul 27, 2021

GitPack: Don't use memory mapped streams when accessing pack files #626

GitPack: Don't use memory mapped streams when accessing pack files #626

Conversation

qmfrederik commented Jun 28, 2021 • edited Loading

qmfrederik commented Jun 28, 2021

AArnott commented Jun 28, 2021

filipnavara commented Jul 23, 2021

AArnott commented Jul 23, 2021

qmfrederik commented Jul 23, 2021

AArnott commented Jul 23, 2021

AArnott commented Jul 23, 2021

azure-pipelines bot commented Jul 23, 2021

AArnott commented Jul 23, 2021

AArnott commented Jul 23, 2021

AArnott commented Jul 23, 2021

qmfrederik commented Jul 23, 2021

AArnott commented Jul 27, 2021

filipnavara commented Jul 27, 2021

qmfrederik commented Jun 28, 2021 •

edited

Loading