Begin `unsigned long`->`size_t` conversion to support large files on Windows #3533

PhilipOakley · 2021-11-13T16:27:02Z

This short series starts the code conversion to the more portable size_t type for object sizes.

A minimalist test is used as the initial test case.

The Windows LLP64 convention and Linux LP64 conventions differ in the sizeof(long),
meaning objects larger than 4GB (2GB in some case) are not handled correctly by the Linux LP64 code.

Git-LFS oversize object detection was resolved in Matt Cooper's series 596b5e7 (clean/smudge: allow clean filters to process extremely large files, 2021-11-02).

The strbuf's are already fully size_t in compatible.

Some RFC (request for comment) points:

Brian Carlson for checking the wider SHAxxx compatibility (e.g. the other sha libraries)
Dscho for the >big vs using a $() sub-shell for feeding the 5GB to stdin?
Should test be in t1007-hash-object.sh, or maybe in t1051-large-conversion.sh?
any other tidy-ups?

Future/next work: use -w write a loose object as a step-up in complexity:
This will need bigFileThreshold sorted, and zlib.

Strategically, avoid any pack work until loose object paths resolved.
Mark "/* TODO size_t <what> */" to avoid cascaded changes when call tree changes are deep and wide. The long/size_t name change has no effect on Linux, and on Windows it's already 'broken', so delaying the TODOs till later patches will reduce review loading.

2nd step work: the other hash-object options.

--
The PR can be passed (ggg) to the main list once dumb flaws are fixed.

Nevertheless, you can use GitGitGadget (https://gitgitgadget.github.io/)
to conveniently send your Pull Requests commits to our mailing list.

Philip

object-file.c

dscho

Thank you for working on this!

This PR is distinctly more pleasant to read and to review than #2179, it is really great to see.

There is a bit more work to be done to polish this to a point where we can hope for the patches to be accepted on the Git mailing list either without changes or with only minor adjustments. I am pretty optimistic at this stage that it won't take too much effort to get there so that we can upstream this soon after Git v2.34.0 is released.

dscho · 2021-11-15T07:09:35Z

t/t1007-hash-object.sh

@@ -49,6 +49,9 @@ test_expect_success 'setup' '

 	example sha1:ddd3f836d3e3fbb7ae289aa9ae83536f76956399
 	example sha256:b44fe1fe65589848253737db859bd490453510719d7424daab03daf0767b85ae
+


Commenting about the commit message: a verb is missing from the commit subject. It would read better as "hash-object: demonstrate failure with big files in an LLP64 data model".

dscho · 2021-11-15T08:07:31Z

object-file.c

@@ -2027,7 +2027,7 @@ int write_object_file_flags(const void *buf, unsigned long len,
 	return write_loose_object(oid, hdr, hdrlen, buf, len, 0, flags);
 }

-int hash_object_file_literally(const void *buf, unsigned long len,
+int hash_object_file_literally(const void *buf, size_t len,


Okay, this looks like a good change. Still, simply changing the signature leaves open a rather big question (which the commit message does not answer): shouldn't the callees of this function be updated, either? I would expect at least a call to cast_size_t_to_ulong(). At least when my eyes look at the signature of write_object_file_prepare(), it still uses unsigned long as the data type for len.

About the commit message:

object-file.c: use size_t for large object sizes

Please drop the .c. Or maybe even better: use hash_object_file_literally() as prefix, because we sure do not take care of all of object-file.c in this commit.

This is the starting point for size_t conversion for the --literally
option.

That --literally comes out of left field a bit. Did you mean to talk about git hash-object? The commit message did not talk about that, it talked about object-file.c, and that file has no --literally option.

The simplest of function signature changes.

I would strongly disagree that this is the simplest of function signature changes. Maybe avoid the judgement and instead lead with the motivation for this change (namely, spell it out that files >=4GB cannot currently be hashed in an LLP64 data model), and then explain that this change is a minimal step toward that goal.

There are no other uses of the function.

We have not even described one user. Does that mean that there are no users of this function?

The string buffer type is already size_t compatible.

Why is this relevant in this context? Remember, commit messages should clarify, preempt questions a reader might have about the patch.

Lower level function signature changes are in subsequent commits.

What are "lower level functions"? Do you mean callers of hash_object_file_literally()? Or callees?

Future commits will extend the code paths of the test cases to real usages.

That is too vague, especially given the fact that this commit does not even touch test cases.

Not sure what to do with this (far too much in the one comment).

'We' needed to start somewhere.

The first commit identified (and created) a failing test that has the tightest of focus. The changes then work from tat specific failure, working outward, until we have a passing test, then we can flip the test status. It only resolves that very narrowest of tests, leaving the rest still untested and failing (the future changes).

One factor this series did drive out (for me) is that the hash functions also needed updating, not just the zlib functions and the many internal Git length type usages.

Not sure what to do with this (far too much in the one comment).

You need to practice writing informative commit messages.

'We' needed to start somewhere.

Sure. And we now need to tell a story with the patch series. And like every good story, it needs to be edited. Nobody cares how we arrived at the end, as long as the story is fun to read.

The first commit identified (and created) a failing test that has the tightest of focus.

Yes, that's how you developed the patch series. And the last patch in the patch series changes something in the code that lets the test case pass. But it does not have to end there. The first commit can still stay in place, but the last commit should probably be put earlier in the patch series. Necessarily, the change from test_expect_failure to test_expect_success will have to be moved to the patch that will now be at the end.

Not sure what to do with this (far too much in the one comment).

You need to practice writing informative commit messages.

At my day job, this problem came up, and the best advice that came out of the session was: pretend that you are talking your future self, like, 6 months from now, when you don't recall any details. What would you tell your future self to catch them up?

I am usually loathe to suggest commit messages to native English speakers (because English is only a second language to me, or third, depending how you want to count). So I will need encouragement from your side if you want me to.

I've updated the commit messages, but retained the 'find, fix, find again' processing, and the half-stepping (splitting at signature, then do internals), though all false steps have been rebased away.

I have also added three additional tests at the end, because they passed (yay) which gives a bit of confidence there weren't other gotcha's in those other parallel paths.

I haven't updated the PR cover note yet.

object-file.c

dscho · 2021-11-15T09:43:16Z

object-file.c

 			    const char *type, struct object_id *oid,
 			    unsigned flags)
 {
 	char hdr[MAX_HEADER_LEN];
-	int hdrlen = sizeof(hdr);
+	size_t hdrlen = sizeof(hdr);


Unlike hash_object_file(), which only calls write_object_file_prepare() and therefore handles the increased data width of len and hdrlen correctly, the write_object_file_flags() function also calls write_loose_object(), which you have not prepared to accept the wider data type:

git/object-file.c

Lines 1919 to 1921 in 06d5a16

static int write_loose_object(const struct object_id *oid, char *hdr,

int hdrlen, const void *buf, unsigned long len,

time_t mtime, unsigned flags)

It would be better to change write_loose_object() in a preemptive patch before changing write_object_file_flags(): it is totally possible to call a function that accepts size_t parameters with unsigned long, as the latter will be always at most as wide as the former, at least in the data models we support in Git.

dscho · 2021-11-15T09:44:26Z

object-file.c

@@ -2032,7 +2032,8 @@ int hash_object_file_literally(const void *buf, size_t len,
 			       unsigned flags)
 {
 	char *header;
-	int hdrlen, status = 0;
+	size_t hdrlen;


Same comment as for write_object_file_flags(): the hash_object_file_literally() function also calls write_loose_object() (which should be prepared for size_t before changing the data type of hdrlen here).

sha1dc_git.c

On LLP64 systems, such as Windows, the size of `long`, `int`, etc. is only 32 bits (for backward compatibility). Git's use of `unsigned long` for file memory sizes in many places, rather than size_t, limits the handling of large files on LLP64 systems (commonly given as `>4GB`). Provide a minimum test for handling a >4GB file. The `hash-object` command, with the `--literally` and without `-w` option avoids writing the object, either loose or packed. This avoids the code paths hitting the `bigFileThreshold` config test code, the zlib code, and the pack code. Subsequent patches will walk the test's call chain, converting types to `size_t` (which is larger in LLP64 data models) where appropriate. Signed-off-by: Philip Oakley <[email protected]> Signed-off-by: Johannes Schindelin <[email protected]>

The previous commit adds a test that demonstrates a problem in the `hash-object --literally` command, manifesting in an unnecessary file size limit on systems using the LLP64 data model (which includes Windows). Walking the affected code path is `cmd_hash_object()` >> `hash_fd()` >> `hash_literally()` >> `hash_object_file_literally()`. The function `hash_object_file_literally()` is the first with a file length parameter (via a mem buffer). This commit changes the type of that parameter to the LLP64 compatible `size_t` type. There are no other uses of the function. The `strbuf` type is already `size_t` compatible. Note: The hash-object test does not yet pass. Subsequent commits will continue to walk the call tree's lower level functions to identify further fixes. Signed-off-by: Philip Oakley <[email protected]> Signed-off-by: Johannes Schindelin <[email protected]>

Continue walking the code path for the >4GB `hash-object --literally` test. The `hash_object_file_literally()` function internally uses both `hash_object_file()` and `write_object_file_prepare()`. Both function signatures use `unsigned long` rather than `size_t` for the mem buffer sizes. Use `size_t` instead, for LLP64 compatibility. While at it, convert those function's object's header buffer length to `size_t` for consistency. The value is already upcast to `uintmax_t` for print format compatibility. Note: The hash-object test still does not pass. A subsequent commit continues to walk the call tree's lower level hash functions to identify further fixes. Signed-off-by: Philip Oakley <[email protected]> Signed-off-by: Johannes Schindelin <[email protected]>

Continue walking the code path for the >4GB `hash-object --literally` test to the hash algorithm step for LLP64 systems. This patch lets the SHA1DC code use `size_t`, making it compatible with LLP64 data models (as used e.g. by Windows). The interested reader of this patch will note that we adjust the signature of the `git_SHA1DCUpdate()` function without updating _any_ call site. This certainly puzzled at least one reviewer already, so here is an explanation: This function is never called directly, but always via the macro `platform_SHA1_Update`, which is usually called via the macro `git_SHA1_Update`. However, we never call `git_SHA1_Update()` directly in `struct git_hash_algo`. Instead, we call `git_hash_sha1_update()`, which is defined thusly: static void git_hash_sha1_update(git_hash_ctx *ctx, const void *data, size_t len) { git_SHA1_Update(&ctx->sha1, data, len); } i.e. it contains an implicit downcast from `size_t` to `unsigned long` (before this here patch). With this patch, there is no downcast anymore. With this patch, finally, the t1007-hash-object.sh "files over 4GB hash literally" test case is fixed. Signed-off-by: Philip Oakley <[email protected]> Signed-off-by: Johannes Schindelin <[email protected]>

Just like the `hash-object --literally` code path, the `--stdin` code path also needs to use `size_t` instead of `unsigned long` to represent memory sizes, otherwise it would cause problems on platforms using the LLP64 data model (such as Windows). To limit the scope of the test case, the object is explicitly not written to the object store, nor are any filters applied. The `big` file from the previous test case is reused to save setup time; To avoid relying on that side effect, it is generated if it does not exist (e.g. when running via `sh t1007-*.sh --long --run=1,41`). Signed-off-by: Philip Oakley <[email protected]> Signed-off-by: Johannes Schindelin <[email protected]>

To complement the `--stdin` and `--literally` test cases that verify that we can hash files larger than 4GB on 64-bit platforms using the LLP64 data model, here is a test case that exercises `hash-object` _without_ any options. Just as before, we use the `big` file from the previous test case if it exists to save on setup time, otherwise generate it. Signed-off-by: Philip Oakley <[email protected]> Signed-off-by: Johannes Schindelin <[email protected]>

To verify that the `clean` side of the `clean`/`smudge` filter code is correct with regards to LLP64 (read: to ensure that `size_t` is used instead of `unsigned long`), here is a test case using a trivial filter, specifically _not_ writing anything to the object store to limit the scope of the test case. As in previous commits, the `big` file from previous test cases is reused if available, to save setup time, otherwise re-generated. Signed-off-by: Philip Oakley <[email protected]> Signed-off-by: Johannes Schindelin <[email protected]>

dscho · 2021-12-15T10:53:35Z

The macos failures are due to the occasional update of Perforce (they overwrite the file when upgrading, so that Homebrew always ends up with outdated checksums for a while).

Begin `unsigned long`->`size_t` conversion to support large files on Windows

PhilipOakley · 2021-12-18T11:23:57Z

Thank you for the updates and extra info in the commit messages.

I hadn't thought about that alternative (the test -f big ||) to the code comment.

I hadn't thought of the simple sha1 #defines as being macros, which I tend to think of as the functional form, but it's not important to the explanation. I'd agree that the renaming dance does get confusing. The second set of eyes does make for a better explanation.

Next, for me, is either the size_t bigFileThreshold configs (currently missing) leading to actually writing the big loose object, or maybe try the 'roll-up' of the size_t changes in Matt's git-lfs tree #3487 (#3487 (comment) bottom, up to do_oid_object_info_extended())

Begin `unsigned long`->`size_t` conversion to support large files on Windows

dscho reviewed Nov 15, 2021

View reviewed changes

object-file.c Outdated Show resolved Hide resolved

dscho requested changes Nov 15, 2021

View reviewed changes

PhilipOakley force-pushed the hashliteral_t branch 2 times, most recently from daeb6c8 to 437ec2a Compare December 7, 2021 10:00

PhilipOakley added 7 commits December 15, 2021 11:15

dscho force-pushed the hashliteral_t branch from 437ec2a to a87498f Compare December 15, 2021 10:45

dscho approved these changes Dec 15, 2021

View reviewed changes

dscho changed the title ~~[RFC] begin Windows large object conversion~~ Begin unsigned long->size_t conversion to support large files on Windows Dec 15, 2021

dscho merged commit 784d65b into git-for-windows:main Dec 15, 2021

dscho added this to the Next release milestone Dec 15, 2021

git-for-windows-ci pushed a commit that referenced this pull request Dec 15, 2021

Merge pull request #3533 from PhilipOakley/hashliteral_t

209eaf9

Begin `unsigned long`->`size_t` conversion to support large files on Windows

git-for-windows-ci pushed a commit that referenced this pull request Dec 15, 2021

Merge pull request #3533 from PhilipOakley/hashliteral_t

094c8f4

Begin `unsigned long`->`size_t` conversion to support large files on Windows

git-for-windows-ci pushed a commit that referenced this pull request Dec 15, 2021

Merge pull request #3533 from PhilipOakley/hashliteral_t

1572922

Begin `unsigned long`->`size_t` conversion to support large files on Windows

git-for-windows-ci pushed a commit that referenced this pull request Dec 15, 2021

Merge pull request #3533 from PhilipOakley/hashliteral_t

4043eb2

Begin `unsigned long`->`size_t` conversion to support large files on Windows

git-for-windows-ci pushed a commit that referenced this pull request Dec 15, 2021

Merge pull request #3533 from PhilipOakley/hashliteral_t

bdd472c

Begin `unsigned long`->`size_t` conversion to support large files on Windows

git-for-windows-ci pushed a commit that referenced this pull request Dec 15, 2021

Merge pull request #3533 from PhilipOakley/hashliteral_t

437bd74

Begin `unsigned long`->`size_t` conversion to support large files on Windows

dscho added a commit that referenced this pull request Dec 15, 2021

Merge pull request #3533 from PhilipOakley/hashliteral_t

5d3d7c3

Begin `unsigned long`->`size_t` conversion to support large files on Windows

git-for-windows-ci pushed a commit that referenced this pull request Dec 15, 2021

Merge pull request #3533 from PhilipOakley/hashliteral_t

45fb35d

Begin `unsigned long`->`size_t` conversion to support large files on Windows

git-for-windows-ci pushed a commit that referenced this pull request Dec 16, 2021

Merge pull request #3533 from PhilipOakley/hashliteral_t

c9e9863

Begin `unsigned long`->`size_t` conversion to support large files on Windows

git-for-windows-ci pushed a commit that referenced this pull request Dec 16, 2021

Merge pull request #3533 from PhilipOakley/hashliteral_t

fd327e1

Begin `unsigned long`->`size_t` conversion to support large files on Windows

dscho added a commit that referenced this pull request Dec 17, 2021

Merge pull request #3533 from PhilipOakley/hashliteral_t

4c2a87f

Begin `unsigned long`->`size_t` conversion to support large files on Windows

dscho added a commit that referenced this pull request Dec 21, 2021

Merge pull request #3533 from PhilipOakley/hashliteral_t

6619e98

Begin `unsigned long`->`size_t` conversion to support large files on Windows

git-for-windows-ci pushed a commit that referenced this pull request Jan 7, 2025

Merge pull request #3533 from PhilipOakley/hashliteral_t

517f11a

Begin `unsigned long`->`size_t` conversion to support large files on Windows

dscho mentioned this pull request Jan 7, 2025

Rebase to v2.48.0 rc2 #5348

Merged

dscho added a commit to dscho/git that referenced this pull request Jan 7, 2025

Merge pull request git-for-windows#3533 from PhilipOakley/hashliteral_t

711afaa

Begin `unsigned long`->`size_t` conversion to support large files on Windows

dscho added a commit to dscho/git that referenced this pull request Jan 7, 2025

Merge pull request git-for-windows#3533 from PhilipOakley/hashliteral_t

4ec8907

Begin `unsigned long`->`size_t` conversion to support large files on Windows

git-for-windows-ci pushed a commit that referenced this pull request Jan 7, 2025

Merge pull request #3533 from PhilipOakley/hashliteral_t

9f9f004

Begin `unsigned long`->`size_t` conversion to support large files on Windows

git-for-windows-ci pushed a commit that referenced this pull request Jan 7, 2025

Merge pull request #3533 from PhilipOakley/hashliteral_t

8cc2862

Begin `unsigned long`->`size_t` conversion to support large files on Windows

git-for-windows-ci pushed a commit that referenced this pull request Jan 8, 2025

Merge pull request #3533 from PhilipOakley/hashliteral_t

5261768

Begin `unsigned long`->`size_t` conversion to support large files on Windows

git-for-windows-ci pushed a commit that referenced this pull request Jan 8, 2025

Merge pull request #3533 from PhilipOakley/hashliteral_t

91e2c96

Begin `unsigned long`->`size_t` conversion to support large files on Windows

git-for-windows-ci pushed a commit that referenced this pull request Jan 8, 2025

Merge pull request #3533 from PhilipOakley/hashliteral_t

675aec9

Begin `unsigned long`->`size_t` conversion to support large files on Windows

git-for-windows-ci pushed a commit that referenced this pull request Jan 8, 2025

Merge pull request #3533 from PhilipOakley/hashliteral_t

b995daa

Begin `unsigned long`->`size_t` conversion to support large files on Windows

git-for-windows-ci pushed a commit that referenced this pull request Jan 9, 2025

Merge pull request #3533 from PhilipOakley/hashliteral_t

0705fa1

Begin `unsigned long`->`size_t` conversion to support large files on Windows

git-for-windows-ci pushed a commit that referenced this pull request Jan 9, 2025

Merge pull request #3533 from PhilipOakley/hashliteral_t

7e6fd7d

Begin `unsigned long`->`size_t` conversion to support large files on Windows

dscho added a commit that referenced this pull request Jan 17, 2025

Merge pull request #3533 from PhilipOakley/hashliteral_t

b1441c0

Begin `unsigned long`->`size_t` conversion to support large files on Windows

dscho added a commit that referenced this pull request Jan 27, 2025

Merge pull request #3533 from PhilipOakley/hashliteral_t

d46a187

Begin `unsigned long`->`size_t` conversion to support large files on Windows

dscho added a commit that referenced this pull request Jan 27, 2025

Merge pull request #3533 from PhilipOakley/hashliteral_t

e67f356

Begin `unsigned long`->`size_t` conversion to support large files on Windows

dscho added a commit that referenced this pull request Jan 27, 2025

Merge pull request #3533 from PhilipOakley/hashliteral_t

4d09baa

Begin `unsigned long`->`size_t` conversion to support large files on Windows

dscho added a commit that referenced this pull request Jan 27, 2025

Merge pull request #3533 from PhilipOakley/hashliteral_t

1808e82

Begin `unsigned long`->`size_t` conversion to support large files on Windows

git-for-windows-ci pushed a commit that referenced this pull request Jan 27, 2025

Merge pull request #3533 from PhilipOakley/hashliteral_t

aa7e7e5

Begin `unsigned long`->`size_t` conversion to support large files on Windows

git-for-windows-ci pushed a commit that referenced this pull request Jan 28, 2025

Merge pull request #3533 from PhilipOakley/hashliteral_t

bee9afa

Begin `unsigned long`->`size_t` conversion to support large files on Windows

git-for-windows-ci pushed a commit that referenced this pull request Jan 28, 2025

Merge pull request #3533 from PhilipOakley/hashliteral_t

2fa4126

Begin `unsigned long`->`size_t` conversion to support large files on Windows

dscho added a commit that referenced this pull request Feb 3, 2025

Merge pull request #3533 from PhilipOakley/hashliteral_t

5d578e3

Begin `unsigned long`->`size_t` conversion to support large files on Windows

dscho added a commit that referenced this pull request Feb 5, 2025

Merge pull request #3533 from PhilipOakley/hashliteral_t

8f57984

Begin `unsigned long`->`size_t` conversion to support large files on Windows

dscho added a commit that referenced this pull request Feb 5, 2025

Merge pull request #3533 from PhilipOakley/hashliteral_t

5ac6b05

Begin `unsigned long`->`size_t` conversion to support large files on Windows

dscho added a commit that referenced this pull request Feb 5, 2025

Merge pull request #3533 from PhilipOakley/hashliteral_t

ee09ee6

Begin `unsigned long`->`size_t` conversion to support large files on Windows

git-for-windows-ci pushed a commit that referenced this pull request Feb 5, 2025

Merge pull request #3533 from PhilipOakley/hashliteral_t

b08f636

Begin `unsigned long`->`size_t` conversion to support large files on Windows

dscho added a commit that referenced this pull request Feb 6, 2025

Merge pull request #3533 from PhilipOakley/hashliteral_t

4fc9774

Begin `unsigned long`->`size_t` conversion to support large files on Windows

dscho added a commit that referenced this pull request Feb 6, 2025

Merge pull request #3533 from PhilipOakley/hashliteral_t

d195f12

Begin `unsigned long`->`size_t` conversion to support large files on Windows

git-for-windows-ci pushed a commit that referenced this pull request Feb 6, 2025

Merge pull request #3533 from PhilipOakley/hashliteral_t

a45dca0

Begin `unsigned long`->`size_t` conversion to support large files on Windows

dscho added a commit to dscho/git that referenced this pull request Feb 6, 2025

Merge pull request git-for-windows#3533 from PhilipOakley/hashliteral_t

a1db5a6

Begin `unsigned long`->`size_t` conversion to support large files on Windows

dscho added a commit to dscho/git that referenced this pull request Feb 6, 2025

Merge pull request git-for-windows#3533 from PhilipOakley/hashliteral_t

b2ecb02

Begin `unsigned long`->`size_t` conversion to support large files on Windows

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Begin `unsigned long`->`size_t` conversion to support large files on Windows #3533

Begin `unsigned long`->`size_t` conversion to support large files on Windows #3533

PhilipOakley commented Nov 13, 2021

dscho left a comment

dscho Nov 15, 2021

dscho Nov 15, 2021

PhilipOakley Nov 16, 2021

dscho Nov 16, 2021

dscho Nov 18, 2021

PhilipOakley Dec 8, 2021

PhilipOakley Dec 8, 2021

dscho Nov 15, 2021

dscho Nov 15, 2021

dscho commented Dec 15, 2021

PhilipOakley commented Dec 18, 2021

		@@ -49,6 +49,9 @@ test_expect_success 'setup' '

		example sha1:ddd3f836d3e3fbb7ae289aa9ae83536f76956399
		example sha256:b44fe1fe65589848253737db859bd490453510719d7424daab03daf0767b85ae

	static int write_loose_object(const struct object_id oid, char hdr,
	int hdrlen, const void *buf, unsigned long len,
	time_t mtime, unsigned flags)

Begin unsigned long->size_t conversion to support large files on Windows #3533

Begin unsigned long->size_t conversion to support large files on Windows #3533

Conversation

PhilipOakley commented Nov 13, 2021

dscho left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

dscho commented Dec 15, 2021

PhilipOakley commented Dec 18, 2021

Begin `unsigned long`->`size_t` conversion to support large files on Windows #3533

Begin `unsigned long`->`size_t` conversion to support large files on Windows #3533