Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Sometimes file metadata is invalid #1102

Closed
Tom-Newton opened this issue Mar 31, 2023 · 18 comments · Fixed by #1107
Closed

Sometimes file metadata is invalid #1102

Tom-Newton opened this issue Mar 31, 2023 · 18 comments · Fixed by #1107
Assignees
Labels
Milestone

Comments

@Tom-Newton
Copy link

Tom-Newton commented Mar 31, 2023

Which version of blobfuse was used?

blobfuse2 version 2.0.2
with:
fusermount3 version: 3.9.0

Which OS distribution and version are you using?

Ubuntu 20.04

If relevant, please share your mount command.

config_template.txt
Note this is a template not the resolved config but hopefully it gives all the relevant information.

What was the issue encountered?

Sometimes timestamps in the file metadata are invalid. In my case this is very annoying because it causes rusts std::fs::metadata to panic. It fails when attempting to do a sanity check on some of the metadata timestamps. I'm trying to use a 3rd party library based on rust that is hitting this issue.

Have you found a mitigation/solution?

Use an older version of rust prior to rust-lang/rust@a913277

Steps to reproduce

  1. Upload a file to blob storage. I just used the Azure storage explorer app to upload a dummy file test1.txt

  2. Create a minimal rust script that uses std::fs::metadata. Update the path used to be the blobfuse path to the file you just uploaded.

fn main() -> std::io::Result<()> {
    use std::fs;

    let metadata = fs::metadata("<blobfuse path to test file>")?;

    println!("{:?}", metadata);
    Ok(())
}
  1. Compile and run the script using a new version of rust. I used rustc 1.68.2 (9eb3afe9e 2023-03-27)
  2. You should see thread 'main' panicked at 'assertion failed: tv_nsec >= 0 && tv_nsec < NSEC_PER_SEC as i64', library/std/src/sys/unix/time.rs:69:9 in the terminal.
@Tom-Newton Tom-Newton changed the title Returns invalid file timestamp metadata Sometimes file metadata is invalid Mar 31, 2023
@vibhansa-msft
Copy link
Member

From blobfuse end, last change time is always set to last modified time of the blob. Do you see both change and modified time as invalid ? Can you share the values that you are getting as results of your operation. Also, you can try running 'stat' command from linux shell and see if they also represent invalid timestamps or not.

@vibhansa-msft vibhansa-msft self-assigned this Apr 3, 2023
@vibhansa-msft vibhansa-msft added this to the V2-2.0.3 milestone Apr 3, 2023
@Tom-Newton
Copy link
Author

Thanks for getting back to me.

Running stat returns:

  Size: 28              Blocks: 0          IO Block: 4096   regular file
Device: 39h/57d Inode: 4           Links: 1
Access: (0640/-rw-r-----)  Uid: (    0/    root)   Gid: (    0/    root)
Access: 2023-03-31 17:44:01.999999999 +0100
Modify: 2023-03-31 17:44:01.1268836864 +0100
Change: 2023-03-31 17:44:01.1268836864 +0100
 Birth: -

I think the problem is in the nanoseconds portion of the the modify and change timestamps. They are actually greater than 1E9 nano seconds. So that makes it bigger than 1 second which I expect is why its considered invalid. This aligns with what I see in the rust source code https://github.com/rust-lang/rust/blob/a91327782906885ccd8b8bf25e1e7f9ea46f8428/library/std/src/sys/unix/time.rs#L6.

Looking in the Azure storage explorer it looks like the blob created and modify times are actually only stored to seconds precision so I'm not sure where these problematic nanosecond values have come from.
Screenshot from 2023-04-03 09-01-32

I'm not completely sure what you mean by the results of my operation. The 3rd party library I'm trying to use crashes after it panics in 1 thread then crashes the main thread when trying to join the broken thread. My minimal script panics and logs partial metadata (missing all the timestamps) info to stdout.

@vibhansa-msft
Copy link
Member

In my environment I do not see it going 1E9+
stat /usr/blob_mnt/myfile_8.tst
File: /usr/blob_mnt/myfile_8.tst
Size: 8388608 Blocks: 0 IO Block: 4096 regular file
Device: 33h/51d Inode: 2 Links: 1
Access: (0777/-rwxrwxrwx) Uid: ( 1000/ vikas) Gid: ( 1000/ vikas)
Access: 2023-04-03 11:31:21**.208906752** +0530
Modify: 2023-04-03 11:31:21.208906752 +0530
Change: 2023-04-03 11:31:21.208906752 +0530

@vibhansa-msft
Copy link
Member

Is is something that your system settings control ?

@Tom-Newton
Copy link
Author

Tom-Newton commented Apr 3, 2023

I can't think of any system setting that could effect this. I have personally tested on 2 computers and stat returned exactly the same erroneous value. I also had bug reports from colleagues where I'm pretty confident the root cause was the same invalid timestamps problem so I think that amounts to 4 different computers where we have had this issue.

I additionally tried running in docker using the ubuntu:20.04 from https://hub.docker.com/_/ubuntu only installing minimal dependencies as per https://learn.microsoft.com/en-us/azure/storage/blobs/blobfuse2-how-to-deploy#how-to-install-blobfuse2. The result was the same as running on my local machine so I don't think its system settings thing.

I've been adding some debug logging to blobfuse to try to understand what is going on. It looks like the timestamps are returned by blobfuse here

(*stbuf).st_mtim.tv_sec = C.long(attr.Mtime.Unix())
(*stbuf).st_mtim.tv_nsec = C.long(attr.Mtime.UnixNano())

This is returned as a unix timestamps since the epoch in nanoseconds and in seconds. So I'm not sure how a blobfuse bug could cause this issue but at the same time I don't have this issue with any other filesystems.

Looking at what blobfuse is trying to set I always see numbers like 1680281041000000000 so I'm pretty confused why we end up with non-zero s at less than second precision. My only guess is that stat is accounting for years not being exactly 365.25 days.

I'm quite suspicious of the nanosecond portion of the access time always being .999999999.

@vibhansa-msft
Copy link
Member

Yes your observation is correct. I looked at the code and depending upon what LMT we receive from backend, blobfuse just converts it to st_mtim using the above code. Last access time and change time are also set to same LMT value only. Not able to guess at this stage why the timestamp will show incorrectly on the shell as the conversion appears to be correct in terms of code.
I did some test around to just set tv_sec or tv_nsec here. If tv_nsec is not set then it will show '0000' in the nano-second part of the time shown in shell. if tv_sec is not set then times in stat output will be totally absurd. So ignoring any of the these will not work.

@vibhansa-msft
Copy link
Member

this might potentially be some issue in libfuse where the st_mtim is not interpreted correctly.

@vibhansa-msft
Copy link
Member

Can you not change the rust code to ignore if nanosecond part is bigger or invalid?

@Tom-Newton
Copy link
Author

Tom-Newton commented Apr 4, 2023

Unfortunately I don't really have control over the rust code. The assertion is in the rust standard library and I'm using a 3rd party library that quite reasonably uses the rust standard library.

I did find rust-lang/rust#108277 which sounds like it will change the way rust handles these. Possibly that will solve my issue.

@vibhansa-msft
Copy link
Member

Rust issue you have pointed above looks linked to what you are hitting. Is there a way you can take up the version which has the fix and retry?

@Tom-Newton
Copy link
Author

I don't think there is actually a fix implemented yet and I'm not blocked by this yet since I have the workaround of using an older version of rust.

Regardless of whether I can work around it I think there is a genuine issue. Maybe I can come up with a better way to reproduce.

@vibhansa-msft
Copy link
Member

As per above discussion and our understanding the actual fix will come in Rust in some future release. Do you feel there is a change required from blobfuse end here. If no, we can close this issue here and still continue the discussion on this thread.

@Tom-Newton
Copy link
Author

Tom-Newton commented Apr 6, 2023

I would say there is still an issue effecting blobfuse that ideally would be fixed, though as we discussed its possible the issue is not actually in blobfuse, but could be one of its dependencies.

However I think rust-lang/rust#108277 will be an effective workaround in my case, so from my point of view fixing is probably low priority.

@vibhansa-msft
Copy link
Member

from blobfuse end what kind of fix you are expecting. As we discussed above the time stamp that we get from azure storage is just converted to sec and nsec while submitting that info to kernel. Omitting either of them will present user with incorrect timestamps.

@Tom-Newton
Copy link
Author

Tom-Newton commented Apr 10, 2023

I've been doing a bit more testing. It looks like setting all the tv_nsec to 0 actually solves the problem. I don't really understand this stuff well enough to know if that is a viable solution.

Also it does seem to have some relation to the version of libfuse used. I tried testing with some different versions and 2.9.9 does actually appear to sort of work. I tend to get stuff like
blobfuse 2.0.2 libfuse 2.9.9 Ubuntu 18.04

  File: /mnt/azure/mlptestdev/publictest/test.txt
  Size: 600             Blocks: 0          IO Block: 4096   regular file
Device: 57h/87d Inode: 5           Links: 1
Access: (0777/-rwxrwxrwx)  Uid: (    0/    root)   Gid: (    0/    root)
Access: 2022-09-14 15:23:10.999999999 +0000
Modify: 2022-09-14 15:23:10.999999999 +0000
Change: 2022-09-14 15:23:10.999999999 +0000
 Birth: -

Which seems unlikely but they are all valid timestamps

blobfuse 2.0.2 libfuse 3.9.0 Ubuntu 20.04

  File: /mnt/azure/mlptestdev/publictest/test.txt
  Size: 600             Blocks: 0          IO Block: 4096   regular file
Device: 42h/66d Inode: 2           Links: 1
Access: (0777/-rwxrwxrwx)  Uid: (    0/    root)   Gid: (    0/    root)
Access: 2022-09-14 15:23:10.999999999 +0000
Modify: 2022-09-14 15:23:10.1919233024 +0000
Change: 2022-09-14 15:23:10.1919233024 +0000
 Birth: -

This is the situation I'm currently in. The timestamps are invalid as well as wrong.

blobfuse 2.0.2 libfuse 3.10.5 Ubuntu 22.04

  File: /mnt/azure/mlptestdev/publictest/test.txt
  Size: 600             Blocks: 0          IO Block: 4096   regular file
Device: 57h/87d Inode: 2           Links: 1
Access: (0777/-rwxrwxrwx)  Uid: (    0/    root)   Gid: (    0/    root)
Access: 2022-09-14 15:23:10.999999999 +0000
Modify: 2022-09-14 15:23:10.919233024 +0000
Change: 2022-09-14 15:23:10.919233024 +0000
 Birth: -

Again the Access time is wrong but they are all valid numbers.

All of these tests were done in docker to ensure they are controlled and repeatable.

@vibhansa-msft
Copy link
Member

As you see in your output ignoring tv_nsec is just giving some randome value in nsec part of the timestamp. This might be a workaround but not the correct fix and this might lead to issues for other customers where nsec is working fine (non rust clients).

@Tom-Newton
Copy link
Author

Sorry I wasn't clear. The output I showed is from the current blobfuse 2.0.2. I was doing some testing of your idea that the issue might be in libfuse. It looks like some different versions of libfuse do do enough to make it return valid timestamps but none are completely correct. I have updated my previous comment to make it a bit clearer.

When using a modified build of blobfuse with the tv_nsecs all set to 0 I get exactly what I would expect:

  File: /mnt/azure/mlptestdev/publictest/test.txt
  Size: 600             Blocks: 0          IO Block: 4096   regular file
Device: 39h/57d Inode: 2           Links: 1
Access: (0777/-rwxrwxrwx)  Uid: (    0/    root)   Gid: (    0/    root)
Access: 2022-09-14 16:23:10.000000000 +0100
Modify: 2022-09-14 16:23:10.000000000 +0100
Change: 2022-09-14 16:23:10.000000000 +0100
 Birth: -

might lead to issues for other customers where nsec is working fine

It looks to me like Azure blob storage only stores these timestamps to seconds precision so I don't think it can possibly be working today. I think the current behaviour is just to return a random number for the nsec part of the timestamp.

vibhansa-msft added a commit that referenced this issue Apr 11, 2023
@vibhansa-msft vibhansa-msft linked a pull request Apr 11, 2023 that will close this issue
@vibhansa-msft
Copy link
Member

vibhansa-msft commented Apr 11, 2023

Can you test the fix I have added to this and confirm this resolves the issue.

vibhansa-msft added a commit that referenced this issue Apr 12, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants