[fseek] No such file or directory when loading large input #2478

StanPlatinum · 2021-06-27T00:51:13Z

Description of the problem

I am running a bwa (https://github.com/lh3/bwa) application, which is a bio-informatics algorithm, on Graphene.

When I try to feed a large dataset (hg38.fa, about 3.1GB) to the bwa mem command, a [fseek] error come out at the very beginning. When I feed a smaller dataset (mref.fa, about 1.1GB), it works fine.

This bwa application may consume very large memory space. So I set the sgx.enclave_size = "32G". This is the most I can set since I only have a 64G main memory. Well, I cannot set it as "64G" since it would go wrong, not enough memory of course.

Steps to reproduce

PLEASE ENSURE THAT THE ISSUE REPRODUCES ON THE CURRENT MASTER BRANCH

-->

commit ID：6ba10c6

I know that reproduce the issue might be hard. But FYI, I still post my steps below.
I wrote a template for Graphene, to run applications more conveniently.

git clone https://github.com/StanPlatinum/graphene-bwa

Then set the Graphene Dir at https://github.com/StanPlatinum/graphene-bwa/blob/main/Makefile#L15
Also, set the Application Dir at https://github.com/StanPlatinum/graphene-bwa/blob/main/Makefile#L12

cd graphene-bwa
make SGX=1 run

The manifest can be found at https://github.com/StanPlatinum/graphene-bwa/blob/main/bwa.manifest.template

And you may need a machine with at least 32G main memory and you may need to download the human genome datasets.

I don't expect you to take too much time to reproduce it. But what I can see is that the error comes out very soon. After a long enclave initialization time, it pops. It seems that the error happens when loading the input data.

I also heard that someone else @ya0guang encountered a similar issue. And I know it might be hard to fix. So I wonder if there is a workaround when we want to load a huge data input?

Expected results

I think this must be a "huge data" issue. Since when I run a smaller dataset, it can give me a correct result.

Actual results

graphene-sgx bwa mem data/hg38_reference.fa data/SRR062634.filt.fastq
error: Using insecure argv source. Graphene will continue application execution, but this configuration must not be used in production!
[fseek] No such file or directory
Makefile:70: recipe for target 'run' failed
make: *** [run] Error 1

Thanks!

The text was updated successfully, but these errors were encountered:

pwmarcz · 2021-06-29T13:20:18Z

@StanPlatinum

Thank you for the report! It seems that we have a bug in handling large files (> 2 GB).

Could you check if PR #2485 fixes your problem? It's a one-line fix:

--- a/LibOS/shim/src/sys/shim_open.c
+++ b/LibOS/shim/src/sys/shim_open.c
@@ -218,7 +218,7 @@ long shim_do_lseek(int fd, off_t offset, int origin) {
     if (!hdl)
         return -EBADF;
 
-    int ret = 0;
+    off_t ret = 0;
     if (hdl->is_dir) {
         ret = do_lseek_dir(hdl, offset, origin);
         goto out;

StanPlatinum · 2021-07-01T03:19:52Z

@pwmarcz Thanks!

Yes, the PR fixes it! Feel free to close the issue.

mkow assigned pwmarcz Jun 29, 2021

pwmarcz mentioned this issue Jun 29, 2021

[LibOS] Fix lseek with large offsets #2485

Merged

dimakuv closed this as completed Jul 1, 2021

pwmarcz mentioned this issue Sep 10, 2021

Filesystem refactoring gramineproject/gramine#24

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[fseek] No such file or directory when loading large input #2478

[fseek] No such file or directory when loading large input #2478

StanPlatinum commented Jun 27, 2021

pwmarcz commented Jun 29, 2021

StanPlatinum commented Jul 1, 2021

[fseek] No such file or directory when loading large input #2478

[fseek] No such file or directory when loading large input #2478

Comments

StanPlatinum commented Jun 27, 2021

Description of the problem

Steps to reproduce

PLEASE ENSURE THAT THE ISSUE REPRODUCES ON THE CURRENT MASTER BRANCH

Expected results

Actual results

pwmarcz commented Jun 29, 2021

StanPlatinum commented Jul 1, 2021