Skip to content
This repository has been archived by the owner on Jan 20, 2022. It is now read-only.

[fseek] No such file or directory when loading large input #2478

Closed
StanPlatinum opened this issue Jun 27, 2021 · 2 comments
Closed

[fseek] No such file or directory when loading large input #2478

StanPlatinum opened this issue Jun 27, 2021 · 2 comments
Assignees

Comments

@StanPlatinum
Copy link

Description of the problem

I am running a bwa (https://github.com/lh3/bwa) application, which is a bio-informatics algorithm, on Graphene.

When I try to feed a large dataset (hg38.fa, about 3.1GB) to the bwa mem command, a [fseek] error come out at the very beginning. When I feed a smaller dataset (mref.fa, about 1.1GB), it works fine.

This bwa application may consume very large memory space. So I set the sgx.enclave_size = "32G". This is the most I can set since I only have a 64G main memory. Well, I cannot set it as "64G" since it would go wrong, not enough memory of course.

Steps to reproduce

PLEASE ENSURE THAT THE ISSUE REPRODUCES ON THE CURRENT MASTER BRANCH

-->

commit ID:6ba10c6

I know that reproduce the issue might be hard. But FYI, I still post my steps below.
I wrote a template for Graphene, to run applications more conveniently.

git clone https://github.com/StanPlatinum/graphene-bwa

Then set the Graphene Dir at https://github.com/StanPlatinum/graphene-bwa/blob/main/Makefile#L15
Also, set the Application Dir at https://github.com/StanPlatinum/graphene-bwa/blob/main/Makefile#L12

cd graphene-bwa
make SGX=1 run

The manifest can be found at https://github.com/StanPlatinum/graphene-bwa/blob/main/bwa.manifest.template

And you may need a machine with at least 32G main memory and you may need to download the human genome datasets.

I don't expect you to take too much time to reproduce it. But what I can see is that the error comes out very soon. After a long enclave initialization time, it pops. It seems that the error happens when loading the input data.

I also heard that someone else @ya0guang encountered a similar issue. And I know it might be hard to fix. So I wonder if there is a workaround when we want to load a huge data input?

Expected results

I think this must be a "huge data" issue. Since when I run a smaller dataset, it can give me a correct result.

Actual results

graphene-sgx bwa mem data/hg38_reference.fa data/SRR062634.filt.fastq
error: Using insecure argv source. Graphene will continue application execution, but this configuration must not be used in production!
[fseek] No such file or directory
Makefile:70: recipe for target 'run' failed
make: *** [run] Error 1

Thanks!

@pwmarcz
Copy link
Contributor

pwmarcz commented Jun 29, 2021

@StanPlatinum

Thank you for the report! It seems that we have a bug in handling large files (> 2 GB).

Could you check if PR #2485 fixes your problem? It's a one-line fix:

--- a/LibOS/shim/src/sys/shim_open.c
+++ b/LibOS/shim/src/sys/shim_open.c
@@ -218,7 +218,7 @@ long shim_do_lseek(int fd, off_t offset, int origin) {
     if (!hdl)
         return -EBADF;
 
-    int ret = 0;
+    off_t ret = 0;
     if (hdl->is_dir) {
         ret = do_lseek_dir(hdl, offset, origin);
         goto out;

@StanPlatinum
Copy link
Author

@pwmarcz Thanks!

Yes, the PR fixes it! Feel free to close the issue.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants