You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
{{ message }}
This repository has been archived by the owner on Jan 20, 2022. It is now read-only.
When I try to feed a large dataset (hg38.fa, about 3.1GB) to the bwa mem command, a [fseek] error come out at the very beginning. When I feed a smaller dataset (mref.fa, about 1.1GB), it works fine.
This bwa application may consume very large memory space. So I set the sgx.enclave_size = "32G". This is the most I can set since I only have a 64G main memory. Well, I cannot set it as "64G" since it would go wrong, not enough memory of course.
Steps to reproduce
PLEASE ENSURE THAT THE ISSUE REPRODUCES ON THE CURRENT MASTER BRANCH
-->
commit ID:6ba10c6
I know that reproduce the issue might be hard. But FYI, I still post my steps below.
I wrote a template for Graphene, to run applications more conveniently.
And you may need a machine with at least 32G main memory and you may need to download the human genome datasets.
I don't expect you to take too much time to reproduce it. But what I can see is that the error comes out very soon. After a long enclave initialization time, it pops. It seems that the error happens when loading the input data.
I also heard that someone else @ya0guang encountered a similar issue. And I know it might be hard to fix. So I wonder if there is a workaround when we want to load a huge data input?
Expected results
I think this must be a "huge data" issue. Since when I run a smaller dataset, it can give me a correct result.
Actual results
graphene-sgx bwa mem data/hg38_reference.fa data/SRR062634.filt.fastq
error: Using insecure argv source. Graphene will continue application execution, but this configuration must not be used in production!
[fseek] No such file or directory
Makefile:70: recipe for target 'run' failed
make: *** [run] Error 1
Thanks!
The text was updated successfully, but these errors were encountered:
Thank you for the report! It seems that we have a bug in handling large files (> 2 GB).
Could you check if PR #2485 fixes your problem? It's a one-line fix:
--- a/LibOS/shim/src/sys/shim_open.c+++ b/LibOS/shim/src/sys/shim_open.c@@ -218,7 +218,7 @@ long shim_do_lseek(int fd, off_t offset, int origin) {
if (!hdl)
return -EBADF;
- int ret = 0;+ off_t ret = 0;
if (hdl->is_dir) {
ret = do_lseek_dir(hdl, offset, origin);
goto out;
Description of the problem
I am running a bwa (https://github.com/lh3/bwa) application, which is a bio-informatics algorithm, on Graphene.
When I try to feed a large dataset (hg38.fa, about 3.1GB) to the
bwa mem
command, a [fseek] error come out at the very beginning. When I feed a smaller dataset (mref.fa, about 1.1GB), it works fine.This
bwa
application may consume very large memory space. So I set the sgx.enclave_size = "32G". This is the most I can set since I only have a 64G main memory. Well, I cannot set it as "64G" since it would go wrong, not enough memory of course.Steps to reproduce
PLEASE ENSURE THAT THE ISSUE REPRODUCES ON THE CURRENT MASTER BRANCH
-->
commit ID:6ba10c6
I know that reproduce the issue might be hard. But FYI, I still post my steps below.
I wrote a template for Graphene, to run applications more conveniently.
Then set the Graphene Dir at https://github.com/StanPlatinum/graphene-bwa/blob/main/Makefile#L15
Also, set the Application Dir at https://github.com/StanPlatinum/graphene-bwa/blob/main/Makefile#L12
The manifest can be found at https://github.com/StanPlatinum/graphene-bwa/blob/main/bwa.manifest.template
And you may need a machine with at least 32G main memory and you may need to download the human genome datasets.
I don't expect you to take too much time to reproduce it. But what I can see is that the error comes out very soon. After a long enclave initialization time, it pops. It seems that the error happens when loading the input data.
I also heard that someone else @ya0guang encountered a similar issue. And I know it might be hard to fix. So I wonder if there is a workaround when we want to load a huge data input?
Expected results
I think this must be a "huge data" issue. Since when I run a smaller dataset, it can give me a correct result.
Actual results
Thanks!
The text was updated successfully, but these errors were encountered: