You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The r2 reads is specified with GTGAGTGATGGTTGAGGTAGTGTGGAG at 5'. So I use --adapter_sequence_r2 GTGAGTGATGGTTGAGGTAGTGTGGAG to identify the valid reads, and the command is shown below:
The result suggests only 1/6 reads(~6000000, R1+R2) remains, 5/6 reads were filtered because of short length.
Filtering result:
reads passed filter: 6881134
reads failed due to low quality: 60130
reads failed due to too many N: 0
reads failed due to too short: 30160650
reads failed due to low complexity: 1218
reads with adapter trimmed: 21527581
bases trimmed due to adapters: 2374478381
reads with polyX in 3' end: 472347
bases trimmed in polyX tail: 5270559
Then, I use grep to sum the number of reads with GTGAGTGATGGTTGAGGTAGTGTGGAG at R2 head and find 17606348 R2 reads are start with valid adapter. So I think there might be other reason caused missing reads.
To discovery which factor caused low filtered reads, I removed --adapter_sequence_r2 GTGAGTGATGGTTGAGGTAGTGTGGAG and ran fastp again. However, much more reads were acquired.
Filtering result:
reads passed filter: 30428586
reads failed due to low quality: 221476
reads failed due to too many N: 0
reads failed due to too short: 608
reads failed due to low complexity: 1332
reads with adapter trimmed: 5489442
bases trimmed due to adapters: 109359880
reads with polyX in 3' end: 819659
bases trimmed in polyX tail: 9294422
I want to know why many valid reads miss after adding --adapter_sequence_r2 GTGAGTGATGGTTGAGGTAGTGTGGAG .
And in the output clean data, many reads still have adapter sequence.
for example, target adapter 'GTGAGTGATGGTTGAGGTAGTGTGGAG' is located at the 5' of the E200012434L1C001R00100009337 read2. And it's expected to get trimmed reads2 like 'CGGGGTTATAGTGTGAGATTTTGTTTTAAGAATAAAAAAATTTTAAAATAAGATAATTTTATTTTTATATAAATTATTTTAGAGTATAATAAAAGGAAAATTTTTAAATTTATTATATAAAGT', but the fastp trimmed the whole E200012434L1C001R00100009337 read2.
---input---
The r2 reads is specified with GTGAGTGATGGTTGAGGTAGTGTGGAG at 5'. So I use
--adapter_sequence_r2 GTGAGTGATGGTTGAGGTAGTGTGGAG
to identify the valid reads, and the command is shown below:The result suggests only 1/6 reads(~6000000, R1+R2) remains, 5/6 reads were filtered because of short length.
Then, I use grep to sum the number of reads with GTGAGTGATGGTTGAGGTAGTGTGGAG at R2 head and find 17606348 R2 reads are start with valid adapter. So I think there might be other reason caused missing reads.
To discovery which factor caused low filtered reads, I removed
--adapter_sequence_r2 GTGAGTGATGGTTGAGGTAGTGTGGAG
and ran fastp again. However, much more reads were acquired.I want to know why many valid reads miss after adding
--adapter_sequence_r2 GTGAGTGATGGTTGAGGTAGTGTGGAG
.And in the output clean data, many reads still have adapter sequence.
The text was updated successfully, but these errors were encountered: