Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix augur index I/O #900

Merged
merged 2 commits into from
Apr 25, 2022
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
7 changes: 3 additions & 4 deletions augur/index.py
Original file line number Diff line number Diff line change
Expand Up @@ -177,7 +177,7 @@ def index_sequences(sequences_path, sequence_index_path):
num_of_seqs = 0

with open_file(sequence_index_path, 'wt') as out_file:
tsv_writer = csv.writer(out_file, delimiter = '\t')
tsv_writer = csv.writer(out_file, delimiter = '\t', lineterminator='\n')

#write header i output file
header = ['strain', 'length']+labels+['invalid_nucleotides']
Expand Down Expand Up @@ -206,9 +206,8 @@ def run(args):
tot_length = None
else:
num_of_seqs, tot_length = index_sequences(args.sequences, args.output)
except ValueError as error:
print("ERROR: Problem reading in {}:".format(sequences_path), file=sys.stderr)
print(error, file=sys.stderr)
except FileNotFoundError:
print(f"ERROR: Could not open sequences file '{args.sequences}'.", file=sys.stderr)
return 1
huddlej marked this conversation as resolved.
Show resolved Hide resolved

if args.verbose:
Expand Down
26 changes: 13 additions & 13 deletions tests/builds/zika/results/sequence_index.tsv
Original file line number Diff line number Diff line change
@@ -1,13 +1,13 @@
strain length A C G T N other_IUPAC - ? invalid_nucleotides
PAN/CDC_259359_V1_V3/2015 10771 2952 2379 3142 2298 0 0 0 0 0
COL/FLR_00024/2015 10659 2921 2344 3113 2281 0 0 0 0 0
PRVABC59 10675 2923 2351 3115 2286 0 0 0 0 0
COL/FLR_00008/2015 10659 2924 2344 3110 2281 0 0 0 0 0
Colombia/2016/ZC204Se 10608 2907 2332 3093 2275 0 1 0 0 0
ZKC2/2016 10807 2955 2389 3159 2304 0 0 0 0 0
VEN/UF_1/2016 10808 2958 2383 3152 2315 0 0 0 0 0
DOM/2016/BB_0059 10035 2563 2089 2741 2015 621 6 0 0 0
BRA/2016/FC_6706 10366 2747 2203 2915 2165 329 7 0 0 0
DOM/2016/BB_0183 10621 2910 2343 3099 2269 0 0 0 0 0
EcEs062_16 10812 2960 2388 3158 2306 0 0 0 0 0
HND/2016/HU_ME59 10365 2842 2271 3016 2233 0 3 0 0 0
strain length A C G T N other_IUPAC - ? invalid_nucleotides
PAN/CDC_259359_V1_V3/2015 10771 2952 2379 3142 2298 0 0 0 0 0
COL/FLR_00024/2015 10659 2921 2344 3113 2281 0 0 0 0 0
PRVABC59 10675 2923 2351 3115 2286 0 0 0 0 0
COL/FLR_00008/2015 10659 2924 2344 3110 2281 0 0 0 0 0
Colombia/2016/ZC204Se 10608 2907 2332 3093 2275 0 1 0 0 0
ZKC2/2016 10807 2955 2389 3159 2304 0 0 0 0 0
VEN/UF_1/2016 10808 2958 2383 3152 2315 0 0 0 0 0
DOM/2016/BB_0059 10035 2563 2089 2741 2015 621 6 0 0 0
BRA/2016/FC_6706 10366 2747 2203 2915 2165 329 7 0 0 0
DOM/2016/BB_0183 10621 2910 2343 3099 2269 0 0 0 0 0
EcEs062_16 10812 2960 2388 3158 2306 0 0 0 0 0
HND/2016/HU_ME59 10365 2842 2271 3016 2233 0 3 0 0 0
Binary file modified tests/builds/zika/results/sequence_index.tsv.gz
Binary file not shown.
24 changes: 24 additions & 0 deletions tests/functional/index.t
Original file line number Diff line number Diff line change
@@ -0,0 +1,24 @@
Integration tests for augur index.

$ pushd "$TESTDIR" > /dev/null
$ export AUGUR="../../bin/augur"

Index Zika sequences.

$ ${AUGUR} index \
> --sequences index/sequences.fasta \
> --output "$TMP/sequence_index.tsv"

$ diff -u "index/sequence_index.tsv" "$TMP/sequence_index.tsv"
$ rm -f "$TMP/sequence_index.tsv"

Try indexing sequences that do not exist.
This should fail.

$ ${AUGUR} index \
> --sequences index/missing_sequences.fasta \
> --output "$TMP/sequence_index.tsv"
ERROR: Could not open sequences file 'index/missing_sequences.fasta'.
[1]

$ popd > /dev/null
13 changes: 13 additions & 0 deletions tests/functional/index/sequence_index.tsv
Original file line number Diff line number Diff line change
@@ -0,0 +1,13 @@
strain length A C G T N other_IUPAC - ? invalid_nucleotides
PAN/CDC_259359_V1_V3/2015 10771 2952 2379 3142 2298 0 0 0 0 0
COL/FLR_00024/2015 10659 2921 2344 3113 2281 0 0 0 0 0
PRVABC59 10675 2923 2351 3115 2286 0 0 0 0 0
COL/FLR_00008/2015 10659 2924 2344 3110 2281 0 0 0 0 0
Colombia/2016/ZC204Se 10608 2907 2332 3093 2275 0 1 0 0 0
ZKC2/2016 10807 2955 2389 3159 2304 0 0 0 0 0
VEN/UF_1/2016 10808 2958 2383 3152 2315 0 0 0 0 0
DOM/2016/BB_0059 10035 2563 2089 2741 2015 621 6 0 0 0
BRA/2016/FC_6706 10366 2747 2203 2915 2165 329 7 0 0 0
DOM/2016/BB_0183 10621 2910 2343 3099 2269 0 0 0 0 0
EcEs062_16 10812 2960 2388 3158 2306 0 0 0 0 0
SG_018 10659 2906 2367 3110 2257 19 0 0 0 0
Loading