Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Error when running angle_data_preparation_py.ipynb #46

Open
dsikar opened this issue Apr 8, 2021 · 1 comment
Open

Error when running angle_data_preparation_py.ipynb #46

dsikar opened this issue Apr 8, 2021 · 1 comment

Comments

@dsikar
Copy link

dsikar commented Apr 8, 2021

I am running the README.md steps on the Intel DevCloud. I generated full_under_200.txt both with the julia script get_proteins_under_200aa.jl and julia_get_proteins_under_200aa.ipynb for good measure. A diff says files are different but they look the same (tab separated values).
In the DevCloud environment, when I run angle_data_preparation_py.ipynb, I get an error when extracting data from text:

# Scan first n proteins
names = []
seqs = []
psis = []
phis = []
pssms = []
(...)

ValueError: could not convert string to float: '0.0\ (...)

Which can be suppresed by changing function parse_lines(raw) to:

# Helper functions to extract numeric data from text
def parse_lines(raw):
    # added tab \t to suppress previous error
    return np.array([[float(x) for x in line.split("\t") if x != ""] for line in raw])
(...)

That gets passed the first error, but then throws another one further down:

(...)
---> 10             outputs.append([phis[i][j], psis[i][j]])
     11             # break
     12         # print(i, "Added: ", len(seqs[i])-34,"total for now:  ", long)

IndexError: list index out of range

Which I suspect has someting to do with one of the previous outputs, and the features' "n. prots" not being the same:

# Ensure all features have same n. prots

print("Names: ", len(names))
print("Seqs: ", len(seqs))
print("PSSMs: ", len(pssms))
print("Phis: ", len(phis))
print("Psis: ", len(psis))

Names:  601
Seqs:  600
PSSMs:  600
Phis:  0
Psis:  0

Any suggestions on what could be wrong in parsing the full_under_200.txt file?

@hypnopump
Copy link
Owner

Hi there,
I don't know exactly what might be causing the error. I suggest manually debugging it (try to see the conflicting protein, inspecting the lengths of psi and psi lists, ...)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants