Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adding the ThermoML dataset #117

Open
ml-evs opened this issue Mar 16, 2023 · 2 comments · May be fixed by #118
Open

Adding the ThermoML dataset #117

ml-evs opened this issue Mar 16, 2023 · 2 comments · May be fixed by #118

Comments

@ml-evs
Copy link
Contributor

ml-evs commented Mar 16, 2023

I'm happy to have a go at adding the ThermoML archives as a dataset, if this is useful.

Already mentioned on Discord by @marcosfelt, including the useful link to thermopyl and @marcosfelt's updated fork.

@ml-evs
Copy link
Contributor Author

ml-evs commented Mar 16, 2023

In terms of licensing, it looks like the ThermoML archive has recently been "FAIRified" and can be downloaded in bulk at http://doi.org/10.18434/mds2-2422

@marcosfelt
Copy link

marcosfelt commented Mar 16, 2023

If I remember correctly, I used the massive flat file from this link!

In terms of licensing, it looks like the ThermoML archive has recently been "FAIRified" and can be downloaded in bulk at http://doi.org/10.18434/mds2-2422

Also, to save you some time, here's a rough sketch of the code you need to just get pandas dataframes:

from thermopyl import Parser

# Get the data for each of the the journals
data_paths = [
    "ThermoML.v2020-09-30/10.1021/",
    "ThermoML.v2020-09-30/10.1016/",
    "ThermoML.v2020-09-30/10.1007/"
]


for data_path in data_paths:
     parser = Parser(data_path)
     parsed_data = parser.parse()
     parsed_data = pd.DataFrame(parsed_data)
     doi = data_path.split("/")[-2]
     parsed_data.to_parquet(f"{doi}.pq")

@ml-evs ml-evs linked a pull request Mar 20, 2023 that will close this issue
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants