diff --git a/Open-Science-101/Module_1/Lesson_1/readme.md b/Open-Science-101/Module_1/Lesson_1/readme.md index 98752739..0e9cf037 100644 --- a/Open-Science-101/Module_1/Lesson_1/readme.md +++ b/Open-Science-101/Module_1/Lesson_1/readme.md @@ -31,7 +31,7 @@ This is the first lesson in the module on the Ethos of Open Science. Let's begin "Ethos is the distinguishing character, sentiment, moral nature, or guiding beliefs of a person, group," -**Merriam Webster** +**Merriam-Webster** --- @@ -53,18 +53,13 @@ NASA funds research in fields from Astrobiology to Physics, and basic science to The open science practices and principles that play a critical role supporting NASA mission success are equally relevant to other government agencies and institutions. Similar considerations, approaches, and behaviors are needed in a variety of scientific contexts. Tools for open science frameworks and workflows follow generally similar models. -Open science practices and principles can be applied to all stages of the research process. One early example of NASA’s efforts to involve more people in science is the [exoplanet citizen science projects](https://exoplanets.nasa.gov/citizen-science/). The [Exoplanet Explorers](https://www.zooniverse.org/projects/ianc2/exoplanet-explorers) project posed the questions: - -- Are small planets (like Venus) more common than big ones (like Saturn)? -- Are short-period planets (like Mercury) more common than those on long orbits (like Mars)? -- Do planets more commonly occur around stars like the Sun, or around the more numerous, cooler, smaller red dwarfs? +Open science practices and principles can be applied to all stages of the research process. One early example of NASA's efforts to involve more people in science is the [exoplanet citizen science projects](https://exoplanets.nasa.gov/citizen-science/), with the [Exoplanet Explorers](https://www.zooniverse.org/projects/ianc2/exoplanet-explorers) being a significant part of this effort. -"Stargazing Live", a live television program, took place across three consecutive nights in 2017. The hosts invited viewers to contribute to their research question by classifying solar systems from an open access dataset. Within 48 hours of the program's -debut, more than 10,000 people had participated in [Exoplanet Explorers](https://www.zooniverse.org/projects/ianc2/exoplanet-explorers) and classified over 2 million systems. +"Stargazing Live", a live television program, took place across three consecutive nights in 2017. The hosts invited viewers to identify exoplanets in an open access dataset. Within 48 hours of the program's debut, more than 10,000 people had participated in [Exoplanet Explorers](https://www.zooniverse.org/projects/ianc2/exoplanet-explorers) and classified over 2 million systems. -Following the first night of the program, the researchers watched the results roll in, as citizen scientists helped sift through the data. On the second night, enough people had participated that the researchers were able to share the demographics of the planet candidates that had already been flagged and were undergoing additional analysis: 44 Jupiter-size planets, 72 Neptune-size planets, 44 Earth-size planets and 53 sub-Neptunes (larger than Earth but smaller than Neptune). +Following the first night of the program, the researchers watched the results roll in, as citizen scientists helped sift through the data. On the second night, enough people had participated that the researchers were able to share that 44 Jupiter-size candidate planets, 72 Neptune-size candidate planets, 53 sub-Neptune size candidate planets (larger than Earth but smaller than Neptune), and 44 Earth-size candidate planets had already been found and were undergoing additional analysis. Communities, working together on a problem, can rapidly find new results! Open science enables this and more. @@ -91,15 +86,15 @@ Source: [https://www.pnas.org/doi/full/10.1073/pnas.1708290115](https://www.pnas - **No Yes** - **Open Results Enable Iteration and Improve Error-Detection** -In this section, we will look at an example of how closed science can restrict research impact by [following the outcome](https://digitalcommons.unl.edu/cgi/viewcontent.cgi?article=1318&context=usdeptcommercepub) of a highly cited journal article to understand how science functions to inform a field’s state of research, the decisions of policy makers, and the actions of society. +In this section, we will look at an example of how closed science can restrict research impact by [following the outcome](https://digitalcommons.unl.edu/cgi/viewcontent.cgi?article=1318&context=usdeptcommercepub) of a highly cited journal article to understand how science functions to inform a field’s state of research, the decisions of policymakers, and the actions of society. -A 1990 analysis of satellite data on climate temperature concluded that the upper atmosphere experienced no warming, a finding that contradicted early climate models predictions. Policymakers concluded from this result that researchers don't understand climate models enough to warrant changes in environmental policy. The processed data from this study were made open-access but, as was typical for the time, neither the original data nor the code used for processing and analyzing the data were shared by the original research team. Eight years after the article was published, other scientists noticed that the original authors didn't account for several important effects. This oversight introduced errors into the dataset and falsely produced artificial cooling to the temperature measurements. It took another five years and additional funding to reproduce the code and conduct a new analysis. Thirteen years after the original paper, it was confirmed that the upper atmosphere was warming and agreed with climate model predictions. +A 1990 analysis of satellite data on climate temperature concluded that the upper troposphere experienced no warming, a finding that contradicted early climate models predictions. Policymakers concluded from this result that researchers don't understand climate models enough to warrant changes in environmental policy. The processed data from this study were made open-access but, as was typical for the time, neither the original data nor the code used for processing and analyzing the data were shared by the original research team. Eight years after the article was published, other scientists noticed that the original authors didn't account for several important effects. This oversight introduced errors into the dataset and falsely produced artificial cooling to the temperature measurements. It took another five years and additional funding to reproduce the code and conduct a new analysis. Thirteen years after the original paper, it was confirmed that the upper troposphere was warming and agreed with climate model predictions. + +*Note: Learn about the layers of Earth's atmosphere [here](https://www.sciencefacts.net/layers-of-atmosphere.html).* The inability for the scientific community to access an article’s original data and code slows the pace of discovery, thirteen years in this case, and forces other research teams to repeat the work (code) instead of moving on to new projects. This isn't the pace that we want to advance science, with one step forward and two steps back to iterate and resolve problems. @@ -146,7 +141,7 @@ The Ethos of Open Science is a broad term that encompasses the moral and ethical Diverse practices, assumptions, and goals are just part of the complexity of open science. There are also divergent moral principles that guide open science communities. Such principles are captured in "codes of conduct". A code of conduct is a community governance mechanism that outlines the principles and practices expected of a given research community’s members, as well as the process for investigating and reprimanding those in violation of the code. In a sense, a code of conduct constitutes the moral backbone of a research community. However, as with the numerous schools of thought, there are similarly many codes of conduct. In other words, there is no one set of universal principles that all open science -practitioners abide by. For example, consider how [OLS](https://openlifesci.org/code-of-conduct), [INOSC](https://osf.io/6gsye), [allea](https://allea.org/portfolio-item/the-european-code-of-conduct-for-research-integrity-2/), [AGU](https://www.agu.org/Plan-for-a-Meeting/AGUMeetings/Meetings-Resources/Meetings-code-of-conduct) and [Ethical Source](https://ethicalsource.dev/community-code-of-conduct/) all have different codes of conducts and guiding principles. +practitioners abide by. For example, consider how [OLS](https://openlifesci.org/code-of-conduct), [INOSC](https://osf.io/6gsye), [allea](https://allea.org/portfolio-item/the-european-code-of-conduct-for-research-integrity-2/), [AGU](https://www.agu.org/Plan-for-a-Meeting/AGUMeetings/Meetings-Resources/Meetings-code-of-conduct) and [Ethical Source](https://ethicalsource.dev/) all have different codes of conducts and guiding principles. This great diversity responds to the growing proliferation of open science initiatives and the great use we can make of open science approaches to knowledge. @@ -297,7 +292,7 @@ As briefly discussed in previous lessons, open science doesn’t only involve re -Scientific research should benefit humanity. Although open science has many stakeholders, the advantageous interaction between science and society takes place among three core groups: scientific researchers, policymakers, and the public. Researchers do science and share their results with policy makers and the general public to inform their decisions and improve their lives. The public helps to fund research through taxes and can provide input to future areas of study. Policymakers help to implement measures that are informed by scientific results to improve the health, environment, and livability of society. +Scientific research should benefit humanity. Although open science has many stakeholders, the advantageous interaction between science and society takes place among three core groups: scientific researchers, policymakers, and the public. Researchers do science and share their results with policymakers and the general public to inform their decisions and improve their lives. The public helps to fund research through taxes and can provide input to future areas of study. Policymakers help to implement measures that are informed by scientific results to improve the health, environment, and livability of society. These three stakeholder groups remain central to the world of open science. However, the inclusive nature of open science demands participation from the broader public. Growth in public participation in science can occur by removing barriers to those historically excluded and by expanding the community of people who support the scientific research itself. @@ -334,7 +329,7 @@ Here we list some core groups who we envision as taking part in and/or benefitti -

Policymakers represent another key community in the science environment. Policy makers can reference scientific findings to inform their decisions for the betterment of society. Those who help in the understanding and dissemination of these policies (including educators and science journalists) are crucial to this process. Policy makers can also play important roles in ensuring and facilitating open science by setting data management processes, encouraging open access legislation, and developing ethical guidelines for experiments. Policy makers can benefit from open science by gaining better access to scientific output via the open sharing of research results.

+

Policymakers represent another key community in the science environment. Policymakers can reference scientific findings to inform their decisions for the betterment of society. Those who help in the understanding and dissemination of these policies (including educators and science journalists) are crucial to this process. Policymakers can also play important roles in ensuring and facilitating open science by setting data management processes, encouraging open access legislation, and developing ethical guidelines for experiments. Policymakers can benefit from open science by gaining better access to scientific output via the open sharing of research results.

@@ -406,8 +401,8 @@ Why is open science happening now? - Communicating research results is an integral aspect of science. - The internet and increasing availability of computers enables access by a much wider segment of society. -- Researchers are mandated by law to share all their findings and raw data immediately after conducting their experiments. -- Open Science is happening now because of a global shortage in academic professionals, necessitating widespread public involvement. +- Researchers are mandated by law to share all their findings and raw data immediately after conducting their experiments. +- Open Science is happening now because of a global shortage in academic professionals, necessitating widespread public involvement. *Question* @@ -430,4 +425,4 @@ What societal problems can open science help to address? Select all that apply. - The increase in paper usage due to the printing of scientific journals and articles. - The overpopulation of certain animal species used in laboratory testing. - The issue of decreasing interest in artistic and cultural studies due to the emphasis on scientific research. -- The rising costs of luxury consumer goods are influenced by technological advancements. \ No newline at end of file +- The rising costs of luxury consumer goods are influenced by technological advancements. diff --git a/Open-Science-101/Module_1/Lesson_2/readme.md b/Open-Science-101/Module_1/Lesson_2/readme.md index 5bd765b7..bae43fce 100644 --- a/Open-Science-101/Module_1/Lesson_2/readme.md +++ b/Open-Science-101/Module_1/Lesson_2/readme.md @@ -62,15 +62,14 @@ Well-documented research products also demonstrate the quality of your work, whi ### Give and Get Credit When Using Results of Others - - - -In addition to documenting your own research, the practice of giving credit to everyone who has contributed will strengthen your scientific community reputation and actualize the shared values of open science. As people gain confidence in the benefits of cooperative research, they will also start giving credit to more contributions that might previously have gone unacknowledged. Different work performed as part of a paper can be given in an author contribution statement like the example shared here. - The Turing Way project illustration by Scriberia. Used under a CC-BY 4.0 license. DOI: 10.5281/zenodo.3332807. +In addition to documenting your own research, the practice of giving credit to everyone who has contributed will strengthen your scientific community reputation and actualize the shared values of open science. As people gain confidence in the benefits of cooperative research, they will also start giving credit to more contributions that might previously have gone unacknowledged. Different work performed as part of a paper can be given in an author contribution statement, often required by journals, like the example below taken from [this paper](https://journals.physiology.org/doi/full/10.1152/jn.00636.2019). Additionally, it is important to also include an acknowledgement statement to give credit to those who have shared resources, equipment, or knowledge, without which the final product or paper may not have been possible. + + + --- ### More Visibility and Impact @@ -81,9 +80,9 @@ In addition to improved scientific accuracy, adhering to open science practices #### Emerging evidence that some aspects of open science can increase your citations. -Publishing open access increases citation count by 18%, according to a 2018 [study](https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7176083/). +Publishing open access increases citation count by 18%, according to a 2018 [study](https://peerj.com/articles/4375/). -Articles that make their data openly accessible via a direct link to a repository see ~25% higher citation impact, according to a 2020 study. +Articles that make their data openly accessible via a direct link to a repository see ~25% higher citation impact, according to a 2020 [study](https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7176083/). Publishing as open access may have prohibitive costs for some researchers depending on the venue. There are often other options that allow authors to share their work freely and openly. In Module 5 on Open Results, we discuss some of these other options including preprints and diamond open access. @@ -142,7 +141,7 @@ Here is an example of open science that was able to correct errors in a healthca ### Open Science Leads to More Discoveries -The Solar and Heliospheric Observatory (SOHO) has been sending home images of our dynamic sun, opening up a new era of solar observation. It was designed for heliophysics. However, planetary scientists found SOHO useful for its ability to spot comets that pass extremely close to the sun, known as sungrazers. To this day, SOHO is one of the best sources for views of the giant surface explosions regularly produced by the sun called coronal mass ejections, or CMEs, which can hurl a million tons of solar particles off into space. This field of view is large enough to see a sungrazing comet as it sling shots around the sun. +The Solar and Heliospheric Observatory (SOHO) has been sending home images of our dynamic sun, opening up a new era of solar observation. It was designed for heliophysics. However, planetary scientists found SOHO useful for its ability to spot comets that pass extremely close to the sun, known as sungrazers. To this day, SOHO is one of the best sources for views of the giant surface explosions regularly produced by the sun called coronal mass ejections, or CMEs, which can hurl a million tons of solar particles off into space. This field of view is large enough to see a sungrazing comet as it slingshots around the sun. SOHO's great success as a comet finder is, of course, dependent on the people who sift through SOHO's data – a task made open to the world through real-time publicly available data. @@ -154,7 +153,7 @@ In 2022 though, NASA decided to fund a challenge open to the public to develop n ### Quality and Diversity of Scholarly Communications -Furthermore, open science improves the state of scientific literature. Scientific journals have traditionally faced the severe issue of publication bias, where journal articles overwhelmingly feature novel and positive results, according to a 2018 [study](https://pubmed.ncbi.nlm.nih.gov/30523135/). This results in a state where scientific results in certain disciplines published scientific results may have a number of exaggerated effects, or even be “false positives” (wrongly claiming that an effect exists), making it difficult to evaluate the trustworthiness of published results, according to a 2011 and 2016 study. Open science practices, such as registered reports, mitigate publication bias and improve the trustworthiness of the scientific literature. Registered reports are journal publication formats that peer-review and accept articles before data collection is undertaken, eliminating the pressure to distort results, according to a 2022 [study](https://www.nature.com/articles/s41562-021-01193-7). Other open science practices, such as pre-registration, also allows a partial look into projects that for various reasons (such as lack of funding, logistical issues or shifts in organizational priorities) have not been completed or disseminated, according to a 2023 [study](https://pubmed.ncbi.nlm.nih.gov/34396837/), giving these projects a publicly available output that can help inform about the current state research. +Furthermore, open science improves the state of scientific literature. Scientific journals have traditionally faced the severe issue of publication bias, where journal articles overwhelmingly feature novel and positive results, according to a 2018 [study](https://pubmed.ncbi.nlm.nih.gov/30523135/). This results in a state where scientific results in certain disciplines published may have a number of exaggerated effects, or even be "false positives" (wrongly claiming that an effect exists), making it difficult to evaluate the trustworthiness of published results, according to a 2011 and 2016 study. Open science practices, such as registered reports, mitigate publication bias and improve the trustworthiness of the scientific literature. Registered reports are journal publication formats that peer-review and accept articles before data collection is undertaken, eliminating the pressure to distort results, according to a 2022 [study](https://www.nature.com/articles/s41562-021-01193-7). Other open science practices, such as pre-registration, also allows a partial look into projects that for various reasons (such as lack of funding, logistical issues or shifts in organizational priorities) have not been completed or disseminated, according to a 2023 [study](https://pubmed.ncbi.nlm.nih.gov/34396837/), giving these projects a publicly available output that can help inform about the current state research. @@ -171,7 +170,7 @@ By using openly available tools and making our scientific process and products m The mainstream adoption of open science began relatively recently. The potential benefits of open science extend beyond research through contributions to society and policy. -Collaboration, innovation, education, technology advancement, and science-based public policy are all improved by the open availability of research products. Sharing all research products (eg. data, code, results) makes the scientific process more transparent which may help increase public trust in science. Also, open science encourages IDEA (Inclusion, Diversity, Equity, Accessibility), and increases involvement of citizen-scientists and non-experts in the research process. The inclusion of diverse perspectives from an open community invites unique perspectives that contribute to a more robust and often more accurate scientific outcome. +Collaboration, innovation, education, technology advancement, and science-based public policy are all improved by the open availability of research products. Sharing all research products (e.g., data, code, results) makes the scientific process more transparent which may help increase public trust in science. Also, open science encourages IDEA (Inclusion, Diversity, Equity, Accessibility), and increases involvement of citizen-scientists and non-experts in the research process. The inclusion of diverse perspectives from an open community invites unique perspectives that contribute to a more robust and often more accurate scientific outcome. @@ -196,7 +195,7 @@ Open science reciprocates the benefits it provides to researchers onto the commu Through open science practices, research waste can be avoided, such as unintentional and costly repetition of previous studies, according to a 2020 European Commission [report](https://op.europa.eu/en/publication-detail/-/publication/6bc538ad-344f-11eb-b27b-01aa75ed71a1). In the human sciences, this also reduces participant fatigue in the long term. By maximizing what is learned from publicly available data, one does not need to test repeatedly, especially on already vulnerable communities. By “giving away” science, individuals, communities and organizations can more easily adopt research results to inform interventions for their own needs without the knowledge being gatekept by the original researchers and organizations involved. In this way, open science can strengthen the social and economic impacts of scientific results. -### Open Science Attracts a Diverse Set of Participant +### Open Science Attracts a Diverse Set of Participants @@ -206,7 +205,7 @@ Image credit: Andy Brunning/Compound Interest. **CC BY-NC-ND 4.0 DEED** The open sharing of scientific products and processes makes science accessible to everyone. This allows full participation from everyone, and also maximizes the number of people who can benefit from the work. -The best ways to include a diverse group of open science practitioners and stakeholders are to remove existing barriers and design for inclusion. Beyond this, it is important to learn how to communicate effectively with diverse collaborators, people at different skill levels, career levels, backgrounds, and areas of expertise. The ability to build diverse teams is a skill that everyone can learn. +The best ways to include a diverse group of open science practitioners and stakeholders are to remove existing barriers and design for inclusion. Beyond this, it is important to learn how to communicate effectively with diverse collaborators, people at different skill levels, career levels, backgrounds, and areas of expertise. The ability to build diverse teams is a skill that everyone can learn. To learn more about NASA's commitment to diversity and inclusion, click [here](https://www.nasa.gov/odeo/diversity-and-inclusion/). ### Key Takeaways: Benefits to Society diff --git a/Open-Science-101/Module_1/Lesson_3/readme.md b/Open-Science-101/Module_1/Lesson_3/readme.md index a5846edc..d0421b88 100644 --- a/Open-Science-101/Module_1/Lesson_3/readme.md +++ b/Open-Science-101/Module_1/Lesson_3/readme.md @@ -105,7 +105,7 @@ Many organizations have groups that will support the development and commerciali **Public Domain** -In some cases, intellectual property is not protected at all. Public domain is when a creative work has no intellectual property rights associated with it. Some types of intellectual property expires after a certain time scale. Some types of work, such as those created by civil servants in the United States, is not covered by copyright and can appear immediately in the public domain. For others, the creator donates the work to the public domain or intellectual property rights are not applicable. +In some cases, intellectual property is not protected at all. Public domain is when a creative work has no intellectual property rights associated with it. Some types of intellectual property expires after a certain time scale. Some types of work, such as those created by civil servants in the United States, are not covered by copyright and can appear immediately in the public domain. For others, the creator donates the work to the public domain or intellectual property rights are not applicable. ### Why Should You Care About Intellectual Property Policies? @@ -304,9 +304,7 @@ It is important to plan for the release of your data and results from the very b ### Sharing Controlled Research -As we've previously shown, different kinds of intellectual property are released using different formal structures. For example, text and media products are released under copyright and software is released under a license. - -It is important to check with specialist communities when preparing your research plan. Methods for sharing results may follow different standards of practice or may require a special data format for distribution or submission to common repositories. +As we've previously shown, different kinds of intellectual property are released using different formal structures. It is important to understand these structures and to check with specialist communities when preparing your research plan. Methods for sharing results may follow different standards of practice or may require a special data format for distribution or submission to common repositories. @@ -382,7 +380,7 @@ Remember, sometimes what they say may conflict, for example: - If your grant / funder says outputs should be open, usually your institute will permit you to share items even if they are normally more restrictive. - Different types of outputs may have different types of restrictions. (e.g. software or hardware might have one expectation, whilst data might have others). -Universities and other institutions may have OSPOs (Open Source Policy Office) or commercialisation offices. Most institutes will have intellectual property counsel to help answer questions. Librarians are another good resource to consult when looking for advice on sharing. Considering these policies earlier in your research can save you time and energy down the road, which is why... +Universities and other institutions may have OSPOs (Open Source Policy Office) or commercialization offices. Most institutes will have intellectual property counsel to help answer questions. Librarians are another good resource to consult when looking for advice on sharing. Considering these policies earlier in your research can save you time and energy down the road, which is why... ### Early is Better diff --git a/Open-Science-101/Module_1/Lesson_4/readme.md b/Open-Science-101/Module_1/Lesson_4/readme.md index fb0e6a57..db18412a 100644 --- a/Open-Science-101/Module_1/Lesson_4/readme.md +++ b/Open-Science-101/Module_1/Lesson_4/readme.md @@ -23,7 +23,7 @@ After completing this lesson, you should be able to: ## Common Fears Around Openness -### Activity 4.1: Self Reflection on Open Science Concerns +### Activity 4.1: Self Reflection on Open Science Concerns diff --git a/Open-Science-101/Module_1/Lesson_5/readme.md b/Open-Science-101/Module_1/Lesson_5/readme.md index ce8e6414..c0bae0f8 100644 --- a/Open-Science-101/Module_1/Lesson_5/readme.md +++ b/Open-Science-101/Module_1/Lesson_5/readme.md @@ -39,7 +39,7 @@ Planning for outputs in advance includes: - Identifying journals (or other outlets) for publications - Highlighting these approaches in your grant and much more -In reality, there is an exploratory stage where sharing one’s product may not be part of the plan. During active research and data exploration, data, code, and ideas may be created and deleted even daily. It may not be efficient to spend time making these fully open (eg. creating DOIs, documentation) because you are just exploring! Still, one may choose to make their code public through this process (it should be in some version control repository anyway, no harm in making it public). Part of this planning is beginning to think about what would be valuable to science and figuring out how you might share it. +In reality, there is an exploratory stage where sharing one’s product may not be part of the plan. During active research and data exploration, data, code, and ideas may be created and deleted even daily. It may not be efficient to spend time making these fully open (e.g., creating DOIs, documentation) because you are just exploring! Still, one may choose to make their code public through this process (it should be in some version control repository anyway, no harm in making it public). Part of this planning is beginning to think about what would be valuable to science and figuring out how you might share it. It is important to discuss open science with your research team, lab, group or partners regularly. Much of responsible open science may seem to be related to outputs – such as data, software, and publications – but preparing and organizing work for these in advance is critical. It is much more difficult to follow leading practices for these at the end of research, in the 'afterthought' mode. Open science is both a mindset and culture that starts when you begin a project. @@ -132,7 +132,7 @@ In this section, we introduce the "Use, Make, Share" framework that can start to ### What Resources Will You Use? -There are already many open science resources for you to use! Open science already has a long history. For example, the act that created NASA mandated sharing of its discoveries with all of humanity and NASA has been sharing its data openly on the internet since the 1980’s. Now, there are already over 100 Petabytes of openly available NASA data for you to search, download, and use and examples of these services are provided in Module 3. Technology and practices have been developed around code that make it easy to collaborate on building complex solutions, and examples are given in Module 4. A range of services make it easy to share and discover open access publications and these are discussed in Module 5. +There are already many open science resources for you to use! Open science already has a long history. For example, the act that created NASA mandated sharing of its discoveries with all of humanity and NASA has been sharing its data openly on the internet since the 1980s. Now, there are already over 100 Petabytes of openly available NASA data for you to search, download, and use and examples of these services are provided in Module 3. Technology and practices have been developed around code that make it easy to collaborate on building complex solutions, and examples are given in Module 4. A range of services make it easy to share and discover open access publications and these are discussed in Module 5. In Module 2, we will introduce you to some of the tools that not only make open science possible, but also easy. @@ -152,7 +152,7 @@ Image Credit: Freepik.com Where you choose to share your research materials and results will have a large influence on its impact – how easy it is for others to find it, how long it is available, and how easy it is to reuse. -Will you share data in a file filled with columns of unlabelled numbers without any units or explanations or will it be in an open, standard format and following the [Findable, Accessible, Interoperable, Reusable (FAIR) principles](https://www.go-fair.org/fair-principles/)? Module 3 has more details to help you better understand how to share your data and explains ideas like FAIR and best practices in sharing data. This includes different considerations for where to share your data as well so that it is both accessible and preserved. +Will you share data in a file filled with columns of unlabeled numbers without any units or explanations, or will it be in an open, standard format and following the [Findable, Accessible, Interoperable, Reusable (FAIR) principles](https://www.go-fair.org/fair-principles/)? Module 3 has more details to help you better understand how to share your data and explains ideas like FAIR and best practices in sharing data. This includes different considerations for where to share your data as well so that it is both accessible and preserved. For software, since it is often updated and changed, many researchers first share it on a version control platform like GitHub or GitLab but then archive a version of it in a repository that has long-term preservation capabilities – more on this in Module 4! @@ -280,7 +280,7 @@ Like most data, JWST is complicated and it needs processing and data pipelines. During the implementation phase, the team collaborated on creating the data processing software together, so that everyone would benefit. Imagine the wasted effort if all 400 people had written the software themselves. The benefit and outcome was that by collaborating on this effort, the team decreased duplicative efforts, contributors got credit for their work, the software was more accurate, and this effort accelerated the data wrangling process. EUREKA!, the software created, was [created openly](https://github.com/kevin218/Eureka) with [documentation](https://eurekadocs.readthedocs.io/en/latest/) and published with [peer-review of the software package.](https://joss.theoj.org/papers/10.21105/joss.04503) -But they didn’t have to to start from scratch! The ERS-TRANSIT team was able to build on the work of others. The software built on the [JWST pipeline software](https://github.com/spacetelescope/jwst) developed openly by the JWST mission team. Furthermore, they were able to build on a much larger open source software ecosystem using python and Astropy. +But they didn’t have to start from scratch! The ERS-TRANSIT team was able to build on the work of others. The software built on the [JWST pipeline software](https://github.com/spacetelescope/jwst) developed openly by the JWST mission team. Furthermore, they were able to build on a much larger open source software ecosystem using python and Astropy. ### Open Access to Results @@ -403,7 +403,11 @@ These events provide an opportunity for you to take Open Science 101 with others In addition to the resources listed elsewhere in this training, the below community resources are excellent sources of information about Open Software. -**References and Guides** +#### Disclaimer + +Please note that we reference several papers throughout the course and depending on the paper, it might be blocked by a paywall. If you would like to get a copy of the paper, please contact the Author or search for it in an online preprint archive. For example, [bioRxiv.org](http://biorxiv.org/). + +#### References and Guides - [OpenSciency](https://opensciency.github.io/sprint-content/) - NASA SMD's [Open-Source Science Guidance for researchers](https://smd-cms.nasa.gov/wp-content/uploads/2023/08/smd-open-source-science-guidance-v2-20230407.pdf) @@ -470,7 +474,7 @@ Which item is NOT one of the four steps to open science that anyone can take? Select all that apply. -- Engage with underrepresented communities to ensure science encourages a more equitable, impactful, and +- Engage with underrepresented communities to ensure science encourages a more equitable, impactful, and positive future. - Ask colleagues about open science activities, and award credit for them in evaluations. - Think about all the different types of reviews you are involved with, and how to improve them with a goal of openness. @@ -501,7 +505,7 @@ Congratulations! Now you should be able to: [CLICK TO LEARN](http://doi.org/10.1007/978-3-319-00026-8_2) -**EC Working Group on Education and Skills under Open Science (2017). Providing researchers with the skills and competencies they need to practise Open Science** +**EC Working Group on Education and Skills under Open Science (2017). Providing researchers with the skills and competencies they need to practice Open Science** [CLICK TO LEARN](http://ec.europa.eu/) diff --git a/Open-Science-101/Module_1/images/media/OS101_CJ_Author_Contributions.png b/Open-Science-101/Module_1/images/media/OS101_CJ_Author_Contributions.png new file mode 100644 index 00000000..266e81de Binary files /dev/null and b/Open-Science-101/Module_1/images/media/OS101_CJ_Author_Contributions.png differ diff --git a/Open-Science-101/Module_1/images/media/image220.png b/Open-Science-101/Module_1/images/media/image220.png index 0ea0d354..252d82c9 100644 Binary files a/Open-Science-101/Module_1/images/media/image220.png and b/Open-Science-101/Module_1/images/media/image220.png differ diff --git a/Open-Science-101/Module_1/images/media/image242.png b/Open-Science-101/Module_1/images/media/image242.png index 63570711..0f888d41 100644 Binary files a/Open-Science-101/Module_1/images/media/image242.png and b/Open-Science-101/Module_1/images/media/image242.png differ diff --git a/Open-Science-101/Module_1/readme.md b/Open-Science-101/Module_1/readme.md index d7f0d41d..a07a36dd 100644 --- a/Open-Science-101/Module_1/readme.md +++ b/Open-Science-101/Module_1/readme.md @@ -35,7 +35,7 @@ Select the term to see the description. **Equitable** – Indicates the absence of unfair, avoidable or remediable differences among groups of people, whether those groups are defined socially, economically, demographically, or geographically or by other dimensions of inequality (e.g. sex, gender, ethnicity, disability, or sexual orientation). Learn more [here.](https://www.who.int/health-topics/health-equity) -**Citizen Science or Community Science** – The practice of public participation and collaboration in scientific research to increase scientific knowledge. Learn more [here.](https://education.nationalgeographic.org/resource/citizen-science/) +**Citizen Science or Community Science** – The practice of public participation and collaboration in scientific research to increase scientific knowledge, with or without guidance from experts/specialists. Learn more [here](https://science.nasa.gov/citizen-science/), [here](https://education.nationalgeographic.org/resource/citizen-science-collection/), and [here](https://calacademy.org/Community-science). In recent years, participatory science has been introduced as a new term, creating a more inclusive approach for all who participate in scientific discovery. Learn more [here](https://serc.si.edu/participatory-science). **Open Research** – How research is performed and how knowledge is shared based on the principle that research should be as open as possible. Learn more [here.](https://www.ukri.org/what-we-do/supporting-healthy-research-and-innovation-culture/open-research/) @@ -47,7 +47,7 @@ Select the term to see the description. **FAIR principles** – Principles to improve the Findability, Accessibility, Interoperability, and Reuse of digital assets. The principles emphasize machine- actionability (i.e., the capacity of computational systems to find, access, interoperate, and reuse data with none or minimal human intervention) because humans increasingly rely on computational support to deal with data as a result of the increase in volume, complexity, and creation speed of data. -**Metrics (in context of scientific merit)** – Quantitative tools used to help assess the quality and impact of research outputs (eg. scientific articles, researchers, and more). Learn more [here](https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8397294/) and [here.](https://editorresources.taylorandfrancis.com/understanding-research-metrics/) +**Metrics (in context of scientific merit)** – Quantitative tools used to help assess the quality and impact of research outputs (e.g., scientific articles, researchers, and more). Learn more [here](https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8397294/) and [here.](https://editorresources.taylorandfrancis.com/understanding-research-metrics/) **Altmetrics** – Alternative tools to assess the impact of a scientific article that do not involve journal-level usage information like impact factors. Learn more [here.](https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3792863/) diff --git a/Open-Science-101/Module_2/Lesson_1/readme.md b/Open-Science-101/Module_2/Lesson_1/readme.md index ec589221..bfde624e 100644 --- a/Open-Science-101/Module_2/Lesson_1/readme.md +++ b/Open-Science-101/Module_2/Lesson_1/readme.md @@ -35,7 +35,9 @@ After completing this lesson, you should be able to: Scientific knowledge, or research products, take the form of: - + + +Within these research products are additional types of products, such as methodologies, algorithms, and physical artifacts. ### What is Data? @@ -58,9 +60,7 @@ Many scientists write source code to produce software to analyze data or model o **Operational and Infrastructure Software** – Software used by data centers and large information technology facilities to provide data services. -**Libraries** – No creative process is truly complete until it manifests a tangible reality. Whether your idea is an action or a physical creation, bringing it to life will likely involve the hard work of iteration, testing, and refinement. - -Just be wary of perfectionism. Push yourself to share your creations with others. By maintaining an open stance, you’ll be able to learn from their feedback. Consider their responses new material that you can draw from the next time you’re embarking on a creative endeavor. +**Libraries** – Generic tools that implement well-known algorithms, provide statistical analysis or visualization, etc., which are incorporated in other software categories. **Modeling and Simulation Software** – Software that either implements solutions to mathematical equations given input data and boundary conditions, or infers models from data. @@ -108,7 +108,7 @@ Data can be easily shared through many different services - the best way for sci ### Sharing Open Code -When sharing open code, it is often through an online version controlled platform that allows others to contribute to the software and provides a history of changes to the software. For example, many researchers choose to post code files on [GitHub](https://github.com/) with a BSD 3-Clause license. This permits others to contribute and reuse the software. Steps to preserve code and make it discoverable are discussed in Module 4 - Open Code. +When sharing open code, it is often through an online version controlled platform that allows others to contribute to the software and provides a history of changes to the software. For example, many researchers choose to post code files on [GitHub](https://github.com/) with a BSD 3-Clause license. This permits others to contribute and reuse the software. Steps to preserve code and make it discoverable are discussed in Module 4 - Open Code. @@ -132,9 +132,9 @@ Here is an example of how one group openly shared their data, results, software, **Data** - This version: [https://doi.org/10.5281/zenodo.3688691](https://doi.org/10.5281/zenodo.3688691) + This version: [https://doi.org/10.5281/zenodo.3688691](https://doi.org/10.5281/zenodo.3688691) -All versions: [https://doi.org/10.5281/zenodo.3688690](https://doi.org/10.5281/zenodo.3688690) +All versions: [https://doi.org/10.5281/zenodo.3688690](https://doi.org/10.5281/zenodo.3688690) @@ -146,9 +146,9 @@ All versions: [https://doi.org/10.5281/zenodo.3688690](https://doi.org/10.5281/ Software -This version: [https://github.com/c-h-david/rapid](https://github.com/c-h-david/rapid) +This version: [https://github.com/c-h-david/rapid](https://github.com/c-h-david/rapid) -All versions: [https://doi.org/10.5281/zenod](https://doi.org/10.5281/zenod) +All versions: [https://zenodo.org/records/10161527](https://zenodo.org/records/10161527) ## Lesson 1: Summary diff --git a/Open-Science-101/Module_2/Lesson_2/readme.md b/Open-Science-101/Module_2/Lesson_2/readme.md index b4c314e1..a9055671 100644 --- a/Open-Science-101/Module_2/Lesson_2/readme.md +++ b/Open-Science-101/Module_2/Lesson_2/readme.md @@ -40,7 +40,7 @@ In this lesson, we introduce you to some of the most general open science tools ## Persistent Identifiers -A digital persistent identifier (or "PID") is a “long-lasting reference to a digital resource” that is machine-readable and uniquely points to a digital entity, according to [ORCID](https://support.orcid.org/hc/en-us/articles/360006971013-What-are-persistent-identifiers-PIDs-) examples of persistent identifiers used in science are described below. +A digital persistent identifier (or "PID") is a "long-lasting reference to a digital resource" that is machine-readable and uniquely points to a digital entity, according to [ORCID](https://support.orcid.org/hc/en-us/articles/360006971013-What-are-persistent-identifiers-PIDs-). Examples of persistent identifiers used in science are described below. ### ORCID @@ -51,9 +51,9 @@ A digital persistent identifier (or "PID") is a “long-lasting reference to a d A free, nonproprietary numeric code that is: - Uniquely and persistently identifies authors and contributors of scholarly communication. -- Similar to tax ID numbers for tax purposes. +- Used similarly to how tax ID numbers are used for tax purposes. -ORCIDs are used to link Used to link researchers to their research and research-related outputs. It is a 16-digit number that uniquely identifies researchers and is integrated with certain organizations (like some publishers) that will add research products (such as a published paper) to an individual's ORCID profile. ORCIDs are meant to last throughout ones career, and helps to avoid confusion when information about a researcher changes over time (e.g. career change or name change). (cite: [https://orcid.org/](https://orcid.org/)) +ORCIDs are used to link Used to link researchers to their research and research-related outputs. It is a 16-digit number that uniquely identifies researchers and is integrated with certain organizations (like some publishers) that will add research products (such as a published paper) to an individual's ORCID profile. ORCIDs are meant to last throughout ones career, and helps to avoid confusion when information about a researcher changes over time (e.g., career change or name change). (cite: [https://orcid.org/](https://orcid.org/)) Many publishers, academic institutes, and government bodies support ORCID. In 2023, ORCID reported over 1,300 member organizations and over 9 million yearly live accounts. You can connect it with your professional information (affiliations, grants, publications, peer review, and more). @@ -84,7 +84,7 @@ Data repositories will typically instruct you on the exact way to cite their dat In this activity, you will search for a DOI for a data set or piece of software that you use, and you will then use the DOI website to “resolve” the DOI name. By "resolving", this means that you will be taken to the information about the product designated by that particular DOI. 1. Find the DOI for a dataset or software you use often. - 1. This should be listed either in the citation file, or in the website where that data/software is published. + 1. This should be listed either in the citation file, or on the website where that data/software is published. 2. If you can’t find a DOI, you can instead locate the DOI listed on this page: https://asdc.larc.nasa.gov/project/CERES/CERES_EBAF-TOA_Edition4.1 2. Go to https://www.doi.org/ and scroll down to the bottom of the page to "TRY RESOLVING A DOI NAME". @@ -104,7 +104,7 @@ This is how easy it should be for your readers to find and use your citation inf ### Examples of PIDs in Action - +
@@ -182,7 +182,7 @@ Metadata can facilitate the assessment of dataset quality and data sharing by an Metadata enhances searchability and findability of the data by potentially allowing other machines to read and interpret datasets. -According to [The University of Pittsburgh](https://pitt.libguides.com/metadatadiscovery/metadata-standards), "A metadata standard is a high level document which establishes a common way of structuring and understanding data, and includes principles and implementation issues for utilizing the standard." +According to [The University of Pittsburgh](https://pitt.libguides.com/metadatadiscovery/metadata-standards), "A metadata standard is a high level document which establishes a common way of structuring and understanding data, and includes principles and implementation issues for utilizing the standard." Many standards exist for metadata fields and structures to describe general data information. It is a best practice to use a standard that is commonly used in your domain, when applicable, or that is requested by your data repository. Examples of metadata standards for different domains include: @@ -465,7 +465,7 @@ produced used to share materials - How? - The details of how to enable reuse of -materials (eg. licensing, documentation, +materials (e.g., licensing, documentation, metadata) - Who? - Roles and responsibilities of the team members @@ -497,7 +497,7 @@ General components of a software management plan: - Personnel roles and responsibilities. - Any community-specific information of note. -At a minimum, a software management plan for SMD-funded research should include: +At a minimum, a software management plan SMD-funded (NASA Science Mission Directorate) research should include: - Description of the software expected to be produced from the proposed activities, including types of software to be produced, how the software will be developed, and the addition of new features or updates to existing software. This can include the platforms used for development, project management, and community-based best practices to be included such as documentation, testing, dependencies, and versioning. - The repository(ies) that will be used to archive software arising from the activities and the schedule for making the software publicly available. - Description of software that are subject to relevant laws, regulations, or policies that exclude them from software sharing requirements. diff --git a/Open-Science-101/Module_2/Lesson_3/readme.md b/Open-Science-101/Module_2/Lesson_3/readme.md index 1846b8fa..778ddf4f 100644 --- a/Open-Science-101/Module_2/Lesson_3/readme.md +++ b/Open-Science-101/Module_2/Lesson_3/readme.md @@ -19,7 +19,7 @@ After completing this lesson, you should be able to: - Define the different types of scientific data. - Define what the acronym FAIR means and explain how it supports the sharing of open data. - Identify data management practices and tools to locate data in repositories. -- List and explain the purpose of the resources commonly used in making data including the data formats, inspecting data, and assessing 'FAIR'-ness of data. +- List and explain the resources commonly used in making data, including formatting, inspecting, and assessing the 'FAIR'-ness of data. ## Introduction to Open Data @@ -28,7 +28,7 @@ Data is a major part of scientific research, and why wouldn’t it be? It inform For instance, the open access [Copernicus Emergency Management Service](https://emergency.copernicus.eu/) implemented by the European Commission produces 24/7 open access data collected by ESA and NASA satellites to produce maps that inform disaster preparedness and response efforts across the globe. This is only one example among many others demonstrating the value of data, particularly open and public data, in our daily life and for public good. -Data shared openly in scientific research brings tremendous value to the scientific community and beyond, from indigenous communities to urban populations. Before understanding the broad based impact of data, let’s first look at what is data in the context of scientific research. Specifically, we will discuss the definition and characteristics of open data? +Data shared openly in scientific research brings tremendous value to the scientific community and beyond, from indigenous communities to urban populations. Before understanding the broad based impact of data, let’s first look at what is data in the context of scientific research. Specifically, we will discuss the definition and characteristics of open data. ### What is Data? @@ -49,7 +49,7 @@ The following sections discuss ways to ensure that data is fully utilized and ac Just like driving on a road, if everyone follows agreed upon rules, everything goes much smoother. The rules don’t need to be exactly the same for every region, but share common practices based on insights about safety and efficiency. -For example, maybe you drive on the left side of the road or the right side. Either is fine, those sort of details are for different communities to decide on. However, there are overarching guidelines shared by communities across the globe, such as the rule to drive on the road not the sidewalk, use a turn signal when appropriate, adhere to lights at intersections that direct traffic, and follow speed limits. Some communities may implement stricter rules than others, or practice them differently, but these guidelines help everyone move around safely through a common understanding of how to drive on roads. For scientific data, these guidelines are called the Findable, Accessible, Interoperable, Reusable or “FAIR” principles. They do to data what their title suggests. That is, these principles make it possible for others (and yourself) to find, get , understand, and use data correctly. +For example, maybe you drive on the left side of the road or the right side. Either is fine, those sort of details are for different communities to decide on. However, there are overarching guidelines shared by communities across the globe, such as the rule to drive on the road not the sidewalk, use a turn signal when appropriate, adhere to lights at intersections that direct traffic, and follow speed limits. Some communities may implement stricter rules than others, or practice them differently, but these guidelines help everyone move around safely through a common understanding of how to drive on roads. For scientific data, these guidelines are called the Findable, Accessible, Interoperable, Reusable or “FAIR” principles. They do to data what their title suggests. That is, these principles make it possible for others (and yourself) to find, get, understand, and use data correctly. **Findable**: @@ -64,7 +64,7 @@ Current Enabling Tech: - [DataCite's Metadata Schema](https://schema.datacite.org/) - PIDs: Persistent IDentifiers (additional details in the following sections) - [Digital Object Identifier](https://www.doi.org/) (DOI): A top-level and a mandatory field in the metadata of each record - for data, code, publications. - - [Open Research and Contributor ID](https://orcid.org/) (ORCiD) - A code that uniquely identifies authors and contributors of research products and scholarly communication. + - [Open Research and Contributor ID](https://orcid.org/) (ORCID) - A code that uniquely identifies authors and contributors of research products and scholarly communication. **Accessible** @@ -73,14 +73,14 @@ To be [Accessible:](https://www.go-fair.org/fair-principles/metadata-retrievable - Data and results are retrievable by their identifiers using a standardized communication protocol. - The protocol is open, free, and universally implementable. - The protocol allows for an authentication and authorization procedure, where necessary. Data and results are publicly accessible and licensed under the public domain. - - Metadata are accessible, even when the data are no longer available Data and metadata will be retained for the lifetime of the repository. + - Metadata are accessible, even when the data are no longer available. Data and metadata will be retained for the lifetime of the repository. - Metadata are stored in high-availability database servers. Current Enabling Tech: - [File Transfer Protocol (FTP)](https://www.w3.org/Protocols/rfc959/), File Transfer Protocol Secure (FTPS) - [Hypertext Transfer Protocol (HTTP)](https://www.w3.org/Protocols/), Hypertext Transfer Protocol Secure (HTTPS) -Note that Microsoft Exchange Server and Skype are examples of proprietary protocols. +Note that Microsoft Exchange Server and Skype are examples of proprietary protocols. As always, it is necessary to balance accessibility with security concerns, which may impact the chosen protocol. **Interoperable** @@ -118,7 +118,7 @@ These are high-level guidelines, and much like open science, implementation is n ### Data Management Plan -The previous lesson describes the requirements of a data management plan (DMP). Below are two open science resources to get you started or creating a data management plan: +The previous lesson describes the requirements of a data management plan (DMP). Below are two open science resources to get you started on creating a data management plan: **DMPTool** @@ -140,6 +140,8 @@ A data repository is a digital space to house, curate, and share research output Open science tools such as data repositories should implement FAIR principles, especially in the case of attribution of persistent identifiers (e.g., DOI), metadata annotation, and machine-readability. +Additional examples of data repositories and other open science tools include but are not limited to: + **ZENODO** [Zenodo](https://zenodo.org/) is an example of a data repository that allows the upload of research data and creates DOIs. Its popularity among the research community is due to its simplified interface, support of community curation, and feature that enables researchers to deposit diverse types of research outputs; from datasets and reports to publications, software, multimedia content. @@ -296,7 +298,7 @@ Choose the FAIR Principles from the list below. Select all that apply. Which of the following can help make your data FAIR? Select all that apply. - Get a license for your data -- Make sure you develop your own metadata +- Make your metadata accessible only as long as your data is available - Obtain a PID for your data *Question* @@ -308,5 +310,4 @@ Which of the following are examples of repositories? Select all that apply. - Zenodo - Dataverse - Dryad -- Datacite - Google diff --git a/Open-Science-101/Module_2/Lesson_4/readme.md b/Open-Science-101/Module_2/Lesson_4/readme.md index 5ab7a153..41989ce6 100644 --- a/Open-Science-101/Module_2/Lesson_4/readme.md +++ b/Open-Science-101/Module_2/Lesson_4/readme.md @@ -50,7 +50,7 @@ The general way we use version control starts by initializing a folder on your c This may sound like a simple process, and in many ways it is! So why is it so important? Especially when it comes to coding, the ability to create a snapshot in time of a piece of code can be very helpful. For instance, you may have a piece of code that yields the intended result, but then you want to add a new function. You may choose to copy that code file so you don’t lose the current state, and then work in a new file. This can become cumbersome pretty quickly when you have multiple files that are different versions of the same piece of code. Or instead of creating a new file, you may write code for the new function directly in the original file, but now the code throws errors when you try to run it, and you can’t remember which lines you added since the last time the code ran without errors. By using version control, these problems are solved because we can revert back to the checkpoint when the code ran cleanly, and thereby avoid the need to create multiple copies to save the original piece of code. -There are many other features of version control systems, such as the concept of creating "branches" that allow you to work on new updates to a piece of code independently from and in parallel to the original piece of code. A branch is a deviation from the original code, but can be merged back into the original code when desired. All of these concepts are even more useful when collaborating with others using version control platforms, a collaborative practice that will be discussed later in this lesson. +There are many other features of version control systems, such as the concept of creating "branches" that allow you to work on new updates to a piece of code independently of and in parallel to the original piece of code. A branch is a deviation from the original code, but can be merged back into the original code when desired. All of these concepts are even more useful when collaborating with others using version control platforms, a collaborative practice that will be discussed later in this lesson. ### Types of Software Version Control @@ -207,7 +207,7 @@ If you were in a room with 10 developers and asked them each what their favorite **Source-Code Editing & Kernels – The Value of IDEs and Kernels** -IDEs can bring a lot of good tools to your efforts. It’s not just about editing code any more. Modern, robust IDEs can do most of the things listed here, if not more. One can use an IDE without executing in a kernel; one can use a kernel without having developed code in an IDE. However, they can work hand-in-hand. +IDEs can bring a lot of good tools to your efforts. It’s not just about editing code anymore. Modern, robust IDEs can do most of the things listed here, if not more. One can use an IDE without executing in a kernel; one can use a kernel without having developed code in an IDE. However, they can work hand-in-hand.
@@ -284,7 +284,7 @@ From VS Code you can: - Upload your changes directly to GitHub. - Download changes from other team members to your local system. - **IDE Example: Rstudio – IDE** + **IDE Example: RStudio – IDE** While Visual Studio Code is a more generic IDE where you can use plugins to specialize it, there are also IDEs, such as RStudio, that have specialized features for specific languages right out of the gate. @@ -292,11 +292,11 @@ Researchers conducting statistical analysis tend to use the coding languages of -Source: https://en.wikipedia.org/wiki/File:RStudio_IDE_screenshot.png +Source: https://en.wikipedia.org/wiki/File:RStudio_IDE_screenshot.png ### Plain Text Editors for Coding -Most laptop or desktop computers that run standard operating systems (Windows, MacOS, Linux) have multiple pre-installed plain-text editors that can be used for coding. It is beneficial to know how to use at least one, because it makes editing scripts and files a quick process. +Most laptop or desktop computers that run standard operating systems (Windows, macOS, Linux) have multiple pre-installed plain-text editors that can be used for coding. It is beneficial to know how to use at least one, because it makes editing scripts and files a quick process.
@@ -372,7 +372,7 @@ In this activity, you will run pre-written Python code in a Jupyter Notebook fro ![](../images/media/image39.jpeg) -Source: [https://climatedataguide.ucar.edu/climate-data/nino-sst-indices-nino-12-3-34-4-oni-and-tni](https://climatedataguide.ucar.edu/climate-data/nino-sst-indices-nino-12-3-34-4-oni-and-tni) +Source: [https://climatedataguide.ucar.edu/climate-data/nino-sst-indices-nino-12-3-34-4-oni-and-tni](https://climatedataguide.ucar.edu/climate-data/nino-sst-indices-nino-12-3-34-4-oni-and-tni) --- @@ -439,7 +439,7 @@ Cons: - Google Cloud - Microsoft Azure -Many data providers, especially of large datasets, are migrating their data to the Cloud to increase accessibility and to make use of the large storage capacity that the Cloud provides. For instance, NASA Earthdata (which houses all NASA Earth science data) is now using AWS to store the majority of its data. Many Cloud providers also have a number of publicly available datasets, including [Google Cloud](https://cloud.google.com/storage/docs/public-datasets/#%3A~%3Atext%3DAvailable%20public%20datasets%20on%20Cloud%20Storage%201%20ERA5%3A%2Cfrom%202015%20through%20the%20present.%20...%20More%20items) and [AWS](https://registry.opendata.aws/)[.](https://cloud.google.com/storage/docs/public-datasets/#%3A~%3Atext%3DAvailable%20public%20datasets%20on%20Cloud%20Storage%201%20ERA5%3A%2Cfrom%202015%20through%20the%20present.%20...%20More%20items) +Many data providers, especially of large datasets, are migrating their data to the Cloud to increase accessibility and to make use of the large storage capacity that the Cloud provides. For instance, NASA Earthdata (which houses all NASA Earth science data) is now using AWS to store the majority of its data. Many Cloud providers also have a number of publicly available datasets, including [Google Cloud](https://cloud.google.com/storage/docs/public-datasets/#%3A~%3Atext%3DAvailable%20public%20datasets%20on%20Cloud%20Storage%201%20ERA5%3A%2Cfrom%202015%20through%20the%20present.%20...%20More%20items) and [AWS](https://registry.opendata.aws/)[.](https://cloud.google.com/storage/docs/public-datasets/#%3A~%3Atext%3DAvailable%20public%20datasets%20on%20Cloud%20Storage%201%20ERA5%3A%2Cfrom%202015%20through%20the%20present.%20...%20More%20items) When choosing a computing platform, it is important to consider where your datasets are saved and how big the datasets are. For instance, when working with small datasets, it is often preferable to use a personal computer since data download will take minimal time and large computing resources likely aren’t needed. When working with large datasets, however, it is best to minimize the amount of downloading and uploading data that is needed, as this can take significant amounts of time and internet bandwidth. If your large datasets are stored on the Cloud already, it is typically best to use Cloud resources for the computation as well, and likewise for HPC use. diff --git a/Open-Science-101/Module_2/Lesson_5/readme.md b/Open-Science-101/Module_2/Lesson_5/readme.md index f379693b..0926a1e0 100644 --- a/Open-Science-101/Module_2/Lesson_5/readme.md +++ b/Open-Science-101/Module_2/Lesson_5/readme.md @@ -88,6 +88,10 @@ Tools to support reproducibility in research outputs: ## Additional Tools for Open Results +### Disclaimer + +Please note that we reference several papers throughout the course and depending on the paper, it might be blocked by a paywall. If you would like to get a copy of the paper, please contact the Author or search for it in an online preprint archive. For example, [bioRxiv.org](http://biorxiv.org/). + ### Tools for Open Project Management Advancements over the past few decades to tools that manage research projects and laboratories have helped to meet the ever-increasing demand for speed, innovation, and transparency in science. Such tools are developed to support collaboration, ensure data integrity, automate processes, create workflows and increase productivity. @@ -100,7 +104,7 @@ Advancements over the past few decades to tools that manage research projects an In a broader sense, protocol comprises documented computational workflows, operational procedures with step-by-step instructions, or even safety checklists. - [ Protocols.io](https://www.protocols.io/) is an online and secure platform for scientists affiliated with academia, industry and non- profit organizations, and agencies. It allows users to create, manage, exchange, improve, and share research methods and protocols across different disciplines. This resource can improve collaboration and recordkeeping, leading to an increase in team productivity and facilitating teaching, especially in the life sciences. In its free version, protocols.io supports publicly shared protocols, while paid plans enable private sharing, e.g. for industry. + [ Protocols.io](https://www.protocols.io/) is an online and secure platform for scientists affiliated with academia, industry and non-profit organizations, and agencies. It allows users to create, manage, exchange, improve, and share research methods and protocols across different disciplines. This resource can improve collaboration and recordkeeping, leading to an increase in team productivity and facilitating teaching, especially in the life sciences. In its free version, protocols.io supports publicly shared protocols, while paid plans enable private sharing, e.g. for industry. Some of the tools are specifically designed for open science with an open-by-design concept from ideation on. These tools aim to support the research lifecycle at all stages and allow for integration with other open science tools. @@ -110,7 +114,7 @@ The OSF is designed to be a collaborative platform where users can share researc -"While there are many features built into the OSF, the platform also allows thirdparty add-ons or integrations that strengthen the functionality and collaborative nature of the OSF. These add-ons fall into two categories: citation management integrations and storage integrations. Mendeley and Zotero can be integrated to support citation management, while Amazon S3, Box, Dataverse, Dropbox, figshare, GitHub, and oneCloud can be integrated to support storage. The OSF provides unlimited storage for projects, but individual files are limited to 5 gigabytes (GB) each." +"While there are many features built into the OSF, the platform also allows third-party add-ons or integrations that strengthen the functionality and collaborative nature of the OSF. These add-ons fall into two categories: citation management integrations and storage integrations. Mendeley and Zotero can be integrated to support citation management, while Amazon S3, Box, Dataverse, Dropbox, Figshare, GitHub, and OneCloud can be integrated to support storage. The OSF provides unlimited storage for projects, but individual files are limited to 5 gigabytes (GB) each." **[Center for Open Science](https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5370619/)** @@ -126,7 +130,7 @@ The OSF is designed to be a collaborative platform where users can share researc ### Managing Citations Using Reference Management Software -Keeping track of every paper you reference, every dataset you use, and every software library you build off of is critical. A single paper might cite dozens of references, and each new thing you produce only adds to that list. Reference Management Software can be employed to help you manage these references and automatically create a list of citations in whatever format you need (BibTeX, Word, Google docs, etc.). +Keeping track of every paper you reference, every dataset you use, and every software library you build off of is critical. A single paper might cite dozens of references, and each new thing you produce only adds to that list. Reference Management Software can be employed to help you manage these references and automatically create a list of citations in whatever format you need (BibTeX, Word, Google Docs, etc.). While you are writing up results, keeping track of references and creating a correctly formatted bibliography can be overwhelming. A management software can keep track of references and can be shared with colleagues who are also working in the document. diff --git a/Open-Science-101/Module_2/images/media/image22.jpeg b/Open-Science-101/Module_2/images/media/image22.jpeg deleted file mode 100644 index 24d16f83..00000000 Binary files a/Open-Science-101/Module_2/images/media/image22.jpeg and /dev/null differ diff --git a/Open-Science-101/Module_2/images/media/image22.png b/Open-Science-101/Module_2/images/media/image22.png new file mode 100644 index 00000000..f27a5041 Binary files /dev/null and b/Open-Science-101/Module_2/images/media/image22.png differ diff --git a/Open-Science-101/Module_2/images/media/image9.png b/Open-Science-101/Module_2/images/media/image9.png index 869fdd87..373e3f65 100644 Binary files a/Open-Science-101/Module_2/images/media/image9.png and b/Open-Science-101/Module_2/images/media/image9.png differ diff --git a/Open-Science-101/Module_2/readme.md b/Open-Science-101/Module_2/readme.md index 0b7e9049..fde43c0d 100644 --- a/Open-Science-101/Module_2/readme.md +++ b/Open-Science-101/Module_2/readme.md @@ -11,7 +11,7 @@ This module is designed to help you get started on your journey to practicing op - Define the foundational elements of open science, which includes research products, the "use, make, share" framework, and the role of an Open Science and Data Management Plan. - List and explain the purpose of resources used to discover and assess research products for reuse, including repositories, search portals, publications, documentation such as README files, metadata, and licensing. - Develop a high-level strategy for making and sharing data that employs the FAIR principles, incorporates a data management plan, tracks data and authors with persistent identifiers and citations, and utilizes the appropriate data formats and tools for making data and sharing results. -- Describe the software lifecycle and design a high-level strategy for making and sharing software that considers the the use of a software management plan, the tools needed for development including source code, kernels, programming languages, third-party software and version control, and the tools and documentation used for publishing and curating open software. +- Describe the software lifecycle and design a high-level strategy for making and sharing software that considers the use of a software management plan, the tools needed for development including source code, kernels, programming languages, third-party software and version control, and the tools and documentation used for publishing and curating open software. - List the resources for sharing research products including preprints, open access publications, reference management systems, and resources to support reproducibility. ## Key Terms @@ -26,7 +26,7 @@ These key terms are important topics for this module. Select the term to see the **Computing Environment** – A platform that provides necessary software dependencies, a development area, and connections to computational resources to facilitate running code. -**ORCiD** – A numeric code used to uniquely identify authors and contributors of scholarly communication. Researchers provide an ORCiD for publications and association memberships. ORCiD is also an international, interdisciplinary, open, non-proprietary, and not-for-profit organization created by the research community for the benefit of all stakeholders including ours and the organizations that support the research ecosystem. +**ORCID** – A numeric code used to uniquely identify authors and contributors of scholarly communication. Researchers provide an ORCID for publications and association memberships. ORCID is also an international, interdisciplinary, open, non-proprietary, and not-for-profit organization created by the research community for the benefit of all stakeholders including ours and the organizations that support the research ecosystem. **Persistent Identifiers (PIDs)** - A long-lasting digital reference to an entity. diff --git a/Open-Science-101/Module_3/Lesson_1/readme.md b/Open-Science-101/Module_3/Lesson_1/readme.md index 47f236a6..9ccc99d1 100644 --- a/Open-Science-101/Module_3/Lesson_1/readme.md +++ b/Open-Science-101/Module_3/Lesson_1/readme.md @@ -14,13 +14,13 @@ ## Overview -This lesson defines open data, its benefits, and the practices that enable data to be open. In addition, the lesson takes a closer look at how FAIR applies to open data as well as at the criticall role of metadata. It wraps up with a brief discussion on how to plan for open data in the scientific workflow and tasks guided by the use, make, share framework. +This lesson defines open data, its benefits, and the practices that enable data to be open. In addition, the lesson takes a closer look at how FAIR applies to open data as well as at the critical role of metadata. It wraps up with a brief discussion on how to plan for open data in the scientific workflow and tasks guided by the use, make, share framework. ## Learning Objectives After completing this lesson, you should be able to: -- Define what open data is and how the FAIR and CARE principles are used to guide open data practices +- Define what open data is and how the FAIR principles are used to guide open data practices - List the benefits of open data - Explain how the use, make, share framework can be used to modify the scientific plan for open data @@ -48,7 +48,7 @@ Mitochondria are components within our cells that affect respiratory and energy -The Turing Way Community. This illustration is created by Scriberia with The Turing Way community, used under a CC-BY 4.0 licence. DOI: 10.5281/zenodo.3332807 +The Turing Way Community. This illustration is created by Scriberia with The Turing Way community, used under a CC-BY 4.0 license. DOI: 10.5281/zenodo.3332807 --- @@ -63,7 +63,7 @@ Data includes: - Data acquired from physical samples and specimens form the base of many studies. - Data generated from models and simulations. -**Secondary & Processed data** – Secondary data typically refers to data that is used by someone different than who collected or generated the data. Often, this may include data that has been processed from its raw state to be more readily usable by others. +**Secondary & Processed data** – Secondary data typically refers to data that is used by someone different from who collected or generated the data. Often, this may include data that has been processed from its raw state to be more readily usable by others. **Published data** – Published data are the data shared to address a particular scientific study and/or for general use. While published data can overlap with primary and secondary data types, we have "published data" as its own category to emphasize that such datasets are ideally well-documented and easy to use. @@ -87,14 +87,14 @@ To quote from a [published paper about data reuse](https://www.ncbi.nlm.nih.gov/ - Have the appropriate license, copyright, and citation information. - Have appropriate access information. - Be findable in an accredited or trustworthy resource. -- Be accompanied with history of changes and versioning. +- Be accompanied by history of changes and versioning. - Include details of all processing steps. Not all data may be shared or shared with all this information. There are different reasons why it might not be possible. However, the more information shared about data helps increase the reliability and reusability of the information. -## Benefits of Open Data +## Benefits of Open Data -Data underpins almost all of science. Openly sharing data with others enables reproducibility, transparency, validation, reuse, and collaborations. Data plays a significant role in our day-to-day lives. Open data, in particular, plays a key role. Open data are only common in our society and you have likely already benefited from this form in some way. The impacts of open data include facilitating: +Data underpins almost all of science. Openly sharing data with others enables reproducibility, transparency, validation, reuse, and collaborations. The impacts of open data include facilitating: --- @@ -148,7 +148,7 @@ Open data that are purposefully inclusive and open to scrutiny, benefit scientif Open data allows non-traditional researchers to contribute to scientific development and bring their unique insights to the table. With these benefits in mind, we should always bear in mind that Open Data requires careful consideration of its potential downsides that results from failure to provide due credit and consultation with potentially vulnerable and/or marginalized communities. The next lesson “Using Open Data” discusses important considerations for the responsible management, collection, and use of open data by all stakeholders. -### Benefits to You +### Benefits to You Open data also benefits your research and career. For starters, you are your own future collaborator! @@ -214,13 +214,13 @@ Ultimately, you are free to deploy the open data principles and resources in you -Image by Patrick Hochstenbach, CC0 1.0; image illustrates the each FAIR principle +Image by Patrick Hochstenbach, CC0 1.0; image illustrates each FAIR principle --- ### FAIR: Findable, Accessible, Interoperable, Reusable -The vast majority of data today is shared online. FAIR principles help researchers make better use of, and engage with a broader audience with, their scientific data than outdated techniques would allow. FAIR data are more valuable for science because they are easier to use. Data can be FAIR regardless of whether it is openly shared or not. If data are openly shared, being FAIR helps with reuse and expands the scientific impact of the data. +The vast majority of data today is shared online. Although FAIR has been introduced in Module 2, Lesson 3, some additional details are provided below. FAIR principles help researchers make better use of, and engage with a broader audience with, their scientific data than outdated techniques would allow. FAIR data are more valuable for science because they are easier to use. Data can be FAIR regardless of whether it is openly shared or not. If data are openly shared, being FAIR helps with reuse and expands the scientific impact of the data. FAIR principles don’t encompass comprehensive implementation instructions for every type of data, but offer general insights to improve shareability and reusability. Sometimes it takes a group effort and/or a long production process to make data and results FAIR. The process starts in the planning stage of a research project. A well-coordinated open science and data management plan is often needed for full compliance with FAIR, depending on the size and type of project the data are used for. diff --git a/Open-Science-101/Module_3/Lesson_2/readme.md b/Open-Science-101/Module_3/Lesson_2/readme.md index f773384f..a2bd7057 100644 --- a/Open-Science-101/Module_3/Lesson_2/readme.md +++ b/Open-Science-101/Module_3/Lesson_2/readme.md @@ -26,7 +26,7 @@ After completing this lesson, you should be able to: Open data isn't always simple to use in your research. Sometimes there are multiple versions of the same dataset, so learning how to discover and assess and then use open data will help you save time. -As an example, look at the monthly average carbon dioxide data from Mauna Loa Observatory in Hawaii. This is a foundational dataset for climate change. Not only is it one of the first observational datasets that clearly showed anthropogenic impacts on the Earth's atmosphere, it constitutes the longest record of direct measurements of carbon dioxide in the atmosphere. These observations were started by C. David Keeling of the Scripps Institution of Oceanography in March of 1958 at a facility of the National Oceanic and Atmospheric Administration \[Keeling, 1976\]. +As an example, look at the monthly average carbon dioxide data from Mauna Loa Observatory in Hawaii. This is a foundational dataset for climate change. Not only is it one of the first observational datasets that clearly showed anthropogenic impacts on the Earth's atmosphere, it constitutes the longest record of direct measurements of carbon dioxide in the atmosphere. These observations were started by C. David Keeling of the Scripps Institution of Oceanography in March 1958 at a facility of the National Oceanic and Atmospheric Administration \[Keeling, 1976\]. @@ -52,8 +52,6 @@ There are multiple pathways to find research data, and you should be practiced i ### People You Know (Online or In-person!) -When we show up to the present moment with all of our senses, we invite the world to fill us with joy. The pains of the past are behind us. The future has yet to unfold. But the now is full of beauty simply waiting for our attention. - What is the first and best way to find research data? Ask your community, including your research advisor, colleagues, team members, and people online. Knowing where to find reliable, good data is as much a skill and art as any lab technique. You learn this skill set by working with professionals in your field. There is no one source, no one method. @@ -245,8 +243,9 @@ Note that some of our example search portals are also repositories, but not alwa

Data stored in these repositories are often produced by the government.

Examples include:

@@ -292,7 +291,7 @@ Using open data for your project is contingent on a number of factors including - Is the data available in a format appropriate for the content? - Is the data available from a consistent location? -- Is the data well-structured and machine readable? +- Is the data well-structured and machine-readable? - Are complex terms and acronyms in the data defined? - Does the data use a schema or data standard? - Is there an API available for accessing the data? @@ -324,7 +323,7 @@ Many datasets and repositories explain how they’d prefer to be cited. The cita - Authors and their institutions - Title -- ORCiD +- ORCID - DOI - Version - URL @@ -343,7 +342,7 @@ Most datasets require (at a minimum) that you list the data’s producers, name **Example from a NASA Distributed Active Archive Center (DAAC)** -Matthew Rodell and Hiroko Kato Beaudoing, NASA/GSFC/HSL (08.16.2007), GLDAS CLM Land Surface Model L4 3 Hourly 1.0 x 1.0 degree Subsetted,version 001, Greenbelt, Maryland, USA:Goddard Earth Sciences Data and Information Services Center (GES DISC), Accessed on July 12th, 2018 at doi:10.5067/83NO2QDLG6M0 +Matthew Rodell and Hiroko Kato Beaudoing, NASA/GSFC/HSL (08.16.2007), GLDAS CLM Land Surface Model L4 3 Hourly 1.0 x 1.0 degree Subsetted, version 001, Greenbelt, Maryland, USA:Goddard Earth Sciences Data and Information Services Center (GES DISC), Accessed on July 12th, 2018 at doi:10.5067/83NO2QDLG6M0 **Example from NASA Planetary Data System (PDS)** @@ -381,7 +380,7 @@ Which of the following methods can be used for data discovery? Which of the following is/are questions to consider when assessing if a dataset can be used? - Is the data well described? -- Is the data well-structured and machine readable? +- Is the data well-structured and machine-readable? - Is there an existing community of users of the data? - What tools or software are needed to use this data? - Will the data be updated regularly? @@ -396,7 +395,7 @@ What information is commonly found in a citation file? - Authors and their institutions - Title -- ORCiD +- ORCID - DOI - Version - URL diff --git a/Open-Science-101/Module_3/Lesson_3/readme.md b/Open-Science-101/Module_3/Lesson_3/readme.md index 39b1f073..69dc5f47 100644 --- a/Open-Science-101/Module_3/Lesson_3/readme.md +++ b/Open-Science-101/Module_3/Lesson_3/readme.md @@ -12,7 +12,7 @@ ## Overview -In this lesson, you learn the criteria and tasks needed to ensure that the datasets you make are open and reusable. The lesson starts with a discussion on creating a data management plan and then continues with topics on selecting open data formats and how to include metadata, readme files, and version control for your data. It wraps up with a discussion on open licenses for data. +In this lesson, you learn the criteria and tasks needed to ensure that the datasets you make are open and reusable. The lesson starts with a discussion on creating a data management plan and then continues with topics on selecting open data formats and how to include metadata, README files, and version control for your data. It wraps up with a discussion on open licenses for data. ## Learning Objectives @@ -58,7 +58,6 @@ Investigate if your funding agency, institutions, and/or data repository has add A non-open (unsupported and closed/proprietary) data format refers to a file format that is not freely accessible, standardized, or widely supported by different software applications. Here are some examples of closed/proprietary data formats: - **Adobe Photoshop (.psd):** The default proprietary file format for Adobe Photoshop, a popular image editing software. -- **Microsoft Word (.doc/.docx):** A proprietary file format used to store word processing data. - **AutoCAD Drawing (.dwg):** A proprietary data format used for computer-aided design (CAD). Software applications that can read but not create DOC, PSD, or DWG formatted data usually do not fully support all the features, layers, specifications, and inner workings of the original file. @@ -80,11 +79,12 @@ Some examples of open data formats include: | | | |---|---| | Comma Separated Values (CSV) | For simplicity, readability, compatibility, easy data exchange. | -| Hierarchical Data Format (HDF) | For efficient storing and retrieving data, compression, multi-dimensional support. | -| Network Common Data Form (NetCDF) | For self-describing and portability, efficient data subsetting (extract specific portions of large datasets), standardization and interoperability. | +| Hierarchical Data Format (HDF) | For efficient storing and retrieving data, compression, multi-dimensional support. | +| Network Common Data Form (NetCDF) | For self-describing and portability, efficient data subsetting (extract specific portions of large datasets), standardization and interoperability. | | Investigation-Study- Assay (ISA) model for life science studies | For structured data organization, data integration and interoperability among experiments, reproducibility and transparency. | -| Flexible Image Transport System (FITS) | As a standard for astronomical data, flexible and extensible metadata and image headers, efficient data compression and archiving of large datasets. | +| Flexible Image Transport System (FITS) | As a standard for astronomical data, flexible and extensible metadata and image headers, efficient data compression and archiving of large datasets. | | Common Data Format (CDF) | For self-describing format readable across multiple operating systems, programming languages, and software environments, multidimensional data, and metadata inclusion. | +| Microsoft Word (.doc/.docx) | A proprietary file format used to store word processing data. | By embracing open standards, authors can avoid unnecessary barriers and maximize their chances of making data useful to their communities. @@ -92,7 +92,7 @@ By embracing open standards, authors can avoid unnecessary barriers and maximize ### Adding Documentation and Metadata for Reusability -Metadata and data documentation describe data so that we and others can use and better understand data. While metadata and documentation are related, there is an important distinction. Metadata are structured, standardized, and machine readable. Documentation is unstructured and can be any format (often a text file that accompanies the data). +Metadata and data documentation describe data so that we and others can use and better understand data. While metadata and documentation are related, there is an important distinction. Metadata are structured, standardized, and machine-readable. Documentation is unstructured and can be any format (often a text file that accompanies the data). To better understand documentation and metadata, let’s take an example of an online recipe. Many online recipes start with a long description and history of the recipe, and perhaps cooking or baking tips for the dish, before listing ingredients and step-by-step cooking instructions. @@ -119,10 +119,10 @@ Metadata can facilitate the assessment of dataset quality and data sharing by an Metadata enhances searchability and findability of the data by potentially allowing both humans and machines to read and interpret datasets. Benefits to creating metadata about your data include: - Helps users understand what the data are and if/how they can use/cite it. -- Helps users find the data, particularly when metadata is machine- readable and standardized. +- Helps users find the data, particularly when metadata is machine-readable and standardized. - Can make analysis easier with software tools that interpret standardized metadata (e.g. Xarray). -To be machine readable, the metadata needs to be standardized. See an example of a community-accepted standard for labeling climate datasets with the [CF Conventions](http://cfconventions.org/). +To be machine-readable, the metadata needs to be standardized. See an example of a community-accepted standard for labeling climate datasets with the [CF Conventions](http://cfconventions.org/). There are also software packages that can read metadata and enhance the user experience significantly as a result. For instance, [Xarray](https://docs.xarray.dev/en/stable/index.html) is an open-source, community developed software package that is widely used in the climate and biomedical fields, among many others. According to their website, "Xarray makes working with labeled multi-dimensional arrays in Python simple, efficient, and fun!". It's the "labeled" part where standardized metadata comes in! Xarray can interpret variable and dimension names without user input, making the workflow easier and less prone to making mistakes (e.g. users don’t have to remember which axis is "time" - they just need to call the axis with the label "time"). @@ -199,13 +199,13 @@ Data is the intellectual property of the researcher(s), or possibly of their fun If you don't license your work, others can’t/shouldn’t re-use it - even if you want them to. As mentioned previously in this module, a license is a legal document that tells users how they can use the dataset. It is important to understand the licensing conditions of a dataset before data reuse to avoid any copyright infringement or other intellectual property issues. -A dataset without a license does not mean that the data is open; using a licenseless dataset is not ethical. Contacting the data creator and getting explicit permission, while suggesting they apply a license, is the best path forward. +A dataset without a license does not necessarily mean that the data is open. Using a license-less dataset may pose an ethical dilemma. Contacting the data creator and getting explicit permission, while suggesting they apply a license, is the best path forward. Understanding when and where the license applies is crucial. For example, data created using US Government public research funds is, by default, in the public domain. However, that only applies to the jurisdiction of the United States. In order for this to apply internationally, data creators need to select an open license. -There are several different types of licenses that build on each other. Creative Commons (CC) licenses are often used for datasets. CC0 (also known as "public domain") is the license that allows for the most reuse because it has the least restrictions on what users can do with it. Although the CC0 license does not explicitly require citation, you should still follow community best practices and cite the data source. CC-BY is another common license used for scientific data that requires citation. From there, you can add restrictions around commercial use, ability to adapt or modify the data, or requirements to share with the same license. These other flavors all reduce usability by adding restrictions, such that other scientists may be unable to use the data because of institutional or legal restrictions. Funding agencies may require use of a specific license. For public agencies, this is often CC-0 or CC-BY, to maximize their return on investment and ensure widest possible re-use. +There are several different types of licenses that build on each other. Creative Commons (CC) licenses are often used for datasets. CC0 (also known as "public domain") is the license that allows for the most reuse because it has the least restrictions on what users can do with it. Although the CC0 license does not explicitly require citation, you should still follow community best practices and cite the data source. CC-BY is another common license used for scientific data that requires citation. From there, you can add restrictions around commercial use, ability to adapt or modify the data, or requirements to share with the same license. These other flavors all reduce usability by adding restrictions, such that other scientists may be unable to use the data because of institutional or legal restrictions. Funding agencies may require use of a specific license. For public agencies, this is often CC-0 or CC-BY, to maximize their return on investment and ensure the widest possible re-use. ### Example Data Licenses and Reuse diff --git a/Open-Science-101/Module_3/Lesson_4/readme.md b/Open-Science-101/Module_3/Lesson_4/readme.md index 543b8c4f..db912471 100644 --- a/Open-Science-101/Module_3/Lesson_4/readme.md +++ b/Open-Science-101/Module_3/Lesson_4/readme.md @@ -70,11 +70,10 @@ As discussed previously in this curriculum, there are many benefits to sharing a ### Should the Data be Shared? -Before datasets are shared, it’s important to consider any restrictions to your permission to share and ensure that your contributors – including sample and data donors – approve its release. +Before datasets are shared, it's important to consider any restrictions to your permission to share and ensure that your contributors – including sample and data donors – approve its release. -Data should be as open as possible and as closed as necessary. +Data should be as open as possible and as closed as necessary. Opening our data is a powerful way to enable discovery, transparency, and scientific progress. However you may want to consider a couple of points before data is shared: -- Opening our data is a powerful way to enable discovery, transparency, and scientific progress. - Some data are subject to laws, regulations, and policies which limit the release of the data. - Your local institution may have additional policies and resources – investigate them early and often. @@ -144,7 +143,7 @@ If you do not already have a data repository in mind, consider the following to - Do you think the tools offered by the repository for data discovery and distribution are suitable for your data and FAIR? - Does the repository require funding from your project, does it fit within your budget and does it require sustained support beyond the project life cycle? -Find and compare the services, benefits and limitations of the repositories you are considering. Each repository will have its own processes and requirements for accepting and hosting your data depending on their level of funding, purpose, and user base. +Find and compare the services, benefits and limitations of the repositories you are considering. Each repository will have its own processes and requirements for accepting and hosting your data depending on their level of funding, purpose, and user base. Similarly, each repository will provide a different set of functionality and services depending on their level of funding, purpose, and user base. @@ -256,7 +255,7 @@ The goal is to make it easy to cite your data. Best practices include: - Different repositories and journals have different standards for how to cite data. If your repository encourages it, include a .CFF file with your data that explains how to cite your data. - Clearly identify the data creators and/or their institution in your citation. - This allows users to follow up with the creators if they have questions or discover issues. - - Include ORCiD of data authors where possible in the citation. + - Include ORCID of data authors where possible in the citation. Now that your data are at a repository and have a citation statement and DOI, publicize it to your users and remind them to cite your data in their work! @@ -301,7 +300,7 @@ Sharing data should be respectful of the communities that may be involved. This - Privacy concerns and approval processes for release - is the data appropriately anonymized? - How to engage with communities that data may be about. - How data can be correctly interpreted. -- Are there any data restrictions that may be necessary to ensure the sharing is respectful of the community the data involves, eg. collective and individual rights to free, prior, and informed consent in the collection and use of such data, including the development of data policies and protocols for collection? +- Are there any data restrictions that may be necessary to ensure the sharing is respectful of the community the data involves, e.g., collective and individual rights to free, prior, and informed consent in the collection and use of such data, including the development of data policies and protocols for collection? ## Lesson 4: Summary diff --git a/Open-Science-101/Module_3/Lesson_5/readme.md b/Open-Science-101/Module_3/Lesson_5/readme.md index 30caf213..3f876a73 100644 --- a/Open-Science-101/Module_3/Lesson_5/readme.md +++ b/Open-Science-101/Module_3/Lesson_5/readme.md @@ -42,7 +42,7 @@ There are also public examples of data management plans at [https://dmponline.dcc.ac.uk/public_plans](https://dmponline.dcc.ac.uk/public_plans). -If you are applying for funding, it is almost guaranteed that there will be specific requirements detailed in the funding opportunity. For example, the funder may require a certain license or use of a specific repository. Make sure to cross reference your plan with these requirements! +If you are applying for funding, it is almost guaranteed that there will be specific requirements detailed in the funding opportunity. For example, the funder may require a certain license or use of a specific repository. Make sure to cross-reference your plan with these requirements! ### Activity 5.1: Review a data management plan @@ -129,6 +129,10 @@ There are numerous ways to get involved with and support open data communities, ## Additional Resources +### Disclaimer + +Please note that we reference several papers throughout the course and depending on the paper, it might be blocked by a paywall. If you would like to get a copy of the paper, please contact the Author or search for it in an online preprint archive. For example, [bioRxiv.org](http://biorxiv.org/). + ### Resources for More Information In addition to the resources listed elsewhere in this training, the below community resources are excellent sources of information about Open Data. @@ -231,5 +235,5 @@ Congratulations! Now that you have completed the module, you should be able to d - Explain what open data means, its benefits, and how FAIR principles are used. - Discover open data, assess the data for reuse by evaluating provided documentation, and cite the data as instructed. -- Create an open data management plan, select open data formats, add the needed documentation, including metadata, readme files and version control, to make the data reusable and findable. +- Create an open data management plan, select open data formats, add the needed documentation, including metadata, README files and version control, to make the data reusable and findable. - Evaluate whether your data should and can be shared, and use the data accessibility process, including adding a DOI and citation instructions to enable it to be findable and citable. \ No newline at end of file diff --git a/Open-Science-101/Module_3/readme.md b/Open-Science-101/Module_3/readme.md index bf82fddd..4fc6fb5a 100644 --- a/Open-Science-101/Module_3/readme.md +++ b/Open-Science-101/Module_3/readme.md @@ -22,7 +22,7 @@ These key terms are important topics for this module. Select the term to see the **Copyright** – A type of intellectual property that protects original works of authorship as soon as an author fixes the work in a tangible form of expression. Many different types of works are covered by copyright law including data products and software. (As well as books, poems, paintings, photographs, illustrations, musical compositions, and many more.) -**Data** – Any type of information, recordable, or observable facts. Data are now most commonly stored electronically. +**Data** – Factual information (such as measurements or statistics) used as a basis for reasoning, discussion, or calculation. **Data License** – Data licenses give any data creator a way to grant the public permission to use their products under copyright law. Similarly, data licenses give data users clear guidelines regarding how they can reuse the material. diff --git a/Open-Science-101/Module_4/Lesson_1/readme.md b/Open-Science-101/Module_4/Lesson_1/readme.md index a1f5aaed..ab9d8686 100644 --- a/Open-Science-101/Module_4/Lesson_1/readme.md +++ b/Open-Science-101/Module_4/Lesson_1/readme.md @@ -117,11 +117,11 @@ Scientists use and produce a wide variety of different types of software during - Android operating system among others - You can look at the Android source code, but you can't modify it and install it on a device. And even if you could, you couldn't use any of the standard services (e.g. Google Store) with that. So it's "open" in the same sense that last night's lottery numbers are "open". -**Operational Software** – Operational software is used by data centers and large information technology facilities to provide data services. For example: +**Operational Software** – Software delivered to individuals as part of a program or product. Examples include automated workflows, data consolidation, and role-based interfacing and reporting. - [Fprime](https://nasa.github.io/fprime/) – Space mission flight software -**Infrastructure Software** – Infrastructure software is used by data centers and large information technology facilities to provide data services. Examples include: +**Infrastructure Software** – Forms the central framework of computer systems, also as known as the computer's set up foundation. Examples include operating systems, database management systems, web servers, middleware, and virtualization software. - [Fprime](https://nasa.github.io/fprime/) – Space mission flight software - [PODAAC](https://github.com/podaac) – Distributed archiving and processing software - [UFS](https://github.com/ufs-community) – Operational weather forecasting model software @@ -161,7 +161,7 @@ Open software principles are derived from open-source software best practices. T | Collaboration | When we're free to participate, we can enhance each other's work in unanticipated ways. When we can modify what others have shared, we unlock new possibilities. By initiating new projects together, we can solve problems that no one can solve alone. And when we implement open standards, we enable others to contribute in the future. | | Share early and often | Rapid prototypes can lead to rapid discoveries. An iterative approach leads to better solutions faster. When you're free to experiment, you can look at problems in new ways and seek answers in new places. You can learn by doing. | | Inclusive | Good ideas can come from anywhere, and the best ideas should win. Only by including diverse perspectives in our conversations can we be certain we've identified the best ideas, and good decision-makers continually seek those perspectives. We may not operate by consensus, but successful work determines which projects gather support and effort from the community. | -| Community | Communities form when different people unite around a common purpose. Shared values guide decision making, and community goals supersede individual interests and agendas. | +| Community | Communities form when different people unite around a common purpose. Shared values guide decision-making, and community goals supersede individual interests and agendas. | Credit: [The open source way \| Opensource.com](https://opensource.com/open-source-way) @@ -251,7 +251,7 @@ There are valid reasons that restrict a researcher’s ability to share their co The [collaborative data science handbook by The Turing Way](https://the-turing-way.netlify.app/reproducible-research/licensing) says of restrictions to open source sharing, "As with anything else in society, some of what you can and cannot do in software (or hardware) development is determined by the law. Licensing is therefore an important aspect of sharing/publishing open source projects as it provides clarity for anyone looking to reuse an open source project. Without licenses in place, anyone who wants to reuse it will be left with legal ambiguity as to the status of using your intellectual property." -To be considered open source, software requires a license that complies with the Open Source Definition. One criteria of this definition demands that open source licenses "[must allow modifications and derived works, and must allow them to be distributed under the same terms as the license of the original software](https://opensource.org/licenses/)." +To be considered open source, software requires a license that complies with the Open Source Definition. One criterion of this definition demands that open source licenses "[must allow modifications and derived works, and must allow them to be distributed under the same terms as the license of the original software](https://opensource.org/licenses/)." In the next lessons, licenses will be discussed in more detail. As you are working on a project, you may want to use code developed by others, develop your own code, and then share it. Licenses affect all aspects of this process and it is important to understand how different licenses may affect your ability to share your code at the time of publication. It is also important to consider any requirements from your funder or institution about how you license your software. @@ -344,7 +344,7 @@ Answer the following questions to test what you have learned so far. Read the statement below and decide whether it's true or false: -*Software is referred to as open source when it is publicly accessible; anyone can see, modify, and distribute the code as they see fit.* +*Software is referred to as open source when it is publicly accessible; anyone can see, modify, and distribute the code as they see fit within the constraints set by the software license.* - True - False diff --git a/Open-Science-101/Module_4/Lesson_2/readme.md b/Open-Science-101/Module_4/Lesson_2/readme.md index e0df3334..ccf904cf 100644 --- a/Open-Science-101/Module_4/Lesson_2/readme.md +++ b/Open-Science-101/Module_4/Lesson_2/readme.md @@ -179,15 +179,15 @@ Most research code should be open source software, which is stored in code repos These are a few links to NASA-specific repositories that may be of interest: - [NASA Open Source Software](https://code.nasa.gov/) - [NASA Open APIs](https://api.nasa.gov/) -- [Science Discovery Engine A strophysics Data System](https://sciencediscoveryengine.nasa.gov/app/nasa-sba-smd/) +- [Science Discovery Engine Astrophysics Data System](https://sciencediscoveryengine.nasa.gov/app/nasa-sba-smd/) - [Earthdata Developer Portal](https://www.earthdata.nasa.gov/engage/open-data-services-and-software/api) -[Exoplanet Modeling and Analysis Center](https://www.earthdata.nasa.gov/engage/open-data-services-and-software/api) +- [Exoplanet Modeling and Analysis Center](https://www.earthdata.nasa.gov/engage/open-data-services-and-software/api) ## Assessing Open Code and Software So, you've discovered some exciting open code that might help you solve your scientific problem. Can you trust this code you discovered on the web? Will it be useful? How much time will it take to learn it? Could the code contain malware? Could you get in legal trouble for using it? -**Examples:** You found the “General Ocean Turbulence Model (GOTM)” on the internet, and it looks promising. Or, you just found lots of code snippets and functions related to the Lomb-Scargle power spectrum. Now you would like to assess these pieces of code to help you decide if you should use them. This section discusses some best practices for assessing if the code will help you. +**Examples:** You found the “General Ocean Turbulence Model (GOTM)” on the internet, and it looks promising. Or, you just found lots of code snippets and functions related to the Lomb-Scargle power spectrum. Now, you would like to assess these pieces of code to help you decide if you should use them. This section discusses some best practices for assessing if the code will help you. ### Four General Considerations for Assessing Open Software @@ -198,16 +198,16 @@ Software assessment criteria are similar, for any level of openness: - **Security:** Is it safe? Would using the software create a security risk? - **Licenses/restrictions:** Can you use it? Is it legal to use the software in your project? -### Functionality: Assessing Scientific Utility +### Functionality: Assessing Scientific Utility -#### Does the software meet your scientific needs?** +#### Does the software meet your scientific needs? - Does it address your specific science question? - Do studies similar to yours use it? - What papers cite it and how do they use it? - Talk to your advisors or colleagues that might have experience with it. -#### Testing the scientific compatibility +#### Testing the scientific compatibility - Does the software contain scientific test cases? If so, reproduce a case that is applicable to your problem; make sure the results are as expected. - If you’ve done similar scientific analysis/modeling previously, reproduce your prior results with the new software. Are the results consistent? @@ -251,7 +251,7 @@ The risks are relatively low for small snippets of code that are easy for you to Open software is perceived to have more security risks. This is generally less of a problem for open source code than executables because the code can be audited for security vulnerabilities by the community. How can you assess security in this case? - Consult with your institutional open software policies and IT staff -- Use authoritative reputable sources to minimize security risks +- Use authoritative, reputable sources to minimize security risks - Set strict security rules and standards when using a dependency - Use security tools to check for vulnerabilities (e.g., [Open Worldwide Application Security Project®](https://owasp.org/)) - Avoid unsupported open-source software. Switch to actively developed components or develop it yourself @@ -276,7 +276,7 @@ Consider the following when selecting among multiple versions of open source sof | | | |---|---| -| Use the latest stable release when possible | Just like software updates to your phone or computer’s operating system or apps, it is important to use the latest stable release. Developers often release developmental versions that include new features or bug fixes that are not fully tested. For this reason, using a developmental release is generally not recommended. | +| Use the latest stable release when possible | Just like software updates to your phone or computer’s operating system or apps, it is important to use the latest stable release. Developers often release developmental versions that include new features or bug fixes that are not fully tested. For this reason, using a developmental release is generally not recommended. | | Determine the origin of the version you intend to use | Determine whether the version you intend to use comes from a modified open-source project or from its original source project. With this information, determine which source is more appropriate for your project. | | Check for issues and bugs | Check for any known issues or bugs with your selected version that could cause problems. Find current information on issues or bugs by checking release notes, issue trackers, and developer forums. | @@ -285,7 +285,7 @@ Consider the following when selecting among multiple versions of open source sof - Implement tests to verify that the software performs as expected in your application. - If you run into problems, revisit the release notes, issue tracker, and/or user/developer forums. - Don't be afraid to ask experienced colleagues for help. -- It is better to seek and obtain help in a public forum than in private (eg. email). Part of open science is working in the open. Often you may find through a search that other users have similar questions. Someone may have already offered a solution. If not, it is likely that others will benefit from your question being answered in public. +- It is better to seek and obtain help in a public forum than in private (e.g., email). Part of open science is working in the open. Often you may find through a search that other users have similar questions. Someone may have already offered a solution. If not, it is likely that others will benefit from your question being answered in public. ### Activity 2.1: Ways to Get Help Using Open Software In this activity, you are asked to select from a list of ways you can resolve some common problems that arise when using open software. @@ -341,7 +341,7 @@ Cite any code that you view as having contributed to your research: - Did the code play a critical part in your research? - Did the code provide something novel? -In most cases, a code snippet on Stack Overflow does not constitute a citable research contribution. However, an author can still decide to cite it if they chose. +In most cases, a code snippet on Stack Overflow does not constitute a citable research contribution. However, an author can still decide to cite it if they choose. Instances when shared code directly impacts the scientific results and requires a detailed description include: @@ -350,11 +350,11 @@ Instances when shared code directly impacts the scientific results and requires See the journal where you are publishing if they have any specific instructions on how to cite software (e.g., [AAS Software Citation Suggestions](https://journals.aas.org/news/software-citation-suggestions/)). -In some cases, a software’s licensing terms and conditions require acknowledgement or citation in the references or bibliography of any publications based on research that made use of the software. +In some cases, a software’s licensing terms and conditions require acknowledgment or citation in the references or bibliography of any publications based on research that made use of the software. ### How to cite? -Ideally, use and cite code that is archived in a long-term repository with a persistent DOI. Follow the guidance about the preferred citation format, which is provided in the long- term repository and may appear in a README or a CITATION file. +Ideally, use and cite code that is archived in a long-term repository with a persistent DOI. Follow the guidance about the preferred citation format, which is provided in the long-term repository and may appear in a README or a CITATION file. DOIs provide a persistent identifier/link for research outputs. Thus, it is preferable to cite code in long-term repositories linked to a DOI. URLs (e.g., Stack Overflow) and active repositories (e.g., on GitHub) are mutable but can be used if there is no alternative. @@ -384,7 +384,7 @@ Discovering open software successfully depends on which of the following: Select all that apply. -- Well defined requirements +- Well-defined requirements - Knowing where to search - FAIR open software exists to meet your needs - All of the above diff --git a/Open-Science-101/Module_4/Lesson_3/readme.md b/Open-Science-101/Module_4/Lesson_3/readme.md index 4113a0ae..3b01fa2f 100644 --- a/Open-Science-101/Module_4/Lesson_3/readme.md +++ b/Open-Science-101/Module_4/Lesson_3/readme.md @@ -99,13 +99,13 @@ Before someone else can use your code, they're going to ask some questions: - In what ways am I allowed to use your code? - Will you accept changes to your code? If I find a bug, what do I do? - How do I trust your code works? -- How do I know if the code will be supported long term? +- How do I know if the code will be supported long-term? ## Importance of Version Control Your code will change significantly over the lifetime of your project. Just as we appreciate the ability to track earlier versions of documents or versions created by different people, inevitably someone will want to be able to revert, compare, and synthesize changes in code. -The most popular tool for version control is git. Git is a system that tracks changes in computer files, similar to Google Docs or SharePoint but more applicable to code script. Git is usually used in conjunction with a version control platform such as GitHub, Gitlab, or Bitbucket. These tools were covered in Module 2.2. +The most popular tool for version control is git. Git is a system that tracks changes in computer files, similar to Google Docs or SharePoint but more applicable to code script. Git is usually used in conjunction with a version control platform such as GitHub, GitLab, or Bitbucket. These tools were covered in Module 2.2. Version control enables the following: @@ -134,9 +134,9 @@ At the minimum, a README should contain the name of the project and a very short | **Bad** README example | "This code recomputes the fundamental permutation factor of the downward flow (for J < 10, obviously)." | | **Good** README example | "LeapKitten. This Python software package takes any picture of a kitten (JPEG, PNG) and uses artificial intelligence to output what it would look like leaping into the air. In addition, the code takes leap years into account on the timestamp on the image." | -In addition, the following information is helpful to add to the README especially if they are not listed elsewhere: +In addition, the following information is helpful to add to the README, especially if they are not listed elsewhere: -- A list of any code dependencies the software has, e.g. "Numpy, kitten-rng, and human- readable must be installed to run this software." +- A list of any code dependencies the software has, e.g. "Numpy, kitten-rng, and human-readable must be installed to run this software." - How to install and a brief description of how to run the software. - Detailed description of the software, especially if there is no external documentation. - Examples of how to use the software. @@ -170,13 +170,13 @@ Your software should be documented within the source code. Each function should > > Without going into details of the data type, calling parameters, etc. this description immediately puts someone looking at the code into the context of what the function aims to accomplish; they can then explore the details. > -> While you should consider placing a description at the start of a function, use your discretion on where you put similar descriptions of code. At the start of a complex loop or analysis would be good ideas. Don’t go overboard - things like this aren’t useful: +> While you should consider placing a description at the start of a function, use your discretion on where you put similar descriptions of code. At the start of a complex loop or analysis would be a good idea. Don’t go overboard - things like this aren’t useful: > > \# set x to 17 > > x = 17 > -> Descriptive variable, class, and function names can make your code very readable. . Sometimes even great coders are working fast and will name variables 'a', 'temp', or other names that probably won't make a lot of sense in a week or two when they come back to something they were working on. Names like 'baking_time' or 'velocity' are more clear. Variable names should be easy to understand and clearly represent what they are. +> Descriptive variable, class, and function names can make your code very readable. Sometimes even great coders are working fast and will name variables 'a', 'temp', or other names that probably won't make a lot of sense in a week or two when they come back to something they were working on. Names like 'baking_time' or 'velocity' are more clear. Variable names should be easy to understand and clearly represent what they are. > > Ideally, someone who doesn't write in the software language of the code can read the comments in the file and have a rough idea of what is happening. > @@ -362,7 +362,7 @@ A software license states the rights of the developer and user for a piece of so **Statement 2:** -Without a license, software is assumed copyrighted and without permissions. +Without a license, software is assumed copyrighted and without permission. - True - False @@ -389,9 +389,9 @@ In this section, some best practices in development are provided including on co Code benefits from peer review in the same way as science. Having someone else read over your code and test it is one of the best ways to improve the quality of the code. -Many version control platforms have built in tools that enable developers to review, comment, and iterate on each other’s code. These can be done in the open and allow anyone to comment. +Many version control platforms have built-in tools that enable developers to review, comment, and iterate on each other’s code. These can be done in the open and allow anyone to comment. -Here is a great example of the discussion that can happen when the original creator of an algorithm [comments on a python implementation made by a first time contributor to the Astropy project](https://github.com/astropy/astropy/pull/4301). The open and constructive discussion led to a better implementation of the algorithm along with possible future improvements. +Here is a great example of the discussion that can happen when the original creator of an algorithm [comments on a python implementation made by a first-time contributor to the Astropy project](https://github.com/astropy/astropy/pull/4301). The open and constructive discussion led to a better implementation of the algorithm along with possible future improvements. Software packages can be reviewed as their own products as well. Many scientific publications now accept papers focused on software. There are entities like [PyOpenSci](https://www.pyopensci.org/) and the [Journal of Open Source Software](https://joss.theoj.org/) that provide open peer review of scientific packages. See more details about JOSS in the next lesson on sharing your code. @@ -405,7 +405,7 @@ The main objective of code testing is to evaluate if a code does what its author - + @@ -423,7 +423,7 @@ The main objective of code testing is to evaluate if a code does what its author - + @@ -441,7 +441,7 @@ The main objective of code testing is to evaluate if a code does what its author - + @@ -459,7 +459,7 @@ The main objective of code testing is to evaluate if a code does what its author - + @@ -467,7 +467,7 @@ The main objective of code testing is to evaluate if a code does what its author @@ -524,7 +524,7 @@ Whether using open source, closed source, or commercial software, it is importan - + @@ -542,7 +542,7 @@ Whether using open source, closed source, or commercial software, it is importan - + @@ -560,7 +560,7 @@ Whether using open source, closed source, or commercial software, it is importan - + @@ -578,7 +578,7 @@ Whether using open source, closed source, or commercial software, it is importan - + @@ -602,7 +602,7 @@ Here are some further suggestions on how to make your code more accessible, repr | **Operation Documentation** | Share details about how you are running the code. For example, document the version of a software library you are using, or the version of the compiler. These are often shared in an 'environment.yml' file. | | **Automation** | Consider the following scenario:

You are getting ready to publish your paper that includes 17 plots that all depend on a data set released by a mission. Right before you are about to submit, the mission releases an updated version of the data set.

How easy will it be to recreate those plots?

Software allows you to automate the running of scripts and alert programmers when written so that input files are not hardcoding. This allows programmers to easily re-run code if an initial parameter changes. | | **Using Standards** | Most languages have their own coding style adopted by their respective communities. Following those conventions makes it easier for others to contribute to your code and makes your project more inclusive. | -| **Portability** | Share details about how you are running the code, for example the version of a software library you are using, or the version of the compiler. These are often shared in an 'environment.yml' file. | +| **Portability** | Allows individuals the ability to transfer their personal data between platforms. | | **Naming** | Many historical terms used in software have negative connotations depending on the context. When considering different terms or naming, consider how different audiences may react to those terms. | ## Lesson 3: Summary @@ -658,7 +658,7 @@ Select two items that are good to include in a README file from the list below: **04/05** -Which of the following licenses allows users to reuse, but also require users to share their changes with the community using the same license? +Which of the following licenses allows users to reuse, but also requires users to share their changes with the community using the same license? - Public Domain - Lesser general domain @@ -676,4 +676,4 @@ Which of the following practices makes your project more inclusive? - Referencing historical events in the name of your project. - Following standards for the programming language being used. - Developing the project privately. -- Including a Guideline for Contributors. \ No newline at end of file +- Including a Guideline for Contributors. diff --git a/Open-Science-101/Module_4/Lesson_4/readme.md b/Open-Science-101/Module_4/Lesson_4/readme.md index 8ea1ab4d..264c51f6 100644 --- a/Open-Science-101/Module_4/Lesson_4/readme.md +++ b/Open-Science-101/Module_4/Lesson_4/readme.md @@ -38,7 +38,7 @@ There are two major categories of sharing: sharing for development and providing ### Open Source Code Development -Writing scientific code is often a dynamic and collaborative process in which multiple people contribute and the code evolves over time. In such projects, it is beneficial to develop open code within a public repository hosting platform such as Github, Bitbucket, GitLab etc. from the beginning of a project. This ensures that all updates are shared openly on the web and can reach potentially interested collaborators and users in near real time. +Writing scientific code is often a dynamic and collaborative process in which multiple people contribute and the code evolves over time. In such projects, it is beneficial to develop open code within a public repository hosting platform such as GitHub, Bitbucket, GitLab etc. from the beginning of a project. This ensures that all updates are shared openly on the web and can reach potentially interested collaborators and users in near real time. ### Archiving Open Code @@ -168,10 +168,10 @@ First, consider your institutional or funding agency policies that may dictate w #### What are some good options and best practices for archiving your code? - Archive open code with an open access journal article. -- If the open code is in an active online development repository such as Github, then create a version and archive the code at a long-term repository with a DOI such as Zenodo, which can be integrated with Github (more details on this process later). +- If the open code is in an active online development repository such as GitHub, then create a version and archive the code at a long-term repository with a DOI such as Zenodo, which can be integrated with GitHub (more details on this process later). - Archive the code in other long-term public repositories, such as Software Heritage. -#### Is your code a substantial software package and of interest to a significant number of users from various disciplines? Where else can your open code be shared? +#### Is your code a substantial software package and of interest to a significant number of users from various disciplines? Where else can your open code be shared? @@ -181,7 +181,7 @@ First, consider your institutional or funding agency policies that may dictate w - Publish the software in a Journal dedicated to open software (ex. JOSS). - Get your software peer reviewed through communities like PyOpenSci. -#### To share my code, I can just add it to github, right? +#### To share my code, I can just add it to GitHub, right? Not necessarily. Sharing on a repository is encouraged, but a researcher’s funding organization may require a DOI from an archival repository, such as Zenodo, for long-term preservation of your code at the time of publication or version releases. @@ -229,7 +229,7 @@ Steps for this activity: 8. You will be automatically directed to your new repository webpage. 9. Now we will get a DOI from the Zenodo application. Note that we are going to use [https://sandbox.zenodo.org/](https://sandbox.zenodo.org/) to do this. This offers all the same capabilities as [https://zenodo.org](https://zenodo.org/) but is a testing site! Create a free account if you have not already. -**Part 2: Create an archived repository and affiliated DOI.** +**Part 2: Create an archived repository and affiliated DOI.** 1. Navigate to the [Zenodo GitHub page](https://sandbox.zenodo.org/account/settings/github/). Click on the button 'Connect' to allow Zenodo to access your GitHub repositories. @@ -272,7 +272,7 @@ CITATION files are a means to make citation information easily accessible in ope If you are hoping for community input on your software, it is a best practice to include CONTRIBUTING and CODE_OF_CONDUCT files in your repository that outline expectations for member interactions. -We won't go into these in detail here, but you can check out the [Xarray package's github repository](https://github.com/pydata/xarray/tree/main) for a good example. +We won't go into these in detail here, but you can check out the [Xarray package's GitHub repository](https://github.com/pydata/xarray/tree/main) for a good example. ## Who: Roles and Responsibilities of the Team Members in Implementing the SMP diff --git a/Open-Science-101/Module_4/Lesson_5/readme.md b/Open-Science-101/Module_4/Lesson_5/readme.md index 368ab2df..e7e69473 100644 --- a/Open-Science-101/Module_4/Lesson_5/readme.md +++ b/Open-Science-101/Module_4/Lesson_5/readme.md @@ -68,7 +68,7 @@ The following material assumes that you have met the threshold and are writing a ### Pen to Paper: Getting Started Writing a Plan -If you are applying for funding, it is almost guaranteed that there will be specific data management requirements detailed in the funding opportunity. For example, the funder may require a certain license or use of a specific repository. Make sure to cross reference your plan with these requirements. +If you are applying for funding, it is almost guaranteed that there will be specific data management requirements detailed in the funding opportunity. For example, the funder may require a certain license or use of a specific repository. Make sure to cross-reference your plan with these requirements. **Examples of Software Management Plans** @@ -168,7 +168,7 @@ Subscribe to and/or participate in forums (e.g., GitHub discussions, Stack Overf **Explore: The Turing Way** -**Hit the button to find out more information on building a community.** +**Hit the button to find out more information on building a community.** [CLICK TO LEARN](https://the-turing-way.netlify.app/collaboration/new-community.html) @@ -194,7 +194,7 @@ There are several types of contributing to open software. Not all of them requir |---|---| | **Add New Features** | The most obvious case for contributing to open software is enhancing its usability by adding new features. | | **Fix Bugs** | Alternatively, you can reply to an already opened issue by fixing it. | -| **Report Issues and Make Suggestions About Improving Code** | Reporting an issue is a valuable contribution even if you don’t know how to fix it. For example, you might be using a different browser in which the software has not been tested yet, have discovered a particularly uninformative error message, be colorblind or be otherwise able to feed a valuable user experience back to the developers that can help to improve the overall usability of the software. | +| **Report Issues and Make Suggestions About Improving Code** | Reporting an issue is a valuable contribution even if you don't know how to fix it. For example, you might be using a different browser in which the software has not been tested yet, have discovered a particularly uninformative error message, or be unable to feed a valuable user experience back to the developers that can help to improve the overall usability of the software. | | **Improving and Contributing to Documentation** | Contributing to documentation constitutes a great starting point to contributing to open source software and is often overlooked in its importance. Writing documentations allows you to familiarize yourself with the use of the software, while helping to teach others. | | **Create Tutorials, Use Cases, or Visuals** | Another way to contribute is to make your experience and use of the software publicly available. For example, you could create a tutorial based on your use of the software, summarize a use case or provide a summary of your use in a graphic. This part of contribution is particularly appealing as it does not create much extra work to just publish what you have used the software for. | | **Improve Layout, Automatization, Structure of Code** | Apart from creating new code, a good way to contribute to open source software can also be to improve, restructure or automatize existing code. This is called refactoring and helps to make the software project more effective and stable. | @@ -203,6 +203,10 @@ There are several types of contributing to open software. Not all of them requir ## Additional Resources +### Disclaimer + +Please note that we reference several papers throughout the course and depending on the paper, it might be blocked by a paywall. If you would like to get a copy of the paper, please contact the Author or search for it in an online preprint archive. For example, [bioRxiv.org](http://biorxiv.org/). + ### References and Guides In addition to the resources listed elsewhere in this training, the below community resources are excellent sources of information about Open Software. @@ -237,7 +241,7 @@ In this lesson, you learned: - When a SMP should be written and that your funding organization or institution may have rules around how you develop and share your code. - That joining software communities can be a great way to exchange knowledge and learn new skills around open code. -- That there are many ways to contribute to open code, and that not all of them require writing code." +- That there are many ways to contribute to open code, and that not all of them require writing code. ## Lesson 5: Knowledge Check diff --git a/Open-Science-101/Module_5/Lesson_1/readme.md b/Open-Science-101/Module_5/Lesson_1/readme.md index 97608e7d..ae7c9711 100644 --- a/Open-Science-101/Module_5/Lesson_1/readme.md +++ b/Open-Science-101/Module_5/Lesson_1/readme.md @@ -1 +1,277 @@ -# Just a test \ No newline at end of file +# Lesson 1: Introduction to Open Results + +## Navigation + +* [What Research Objects are Created Throughout the Research Cycle?](#what-research-objects-are-created-throughout-the-research-cycle) +* [Examples of Open Results](#examples-of-open-results) +* [What is the Reproducibility Crisis?](#what-is-the-reproducibility-crisis) +* [Lesson 1: Summary](#lesson-1-summary) +* [Lesson 1: Knowledge Check](#lesson-1-knowledge-check) + +## Overview + +This lesson aims to broaden your perspective regarding what shareable research outputs are produced throughout the research lifecycle. We will first consider what constitutes an open result. To do so, we will read an example of a forward-thinking research project that utilizes open result best practices. The perspectives gained from this example will ultimately get us thinking about how we can work towards creating reproducible research. + +## Learning Objectives + +After completing this lesson, you should be able to: + +- Describe what constitutes open results and list the research objects that can be created throughout a research cycle. +- Describe how sharing open results can advance science and your career. +- Explain what the reproducibility crisis is and how open science can help combat it. + +## What Research Objects are Created Throughout the Research Cycle? + +### The Traditional Depiction of a "Scientific Result" Has Changed Over Time + +When we think of results, most people think of just the final publication. + +**1665** + + + +This publication dates back to 1665 when the first scientific journal Philosophical Transactions was established to publish letters about scientific observations and experimentation. + +**1940s** + + + +Later in the 1940s, publishing became commercialized and took over as the mechanism for releasing journals, conference proceedings, and books. This new business model normalized publication paywalls. + +**21st century** + + + +Only by the 21st century did the scientific community expand the meaning of open results. The evolution of this definition was driven by technological advances, such as the internet, and advances in modes to share information. The open access movement was established by the [Budapest Open Access Initiative](https://www.budapestopenaccessinitiative.org/) in 2002 and the [Berlin Declaration on Open Access](https://openaccess.mpg.de/Berlin-Declaration) in 2003, both of which formalized the idea that, with regard to new knowledge, there should be "free availability on the public internet, permitting any users to read, download, copy, distribute, print, search, or link to the full texts of these articles" (Budapest Open Access Initiative). + +### But Results Have Always Been Far More Than Just the Publication + +You might be familiar with the research life cycle, but may not have considered what results could be shared openly throughout its process. This lesson adopts a definition of the research life cycle based on [The Turing Way](https://the-turing-way.netlify.app/index.html) and breaks it down into nine phases based, pictured in the figure below. + +Although the phases are presented in a linear fashion, we acknowledge that the research lifecycle is rarely ever linear! Products are created throughout the scientific process that are needed to enable others to reproduce the findings. The products of research include data, code, analysis pipelines, papers, and more! + +Following [Garcia-Silva et al. 2019](https://www.sciencedirect.com/science/article/abs/pii/S0167739X18314638), we define a Research object (RO) as a method for the identification, aggregation and exchange of scholarly information on the Web. Research objects can be composed of both research data and digital research objects that are defined as follows by the Organization for Economic Co-Operation and Development ([OECD Legal Instruments](https://legalinstruments.oecd.org/en/instruments/OECD-LEGAL-0347)). + + + +The term 'Open Results' comprehensively includes all these research products and more. + +Open results can include both data and code. Since data and code were covered in previous modules, in this lesson, we focus on sharing science outcomes as open results. Examples of open results can include: + +- Open access peer-reviewed articles +- Technical reports +- Computational notebooks +- Code of conduct, contributor guidelines, publication policies +- Blog posts +- Short form videos and podcasts +- Social media posts +- Conference abstracts and presentations +- Forum discussions + +Open access peer-reviewed articles are archived for long-term preservation and represent a more formal discussion of scientific ideas, interpretations, and conclusions. These discussions inform the method that researchers share results. In the following lesson section, we will discuss different types of sharing and methods to build and adapt them for use in your research. + +Scientists can share their incremental progress throughout the research process and invite community feedback. Sharing more parts of the research process creates more interactions between researchers and can improve the end result (which may be a peer-reviewed article). + +Throughout this module we will show you how to use, make, and share open results. + +### The Practice of 'Open' + +Specifically, the "Use, Make, Share" format has been naturally embedded throughout the curriculum and should be a familiar format by now. Lesson 2 will cover "Using". Lesson 3 will cover "Making". Lesson 4 will cover "Sharing". Throughout this module, we will pay particular attention to manuscripts and other research products as examples because the previous modules covered "Use, Make, Share" in the context of components with data and software. + + + +## Examples of Open Results + +Let's broaden our perspectives on the types of **research objects** that are produced throughout the research process. Let’s take a look at some examples from different projects. + +### Reaching New Audiences + + + +Qiusheng Wu is an associate professor at University of Tennessee. He has published 500+ video tutorials on [YouTube](https://www.youtube.com/%40giswqs), which have gained 25K+ subscribers, and 1.1M+ views (as of 8/2023). + +Professor Qiusheng Wu created a [YouTube channel](https://youtube.com/%40giswqs) in April 2020 for the purpose of sharing video tutorials on the [geemap Python package](https://geemap.org/) that he was developing. Since then, Wu has published over 500 video tutorials on open-source geospatial topics. The channel has gained over 25K subscribers, with more than 1 million views and 60K watch hours in total. On average, it receives 70 watch hours per day. + +The YouTube channel has allowed Wu to reach a much larger audience beyond the confines of a traditional classroom. It has made cutting-edge geospatial research more accessible to the general public and has led to collaborations with individuals from around the world. This has been particularly beneficial for Wu’s tenure promotion as it has resulted in increased funding opportunities, publications, and public engagement through the YouTube channel, social media, and GitHub. + +Overall, the YouTube channel serves as an important tool for Wu to disseminate research, inspire others, and contribute to the advancement of science. It has also played a significant role in advancing Wu’s professional career. + +### New Media for Science Products + +"A new method reduced the compute time for this image from ~30 minutes to \<1 minute". In 2021, Lucas Sterzinger spent one summer of his PhD on an internship. During that summer, he wrote a blog post to explain and demonstrate a game-changing technology called Kerchunk – a software package that makes accessing scientific data in the cloud much faster. + + + +Source: [https://medium.com/pangeo/fake-it-until-you-make-it-reading-goes-netcdf4-data-on-aws-s3-as-zarr-for-rapid-data-access-61e33f8fe685](https://medium.com/pangeo/fake-it-until-you-make-it-reading-goes-netcdf4-data-on-aws-s3-as-zarr-for-rapid-data-access-61e33f8fe685) + +--- + +Alongside the blog post, he also created a tutorial as a Jupyter Notebook – both of these resources and associated code are freely accessible to the public, allowing for rapid adoption and iteration by other developers and scientists. He posted the blog on Medium and posted about it to Twitter. The blog got a lot of attention on a newly developed technology as it was being developed! This is starkly different from the slow and complicated world of academic publishing where this result would not have been shared for about a year (writing it up, the review process, publication process). He said, "Working on Kerchunk and sharing it widely using open science principles greatly expanded my professional connections and introduced me to the field of research software engineering. The connections I made from this led me directly to my current role as a Scientific Software Developer at NASA." + +### New Products for Increasing Impact + + + +Image credit: OpenStreetMap 2011, Ken Vermette. CC BY-SA 3.0 + +--- + +From "2003: let's map the UK to 2023: using the data to map the world with applications ranging from Uber to mapping UN Sustainable Development Goals." (>1.5M contributors, 100M+ edits) [OpenStreetMaps is being used for GIS analysis](https://welcome.openstreetmap.org/about-osm-community/consumers/), such as planning or logistics for humanitarian groups, utilities, governments and more. This was only possible because it was set up and shared openly and built by a community devoted to improving it. You never know where your personal project might go or who might be interested in collaborating! + +### New Visualizations to Share Results + +Matplotlib was developed around 2002 by post-doc John Hunter to visualize some neurobiology data he was working on. He wasn't a software developer, he was a neurobiologist! He could have just published the paper in a peer-reviewed journal, and maybe shared his code to create the figures, but instead he started an open project on GitHub and thought, 'well if this is useful to me, maybe it will be useful to others...'. + + + +Source: [https://medium.com/dataseries/mastering-matplotlib-part-1-a480109171e3](https://medium.com/dataseries/mastering-matplotlib-part-1-a480109171e3) + +--- + +Matplotlib is now the most widely used plotting library for the Python programming language and a core component of the scientific Python stack, along with NumPy, SciPy and IPython. Matplotlib was used for data visualization during the 2008 landing of the Phoenix spacecraft on Mars and for the creation of the first image of a black hole. + +### JWST Case Study: Reporting and Publication + +And last but not least, we have the example for the JWST Early Release Science team from Module 1 on how they reported their results. This came in various forms from publishing a peer review paper, preprints, blog posts, and social media. Their peer-reviewed publication was published open access in Nature along with a preprint through arXiv. + +Open communication platforms furthered the reach and audience of results. + + + +Figure Credit: https://arxiv.org/abs/2208.11692 + +--- + +The public is interested in what you are doing, and reaching them can involve communication through traditional and new platforms. Publishing results on platforms such as Twitter/X, YouTube, TikTok, blogs, websites, and other social media platforms is becoming more common. Awareness through social media drastically increases the reach and audience of your work. There have been studies on how this impacts citation rates. For example, The Journal of Medical Internet Research (JMIR) conducted a three-year [study](https://www.jmir.org/) of the relative success of JMIR articles in both Twitter and academic worlds. They found that highly tweeted articles were 11 times more likely to be highly cited than less tweeted articles. + +Open communication platforms noticeably furthered the reach and audience of results. + + + +Twitter \#1: https://twitter.com/cornerof_thesky/status/1595086671275589632?s=20 + +Twitter \#2: https://twitter.com/V_Parmentier/status/1595127493199302656?s=20 + +TikTok: https://www.tiktok.com/@astrojaket/video/7168878696906886405 + +YouTube: https://www.youtube.com/watch?v=cI-kM_wPbbQ + +--- + +## What is the Reproducibility Crisis? + +A 2016 [Nature survey](https://www.nature.com/articles/533452a) on reproducibility found that of 1,576 researchers, "More than 70% of researchers have tried and failed to reproduce another scientist's experiments, and more than half have failed to reproduce their own experiments." The 'reproducibility crisis' in science is a growing concern over several reproducibility studies where previous positive results were not reproduced. + +We must consider the full research workflow if we are to solve the reproducibility crisis. The fact that 70% of researchers could not reproduce other scientists' results is shocking, especially considering that the reproducibility of science is the cornerstone of the scientific method. + + + +There are many personal incentives to implement open science principles throughout all stages of the research process. By making results open throughout, you increase your ability to reproduce your own results. This also has implications for research beyond the ability to improve your research. + +### What is the Cause of This Reproducibility Crisis? + + + +The three main causes of the reproducibility crisis are: + +1. Intermediate methods of research are often described informally or not at all. +2. Intermediate data are often omitted entirely. +3. We often only think about results at the time of publication. + +We need to think of the entire research process as a result. As an example, scientific articles describe computational methods informally which demands significant effort from others to understand and to reuse. + +Articles often lack sufficient information needed for other researchers to reproduce results, even when data sets are published, according to two studies in [Nature Genetics](https://www.nature.com/articles/ng.295) and [Nature Methods](https://www.nature.com/articles/nmeth.1333). Raw and/or intermediate data products and relevant software are often not provided alongside the final manuscript, limiting the reader's ability to attempt replication. + +Without access to the source codes for the papers, reproducibility has been shown to be elusive, according to two other studies in [Briefings in Bioinformatics](https://academic.oup.com/bib/article/12/3/288/258098) and [Nature Physics](https://www.nature.com/articles/nphys3313). + +### Combating the Reproducibility Crisis + +If your research workflow uses principles of open results, as showcased in the example, this will help you to combat the reproducibility crisis. + +We can create reproducible workflows and combat this crisis by considering open results at each stage of the research lifecycle. An Open Science and Data Management Plan (OSDMP) helps researchers think and plan for all aspects of sharing by determining how they will make software and data available. This plan can be shared publicly early on through a practice called pre-registering, where researchers determine their analysis plan and data collection procedure before a study begins (discussed previously in Lesson 2 of Module 2). + +### Activity 1.1: What Could You Do? + + + +Let's rethink your research workflow. Identify the research objects that could be (or could have been) shared as open results of a project you are/were involved in. What are high priority items for combatting the reproducibility crisis in each area of the research workflow? + +- Ideation +- Planning +- Project Design +- Engagement & Training +- Data Collection +- Data Wrangling +- Data Exploration +- Preservation +- Reporting & Publication + +**There are many personal advantages of implementing open science principles across all stages of a research process** + + + +#### Key Takeaways: What Could You Do? + +The OpenSciency team created a large table that describes all the different kinds of shareable research objects that are possible to create throughout the research lifecycle. + +**A full table is available here** + +[CLICK TO LEARN](https://opensciency.github.io/sprint-content/open-results/lesson1-research-process-and-results.html#research-stages-and-open-result-table) + +Thinking about sharing everything all at once can be overwhelming when you are getting started. To move forward, just focus on how you might pick the most important item. Here we have pared down the list to only a couple items per category. Furthermore, you could think about shortening the list even further when you are getting started. For example, maybe it is the case that, for your work, sharing the code used to wrangle the data is the most critical element to reproducibility. Therefore, code-sharing would be a good place to start your open science journey. The small steps we make are what move us towards sustainable open science. + +- **Ideation:** Proposals can be shared on Zenodo and open grant platforms such as [ogrants.org](https://www.ogrants.org/). +- **Planning:** Projects can be pre-registered before they begin. +- **Project Design:** Contributor guidelines or a code of conduct can be posted on Zenodo, GitHub, or team Web Pages. +- **Engagement & Training:** Workflow computational notebooks can be shared with the team via GitHub and released on Zenodo. +- **Data Collection:** Raw data can be shared through data repositories. +- **Data Wrangling:** Code can be shared through software repositories. +- **Data Exploration:** Computational notebooks can be shared via GitHub and released on Zenodo. +- **Preservation:** Data management plans for archiving can be posted on Zenodo. +- **Reporting & Publication:** + - Open access peer-reviewed articles + - Computational notebooks + - Code of conduct, contributor guidelines, publication policies + - Blog posts + - Short form videos and podcasts + - Social media posts + - Conference abstracts, posters, and presentations (when made openly available) + - Forum discussions + +## Lesson 1: Summary + +In this lesson, you learned that: + +- The contemporary scientific workflow involves being open about processes and products. Research products (results) include far more than just the final manuscript, which is a drastic change from the historical notion of a scientific result. +- At every stage of the research lifecycle, there are research objects produced that we can consider results. +- We can combat the reproducibility crisis by sharing these research objects at each stage of our research workflow. +- There are amazing examples of research groups sharing different types of open results! + +Let's start thinking about what we can do immediately to work towards an open research workflow. + +## Lesson 1: Knowledge Check + +Answer the following questions to test what you have learned so far. + +*Question* + +**01/02** + +Which of the following may fit the definition of a "research object"? + +- Raw data +- Blog +- Proposal +- Code of Conduct +- All of the above + +*Question* + +**02/02** + +What are some of the key causes of the reproducibility crisis? + +- Intermediate methods of research are often described informally or not at all. +- Intermediate data are often omitted entirely. +- We often only think about results at the time of publication. +- All of the above \ No newline at end of file diff --git a/Open-Science-101/Module_5/Lesson_2/readme.md b/Open-Science-101/Module_5/Lesson_2/readme.md index 97608e7d..00336346 100644 --- a/Open-Science-101/Module_5/Lesson_2/readme.md +++ b/Open-Science-101/Module_5/Lesson_2/readme.md @@ -1 +1,453 @@ -# Just a test \ No newline at end of file +# Lesson 2: Using Open Results + +## Navigation + +* [How to Discover Open Results](#how-to-discover-open-results) +* [How to Assess Open Results](#how-to-assess-open-results) +* [How to Use Open Results](#how-to-use-open-results) +* [How to Cite Open Results](#how-to-cite-open-results) +* [Lesson 2: Summary](#lesson-2-summary) +* [Lesson 2: Knowledge Check](#lesson-2-knowledge-check) + +## Overview + +By the end of this lesson, you will be familiar with resources for open results utilization, how and when to cite the sources of the open results that you use, how to provide feedback to open results providers, and how to determine when it is appropriate to invite authors of the open results materials to be formal collaborators versus simply citing those resources in your work. + +Published articles, blog posts, and forums can lead to new ideas for your own research. A technique learned from social media can be applied to a use-case that you are trying to solve. There are many different ways to discover results. + +## Learning Objectives + +After completing this lesson, you should be able to: + +- Identify a variety of open results sources including both published science research and non-traditional sources. +- Evaluate the reliability and quality of open results sources based on key characteristics. +- List the responsibilities of an open results user, including providing feedback to open results developers. +- List the ways to cite open results into your own research process. + +## How to Discover Open Results + +How do I learn about the state of research for a particular field? How do you engage in the current conversation? Researchers often begin with a search of peer-reviewed articles. This review tells you how much research has been done in a field and what conclusions have recently been reached. In most fields, going through the peer-review process can take up to a year. The ability to find pre-prints can help reduce this delay because they offer the latest findings before a publication date. However, researchers who choose to share their results before publication typically do so in the ways listed as best practices above. As you start research on a topic, how do you find all these different types of results and engage in the most relevant research? + +### Example: Exoplanets + +The various stages of research, from conceptualization to dissemination of results, produce products that can be put into the public domain as "Open Results". Where these results are archived, and to what degree, depends on the discipline author. However, some general guidelines on where to start a search on open results include: + +1. Scholarly Search Portals +2. Web Searches + +**Scholarly Search Portals** + +Search engines like Google and Bing have radically changed how we look up information. For research results, specialized academic search engines and portals curate scientific results from researchers based on topic and field. These engines are useful for finding peer-reviewed articles. + +
SCIENTIFIC VALIDATION ☑REPRODUCI-BILITY TESTINGREPRODUCIBILITY TESTING BUILT IN TESTS AUTOMATED TESTING
SCIENTIFIC VALIDATIONREPRODUCI-BILITY TESTING ☑REPRODUCIBILITY TESTING ☑ BUILT IN TESTS AUTOMATED TESTING
SCIENTIFIC VALIDATIONREPRODUCI-BILITY TESTINGREPRODUCIBILITY TESTING BUILT IN TESTS ☑ AUTOMATED TESTING
SCIENTIFIC VALIDATIONREPRODUCI-BILITY TESTINGREPRODUCIBILITY TESTING BUILT IN TESTS AUTOMATED TESTING ☑
-

Built in tests can usually be run both manually and automatically. Most version control platforms offer services for running tests automatically. When run this way, code can be checked to see if changes raise any problems. This process of checking the code automatically as it is developed is called continuous development or continuous integration (CI/CD). If a small change made in one part of the code results in an unexpected change in another part, running the tests will uncover this immediately.

+

Built-in tests can usually be run both manually and automatically. Most version control platforms offer services for running tests automatically. When run this way, code can be checked to see if changes raise any problems. This process of checking the code automatically as it is developed is called continuous development or continuous integration (CI/CD). If a small change made in one part of the code results in an unexpected change in another part, running the tests will uncover this immediately.

FINDABLE ☑ ACCESSIBLEINTER-OPERABLEINTEROPERABLE REUSABLE
FINDABLE ACCESSIBLE ☑INTER-OPERABLEINTEROPERABLE REUSABLE
FINDABLE ACCESSIBLEINTER-OPERABLE ☑INTEROPERABLE ☑ REUSABLE
FINDABLE ACCESSIBLEINTER-OPERABLEINTEROPERABLE REUSABLE ☑
+ + + + + + + + + + + +
GENERIC ☑DISCIPLINE-SPECIFIC
+ +
+ + + + + + + + + + + + + +
GENERICDISCIPLINE-SPECIFIC ☑
+ +
+ +Publications that provide some levels of open access are tracked in the [Directory of Open Access Journals (DOAJ).](https://doaj.org/) + +**Web Searches** + +Open results include much more than open-access peer-reviewed publications. How do you find these alternative types of research objects? + +Open communities and forums offer the best way to find research objects other than complete publications. How do you even find out whether these exist and where they are? + +Once you have found a few peer-reviewed articles that are highly relevant, to find additional research objects, you can follow the authors on social media for links to their posts, blogs, and activities. There are open communities in almost every area of research - find yours! Here are different platforms to locate these conversations and resources: + +- GitHub +- LinkedIn +- YouTube +- Google/Bing +- Conference websites +- X, formerly known as Twitter +- Facebook +- Medium +- Substack +- Stack Overflow +- Reddit +- Mastodon + +Various research objects, including datasets and software, are frequently attached to scholarly publications in the form of supplemental material. At other times, the source is referenced in the paper, which could be a GitHub repository, personal/institutional website, or other storage site. This can be another starting point, by engaging in discussions on the GitHub repository. + +**Kerchunk Example:** In lesson 1, a blog post about a software library 'kerchunk' was presented. Let's look at a [post](https://discourse.pangeo.io/t/trick-for-improving-kerchunk-performance-for-large-numbers-of-chunks-files/3090) on the [Pangeo Discourse Forum](https://discourse.pangeo.io/) of Kerchunk with a large number of views. The open science [Pangeo project](https://pangeo.io/) worked completely in the open. The [project website](https://pangeo.io/) (run off of GitHub) has links to blog posts, a discussion forum, and a calendar to all their meetings which anyone was welcome to join. This has resulted in an engaged and dynamic community. An example of this comes from the post linked to above, where one person asks for help, others reply, and the conversation is documented in the open. The post’s 636 views indicate that this question, or one similar, has occurred to others. Imagine if this had been done over private email? By working in the open, they are improving science and helping everyone become faster and more accurate. + +## How to Assess Open Results + +"Garbage in, garbage out" – your own research products are only as good as the data used in your investigation. + +If you use poor quality data or materials from unreliable and unvetted sources as critical components of your research, you run the risk of producing flawed, or low-quality science that may harm your reputation as a scientist. Therefore, it is critical to assess the quality and reliability of open-results sources before you include them in your own work. + +What are best practices for assessing the quality of alternative sources of data to research articles such as blog posts, YouTube videos, and other research objects? + +### Attributes of Reputable Material + +Let's take a look at the questions you might consider asking yourself when determining the reliability of any type of open results source. + +Here, we list questions under two categories: the open results material themselves, and the server they are downloaded from. The more questions here that can be answered in the affirmative, the lower the risk in utilizing the open results materials for your own research. + + + + + + + + + + + + + + +
THE MATERIAL ITSELF ☑THE ASSOCIATED WEBSITE / SERVERSOURCE RELIABILITY INDICATORS
+
    +
  • Is the material associated with a peer-reviewed publication?
  • +
  • Are the primary data associated with the results also open-source?
  • +
  • Is code used to generate the Open Results materials also open-source?
  • +
  • Are all fields and parameters clearly defined?
  • +
  • Is the derivation of measurement uncertainties clearly described?
  • +
  • Were any data or results excluded, and if so, were criteria provided?
  • +
  • Are authoring teams also members of the field?
  • +
+
+ + + + + + + + + + + + + + +
THE MATERIAL ITSELFTHE ASSOCIATED WEBSITE / SERVER ☑SOURCE RELIABILITY INDICATORS
+
    +
  • Does the host website's URL end in .edu, .gov or (if managed by a non-profit organization) in .org?
  • +
  • Does the host website provide contact information of the author and/or organization?
  • +
  • Is the host website updated on a frequent basis?
  • +
  • Is the host website free of advertisements and/or sponsored content, the presence of which could indicate bias?
  • +
+
+ + + + + + + + + + + + + + +
THE MATERIAL ITSELFTHE ASSOCIATED WEBSITE / SERVERSOURCE RELIABILITY INDICATORS ☑
+
    +
  • Is the result reproducible? Can you interact with the data and results? Have others reported being able to reproduce the results?
  • +
  • Is the author reliable? Have you seen them publish or share results in other forums?
  • +
  • Is the result from only a single author/voice or includes contributions from a broader community?
  • +
  • Does the post have a significant amount of likes/views and public comments? The value of a blog post with no comments or responses can be difficult to assess. Conversely, a thorough github discussion forum with multiple views shared indicated a robust post.
  • +
  • Is the result part of an active conversation? (Is the information still relevant and current?)
  • +
+
+ +Adapted from [https://www.scribbr.com/working-with-sources/credible-sources/](https://www.scribbr.com/working-with-sources/credible-sources/) + +Note that failure to meet one or many of the criteria does not automatically mean that the open results are of poor quality, but rather that more caution should be exercised if incorporated into your own research. It also means that you will have to invest more personal vetting of the material to ensure its quality is sufficient for your purposes. + +Reliable Example: Qiusheng Wu YouTube videos (as mentioned in the previous lesson). Professor Wu is an expert in his field. He presents results along with notebooks that demonstrate reproducibility. Comments on his YouTube tutorial videos represent meaningful interactions between users reproducing results and the author. + +## How to Use Open Results + +While open results benefit science and have already provided valuable societal benefits, the misuse and incautious sharing of open materials can have far-reaching harmful effects. The end-user of open results bears the responsibility to ensure that the data they reference are used in a responsible manner and that any relevant guidelines for the use of the data are followed. + +### How to Contribute and Provide Constructive Feedback + +Contributing to and providing constructive feedback are vital components for a healthy open access ecosystem, ensuring long-term sustainability of the open resources by providing continual improvements and capability expansions. + +In our current system, there are results creators and consumers. This scenario presents a one way street with no feedback loop, no sharing of data back to publishers, and no sharing between intermediaries. + +The practice of producing open results aims to foster a system where feedback loops exist between users and makers. Users share their cleaned, integrated, or improved work to the maker. This feedback creates a symbiotic and sustainable process where everyone benefits. + +### Your Responsibilities as an Open Results User + +- Users should familiarize themselves with contributor guidelines posted to open result repositories and follow the associated policies. What if there aren't contributor guidelines? Contact the creators! +- Always provide feedback in a respectful and supportive manner. +- If you discover an error in Open Results materials, the ethical action to take is to contact the author (or repository, depending on the nature of the issue) and give them the opportunity to correct the problem, rather than ignoring the issue or (worse!) taking advantage of a fixable issue to elevate your own research. + +### Different Ways to Provide Feedback + +#### Use GitHub Issues + + ++ + + + + + + + + + + + + +
+ + Pro: The feedback is open and other community members can see ongoing issues that are being addressed.
+ + Pro: Contribution is archived and logged on GitHub.
+ +**Working with GitHub Issues** + +See this blog for general issue etiquette + +[OPEN](https://www.w3.org/International/i18n-activity/guidelines/issues.html) + +#### Email authors + + ++ + + + + + + + + + + + + +
+ + Con: the feedback is closed. The information is generally not propagated back to the community unless the creator creates a new version.
+ + Con: No way of tracking credit.
+ +### Getting Credit for Providing Feedback + +If your feedback results in a substantial intellectual contribution to the work, it is reasonable for you to expect an opportunity for co-authorship in a future version of the open result. The associated contribution guidelines should address this possibility and manage expectations prior to your providing feedback. + +Sadly, many times contributor guidelines do not exist and it is not clear what is "substantial". + +### Open Results User Responsibilities + +- **Institutional Security Compliance:** Always download code from an authoritative source and be familiar with / follow your institution’s IT security policies. +- **Licensing Policies:** Understand and abide by the license(s) associated with the open results materials being used. +- **Attribution and Contribution:** Provide appropriate attribution for the open results used and contribute to the open results community. + +Additionally, give credit to repositories that provide open source materials in the acknowledgement section of your paper. If the repository provides an acknowledgments template in their “About” link, follow that suggestion. Otherwise, a generic "This research has made use of \." will be sufficient. + +### Avoid Plagiarism When Using Open Results + +Standard guidelines that you’ve been using in your research all along for providing appropriate attribution and citations of closed access publications also apply to open access published works. + +Examples of plagiarism include: + +- Word-for-word copying without permission and source acknowledgement. +- Copying components (tables, processes, equipment) without source attribution. +- Paraphrasing an idea without proper source referencing. +- Recycling one's own past work and presenting as a new paper. + +#### FACTSHEET: Plagiarism + +**Here is a useful guide regarding the different forms of plagiarism** + +[CLICK TO LEARN](https://www.elsevier.com/editor/perk/plagiarism-complaints#0-introduction) + +## How to Cite Open Results + +Giving proper attribution to open results is an important and ethical responsibility for using open a source materials. The process for citation is specific to the nature of the material. + +### Citation Guidelines for Published Versus Unpublished Results + +If a paper has been formally published in a journal, then your citation should point to the published version rather than to a preprint server. + +Take the time to locate the originating journal to provide an accurate citation. + + + +Preprint Server (Cite only if journal publication not available) + +--- + + + +Source Publication (Always cite) + +--- + +If a paper that you wish to cite is not yet accepted for publication, you should follow the guidelines of the journal to which you are submitting your paper. A preprint reference citation typically includes author name(s), date of the most recent version posted, paper title, name of the preprint server, object type ("preprint"), and the DOI. + + + +At the time of the Lesson preparation, the following paper did not yet appear as a journal publication. + +Jin, H., et al. 2023, "Optical color of Type Ib and Ic supernovae and implications for their progenitors," ApJ, preprint, arXiv:2304.10670. + +--- + + + + + + + + + + + + + + +
FOR MATERIAL THAT HAS A DOI ☑FOR MATERIAL THAT DOES NOT HAVE A DOIFOR OTHER MATERIALS OR INTERACTIONS THAT WERE HELPFUL FOR YOUR RESEARCH
+

To cite all of the following, follow existing guidelines and community best practices:

+
    +
  • Cite publications
  • +
  • Cite data
  • +
  • Cite software
  • +
  • Cite any other object with a DOI. Since many journals will only allow authors to cite material that has a DOI, what do you do with other types of open results?
  • +
+
+ + + + + + + + + + + + + + +
FOR MATERIAL THAT HAS A DOIFOR MATERIAL THAT DOES NOT HAVE A DOI ☑FOR OTHER MATERIALS OR INTERACTIONS THAT WERE HELPFUL FOR YOUR RESEARCH
+

Examples include blog posts, videos, and notebooks.

+
    +
  • You could also contact the author and ask them to obtain a DOI.
  • +
  • Leave a comment in the comments section or on the forum letting the author know about your publication.
  • +
+
+ + + + + + + + + + + + + + +
FOR MATERIAL THAT HAS A DOIFOR MATERIAL THAT DOES NOT HAVE A DOIFOR OTHER MATERIALS OR INTERACTIONS THAT WERE HELPFUL FOR YOUR RESEARCH ☑
+
    +
  • Acknowledge communities and forums that helped you advance your research in the Acknowledgements Section. Not only does this give them credit, but it helps others find those communities.
  • +
  • Citing open research results advances science by giving appropriate credit for all parts of the research process. This is essential for the cultural shift to open science; we must give credit for all types of contributions, and expect them in return. Participatory science allows more people, from more places, with different voices and experiences to participate in science.
  • +
  • Contributing and collaborating this way lowers the barriers (like conference fees) to participating in science and broadens who can participate.
  • +
+
+ +### Examples of Giving Credit + +In the Lesson 1 blog post [example](https://medium.com/pangeo/fake-it-until-you-make-it-reading-goes-netcdf4-data-on-aws-s3-as-zarr-for-rapid-data-access-61e33f8fe685), researchers acknowledged people they worked with in an article they wrote that they found helpful, and two different communities, as well as the computational environment they worked on. This is a great example of giving credit: "I would like to thank Rich Signell (USGS) and Martin Durant (Anaconda) for their help in learning this process. If you're interested in seeing more detail on how this works, I recommend Rich's article from 2020 on the topic. I would also like to recognize [Pangeo](https://pangeo.io/) and [Pangeo-forge](https://pangeo-forge.org/) who work hard to make working with big data in geoscience as easy as possible. Work on this project was done on the Pangeo AWS deployment." + +In Lesson 1, the JWST case study was presented. The peer-reviewed [publication](https://www.nature.com/articles/s41586-022-05269-w#Ack1) that reported the first discovery of CO2 on another planet has been accessed 18,000+ times. Notice is that the authorship is attributed to the entire team. The Acknowledgements section duly explains the contributions of their collaborators and partners, "The results reported herein benefited during the design phase from collaborations and/or information exchange within NASA’s Nexus for Exoplanet System Science (NExSS) research coordination network sponsored by NASA's Science Mission Directorate." Also, "All the data and models presented in this publication can be found at [https://doi.org/10.5281/zenodo.6959427](https://doi.org/10.5281/zenodo.6959427)". And finally, they cite all the software! "The codes used in this publication to extract, reduce and analyze the data are as follows..." + +## Lesson 2: Summary + +In this lesson, you learned: + +- Open results can be found using both Scholarly Search Portals and Web searches. +- The reliability of a post can generally be evaluated by the trustworthiness of the website from which they originated from, the engagement of community members, and the scientific rigor of its content. +- Users of open results, as inherent stewards of the open a source community, informally carry some responsibility to contribute to the community’s sustainability. This participation includes providing feedback to open results providers and developers. +- Giving proper attribution to open results is an important and ethical responsibility for using open source materials. The process for citation is specific to the nature of the material. + +## Lesson 2: Knowledge Check + +Answer the following questions to test what you have learned so far. + +*Question* + +**01/02** + +Which of the following could be a source of open results? Select all that apply. + +- Web searches +- Papers accessed through a paid subscription +- Materials made public after a 1-year exclusive-use/paid-subscription-only period +- Repositories +- Open access papers + +*Question* + +**02/02** + +Which of the following characteristics suggest that a particular paper / data set is more likely to be a credible Open Result? Select all that apply. + +- The website lists its source of funding +- The results are described in an associated peer-reviewer publication +- Detailed documentation accompanies the data, including defined fields and parameters +- Organization contact information is listed +- The website was last updated in 2015 +- The webpage advertises open job positions with the funder +- The author is an expert in the field +- The webpage's URL ends in '.com' +- The accompanying documentation states that "on inspection, data that were obvious outliers were excluded from the following analysis" \ No newline at end of file diff --git a/Open-Science-101/Module_5/Lesson_3/readme.md b/Open-Science-101/Module_5/Lesson_3/readme.md index 97608e7d..18a220db 100644 --- a/Open-Science-101/Module_5/Lesson_3/readme.md +++ b/Open-Science-101/Module_5/Lesson_3/readme.md @@ -1 +1,599 @@ -# Just a test \ No newline at end of file +# Lesson 3: Making Open Results + +## Navigation + +* [How to Make Open Results](#how-to-make-open-results) +* [Role of Contributors in Open Science](#role-of-contributors-in-open-science) +* [How to Give Open Recognition](#how-to-give-open-recognition) +* [Combining Open Results for Scientific Reporting and Publications](#combining-open-results-for-scientific-reporting-and-publications) +* [Lesson 3: Summary](#lesson-3-summary) +* [Lesson 3: Knowledge Check](#lesson-3-knowledge-check) + +## Overview + +In Lesson 2 you learned how to use other's results. In this lesson, we focus on making open results. We will start by discussing what it means to make reproducible results. Having earlier in the course discussed the computational reproducibility practices in open software, in this lesson, we specifically emphasize the importance of collaborations in making those results open and reproducible. This begins with acknowledging that the scientific results are not made by single individuals. We will then teach how to ensure equitable, fair, and successful collaborations when making your open results that acknowledge all contributions. Once you’ve planned the rules of engagement, we will provide you with ways to ensure that your reporting and publication abide by open results principles and combat the reproducibility crisis. + +## Learning Objectives + +After completing this lesson, you should be able to: + +- Identify approaches to make different types of open results. +- Recognize the importance of collaboration in making results. +- Develop contribution guidelines to enable recognition of contributors who make results. +- Combine different open results to create scientific reports and reproducible outputs. + +## How to Make Open Results + +### Capturing the Research Process Accurately in the Making of Results + +> I am aware of the reproducibility crisis and how open science can help combat it. What practical ways can I apply to my research outputs to make open results? How can I ensure that the results I share can be reproduced by others? How can I publish scientific publications that do not add to, but combat the reproducibility crisis? + +In the Ethos of Open Science, you learned about the ethics and principles underlying responsible open science practices. In Open Code, you explored and identified the right tools and methods that ensure the usability and reproducibility of your analysis. In Open Data, you developed a data management plan that can ensure the Findability, Accessibility, Interoperability and Reusability (FAIR) of your data throughout the research process, and not just at the end when the final report from the project is released. These open science approaches directly address the root causes of the reproducibility crisis, which are a lack of openness throughout the scientific process, lack of documentation, poor description of intermediate methods or missing data that were used at intermediate stages of the research process. In this lesson, you will learn to put all of these together to ensure that you are prepared to make your open results easy to reproduce by others. + +In Lesson 1, we identified different research components that can be considered open results at various stages of research. In this lesson, we want to specifically explain what processes are involved in making them. + +### Case Study: Open Results from Distributed Multi-Team Event Horizon Telescope Collaboration (EHTC) + +**Example:** Capturing results on activities ranging from collaboration to observations, image generation to interpretation. + +In 2017, the Event Horizon Telescope targeted supermassive black holes with the largest apparent event horizons: M87, and Sgr A\* in the Galactic Center on four separate days. This distributed collaboration led to the multi-petabyte yield of data that allowed astronomers to unveil the first image of a black hole providing the strongest visual evidence of their existence. The [EHTC website](https://eventhorizontelescope.org/) provides information about research projects, scientific methods, instruments, press and media resources (such as blog posts, news articles and YouTube videos), as well as events, data, proposals and publications. This project shows large-scale and high-impact work that applies open practices in making their results. Different kinds of outputs shared under this project can be mapped to different stages of the research process and the teams involved in creating them. + +### Making Results and Crediting Contributors Fairly at Different Stages of Research + +The case studies listed above highlight that results associated with a project are more than a publication. By understanding how open results are created in different projects, we can gain deep insights into the processes for making them. With that goal, the rest of this lesson describes the process of making results into three parts: 1) making all types of research outputs; 2) recognizing all contributors; and 3) combining outputs for scientific reporting and publications. + +### Making All Types of Research Outputs + +New ways of working with creative approaches for collaboration and communication in research have opened up opportunities to engage with the broader research communities by sharing scientific outcomes as they develop, rather than at the end through summary articles. A range of research components are created throughout the research lifecycle that can be shared openly. For example, resources created in a scientific project include, but are not limited to the following: + + + + + + + + + + + + + + + +
IDEATION AND PLANNING ☑DATA COLLECTION AND EXPLORATIONCOMMUNITY ENGAGEMENT AND REPRODUCIBILITYPRESERVATION AND PUBLICATION
+

Ideation and planning – perhaps before the research project is funded or started:

+
    +
  • Research proposals
  • +
  • People and organizations involved
  • +
  • Research ethics guidelines
  • +
  • Data management plan
  • +
+
+ + + + + + + + + + + + + + + +
IDEATION AND PLANNINGDATA COLLECTION AND EXPLORATION ☑COMMUNITY ENGAGEMENT AND REPRODUCIBILITYPRESERVATION AND PUBLICATION
+

Data collection and exploration – research artifacts created during the active research process:

+
    +
  • Project repository
  • +
  • Project roadmap and milestones
  • +
  • Resource requirements
  • +
  • Project management resources (without sensitive information)
  • +
  • Collaboration processes like Code of Conduct and contributor guidelines
  • +
  • Virtual research environment
  • +
  • Data and metadata information
  • +
+
+ + + + + + + + + + + + + + + +
IDEATION AND PLANNINGDATA COLLECTION AND EXPLORATIONCOMMUNITY ENGAGEMENT AND REPRODUCIBILITY ☑PRESERVATION AND PUBLICATION
+

Community engagement and reproducibility – most valuable during the project period:

+
    +
  • Training and education materials
  • +
  • Computational notebooks
  • +
  • Computational workflow
  • +
  • Code repository (version controlled)
  • +
  • Blog posts
  • +
  • Short form videos and podcasts
  • +
  • Social media posts
  • +
  • Forum discussions (for example when asking for feedback or troubleshooting)
  • +
+
+ + + + + + + + + + + + + + + +
IDEATION AND PLANNINGDATA COLLECTION AND EXPLORATIONCOMMUNITY ENGAGEMENT AND REPRODUCIBILITYPRESERVATION AND PUBLICATION ☑
+

Preservation and publication – expected to persist long-term:

+
    +
  • Publication and authorship guidelines
  • +
  • Open access peer-reviewed articles
  • +
  • Conference abstracts and presentations
  • +
  • End of project report
  • +
  • User manual or documentation
  • +
  • Public outreach and events
  • +
+
+ + + +Image credit: The Turing Way project illustration by Scriberia. Zenodo. + +--- + +You have already come across some of these in the previous lessons, and hopefully, you could already identify which of these or additional outputs you are generating in your work. To make them part of your open results, it's important that they are shared openly with appropriate licensing and documentation so that others can read, investigate and when possible, reuse or build upon them. + +### Making Open and Reproducible Results + +Open science ultimately informs our decisions as scientists and guides the selection of approaches that contribute to making our results open at different stages. One of the main purposes of open results is to ensure research reproducibility, often explained through definitions such as the following by [Stodden (2015)](https://www.annualreviews.org/doi/10.1146/annurev-statistics-010814-020127): + + + +"Reproducibility is a researcher's ability to obtain the same results in a published article using the raw data and code used in the original study." + +**Stodden (2015)** + +--- + +Using this definition, results that can be computationally reproduced by others would be called Reproducible Results. The EHTC case studies present open results as collections of research objects created at different stages of the research process. They also provide documentation and resources that allow reanalysis and reproduction of the original results. + +Ideally, anyone, anywhere, must be able to read a publication and understand the results, easily find methods applied, as well as properly follow procedures to achieve the same results as shared in that study. However, as already learned, the issue of reproducibility is prevalent across all scientific fields (refer to this Nature [report](https://www.nature.com/articles/533452a)). A well-intentioned scientist may share all research objects and describe all steps applied in their research, but failing to provide the research environment or other technical setup they used for analyzing their data can prohibit others from reproducing their results. This issue is further compounded by [human bias and errors](https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4776714/). For example, individuals may not always be able to identify how their interests and experiences inform their decisions that impact their research conclusions. This makes the issue of combating the reproducibility crisis even bigger. + +Approaches for making open results should integrate reproducible tools and methods, such as version control, continuous integration, containerization, code review, code testing and documentation. Furthermore, to extend the reproducibility beyond computational aspects of research, reporting and documentation for different types of outputs and decisions should also be supplied transparently. + +### How to Make Different Types of Open Results + +Sharing different types of results as early as possible not only helps you find solutions faster, but also helps your science be more reproducible because that openness helps you understand how to communicate your methodologies and your findings more clearly to others. Here we provide some easy places to start creating your results openly. + + + + + + + + + + + + + + + +
WRITING A FORUM POST ☑WRITING A GOOD BLOG POSTMAKING A GOOD VIDEOWRITING A SOCIAL MEDIA POST
+

Often, when first starting in research, public forums are a great place to begin understanding and collaborating with communities. Most discussion forums have a code of conduct and guidelines on best practices for participation. Some common ones that may be helpful are guidelines from StackOverflow, and Xarray, but most forums have some specific guidance. On forums, you increase trust by interacting with the community, so the more you interact, the more people are likely to respond! Often, best practices include making sure you are posting to the right area, using tags (when available), and including examples that document the question or issue you are having. If you review the post on the Pangeo Discourse Forum with a large number of reviews you can see that they clearly state the problem they are trying to solve, reference other posts on similar topics, link to a computational notebook that has an example of their code, and give an example of the code they are trying to do.

+
+ + + + + + + + + + + + + + + +
WRITING A FORUM POSTWRITING A GOOD BLOG POST ☑MAKING A GOOD VIDEOWRITING A SOCIAL MEDIA POST
+

Blogs are long-form articles that aren’t peer-reviewed. Blogs can be a great way to share your scientific process and findings before they are published, but also after they are published to provide another more accessible presentation of the material. For example, maybe you write a scientific article on your research that is highly technical, but then break it down in more accessible language in a blog post. Many scientists use blog posts to develop and test ideas and approaches because they are more interactive. There are science blogs all over the internet. Some popular ones are Medium, Science Bites, and Scientific American. One good way to get started is to find a blog post that you liked or found inspirational and use that as a guide for writing your own post.

+
+ + + + + + + + + + + + + + + +
WRITING A FORUM POSTWRITING A GOOD BLOG POSTMAKING A GOOD VIDEO ☑WRITING A SOCIAL MEDIA POST
+

Start small! Record a short video where you show how to do something that you struggled with or a new skill or tool that you learned how to use and post it to YouTube or other popular video platforms. Great videos often explain science concepts, ideas, or experiments to a target audience. Videos can inspire others to work in science, so talk about how you got into science, and show some of your research. There are a lot of online resources to help you out here as well!

+
+ + + + + + + + + + + + + + + +
WRITING A FORUM POSTWRITING A GOOD BLOG POSTMAKING A GOOD VIDEOWRITING A SOCIAL MEDIA POST ☑
+

Social media is also a good place to ask questions as you are just starting on a research topic and also as a place to share all types of results. Providing a link to a video, blog post, or computational notebook and/or sharing an image of a scientific result is a great way to start interactions. You can draw attention to your post by using hashtags and tagging other collaborators. There are a lot of online guides for how to write social media posts and it is always good to look at what others in your area are doing. Responding to comments and engaging with others can help you improve your research and learn about new tools or methods.

+
+ +All these different ways of sharing information will help make your published report or article better. And as you start working more in the open, with others, think about how collaborations will work and how you will give credit. All resources can be centralized through reports and documentation on a repository or website so anyone, including the 'future you' can find them in the future. + +More ways to communicate your work can be found in a [guide for communication](https://the-turing-way.netlify.app/communication/communication) in The Turing Way. + +### Maintaining Ethical Standards + +Open science, as learned in the Ethos of Open Science, should maintain the highest ethical standards. This can be enabled through the involvement of diverse contributors in the development of scientific outcomes. Participatory approaches allow multiple perspectives and expertise to be integrated into research from the start and ensure that peer review happens for all outputs in an iterative manner, not just for the articles at the end. + +In making and planning to share open results, you can apply the "as open as possible, as closed as necessary" principle. This means, protecting sensitive information, managing data protection practices where necessary and not carelessly sharing sensitive data or people's private information that can be misused. Online repositories, such as GitHub and GitLab, allow online interaction in addition to serving the technical purpose of version control and content hosting. For example, you can use [issues](https://docs.github.com/en/issues/tracking-your-work-with-issues/about-issues) and [a project board](https://docs.github.com/en/github-ae%40latest/issues/organizing-your-work-with-project-boards/managing-project-boards/creating-a-project-board) to communicate what is happening in a project at any given point. The use of [Pull Requests](https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/proposing-changes-to-your-work-with-pull-requests/about-pull-requests) signals an invitation for peer review on the new development of code or other content. Thanks to a number of reusable templates you don't have to set up repositories from scratch. For example, you can directly use a [template for reproducible research projects](https://github.com/the-turing-way/reproducible-project-template). + +## Role of Contributors in Open Science + +Collaboration is central to all scientific research. The positive impact of collaboration is achieved when diverse contributors are supported to combine a range of skills, perspectives and resources together to work towards a shared goal. Projects that apply open and reproducible approaches, make it easier for diverse contributors to be involved and get recognized for their contributions while supporting the development of solutions that they can all benefit from. + +Involving and recognizing the roles of all contributors in making open results is an important part of open science, which we will discuss next. + +### EHTC Case Study: Recognizing All Contributors + + + +A map of the EHT. Stations active in 2017 and 2018 are shown with connecting lines and labeled in yellow, sites in commission are labeled in green, and legacy sites are labeled in red. From Paper II (Figure 1). IOPscience. https://iopscience.iop.org/journal/2041-8205/page/Focus_on_EHT + +--- + +The Event Horizon Telescope (EHT) team involved 200 members from 59 institutes in 20 countries, from undergraduates to senior members of the field. They used an array that included eight radio telescopes at six geographic locations across the USA, Latin America, Europe and the South Pole. All collaborators were located in different geographic locations, had access to different instruments, collected data generated from telescopes in different locations and applied skills from across different teams to create groundbreaking results. Each contributor was acknowledged across different communication channels and given authorships in publications. EHTC also supports the "critical, independent analysis and interpretation" of their published results to facilitate transparency, rigor, and reproducibility ([EHTC website](https://eventhorizontelescope.org/blog/imaging-reanalyses-eht-data)). + +### Making Open Results Starts with Contributors! + +Making different research components and preparing to share them as open results involve a range of activities. Behind these activities are the contributors who engage in various responsibilities that include, but are not limited to: + +- Conceptualizing the idea +- Designing the project +- Serving as advisor or mentor +- Conducting experiment as a student, researcher, or research assistant +- Creating tools essential for carrying out the research +- Providing data expertise +- Developing software +- Providing specialized expertise and support +- Managing community and project requirements +- Providing feedback to the results +- Designing experiments and interpreting results +- Manuscript writing and review +- And [more](https://the-turing-way.netlify.app/collaboration/shared-ownership/shared-ownership-projects.html)! + +Too often conversations about contribution and authorship take place towards the end of a project or when a scientific publication is drafted. However, as you learned in the previous lessons, research outputs are generated throughout the lifetime of a research project. Therefore, it is important to build an agreement at the beginning of the project for how contributorship in the project will be managed. + +Developing contribution guidelines and contributor agreements requires collaboratively defining what is considered contributions in your project, who among the current contributors will get authorship, who will get acknowledged as a contributor, what is the significance of the order in which authors are listed in a scientific publication, and who makes these decisions. Ensuring that all collaborators understand and agree to these guidelines before beginning the project is also important. + +### Contributors and Authorship + +First and foremost, you must ensure that anyone who has contributed to the research project has their contributions recognized. With that shared understanding, in this lesson, you will explore what those recognitions as contributors or authors in your research project might look like. + +Let's first define contributor and author roles. + + + + + + + + + + + + + +
A "CONTRIBUTOR" ☑AN "AUTHOR"
+

A contributor is anyone who has done any activity that made it possible for the research to happen and results to be created, published or shared.

+
+ + + + + + + + + + + + + +
A "CONTRIBUTOR"AN "AUTHOR" ☑
+

An author of an open result is a contributor who has given a substantial contribution to the conception or design of the work or the acquisition, analysis, or interpretation of the data for the published work.

+
+ +### Are All Authors Contributors and Vice Versa? + +An author is a contributor who actively carries out one or several of the tasks listed above ([National Institute of Health - NIH](https://oir.nih.gov/sourcebook/ethical-conduct/authorship-guidelines-resources/authorship-resources) and [ICMJE](https://www.icmje.org/recommendations/browse/roles-and-responsibilities/defining-the-role-of-authors-and-contributors.html)). All authors are contributors, but all contributors may not be authors, for example, someone serving as a mentor, trainer or infrastructure maintainer. Ideally, all contributors are given the opportunity to author research outputs. + +Given the importance traditionally placed on authorship in scientific publication and the fuzziness of the definitions (that often contain relative terms such as "substantial" or "extensive" leaving too much room for interpretation), it is not surprising that determining who among the contributors gets to be an author can lead to biased or unfair decisions, disputes between contributors, or at the very least leave someone resentful and feeling unappreciated. + +There is no single approach for recognizing contributors as authors, but here is what you should consider: + + + + + + + + + + + + + +
GROUP POWER DYNAMICS & EQUITY (E.G. SENIORITY, SYSTEMS OF OPPRESSION) ☑THE TYPE OF CONTRIBUTION
+

Consider this hypothetical scenario: You are a postdoctoral fellow and the leading author of a research project. A rotating student spends 4 months in the lab helping you set up and perfect the experimental protocol that you will then use to carry out the experiments needed to answer your research question. They may even help you collect some preliminary data, but then they leave and later decide to join another lab. Would you provide authorship for the student?

+

It would be unethical not to give authorship or credit to someone who has provided significant help and contributed to the success of a research, even when they are no longer involved. A fair path in this scenario could be to contact the previous contributor and involve them in writing a relevant section of the manuscript.

+
+ + + + + + + + + + + + + +
GROUP POWER DYNAMICS & EQUITY (E.G. SENIORITY, SYSTEMS OF OPPRESSION)THE TYPE OF CONTRIBUTION ☑
+

The NIH guidelines for authorship outline what type of contribution does or does not warrant authorship. Each contribution is represented on a sliding scale and has no rigid cutoffs. Some contributions are given more weight than others. For example, for "design and interpretation of results", nearly all types of "original ideas, planning, and input" result in authorship. Whereas simply supervising the 1st author usually does not result in authorship (unless they are also contributing to the paper, of course). This is just one example. You will need to think about what this looks like for your own work!

+
+ +Clear communication about roles and responsibilities early in the project, and guidelines for how credit will be determined, can help mitigate some of these issues. + +### Diverse Role of Contributors + +It is important to set a reference for each research team/project about different kinds of responsibilities and opportunities available for different contributors and how each of them are acknowledged. [CRediT Taxonomy](https://credit.niso.org/) represents roles typically played by contributors to research in creating scholarly output. Below, we provide a table with research roles that extends the CRediT taxonomy to include broader contributorship ([Sharan, 2022](https://zenodo.org/record/8403386)). Using this as a starting point, open dialogue and discussion among team members can be facilitated to set a shared understanding and agreement about diverse roles of contributors including authorship of publications. The distinction between contribution types can help set clear expectations about responsibilities and how they can be recognized in a project. + + ++ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
Research RolesDefinition
Project AdministrationManagement and coordination responsibility for the research activity planning and execution
Funding AcquisitionAcquisition of the financial support for the project leading to the research and publications
Community EngagementConnecting with project stakeholders, enabling collaboration, identifying resources, and managing contributors interactions
Equity, Diversity, Inclusion and Accessibility (EDIA)Inclusive approaches to collaboration and research, involvement of diverse contributors, accessibility of resources, consideration of disability, neurodiversity and other considerations for equitable participation
Ethics ReviewEnsure that if the research project needs to undergo an ethics review process
Communications and EngagementCommunications about the project and engagements with the stakeholders beyond the project and institution
Engagement with Experts and PolicymakersPre-publication review, external advisory board meetings, regular reporting, post-publication reporting, and reaching out to the relevant policymakers actively
Recognition and CreditAssessing incentives, creating a fair value system, fair recognition of all contributors
Project DesignTechnical planning, expert recommendations, supervision or guidance, developing project roadmaps and milestones, tooling and template development
ConceptualizationIdeas; formulation or evolution of overarching research goals and aims
MethodologyDevelopment or design of methodology; creation of models
SoftwareProgramming, software development; designing computer programs; implementation of the computer code and supporting algorithms; testing of existing code components
ValidationVerification, whether as a part of the activity or separate, of the overall replication/ reproducibility of results/experiments and other research outputs - generalizable
InvestigationConducting a research and investigation process, specifically performing the experiments, or data/evidence collection
ResourcesProvision of study materials, reagents, materials, patients, laboratory samples, animals, instrumentation, computing resources, or other analysis tools
Data CurationManagement activities to annotate (produce metadata), scrub data and maintain research data (including software code, where it is necessary for interpreting the data itself) for initial use and later reuse (including licensing)
Writing - Original DraftPreparation, creation and/or presentation of the published work, specifically writing the initial draft (including substantive translation)
Writing - Review & EditingPreparation, creation and/or presentation of the published work by those from the research group, specifically critical review, commentary or revision – including pre-or post publication stages
VisualizationPreparation, creation and/or presentation of the published work, specifically visualization/ data presentation
SupervisionOversight and leadership responsibility for the research activity planning and execution, including mentorship external to the core team
+ +## How to Give Open Recognition + +To openly and fairly recognize all contributors, their names with the types of contributions they made should be listed in the project documentation. In manuscripts, it is a common practice to mention contributors' roles under the 'acknowledgement' section, such as using CRedIT or similar taxonomy as provided in the table above. All contributors should be encouraged to provide ORCIDs associated with their names to make them identifiable. + +Contribution statements in documentation and manuscripts can specify who did what in the official results. This is great for transparency. It is also a great way to guard against unfair power dynamics. Details about contribution type shows explicitly who works on which parts of results, and makes it easy to give fair authorship. For example: *"Pierro Asara: review and editing (equal). Kerys Jones: Conceptualization (lead); writing – original draft (lead); formal analysis (lead); writing – review and editing (equal). Elisha Roberto: Software (lead); writing – review and editing (equal). Hebei Wang: Methodology (lead); writing – review and editing (equal). Jinnie Wu: Conceptualization (supporting); Writing – original draft (supporting); Writing – review and editing (equal)."* + +If a GitHub repository and website exist, a dedicated page should be created to list and recognize all contributors. If someone minorly contributed to the paper, code or data, you could add them as an author or contributor to the GitHub and Zenodo releases respectively. Engaged collaborators and contributors not already involved in making research outputs should be given the opportunity to contribute to open results such as through presentation, posters, talks, blogs, podcasts, data, software as well as articles. + +### Activity 3.1: Draft a Contribution Guideline + + + +A standalone contribution guideline should be created for each open project, even when that means reusing an existing draft that the research team has used in another project. + +Note that this is different from "contributing" guidelines that describe "how" to contribute (for example on code repositories). Contribution guidelines should describe contribution types and ways to acknowledge them as discussed above. + +Contribution guidelines are not set-in-stone, but rather: + +- Are discipline-dependent +- Can be adapted for your unique situation + +You can begin by reviewing guidelines by [NIH](https://oir.nih.gov/sourcebook/ethical-conduct/authorship-guidelines-resources/authorship-resources) and [ICMJEs](https://www.icmje.org/recommendations/browse/roles-and-responsibilities/defining-the-role-of-authors-and-contributors.html) for authorship contributions. + +Notice that many categories and criteria for authorship, such as represented in the NIH guidelines' sliding scale, may be differently decided. For example, in some fields providing financial resources for a research project always warrants authorship. In other fields this is not the case. + +Some projects may not follow traditional manuscripts as their outputs. For example, if software is a primary output from a project, there may be a need to define specific roles regarding code contributions. You can work with your research team to create a version of CRediT Taxonomy for your project, such as shared in an expanded version of the table above. + +When different kinds of contributorship have been identified, clarify how different contributors will be involved and acknowledged. This may include recommended communication and collaboration processes for the team members, as well as recognition and credit for different kinds of contributions they make. + +**Additional Information** + +For additional tips on how to acknowledge different kinds of contributors to developing a resource including authorship, check out [Acknowledging Contributors The Turing Way](https://the-turing-way.netlify.app/community-handbook/acknowledgement.html). + +If working with online repositories such as GitHub, an app like '[all-contributors](https://allcontributors.org/)' bot is a great way to automate capturing all kinds of contributions, from fixing bugs to organizing events to improving accessibility in the project. + +More systematic work is being undertaken by [hidden REF](https://hidden-ref.org/) who constructed a broad set of [categories](https://hidden-ref.org/categories) that can be used for celebrating everyone who contributes to the research. + +There are several [infrastructure roles](https://the-turing-way.netlify.app/collaboration/research-infrastructure-roles.html) like community managers, data stewards, product managers, ethicists and science communicators, who are also being recognized as valued members in research projects with an intention to provide leadership paths for technical and subject matter experts, even when their contributions can’t always be assessed in tangible or traditional outputs \[[Mazumdar et al. 2015](https://journals.lww.com/academicmedicine/fulltext/2015/10000/evaluating_academic_scientists_collaborating_in.14.aspx), [Bennett et al., 2023](https://journal.trialanderror.org/pub/manifesto-rewarding-recognizing/release/1)\]. + +[The Declaration on Research Assessment](https://sfdora.org/) (DORA) is also a good resource to understand what researchers, institutions, funders and publishers can do to improve the ways in which researchers and the outputs of scholarly research are evaluated. + +## Combining Open Results for Scientific Reporting and Publications + +Scientific publications have traditionally remained one of the most popular modes of reporting and publication. Over the last decade, it has become a standard practice to submit pre-peer reviewed manuscripts on preprint servers (such as [arXiv](https://arxiv.org/)) to speed access to research before the peer-reviewed journal articles are published (discussed in Lesson 2). The publication system has also evolved massively. Journal articles are no longer about writing overview and summary of research, but can be used to share articles on software, data, education materials and more. + +### EHTC Case Study: Capturing Results on Activities Ranging From Collaboration to Observations, Image Generation to Interpretation + + + +The polarized image of the M87 black hole shadow as observed on 2017 April 11 by the EHT (left panel) and an image from the EHT Model Library with a MAD magnetic configuration (right pane), with a list of papers describing different sets of results. + +--- + +Across [several preprints](https://arxiv.org/search/astro-ph?searchtype=author&query=Event%2BHorizon%2BTelescope%2BCollaboration) and [eight peer-reviewed letters](https://iopscience.iop.org/journal/2041-8205/page/Focus_on_EHT), EHTC presented open results issued from different teams on instrumentation, observation, algorithm, software, modeling, and data management, providing the full scope of the project and the conclusions drawn to date. + +Open results such as reports, publications, code, white papers, press releases, blog posts, videos, TED talks and social media posts add to the comprehensive repertoire of open results supported by EHTC. Resources are centralized on the [EHTC website](https://eventhorizontelescope.org/), [GitHub organization](https://github.com/eventhorizontelescope) and [YouTube channel](https://www.youtube.com/%40ehtelescope) among others to provide easy access to all open results. + +It's important to highlight that their efforts have led to independent reanalysis and regeneration of black hole images. Specifically, [Patel et al. (2022)](https://arxiv.org/abs/2205.10267) not only reproduced the original finding, but also contributed additional documentation, code, and a computational environment as open-source containerized software package to ensure future testing. Some of the original authors reviewed this work and [made their comments also available online](https://quarxiv.authorea.com/users/557984/articles/607408-review-reproducibility-of-the-first-image-of-a-black-hole-in-the-galaxy-m87-from-the-event-horizon-telescope-eht-collaboration) (Authorea). + +### How Do I Connect Open Results to Make Reproducible Publications + +If not considered from the start, it can become challenging to ensure result reproducibility at the publication stage. Assuming that you have maintained open results considering their reproducibility, you can start assembling them to connect with the final reporting and publication with appropriate references to previous studies. + +- Before writing your manuscript, assess each output to make sure that appropriate license is attached for reuse, documentation has been provided and contributors are clearly listed. You can decide to create a version of the record and point to a permanent identifier such as via Zenodo so that the link never breaks when sharing them on a public repository (such as GitLab/GitHub) or manuscripts with a visible list of contributors. +- Your publications can be created individually (such as in EHTC case study) or by combining several outputs or pieces of information in manuscripts. These will include resource requirements, dependencies, software, data, repository where code is shared with documentation and contributor information, among other research artifacts. +- The manuscript itself will describe research questions, methods as well as individual figures and tables explaining the results. When writing a manuscript, you can begin with figures by packaging data, code and parameters used, ensuring that information represented can be reproduced. You can find a detailed checklist in the publication by [Gil et al.](https://agupubs.onlinelibrary.wiley.com/doi/full/10.1002/2015EA000136) (2016). + +As demonstrated in the EHTC case study, a final step towards making open results could be to create a meta article and/or simple website/git page that centralizes all your research outputs. Different parts of research (individual open results) can be accessed centrally with details including open recognition for all contributors. + +If you are looking for concrete actions you can take to make open results, pick one of these four items: + +- Improve how you define contributorship in your project and how authorship is assigned. +- Ensure the data or software in your paper is uploaded to Zenodo with license and documentation including metadata and that the DOI is posted to your scientific report and publication. +- Ensure that the process you use to collect data and perform its analysis, including all the dependencies and methods used in your data analysis pipeline, are clearly described to allow others to reproduce your results. +- Create a centralized repository or a simple git page to centralize all research outputs with contributors list. + +## Lesson 3: Summary + +The steps that we highlight to make open results are not intractable. In fact, the steps we have highlighted are things we can do on a regular basis to ensure that all research artifacts can be shared later as open and reproducible results. In this lesson we learned: + +- Approaches for making open results. +- The importance of collaboration in making results. +- How to recognize and credit all of the contributors who make results. +- How to combine different open results to create scientific reports and reproducible outputs. + +## Lesson 3: Knowledge Check + +Answer the following questions to test what you have learned so far. + +*Question* + +**01/02** + +1. Which of the following roles would be most appropriately credited with contributorship? Select all that apply. + +- Original idea, planning, and input +- Supervision of the project +- Original experimental work +- Data analysis +- Drafting of manuscript + +*Question* + +**02/02** + +What is not an example of open research results? + +- Open access papers +- Conference presentation +- Internal team meeting notes +- Regular reports shared online +- Poster at a workshop +- Blog post +- Computational notebook on GitHub +- Figure with a DOI (e.g, Zenodo or Figshare) +- Pre-print of a paper \ No newline at end of file diff --git a/Open-Science-101/Module_5/Lesson_4/readme.md b/Open-Science-101/Module_5/Lesson_4/readme.md index 97608e7d..4f2191de 100644 --- a/Open-Science-101/Module_5/Lesson_4/readme.md +++ b/Open-Science-101/Module_5/Lesson_4/readme.md @@ -1 +1,397 @@ -# Just a test \ No newline at end of file +# Lesson 4: Sharing Open Results + +## Navigation + +* [When to Share](#when-to-share) +* [How to Share](#how-to-share) +* [Other Considerations When Sharing](#other-considerations-when-sharing) +* [Lesson 4: Summary](#lesson-4-summary) +* [Lesson 4: Knowledge Check](#lesson-4-knowledge-check) + +## Overview + +In lesson 3 you learned about how to make reproducible results. Now, we can finally think about how to best share those results. In this lesson we will place emphasis on publishing manuscripts as open access. You will learn what subtleties to consider when determining what journal to publish in, including how to make sense of a journal's policies on self-archiving. Finally, we discuss some commonly held concerns about sharing open access publications, and how to overcome them. Ultimately, we want to ensure that you have confidence in your decision to publish as open access. + +## Learning Objectives + +After completing this lesson, you should be able to: + +- List ways that you can share open results to become a more collaborative, effective, scientist. +- List different types of open access publications and considerations when sharing like licenses. +- List some of the concerns around open access publishing, including responsibilities for authors, the threat of predatory publishers, and the fear of being wrong. + +## When to Share + + + +Part of doing open science is enabling collaborative, interactive results. Sharing different types of research objects earlier in your research process helps increase visibility to your work and accelerates your efforts by drawing from the collective knowledge of others. The internet has fundamentally changed the timing of and manner in which scientists communicate results. + +Planning to share your intermediate results at the beginning of your project makes sharing final results easier. The figure above illustrates many of the different objects that can be shared before the 'final' report or publication. Sharing and talking about your research as you are doing it, as well as engaging with other scientists, will increase the robustness of your work. + +Ask questions. Share what you are working on. You will find that many involved in the scientific community want to help. The more you engage, the larger the audience and the more impact you will have when that 'final' publication is published. + +In the past few decades, scientists have made new connections and sought collaborators through letters and at conferences. However, this way of doing science tended to restrict who could participate. Today, most of these discussions take place on the internet, which has enabled new avenues for participatory science, open to all. + +The platforms where you share research depends on what you want to share. Reference the figure above and think about where you might share different types of information. How will this influence who you have an ability to engage with? + +Let's start with sharing in smaller groups (workshops and conferences) and move to larger audiences. There are distinct reasons for communicating results to different sizes of groups, as explored in the following sections. + +### At Workshops and Conferences + +Many of us attend scientific conferences, workshops, and other gatherings to discuss our science with peers. The costs associated with attendance and travel to these events may limit who has access to the material presented there. At these events, scientists often give talks or present posters that are not yet peer reviewed to invite feedback from the community and potentially recruit collaborators. These interactions are important for improving research projects, and are often done when a project is still ongoing so that researchers can gather feedback early in their scientific process. + +It is important to think about what audience you will be reaching at an event. Conferences have different policies about open access to materials presented at an event. Consider what you are sharing and who you want to share it with. For example, not all events provide long-term open access to workshop materials after the event. If you want to reach a larger audience or preserve the materials long-term, as a scientist, you have options to license and publish presented materials yourself (for example using Zenodo with a DOI) if an event doesn't do so. + +### Other Forms of Interactive Feedback + +Other forms of sharing can serve a similar purpose to share and document your results and/or software packages, and also allow for additional flexibility and openness! There are a number of additional resources that you can use + +- Blog posts and online articles +- Short form videos and podcasts +- Computational notebooks +- Social media posts +- Forum discussions + +These different pathways allow for the dissemination of null results, intermediate science updates and/or software improvements. These alternative ways of sharing your work can benefit your research by facilitating extended dialogue between you and collaborators, and even the general public. Additionally, the public has easier access to these forms than they do to conferences. + +Here are some specific examples of engagement across contemporary platforms for scientific collaboration: + +- Blog posts such as the [Pangeo blog](https://medium.com/pangeo) - see examples of how to use different software tools for different science questions! +- Computational notebooks as a way to demo software techniques (e.g. the [Project Pythia Cookbook Gallery](https://cookbooks.projectpythia.org/) showcasing computational science workflows in the Earth sciences). +- Non-peer reviewed publications, such as [Research Notes of the AAS](https://journals.aas.org/research-notes/). +- Team and/or Mission Science Pages, such as the [LUVOIR team's page](https://asd.gsfc.nasa.gov/luvoir/) or the Juno [mission's page.](https://www.missionjuno.swri.edu/) +- Conference proceedings, such as from the [Society of Photo-Optical Instrumentation Engineers.](https://spie.org/publications/conference-proceedings) +- Social media posts: [https://twitter.com/MartianColonist/status/1706824699349488036](https://twitter.com/MartianColonist/status/1706824699349488036) + + **Over the course of a 3-year study, the Journal of Medical Internet Research found that highly tweeted articles were 11 times more likely to be highly cited than less tweeted articles.** + +### Publishing Reproducible Reports and Publications + +An open access report and paper can be reproducible when its data, software, and content are made available to the readers following best practices. There is a growing list of resources documenting how to make open results reproducible (such as [The Turing Way](https://the-turing-way.netlify.app/reproducible-research/reproducible-research) and [FORRT](https://forrt.org/)). + +There are several examples (discussed in these lessons) that demonstrate how we can integrate technical and collaborative solutions to enable reproducibility. For example, executable notebooks allow interactivity and testing, training workshops invite feedback for improvement and GitHub/GitLab enable community based open review. + +**Scholarly Journals** + +Publishing work in a peer-reviewed journal forms the traditionally written basis of how we share our science, and is important for communicating scientific detail and rigor to colleagues. Academic journals also act as a long-term archive of scientific research papers. For many scientists, publishing in peer-reviewed journals and receiving citations are key factors in how they are evaluated for career advancement, positions appointments, committee memberships, and honors. + +Traditionally, authors pay an Article Processing Charge (APC) that can range from \$200-\$12000 USD. Higher profile journals often charge higher fees to authors. Accessing articles has traditionally been restricted by pay-walls that require a subscription or charge per article. Journals have different options for making your published work accessible to various communities. + +**Who Has Access to Journal Subscriptions?** + +Paywalls limit who can access scientific research. This barrier acts to limit who can participate in science and erodes public trust in results. Part of open science is ensuring worldwide access to research. + +**Open Access Journals** + +Open access journals are peer-reviewed journals that are more accessible because they don’t require readers to have a subscription or pay to access the content. However, open access journals often require additional fees for the author. Open access peer-reviewed articles are archived by a more formal discussion of scientific ideas, interpretations, and conclusions. They form the basis of how researchers share results. + +### Activity 4.1: Read the Open Access Policies of Publishers That You Use + + + +In this activity, you will learn how to access information about a journal’s data archive policies. The Directory of Open Access Journals (DOAJ) provides an extensive index of open access journals around the globe. The DOAJ can be used to look up information, including data archiving policies, for journals that publish research. Let’s open up this website and look up the policies specific to your most-used journals. + +1. First, navigate to the [DOAJ website](https://doaj.org/). +2. Type in the name of one of the following journals in the search box, and then click on the yellow "SEARCH" button. +- Atmospheric and Oceanic Science Letters +- Swiss Journal of Geosciences +- History of Geo-and Space Sciences

Note: You may input any journal desired but for this exercise use one of those listed to see the Sherpa/Romeo link that is listed in Step 5. +3. The search results may show more than one match. Select the desired journal within the search results by clicking on the journal name.

A dashboard appears, giving information regarding publication fees, waiver policies, the type of open license used, and other information on multiple displayed titles. +4. Click on the "archiving policy" link appearing in one of the displayed boxes as seen here. This will provide links to extensive information regarding the journal’s open access policies for the manuscript itself:

An extensive amount of information will be presented, including details on the publishing policies specific to the selected journal. +5. Alternatively, to get a more condensed view of the journal’s policies, return to the DOAJ dashboard on the About page with the multiple boxes displayed, and click on the "Sherpa/Romeo" link as shown here.
+6. On the Sherpa Romeo page, click on the journal name that is displayed in the list (the only journal displayed).
+7. When you view the page, you see that it consolidates and summarizes the open access policies for that journal and associated materials. The published version is likely to be the most relevant (see red box in figure).
+8. Review the page and determine which license the journal you selected has defined for reusability for manuscripts. + +#### Activity Key Takeaways: Read the Open Access Policies of Publishers That You Use + +This is an example of a site that you can use to determine if a journal’s policy is consistent with how you wish to publish your open access results. Journal policies should always be reviewed and considered during the early planning phase of your project and well before submitting your manuscript for publication. + +## How to Share + +Perhaps the single most important step to make your results open is to assign them a globally unique and persistent identifier. This will give you a single code, URL, or number that you can use to uniquely refer to a research object. Any derived research object can use this identifier to link to it and create a traceable and rich history of use and development. Crucially, this identifier can be used by others to cite and credit your work ([source](https://opensciency.github.io/sprint-content/open-results/lesson3-apply-open-results.html)). + +The identifier must also be persistent. This guarantees that the identifier points to the same research object for a long period of time. What counts as "persistent" is, of course, a matter of degree since even the most stable identifier probably won't survive the Sun engulfing the Earth in a few billion years. In this context, "persistent" implies that it is registered in a database managed by an organization or system that is committed to maintaining it as stable and backwards compatible for the foreseeable future. + +For example, URLs (for example, a personal website, GitHub repository, or cloud storage) are notoriously not persistent since they can change their contents frequently or become invalid without maintenance. On the other hand, Journal publications have a Digital Object Identifier (DOI) whose persistence is guaranteed by the International DOI Foundation. + +As well as uniquely identifying each research object, it is important to be able to uniquely identify and cite all the authors and contributors. For this, it is recommended to get the permanent digital ID of each of the authors and contributors. [ORCID](https://orcid.org/) (Open Researcher and Contributor ID) is an online service where you can get a permanent digital identifier. + +There are examples of globally unique and persistent identifiers: + + + + + + + + + + + + + + +
DIGITAL OBJECT IDENTIFIER
10.1371/JOURNAL.PONE.0230416 ☑
ISBN-13: 978-0735619678THE INTERNET ARCHIVE
+

The Digital Object Identifier is provided by the International DOI Foundation, which ensures that each ID is unique and ensures that a DOI link always links to the correct object.

+
+ + + + + + + + + + + + + + +
DIGITAL OBJECT IDENTIFIER
10.1371/JOURNAL.PONE.0230416
ISBN-13: 978-0735619678 ☑THE INTERNET ARCHIVE
+

This is an International Standard Book Number, which has to be purchased by publishers by the International ISBN Agency.

+
+ + + + + + + + + + + + + + +
DIGITAL OBJECT IDENTIFIER
10.1371/JOURNAL.PONE.0230416
ISBN-13: 978-0735619678THE INTERNET ARCHIVE ☑
+

The Internet Archive captures snapshots of websites and their links are really stable. Even if not ideal, it’s a handy tool for creating identifiers of websites easily.

+
+ +### Licenses + + + +By applying a license to your work, you make clear what others can do with the things you're sharing, as well as the conditions under which you're providing them (like the requirement to cite you). Another very important element to include with your research objects is clear rules for reuse (as is and for creating derivative work), which are often and most easily codified by the use of licenses. + +Without a license, all rights are with the author of the research result. That means nobody else can use, copy, distribute, or modify the work without consent. A license gives this consent. If you do not have a license for each of the research objects that constitute your research result, it is effectively unusable by the whole research community. + +Creative Commons licenses are usually used for written content (see Lesson 3 for a full description!). The benefit of a license, as opposed to the public domain, is that most require attribution to the original creators. The Creative Commons Attribution License, [CC-BY](https://wellcome.org/grant-funding/guidance/open-access-guidance/creative-commons-attribution-licence-cc), is the most common open access license for sharing publications as it requires attribution. There are other Creative Commons licenses used that may have different limitations on whether or not they can be commercially used, whether or not they can be modified and copied, and whether or not the licenses can be changed in further adaptations of code. + +Your institutions, funding agency, or research proposal may require use of a specific license depending on the type of material that you produce from your research. For public agencies, CC-0 or CC-BY are generally recommended (or required) to maximize their return on investment and ensure the widest possible re-use. Choosing a CC license that has additional restrictions (e.g., -ND, -SA, -NC) can result in less reuse of data. As you share results on different platforms, look carefully to see what license is being applied! + +### Routes for Open Access Publishing + + + +Routes to publishing openly. The Turing Way project illustration by Scriberia. Used under a CC-BY 4.0 license. Original version on Zenodo. [http://doi.org/10.5281/zenodo.5706310](http://doi.org/10.5281/zenodo.5706310) + +--- + +The most common types of open access publishing are Green, Gold, and Diamond. + + + + + + + + + + + + + + +
GOLD OPEN ACCESS PUBLISHING ☑GREEN OPEN ACCESS PUBLISHINGDIAMOND OPEN ACCESS PUBLISHING
+

In Gold Open Access Publishing, authors pay an Article Processing Charge (APC) to a journal so that they publish the final version of your article under an open access license, which is then permanently and freely available online for anyone. The author will retain the copyright of their article, usually via a Creative Commons license of their choice, which dictates what others can do with the article. A criticism around gold Open Access publishing is the cost.

+

APCs can generally be around 2000 USD or in some cases more, which can therefore be prohibitive for authors across the globe. Some publishers offer discounts or waivers to authors from countries classified by the World Bank as low-income economies or APCs may be covered by your funder as part of your grant.

+
+ + + + + + + + + + + + + + +
GOLD OPEN ACCESS PUBLISHINGGREEN OPEN ACCESS PUBLISHING ☑DIAMOND OPEN ACCESS PUBLISHING
+

Green Open Access is the process of self-archiving. The self-archiving movement aims to provide tools and assistance to scholars to deposit and disseminate their refereed journal articles in open institutional or subject-based repositories. You may choose to self-archive your work to make it more discoverable and/or after you’ve published it in a subscription journal to ensure there is an open version of your paper.

+

The Registry of Open Access Repositories contains a list of repositories that are available for researchers to self-archive. At the beginning of 2019, there were more than 4000 repositories. It is important to find yourself-archive community!

+
+ + + + + + + + + + + + + + +
GOLD OPEN ACCESS PUBLISHINGGREEN OPEN ACCESS PUBLISHINGDIAMOND OPEN ACCESS PUBLISHING ☑
+

Diamond Open Access are publications where there is neither a cost for reading the article or publishing an article. Diamond Open Access journals either have very low costs due to building on existing infrastructure and volunteer efforts, or are supported directly by foundations or institutions. For authors, Diamond Open Access publications typically allow the author to retain copyright and the final version of their article as it is published under an open access license.

+
+ +### Pros and Cons of Preprints + +When publishing in a peer-reviewed journal, you can decide to share a pre-print. A preprint is a version of a paper prior to its publication in a journal\*. This can be the author’s version of the accepted manuscript after peer review or a version prior to submission to a journal. + + **The accepted manuscript is the final, peer-reviewed version of the article that has been accepted for publication by a publisher. The accepted manuscript includes all changes made during the peer review process and contains the same content as the final published article, but it does not include the publisher’s copy editing, stylistic, or formatting edits that will appear in the final journal publication (i.e., the version of record).** + +**Source: https://science.nasa.gov/researchers/sara/faqs/osdmp.** + + +Many journals provide preprint services. If they don’t, there are many public preprint servers available. Often the funding agency will have a preferred public preprint server. + +Preprints come with many advantages as well as perceived or potential disadvantages. + + + + + + + + + + + + + +
ADVANTAGES TO PUBLISHING WORK AS A PRE-PRINT ☑POTENTIAL DISADVANTAGES
+
    +
  • Quickly disseminate findings to communities in a timely manner.
  • +
  • Many field-specific preprint servers (e.g. arxiv.org, biorxiv.org, essoar.org) are free to both upload and read.
  • +
  • Community feedback on your work as it's being done.
  • +
+
+ + + + + + + + + + + + + +
ADVANTAGES TO PUBLISHING WORK AS A PRE-PRINTPOTENTIAL DISADVANTAGES ☑
+
    +
  • Work may be shared with critical errors that may have been caught in peer review.
  • +
  • In some fields, there is a perception of lessened reliability or quality of research published as a preprint.
  • +
  • Some journals do not allow or accept articles if they have been submitted to a preprint server.
  • +
+
+ +### What to Consider When Making Preprints + +When deciding to preprint your work, you will need to check: + +1. The copyright policy of the journal with which you aim to publish. +2. The version of the paper that can be deposited. +3. When the paper is allowed to be made publicly available. + +#### Additional Reading: + +Read the [story](https://pubs.aip.org/physicstoday/Online/29310/Joanne-Cohn-and-the-email-list-that-led-to-arXiv) about how Joanne Cohn's email list for preprints led to Paul Ginsparg's development of [arXiv](https://arxiv.org/). + +## Other Considerations When Sharing + +### Who is Sharing? + +When writing an OSDMP, it’s important to include a plan for the roles and responsibilities needed to share your results. As discussed in lesson 3, your community will consist of members in different roles – some actively engaged, some with only a passing interest. Having a clear plan for sharing open results and how credit will be given will help everyone understand their contributions and roles and minimize conflict. + +Lesson 3 describes in detail the different roles that people may play in sharing results. This should be clearly described In the OSDMP. + +### Predatory Publishers + +Predatory Publishers are generally for-profit publishers that charge a publishing fee but provide few quality checks on the quality of the publication that would be expected from scholarly publications. They sometimes use the benefit of open access to entice authors to publish with them. If you are unsure if a publisher may be predatory, checking with your library staff is a good place to start. + +There are many red flags in these requests for predatory publishers: + +- There is an urgency and request for an extremely quick turnaround. A very fast publication time might indicate a less rigorous peer-review process. +- Written English in correspondence is often poor quality with many grammatical errors. (Though it’s important to remember that this alone does not indicate predatory behavior, as grammatical mistakes can be made for innocent reasons, such as being a non-native speaker.) +- The journal subject is nonspecific. +- The solicitation is inaccurate or generic. +- The email is often unsolicited, even if they claim that they're referring to a previous paper of yours. This might start with an inaccurate or generic solicitation such as "professor". +- They emphasize ISSN indexing and/or impact factors, although this particular journal doesn’t have one. Consider Journal Citation Indicator (JCI) in addition to Journal Impact Factor (JIF). +- The publisher/journal sends multiple emails soliciting manuscripts, special issues, and editorial roles. +- They have a high number of special issues, such that the majority of the papers published appear in special issues. +- Their name resembles the name of a prestigious journal. +- They have a high self-citation rate, such as over 20%. +- They have a very high acceptance rate of submitted papers. +- They send frequent requests to submit/serve as editor. + +Below are some final thoughts on what or what not to consider when deciding where to publish. As with many considerations you will encounter in academia, sometimes deciding the best place to publish will be determined by word of mouth conversations with peers. Read more on NOAA's [guidance on predatory publishing](https://libguides.library.noaa.gov/predatorypublishing). + +### Common Questions About Sharing Results + +Sharing in different ways, especially without peer-review, can be intimidating. Maybe you have worried about the following questions: + +- **What if an open result is wrong?**
A tweet, post, or video is only a snapshot in time of a research result. It is understood by all working professional scientists that we are constantly learning and discovering new things. Making reproducible results will necessarily include different versions and revisions of an idea as it develops. + +- **I have already published my science as an open result, so do I need to respond to community feedback forever?**
As long as you have done everything to make your work reproducible - you don’t need to worry. Open science can’t be carried solely by a single person. Open science communities can continue to update, refine, and develop your open science result if your work has been shared and openly licensed.

If you are able to address a question or a concern about your prior research, that’s great. It is also an ethical response to acknowledge that this is research that you are no longer actively involved with, but allow others to continue the work that you began. + +- **What if I can't do everything? Am I a bad open scientist?**
The short answer is no! You have only a limited amount of time. Even with collaborators, you can’t possibly do everything.

+ +Sharing open results improves science - it is faster, more accessible, and more collaborative. In this lesson you have learned about all the different ways you can share open results. Think about how you might share something you are working on now! + +## Lesson 4: Summary + +In this lesson, you learned: + +- When to share open results and the different ways in which they can be shared. This includes: peer reviewed publications, conference proceedings, blog posts, videos, notebooks, and social media. +- How to share open results including considerations around the license for the publication, routes for open access publications (Green, Gold, Diamond), and preprints as part of the publication process. +- Considerations around sharing, including considerations around predatory publishers and common concerns around openly sharing of results. + +## Lesson 4: Knowledge Check + +Answer the following questions to test what you have learned so far. + +*Question* + +**01/03** + +Which of the following Creative Commons licenses is most commonly used for open access publications? + +- CC BY-NC-SA +- Copyright +- CC-BY +- Apache 2.0 + +*Question* + +**02/03** + +Read the statement below and decide whether it's true or false. + +*Diamond open access is both free to publish and to read scientific articles.* + +- True +- False + +*Question* + +**03/03** + +Take a close look at the request for journal submission below. Does this request for journal submission seem reliable? + + + +- Yes +- No diff --git a/Open-Science-101/Module_5/Lesson_5/readme.md b/Open-Science-101/Module_5/Lesson_5/readme.md index 97608e7d..b52356e9 100644 --- a/Open-Science-101/Module_5/Lesson_5/readme.md +++ b/Open-Science-101/Module_5/Lesson_5/readme.md @@ -1 +1,327 @@ -# Just a test \ No newline at end of file +# Lesson 5: From Theory to Practice + +## Navigation + +* [Writing an OSDMP: What to Include in the OSDMP for Sharing Results Openly](#writing-an-osdmp-what-to-include-in-the-osdmp-for-sharing-results-openly) +* [Example Steps Toward More Open Results](#example-steps-toward-more-open-results) +* [How Emerging Technology Like AI is Changing How We Do Science](#how-emerging-technology-like-ai-is-changing-how-we-do-science) +* [Lesson 5: Summary](#lesson-5-summary) +* [Lesson 5: Knowledge Check](#lesson-5-knowledge-check) +* [Open Results Summary](#open-results-summary) +* [Open Science 101 Summary](#open-science-101-summary) + +## Overview + +In the previous lessons, we learned about various ways to share our science, and what steps we should think about when sharing. In this lesson, we tie the concepts from previous lessons together with some specific guidance for writing the Sharing Results section of an Open Science and Data Management Plans (OSDMP). We will also reflect on how our society and technology constantly evolve, as does the way we do science. A new technology with the potential to radically alter the way we do and share science is artificial intelligence (AI), particularly when it comes to language learning models. These AI tools are already changing how we interact with written text. In this lesson, we discuss some of the ways that AI is and will affect how we do and share our science. + +## Learning Objectives + +After completing this lesson, you should be able to: + +- List what to include in an OSDMP for sharing results openly. +- List some concrete steps toward sharing results openly. +- Describe how emerging technology like AI is currently impacting how we use, make, and share our science. + +## Writing an OSDMP: What to Include in the OSDMP for Sharing Results Openly + +The process within an Open Science and Data Management Plans (OSDMP) to share data and software is covered in other modules, so here we will discuss how to share the other type of research outputs. Most proposals require that you include plans for publications such as peer reviewed manuscripts, technical reports, books, and conference materials. + +Though not required, it can be a good idea to include plans for making your results publicly accessible in ways other than traditional publishing, e.g. online blog posts, tutorials, or other materials. After all, writing an OSDMP is often required for funding requests, and this can be a way to show proposal reviewers that you are thinking about how to best share your science. + +### Activity 5.1: Pen to Paper + + + +Write a sample results section of an OSDMP that details how you would plan to make your results open. Think about an example from your research and what details you would need to include to convince reviewers that you will share open access results. + +**Example 1:** This activity will result in 2 peer-reviewed publications that will be published green open- access. Pre-prints will be archived in PubSpace. + +**Example 2:** This activity will result in the creation of computational notebooks, 4 conference abstracts and posters, 2 peer reviewed manuscripts, and 2 online plain-language articles, summarizing our results. Peer-reviewed publications will be published green open-access and pre-prints will be archived in PubSpace or the journals open-access preprint server. All other materials will be archived at Zenodo, assigned a DOI, and assigned a CC-BY license or permissive software license. + +For these examples, what other information or details could be added? If you were planning to write a tutorial about your science, what would you include? + +## Example Steps Toward More Open Results + +[**NASA Announces Summer 2023 Hottest on Record**](https://www.nasa.gov/news-release/nasa-announces-summer-2023-hottest-on-record/) + + + +Image credit: NASA Earth Observatory/Lauren Dauphin. + +--- + +When results and research objects are published openly, anyone can reproduce the scientific result. For topics like climate change, the transparency of results helps reduce misinformation and increases public trust in results. + +Here is a GitHub [repository](https://github.com/jmunroe/OpenScienceExample_GISTEMPv4/tree/main) with an example of a [result](https://gist.github.com/jmunroe/74a1eda18d1473040ed91f2a1f02b1b5) made available as open access. This visualization is not perfect but provides a snapshot of a work in progress that can be shared with the community for feedback and refinement. This could be further refined, or perhaps serve as the start of a new effort that will extend the initial results. The results are more accessible, inclusive, and reproducible by being published openly. + +There are lots of ways that open science can extend the span or scope of projects. Here are some steps you can take to share your open results in a way that makes your work more usable, reproducible, and inclusive: + +- Add a Code of Conduct via the CODE_OF_CONDUCT file and link to other policies that apply to your work. +- Add contributors and authorship guidelines via a CONTRIBUTING file. +- Add your collaborators and team members' names with their permission. +- Add your proposal but remove any sensitive information. +- Create a preliminary roadmap and what goals the project is trying to achieve. +- Create a project management, code and data folders where you can upload appropriate information as your project develops. +- Create a resource list that your project requires. +- Provide links to training materials that your collaborators and contributors may benefit from. +- Use issues and project boards to communicate what is happening in the project. +- Use Pull Request to invite reviews to new development of code and content. +- Add user manual and executable notebooks to allow code testing. +- Create and share executable notebooks that document how data is processed and the result obtained. +- Create tutorials or short form videos demonstrating how a step in your research workflow was accomplished. +- Write a blog post about your experience wrestling with a particular research challenge and how you solved it. +- Contribute to documentation to improve the open-source tools based on your own experience. +- Connect your repository to Binder to allow online testing of your code and executable notebooks. +- Link all the outputs that are generated outside this repository (like blog, video, forum post and podcast among others as discussed above). +- Some advanced steps that should be applied as the project develops include continuous integration, containerization, Citation CFF file and the creation of a simple web page to link all information. + +## How Emerging Technology Like AI is Changing How We Do Science + +Throughout these modules, the internet has been identified as a fundamental disruptive technology that changed how almost all of science is accomplished. Scientists rarely go to libraries to read the latest journal articles. Data is no longer mailed around the world on tape drives. Software isn’t shared via floppy disks. The internet helped create the modern scientific workflow and made science more interactive and accessible. Now AI tools are starting to disrupt science in a similar manner. AI is not only revolutionizing many aspects of our lives, it is also changing how we do science. As companies race to create and integrate new generative AI tools into every aspect of our lives, many scientists, institutions, journal publishers, and agencies are looking to see how to use these tools effectively, understand their reliability, accuracy, biases, and how to also use these cutting edge tools ethically. An additional concern is how any information shared with AI tools may be used to intentionally or unintentionally disclose confidential data, leading to privacy concerns. + +AI can help us use and share research. It can act as an accelerant, taking care of tedious tasks while leaving scientists free for more creative thought. These tools are better than humans at processing vast amounts of data, but humans are better at creative and nuanced thought. This is important to consider when determining whether or not to use AI. As an example, many people already use AI tools to help with their inbox management and writing emails with AI generated suggested content. Within science, there are many potential tasks that could potentially be expedited using AI, according to three studies published in Nature: + +- [AI science search engines are exploding in number — are they any good?](https://www.nature.com/articles/d41586-023-01273-w) +- [How AI technology can tame the scientific literature](https://www.nature.com/articles/d41586-018-06617-5) +- [AI and science: what 1,600 researchers think](https://www.nature.com/articles/d41586-023-02980-0) + +### Using AI: + + + + + + + + + + + + + + +
LITERATURE REVIEWS ☑SEARCHING FOR RELEVANT DATASETS AND SOFTWARE TOOLSLANGUAGE BARRIERS
+

The ever-increasing volume of scientific literature has made it challenging for researchers to stay abreast of recent articles and find relevant older ones. AI tools can be used to create personalized recommendations for relevant articles as well as create summaries of them in various formats. Some examples of these tools include SciSummary, SummarizeBot, Scholarcy, Paper Digest, Lynx AI, TLDR This.

+

Possible drawbacks when using these tools include:

+
    +
  • Potential introduction of biases
  • +
  • Insufficient contextual understanding or interpretation
  • +
  • Possible inability to handle complex technical language
  • +
  • Incorrectly identifying key points
  • +
+
+ + + + + + + + + + + + + + +
LITERATURE REVIEWSSEARCHING FOR RELEVANT DATASETS AND SOFTWARE TOOLS ☑LANGUAGE BARRIERS
+

AI tools can be used to discover different datasets that may be relevant to a scientific query and recommend relevant software libraries.

+
+ + + + + + + + + + + + + + +
LITERATURE REVIEWSSEARCHING FOR RELEVANT DATASETS AND SOFTWARE TOOLSLANGUAGE BARRIERS ☑
+

AI tools can be used to create automatic translations into different languages. Several of the tools above also offer translation.

+
+ +### Making with AI: + + + + + + + + + + + + + +
CODE ☑RESULTS
+

AI tools can be used to generate code to perform analysis tasks and translate between programming languages. Some examples of these tools include Co-Pilot, Codex, ChatGPT, and AlphaCode.

+

Usage tip: Popular large language models can be used to generate code, but it has been noted by many that breaking down tasks and using careful prompts helps generate better results.

+
+ + + + + + + + + + + + + +
CODERESULTS ☑
+

AI tools can be used to generate text, summarize background materials, develop key points, develop images and figures, and conclusions. Using these tools may help non-native speakers communicate science in different languages more clearly. Additionally, they could be helpful to develop plain-language summaries, blog posts, and social media posts.

+

Some possible drawbacks when using these tools:

+
    +
  • See the list above for a literature review.
  • +
  • Factual and commonsense reasoning mistakes because they do not (at this time) have the type of cognition or perception needed to understand language and its relationship to the external physical, biological, and social world (cite: https://www.tandfonline.com/doi/full/10.1080/08989621.2023.2168535).
  • +
+
+ +### Sharing with AI: + +- Results - AI/ML models are increasingly being used in research. When sharing results, follow best practices as outlined in the [Ethical and Responsible Use of AI/ML in the Earth, Space, and Environmental Sciences](https://essopenarchive.org/users/536571/articles/635008-ethical-and-responsible-use-of-ai-ml-in-the-earth-space-and-environmental-sciences) article. +- Incremental prompting can help create an outline for your research article. An example can be found on [X](https://twitter.com/MushtaqBilalPhD/status/1640243808851075072?ref_src=twsrc%5Etfw%7Ctwcamp%5Etweetembed%7Ctwterm%5E1640243808851075072%7Ctwgr%5E86f4269a3a6f05f7927bdb57e4f45654f827dc44%7Ctwcon%5Es1_&ref_url=https%3A%2F%2Fwww.euronews.com%2Fnext%2F2023%2F08%2F07%2Fbest-ai-tools-academic-research-chatgpt-consensus-chatpdf-elicit-research-rabbit-scite). +- AI tools can help identify where to share results and help write social media or other short posts based on your article. + +### Cautions About Use of AI Tools + +Journals are increasingly implementing guidelines and requirements concerning the usage of AI tools during the writing process. Many require that the use of AI tools for writing, images creation, or other elements must be disclosed and their method of use identified. As is the case with all other material within an article, authors are fully responsible for ensuring that content is correct. Examples of this policy can be read in the AI guidelines of [Nature](https://www.nature.com/nature-portfolio/editorial-policies/ai) and [NCBI](https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10318315/). + +Furthermore, there are numerous examples of generative AI (for both code and content) delivering plagiarized information in violation of licenses, as well as fabricating material including citations. Using these AI tools may lead to findings of academic and research misconduct should fabrication, falsification or plagiarism be contained within AI generated materials. So BE CAREFUL. Learn more about possible issues with AI in a [Nature example](https://www.nature.com/nature-index/news/artificial-intelligence-writing-tools-promise-faster-manuscripts-for-researchers) here. + +At this time, and for these reasons, AI tools are generally not allowed in grant applications or in peer- review or proposal review activities. + +The National Institutes of Health (NIH) has prohibited "scientific peer reviewers from using natural language processors, large language models, or other generative Artificial Intelligence (AI) technologies for analyzing and formulating peer review critiques for grant applications and R&D contract proposals." Utilizing AI in the peer review process is a breach of confidentiality because these tools "have no guarantee of where data is being sent, saved, viewed or used in the future." Using AI tools to help draft a critique or to assist with improving the grammar and syntax of a critique draft are both considered breaches of confidentiality. Read [NIH's AI policy](https://grants.nih.gov/grants/guide/notice-files/NOT-OD-23-149.html) here. + +AI tools for science are developing rapidly. The science community's understanding of how to ethically and safely use AI is just developing as its use in research expands rapidly. The guidelines above offer a snapshot in time and will likely continue to evolve. If you choose to use these tools for scientific research, carefully consider how much to rely on them and how their biases may impact results, as cautioned in [this Nature article](https://www.nature.com/articles/d41586-023-02980-0). The internet has transformed the world and AI tools are likely to do the same. As with any tool, it is important they are used for the appropriate purpose and in an ethical manner. + +## Lesson 5: Summary + +The steps that we highlight to make your research more reproducible and open will advance science and the impact of your research. In fact, the steps we have highlighted are things we can do immediately to ensure we make open and reproducible results. + +In this lesson, you learned: + +- How to include open results in the OSDMP. +- An example of how results can be shared openly. +- That developing AI tools are being used in all parts of the scientific workflow, they are changing rapidly, and there are still many open questions about how and when to use them. + +## Lesson 5: Knowledge Check + +Answer the following questions to test what you have learned so far. + +*Question* + +**01/03** + +Read the statement below and decide whether it's true or false. + +*It is a good idea to include plans in your OSDMP for making your results available in ways outside of traditional publishing, e.g. online blog posts or tutorials.* + +- True +- False + +*Question* + +**02/03** + +Which of the following aspects of AI are considered as benefits? Select all that apply. + +- Personalized journal article recommendations based on your discipline and interests +- Recommendations for data and software relevant to your science project +- Potential introduction of bias +- Factual mistakes +- Translation between languages + +*Question* + +**03/03** + +Which of the following are steps you can take to share your open results online? Let's assume that, like the activity, you are sharing an interactive visualization. + +- Host your project in a public GitHub repository +- Assign an open license +- Add a code of conduct to the GitHub repository +- Add a user manual +- Release your project on public repositories that assign DOIs +- All of the above + +## Open Results Summary + +### Moving Toward an Open, Collaborative, and Inclusive Scientific Future + +Science is meant to benefit society. Sharing our science helps ensure that it benefits society and informs the decisions of the public and policymakers, especially when funded by public agencies or governments. Going back to the 'Ethos of Open Science' module: + + + +"Open Science is the principle and practice of making research products and +processes available to all, while respecting diverse cultures, maintaining security and privacy, and fostering collaborations, reproducibility, and equity" + +**https://open.science.gov/** + +--- + +Throughout this curriculum, we have focused on skills needed to make research products and processes available to all. The traditional practice of only sharing results limits insight into how science is done and may act to limit who can participate in science. By sharing your scientific process and working openly, you advance all of science in a more rapid and inclusive way. This curriculum will continue to evolve as science evolves and we welcome your contributions! + +**Learn more about NASA's transformation to open science and join conversations following the link.** + +[CLICK TO LEARN](https://nasa.github.io/Transform-to-Open-Science/) + +## Open Science 101 Summary + +Congratulations! You have successfully completed Open Science 101! Thank you for taking the time to learn about open science - you are part of a broader movement to improve science and make our world better! + +Ready to learn more? Here are some great next steps: + +### Learn more about and engage with TOPS! + +**TOPS website** + +[CLICK TO LEARN](https://nasa.github.io/Transform-to-Open-Science/) + +**TOPS GitHub Discussion Forum** + +[CLICK TO LEARN](https://github.com/nasa/Transform-to-Open-Science/discussions) + +### Learn more through online courses: + +**OpenSciency** + +[CLICK TO LEARN](https://opensciency.github.io/sprint-content/) + +**Open Science MOOC** + +[CLICK TO LEARN](https://opensciencemooc.eu/) + +### Take your coding and data science skills to the next level! + +**Carpentries** + +[CLICK TO LEARN](https://carpentries.org/) + +### Read online guides and learn about ongoing open science community initiatives: + +**The Turing Way** + +[CLICK TO LEARN](https://the-turing-way.netlify.app/index.html) + +**Center for Open Science** + +[CLICK TO LEARN](https://www.cos.io/) + +**Open Science NL** + +[CLICK TO LEARN](https://www.openscience.nl/en) + +These are just a start - there are a lot more fantastic open science resources online! Keep your eye out for discipline-specific learning content that is currently being developed by NASA ScienceCore grantees, and that will be linked here once available! + +### Disclaimer + +Please note that we reference several papers throughout the course and depending on the paper, it might be blocked by a paywall. If you would like to get a copy of the paper, please contact the Author or search for it in an online preprint archive. For example, [bioRxiv.org](http://biorxiv.org/). \ No newline at end of file