feat: change grep & redirection exercise to use metadata instead of fastq #371

ameynert · 2025-09-19T18:57:27Z

If this pull request addresses an open issue on the repository, please add 'Closes #NN' below, where NN is the issue number.

Closes #230
Closes #316
Closes #369

Please briefly summarise the changes made in the pull request, and the reason(s) for making these changes.

Changed the grep examples in Episode 4 for redirection to use the SRA metadata file instead of FASTQ. This gets around the issue of grep on different platforms behaving differently with the FASTQ examples as in #316, removes the issue of appending empty output as in #369 , and is a more realistic use of the tool grep and redirection as addressed in #230.

If any relevant discussions have taken place elsewhere, please provide links to these.

See the issues linked.

…astq

github-actions · 2025-09-19T18:57:37Z

Thank you!

Thank you for your pull request 😃

🤖 This automated message can help you check the rendered files in your submission for clarity. If you have any questions, please feel free to open an issue in {sandpaper}.

If you have files that automatically render output (e.g. R Markdown), then you should check for the following:

🎯 correct output
🖼️ correct figures
❓ new warnings
‼️ new errors

Rendered Changes

🔍 Inspect the changes: https://github.com/datacarpentry/shell-genomics/compare/md-outputs..md-outputs-PR-371

The following changes were observed in the rendered markdown documents:

 04-redirection.md     | 200 +++++++++++++-------------------------------------
 05-writing-scripts.md |   2 +
 md5sum.txt            |   4 +-
 3 files changed, 55 insertions(+), 151 deletions(-)

What does this mean?

If you have source files that require output and figures to be generated (e.g. R Markdown), then it is important to make sure the generated figures and output are reproducible.

This output provides a way for you to inspect the output in a diff-friendly manner so that it's easy to see the changes that occur due to new software versions or randomisation.

⏱️ Updated at 2025-09-19 18:58:28 +0000

ameynert · 2025-09-19T18:58:06Z

episodes/04-redirection.md

-also be a FASTQ file? The answer is, yes - it will be a FASTQ file and it would make sense to
-name it with a `.fastq` extension. However, using a `.fastq` extension will lead us to problems
-when we move to using wildcards later in this episode. We'll point out where this becomes
-important. For now, it's good that you're thinking about file extensions!


This is consolidated into the new callout on wildcards below.

ameynert · 2025-09-19T18:58:34Z

episodes/04-redirection.md

-```
-
-```output
-249


Introducting shell arithmetic is too advanced for this lesson.

ameynert · 2025-09-19T18:59:27Z

episodes/04-redirection.md

-four to get the number of sequences that match our search pattern. Since 802 / 4 = 200.5 and we
-are expecting an integer number of records, there is something added or missing in `bad_reads.txt`.
-If we explore `bad_reads.txt` using `less`, we might be able to notice what is causing the uneven
-number of lines. Luckily, this issue happens by the end of the file so we can also spot it with `tail`.


This whole piece of the exercise is extraneous information for beginners.

ameynert · 2025-09-19T19:00:09Z

episodes/04-redirection.md

 ```

-The `-v` option in the second `grep` search stands for `--invert-match` meaning `grep` will now only display the
-lines which do not match the searched pattern, in this case `'^--'`. The caret (`^`) is an **anchoring**


This intro to the caret usage and the single quote enclosing was moved to the relevant place in episode 5 below.

ameynert · 2025-09-19T19:05:26Z

@ggrimes Could you have the curriculum advisory committee take a look at this please? It's a fairly large change and I would like to get as many eyes as possible on it before we look to merge.

ggrimes · 2025-09-26T13:40:50Z

@tobyhodges how do i get the curriculum advisory committee to convene this process this

tobyhodges · 2025-09-26T14:25:48Z

@datacarpentry/curriculum-advisors-genomics please could one or more of you review this/discuss it?

ggrimes · 2025-10-03T13:51:50Z

@datacarpentry/curriculum-advisors-genomics please could one or more of you review this/discuss it?

I am happy to review it but on't seem to have the privileges

ameynert · 2025-10-03T15:29:53Z

@datacarpentry/curriculum-advisors-genomics please could one or more of you review this/discuss it?

I am happy to review it but on't seem to have the privileges

I'll ask on the maintainer's Slack.

sstevens2

Excited to see this PR! It seems to be a good start for changing the example in the redirection section. I left some comments and small language changes. Would be nice if we didn't have to switch back and forth between the two data folders, if possible.

I'm not sure it fully addresses the issues in that the grep example in the writing scripts is still using the bad reads example, which is somewhat artificial as an example and in contrast with the fastqc and trimmomatic best practice taught in the data wrangling lesson. Working with the metadata there is perhaps not a better example though. I'd have to think more about what might be a good example to swap out that example. Maybe something like having them pull out all their read names into a file with the name of the sample in it or as part of the filename?

sstevens2 · 2025-10-06T13:00:49Z

episodes/04-redirection.md

+Let's try out this command to look for particular samples in the SRA metadata file and copy the
+output to another file called `metadata.txt`.


Suggested change

Let's try out this command to look for particular samples in the SRA metadata file and copy the

output to another file called `metadata.txt`.

Let's try out this command to pull out particular samples from the SRA metadata file and save the metadata for those samples to another file called `metadata.txt`.

sstevens2 · 2025-10-06T15:30:40Z

episodes/04-redirection.md

+$ grep PAIRED SraRunTable.txt > metadata.txt
+$ grep SINGLE SraRunTable.txt >> metadata.txt


I can't quite find the SraRunTable.txt to check but doesn't this just remake the input again since they are all either paired or single? (except the header maybe?)
An alternative might be to grep the header and a sample name or header and a couple samples and >> them to the same output file.
In this example, I would probably rename the output to keep them as separate files, one for the PAIRED and one for SINGLE, especially since we already have them in the input file.

sstevens2 · 2025-10-06T15:35:25Z

episodes/04-redirection.md

+:::::::::::::::::::::::::::::::::::::::::  callout

-The output of our second call to `wc` shows that we have not overwritten our original data.
+## Using `grep` with wildcards


This switching back has me wondering, should this whole episode be on the metadata to avoid switching?
If you want to keep it as it is above, switching to metadata in the redirection section, this section could be removed so you don't have to switch back and forth. Not sure the example with multiple files is fully needed here.

sstevens2 · 2025-10-06T15:39:29Z

episodes/04-redirection.md

-The fifth and six lines in the output display "--" which is the default action for `grep` to separate groups of
-lines matching the pattern, and indicate groups of lines which did not match the pattern so are not displayed.
-To fix this issue, we can redirect the output of grep to a second instance of `grep` as follows.
+The `-v` option for `grep` search stands for `--invert-match` meaning `grep` will now only display the


Suggested change

The `-v` option for `grep` search stands for `--invert-match` meaning `grep` will now only display the

Sometimes we want to find all the lines that don't match a specific pattern, rather than trying to construct a complex pattern with many options..

The `-v` option for `grep` search stands for `--invert-match` meaning `grep` will now only display the

lines which do not match the searched pattern.

```bash

$ grep -v SINGLE SraRunTable.txt | less

I am in favour of using the SraRunTable.txt as the primary example for the grep exercise. This feels more like what you would would do in a reality. I would prefer not using the fastq files as examples for grep at all as it's a bit of tricky format for newbies.

naupaka · 2025-10-07T13:19:31Z

Thank you for the PR @ameynert! I'll take a look through this ASAP

feat: change grep & redirection exercise to use metadata instead of f…

04e8b45

…astq

ameynert self-assigned this Sep 19, 2025

ameynert commented Sep 19, 2025

View reviewed changes

github-actions bot pushed a commit that referenced this pull request Sep 19, 2025

differences for PR #371

beed2ae

ameynert commented Sep 19, 2025

View reviewed changes

sstevens2 reviewed Oct 6, 2025

View reviewed changes

		Let's try out this command to look for particular samples in the SRA metadata file and copy the
		output to another file called `metadata.txt`.

	Let's try out this command to look for particular samples in the SRA metadata file and copy the
	output to another file called `metadata.txt`.
	Let's try out this command to pull out particular samples from the SRA metadata file and save the metadata for those samples to another file called `metadata.txt`.

		$ grep PAIRED SraRunTable.txt > metadata.txt
		$ grep SINGLE SraRunTable.txt >> metadata.txt

-The `-v` option for `grep` search stands for `--invert-match` meaning `grep` will now only display the
+Sometimes we want to find all the lines that don't match a specific pattern, rather than trying to construct a complex pattern with many options..
+The `-v` option for `grep` search stands for `--invert-match` meaning `grep` will now only display the
+lines which do not match the searched pattern.
+```bash
+$ grep -v SINGLE SraRunTable.txt | less

Uh oh!

feat: change grep & redirection exercise to use metadata instead of fastq #371

Are you sure you want to change the base?

feat: change grep & redirection exercise to use metadata instead of fastq #371

Uh oh!

Conversation

ameynert commented Sep 19, 2025

Uh oh!

github-actions bot commented Sep 19, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Thank you!

Rendered Changes

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ameynert Sep 19, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ameynert commented Sep 19, 2025

Uh oh!

ggrimes commented Sep 26, 2025

Uh oh!

tobyhodges commented Sep 26, 2025

Uh oh!

ggrimes commented Oct 3, 2025

Uh oh!

ameynert commented Oct 3, 2025

Uh oh!

sstevens2 left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

naupaka commented Oct 7, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

github-actions bot commented Sep 19, 2025 •

edited

Loading

ameynert Sep 19, 2025 •

edited

Loading