Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Episode 03: Working with Files and Directories -- cat something other than a fastq file #152

Open
taylorreiter opened this issue Jun 27, 2018 · 7 comments
Labels
help wanted Looking for Contributors

Comments

@taylorreiter
Copy link
Contributor

The lesson demonstrates cat by applying cat to a fastq file. Can we cat a different file instead? When the learners go back to working with full size fastq files, cating the whole file is generally a bad idea because fastq files contain a lot of text. Perhaps we could cat ~/dc_sample_data/sra_metadata/SraRunTable.txt?

@raynamharris
Copy link
Contributor

+1

@aschuerch
Copy link
Contributor

@taylorreiter I agree that cat applied to fastq files is not a best practice example. On the other hand, doesn't it give the instructor a chance to explain why cat is not such a good idea here? I feel that it would break the flow of the narrative to use another file at this place of the lesson.

@raynamharris
Copy link
Contributor

Thanks @aschuerch. this is a good perspective.

This part of the lesson comes very early in the workshop. According to instructor training, we should treat the most useful first, so it the only utility of this line is to show them an unuseful way to look at files, I saw we omit it all together.

However, if teaching cat is important, I think cat ~/dc_sample_data/sra_metadata/SraRunTable.txt is a nice way to get learned used to looking at files in a terminal window as opposed to in an excel spreadsheet. However, again, they may or may not find that useful/scary/daunting.

@aschuerch
Copy link
Contributor

aschuerch commented Jun 29, 2018

I would be fine with omitting cat here and I am pretty sure we can do this safely within the shell lesson, however it would be good to double-check with the subsequent lessons wrangling-genomics and cloud-genomics if this doesn't break anything there.

@ErinBecker ErinBecker added this to the June_2019_release milestone Mar 18, 2019
@ErinBecker ErinBecker added the help wanted Looking for Contributors label Mar 18, 2019
@ErinBecker
Copy link
Contributor

Thanks for thinking of how this would affect later lessons @aschuerch. I've just done a search in those two lesson repos and found that cat is used multiple times in each.

https://github.com/datacarpentry/wrangling-genomics/search?q=cat&unscoped_q=cat
https://github.com/datacarpentry/cloud-genomics/search?q=cat&unscoped_q=cat

Based on this, I'm in favor of retaining an introduction to cat in this lesson, and I like @taylorreiter's suggestion to have the learners cat the metadata file instead of a FASTQ file. I'm happy to put in a PR for this if it would be useful. Please let me know @aschuerch

@aschuerch
Copy link
Contributor

With the recent move of the 'file manipulation' part to Extra, we do not touch the metadata anymore within the regular shell lesson. We can use cat on the metadata to demonstrate its usefulness but it would mean switching directories and introducing a new file. I would advise against it. In favor of 'teach most useful first' , how about we move head and tail and Details on FASTQ format before cat and less?

@ValterAlmeida
Copy link

I agree, @aschuerch. In addition, as we have the question "How can I view and search file contents" in the overview, I would also consider adding the command line "grep" in order to teach the learner another way to search content in the files. E.g: "grep SAMtools rsmodules.sh".

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
help wanted Looking for Contributors
Projects
None yet
Development

No branches or pull requests

5 participants