Skip to content
GitLab
Explore
Sign in
Primary navigation
Search or go to…
Project
B
Bash scripting course
Manage
Activity
Members
Labels
Plan
Issues
Issue boards
Milestones
Code
Merge requests
Repository
Branches
Commits
Tags
Repository graph
Compare revisions
Deploy
Releases
Model registry
Analyze
Value stream analytics
Contributor analytics
Repository analytics
Model experiments
Help
Help
Support
GitLab documentation
Compare GitLab plans
GitLab community forum
Contribute to GitLab
Provide feedback
Keyboard shortcuts
?
Snippets
Groups
Projects
Show more breadcrumbs
Thomas Junier
Bash scripting course
Merge requests
!2
slides: fix a few typos and other minor changes
Code
Review changes
Check out branch
Download
Patches
Plain diff
Expand sidebar
Closed
slides: fix a few typos and other minor changes
dev-r
into
master
Overview
0
Commits
2
Changes
6
Closed
slides: fix a few typos and other minor changes
Robin Engler
requested to merge
dev-r
into
master
May 24, 2022
Overview
0
Commits
2
Changes
6
0
0
Merge request reports
Compare
master
version 9
22c69e7a
May 27, 2022
version 8
0bd2dcb4
May 27, 2022
version 7
95a0a7ed
May 27, 2022
version 6
fab49928
May 27, 2022
version 5
49781f52
May 27, 2022
version 4
38d84831
May 27, 2022
version 3
c69e96e5
May 25, 2022
version 2
401861c4
May 24, 2022
version 1
ea5fa952
May 24, 2022
master (base)
and
version 7
latest version
5fa21bd5
2 commits,
Nov 1, 2022
version 9
22c69e7a
2 commits,
May 27, 2022
version 8
0bd2dcb4
2 commits,
May 27, 2022
version 7
95a0a7ed
2 commits,
May 27, 2022
version 6
fab49928
2 commits,
May 27, 2022
version 5
49781f52
2 commits,
May 27, 2022
version 4
38d84831
2 commits,
May 27, 2022
version 3
c69e96e5
1 commit,
May 25, 2022
version 2
401861c4
1 commit,
May 24, 2022
version 1
ea5fa952
1 commit,
May 24, 2022
6 files
+
525
−
60
Inline
Compare changes
Side-by-side
Inline
Show whitespace changes
Show one file at a time
Files
6
exam/exam_questions.md
0 → 100644
+
167
−
0
View file @ 95a0a7ed
Edit in single-file editor
Open in Web IDE
# Shell scripting course - exam questions
Please follow these instructions to submit your answers:
*
Answer all questions in
**a single text file**
. The scripts you will be asked to write are
short, so they can be included in the same file too.
*
**Number your answers according to the questions**
, e.g.:
> Question 2
>
> Your answer here...
*
**Name your file**
using the pattern
**`<LAST NAME>_<First_name>_exam`**
. For instance
`SMITH_Alice_exam.md`
or
`SPONGE_Bob_exam.txt`
.
*
Submit your answers by email to
`robin.engler@sib.swiss`
and
`thomas.junier@sib.swiss`
.
*Note:*
all files needed for the exam questions are found in the
**`exam/`**
directory - the same
directory that also contains the present file.
<br>
## Question 1 - [1 point]
Consider the following variable declarations:
```
bash
# Case 1
name
=
"Dendroaspis"
# Case 2
name
=
"Dendroaspis angusticeps"
```
**Questions:**
*
In which (if any) of the above cases are the quotes really needed?
*
What happen if the quotes are omitted?
<br>
## Question 2 - [1 point]
We would like to surround indirect speech with double quotes, as in:
> I said, "This will never work."
However, when we type the following, it doesn't work (the quotes are missing):
```
bash
$
echo
I said,
"This will never work"
```
**Task:**
give 2 solutions of how can we fix this problem.
<br>
## Question 3 [3 points]
The file
`sequences_mammalia.fasta`
contains genomic sequences for different mammal species,
amongst which are
[
*Vulpes vulpes*
](
https://en.wikipedia.org/wiki/Red_fox
)
- the red fox,
[
*Vulpes lagopus*
](
https://en.wikipedia.org/wiki/Arctic_fox
)
- the Arctic fox, and
[
*Vulpes pallida*
](
https://en.wikipedia.org/wiki/Pale_fox
)
- the Pale fox.
**Reminder:**
each sequence in a Fasta files starts with a
**header line**
that starts with
a
**`>`**
character.
**Tasks:**
*
Write a Bash command that stores the number of sequences found in the file for the red fox
in a variable named
`red_fox_sq_count`
.
*
Write a
**`for`**
loop that prints the name of the species and the number of sequences found
in the file for the Red, Arctic and Pale fox. Your output on the terminal should looks
something like this:
```
<species 1> sequence count: x
<species 2> sequence count: y
<species 3> sequence count: z
```
<br>
## Question 4 [4 points]
Using the same
`sequences_mammalia.fasta`
file as in the previous exercise, write a script that
**prints the number of sequences per species**
to a tab-delimited text file named
`seq_count_per_species.txt`
.
The output file should have a header line (row names), followed by data for each species
(name and sequence count), like so:
```
species count
Balaenoptera musculus 2
Hippopotamus amphibius 3
Vulpes lagopus 3
...
...
```
**Important:**
*
The names of the species should be retrieved programatically, not manually.
*
The name of the input and output files should only appear once in the script
(i.e. DRY principle - don't repeat yourself), so they are easy to change.
**Hints:**
here is a suggestions of how your script could proceed:
1.
Extract the list of unique species names from the
`sequences_mammalia.fasta`
input file and
store them in a temporary file named
`tmp.txt`
.
2.
Loop over the species stored in
`tmp.txt`
, compute the sequence count for each of them and
add it to the output file.
3.
Delete the temporary
`tmp.txt`
file.
<br>
## Question 5 [4 points]
In Fasta seuquences with nucleotides, the
**`N`**
character is used to indicate
**unindentified nucleotides**
(i.e. nucleotides for which the reading from the sequencer was
of too poor quality to be assigned a specific nucleotide value).
**Task:**
*
Write a program named
`detect-Ns.sh`
that takes the _output_ of
`exam/fasta2tsv-exam.sh`
as input
(NOT an original Fasta file!) and
**keeps only those sequences**
in which the sequence field has
**at least one `N` character**
. This is essentially a detector of poor-quality sequences.
*
Test your script on the file
`exam/test_sequences.fasta`
. It should produce the following
result:
```
bash
$
./src/fasta2tsv-exam.sh < ./data/Q4_test.fasta | ./detect-Ns.sh
Test_sequence_1
(
has 1 N
)
;
TGGCCTTAGATGACGCGTTGGGTGNCGGCGCCTGAAAGTTCAGGTAAAACGACCGTGGCA
Test_sequence_3
(
has 4 N
)
;
AGGGGCGATTATGCNNATGGGTGACGCTGCCCTGAAAGTTCANGTAAAACGACCGTGGCN
Test_sequence_5
(
has 5 N
)
;
AGGGGCGATTATGCNNATGGGTGACGCTGCNNTGAAAGTTCAGGTAAAACGACCGTGGCN
```
**Notes and Hints:**
*
The
`exam/fasta2tsv-exam.sh`
file is a script similar to what we developed together in the
course: it takes a Fasta file as input, and converts it to a tabulated format (tab delimited)
file where each line corresponds to a fasta sequence (header in the first field and sequence in
the second field).
*
To set the IFS to TAB, you can use ANSI quoting, e.g.
`IFS=$'\t'`
. See the
code in
`fasta2tsv-exam.sh`
for an example.
<br>
## Question 6 [3 points]
Write a program that converts the TSV (tab-separated values) produced by
`fasta2tsv-exam.sh`
back into Fasta format. Call it
**`tsv2fasta.sh`**
.
**Hint:**
*
Fasta does not require sequences to be on multiple lines (it just allows it). It's therefore
OK if the nucleotide sequence is on a single line (the header, on the other hand,
**must be on a line of its own _and_ start with a `>`**
).
<br>
## Question 7 [2 points]
Show how
`fasta2tsv-exam.sh`
can be combined with
`detect-Ns.sh`
and
`tsv2fasta.sh`
to filter
an input Fasta file and
**keep only the poor-quality sequences**
(those containing at least one
`N`
).
**Tasks:**
*
Write an expression/command that filters the file
`sequences_mammalia.fasta`
by keeping only
the poor-quality sequences.
*
Write a second expression/command that additionally filters the
`sequences_mammalia.fasta`
file
to only keep the poor-quality sequences that belong to
[
*Vultur gryphus*
](
https://en.wikipedia.org/wiki/Andean_condor
)
- the Andean condor.
**Hint:**
*
You don't need to write a full script for this exercise, an expression on the command
line should be enough.
<br>
Loading