Skip to content
Snippets Groups Projects

Update readme and minor fixes to slides

3 files
+ 184
66
Compare changes
  • Side-by-side
  • Inline

Files

+ 65
60
@@ -161,7 +161,7 @@ Ok, so what is automation good for?
* Saving time
* Preventing errors
* Ensuring reproductibility
* Ensuring reproductibility
* Avoiding boredom
@@ -222,7 +222,7 @@ $\rightarrow$ The [script we just made](#first-script) is an example of the "glu
When/How __not__ to use Shells?
---------------------------
Pure shell is __not__ optimal if you need
Pure shell is __not__ optimal if you need
* speed
* nontrivial data types (_e.g._ floating-point numbers, any kind of structure
@@ -668,7 +668,7 @@ the grammar is *recursive*.
:::
. . .
. . .
```bash
for f in *; do echo $f; done | wc -l
@@ -711,11 +711,13 @@ Brace Expansion - Examples
```bash
$ echo {1..100} # e.g. in loops (see below)
$ echo {a..j} # works on chars
$ echo {10..1} # works in reverse
# Create a project tree (note nesting)
$ mkdir -p myproject/{src,doc/{mail,ref},data}
# {} at same level -> ~ Cartesian product
$ echo {a..j} # works on chars
$ echo {10..1} # works in reverse
# Create a project tree with nested expansions.
$ mkdir -p my_project/{src,doc/{mail,ref},data}
# Multiple expansions at the same level generate all possible combinations:
$ echo {A..D}{1..3}
```
@@ -736,14 +738,14 @@ them because they're mostly relevant for interactive use.
Parameter Expansion
-------------------
A `$` followed by a parameter name is replaced by the parameter's value
A `$` followed by a parameter name is replaced by the parameter's value:
```bash
$ place=Rovaniemi
$ echo $place
Rovaniemi
$ echo "I'm off to $place"
I'm off to Rovaniemi
# -> I'm off to Rovaniemi
```
There is **a lot** more to parameter expansion than this. We'll come back to it
@@ -757,11 +759,11 @@ expansion, `$((...))`.]
```bash
$ echo "Today is $(date -I)"
$ didirze=$(du -s .)
$ dirze=$(du -s .)
$ nb_files=$(ls | wc -l)
```
An older form uses _backticks_:
An older form for command substitution uses _backticks_:
```bash
$ echo "it is now `date`"
@@ -784,16 +786,17 @@ $ a=2; b=3; echo $a+$b
This does:
```bash
$ a=2; b=3; echo $((a+b)) # Note: no $ needed
$ a=2; b=3; echo $((a+b)) # Note: no $ needed
```
The expression between `$((...))` is evaluated using [shell
arithmetic](#shell-arithmetic) (more about this later)
arithmetic](#shell-arithmetic) - more about this later.
Process Substitution
--------------------
Replace a filename argument with the output of a command.
**Process Substitution `<(...)`** replaces a filename argument with the
output of a command.
Example: What items are common to two lists?
@@ -842,7 +845,7 @@ Why split on words?
```Zsh
# Try this in Zsh, which doesn't word-split
$ ls -ld $dirs
$ ls -ld $dirs
ls: cannot access 'bin doc new':
No such file or directory
```
@@ -864,7 +867,7 @@ IFS
Word splitting uses the characters in the _internal field separator_^[Hence the alternative (and better) term _field_ splitting.] (by
default, `<space><tab><newline>`)^[Contrary to splitting into tokens, which uses
whitespace and metacharacters.] as word delimiters.
white space and metacharacters] as word delimiters.
The value of `IFS` can be changed:
@@ -946,7 +949,7 @@ operators and arguments removed from the expanded list of words:
```bash
# Output of ls goes into list.txt (destructive!)
# Use >> to append
$ ls > list.txt
$ ls > list.txt
```
**Note**: Redirection can be done from _within_ the script:
@@ -961,13 +964,13 @@ Here Documents: `<<`
```bash
cat <<END
# Everything up to END goes to the input of cat;
# The end token can be any word, not just END
# Everything up to END goes to the input of cat.
# The end word can be any word, not necessarily END.
# Quoting prevents expansion.
END
```
Useful to store some multiline output within the script - see `src/welcome.sh`.
Useful to generate multi-line outputs within a script - see `src/welcome.sh`.
Here Strings: `<<<`
------------
@@ -1053,7 +1056,7 @@ Interpretation
Project
=======
The Trouble with Fasta
The trouble with Fasta
----------------------
Here are some operations that one might wish to perform on a Fasta file:
@@ -1076,9 +1079,11 @@ Why ?
------
* Unix shell tools (`sed`, `awk`, `grep`, etc.) are predominantly _line-oriented_.
* Some bioinformatics formats are line-oriented (_e.g._ [GFF](#gff), [VCF](#vcf))
* Some bioinformatics formats are line-oriented (_e.g._ [GFF](#gff), [VCF](#vcf)).
* Fasta is not (neither are GenBank, UniProt, ...).
* Converting Fasta to some line-oriented format (_e.g._ CSV) would solve the problem.^[The _format_ problem, that is - the rest can be left to `grep` and the like.]
* Converting Fasta to some line-oriented format (_e.g._ CSV) would solve the
problem. ^[The _format_ problem, that is - the rest can be left to `grep` and
the like.]
. . .
@@ -1089,7 +1094,7 @@ WARNING
-------
\begin{alertblock}{Didactical Script!}
The script is meant to \emph{illustrate} concepts, \strong{not} to be efficient.
The script is meant to \emph{illustrate} concepts, \strong{not} to be efficient.
$\Rightarrow$ We'll write it in pure style. A real-world
script would be in "glue" style and very different (but mostly useless for
@@ -1115,7 +1120,7 @@ can run it.
#!/bin/bash
# Just to make sure our script, well, works.
echo "It works"!
echo "It works"!
```
Then do
@@ -1133,7 +1138,7 @@ It works!
::: nonincremental
* The `#!` ("shebang") specifies the interpreter (like for Python, etc.) -
without it, we have to call the script as
without it, we have to call the script as
```bash
$ bash fasta2tsv-stage-01.sh
```
@@ -1226,7 +1231,7 @@ We can now change our script to:
```bash
#!/bin/bash
# pos_arg.sh
grep Spo0A "$1"
grep Spo0A "$1"
```
```bash
@@ -1339,7 +1344,7 @@ for <name> in <words> ; do <commands> ; done
Expands `<words>`, and executes `<commands>`, binding `name` to each of the
resulting values in turn.
`;` can be (and often is) replaced with newlines
`;` can be (and often is) replaced with newlines.
`for` - Example 1
-----------------
@@ -1359,7 +1364,7 @@ done
Another typical case is with a sequence:
```bash
# Compute the squares of numbers from 1 to 10
# Compute the squares of numbers from 1 to 10.
$ for n in {1..10} ; do echo $((n**2)) ; done
```
@@ -1374,8 +1379,8 @@ for ((<start-cmd>; <condition>; <iteration-cmd>)); do
done
```
1. Evaluate `<start-cmd>`
1. Evaluate `<condition>`; if true execute `<list>`, if not exit loop
1. Evaluate `<start-cmd>`.
1. Evaluate `<condition>`; if true execute `<list>`, if not exit loop.
1. Evaluate `<iteration-cmd>` and go back to 2.
. . .
@@ -1529,11 +1534,11 @@ Can you guess what the rule is?
Unset and Null
--------------
* A _null_ variable has the empty string for a value
* A _null_ variable has the empty string for a value.
* An _unset_ variable does not exist at all (and it's usually a mistake to try
to use its value^[Think of `NULL`, `null`, `nil`, `None`, or `Nothing` in your
favourite language])
* A variable can be deleted with `unset`
favourite language]).
* A variable can be deleted with `unset`.
```bash
$ place=Seoul # non-empty
@@ -1578,7 +1583,7 @@ $ PI=4096
bash: PI: readonly variable
```
Readonly variables can't be `unset`.
Read-only variables can't be `unset`.
Type
----
@@ -1628,12 +1633,12 @@ Arrays
------
* _indexed_ arrays (or just "arrays" for short) store lists of values referred
to by a nonnegative integer. They work like the (1D) arrays of other
to by a non-negative integer. They work like the (1D) arrays of other
languages, e.g. `weights[7]` could be the 7th^[Or the 8th, depending on the
language] element in an array of weight values.
* _associative_ arrays store key-value pairs, like Python's dictionaries or
Ruby/Perl's hashes, e.g. `nb_reads['rec_A']`^[Again, the syntax is
language-dependen.] for the number of reads mapping to the _recA_ gene.
language-dependent.] for the number of reads mapping to the _recA_ gene.
Indexed Arrays
--------------
@@ -1642,7 +1647,7 @@ Indexed Arrays
$ ary=(1 two 'Hey there') # create whole array
$ ary[3]=foo # set individual element
$ ary+=(bar) # append
$ unset ary[1] # delete element
$ unset ary[1] # delete element
$ unset ary # delete array
$ echo ${ary[0]} # 0-based
$ declare -p ary # inspect array
@@ -1663,10 +1668,10 @@ $ showa "${names[*]}" # 1 argument
$ IFS=','; echo "${names[*]}"; unset IFS
```
The `#` operator yields the number of elements
The `#` operator yields the number of elements
```bash
$ echo ${#names[@]} # (or *)
$ echo ${#names[@]} # (or *)
```
Iterating over an Array
@@ -1690,7 +1695,7 @@ Cf. `../src/pascal.sh`
Arrays and Word Splitting
-------------------------
Word splitting is _NOT disabled_ when creatings arrays:
Word splitting is _NOT disabled_ when creating arrays:
```bash
$ elements='A B "C D"'
@@ -1712,14 +1717,14 @@ Array Caveats
```bash
$ names=(Frodo Lobelia Arwen)
$ echo $names # = ${names[0]} -> Frodo
```
```
* Arrays can't be assigned as values:
```bash
# Try to make a copy of `names`
$ lotr_names=names # WRONG - string assignment
$ lotr_names=$names # WRONG - see above
$ lotr_names=(${lotr_names[@]}) # OK
$ lotr_names=names # WRONG - string assignment
$ lotr_names=$names # WRONG - see above
$ lotr_names=(${names[@]}) # OK
```
Associative Arrays
@@ -1728,7 +1733,7 @@ Associative Arrays
Associative arrays _must_ be `declare`d as such:
```bash
$ declare -A aar
$ declare -A aar
$ aar[key1]=val1
$ declare -A aar=(K1 V1 K2 V2)
$ echo ${aar[key1]}
@@ -1749,12 +1754,12 @@ the order they were added to the array.
Example of Associative Array Usage
----------------------------------
### Ideas: (keep one for the course and the other as an exam question?)
### Ideas: (keep one for the course and the other as an exam question?)
(the example in the 2022-11 course was a bit too long).
* codon -> aa translator
* IUPAC -> grep pattern generator
* IUPAC -> grep pattern generator
`fasta2tsv.sh`: Stage 5 {#stage-05}
-----------------------
@@ -1793,8 +1798,8 @@ enable the script to **choose what to do** between two or more possibilities.
The main conditional constructs are:
* `if` - yes-or-no decisions (possibly nested), based on a _test command_
* `case` - multi-way decision, based on pattern matching
* `if` - yes-or-no decisions (possibly nested), based on a _test command_.
* `case` - multi-way decision, based on pattern matching.
`if`
----
@@ -1803,7 +1808,7 @@ The basic idea:
```bash
if <test-command> ; then
<statements> # iff test-command returns 0
<statements> # if test-command returns 0
fi
```
@@ -1853,7 +1858,7 @@ Test Commands
Test commands are the main ingredient of `if...then`
[conditionals](#conditionals) and `while/until` [loops](#loops). They can be:
* a _list_ - the test succeeds iff the list itself succeeds (returns 0);
* a _list_ - the test succeeds iff the list itself succeeds (returns 0).
* a _conditional expression_ between `[[` and `]]` - the test succeeds if the
expression is true, the expression involves _strings_ (including
filenames);^[An older form for conditional expression used `[...]` or `test`.]
@@ -1867,8 +1872,8 @@ Why can 0 signal both success and failure?
:::: nonincremental
* Unix: (many) more ways to fail than to succeed $\rightarrow$ 0 for sucess, $> 0$ for
various kinds of errors
* Unix: (many) more ways to fail than to succeed $\rightarrow$ 0 for success,
$> 0$ for various kinds of errors
* Early shells: crude Boolean and arithmetic expressions (if at all)
* C language: Boolean algebra with 0 for false and nonzero for true
@@ -1926,7 +1931,7 @@ operator true if
`-x` file is executable
There are also a few file _comparison_ operators, such as `f1 -nt f2` which is
true iff `f1` is newer than `f2`. Obviously, they expect _two_ arguments.
true if `f1` is newer than `f2`. Obviously, they expect _two_ arguments.
------
@@ -2248,9 +2253,9 @@ itself start a new process.
We write functions in order to:
* re-use code (DRY)
* improve the clarity of the code
* avoid creating new processes
* Re-use code (DRY principle - Don't Repeat Yourself).
* Improve the clarity of the code.
* Avoid creating new processes.
Definition
@@ -2673,4 +2678,4 @@ References
* The [Bash website](https://www.gnu.org/software/bash/) and especially the [Bash Manual](http://www.gnu.org/software/bash/manual)
* [Advanced Bash Scripting Guide](ftp://ftp.wayne.edu/ldp/en/abs-guide/abs-guide.pdf)
* The [Bash Cheat Sheet](https://devhints.io/bash)
* The [Bash Programming Reference](https://gitlab.isb-sib.ch/tjunier/bash-prog-cheat) is another cheatsheet, more specialized towards programming.
* The [Bash Programming Reference](https://gitlab.isb-sib.ch/tjunier/bash-prog-cheat) is another cheat-sheet, more specialized towards programming.
Loading