-> split+glob

d3bdc50a · Thomas Junier · 1373125a · d3bdc50a · d3bdc50a
Commit d3bdc50a authored May 2, 2021 by Thomas Junier
--- a/slides/shell-scripting.md
+++ b/slides/shell-scripting.md
@@ -204,8 +204,10 @@ $ ls | wc -l
 There can be more than two commands:

 ```bash
-$ grep -o 's:.*' ../sample_00.dna | sort | uniq -c \
-  | sort -rn | head
+$ grep -o '{[^}]\+}' < Spo0A.msa    \
+  | tr -d '{}'                      \
+  | awk '$2 != "sp."{print $1, $2}' \
+  | sort
 ```

 ------
@@ -246,11 +248,11 @@ Task
 Find the maximum number of sequences per species in `sample_00.dna` (e.g. to
 check for overrepresentation).

-$\rightarrow$ Let's Try this together, using your knowledge of *interactive*
+$\rightarrow$ Let's try this together, using your knowledge of *interactive*
 shell use only (no scripting yet)

 A possible Solution
-------------
+-------------------

 * `grep` extracts lines from a file if they match some pattern
 * `uniq` squishes blocks of identical lines
@@ -263,8 +265,8 @@ $ grep -o 's:.*;' sample_00.dna \
 ```
 ------

-Suppose we want to perform the same tasks on all the other `sample_??.dna` files. I could type these
-commands again, once per file, but 
+Suppose we want to perform the same task on all the other `sample_??.dna` files.
+I could type these commands again, once per file, but 

 * I am likely to get bored and/or tired
 * I am more likely to make mistakes
@@ -309,11 +311,11 @@ WARNING: Lots of Theory Ahead!
  the shell works...
 * ... and we need a deeper understanding for scripting than for interactive use.
 * Arguably most hair-pulling bugs^[E.g. when you yell at the computer and
-  threaten it with defenestration... that sort of thing.] come from our^[I
+  threaten it with defenestration...] come from our^[I
  include myself here, of course] not-so-complete understanding of what the
  shell actually does with our input.
+* Don't try to learn all this material - rather, be aware of it.

-With that out of the way...

 What the Shell does for Us
 --------------------------
@@ -368,10 +370,10 @@ Broadly speaking, the shell does the following:

 See [the manual](https://www.gnu.org/software/bash/manual/html_node/Shell-Operation.html#Shell-Operation) for details.

-Splitting 
+Tokenizing 
 ----------

-Basically:
+Analogous to recognizing _words_ in natural language.

 * Tokens are separated according to _metacharacters_: whitespace (`space`, `tab`,
  `newline`) or any of `|`, `&`, `;`, `(`, `)`, `<`, or `>`. 
@@ -380,7 +382,7 @@ Basically:
  non-metacharacters (_words_) 
 * Metacharacters can be made literal by _quoting_ (more on that in a few slides).

-Splitting - Examples
+Tokenizing - Examples
 --------------------

 ```bash
@@ -468,7 +470,7 @@ $ echo 'my name is $name'
 Parsing Commands
 --------

-After splitting the input into tokens, the shell parses _commands_:
+Analogous to recognizing _sentences_ in natural language.

 * Simple commands (already known)
 * Pipelines (`cmd | cmd`, etc. - already known)
@@ -529,7 +531,7 @@ expressions with values, in this order:
 Brace Expansion
 ---------------

-Used to generate sets of names based on:
+Used to generate sets of strings based on:

 * comma-separated strings: `file_{A,B,C}.txt` $\rightarrow$ `file_A.txt` `file_B.txt` `file_C.txt` 
 * a sequence: `sample_{1..9}` $\rightarrow$ `sample_1` ... `sample_9`
@@ -542,7 +544,7 @@ Brace Expansion - Examples
 --------------------------

 ```bash
-$ for i in {1..100}; do ... # see loops below
+$ echo {1..100} # e.g. in loops (see below)
 $ echo {a..j} # works on chars
 $ echo {10..1} # works in reverse
 # Create a project tree (note nesting)
@@ -583,12 +585,14 @@ Parameter Expansion
 A `$` followed by a parameter name is replaced by the parameter's value

 ```bash
-place=Rovaniemi
-echo "I'm off to $place"
+$ place=Rovaniemi
+$ echo $place
+Rovaniemi
+$ echo "I'm off to $place"
 I'm off to Rovaniemi
 ```

-There is **a lot** more to parameter expansion than this. We'll come back to this 
+There is **a lot** more to parameter expansion than this. We'll come back to it 
 later on.

 Command Substitution
@@ -705,6 +709,8 @@ There are predefined classes, e.g. for letters, digits, punctuation, etc.
 Note that __globbing only occurs on the results of word splitting__:

 ```bash
+$ cd data
+$ IFS= # else splitting may not work as expected
 $ glob='*.msa *.pep' 
 $ echo '$glob'     # literal
 $ echo "$glob"     # "" -> NO splitting -> NO glob

--- a/slides/shell-scripting_beamer.pdf
+++ b/slides/shell-scripting_beamer.pdf