Changes to be committed:

modified: README.txt modified: examples/README.html modified: examples/README.txt new file: examples/ctcf_real.mat new file: examples/stat1_jaspar.mat new file: examples/stat1_lpm.mat new file: examples/stat1_pfm.mat new file: examples/statx_transfac.mat modified: perl_tools/lpmconvert.pl modified: perl_tools/pwmconvert.pl modified: pwm_bowtie_wrapper new file: pwm_convert modified: pwm_scan

Changes to be committed:
71e277e1 · Giovanna Ambrosini · 1e948e4e · 71e277e1 · 71e277e1 · 71e277e1
Commit 71e277e1 authored Nov 2, 2017 by Giovanna Ambrosini
--- a/README.txt
+++ b/README.txt
@@ -132,6 +132,10 @@ We also provide three (bash) shell wrapper scripts that embed the entire analysi
  - pwm_bowtie_wrapper    Scan a genome with a PWM and a p-value using Bowtie
  - pwm_mscan_wrapper     Scan a genome with a PWM and a p-value using matrix_scan

+For matrix conversion into integer log-odds, we also provide the following bash script:
+
+  - pwm_convert           Convert JASPAR, TRANSFAC, PFM, LPM and real PWM formats to integer log-odds
+

 PWMScan pipeline and application example
 ============================================================================

--- a/examples/README.html
+++ b/examples/README.html
@@ -217,7 +217,7 @@
            </li>
            
            <p class="code">
-                mba -c 1128 -l 19 chen10_ctcf.mat | sort -k2,2 -nr &gt chen10_ctcf_co1128.dat<br><br>
+                mba -c 1128 -l 19 chen10_ctcf.mat &gt chen10_ctcf_co1128.dat<br>
                <font color=firebrick># Convert the tag list into FASTA format (for bowtie):</font><br>
                awk '{print "&gt"$2"\n"$1}' chen10_ctcf_co1128.dat &gt chen10_ctcf_co1128_taglist.fa<br>
            </p>
@@ -244,12 +244,12 @@
                    <p>
                        The command to map the list of PWM tags to hg19 is the follwing:
                    <p class="code">
-		            <font color=firebrick># It takes of the order of 20 seconds for a full scan of the hg19 genome assembly (with the exclusion of the mitochondrial chromosome). The total number of hits is 143597.</font><br><br> 
-                            bowtie -l 19 -n0 -a BOWTIE_DIR/h_sapiens_hg19 -f chen10_ctcf_co1128_taglist.fa --un unmapped.dat &gt chen10_ctcf_co1128_bowtie.out<br>
-                            <br>                        
-                            bowtie -l 19 -n0 -a /home/local/db/bowtie/h_sapiens_hg19 -f chen10_ctcf_co1128_taglist.fa --un unmapped.dat &gt chen10_ctcf_co1128_bowtie.out<br>
+		            <font color=firebrick># It takes of the order of 12 seconds for a full scan of the hg19 genome assembly (with the exclusion of the mitochondrial chromosome). The total number of hits is 143597.</font><br>
+                            bowtie --threads 4 -l 19 -n0 -a BOWTIE_DIR/h_sapiens_hg19 -f chen10_ctcf_co1128_taglist.fa --un unmapped.dat &gt chen10_ctcf_co1128_bowtie.out<br>
+                            <font color=firebrick># Example:</font>
                            <br>                        
-                            <font color=firebrick># Convert the bowtie output to <b><span style='font-size:13px'>BEDdetail</span></b> via the) following command:</font><br>
+                            bowtie --threads 4 -l 19 -n0 -a /home/local/db/bowtie/h_sapiens_hg19 -f chen10_ctcf_co1128_taglist.fa --un unmapped.dat &gt chen10_ctcf_co1128_bowtie.out<br>
+                            <font color=firebrick># Convert the bowtie output to <b><span style='font-size:13px'>BEDdetail</span></b> via the following command:</font><br>
                            sort -s -k3,3 -k4,4n chen10_ctcf_co1128_bowtie.out | bowtie2bed -s hg19 -l 19 -i chr_NC_PATH | awk 'BEGIN { while((getline line < "chen10_ctcf_co1128_scoretab.txt") &gt 0 ) {split(line,f," "); pvalue[f[1]]=f[2]} close("chen10_ctcf_co1128_scoretab.txt")} {print $1"\t"$2"\t"$3"\t"$4"\t"$5"\t"$6"\t""chen10_ctcf""\t""P-value="pvalue[$5]}' &gt chen10_ctcf_co1128_bowtie.bed<br>

                            <font color=firebrick># Example (use default chr_NC_PATH):</font><br>
@@ -261,18 +261,13 @@
                            # Also note that the bowtie ouput reports the match score in the first field so that it can be easily retrieved.<br>
                            <br>
                            # To run the bowtie-based pipeline, type:</font><br>
-                            bowtie -l 19 -n0 -a BOWTIE_DIR/h_sapiens_hg19 -f chen10_ctcf_co1128_taglist.fa --un unmapped.dat | sort -s -k3,3 -k4,4n | 
+                            bowtie --threads 4 -l 19 -n0 -a BOWTIE_DIR/h_sapiens_hg19 -f chen10_ctcf_co1128_taglist.fa --un unmapped.dat | sort -s -k3,3 -k4,4n | 
                            bowtie2bed -s hg19 -l 19 -i chr_NC_PATH | awk 'BEGIN { while((getline line < "chen10_ctcf_co1128_scoretab.txt") &gt 0 ) {split(line,f," "); pvalue[f[1]]=f[2]} close("chen10_ctcf_co1128_scoretab.txt")} {print $1"\t"$2"\t"$3"\t"$4"\t"$5"\t"$6"\t""chen10_ctcf""\t""P-value="pvalue[$5]}' &gt chen10_ctcf_co1128_bowtie.bed<br>
-                            <br>
                            <font color=firebrick># Bowtie can read input files from stdin. You should specify "-" for stdin.<br>
                            # In such case, you can run the entire pipeline as follows:</font><br>
-                            mba -c 1128 -l 19 chen10_ctcf.mat | sort -k2,2 -nr | awk '{print ">"$2"\n"$1}' | bowtie -l 19 -n0 -a BOWTIE_DIR/h_sapiens_hg19 -f - --un unmapped.dat | sort -s -k3,3 -k4,4n | bowtie2bed -s hg19 -l 19 -i chr_NC_PATH | awk 'BEGIN { while((getline line < "chen10_ctcf_co1128_scoretab.txt") &gt 0 ) {split(line,f," "); pvalue[f[1]]=f[2]} close("chen10_ctcf_co1128_scoretab.txt")} {print $1"\t"$2"\t"$3"\t"$4"\t"$5"\t"$6"\t""chen10_ctcf""\t""P-value="pvalue[$5]}' &gt chen10_ctcf_co1128_bowtie.bed<br>
-                            <br>
+                            mba -c 1128 -l 19 chen10_ctcf.mat | awk '{print ">"$2"\n"$1}' | bowtie --threads 4 -l 19 -n0 -a BOWTIE_DIR/h_sapiens_hg19 -f - --un unmapped.dat | sort -s -k3,3 -k4,4n | bowtie2bed -s hg19 -l 19 -i chr_NC_PATH | awk 'BEGIN { while((getline line < "chen10_ctcf_co1128_scoretab.txt") &gt 0 ) {split(line,f," "); pvalue[f[1]]=f[2]} close("chen10_ctcf_co1128_scoretab.txt")} {print $1"\t"$2"\t"$3"\t"$4"\t"$5"\t"$6"\t""chen10_ctcf""\t""P-value="pvalue[$5]}' &gt chen10_ctcf_co1128_bowtie.bed<br>
                            <font color=firebrick># Example:</font><br>
-                            bowtie -l 19 -n0 -a /home/local/db/bowtie/h_sapiens_hg19 -f chen10_ctcf_co1128_taglist.fa --un unmapped.dat | sort -s -k3,3 -k4,4n | bowtie2bed -s hg19 -l 19 | awk 'BEGIN { while((getline line < "chen10_ctcf_co1128_scoretab.txt") &gt 0 ) {split(line,f," "); pvalue[f[1]]=f[2]} close("chen10_ctcf_co1128_scoretab.txt")} {print $1"\t"$2"\t"$3"\t"$4"\t"$5"\t"$6"\t""chen10_ctcf""\t""P-value="pvalue[$5]}' &gt chen10_ctcf_co1128_bowtie.bed<br>
-                            <br>
-                            <font color=firebrick># or</font><br>
-                             mba -c 1128 -l 19 chen10_ctcf.mat | sort -k2,2 -nr | awk '{print "&gt"$2"\n"$1}' | bowtie -l 19 -n0 -a /home/local/db/bowtie/h_sapiens_hg19 -f - --un unmapped.dat | sort -s -k3,3 -k4,4n | bowtie2bed -s hg19 -l 19 | awk 'BEGIN { while((getline line < "chen10_ctcf_co1128_scoretab.txt") &gt 0 ) {split(line,f," "); pvalue[f[1]]=f[2]} close("chen10_ctcf_co1128_scoretab.txt")} {print $1"\t"$2"\t"$3"\t"$4"\t"$5"\t"$6"\t""chen10_ctcf""\t""P-value="pvalue[$5]}' &gt chen10_ctcf_co1128_bowtie.bed<br>                           
+                             mba -c 1128 -l 19 chen10_ctcf.mat | awk '{print "&gt"$2"\n"$1}' | bowtie --threads 4 -l 19 -n0 -a /home/local/db/bowtie/h_sapiens_hg19 -f - --un unmapped.dat | sort -s -k3,3 -k4,4n | bowtie2bed -s hg19 -l 19 | awk 'BEGIN { while((getline line < "chen10_ctcf_co1128_scoretab.txt") &gt 0 ) {split(line,f," "); pvalue[f[1]]=f[2]} close("chen10_ctcf_co1128_scoretab.txt")} {print $1"\t"$2"\t"$3"\t"$4"\t"$5"\t"$6"\t""chen10_ctcf""\t""P-value="pvalue[$5]}' &gt chen10_ctcf_co1128_bowtie.bed<br>                           
                        </p>
                </li>
        <p>
@@ -298,7 +293,7 @@
        chr|NC_000004|NC_000004.11 
        </pre></font>
        <p>
-           <font color=firebrick># Parallelization of matrix_scan<br></font>
+           <font color=firebrick># Parallelization of matrix_scan</font><br>
           <font color=black>To improve performance, we can run matrix_scan in parallel (on each chromosome file) using a simple python script called <b>matrix_scan_parallel.py</b> that dispatches the jobs to available CPU-cores. In this way, matrix_scan competes with the bowtie-based approach.</font>
        <p>
           <p class="code">
@@ -307,20 +302,20 @@
        <br>
           <font color=firebrick># To use matrix_scan_parallel.py, type the following command:<br> </font>
           <i>python /local/bin/matrix_scan_parallel.py -m chen10_ctcf.mat -f "$(ls /home/local/db/genome/hg19/chrom*.seq|grep -v chromMt)" -c 1128 -p 15 | sort -s -k1,1 -k2,2n -k6,6 &gt chen10_ctcf_co1128_matrix_scan.out</i>
-        <br>
-           <font color=firebrick># The -p option is used to set the maximum number of parallel processes for execution of matrix_scan. The above example takes only about 20 seconds to complete the task in agreement with Bowtie.<br>
+        <br><br>
+           <font color=firebrick># The -p option is used to set the maximum number of parallel processes for execution of matrix_scan. The above example takes only about 20 seconds to complete the task in agreement with Bowtie.<br><br>
           # Convert the matrix scan output to <b><span style='font-size:13px'>BEDdetail</span></b> by issuing the following command:</font>
        <br>
           <i>mscan2bed -s hg19 -i GENOME_DIR chen10_ctcf_co1128_matrix_scan.out | awk 'BEGIN { while((getline line < "chen10_ctcf_co1128_scoretab.txt") &gt 0 ) {split(line,f," "); pvalue[f[1]]=f[2]} close("chen10_ctcf_co1128_scoretab.txt")} {print $1"\t"$2"\t"$3"\t"$4"\t"$5"\t"$6"\t""chen10_ctcf""\t""P-value="pvalue[$5]}' &gt chen10_ctcf_co1128_matrix_scan.bed</i>
        <br>
           <font color=firebrick># Example:<br></font>
           <i>mscan2bed -s hg19 chen10_ctcf_co1128_matrix_scan.out | awk 'BEGIN { while((getline line < "chen10_ctcf_co1128_scoretab.txt") &gt 0 ) {split(line,f," "); pvalue[f[1]]=f[2]} close("chen10_ctcf_co1128_scoretab.txt")} {print $1"\t"$2"\t"$3"\t"$4"\t"$5"\t"$6"\t""chen10_ctcf""\t""P-value="pvalue[$5]}' &gt chen10_ctcf_co1128_matrix_scan.bed</i>
-        <br>
+        <br><br>
           <font color=firebrick>
           # Note : to convert matrix_scan output to BED format, we use the chr_NC_gi file for assembly hg19.<br>
           # By default, the chr_NC_gi is located at chr_NC_PATH/hg19 = /home/local/db/genome/hg19.<br> 
             &nbsp;&nbsp;chr_NC_PATH can be changed by using the &lt-i&gt option. 
-        <br>   
+        <br><br>   
           # To run the entire pipeline:
        <br></font>
           <i>cat GENOME_DIR/hg19/chrom*.seq | matrix_scan -m chen10_ctcf.mat -c 1128 | sort -s -k1,1 -k2,2n -k6,6 | mscan2bed -s hg19 -i GENOME_DIR | awk 'BEGIN { while((getline line < "chen10_ctcf_co1128_scoretab.txt") &gt 0 ) {split(line,f," "); pvalue[f[1]]=f[2]} close("chen10_ctcf_co1128_scoretab.txt")} {print $1"\t"$2"\t"$3"\t"$4"\t"$5"\t"$6"\t""chen10_ctcf""\t""P-value="pvalue[$5]}' &gt chen10_ctcf_co1128_matrix_scan.bed</i>
@@ -372,7 +367,7 @@
         <ul style='list-style-type: none;'>
         <li><font color=firebrick><b>1) Bowtie (pwm_bowtie_wrapper)</b> script</font>
        <p class="code">
-           # Usage: `basename $0` &ltmatrix-file&gt &ltp-value&gt &ltbowtie-dir&gt &ltidx-file&gt &ltassembly[hg19|mm9|..]&gt [&ltnon-overlapping matches [1/0]&gt]
+           # Usage: pwm_bowtie_wrapper &ltmatrix-file&gt &ltp-value&gt &ltbowtie-dir&gt &ltidx-file&gt &ltassembly[hg19|mm9|..]&gt [&ltnon-overlapping matches [1/0]&gt]
        <br>
           <font color=firebrick># Example:</font>
        <br>
@@ -384,7 +379,7 @@

        </li><li><font color=firebrick><b>2) Matrix Scan (pwm_mscan_wrapper)</b> script</font>
        <p class="code">
-           # Usage: `basename $0` &ltmatrix-file&gt &ltp-value&gt &ltgenome-dir&gt &ltassembly[hg19|mm9|..]&gt [&ltnon-overlapping matches [1/0]&gt] [&ltparallel [1/0]&gt]
+           # Usage: pwm_mscan_wrapper &ltmatrix-file&gt &ltp-value&gt &ltgenome-dir&gt &ltassembly[hg19|mm9|..]&gt [&ltnon-overlapping matches [1/0]&gt] [&ltparallel [1/0]&gt]
        <br>
           <font color=firebrick># Example:</font>
        <br>
@@ -397,7 +392,7 @@

        </li><li><font color=firebrick><b>3) PWM Scan (pwm_scan)</b> script</font>
        <p class="code">
-           # Usage: `basename $0` &ltmatrix-file&gt &ltp-value&gt &ltassembly[hg19|mm9|..]&gt &ltgenome-root-dir [ex:/db]&gt [&ltnon-overlapping matches [1/0]&gt] [&ltparallel [1/0]&gt]
+           # Usage: pwm_scan &ltmatrix-file&gt &ltp-value&gt &ltassembly[hg19|mm9|..]&gt &ltgenome-root-dir [ex:/db]&gt [&ltnon-overlapping matches [1/0]&gt] [&ltparallel [1/0]&gt]
        <br>
           <font color=firebrick># Examples:</font>
        <br>
@@ -425,13 +420,66 @@
           Of course, this only works on a multi-core processor machine.
        <br>
          </li></ul>
+
        <p>
-        <h3> How to build Bowtie index files</h3>
+        <h3> Shell (bash) Wrapper for matrix conversion</h3>
         <p class="tab">
+         The pwm_convert bash script is used to convert JASPAR, TRANSFAC, PFM, LPM and real PWM formats to integer log-odds format.
         <ul style='list-style-type: none;'>
+        <p class="code">
+           # Usage: pwm_convert &ltmatrix-file&gt -f=&ltformat [jaspar|transfac|lpm|pfm|real]&gt
+             <br>
+ &nbsp;&nbsp; OPTIONS:
+             <br>
+&nbsp;&nbsp;&nbsp;&nbsp;  -h                  show help message 
+             <br>
+&nbsp;&nbsp;&nbsp;&nbsp;  -b=<bg-freq-file>   bg nucleotide composition file 
+             <br>
+&nbsp;&nbsp;&nbsp;&nbsp;  -c=<pseudo-cnt>     pseudo-count fraction [def=0] 
+             <br>
+&nbsp;&nbsp;&nbsp;&nbsp;  -m=<minscore>       minimal score value [def=-10000] 
+             <br>
+&nbsp;&nbsp;&nbsp;&nbsp;  -n=<log-scaling>    log scaling factor [def=100] 
+             <br>
+&nbsp;&nbsp;&nbsp;&nbsp;  -s=<mult>           matrix multiplication factor (for real PWMs) [def=100] 
+             <br>
+&nbsp;&nbsp;&nbsp;&nbsp;  Convert JASPAR, TRANSFAC, PFM, LPM and real PWM formats to integer log likelihoods (log-odds) format.
         <p>
-          Let's define BOWTIE_DIR as as the directory containing the bowtie indices.
+         <font color=black>By default, the prior background nucleotide frequencies are set to 0.25.</font>
+        <br>
+        <p class="code">
+        <font color=firebrick># Examples</font>
+        <br>
+        <font color=firebrick># Convert JASPAR to log-odds:</font>
+        <br>
+             <i>pwm_convert stat1_jaspar.mat -f=jaspar 2&gt/dev/null</i><br>
+
+        <font color=firebrick># If we want to use different backgroung nucleotide frequencies:</font>
+        <br>
+             <i>pwm_convert stat1_jaspar.mat -f=jaspar -b=bg_probs_2.txt 2&gt/dev/null</i><br>
+
+        <font color=black># Where the bg_probs_2.txt file includes the prior bg frequencies, defined as follows:</font>
+        <br>
+         &nbsp;&nbsp;A  0.29 <br>
+         &nbsp;&nbsp;C  0.21 <br>
+         &nbsp;&nbsp;G  0.21 <br>
+         &nbsp;&nbsp;T  0.29 <br>
+
+        <font color=firebrick># Convert TRANSFAC to log-odds:</font>
+        <br>
+             <i>pwm_convert statx_transfac.mat -f=transfac -c=0.0000001 2&gt/dev/null</i><br>
+
+        <font color=firebrick># Convert PFM (Position Frequency Matrix) to log-odds:</font>
+        <br>
+             <i>pwm_convert pwm_convert stat1_pfm.mat -f=pfm 2&gt/dev/null</i><br>
+        <br>
+        </ul>
+        <p>
+        <h3> How to build Bowtie index files</h3>
        <p class="tab">
+         <ul style='list-style-type: none;'>
+        <p>
+          Let's define BOWTIE_DIR as as the directory containing the bowtie indices.<br>
          Go to the BOWTIE_DIR directory, and concatenate all chromosome files into a sequence file that includes the entire hg19 genome assembly:
        <p class="code">
          ls -1 -v DB_DIR/chrom*.seq | xargs cat &gt h_sapiens_hg19.fa

--- a/examples/README.txt
+++ b/examples/README.txt
@@ -153,7 +153,7 @@ matrix_prob --bg "0.29,0.21,0.21,0.29" chen10_ctcf.mat > chen10_ctcf_co1128_scor

 This step is only necessary if fast string-matchig tools (such as Bowtie) are used to scan the genome.

-mba -c 1128 -l 19 chen10_ctcf.mat | sort -k2,2 -nr > chen10_ctcf_co1128.dat
+mba -c 1128 -l 19 chen10_ctcf.mat > chen10_ctcf_co1128.dat

 Remove the score column:

@@ -182,11 +182,11 @@ The reason for generating a match list in FASTA format is that we want to carry

    bowtie -l 19 -n0 -a BOWTIE_DIR/h_sapiens_hg19 -f chen10_ctcf_co1128_taglist.fa --un unmapped.dat > chen10_ctcf_co1128_bowtie.out

-    # It takes of the order of 20 seconds for a full scan of the hg19 genome assembly (with the exclusion of the mitochondrial chromosome). 
+    # It takes of the order of 12-15 seconds for a full scan of the hg19 genome assembly (with the exclusion of the mitochondrial chromosome). 
      The total number of hits is 143597.

    # Example:
-    bowtie -l 19 -n0 -a /home/local/db/bowtie/h_sapiens_hg19 -f chen10_ctcf_co1128_taglist.fa --un unmapped.dat > chen10_ctcf_co1128_bowtie.out
+    bowtie --threads 4 -l 19 -n0 -a /home/local/db/bowtie/h_sapiens_hg19 -f chen10_ctcf_co1128_taglist.fa --un unmapped.dat > chen10_ctcf_co1128_bowtie.out

    # Convert the bowtie output to BEDdetail via the following command:

@@ -202,20 +202,16 @@ The reason for generating a match list in FASTA format is that we want to carry

    # To run the bowtie-based pipeline, type:

-    bowtie -l 19 -n0 -a BOWTIE_DIR/h_sapiens_hg19 -f chen10_ctcf_co1128_taglist.fa --un unmapped.dat | sort -s -k3,3 -k4,4n | bowtie2bed -s hg19 -l 19 -i chr_NC_PATH | awk 'BEGIN { while((getline line < "chen10_ctcf_co1128_scoretab.txt") > 0 ) {split(line,f," "); pvalue[f[1]]=f[2]} close("chen10_ctcf_co1128_scoretab.txt")} {print $1"\t"$2"\t"$3"\t"$4"\t"$5"\t"$6"\t""chen10_ctcf""\t""P-value="pvalue[$5]}' > chen10_ctcf_co1128_bowtie.bed
+    bowtie --threads 4 -l 19 -n0 -a BOWTIE_DIR/h_sapiens_hg19 -f chen10_ctcf_co1128_taglist.fa --un unmapped.dat | sort -s -k3,3 -k4,4n | bowtie2bed -s hg19 -l 19 -i chr_NC_PATH | awk 'BEGIN { while((getline line < "chen10_ctcf_co1128_scoretab.txt") > 0 ) {split(line,f," "); pvalue[f[1]]=f[2]} close("chen10_ctcf_co1128_scoretab.txt")} {print $1"\t"$2"\t"$3"\t"$4"\t"$5"\t"$6"\t""chen10_ctcf""\t""P-value="pvalue[$5]}' > chen10_ctcf_co1128_bowtie.bed

    # Bowtie can read input files from stdin. You should specify "-" for stdin.
    In such case, you can run the entire pipeline as follows:

-    mba -c 1128 -l 19 chen10_ctcf.mat | sort -k2,2 -nr | awk '{print ">"$2"\n"$1}' | bowtie -l 19 -n0 -a BOWTIE_DIR/h_sapiens_hg19 -f - --un unmapped.dat | sort -s -k3,3 -k4,4n | bowtie2bed -s hg19 -l 19 -i chr_NC_PATH | awk 'BEGIN { while((getline line < "chen10_ctcf_co1128_scoretab.txt") > 0 ) {split(line,f," "); pvalue[f[1]]=f[2]} close("chen10_ctcf_co1128_scoretab.txt")} {print $1"\t"$2"\t"$3"\t"$4"\t"$5"\t"$6"\t""chen10_ctcf""\t""P-value="pvalue[$5]}' > chen10_ctcf_co1128_bowtie.bed
+    mba -c 1128 -l 19 chen10_ctcf.mat | awk '{print ">"$2"\n"$1}' | bowtie --threads 4 -l 19 -n0 -a BOWTIE_DIR/h_sapiens_hg19 -f - --un unmapped.dat | sort -s -k3,3 -k4,4n | bowtie2bed -s hg19 -l 19 -i chr_NC_PATH | awk 'BEGIN { while((getline line < "chen10_ctcf_co1128_scoretab.txt") > 0 ) {split(line,f," "); pvalue[f[1]]=f[2]} close("chen10_ctcf_co1128_scoretab.txt")} {print $1"\t"$2"\t"$3"\t"$4"\t"$5"\t"$6"\t""chen10_ctcf""\t""P-value="pvalue[$5]}' > chen10_ctcf_co1128_bowtie.bed

    # Example:

-    bowtie -l 19 -n0 -a /home/local/db/bowtie/h_sapiens_hg19 -f chen10_ctcf_co1128_taglist.fa --un unmapped.dat | sort -s -k3,3 -k4,4n | bowtie2bed -s hg19 -l 19 | awk 'BEGIN { while((getline line < "chen10_ctcf_co1128_scoretab.txt") > 0 ) {split(line,f," "); pvalue[f[1]]=f[2]} close("chen10_ctcf_co1128_scoretab.txt")} {print $1"\t"$2"\t"$3"\t"$4"\t"$5"\t"$6"\t""chen10_ctcf""\t""P-value="pvalue[$5]}' > chen10_ctcf_co1128_bowtie.bed
-
-    # Or:
-
-    mba -c 1128 -l 19 chen10_ctcf.mat | sort -k2,2 -nr | awk '{print ">"$2"\n"$1}' | bowtie -l 19 -n0 -a /home/local/db/bowtie/h_sapiens_hg19 -f - --un unmapped.dat | sort -s -k3,3 -k4,4n | bowtie2bed -s hg19 -l 19 | awk 'BEGIN { while((getline line < "chen10_ctcf_co1128_scoretab.txt") > 0 ) {split(line,f," "); pvalue[f[1]]=f[2]} close("chen10_ctcf_co1128_scoretab.txt")} {print $1"\t"$2"\t"$3"\t"$4"\t"$5"\t"$6"\t""chen10_ctcf""\t""P-value="pvalue[$5]}' > chen10_ctcf_co1128_bowtie.bed
+    mba -c 1128 -l 19 chen10_ctcf.mat | awk '{print ">"$2"\n"$1}' | bowtie --threads 4 -l 19 -n0 -a /home/local/db/bowtie/h_sapiens_hg19 -f - --un unmapped.dat | sort -s -k3,3 -k4,4n | bowtie2bed -s hg19 -l 19 | awk 'BEGIN { while((getline line < "chen10_ctcf_co1128_scoretab.txt") > 0 ) {split(line,f," "); pvalue[f[1]]=f[2]} close("chen10_ctcf_co1128_scoretab.txt")} {print $1"\t"$2"\t"$3"\t"$4"\t"$5"\t"$6"\t""chen10_ctcf""\t""P-value="pvalue[$5]}' > chen10_ctcf_co1128_bowtie.bed


 5) Using matrix_scan
@@ -249,7 +245,7 @@ The reason for generating a match list in FASTA format is that we want to carry

    python /local/bin/matrix_scan_parallel.py -m chen10_ctcf.mat -f "$(ls /home/local/db/genome/hg19/chrom*.seq|grep -v chromMt)" -c 1128 -p 15 | sort -s -k1,1 -k2,2n -k6,6 > chen10_ctcf_co1128_matrix_scan.out  

-    The -p option is used to set the maximum number of parallel processes for execution of matrix_scan. The above example takes only about 20 seconds to complete the task in agreement with Bowtie. 
+    The -p option is used to set the maximum number of parallel processes for execution of matrix_scan. The above example takes only about 15 seconds to complete the task in agreement with Bowtie. 

    # Convert the matrix scan output to BEDdetail by issuing the following command:

@@ -287,7 +283,7 @@ Shell (bash) Wrappers to execute the analysis pipeline

 The output file is a match list in BEDdetail format.

-------------------------------------------------------------------------------
+------------------------------------------------------------------------------

 A few remarks on the general pwm_scan script.

@@ -303,33 +299,33 @@ Shell (bash) Wrappers to execute the analysis pipeline
    # The chrNC_dir variable (used for locating the the chr_NC_gi/chr_hdr files) is set to chrNC_dir=<genome-root-dir>/genome.
    # The bin_dir varaible (used for locating the matrix_scan_parallel.py script) is set to /home/local/bin.
 
-------------------------------------------------------------------------------
+------------------------------------------------------------------------------

    1) Bowtie wrapper script

-    # Usage: `basename $0` <matrix-file> <p-value> <bowtie-dir> <genome-idx-file> <assembly[hg19|mm9|..]> [<non-overlapping matches [1/0]>]
+    # Usage: pwm_bowtie_wrapper <matrix-file> <p-value> <bowtie-dir> <genome-idx-file> <assembly[hg19|mm9|..]> [<non-overlapping matches [1/0]>]

    # Example:

    pwm_bowtie_wrapper chen10_ctcf.mat 0.00001 /home/local/db/bowtie h_sapiens_hg19 hg19 0

    # Output file (BED) : chen10_ctcf_co1128_bowtie.bed
-------------------------------------------------------------------------------
+------------------------------------------------------------------------------

    2) Matrix Scan (matrix_scan) wrapper script

-    # Usage: `basename $0` <matrix-file> <p-value> <genome-dir> <assembly[hg19|mm9|..]> [<non-overlapping matches [1/0]>] [<parallel [1/0]>] 
+    # Usage: pwm_mscan_wrapper <matrix-file> <p-value> <genome-dir> <assembly[hg19|mm9|..]> [<non-overlapping matches [1/0]>] [<parallel [1/0]>] 

    # Example:

    pwm_mscan_wrapper chen10_ctcf.mat 0.00001 /home/local/db/genome hg19 0 1

    # Output file (BED) : chen10_ctcf_co1128_matrix_scan.bed
-------------------------------------------------------------------------------
+------------------------------------------------------------------------------

    3) PWM Scan (pwm_scan) wrapper script

-    # Usage: `basename $0` <matrix-file> <p-value> <assembly[hg19|mm9|..]> <genome-root-dir [ex:/db]> [<non-overlapping matches [1/0]>] [<parallel [1/0>]>]
+    # Usage: pwm_scan <matrix-file> <p-value> <assembly[hg19|mm9|..]> <genome-root-dir [ex:/db]> [<non-overlapping matches [1/0]>] [<parallel [1/0>]>]

    # Examples:

@@ -354,6 +350,52 @@ Shell (bash) Wrappers to execute the analysis pipeline
    # Of course it only works on a multi-core processor machine.


+Shell (bash) Wrapper for matrix conversion
+==============================================================================
+
+    The pwm_convert bash script is used to convert JASPAR, TRANSFAC, PFM, LPM and real PWM formats to integer log-odds format.
+
+    # Usage: pwm_convert <matrix-file> -f=<format [jaspar|transfac|lpm|pfm|real]>
+
+      OPTIONS:
+              -h                  show help message
+              -b=<bg-freq-file>   bg nucleotide composition file
+              -c=<pseudo-cnt>     pseudo-count fraction [def=0]
+              -m=<minscore>       minimal score value [def=-10000]
+              -n=<log-scaling>    log scaling factor [def=100]
+              -s=<mult>           matrix multiplication factor (for real PWMs) [def=100]
+
+              Convert JASPAR, TRANSFAC, PFM, LPM and real PWM formats to integer log likelihoods (log-odds) format.
+
+     By default, the prior background nucleotide frequencies are set to 0.25.
+
+     # Examples
+
+     # Convert JASPAR to log-odds:
+    
+     pwm_convert stat1_jaspar.mat -f=jaspar  2>/dev/null
+
+     # If we want to use different backgroung nucleotide frequencies:
+
+     pwm_convert stat1_jaspar.mat -f=jaspar -b=bg_probs_2.txt 2>/dev/null
+   
+     # Where the bg_probs_2.txt file includes the prior bg frequencies, defined as follows:
+
+       A  0.29
+       C  0.21
+       G  0.21
+       T  0.29
+
+     # Convert TRANSFAC to log-odds:
+
+     pwm_convert statx_transfac.mat -f=transfac -c=0.0000001 2>/dev/null
+
+     # Convert PFM (Position Frequency Matrix) to to log-odds:
+
+     pwm_convert stat1_pfm.mat -f=pfm 2>/dev/null 
+------------------------------------------------------------------------------
+
+
 How to build Bowtie index files
 ==============================================================================


--- a/examples/ctcf_real.mat
+++ b/examples/ctcf_real.mat
+>CTCF-motif
+ -0.96   -0.89    0.00   -0.72
+ -0.59   -1.83    0.00   -1.22
+ -2.64    0.00   -3.50   -3.10
+ -4.48    0.00   -4.70   -4.71
+  0.00   -2.54   -2.29   -1.84
+ -2.12   -0.12    0.00   -3.09
+ -0.31    0.00   -2.13   -0.32
+  0.00   -3.88   -2.90   -2.74
+ -4.15   -4.69    0.00   -5.28
+  0.00   -4.05   -0.05   -3.56
+ -3.20   -4.33    0.00   -0.89
+ -4.02   -5.28    0.00   -4.95
+ -3.36   -2.68    0.00   -2.74
+ -1.83    0.00   -2.99   -2.21
+  0.00   -2.37   -0.04   -2.77
+ -1.73    0.00   -0.53   -1.17
+ -0.96   -0.08   -1.95    0.00
+  0.00   -0.74   -0.54   -1.63
--- a/examples/stat1_jaspar.mat
+++ b/examples/stat1_jaspar.mat
+>MA0137.2 STAT1
+A  [ 208  859  251   10    8  106   23  528  696   53 1900 2030  954  336  417 ]
+C  [1076  496  574   22   14 1921 1900  762   31   30  124   17  263  760  804 ]
+G  [ 415  279  144   11   38   14    7  115 1292 1700   29   23  552  270  425 ]
+T  [ 378  446 1112 2038 2023   44  155  680   66  302   32   15  315  714  431 ]
--- a/examples/stat1_lpm.mat
+++ b/examples/stat1_lpm.mat
+> letter-probability matrix V_STAT1_01: alength= 4 w= 21 nsites= 100.1 E= 0
+  0.080000        0.180000        0.419000        0.321000
+  0.232232        0.097097        0.372372        0.298298
+  0.138723        0.312375        0.197605        0.351297
+  0.175175        0.379379        0.381381        0.064064
+  0.595405        0.134865        0.150849        0.118881
+  0.311688        0.262737        0.093906        0.331668
+  0.000000        0.000000        0.000000        1.000000
+  0.000000        0.000000        0.000000        1.000000
+  0.039000        0.909000        0.020000        0.032000
+  0.000000        1.000000        0.000000        0.000000
+  0.000000        0.325000        0.675000        0.000000
+  0.000000        0.000000        1.000000        0.000000
+  0.038038        0.000000        0.949950        0.012012
+  1.000000        0.000000        0.000000        0.000000
+  1.000000        0.000000        0.000000        0.000000
+  0.362637        0.087912        0.294705        0.254745
+  0.111000        0.111000        0.111000        0.667000
+  0.198198        0.105105        0.696697        0.000000
+  0.248000        0.127000        0.466000        0.159000
+  0.125125        0.367367        0.492492        0.015015
+  0.317682        0.125874        0.396603        0.159840
--- a/examples/stat1_pfm.mat
+++ b/examples/stat1_pfm.mat
+> M00224 STAT1
+8.0    18.0   41.9   32.1
+23.2    9.7   37.2   29.8
+13.9   31.3   19.8   35.2
+17.5   37.9   38.1    6.4
+59.6   13.5   15.1   11.9
+31.2   26.3    9.4   33.2
+0         0      0    100
+0         0      0    100
+3.9    90.9      2    3.2
+0       100      0      0
+0      32.5   67.5      0
+0         0    100      0
+3.8       0   94.9    1.2
+100       0      0      0
+100       0      0      0
+36.3    8.8   29.5   25.5
+11.1   11.1   11.1   66.7
+19.8   10.5   69.6      0
+24.8   12.7   46.6   15.9
+12.5   36.7   49.2    1.5
+31.8   12.5   39.7     16
--- a/examples/statx_transfac.mat
+++ b/examples/statx_transfac.mat
+AC   M00223
+XX
+ID   V$STAT_01
+XX
+DT   29.11.1995 (created); ewi.
+DT   11.03.2003 (updated); dtc.
+CO   Copyright (C), Biobase GmbH.
+XX
+NA   STATx
+XX
+DE   signal transducers and activators of transcription
+XX
+BF   T01575 STAT1alpha; Species: mouse, Mus musculus.
+BF   T01492 STAT1alpha; Species: human, Homo sapiens.
+BF   T01573 STAT1beta; Species: human, Homo sapiens.
+XX
+PO      A      C      G      T
+01      0      0      0     14      T
+02      0      0      0     14      T
+03      2     12      0      0      C
+04      0     12      0      2      C
+05      0      9      2      3      C
+06      4      2      8      0      G
+07      0      0      8      6      K
+08     13      1      0      0      A
+09     14      0      0      0      A
+XX
+BA   14 genomic sequences of 14 different genes
+XX
+CC   compiled sequences
+XX
+RN   [1]; RE0003481.
+RX   PUBMED: 7774815.
+RA   Horvath C. M., Wen Z., Darnell jr J. E.
+RT   A STAT protein domain that determines DNA sequence recognition suggests a novel DNA-binding domain
+RL   Genes Dev. 9:984-994 (1995).
+XX
--- a/perl_tools/lpmconvert.pl
+++ b/perl_tools/lpmconvert.pl
@@ -207,18 +207,26 @@ while ($line = <MF>) {
    for ($i_motif = 0; $i_motif < $len; $i_motif++) {
      for ($i_base = 0; $i_base < $num_bases; $i_base++) {
        if ($ofile) {
-          if ($motif{$i_base, $i_motif} > 0) {
+          if ($c > 0) {
            printf(OF "%7d ", round((log( ($motif{$i_base, $i_motif} + $bg{$bases[$i_base]} * $c) / ($bg{$bases[$i_base]} * (1 + $c)) )/log(2.0))*$logscl) );
+          } else {
+            if ($motif{$i_base, $i_motif} > 0) {
+              printf(OF "%7d ", round((log( $motif{$i_base, $i_motif} / $bg{$bases[$i_base]} )/log(2.0))*$logscl) );
            } else {
              printf(OF "%7d ", $minscore);
            }
          }
-        if ($motif{$i_base, $i_motif} > 0) {
+        }
+        if ($c > 0) {
            printf("%7d ", round((log( ($motif{$i_base, $i_motif} + $bg{$bases[$i_base]} * $c) / ($bg{$bases[$i_base]} * (1 + $c)) )/log(2.0))*$logscl) );
+        } else {
+          if ($motif{$i_base, $i_motif} > 0) {
+            printf("%7d ", round((log( $motif{$i_base, $i_motif} / $bg{$bases[$i_base]} )/log(2.0))*$logscl) );
          } else {
            printf("%7d ", $minscore);
          }
        }
+      }
      if ($ofile) {
        print(OF "\n");
      }

--- a/perl_tools/pwmconvert.pl
+++ b/perl_tools/pwmconvert.pl
@@ -26,7 +26,6 @@ $bg{"T"} = 0.25;
 my $usage = "USAGE: pwmconvert [options] <matrix file>

  Options: 
-	   -bg <background file>	set of f_a
           -c                           check format : check whether PWM scores are Real or Integer 
                                        and print out the corresponding Real/Integer flag
           -n <scaling factor>          set scaling factor (int) for real format only
@@ -58,9 +57,7 @@ my $id_list = "";
 my $header = 1;
 while (scalar(@ARGV) > 1) {
  $next_arg = shift(@ARGV);
-  if ($next_arg eq "-bg") {
-    $bg_file = shift(@ARGV);
-  } elsif ($next_arg eq "-c") {
+  if ($next_arg eq "-c") {
    $check_flag = 1;
  } elsif ($next_arg eq "-n") {
    $scale = shift(@ARGV);
@@ -83,34 +80,6 @@ while (scalar(@ARGV) > 1) {
 }
 ($matrix_file) = @ARGV;

-# read the background file
-if (defined($bg_file)) {
-  open($bg_file, "<$bg_file") || die("Can't open $bg_file.\n");
-  $total_bg = 0;
-  while (<$bg_file>) {
-    next if (/^#/);			# skip comments
-    ($a, $f) = split;
-    if ($a eq "A" || $a eq "a") {
-      $bg{"A"} = $f; 
-      $total_bg += $f;
-    } elsif ($a eq "C" || $a eq "c") {
-      $bg{"C"} = $f; 
-      $total_bg += $f;
-    } elsif ($a eq "G" || $a eq "g") {
-      $bg{"G"} = $f; 
-      $total_bg += $f;
-    } elsif ($a eq "T" || $a eq "t") {
-      $bg{"T"} = $f; 
-      $total_bg += $f;
-    }
-  }
-  # make sure they sum to 1
-  foreach $key (keys %bg) {
-    $bg{$key} /= $total_bg;
-    #printf STDERR "$key $bg{$key}\n";
-  }
-}  # background file
-
 # Open the matrix file for reading.
 open(MF, "<$matrix_file") || die("Can't open $matrix_file.\n");
 # Open the output file

--- a/pwm_bowtie_wrapper
+++ b/pwm_bowtie_wrapper
@@ -126,13 +126,13 @@ pwmout_bed=${matrix_name}_co${matrix_score}_bowtie.bed
 echo "========               Bowtie-based pipeline               ========"
 if [ $non_overlapping == 0 ]
 then
-  echo "mba -c $matrix_score -l $matrix_len $matrix_file | sort -k2,2 -nr | awk '{print \">\"\$2\"\n\"\$1}' | bowtie --threads 4 -l $matrix_len -n0 -a $bowtie_dir/$genome_idx_file -f - --un unmapped.dat | sort -s -k3,3 -k4,4n | bowtie2bed -s $assembly -l $matrix_len -i $chrNC_dir | awk 'BEGIN { while((getline line < \"$pwmScore_tab\") > 0 ) {split(line,f,\" \"); pvalue[f[1]]=f[2]} close(\"$pwmScore_tab\")} {print \$1\"\t\"\$2\"\t\"\$3\"\t\"\$4\"\t\"\$5\"\t\"\$6\"\t\"\"$matrix_name\"\"\t\"\"P-value=\"pvalue[\$5]}' > $pwmout_bed" 
+  echo "mba -c $matrix_score -l $matrix_len $matrix_file | awk '{print \">\"\$2\"\n\"\$1}' | bowtie --threads 4 -l $matrix_len -n0 -a $bowtie_dir/$genome_idx_file -f - --un unmapped.dat | sort -s -k3,3 -k4,4n | bowtie2bed -s $assembly -l $matrix_len -i $chrNC_dir | awk 'BEGIN { while((getline line < \"$pwmScore_tab\") > 0 ) {split(line,f,\" \"); pvalue[f[1]]=f[2]} close(\"$pwmScore_tab\")} {print \$1\"\t\"\$2\"\t\"\$3\"\t\"\$4\"\t\"\$5\"\t\"\$6\"\t\"\"$matrix_name\"\"\t\"\"P-value=\"pvalue[\$5]}' > $pwmout_bed" 
  echo "..."
-  mba -c $matrix_score -l $matrix_len $matrix_file | sort -k2,2 -nr | awk '{print ">"$2"\n"$1}' | bowtie --threads 4 -l $matrix_len -n0 -a $bowtie_dir/$genome_idx_file -f - --un unmapped.dat | sort -s -k3,3 -k4,4n | bowtie2bed -s $assembly -l $matrix_len -i $chrNC_dir | awk -v scoretab="$pwmScore_tab" -v pwmname="$matrix_name" 'BEGIN { while((getline line < scoretab) > 0 ) {split(line,f," "); pvalue[f[1]]=f[2]} close(scoretab)} {print $1"\t"$2"\t"$3"\t"$4"\t"$5"\t"$6"\t"pwmname"\t""P-value="pvalue[$5]}' > $pwmout_bed
+  mba -c $matrix_score -l $matrix_len $matrix_file | awk '{print ">"$2"\n"$1}' | bowtie --threads 4 -l $matrix_len -n0 -a $bowtie_dir/$genome_idx_file -f - --un unmapped.dat | sort -s -k3,3 -k4,4n | bowtie2bed -s $assembly -l $matrix_len -i $chrNC_dir | awk -v scoretab="$pwmScore_tab" -v pwmname="$matrix_name" 'BEGIN { while((getline line < scoretab) > 0 ) {split(line,f," "); pvalue[f[1]]=f[2]} close(scoretab)} {print $1"\t"$2"\t"$3"\t"$4"\t"$5"\t"$6"\t"pwmname"\t""P-value="pvalue[$5]}' > $pwmout_bed
 else
-  echo "mba -c $matrix_score -l $matrix_len $matrix_file | sort -k2,2 -nr | awk '{print \">\"\$2\"\n\"\$1}' | bowtie --threads 4 -l $matrix_len -n0 -a $bowtie_dir/$genome_idx_file -f - --un unmapped.dat | sort -s -k3,3 -k4,4n | bowtie2bed -s $assembly -l $matrix_len -i $chrNC_dir | filterOverlaps -l$matrix_len  | awk 'BEGIN { while((getline line < \"$pwmScore_tab\") > 0 ) {split(line,f,\" \"); pvalue[f[1]]=f[2]} close(\"$pwmScore_tab\")} {print \$1\"\t\"\$2\"\t\"\$3\"\t\"\$4\"\t\"\$5\"\t\"\$6\"\t\"\"$matrix_name\"\"\t\"\"P-value=\"pvalue[\$5]}' > $pwmout_bed" 
+  echo "mba -c $matrix_score -l $matrix_len $matrix_file | awk '{print \">\"\$2\"\n\"\$1}' | bowtie --threads 4 -l $matrix_len -n0 -a $bowtie_dir/$genome_idx_file -f - --un unmapped.dat | sort -s -k3,3 -k4,4n | bowtie2bed -s $assembly -l $matrix_len -i $chrNC_dir | filterOverlaps -l$matrix_len  | awk 'BEGIN { while((getline line < \"$pwmScore_tab\") > 0 ) {split(line,f,\" \"); pvalue[f[1]]=f[2]} close(\"$pwmScore_tab\")} {print \$1\"\t\"\$2\"\t\"\$3\"\t\"\$4\"\t\"\$5\"\t\"\$6\"\t\"\"$matrix_name\"\"\t\"\"P-value=\"pvalue[\$5]}' > $pwmout_bed" 
  echo "..."
-  mba -c $matrix_score -l $matrix_len $matrix_file | sort -k2,2 -nr | awk '{print ">"$2"\n"$1}' | bowtie --threads 4 -l $matrix_len -n0 -a $bowtie_dir/$genome_idx_file -f - --un unmapped.dat | sort -s -k3,3 -k4,4n | bowtie2bed -s $assembly -l $matrix_len -i $chrNC_dir | filterOverlaps -l$matrix_len | awk -v scoretab="$pwmScore_tab" -v pwmname="$matrix_name" 'BEGIN { while((getline line < scoretab) > 0 ) {split(line,f," "); pvalue[f[1]]=f[2]} close(scoretab)} {print $1"\t"$2"\t"$3"\t"$4"\t"$5"\t"$6"\t"pwmname"\t""P-value="pvalue[$5]}' > $pwmout_bed
+  mba -c $matrix_score -l $matrix_len $matrix_file | awk '{print ">"$2"\n"$1}' | bowtie --threads 4 -l $matrix_len -n0 -a $bowtie_dir/$genome_idx_file -f - --un unmapped.dat | sort -s -k3,3 -k4,4n | bowtie2bed -s $assembly -l $matrix_len -i $chrNC_dir | filterOverlaps -l$matrix_len | awk -v scoretab="$pwmScore_tab" -v pwmname="$matrix_name" 'BEGIN { while((getline line < scoretab) > 0 ) {split(line,f," "); pvalue[f[1]]=f[2]} close(scoretab)} {print $1"\t"$2"\t"$3"\t"$4"\t"$5"\t"$6"\t"pwmname"\t""P-value="pvalue[$5]}' > $pwmout_bed
 fi

 echo "List of matches (BED format) : $pwmout_bed"

--- a/pwm_convert
+++ b/pwm_convert
+#!/bin/bash
+
+# pwm_convert: Matrix conversion utility
+#
+#  Convert JASPAR, TRANSFAC, PFM, LPM, and real PWM formats
+#  to integer log likelihoods (log-odds) format.
+#
+#  arguments:
+#              matrix-file
+#              matrix format
+#              bg base composition
+#              pseudo-count fraction
+#              minimal log score
+#              log scaling factor
+#
+#  02.11.2107  Giovanna Ambrosini
+
+ARGS=2         # Script takes 6 arguments, but at least 2 arguments are mandatory.
+E_BADARGS=85   # Wrong number of arguments passed to script.
+
+bg_freq=""
+pseudocnt=0
+minscore=-10000
+logscl=100
+mult=100
+format=""
+
+function usage()
+{
+  echo "Usage: `basename $0` <matrix-file> -f=<format [jaspar|transfac|lpm|pfm|real]>"
+  echo ""
+  echo "OPTIONS:"
+  echo -e "\t-h                  show help message"
+  echo -e "\t-b=<bg-freq-file>   bg nucleotide composition file"
+  echo -e "\t-c=<pseudo-cnt>     pseudo-count fraction [def=$pseudocnt]"
+  echo -e "\t-m=<minscore>       minimal score value [def=$minscore]"
+  echo -e "\t-n=<log-scaling>    log scaling factor [def=$logscl]" 
+  echo -e "\t-s=<mult>           matrix multiplication factor (for real PWMs) [def=$mult]" 
+  echo -e ""
+  echo -e "\tConvert JASPAR, TRANSFAC, PFM, LPM and real PWM formats to integer log likelihoods (log-odds) format."
+  echo ""
+}
+
+if [ $# -lt "$ARGS" ]
+then
+  usage
+  exit $E_BADARGS
+fi
+
+if [ -f "$1" ]
+then
+    matrix_file=$1
+else
+    echo "File \"$1\" does not exist."
+    exit $E_BADARGS
+fi
+
+bg_freq="bg_probs.txt"
+echo -e "A 0.25\nC 0.25\nG 0.25\nT 0.25" > $bg_freq
+
+if [ "$2" != "" ]
+then
+  format=`echo $2 | awk -F= '{print $2}'`
+fi
+
+while [ "$3" != "" ]; do
+  PARAM=`echo $3 | awk -F= '{print $1}'`
+  VALUE=`echo $3 | awk -F= '{print $2}'`
+  case $PARAM in
+      -h)
+          usage
+          exit
+          ;;
+      -b)
+          bg_freq=$VALUE
+          ;;
+      -c)
+          pseudocnt=$VALUE
+          ;;
+      -m)
+          minscore=$VALUE
+          ;;
+      -n)
+          logscl=$VALUE
+          ;;
+      -s)
+          mult=$VALUE
+          ;;
+      *)
+          echo "ERROR: unknown parameter \"$PARAM\""
+          usage
+          exit 1
+          ;;
+  esac
+  shift
+done
+
+echo "BG base composition file: $bg_freq" >&2;
+cat $bg_freq >&2;
+echo "Pseudo-count fraction: $pseudocnt" >&2;
+echo "Monimal score value:   $minscore" >&2;
+echo "Log scaling factor:    $logscl" >&2;
+
+# Extract basename from matrix file (without path)
+matrix_name=$(basename "$matrix_file")
+extension="${matrix_name##*.}"
+matrix_name="${matrix_name%.*}"
+
+echo "PWM name: $matrix_name" >&2
+echo "PWM format: $format" >&2
+
+echo "========               Computing PWM length                ========" >&2
+matrix_len=$(cat $matrix_file | perl -ane 'next if (/^#/ or /^>/); print;' | wc -l)
+
+file_len=$(cat $matrix_file | wc -l)
+
+if [ $file_len -gt $matrix_len ]
+then
+    sed 1d $matrix_file > $matrix_file".tmp"
+    matrix_file=$matrix_file".tmp"
+fi
+
+echo "PWM file: $matrix_file" >&2
+echo "PWM length: $matrix_len" >&2
+echo "PWM file length : $file_len" >&2
+
+matrix_file_tmp=$matrix_file".tmp"
+
+echo "========               PWM conversion                     ========" >&2
+if [ $format == 'jaspar' ]
+then
+  if [ $(echo "$pseudocnt > 0" | bc) -eq 1 ]
+  then
+    echo "jasparconvert.pl -c $pseudocnt -m $minscore -n $logscl -bg $bg_freq -w l -noheader -o $matrix_file_tmp $matrix_file 1>/dev/null 2>/dev/null" >&2
+    jasparconvert.pl -c $pseudocnt -m $minscore -n $logscl -bg $bg_freq -w l -noheader -o $matrix_file_tmp $matrix_file 1>/dev/null 2>/dev/null; cat $matrix_file_tmp    
+  else
+    echo "jasparconvert.pl -m $minscore -n $logscl -bg $bg_freq -w l -noheader -o $matrix_file_tmp $matrix_file 1>/dev/null 2>/dev/null" >&2
+    jasparconvert.pl -m $minscore -n $logscl -bg $bg_freq -w l -noheader -o $matrix_file_tmp $matrix_file 1>/dev/null 2>/dev/null; cat $matrix_file_tmp    
+  fi
+fi
+
+if [ $format == 'transfac' ]
+then
+  if [ $(echo "$pseudocnt > 0" | bc) -eq 1 ]
+  then
+    echo "transfaconvert.pl -c $pseudocnt -m $minscore -n $logscl -bg $bg_freq -w l -noheader -o $matrix_file_tmp $matrix_file 1>/dev/null 2>/dev/null" >&2
+    transfaconvert.pl -c $pseudocnt -m $minscore -n $logscl -bg $bg_freq -w l -noheader -o $matrix_file_tmp $matrix_file 1>/dev/null 2>/dev/null; cat $matrix_file_tmp    
+  else
+    echo "transfaconvert.pl -m $minscore -n $logscl -bg $bg_freq -w l -noheader -o $matrix_file_tmp $matrix_file 1>/dev/null 2>/dev/null" >&2
+    transfaconvert.pl -m $minscore -n $logscl -bg $bg_freq -w l -noheader -o $matrix_file_tmp $matrix_file 1>/dev/null 2>/dev/null; cat $matrix_file_tmp    
+  fi
+fi
+
+if [ $format == 'pfm' ]
+then
+  if [ $(echo "$pseudocnt > 0" | bc) -eq 1 ]
+  then
+    echo "pfmconvert.pl -c $pseudocnt -m $minscore -n $logscl -bg $bg_freq -noheader -o $matrix_file_tmp $matrix_file 1>/dev/null 2>/dev/null" >&2
+    pfmconvert.pl -c $pseudocnt -m $minscore -n $logscl -bg $bg_freq -noheader -o $matrix_file_tmp $matrix_file 1>/dev/null 2>/dev/null; cat $matrix_file_tmp    
+  else
+    echo "pfmconvert.pl -m $minscore -n $logscl -bg $bg_freq -noheader -o $matrix_file_tmp $matrix_file 1>/dev/null 2>/dev/null" >&2
+    pfmconvert.pl -m $minscore -n $logscl -bg $bg_freq -noheader -o $matrix_file_tmp $matrix_file 1>/dev/null 2>/dev/null; cat $matrix_file_tmp    
+  fi
+fi
+
+if [ $format == 'lpm' ]
+then
+  if [ $(echo "$pseudocnt > 0" | bc) -eq 1 ]
+  then
+    echo "lpmconvert.pl -c $pseudocnt -m $minscore -n $logscl -bg $bg_freq -noheader -o $matrix_file_tmp $matrix_file 1>/dev/null 2>/dev/null" >&2
+    lpmconvert.pl -c $pseudocnt -m $minscore -n $logscl -bg $bg_freq -noheader -o $matrix_file_tmp $matrix_file 1>/dev/null 2>/dev/null; cat $matrix_file_tmp    
+  else
+    echo "lpmconvert.pl -m $minscore -n $logscl -bg $bg_freq -noheader -o $matrix_file_tmp $matrix_file 1>/dev/null 2>/dev/null" >&2
+    lpmconvert.pl -m $minscore -n $logscl -bg $bg_freq -noheader -o $matrix_file_tmp $matrix_file 1>/dev/null 2>/dev/null; cat $matrix_file_tmp    
+  fi
+fi
+
+if [ $format == 'real' ]
+then
+  echo "pwmconvert.pl -n $mult -noheader -o $matrix_file_tmp $matrix_file 1>/dev/null 2>/dev/null" >&2
+  pwmconvert.pl -n $mult -noheader -o $matrix_file_tmp $matrix_file 1>/dev/null 2>/dev/null; cat $matrix_file_tmp
+fi
+
+# Remove temporary files
+if [ $file_len -gt $matrix_len ]
+then
+    rm $matrix_file
+fi
+rm $matrix_file_tmp $bg_freq
--- a/pwm_scan
+++ b/pwm_scan
@@ -220,13 +220,13 @@ then
   echo "========               Bowtie-based pipeline               ========"
   if [ $non_overlapping == 0 ]
   then
-      echo "mba -c $matrix_score -l $matrix_len $matrix_file | sort -k2,2 -nr | awk '{print \">\"\$2\"\n\"\$1}' | bowtie --threads 4 -l $matrix_len -n0 -a $bowtie_dir/$genome_idx_file -f - --un unmapped.dat | sort -s -k3,3 -k4,4n | bowtie2bed -s $assembly -l $matrix_len $chrNC_dir | awk 'BEGIN { while((getline line < \"$pwmScore_tab\") > 0 ) {split(line,f,\" \"); pvalue[f[1]]=f[2]} close(\"$pwmScore_tab\")} {print \$1\"\t\"\$2\"\t\"\$3\"\t\"\$4\"\t\"\$5\"\t\"\$6\"\t\"\"$matrix_name\"\"\t\"\"P-value=\"pvalue[\$5]}' > $pwmout_bed" 
+      echo "mba -c $matrix_score -l $matrix_len $matrix_file | awk '{print \">\"\$2\"\n\"\$1}' | bowtie --threads 4 -l $matrix_len -n0 -a $bowtie_dir/$genome_idx_file -f - --un unmapped.dat | sort -s -k3,3 -k4,4n | bowtie2bed -s $assembly -l $matrix_len $chrNC_dir | awk 'BEGIN { while((getline line < \"$pwmScore_tab\") > 0 ) {split(line,f,\" \"); pvalue[f[1]]=f[2]} close(\"$pwmScore_tab\")} {print \$1\"\t\"\$2\"\t\"\$3\"\t\"\$4\"\t\"\$5\"\t\"\$6\"\t\"\"$matrix_name\"\"\t\"\"P-value=\"pvalue[\$5]}' > $pwmout_bed" 
      echo "..."
-      mba -c $matrix_score -l $matrix_len $matrix_file | sort -k2,2 -nr | awk '{print ">"$2"\n"$1}' | bowtie --threads 4 -l $matrix_len -n0 -a $bowtie_dir/$genome_idx_file -f - --un unmapped.dat | sort -s -k3,3 -k4,4n | bowtie2bed -s $assembly -l $matrix_len -i $chrNC_dir | awk -v scoretab="$pwmScore_tab" -v pwmname="$matrix_name" 'BEGIN { while((getline line < scoretab) > 0 ) {split(line,f," "); pvalue[f[1]]=f[2]} close(scoretab)} {print $1"\t"$2"\t"$3"\t"$4"\t"$5"\t"$6"\t"pwmname"\t""P-value="pvalue[$5]}' > $pwmout_bed
+      mba -c $matrix_score -l $matrix_len $matrix_file | awk '{print ">"$2"\n"$1}' | bowtie --threads 4 -l $matrix_len -n0 -a $bowtie_dir/$genome_idx_file -f - --un unmapped.dat | sort -s -k3,3 -k4,4n | bowtie2bed -s $assembly -l $matrix_len -i $chrNC_dir | awk -v scoretab="$pwmScore_tab" -v pwmname="$matrix_name" 'BEGIN { while((getline line < scoretab) > 0 ) {split(line,f," "); pvalue[f[1]]=f[2]} close(scoretab)} {print $1"\t"$2"\t"$3"\t"$4"\t"$5"\t"$6"\t"pwmname"\t""P-value="pvalue[$5]}' > $pwmout_bed
   else
-      echo "mba -c $matrix_score -l $matrix_len $matrix_file | sort -k2,2 -nr | awk '{print \">\"\$2\"\n\"\$1}' | bowtie --threads 4 -l $matrix_len -n0 -a $bowtie_dir/$genome_idx_file -f - --un unmapped.dat | sort -s -k3,3 -k4,4n | bowtie2bed -s $assembly -l $matrix_len $chrNC_dir | filterOverlaps -l$matrix_len | awk 'BEGIN { while((getline line < \"$pwmScore_tab\") > 0 ) {split(line,f,\" \"); pvalue[f[1]]=f[2]} close(\"$pwmScore_tab\")} {print \$1\"\t\"\$2\"\t\"\$3\"\t\"\$4\"\t\"\$5\"\t\"\$6\"\t\"\"$matrix_name\"\"\t\"\"P-value=\"pvalue[\$5]}' > $pwmout_bed" 
+      echo "mba -c $matrix_score -l $matrix_len $matrix_file | awk '{print \">\"\$2\"\n\"\$1}' | bowtie --threads 4 -l $matrix_len -n0 -a $bowtie_dir/$genome_idx_file -f - --un unmapped.dat | sort -s -k3,3 -k4,4n | bowtie2bed -s $assembly -l $matrix_len $chrNC_dir | filterOverlaps -l$matrix_len | awk 'BEGIN { while((getline line < \"$pwmScore_tab\") > 0 ) {split(line,f,\" \"); pvalue[f[1]]=f[2]} close(\"$pwmScore_tab\")} {print \$1\"\t\"\$2\"\t\"\$3\"\t\"\$4\"\t\"\$5\"\t\"\$6\"\t\"\"$matrix_name\"\"\t\"\"P-value=\"pvalue[\$5]}' > $pwmout_bed" 
      echo "..."
-      mba -c $matrix_score -l $matrix_len $matrix_file | sort -k2,2 -nr | awk '{print ">"$2"\n"$1}' | bowtie --threads 4 -l $matrix_len -n0 -a $bowtie_dir/$genome_idx_file -f - --un unmapped.dat | sort -s -k3,3 -k4,4n | bowtie2bed -s $assembly -l $matrix_len -i $chrNC_dir | filterOverlaps -l$matrix_len | awk -v scoretab="$pwmScore_tab" -v pwmname="$matrix_name" 'BEGIN { while((getline line < scoretab) > 0 ) {split(line,f," "); pvalue[f[1]]=f[2]} close(scoretab)} {print $1"\t"$2"\t"$3"\t"$4"\t"$5"\t"$6"\t"pwmname"\t""P-value="pvalue[$5]}' > $pwmout_bed
+      mba -c $matrix_score -l $matrix_len $matrix_file | awk '{print ">"$2"\n"$1}' | bowtie --threads 4 -l $matrix_len -n0 -a $bowtie_dir/$genome_idx_file -f - --un unmapped.dat | sort -s -k3,3 -k4,4n | bowtie2bed -s $assembly -l $matrix_len -i $chrNC_dir | filterOverlaps -l$matrix_len | awk -v scoretab="$pwmScore_tab" -v pwmname="$matrix_name" 'BEGIN { while((getline line < scoretab) > 0 ) {split(line,f," "); pvalue[f[1]]=f[2]} close(scoretab)} {print $1"\t"$2"\t"$3"\t"$4"\t"$5"\t"$6"\t"pwmname"\t""P-value="pvalue[$5]}' > $pwmout_bed
   fi

   echo "List of matches (BED format) : $pwmout_bed"