... | ... | @@ -5,7 +5,8 @@ For this tutorial we will need the [HapMap3](ftp://ftp.ncbi.nlm.nih.gov/hapmap/p |
|
|
|
|
|
A run of FastEpistasis bears several stages, each having its own executables (or several in the compute stage to differentiate MPI from SMP computer architecture).
|
|
|
|
|
|
1. The first stage gathers data into a unique binary file that will hold all the information required later on in a compact format. Most of the issues arise here as the user may run into unrecognized format. Similar to Plink, PreFastEpistasis has several mandatory arguments that point to the different files holding the data:
|
|
|
## Preparing data - ***preFastEpistasis***
|
|
|
The first stage gathers data into a unique binary file that will hold all the information required later on in a compact format. Most of the issues arise here as the user may run into unrecognized format. Similar to Plink, PreFastEpistasis has several mandatory arguments that point to the different files holding the data:
|
|
|
* the individual relations: Plink .fam file
|
|
|
* the overall genotype data: Plink .ped or .bed file
|
|
|
* the SNPs information: Plink .map or .bim file.
|
... | ... | @@ -77,4 +78,138 @@ rs4738868 |
|
|
rs7001997
|
|
|
rs1367975
|
|
|
END
|
|
|
``` |
|
|
\ No newline at end of file |
|
|
```
|
|
|
|
|
|
At last, generating the binary file FastEpistasis.data.MKK.bin for the stage 2 process is a matter of running PreFastEpistasis giving all prior files, that is
|
|
|
```bash
|
|
|
preFastEpistasis --bfile FastEpistasis.data.MKK \
|
|
|
--pheno hapmap3_r1_b36_fwd.MKK.qc.poly.recode.fakenormphenotype.txt \
|
|
|
--set pure_itr_19999.subset
|
|
|
```
|
|
|
which should output
|
|
|
```
|
|
|
â•’â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â••
|
|
|
│ Pre FastEpistasis version 2.01 build on Nov 11 2011 │
|
|
|
â•žâ•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•¡
|
|
|
│ © 2010-2011 Thierry Schuepbach, Vital-IT │
|
|
|
│ Swiss Institute of Bioinformatics (SIB) │
|
|
|
â•žâ•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•¡
|
|
|
│ For documentation & bug-report instructions: │
|
|
|
│ http://www.vital-it.ch/software/FastEpistasis │
|
|
|
│ thierry.schuepbach@sib.swiss │
|
|
|
â•žâ•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•¡
|
|
|
│ System informations │
|
|
|
├────────────────────────────────────────────────────────────────────┤
|
|
|
│ Linux kernel : 2.6.18-194.el5 │
|
|
|
│ Architecture : x86_64 │
|
|
|
│ CPU vendor : GenuineIntel │
|
|
|
│ CPU extensions: MMXEXT SSE SSE2 SSE3 SSSE3 SSE41 │
|
|
|
│ Host name : devfrt01.vital-it.ch │
|
|
|
│ User name : Static linking prevent username query │
|
|
|
â•žâ•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•¡
|
|
|
│ Plink data importation │
|
|
|
├────────────────────────────────────────────────────────────────────┤
|
|
|
│ Family : FastEpistasis.data.MKK.fam │
|
|
|
│ Map : FastEpistasis.data.MKK.bim │
|
|
|
│ SNPs : FastEpistasis.data.MKK.bed │
|
|
|
├───────┬───────┬────────┬───────┬───────────┬───────┬───────┬───────┤
|
|
|
│ Male │ 88 │ Female │ 83 │ Ambiguous │ 0 │ TOTAL │ 171 │
|
|
|
├───────┼───────┴────────┴───────┼───────────┴───────┴───────┼───────┤
|
|
|
│ SNPs │ 1525239 │ Memory 16 bytes alignment │ 176 │
|
|
|
â•žâ•â•â•â•â•â•â•â•§â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•§â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•§â•â•â•â•â•â•â•â•¡
|
|
|
│ Plink phenotype import │
|
|
|
├────────────────────────────────────────────────────────────────────┤
|
|
|
│ Missing phenotype string set to -9 │
|
|
|
├────────────────────────────────────────────────────────────────────┤
|
|
|
│ File : ..._r1_b36_fwd.MKK.qc.poly.recode.fakenormphenotype.txt│
|
|
|
│ contains 1 phenotype(s) named │
|
|
|
│ MyTestPhenotype │
|
|
|
â•žâ•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•¡
|
|
|
│ Plink set importation │
|
|
|
├────────────────────────────────────────────────────────────────────┤
|
|
|
│ File : pure_itr_19999.subset │
|
|
|
├────────┬────────┬───────┬────────┬─────┬────────┬─────────┬────────┤
|
|
|
│ Set A │ 19999 │ Set B │ 19999 │ A&B │ 19999 │ Useless │1505240 │
|
|
|
â•žâ•â•â•â•â•â•â•â•â•§â•â•â•â•â•â•â•â•â•§â•â•â•â•â•â•â•â•§â•â•â•â•â•â•â•â•â•§â•â•â•â•â•â•§â•â•â•â•â•â•â•â•â•§â•â•â•â•â•â•â•â•â•â•§â•â•â•â•â•â•â•â•â•¡
|
|
|
│ Prune haploid and sexual chromosomes │
|
|
|
├────────┬────────┬───────┬────────┬─────┬────────┬─────────┬────────┤
|
|
|
│ Set A │ 0 │ Set B │ 0 │ A&B │ 19999 │ Useless │1505240 │
|
|
|
â•žâ•â•â•â•â•â•â•â•â•§â•â•â•â•â•â•â•â•â•§â•â•â•â•â•â•â•â•§â•â•â•â•â•â•â•â•â•§â•â•â•â•â•â•§â•â•â•â•â•â•â•â•â•§â•â•â•â•â•â•â•â•â•â•§â•â•â•â•â•â•â•â•â•¡
|
|
|
│ Output binary file for EPISTASIS processing application │
|
|
|
├────────────────────────────────────────────────────────────────────┤
|
|
|
│ File : FastEpistasis.data.MKK.bin │
|
|
|
│ created Mon Dec 5 13:46:46 2011 │
|
|
|
├────────┬────────┬───────┬────────┬─────┬────────┬─────────┬────────┤
|
|
|
│ Set A │ 0 │ Set B │ 0 │ A&B │ 19999 │ Useless │1505240 │
|
|
|
├────────┼────────┼───────┼────────┼─────┼────┬───┴─┬────┬──┴──┬─────┤
|
|
|
│ EPI 1 │1.00E-04│ EPI 2 │1.00E-02│ mBB │ 2.0│ mAB │ 1.0│ mAA │ 0.0 │
|
|
|
╘â•â•â•â•â•â•â•â•â•§â•â•â•â•â•â•â•â•â•§â•â•â•â•â•â•â•â•§â•â•â•â•â•â•â•â•â•§â•â•â•â•â•â•§â•â•â•â•â•§â•â•â•â•â•â•§â•â•â•â•â•§â•â•â•â•â•â•§â•â•â•â•â•â•›
|
|
|
|
|
|
```
|
|
|
|
|
|
Note the last line where default values where used. Indeed the EPI 1 default is 0.0001, hence pvalues below this threshold will be both accounted and stored. Whereas EPI 2 sets the threshold for values to be only accounted (0.01). mBB, mAB and mAA are the continous values assigned to the genotype in the fit procedure.
|
|
|
While mBB, mAB and mAA can only be modified at this stage, the EPI thresholds can also be modified in stage 2 using the appropriate option (use --help to see options). This features was added as of version 1.07.
|
|
|
|
|
|
## Computing ***[smp,mpi]FastEpistasis***
|
|
|
Once the data has been compacted into a binary file by PreFastEpistasis, running the search for best pairs is as simple as choosing its architecture, namely SMP for shared memmory processor or MPI for Message Passing Interface implementation. For SMP version the command is
|
|
|
```bash
|
|
|
smpFastEpistasis FastEpistasis.data.MKK.bin --method 4 --epi1 0.0
|
|
|
```
|
|
|
which should modify the EPI1 threshold to have no storing and use method 4 (SSE3 switch QR) to compute the fit, it outputs
|
|
|
```
|
|
|
.
|
|
|
```
|
|
|
as well as in this case 8 files named FastEpistasis.data.MKK.epi.qt.lm_XXX.bin where XXX runs from 0 to 7 (empty here as EPI1 is null), an index file FastEpistasis.data.MKK.epi.qt.lm.idx and a summary file FastEpistasis.data.MKK.epi.qt.lm.summary. Only the summary file is text, the rest is binary and holds data required for post processing (see below).
|
|
|
|
|
|
```
|
|
|
< BEST >
|
|
|
CHR SNP N_SIG N_TOT PROP CHISQ CHR SNP
|
|
|
-----------------------------------------------------------------------------------------------
|
|
|
8 rs12544008 97 19466 0.00498 12.33298 8 rs7831336
|
|
|
8 rs10113823 136 19701 0.00690 15.95015 8 rs10108613
|
|
|
8 rs16926871 586 18811 0.03115 17.47989 8 rs10086120
|
|
|
8 rs3864668 379 19720 0.01922 17.16080 8 rs4507760
|
|
|
8 rs4738868 168 19668 0.00854 14.05168 8 rs1479904
|
|
|
8 rs1367975 248 19512 0.01271 17.22512 8 rs3134474
|
|
|
8 rs7001997 142 19698 0.00721 19.02015 8 rs7004336
|
|
|
8 rs1835758 211 19227 0.01097 15.51307 8 rs12543741
|
|
|
8 rs4033372 535 18923 0.02827 16.00133 8 rs6985242
|
|
|
8 rs16926901 0 0 0.00000 0.00000 0 None
|
|
|
8 rs10104116 262 19617 0.01336 16.34661 8 rs2029820
|
|
|
8 rs9298048 241 19712 0.01223 14.70801 8 rs2436854
|
|
|
8 rs10112795 433 18795 0.02304 24.66882 8 rs1481847
|
|
|
8 rs3864667 300 19403 0.01546 20.26660 8 rs450738
|
|
|
8 rs4738870 159 19546 0.00813 18.73731 8 rs2046370
|
|
|
8 rs4738869 285 19680 0.01448 21.64085 8 rs10098671
|
|
|
8 rs7824081 433 18795 0.02304 24.66882 8 rs1481847
|
|
|
8 rs10103997 238 19698 0.01208 16.57859 8 rs10098671
|
|
|
8 rs16926906 246 19142 0.01285 15.97859 8 rs12543741
|
|
|
8 rs7830295 65 19667 0.00331 11.08873 8 rs3102545
|
|
|
8 rs6997436 235 19691 0.01193 16.72613 8 rs3133759
|
|
|
8 rs16926934 145 19571 0.00741 15.35286 8 rs1492649
|
|
|
8 rs956969 208 19435 0.01070 19.16633 8 rs1384769
|
|
|
8 rs16926940 421 19468 0.02163 16.12952 8 rs6985810
|
|
|
8 rs2003204 333 19652 0.01694 18.83094 8 rs1473541
|
|
|
8 rs17831865 326 19629 0.01661 19.63534 8 rs1473541
|
|
|
8 rs12545580 421 19468 0.02163 16.12952 8 rs6985810
|
|
|
8 rs2017819 326 19629 0.01661 19.63534 8 rs1473541
|
|
|
8 rs3852340 203 19496 0.01041 16.24189 8 rs10106503
|
|
|
8 rs7008918 241 19696 0.01224 15.83022 8 rs3133759
|
|
|
8 rs6997461 213 19680 0.01082 15.13975 8 rs3133759
|
|
|
8 rs4144413 221 19277 0.01146 19.04957 8 rs1384769
|
|
|
8 rs16926976 291 19572 0.01487 15.16305 8 rs4386964
|
|
|
8 rs7830371 249 19565 0.01273 14.04788 8 rs514589
|
|
|
8 rs7010431 279 19534 0.01428 21.92911 8 rs4507760
|
|
|
8 rs2931309 247 19685 0.01255 16.73396 8 rs11995526
|
|
|
8 rs2931308 535 18923 0.02827 16.00133 8 rs6985242
|
|
|
8 rs4738873 175 19698 0.00888 12.29373 8 rs11784029
|
|
|
8 rs10112069 414 19682 0.02103 17.17968 8 rs6989096
|
|
|
8 rs3852341 243 19166 0.01268 15.50652 8 rs4335099
|
|
|
8 rs2978508 583 19603 0.02974 21.63617 8 rs723508
|
|
|
8 rs11785638 131 19639 0.00667 14.80778 8 rs6986442
|
|
|
8 rs6471937 197 19690 0.01001 14.44272 8 rs7015958
|
|
|
...
|
|
|
```
|
|
|
Unless one wants to perform statistical analysis on the overall data, the summary file holds all the information about the best pairs.
|
|
|
|
|
|
## Post processing ***postFastEpistasis*** |
|
|
\ No newline at end of file |