Commit 465ad338 authored by aneves's avatar aneves
Browse files

Upload New File

parent 193cd74f
<!DOCTYPE html>
<html xmlns="http://www.w3.org/1999/xhtml"><head><meta http-equiv="Content-Type" content="text/html; charset=UTF-8">
<title>Visualisation</title>
<style type="text/css">code{white-space: pre;}</style>
<style type="text/css">
pre:not([class]) {
background-color: white;
}
</style>
<style type="text/css">
h1 {
font-size: 34px;
}
h1.title {
font-size: 38px;
}
h2 {
font-size: 30px;
}
h3 {
font-size: 24px;
}
h4 {
font-size: 18px;
}
h5 {
font-size: 16px;
}
h6 {
font-size: 12px;
}
.table th:not([align]) {
text-align: left;
}
</style>
<style type="text/css">
.main-container {
max-width: 940px;
margin-left: auto;
margin-right: auto;
}
code {
color: inherit;
background-color: rgba(0, 0, 0, 0.04);
}
img {
max-width:100%;
height: auto;
}
.tabbed-pane {
padding-top: 12px;
}
.html-widget {
margin-bottom: 20px;
}
button.code-folding-btn:focus {
outline: none;
}
summary {
display: list-item;
}
</style>
<body>
<div class="container-fluid main-container">
<!-- setup 3col/9col grid for toc_float and main content -->
<h1 class="title toc-ignore">Visualisation</h1>
<h4 class="author">Yann Christinat</h4>
<address class="author_afil">Geneva University Hospitals (HUG)</address>
<h4 class="author">Tutorial adapted from Whalid Gharib (Swiss Institute of Bioinformatics) who adapted it from:<br>
1-Adapted from Griffith lab at the McDonnell Genome Institute, Washington University School of Medicine, St.&nbsp;Louis.<br>
2-Adapted from MRC Clinical Sciences Centre Bioinformatics Team at Imperial College London, Hammersmith Hospital and Mark Dunning of CRUK, Cambridge Institute</h4>
<h4 class="author"><strong><em>This work is shared under Creative Commons Attribution ShareAlike 3.0</em></strong></h4>
<div id="variants-visualisation-in-igv" class="section level1">
<div name="variants_visualisation_in_igv" data-unique="variants_visualisation_in_igv"></div>
<h1>Variants visualisation in IGV</h1>
<p>In this section we will be looking at how IGV can be used for visualizing mutations in sequence data. Here we consider the scenario in which genome sequencing
has been performed on a DNA sample, sequence reads have been aligned to the reference genome and a variant caller such as GATK HaplotypeCaller or MuTect2 has been run.</p>
<p>We will inspect some regions of the genome where there are possible variants in a breast cancer cell line to determine whether these are real events or artifacts.
These will include single nucleotide variants (SNVs), small insertions and deletions (indels) and larger structural rearrangements.</p>
<hr>
<h1>IGV setup</h1>
<p>Download and install IGV from the Broad Institute website (<a target="_blank" href="http://software.broadinstitute.org/software/igv/download">http://software.broadinstitute.org/software/igv/download</a>).</p>
<div id="hcc1143-data-set" class="section level3">
<div name="hcc1143_data_set" data-unique="hcc1143_data_set"></div>
<h3>HCC1143 data set</h3>
<p>We will be using publicly available Illumina sequence data generated for the HCC1143 cell line. The HCC1143 cell line was generated from a 52 year old caucasian
woman with breast cancer. Additional information on this cell line can be found here: <a target="_blank" href="http://www.atcc.org/products/all/CRL-2321.aspx">HCC1143</a> (tumor,
TNM stage IIA, grade 3, primary ductal carcinoma) and <a target="_blank" href="http://www.atcc.org/products/all/CRL-2362.aspx">HCC1143/BL</a> (matched normal EBV transformed
lymphoblast cell line).</p>
<p>Sequence reads were aligned to version GRCh37 of the human reference genome. We will be working with subsets of aligned reads in the region: chromosome 21: 19,000,000 - 20,000,000.</p>
<p>The BAM files containing these reads for the cancer cell line and the matched normal are:</p>
<ul>
<li><code>HCC1143.tumour.21.19M-20M.bam</code></li>
<li><code>HCC1143.normal.21.19M-20M.bam</code></li>
</ul>
<p>To download the bam files please click <a href="https://drive.switch.ch/index.php/s/TMxxxEkfgK4jKCh/download">here</a></p>
<p>These need to be indexed to be read into IGV. The index files have the .bai suffix and allow IGV to speedily access and display the reads aligning to a specified genomic location.</p>
<p>The reads are from paired end sequencing. DNA fragments of approximately 350 base pairs have been sequenced from each end. The read lengths are 101bp.</p>
<hr>
</div>
<div id="load-aligned-sequence-data" class="section level3">
<div name="load_aligned_sequence_data" data-unique="load_aligned_sequence_data"></div><h3>Load aligned sequence data</h3>
<p>First we need to ensure that IGV is using the same reference genome as that to which the sequence data were aligned, GRCh37, also known as hg19.</p>
<ul>
<li>Select <code>Human hg19</code> from the drop-down list in the top left of the IGV window.</li>
</ul>
<p>Now we’re ready to load the sequence data.</p>
<ul>
<li>Select <code>File &gt; Load from File...</code> from the main menu and select the BAM file <code>HCC1143.normal.21.19M-20M.bam</code> using the file browser.</li>
</ul>
<div class="figure">
<img src="./Tutorial_IGV-images/load_sequence_data.png">
</div>
<p>This BAM file only contains data for a 1 megabase region of chromosome 21. Let’s navigate there to see what genes this region covers. To do so, either:</p>
<ul>
<li>Click on the Home button on the toolbar to jump to the whole genome view, select chromosome 21 in the drop-down list or in the ‘genome ruler’ in the top pane, then click and drag from 19mb to 20mb</li>
</ul>
<p>or</p>
<ul>
<li>Enter <code>chr21:19,000,000-20,000,000</code> in the genome position box just to the left of the Home button</li>
</ul>
<p><strong>Note:</strong> by default overlapping genes, e.g.&nbsp;on different strands, or different isoforms of a gene are collapsed to a single line; these can be expanded by right-clicking the gene track and selecting <code>Expanded</code> from the menu.</p>
<div class="figure">
<img src="./Tutorial_IGV-images/expand_gene_track.png">
</div>
<p>The 1mb region contains too many reads to be displayed by IGV. We need to zoom in to see the read alignments. There are several ways to do this:</p>
<ul>
<li><p>Double-click within the main panel (the one that currently says ’Zoom in to see alignments`); do this a few times until aligned reads are loaded and displayed. Note that the view after zooming is centred on the position that was clicked.</p></li>
<li><p>Click and drag to select the desired region within the genome ruler</p></li>
<li><p>Click on the <code>+</code> button on the slider at the top right of the IGV window, drag the slider bar toward the <code>+</code> button, or click on one of the bars within the slider to go to the desired zoom level.</p></li>
</ul>
<div class="figure">
<img src="./Tutorial_IGV-images/zooming_in_on_alignments.png">
</div>
<hr>
</div>
<div id="snvs-in-the-coverage-track" class="section level3">
<div name="snvs_in_the_coverage_track" data-unique="snvs_in_the_coverage_track"></div><h3>SNVs in the coverage track</h3>
<p>Possible variants are highlighted in the coverage track where the allele fraction is above a configurable threshold. These are the coloured stacked bars within what is a mostly grey coverage plot, where the coloured portion of each bar represents the fraction of reads with different alleles at that position.</p>
<p>Zoom in on one of these coloured bars and hover the cursor over it to show a tooltip that summarizes the number of reads aligned at the position for each of the different alleles.</p>
<div class="figure">
<img src="./Tutorial_IGV-images/variant_in_coverage_track.png">
</div>
<p>The example shown above looks like a heterozygous single nucleotide polymorphism (SNP) with an allele fraction of approximately 0.5. We can load an annotation track for the dbSNP database of common polymorphisms to see if this is a known SNP.</p>
<ul>
<li>Select <code>File &gt; Load from Server...</code> from the main menu and then select <code>Available Datasets &gt; Annotations &gt; Variation and Repeats &gt; dbSNP</code></li>
</ul>
<p><strong>Note:</strong> there are very many SNPs so it may take a few seconds to load the dbSNP track.</p>
<div class="figure">
<img src="./Tutorial_IGV-images/load_dbsnp_annotations.png">
</div>
<p>The dbSNP track is at the bottom of the IGV window. Black bars represent known SNPs.</p>
<div class="figure">
<img src="./Tutorial_IGV-images/dbsnp_annotation.png">
</div>
<p><strong>Note:</strong> hovering over a SNP will display a tooltip containing more details about the SNP including population allele frequencies; clicking on a SNP will open the dbSNP entry in your web browser.</p>
<p>We can adjust the allele fraction threshold above which the bar in the coverage track will be coloured by allele read count using the <code>View &gt; Preferences</code> dialog.</p>
<ul>
<li><p>Select <code>View &gt; Preferences...</code> from the main menu</p></li>
<li><p>Select the <code>Alignments</code> tab from the preferences dialog</p></li>
<li><p>Change the Coverage allele-fraction threshold to 0.01 and click the <code>OK</code> button.</p></li>
</ul>
<div class="figure">
<img src="./Tutorial_IGV-images/view_preferences.png">
</div>
<p>Decreasing the threshold shows more possible variants, increasing the threshold results in fewer variant positions. In this dataset with an average depth of around 60 reads at each position, lowering the threshold to 0.01 results in several additional coloured bars, many of which have a single read supporting an alternative allele to the reference base. We’ll use a threshold of 0.05 for the rest of this tutorial.</p>
<ul>
<li>Reset the Coveage allele-fraction threshold to 0.05</li>
</ul>
<p>Zoom out again and observe the uneven coverage across the region. In some parts of the region the coverage drops to zero. It will be much more difficult to reliably identify variants in a low coverage region.</p>
<p>We can load another annotation track for GC content to help understand why the coverage is uneven.</p>
<ul>
<li>Select <code>File &gt; Load from Server...</code> from the main menu and then select <code>Available Datasets &gt; Annotations &gt; Sequence and Regulation &gt; GC Percentage</code></li>
</ul>
<p>The coverage appears to correlate with GC content. Next-generation sequencing technologies tend to lose coverage in regions with low GC content.</p>
<p>You can also use a collapsed view of the alignments which for this depth of sequencing will allow all the reads aligning in this region to be visible without the need for scrolling.</p>
<ul>
<li>Navigate to <code>chr21:19,611,000-19,631,000</code></li>
<li>Right click in the main alignment track and select <code>Collapsed</code> from the menu</li>
</ul>
<div class="figure">
<img src="./Tutorial_IGV-images/gc_coverage.png">
</div>
<p>The read pileup mirrors the coverage track.</p>
<p>We’ll now remove the GC Percentage track to allow more screen real estate for the read alignments and other tracks used in the next part of the tutorial.</p>
<ul>
<li>Right click on the GC Percentage track and select <code>Remove Track</code> from the menu, then click <code>Yes</code> to confirm</li>
</ul>
<div class="figure">
<img src="./Tutorial_IGV-images/remove_gc_track.png">
</div>
<ul>
<li>Click and drag the divider between what was the GC Percentage track and the alignment track to shrink the now empty upper track.</li>
</ul>
<hr>
</div>
<h1>Exercices</h1>
<div id="examining-read-alignments" class="section level3">
<div name="examining_read_alignments" data-unique="examining_read_alignments"></div>
<h3>Examining read alignments</h3>
<p>We’re now going to examine read alignments at several genomic loci where there are possible variants.</p>
<p><span class="math display"><span class="MathJax_Preview" style="color: inherit; display: none;"></span><div class="MathJax_Display" style="text-align: center;"><span class="MathJax" id="MathJax-Element-2-Frame" tabindex="0" data-mathml="&lt;math xmlns=&quot;http://www.w3.org/1998/Math/MathML&quot; display=&quot;block&quot; /&gt;" role="presentation" style="text-align: center; position: relative;"><nobr aria-hidden="true"><span class="math" id="MathJax-Span-3" style="width: 0em; display: inline-block;"><span style="display: inline-block; position: relative; width: 0em; height: 0px; font-size: 124%;"><span style="position: absolute; clip: rect(3.805em, 1000em, 4.15em, -999.997em); top: -3.972em; left: 0em;"><span class="mrow" id="MathJax-Span-4"></span><span style="display: inline-block; width: 0px; height: 3.978em;"></span></span></span><span style="display: inline-block; overflow: hidden; vertical-align: -0.068em; border-left: 0px solid; width: 0px; height: 0.146em;"></span></span></nobr><span class="MJX_Assistive_MathML MJX_Assistive_MathML_Block" role="presentation"><math xmlns="http://www.w3.org/1998/Math/MathML" display="block"></math></span></span></div><script type="math/tex; mode=display" id="MathJax-Element-2"> </script></span></p>
<div id="high-confidence-snvssnps" class="section level4">
<h4>High confidence SNVs/SNPs</h4>
<ul>
<li><p>Navigate to <code>chr21:19,479,200-19,479,800</code></p></li>
<li><p>Right click in the alignment track and select <code>Color alignments by &gt; insert size and pair orientation</code></p></li>
</ul>
<div class="figure">
<img src="./Tutorial_IGV-images/two_neighbouring_snvs.png">
</div>
<p>There are two heterozygous variants.</p>
<p><strong>Q1</strong> <em>Which of these corresponds to a known SNP?</em></p>
<p><strong>Q2</strong> <em>What is the population allele frequency of the alternate (non-reference) allele?</em></p>
<p><strong>Q3</strong> <em>Why are some read alignments represented as coloured bars?</em> (hint: bring up the tooltip for these reads and look at insert size, mate start and pair orientation, compare with normal read alignments displayed as grey bars)</p>
<p>Let’s take a closer look at one of these two SNVs.</p>
<ul>
<li><p>Zoom in and centre on the C/T SNV on the left (chr21:19,479,321)</p></li>
<li><p>Right click in the alignment track and select <code>Expanded</code></p></li>
<li><p>Right click in the alignment track at this exact position and select <code>Sort alignments by &gt; base</code></p></li>
<li><p>Right click in the alignment track and select <code>Color alignments by &gt; read strand</code></p></li>
<li><p>Right click again in the alignment track and ensure that <code>Shade mismatch bases by quality</code> is selected</p></li>
<li><p>Hover the cursor over the red T bases in reads that support the SNV to display a tooltip providing useful details such as the the quality value for the T base or the read mapping quality</p></li>
</ul>
<div class="figure">
<img src="./Tutorial_IGV-images/high_confidence_snv.png">
</div>
<p><strong>Notes</strong></p>
<ul>
<li>High base qualities in all reads except one where the alternate allele is the last base in the read</li>
<li>Good mapping quality of reads</li>
<li>No strand bias</li>
<li>Allele fraction consistent with heterozygous mutation</li>
</ul>
<p><strong>Q4</strong> <em>Why is ‘Shade base by quality’ helpful for scrutinizing potential SNVs?</em></p>
<p><strong>Q5</strong> <em>How does ‘Color by read strand’ help?</em></p>
<p>Strand bias is where reads supporting a variant align to one strand, i.e.&nbsp;in the forward or the reverse direction, and not the other. It is associated with false positive variant calls.</p>
<hr>
</div>
<div id="homozygous-deletion" class="section level4">
<h4>Homozygous deletion</h4>
<ul>
<li><p>Navigate to region <code>chr21:19,324,500-19,331,500</code></p></li>
<li>Right click in the main alignment track and select</li>
<li><code>Expanded</code> view</li>
<li><code>View as pairs</code></li>
<li><code>Color alignments by -&gt; insert size and pair orientation</code></li>
<li><p><code>Sort alignments by -&gt; insert size</code></p></li>
<li><p>Hover over one of the red read pairs to display information about the alignments for both ends</p></li>
</ul>
<div class="figure">
<img src="./Tutorial_IGV-images/homozygous_deletion.png">
</div>
<p><strong>Notes</strong></p>
<ul>
<li>The average insert size of a read pair for this sample/library is 350bp</li>
<li>Insert size of red read pairs is 2875bp</li>
<li>This corresponds to a homozygous deletion of 2.5kb</li>
</ul>
<p>Reads that span a rearrangement often have clipped alignments and these can be viewed in IGV.</p>
<ul>
<li><p>Turn off <code>View as pairs</code> and zoom in to the left hand end of the deletion.</p></li>
<li><p>Open the view preferences dialog and select <code>Show soft-clipped bases</code> in the Alignments tab.</p></li>
</ul>
<p><strong>Q6</strong> <em>What do you notice about the clipped sequence from the junction-spanning reads?</em></p>
<p>Repeat for the other end of the deletion.</p>
<hr>
</div>
<div id="homopolymer-region-with-indel" class="section level4">
<h4>Homopolymer region with indel</h4>
<ul>
<li><p>Navigate to region <code>chr21:19,375,400-19,375,500</code></p></li>
<li><p>Right click in the alignment track and turn off ‘Shade base by quality’</p></li>
</ul>
<div class="figure">
<img src="./Tutorial_IGV-images/homopolymer_region_with_indel.png">
</div>
<p><strong>Q7</strong> <em>What do the purple</em> <code>I</code> <em>symbols represent?</em></p>
<p><strong>Q8</strong> <em>Several read alignments have mismatches. Can you see how these could have been aligned differently to be more consistent with other reads?</em></p>
<p><strong>Q9</strong> <em>How would you summarize the differences between HCC1143/BL and the reference sequence?</em></p>
<p><strong>Notes</strong></p>
<ul>
<li><p>Aligners often penalize opening a gap more heavily than allowing 2 or 3 mismatches towards the ends of reads; this can be source of false positive variant calls</p></li>
<li><p>Common variants from dbSNP include some cases that are actually common misalignments caused by repeats</p></li>
</ul>
<hr>
</div>
</div>
<div id="comparing-alignments-for-different-samples" class="section level3">
<div name="comparing_alignments_for_different_samples" data-unique="comparing_alignments_for_different_samples"></div>
<h3>Comparing alignments for different samples</h3>
<p>Multiple alignment tracks can be viewed alongside each other. This can be helpful when comparing the variants between related samples, e.g.&nbsp;comparing a cancer genome with the matched normal to detect somatic variants.</p>
<p>Another scenario where this would be useful is in looking for possible de novo mutations or autosomal recessive mutations within a parent-child trio. In this case, we would display the genomic read alignments for the mother and father alongside those for the child.</p>
<p><span class="math display"><span class="MathJax_Preview" style="color: inherit; display: none;"></span><div class="MathJax_Display" style="text-align: center;"><span class="MathJax" id="MathJax-Element-3-Frame" tabindex="0" data-mathml="&lt;math xmlns=&quot;http://www.w3.org/1998/Math/MathML&quot; display=&quot;block&quot; /&gt;" role="presentation" style="text-align: center; position: relative;"><nobr aria-hidden="true"><span class="math" id="MathJax-Span-5" style="width: 0em; display: inline-block;"><span style="display: inline-block; position: relative; width: 0em; height: 0px; font-size: 124%;"><span style="position: absolute; clip: rect(3.805em, 1000em, 4.15em, -999.997em); top: -3.972em; left: 0em;"><span class="mrow" id="MathJax-Span-6"></span><span style="display: inline-block; width: 0px; height: 3.978em;"></span></span></span><span style="display: inline-block; overflow: hidden; vertical-align: -0.068em; border-left: 0px solid; width: 0px; height: 0.146em;"></span></span></nobr><span class="MJX_Assistive_MathML MJX_Assistive_MathML_Block" role="presentation"><math xmlns="http://www.w3.org/1998/Math/MathML" display="block"></math></span></span></div><script type="math/tex; mode=display" id="MathJax-Element-3"> </script></span></p>
<div id="somatic-snv" class="section level4">
<h4>Somatic SNV</h4>
<p>We’ll load the alignments for the HCC1143 cell line alongside those for the matched normal that we’ve been looking at so far.</p>
<ul>
<li><p>Right click in the main alignment track and turn off <code>View as pairs</code></p></li>
<li><p>Select <code>File &gt; Load from File...</code> from the main menu and select the BAM file <code>HCC1143.tumour.21.19M-20M.bam</code> using the file browser.</p></li>
<li><p>Navigate to <code>chr21:19,544,728-19,544,828</code></p></li>
<li><p>Select the <code>Collapsed</code> view for both alignment tracks, tumour and normal.</p></li>
</ul>
<div class="figure">
<img src="./Tutorial_IGV-images/somatic_snv.png">
</div>
<p>Support for an A&gt;T mutation at <code>chr21:19,544,778</code> is only evident in the tumour. This is a somatic SNV.</p>
<p><strong>Q10</strong> <em>Is there any reason to doubt that this mutation is real?</em></p>
<hr>
</div>
<div id="loss-of-heterozygosity" class="section level4">
<h4>Loss of heterozygosity</h4>
<ul>
<li>Zoom out to see a 20kb region surrounding the somatic SNV we just examined at <code>chr21:19,544,778</code></li>
</ul>
<div class="figure">
<img src="./Tutorial_IGV-images/loss_of_heterozygosity.png">
</div>
<p><strong>Q11</strong> <em>What do you notice when comparing the variants that are visible in the coverage tracks in the tumour and normal?</em></p>
<hr>
</div>
<div id="germline-indel" class="section level4">
<h4>Germline indel</h4>
<ul>
<li><p>Navigate to and inspect the 8bp deletion at <code>chr21:19,956,710</code></p></li>
<li><p>View each of the tumour and normal tracks using the <code>Expanded</code> view</p></li>
</ul>
<div class="figure">
<img src="./Tutorial_IGV-images/germline_deletion.png">
</div>
<p><strong>Note:</strong> dbSNP contains common small insertions and deletions including this deletion.</p>
<p><strong>Q12</strong> <em>What fraction of the individuals sequenced as part of the 1000 Genomes Project also have this deletion?</em></p>
<p><strong>Q13</strong> <em>There is some ambiguity about which 8 bases have been deleted (i.e. the deletion is not present in all reads). Can you see why and what other sequences of 8 bases could have been used to
represent the deletion?</em></p>
<p>Reads containing indels have been left-aligned to standardize the representation when multiple valid representations are possible, i.e.&nbsp;when the same indel can be placed at multiple positions. The standard convention is to place an indel at the left-most position possible.</p>
<hr>
</div>
</div>
</div>
</div>
Supports Markdown
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment