WGS Variant Calling: Variant calling with GATK - Part 1 | Detailed NGS Analysis Workflow

Опубликовано: 10 Октябрь 2022
на канале: Bioinformagician
43,827
878

This is a detailed workflow tutorial of how to call variants (SNPs + Indels) from whole genome sequencing (WGS) data. In this video, I follow GATK best practice workflow and walk through setting up a pipeline in bash (linux) and perform steps to pre-process & align reads and ultimately get a VCF file -
• Quality control (fastQC)
• Alignment (BWA-MEM)
• Mark Duplicate reads and BQSR
• Variant calling using HaplotypeCaller
In addition I also provide intuition behind performing various steps, run times and memory requirements. I hope you find this video helpful! Leave your thoughts in the comment section below!

▸ Code:
https://github.com/kpatel427/YouTubeT...

▸ Data:
ftp://ftp-trace.ncbi.nih.gov/1000genomes/ftp/phase3/data/HG00096/sequence_read/SRR062634_1.filt.fastq.gz
ftp://ftp-trace.ncbi.nih.gov/1000genomes/ftp/phase3/data/HG00096/sequence_read/SRR062634_2.filt.fastq.gz

▸ GATK best practice workflow:
https://gatk.broadinstitute.org/hc/en...

▸ SAM file format:
   • Understanding Bioinformatics File For...  

▸ SAM flags:
   • SAM flags explained | Understanding S...  

▸ VCF file format:
   • Understanding File Formats in Bioinfo...  


Chapters:
0:00 Intro
0:48 Aim & Intuition behind variant calling
1:51 What is GATK?
3:07 Somatic vs Germline variants
5:07 GATK best practice workflow steps
6:13 Data pre-processing steps - alignment
7:37 A note on Read Groups
9:24 Data pre-processing steps - mark duplicate reads
10:39 Data pre-processing steps - Base Quality Score Recalibrator
13:08 Variant discovery
14:07 Data used for demonstration
15:09 System requirements
16:11 Setting up directories
16:45 Download data
17:55 Download reference fasta, known sites and create supporting files (.fai, .dict)
22:40 Setting directory paths
24:30 Step 1: Perform QC - FastQC
29:02 Step 2: Align reads - BWA-MEM
33:19 Step 3: Mark Duplicate Reads - GATK MarkDuplicatesSpark
35:27 Step 4: Base Quality Score Recalibration - GATK BaseRecalibrator + ApplyBQSR
38:15 Step 5: Post Alignment QC - GATK CollectAlignmentSummaryMetrics and CollectInsertSizeMetrics
40:55 Create multiQC report of post alignment metrics
42:57 Step 6: Call variants - GATK HaplotypeCaller

You can show your support and encouragement by buying me a coffee:
https://www.buymeacoffee.com/bioinfor...

To get in touch:
Website: https://bioinformagician.org/
Github: https://github.com/kpatel427
Email: [email protected]


#bioinformagician #bioinformatics #variantcalling #variants #gatk #vcf #gvcf #haplotype #alleles #geneticvariants #mutations #gff3 #gff #gtf #sam #bam #phred #fasta #fastq #singlecell #10X #ensembl #biomart #annotationdbi #annotables #affymetrix #microarray #affy #ncbi #genomics #beginners #tutorial #howto #omics #research #biology #GEO #rnaseq #ngs