Skip to content

Re seq

将测序结果比对至已知reference,可探测个体的突变信息

Pipeline

对每个样本:Mapping 后 drop PCR duplicates + 低质量的Mapping (MAPQ<10),call variants:

Variants Detection Annotation
SNP/INDEL GATK ANNOVAR: UTR/exon/upstream/splicing/... + CDS: synonymous/frameshift/stopgain/...
SV BreakDancer dbVar
CNV CNVnator --

此外还有常用突变频率数据库(G1000/ExAC/ESP6500/dbSNP),功能预测数据库(dbNSPF),突变治病信息与表型数据库(HGMD/dbGap/ClinVar)

随后,针对不同的实验设计,可进行群体分析/GWAS/Binmap/QTL/... TBA

Notations

NM_000500 .7 (CYP21A2): c. 293-13 C>G

NM_000500 .7  转录本/染色体名称+版本号
CYP21A2       基因名称
c./n./g.      coding/non-coding/genome
C>G           mutation
  • 纯合突变hom/杂合突变het
hom  所有等位基因都发生突变,对于二倍体:    aa->AA
het    等位基因之一发生突变,对于二倍体:    aa->Aa
  • RIL群体:F2开始连续自交或同胞交配直至趋于纯合,过程中经历多次减数分裂,重组程度高于F2,比F2更能解析F1代配子的分离。

  • SNP

Ref     TTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTT

Reads                   TTTTTTTTTTTCTTTTTTTT    SNP: T>C
                      TTTTTTTTTTTTTCTTTTTT  
                  TTTTTTTTTTTTTTTTTGTT          Sequencing Error
                          TTTTTTTTTCTTTTTTTTTT

考虑到+/-链的问题:正链(T>C)==负链(A>G),故统称 T:A>C:G
情形共有: C:G>A:T  |  C:G>G:C  |  C:G>T:A  |  T:A>A:T  |  T:A>C:G  |  T:A>G:C 
  • INDEL
Reads              TTTTTTTTTTT                Deletion: GGG deleted in query seq
                       / \
Ref     TTTTTTTTTTTTTTTGGGTTTTTTTTTTTTTTTTTTTTTTTTTTTT
                                / \
Reads                   TTTTTTTTAAATTTTT      Insertion: AAA inserted to query seq
  • SV (Structural variation),下文 CNV 是特殊的 SV
RefRegion1     TTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTT
RefRegion2     GGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGG

Reads              TTTTTTTTGGGGGG           SV: >1kb region, can be cross-chromosomal
  • CNV (Copy Number Variation)检测方法:根据深度/断裂点/突变基因型
对于二倍体:

Ref          ===========GeneA==========
缺失         ==========================
重复         ===========GeneAGeneA======

或发生在两条姐妹染色单体之间:
             -----------GeneAGeneA------
             ---------------------------