Skip to content

Mummer

全局比对

Install

sudo apt-get install g++
sudo apt-get install csh
sudo apt-get install gnuplot
sudo apt-get install gnuplot-x11



conda create -n mummer4
conda activate mummer4
conda install -c bioconda mummer4
conda deactivate
export PATH=$PATH:/mnt/d/WSL_dir/home/miniconda3/envs/mummer4/bin/

注意,conda环境中时gnuplot可能失败,所以在环境外边plot

  • Error1 - perl版本过高
## Can't use 'defined(%hash)' (Maybe you should just omit the defined()?) at /mnt/d/WSL_dir/home/MUMmer3.23/mummerplot line 884.
which mummerplot | while read dd ; do perl -i -pe 's/defined \(%/\(%/' $dd ; done
  • Error2 - gnuplot出错; 此时需要本地打开Xming,然后
export DISPLAY=localhost:0
which mummerplot | while read dd ; do sed -i "s/GNUPLOT_EXE = 'false'/GNUPLOT_EXE = 'gnuplot' /g" $dd ; done

Data

下载Salmonella enterica的不同Assembly作为数据, Fasta_Header_Rename.py 本网站搜索可得。

wget https://ftp.ncbi.nlm.nih.gov/genomes/all/GCF/001/558/355/GCF_001558355.2_ASM155835v2/GCF_001558355.2_ASM155835v2_genomic.fna.gz -O S1.fa.gz
wget https://ftp.ncbi.nlm.nih.gov/genomes/all/GCF/016/028/495/GCF_016028495.1_ASM1602849v1/GCF_016028495.1_ASM1602849v1_genomic.fna.gz -O S2.fa.gz
wget https://ftp.ncbi.nlm.nih.gov/genomes/all/GCF/000/006/945/GCF_000006945.2_ASM694v2/GCF_000006945.2_ASM694v2_genomic.fna.gz -O S3.fa.gz

gunzip *

## Rename header as: S1_1, S1_2 ...
for dd in S1 S2 S3; do python3 Fasta_Header_Rename.py ${dd}.fa ${dd}.fa ${dd} ; done

Alignment

  • mummer: 使用suffix tree定位两个序列之间的最大唯一匹配,生成精确匹配的列表;同时最多32个Query File

  • nucmer: 先找到给定长度的最大精确匹配,随后通过聚类、扩展合并形成更大的不精确对齐区域。适用于定位和显示DNA序列的高度保守区域。需要注意其中的anchor设定(默认--mumreference)

  • promer: 策略同NUCmer,不过尝试翻译了6个氨基酸阅读框,因此比NUCmer具有更高的灵敏度;适用于识别在DNA水平上可能不保守的保守蛋白质序列的区域

  • run-mummer1: 策略同NUCmer,不过可以处理非核苷酸序列;善于对齐非常相似的DNA序列并识别它们的差异,非常适合SNP和错误检测;适用于没有重排的one vs. one comparisons

  • run-mummer3: 策略同NUCmer,不过可以处理非核苷酸序列;善于对齐非常相似的DNA序列并识别它们的差异,非常适合SNP和错误检测;适用于可能涉及重排的one vs. many comparisons

注意:conda-mummer4中无 mapview 和 run-mummer

Usage:

mummer [options] <reference file> <query file1> . . . [query file32]
nucmer [options] <reference file> <query file>  
promer [options] <reference file> <query file>
run-mummer1 <fasta reference> <fasta query> <prefix> [-r]  
run-mummer3 <fasta reference> <multi-fasta query> <prefix>

Example:

## Generates mummerO.mums
## mummer S1.fa  S2.fa  S3.fa  > mummerO.mums
mummer S1.fa  S2.fa > mummerO.mums


## Generates nucmerO.delta
nucmer -p nucmerO   S1.fa  S2.fa


## Generates promerO.delta
promer -p promerO    S1.fa  S2.fa

## Generates <prefix>.out, <prefix>.gaps, <prefix>.errorsgaps and <prefix>.align
## Depreted in conda-mummer4 ??
run-mummer1 S1.fa S2.fa run1_reverse -r
run-mummer3 S1.fa S2.fa run3

Plot

mummerplot -h

mummerplot --png --prefix=mummerO   mummerO.mums
mummerplot --png --prefix=nucmerO   nucmerO.delta
mummerplot --png --prefix=promerO   promerO.delta

mummerplot --png --prefix=mummerO_Cov_SNP  --SNP  -c mummerO.mums
mummerplot --png --prefix=nucmerO_Cov_SNP  --SNP  -c nucmerO.delta

mummerO nucmerO promerO mummerO_Cov_SNP nucmerO_Cov_SNP

Process

注:建议去mummer3官网查看Use cases and walk-throughs章节

## Filter by: identity>89, length > 1000
delta-filter -i 89 -l 1000 -1 nucmerO.delta > nucmerO.delta.filter

## Generate: xx.1coords  xx.1delta  xx.mcoords  xx.mdelta  xx.qdiff  xx.rdiff  xx.report  xx.snps
dnadiff -p nucmerO_diff  -d nucmerO.delta

## human readable table
show-coords -r -c -l -L 100 -I 50  nucmerO.delta |  less

## Show alienment of 2 sequence
show-aligns nucmerO.delta S1_1 S2_1 | less


## Not suitable for "many vs. many" assembly comparisons
## 将query contigs 贴回ref,以得到最佳映射位置;Each contig may only be tiled once, so repetitive regions may cause  difficulty
show-tiling promerO.delta  >  promerO.delta.tiling
mummerplot --png --prefix=promerO_tiling  promerO.delta.tiling

show-coords show-aligns

参考

mummer3: https://mummer.sourceforge.net/
mummer4: https://github.com/mummer4/mummer
mummer4: https://mummer4.github.io/install/install.html
中文1: https://www.jianshu.com/p/c12f2a117892
中文2: https://www.jianshu.com/p/2e184e5c15b7