Skip to content

MaSuRCA

可用三代数据辅助二代数据的组装软件

Install

首先需要安装 numactl 与 boost; 随后下载release安装:(官方说不能用conda??)

sudo apt-get install numactl

wget https://boostorg.jfrog.io/artifactory/main/release/1.83.0/source/boost_1_83_0.tar.gz
tar -xzf boost_1_83_0.tar.gz
cd boost_1_83_0
./bootstrap.sh
./b2 install
cd ..


wget https://github.com/alekseyzimin/masurca/releases/download/v4.1.0/MaSuRCA-4.1.0.tar.gz
tar -xzf MaSuRCA-4.1.0.tar.gz
cd MaSuRCA-4.1.0
## not working:BOOST_ROOT=install ./install.sh
BOOST_ROOT=$PWD/../boost_1_83_0/ ./install.sh
cd ..

export PATH=$PWD/MaSuRCA-4.1.0/bin/:$PATH

Usage

masurca -t 16 -i R1.fq.gz,R2.fq.gz
masurca -t 16 -i R1.fq.gz,R2.fq.gz  -r nanopore.fq.gz  ##用二代数据纠正三代长reads

设置较多时使用 masurca config.txt 可生成 assemble.sh
config.txt示例:(参考:/install_path/sr_config_example.txt)

## If mean/stdev are unknown use 500 and 50 (safe values work for most runs)
## <two-character prefix> <fragment mean> <fragment stdev> <forward_reads> <reverse_reads(opt)>
## Use FULL_PATH !!! 
DATA
PE= pe 500 50  E_R1.fq.gz  E_R2.fq.gz
PACBIO=E_pacbio.fa
END

PARAMETERS
EXTEND_JUMP_READS=0
GRAPH_KMER_SIZE = auto
USE_LINKING_MATES = 0
USE_GRID=0
GRID_ENGINE=SGE
GRID_QUEUE=all.q
GRID_BATCH_SIZE=500000000
LHE_COVERAGE=25
LIMIT_JUMP_COVERAGE = 300
CA_PARAMETERS =  cgwErrorRate=0.15
CLOSE_GAPS=1
NUM_THREADS = 32
JF_SIZE = 200000000
SOAP_ASSEMBLY=0
FLYE_ASSEMBLY=1
END

bash ./assemble.sh 执行Pipeline

参考

Releases: https://github.com/alekseyzimin/masurca/releases