GTDB
比较准确,但是内存消耗量大;常用于给MAGs注释物种
Download & Install
需要留意数据库&软件版本对应关系
conda create -n gtdbtk -c bioconda gtdbtk=2.3.0 -y
cd /mnt/d/WSL_dir/home/miniconda3/envs/gtdbtk/share/gtdbtk-1.0.2
wget -c https://data.gtdb.ecogenomic.org/releases/release214/214.0/auxillary_files/gtdbtk_r214_data.tar.gz
tar -xzf gtdbtk_r214_data.tar.gz
rm -r db; mv release214 db
## GTDBTK_DATA_PATH in /mnt/d/WSL_dir/home/miniconda3/envs/gtdbtk/etc/conda/activate.d/gtdbtk.sh
## is /mnt/d/WSL_dir/home/miniconda3/envs/gtdbtk/share/gtdbtk-1.0.2/db/
ncbi_taxdump(记录版本)
wget https://data.gtdb.ecogenomic.org/releases/release214/214.0/auxillary_files/ncbi_taxdump_20220917.tar.gz
Usage
- classify_wf: 通过单拷贝基因,基于GTDB reference tree 注释物种;identify-->align-->classify
- de_novo_wf: identify-->align-->infer-->root-->decorate,一般不用这个
gtdbtk classify_wf -x fa --genome_dir MAGs_genomes/ --out_dir ./ --prefix MAGs --min_perc_aa 30 --cpus 32
## MAGs_genomes/ : genome_1.fa genome_2.fa genome_3.fa
Outputs
user_genome classification fastani_reference fastani_reference_radius fastani_taxonomy fastani_ani fastani_af closest_placement_reference closest_placement_taxonomy closest_placement_ani closest_placement_af pplacer_taxonomy classification_method note other_related_references(genome_id,species_name,radius,ANI,AF) aa_percent translation_table red_value warnings
genome_2 d__Archaea;p__Thermoplasmatota;c__Thermoplasmata;o__Methanomassiliicoccales;f__Methanomethylophilaceae;g__VadinCA11;s__VadinCA11 sp002498365 GCA_002498365.1 95.0 d__Archaea;p__Thermoplasmatota;c__Thermoplasmata;o__Methanomassiliicoccales;f__Methanomethylophilaceae;g__VadinCA11;s__VadinCA11 sp002498365 99.16 0.94 GCA_002498365.1 d__Archaea;p__Thermoplasmatota;c__Thermoplasmata;o__Methanomassiliicoccales;f__Methanomethylophilaceae;g__VadinCA11;s__VadinCA11 sp002498365 99.16 0.94 d__Archaea;p__Thermoplasmatota;c__Thermoplasmata;o__Methanomassiliicoccales;f__Methanomethylophilaceae;g__VadinCA11;s__ ANI/Placement topological placement and ANI have congruent species assignments GCA_002505345.1, s__VadinCA11 sp002505345, 95.0, 89.92, 0.89; GCA_002509405.1, s__VadinCA11 sp002509405, 95.0, 88.13, 0.89 87.1 11 N/A N/A
genome_3 d__Archaea;p__Thermoplasmatota;c__Thermoplasmata;o__Methanomassiliicoccales;f__Methanomethylophilaceae;g__VadinCA11;s__VadinCA11 sp002498365 GCA_002498365.1 95.0 d__Archaea;p__Thermoplasmatota;c__Thermoplasmata;o__Methanomassiliicoccales;f__Methanomethylophilaceae;g__VadinCA11;s__VadinCA11 sp002498365 95.33 0.87 GCA_002498365.1 d__Archaea;p__Thermoplasmatota;c__Thermoplasmata;o__Methanomassiliicoccales;f__Methanomethylophilaceae;g__VadinCA11;s__VadinCA11 sp002498365 95.33 0.87 d__Archaea;p__Thermoplasmatota;c__Thermoplasmata;o__Methanomassiliicoccales;f__Methanomethylophilaceae;g__VadinCA11;s__ ANI/Placement topological placement and ANI have congruent species assignments GCA_002505345.1, s__VadinCA11 sp002505345, 95.0, 94.26, 0.87; GCA_002509405.1, s__VadinCA11 sp002509405, 95.0, 90.74, 0.77 73.07 11 N/A N/A
genome_1 d__Archaea;p__Euryarchaeota;c__Methanobacteria;o__Methanobacteriales;f__Methanobacteriaceae;g__Methanobrevibacter;s__Methanobrevibacter ruminantium GCF_000024185.1 95.0 d__Archaea;p__Euryarchaeota;c__Methanobacteria;o__Methanobacteriales;f__Methanobacteriaceae;g__Methanobrevibacter;s__Methanobrevibacter ruminantium 100.0 1.0 GCF_000024185.1 d__Archaea;p__Euryarchaeota;c__Methanobacteria;o__Methanobacteriales;f__Methanobacteriaceae;g__Methanobrevibacter;s__Methanobrevibacter ruminantium 100.0 1.0 d__Archaea;p__Euryarchaeota;c__Methanobacteria;o__Methanobacteriales;f__Methanobacteriaceae;g__Methanobrevibacter;s__ ANI/Placement topological placement and ANI have congruent species assignments GCA_900321995.1, s__Methanobrevibacter sp900321995, 95.0, 80.9, 0.7; GCF_900114585.1, s__Methanobrevibacter olleyae, 95.0, 79.96, 0.55; GCA_900314635.1, s__Methanobrevibacter sp900314635, 95.0, 78.45, 0.3 97.09 11 N/A N/A
参考
GTDB-Tk官网: https://gtdb.ecogenomic.org/