Skip to content

OrthoFinder

1

Concepts

  • LCA: Last Common Ancestor
  • (形容一对基因间的关系)
  • Homologs: 来源于共同祖先的相似序列
    • Ortholog: 因物种形成而分支进化的基因 (i.e. 来源于2个物种的LCA中某个基因的一对基因)
    • Paralogs: 同一物种中由于复制形成的基因,可能会进化出与原功能相关的功能
    • Xenologs: 通过共生或病毒侵染导致的水平基因转移,基因在跨度巨大的物种间跳跃转移
  • COG: Clusters of Orthologous Groups of proteins, NCBI
    • 原核生物: COG数据库
    • 真核生物: KOG数据库
  • (Species-specific) Orthogroup: 来自一组三个及以上物种的LCA中某个基因的一组基因
  • Single-copy gene: 在生物的一个染色体组中只有一份拷贝的基因 (对比:Multi-copy gene)
  • Single-copy Orthologs (SC-OGs): Ortholog,且表现为单拷贝形式

Install

conda:

conda create -n orthofinder2 -c bioconda orthofinder -y
conda activate orthofinder
conda deactivate

use releases if conda fails

wget https://github.com/davidemms/OrthoFinder/releases/download/2.5.5/OrthoFinder.tar.gz
tar xzvf OrthoFinder.tar.gz
export PATH=$PWD/OrthoFinder/:$PATH

如果GLIBC版本报错,可于Tricks中搜索关键词

Usage

FAA_folder/中存放proteomes fasta,每个species一个文件; 先用其自带的ExampleData/

FAA_folder=OrthoFinder/ExampleData/

orthofinder -f $FAA_folder

Results

output会在FAA_folder/OrthoFinder/Results_*/中, 官方解读见exploring-orthofinders-results, 推荐一个中文解读,示例结果文件夹:Results_Jun30.zip

列出需要注意的文件,其中tree可用ETE Toolkit tree viewer在线查看:

Comparative_Genomics_Statistics/
    - Statistics_PerSpecies.tsv          ## QC
    - Duplications_per*.tsv
    - 
Species_Tree/
    - SpeciesTree_rooted.txt             ## buikt by STAG, rooted by STRIDE
    - SpeciesTree_rooted_node_labels.txt ## tree node labled 'N0'...(for checking Gene_Duplication_Events)
Orthologues/
    - Orthologues_A_B/*.tsv              ## find SpeA_GeneX's orthologue in SpeB (& belong to which Orthogroup)
    - *.tsv                              ## find SpeA_GeneX's orthologue in other Spes (& belong to which Orthogroup)
Orthogroups/                             
    - Orthogroups.tsv                    ## Each Orthogroups contains wich gene from which Spe
Gene_Trees/
    - OG*******_tree.txt                 ## Gene tree for each Orthogroups
Gene_Duplication_Events/
    - SpeciesTree_Gene_Duplications_0.5_Support.txt    ## tree node labled 'N0_35'... 35:well-supported duplication events
    - Duplications.tsv                   ## Check where Duplication happened
Resolved_Gene_Trees/
    - OG*******_tree.txt                 ## Gene tree node labled 'N0'...(for checking Gene_Duplication_Events)
Orthogroup_Sequences/
    - OG*******.fa                       ## sequences for the genes in orthogroup



Single_Copy_Orthologue_Sequences/        ## 单拷贝直系同源组序列

Phylogenetic_Hierarchical_Orthogroups/
Phylogenetically_Misplaced_Genes/
Putative_Xenologs/

注意

  • STAG(默认)比-M msa更严格
  • SpeciesTree_rooted不正确不会影响orthogroup inference,但可能会影响orthologue inference (如果有gene duplication events)。此情况需手动修改tree,然后在修改后物种树的基础上使用(-ft and -s options) -s <file> User-specified rooted species tree-ft <dir> Start OrthoFinder from pre-computed gene trees in <dir>
  • Gene-duplication events are considered ‘well-supported’ if at least 50% of the descendant species have retained both copies of the duplicated gene
  • Resolved_Gene_Trees/ 因使用Duplication-Loss-Coalescence analysis而更加parsimonious,可能与gene_trees/有细微不同

参考

Tutorial: https://davidemms.github.io/
友好阐释:https://blog.csdn.net/sinat_41621566/article/details/112320002
newick:https://www.jianshu.com/p/80f0b8ebf2a5
结果阐释:https://www.jianshu.com/p/82d4cf6c3eda