Skip to content

KEGG

关联:NR SWissport GO KEGG 已实现相互注释;enrichment 可参考 GO

举例 -> sbml 格式

KEGG 通路 ko00020 包含多个 ORTHOLOGY 字段 (KO,KEGG Orthology),任选 K00024->EC:1.1.1.37 包含找到多条 REACTION 字段 RN:R00342,其中包含参与的 Compounds C00149 ...

ENTRY       R00342                      Reaction
NAME        (S)-malate:NAD+ oxidoreductase
DEFINITION  (S)-Malate + NAD+ <=> Oxaloacetate + NADH + H+
EQUATION    C00149 + C00003 <=> C00036 + C00004 + C00080
RCLASS      RC00001  C00003_C00004
            RC00031  C00036_C00149
ENZYME      1.1.1.37        1.1.1.299

将 KEGG 注释转换为 SBML 时,推荐使用 EC 编号关联 REACTION Gene → KO → EC → Reaction

EC编号是酶的ID编码,可与外部数据库交叉引用:KEGG,UniProt,BRENDA,MetaCyc/BioCyc(非冗余、实验阐明的代谢途径和酶的数据库)

EC:a.b.c.d

a. 反应类型
b. 反应物性质
c. 受体性质
d. 序列号 - 具体酶

参考KEGG REST API的提示,pathway prefix 主要包括:

map: default
ko - reference pathway map linked to KO entries (K numbers)
rn - reference pathway map linked to REACTION entries (R numbers)
ec - reference pathway map linked to ENZYME entries (EC numbers)
org (three- or four-letter organism code) - organism-specific pathway map linked to GENES entries (gene IDs)

KEGG数据库:常用link/list

https://rest.kegg.jp/list/organism  得知人类对应prefix是hsa
http://rest.kegg.jp/list/hsa

    ftp://ftp.genome.jp/pub/db/
    http://rest.kegg.jp/link/hsa/ko     大K和人hsa的对应关系
    http://rest.kegg.jp/list/ko           大K的功能描述和酶信息
    http://rest.kegg.jp/list/pathway         map/ko 功能描述
    http://rest.kegg.jp/list/pathway/hsa     hsa-ko功能描述
    http://rest.kegg.jp/link/pathway/ko         大K和ko的对应关系
    http://rest.kegg.jp/link/pathway/compound   化合物C编号和ko的对应关系
    http://rest.kegg.jp/link/ko/module          大K和M的对应关系

    https://www.kegg.jp/brite/br08902       kegg层级文件
    https://www.kegg.jp/kegg-bin/show_brite?br08001.keg   化合物功能分类

另外,常用:ko01200

Carbon metabolism - ko01200
Methane metabolism - ko00680
Nitrogen metabolism - ko00910
Sulfur metabolism - ko00920

Setup DB

  1. 下载 K---(enzyme), map/ko---(pathway) 及其对应关系: 双方并非一一对应关系;K可以参与多个ko,ko也可以包含多个K
  2. 使用 Collect_pathway_info.py 收集 pathway 中的 layer info(CLASS)
  3. 制作 K---(enzyme), map/ko---(pathway) 关系的映射文件,单行对应为一个K/ko对应的所有、以','分割
## relation
wget -c http://rest.kegg.jp/link/ko/pathway -O K_path

## map/ko--- pathway
wget -c http://rest.kegg.jp/list/pathway                                  ## 570
grep map K_path |cut -f1 | cut -d ':' -f2 |sort|uniq |wc -l               ## 486

## K--- enzyme/KO 
wget -c http://rest.kegg.jp/list/ko               ## 26696
cut -d ':' -f3 K_path |sort|uniq |wc -l           ## 14019


mkdir map_folder
cd map_folder
cut -f1 ../pathway |cut -f2 -d ':'  |while read dd ; do wget -c http://rest.kegg.jp/get/$dd ;sleep 1; done
cd ..

python3 Collect_pathway_info.py  pathway  map_folder  pathway_info.xls


rm path_allK
less pathway| cut -f1|sort | uniq| while read dd ; do echo -e "$dd\t\c" >> path_allK; grep $dd K_path  | sed 's/ko://g' |  awk '{printf $2 ","}' >> path_allK; echo '' >> path_allK; done


rm K_allpath
less ko| cut -f1|sort | uniq| while read dd ; do echo -e "$dd\t\c" >> K_allpath; grep $dd K_path| grep 'map' | sed 's/path://g' |  awk '{printf $1 ","}' >> K_allpath; echo '' >> K_allpath; done


## awk -F '\t' '{if ($3!="-" && $4!="-") print ($1,$3,$4,$5)}'  pathway_info.xls |  less
## 使用map---而不是ko---,只是因为官方提供的是map---
## Layer示例:ko01100无CLASS,ko05416有CLASS

Anno

kofam_scan

Install:

wget -c ftp://ftp.genome.jp/pub/db/kofam/ko_list.gz
wget -c ftp://ftp.genome.jp/pub/db/kofam/profiles.tar.gz
gunzip ko_list.gz
tar -xzf profiles.tar.gz
KO_LIST=$PWD/ko_list
PROFILE_DIR=$PWD/profiles/

conda create -n kofam -c bioconda kofamscan     ##  hmmer parallel ruby
conda activate kofam

Use:

exec_annotation uniqGeneSet.faa -o kegg.txt -p $PROFILE_DIR -k $KO_LIST

Output: kegg.txt;多个hits,选E-value最优一个

# gene name    KO     thrshld  score   E-value        KO definition
#-------------------- ------ ------- ------ --------- ---------------------
  GeneA        K00121  619.83  560.5  7.2e-172 S-(hydroxymethyl)glutathione dehydrogenase / alcohol dehydrogenase [EC:1.1.1.284 1.1.1.1]
  GeneA        K00001  345.97  308.1   2.5e-95 alcohol dehydrogenase [EC:1.1.1.1]
  GeneA        K00055  389.00  305.5   8.8e-95 aryl-alcohol dehydrogenase [EC:1.1.1.90]
  GeneA        K00153  383.77  276.3   7.8e-86 S-(hydroxymethyl)mycothiol dehydrogenase [EC:1.1.1.306]

KEGG Graph

对于一些关注基因(e.g.差异表达),Color tool可在线标注颜色,或者使用url控制颜色,本地可使用使用pathview R包,或者也可下载KGML后用cytoscape-KEGGscape绘制(参考:KEGGscape/py4cytoscape,Dash Cytoscape)。

KGML

理解KGMLKGML颜色说明

cd /mnt/d/WSL_dir/home/miniconda3/envs/r-base/lib/R/library/pathview/extdata/

for dd in ko00010 ko00600 ko04110; do 
wget https://rest.kegg.jp/get/$dd/kgml -O $dd.kgml;
wget https://rest.kegg.jp/get/$dd/image -O $dd.png;
done

其它使用:parser KGML文件,然后用 graphviz 画箭头图,示例:kegg_graphviz.py --TBA

pathview

Install:

if (!require("BiocManager", quietly = TRUE))
    install.packages("BiocManager")

BiocManager::install("pathview")

browseVignettes("pathview")

Usage:

library("pathview")

KGML_PATH = "/mnt/d/WSL_dir/workdir/KEGG/tt/ko04110.kgml"
IMG_FOLDER = "."
ko_ID = "ko04110"                ## ko_ID.kgml must in IMG_FOLDER
OUT_SUFFIX = "myRed"

node.data=node.info(KGML_PATH)
plot.data.gene = node.map(mol.data=NULL, node.data, node.types="ortholog")   ## ortholog/gene/...
COLOR_LIST = rep('red',length(plot.data.gene$x))

keggview.native(
     plot.data.gene = plot.data.gene, 
     cols.ts.gene = COLOR_LIST,   ## Color List by the order of plot.data.gene
     node.data, 
     pathway.name = ko_ID,
     out.suffix = OUT_SUFFIX, 
     kegg.dir = IMG_FOLDER)

参考

KEGG: https://www.kegg.jp/

(kofam_scan)[https://academic.oup.com/bioinformatics/article/36/7/2251/5631907]: ftp://ftp.genome.jp/pub/tools/kofam_scan/INSTALL

Downloads: ftp://ftp.genome.jp/pub/

一文快速读懂 KEGG 数据库与通路图: https://zhuanlan.zhihu.com/p/96008506

KGML: https://www.genome.jp/kegg/xml/docs/

KGML下载: https://www.kegg.jp/kegg/rest/keggapi.html

理解KGML:https://cloud.tencent.com/developer/article/1626035

https://www.genome.jp/kegg/pathway.html