如何安装JCVI

使用conda安装

1
conda create -y -c bioconda -n jcvi jcvi 

使用时需要启动环境

1
conda activate jcvi

在此基础上可以更新到最新版的JCVI (Github版本会不断的增加新功能并修订bug)

1
pip install git+git://github.com/tanghaibao/jcvi.git

安装额外的依赖环境

  • Kent tools

  • BEDTOOLS

  • EMBOSS

  • LAST

  • LaTex

依赖环境中,BEDTOOLS, EMBOSS和LAST可以用conda安装

1
2
3
# BEDTOOLS 和 EMBOSS
conda install -y -n jcvi -c bioconda bedtools emboss last

Kent tools需要编译, 并且依赖MySQL,暂时不需要安装。

1
2
3
4
5
# Kent tools, 134M
wget http://hgdownload.cse.ucsc.edu/admin/jksrc.zip
unzip jksrc.zip
cd kent/src/lib
make

对于Latex,conda的安装版本 ( texlive-core )在我后续的测试中各种出问题, 我最后根据根据官方的quickinstall.html 进行了安装,具体的安装过程见 LaTex

建议通过root进行安装(或者手工安装)。

1
2
3
4
# ubuntu
sudo apt-get install -y texlive texlive-latex-extra texlive-latex-recommended
# centos
sudo yum install -y texlive texlive-latex texlive-xetex texlive-collection-latexrecommended

最终通过Python版的MCscan对安装结果进行测试 (需要用conda安装seqkit)

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
mkdir -p test && cd test
#Athaliana
wget ftp://ftp.ensemblgenomes.org/pub/plants/release-44/fasta/arabidopsis_thaliana/cdna/Arabidopsis_thaliana.TAIR10.cdna.all.fa.gz
wget ftp://ftp.ensemblgenomes.org/pub/plants/release-44/fasta/arabidopsis_thaliana/pep/Arabidopsis_thaliana.TAIR10.pep.all.fa.gz
wget ftp://ftp.ensemblgenomes.org/pub/plants/release-44/gff3/arabidopsis_thaliana/Arabidopsis_thaliana.TAIR10.44.gff3.gz
#Osativa
wget ftp://ftp.ensemblgenomes.org/pub/plants/release-44/fasta/oryza_sativa/cdna/Oryza_sativa.IRGSP-1.0.cdna.all.fa.gz
wget ftp://ftp.ensemblgenomes.org/pub/plants/release-44/fasta/oryza_sativa/pep/Oryza_sativa.IRGSP-1.0.pep.all.fa.gz
wget ftp://ftp.ensemblgenomes.org/pub/plants/release-44/gff3/oryza_sativa/Oryza_sativa.IRGSP-1.0.44.gff3.gz
# convert gff to bed
python -m jcvi.formats.gff bed --type=mRNA --key=transcript_id Arabidopsis_thaliana.TAIR10.44.gff3.gz > ath.bed
python -m jcvi.formats.gff bed --type=mRNA --key=transcript_id Oryza_sativa.IRGSP-1.0.44.gff3.gz > osa.bed
# deduplication
python -m jcvi.formats.bed uniq ath.bed
python -m jcvi.formats.bed uniq osa.bed
# Athaliana
seqkit grep -f <(cut -f 4 ath.uniq.bed ) Arabidopsis_thaliana.TAIR10.cdna.all.fa.gz | seqkit seq -i > ath.cds
seqkit grep -f <(cut -f 4 ath.uniq.bed ) Arabidopsis_thaliana.TAIR10.pep.all.fa.gz | seqkit seq -i > ath.pep
# Osativa
seqkit grep -f <(cut -f 4 osa.uniq.bed ) Oryza_sativa.IRGSP-1.0.cdna.all.fa.gz | seqkit seq -i > osa.cds
seqkit grep -f <(cut -f 4 osa.uniq.bed ) Oryza_sativa.IRGSP-1.0.pep.all.fa.gz | seqkit seq -i > osa.pep
# create a new directory
mkdir -p cds && cd cds
ln -s ../ath.cds ath.cds
ln -s ../ath.uniq.bed ath.bed
ln -s ../osa.cds osa.cds
ln -s ../osa.uniq.bed osa.bed
# compara
python -m jcvi.compara.catalog ortholog --no_strip_names ath osa

如果顺利,当前目录下会有一个 ath.osa.pdf 文件

我测试的时候遇到了两个不顺利,都是和LaTex有关的报错。

第一个是latex没有安装导致的报错

1
2
3
FileNotFoundError: [Errno 2] No such file or directory: 'latex': 'latex'
RuntimeError: Failed to process string with tex because latex could not be found

第二个是用conda安装texlive-core后,出现的第二个问题,还是得手工安装LaTex才解决。

1
2
kpathsea: Running mktexfmt latex.fmt
Can't locate mktexlsr.pl in @INC