R语言microeco:一个用于微生物群落生态学数据挖掘的R包（构建microeco对象。_Python

我以前写过临床微生物组的文章，其中数据分析用过microeco包，在这里，将我学到的资源分享给大家。

r语言microeco:一个用于微生物群落生态学数据挖掘的r包。

主要功能r6类;分类群丰度图，维恩图，alpha多样性，beta多样性，差异丰度分析，环境数据分析，零模型分析，网络分析，功能分析。

install.packages("microeco")
library(microeco)
library(magrittr)
library(ggplot2)
set.seed(123)
theme_set(theme_bw())

> data(sample_info_16s) #样本信息表
> data(otu_table_16s) #otu表
> data(taxonomy_table_16s) #分类表
> data(phylo_tree_16s) #
> data(env_data_16s) #环境因子

otu_table_16s[1:5, 1:5]
         s1 s2 s3 s4  s5
otu_4272  1  0  1  1   0
otu_236   1  4  0  2  35
otu_399   9  2  2  4   4
otu_1556  5 18  7  3   2
otu_32   83  9 19  8 102

> taxonomy_table_16s[1:5, 1:3]
             kingdom                                phylum                 class
otu_4272 k__bacteria                         p__firmicutes            c__bacilli
otu_236  k__bacteria                        p__chloroflexi                   c__
otu_399  k__bacteria                     p__proteobacteria c__betaproteobacteria
otu_1556 k__bacteria                      p__acidobacteria      c__acidobacteria
otu_32    k__archaea p__miscellaneous crenarchaeotic group                   c__

有时，分类学表可能会有一些混乱的信息，如na、unidentified和unknown。这些信息可以影响接下来的分类丰度计算。因此，有必要使用以下代码清理该文件。该操作的另一个重要部分是统一分类学前缀，例如将门级的d_1__转换为p__。

> taxonomy_table_16s %<>% tidy_taxonomy

> sample_info_16s[1:5, ]
   sampleid group type          saline
s1       s1    iw   ne non-saline soil
s2       s2    iw   ne non-saline soil
s3       s3    iw   ne non-saline soil
s4       s4    iw   ne non-saline soil
s5       s5    iw   ne non-saline soil

> head(env_data_16s)
   latitude longitude altitude temperature precipitation   toc      nh4  no3   ph conductivity   tn
s1 52.96482  122.5635      432        -4.2           445 10.02 18.95966 1.93 5.10        108.7 0.28
s2 52.94877  122.6109      445        -4.3           449  6.94 21.05730 1.46 6.53         31.2 0.22
s3 52.94932  122.6099      430        -4.3           449  8.38 14.31937 2.38 5.13         68.2 0.26
s4 52.94900  122.6104      430        -4.3           449  8.17 20.89645 1.38 5.81         64.4 0.25
s5 52.95292  122.6057      429        -4.3           449  8.41  9.65465 0.77 5.50         33.5 0.24
s6 52.95300  122.6123      429        -4.3           449  8.13 15.94577 1.31 5.37         33.3 0.26

> class(phylo_tree_16s)
[1] "phylo"

然后，我们创建一个microtable类的对象。该操作与package phyloseq非常相似，但microeco更简短、更简单。microtable类中的otu_table必须是物种-样本格式:行名必须是otu名称，colname必须是样本名称。在sample_table的rownames和otu_table的colnames中，所需的样例名称必须相同。

dataset <- microtable$new(sample_table = sample_info_16s, otu_table = otu_table_16s, tax_table = taxonomy_table_16s, phylo_tree = phylo_tree_16s)

> class(dataset)
[1] "microtable" "r6"

> print(dataset)
microtable-class object:
sample_table have 90 rows and 4 columns
otu_table have 13628 rows and 90 columns
tax_table have 13628 rows and 7 columns
phylo_tree have 14096 tips

为了使物种和样本信息在数据集对象的不同文件中保持一致，我们可以使用函数tidy_dataset()来修剪数据集。

>dataset$tidy_dataset()
> print(dataset)
microtable-class object:
sample_table have 90 rows and 4 columns
otu_table have 13628 rows and 90 columns
tax_table have 13628 rows and 7 columns
phylo_tree have 13628 tips

然后，我们去除在“k__archaea”或“k__bacteria”中未分配的otu。

> dataset$tax_table %<>% base::subset(kingdom == "k__archaea" | kingdom == "k__bacteria")
> print(dataset)
microtable-class object:
sample_table have 90 rows and 4 columns
otu_table have 13628 rows and 90 columns
tax_table have 13330 rows and 7 columns
phylo_tree have 13628 tips

我们还去除分类上带有“线粒体”或“叶绿体”的otu。

> dataset$filter_pollution(taxa = c("mitochondria", "chloroplast"))
total 34 features are removed from tax_table ...
> print(dataset)
microtable-class object:
sample_table have 90 rows and 4 columns
otu_table have 13628 rows and 90 columns
tax_table have 13296 rows and 7 columns
phylo_tree have 13628 tips

然后，为了使otu_table、tax_table和phylo_tree中的otu相同，我们再次使用tidy_dataset()。

> dataset$tidy_dataset()
> print(dataset)
microtable-class object:
sample_table have 90 rows and 4 columns
otu_table have 13296 rows and 90 columns
tax_table have 13296 rows and 7 columns
phylo_tree have 13296 tips

然后使用sample_sum()检查每个样本中的序列数。

> dataset$sample_sums() %>% range
[1] 10316 37087

有时，为了减少物种数量对多样性测量的影响，我们需要进行重采样，使每个样本的序列数相等。函数rarefy_samples可以在细化之前和之后自动调用函数tidy_dataset。

# as an example, we use 10000 sequences in each sample
> dataset$rarefy_samples(sample.size = 10000)
530 features are removed because they are no longer present in any sample after random subsampling ...
530 taxa with 0 abundance are removed from the otu_table ...
> dataset$sample_sums() %>% range
[1] 10000 10000

然后，我们使用cal_abund()计算每个分类等级上的分类群丰度。这个函数返回一个名为taxa_abund的列表，其中包含每个分类等级的丰度信息的几个数据帧。该列表自动存储在microtable对象中。

> dataset$cal_abund()
# return dataset$taxa_abund
> class(dataset$taxa_abund)

如果希望将分类库丰度文件保存到本地，请使用save_abund()。

> dir.create("taxa_abund")
> dataset$save_abund(dirpath = "taxa_abund")

然后，我们计算alpha分集。结果也自动存储在对象微表中。例如，我们不计算系统发育多样性。

> dataset$cal_alphadiv(pd = false)
# return dataset$alpha_diversity
> class(dataset$alpha_diversity)

> dir.create("alpha_diversity")
> dataset$save_alphadiv(dirpath = "alpha_diversity")

我们还使用函数cal_betadiv()计算beta分集的距离矩阵。我们提供了四种最常用的指数:bray-curtis、jaccard、加权unifrac和未加权unifrac。

> dir.create("alpha_diversity")
> dataset$save_alphadiv(dirpath = "alpha_diversity")
> dataset$cal_betadiv(unifrac = true)
accumulate the abundance along the tree branches...
compute pairwise distances ...
completed!
the result is stored in object$beta_diversity ...
# return dataset$beta_diversity
> class(dataset$beta_diversity)
[1] "list"
> dir.create("beta_diversity")
> dataset$save_betadiv(dirpath = "beta_diversity")

R语言microeco:一个用于微生物群落生态学数据挖掘的R包（构建microeco对象。

2024年08月01日 • Python •我要评论

相关文章:

PyTorch LSTM谷歌股价预测（完整代码与训练过程）

【深度学习基础】用PyTorch从零开始搭建DNN深度神经网络

用Pytorch构建第一个神经网络模型（附案例实战）

发表评论


验证码：