python – 如何给出一个outgroup,为一组物种生成所有可能的Newic
如果给出一个外群,我如何为一组物种生成所有可能的Newick树排列?
对于那些不知道Newick树格式的人,可以在以下位置找到一个很好的描述: 我想在给出一个外群的情况下为一组物种创建所有可能的Newick树排列.我期望处理的叶节点的数量很可能是4,5或6个叶节点. 允许“软”和“硬”多面体. 下面显示的是理想输出,“E”设置为outgroup 理想输出: ((("A","B","C"),("D"),("E")); ((("A","D"),("C"),"C",("B"),("E")); ((("B",("A"),"B")("C","C")("B","C")("A",("E")); (("A",("E")); (((("A","B"),("E")); 但是,我使用itertools带来的任何可能的解决方案,特别是itertools.permutations,都遇到了等效输出的问题.我想出的最后一个想法涉及下面显示的等效输出. 等效输出: ((("C","A"),("E")); ((("C","A",("E")); 这是我的解决方案的开始.但是,除了itertools之外,我现在还不确定该怎么回事. import itertools def Newick_Permutation_Generator(list_of_species,name_of_outgroup) permutations_list =(list(itertools.permutations(["A","D","E"]))) for given_permutation in permutations_list: process(given_permutation) Newick_Permutation_Generator(["A","E"],"E") 解决方法
树作为递归套叶
让我们暂时搁置newick表示,并考虑问题的可能python表示. 根树可以被视为(((……组)叶集的递归层次结构.集合是无序的,非常适合描述树中的分支:{{{“A”,“B”},{“C”,“D”}},“E”}应该与{{{ {“C”,“D”},{“B”,“A”}},“E”}. 如果我们考虑叶子的初始集合{“A”,“B”,“C”,“D”,“E”},以“E”作为外群的树是{tree,’形式的集合集. E“}树木取自树木的一组树木,可以从树叶集合{”A“,”B“,”C“,”D“}建造.我们可以尝试编写一个递归树函数来生成这组树,我们的总树集将表示如下: {{tree,"E"} for tree in trees({"A","D"})} (这里,我使用set comprehension表示法.) 实际上,python不允许集合集,因为集合的元素必须是“可散列的”(也就是说,python必须能够计算对象的一些“散列”值才能检查它们是否属于集).碰巧python集没有这个属性.幸运的是,我们可以使用名为 all_trees = frozenset( {frozenset({tree,"E"}) for tree in trees({"A","D"})}) 实现树功能 现在让我们关注树木功能. 对于叶子集的每个可能的分区(分解成一组不相交的子集,包括所有元素),我们需要为分区的每个部分找到所有可能的树(通过递归调用).对于给定的分区,我们将为每个可能的子树组合制作一个树. 例如,如果分区是{“A”,我们将考虑可以从“A”部分制作的所有可能的树(实际上,只是叶子“A” “本身”,以及所有可能由部分{“B”,“D”}(即树木({“B”,“D”}))制成的树木.然后,通过采用所有可能的对来获得该分区的可能树,其中一个元素来自“A”,而另一个元素来自树({“B”,“D”}). 这可以推广到具有两个以上部分的分区,而itertools的产品功能似乎在这里很有用. 因此,我们需要一种方法来生成一组叶子的可能分区. 生成集合的分区 这里我创建了一个改编自this solution的partitions_of_set函数: # According to https://stackoverflow.com/a/30134039/1878788: # The problem is solved recursively: # If you already have a partition of n-1 elements,how do you use it to partition n elements? # Either place the n'th element in one of the existing subsets,or add it as a new,singleton subset. def partitions_of_set(s): if len(s) == 1: yield frozenset(s) return # Extract one element from the set # https://stackoverflow.com/a/43804050/1878788 elem,*_ = s rest = frozenset(s - {elem}) for partition in partitions_of_set(rest): for subset in partition: # Insert the element in the subset try: augmented_subset = frozenset(subset | frozenset({elem})) except TypeError: # subset is actually an atomic element augmented_subset = frozenset({subset} | frozenset({elem})) yield frozenset({augmented_subset}) | (partition - {subset}) # Case with the element in its own extra subset yield frozenset({elem}) | partition 为了检查获得的分区,我们创建了一个函数,使它们更容易显示(这对于稍后对树进行newick表示也很有用): def print_set(f): if type(f) not in (set,frozenset): return str(f) return "(" + ",".join(sorted(map(print_set,f))) + ")" 我们测试分区是否有效: for partition in partitions_of_set({"A","D"}): print(len(partition),print_set(partition)) 输出: 1 ((A,B,C,D)) 2 ((A,D),C) 2 ((A,C),(B,D)) 2 ((B,A) 3 ((B,A,D) 2 ((A,B),(C,D)) 3 ((A,C)) 2 ((A,B) 3 ((A,C) 3 ((A,D) 3 ((B,D) 3 ((C,B) 4 (A,D) 树的实际代码功能 现在我们可以编写树函数: from itertools import product def trees(leaves): if type(leaves) not in (set,frozenset): # It actually is a single leaf yield leaves # Don't try to yield any more trees return # Otherwise,we will have to consider all the possible # partitions of the set of leaves,and for each partition,# construct the possible trees for each part for partition in partitions_of_set(leaves): # We need to skip the case where the partition # has only one subset (the initial set itself),# otherwise we will try to build an infinite # succession of nodes with just one subtree if len(partition) == 1: part,*_ = partition # Just to be sure the assumption is correct assert part == leaves continue # We recursively apply *tree* to each part # and obtain the possible trees by making # the product of the sets of possible subtrees. for subtree in product(*map(trees,partition)): # Using a frozenset guarantees # that there will be no duplicates yield frozenset(subtree) 测试它: all_trees = frozenset( {frozenset({tree,"D"})}) for tree in all_trees: print(print_set(tree) + ";") 输出: (((B,E); ((((A,E); ((((B,A),E); ((((C,E); (((A,E); ((A,D)),E); (((B,E); (((C,C)),E); 我希望结果是正确的. 这种做法有点难以实现.我花了一些时间来弄清楚如何避免无限递归(当分区为{{“A”,“D”}}时会发生这种情况). (编辑:李大同) 【声明】本站内容均来自网络,其相关言论仅代表作者个人观点,不代表本站立场。若无意侵犯到您的权利,请及时与联系站长删除相关内容! |