python – 如何从马尔可夫链输出创建段落？

发布时间：2020-12-16 21:58:37 所属栏目：Python 来源：网络整理

导读：我想修改下面的脚本,以便它从脚本生成的随机数量的句子中创建段落.换句话说,在添加换行符之前连接一个随机数(如1-5)的句子. 脚本工作正常,但输出是由换行符分隔的短句.我想把一些句子收集成段落. 关于最佳实践的任何想法？谢谢. """ from: http://code.activ

我想修改下面的脚本,以便它从脚本生成的随机数量的句子中创建段落.换句话说,在添加换行符之前连接一个随机数(如1-5)的句子.

脚本工作正常,但输出是由换行符分隔的短句.我想把一些句子收集成段落.

关于最佳实践的任何想法？谢谢.

"""
    from:  http://code.activestate.com/recipes/194364-the-markov-chain-algorithm/?in=lang-python
"""

import random;
import sys;

stopword = "n" # Since we split on whitespace,this can never be a word
stopsentence = (".","!","?",) # Cause a "new sentence" if found at the end of a word
sentencesep  = "n" #String used to seperate sentences


# GENERATE TABLE
w1 = stopword
w2 = stopword
table = {}

for line in sys.stdin:
    for word in line.split():
        if word[-1] in stopsentence:
            table.setdefault( (w1,w2),[] ).append(word[0:-1])
            w1,w2 = w2,word[0:-1]
            word = word[-1]
        table.setdefault( (w1,[] ).append(word)
        w1,word
# Mark the end of the file
table.setdefault( (w1,[] ).append(stopword)

# GENERATE SENTENCE OUTPUT
maxsentences  = 20

w1 = stopword
w2 = stopword
sentencecount = 0
sentence = []

while sentencecount < maxsentences:
    newword = random.choice(table[(w1,w2)])
    if newword == stopword: sys.exit()
    if newword in stopsentence:
        print ("%s%s%s" % (" ".join(sentence),newword,sentencesep))
        sentence = []
        sentencecount += 1
    else:
        sentence.append(newword)
    w1,newword

编辑01：

好吧,我拼凑了一个简单的“段落包装器”,它可以很好地将句子收集到段落中,但它与句子生成器的输出相混淆 – 我对第一个单词的重复性过高,例如,其他的问题.

但前提是声音;我只需要弄清楚为什么句子循环的功能受到段落循环的添加的影响.如果您能看到问题,请告知：

###
#    usage: $python markov_sentences.py < input.txt > output.txt
#    from:  http://code.activestate.com/recipes/194364-the-markov-chain-algorithm/?in=lang-python
###

import random;
import sys;

stopword = "n" # Since we split on whitespace,) # Cause a "new sentence" if found at the end of a word
paragraphsep  = "nn" #String used to seperate sentences


# GENERATE TABLE
w1 = stopword
w2 = stopword
table = {}

for line in sys.stdin:
    for word in line.split():
        if word[-1] in stopsentence:
            table.setdefault( (w1,[] ).append(stopword)

# GENERATE PARAGRAPH OUTPUT
maxparagraphs = 10
paragraphs = 0 # reset the outer 'while' loop counter to zero

while paragraphs < maxparagraphs: # start outer loop,until maxparagraphs is reached
    w1 = stopword
    w2 = stopword
    stopsentence = (".",)
    sentence = []
    sentencecount = 0 # reset the inner 'while' loop counter to zero
    maxsentences = random.randrange(1,5) # random sentences per paragraph

    while sentencecount < maxsentences: # start inner loop,until maxsentences is reached
        newword = random.choice(table[(w1,w2)]) # random word from word table
        if newword == stopword: sys.exit()
        elif newword in stopsentence:
            print ("%s%s" % (" ".join(sentence),newword),end=" ")
            sentencecount += 1 # increment the sentence counter
        else:
            sentence.append(newword)
        w1,newword
    print (paragraphsep) # newline space
    paragraphs = paragraphs + 1 # increment the paragraph counter


# EOF

编辑02：

将以下句子中的句子= []添加到elif语句中.以机智;

        elif newword in stopsentence:
            print ("%s%s" % (" ".join(sentence),end=" ")
            sentence = [] # I have to be here to make the new sentence start as an empty list!!!
            sentencecount += 1 # increment the sentence counter

编辑03：

这是此脚本的最后一次迭代.感谢悲伤帮助整理出来.我希望其他人可以玩得开心,我知道我会的.

（编辑：李大同）

【声明】本站内容均来自网络，其相关言论仅代表作者个人观点，不代表本站立场。若无意侵犯到您的权利，请及时与联系站长删除相关内容!