TensorFlow教程——Bi-LSTM+CRF进行序列标注（代码浅析）

发布时间：2020-12-14 03:48:48 所属栏目：大数据来源：网络整理

导读：https://blog.csdn.net/guolindonggld/article/details/79044574 Bi-LSTM 使用TensorFlow构建Bi-LSTM时经常是下面的代码： cell_fw = tf.contrib.rnn.LSTMCell(num_units= 100) cell_bw = tf.contrib.rnn.LSTMCell(num_units= 100) (outputs,output_states)

https://blog.csdn.net/guolindonggld/article/details/79044574

Bi-LSTM

使用TensorFlow构建Bi-LSTM时经常是下面的代码：

cell_fw = tf.contrib.rnn.LSTMCell(num_units=100) cell_bw = tf.contrib.rnn.LSTMCell(num_units=100) (outputs,output_states) = tf.nn.bidirectional_dynamic_rnn(cell_fw,cell_bw,inputs,sequence_length=300)

首先下面是我画的Bi-LSTM示意图：

其实LSTM使用起来很简单，就是输入一排的向量，然后输出一排的向量。构建时只要设定两个超参数：num_units和sequence_length。

LSTMCell

tf.contrib.rnn.LSTMCell(
    num_units,use_peepholes=False,cell_clip=None,initializer=None,num_proj=None,proj_clip=None,num_unit_shards=None,num_proj_shards=None,forget_bias=1.0,state_is_tuple=True,activation=None,reuse=None )

上面的LSTM Cell只有一个超参数需要设定，num_units，即输出向量的维度。

bidirectional_dynamic_rnn()

(outputs,output_states) = tf.nn.bidirectional_dynamic_rnn(
    cell_fw,sequence_length=None,initial_state_fw=None,initial_state_bw=None,dtype=None,parallel_iterations=None,swap_memory=False,time_major=False,scope=None )

这个函数唯一需要设定的超参数就是序列长度sequence_length。

输入：?
inputs的shape通常是[batch_size,sequence_length,dim_embedding]。

输出：?
outputs是一个(output_fw,output_bw)元组，output_fw和output_bw的shape都是[batch_size,num_units]

output_states是一个(output_state_fw,output_state_bw) 元组，分别是前向和后向最后一个Cell的Output，output_state_fw和output_state_bw的类型都是LSTMStateTuple，这个类有两个属性c和h，分别表示Memory Cell和Hidden State，如下图：?

CRF

对于序列标注问题，通常会在LSTM的输出后接一个CRF层：将LSTM的输出通过线性变换得到维度为[batch_size,max_seq_len,num_tags]的张量，这个张量再作为一元势函数（Unary Potentials）输入到CRF层。

# 将两个LSTM的输出合并 output_fw,output_bw = outputs output = tf.concat([output_fw,output_bw],axis=-1) # 变换矩阵，可训练参数 W = tf.get_variable("W",[2 * num_units,num_tags]) # 线性变换 matricized_output = tf.reshape(output,[-1,2 * num_units]) matricized_unary_scores = tf.matmul(matricized_output,W) unary_scores = tf.reshape(matricized_unary_scores,[batch_size,num_tags])

损失函数

# Loss函数 log_likelihood,transition_params = tf.contrib.crf.crf_log_likelihood(unary_scores,tags,sequence_lengths) loss = tf.reduce_mean(-log_likelihood)

其中?
tags：维度为[batch_size,max_seq_len]的矩阵，也就是Golden标签，注意这里的标签都是以索引方式表示的。?
sequence_lengths：维度为[batch_size]的向量，记录了每个序列的长度。

log_likelihood：维度为[batch_size]的向量，每个元素代表每个给定序列的Log-Likelihood。?
transition_params ：维度为[num_tags,num_tags]的转移矩阵。注意这里的转移矩阵不像传统的HMM概率转移矩阵那样要求每个元素非负且每一行的和为1，这里的每个元素取值范围是实数（正负都可以）。

解码

decode_tags,best_score = tf.contrib.crf.crf_decode(unary_scores,transition_params,sequence_lengths)

其中?decode_tags：维度为[batch_size,max_seq_len]的矩阵，包含最高分的标签序列。?best_score ：维度为[batch_size]的向量，包含最高分数。

（编辑：李大同）

【声明】本站内容均来自网络，其相关言论仅代表作者个人观点，不代表本站立场。若无意侵犯到您的权利，请及时与联系站长删除相关内容!