加入收藏 | 设为首页 | 会员中心 | 我要投稿 李大同 (https://www.lidatong.com.cn/)- 科技、建站、经验、云计算、5G、大数据,站长网!
当前位置: 首页 > 大数据 > 正文

【NLP】Conditional Language Models

发布时间:2020-12-14 03:18:47 所属栏目:大数据 来源:网络整理
导读:Language Model estimates the probs that the sequences of words can be a sentence said by a human. Training it,we can get the embeddings of the whole vocabulary. ? UnConditional Language Model? just assigns probs to sequences of words. That

Language Model estimates the probs that the sequences of words can be a sentence said by a human. Training it,we can get the embeddings of the whole vocabulary.

?

UnConditional Language Model? just assigns probs to sequences of words. That’s to say,given the first n-1 words and to predict the probs of the next word.(learn the prob distribution of next word).

Beacuse of the probs chain rule,we only train this:

?

Conditional LMs

A conditional language model assigns probabilities to sequences of words,W =(w1,w2,…,wt),given some conditioning context x.

?

For example,in the translation task,we must given the orininal sentence and its translation. The orininal sentence is the conditioning context,and by using it,we predict the objection sentence.

?

Data for training conditional LMs:

  To train conditional language models,we need paired? samples.E.X.

Such task like:Translation,summarisation,caption generation,? speech recognition

?

How to evaluate the conditional LMs?

  • Traditional methods: use the cross-entropy or perplexity.(hard to interpret,easy to implement)
  • Task-specific evaluation:? Compare the model’s most likely output to human-generated expected output . Such as 【BLEU】、METEOR、ROUGE…(okay to interpret,easy to implement)
  • Human evaluation: Hard to implement.

?

Algorithmic challenges:

Given the condition context x,to find the max-probs of the the predict sequence of words,we cannot use the gready search,which might cann’t generate a real sentence.

We use the 【Beam Search】.

?

We draw attention to the “encoder-decoder” models? that learn a function that maps? x? into a ?xed-size? vector and then uses a language model to “decode”? that vector into a sequence of words,?

?

Model: K&B2013

A simpal of Encoder – just cumsum(very easy)

A simpal of Encoder – CSM Encoder:use CNN to encode

The Decoder – RNN Decoder

The cal graph is.

?

Sutskever et al. Model (2014):

- Important.Classic Model

Cal Graph:

?

Some Tricks to Sutskever et al. Model :

  • Read the Input Sequence ‘backwards’: +4BLEU

  

  • ?Use an ensemble of m 【independently trained】 models (at the decode period) :
  1. Ensemble of 2 models: +3 BLEU
  2. Ensemble of 5 models: +4.5 BLEU


    For example:

      

  • we want to ?nd the most probable (MAP) output? given the input,i,e.

      

  We use the beam search : +1BLEU

    For example,the beam size is 2:

      

?

Example of A Application: Image caption generation

Encoder:CNN

Decoder:RNN or

???????????? conditional n-gram LM(different to the RNN but it is useful)

????????????

????????????

?

We must have some datasets already.

Kiros et al. Model has done this.

?

?

?

?

?

?

?

?

?

?

?

?

?

?

?

?

?

?

?

? .

(编辑:李大同)

【声明】本站内容均来自网络,其相关言论仅代表作者个人观点,不代表本站立场。若无意侵犯到您的权利,请及时与联系站长删除相关内容!

    推荐文章
      热点阅读