问题 使用Tensorflow的连接主义时间分类（CTC）实现

我试图在contrib包（tf.contrib.ctc.ctc_loss）下使用Tensorflow的CTC实现，但没有成功。

首先，任何人都知道我在哪里可以阅读一个好的分步教程？ Tensorflow的文档在这个主题上非常糟糕。
我是否必须向ctc_loss提供交错的空白标签？
即使使用长度超过200个时期的火车数据集，我也无法过度使用我的网络。 :(
如何使用tf.edit_distance计算标签错误率？

这是我的代码：

with graph.as_default():

  max_length = X_train.shape[1]
  frame_size = X_train.shape[2]
  max_target_length = y_train.shape[1]

  # Batch size x time steps x data width
  data = tf.placeholder(tf.float32, [None, max_length, frame_size])
  data_length = tf.placeholder(tf.int32, [None])

  #  Batch size x max_target_length
  target_dense = tf.placeholder(tf.int32, [None, max_target_length])
  target_length = tf.placeholder(tf.int32, [None])

  #  Generating sparse tensor representation of target
  target = ctc_label_dense_to_sparse(target_dense, target_length)

  # Applying LSTM, returning output for each timestep (y_rnn1, 
  # [batch_size, max_time, cell.output_size]) and the final state of shape
  # [batch_size, cell.state_size]
  y_rnn1, h_rnn1 = tf.nn.dynamic_rnn(
    tf.nn.rnn_cell.LSTMCell(num_hidden, state_is_tuple=True, num_proj=num_classes), #  num_proj=num_classes
    data,
    dtype=tf.float32,
    sequence_length=data_length,
  )

  #  For sequence labelling, we want a prediction for each timestamp. 
  #  However, we share the weights for the softmax layer across all timesteps. 
  #  How do we do that? By flattening the first two dimensions of the output tensor. 
  #  This way time steps look the same as examples in the batch to the weight matrix. 
  #  Afterwards, we reshape back to the desired shape


  # Reshaping
  logits = tf.transpose(y_rnn1, perm=(1, 0, 2))

  #  Get the loss by calculating ctc_loss
  #  Also calculates
  #  the gradient.  This class performs the softmax operation for you, so    inputs
  #  should be e.g. linear projections of outputs by an LSTM.
  loss = tf.reduce_mean(tf.contrib.ctc.ctc_loss(logits, target, data_length))

  #  Define our optimizer with learning rate
  optimizer = tf.train.RMSPropOptimizer(learning_rate).minimize(loss)

  #  Decoding using beam search
  decoded, log_probabilities = tf.contrib.ctc.ctc_beam_search_decoder(logits, data_length, beam_width=10, top_paths=1)

谢谢！

更新（2016年6月29日）

谢谢你，@ jihyeon-seo！所以，我们输入的RNN就像[num_batch，max_time_step，num_features]。我们使用dynamic_rnn在给定输入的情况下执行循环计算，输出形状张量[num_batch，max_time_step，num_hidden]。在那之后，我们需要在每个tilmestep中使用权重共享进行仿射投影，因此我们要重塑为[num_batch * max_time_step，num_hidden]，乘以形状[num_hidden，num_classes]的权重矩阵，求和偏差重塑，转置（所以我们将[max_time_steps，num_batch，num_classes]用于ctc丢失输入），这个结果将是ctc_loss函数的输入。我做的一切都是正确的吗？

这是代码：

    cell = tf.nn.rnn_cell.MultiRNNCell([cell] * num_layers, state_is_tuple=True)

    h_rnn1, self.last_state = tf.nn.dynamic_rnn(cell, self.input_data, self.sequence_length, dtype=tf.float32)

    #  Reshaping to share weights accross timesteps
    x_fc1 = tf.reshape(h_rnn1, [-1, num_hidden])

    self._logits = tf.matmul(x_fc1, self._W_fc1) + self._b_fc1

    #  Reshaping
    self._logits = tf.reshape(self._logits, [max_length, -1, num_classes])

    #  Calculating loss
    loss = tf.contrib.ctc.ctc_loss(self._logits, self._targets, self.sequence_length)

    self.cost = tf.reduce_mean(loss)

更新（07/11/2016）

谢谢@Xiv。以下是错误修复后的代码：

    cell = tf.nn.rnn_cell.MultiRNNCell([cell] * num_layers, state_is_tuple=True)

    h_rnn1, self.last_state = tf.nn.dynamic_rnn(cell, self.input_data, self.sequence_length, dtype=tf.float32)

    #  Reshaping to share weights accross timesteps
    x_fc1 = tf.reshape(h_rnn1, [-1, num_hidden])

    self._logits = tf.matmul(x_fc1, self._W_fc1) + self._b_fc1

    #  Reshaping
    self._logits = tf.reshape(self._logits, [-1, max_length, num_classes])
    self._logits = tf.transpose(self._logits, (1,0,2))

    #  Calculating loss
    loss = tf.contrib.ctc.ctc_loss(self._logits, self._targets, self.sequence_length)

    self.cost = tf.reduce_mean(loss)

更新（07/25/16）

一世发表在我的代码的GitHub上，使用一个话语。随意使用！ :)

我正在尝试做同样的事情。这是我发现你可能感兴趣的东西。

很难找到ctc的教程，但是这个例子（https://github.com/tensorflow/tensorflow/blob/679f95e9d8d538c3c02c0da45606bab22a71420e/tensorflow/python/kernel_tests/ctc_loss_op_test.py）很有帮助。

对于空白标签，ctc层假定空白索引为num_classes - 1.因此您需要为空白标签提供额外的类。（https://github.com/tensorflow/tensorflow/blob/d42facc3cc9611f0c9722c81551a7404a0bd3f6b/tensorflow/core/kernels/ctc_loss_op.cc，第146行）

ctc网络也执行softmax层。在您的代码中，rnn层连接到ctc丢失层。 rnn层的输出在内部被激活，因此你需要在没有激活功能的情况下再添加一个隐藏层（它可以是输出层），然后添加ctc loss layer。

看到这里例如，使用双向LSTM，CTC和编辑距离实现，在TIMIT语料库上训练音素识别模型。如果您在该语料库的训练集上训练，您应该能够在120个左右的时间后将音素错误率降低到20-25％。

在RNN之后重新整形后，代码中出现错误。如果矩阵是Time Major，那么你的重塑是正确的，但是RNN需要传入time_major = True。如果矩阵是Batch major，那么你需要tf.transpose（tf.reshapose（[ - 1，max_length] ，num_classes]），[1,0,2]） - Xiv

谢谢@ jihyeon-seo。您是否在使用CTC丢失训练网络时遇到任何问题？过度拟合这个网络实在太难了，但是在很多论文中，作者说LSTM网络很容易过载，我不能用1个LSTM层来覆盖我的网络，320个存储单元仅使用1个话语（TIMIT语料库，带过滤器）银行特色）甚至在2000年后。 :( - Igor Macedo Quintanilha

在仅仅100个纪元之后，我得到了一个句子的过度拟合的LSTM模型。 - Jihyeon Seo

我认为你可以检查LSTM层和CTC损耗层之间的输入和输出张量。你检查过ctc层是否会返回在每个时代更新的损失？ - Jihyeon Seo

是的，我查了一下。当使用滤波器组的输出作为我的功能时，我无法训练网络。但是，当我切换到mfcc功能时，一切顺利。 :) - Igor Macedo Quintanilha

@JihyeonSeo你能解释一下吗？ rnn layer is internally activated 意思更详细？我试图理解为什么需要一个额外的仿射变换层（没有激活功能）。 - Helin Wang

问题使用Tensorflow的连接主义时间分类（CTC）实现

答案:

热门问题

问题 使用Tensorflow的连接主义时间分类（CTC）实现

答案:

热门问题

问题使用Tensorflow的连接主义时间分类（CTC）实现