In trying to learn a bit about Tensorflow, I had been building a Variational Auto Encoder, which is working, however I noticed that, after training, I was getting different results from the decoders which are sharing the same variables.
在尝试学习一些关于Tensorflow的知识时,我一直在构建一个变分自动编码器,它正在运行,但是我注意到,在训练之后,我得到了不同的结果,这些解码器共享相同的变量。
I created two decoders, because the first I train against my dataset, the second I want to eventually feed a new Z encoding in order to produce new values.
我创建了两个解码器,因为第一个是针对我的数据集进行训练的,第二个是我想要最终提供一个新的Z编码以产生新的值。
My check is that I shoud be able to send the Z values generated from the encoding process to both decoders and get equal results.
我的检查是,我应该能够将从编码过程生成的Z值发送到两个解码器并得到相同的结果。
I have 2 Decoders (D, D_new). D_new shares the variable scope from D.
我有两个解码器(D, D_new)。D_new与D共享变量范围。
before training, I can send values into the Encoder (E) to generate output values as well as the Z values it generated (Z_gen).
在训练之前,我可以将值发送到编码器(E)以生成输出值以及它生成的Z值(Z_gen)。
if I use Z_gen as input to D_new before training then its output is identical to the output of D, which is expected.
如果在训练之前使用Z_gen作为输入到D_new,那么它的输出与D的输出是相同的,这是预期的。
After a few iterations of training, however, the output of D compared with D_new begins to diverge (although they are quite similar).
然而,经过几次迭代训练后,D与D_new的输出开始出现分歧(尽管它们非常相似)。
I have paired this down to a more simple version of my code which still reproduces the error. I'm wondering if others have found this to be the case and where I might be able to correct for it.
我将它与我的代码的一个更简单的版本配对,它仍然会重现错误。我想知道其他人是否已经发现了这个问题,并且我可以在哪里更正它。
The below code can be run in a jupyter notebook. I'm using Tensorflow r0.11 and Python 3.5.0
下面的代码可以在一个jupyter笔记本中运行。我使用的是Tensorflow r0.11和Python 3.5.0。
import numpy as np
import tensorflow as tf
import matplotlib
import matplotlib.pyplot as plt
import os
import pylab as pl
mgc = get_ipython().magic
mgc(u'matplotlib inline')
pl.rcParams['figure.figsize'] = (8.0, 5.0)
##-- Helper function Just for visualizing the data
def plot_values(values, file=None):
t = np.linspace(1.0,len(values[0]),len(values[0]))
for i in range(len(values)):
plt.plot(t,values[i])
if file is None:
plt.show()
else:
plt.savefig(file)
plt.close()
def encoder(input, n_hidden, n_z):
with tf.variable_scope("ENCODER"):
with tf.name_scope("Hidden"):
n_layer_inputs = input.get_shape()[1].value
n_layer_outputs = n_hidden
with tf.name_scope("Weights"):
w = tf.get_variable(name="E_Hidden", shape=[n_layer_inputs, n_layer_outputs], dtype=tf.float32)
with tf.name_scope("Activation"):
a = tf.tanh(tf.matmul(input,w))
prevLayer = a
with tf.name_scope("Z"):
n_layer_inputs = prevLayer.get_shape()[1].value
n_layer_outputs = n_z
with tf.name_scope("Weights"):
w = tf.get_variable(name="E_Z", shape=[n_layer_inputs, n_layer_outputs], dtype=tf.float32)
with tf.name_scope("Activation"):
Z_gen = tf.matmul(prevLayer,w)
return Z_gen
def decoder(input, n_hidden, n_outputs, reuse=False):
with tf.variable_scope("DECODER", reuse=reuse):
with tf.name_scope("Hidden"):
n_layer_inputs = input.get_shape()[1].value
n_layer_outputs = n_hidden
with tf.name_scope("Weights"):
w = tf.get_variable(name="D_Hidden", shape=[n_layer_inputs, n_layer_outputs], dtype=tf.float32)
with tf.name_scope("Activation"):
a = tf.tanh(tf.matmul(input,w))
prevLayer = a
with tf.name_scope("OUTPUT"):
n_layer_inputs = prevLayer.get_shape()[1].value
n_layer_outputs = n_outputs
with tf.name_scope("Weights"):
w = tf.get_variable(name="D_Output", shape=[n_layer_inputs, n_layer_outputs], dtype=tf.float32)
with tf.name_scope("Activation"):
out = tf.sigmoid(tf.matmul(prevLayer,w))
return out
Here is where the Tensorflow graph is setup:
这是肌腱流图的设置:
batch_size = 3
n_inputs = 100
n_hidden_nodes = 12
n_z = 2
with tf.variable_scope("INPUT_VARS"):
with tf.name_scope("X"):
X = tf.placeholder(tf.float32, shape=(None, n_inputs))
with tf.name_scope("Z"):
Z = tf.placeholder(tf.float32, shape=(None, n_z))
Z_gen = encoder(X,n_hidden_nodes,n_z)
D = decoder(Z_gen, n_hidden_nodes, n_inputs)
D_new = decoder(Z, n_hidden_nodes, n_inputs, reuse=True)
with tf.name_scope("COST"):
loss = -tf.reduce_mean(X * tf.log(1e-6 + D) + (1-X) * tf.log(1e-6 + 1 - D))
train_step = tf.train.AdamOptimizer(0.001, beta1=0.5).minimize(loss)
I'm generating a training set of 3 samples of normal distribution noise with 100 data points and then sort it to more easily visualize:
我用100个数据点生成了3个正态分布噪声样本的训练集然后将其排序,更容易理解:
train_data = (np.random.normal(0,1,(batch_size,n_inputs)) + 3) / 6.0
train_data.sort()
plot_values(train_data)
startup the session:
启动会话:
sess = tf.InteractiveSession()
sess.run(tf.group(tf.initialize_all_variables(), tf.initialize_local_variables()))
Lets just look at what the network initially generates before training...
让我们看看在培训之前,网络最初产生了什么……
resultA, Z_vals = sess.run([D, Z_gen], feed_dict={X:train_data})
plot_values(resultA)
Pulling the Z generated values and feeding them to D_new which is reusing the variables from D:
将Z生成的值拉到D_new中,这是重新使用D中的变量:
resultB = sess.run(D_new, feed_dict={Z:Z_vals})
plot_values(resultB)
Just for sanity I'll plot the difference between the two to be sure they're the same...
为了保持头脑清醒,我将把两者的区别画出来,以确保它们是相同的……
Now run 1000 training epochs and plot the result...
现在运行1000个训练纪元,并绘制结果……
for i in range(1000):
_, resultA, Z_vals = sess.run([train_step, D, Z_gen], feed_dict={X:train_data})
plot_values(resultA)
Now lets feed those same Z values to D_new and plot those results...
现在让我们将相同的Z值添加到D_new并将这些结果绘制出来…
resultB = sess.run(D_new, feed_dict={Z:Z_vals})
plot_values(resultB)
They look pretty similar. But (I think) they should be exactly the same. Let's look at the difference...
他们看起来非常相似。但是(我认为)他们应该是完全一样的。让我们看看区别。
plot_values(resultA - resultB)
You can see there is some variation now. This becomes much more dramatic with a larger network on more complex data, but still shows up in this simple example. Any clues as to what's going on?
你可以看到现在有一些变化。在更复杂的数据上使用更大的网络会变得更有戏剧性,但在这个简单的例子中仍然会出现。有什么线索吗?
0
There are some methods (don't know which one specifically) which can be supplied with a seed value. Besides those, I'm not even sure if the training process is completely deterministic, especially when the GPU is involved, simply by the nature of parallelization.
有一些方法(不知道具体哪一个)可以提供种子值。除此之外,我甚至不确定训练的过程是否完全确定,特别是当GPU涉及到的时候,仅仅是通过并行的性质。
See this question.
看到这个问题。
0
While I don't have a full explanation for the reason why, I was able to resolve my issue by changing:
虽然我没有完全解释原因,但我可以通过改变来解决我的问题:
for i in range(1000):
_, resultA, Z_vals = sess.run([train_step, D, Z_gen], feed_dict={X:train_data})
plot_values(resultA)
resultB = sess.run(D_new, feed_dict={Z:Z_vals})
plot_values(resultB)
plot_values(resultA - resultB)
to...
……
for i in range(1000):
_, resultA, Z_vals = sess.run([train_step, D, Z_gen], feed_dict={X:train_data})
resultA, Z_vals = sess.run([D, Z_gen], feed_dict={X:train_data})
plot_values(resultA)
resultB = sess.run(D_new, feed_dict={Z:Z_vals})
plot_values(resultB)
plot_values(resultA - resultB)
Note, that I simply ran and extracted the result and Z_vals one last time, without the train_step
.
注意,我只是简单地运行和提取结果和Z_vals最后一次,没有train_step。
The reason I was still seeing problems in my more complex setup was that I had bias variables (even though they were set to 0.0) that were being generated with...
我在更复杂的设置中仍然看到问题的原因是,我有偏见变量(尽管它们被设置为0.0),这些变量是由…
b = tf.Variable(tf.constant(self.bias_k, shape=[n_layer_outputs], dtype=tf.float32))
And that is somehow not considered while using reuse
with a tf.variable_scope
. So there were variables technically not being reused. Why they presented such a problem when set to 0.0 I'm not sure.
在使用tf.variable_scope使用重用时,这是不被考虑的。所以从技术上讲,有些变量没有被再利用。为什么在设置为0.0时出现这样的问题,我不确定。
本站翻译的文章,版权归属于本站,未经许可禁止转摘,转摘请注明本文地址:http://www.silva-art.net/blog/2016/11/22/d357bbe447c0ae89352c685f0cbe1f6e.html。