入门

TensorFlow 安装

pip install tensorflow

或者在 docker 中

# CPU only
docker run -it -p 8888:8888 tensorflow/tensorflow
# GPU version
nvidia-docker run -it -p 8888:8888 tensorflow/tensorflow:latest-gpu

TensorFlow 入门

from __future__ import print_function, division
import tensorflow as tf

print('Loaded TF version', tf.__version__)

简单示例

import tensorflow as tf

a = tf.constant(5, name="input_a")
b = tf.constant(3, name="input_b")
c = tf.mul(a, b, name="mul_c")
d = tf.add(a, b, name="add_d")
e = tf.add(c, d, name="add_e")

with tf.Session() as sess:
    print sess.run(e) # output => 23
    writer = tf.train.SummaryWriter("./hello_graph", sess.graph)

接着,可以启动 tensorboard 来查看这个 Graph(在 jupyter notebookt 中可以执行 !tensorboard --logdir="hello_graph"):

tensorboard --logdir="hello_graph"

打开网页 http://localhost:6006 并切换到 GRAPHS 标签,可以看到生成的 Graph:

Tensor 简单示例

再来看一个输入向量的例子:

import tensorflow as tf

a = tf.constant([5,3], name="input_a")
b = tf.reduce_prod(a, name="prod_b")
c = tf.reduce_sum(a, name="sum_c")
d = tf.add(c, b, name="add_d")

with tf.Session() as sess:
    print sess.run(d) # => 23

基本类型

Tensorflow 中所有的数据都称之为 Tensor,可以是一个变量、数组或者多维数组。Tensor 有几个重要的属性:

常量:

# Constant

a = tf.constant(2)
b = tf.constant(3)

with tf.Session() as sess:
    print sess.run(a+b)  # => 5

变量在计算过程中是可变的,并且在训练过程中会自动更新或优化。如果只想在 tf 外手动更新变量,那需要声明变量是不可训练的,比如 not_trainable = tf.Variable(0, trainable=False)

# Variable
# Variables maintain state across executions of the
# graph. The following example shows a variable serving
# as a simple counter.

v1 = tf.Variable(10)
v2 = tf.Variable(5)

with tf.Session() as sess:
    # variables must be initialized first.
    tf.initialize_all_variables().run(session=sess)
    print sess.run(v1+v2) # => 15
# Placeholder and feed
# Placeholder is used as Graph input when running session
# A feed temporarily replaces the output of an operation
# with a tensor value. You supply feed data as an argument
# to a run() call. The feed is only used for the run call
# to which it is passed. The most common use case involves
# designating specific operations to be "feed" operations
# by using tf.placeholder() to create them

a = tf.placeholder(tf.int16)
b = tf.placeholder(tf.int16)

# Define some operations
add = tf.add(a, b)
mul = tf.mul(a, b)

with tf.Session() as sess:
    print sess.run(add, feed_dict={a: 2, b: 3})  # ==> 5
    print sess.run(mul, feed_dict={a: 2, b: 3})  # ==> 6
# Matrix

# Create a Constant op that produces a 1x2 matrix.  The op is
# added as a node to the default graph.
#
# The value returned by the constructor represents the output
# of the Constant op.
matrix1 = tf.constant([[3., 3.]])

# Create another Constant that produces a 2x1 matrix.
matrix2 = tf.constant([[2.],[2.]])

# Create a Matmul op that takes 'matrix1' and 'matrix2' as inputs.
# The returned value, 'product', represents the result of the matrix
# multiplication.
product = tf.matmul(matrix1, matrix2)
with tf.Session() as sess:
    print product.eval() # => 12

数据类型

Tensorflow 以图(Graph)来表示计算任务,图中的节点称之为 op(即 operation)。每个节点包括 0 个或多个 Tensor。为了进行计算,图必须在会话中启动,会话将图的 op 分发到 CPU、GPU 等设备并在执行后返回新的 Tensor。

图(Graph)和会话(Session)

如果不指定图,tensorflow 会自动创建一个,可以通过 tf.get_default_graph() 来获取这个默认图。

graph = tf.Graph()
with graph.as_default():
    value1 = tf.constant([1., 2.])
    value2 = tf.Variable([3., 4.])
    result = value1*value2

启动图时,需要创建会话,并在改会话中启动:

# 使用自定义图
with tf.Session(graph=graph) as sess:
    tf.global_variables_initializer().run()
    print sess.run(result)
    print result.eval()

使用默认图

sess = tf.Session()

# 在会话中执行图
print(sess.run(product))

# 使用完毕关闭会话
sess.close()

会话使用完成后必须关闭,可以调用 sess.close(),也可以使用 with 代码块

with tf.Session() as sess:
    print(sess.run(product))

会话在计算图时会自动检测设备,并在有 GPU 设备的机器上自动使用 GPU。但多个 GPU 时,Tensorflow 只会使用第一个 GPU 设备,要使用其他设备必须指定:

with tf.Session() as sess:
    with tf.device("/gpu:1"):
        print(sess.run(product))

在 IPython 等交互式环境中,可以使用 tf.InteractiveSession 代替 tf.Session。这样,可以直接调用 Tensor.eval()Operation.run() 方法,非常方便。

Tensor

Tensorflow 中所有的数据都称之为 Tensor,可以是一个变量、数组或者多维数组。Tensor 有几个重要的属性:

Rank 与 Shape 的关系如下表所示

Rank Shape Dimension number Example
0 [] 0-D A 0-D tensor. A scalar.
1 [D0] 1-D A 1-D tensor with shape [5].
2 [D0, D1] 2-D A 2-D tensor with shape [3, 4].
3 [D0, D1, D2] 3-D A 3-D tensor with shape [1, 4, 3].
n [D0, D1, … Dn-1] n-D A tensor with shape [D0, D1, … Dn-1].

常量(Constant)

常量即计算过程中不可变的类型,如

a = tf.constant(2)
b = tf.constant(3)

with tf.Session() as sess:
    print sess.run(a+b)  # Output => 5

变量(Variable)

变量在计算过程中是可变的,并且在训练过程中会自动更新或优化,常用于模型参数。在定义时需要指定初始值。

如果只想在 tf 外手动更新变量,那需要声明变量是不可训练的,比如 not_trainable = tf.Variable(0, trainable=False)

v1 = tf.Variable(10)
v2 = tf.Variable(5)

with tf.Session() as sess:
    # variables must be initialized first.
    tf.global_variables_initializer().run(session=sess)
    print(sess.run(v1+v2)) # Output => 15

占位符(Placeholder)

占位符用来给计算图提供输入,常用于传递训练样本。需要在 Session.run() 时通过 feed 绑定。

a = tf.placeholder(tf.int16)
b = tf.placeholder(tf.int16)

# Define some operations
add = tf.add(a, b)
mul = tf.multiply(a, b)

with tf.Session() as sess:
    print (sess.run(add, feed_dict={a: 2, b: 3}))  # ==> 5
    print (sess.run(mul, feed_dict={a: 2, b: 3}))  # ==> 6

数据类型

Tensorflow 有着丰富的数据类型,比如 tf.int32, tf.float64 等,这些类型跟 numpy 是一致的。

import tensorflow as tf
import numpy as np

a = np.array([2, 3], dtype=np.int32)
b = np.array([4, 5], dtype=np.int32)
# Use `tf.add()` to initialize an "add" Operation
c = tf.add(a, b)

with tf.Session() as sess:
    print sess.run(c) # ==> [6 8]

tf.convert_to_tensor(value, dtype=tf.float32) 是一个非常有用的转换函数,一般用来构造新的 Operation。它还可以同时接受 python 原生类型、numpy 数据以及 Tensor 数据。

数学计算

Tensorflow 内置了很多的数学计算操作,包括常见的各种数值计算、矩阵运算以及优化算法等。

import tensorflow as tf
# 使用交互式会话方便展示
sess = tf.InteractiveSession()

x = tf.constant([[2, 5, 3, -5],
                 [0, 3,-2,  5],
                 [4, 3, 5,  3],
                 [6, 1, 4,  0]])
y = tf.constant([[4, -7, 4, -3, 4],
                 [6, 4,-7,  4, 7],
                 [2, 3, 2,  1, 4],
                 [1, 5, 5,  5, 2]])

floatx = tf.constant([[2., 5., 3., -5.],
                      [0., 3.,-2.,  5.],
                      [4., 3., 5.,  3.],
                      [6., 1., 4.,  0.]])

print (tf.transpose(x).eval())
print (tf.matmul(x, y).eval())
print (tf.matrix_determinant(tf.to_float(x)).eval())
print (tf.matrix_inverse(tf.to_float(x)).eval())
print (tf.matrix_solve(tf.to_float(x), [[1],[1],[1],[1]]).eval())

Reduction

Reduction 对指定的维度进行操作,并返回降维后的结果:

import tensorflow as tf
sess = tf.InteractiveSession()

x = tf.constant([[1,  2, 3],
                 [3,  2, 1],
                 [-1,-2,-3]])

boolean_tensor = tf.constant([[True,  False, True],
                 [False, False, True],
                 [True, False, False]])

print (tf.reduce_prod(x).eval()) # => -216
print (tf.reduce_prod(x, reduction_indices=1).eval()) # => [6,6,-6]
print (tf.reduce_min(x, reduction_indices=1).eval()) # => [ 1  1 -3]
print (tf.reduce_max(x, reduction_indices=1).eval()) # => [ 3  3 -1]
print (tf.reduce_mean(x, reduction_indices=1).eval()) # => [ 2  2 -2]

# Computes the "logical and" of elements
print (tf.reduce_all(boolean_tensor, reduction_indices=1).eval()) # => [False False False]

# Computes the "logical or" of elements
print (tf.reduce_any(boolean_tensor, reduction_indices=1).eval()) # => [ True  True  True]

Segmentation

Segmentation 根据指定的 segment_ids 对输入分段进行计算操作,并返回降维后的结果:

import tensorflow as tf
sess = tf.InteractiveSession()

seg_ids = tf.constant([0,1,1,2,2]); # Group indexes : 0|1,2|3,4
x = tf.constant([[2, 5, 3, -5],
                    [0, 3,-2,  5],
                    [4, 3, 5,  3],
                    [6, 1, 4,  0],
                    [6, 1, 4,  0]])

print (tf.segment_sum(x, seg_ids).eval())
print (tf.segment_prod(x, seg_ids).eval())
print (tf.segment_min(x, seg_ids).eval())
print (tf.segment_max(x, seg_ids).eval())
print (tf.segment_mean(x, seg_ids).eval())

Sequence

序列比较和索引提取操作。

import tensorflow as tf
sess = tf.InteractiveSession()

x = tf.constant([[2, 5, 3, -5],
                 [0, 3,-2,  5],
                 [4, 3, 5,  3],
                 [6, 1, 4,  0]])
listx = tf.constant([1,2,5,3,4,5,6,7,8,3,2])
boolx = tf.constant([[True,False], [False,True]])

# 返回各列最小值的索引
print(tf.argmin(x, 0).eval()) # ==>  [1 3 1 0]

# 返回各行最大值的索引
print(tf.argmax(x, 1).eval()) # ==> [1 3 2 0]

# 返回 Tensor 为 True 的位置
# ==> [[0 0]
#      [1 1]]
print(tf.where(boolx).eval())

# 返回唯一化数据
print(tf.unique(listx)[0].eval()) # ==> [1 2 5 3 4 6 7 8]

Name Scope

Name scopes 可以把复杂操作分成小的命名块,方便组织复杂的图,并方便在 TensorBoard 展示。

import tensorflow as tf

with tf.name_scope("Scope_A"):
    a = tf.add(1, 2, name="A_add")
    b = tf.multiply(a, 3, name="A_mul")

with tf.name_scope("Scope_B"):
    c = tf.add(4, 5, name="B_add")
    d = tf.multiply(c, 6, name="B_mul")

e = tf.add(b, d, name="output")
writer = tf.summary.FileWriter('./name_scope', graph=tf.get_default_graph())
writer.close()

一个完整示例

import tensorflow as tf

# Define a new Graph
graph = tf.Graph()

with graph.as_default():

    with tf.name_scope("variables"):
        # Variable to keep track of how many times the graph has been run
        global_step = tf.Variable(0, dtype=tf.int32, trainable=False, name="global_step")

        # Variable that keeps track of the sum of all output values over time:
        total_output = tf.Variable(0.0, dtype=tf.float32, trainable=False, name="total_output")

    # Primary transformation Operations
    with tf.name_scope("transformation"):
        # Separate input layer
        with tf.name_scope("input"):
            # Create input placeholder- takes in a Vector
            a = tf.placeholder(tf.float32, shape=[None], name="input_placeholder_a")

        # Separate middle layer
        with tf.name_scope("intermediate_layer"):
            b = tf.reduce_prod(a, name="product_b")
            c = tf.reduce_sum(a, name="sum_c")

        # Separate output layer
        with tf.name_scope("output"):
            output = tf.add(b, c, name="output")

    with tf.name_scope("update"):
        # Increments the total_output Variable by the latest output
        update_total = total_output.assign_add(output)

        # Increments the above `global_step` Variable, should be run whenever the graph is run
        increment_step = global_step.assign_add(1)

    # Summary Operations
    with tf.name_scope("summaries"):
        avg = tf.div(update_total, tf.cast(increment_step, tf.float32), name="average")

        # Creates summaries for output node
        tf.scalar_summary(b'Output', output, name="output_summary")
        tf.scalar_summary(b'Sum of outputs over time', update_total, name="total_summary")
        tf.scalar_summary(b'Average of outputs over time', avg, name="average_summary")

    # Global Variables and Operations
    with tf.name_scope("global_ops"):
        # Initialization Op
        init = tf.initialize_all_variables()
        # Merge all summaries into one Operation
        merged_summaries = tf.merge_all_summaries()

# Start a Session, using the explicitly created Graph
sess = tf.Session(graph=graph)

# Open a SummaryWriter to save summaries
writer = tf.train.SummaryWriter('./improved_graph', graph)

# Initialize Variables
sess.run(init)


def run_graph(input_tensor):
    """
    Helper function; runs the graph with given input tensor and saves summaries
    """
    feed_dict = {a: input_tensor}
    _, step, summary = sess.run([output, increment_step, merged_summaries],
                                  feed_dict=feed_dict)
    writer.add_summary(summary, global_step=step)

# run graph with some inputs
run_graph([2,8])
run_graph([3,1,3,3])
run_graph([8])
run_graph([1,2,3])
run_graph([11,4])
run_graph([4,1])
run_graph([7,3,1])
run_graph([6,3])
run_graph([0,2])
run_graph([4,5,6])

# flush summeries to disk
writer.flush()

# close writer and session
writer.close()
sess.close()

Tensorboard 图:

Tensorboard 事件:

通用框架

import tensorflow as tf

# initialize variables/model parameters

# define the training loop operations

def inference(X):
    # compute inference model over data X and return the result

def loss(X, Y):
    # compute loss over training data X and expected outputs Y

def inputs():
    # read/generate input training data X and expected outputs Y

def train(total_loss):
    # train / adjust model parameters according to computed total loss

def evaluate(sess, X, Y):
    # evaluate the resulting trained model

# Create a saver.
saver = tf.train.Saver()

# Launch the graph in a session, setup boilerplate
with tf.Session() as sess:
    tf.initialize_all_variables().run()

    X, Y = inputs()

    total_loss = loss(X, Y)
    train_op = train(total_loss)

    coord = tf.train.Coordinator()
    threads = tf.train.start_queue_runners(sess=sess, coord=coord)

    # actual training loop
    training_steps = 1000
    for step in range(training_steps):
        sess.run([train_op])
        # for debugging and learning purposes, see how the loss gets decremented
        # through training steps
        if step % 10 == 0:
            print "loss:", sess.run([total_loss])
        # save training checkpoints in case loosing them
        if step % 1000 == 0:
            saver.save(sess, 'my-model', global_step=step)

    evaluate(sess, X, Y)

    coord.request_stop()
    coord.join(threads)
    saver.save(sess, 'my-model', global_step=training_steps)

万一训练过程中断,可以通过下面的方式恢复:

with tf.Session() as sess:
    # model setup....

    initial_step = 0

    # verify if we don't have a checkpoint saved already
    ckpt = tf.train.get_checkpoint_state(os.path.dirname(__file__))
    if ckpt and ckpt.model_checkpoint_path:
        # Restores from checkpoint
        saver.restore(sess, ckpt.model_checkpoint_path)
        initial_step = int(ckpt.model_checkpoint_path.rsplit('-', 1)[1])

    #actual training loop
    for step in range(initial_step, training_steps):
       # run each train step