Coursera_DeepLearning_神经网络之逻辑回归

本文转载自 maintain_ 查看原文 2017/11/10 76 神经网络/ 逻辑回归/ coursera深度

神经网络之逻辑回归

我们将建立一个逻辑回归分类器（辨别猫）

通过这个分类器我们将学会：

建立一个一般的学习算法框架，包括：
- 参数初始化
计算损失函数以及梯度
使用一个优化算法（梯度下降）
将上面的三个函数以正确顺序集合到一个主函数中

要导入的包：

numpy ：Python里用于科学计算的基础包
h5py ：
matplotlib ：Python里用于画图的包
PIL ：Python平台事实上的图像处理标准库
scipy ：其包含科学计算中常见问题的各个工具箱，将和上面的PIL结合使用，在最后用自己的图片测试模型

import numpy as np
import matplotlib.pyplot as plt
import h5py
import scipy
from PIL import Image
from scipy import nadimage
from lr_utils import load_dataset

%matplotlib inline

问题概览

问题描述：给定一个数据集（“data.h5”），其中包括：

一个训练集，其中包含m_train个样本，被标记为“cat”(y=1)或者“non-cat”(y=0)
一个测试集，其中也包含m_test个样本，同样被标记为cat或者non-cat
每一个图片的大小皆为（num_px,num_px,3），其中3代表RGB三原色。因此，每张图片的height为num_px，width为num_px

我们将建立一个简单的图片辨别器，能够正确地去辨别一张图片是猫或者不是猫。

#加载数据（cat/non-cat）
train_set_x_orig,train_set_y,test_set_x_orig,test_set_y,classes = load_dataset()

上面的关于x的数据之所以加上”_orig”，是因为我们要对它们进行处理，处理后将变为train_set_x和test_set_x。（标签集train_set_y和test_set_y不需要处理）

train_set_x_orig和test_set_x__orig是图片数组，可通过运行以下代码查看：

#展示其中一张图片
index = 25
plt.imshow(train_set_x_orig[index])
print("y="+str(train_set_y[:, index]) + ", it's a '" + classes[np.squeeze(train_set_y[:, index])].decode("utf-8") +  "' picture.")

深度学习中许多Bug来自于矩阵或者向量的维度不匹配，如果能解决这些维度问题，将会大大减少许多错误。

练习：找到这些值：

m_train（训练集的数目）
m_test （测试集的数目）
num_px （训练样本的高和宽）

m_train = train_set_x_orig.shape[0]
m_test = test_set_x_orig.shape[0]
num_px = test_set_x_orig.shape[1]

为了方便，我们应该将一个(num_px,num_px,3)的图片reshape为一个(num_px*num_px*3,1)的数组。做了这些之后，我们的训练样本集（测试样本集）将变为一个每列都代表着一个扁平图像的Numpy数组。这个数组应该有m_train（m_test）列。

一个实现上述功能的方法：X (a,b,c,d) —>X_flatten (b * c * d,a)

X_flatten = X.reshape(X.shape[0],-1).T

train_set_x_flatten = train_set_x_orig.reshape(train_set_x_orig.shape[0],-1).T
test_set_x_flatten = test_set_x_orig.reshape(test_set_x_orig.shape[0],-1).T

对于彩色图片，每个像素都将包含RGB三种原色，因此像素值实际上是由三个处于0~255的数字组成的一个向量。

标准化数据：

train_set_x = train_set_x_flatten/255
test_set_x = test_set_x_flatten/255

上面所讲这么多我们需要记住以下几点：

在一个新的数据集中通用的处理步骤：

维度问题，以及m_train，m_test，num_px的值

每一条数据都要reshape为(num_px * num_px * 3,1)

标准化数据

学习算法的通用结构

是时候设计一个算法去辨别一张图片是否为猫了~

下面这张图片将解释为什么逻辑回归是一个简单的神经网络：

算法的数学表达式：

例如： $x^{(i)}$ :

z (i) = w T x (i) + b (1)

$z^{(i)} = w^T x^{(i)} + b \tag{1}$

y^(i) = a (i) = s i g m o i d (z (i)) (2)

$\hat{y}^{(i)} = a^{(i)} = sigmoid(z^{(i)})\tag{2}$

L (a (i), y (i)) = - y (i) log (a (i)) - (1 - y (i)) log (1 - a (i)) (3)

$\mathcal{L}(a^{(i)}, y^{(i)}) = - y^{(i)} \log(a^{(i)}) - (1-y^{(i)} ) \log(1-a^{(i)})\tag{3}$

将所有样本的损失加起来求平均值：

J = 1 m \sum i = 1 m L (a (i), y (i))

$J = \frac{1}{m} \sum_{i=1}^m \mathcal{L}(a^{(i)}, y^{(i)})$

关键步骤：

初始化模型参数

通过最小化损失学习参数

用学到的参数进行预测（当然是在test set里）

分析结果并总结

构建算法的各个部分：

构建神经网络的几个主要步骤是：

定义参数结构（例如输入特征的数目）
初始化模型的参数
循环：
- 计算当前损失（前向传播）
- 计算当前的梯度（后向传播）
- 更新参数（梯度下降）

我们将构建独立的以上三步并把它们集结在一个函数里，叫做model()。

辅助函数：通过上面的图片我们知道sigmoid()函数是必须的。

$sigmoid( w^T x + b) = \frac{1}{1 + e^{-(w^T x + b)}}$

def sigmoid(z):
    s = 1./(1+np.exp(-z))
    return s

初始化参数：w初始化为由0组成的向量，b也初始化为0

def initialize_with_zeros(dim):
    w = np.zeros((dim,1))
    b = 0
    return w,b

前向 and 后向传播

实现propagate()函数计算损失函数和它的梯度：

前向传播：

得到X
计算 $A = \sigma(w^T X + b) = (a^{(0)}, a^{(1)}, ..., a^{(m-1)}, a^{(m)})$
计算损失函数： $J = -\frac{1}{m}\sum_{i=1}^{m}y^{(i)}\log(a^{(i)})+(1-y^{(i)})\log(1-a^{(i)})$

这里有两个公式将会用到：

\partial J \partial w = 1 m X (A - Y) T

$\frac{\partial J}{\partial w} = \frac{1}{m}X(A-Y)^T$

\partial J \partial b = 1 m \sum i = 1 m (a (i) - y (i))

$\frac{\partial J}{\partial b} = \frac{1}{m} \sum_{i=1}^m (a^{(i)}-y^{(i)})$

def propagate(w,b,X,Y):
    m = X.shape[1]
    A = sigmoid(w.T.dot(X)+b)
    cost = -(np.sum(np.log(A)*Y + np.log(1-A)*(1-Y)))/m

    dw = (X.dot(A-Y).T)/m
    db = np.sum(A-Y)/m

    assert(dw.shape == w.shape)
    assert(db.dtype == float)
    cost = np.squeeze(cost)
    assert(cost.shape == ())

    grads = {"dw":dw,
             "db":db}
    return grads,cost

参数优化：

我们已经初始化了参数
我们也能计算一个损失函数和它的梯度
现在，我们应该使用梯度去优化参数

def optimize(w,b,X,Y,num_iterations,learning_rate,print_cost = False):
    """ 我们的目标是学得w和b去最小化损失函数J。 对于一个参数θ，更新规则是θ = θ - αdθ，α是学习速率 这个方法通过使用梯度下降算法去优化w和b w -- weights, a numpy array of size (num_px * num_px * 3, 1) b -- bias, a scalar X -- data of shape (num_px * num_px * 3, number of examples) Y -- true "label" vector (containing 0 if non-cat, 1 if cat), of shape (1,number of examples) num_iterations -- number of iterations of the optimization loop learning_rate -- learning rate of the gradient descent update rule print_cost -- True to print the loss every 100 steps """
    costs = []
    for i in range(num_iterations):
        #调用前向传播算法
        grads,cost = progagate(w,b,X,Y)

        #从梯度中检索导数
        dw = grads["dw"]
        db = grads["db"]

        #更新规则
        w = w - learning_rate*dw
        b = b - learning_rate*db

        #每迭代100次记录一次损失
        if i % 100 == 0:
            costs.append(cost)

        #每迭代100次打印出损失
        if print_cost and i %100 == 0:
            print("Cost after iteration %i: %f" % (i,cost))

        params = {"w":w,
                  "b":b}
        grads = {"dw":dw,
                 "db":db}

    return params,grads,costs

上面的函数将会输出学习到的w和b。我们可以使用w和b去预测数据集X的标签。为了实现predict()函数，我们有两步需要去做：

计算 $\hat{Y} = A = \sigma(w^T X + b)$
如果 activation <= 0.5，将a的值转换为0，如果activation > 0.5，将其转换为1，并把每个实例的预测存储进Y_prediction。我们可以在for loop里实现if/else 语句进行向量化

def predict(w,b,X):
    m = X.shape[1]
    Y_prediction = np.zeros((1,m))
    w = w.reshape(X.shape[0],1)

    A = sigmoid(w.T.dot(X)+b)

    for i in range(A.shape[1]):
        if(A[0,i] <= 0.5):
            Y_prediction[0,i] = 0
        else:
            Y_prediction[0,i] = 1

    assert(Y_prediction.shape == (1,m))

    return Y_prediction

将以上所有函数结合在一个model里

def model(X_train,Y_train,X_test,Y_test,num_iterations=2000,learning_rate=0.5,print_cost=False):
    #初始化参数 with zeros
    w,b = initialize_with_zeros(X_train.shape[0])
    #梯度下降
    parameters,grads,costs = optimize(w,b,X_train,Y_train,num_iterations,learning_rate,print_cost=False);
    #从字典parameters里检索w和b
    w = parameters["w"]
    b = parameters["b"]
    #分别预测test/train set examples
    Y_prediction_test = predict(w,b,X_test)
    Y_prediction_train = predict(w,b,X_train)
    #打印准确率
    print("train accuracy: {} %".format(100 - np.mean(np.abs(Y_prediction_train - Y_train)) * 100))
    print("test accuracy: {} %".format(100 - np.mean(np.abs(Y_prediction_test - Y_test)) * 100))
    d = {"costs": costs,
         "Y_prediction_test": Y_prediction_test, 
         "Y_prediction_train" : Y_prediction_train, 
         "w" : w, 
         "b" : b,
         "learning_rate" : learning_rate,
         "num_iterations": num_iterations}
    return d

运行后，我们发现训练集的准确率达到将近100%。这说明这个模型在训练集上表现得足够好了，测试集的误差达到68%，对于这样一个简单的model来说已经不错了（毕竟这只是一个逻辑回归而且只用了很小的数据集）。

此外，我们发现这个模型发生了过拟合问题，下一次我们将会使用正则化去减轻这种过拟合现象。

使用以上代码，我们可以看到对于test set里的example的预测：

index = 1
plt.imshow(test_set_x[:,index].reshape((num_px,num_px,3)))
print("y = " + str(test_set_y[0,index]) + ", you predicted that it is a \"" + classes[d["Y_prediction_test"][0,index]].decode("utf-8") +  "\" picture.")

画出损失函数和它的梯度

costs = np.squeeze(d['costs'])
plt.plot(costs)
plt.ylabel('cost')
plt.xlabel('iterations (per hundreds)')
plt.title("Learning rate = "+str(d["learning_rate"]))
plt.show()

我们可以发现损失在下降，这显示着我们的参数正在被学习。然而，我们会发现我们可以在训练集中更有效地训练这个模型，比如尝试着减少迭代次数，这时或许会发现训练集准确率不断上升而测试集准确率不断下降，这被称作过拟合。

进一步分析

学习速率α的选择

learning_rates = [0.01, 0.001, 0.0001]
models = {}
for i in learning_rates:
    print ("learning rate is: " + str(i))
    models[str(i)] = model(train_set_x, train_set_y, test_set_x, test_set_y, num_iterations = 1500, learning_rate = i, print_cost = False)
    print ('\n' + "-------------------------------------------------------" + '\n')

for i in learning_rates:
    plt.plot(np.squeeze(models[str(i)]["costs"]), label= str(models[str(i)]["learning_rate"]))

plt.ylabel('cost')
plt.xlabel('iterations')

legend = plt.legend(loc='upper center', shadow=True)
frame = legend.get_frame()
frame.set_facecolor('0.90')
plt.show()

使用自己的图片进行预测

my_image = "my_image.jpg"   # change this to the name of your image file 

# 处理图片以适应算法
fname = "images/" + my_image
image = np.array(ndimage.imread(fname, flatten=False))
my_image = scipy.misc.imresize(image, size=(num_px,num_px)).reshape((1, num_px*num_px*3)).T
my_predicted_image = predict(d["w"], d["b"], my_image)

plt.imshow(image)
print("y = " + str(np.squeeze(my_predicted_image)) + ", your algorithm predicts a \"" + classes[int(np.squeeze(my_predicted_image)),].decode("utf-8") +  "\" picture.")

Output : y = 0.0, your algorithm predicts a “non-cat” picture.

智能推荐

注意！

本站转载的文章为个人学习借鉴使用，本站对版权不负任何法律责任。如果侵犯了您的隐私权益，请联系我们删除。

猜您在找

神经网络-逻辑回归 Coursera吴恩达《神经网络与深度学习》课程笔记（2）-- 神经网络基础之逻辑回归 Coursera吴恩达《神经网络与深度学习》课程笔记（2）-- 神经网络基础之逻辑回归 Coursera吴恩达《神经网络与深度学习》课程笔记（2）-- 神经网络基础之逻辑回归 DeepLearning:三、神经网络

赞助商链接