跨境派

跨境派

跨境派,专注跨境行业新闻资讯、跨境电商知识分享!

当前位置:首页 > 综合服务 > 培训机构 > 【AI大模型应用开发实战】逐步推导反向传播计算原理 Backpropagation step by step

【AI大模型应用开发实战】逐步推导反向传播计算原理 Backpropagation step by step

时间:2024-04-20 10:50:36 来源:网络cs 作者:欧阳逸 栏目:培训机构 阅读:

标签: 传播  原理  推导  模型  逐步  实战 

 

Backpropagation 反向传播

If you understand gradient descent for finding the minimum of a cost function, then backpropagation is going to be a cake walk. Backpropagation is gradient descent but we have to rely on the chain rule of derivatives because of the way layers are arranged in a neural network. That’s it. We will also show a python example of training a network to recognize digits just to get a feel of the whole thing.
如果您理解梯度下降是为了找到成本函数的最小值,那么反向传播将是小菜一碟。反向传播是梯度下降,但由于神经网络中各层的排列方式,我们必须依赖导数的链式法则。就是这样。我们还将展示一个训练网络识别数字的 Python 示例,以便了解整个过程。

A simple example 一个简单的例子

Let’s start with a simple example that entails a network with no hidden layers. Just an input layer and an output layer. This is meant to understand how the errors of the cost function get propagated back from one layer to the previous layer. Then we will code an actual network with a hidden layer in between.
让我们从一个简单的示例开始,该示例需要一个没有隐藏层的网络。只是一个输入层和一个输出层。这是为了理解成本函数的误差如何从一层传播回上一层。然后我们将编写一个中间有隐藏层的实际网络。

So suppose we have a set of MNIST images of digits. These are 28x28 pixels black and white images of all the 10 digits. And we want to classify them accordingly. Of course we also have the labels for those images. The first question is how are we going to use these images as input to a network. The obvious way is to treat these images as an array of pixels and not as a matrix. So an image of 28x28 pixels will be treated as an array of 784 pixel values instead. And this will be our input layer. The output layer though will be of size 10 since we have 10 possible labels from 0 to 9. When the network is trained, and we input, say an image with the digit zero, we would like the first neuron value to be a s close as possible to 1 and the rest to be as close as possible to 0. The image below shows this. Of course we are using only 10 boxes to represent the 784 neurons in the input layer and only 5 boxes to represent 10 neurons in the output layer. I did that to make the image manageable :)
假设我们有一组 MNIST 数字图像。这些是所有 10 位数字的 28x28 像素黑白图像。我们想对它们进行相应的分类。当然,我们也有这些图像的标签。第一个问题是我们如何使用这些图像作为网络的输入。最明显的方法是将这些图像视为像素阵列而不是矩阵。因此,28x28 像素的图像将被视为 784 像素值的数组。这将是我们的输入层。不过,输出层的大小将为 10,因为我们有从 0 到 9 的 10 个可能的标签。当训练网络时,我们输入一个数字为零的图像,我们希望第一个神经元值接近于尽可能接近 1,其余尽可能接近 0。下图显示了这一点。当然,我们仅使用 10 个框来表示输入层中的 784 个神经元,仅使用 5 个框来表示输出层中的 10 个神经元。我这样做是为了使图像易于管理:)

Figure1: A one layer neural net
图1:单层神经网络

Now let’s train this network.
现在让我们训练这个网络。

We will initialize the weights with random values in the [-1,1] interval
我们将使用 [-1,1] 区间内的随机值初始化权重

We will use MSE as a cost function. Cross entropy actually works better for classification tasks as we’ve seen in the logistic regression section. But MSE is fine as well.
我们将使用 MSE 作为成本函数。正如我们在逻辑回归部分中看到的那样,交叉熵实际上更适合分类任务。但 MSE 也不错。

We know from the linear regression section that the MSE function looks like this:
我们从线性回归部分得知,MSE 函数如下所示:

Figure 2: Mean Squared Error Cost function
图 2:均方误差成本函数

Where y is the desired output or the label and y’(a) is the actual output. And we sum up over all the data points in the training set and that will give us an error number given the values of the weights at that moment. And we would like to minimize this error by using gradient descent, a.i. To somehow change the weights values such that these error values get smaller and smaller over time. In our toy network we have just one set of weights. The ones between the input and the output layer. These are the ones that we would like to change. Now let’s input an actual image of a zero when the weights are randomly initialized. Of course we do not expect much. And the output values won’t mean anything. What we’ll do is we will compute the cost function just for this example. We will subtract the desired value of the output neurons from the predicted value. Image below:
其中 y 是所需的输出或标签,y'(a) 是实际输出。我们对训练集中的所有数据点进行求和,根据当时的权重值,这将给出一个错误数。我们希望通过使用梯度下降(a.i.)来最小化这个错误。以某种方式改变权重值,使得这些误差值随着时间的推移变得越来越小。在我们的玩具网络中,我们只有一组权重。输入层和输出层之间的层。这些是我们想要改变的。现在,当权重随机初始化时,让我们输入一个零的实际图像。当然,我们并没有期望太多。并且输出值没有任何意义。我们要做的就是计算这个例子的成本函数。我们将从预测值中减去输出神经元的期望值。下图:

Figure 3: Subtract the label value from the predicted value of each output neuron
图 3:从每个输出神经元的预测值中减去标签值

Let’s look at the first neuron in the output layer. We want this one to be 1 and in this case is 0.3. The value of this neuron is given by this formula that we’ve seen in the logistic regression section:
让我们看看输出层的第一个神经元。我们希望该值为 1,在本例中为 0.3。该神经元的值由我们在逻辑回归部分中看到的公式给出:

Image below: 下图:

本文链接:https://www.kjpai.cn/news/2024-04-20/160579.html,文章来源:网络cs,作者:欧阳逸,版权归作者所有,如需转载请注明来源和作者,否则将追究法律责任!

版权声明:本文内容由互联网用户自发贡献,该文观点仅代表作者本人。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如发现本站有涉嫌抄袭侵权/违法违规的内容,一经查实,本站将立刻删除。

文章评论