Feedforward Neural Networks
This page explains various ways of implementing single-layer and multi-layer neural networks as a supplementary material of this lecture. The implementations appear in explicit to abstract order so that one can understand the black-boxed internal processing in deep learning frameworks.
In order to focus on the internals, this page uses a simple and classic example: threshold logic units. Supposing $x=0$ as false and $x=1$ as true, single-layer neural networks can realize logic units such as AND ($\wedge$), OR ($\vee$), NOT ($\lnot$), and NAND ($|$). Multi-layer neural networks can realize logical compounds such as XOR.
$x_1$ | $x_2$ | AND | OR | NAND | XOR |
---|---|---|---|---|---|
0 | 0 | 0 | 0 | 1 | 0 |
0 | 1 | 0 | 1 | 1 | 1 |
1 | 0 | 0 | 1 | 1 | 1 |
1 | 1 | 1 | 1 | 0 | 0 |
Single-layer perceptron
We consider a single layer perceptron that predicts a binary label $\hat{y} \in {0, 1}$ for a given input vector $\boldsymbol{x} \in \mathbb{R}^d$ ($d$ presents the number of dimensions of inputs) by using the following formula,
\[\hat{y} = g(\boldsymbol{w} \cdot \boldsymbol{x} + b) = g(w_1 x_1 + w_2 x_2 + ... + w_d x_d + b)\]Here, $\boldsymbol{w} \in \mathbb{R}^d$ is a weight vector; $b \in \mathbb{R}$ is a bias weight; and $g(.)$ denotes a Heaviside step function (we assume $g(0)=0$).
Let’s train a NAND gate with two inputs ($d = 2$). More specifically, we want to find a weight vector $\boldsymbol{w}$ and a bias weight $b$ of a single-layer perceptron that realizes the truth table of the NAND gate: $\{0,1\}^2 \to \{0,1\}$.
We convert the truth table into a training set consisting of all mappings of the NAND gate,
\[\boldsymbol{x}_1 = (0, 0), y_1 = 1 \\ \boldsymbol{x}_2 = (0, 1), y_2 = 1 \\ \boldsymbol{x}_3 = (1, 0), y_3 = 1 \\ \boldsymbol{x}_4 = (1, 1), y_4 = 0 \\\]In order to train a weight vector and bias weight in a unified code, we include a bias term as an additional dimension to inputs. More concretely, we append $1$ to each input,
\[\boldsymbol{x}'_1 = (0, 0, 1), y_1 = 1 \\ \boldsymbol{x}'_2 = (0, 1, 1), y_2 = 1 \\ \boldsymbol{x}'_3 = (1, 0, 1), y_3 = 1 \\ \boldsymbol{x}'_4 = (1, 1, 1), y_4 = 0 \\\]Then, the formula of the single-layer perceptron becomes,
\[\hat{y} = g((w_1, w_2, w_3) \cdot \boldsymbol{x}') = g(w_1 x_1 + w_2 x_2 + w_3)\]In other words, $w_1$ and $w_2$ present weights for $x_1$ and $x_2$, respectively, and $w_3$ does a bias weight.
The code below implements Rosenblatt’s perceptron algorithm with a fixed number of iterations (100 times). We use a constant learning rate 0.5 for simplicity.
import numpy as np
# Training data for NAND.
x = np.array([
[0, 0, 1], [0, 1, 1], [1, 0, 1], [1, 1, 1]
])
y = np.array([0, 0, 0, 1])
w = np.array([0.0, 0.0, 0.0])
eta = 0.5
for t in range(100):
for i in range(len(y)):
y_pred = np.heaviside(np.dot(x[i], w), 0)
w += (y[i] - y_pred) * eta * x[i]
w
array([ 1. , 0.5, -1. ])
np.heaviside(np.dot(x, w), 0)
array([0., 0., 0., 1.])
Single-layer perceptron with mini-batch
It is better to reduce the execusion run by the Python interpreter, which is relatively slow. The common technique to speed up a machine-learning code written in Python is to to execute computations within the matrix library (e.g., numpy).
The single-layer perceptron makes predictions for four inputs,
\[\hat{y}_1 = g(\boldsymbol{x}_1 \cdot \boldsymbol{w}) \\ \hat{y}_2 = g(\boldsymbol{x}_2 \cdot \boldsymbol{w}) \\ \hat{y}_3 = g(\boldsymbol{x}_3 \cdot \boldsymbol{w}) \\ \hat{y}_4 = g(\boldsymbol{x}_4 \cdot \boldsymbol{w}) \\\]Here, we define $\hat{Y} \in \mathbb{R}^{4 \times 1}$ and $X \in \mathbb{R}^{4 \times d}$ as,
\[\hat{Y} = \begin{pmatrix} \hat{y}_1 \\ \hat{y}_2 \\ \hat{y}_3 \\ \hat{y}_4 \\ \end{pmatrix}, X = \begin{pmatrix} \boldsymbol{x}_1 \\ \boldsymbol{x}_2 \\ \boldsymbol{x}_3 \\ \boldsymbol{x}_4 \\ \end{pmatrix}\]Then, we can write the four predictions in one dot-product computation, \(\hat{Y} = X \cdot \boldsymbol{w}\)
The code below implements this idea. The function np.heaviside()
yields a vector corresponding to the four predictions, applying the step function for every element of the argument.
This technique is frequently used in mini-batch training, where gradients for a small number (e.g., 4 to 128) of instances are computed.
import numpy as np
# Training data for NAND.
x = np.array([
[0, 0, 1], [0, 1, 1], [1, 0, 1], [1, 1, 1]
])
y = np.array([1, 1, 1, 0])
w = np.array([0.0, 0.0, 0.0])
eta = 0.5
for t in range(100):
y_pred = np.heaviside(np.dot(x, w), 0)
w += np.dot((y - y_pred), x)
w
array([-1., -1., 2.])
np.heaviside(np.dot(x, w), 0)
array([1., 1., 1., 0.])
Stochastic gradient descent (SGD) with mini-batch
Next, we consider a single-layer feedforward neural network with sigmoid activation function. In essence, we replace Heaviside step function with sigmoid function when predicting $\hat{Y}$ and to use the formula for stochastic gradient descent when updating $\boldsymbol{w}$.
import numpy as np
def sigmoid(v):
return 1.0 / (1 + np.exp(-v))
# Training data for NAND.
x = np.array([
[0, 0, 1], [0, 1, 1], [1, 0, 1], [1, 1, 1]
])
y = np.array([1, 1, 1, 0])
w = np.array([0.0, 0.0, 0.0])
eta = 0.5
for t in range(100):
y_pred = sigmoid(np.dot(x, w))
w -= np.dot((y_pred - y), x)
w
array([-5.59504346, -5.59504346, 8.57206068])
sigmoid(np.dot(x, w))
array([0.99981071, 0.95152498, 0.95152498, 0.06798725])
Automatic differentiation
autograd
import autograd
import autograd.numpy as np
def loss(w, x):
return -np.log(1.0 / (1 + np.exp(-np.dot(x, w))))
x = np.array([1, 1, 1])
w = np.array([1.0, 1.0, -1.5])
grad_loss = autograd.grad(loss)
print(loss(w, x))
print(grad_loss(w, x))
0.47407698418010663
[-0.37754067 -0.37754067 -0.37754067]
PyTorch
import torch
dtype = torch.float
x = torch.tensor([1, 1, 1], dtype=dtype)
w = torch.tensor([1.0, 1.0, -1.5], dtype=dtype, requires_grad=True)
loss = -torch.dot(x, w).sigmoid().log()
loss.backward()
print(loss.item())
print(w.grad)
0.4740769565105438
tensor([-0.3775, -0.3775, -0.3775])
Chainer
import numpy as np
from chainer import Variable
import chainer.functions as F
dtype = np.float32
x = np.array([1,1,1], dtype=dtype)
w = Variable(np.array([1.0,1.0,-1.5], dtype=dtype), requires_grad=True)
loss = -F.log(F.sigmoid(np.dot(x,w)))
loss.backward()
print(loss.data)
print(w.grad)
0.47407696
[-0.37754062 -0.37754062 -0.37754062]
TensorFlow
import tensorflow as tf
x = tf.constant([1., 1., 1.])
w = tf.Variable([1.0, 1.0, -1.5])
loss = -tf.log(tf.sigmoid(tf.tensordot(x, w, axes=1)))
grad = tf.gradients(loss, w)
with tf.Session() as sess:
sess.run(tf.global_variables_initializer())
loss_value = sess.run(loss)
grad_value = sess.run(grad)
print(loss_value)
print(grad_value)
0.47407696
[array([-0.37754062, -0.37754062, -0.37754062], dtype=float32)]
MXNet
import mxnet as mx
from mxnet import nd, autograd, gluon
x = nd.array([1., 1., 1.])
w = nd.array([1.0, 1.0, -1.5])
w.attach_grad()
with autograd.record():
loss = -nd.dot(x, w).sigmoid().log()
loss.backward()
print(loss)
print(w.grad)
[0.47407696]
<NDArray 1 @cpu(0)>
[-0.37754065 -0.37754065 -0.37754065]
<NDArray 3 @cpu(0)>
Single-layer neural network using automatic differentiation
PyTorch
import torch
dtype = torch.float
# Training data for NAND.
x = torch.tensor([[0, 0, 1], [0, 1, 1], [1, 0, 1], [1, 1, 1]], dtype=dtype)
y = torch.tensor([[1], [1], [1], [0]], dtype=dtype)
w = torch.randn(3, 1, dtype=dtype, requires_grad=True)
eta = 0.5
for t in range(100):
# y_pred = \sigma(x \cdot w)
y_pred = x.mm(w).sigmoid()
ll = y * y_pred + (1 - y) * (1 - y_pred)
loss = -ll.log().sum() # The loss value.
#print(t, loss.item())
loss.backward() # Compute the gradients of the loss.
with torch.no_grad():
w -= eta * w.grad # Update weights using SGD.
w.grad.zero_() # Clear the gradients for the next iteration.
w
tensor([[-4.2327],
[-4.2320],
[ 6.5405]])
x.mm(w).sigmoid()
tensor([[ 0.9986],
[ 0.9096],
[ 0.9095],
[ 0.1274]])
Chainer
import numpy as np
import chainer
from chainer import Variable
import chainer.functions as F
dtype = np.float32
# Training data for NAND
x = Variable(np.array([[0, 0, 1], [0, 1, 1], [1, 0, 1], [1, 1, 1]], dtype=dtype))
y = Variable(np.array([[1], [1], [1], [0]], dtype=dtype))
w = Variable(np.random.rand(3, 1).astype(dtype=dtype), requires_grad=True)
eta = 0.5
for t in range(100):
# y_pred = \sigma(x \cdot w)
y_pred = F.sigmoid(F.matmul(x, w))
ll = y * y_pred + (1 - y) * (1 - y_pred)
loss = -F.sum(F.log(ll)) # The loss value.
#print(t, loss)
loss.backward() # Compute the gradients of the loss.
with chainer.no_backprop_mode():
w -= eta * w.grad # Update weights using SGD.
w.cleargrad() # Clear the gradients for the next iteration.
w
variable([[-4.238654 ],
[-4.2391624],
[ 6.550305 ]])
F.sigmoid(F.matmul(x, w))
variable([[0.99857235],
[0.90979564],
[0.90983737],
[0.12702626]])
TensorFlow
import tensorflow as tf
# Training data for NAND
x_data = [[0, 0, 1], [0, 1, 1], [1, 0, 1], [1, 1, 1]]
y_data = [[1], [1], [1], [0]]
x = tf.placeholder(tf.float32, [4, 3])
y = tf.placeholder(tf.float32, [4, 1])
w = tf.Variable(tf.random_normal([3,1]))
# y_pred = \sigma(x \cdot w)
y_pred = tf.sigmoid(tf.matmul(x, w))
ll = y * y_pred + (1 - y) * (1 - y_pred)
loss = -tf.reduce_sum(tf.log(ll))
grad = tf.gradients(loss, w)
eta = 0.5
with tf.Session() as sess:
sess.run(tf.global_variables_initializer())
for t in range(100):
grads = sess.run(grad, feed_dict={x: x_data, y: y_data})
sess.run(w.assign_sub(eta * grads[0]))
print(sess.run(w))
print(sess.run(y_pred, feed_dict={x: x_data, y: y_data}))
[[-4.203982 ]
[-4.2044005]
[ 6.4987044]]
[[0.9984969 ]
[0.90840423]
[0.90843904]
[0.12901704]]
MXNet
import mxnet as mx
from mxnet import nd, autograd
# Training data for NAND.
x = nd.array([[0, 0, 1], [0, 1, 1], [1, 0, 1], [1, 1, 1]])
y = nd.array([[1], [1], [1], [0]])
w = nd.random.normal(0, 1, shape=(3, 1))
w.attach_grad()
eta = 0.5
for t in range(100):
with autograd.record():
# y_pred = \sigma(x \cdot w).
y_pred = nd.dot(x, w).sigmoid()
ll = y * y_pred + (1 - y) * (1 - y_pred)
loss = -ll.log().sum() # The loss value.
#print(t, loss)
loss.backward() # Compute the gradients of the loss.
w -= eta * w.grad # Update weights using SGD.
w
[[-4.2020216]
[-4.20314 ]
[ 6.4963117]]
<NDArray 3x1 @cpu(0)>
nd.dot(x, w).sigmoid()
[[0.9984933 ]
[0.90831 ]
[0.90840304]
[0.12911019]]
<NDArray 4x1 @cpu(0)>
Multi-layer neural network using automatic differentiation
PyTorch
import torch
dtype = torch.float
# Training data for XOR.
x = torch.tensor([[0, 0, 1], [0, 1, 1], [1, 0, 1], [1, 1, 1]], dtype=dtype)
y = torch.tensor([[0], [1], [1], [0]], dtype=dtype)
w1 = torch.randn(3, 2, dtype=dtype, requires_grad=True)
w2 = torch.randn(2, 1, dtype=dtype, requires_grad=True)
b2 = torch.randn(1, 1, dtype=dtype, requires_grad=True)
eta = 0.5
for t in range(1000):
# y_pred = \sigma(w_2 \cdot \sigma(x \cdot w_1) + b_2)
y_pred = x.mm(w1).sigmoid().mm(w2).add(b2).sigmoid()
ll = y * y_pred + (1 - y) * (1 - y_pred)
loss = -ll.log().sum()
#print(t, loss.item())
loss.backward()
with torch.no_grad():
# Update weights using SGD.
w1 -= eta * w1.grad
w2 -= eta * w2.grad
b2 -= eta * b2.grad
# Clear the gradients for the next iteration.
w1.grad.zero_()
w2.grad.zero_()
b2.grad.zero_()
print(w1)
print(w2)
print(b2)
tensor([[ 5.1994, 7.0126],
[ 5.2027, 7.0301],
[-7.9606, -3.1728]])
tensor([[-12.1454],
[ 11.3713]])
tensor([[-5.2753]])
x.mm(w1).sigmoid().mm(w2).add(b2).sigmoid()
tensor([[ 0.0080],
[ 0.9942],
[ 0.9941],
[ 0.0062]])
Chainer
import numpy as np
import chainer
from chainer import Variable
import chainer.functions as F
dtype = np.float32
# Training data for XOR.
x = np.array([[0, 0, 1], [0, 1, 1], [1, 0, 1], [1, 1, 1]], dtype=dtype)
y = np.array([[0], [1], [1], [0]],dtype=dtype)
w1 = Variable(np.random.randn(3, 2).astype(dtype),requires_grad=True)
w2 = Variable(np.random.randn(2, 1).astype(dtype),requires_grad=True)
b2 = Variable(np.random.randn(1).astype(dtype), requires_grad=True)
eta = 0.5
for t in range(1000):
# y_pred = \sigma(w_2 \cdot \sigma(x \cdot w_1) + b_2)
y_pred = F.sigmoid(F.bias(F.matmul(F.sigmoid(F.matmul(x, w1)), w2), b2))
ll = y * y_pred + (1 - y) * (1 - y_pred)
loss = -F.sum(F.log(ll))
#print(t, loss.data)
loss.backward()
with chainer.no_backprop_mode():
# Update weights using SGD.
w1 -= eta * w1.grad
w2 -= eta * w2.grad
b2 -= eta * b2.grad
# Clear the gradients for the next iteration.
w1.cleargrad()
w2.cleargrad()
b2.cleargrad()
print(w1)
print(w2)
print(b2)
variable([[-6.898038 -6.5185432]
[ 7.0828695 6.2416263]
[ 3.471656 -3.3139799]])
variable([[-11.242552]
[ 11.863684]])
variable([5.270227])
F.sigmoid(F.bias(F.matmul(F.sigmoid(F.matmul(x,w1)) ,w2), b2))
variable([[0.00539303],
[0.9949782 ],
[0.9927317 ],
[0.00462809]])
TensorFlow
import tensorflow as tf
# Training data for XOR.
x_data = [[0, 0, 1], [0, 1, 1], [1, 0, 1], [1, 1, 1]]
y_data = [[0], [1], [1], [0]]
x = tf.placeholder(tf.float32, [4, 3])
y = tf.placeholder(tf.float32, [4, 1])
w1 = tf.Variable(tf.random_normal([3, 2]))
w2 = tf.Variable(tf.random_normal([2, 1]))
b2 = tf.Variable(tf.random_normal([1, 1]))
y_pred = tf.sigmoid(tf.add(tf.matmul(tf.sigmoid(tf.matmul(x, w1)), w2), b2))
ll = y * y_pred + (1 - y) * (1 - y_pred)
log = tf.log(ll)
loss = -tf.reduce_sum(log)
grad = tf.gradients(loss, [w1, w2, b2])
eta = 0.5
with tf.Session() as sess:
sess.run(tf.global_variables_initializer())
for t in range(1000):
w1_grad, w2_grad, b2_grad = sess.run(grad, feed_dict={x: x_data, y: y_data})
sess.run(tf.assign_sub(w1, eta * w1_grad))
sess.run(tf.assign_sub(w2, eta * w2_grad))
sess.run(tf.assign_sub(b2, eta * b2_grad))
print(sess.run(w1))
print(sess.run(w2))
print(sess.run(b2))
print(sess.run(y_pred, feed_dict={x: x_data, y: y_data}))
[[ 8.545845 8.527924 ]
[ 4.8393583 -4.2591324]
[-1.4335978 2.7701557]]
[[ 7.452827 ]
[-7.1250057]]
[[-0.32773858]]
[[0.00369266]
[0.9962198 ]
[0.49852544]
[0.5015694 ]]
MXNet
import mxnet as mx
from mxnet import nd, autograd
# Training data for XOR.
x = nd.array([[0, 0, 1], [0, 1, 1], [1, 0, 1], [1, 1, 1]])
y = nd.array([[0], [1], [1], [0]])
w1 = nd.random.normal(0, 1, shape=(3, 2))
w2 = nd.random.normal(0, 1, shape=(2, 1))
b2 = nd.random.normal(0, 1, shape=(1, 1))
w1.attach_grad()
w2.attach_grad()
b2.attach_grad()
eta = 0.5
for t in range(1000):
with autograd.record():
# y_pred = \sigma(w_2 \cdot \sigma(x \cdot w_1) + b_2)
y_pred = (nd.dot(nd.dot(x, w1).sigmoid(), w2) + b2).sigmoid()
ll = y * y_pred + (1 - y) * (1 - y_pred)
loss = -ll.log().sum()
loss.backward()
# Update weights using SGD.
w1 -= eta * w1.grad
w2 -= eta * w2.grad
b2 -= eta * b2.grad
print(w1)
print(w2)
print(b2)
[[ 4.5155373 4.041809 ]
[ -7.8481655 7.4811954]
[ 5.9294176 -10.316905 ]]
<NDArray 3x2 @cpu(0)>
[[-5.775163 ]
[-7.2728887]]
<NDArray 2x1 @cpu(0)>
[[5.775924]]
<NDArray 1x1 @cpu(0)>
(nd.dot(nd.dot(x, w1).sigmoid(), w2) + b2).sigmoid()
[[0.50396055]
[0.99037385]
[0.4968157 ]
[0.00550799]]
<NDArray 4x1 @cpu(0)>
Single-layer neural network with high-level NN modules
PyTorch
import torch
dtype = torch.float
# Training data for NAND.
x = torch.tensor([[0, 0], [0, 1], [1, 0], [1, 1]], dtype=dtype)
y = torch.tensor([[1], [1], [1], [0]], dtype=dtype)
# Define a neural network using high-level modules.
model = torch.nn.Sequential(
torch.nn.Linear(2, 1, bias=True), # 2 dims (with bias) -> 1 dim
)
# Binary corss-entropy loss after sigmoid function.
loss_fn = torch.nn.BCEWithLogitsLoss(size_average=False)
eta = 0.5
for t in range(100):
y_pred = model(x) # Make predictions.
loss = loss_fn(y_pred, y) # Compute the loss.
#print(t, loss.item())
model.zero_grad() # Zero-clear the gradients.
loss.backward() # Compute the gradients.
with torch.no_grad():
for param in model.parameters():
param -= eta * param.grad # Update the parameters using SGD.
model.state_dict()
OrderedDict([('0.weight', tensor([[-4.3067, -4.3060]])),
('0.bias', tensor([ 6.6506]))])
model(x).sigmoid()
tensor([[ 0.9987],
[ 0.9125],
[ 0.9124],
[ 0.1232]])
Chainer
import chainer
import numpy as np
from chainer import Variable, Function
import chainer.functions as F
import chainer.links as L
dtype = np.float32
# Training data for NAND
x = Variable(np.array([[0, 0], [0, 1], [1, 0], [1, 1]], dtype=dtype))
y = Variable(np.array([[1], [1], [1], [0]], dtype=np.int32))
# Define a neural network using high-level modules.
model = chainer.Sequential(
L.Linear(2, 1, nobias=False) # 2 dims (with bias) -> 1 dim
)
# Binary corss-entropy loss after sigmoid function.
loss_fn=F.sigmoid_cross_entropy
eta = 0.5
for t in range(100):
y_pred = model(x) # Make predictions.
loss = loss_fn(y_pred, y, normalize=False)
# print(t, loss.data)
model.cleargrads() # Zero-clear the gradients.
loss.backward() # Compute the gradients.
with chainer.no_backprop_mode():
for para in model.params():
para.data -= eta * para.grad # Update the parameters using SGD.
for para in model.params():
print(para)
variable b([3.2144895])
variable W([[-1.9686245 -1.9545643]])
F.sigmoid(model(x))
variable([[0.96137595],
[0.7790133 ],
[0.77658325],
[0.32988632]])
Multi-layer neural network with high-level NN modules
PyTorch
import torch
dtype = torch.float
# Training data for XOR.
x = torch.tensor([[0, 0], [0, 1], [1, 0], [1, 1]], dtype=dtype)
y = torch.tensor([[0], [1], [1], [0]], dtype=dtype)
# Define a neural network using high-level modules.
model = torch.nn.Sequential(
torch.nn.Linear(2, 2, bias=True), # 2 dims (with bias) -> 2 dims
torch.nn.Sigmoid(), # Sigmoid function
torch.nn.Linear(2, 1, bias=True), # 2 dims (with bias) -> 1 dim
)
# Binary corss-entropy loss after sigmoid function.
loss_fn = torch.nn.BCEWithLogitsLoss(size_average=False)
eta = 0.5
for t in range(1000):
y_pred = model(x) # Make predictions.
loss = loss_fn(y_pred, y) # Compute the loss.
#print(t, loss.item())
model.zero_grad() # Zero-clear the gradients.
loss.backward() # Compute the gradients.
with torch.no_grad():
for param in model.parameters():
param -= eta * param.grad # Update the parameters using SGD.
model.state_dict()
OrderedDict([('0.weight', tensor([[ 7.0281, 7.0367],
[ 5.1955, 5.1971]])),
('0.bias', tensor([-3.1767, -7.9526])),
('2.weight', tensor([[ 11.4025, -12.1782]])),
('2.bias', tensor([-5.2898]))])
model(x).sigmoid()
tensor([[ 0.0079],
[ 0.9942],
[ 0.9942],
[ 0.0061]])
Chainer
import chainer
import numpy as np
from chainer import Variable, Function
import chainer.functions as F
import chainer.links as L
dtype = np.float32
# Training data for XOR.
x = chainer.Variable(np.array([[0, 0], [0, 1], [1, 0], [1, 1]], dtype=dtype))
y = chainer.Variable(np.array([[0], [1], [1], [0]], dtype=np.int32))
# Define a neural network using high-level modules.
init=chainer.initializers.HeNormal()
model = chainer.Sequential(
L.Linear(2, 2, nobias=False, initialW=init), # 2 dims (with bias) -> 2 dims
F.sigmoid, # Sigmoid function
L.Linear(2, 1, nobias=False, initialW=init), # 2 dims (with bias) -> 1 dim
)
# Binary corss-entropy loss after sigmoid function.
loss_fn=F.sigmoid_cross_entropy
eta = 0.5
for t in range(1000):
y_pred = model(x) # Make predictions.
loss = loss_fn(y_pred, y, normalize=False)
# print(t, loss.data)
model.cleargrads() # Zero-clear the gradients.
loss.backward() # Compute the gradients.
with chainer.no_backprop_mode():
for para in model.params():
para.data -= eta * para.grad # Update the parameters using SGD.
for para in model.params():
print(para)
variable W([[-5.0357113 4.694917 ]
[ 5.884984 -6.006705 ]])
variable b([-2.5778637 -3.3951974])
variable W([[7.5983677 7.613726 ]])
variable b([-3.705806])
F.sigmoid(model(x))
variable([[0.05105227],
[0.95592314],
[0.9653981 ],
[0.04323387]])
Single-layer neural network with an optimizer
PyTorch
import torch
dtype = torch.float
# Training data for NAND.
x = torch.tensor([[0, 0], [0, 1], [1, 0], [1, 1]], dtype=dtype)
y = torch.tensor([[1], [1], [1], [0]], dtype=dtype)
# Define a neural network using high-level modules.
model = torch.nn.Sequential(
torch.nn.Linear(2, 1, bias=True), # 2 dims (with bias) -> 1 dim
)
# Binary corss-entropy loss after sigmoid function.
loss_fn = torch.nn.BCEWithLogitsLoss(size_average=False)
# Optimizer based on SGD (change "SGD" to "Adam" to use Adam)
optimizer = torch.optim.SGD(model.parameters(), lr=0.5)
for t in range(100):
y_pred = model(x) # Make predictions.
loss = loss_fn(y_pred, y) # Compute the loss.
#print(t, loss.item())
optimizer.zero_grad() # Zero-clear gradients.
loss.backward() # Compute the gradients.
optimizer.step() # Update the parameters using the gradients.
model.state_dict()
OrderedDict([('0.weight', tensor([[-4.2726, -4.2721]])),
('0.bias', tensor([ 6.6000]))])
model(x).sigmoid()
tensor([[ 0.9986],
[ 0.9112],
[ 0.9111],
[ 0.1251]])
Chainer
import chainer
import numpy as np
from chainer import functions as F
from chainer import links as L
chainer.config.train = True
dtype=np.float32
# Training data for NAND.
x = chainer.Variable(np.array([[0, 0], [0, 1], [1, 0], [1, 1]], dtype=dtype))
y = chainer.Variable(np.array([[1], [1], [1], [0]], dtype=np.int32))
# Define a neural network using high-level modules.
model = chainer.Sequential(
L.Linear(2, 1, nobias=False), # 2 dims (with bias) -> 1 dim
)
# Binary corss-entropy loss after sigmoid function.
loss_fn = F.sigmoid_cross_entropy
# Optimizer based on SGD (change "SGD" to "Adam" to use Adam)
optimizer = chainer.optimizers.SGD(lr=0.5)
optimizer.setup(model)
for t in range(100):
y_pred = model(x) # Make predictions.
loss = loss_fn(y_pred, y, normalize=False) # Compute the loss.
#print(t, loss.data)
model.cleargrads() # Zero-clear gradients.
loss.backward() # Compute the gradients.
optimizer.update() # Update the parameters using the gradients.
for para in model.params():
print(para)
variable b([3.4636898])
variable W([[-2.134361 -2.1379907]])
F.sigmoid(model(x))
variable([[0.9696368 ],
[0.79012835],
[0.7907295 ],
[0.30817568]])
TensorFlow
import numpy as np
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Flatten, Dense, Activation
from tensorflow.keras import optimizers
# Training data for NAND.
x_data = np.array([[0, 0], [0, 1], [1, 0], [1, 1]])
y_data = np.array([[1], [1], [1], [0]])
# Define a neural network using high-level modules.
model = Sequential([
Flatten(),
Dense(1, activation='sigmoid')
])
model.compile(
optimizer=optimizers.SGD(lr=0.5),
loss='binary_crossentropy',
metrics=['accuracy']
)
model.fit(x_data, y_data, epochs=100)
Epoch 1/100
4/4 [==============================] - 0s 68ms/step - loss: 0.8059 - acc: 0.2500
Epoch 2/100
4/4 [==============================] - 0s 471us/step - loss: 0.7065 - acc: 0.5000
Epoch 3/100
4/4 [==============================] - 0s 3ms/step - loss: 0.6347 - acc: 0.5000
Epoch 4/100
4/4 [==============================] - 0s 718us/step - loss: 0.5838 - acc: 0.7500
Epoch 5/100
4/4 [==============================] - 0s 805us/step - loss: 0.5477 - acc: 0.7500
Epoch 6/100
4/4 [==============================] - 0s 1ms/step - loss: 0.5217 - acc: 0.7500
Epoch 7/100
4/4 [==============================] - 0s 3ms/step - loss: 0.5024 - acc: 1.0000
Epoch 8/100
4/4 [==============================] - 0s 426us/step - loss: 0.4875 - acc: 1.0000
Epoch 9/100
4/4 [==============================] - 0s 437us/step - loss: 0.4756 - acc: 1.0000
Epoch 10/100
4/4 [==============================] - 0s 1ms/step - loss: 0.4657 - acc: 1.0000
Epoch 11/100
4/4 [==============================] - 0s 499us/step - loss: 0.4572 - acc: 1.0000
Epoch 12/100
4/4 [==============================] - 0s 2ms/step - loss: 0.4496 - acc: 1.0000
Epoch 13/100
4/4 [==============================] - 0s 1ms/step - loss: 0.4426 - acc: 1.0000
Epoch 14/100
4/4 [==============================] - 0s 2ms/step - loss: 0.4362 - acc: 0.7500
Epoch 15/100
4/4 [==============================] - 0s 648us/step - loss: 0.4301 - acc: 0.7500
Epoch 16/100
4/4 [==============================] - 0s 960us/step - loss: 0.4244 - acc: 0.7500
Epoch 17/100
4/4 [==============================] - 0s 422us/step - loss: 0.4188 - acc: 0.7500
Epoch 18/100
4/4 [==============================] - 0s 538us/step - loss: 0.4135 - acc: 0.7500
Epoch 19/100
4/4 [==============================] - 0s 560us/step - loss: 0.4084 - acc: 1.0000
Epoch 20/100
4/4 [==============================] - 0s 524us/step - loss: 0.4034 - acc: 1.0000
Epoch 21/100
4/4 [==============================] - 0s 1ms/step - loss: 0.3986 - acc: 1.0000
Epoch 22/100
4/4 [==============================] - 0s 713us/step - loss: 0.3939 - acc: 1.0000
Epoch 23/100
4/4 [==============================] - 0s 630us/step - loss: 0.3894 - acc: 1.0000
Epoch 24/100
4/4 [==============================] - 0s 519us/step - loss: 0.3849 - acc: 1.0000
Epoch 25/100
4/4 [==============================] - 0s 523us/step - loss: 0.3806 - acc: 1.0000
Epoch 26/100
4/4 [==============================] - 0s 779us/step - loss: 0.3764 - acc: 1.0000
Epoch 27/100
4/4 [==============================] - 0s 661us/step - loss: 0.3723 - acc: 1.0000
Epoch 28/100
4/4 [==============================] - 0s 481us/step - loss: 0.3683 - acc: 1.0000
Epoch 29/100
4/4 [==============================] - 0s 2ms/step - loss: 0.3644 - acc: 1.0000
Epoch 30/100
4/4 [==============================] - 0s 1ms/step - loss: 0.3606 - acc: 1.0000
Epoch 31/100
4/4 [==============================] - 0s 684us/step - loss: 0.3569 - acc: 1.0000
Epoch 32/100
4/4 [==============================] - 0s 660us/step - loss: 0.3533 - acc: 1.0000
Epoch 33/100
4/4 [==============================] - 0s 419us/step - loss: 0.3497 - acc: 1.0000
Epoch 34/100
4/4 [==============================] - 0s 519us/step - loss: 0.3463 - acc: 1.0000
Epoch 35/100
4/4 [==============================] - 0s 2ms/step - loss: 0.3429 - acc: 1.0000
Epoch 36/100
4/4 [==============================] - 0s 1ms/step - loss: 0.3396 - acc: 1.0000
Epoch 37/100
4/4 [==============================] - 0s 584us/step - loss: 0.3363 - acc: 1.0000
Epoch 38/100
4/4 [==============================] - 0s 503us/step - loss: 0.3331 - acc: 1.0000
Epoch 39/100
4/4 [==============================] - 0s 382us/step - loss: 0.3300 - acc: 1.0000
Epoch 40/100
4/4 [==============================] - 0s 2ms/step - loss: 0.3270 - acc: 1.0000
Epoch 41/100
4/4 [==============================] - 0s 1ms/step - loss: 0.3240 - acc: 1.0000
Epoch 42/100
4/4 [==============================] - 0s 6ms/step - loss: 0.3211 - acc: 1.0000
Epoch 43/100
4/4 [==============================] - 0s 799us/step - loss: 0.3182 - acc: 1.0000
Epoch 44/100
4/4 [==============================] - 0s 2ms/step - loss: 0.3154 - acc: 1.0000
Epoch 45/100
4/4 [==============================] - 0s 1ms/step - loss: 0.3127 - acc: 1.0000
Epoch 46/100
4/4 [==============================] - 0s 814us/step - loss: 0.3100 - acc: 1.0000
Epoch 47/100
4/4 [==============================] - 0s 547us/step - loss: 0.3073 - acc: 1.0000
Epoch 48/100
4/4 [==============================] - 0s 663us/step - loss: 0.3047 - acc: 1.0000
Epoch 49/100
4/4 [==============================] - 0s 578us/step - loss: 0.3022 - acc: 1.0000
Epoch 50/100
4/4 [==============================] - 0s 508us/step - loss: 0.2997 - acc: 1.0000
Epoch 51/100
4/4 [==============================] - 0s 373us/step - loss: 0.2972 - acc: 1.0000
Epoch 52/100
4/4 [==============================] - 0s 1ms/step - loss: 0.2948 - acc: 1.0000
Epoch 53/100
4/4 [==============================] - 0s 812us/step - loss: 0.2925 - acc: 1.0000
Epoch 54/100
4/4 [==============================] - 0s 530us/step - loss: 0.2902 - acc: 1.0000
Epoch 55/100
4/4 [==============================] - 0s 504us/step - loss: 0.2879 - acc: 1.0000
Epoch 56/100
4/4 [==============================] - 0s 521us/step - loss: 0.2856 - acc: 1.0000
Epoch 57/100
4/4 [==============================] - 0s 680us/step - loss: 0.2834 - acc: 1.0000
Epoch 58/100
4/4 [==============================] - 0s 745us/step - loss: 0.2813 - acc: 1.0000
Epoch 59/100
4/4 [==============================] - 0s 583us/step - loss: 0.2791 - acc: 1.0000
Epoch 60/100
4/4 [==============================] - 0s 2ms/step - loss: 0.2770 - acc: 1.0000
Epoch 61/100
4/4 [==============================] - 0s 954us/step - loss: 0.2750 - acc: 1.0000
Epoch 62/100
4/4 [==============================] - 0s 536us/step - loss: 0.2729 - acc: 1.0000
Epoch 63/100
4/4 [==============================] - 0s 352us/step - loss: 0.2709 - acc: 1.0000
Epoch 64/100
4/4 [==============================] - 0s 461us/step - loss: 0.2690 - acc: 1.0000
Epoch 65/100
4/4 [==============================] - 0s 699us/step - loss: 0.2670 - acc: 1.0000
Epoch 66/100
4/4 [==============================] - 0s 581us/step - loss: 0.2651 - acc: 1.0000
Epoch 67/100
4/4 [==============================] - 0s 409us/step - loss: 0.2633 - acc: 1.0000
Epoch 68/100
4/4 [==============================] - 0s 676us/step - loss: 0.2614 - acc: 1.0000
Epoch 69/100
4/4 [==============================] - 0s 639us/step - loss: 0.2596 - acc: 1.0000
Epoch 70/100
4/4 [==============================] - 0s 840us/step - loss: 0.2578 - acc: 1.0000
Epoch 71/100
4/4 [==============================] - 0s 970us/step - loss: 0.2561 - acc: 1.0000
Epoch 72/100
4/4 [==============================] - 0s 476us/step - loss: 0.2543 - acc: 1.0000
Epoch 73/100
4/4 [==============================] - 0s 576us/step - loss: 0.2526 - acc: 1.0000
Epoch 74/100
4/4 [==============================] - 0s 726us/step - loss: 0.2509 - acc: 1.0000
Epoch 75/100
4/4 [==============================] - 0s 496us/step - loss: 0.2493 - acc: 1.0000
Epoch 76/100
4/4 [==============================] - 0s 437us/step - loss: 0.2476 - acc: 1.0000
Epoch 77/100
4/4 [==============================] - 0s 418us/step - loss: 0.2460 - acc: 1.0000
Epoch 78/100
4/4 [==============================] - 0s 482us/step - loss: 0.2444 - acc: 1.0000
Epoch 79/100
4/4 [==============================] - 0s 568us/step - loss: 0.2428 - acc: 1.0000
Epoch 80/100
4/4 [==============================] - 0s 542us/step - loss: 0.2413 - acc: 1.0000
Epoch 81/100
4/4 [==============================] - 0s 436us/step - loss: 0.2397 - acc: 1.0000
Epoch 82/100
4/4 [==============================] - 0s 434us/step - loss: 0.2382 - acc: 1.0000
Epoch 83/100
4/4 [==============================] - 0s 2ms/step - loss: 0.2367 - acc: 1.0000
Epoch 84/100
4/4 [==============================] - 0s 1ms/step - loss: 0.2353 - acc: 1.0000
Epoch 85/100
4/4 [==============================] - 0s 789us/step - loss: 0.2338 - acc: 1.0000
Epoch 86/100
4/4 [==============================] - 0s 1ms/step - loss: 0.2324 - acc: 1.0000
Epoch 87/100
4/4 [==============================] - 0s 473us/step - loss: 0.2310 - acc: 1.0000
Epoch 88/100
4/4 [==============================] - 0s 463us/step - loss: 0.2296 - acc: 1.0000
Epoch 89/100
4/4 [==============================] - 0s 489us/step - loss: 0.2282 - acc: 1.0000
Epoch 90/100
4/4 [==============================] - 0s 1ms/step - loss: 0.2269 - acc: 1.0000
Epoch 91/100
4/4 [==============================] - 0s 627us/step - loss: 0.2255 - acc: 1.0000
Epoch 92/100
4/4 [==============================] - 0s 2ms/step - loss: 0.2242 - acc: 1.0000
Epoch 93/100
4/4 [==============================] - 0s 906us/step - loss: 0.2229 - acc: 1.0000
Epoch 94/100
4/4 [==============================] - 0s 628us/step - loss: 0.2216 - acc: 1.0000
Epoch 95/100
4/4 [==============================] - 0s 373us/step - loss: 0.2203 - acc: 1.0000
Epoch 96/100
4/4 [==============================] - 0s 585us/step - loss: 0.2190 - acc: 1.0000
Epoch 97/100
4/4 [==============================] - 0s 490us/step - loss: 0.2178 - acc: 1.0000
Epoch 98/100
4/4 [==============================] - 0s 508us/step - loss: 0.2166 - acc: 1.0000
Epoch 99/100
4/4 [==============================] - 0s 588us/step - loss: 0.2153 - acc: 1.0000
Epoch 100/100
4/4 [==============================] - 0s 718us/step - loss: 0.2141 - acc: 1.0000
<tensorflow.python.keras.callbacks.History at 0x7f9f80286da0>
model.get_weights()
[array([[-2.202066],
[-2.160888]], dtype=float32), array([3.5287492], dtype=float32)]
model.predict(x_data)
array([[0.9714948],
[0.7970343],
[0.7902915],
[0.3027567]], dtype=float32)
MXNet
import mxnet as mx
from mxnet import nd, autograd, gluon
# Training data for NAND.
x = nd.array([[0, 0], [0, 1], [1, 0], [1, 1]])
y = nd.array([[1], [1], [1], [0]])
# Define a neural network using high-level modules.
net = gluon.nn.Sequential()
with net.name_scope():
net.add(gluon.nn.Dense(1))
net.collect_params().initialize(mx.init.Normal(sigma=1.))
# Binary cross-entropy loss agter sigmoid function.
loss_fn = gluon.loss.SigmoidBinaryCrossEntropyLoss()
# Optimizer based on SGD
trainer = gluon.Trainer(net.collect_params(), 'sgd', {'learning_rate': 0.5})
for t in range(100):
with autograd.record():
# Make predictions.
y_pred = net(x)
# Compute the loss.
loss = loss_fn(y_pred, y)
# Compute the gradients of the loss.
loss.backward()
# Update weights using SGD.
# the batch_size is set to one to be consistent with the slide.
trainer.step(batch_size=1)
for v in net.collect_params().values():
print(v, v.data())
Parameter sequential0_dense0_weight (shape=(1, 2), dtype=float32)
[[-4.182336 -4.1832795]]
<NDArray 1x2 @cpu(0)>
Parameter sequential0_dense0_bias (shape=(1,), dtype=float32)
[6.466928]
<NDArray 1 @cpu(0)>
net(x).sigmoid()
[[0.9984484 ]
[0.90751374]
[0.9075929 ]
[0.13025706]]
<NDArray 4x1 @cpu(0)>
Multi-layer neural network with an optimizer
PyTorch
import torch
dtype = torch.float
# Training data for XOR.
x = torch.tensor([[0, 0], [0, 1], [1, 0], [1, 1]], dtype=dtype)
y = torch.tensor([[0], [1], [1], [0]], dtype=dtype)
# Define a neural network using high-level modules.
model = torch.nn.Sequential(
torch.nn.Linear(2, 2, bias=True), # 2 dims (with bias) -> 2 dims
torch.nn.Sigmoid(), # Sigmoid function
torch.nn.Linear(2, 1, bias=True), # 2 dims (with bias) -> 1 dim
)
# Binary corss-entropy loss after sigmoid function.
loss_fn = torch.nn.BCEWithLogitsLoss(size_average=False)
# Optimizer based on SGD (change "SGD" to "Adam" to use Adam)
optimizer = torch.optim.SGD(model.parameters(), lr=0.5)
for t in range(1000):
y_pred = model(x) # Make predictions.
loss = loss_fn(y_pred, y) # Compute the loss.
#print(t, loss.item())
optimizer.zero_grad() # Zero-clear gradients.
loss.backward() # Compute the gradients.
optimizer.step() # Update the parameters using the gradients.
model.state_dict()
OrderedDict([('0.weight', tensor([[ 7.3702, -7.1611],
[-6.6066, 6.9133]])),
('0.bias', tensor([ 3.6234, 3.3088])),
('2.weight', tensor([[-9.1519, -9.2072]])),
('2.bias', tensor([ 13.4994]))])
model(x).sigmoid()
tensor([[ 0.0134],
[ 0.9826],
[ 0.9824],
[ 0.0118]])
Chainer
import chainer
import numpy as np
from chainer import functions as F
from chainer import links as L
dtype=np.float32
# Training data for XOR.
x = chainer.Variable(np.array([[0, 0], [0, 1], [1, 0], [1, 1]], dtype=dtype))
y = chainer.Variable(np.array([[0], [1], [1], [0]], dtype=np.int32))
# Define a neural network using high-level modules.
model = chainer.Sequential(
L.Linear(2, 2, nobias=False),
F.sigmoid,
L.Linear(2, 1, nobias=False),
)
# Binary corss-entropy loss after sigmoid function.
loss_fn=F.sigmoid_cross_entropy
# Optimizer based on SGD (change "SGD" to "Adam" to use Adam)
optimizer = chainer.optimizers.SGD(lr=0.5)
optimizer.setup(model)
for t in range(1000):
y_pred = model(x) # Make predictions.
loss = loss_fn(y_pred, y, normalize=False) # Compute the loss.
#print(t, loss.data)
model.cleargrads() # Zero-clear gradients.
loss.backward() # Compute the gradients.
optimizer.update() # Update the parameters using the gradients.
for param in model.params():
print(param)
variable b([-2.2869124 -5.1613417])
variable W([[5.922025 5.878332 ]
[3.4194984 3.4142656]])
variable b([-3.3487024])
variable W([[ 7.281869 -7.459597]])
F.sigmoid(model(x))
variable([[0.06181788],
[0.93281394],
[0.93301374],
[0.08725712]])
TensorFlow
import numpy as np
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Flatten, Dense, Activation
from tensorflow.keras import optimizers
# Training data for XOR.
x_data = np.array([[0, 0], [0, 1], [1, 0], [1, 1]])
y_data = np.array([[0], [1], [1], [0]])
# Define a neural network using high-level modules.
model = Sequential([
Flatten(),
Dense(2, activation='sigmoid'), # 2 dims (with bias) -> 2 dims
Dense(1, activation='sigmoid') # 2 dims (with bias) -> 2 dims
])
model.compile(
optimizer=optimizers.SGD(lr=0.5),
loss='binary_crossentropy',
metrics=['accuracy']
)
model.fit(x_data, y_data, epochs=1000, verbose=0)
<tensorflow.python.keras.callbacks.History at 0x7fd00b4d00f0>
model.get_weights()
[array([[2.0502675, 5.6302657],
[1.0301563, 4.942775 ]], dtype=float32),
array([-1.615101 , -1.5490855], dtype=float32),
array([[-4.0722337],
[ 5.53848 ]], dtype=float32),
array([-2.3617785], dtype=float32)]
model.predict(x_data)
array([[0.11236145],
[0.8234226 ],
[0.6484979 ],
[0.4670412 ]], dtype=float32)
MXNet
import mxnet as mx
from mxnet import nd, autograd, gluon
# Training data for XOR.
x = nd.array([[0, 0, 1], [0, 1, 1], [1, 0, 1], [1, 1, 1]])
y = nd.array([[0], [1], [1], [0]])
# Define a neural network using high-level modules.
net = gluon.nn.Sequential()
with net.name_scope():
net.add(gluon.nn.Dense(2))
net.add(gluon.nn.Activation('sigmoid'))
net.add(gluon.nn.Dense(1))
net.collect_params().initialize(mx.init.Normal(sigma=1.))
# Binary cross-entropy loss agter sigmoid function.
loss_fn = gluon.loss.SigmoidBinaryCrossEntropyLoss()
# Optimizer based on SGD
trainer = gluon.Trainer(net.collect_params(), 'sgd', {'learning_rate': 0.5})
for t in range(1000):
with autograd.record():
# Make predictions.
y_pred = net(x)
# Compute the loss.
loss = loss_fn(y_pred, y)
# Compute the gradients of the loss.
loss.backward()
# Update weights using SGD.
# the batch_size is set to one to be consistent with the slide.
trainer.step(batch_size=1)
for v in net.collect_params().values():
print(v, v.data())
Parameter sequential0_dense0_weight (shape=(2, 3), dtype=float32)
[[ 8.537011 -4.4268546 1.9961522]
[ 8.598794 4.8988695 -1.3610638]]
<NDArray 2x3 @cpu(0)>
Parameter sequential0_dense0_bias (shape=(2,), dtype=float32)
[ 0.9527137 -0.12632234]
<NDArray 2 @cpu(0)>
Parameter sequential0_dense1_weight (shape=(1, 2), dtype=float32)
[[-7.1571183 7.3222814]]
<NDArray 1x2 @cpu(0)>
Parameter sequential0_dense1_bias (shape=(1,), dtype=float32)
[-0.16509056]
<NDArray 1 @cpu(0)>
net(x).sigmoid()
[[0.00362506]
[0.9962937 ]
[0.49854442]
[0.5015438 ]]
<NDArray 4x1 @cpu(0)>
Single-layer neural network with a customizable NN class.
PyTorch
import torch
dtype = torch.float
# Training data for NAND.
x = torch.tensor([[0, 0], [0, 1], [1, 0], [1, 1]], dtype=dtype)
y = torch.tensor([[1], [1], [1], [0]], dtype=dtype)
# Define a neural network model.
class SingleLayerNN(torch.nn.Module):
def __init__(self, d_in, d_out):
super(SingleLayerNN, self).__init__()
self.linear1 = torch.nn.Linear(d_in, d_out, bias=True)
def forward(self, x):
return self.linear1(x)
model = SingleLayerNN(2, 1)
# Binary corss-entropy loss after sigmoid function.
loss_fn = torch.nn.BCEWithLogitsLoss(size_average=False)
# Optimizer based on SGD (change "SGD" to "Adam" to use Adam)
optimizer = torch.optim.SGD(model.parameters(), lr=0.5)
for t in range(100):
y_pred = model(x) # Make predictions.
loss = loss_fn(y_pred, y) # Compute the loss.
#print(t, loss.item())
optimizer.zero_grad() # Zero-clear gradients.
loss.backward() # Compute the gradients.
optimizer.step() # Update the parameters using the gradients.
model.state_dict()
OrderedDict([('linear1.weight', tensor([[-4.2693, -4.2689]])),
('linear1.bias', tensor([ 6.5951]))])
model(x).sigmoid()
tensor([[ 0.9986],
[ 0.9110],
[ 0.9110],
[ 0.1253]])
Chainer
import chainer
import numpy as np
from chainer import Variable, Function
import chainer.functions as F
import chainer.links as L
x = Variable(np.array([[0,0],[0,1],[1,0],[1,1]],dtype=np.float32))
y = Variable(np.array([[1],[1],[1],[0]],dtype=np.int32))
class Linear(chainer.Chain):
def __init__(self):
super().__init__()
with self.init_scope():
self.l1 = L.Linear(2,1)
def __call__(self,x):
return self.l1(x)
model = Linear()
optimizer = optimizers.SGD(lr=0.5).setup(model)
for t in range(1000):
y_pred = model(x)
loss = F.sigmoid_cross_entropy(y_pred,y)
#print(t,loss.data)
model.cleargrads()
loss.backward()
optimizer.update()
F.sigmoid(model(x))
variable([[0.99989665],
[0.95996976],
[0.95997 ],
[0.05611408]])
Multi-layer neural network with a customizable NN class.
PyTorch
import torch
dtype = torch.float
# Training data for XOR.
x = torch.tensor([[0, 0], [0, 1], [1, 0], [1, 1]], dtype=dtype)
y = torch.tensor([[0], [1], [1], [0]], dtype=dtype)
# Define a neural network model.
class ThreeLayerNN(torch.nn.Module):
def __init__(self, d_in, d_hidden, d_out):
super(ThreeLayerNN, self).__init__()
self.linear1 = torch.nn.Linear(d_in, d_hidden, bias=True)
self.linear2 = torch.nn.Linear(d_hidden, d_out, bias=True)
def forward(self, x):
return self.linear2(self.linear1(x).sigmoid())
model = ThreeLayerNN(2, 2, 1)
# Binary corss-entropy loss after sigmoid function.
loss_fn = torch.nn.BCEWithLogitsLoss(size_average=False)
# Optimizer based on SGD (change "SGD" to "Adam" to use Adam)
optimizer = torch.optim.SGD(model.parameters(), lr=0.5)
for t in range(1000):
y_pred = model(x) # Make predictions.
loss = loss_fn(y_pred, y) # Compute the loss.
#print(t, loss.item())
optimizer.zero_grad() # Zero-clear gradients.
loss.backward() # Compute the gradients.
optimizer.step() # Update the parameters using the gradients.
model.state_dict()
OrderedDict([('linear1.weight', tensor([[ 6.6212, -6.8110],
[ 6.7129, -6.4369]])),
('linear1.bias', tensor([-3.5404, 3.2040])),
('linear2.weight', tensor([[ 11.6606, -11.1694]])),
('linear2.bias', tensor([ 5.2589]))])
model(x).sigmoid()
tensor([[ 0.0058],
[ 0.9921],
[ 0.9947],
[ 0.0049]])
Chainer
import chainer
import numpy as np
from chainer import Variable
from chainer import functions as F
from chainer import links as L
from chainer import optimizers
x = Variable(np.array([[0,0],[0,1],[1,0],[1,1]],dtype=np.float32))
y = Variable(np.array([[0],[1],[1],[0]],dtype=np.int32))
class Linear(chainer.Chain):
def __init__(self):
super().__init__()
with self.init_scope():
self.l1 = L.Linear(2,2)
self.l2 = L.Linear(2,1)
def __call__(self,x):
h = F.sigmoid(self.l1(x))
o = self.l2(h)
return o
model = Linear()
optimizer = optimizers.SGD(lr=0.5).setup(model)
for t in range(1000):
y_pred = model(x)
loss = F.sigmoid_cross_entropy(y_pred,y)
#print(t,loss.data)
model.cleargrads()
loss.backward()
optimizer.update()
F.sigmoid(model(x))
variable([[0.33073208],
[0.5335392 ],
[0.8444923 ],
[0.26925102]])