The chain rule of derivative

Chapter 7: The Chain Rule of Derivative

In this chapter, we will explore the chain rule of derivatives, an essential concept in differential calculus that plays a crucial role in deep learning. The chain rule enables us to find the derivative of a composition of functions, allowing us to efficiently compute gradients in neural networks.

The Chain Rule of Derivative

The chain rule is a fundamental principle in calculus used to find the derivative of a composite function. It states that if we have a function ( f(g(x)) ), the derivative of this composite function is given by:

$$
\frac{d}{dx} \left[ f(g(x)) \right] = f'(g(x)) \cdot g'(x)
$$

Where:

( f ) is the outer function.
( g ) is the inner function.
( f'(g(x)) ) represents the derivative of ( f ) with respect to its argument ( g(x) ).
( g'(x) ) represents the derivative of ( g ) with respect to ( x ).

Example

Let’s illustrate the chain rule with a simple example. Consider the following functions:

$$
f(u) = u^2 \quad \text{and} \quad g(x) = 3x + 1
$$

We want to find ( \frac{d}{dx} \left[ f(g(x)) \right] ).

Compute ( g'(x) ):
$$
g'(x) = \frac{d}{dx} (3x + 1) = 3
$$
Compute ( f'(u) ) (derivative of ( f ) with respect to ( u )):
$$
f'(u) = \frac{d}{du} (u^2) = 2u
$$
Apply the chain rule:
$$
\frac{d}{dx} \left[ f(g(x)) \right] = f'(g(x)) \cdot g'(x) = (2 \cdot g(x)) \cdot 3 = 6 \cdot (3x + 1) = 18x + 6
$$

So, $\frac{d}{dx} \left[ f(g(x)) \right] = 18x + 6 $.

Python Code

You can compute the derivative of a composite function using Python libraries like SymPy. Here’s how you can implement the chain rule in Python:

import sympy as sp

# Define variables and functions
x = sp.symbols('x')
u = 3*x + 1
f = u**2

# Compute the derivative using the chain rule
derivative_fg = sp.diff(f, x)
print("The derivative of f(g(x)) is:", derivative_fg)

Application in Deep Learning

In deep learning, the chain rule is used extensively for calculating gradients during the training of neural networks. Specifically, when we have multiple layers with activation functions and loss functions, the chain rule helps us compute gradients efficiently through backpropagation. It enables us to update the network’s weights and biases, allowing the model to learn from data and make better predictions.

Understanding the chain rule is crucial for building and training deep learning models effectively. It forms the foundation for gradient-based optimization algorithms like gradient descent, which are at the core of many deep learning applications.

GET HELP

CONTACT US

Address : Sector 63A, Anishi's Utsav, Noida

Maths For LLM/Deep-Learning

The chain rule of derivative