Differential Calculus - The Chain Rule

The chain rule gives us a formula that enables us to differentiate a function of a function. In other words, it enables us to differentiate an expression called a composite function, in which one function is applied to the output of another. Supposing we have two functions, ƒ(x) = cos(x) and g(x) = x2. Now consider the following expression:

y  =  cos(x2)

We are applying the trigonometric function ƒ(x) = cos(x) to the function g(x) = x2. First of all we are squaring x, and then we are taking the cosine of the result (x2). We can express this relationship formally as follows:

y  =  ƒ(g(x))

It might be helpful here to think of these functions as being like Russian dolls. In this case, there are only two dolls. Function g(x) is the inner doll, and function ƒ(x) is the outer doll. We will see in due course how the chain rule can be applied to a composite function consisting of more than two functions, but for now we will concentrate on composites that involve just two.


Composite functions are nested at different levels, like Russian dolls

Composite functions are nested at different levels, like Russian dolls


Although it is possible in theory to find the derivative of composite functions without using the chain rule, this is usually very difficult to achieve in practice. Let's suppose that we want to find the derivative of the function ƒ(x) = (2x - 3)4. You might assume that we can simply multiply out the brackets and then apply the basic rules of differentiation in the normal way. That is certainly one possibility. Here's what it looks like:

y  =  (2x - 3)4  =  (2x - 3)(2x - 3)(2x - 3)(2x - 3)

We start by multiplying together the two pairs of binomials:

(2x - 3)(2x - 3)(2x - 3)(2x - 3)  =  (4x2 - 12x + 9)(4x2 - 12x + 9)

Now we multiply together the resulting trinomials:

(4x2 - 12x + 9)(4x2 - 12x + 9)  =  16x4 - 96x3 + 216x2 - 216x + 81

That doesn't look too bad. We did tidy things up a bit though, rather than show every step in the process. Bear in mind also that this is a relatively trivial example. You can probably imagine how easy it is to make an error with this kind of calculation. Anyway, having multiplied out the brackets, and assuming we haven't made any mistakes, we can now apply the basic rules of differentiation to the result to find the derivative:

d(16x4 - 96x3 + 216x2 - 216x + 81)  =  64x3 - 288x2 + 432x - 216
dx

Let's factorise this result. We can see that all of the terms can be divided by eight (8):

64x3 - 288x2 + 432x - 216  =  8(8x3 - 36x2 + 54x - 27)

It is also possible factorise the polynomial expression inside the brackets:

8x3 - 36x2 + 54x - 27  =  (2x - 3)3

So:

d((2x - 3)4)  =  8(2x - 3)3
dx

For this last bit of the factorisation (i.e. factorising the polynomial expression 8x3 - 36x2 + 54x - 27) we need to carry out something called a rational root test, because the degree of the polynomial is greater than two (2). If you are not familiar with the techniques used for factorising polynomials, the page entitled "Polynomials" in the Algebra section might be of interest. Suffice it to say that it is not a trivial exercise. By now, you have probably realised that trying to find the derivative of a composite function in the way that we have demonstrated above requires significant time and effort. We can achieve the same result much more efficiently using the chain rule.

The chain rule works on the principle of substitution. Let's go back again to the concept of the functions being nested, like Russian dolls. It would make life much easier if we could simply differentiate the "outer" function, and worry about what's inside it later. In fact, that's essentially how the chain rule works. The outer function in this case will be whatever function we would normally evaluate last. We'll use the composite function we have already differentiated (the hard way) to show how this works. For the function ƒ(x) = (2x - 3)4, we have:

y  =  (2x - 3)4

We're going to substitute the variable u for the expression 2x - 3, so that we get:

y  =  u4

That gives us a much simpler expression to deal with and we can now apply the chain rule - once we actually know the rule, that is! To understand what's going on, we need to backtrack a little. Remember that the function ƒ(x) = (2x - 3)4 is actually the composite of two functions. The outer function is ƒ(x) = x4. For argument's sake, we'll identify the inner function as g(x) = 2x - 3. The value of x passed to function ƒ will obviously not be the same value of x passed to function g. It will in fact be the output of function g, which we have, for the sake of convenience, labelled u. So, for function g we have:

du  =  dg(x)
dxdx

And for function ƒ we have:

dy  =  dƒ(u)
dudu

Which is all very well, but what we actually want is:

dy  =  dƒ(g(x))
dxdx

The chain rule comes to the rescue here. Putting it into words, the chain rule tells us that, in order to find the derivative of the composite of two functions, we need to multiply the derivative of the outer function by the derivative of the inner function. Expressing this algebraically, we have:

dy  =  dy × du
dxdudx

Let's apply this formula to the function ƒ(x) = (2x - 3)4. The first thing to establish is which function is the outer function and which is the inner function. As we said before, the outer function is the function we would normally evaluate last. Since terms enclosed within brackets must always be evaluated first, we can see here that 2x - 3 is the inner function. In other words, remembering that we substitute u for the inner function, we have u = 2x - 3, and y = u4. This gives us:

du  =  d(2x - 3)  =  2
dxdx

and

dy  =  d(u4)  =  4u3
dudu

Applying the chain rule, we get:

dy  =  dy × du  =  2(4u3)  =  8u3
dxdudx

The last part of the exercise is simply to replace u with the original function, 2x - 3. We now have:

d((2x - 3)4)  =  8(2x - 3)3
dx

You may have noticed by now that the derivative doesn't actually look all that different to the original function. Indeed, once you have grasped the idea of how the chain rule actually works, you can often write the derivative of a composite function without going through any intermediate stages. For example, supposing we want to differentiate the composite function ƒ(x) = (3x - 7)10. If you have a good understanding of the chain rule, you should be able to see that the derivative will be ten multiplied by three multiplied by the inner function to the power of nine. Expressing this formally, we have:

d((3x - 7)10)  =  30(3x - 7)9
dx

You can probably imagine just how messy things would get if we tried to find the derivative of an expression like (3x - 7)10 by multiplying out the brackets and applying the basic rules of differentiation to the result! The chain rule makes life a lot easier. Let's look at a slightly more difficult example. Suppose we want to differentiate the following expression:

y  =  √(8x2 - 3x + 6)

If we were evaluating this expression, we would first evaluate the expression under the radical (i.e. the expression for which we want to find the square root), and then take the square root of that result. So, the outer function is the square root function, and the inner function is 8x2 - 3x + 6. We therefore have:

du  =  d(8x2 - 3x + 6)  =  16x - 3
dxdx

and

dy  =  d(u1/2)  =  1u-1/2
dudu2

Applying the chain rule, we get:

dy  =  dy × du  =  (16x - 3)(1/2)(8x2 - 3x + 6)-1/2  =  16x - 3
dxdudx2√(8x2 - 3x + 6)

Sometimes we encounter composite functions that consist of more than two functions. Suppose we have three functions, ƒ, g and h, that are related as follows:

y  =  ƒ(g(h(x)))

How do we differentiate a composite function like this? We can still use the chain rule, but we need to apply it more than once. Suppose we have the following composite function:

y  =  (sin (x2))3

To evaluate the function, we would first evaluate the expression inside the inner brackets. Then, we would evaluate the contents of the outer brackets. Finally, we would raise everything inside the outer brackets to the power of three. This last operation is our outer function. We will start by substituting the variable u for the expression sin (x2). We can now write:

y  =  u3

Differentiating gives us:

dy  =  d(u3)  =  3u2
dudu

We now need the derivative of u, but this is itself a composite function, so we'll have to make a further substitution. This time, we will substitute the variable v for the expression x2. We can now write:

u  =  sin (v)

Differentiating gives us:

du  =  d(sin (v))  =  cos (v)
dvdv

Note that for some quantity n, the derivative of sin (n) will always be cos (n) (how we differentiate trigonometric functions, including how we arrive at this result, will be dealt with in the relevant page in this section). The next step is to differentiate the innermost function:

dv  =  d(x2)  =  2x
dxdx

To differentiate a composite in which the inner function is itself a composite function, we must first apply the chain rule to this inner composite function to find its derivative. We then apply it once more to get the derivative of the outer function. This means that the differential of our composite function is the product of the derivatives of each of the three functions from which it is formed. We can express this algebraically as follows:

dy  =  dy  ×  du  ×  dv
dxdudvdx

So we have:

dy  =  (3)(sin (x2))2)(cos (x2))(2x)  =  6x · cos (x2· (sin (x2))2
dx

So far, we have seen how the chain rule is used together with the basic rules of differentiation to obtain the derivative of a composite function. There will also be occasions when the chain rule must be used together with the product or quotient rules. Consider the following expression:

y  =   (x3 + 7)5
(1 - 2x2)3

To differentiate this function, we will need to use the chain rule together with the quotient rule. The quotient rule states that the derivative of the quotient of two functions is equal to the product of the denominator and the derivative of the numerator, minus the product of the numerator and the derivative of the denominator, all over the denominator squared. However, since both the numerator and the denominator in our example are composite functions, we will also need to use the chain rule. First we'll use the chain rule to find the derivative of the numerator:

d((x3 + 7)5)  =  (5)(x3 + 7)4(3x2)
dx

Now we'll use the chain rule to find the derivative of the denominator:

d((1 - 2x2)3)  =  (3)(1 - 2x2)2(-4x)
dx

Now we can use the quotient rule to differentiate the entire function:

dy  =  (5)(x3 + 7)4(3x2)(1 - 2x2)3  -  (3)(1 - 2x2)2(-4x)(x3 + 7)5
dx((1 - 2x2)3)2

We can of course simplify this somewhat:

dy  =  15x2(x3 + 7)4(1 - 2x2)3  +  12x(1 - 2x2)2(x3 + 7)5
dx((1 - 2x2)6((1 - 2x2)6
dy  =  15x2(x3 + 7)4  +  12x(x3 + 7)5
dx((1 - 2x2)3((1 - 2x2)4

The chain rule gives us a straightforward method of differentiating a composite function - a function that takes the output of a second function as its input. This second function may itself be a composite function. In fact, there can be any number of functions in a composite function, each nested at a different level. We must apply the chain rule at each level, only stopping when we reach the innermost function. The differential of a composite function is the product of the derivatives of the functions from which it is formed. While finding the derivative of an "outer" function, we can represent its "inner" function using a simple placeholder variable, which is replaced with the expression it represents once we are ready to write out our answer in full. Finally, as we have seen, the chain rule can be used in tandem with other rules of differentiation, including the product and quotient rules.