Matrices

Overview
The origins and uses of matrices
Characteristics of matrices
Matrix addition and subtraction
Matrix multiplication
The inverse of a matrix

The determinant
Finding the inverse of a three-by-three matrix
Eigenvalues and eigenvectors
Row-echelon form
Solving systems of linear equations

Overview

A matrix is a rectangular grid of mathematical terms, usually (though not always) numbers. The grid is enclosed on the left and right-hand sides by square brackets (or sometimes large parentheses). The terms are called elements, and they are arranged in rows and columns as shown below.

6	11	2
3	1	0
6	13	7

The size of the matrix is specified using the number of rows and columns (in that order) as its dimensions. A particular instance of a matrix is described as an m-by-n (m × n) matrix, where m represents the number of rows and n represents the number of columns. The matrix shown above is thus a three-by-three (3 × 3) matrix. The individual rows or columns of a matrix are called row vectors and column vectors respectively. Matrices are usually identified using an upper-case character in bold type:

A =	7	16	4
	5	-3	2
	1	9	18

The individual elements of a matrix are identified using either the character used to identify the matrix itself, or (more commonly) its lower-case equivalent, plus two subscripted numbers that identify the row and column (strictly in that order) in which the element resides. Thus, in the above example, the element in the bottom right-hand corner of the matrix (18) would be identified as either A_3,3 or a_3,3. Reference can be made to en entire row or column of a matrix by using the asterisk symbol ("*") to represent the relevant series of subscripts. The second row of our matrix could thus be identified as A_2,* or a_2,*, while the first column could be identified as A_*,1 or a_*,1. We can express a general three-by-three matrix, A, in terms of its element identifiers as follows:

A =	a_1,1	a_1,2	a_1,3
	a_2,1	a_2,2	a_2,3
	a_3,1	a_3,2	a_3,3

The origins and uses of matrices

[Return to top of page]

Matrices are essentially a compact way of expressing data of various kinds. Although the notation we use for matrices today was not developed until the early part of the twentieth century, much of the theory associated with matrices began to emerge as early as the late seventeenth century. Indeed, some of the ideas upon which matrix theory is based were evident in the work of Chinese scholars of the second and third centuries BCE, who were working on problems related to the solution of linear systems. Matrices are in fact still used for finding solutions to systems of linear equations, which is why we are covering the topic under the general heading of Algebra. Their use is by no means restricted to linear Algebra, however. They can be applied to areas as diverse as game theory, forestry, statistics, electrical networking systems, cryptography, computer graphics, numerical analysis and the generation of economic models, to mention just a few.

Problems involving two or more unknown quantities were being solved by the Babylonians over three thousand years ago, but they were invariably expressed rhetorically (i.e. the problem was written down in words, in a rather long-winded fashion) rather than in the kind of concise algebraic notation used today. A number of statements were made, each of which provided some information about the unknown quantities. The solution to the problem lay in finding values for the unknown quantities that made all of the statements true at the same time. Today, we would write these statements as linear equations, and the collection of statements would together constitute a system of linear equations.

Evidence of the use of a more concise way of expressing such problems appears in a Chinese book of mathematical problems, the title of which (Jiuzhang Suanshu) translates as "Nine Chapters of the Mathematical Art". The book is believed to have been written during the first century BCE. Chapter eight of the book contains a number of problems that are initially expressed rhetorically, the first of which can be summarised as follows:

there are three types of corn
three bundles of the first type, two of the second type and one of the third type make thirty-nine measures
two bundles of the first type, three of the second and one of the third type make thirty-four measures
one bundle of the first type, two of the second type and three of the third type make twenty-six measures

The obvious question here is how many measures of corn does a single bundle of each type of corn contain? The problem as written appears to have been contrived in order to demonstrate a particular method of solving a general class of problems. Using modern algebraic methods for solving such a problem, we would express it as a system of three linear equations in which the value of each type of corn (in terms of how many measures each contains) are expressed as three distinct variables. We could call types one, two and three x, y and z respectively. The three linear equations would thus be:

3x + 2y + z = 39

2x + 3y + z = 34

x + 2y + 3z = 26

Of course, Chinese mathematicians of the time would not have expressed the problem in terms of a system of linear equations. They do however appear to have gone beyond simply expressing the problem in rhetorical terms, and developed something remarkably close to the matrix notation used in modern mathematics. What they did in fact was to arrange the numbers in columns so that the right-most column contained (in descending order) the number of bundles of each type of corn making up the thirty-nine measures as described in the first statement. The number thirty-nine is placed at the bottom of the column. Immediately to the left of this column was a second column that contained (again in descending order) the number of bundles of each type of corn making up the thirty-four measures as described in the second statement. The third (left-most) column contained the number of bundles of each type of corn making up the twenty-six measures as described in the last statement. The resulting matrix would have looked something like this:

1	2	3
2	3	2
3	1	1
26	34	39

In ancient china, the arrangement shown above would have been created using rods to represent the numbers, placed within squares on a counting board. We can re-arrange the matrix to reflect how we would write it today as follows:

3	2	1	39
2	3	1	34
1	2	3	26

We will continue to follow the procedure used by the ancient Chinese mathematicians using modern matrix notation. The next step in the solution is to multiply all of the numbers in the second row by the first number in the first row (which is three), and then subtract the first row from the second row until the first number in the second row is zero. Here are the matrices representing each of the stages involved (the rows that change as a result of each stage are highlighted in red):

3	2	1	39
2	3	1	34
1	2	3	26

3	2	1	39
6	9	3	102
1	2	3	26

3	2	1	39
3	7	2	63
1	2	3	26

3	2	1	39
0	5	1	24
1	2	3	26

The object of the exercise was to get a zero in column one for the second row, i.e. to eliminate the first type of corn from that row. Supposing we now repeat this process with the first and third rows. We will multiply the third row by three, and then subtract the first row from it so that the first number in the third row is zero. Here are the matrices representing each stage:

3	2	1	39
0	5	1	24
1	2	3	26

3	2	1	39
0	5	1	24
3	6	9	78

3	2	1	39
0	5	1	24
0	4	8	39

Having now eliminated the first type of corn from the second and third rows, we can turn our attention to eliminating the second type of corn from the third row. We do this by multiplying the third row by five, and then subtracting the second row from the third row until the value of column two in the third row is also zero. Each stage in the process is shown below.

3	2	1	39
0	5	1	24
0	4	8	39

3	2	1	39
0	5	1	24
0	20	40	195

3	2	1	39
0	5	1	24
0	15	39	171

3	2	1	39
0	5	1	24
0	10	38	147

3	2	1	39
0	5	1	24
0	5	37	123

3	2	1	39
0	5	1	24
0	0	36	99

The third row now has only one type of corn (the third type), and we can see from the numbers that thirty-six bundles of the third type of corn contain ninety-nine measures of corn. The number of measures of corn contained in a single bundle of the third type of corn can therefore be calculated as:

99 ÷ 36 =	11	measures
	4

We have left the result as an improper fraction as a matter of convenience, because it also tells us that four bundles of the third type of corn is equal to eleven measures. We have used a process of elimination to find the value (in measures) for one bundle of the third type of corn, and we can therefore simplify the third row as follows:

3	2	1	39
0	5	1	24
0	0	4	11

We will now eliminate the third type of corn in the second row by multiplying the second row by four, and subtracting the third row from it so that column three in the second row is equal to zero. Here are the stages involved:

3	2	1	39
0	5	1	24
0	0	4	11

3	2	1	39
0	20	4	96
0	0	4	11

3	2	1	39
0	20	0	85
0	0	4	11

The second row now also has only one type of corn (the second type), and we can see from the numbers that twenty bundles of the second type of corn contain eighty-five measures of corn. The number of measures of corn contained in a single bundle of the third type of corn can therefore be calculated as:

85 ÷ 20 =	17	measures
	4

We can now simplify the second row:

3	2	1	39
0	4	0	17
0	0	4	11

Now we must eliminate the second and third types of corn in the first row. We start by multiplying the first row by four and subtracting the third row from it so that column three in the first row becomes zero:

3	2	1	39
0	4	0	17
0	0	4	11

12	8	4	156
0	4	0	17
0	0	4	11

12	8	0	145
0	4	0	17
0	0	4	11

Now we subtract the second row from the first until column two in the first row becomes zero:

12	8	0	145
0	4	0	17
0	0	4	11

12	4	0	128
0	4	0	17
0	0	4	11

12	0	0	111
0	4	0	17
0	0	4	11

The first row now has only one type of corn (the first type), and we can see from the numbers that twelve bundles of the first type of corn contain one-hundred and eleven measures of corn. The number of measures of corn contained in a single bundle of the first type of corn can therefore be calculated as:

111 ÷ 12 =	37	measures
	4

We can now simplify the first row:

4	0	0	37
0	4	0	17
0	0	4	11

The number of measures of corn in each of the three types of corn are therefore nine-and-one-quarter (9¹/₄) for the first type, four-and-one-quarter (4¹/₄) for the second type, and two-and-three-quarters (2³/₄) for the third type. To confirm these results, we can plug the values for the three types of corn into our three linear equations:

3(9.25) + 2(4.25) + 2.75 = 39  ⇒  27.75 + 8.5 + 2.75 = 39

2(9.25) + 3(4.25) + 2.75 = 34  ⇒  18.5 + 12.75 + 2.75 = 34

9.25 + 2(4.25) + 3(2.75) = 26  ⇒  9.25 + 8.5 + 8.25 = 26

All of the above may seem somewhat unwieldy, considering that the Chinese mathematicians had to use counting boards and rods. One suspects, however, that once familiar with these methods, they could probably perform such calculations quite quickly. Even negative numbers did not create a problem, since according to records of the time different coloured rods were used to represent positive and negative numbers. The method used above to find the value of several unknown quantities is based on eliminating all but one of the variables from each row of the matrix by reducing their coefficients to zero, allowing the value of the unknown quantity associated with the remaining coefficient to be calculated. This method is today called Gaussian elimination, although the German mathematician Carl Friedrich Gauss, for whom it is named, did not actually invent the technique in its modern form. Its re-emergence in Europe in the seventeenth century was largely down to Sir Isaac Newton.

Characteristics of matrices

[Return to top of page]

The most obvious way in which one matrix may differ from another is in the number of rows and columns it contains. This can be an important factor in whether or not a particular arithmetic operation can be applied to a pair of matrices. Two matrices cannot be added together, for example, unless they both have the same number of rows and columns. Thus, a three-by-two matrix and a three-by-four matrix cannot be added together. A matrix which has the same number of rows and columns is called a square matrix. A special case of a square matrix in which all of the elements are zero except for the elements that form the diagonal (from the top left-hand corner to the bottom right-hand corner) is called a diagonal matrix. Examples of a square matrix and a diagonal matrix are shown below.

A square matrix:	3	2	5
	6	3	1
	1	4	3

A diagonal matrix:	9	0	0
	0	5	0
	0	0	7

A diagonal matrix in which all of the elements in the diagonal are one is called an identity matrix, and is usually denoted by the upper-case character I. We will examine the significance of this type of matrix in due course. A matrix may also consist of a single row or column. A matrix that consists of a single row is called a row matrix, while a matrix that consists of a single column is called a column matrix. Examples of a row matrix and a column matrix are shown below.

A row matrix:

-5

A column matrix:		3
		8
		4
		6

For two matrices to be equal to one another, they most have the same number of rows and columns, and each element of the first matrix must be identical to the corresponding element of the second matrix. The two matrices shown below are equal to one another:

3	2	5	=	3	2	5
6	3	1		6	3	1
1	4	3		1	4	3

A matrix is said to have been transposed if rows one to n become columns one to n, as demonstrated by the example below. The transpose of a matrix is by convention given the same identifier as the original matrix, but with the superscripted upper-case character "T" appended to it.

A =	7	1	16	4
	5	5	-3	2
	1	12	9	18

A^T =	7	5	1
	1	5	12
	16	-3	9
	4	2	18

Another special kind of matrix is one in which all of the elements are zero. Such a matrix is called the zero matrix. The zero matrix is usually denoted by the bold-type character 0 (zero), together with subscripts that give the dimensions of the matrix in rows and columns. The zero matrix 0_m,n has m rows and n columns. In terms of matrix arithmetic, the following will hold true for any matrix M:

0M = M0 = 0 and 0 + M = M

Here is the three-by-three zero matrix O_3,3:

O_3,3 =		0	0	0
		0	0	0
		0	0	0

Matrix addition and subtraction

[Return to top of page]

Two matrices can be added together if (and only if) they both have the same number of rows and columns. The same is true if we want to subtract one matrix from another. The primary difference between matrix addition and matrix subtraction is that the order in which the matrices appear is not important for addition, but it is important for subtraction, as the examples below will demonstrate. Here is an example of adding two matrices, A and B, together to produce a third matrix, C:

A =	1	9	-3
	12	7	1
	-4	15	2

B =	6	5	7
	-3	18	9
	4	-1	17

A + B = C ⇒	1	9	-3	+	6	5	7	=	7	14	4
	12	7	1		-3	18	9		9	25	-8
	-4	15	2		4	-1	17		0	14	19

Matrix C is the sum of matrix A and matrix B. Note that the corresponding elements in matrix A and matrix B have been added together to create the elements of matrix C. The element c_1,1 is thus the sum of element a_1,1 and element b_1,1. Note that matrix addition is both commutative (the order in which the matrices appear in the addition does not affect the result - A + B is the same as B + A) and associative (when more than two matrices are added together, the order in which the addition operations occur does not affect the result - (A + B) + M is the same as A + (B + M)). Subtracting one matrix from another is equally straightforward. If instead of adding matrix A and matrix B, we want to subtract matrix B from matrix A, we simply subtract each element in matrix B from the corresponding element in matrix A. If matrix C is equal to matrix A minus matrix B, element c_1,1 will equal element a_1,1 minus element b_1,1:

A - B = C ⇒	1	9	-3	-	6	5	7	=	-5	4	-10
	12	7	1		-3	18	9		15	-11	10
	-4	15	2		4	-1	17		-8	16	-15

Subtraction with matrices (as with real numbers or integers, for example) is neither commutative nor associative. The order in which the matrices appear will affect the result, so A - B is not the same as B - A. When more than two matrices are involved in an expression involving subtraction, the order in which the addition operations occur does affect the result, so (A - B) - M is not the same as A - (B - M).

Matrix multiplication

[Return to top of page]

A matrix can be multiplied by a number (a process called scalar multiplication) in a straightforward manner. Quite simply, every entry in the matrix is multiplied by the number in question. The following example illustrates the process:

A × 3 =	-5	4	-10	× 3 =	-15	12	-30
	15	-11	10		45	-33	30
	-8	16	-2		-24	48	-6

Multiplying two matrices together is a little trickier. As with addition, there are restrictions on what kinds of matrices can be multiplied together. The rule is that in order for matrix A to be multiplied by matrix B, where matrix A is the first multiplicand and matrix B is the second, the number of columns in A must be equal to the number of rows in B. Obviously if both A and B are square matrices with the same dimensions, they can be multiplied together in any order. The result is, however, unlikely to be the same. The product AB is usually different from the product BA unless the two matrices are identical. Consider the following matrices:

A =		1	2	-3
		3	4	0

B =	6	7
	2	1
	4	5

We can multiply matrix A by matrix B because the number of columns in matrix A is the same as the number of rows in matrix B. Let's call the matrix resulting from this multiplication C. To find the first element of the first row in matrix C, we multiply each of the numbers in the first row of matrix A with the corresponding numbers in the first column of matrix B, and add the results together. This gives us:

(1)(6) + (2)(2) + (-3)(4) = 6 + 4 + (-12) = -2

We can see from the above that element c_1,1 will be minus two (-2). As well as telling us that this is the first element in the first row of matrix C, the subscripts also tell us that the element is the product of row one of matrix A and column one of matrix B. We will now repeat the procedure for row one of matrix A and column two of matrix B to find the value of element c_1,2:

(1)(7) + (2)(1) + (-3)(5) = 7 + 2 + (-15) = -6

We now multiply row two of matrix A and column one of matrix B to find the value of element c_2,1:

(3)(6) + (4)(2) + (0)(4) = 18 + 8 + 0 = 26

Finally, we multiply row two of matrix A and column two of matrix B to find the value of element c_2,2:

(3)(7) + (4)(1) + (0)(5) = 21 + 4 + 0 = 25

Here is the completed calculation:

AB = C ⇒	1	2	-3	×	6	7	=	-2	-6
	3	4	0		2	1		26	25
					4	5

For any pair of matrices that can be multiplied together, element (i,j) in the resulting matrix will be the result of multiplying the elements of row i in the first matrix by the corresponding elements of column j in the second matrix, and then adding the products together. Note that the shape of the matrix that results from multiplying two matrices together is often different from that of either of the multiplicands. It will in fact have the same number of rows as the first multiplicand and the same number of columns as the second. If we were to multiply matrix B by matrix A such that the product would be BA, matrix C would be very different:

BA = C ⇒	6	7	×	1	2	-3	=	27	40	-18
	2	1		3	4	0		5	8	-6
	4	5						19	28	-12

Clearly then, A × B does not produce the same result as B × A. Note also that given a two-by-three matrix and a three-by-two matrix, it is possible to multiply them together either way round and get a valid result. The same cannot be said of (say) a two-by-three matrix and a three-by-four matrix, since if we put the three-by-four matrix as the first multiplicand and the two-by-three matrix as the second, the number of columns in the first multiplicand will not match the number of rows in the second. Putting the two-by-three matrix as the first multiplicand would work, because then there would be three columns in the first multiplicand and three rows in the second.

A special case of matrix multiplication exists in which a square matrix M is multiplied by an identity matrix I of the same dimensions. If we call the resulting matrix C, the multiplication looks like this:

MI₃ = C ⇒	6	2	7	×	1	0	0	=	6	2	7
	2	4	1		0	1	0		2	4	1
	4	1	5		0	0	1		4	1	5

Note the subscripted number that appears after the letter I. This tells us the dimensions of the identity matrix (in this case, three-by-three). The result of multiplying an n by n square matrix M by its corresponding identity matrix I_n will always result in a matrix C that is identical to matrix M. In essence, multiplying a square matrix by its identity matrix is the same as multiplying it by one.

The inverse of a matrix

[Return to top of page]

Dividing one matrix by another is not defined as such. We can however effectively divide one matrix by another by multiplying it by the inverse of the second matrix. In order for a matrix to be invertible it must be a square matrix (i.e. it must have the same number of rows and columns). Not all square matrices are invertible, however. A square n-by-n matrix A is invertible if (and only if) there exists an n-by-n matrix B such that:

OM = AB = BA = I_n

where I_n is the n-by-n identity matrix. If the above statement is true, then matrix B is the inverse of matrix A, which is denoted by A^-1. Any square matrix that is not invertible is referred to as a singular matrix. A square matrix will only be singular if its determinant is zero (determinants will be explained below). Singular matrices are fairly rare, and most square matrices are invertible. The process of finding the inverse matrix of a square matrix becomes more involved as the size of the matrix increases, but for a two-by-two matrix the process is relatively straightforward and can be illustrated as shown below:

	a	b		-1	=	1			d	-b
	c	d				ad - cb			-c	a

In the case of a two-by-two matrix in the general form shown here, the determinant is defined as ad - bc. Obviously from this definition the inverse matrix will be undefined if ad - bc (i.e. the determinant) evaluates to zero, and our two-by-two matrix will be singular. Assuming this is not the case, then for any n-by-n square matrix A, the following statement will hold true:

AA^-1 = A^-1A = I_n

This is essentially the same statement we saw earlier (AB = BA = I_n) but since by definition matrix B is the inverse of matrix A, we are now identifying it as such (you will recall that I_n is the n-by-n identity matrix). Consider the following two-by-two matrix:

A =		2	1
		2	3

We can find A^-1 as follows:

	2	1		-1	=	1			3	-1
	2	3				(2)(3)-(2)(1)			-2	2

-1

-2

-1

	3
	4

-	1
	4

-	1
	2

	1
	2

To demonstrate that AA^-1 = A^-1A = I₂, we can carry out the matrix multiplications:

AA^-1 =

	3
	4

-	1
	4

-	1
	2

	1
	2

and:

A^-1A =

	3
	4

-	1
	4

-	1
	2

	1
	2

In both cases we get the matrix I₂ (the two-by-two identity matrix) as expected. There are a couple of additional things to note here. The first is that if we invert the matrix A^-1, we get back to matrix A:

	3
	4

-	1
	4

-1

	1
	2

	1
	4

-	1
	2

	1
	2

(³/₄)(¹/₂) - (-¹/₂)(-¹/₄)

	1
	2

	3
	4

	3
	4

-	1
	4

-1

	1
	2

	1
	4

-	1
	2

	1
	2

(³/₈) - (¹/₈)

	1
	2

	3
	4

	3
	4

-	1
	4

-1

	1
	2

	1
	4

-	1
	2

	1
	2

¹/₄

	1
	2

	3
	4

	3
	4

-	1
	4

-1

= 4

	1
	2

	1
	4

-	1
	2

	1
	2

	1
	2

	3
	4

	3
	4

-	1
	4

-1

-	1
	2

	1
	2

The second thing to note is that two matrices, A and B, are both invertible, then the product of matrix A and matrix B is also invertible. In fact, it can be shown that the inverse of AB is the product of B^-1 and A^-1:

(AB)^-1 = B^-1A^-1

Finding the inverse of a three-by-three matrix is a more complex problem, and we will defer dealing with it until after we have looked at determinants.

The determinant

[Return to top of page]

The determinant of a matrix is usually denoted by enclosing the matrix identifier between two vertical bars. Thus, the determinant of a matrix A would be written as |A|. The value of the determinant can tell us something about the matrix. If the value of the determinant is zero, for example, the matrix cannot be inverted (i.e. it is a singular matrix). If the value of the determinant is non-zero, then the matrix can be inverted. Determinants alslo have a role to play in solving systems of linear equations, as we shall see later. Consider the following generic two-by-two matrix:

A =		a	b
		c	d

The determinant of the two-by-two matrix A is given as:

|A| = ad - cb

The determinant is also sometimes written as follows:

\|A\| =		a	b
		c	d

The only difference here is that instead of enclosing the matrix between square brackets, we use vertical bars to denote the fact that we are in fact referring to the determinant of the matrix. We could of course also express the matrix A and its determinant in terms of its element identifiers as follows:

A =		a_1,1	a_1,2
		a_2,1	a_2,2

|A| = a_1,1a_2,2 - a_2,1a_1,2

It is interesting to note that the determinant of the product of two square matrices, A and B, is equal to the product of their individual determinants:

|AB| = |A| · |B|

It is also interesting to note that for a given square matrix, adding a multiple of one row to another row, or a multiple of one column to another column, does not change the determinant. Interchanging two rows or two columns, on the other hand, has the effect of negating the determinant (i.e. multiplying the determinant by minus one).

We will now look at a method for finding the determinant of a three-by-three square matrix, which is somewhat more difficult than for a two-by-two matrix. First of all, we will consider a generic three-by-three matrix, A. The determinant of A is obtained as follows:

|A| =

a_1,1

a_1,2

a_1,3

= a_1,1

a_2,2

a_2,3

- a_1,2

a_2,1

a_2,3

+ a_1,3

a_2,1

a_2,2

a_2,1

a_2,2

a_2,3

a_3,2

a_3,3

a_3,1

a_3,3

a_3,1

a_3,2

a_3,1

a_3,2

a_3,3

The method used here requires us to choose a row or column to work with (any row or column can be used). In the example above, we have chosen the first row. Taking the first element in the row or column we have chosen, we ignore the row and column in which it appears. This leaves us with a two-by-two sub-matrix. Because we have chosen the first row to work with, the first element is a_1,1. We therefore ignore the first row and column, which leaves us with the following sub-matrix:

	a_2,2	a_2,3
	a_3,2	a_3,3

If we repeat this procedure with element a_1,2 we get the second sub-matrix:

	a_2,1	a_2,3
	a_3,1	a_3,3

And for element a_1,3 we get the third sub-matrix:

	a_2,1	a_2,2
	a_3,1	a_3,2

The next step is to find the determinants for each sub-matrix. The determinant of a submatrix is referred to as a minor. We have already seen how to find the determinant of a two-by-two matrix, but we will show here the calculation required to find the determinant for the first sub-matrix anyway:

	a_2,2	a_2,3		-1	= a_2,2a_3,3 - a_3,2a_2,3
	a_3,2	a_3,3

Once we have found the three determinants, we need to change their sign according to the sign matrix shown below. To use the sign matrix, we choose the same row or column in the sign matrix as we originally chose from matrix A (in this case, row one). The sign of each determinant will change if the corresponding sign in the chosen row or column of the sign matrix is negative, but stay the same if the sign is positive. The three determinants, with their revised sign allocations, are called co-factors. The sign matrix itself is fairly self explanatory, since for any m-by-m sign matrix, the signs alternate between positive and negative across each row and down each column, starting with a positive sign in the top left-hand corner. The rule is that for each position in the matrix, if the row and column number added together produce an even number then the element occupying that position is the positive sign. If the row and column number added together produce an odd number, the element will be the negative sign.

+	-	+
-	+	-
+	-	+

Because we must choose row one in the sign matrix (+ - +), the first and last co-factors will retain their existing sign, but the second co-factor will change its sign. The final step is to multiply each co-factor by the corresponding element in the row or column we originally selected in matrix A, and then add the results together to get the determinant of our three-by-three matrix. Since we chose the first row, we will multiply the first co-factor by a_1,1, the second co-factor by a_1,2, and the third co-factor by a_1,3. Note that the co-factor corresponding to a particular element in the original matrix is sometimes denoted using the capitalised version of the matrix element's ID. Thus, the co-factor corresponding to element a_1,1 can be identified as A_1,1. We could therefore express the determinant of our generic three-by-three matrix A as:

|A| = a_1,1A_1,1 - a_1,2A_1,2 + a_1,3A_1,3

There is a good possibility that you are by now thoroughly confused, but a concrete example should help to clarify how this all works.

Consider the following three-by-three matrix:

A =	5	7	2
	1	9	4
	2	6	3

Here is the calculation to find the determinant of A (note that we will again use the first row to determine the minors):

\|A\| =	5	7	2	= 5	9	4	- 7	1	4	+ 2	1	9
	1	9	4		6	3		2	3		2	6
	2	6	3

|A| = 5((9)(3)-(6)(4)) - 7((1)(3)-(2)(4)) + 2((1)(6)-(2)(9))

|A| = 5(27-24) - 7(3-8) + 2(6-18)

|A| = 5(3) - 7(-5) + 2(-12)

|A| = 15 + 35 - 24

|A| = 26

Just to illustrate the point that we can indeed use any row or column for this calculation and still get the same result, let's repeat the exercise with column two (the only slightly tricky thing here is to remember to select the correct sign for each co-factor using the sign matrix):

\|A\| =	5	7	2	= -7	1	4	+ 9	5	2	- 6	5	2
	1	9	4		2	3		2	3		1	4
	2	6	3

|A| = -7((1)(3)-(2)(4)) + 9((5)(3)-(2)(2)) - 6((5)(4)-(1)(2))

|A| = -7(3-8) + 9(15-4) - 6(20-2)

|A| = -7(-5) + 9(11) - 6(18)

|A| = 35 + 99 - 108

|A| = 26

Finding the determinants of larger matrices

Although in an exam you will probably not be asked to find the determinants for matrices larger than three-by-three, there are formulae that can be used for this purpose. For any n-by-n matrix A, there is a general formula that can be used to describe the calculation for the determinant. In fact, there are two versions of the formula, depending on whether we start the proceedings by choosing a row or a column. Here is the formula which will apply if we should choose a row, i:

\|A\| =	j = n	a_i,jA_i,j
	Σ
	j = 1

and here is the formula which will apply should we choose a column, j:

\|A\| =	i = n	a_i,jA_i,j
	Σ
	i = 1

Let's see what this means for a four-by-four matrix using a concrete example. Consider the following four-by-four matrix:

A =	3	0	2	-1
	1	2	0	-2
	4	0	6	-3
	5	0	2	0

Here is the sign matrix we must use for a four-by-four matrix:

+	-	+	-
-	+	-	+
+	-	+	-
-	+	-	+

Let's assume that we will use the first row of our four-by-four matrix do derive the determinant. Here is the first stage in the calculation:

\|A\| =	3	0	2	-1	= 3	2	0	-2	- 0	1	0	-2	+ 2	1	2	-2	- (-1)	1	2	0
	1	2	0	-2		0	6	-3		4	6	-3		4	0	-3		4	0	6
	4	0	6	-3		0	2	0		5	2	0		5	0	0		5	0	2
	5	0	2	0

We now have four three-by-three minors and their corresponding co-factors. Essentially, finding the determinant of any high order matrix is going to be a process of recursively working our way down to successively smaller minors until we have two-by-two minors for which we can calculate the determinant. Note that the second three-by-three minor is multiplied by zero (highlighted above) and will therefore evaluate to zero. We can therefore ignore this minor for the next step of our calculation, which is shown below.

\|A\| = 3	(										)
		2	6	-3	- 0	0	-3	- 2	0	6
			2	0		0	0		0	2

+ 2	(											)
		1	0	-3	- 2	4	-3	- 2	4	0
			0	0		5	0		5	0

+ 1	(										)
		1	0	6	- 2	4	6	+ 0	4	0
			0	2		5	2		5	0

The highlighted elements in this stage of the calculation indicate the two-by-two minors that are either multiplied by zero, or contain two or more elements (in any position) that equal zero. In either of these circumstances, the minor itself will always evaluate to zero, and can be ignored for the purposes of our final calculation. The calculation now becomes relatively straightforward, since we are dealing with two-by-two minors. The final steps in our calculation are shown below.

|A| = 3(2((6)(0)-(2)(-3))) + 2(-2((4)(0)-(5)(-3))) + 1(-2((4)(2)-(5)(6)))

|A| = 3(2(6)) + 2(-2(15)) + 1(-2(-22))

|A| = 36 - 60 + 44

|A| = 20

Looking at what we did above, and thinking about the fact that any minor that is multiplied by zero will not be a part of our calculation, we could have made life a lot easier for ourselves. Supposing we do the calculation again, but instead of choosing row one to select the minors of our four-by-four matrix, we will use column two, which contains three zeros. Here is the first stage in the calculation:

\|A\| =	3	0	2	-1	= -0	1	0	-2	+ 2	3	2	-1	- 0	3	2	-1	+ 0	3	2	-1
	1	2	0	-2		4	6	-3		4	6	-3		1	0	-2		1	0	-2
	4	0	6	-3		5	2	0		5	2	0		5	2	0		4	6	-3
	5	0	2	0

Although we again have four three-by-three minors, this time only the second three-by-three minor is multiplied by a non-zero value. We can therefore ignore the other minors for the next step of our calculation, which is shown below. For this stage of the calculation, we will again make life a bit easier by choosing row three (which contains a zero) to determine our two-by-two minors:

\|A\| = 2	(										)
		5	2	-1	- 2	3	-1	+ 0	3	2
			6	-3		4	-3		4	6

The final calculation is again relatively straightforward, and is shown below.

|A| = 2(5((2)(-3)-(6)(-1)) - 2((3)(-3)-(4)(-1)))

|A| = 2(5(-6-(-6)) - 2(-9-(-4)))

|A| = 2(5(0) - 2(-5))

|A| = 2(10)

|A| = 20

We get the same answer as before, but the calculation was significantly simplified because we chose column two, which contained only one non-zero value. Finding the determinant of a high-order matrix can often be greatly simplified by a judicious choice of row or column to start the proceedings, but you should also note that even if there are no suitable candidate rows or columns, it is usually possible to manipulate the matrix using basic row or column operations to get all zeroes in the matrix either above the diagonal or below it. By so doing, you will always be able to select a row or column with only one non-zero element, making the process of finding the determinant of a high-order matrix somewhat less labour-intensive.

Finding the inverse of a three-by-three matrix

[Return to top of page]

We have already seen how to find the inverse of a two-by-two matrix. An investigation of how we would go about finding the inverse of a three-by-three matrix has been intentionally postponed until now, because in order to understand how this works you need to understand how to find the determinant of a three-by-three matrix. There are a couple of things that the determinant can tell you straight away. First, if the determinant is zero, then the matrix is singular (i.e. it has no inverse). Second, if the determinant is one, then the inverse will be the adjoint of the original matrix (more about what that means shortly). Assuming neither of these conditions applies, then the inverse of any n-by-n matrix M is given by the following formula:

M^-1 =	1	adj M
	\|M\|

The formula states that the inverse of the n-by-n matrix M is the inverse of the determinant of M multiplied by the adjoint of M. We have seen how to calculate the determinant of a three-by-three matrix, but we have not yet looked at the adjoint of a matrix. Simply put, the adjoint of an n-by-n matrix M is formed by taking the co-factor of each element of M, putting these co-factors into another n-by-n matrix, and then transposing the resulting matrix. Actually, it doesn't really sound all that simple, but we have done all of these things before. An example should help to clarify matters. Consider the following matrix:

A =	1	2	3
	0	4	5
	1	0	6

First of all, let's find the co-factor matrix C. You might recall that the co-factor of a matrix element is the signed determinant of the matrix that remains when we remove all elements in the same row and column as the element for which we seek the co-factor. Thus for element a_1,1, the corresponding co-factor matrix element c_1,1 is given as follows:

C_1,1 =		a_2,2	a_2,3		= +(a_2,2a_3,3 - a_3,2a_2,3)
		a_3,2	a_3,3

Substituting actual values, we get:

C_1,1 =		4	5		= +((4)(6) - (0)(5)) = +(24 - 0) = 24
		0	6

So the first element in our co-factor matrix C is twenty-four:

C =		24

Remember that the sign used to modify the value of a determinant to get the co-factor depends on the row number (i) and column number (j) of the matrix element from which it is derived. Essentially, the co-factor will be the determinant multiplied by 1^(i+j) (i.e. plus one or minus one). If in doubt, use the sign matrix we saw earlier as a reference. Let's find the remaining co-factors:

C_1,2 =		0	5		= -((0)(6) - (1)(5)) = -(0 - 5) = 5
		1	6

C_1,3 =		0	4		= +((0)(0) - (1)(4)) = +(0 - 4) = -4
		1	0

C_2,1 =		2	3		= -((2)(6) - (0)(3)) = -(12 - 0) = -12
		0	6

C_2,2 =		1	3		= +((1)(6) - (1)(3)) = +(6 - 3) = 3
		1	6

C_2,3 =		1	2		= -((1)(0) - (1)(2)) = -(0 - 2) = 2
		1	0

C_3,1 =		2	3		= +((2)(5) - (4)(3)) = +(10 - 12) = -2
		4	5

C_3,2 =		1	3		= -((1)(5) - (0)(3)) = -(5 - 0) = -5
		0	5

C_3,3 =		1	2		= +((1)(4) - (0)(2)) = +(4 - 0) = 4
		0	4

Here is the complete co-factor matrix:

C =	24	5	-4
	-12	3	2
	-2	-5	4

We now transpose the co-factor matrix to get the adjoint of A:

adj A = C^T =	24	-12	-2
	5	3	-5
	-4	2	4

We still need to find the determinant of A. If we use the first row to determine the minors, the calculation will be as follows:

\|A\| =	1	2	3	= 1	4	5	- 2	0	5	+ 3	0	4
	0	4	5		0	6		1	6		1	0
	1	0	6

|A| = 1((4)(6)-(0)(5)) - 2((0)(6)-(1)(5)) + 3((0)(0)-(1)(4))

|A| = 1(24-0) - 2(0-5) + 3(0-4)

|A| = 1(24) - 2 (-5) + 3(-4)

|A| = 24 + 10 - 12

|A| = 22

The inverse of matrix A is therefor given by:

A^-1 =

adj A =

-12

	12
	11

-	6
	11

-	1
	11

|A|

-5

	5
	22

	3
	22

-	5
	22

-4

-	2
	11

	1
	11

	2
	11

Eigenvalues and eigenvectors

[Return to top of page]

An eigenvalue of a square matrix is a scalar value (i.e. a real or a complex number) that is usually represented by the Greek letter lambda (λ). An eigenvector is a special kind of matrix vector that is usually associated with a linear equation. An eigenvector must be non-zero (i.e. it cannot be the zero vector). An eigenvector is usually represented by the lower case character v. For any square matrix M, an eigenvector and its associated eigenvalue must satisfy the equation:

Mv = λv

For any n-by-n square matrix, there will be n eigenvalues (although bear in mind that some of the eigenvalues may have the same value). Unlike an eigenvector, an eigenvalue is not required to be non-zero, but it can be shown that if one or more of the eigenvalues of a square matrix is zero, then the matrix is singular (i.e. it is not invertible). The coupling of an eigenvector and its associated eigenvalue is known as an eigenpair. Note that while an eigenvector may only be associated with a single eigenvalue, an eigenvalue may be associated with any number of eigenvectors, since any (non-zero) scalar multiple of an eigenvector M (i.e. an eigenvector of M multiplied by a number) is also an eigenvector of M. Indeed, the whole point of eigenvectors is that they provide a means of scaling a linear system.

The eigenvector that directly corresponds to the largest eigenvalue of a matrix (in terms of its absolute value) is called the dominant eigenvector of the matrix, and the eigenpair consisting of the dominant eigenvector and its associated eigenvalue is called the dominant eigenpair. Finding the eigenvalues and eigenvectors of a system is an important process in many areas of science, engineering and electronics. Engineers, for example, use them to analyse the structure of a bridge or a building to ensure that it will remain stable as conditions (e.g. wind speed, water level, or seismic activity) vary. The process of deriving a set of eigenvalues and eigenvectors for a square matrix is known as eigen decomposition.

We can find the eigenvalues of an n-by-n matrix M by solving the general equation:

det(M - λI_n) = 0

You may recall that I_n is the n-by-n identity matrix. This equation is called the characteristic equation or characteristic polynomial of M, and will be a polynomial equation of degree n (which by definition means it will have a maximum of n real solutions). Consider the following two-by-two matrix:

A =		3	6
		1	4

Here is the corresponding two-by-two identity matrix, I₂:

I₂ =		1	0
		0	1

If we multiply I₂ by lambda (λ) we get:

λI₂ =		λ	0
		0	λ

Subtracting λI₂ from A gives us:

A - λI₂ =		3 - λ	6
		1	4 - λ

And the determinant of A - λI₂ is given by:

	3 - λ	6		= (3 - λ)(4 - λ) - (1)(6) = λ² - 7λ + 6
	1	4 - λ

The result is a second degree characteristic polynomial (in other words, a quadratic expression) which will factorise quite easily as (λ - 1)(λ - 6). Remembering that the general equation to find the eigenvalues of an n-by-n matrix is:

det(M - λI_n) = 0

we can substitute our factorised quadratic expression to get the following equation:

det(A - λI₂) = (λ - 1)(λ - 6) = 0

Thus the eigenvalues of matrix A are one and six (λ=1 or λ=6). Let's have a look at another example. Consider this two-by-two matrix:

A =		1	-2
		-2	0

We have already seen that if we multiply the identity matrix I₂ by lambda (λ) we get:

λI₂ =		λ	0
		0	λ

Subtracting λI₂ from A gives us:

A - λI₂ =		1 - λ	-2
		-2	0 - λ

And the determinant of A - λI₂ is given by:

	1 - λ	-2		= (1 - λ)(0 - λ) - (-2)(-2) = λ² - λ - 4
	-2	0 - λ

The result is again a quadratic expression, but this time it will not factorise, so we need to call upon the quadratic formula:

λ =	1 ± √(1 + 16)
	2

λ =	1 ± √17
	2

So the eigenvalues of A are λ ≈ -1.56 and λ ≈ 2.56. Note that the value of the discriminant (the expression under the radic, or square root symbol) will tell us whether or not the roots of the associated quadratic equation, i.e. the eigenvalues, are real or complex. If the discriminant is positive, there are two real eigenvalues. If the discriminant is zero, there is a single real eigenvalue (or if you like, two identical real eigenvalues). A negative discriminant indicates that the eigenvalues are complex numbers.

For any square matrix, the product of the eigenvalues of the matrix is equal to the determinant of the matrix. This can be used as a quick check to see whether the eigenvalues have been calculated correctly. Here is the matrix for which we have just found the eigenvalues λ ≈ -1.56 and λ ≈ 2.56:

A =		1	-2
		-2	0

The determinant is given by:

(1)(0) - (-2)(-2) = -4

The product of the eigenvalues is given by:

-1.56 × 2.56 = -3.99

Bearing in mind that we rounded off the results of the calculations that we used to find the eigenvalues, it is fairly safe to say that the (approximate) values given for λ are correct.

Since the degree of the characteristic polynomial increases with the size of the matrix, the use of the characteristic equation to find eigenvalues becomes impractical once the size of the matrix becomes too large. We will in any case not be discussing matrices larger than three-by-three in the context of eigenvalues and eigenvectors on this page. We now need to address the issue of finding the eigenvectors for a given matrix, which of course involves first finding the eigenvalues. Let's look again at the following two-by-two matrix:

A =		3	6
		1	4

We have already found the eigenvalues for this matrix (λ = 1 and λ = 6). We will start by finding an eigenvector corresponding to λ = 6. We can rewrite the equation Av = λv as:

(A - λI₂)v = 0

Note that the use of the identity matrix is required here because we cannot subtract a scalar value from a matrix directly. We can represent this equation using matrices as follows:

(						)
	3	6	-	6	0		v =	0	⇒	-3	6	v₁	=	0
	1	4		0	6			0		1	-2	v₂		0

We will now create an augmented matrix (a matrix created by appending the columns of two separate matrices) by combining the matrix we got for A - λI₂ with the two-by-one column matrix (which is the zero matrix, 0_2,1) on the right-hand side of the equation. In this case, we are effectively using an augmented matrix to represent both the coefficient matrix and the right-hand side vector of a linear system for which we are trying to find the variables v₁ and v₂ (note that the convention is to separate the two using a vertical line, as shown below). Once we have the augmented matrix, we can manipulate it using basic row operations (this is effectively a shortcut for manipulating the corresponding linear equations). Here is our augmented matrix:

	-3	6		0
	1	-2		0

We can simplify the first row of our augmented matrix by multiplying it by one third (¹/₃):

	-1	2		0
	1	-2		0

Now we add the first row to the second row:

	-1	2		0
	0	0		0

The last step is to multiply the first row by minus one:

	1	-2		0
	0	0		0

We used a process of Gaussian elimination to get rid of the second row. You may be wondering why we said that we were multiplying the first row by one third, as opposed to dividing it by three. The short answer is that division is not one of the defined row operations. We are however being somewhat pedantic, since the result of multiplying something by the reciprocal of a scalar value is the same as dividing it by the scalar value itself. Dividing a row by a scalar is therefore a valid row operation. Note also that subtracting one row from another row is also not one of the defined row operations, but multiplying a row by minus one and then adding it to another row has exactly the same outcome as subtracting it from the other row! The linear system represented by this matrix can now be represented using the following linear equation:

v₁ - 2v₂ = 0

Obviously there are an infinite number of possible values of v₁ and v₂ that would satisfy this equation. However, if we let v₂ equal c, then v₁ will equal 2c. All eigenvectors of A that correspond to the eigenvalue λ = 6 can thus be written as:

c		2
		1

where c is some arbitrary constant value. We will now repeat the process to find an eigenvector corresponding to λ = 1:

(						)
	3	6	-	1	0		v =	0	⇒	2	6	v₁	=	0
	1	4		0	1			0		1	3	v₂		0

Here is our augmented matrix:

	2	6		0
	1	3		0

We add the second row of our augmented matrix, multiplied by minus one, to the first row:

	1	3		0
	1	3		0

Now we add the result of multiplying the first row by minus one to the second row:

	1	3		0
	0	0		0

The linear system represented by this matrix can be represented using the linear equation:

v₁ + 3v₂ = 0

If we let v₂ equal c, then v₁ will equal -3c. All eigenvectors of A that correspond to the eigenvalue λ = 1 can thus be written as:

c		-3
		1

where c is some arbitrary constant value. Let's look now at a more complex example. Consider the following three-by-three matrix:

A =	1	2	1
	6	-1	0
	-1	-2	-1

We can find the eigenvalues of A by by solving the characteristic equation:

det(A - λI₃) = 0

This expands to:

det(A - λI₃) =	1 - λ	2	1	= 1 - λ	-1 - λ	0	- 2	6	0	+ 1	6	-1 - λ
	6	-1 - λ	0		-2	-1 - λ		-1	-1 - λ		-1	-2
	-1	-2	-1 - λ

The calculation that results is:

det(A - λI₃) = (1 - λ)((-1 - λ)(-1 - λ) - (-2)(0)) - 2((6)(-1 - λ) - (-1)(0)) + ((6)(-2) - (-1)(-1 - λ))

det(A - λI₃) = - λ³ - λ² + λ + 1 + 12 + 12λ -12 - 1 - λ

det(A - λI₃) = - λ³ - λ² + 12λ

As we might expect for a three-by-three matrix, the result is a third degree polynomial. Fortunately for us, the polynomial expression will factor quite nicely as:

- λ(λ + 4)(λ - 3)

Again remembering that the general equation to find the eigenvalues of an n-by-n matrix is:

det(M - λI_n) = 0

we can substitute our factorised quadratic expression to get the following equation:

det(A - λI₃) = - λ(λ + 4)(λ - 3) = 0

Thus the eigenvalues of matrix A are zero, minus four and three (λ = 0, λ = -4 or λ = 3). Having found the eigenvalues, we can now look for eigenvectors that correspond to them. We will start by finding an eigenvector corresponding to λ = 0. Remember that we can rewrite the equation Av = λv as:

(A - λI₃)v = 0

If we expand this equation, we get the following:

(								)
	1	2	1	-	0	0	0		v =	0	⇒	1	2	1	v₁	=	0
	6	-1	0		0	0	0			0		6	-1	0	v₂		0
	-1	-2	-1		0	0	0			0		-1	-2	-1	v₃		0

Note that subtracting λI₃ from A when λ = 0 leaves A unchanged. We will create the augmented matrix using the three-by-one column matrix on the right-hand side of the equation:

1	2	1
6	-1	0
-1	-2	-1

If we add the first row to the last row, we get:

1	2	1
6	-1	0
0	0	0

If we now subtract the second row multiplied by minus two from the first row, we get:

13	0	1
6	-1	0
0	0	0

The linear system represented by this matrix can now be represented using the following linear equations:

13v₁ + v₃ = 0

6v₁ - v₂ = 0

From these equations, we can see that v₂ is equal to 6v₁, and that v₃ is equal to -13v₁. If we let v₁ equal c, all eigenvectors of A that correspond to the eigenvalue λ = 0 can be written as:

c		1
		6
		-13

Let's repeat the process to find an eigenvector corresponding to λ = -4:

(								)
	1	2	1	-	-4	0	0		v =	0	⇒	5	2	1	v₁	=	0
	6	-1	0		0	-4	0			0		6	3	0	v₂		0
	-1	-2	-1		0	0	-4			0		-1	-2	3	v₃		0

Here is the augmented matrix:

5	2	1
6	3	0
-1	-2	3

Add the third row multiplied by five to the first row:

0	-8	16
6	3	0
-1	-2	3

Add the third row multiplied by minus six to the second row:

0	-8	16
0	-9	18
-1	-2	3

Now multiply the first row by one-eighth (¹/₈) and the second row by one-ninth (¹/₉):

0	-1	2
0	-1	2
-1	-2	3

Switch the first and third rows:

-1	-2	3
0	-1	2
0	-1	2

Add the second row multiplied by minus one to the third row:

-1	-2	3
0	-1	2
0	0	0

Switching rows here demonstrates one of the requirements for getting the matrix into something called row-echelon form - see below for an explanation of what this means. Add the second row multiplied by minus two to the first row:

-1	0	-1
0	-1	2
0	0	0

Finally, multiply rows one and two by minus one:

1	0	1
0	1	-2
0	0	0

We now have the matrix in what is called reduced row-echelon form - again, see below for an explanation of this term. The linear system represented by this matrix can now be represented using the following linear equations:

v₁ + v₃ = 0

v₂ - 2v₃ = 0

From these equations, we can see that v₃ is equal to -v₁, and that v₂ is equal to -2v₁. If we let v₁ equal c, all eigenvectors of A that correspond to the eigenvalue λ = -4 can be written as:

c		1
		-2
		-1

Let's find an eigenvector corresponding to λ = 3:

(								)
	1	2	1	-	3	0	0		v =	0	⇒	-2	2	1	v₁	=	0
	6	-1	0		0	3	0			0		6	-4	0	v₂		0
	-1	-2	-1		0	0	3			0		-1	-2	-4	v₃		0

Here is the augmented matrix:

-2	2	1
6	-4	0
-1	-2	-4

Add the third row to the first row:

-3	0	-3
6	-4	0
-1	-2	-4

Multiply the first row by minus one third (-¹/₃):

1	0	1
6	-4	0
-1	-2	-4

Multiply the second row by one half (¹/₂):

1	0	1
3	-2	0
-1	-2	-4

Add the second row multiplied by minus one to the third row:

1	0	1
3	-2	0
-4	0	-4

Add the first row multiplied by four to the third row:

1	0	1
3	-2	0
0	0	0

The linear system represented by this matrix can now be represented using the following linear equations:

v₁ + v₃ = 0

3v₁ - 2v₂ = 0

From these equations, we can see that v₃ is equal to -v₁, and that 2v₂ is equal to 3v₁. If we let v₁ equal c, all eigenvectors of A that correspond to the eigenvalue λ = 3 can be written as:

c		2
		3
		-2

Row-echelon form

[Return to top of page]

We employ elementary row operations to manipulate the rows of a matrix in various ways. In the case of an augmented matrix of the kind seen above, which essentially represents a set of linear equations, the row-equivalent matrix produced by a series of elementary row operations will have the same solution set as the original matrix. To get down to specifics, there are three operations that are valid for an augmented matrix that represents a linear system that will not change the solution set:

interchanging (swapping) two rows
multiplying a row by a (non-zero) constant value
adding one row to another row

We have already seen that for a system of linear equations, each equation in the system can be represented by one row of an augmented matrix. The row-echelon form of a matrix represents a row-equivalent form of the original matrix, and can be produced using the elementary row operations described above in a process of Gaussian elimination. In order for us to be able to state that a matrix is in row-echelon form, the following conditions must be met:

if a row exists that consists entirely of zeros, it must be the bottom row
the first non-zero element of any row must be a one (this is called the leading one)
the leading one in any row must be to the right of the leading one in the previous row (but not necessarily to the immediate right)

There is also a form called reduced row-echelon form in which all of the above conditions apply, together with the requirement that all elements above and below the leading ones must be zero. The only significant differences between the row-echelon form and the reduced row-echelon form is that the latter form is unique for a particular matrix, and provides solutions to the system without requiring any further substitutions. A distinction is made between between Gaussian elimination (which is used to reduce a matrix to basic row-echelon form), and Gauss-Jordan elimination (which is a more rigorous form of Gaussian elimination used to reduce a matrix to reduced row-echelon form). Reducing a matrix to row echelon or reduced row echelon form sometimes involves the use of fractions, as we will see below.

In the solution we obtained above when finding the eigenvectors for our three-by-three matrix, we cheated a little bit to save time. The augmented matrices we used to find the eigenvectors for the eigenvalues zero and three were not reduced to row echelon form. We present below the sequence of row operations required to get each of these matrices into strict reduced row echelon form.

We will start with the row operation sequence for the augmented matrix representing the eigenvalue λ = 0:

1	2	1
6	-1	0
-1	-2	-1

Add the first row multiplied by minus six to the second row:

1	2	1
0	-13	-6
-1	-2	-1

Add the first row to the last row:

1	2	1
0	-13	-6
0	0	0

Multiply the second row by minus one thirteenth (-¹/₁₃):

	6
	13

Add the second row multiplied by minus two to the first row:

	1
	13

	6
	13

The augmented matrix now satisfies all of the requirements for both the row echelon form and the reduced row echelon form. Here is the row operation sequence for the augmented matrix representing the eigenvalue λ = 3:

-2	2	1
6	-4	0
-1	-2	-4

Multiply the first row by minus one half (-¹/₂):

-1

-	1
	2

-4

-1

-2

-4

Add the first row multiplied by minus six to the second row:

-1

-	1
	2

-1

-2

-4

Add the first row to the third row:

-1

-	1
	2

-3

-	9
	2

Multiply the second row by one half (¹/₂):

-1

-	1
	2

	3
	2

-3

-	9
	2

Add the second row multiplied by three to the third row:

-1

-	1
	2

	3
	2

Add the second row to the first row:

	3
	2

Solving systems of linear equations

[Return to top of page]

We have already seen how matrices can be used to find eigenvectors for a system of linear equations. By definition, however, an eigenvector does not provide a unique solution for any one equation. Solving systems of linear equations for known solution sets can also be handled using matrices, as we have seen from the examples provided in the Chinese work Jiuzhang Suanshu. As for eigenvectors, each equation in the system becomes one row of the matrix. Each column in the coefficient matrix represents a single variable. A separate solution vector consisting of a single column holds the known (constant) values from the right-hand side of each equation. A separate column matrix contains the names of the variables (i.e. the unknown values that we are trying to find). An example will help to clarify. Consider the following system of linear equations:

x - y + 3z = 11

2x - y + 4z = 24

2x + 2y + z = 39

This linear system could be represented using matrices as follows:

1	-1	3	x	=	11
2	-1	4	y		24
2	2	1	z		39

If we call the coefficient matrix M, the variable matrix X, and the solution vector matrix B, then we can express this relationship as:

MX = B

Rearranging this equation, we get:

X =	B
	M

Therein lies a problem, because we cannot directly divide one matrix by another. We can however multiply matrix B by the inverse of matrix M, which effectively achieves the same result. Here is the revised equation:

X = M^-1B

We can expand this as:

x	=	1	-1	3	-1	11
y		2	-1	4		24
z		2	2	1		39

You may recall that the inverse of a matrix M is given by the following formula:

M^-1 =	1	adj M
	\|M\|

We therefore need to find the determinant of M and the adjoint of M. The necessary calculations for finding the determinant and the adjoint of a three by three matrix have been covered above, and we will not revisit them here. If you carry out these calculations, however, you should get the following results:

\|M\| = 3, adj M =	-9	7	-1
	6	-5	2
	6	-4	1

We therefore have:

M^-1 =

-9

-1

-3

	7
	3

-	1
	3

-5

-	5
	3

	2
	3

-4

-	4
	3

	1
	3

We can now solve for X:

X =

-3

	7
	3

-	1
	3

-	5
	3

	2
	3

-	4
	3

	1
	3

The values of the variables in our linear system are therefore x = 10, y = 8 and z = 3. In cases where the determinant of matrix M turns out to be zero, then the matrix is singular (i.e. it cannot be inverted). In such a scenario, there is no unique solution to the system of linear equations. This means that there may be more than one solution, but it could also mean that there is no solution at all. The system is said to be consistent if a unique solution exists, and also if many solutions exist. If no solutions exist, it is said to be inconsistent.

Cramer's rule

Cramer's rule (named after Swiss mathematician Gabriel Cramer) describes how to find the solution to a linear system using determinants. One of the useful things about Cramer's rule is that it allows us to find the value of a single variable in a linear system without having to solve the whole system. The down side is that Cramer's rule only works if the coefficient matrix (i.e. the matrix that represents the coefficients of the linear system) is invertible. In order for this to be the case, the determinant of the coefficient matrix must be non-zero. Consider again the equation that describes the relationship between the coefficient matrix M, the variable matrix X, and the solution vector matrix B:

MX = B

Assuming that M is invertible, Cramer's rule states that:

x_n =	\|M_n\|
	\|M\|

where x_n represents the nth element of matrix X, and matrix M_n is obtained by replacing the nth column of matrix M (i.e. the column representing the coefficients of the nth variable) with the elements of column matrix B. Let's put this to the test for a linear system in two variables. Consider the following linear system:

5x - 4y = 2

6x - 5y = 1

Here are the coefficient, variable, and solution vector matrices:

M =		5	-4
		6	-5

X =		x
		y

B =		2
		1

Here is the complete linear system expressed in matrix form:

	5	-4			x		=		2
	6	-5			y				1

First we will find the three determinants |M|, |M_x| and |M_y|:

\|M\| =		5	-4		= (5)(-5) - (6)(-4) = -25 + 24 = -1
		6	-5

\|M_x\| =		2	-4		= (2)(-5) - (1)(-4) = -10 + 4 = -6
		1	-5

\|M_y\| =		5	2		= (5)(1) - (6)(2) = 5 - 12 = -7
		6	1

Having found our determinants, we can now apply Cramer's rule:

x =	\|M_x\|	=	-6	= 6
	\|M\|		-1

y =	\|M_y\|	=	-7	= 7
	\|M\|		-1

Let's try solving a linear system in three variables. Consider the following linear system:

4x - y + z = -5

2x + 2y + 3z = 10

5x - 2y + 6z = 1

Here are the coefficient, variable, and solution vector matrices:

M =	4	-1	1
	2	2	3
	5	-2	6

X =		x
		y
		z

B =		-5
		10
		1

Here is the complete linear system expressed in matrix form:

4	-1	1	x	=	-5
2	2	3	y		10
5	-2	6	z		1

First we will find the four determinants |M|, |M_x|, |M_y| and |M_z|:

\|M\| =	4	-1	1	= 4((2)(6) - (-2)(3)) + ((2)(6) - (5)(3)) + ((2)(-2) - (5)(2))
	2	2	3
	5	-2	6

= 4(12 + 6) + (12 - 15) + (-4 - 10)

= 72 - 3 - 14 = 55

\|M_x\| =	-5	-1	1	= -5((2)(6) - (-2)(3)) + ((10)(6) - (1)(3)) + ((10)(-2) - (1)(2))
	10	2	3
	1	-2	6

= -5(12 + 6) + (60 - 3) + (-20 - 2)

= -90 + 57 - 22 = -55

\|M_y\| =	4	-5	1	= 4((10)(6) - (1)(3)) + 5((2)(6) - (5)(3)) + ((2)(1) - (5)(10))
	2	10	3
	5	1	6

= 4(60 - 3) + 5(12 - 15) + (2 - 50)

= 228 - 15 - 48 = 165

\|M_z\| =	4	-1	-5	= 4((2)(1) - (-2)(10)) + ((2)(1) - (5)(10)) - 5((2)(-2) - (5)(2))
	2	2	10
	5	-2	1

= 4(2 + 20) + (2 - 50) - 5(-4 - 10)

= 88 - 48 + 70 = 110

Having found our determinants, we can now apply Cramer's rule:

x =	\|M_x\|	=	-55	= -1
	\|M\|		55

y =	\|M_y\|	=	165	= 3
	\|M\|		55

z =	\|M_z\|	=	110	= 2
	\|M\|		55

Gaussian elimination

We have already looked briefly at Gaussian elimination in the context of solving linear systems at the beginning of this page, and more extensively in the context of finding eigenvectors. The process essentially involves writing the equations in our linear system as an augmented matrix comprising the coefficient matrix and the solution vector, and then carrying out elementary row operations until we have put the augmented matrix into row-echelon form or reduced row-echelon form. Once the augmented matrix is in row-echelon form, we can re-write the matrix as a series of linear equations and use back substitution where necessary to obtain values for any unresolved variables. If the augmented matrix is in reduced row-echelon form, we have already found the values for all of the variables. Consider the linear system in two variables that we saw earlier:

5x - 4y = 2

6x - 5y = 1

Writing this as an augmented matrix we get:

	5	-4	2
	6	-5	1

We present below the row operation sequence required to get the augmented matrix into reduced row-echelon form.

Multiply the first row by one fifth (¹/₅):

-	4
	5

	2
	5

-5

Add the first row multiplied by minus six (-6) to the second row:

-	4
	5

	2
	5

-	1
	5

-	7
	5

Multiply the second row by minus five (-5):

-	4
	5

	2
	5

Add the second row multiplied by four fifths (⁴/₅) to the first row:

	1	0	6
	0	1	7

Thus the solution to our linear system is x = 6, y = 7. This is the same result we got using Cramer's rule, as you would expect. Let's use Gaussian elimination to find the solution for the linear system in three variables that we also saw earlier:

4x - y + z = -5

2x + 2y + 3z = 10

5x - 2y + 6z = 1

Writing this as an augmented matrix we get:

4	-1	1	-5
2	2	3	10
5	-2	6	1

We present below the row operation sequence required to get the augmented matrix into reduced row-echelon form.

Multiply the first row by one quarter (¹/₄):

-	1
	4

	1
	4

-	5
	4

-2

Add the first row multiplied by minus two (-2) to the second row:

-	1
	4

	1
	4

-	5
	4

	5
	2

	5
	2

	25
	2

-2

Add the first row multiplied by minus five (-5) to the third row:

-	1
	4

	1
	4

-	5
	4

	5
	2

	5
	2

	25
	2

-	3
	4

	19
	4

	29
	4

Multiply the second row by two fifths (²/₅):

-	1
	4

	1
	4

-	5
	4

-	3
	4

	19
	4

	29
	4

Add the second row multiplied by three quarters (³/₄) to the third row:

-	1
	4

	1
	4

-	5
	4

	11
	2

Multiply the third row by two elevenths (²/₁₁):

-	1
	4

	1
	4

-	5
	4

Add the third row multiplied by minus one (-1) to the second row:

-	1
	4

	1
	4

-	5
	4

Add the third row multiplied by minus one quarter (-¹/₄) to the first row:

-	1
	4

-	7
	4

Add the second row multiplied by one quarter (¹/₄) to the first row:

1	0	0	-1
0	1	0	3
0	0	1	2

Thus the solution to our linear system is x = -1, y = 3, and z = 2. This is again the same result we got using Cramer's rule, so no surprises there. We have now looked at various methods for solving linear systems using matrices. I leave it up to the reader to decide which of these methods they prefer.

[Return to top of page]