Number Systems
Integers
Integers, denoted \(\mathbb{Z}\). We use a denary/decimal (base-10) system with 10 digits, but there is also binary, which uses 2 digits.
Subscripts are used to denote base; \(10011_{\textrm{two}} = 19_{\textrm{ten}}\) or \(10011_{2} = 19_{10}\).
Decimal to Binary Conversion Algorithm
Divide decimal number repeatedly by 2 to get remainders \(r_0, r_1, r_2,...r_n\). The binary representation is \(r_{n}r_{n-1}...r_{1}r_0\) (note the switched order).
Problems
- Convert \(244_{10}\) to binary.
- The hexadecimal system is base 16, with digits 0123456789ABCDEF. Convert \(21BAD_{16}\) to decimal.
Generic Base Conversion Algorithm
Let us have an integer \(b\). To convert a base 10 integer to a base \(b\) integer, divide repeatedly by \(b\) to get remainders \(r_0, r_1,...,r_n\), thus the base \(b\) representation would be \(r_{n},r_{n-1},...,r_{1},r_0\) (note the switched order).
Division Algorithm
If \(a, b \in \mathbb{Z}\) with \(b \neq 0\), then there exist unique \(q, r \in \mathbb{Z}\) with
\[\begin{align} & a = qb + r, & 0 \leq r < \|b\| \end{align}\]Where \(q\) is the quotient and \(r\) the remainder.
If \(0 < b < a\) then an algorithm to compute \(q, r\) would be to iteratively compute \(a - b, a - 2b, ..., a-nb, a - (n-1)b\) where \(a - (n-1)b < 0\) is the first strictly negative number. Then \(q = n, r = a-nb\).
If \(a = qb\) (all integers) then \(b\) divides \(a\). The greatest common denominator is denoted \(\gcd(a, b)\).
\[\gcd(0, n) = n \; \forall n \in \mathbb{Z}^+\]Euclidean algorithm
Let \(r_1, r_0\) be integers s.t. \(0 < r_1 < r_0\).
- For each \(i\), define \(r_{i+1}\) as the remainder of
\(\frac{r_{i-1}}{r_i}\).
The last non-zero remainder \(r_N = \gcd(r_1, r_0)\). - \(r_{i-1} = q_i r_i + r_{i+1} \; (1 \leq 1 \leq N\) can be used to write this last nonzero remainder, thus \(\gcd(r_1, r_0) = xr_1 + yr_0 \textrm{ for some integers } x, y.\)
A python implementation of this algorithm is as follows, and can be downloaded here:
#!/usr/bin/python3
from math import log10, ceil
def printTableRow(a,b,r):
print("| {} | {} | {} |".format(
str(a).ljust(strPad),
str(b).ljust(strPad),
str(r).ljust(strPad)
))
a = int(input("Enter the first number: "))
b = int(input("Enter the second number: "))
strPad = ceil(max(log10(a), log10(b)))
if a < b:
a,b = b,a
r = -1
while r != 0:
r = a % b
printTableRow(a,b,r)
a,b = b,r
print("GCD = {}".format(a))
Problems
- Find the greatest common divisor of 16579 and 30031, and determine integers \(x\) and \(y\) such that \(\gcd(16579, 30031) = x16579 + y30031\).
Modular Arithmetic
Two integers \(a, b\) are congruent modulo \(n\) (another integer) if \(a-b = kn \; k \in \mathbb{Z}\), i.e. \(a = b + kn\). Written \(a \equiv b \mod n\) or \(a \stackrel{\mod{}}{\equiv} b\).
Two congruencies with the same mod \(n\) can be added, subtracted, multiplied just like normal equations.
Two’s complement
In a computer, for negative integers we use two's complement, see here.
We are also working with a system modulo \(2^N\) where \(N\) is the number of bits. Thus the 32 bit limit of \([2^{31}, 2^{31} - 1]\) which is \(2^{32}\) integers.
Reals
Reals are denoted \(\mathbb{R}\), and have a subset \(\mathbb{Q}\) which are numbers definable as \(\frac{m}{n}, \; m, n \in \mathbb{Z}, \; n \neq 0\). We can always pick rationals such that \(n \geq 1\) and \(\gcd(m, n) = 1\). Every nonzero rational has an inverse.
Reals which are the solutions of polynomial equations are called algebraic, an example would be \(\sqrt{2}\) (being the solution of \(x^2 = 2\)).
Those which are not are called transcendental, such as \(\pi, e\).
A real number can be thought of as a sequence of rational numbers, which converges to said real. e.g. \(\pi\) is the limit of \(3, 3.1, 3.14, 3.141, 3.1415...\)
Properties of real numbers
All properties of real numbers come from 13 axioms, of which 1-8 are algebraic properties, and 9-12 are order properties.
For all \(x, y, z \in \mathbb{R}\):
- Commutativity: \(x + y = y + x, \; xy = yx\)
- Associativity: \(x + (y + z) = (x + y) + z\) (same with multiplication)
- Multiply distributes over add: \(x(y + z) = xy + xz\)
- Additive Identity: \(\exists 0 \in \mathbb{R }: x + 0 = x\)
- Multiplicative identity: \(\exists 1 \in \mathbb{R} : x \cdot 1 = x\)
- Multiplicative and additive identites are distinct: \(1 \neq 0\)
- Every element has an additive inverse: \(\exists (-x) \in \mathbb{R}: x + (-x) = 0\)
- Every \(\neq 0\) element has a mul. inverse: \(x \neq 0 \implies \exists x^{-1} \in \mathbb{R} : x \cdot x^{-1} = 1\)
- Transitivity of ordering: \(x < y \land y < z \implies x < z\)
- Trichotomy Law: only one of \(x < y,\; x > y,\; x = y\)
- Order preserved under add: \(x < y \implies x + z < y + z\)
- Order preserved under mul: \(0 < z \land x < y \implies xz < yz\)
- Completeness: Every non-empty subset of \(\mathbb{R}\) that is bounded above has a least upper bound
Exercises
1. Show that \(0 < 1\).
We first need a lemma.
Lemma 1. \(\forall x ,\; x^2 \geq 0\).
Proof of Lemma 1. Consider the cases where \(x < 0\) and \(x \geq 0\).
When \(x \geq 0\),
\[\begin{align} x \geq 0 & \implies x^2 \geq 0 \cdot 0 \textrm{ by ax. 12}\\ & \implies \geq 0. \end{align}\]When \(x < 0\), \(-x \geq 0\), so:
\[\begin{align} x^2 = x \cdot x & = (-x)(-x) \\ & \geq 0 \cdot 0 \textrm{ by prev case}\\ & \geq 0. & \triangleright \end{align}\]Then, we can prove. Proof.
\[\begin{align} 1^2 & \geq 0 \textrm{ by lemma}\\ \implies 1 & \geq 0\\ 1 & \neq 0 \textrm{ by ax. 6 } \therefore 1 > 0. & \triangleright \end{align}\]2. Show that \(a > 0 \implies \frac{1}{a} > 0\).
Proof. Let \(a > 0\)
\[\begin{align} \textrm{If } \frac{1}{a} & = 0 \\ \implies 1 & = a(\frac{1}{a}) = a \cdot 0 = 0 \\ & \textrm{which is a contradiction to ax. 6.} \end{align}\] \[\begin{align} \textrm{If } \frac{1}{a} & < 0 \\ \implies a (\frac{1}{a}) & < a \cdot 0 \textrm{ by ax. 12} \\ \implies 1 & = 0 \textrm{ which is a contradiction.} \end{align}\]By axiom 10 we get that \(\frac{1}{a} > 0. \quad \triangleright\)
Problems
-
Show if \(a, b > 0\) then \(a < b \Longleftrightarrow a^2 < b^2\). Note: you have to prove both ways.
-
Show if \(a < b \land c < 0 \implies ac > bc\).
Intervals
Intervals are ranges represented by brackets. \((a, b)\) ranges are called open, \([a, b), \; (a, b]\) are called semi-open or semi-closed and \([a, b]\) ranges are called closed. \(\infty\) is not a real number and cannot appear in closed ranges.
Nth roots
Let \(n \in \mathbb{Z}^+\). For any \(a \in \mathbb{R}_{\geq 0}, \; \exists! x \geq 0\) with \(x^n = a\). This \(x\) is the \(n^{th}\) root of \(a\), \(a^{\frac{1}{n}}\). For any positive real \(a, b\) and \(n \in \mathbb{Z}^+\) we have \(a < b \Longleftrightarrow a^{\frac{1}{n}} < b^{\frac{1}{n}}\) There's also of course \(a^\frac{1}{2} \equiv \sqrt{a}\) and all that comes with it.
Modulus or absolute function
The Modulus or Absolute of \(x\), \(|x|\) is basically \(x\) but without negatives. \(\|x\| = \sqrt{x^2}, \quad \forall x \in \mathbb{R}\).
There are 4 properties of modulus:
- \[-\|x\| \leq x \leq \|x\|\]
- \[\|xy\| = \|x\|\|y\|\]
- \[\|x + y\| \leq \|x\|+\|y\|\]
- \[\|\|x\| - \|y\|\| \leq \|x - y\|\]
Bounds
For a set of real numbers \(S\)
- \(u\) is an upper bound of S if \(u \geq x \forall x \in S\)
- \(U\) is the least upper bound (supremum) of S if \(U\) is an UB of S and \(U \leq u \forall u\)
- \(l\) is a lower bound of S if \(l \leq x \forall x \in S\)
- \(L\) is the greatest lower bound (infimum) of S if \(L\) is a LB of S and \(L \geq l \forall l\)
Completeness axiom
The completeness axiom suggests that for every set which has a lower bound has a greatest lower bound.
Important consequences of the completeness axiom:
- The ARCHIMEDIAN PROPERTY of \(\mathbb{R}\):
if \(\epsilon \in \mathbb{R}; \epsilon > 0\) then \(\exists n \in \mathbb{Z}^+ : n\epsilon > 1\). - Between any two real numbers there are both rational and irrational numbers.
- Every real number can be represented by a (possibly infinite) decimal expansion.
Complex Numbers
In the set \(\mathbb{C}\), of the form \(a + ib\) where \(i^2 = -1\). Often denoted \(z\).
Representable by an ordered pair \(z = (a, b)\) where \(Re_z = a\) and \(Im_z = b\). Also representable as a point on a 2d plane, the complex plane or argand diagram.
For \(a, b \in \mathbb{R}\) the complex conjugate of \(z = a + ib\) is \(\overline{z} = a - ib\) (overline). Effectively reflecting along x axis.
Complex conjugates
Properties of complex conjugates (\(\forall z, w \in \mathbb{C}\)):
- \[\overline{z + w} = \overline{z} + \overline{w}\]
- \[\overline{zw} = \overline{z}\overline{w}\]
- \[\overline{(\frac{z}{w})} = \frac{\overline{z}}{\overline{w}}\]
- \[\overline{\overline{z}} = z\]
- \[z \in \mathbb{R} \Longleftrightarrow \overline{z} = z\]
- \[Re_z = \frac{z + \overline{z}}{2}\]
- \[Im_z = \frac{z - \overline{z}}{2i}\]
Polar coordinates
If \(x, y \in \mathbb{R}\) and \(x + iy \neq 0\) we can express \(x\) and \(y\) in polar coordinates,
\[\begin{align} & x = r \cos(\theta) & y = r \sin(\theta) \end{align}\] \[\therefore x + iy = r(\cos(\theta) + i\sin(\theta).\] \[\begin{align} & r = \sqrt{x^2 + y^2} & \theta \textrm{ satisfies } \tan(\theta) = \frac{y}{x}. \end{align}\]\(\theta\) is the argument, with the principal argument being a \(\theta \in (-\pi, \pi]\).
You can multiply two complex numbers in polar form, \(r_{1} (\cos(\theta) + i\sin(\theta)) \cdot r_{2} (\cos(\phi) + i\sin(\phi)) = r_{1}r_{2}(\cos(\theta + \phi) + i\sin(\theta + \phi))\).
Complex modulus
\(r\) is the modulus, often denoted \(\|x + iy\|\).
Properties of modulus, for all \(z, w \in \mathbb{C}\):
- \[\|z\| = \|\overline{z}\|\]
- \[\|z\| = \sqrt{z\overline{z}}\]
- \[z\overline{z} = \|z\|^2\]
- \[\|zw\| = \|z\|\|w\|\]
- \(\|z + w\| \leq \|z\| + \|w\|\), (triangle inequality)
- \[\|\|z\| - \|w\|\| \leq \|z - w\|\]
Proof of the triangle inequality is omitted.
De Moivre's Theorem
\(\forall n \in \mathbb{Z}\): \((r(\cos{\theta} + i\sin \theta))^n = r^{n}(\cos{n\theta} + i\sin{n\theta}).\)
Q. Find all complex numbers \(z : z^3 = 1\).
Fundamental Theorem of Algebra
(Gauss) states that an \(n\) degree polynomial must have \(n\) roots.
Other Notation
The conjugate \(\overline{z}\) can also be written \(z*\).
Sometimes (often in engineering) \(i\) is instead written as \(j\), as \(i\) often refers to current.
Vectors
Basics
Vectors in 2 and 3D are members of the sets \(\mathbb{R}^2\) and \(\mathbb{R^3}\) respectively. (Note, thus ordered pairs)
Can be treated as coordinate points.
Denoted either with arrow (\(\vec{x}\)), underline (\(\underline{x}\)), or bold (\(\mathbf{x}\)). I'll be using the arrow.
Addition and scalar multiplication
Addition and scalar multiplication are done element-wise. For \(\vec{x} = (x_1, x_2, ..., x_n)\) and \(\vec{y} = (y_1, y_2, ..., y_n)\)
- \[\lambda \vec{x} = (\lambda x_1, \lambda x_2, ..., \lambda x_n)\]
- \[\vec{x}+\vec{y} = (y_1 + x_1, y_2 + x_2, ...,y_3 + x_n)\]
\(\vec{x} - \vec{y}\) and \(-\vec{x}\) are also defined accordingly from these.
In \(\mathbb{R}^2\) if \(\vec{p} = (p_1, p_2)\) then this is the directed line segment \(\overrightarrow{OP}\) starting at origin \(O\) and ending at point P \((p_1, p_2)\). \(\vec{p}\) is then the position vector of P.
Position, unit and zero vectors, and vector length
Two line segments are equivalent if they have the same length and direction.
For points \(A, B\) with vectors \(\vec{a}, \vec{b}\) then \(\overrightarrow{AB} = \vec{b} - \vec{a}\), for \(\vec{a} = (a_1, a_2) \in \mathbb{R}^2\)
The length of \(\vec{a}\) is written as \(\|\vec{a}\| = \sqrt{a_{1}^{2} + a_{2}^{2}}\); similarly in 3D.
A unit vector has length 1. The distance between \(\vec{a}, \vec{b} = \|\vec{b} - \vec{a}\|\).
The zero vector, \(\vec{0}\) has all zeros in it.
The scalar/dot product
The scalar (dot) product, \(\vec{a} \cdot \vec{b}\) is the real number \(a_1 b_1 + a_2 b_2 + ... + a_n b_n\).
The angle between two position vectors, \(\theta\) between vectors \(\vec{a}, \vec{b}\) is given by \(\cos \theta = \frac{\vec{a} \cdot \vec{b} }{|\vec{a}||\vec{b}|}.\) Two vectors are orthogonal (perpendicular) if dot product is 0.
All definitions (if not already) can be extended to \(n\) dimensions.
Linear Combinations and Subspaces
Linear Combinations
If \(\vec{u}_1, \vec{u}_2, ..., \vec{u}_m \in \mathbb{R}^n\) and \(a_1, a_2, ..., a_m \in \mathbb{R}\), then any vector of the form \(a_1 \vec{u}_1 + a_2 \vec{u}_2 + ... + a_m \vec{u}_m\) is a linear combination of \(\vec{u}_1, \vec{u}_2, ..., \vec{u}_m\).
A linear combination of a single vector is defined as a multiple of that vector.
In \(\mathbb{R}^3\) if \(\vec{u}, \vec{v}\) are not parallel, then \(\alpha \vec{u} + \beta \vec{v}\) represents the vertex of a parallelogram having \(\alpha \vec{u}, \beta \vec{v}\) as sides - a vector in the plane containing \(\vec{u}, \vec{v}, \vec{0}\).
Spans
If \(U = \{\vec{u}_1, \vec{u}_2, ..., \vec{u}_m\}\) is a finite set of vectors in \(\mathbb{R}^n\), then the span of U is the set of all linear combinations of vectors in U and is denoted \(\textrm{span } U\); \(\textrm{span } U = \{a_1 \vec{u}_1 + a_2 \vec{u}_2 + ... + a_m \vec{u}_m : a_1, a_2, ..., a_n \in \mathbb{R}\}.\)
- If \(U = \{\vec{u}\}\) then the span is the set of all multiples of \(\vec{u}\).
- Note that for basis spans, 1 vector is a line, 2 vectors is a plane, and onwards to hyperplanes. (Basis is covered in next section)
- Elementary spans of \(\mathbb{R}^2, \mathbb{R}^3\) are \(\{(1, 0), (0, 1)\}\) and \(\{(1, 0, 0), (0,1,0), (0,0,1)\}\) respectively.
Subspaces
A subspace of \(\mathbb{R}^n\) is a non-empty subset \(S \subseteq \mathbb{R}^n\) such that:
\[\begin{align} (1) & \vec{u}, \vec{v} \in S \implies \vec{u} + \vec{v} \in S;\\ (2) & \vec{u} \in S, \lambda \in \mathbb{R} \implies \lambda \vec{u} \in S. \end{align}\]i.e. closed on addition and multiplication.
Means if a set of vectors is in a subspace, any linear combinations of those vectors is also in.
Two elementary subspaces of \(\mathbb{R}^n\) are \(\{\vec{0}\}\) (just empty) and \(\mathbb{R}^n\) itself.
Properties of Subspaces
- Every subspace contains \(\vec{0}\).
- If \(U\) is a nonempty finite subset of \(\mathbb{R}^n\) then \(\textrm{span } U\) is a subspace, the subspace spanned or generated by U.
Exercise
Determine if \(S\) is a subspace of \(\mathbb{R}^n\):
- \[S = \{(x, y, 0) : x, y \in \mathbb{R}\} \in \mathbb{R}^3\]
- \[S = \{(1, 1)\} \in \mathbb{R}^2\]
- \[S = \{(x, y) : x^2 + y^2 \leq 1\} \in \mathbb{R}^2\]
Solutions.
1. We need to show closure on addition and scaling. Let \(\vec{u}, \vec{v} \in S : \vec{u} = (a, b, 0), \vec{v} = (c, d, 0)\) for some \(a, b, c, d \in \mathbb{R}\). \(\vec{u} + \vec{v} = (a, b, 0) + (c, d, 0) = (a+c, b+d, 0) \in S.\) For any \(\lambda \in \mathbb{R}\) \(\lambda \vec{u} = \lambda (a, b, 0) = (\lambda a, \lambda b, 0) \in S.\)
2. Nope, since \(2(1,1) \not \in S\), so no scaling closure.
3. Nope. Let \(\vec{u} = (1, 0), \vec{v} = (0, 1)\), both of which \(\in S\), however \(\vec{u} + \vec{v} = (1, 1)\). \(1^2 + 1^2 = 2 \not \leq 1\), so not closed under addition.
Linear Independence
A set of vectors \(\{\vec{u}_1, \vec{u}_2, ..., \vec{u}_m\} \in\mathbb{R}^n\) are linearly dependent IF there are real numbers \(a_1, a_2, ..., a_n\) which are NOT ALL ZERO such that \(a_1 \vec{u}_1 + a_2 \vec{u}_2 + ... + a_m \vec{u}_m = \vec{0}.\)
Thus a linearly independent set is where IF \(a_1 \vec{u}_1 + a_2 \vec{u}_2 + ... + a_m \vec{u}_m = \vec{0}.\) THEN ALL \(a_i\) are 0.
i.e. if you can't find a nonzero linear combination that makes zero vector, then the set is linearly independent.
- If a set contains one nonzero vector, it is linear independence
- If it contains the zero vector, it is linear dependence
In a set of three vectors, you can fairly easily solve three simultaneous equations all with the sum of zero. Then, you either find that your three coefficients \(\alpha, \beta, \gamma\) has to be 0 (indep) or there is some non-zero relationship between at least two of them (dep)
Theorem. A set \(\{\vec{u}_1, \vec{u}_2, ..., \vec{u}_m\}\) of nonzero vectors is linearly depending iff some / any vector \(\vec{u}_r\) is a linear combination of its predecessors $${\vec{u}_1, \vec{u}_2, ..., \vec{u}_m{r-1}}$$
(proof omitted)
Basis and Dimension
Basis
Let \(S\) be a subspace of \(\mathbb{R}^n\). A set of vectors is a basis of S if it is a linearly independent set which spans S.
e.g. The set \(\{(1,0,0), (0,1,0), (0,0,1)\}\) is a basis for \(\mathbb{R}^3\). In fact, it is the standard basis.
Theorem. Let \(S\) be subspace of \(\mathbb{R}^n\). If a set \(\vec{v}_1, \vec{v}_2, ..., \vec{v}_m\) spans S, then any linearly independence subset of S has at most \(m\) vectors.
(proof omitted)
This leads to the fact that any two bases for a subpace S have the same number of elems.
Dimension
The dimension of a subspace of \(\mathbb{R}^n\) is the number of vectors in the basis.
Exercises
1. Show that the set \(S = \{(x, y, z) : x + 2y - z = 0\}\) is a subspace of \(\mathbb{R}^3\), and find a basis and dimension of \(S\).
Solution. We can rewrite S as:
\[\begin{align} S & = \{(x, y, x+2y = 0) : x, y \in \mathbb{R}\} \\ & = \{x(1, 0, 1) + y(0, 1, 2) : x, y \in \mathbb{R}\} \\ & = \textrm{ span } \{(1, 0, 1), (0, 1, 2)\}. \end{align}\]Whih shows that S is a subspace, since the span of any nonempty finite set is a subspace (known property).
By inspection we can see that the two vectors in the set are independent, thus it is a basis. Thus the dimension of S is 2.
(Supposedly) we can use the theorem from linear independence to construct a basis from as panning set.
Let \(\{\vec{v}_1, \vec{v}_2, ..., \vec{v}_m\}\) be a basis of a subspace S of \(\mathbb{R}^n\). Then removing each \(\vec{v}_i\) which is a linear combination of its "predecessors" will leave a basis for S.
2. Find a basis for and dimension of a subspace S (of \(\mathbb{R}^4\)) spanned by \(\{(2,1,0,-3), (-1,0,-1,2), (1,2,-3,0), (0,0,0,1), (0,1,-2,0)\}.\)
Solution. Let's look at (1, 2, -3, 0) and see if it is a linear comb. of predecessors. \((1, 2, -3, 0) = \alpha(2,1,0,-3) + \beta(-1,0,-1,2)\)
\[\begin{align} & \implies \begin{cases} 2\alpha - \beta & = 1 \\ \alpha & = 2 \\ -\beta & = -3 \\ -3\alpha + 2\beta & = 0 \end{cases} & \implies \begin{cases} \alpha & = 2 \\ \beta & = 3. \end{cases} \end{align}\]We can see it is a linear combination, so we remove, giving \(\{(2,1,0,-3), (-1,0,-1,2), (0,0,0,1), (0,1,-2,0)\}.\)
Now we next check \((0,0,0,1)\). \((0,0,0,1) = \alpha(2,1,0,-3) + \beta(-1,0,-1,2)\)
\[\begin{align} & \implies \begin{cases} 2\alpha - \beta & = 0 \\ \alpha & = 0 \\ -\beta & = 0 \\ -3\alpha + 2\beta & = 1 \end{cases} & \implies \begin{cases} \alpha & = 0 \\ \beta & = 0 \\ -3\alpha + 2\beta & = 1 \end{cases} \end{align}\]Which have no solution of \(\alpha, \beta\) and thus (0, 0, 0, 1) is not a linear combination of priors. Thus \(\{(0,0,0,1),(2,1,0,-3),(-1,0,-1,2)\}\) is a linear independence set.
Check the last one against all others, \((0,1,-2,0) = \alpha(2,1,0,-3) + \beta(-1,0,-1,2) + \gamma(1, 2, -3, 0)\)
\[\begin{align} & \implies \begin{cases} 2\alpha - \beta & = 0 \\ \alpha & = 1 \\ \beta & = 2 \\ -3\alpha + 2\beta + \gamma & = 0 \end{cases} & \implies \begin{cases} \alpha & = 1 \\ \beta & = 2 \\ \gamma & = -1. \end{cases} \end{align}\]So we remove that, thus finally the remaining set \(\{(0,0,0,1),(2,1,0,-3),(-1,0,-1,2)\}\) is the final basis of \(S\), which gives \(S\) a dimension of 3.
Properties of bases
Let S be an \(m\)-dimensional subspace of \(\mathbb{R}^n\) then
- Any subset of S with more than \(m\) vectors is linearly dependent;
- A subset of S is a basis if and only if it is a linearly independent set containing \(m\) vectors.
It then follows that any subset of \(\mathbb{R}^n\) is linearly dependent, and a subset of \(\mathbb{R}^n\) iff it is a linearly independent set containing \(n\) vectors.
\(\mathbb{R}^2, \mathbb{R}^3\) properties...
Subspaces of \(\mathbb{R}^2\):
- There is one 0 dim subspace \(\{\vec{0}\}\)
- A 1D subspace is spanned by a single non-zero vector: straight lines through origin.
- The only 2D subspace is \(\mathbb{R}^2\)
Subspaces of \(\mathbb{R}^3\):
- There is one 0 dim subspace \(\{\vec{0}\}\)
- A 1D subspace is spanned by a single non-zero vector: straight lines through origin.
- A 2D subspace is spanned by 2 linear independence vectors: plains containing the origin
- The only 3D subspace is \(\mathbb{R}^3\)
Change of basis
Exercise
What are the co-ordinates of the vector \(\vec{w} = (8,-9,6)\) in relation to the basis \(V = \{(1,-1,3),(-3,4,9),(2,-2,4)\}\)
Solution #1 \(\begin{align} \begin{cases} \alpha - 3\beta + 2\gamma & = 8 \\ -\alpha + 4\beta -2\gamma & = -9 \\ 3\alpha + 9\beta + 4\gamma & = 6 \end{cases} & \implies \begin{cases} \alpha & = 5 \\ \beta & = -1 \\ \gamma & = 0. \end{cases} \end{align}\) Hence, the co-ordinates in the new basis are \(\vec{w} = (5,-1,0)\)
Solution #2 \(\alpha \vec{v}_1 + \beta \vec{v}_2 + \gamma \vec{v}_3 = \begin{bmatrix}8 \\ -9 \\ 6\end{bmatrix}\)
\[x\begin{bmatrix}1 \\ -1 \\ 3\end{bmatrix} + y\begin{bmatrix}-3 \\ 4 \\ 9\end{bmatrix} + z\begin{bmatrix}2 \\ -2 \\ 4\end{bmatrix} = \begin{bmatrix}8 \\ -9 \\ 6\end{bmatrix}\] \[\implies x = 5, y = -1, z = 0\]Matrices
Matrix Algebra
Matrices are rectangular arrays of elements. It's order is its \(\textrm{row } \times \textrm{ column}\). Note that an \(m \times 1\) matrix is called a column matrix/vector, and \(1 \times n\) matrices are similarly named as rows. Elements are referred to with subscript row column: \(a_{ij}\).
Addition and scalar multiplication
Sum of two matrices is only defined if they have the same order, and is elementwise addition.
Scalar multiplication is also done elementwise.
Properties of addition and scalar multiplication
\(\forall m \times n\) matrices \(A, B, C\), \(\forall \lambda, \mu \in \mathbb{R}\):
- \(A + (B+C) = (A+B)+C\) (associativity of addition)
- \[A + O = A = O + A\]
- \[A + (-A) = O = (-A) + A\]
- \(A+B=B+A\) (commutativity of addition)
- \[(\lambda + \mu)A = \lambda A + \mu A\]
- \[\lambda (A+B) = \lambda A + \lambda B\]
- \[\lambda(\mu A) = (\lambda \mu ) A\]
Matrix multiplication
Matrix multiplication can only happen between an \(A_{m \times n}\) and a \(B_{n \times p}\) (note the highlighted dimensions) and will produce a matrix \(C_{m \times p}\).
Matrix multiplication is hard to explain in text, so see this video by blackpenredpen if you're not sure
Properties of matrix multiplication
Whenever the products exist, matrix multiplication has the properties:
- \((AB)C = A(BC)\) (associativity)
- \(A(B+C) = AB + AC\), \((A+B)C = AC + BC\)
- \[IA = A = AI\]
- \[OA = O = AO\]
- \(A^p A^q = A^{p+q} = A^q A^p\), \((A^p)^q = A^{pq}\)
Note that matrix multiplication is not commutative: \(AB \neq BA\) (for all but specific circumstances).
Matrix transposition
The transpose \(A^T\) of a matrix is obtained by swapping rows and columns (i.e. reflecting on leading diagonal).
Properties of transposition
- \((A^T)^T=A\) holds for any matrix \(A\)
- \((A+B)^T = A^T + B^T\) if \(A+B\) exists
- \((\lambda A)^T = \lambda A^T\) for any \(\lambda \in \mathbb{R}\)
- \((AB)^T = B^T A^T\) if \(AB\) exists.
For same order square matrices \(A, B\), \(B\) is the inverse of \(A\) if and only if \(AB = I = BA\). The inverse (should it exist) is unique and denoted \(A^{-1}\).
Types of matrices
A square matrix \(A_{n \times n}\) is said to be of order \(n\)
The zero matrix, denoted \(O_{m \times n}\) is a matrix of all zeros.
Diagonal matrices only have elements on the leading diagonal; \(a_{ii}\) for some \(i : [1..n]\).
The identity matrix \(I\) (or \(I_n\)) is the \(n \times n\) diagonal matrix whose diagonal elements are all 1.
For a square matrix \(A\), \(A, AA, AAA, ...\) are defined as \(A, A^2, A^3,...\) respectively. \(A^0 = I\). Functions \(\exp(A), \cos(A), \sin(A)\) can also be defined (hint: taylor series).
Determinant of a 2x2 matrix
The determinant of a \(2 \times 2\) matrix \(A = \begin{bmatrix} a & b \\ c & d \end{bmatrix}\) is \(ad-bc\) and denoted \(|A|, \det(A)\).
If a \(2 \times 2\) matrix is invertible, then \(\det(A)det(A^{-1}) = \det(AA^{-1}) = \det(I) = 1\). Thus \(\det(A) \neq 0\) and in that case,
The inverse of \(A = \begin{bmatrix} a & b \\ c & d \end{bmatrix}\) is \(A^{-1} = \frac{1}{\det A} \begin{bmatrix} d & -b \\ -c & a \end{bmatrix}\) (a particular case of a general result)
Matrix Inverse, Linear Equations
A system of linear equations can be written in matrix form.
\[\begin{align} ax_1 + bx_2 & = y_1 \\ cx_1 + dx_2 & = y_2 \end{align}\]\(\equiv \begin{bmatrix} a & b \\ c & d \end{bmatrix} \begin{bmatrix} x_1 \\ x_2 \end{bmatrix} = \begin{bmatrix} y_1 \\ y_2 \end{bmatrix}.\) Which can be extrapolated to general form.
Elementary row operations
The following operations can be performed to solve a system (Gaussian Elimination):
- Swap two rows (equations)
- Multiply a row (both sides of an equation) by a nonzero number
- Add a multiple of one row (equation) to another
Augmented matrices
Which can be done over the augmented matrix, which is gotten by combining \(\begin{bmatrix} a & b \\ c & d \end{bmatrix}\) and \(\begin{bmatrix} y_1 \\ y_2 \end{bmatrix}\) (the coefficients and the result). \(\left[\begin{array}{cc|c} a & b & y_1 \\ c & d & y_2 \end{array}\right]\)
Two matrices \(A, B\) are row equivalent if we can use row operations to get from A to B. Denoted \(A \sim B\).
A matrix is in row echelon form if the first nonzero entry in each row is further to the right of said entry in the previous row. By reducing to row echelon form we can solve a system of linear equations.
See the pdf for sample problems. Note: there also exists reduced row echelon form, where each leading entry is a 1, and each column with a 1 in has 0s for all other entries.
Elementary row operations can be done by multiplying by so-called elementary matrices. These are defined for (\(E_{n \times n}\)):
- \(E_{ij}\) obtained from \(I\) by exchanging rows \(i, j\)
- For \(\lambda \neq 0, \; E_i(\lambda)\) obtained from \(I\) by multiplying row \(i\) by \(\lambda\)
- \(E_{ij}(\mu)\) obtained from \(I\) by adding \(\mu \cdot\) row \(j\) to row \(i\)
Every elementary matrix is invertible.
If a sequence of row operations transforms a square matrix \(A\) into \(I\). then \(A^{-1}\) exists and the same sequence transforms \(I\) into \(A\).
This is best done with an augmented matrix, like \(\left[\begin{array}{cc|cc} a & b & 1 & 0 \\ c & d & 0 & 1 \end{array}\right]\)
Matrix Inverse, Determinants
Link to first PDF. Link to second PDF.
Determinant of a 3x3 matrix and the cofactor matrix
The determinant of a \(3 \times 3\) matrix is denoted the same way, and is defined
\[\begin{vmatrix} a_{11} & a_{12} & a_{13} \\ a_{21} & a_{22} & a_{23} \\ a_{31} & a_{32} & a_{33} \end{vmatrix} = a_{11} \begin{vmatrix} a_{22} & a_{23} \\ a_{32} & a_{33}\end{vmatrix} - a_{12} \begin{vmatrix}a_{21} & a_{23} \\ a_{31} & a_{33}\end{vmatrix} + a_{32} \begin{vmatrix}a_{21} & a_{22} \\ a_{31} & a_{32}\end{vmatrix}\]i.e. all elements on the first row multiplied (respecting +/- grid) with the determinant of the minor matrix (the cofactor) - the matrix gotten by deleting the row and column with said element.
\[\begin{bmatrix}+ & - & + \\ - & + & - \\ + & - & +\end{bmatrix}\]This can be done with any row or column.
Elementary row operations on determinants
On elementary row operations and determinants (\(B\) obtained from \(A\)):
- Multiplying a row in A by a \(\lambda\): \(\|B\| = \lambda\|A\|\)
- Swapping 2 rows of A: \(\|B\| = -\|A\|\)
- Adding a multiple of one row to another: \(\|B\| = \|A\|\)
Cramer’s rule to invert matrices
A square matrix is inversible iff its determinant is not 0. If \(A\) is invertible, \(A^{-1} = \frac{1}{|A|} \textrm{adj}(A)\) where \(\textrm{adj}(A)\) is the transposed matrix of cofactors.
You can also use matrix inverses to calculate equations:
\[\begin{align} \textrm{if } & A\vec{x} = \vec{y}\\ \textrm{then } & A^{-1} A \vec{x} = A^{-1} \vec{y} \implies \vec{x} = A^{-1}\vec{y} \end{align}\]Where the column vector x are the variables, and column vector y are the values of the equations.
Linear independence via determinants
A set of \(n\) vectors in \(\mathbb{R}^n\) is linearly independent if and only if it is the set of column vectors of a matrix with nonzero determinant.
Basically, bang \(n\) \(\mathbb{R}^n\) vectors into a square matrix, compute the determinant, and if it is 0, then those vectors are linearly dependent.
Linear Transformations
A function \(T : \textrm{R}^m \longrightarrow \mathbb{R}^n\) is a linear transformation if, \(\forall \vec{u}, \vec{v} \in \mathbb{R}^n, \lambda \in \mathbb{R}\), we have: \(T(\vec{u} + \vec{v}) = T(\vec{u}) + T(\vec{v}),\) \(T(\lambda \vec{u}) = \lambda T(\vec{u}).\) Which are preservation of addition and scaling respectively. Also,
\[T(\vec{0}) = \vec{0}.\]For simple problems, verifying that the transformation fits the two rules of addition and scaling is sufficient.
Example. Let \(\vec{u}\) be a nonzero 2D vector. If \(\vec{x} \in \mathbb{R}^2\) then we define the projection of \(\vec{x}\) onto \(\vec{u}\) to be a vector \(P_{\vec{u}}(\vec{x})\) such that
- \(P_{\vec{u}}(\vec{x})\) is a multiple of \(\vec{u}\)
- \(\vec{x} - P_{\vec{u}}(\vec{x})\) is perpendicular to \(\vec{u}\).
We have by (1) that \(P_{\vec{u}}(\vec{x}) = \alpha \vec{u}\) for some \(\alpha \in \mathbb{R}\), so by (2) \(0 = (\vec{x} - P_{\vec{u}}(\vec{x})) \cdot \vec{u} = (\vec{x} - \alpha \vec{u}) \cdot \vec{u} = \vec{x} \cdot \vec{u} - \alpha |\vec{u}|^2,\) \(\implies \alpha = \frac{\vec{x}\cdot\vec{u}}{|\vec{u}|^2}.\)
The projection can then be regarded as a function \(P_\vec{u} : \mathbb{R}^2 \longrightarrow \mathbb{R}^2\) defined \(\forall \vec{x} \in \mathbb{R}^2\): \(P_{\vec{u}}(\vec{x}) = (\frac{\vec{x}\cdot\vec{u}}{|\vec{u}|^2})\vec{u}.\) This function can be verified to be a linear transformation.
Example. For \(\theta \in [0, 2\pi)\) define \(R_\theta : \mathbb{R}^2 \longrightarrow \mathbb{R^2}\) to be a function describing rotation about angle \(\theta\) through origin. After a bit of derivation, we get \(R_\theta (x, y) = (x\cos\theta - y\sin\theta, x\sin\theta - y\cos\theta).\) Or alternatively in matrix form (let \((x', y')\) be \(R_\theta (x, y)\)) as \(\begin{bmatrix} x' \\ y' \end{bmatrix} = \begin{bmatrix}\ \cos\theta & -\sin\theta \\ \sin\theta & \cos\theta \end{bmatrix} \begin{bmatrix} x \\ y \end{bmatrix}.\)
Linear Transformations and Matrices
Referring back to the last example in the last section, it is further true that every \(M_{m \times n}\) matrix can act as a linear transformation (\(T(\vec{x}) = M\vec{x}\)). Vectors are column vectors.
For a basis \(V = \{\vec{v}_1, \vec{v}_2, ... \vec{v}_n\}\) of \(\mathbb{R}^n\), every \(\vec{x} \in \mathbb{R}^n\) has a linear expansion \(\vec{x} = a_1 \vec{v}_1 + a_2 \vec{v}_2 + ... + a_n \vec{v}_n\).
These coefficients \(a_1 ... a_n\) are the coordinates of \(x\) with respect to basis \(V\).
Let \(T : \mathbb{R}^m \longrightarrow \mathbb{R}^n\) be a linear transformation, V be a basis in \(\mathbb{R}^m\) and W a basis in \(\mathbb{R}^n\).
For each vector \(\vec{v}\) in V \(T(\vec{v})\) has an expansion in W. The Matrix of a linear transformation T with respect to V and W is the \(m \times n\) matrix where each column \(i\) contains the coefficients of the expansion of \(T(\vec{v}_i)\) for each \(\vec{v}_i \in V\).
When \(m = n, \; W = V\) then it is referred to as the matrix of T with respect to basis V.
Matrix of a linear transformation. For a linear transformation T (as above), M the matrix of T with respect to bases V, W (as above); the columns of M contain the coordinates of the images of the basis vectors in V w/ respect to W.
If \(\vec{x} \in \mathbb{R}^m\) has coordinates \([x_1,...,x_n]\) with respect to V then the coordinates with respect to W, \([y_1, ... , y_n]\) are \(\begin{bmatrix} y_1 \\ \vdots \\ y_n \end{bmatrix} = M \begin{bmatrix} x_1 \\ \vdots \\ x_n \end{bmatrix}.\)
A matrix that changes between two different bases in \(\mathbb{R}^n\) is called a transition matrix.
The definition on the notes is uh, just do the same thing as above but the matrix will be square.
Eigenvalues and Eigenvectors
Relating to matrices multiplying vectors, and especially where the vectors don't change direction.
For a matrix A and a vector \(\vec{r}\), if \(A\vec{r} = \lambda \vec{r}, \; \lambda \in \mathbb{R}\), then \(\vec{r}\) is the eigenvector and \(\lambda\) is the eigenvalue.
A number \(\lambda\) is an eigenvalue of A if and only if it satisfies the characteristic equation \(|A - \lambda I| = 0.\)
For an order \(n\) matrix there are \(n\) (not necessarily unique) eigenvalues. Eigenvalues can also be \(\in \mathbb{C}\).
Recall that diagonal matrices are written \(\textrm{diag}[a_{11}, a_{22}, ..., a_{nn}]\).
Diagonalisation of Matrices
For an order \(n\) matrix A: \(A = UDU^{-1}\) Where \(D = \textrm{diag}[\lambda_1, \lambda_2, ..., \lambda_n]\) (the eigen values), and \(U = [\vec{v}_1, \vec{v}_2, \dots, \vec{v}_n]\) are the corresponding eigenvectors of said eigenvalues.
Note that if you have repeated eigenvalues, you have to find multiple distinct (linear independence) eigenvectors for that eigenvalue.
Sequences and Series
Sequences
A sequence \((a_n)\) is an infinite list of numbers \((a_0, a_1,...)\). We can define sequences through nth term rules or recursively.
Convergent Sequences
A sequence \(a_n\) is said to converge to a limit \(l \in \mathbb{R}\) if for every small \(\epsilon > 0\) there is a related integer \(N\) such that \(|a_n - l| < \epsilon \; \forall n > N\). This is denoted as
\[\begin{align} & \lim_{n \rightarrow \infty} a_n = l; & a_n \rightarrow l. \end{align}\]In layman's terms, if a sequence converges, then for every (increasingly small) positive number, the difference between the sequence's terms and the limit will be eventually smaller.
\(N\) here is like a threshold indicator, where every term of the sequence after \(N\) will be within the \(\epsilon\) difference.
We generally worth with \(n \in \mathbb{N}\).
Example. The sequence \(a_n = \frac{1}{n}\) converges to 0. We can prove this by going to the definition - let us have a small \(\epsilon > 0\).
Then \(\exists N : a_N - 0 < \epsilon\), or \(\frac{1}{N} < \epsilon\). We can choose any integer \(N > \frac{1}{e}\), to demonstrate that this is possible for any \(\epsilon\). For any \(n > N\): \(|\frac{1}{n} - 0| = \frac{1}{n} < \frac{1}{N} < \epsilon,\) to write it out as an equation.
If possible, try break a large sequence down into simpler components. They can be combined using the following:
Combination Rules for Convergent Sequences
For convergent sequences \(a_n \rightarrow \alpha, b+n \rightarrow \beta, c_n \rightarrow \gamma\):
- Sum rule: \(a_n + b_n \rightarrow \alpha + \beta\)
- Scalar multiple rule: \(\lambda a_n \rightarrow \lambda\alpha, \; \lambda \in \mathbb{R}\)
- Product rule: \(a_n b_n \rightarrow \alpha\beta\)
- Reciprocal rule: \(\frac{1}{a_n} \rightarrow \frac{1}{\alpha}\)
- Quotient rule: \(\frac{b_n}{a_n} \rightarrow \frac{\beta}{\alpha}\)
- Hybrid rule: \(\frac{b_n c_n}{a_n} \rightarrow \frac{\beta\gamma}{\alpha}\)
Exercise
Show that the following sequence converges, find its limit: \(a_n = \frac{(n+2)(2n-1)}{3n^2 + 1}\)
Solution. The components do not converge themselves, a technique here is dividing by the fastest increasing term. Dividing by \(n^2\), \(a_n = \frac{\frac{(n+2)}{n}\frac{(2n-1)}{n}}{\frac{3n^2 +1}{n^2}} = \frac{(1 + \frac{2}{n})(2 - \frac{1}{n})}{3 + \frac{1}{n^2}}.\)
Now the components \(\frac{1}{n}\) and \(\frac{1}{n^2}\) converge to 0, so applying combination rules we can get \(a_n \rightarrow \frac{(1+0)(2-0)}{3+0} = \frac{2}{3}.\)
Bounded sequences
A sequence \(a_n\) is bounded above if there is a number \(U : a_n \leq U \forall n\). \(a_n\) is bounded below if there is a number \(L : L \leq a_n \forall n\). A sequence is bounded if it is bounded both above and below.
Subsequences
A subsequence of a sequence is obtained from the original sequence by deleting some terms. We can say \(a_{2n}, a_{2n+1}\) are the even and odd subsequences respectively of \(a_n\).
A sequence \(a_n\) is an increasing sequence if \(a_{n+1} \geq a_n \;\forall n\). It is a decreasing sequence if \(a_{n+1} \leq a_n \;\forall n\).
Basic Properties of Convergent Sequences
- A convergent sequence has a unique limit.
- If \(a_n \rightarrow l\) then every subsequence of \(a_n\) also converges to \(l\).
- If \(a_n \rightarrow l\) then \(\|a_n\| \rightarrow \|l\|\).
- The squeeze rule. If \(a_n \rightarrow l, b_n \rightarrow l; a_n < c_n < b_n \; \forall n\) then \(c_n \rightarrow l\).
- A convergent sequence is always bounded.
- An increasing sequence which is bounded above converges. A decreasing sequence which is bounded below converges.
The sequence \(c_n\) is "squeezed" between two sequences which both converge to the same limit, so naturally it will too.\(\exists B > 0 : - B \leq a_n \leq B, \; \forall n\).
You can demonstrate an alternating sequence (e.g. \((-1)^n\)) doesn't converge by looking at subsequences.
Exercise
Show that for \(x \geq 0, n > 0\), this: \((1+x)^{\frac{1}{n}} \leq 1 + \frac{x}{n}\).
Hence deduce/show that if \(c > 0\) then \(c^\frac{1}{n} \rightarrow 1\).
Reminder: the binomial theorem is \((1 + x)^n = 1 + nx + \frac{n(n-1)}{2!}x^2 + ... + \frac{n!}{k!(n-k)!}x^k\)
Solution. Firstly, we can rearrange and use the binomial theorem.
\[\begin{align} 1 + x & \leq (1 + \frac{x}{n})^n. \\ (1 + \frac{x}{n})^n & = 1 + n\frac{x}{n} + \textrm{ (other positive terms) via binomial exp, so} \\ 1 + x & \leq 1 + x + \textrm{ (other positive terms)} \\ \end{align}\]Thus if we take \(n^\textrm{th}\) roots of each side we will demonstrate that \((1+x)^\frac{1}{n} \leq 1 + \frac{x}{n}.\)
For the second part, we need to consider separately the cases \(c \geq 1\) and \(c < 1\).
If \(c \geq 1\) then \(c^\frac{1}{n} \geq 1\). If we use the inequality we demonstrated in the first part, letting \(x = c-1 \geq 0\) we get \(1 \leq c^\frac{1}{n} \leq 1 + \frac{c-1}{n}.\)
By the squeeze rule, we can (in an ideal world) see that \(c^\frac{1}{n} \rightarrow 1\).
Finally if \(c < 1\) then \(\frac{1}{c} > 1\) and by the reciprocal rule \(\frac{1}{c^\frac{1}{n}} \rightarrow 1\).
Did you get that? Me neither.
Divergent Sequences
A sequence \(a_n\) is said to diverge to infinity if \(\forall K \in \mathbb{R}\; \exists N : n > N \implies a_n > K\).
In plain English, there is a point in \(a_n\) where the terms are greater than any real number one picks.
We denote this as \(a_n \rightarrow \infty\). \(a_n\) diverges to \(-\infty\) if \(-a_n \rightarrow \infty\).
A divergent sequence that doesn't go off to infinity is said to oscillate.
Basic convergent sequences
\[\begin{align} & \lim_{n \rightarrow \infty} \frac{1}{n^p} = 0 & \forall p > 0 \\ & \lim_{n \rightarrow \infty} c^n = 0 & \forall c : \|c\| < 1 \\ & \lim_{n \rightarrow \infty} c^\frac{1}{n} = 1 & \forall c > 0 \\ & \lim_{n \rightarrow \infty} n^p c^n = 0 & \forall p > 0 \land \|c\| < 1 \\ & \lim_{n \rightarrow \infty} \frac{c^n}{n!} = 0 & \forall c \in \mathbb{R} \\ & \lim_{n \rightarrow \infty} (1 + \frac{c}{n})^n = e^c & \forall c \in \mathbb{R} \end{align}\]Series
A series \(\sum a_n\) is a pair of sequences:
- A sequence \(a_n\) called the sequence of terms
- A sequence \(s_n\) called the sequence of partial sums defined as \(s_n = \sum_{k=0}^{n} a_k.\)
If the sequence of partial sums converges to \(s\), then the series converges to the sum \(s\); \(\sum_{n=0}^{\infty} a_n = s.\) Divergent if not.
A standard one is the geometric series, \(\sum r^n\) which will always converge to \(\frac{1}{1-r}\) when \(|r| < 1\). You can prove this by working out \(rs_n - s_n\).
\[\begin{align} s_n - rs_n = 1 - r^{n+1} & \Longleftrightarrow s_n(1-r) = 1 - r^{n+1}\\ & \Longleftrightarrow s_n = \frac{1 - r^{n+1}}{1-r}. \end{align}\]And \(|r| < 1\) means that \(r\)-power will converge to 0.
A standard divergent series is the harmonic series, \(\sum \frac{1}{n}\). You can prove this by estimating partial sums \(s_2, s_4, s_8, s_{16}, ...\) and see that it will forever build to \(1 + 0.5 + 0.5 + 0.5 + ...\). \(s_{2^n} > 1 + \frac{n}{2}.\) Though any series \(\sum \frac{1}{n^k}\) where \(k > 1\) will always converge.
Basic Properties of Convergent Series
- Sum rule, \(\sum a_n\) converges to \(s,\; \sum b_n\) converges to \(t \implies \sum (a_n + b_n)\) converges to \(s + t\).
- Multiple rule, \(\sum a_n\) converges to \(s, \; \lambda \in \mathbb{R} \implies \sum \lambda a_n\) converges to \(\lambda s\).
- \(\sum a_n\) converges \(\implies a_n \rightarrow 0\).
-
$$\sum a_n \(converges\)\implies \sum a_n$$ also converges.
No. 4 is only useful really when you have an "oscillating" series, like \(\sum \frac{(-1)^n}{2^n}\).
The Comparison Test
Suppose that \(0 \leq a_n \leq b_n\) for all \(n\),
- If \(\sum b_n\) converges then so does \(\sum a_n\)
- If \(\sum a_n\) diverges then so does \(\sum b_n\).
Usually with comparison test questions you just pull out equations from thin air, but (especially with rational functions) there's usually a technique for doing so based on the largest power of \(n\), and slowly chipping away a piece at a time until we get a very simple expression.
Exercises
Determine whether the following series converges or diverges:
- \[\sum \frac{n+2}{n^3 - n^2 + 1}\]
- \[\sum \frac{n^2 + 4}{2n^3 - n + 1}\]
Solution 1. Take a look at the fraction \(\frac{n+2}{n^3 - n^2 + 1}\). If we want to find a smaller fraction, we want a smaller numerator and a larger denominator than our current fraction.
\[\begin{align} & n+2 \geq n & n^3 - n^2 + 1 \geq n^3 \end{align}\]So we get the fraction \(\frac{n}{n^3}\) or \(\frac{1}{n^2}\) which we know will converge as a series. No help there.
For a larger fraction, we want a bigger numerator and a smaller denominator;
\[\begin{align} n+2 & \leq n + 2n \textrm{ or } 3n \\n^3 - n^2 + 1 \geq n^3 - n^2 \textrm{ or } n^2 (n -1) & \geq \frac{n^3}{2}. \end{align}\]Admittedly, at the end we were pulling expressions out of thin air a bit but hopefully you can see that it's all valid. Thus \(\frac{n+2}{n^3 - n^2 + 1} \leq \frac{6}{n^2}\) and since we know \(\sum \frac{6}{n^2}\) converges (\(6 \sum \frac{1}{n^2}\)) the original one must too.
Solution 2. We're gonna be cheat-y and say that this sequence does diverge. To show this, let's find a smaller fraction which diverges.
\[\begin{align} n^2 + 4 & \geq n^2 \\ 2n^3 -n + 1 & \leq 2n^3 \end{align}\](the \(-n\) more than cancels out the \(+1\)) and we can see that this gives us \(\frac{1}{2n} \leq \frac{n^2 + 4}{2n^3 -n+1}\) And oh no, \(\sum \frac{1}{2n}\) diverges (multiple rule) so the original one must too.
The Ratio Test
If $$ | \frac{a_{n+1}}{a_n} | \rightarrow L$$ then |
- \(0 \leq L < 1 \implies \sum a_n\) converges.
- \(L > 1\) or \(L \textrm{ is } \infty \implies \sum a_n\) diverges.
- \(L = 1 \implies\) the test is inconclusive.
Useful in dealing with factorials.
Basic Convergent Series
- \(\sum_{n=0}^{\infty} r^n = \frac{1}{1-r}\) for all \(r : \|r\| < 1\)
- \(\sum \frac{1}{n^k}\) converges for all \(k > 1\)
- \(\sum n^k r^n\) converges for \(k > 0 \land \|r\| < 1\)
- \(\sum_{n=0}^{\infty} \frac{c^n}{n!} = e^c\) for all \(c \in \mathbb{R}\)
Basic Divergent Series
- \(\sum \frac{1}{n^k}\) diverges for all \(k < 1\).
Power series
A power series is one of the form \(\sum a_n x^n\). Usually we start at \(n = 0\), such that the sequence goes \(a_0, a_1x, a_2x^2, ...\)
Basic Properties of Power Series
Let
\[\begin{align} f(x) & = \sum_{n=0}^{\infty} a_nx^n & x \in (-R_1, R_1) \\ g(x) & = \sum_{n=0}^{\infty} b_nx^n & x \in (-R_2, R_2) \end{align}\]For positive \(R_1, R_2\), and let \(R = \min(R_1, R_2)\). Then for all \(x \in (-R , R)\),
- If \(f(x) = g(x)\) then \(a_n = b_n\) for all \(n\)
- Sum rule, \(f(x) + g(x) = \sum_{n=0}^{\infty} (a_n + b_n) x^n\)
- Multiple rule, \(\lambda f(x) = \sum_{n=0}^{\infty} \lambda a_nx^n \; \forall \lambda \in \mathbb{R}\)
- Product rule, \(f(x)g(x) = \sum_{n=0}^{\infty} (a_0b_n + a_1b_{n-1} + ... + a_{n-1}b_1 + a_nb_0)x^n\).
This also gives the general binomial theorem, which for any \(q \in \mathbb{Q}, x \in (-1, 1)\)
\[\begin{align} (1+x)^q & = \sum_{n=0}^{\infty} {q \choose n} x^n \\ {q \choose n} & = \frac{q!}{n!(q-n)!} \end{align}\]Radius of convergence
Lemma. If \(\sum a_n R^n\) converges for some \(R \geq 0\), then \(\sum a_n x^n\) converges \(\forall x : \|x\| < R\). (proof in notes)
\(R \geq 0\) is the radius of convergence of a power series \(\sum a_n x^n\) if it converges according to the above, and diverges if \(|x| > R\). If a series converges \(\forall x\) then the radius is infinity.
If the series \(\sum a_n x^n\) has a conv. rad \(R\) then it defines a function
\[\begin{align} & f(x) = \sum_{n=0}^{\infty} a_nx^n & \forall x \in (-R, R) \end{align}\]You can find the radius of convergence using ratio test, essentially evaluate \(\lim_{n \rightarrow \infty} \|\frac{a_{n+1}}{a_n}\| < 1.\) and you will get an \(\|x\| < ...\) where that something is your radius.
Addendum: Partial Fractions
Partial fractions can be important in series.
Where a rational function is decomposed into a sum of simpler fractions, such as \(\frac{cx+d}{(x-a)(x-b)} = \frac{A}{x-a} + \frac{B}{x-b}.\)
When there is no repeated factor, simply substituting \(x = a, x=b\) can eliminate a term and make it easy to find the unknowns \(A, B\). When there is however, we need to just pick values of \(x\) and solve from there.
If there is a repeated factor \((x-a)^n\), the partial sum will have all the fractions with denominations \((x-a), (x-a)^2,...\) up to \((x-a)^n\)
If the degree of the numerator is higher than the denominator, then we have to divide out the numerator (polynomial long division) to get a valid separable fraction, \(\frac{x^3 + 3x}{(x+1)(x-3)} = x+2 + \frac{10x+6}{(x+1)(x-3)}.\)
Recurrences
Like the differential equations of sequences.
A recurrence is a sequence defined, funnily enough, recursively.
For example, the Fibonacci sequence \(F_n\) is defined as \(F_n = F_{n-1} + F_{n-2}\) \(F_0 = 0, F_1 = 1.\)
The Towers of Hanoi game's number of steps taken was computed to be the recurrence \(T_n = 2T_{n-1} + 1.\) For \(T_0 = 0\) this gives \(0, 1, 3, 7, 15, ...\) which we can see is \(T_n = 2^n - 1\). This latter form is the closed form of \(T_n\) and allows for immediate computation.
We are looking at solving linear recurrences with constant coefficients, of the form \(x_n + a_1x_{n-1} + ... a_kx_{n-k} = f(n)\) where \(f\) is a given function. (If the terms from 1 to k are given, then this describes a unique sequence.)
Homogenous recurrences
More importantly, we will look at homogenous (\(f(n) = 0\)) recurrences with \(k = 2\). Thus, we want the general solution of \(x_n + ax_{n-1} + bx_{n-2} =0.\)
In the general case, we want to look for solutions of the form \(A\lambda^n\) where \(\lambda, A\) are constants. A bit of deriving later, we can get the auxiliary equation in lambda.
The auxiliary equation of the recurrence \(x_n + ax_{n-1} + bx_{n-2} = 0\) is \(\lambda^2 + a\lambda + b = 0.\)
General Solution. of the recurrence \(x_n + ax_{n-1} + bx_{n-2} = 0\),
Let \(\lambda_1, \lambda_2\) be the roots of the auxiliary equation.
- \[\lambda_1 \neq \lambda 2 \implies x_n = A\lambda_1^n + B\lambda_2^n\]
- \[\lambda_1 = \lambda_2 \implies x_n = A\lambda_1^n + Bn\lambda_2^n\]
For constants \(A, B\), which can be found if the first two terms of the sequence are known.
Exercise
Find a closed form for the Fibonacci sequence.
Solution. We have \(F_n = F_{n-1} + F_{n-2}\) so \(F_n - F_{n-1} - F_{n-2} = 0\). The auxiliary equation is \(\lambda^2 - \lambda - 1 = 0\) Which has roots \(\lambda = \frac{1 \pm \sqrt{5}}{2}\) Or \(\phi\) and \(-(\frac{1}{\phi})\) (the golden ratio phi). Hence \(F_n = A(\frac{1 + \sqrt{5}}{2}) + B(\frac{1 - \sqrt{5}}{2}).\) We know \(F_0 = 0, F_1 = 1\) so if we substitute these values in, we get that \(A = \frac{1}{\sqrt{5}}, B = -\frac{1}{\sqrt{5}}\).
\[\begin{align} F_n & = \frac{1}{\sqrt{5}}(\frac{1 + \sqrt{5}}{2}) -\frac{1}{\sqrt{5}}(\frac{1 - \sqrt{5}}{2}). \\ & = \frac{1}{\sqrt{5}}(\phi^n + \phi^{-n}). \end{align}\]Actually by some magic we can further simplify down to \(F_n = \left \lfloor{\frac{\phi^n}{\sqrt{5}}}\right \rfloor\)
Non-homogenous recurrences have the \(f(n)\) bit not 0.
Non-homogenous recurrences
Finding solutions to non-homogenous recurrences of the form \(x_n + ax_{n-1} + bx_{n-2} = f(n)\)
- Find the general solution \(x_n = h_n\) of the homogenous recurrence: \(x_n + ax_{n-1} + bx_{n-2} = 0.\) (will have 2 arbitrary constants)
- Find any particular solution \(x_n = p_n\) of the original recurrence.
- The general solution will be \(x_n = h_n + p_n\).
Finding the particular solution \(p_n\) is not straight forward, but the technique is to try solutions that are similar in form to \(f(n)\). Substitute these into the original recurrence (as they are solutions!) and try solve.
If \(f(n)\) is a constant, try find a constant \(k\).
If \(f(n)\) is a polynomial, try find a same degree polynomial. (substitute in a general polynomial)
Etc. just like differential equations.
Decimal Representation of Reals
Rational numbers can be represented by expansions, so can repeating decimals.
Repeating decimals can be expressed as a fraction by summing an appropriate geometric series.
For example, \(0.4\overline{9}\) can be expanded as follows;
\[\begin{align} 0.4\overline{9} & = \frac{4}{10} + \frac{0}{10^2} + \frac{9}{10^3} + ... \\ & = \frac{4}{10} + \frac{9}{10}(1 + \frac{1}{10} + \frac{1}{10^2} + ...) \\ & = \frac{4}{10} + \frac{9}{100}(\frac{1}{1-0.1}) = \frac{4}{10} + \frac{9}{90} = \frac{4}{10} + \frac{1}{10} = \frac{1}{2}. \end{align}\]The decimal expansion of any arbitrary real can be computed by squeezing between two converging sequences.
Expand if you really want to see...
If that real \(x\) is not an integer, it is between one integer and the next consecutive one. Now divide the integer gap into 10 fractions - if \(x\) is not on a fraction then it is between two consecutive division points, \(a_0 + \frac{a_1}{10} < x < a_0 + \frac{a_1 + 1}{10}\) Where \(a_1 \in \{0..9\}\). We can then repeat to get \(a_0 + \frac{a_1}{10} + \frac{a_2}{10^2} < x < a_0 + \frac{a_1}{10} + \frac{a_2 + 1}{10^2}\) etc etc.
Every rational has a terminating or repeating decimal, there's some formal rigorous waffle and then you compute it via long division really.
Not sure why this is here save the fact that yes decimals are related to series.
Calculus
Limits and Continuity
(Definitions) let a function \(f: I \longrightarrow \mathbb{R}\) be defined on an interval \(I\). We're interested in a point \(a\) in that interval, where \(f\) may or may not be defined.
When \(f(x)\) tends to a value \(l\) as \(x\) tends towards \(a\) from the left (negative side), we write the limit \(\lim_{x \rightarrow a-} f(x) = l\) Conversely, if \(f(x)\) tends to \(l\) as \(x\) to \(a\) from the other side we write \(\lim_{x \rightarrow a+} f(x) = l\) If \(\lim_{x \rightarrow a-} f(x) = l\) and \(\lim_{x \rightarrow a+} f(x) = l\) then \(\lim_{x \rightarrow a} f(x) = l.\) What this says that if every sequence \(x_n\) in \(I\) which converges to \(a\), \(x_n \neq a\), then \(\forall n\) the sequence \(f(x_n)\) converges to \(l\).
Floor and Ceiling functions
The \(\left \lfloor{\textrm{floor}}\right \rfloor\) and \(\left \lceil{\textrm{ceiling}}\right \rceil\) functions always round down and up respectively to the nearest integer.
\(\forall k \in \mathbb{Z}\):
\[\begin{align} & \lim_{x \rightarrow k-} \left \lfloor{x}\right \rfloor = k-1, & \lim_{x \rightarrow k+} \left \lfloor{x}\right \rfloor = k; \end{align}\] \[\begin{align} & \lim_{x \rightarrow k-} \left \lceil{x}\right \rceil = k, & \lim_{x \rightarrow k+} \left \lceil{x}\right \rceil = k+1. \end{align}\]Exercise
Prove that \(\lim_x \rightarrow 0 \frac{1}{x}\) does not exist.
Proof. We need to find a sequence \(x_n \rightarrow 0\) where \(f(x_n)\) does not converge. We can magic the sequence \(x_n = \frac{1}{n}\) which does converge to 0, but \(f(x_n) = n\) is an unbounded sequence which does not converge. Hence, the limit does not exist.
Combination rules for limits
If \(\lim_{x \rightarrow a} f(x) = l\)
and \(\lim_{x \rightarrow a} g(x) = m\) then
- Sum rule: \(\lim_{x \rightarrow a} (f(x) + g(x)) = l + m\)
- Multiple rule: \(\lim_{x \rightarrow a} \lambda f(x) = \lambda l\), \(\lambda \in \mathbb{R}\)
- Product rule: \(\lim_{x \rightarrow a} f(x)g(x) = lm\)
- Quotient rule: \(\lim_{x \rightarrow a} \frac{f(x)}{g(x)} = \frac{l}{m}\) provided \(m \neq 0\)
Squeeze rule for limits
If \(f(x) \leq g(x) \leq h(x)\) for all \(x \neq a\),
\[\lim_{x \rightarrow a} f(x) = l \land \lim_{x \rightarrow a} h(x) = l \implies \lim_{x \rightarrow a} g(x) = l.\]Continuous functions
A continuous function is one with no "jumps" -
A function \(f : D \longrightarrow \mathbb{R}\) (\(D \subseteq \mathbb{R}\)) is continuous at a point \(a\) if \(\lim{x \rightarrow a} f(x)\) exists and equals \(f(a)\). \(f\) is continuous if it is continuous at every point in the interval.
Combination rules for continuous functions
If \(f, g\) are continuous at \(a\), then so are
- The sum \(f+g\)
- The scalar multiple \(\lambda f\) (\(\lambda \in \mathbb{R}\))
- The product \(fg\)
- The quotient \(\frac{f}{g}\) provided \(g(a) \neq 0\)
If \(f\) is continuous at \(a\) and \(g\) is continuous at \(f(a)\), then the composite \(g \circ f\) is also continuous at \(a\).
\(g \circ f \equiv g(f(x))\).
Basic continuous functions
- polynomials and rational functions*
- The modulus/absolute function
- The square root function, and nth root function where \(n \in \mathbb{Z}^+\)
- Trig functions
- Exponential functions
- Functions defined by power series
*The domain of rational functions exclude the divide by zero bit and so is continuous.
Intermediate Value Theorem
If \(f : [a, b] \longrightarrow \mathbb{R}\) is continuous, \(f(a), f(b)\) have opposite signs, then there is a point \(c\) between \(: f(c) = 0\).
Exercise
Show that there is a number \(x : x^{179} + \frac{163}{1 + x^2} = 119\).
Proof. Rearrange, let \(f(x) = x^{179} + \frac{163}{1 + x^2} - 119\).
\(f(0) > 0,\; f(1) < 0\) thus by intermediate value theorem \(\exists x \in [1, 0] : f(x) = 0\).
(You do have to conjure numbers to do this question)
Extreme Value Theorem
If \(f : [a, b] \longrightarrow \mathbb{R}\) is continuous, then \(\exists m, M \in [a, b]:\)
\[\begin{align} & f(m) \leq f(x) \leq f(M) & \forall x \in [a, b] \end{align}\]Basically, there's a maximum and minimum value of a function in a bound.
Differentiation
If the limit of a real function \(f(x)\) at point \(a\) \(lim_{h \rightarrow 0} \frac{f(a + h) - f(a)}{h}\) exists, then \(f(x)\) is differentiable at \(a\), we get \(f'(a)\).
The \(\frac{d}{dx}\) thing is called Leibniz notation.
Theorem. If \(f\) is differentiable at \(a\), \(f\) is also continuous at \(a\). However, \(f\) may be continuous but not differentiable.
For example, the function \(f(x) = |x|\) is continuous at 0 but not differentiable, because at 0 the relevant limit does not exist (there's a crinkle).
Also though the notes is very from principles ""RIGOUR"" \(\frac{d(x^n)}{dx} = nx^{n-1}\) - in case you didn't know.
Combination Rules for Derivatives
If \(f, g\) are differentiable, the following are also differentiable and follow these rules:
- Sum rule (\(f + g\)): \((f +g)' = f' + g'\)
- Multiple rule (\(lambda f:\lambda \in \mathbb{R}\)): \((\lambda f)' = \lambda f'\)
- Product rule (\(fg\)): \((fg)' = fg' + gf'\)
- Quotient rule (\(f/g\)): \((\frac{f}{g})' = \frac{gf' - fg'}{g^2}\)
Trig derivatives
\[\begin{array}{ccc} & \frac{d(\sin x)}{dx} = \cos x & \frac{d(\cos x)}{dx} = -\sin x & \frac{d(\tan x)}{dx} = \frac{1}{\cos ^2 x} = \sec ^2 x. \end{array}\]The Chain Rule
Providing the functions \(y = f(z), \; z = g(x)\) are differentiable
\[\begin{align} \frac{dy}{dx} & = \frac{dy}{dz} \times \frac{dz}{dx} \\ \textrm{or } (f \circ g)'(x) & = g'(x) f'(g(x)) \end{align}\]Differentiation of functions def' by power series
If \(\sum(a_nx^n\) is a power series with radius of convergence R, and \(f\) is defined \(f(x) = \sum_{n=0}^\infty a_nx^n : x \in (-R, R)\) then it is differentiable and \(f'(x) = \sum_{n=1}^\infty na_nx^{n-1}\)
Pay attention: \(n=0\) changes to \(n=1\)
The exponential function \(\exp(x)\) or \(e^x\) can be defined as a power series (its Maclaurin series) and thus differentiable to get... \(e^x\).
Exercise
Find \(\sum_{n=1}^\infty \frac{n}{2^n}\).
Solution. You've got to bear with me on this one, because the given solution is clearly written by someone who knew the answer already and have left out a lot of the logic leaps.
We start with a standard definition, for \(|x| = 1\) \(\sum_{n=0}^\infty x^n = \frac{1}{1-x}\) We differentiate the left to get \(\sum_{n=1}^\infty nx^{n-1} = \frac{d}{dx} (\frac{1}{1-x}).\) And the right resolves into \((1-x)^{-2}\).
This derivative form of the power set is very similar to the sum we want to find, so much so that if we take \(x = \frac{1}{2}\), \(\sum_{n=1}^\infty n(\frac{1}{2})^{n-1} = \sum_{n=1}^\infty \frac{n}{2^{n-1}}\) Which resolves down into \(4\) since we can just sub in \(x\) to the other side. Note that if we want to get to \(\frac{n}{2^n}\) from \(\frac{n}{2^n-1}\) we have to multiply by \(\frac{1}{2}\), so we multiply both sides by a half to get \(\sum_{n=1}^\infty \frac{n}{2^n} = 2.\)
Properties of Differentiable Functions
(Things to be aware of) Turning points, of which are local maxima and minima.
Turning Point Theorem
If a differentiable function \(f\) has a turning point at \(a\), \(f'(a) = 0\).
All points \(a\) where \(f(a) = 0\) are called stationary points, which aren't necessarily turning points. A stationary point which is neither a local maximum nor minimum is a point of inflection
To locate maxima and minima of a continuous function in a range \([a, b]\), we need only consider values at
- The stationary points of \(f\) in \([a, b]\)
- The end points \(a, b\)
Rolle's Theorem
If \(f : [a, b] \longrightarrow \mathbb{R}\) is continuous, is differentiable on \(a, b\) and \(f(a) = f(b)\) then there is a point \(c \in a, b : f'(c) = 0\)
You have two end points with the same \(y\) value, so in the middle there must be some value which is a stationary point. (on horizontal lines, all the points work.)
Mean Value Theorem
If \(f\) is continuous and differentiable over \([a, b]\) then \(\exists c \in [a,b] :\) \(f'(c) = \frac{f(b) - f(a)}{b-a}.\)
Which is just Rolle's theorem but on a angled graph.
Consequences of the MVT
Suppose \(f\) is continuous on \([a,b]\) and differentiable on \((a,b)\),
-
- \(f'(x) = 0 \forall x \in (a, b) \implies f\) is constant on \([a, b]\)
- \(f'(x) > 0 \forall x \in (a, b) \implies f\) is strictly increasing
- \(f'(x) < 0 \forall x \in (a, b) \implies f\) is strictly decreasing
- Second Derivative Test. Suppose \(f'(c) = 0\). \(f''(c) > 0 \implies f\) has local minimum; \(f''(c) < 0 \implies f\) has local maximum.
For curve sketching:
- Find stationary points, their co-ordinates, and their nature (max/min/inflection)
- Find where \(f(x) = 0\), i.e. intersects \(x\) axis
- Find \(f\) where \(x = 0\), i.e. y-intercept
- Determine what happens to \(f(x)\) as \(x \rightarrow \pm\infty\) (asymptotes?)
- Investigate near where (if) \(f(x) \rightarrow \infty\), where a divide by zero is found or some other asymptote.
L'Hopital's Rule and Implicit Differentiation
L'Hopital's Rule
Suppose that \(f(a) = 0, g(a) = 0\). If
\(\lim_{x \rightarrow a} \frac{f'(x)}{g'(x)}\) exists then so does \(\lim_{x \rightarrow a} \frac{f(x)}{g(X)}\) also exists, \(\lim_{x\rightarrow a} \frac{f(x)}{g(x)} = \lim_{x \rightarrow a} \frac{f'(x)}{g'(x)}.\)
You can repeat this and keep on differentiating if you keep getting \(\frac{0}{0}\), however there is no guarantee that you will get anywhere.
Implicit differentiation
Implicit differentiation deals with functions not of the form \(y = f(x)\), such as \(x^2 + y^2 = 1\).
Two methods (that work the same):
- Either consider a \(y\) term as a function of \(x\), thus because of chain rule append a \(\frac{dy}{dx}\) to the end of each \(y\) term, differentiate \(x\) terms as normal.
- Or, consider both as functions and append a \(dx\) or \(dy\).
Exercise
Find \(\frac{dy}{dx}\) of \(y \sin x + \tan^{-1} y = 0\) in terms of \(y, x\).
Solution. First, differentiate \(y \sin x\). By product rule,
\(d(y \sin x) = \sin(x)dy + y\cos(x)dx\) Then \(\tan^{-1} y\) is a standard derivative, \(d(\tan^{-1} y) = \frac{1}{1+y^2} dy\) Thus we get \(\sin(x)dy + y\cos(x)dx + \frac{1}{1+y^2}dy = 0\) Collecting terms, \((\sin(x) + \frac{1}{1+y^2})dy = (-y\cos(x))dx\) Dividing through, \(\frac{dy}{dx} = \frac{-y \cos x}{\sin x + \frac{1}{1+ y^2}}\) Which one can simplify if they wish.
Differentiating Inverse Functions
Recall from 130 Injection, Bijection, and Surjection. Recall also the range of a function \(f : A \longrightarrow B\) is \(\{y \in B : f(x) = y, x \in A \}\).
Let \(f : [a, b] \longrightarrow \mathbb{R}\) be a continuous function. If \(f\) differentiable on \((a, b)\) and \(f'(x) > 0 \lor f'(x) < 0 \; \forall x \in (a, b)\), then \(f\) has an inverse function \(f^{-1}\) which is differentiable. Let \(y = f(x);\)
\[\begin{align} & (f^{-1})'(y) = \frac{1}{f'(x)} \equiv & \frac{dx}{dy} = \frac{1}{\frac{dy}{dx}} \end{align}\]Example
To find the derivative of \(f\) given \(f(x) = x^\frac{1}{n}\) for some fixed \(n > 0\). The inverse function is gotten by writing \(y = x^\frac{1}{n}\), such that \(x = y^n\), thus \(f^{-1}(x) = y^n\). So
\[\begin{align} & \frac{dx}{dy} = ny^{n-1} & \frac{dy}{dx} = \frac{1}{ny^{n-1}}. \end{align}\]\(y = x^\frac{1}{n}\) so \(\frac{dy}{dx} = y^{n-1} = x^\frac{n-1}{n}\), thus
\[\begin{align} f\'(x) & = \frac{dy}{dx} = \frac{1}{nx^\frac{n-1}{n}} \\ & = \frac{1}{n} x^\frac{1}{n-1}. \end{align}\]We can differentiate any rational power of \(x\) by the following: \(\frac{d}{dx}x^\frac{p}{q} = ... = \frac{p}{q} x^{\frac{p}{q}-1}.\) same way for regular differentiation.
Trigonometric Inverse Derivatives
- \(\mathbf{sin}^{-1}\). The inverse for the function \(\sin : [\frac{-\pi}{2}, \frac{\pi}{2}] \longrightarrow [-1, 1]\);
- \(\mathbf{cos}^{-1}\). The inverse for the function \(\cos : [0, \pi] \longrightarrow [-1, 1]\);
- \(\mathbf{tan}^{-1}\). The inverse for the function \((\frac{-\pi}{2}, \frac{\pi}{2}) \longrightarrow \mathbb{R}\);
Integration
Definition of integration
Let \(f:[a, b] \longrightarrow\mathbb{R}\) be a bounded function. A partition of \([a,b]\) is a set \(P = \{x_0, x_1, x_2,...,x_n\} :\) \(a = x_0 < x_1 < x_2 < ... < x_{n-1} < x_n = b.\) For such a partition P let
- \(m_r\) as the greatest lower bound \(\{f(x): x_{r-1} \leq x \leq x_r\}\)
- \(M_r\) as the least upper bound \(\{f(x): x_{r-1} \leq x \leq x_r\}\)
Also let the lower sum \(L(f, P\) and the upper sum \(U(f, P)\)
\[\begin{align} L(f, P) & = \sum_{r=1}^{n} (x_r - x_{r-1})m_r, \\ U(f, P) & = \sum_{r=1}^{n} (x_r - x_{r-1})M_r. \end{align}\]These are the lower and upper estimates of the area under the graph of \(f\). For more points in the partition, we get a better estimate of an area.
If there is a unique number A: \(L(f,P) \leq A \leq U(f,P)\) for every partition P of \([a, b]\) then we say that \(f\) is integrable over \([a, b]\). A is the definite integral of \(f\) between \(a, b\). \(\vartriangleright\)
\[\begin{align} & \int_{a}^{a} f(x)dx = 0 & \int_{a}^{b} f(x)dx = - \int_{b}^{a} f(x)dx. \end{align}\]Some basic integrable functions (referring to the range \([a, b]\)):
- Any continuous function.
- Any function which is increasing.
- Any function which is decreasing.
After this section are basic rules for integrals. However, I'm going to put collapsible sections in here for more integration techniques one can use.
Integration by Substitution
Like reverse-engineering a chain rule, this methods relies on the integrand being able to be written in the following form: \(\int f(g(x))g'(x)dx\) Where the function \(g(x)\) is contained within a bigger function, and its derivative is outside as a separate factor.
We replace \(g(x)\) with a new variable, such as \(u\) and \(g'(x)dx\) with \(du\), and integrate over just the function of \(u\), \(\int f(u) du\), replacing in what \(u\) is meant to be afterwards.
Why it works...
Let \(u = g(x)\). If we differentiate \(g(x)\) we get \(\frac{du}{dx} = g'(x)\) Now, look at the integral, \(\int f(g(x))g'(x)dx\). Replace every \(g\) instance with the equivalent \(u\) form to get
\[\begin{align} \int f(g(x))g\'(x)dx & = \int f(u) \frac{du}{dx} dx \\ & = \int f(u) du. \end{align}\]Try working out \(\int 2x\cos(x^2) dx\) this way, and \(\int \frac{9x^2 + 2}{3x^3+2x+7} dx\).
Answers...
- \[-\sin(x^2)\]
- \[\ln(3x^3 + 2x + 7)\]
Tangentially related is the reverse chain rule itself. Say we have an integral, \(\int \cos(3x^2+2x) dx\), we can note that \(\sin (3x^2 + 2x)\) differentiates into \((6x+2) \cos(3x^2+2x) dx\) by chain rule, so integrating the original integrand back we divide by this extra factor, to get \(\frac{\sin(3x^2+2x)}{6x + 2}\).
In certain cases (i.e. like \(\int \frac{1}{1 + x^2} dx\)) we would have to substitute trig functions - in this case \(x = \tan \theta\). (\(dx = \sec^2 (\theta)d\theta\).) Then when we substitute, using said example we get \(\int \frac{1}{\sec^2 \theta} \sec^2 \theta \: d\theta\). This is just \(\int 1 d\theta\) and is \(\theta + c\), or \(\tan^{-1} x + c\).
Integration by Parts
Integration by parts is just a formula:
\[\begin{align} & \int f(x)g(x)dx = f(x) \int g(x)dx - \int f'(x) [\int g(x) dx] dx, \\ \textrm{or } & \int uv\: dx = u \int v\: dx - \int u' [\int v \: dx]\: dx. \end{align}\]It can alternatively be written \(\int u \: dv = uv + \int v \: du\)
Try work out \(\int x \cos (x )dx\).
Answer...
\[\begin{align} \int x \cos (x )dx & = x \sin(x) - \int 1 \sin(x) dx \\ & = x \sin x + \cos x + c. \end{align}\]Properties of definite intervals
- Sum rule: \(f, g\) are integrable then so is \(f+g\), and \(\int_{a}^{b} (f(x) + g(x))dx = \int_{a}^{b} f(x)dx + \int_{a}^{b} g(x)dx.\)
- Multiple (scalar) rule: \(f\) integrable, \(\lambda \in \mathbb{R}\), then \(\lambda f\) integrable, and \(\int_a^b \lambda f(x)dx = \lambda \int_a^b f(x)dx.\)
- \(f\) integrable over \([a,c]\) and \([c,b]\), then \(f\) also integrable over \([a,b]\), and \(\int_a^b f(x)dx = \int_a^c f(x)dx + \int_c^b f(x)dx.\)
- If \(f(x) \leq g(x) \; \forall x \in [a, b]\) then provided both integrals exist, \(\int_a^b f(x)dx \leq \int_a^b g(x)dx.\)
Fundamental theorems of calculus
First Fundamental Theorem of Calculus
Let \(f : [a,b] \longrightarrow \mathbb{R}\) be integrable, and define \(F : [a,b] \longrightarrow \mathbb{R}\) as \(F(x) = \int_a^x f(t)dt.\) If \(f\) continuous at \(c \in (a, b)\) then \(F\) is differentiable at \(c\), and \(F'(c) = f(c).\)
Second Fundamental Theorem of Calculus
Let \(f : [a,b] \longrightarrow \mathbb{R}\) be continuous and suppose \(F\) is a differentiable function : \(F' = f\), then \(\int_a^b f(x)dx = [F(x)]^b_a\) Where \([F(x)]^b_a\) denotes \(F(b) - F(a)\)
Alternatively, \(\frac{d}{dx}F(x) = f(x) \; \forall x\), and \(f\) continuous, then \(\int f(x)dx = F(x) + c\) for some c.
Logs and Exponentials
In the notes \(\log x\) means the natural log of \(x\), but here I will use \(\ln x\). It is always useful to specify.
For each \(x > 0\); \(\ln x = \int_1^x \frac{1}{t}dt.\)
Properties of Logarithms
- \[\ln(1) = 0.\]
- log is a strictly increasing function.
- log is a differentiable function, \(\frac{d(\ln x)}{dx}\)
- \[\ln(xy) = \ln x + \ln y\]
- \[\ln(\frac{x}{y}) = \ln x - \ln y\]
- \(\ln : (0, \infty) \longrightarrow \mathbb{R}\) is bijective
Since \(\ln : (0, \infty) \longrightarrow \mathbb{R}\) is bijective, it has an inverse function \(\exp : \mathbb{R} \longrightarrow (0, \infty)\). \(e = \exp(1); e^x = \exp(x)\).
Properties of Exponentials
For any \(x, y \in \mathbb{R}\),
- \(\exp(x+y) = \exp(x)\exp(y)\) but likewise \(e^{x+y} = e^xe^y\)
- \(\frac{d(\exp(x))}{dx} = \exp(x)\), \(\exp(0) = 1\)
- \[\exp(x) = \lim_{n \rightarrow \infty} (1 + (\frac{x}{n})^n\]
- \[\exp(x) = \sum_{n=0}^{\infty} \frac{x^n}{n!}\]
For \(a > 0, x \in \mathbb{R}\), any exponential can be defined by \(a^x = e^{x \ln a}\) Then for \(a, b > 0; x, y \in \mathbb{R}\) the following properties hold:
\[\begin{align} & \ln (a^x) = x \ln a & (ab)^x = a^xb^x \\ & a^xa^y = a^{x+y} & (a^x)^y = a^{xy} = (a^y)^x. \end{align}\]We can change bases of logarithms (provided \(b \neq 1\)); \(\log_b x = \frac{\ln x}{\ln b}\)
Taylor's Theorem
Taylor's Theorem
Let \(f\) be an \(n+1\) times differentiable function on an open interval containing some points \(a, x\). Then \(f(x) = f(a) + f'(a)(x-a) + \frac{f''(a)}{2!}(x-a)^2 + ... + \frac{f^{(n)}(a)}{n!}(x-a)^n + R_n(x)\) Where \(R_n(x) = \frac{f^{(n+1)}(c)}{(n+1)!}(x-a)^{n+1}\) For some number \(c\) between \(a, x\).
The resulting polynomial from evaluating some \(n\) terms is called the Taylor polynomial of degree \(n\) of \(f\) at \(a\). This is however an approximation, and so the end term \(R_n(x)\) is the error in the approximation.
Taylor series
If we can show \(R_n (x) \rightarrow 0\) as \(n \rightarrow \infty\) then we get a sequence of better and better approximations until we get the Taylor series, \(f(x) = \sum_{n=0}^\infty \frac{f^{(n)}(a)}{n!}(x-a)^n\) This will only converge for values of \(x\) within the radius of convergence of the power series. They can be used to approximate functions.
When \(n = 0\), Taylor's reduces down to the mean value theorem. (Proof of Taylor's is omitted)
Nth derivative test for the nature of stationary points
Suppose a function \(f\) has a stationary point at \(a\) however \(f'(a) = f''(a) = ... = f^{(n-1)} = 0\), while \(f^{n}(a) \neq 0\). If \(f^{n}(a)\) is continuous,
- \(n\) even, \(f^{(n)}(a) > 0\); \(f\) has a local maximum
- \(n\) even, \(f^{(n)}(a) < 0\); \(f\) has a local minimum
- \(n\) odd; \(f\) has a point of inflection at \(a\)
Maclaurin Series
When \(a = 0\), we get the Maclaurin's series, which is a special case of Taylor's series \(f(x) = \sum_{n=0}^{\infty}\frac{f^{(n)}(0)}{n!}x^n\)
Important Maclaurin Series
\[\begin{align} e^x & = 1 + x + \frac{x^2}{2!} + ... + \frac{x^n}{n!} & (x \in \mathbb{R}) \\ \sin x & = x - \frac{x^3}{3!} + \frac{x^5}{5!} - ... + \frac{(-1)^nx^{2n}}{(2n)!} & (x \in \mathbb{R}) \\ \cos x & = 1 - \frac{x^2}{2!} + \frac{x^4}{4!} - ... + \frac{(-1)^nx^{2n}}{(2n)!} & (x \in \mathbb{R}) \\ (1+x)^a & = 1 + ax + \frac{a(a-1)x^2}{2!} + ... + {a \choose n}x^n & (x \in \mathbb{R}) \\ \ln (1 + x) & = x - \frac{x^2}{2} + \frac{x^3}{3}- ... + \frac{(-1)^{n+1}x^n}{n} & (\|x\| < 1) \\ -\ln (1-x) & = x \frac{x^2}{2} + \frac{x^3}{3} + ... + \frac{x^n}{n} & (\|x\| < 1) \end{align}\]These are all expressible as sums using the nth term, though bear in mind the last two start from \(n=1\).
First Order ODEs
ODEs, Ordinary Differential Equations, are equations including derivatives, such as \(y' = 4x\) or \(y'' + 4y = x\), etc.
The order of a DE represents the highest derivative contained within it. This section looks at 1st orders.
First order equations of the form \(y' = f(x)\) can be easily solved via integration. This will give us a general solution with the \(+ c\), representing a family of curves. If we are given constraints, we can get a particular solution.
Separable Equations
A [1st order] ODE is called a **separable equation if it can be written in the form \(\frac{dy}{dx} = f(x)g(y)\) Since this can be rewritten as \(\frac{1}{g(y)} \frac{dy}{dx} = f(x) \textrm{ or } \frac{1}{g(y)}dy = f(x)dx\) Which can be directly integrated, \(\int \frac{1}{g(y)} dy = \int f(x)dx.\)
An example of a separable equation is \(\frac{dy}{dx} + xy = 0\).
Homogenous Equations
An ODE is called homogenous if it can be written in the form \(\frac{dy}{dx} = f(\frac{y}{x})\) We let some variable \(v = \frac{y}{x} : y = xv\), thus \(\frac{d(xv)}{dx} = f(v) \Longleftrightarrow x\frac{dv}{dx} + v = f(v) \Longleftrightarrow \frac{dv}{dx} = \frac{f(v) -v}{x}\) This brings it into a separable form, into
\[\begin{align} \frac{1}{f(v)-v}dv = \frac{1}{x}dx & \implies \int \frac{1}{x} dx = \int \frac{1}{f(v)-v} dv \\ & \implies \ln x = \int \frac{1}{f(v)-v} dv. \end{align}\]And replace \(v\) by \(\frac{y}{x}\) to get the original general equation.
The following exercise may help.
Exercise
Solve \(2xy \frac{dy}{dx} = y^2 - x^2.\)
Solution. We can rearrange; \(\implies \frac{dy}{dx} = \frac{y^2-x^2}{2xy} = \frac{1}{2}(\frac{y^2}{xy} - \frac{x^2}{xy}) = \frac{1}{2}(\frac{y}{x}-\frac{x}{y}).\) This is homogenous, so we apply the above principle to this equation;
\[\begin{align} \implies \frac{d(vx)}{dx} = \frac{1}{2}(v - \frac{1}{v}) & \implies x\frac{dv}{dx} + v = \frac{1}{2}(v - \frac{1}{v}) \\ & \implies \frac{dv}{dx} = -\frac{1}{x}(\frac{1+v^2}{2v}) \end{align}\]Which is now in separable form. Now, skipping straight to \(\ln x = \dots\) form,
\[\begin{align} \implies \ln x = \int \frac{-2v}{1 + v^2}dv & \implies \ln x = -\ln (1 + v^2) + c \\ & \implies x(1+v^2) = e^c. \end{align}\]Let the constant \(k = e^c\) and replacing \(v\) by \(\frac{y}{x}\). Thus \(x(1 + \frac{y^2}{x^2}) = k \Longleftrightarrow x^2 + y^2 = kx.\)
Linear Equations
An ODE is called linear if it has the form \(\frac{dy}{dx} + P(x)y = Q(x).\)
Related to this is the integrating factor, \(I(x) = e^{\int P(x) dx}\). This can be derived if \(Q(x) = 0\).
We multiply both sides of the equation by \(I(x)\), to get \(\frac{dy}{dx}I(x) + yI(x)P(x) = Q(x)I(x)\) Note the LHS is the result of the implicit differentiation of \(yI(x)\), thus
\[\begin{align} \frac{d}{dx}(yI(x)) & = Q(x)I(x) \\ yI(x) & = \int Q(x)I(x) dx. \end{align}\]Exercise
Find the general solution to \((1+x^2)\frac{dy}{dx} + xy = x\sqrt{1+x^2},\) and the particular solution which satisfies \(y = \frac{1}{2}, x=0\).
Solution. We can rearrange the equation to \(\frac{dy}{dx} + \frac{xy}{1+x^2} = \frac{x}{\sqrt{1+x^2}}\) The integrating factor is thus \(e^{\int \frac{x}{1+x^2}dx} = e^{\frac{1}{2} \ln (1+x^2)} = (1 + x^2)^{\frac{1}{2}}.\) Multiplying by this integrating factor,
\[\begin{align} \sqrt{1+x^2} \frac{dy}{dx} + \frac{xy}{\sqrt{1+x^2}} & = x \\ \frac{d}{dx}(\sqrt{1+x^2} y) & = x \end{align}\]Hence the general solution is \(\sqrt{1+x^2}y = x^2 / 2 + c.\) The particular solution is when you substitute the values for \(y, x\) into the equation to find c, and \(c = \frac{1}{2}\). That gives you the final specific curve.
Second Order ODEs
These are where there is a \(y''\) term, and are dealt with very similarly to recurrences. They take the form \(ay'' + by' + cy = f(x)\)
If \(f(x) = 0\), then the equation is homogenous. To find the general solution, find the general solution of \(ay'' + by' + cy = 0\) find the particular solution of \(ay'' + by' + cy = f(x)\) and add them together.
Solving homogenous equations
For the homogenous equation \(ay'' + by' + cy = 0\)
There's a lot of waffle, but we have an auxiliary equation that is relevant, which is \(a\lambda^2 + b\lambda + c = 0\)
The roots of this equation determine the form of the general solution.
- Roots are real and distinct. The general solution is \(y = Ae^{\lambda_1 x} + Be^{\lambda_2 x}\).
- Roots are equal. The general solution is \(y = Ae^{\lambda x} + Bxe^{\lambda x} = (A + Bx) e^{\lambda x}\).
- Complex Roots. Let \(\lambda_1 = \alpha + i\beta\) and \(\lambda_2 = \alpha - i\beta\). Then, the general equation is \(y = e^{\alpha x} (A \cos \beta x + B \sin \beta x)\).
Finding particular solutions
For the equation \(ay'' + by' + cy = f(x)\) Like in recurrences, we want like functions.
Form of \(f(x)\) | Form for a particular solution |
---|---|
\(e^{\alpha x}\) | - \(y = Ae^{\alpha x}\) if \(\alpha\) is not a root of the auxiliary equation - \(y = Axe^{\alpha x}\) if \(\alpha\) is one root - \(y = Ax^2 e^{\alpha x}\) if \(\alpha\) is the repeated roots |
Polynomial of degree \(n\) | - Polynomial of degree \(n\) if 0 is not a root - Polynomial of degree \(n+1\) if 0 is a non-repeated root - Polynomial of degree \(n+2\) if 0 is a repeated root |
\(A \cos \alpha x + B \sin \alpha x\) | - \(y = C \cos \alpha x + D \sin\alpha x\) if \(\alpha i\) is not a root of the auxiliary equation - \(y = x(C \cos \alpha x + D \sin \alpha x)\) otherwise |