Table of contents

What I found out

All my math courses emphasize the same thing.

Engineers use math to solve problems for people.

This means that the engineer will often:

Use engineering units to encode physical information.
Use functions to describe the relationships of a system.
Use systems of analytic equations to form a model.
Solve the problem efficiently using computation.

This document ties together what I've learned during undergrad, so I can make connections between my different courses. It may be helpful as it reviews:

Modelling of systems using linear algebra.
The time-varying relationship, the basis of most models. With notes on the analysis of time-series data.
Numerical analysis techniques

Mathematics is an experimental science, and definitions do not come first, but later on. They make themselves, when the nature of the subject has developed itself.

Oliver Heaviside

Topics to learn:

Calculus up to differential equations
Signal theory
Linear algebra
Complex analysis, residue theorem (Needham)
Eigenvectors, Gram-Schmidt on inner product spaces, Hilbert spaces
Functional analysis (Kreyszig)
Dynamics (Strogatz, Hale & Kocak)
Statistics, and stochastic processes
Calculus of variations and lagrange multipliers
Perturbation theory
Dominant balance and saddle-point/steepest descent (analytic approximation)
Wavelets (signal processing)
Information and coding theory (McKay, Princeton Companion to Applied Mathematics)
Analytic approximation (Bender, orszag)

Modelling systems using linear algebra

A matrix $\pmb{A} \in M_{m\times n}(\mathbb{R})$ is a rectangular array of numbers arranged in $m$ rows and $n$ columns. A two-by-two matrix, for example, looks like:

$$\pmb{A} = \begin{bmatrix} a_{11} & a_{12} \\ a_{21} & a_{22} \end{bmatrix}$$

Notice that elements are indexed as $a_{ij}$, "the $i$th row and the $j$th column."

Definition 1: Matrix multiplication

Let $\pmb{A} \in M_{m\times n}, \pmb{B} \in M_{n\times r}$. Then the matrix product $\pmb{C}=\pmb{AB}\in M_{m\times r}$ is defined element-wise as $\pmb{C} := [c_{ij}]$,

$$c_{ij} := \sum^n_{k=1} a_{ik} b_{kj}$$

That is, the result of adding all the products of each element on the $i$th row of $A$ with the $j$th column of $B$

Vectors

A vector is a matrix of the form $\pmb{v} \in M_{m\times 1}(\mathbb{R})$

The time-varying relationship

A common model is the time-varying relationship. We relate one or many physical quantities in a system to the passing of time.

For example: the amount of water in a tank given that I am letting it out would be a time-varying relationship: the amount of water depends on the time passed from the start.

Calculus and the concept of a function gives us the tools to deal with such relationships.

Definition 2: Function

A function, $f: A\mapsto B$, establishes a relationship between two sets of objects, $A$ and $B$.

Given an item $a$ that is contained in $A$, the function tells us either what the corresponding item is in $B$ or if there is no corresponding item.

This is the most useful definition of a function. For the time-varying relationship, our functions look like this:

Definition 3: Time-varying function

A time varying function, $f(t, x, y, \ldots)$ establishes a relationship between a time and other varying parameters with an set of outputs.

$$f: \begin{bmatrix} t \\ x \\ \ldots \end{bmatrix}\mapsto B$$

Definition 4: Piecewise continuous function

A function $f$, which for an interval $I\in [\alpha, \beta]$ can be partitioned by a finite number of points enclosed in $I$ such that

$f$ is continuous on each open subinterval
$f$ approaches a finite limit as the endpoints of each subinterval are approached from within the subinterval.

That is, if $f$ is continuous everywhere on $I$ except for a finite number of jump discontinuities.¹

Heaviside step function

A step function is a piecewise constant function with finitely many pieces. Graphically, it looks like many horizontal lines covering different intervals.

Definition 5: Heaviside step function

The Heaviside step function or unit step function is the piecewise function

$$u_c(t) := \left\{ \begin{matrix} 0,\quad t < c \\ 1,\quad t \ge c \end{matrix} \right .$$

Note that any piecewise continuous function may be expressed as the sum of Heaviside step functions. This makes it very useful for modelling time-varying relationships that may not be continuous everywhere. The function was developed by Oliver Heaviside to represent a signal that switches on at certain time $t=c$ and stays on indefinitely.

Example 1: Piecewise function as the sum of Heaviside functions

Woohoo

The Heaviside step function is used in integration to integrate piecewise continuous functions. The Laplace transform allows the modelling of piecewise continuous functions using the Heaviside step function in the $s$-domain.

The integral transform

Definition 6: Integral transform

A relation of the form

$$F(s) = \int_\alpha^\beta K(s, t)\, f(t)\, dt$$

where $K(s, t)$ is the kernel of the transformation, and the limits of integration $\alpha$ and $\beta$ may be infinite but must satisfy $\alpha < \beta$.

An integral transform takes a function of time and makes it a function of another variable.

$$f(t): [0, \infty) \mapsto g(\tau): [a, \infty)$$

An integral transform maps an equation from its original function space ("domain") to a different space.

This is useful when the the new function is easily manipulated in terms of $\tau$ rather than $t$. The solution can be mapped back to the time domain using the transform's inverse.

A list of useful transforms:

Fourier transform: given a signal, the transform maps it into the function describing the frequencies present in the signal.
Hilbert transform: component of the analytic representation of a real-valued signal $u(t)$
Laplace transform: converts $f(t)$ to a function of a complex variable $s$: converts differentiation and integration in the time domain into multiplication and division in the Laplace domain.
Poisson kernel: solves the two-dimensional Laplace equation.
Weierstrass transform: the smoothed version of $f(t)$ obtained by averaging its values, weighted with a Gaussian function centered at $t$.

Every integral transform is a linear operator because the integration operation is linear. If the kernel is allowed to be a generalized function, thena ll linear operators are integral transforms (Schwartz kernel theorem).

To put the method of Laplace transforms into proper perspective, we consider the following hypothetical situation. Suppose that we want to multiply the numbers 3.163 and 16.38 together, but that we have forgotten completely how to multiply. We only remember how to add. Being good mathematicians, we ask ourselves the following question.

Question: Is it possible to reduce the problem of multiplying the two numbers 3.163 and 16.38 together to the simpler problem of adding two numbers together?

The answer to this question, of course, is yes, and is obtained as follows. First, we consult our logarithm tables and find that $\ln{3.163} = 1.15152094$, and $\ln{16.38} = 2.79606108$. Then, we add these two numbers together to yield $3.94758202$. Finally, we consult our anti-logarithm tables and find that $3.94758202=\ln{51.80994}$. Hence, we conclude that $3.163\times16.38 = 51.80994$ ...

The key point in this analysis is that the operation of multiplication is replaced by the simpler operation of addition when we work with the logarithms of numbers, rather than the numbers themselves.¹

Laplace transform

Definition 7: Exponential order

A function $f: [0, \infty)\mapsto\mathbb{R}$ is of exponential order if there are constants $k, M, a\in\mathbb{R}$, $k,M > 0$, such that

$$\lvert f(t)\rvert < ke^{at} \forall t \ge M$$

Equivalently, $f$ is of exponential order if its growth may be bounded by any exponential function:

$$f\le\mathcal{O}(e^{at})$$

Definition 8: Laplace transform

Let $f(t): [0, \infty) \mapsto\mathbb{R}$. If:

$f$ is piecewise continuous on the interval $0\le t\le A$ for any positive $A$
$f$ is of exponential order

Then the Laplace transform of $f$, denoted $\mathcal{L}\{f(t)\}$, is given by

$$\mathcal{L}\{f(t)\}(s) := F(s) = \int_0^\infty e^{-st} f(t)\, dt$$

and exists for $\text{Re}(s) > a$.

$\text{Re}(s)$ is the "real part" of $s$ and form the exponential "bounds" that keep the envelope of the Laplace transform from "blowing up"

In effect, the Laplace transform maps functions in the $t$-domain to functions in the $s$-domain. This is useful because of the additional property that

$$f'(t) = sF(s) - f(0)$$

The operation of differentiation w.r.t $t$ becomes the operation of multiplication w.r.t. $s$. Since the solutions of linear differential equations with constant coefficients are based on the exponential function, the Laplace transform is particularly useful for such equations.

The steps are:

Transform an IVP for $f$ in the $t$-domain to the equivalent in the $s$-domain.
Solve this problem algebraically to find $\mathcal{L}\{f\}$.
Recover the desired function $f$ from its transform by "inverting the transform"

Theorem 1: Linearity of the Laplace transform

Given $f_1, f_2$ functions defined with respect to $t$ and $F_1, F_2$ be their Laplace transforms. Then $\forall s$ where $F_1, F_2$ defined:

$$\mathcal{L}\{c_1f_1 + c_2f_2\}(s) = c_1F_1(s) + c_2F_2(s)$$

for any two constants $c_1, c_2\in\mathbb{C}$.

Example 2: Deriving the Laplace transform for sine

We start with Euler's identity

$$e^{iat} = \cos{at} + i\sin{at}$$

Note that we know what the Laplace transform of $e^{at}$ is:

$$\begin{align*}\mathcal{L}\{e^{iat}\}(s) &= \frac{1}{s-ia} \\ &= \mathcal{L}\{\cos{at}\} + i\mathcal{L}\{\sin{at}\} & \text{by linearity of } \mathcal{L} \end{align*}$$

Multiply $\frac{1}{s-ia}$ by the complex conjugate $s+ia$:

$$\begin{align*}\frac{1}{s-ia} &= \frac{s + ia}{(s-ia)(s+ia)} \\ &= \frac{s}{s^2+a^2} + i\frac{a}{s^2 + a^2}\end{align*}$$

By comparison of real and imaginary parts

$$\mathcal{L}\{\cos{at}\} = \frac{s}{s^2+a^2}$$

$$\mathcal{L}\{\sin{at}\} = \frac{a}{s^2+a^2}$$

How are $\mathcal{L}\{tf(t)\}$ and $\mathcal{L}\{f(t)\}$ related?

$$\begin{align*}\mathcal{L}\{tf(t)\} &= \int_0^\infty e^{-st} tf(t)\, dt\\ &= \int_0^\infty -\frac{d}{ds}\left [ e^{-st} f(t) \right ]\, dt\\ &= -\frac{d}{ds} \int_0^\infty e^{-st} f(t)\, dt \\ &= -\frac{d}{ds} F(s)\end{align*}$$

More generally:

$$\mathcal{L}\{t^n f(t)\} = (-1)^n \frac{d^n}{ds^n} F(s)$$

Such that $$F^{(n)}(s) = \mathcal{L}\{(-1)^n t^n f(t)\}$$

Theorem 2: Nth derivative in the Laplace domain

Given a function of time $f: [0, A]\mapsto\mathbb{R}$ and its derivatives $f^{(1)}, f^{(2)}, \ldots, f^{(n-1)}$ with well-defined Laplace transforms.

Then $\mathcal{L}\{f^{(n)}(t)\}$ exists for $s > a$ and is given by

$$\begin{align*}\mathcal{L}\{f^{(n)}\}(s) = s^n\mathcal{L}\{f(t)\} &- s^{n-1} f(0)\\ &- \ldots - sf^{(n-2)}(0) \\ &- f^{(n-1)}(0)\end{align*}$$

Multiple variables

Definition 9: Jacobian

Let $\pmb{f}:\mathbb{R}^n\mapsto\mathbb{R}^m$ be an analytic function. Then the Jacobian matrix of $\pmb{f}$ is defined to be

$$\pmb{J} \in M_{m\times n}(\mathbb{R}) := \left[e_{ij} = {\partial\pmb{f}_i \over\partial x_j}\right] = \begin{bmatrix} \nabla^T f_1 \\ \vdots \\ \nabla^T f_m\end{bmatrix}$$

Impulse and the Dirac delta function

In many applications we deal with impulsive nature: for example, voltages or forces may act over a very short time and are very large in magnitude. It is often enough to approximate such functions using the Dirac delta function.

These problems are often represented by the homogenous differential equation

$$ay^{(2)} + by^{(1)} + cy = g(t)$$

where the forcing function $g$ is large during a short interval of time and is otherwise zero.

The integral $I(\tau)$

$$I(\tau) = \int_{t_0-\tau}^{t_0+\tau} g(t)\,dt$$

Equivalently for $g=0$ outside of the interval

$$I(\tau) = \int_{-\infty}^\infty g(t)\,dt$$

Is a measure of the strength of the forcing function. For mechanical systems this is known as the impulse.

Limits may be used to define the idealized unit impulse function $\delta$, which imparts an impulse one at $t=0$ but is zero for all values of $t$ other than zero.

$$\left \{ \begin{matrix}\delta(t) = 0,\quad t\ne 0 \\ \\ \int_{-\infty}^\infty \delta(t)\, dt = 1\end{matrix} \right .$$

Definition 10: Dirac delta function

The Dirac delta function, unit impulse, or $\delta$ distribution, is a generalized function on the real numbers whose value is zero everywhere except zero and whose integral over the entire real line is equal to one.

The function is often manipulated as a weak limit of a sequence of functions, each with a tall spike at the origin.

Ordinary differential equations

Definition 11: Ordinary differential equation (ODE)

Any equation expressing a relationship between time $t$, an unknown function $f(t)$, and any of $f$'s derivatives $f^{(n)}(t)$.

Definition 12: Order of an ODE

The highest derivative of the function $f$ present in an ODE.

Systems of linear equations

A system of ODEs is where there are $n$ equations in $t$, $n$ unknown functions $y_1(t)$ to $y_n(t)$, and at least one derivative of each of $y_k$. Such a system may always be transformed into a system of first-order linear equations.

To transform the arbitrary nth order equation $y^{(n)} = F(t, y, y^{(1)}, \ldots, y^{(n-1)})$, introduce the variables $x_1 = y, x_2 = y^{(1)}, \ldots, x_n = y^{(n-1)}$. Then $\frac{d}{dt} x_1 = x_2$, and in general we may solve the system $$\begin{bmatrix} x_1' \\ x_2' \\ \vdots \\ x_n'\end{bmatrix} = \begin{bmatrix}F_1(t,x_1,x_2,\ldots,x_n) \\ F_2(t,x_1,x_2,\ldots,x_n)\\ \vdots \\ F_n(t,x_1,x_2,\ldots,x_n)\end{bmatrix}$$

which is equivalently

$$\begin{bmatrix} x_1' \\ x_2' \\ \vdots \\ x_n'\end{bmatrix} = \begin{bmatrix}F_1(t,x_1,x_2,\ldots,x_n) \\ F_2(t,x_1,x_2,\ldots,x_n)\\ \vdots \\ F_n(t,x_1,x_2,\ldots,x_n)\end{bmatrix}$$

A solution of this system is therefore given by

$$\vec\phi = \begin{bmatrix} \phi_1(t) \\ \phi_2(t) \\ \vdots \\ \phi_n(t)\end{bmatrix}$$

where $x_1=\phi_1(t), x_2=\phi_2(t), \ldots$ satisfying the system of equations for all points in the interval $I$.

Signal analysis

Keywords: Analog signal processing, audio processing

Definition 13: Signal

A signal is a real-valued, continuous function of time that is either a system input or system output.

$$f: \mathbb{R}\mapsto\mathbb{R}$$

Every signal can be described as the sum of sine functions of different frequency.

This is the main idea for all of signals analysis. The sine functions encode many important physical quantities of the signal, and look like (they are the frequency components of the signal).

$$f_n = A\sin{(\omega t + \phi)}$$

Figure 1: A sine wave

Amplitude 3.3, angular frequency 0.2 pi rad/s, phase-shift of 10

Example 3: Describing this sine wave

There are a few important quantities here:

$A$ is the amplitude of $f_n$ and describes its fluctuation range.

The range of $f_n \in [-A, A]$.

$\omega$ is the angular frequency of $f_n$.

$f_n$ is periodic. Imagine a straight "arm" holding a pen attached to the origin of our graph. Further imagine that this arm rotated at a constant angular speed $\omega\, \text{rad/s}$, and could grow and shrink to trace out $f$ exactly. Then to draw out $f_n$ as it appears, this particular arm would have to rotate at a speed of $0.2\pi \left [ \frac{\text{rad}}{\text{unit time}} \right ]$. This is the angular frequency.

The arm has to rotate $2\pi\,\text{rad}$ to rotate a full cycle, so the frequency $f$ (cycles per unit time) is equal to ${0.2\pi \over 2\pi} = 0.1\text{Hz}$, or 0.1 cycles per unit time. This suggests that the pattern will repeat every 10 units of time, which is what we see.

$\phi$ is the phase shift of $f_n$.

This is the time offset of $f_n$. The entire waveform appears to be shifted backwards in time by the amount $\phi\over\omega$

Why is this useful? Well, any time-varying system can be completely modelled as:

$m$ input signals $x_1, \ldots, x_m$
$n$ output signals $y_1, \ldots, y_n$
The transfer function $h: \mathbb{R}^m \mapsto \mathbb{R}^n$, $h(x_1(t), \ldots, x_m(t))$ that maps all possible input signal combinations to an output response.

It so happens that many physical quantities vary periodically. For example, simple harmonic motion is a periodic function of time. Sound intensity and AC voltage are two other physical quantities that vary periodically in magnitude as time goes on.

When the system you need to model has inputs and outputs that are usually periodic, it makes sense to break down those signals into components of different frequency. This is called the frequency response of the system, and comes from the breaking down of one signal function into many functions of different frequency.

The frequency-domain

{% theorem(ref="Euler's identity") %}

{% end %}

A system's behaviour can be mathematically modeled, and is represented in the time domain as the transfer function $h(t)$ and in the frequency domain as $H(s\in\mathbb{C})$.

Input signals are usually called $x(t)$, $X(t)$ and output signals are $y(t)$, $Y(t)$.

We think of filters in terms of their effect on the spectrum of a signal.

Definition 14: Fourier series

Let $f: [-L, L]\mapsto\mathbb{R}$ be a signal in the time interval $I\in[-L, L]$.

We may encode all of $f$'s properties with the discrete data set given by the Fourier coefficients

$$c_k := \frac{1}{2L}\int_{-L}^L f(t) \exp{\left(\frac{jk\pi t}{L} \right )}\, dt,\quad k\in\mathbb{Z}$$

Then the Fourier series $F(t)$ is defined as the sum of weighted Fourier coefficients reconstructing $f$:

$$f(t) = F(t) := \sum_{k\in\mathbb{Z}} c_k \exp{\left( \frac{jk\pi t}{L} \right )}$$

Definition 15: Fourier transform

Let $f:\mathbb{R}\mapsto\mathbb{R}$ be integrable over the real line. Then the Fourier transform of $f$ is defined as

$$\hat{f}:\mathbb{R}\mapsto\mathbb{R},\quad\hat{f}(\omega) := \int_{-\infty}^\infty f(t)\exp{\left( -j\omega t\right)}\, dt$$

Think of the Fourier transform as the continuous analogue of Fourier coefficeints over the real line: encoding an analog signal as the sum of weighted frequency components of $f$.

The Laplace transform is a generalized Fourier transform. It allows the transform of any signal into the frequency domain, instead of the $j\omega$ line like the Fourier transform.

Proof by induction

Mathematical induction proves that we can climb as high as we like on a ladder, by proving that we can climb onto the bottom rung (the basis) and that from each rung we can climb up to the next one (the step)

Concrete Mathematics, quoted from Wikipedia

Formal power series

Definition 16: Formal power series

A infinite sum considered independently from any notion of convergence, and that can be manipulated with the usual algebraic operations on series:

addition
subtraction
multiplication
division
partial sums

And has terms of the form:

$$a_n x^n$$

Formal power series can be viewed as a generalization of polynomials such that the number of terms can be infinite and with no requirement of convergence.

Definition 17: Generating function

A representation of an infinite sequence of numbers as the coefficients of a formal power series.

A generating function is a device somewhat similar to a bag. Instead of carrying many little objects detachedly, which could be embarrasing, we put them all in a bag, and then we have only one object to carry, the bag.

George Pólya ¹

Definition 18: Improper integral

A limit of a definite integral, which is written as

$$\int_a^\infty f(t)\, dt$$

And which is equivalent to the evaluation of

$$\lim_{A\to\infty} \int_a^A f(t)\, dt$$

given a positive real number $A$.

If the integral from $a$ to $A$ exists for each $A > a$, and if the limit as $A\to\infty$ exists, then the improper integral is said to converge to the limiting value. Otherwise, the integral is said to diverge, or fail to exist. ²

Example 4: Convergence of the improper integral

Let $f(t) = e^{ct}, t\ge 0$, where $c$ is a real non-zero constant. Then

$$\begin{align*} \int_0^\infty e^{ct}\, dt &= \lim_{A\to\infty} \int_0^A e^{ct}\, dt = \lim_{A\to\infty} \frac{e^{ct}}c \lvert_0^A \\ &= \lim_{A\to\infty} \frac{1}c(e^{cA} - 1) \end{align*}$$

This integral converges to the value $\frac{-1}c$ if $c < 0$, and diverges if $c > 0$. If $c=0$, the integrand $f(t)$ is the constant function with value 1. In this case

$$\lim_{A\to\infty} \int_0^A 1\, dt = \lim_{A\to\infty} (A - 0) = \infty$$

The integral of a piecewise continuous function as defined is equal to the sum of the integrals on the continuous subintervals.

Numerical analysis

The study of algorithms that approximate solutions to equations within defined error bounds. Contrast with symbolic analysis.

Seven tool model:

Weighted averages
Iteration
Linear algebra
Interpolation
Taylor series
Bracketing
Intermediate-value Theorem

Approximate solutions to four categories of problems:

Approximate derivative or integral of discrete signal
Approximate solutions to systems of equations
Solutions to initial-, boundary-value problems
Unconstrained optimization problems in one or more variables

Gauss-Seidel method

The Gauss-Seidel method is an iterative method of solving a system of linear equations.

Definition 19: Gauss-Seidel method

Let $A\vec x = \vec b$ be a square system of $n$ linear equations with known $A$, $\vec b$ and unknown $\vec x$.

Let $x_k$ be a guess for the value of $x$. Then the next guess is given by the relation

$$L_* \vec x_{k+1} = \vec b - U\vec x_k$$

such that

$$x_{k+1} = L_*^{-1}(b - Ux_k)$$

where $A = L_* U$, $L_*$ is lower trangular, and $U$ is strictly upper triangular (diagonals are zero).

Theorem 3: Convergence of the Gauss-Seidel method

The Gauss-Seidel method converges if either:

$A$ is symmetric positive-definite
$A$ is strictly or irreducibly diagonally-dominant

Analytic functions

Definition 20: Analytic function

Analytic functions are functions of the form

$$\vec f: \mathbb{R}^n \mapsto \mathbb{R}^m$$

that are continuous and infinitely differentiable.

Formally, we state that $\vec f$ is real analytic on the set of points $D\in\mathbb{R}^n$ if the Taylor series at any point $\vec x_0\in D$

$$\vec T(\vec x) = \sum^\infty_{n=0} \frac{f^{(n)}(\vec x_0)}{n!}(\vec x - \vec x_0)^n$$

converges pointwise to $\vec f$. (This guarantees the above informal properties)

These properties are great for numerical methods of solving problems, many of which depend on the existence of the arbitrary $n$th derivative.

For physical problems, engineers often solve systems of differential equations. Numerical analysis techniques almost always take the derivative of integrals to solve only differential equations. By taking derivatives, integral and integro-differential equations are converted to pure differential equations.

We approximate a continuous and differentiable function $y$ by evaluating an approximation at specific points and interpolating values in between. Every one of our techniques will use a step size $h$ for the density of evaluated points.

These sections above may be useful:

#Ordinary differential equations

The book Geometry for Programmers has a different perspective based on graphics:

The smoothness ofa function is a number saying how many of its derivatives are still continuous. A $C^1$ function has its first derivative continuous, a $C^2$ the second derivative, and so on.

The C^1 derivative is closely related to the smoothness of a parameteric curve. THe C^2 smoothness is associated with its curvature.

Further, you may build piecewise curves smooth and continuous by making sure the values and derivatives of every two pieces put together coincide in the joint points.

As we will see later on, this is the intuition behind natural cubic spline interpolation

Definition 21: Implicit function theorem

Any equation of the form

$$F(x_0, x_1, \ldots, x_n) = 0$$

can usually be written locally in the form $x_0 = f(x_1, \ldots, x_n)$.

The solution to a $n$th order differential equation requires $n$ constraints. Each anti-derivative introduces one more unknown constant.

We may either describe the state of the system perfectly at one moment in time. This may be called the initial condition as it gives the state of the system at time $t_0$.

Another approach is to describe the value fo the function at different positions. These are called boundary conditions.

The give the state of the system at the boundary of $[a, b]$.

Approximate solutions to initial-value problems.

Initial-value problems

Given any first-order ODE, we may transform it into the form

$$y^{(1)}(t) = f(t, y(t))$$

That is, we may equate the first-order derivative with a function of both time and the derivative.

Then, select a step size $h$, and iterate $k$ steps to transform all the points

$$x_0 + hk \mapsto y_k, \text{where } y_k\approx y(x+hk)$$

Definition 22: Runge-Kutta method

Fourth-order Runge-Kutta method:

$$s_0 \leftarrow f(t_k, y_k)$$ $$s_1 \leftarrow f(t_k + 0.5h, y_k + 0.5hs_0)$$ $$s_2 \leftarrow f(t_k + 0.5h, y_k + 0.5hs_1)$$ $$s_3 \leftarrow f(t_k + h, y_k + hs_2)$$

$$y_{k+1} \leftarrow y_k + \frac{1}6h(s_0 +2s_1 + 2s_2 + s_3)$$

Heun's method

Splines

Definition 23: Spline

A function defined piecewise using polynomials.

In interpolating problems, spline interpolation is often preferred. It yields results with acceptable error bounds even with low degree polynomials, and avoids Runge's phenomenon for higher degrees.

Step function
Linear spline: closed linear spline is a polygon
Natural cubic spline: degree 3 with continuity $C^2$ (values and first and second derivatives are continuous)

$$S^{(2)}(a) = S^{(2)}(b) = 0$$

for $[a, b]$.

$$S_j(x) = a_j + b_j(x - x_j) + c_j(x-x_j)^2 + d_j(x - x_j)^3$$

Polynomial approximation

Evaluating a polynomial only requires additions. And multiplications, which can be reduced to addition. One of the most fundamental techniques of numerical analysis is reducing complex functions into polynomial functions with tolerable error.

Vandermonde interpolation

To compute the coefficients of an interpolating polynomial for $n$ points, we may use the method of least squares. The $n-1$ degree polynomial passing through all $n$ points is given by the matrix

Given $n$ points of form $(x_k, y_k)$, there is a polynomial of degree $n-1$ that passes through all of them as long as all $x_k$ are unique.

The coefficients of the polynomial can be found by solving a system of linear equations. The coefficient matrix of this operation is known as the Vandermonde matrix and raises the inputted $x$ coordinates to the 0th - $n-1$th powers.

$$V(x_1, x_2, \ldots, x_n) = \begin{bmatrix} x_1^{n-1} & x_1^{n-2} & \ldots & x_1 & 1 \\ x_2^{n-1} & x_2^{n-2} & \ldots & x_2 & 1 \\ \vdots & \vdots & \ddots & \vdots & \vdots \\ x_{n-1}^{n-1} & x_{n-1}^{n-2} & \ldots & x_{n-1} & 1 \\ x_n^{n-1} & x_n^{n-2} & \ldots & x_n & 1\end{bmatrix}$$

The coefficients are found by solving the below system:

$$\begin{bmatrix}a_{n-1} \\ a_{n-2} \\ \vdots \\ a_1 \\ a_0 \end{bmatrix} V = \begin{bmatrix} y_1 \\ y_2 \\ \vdots \\ y_{n-1} \\ y_n \end{bmatrix}$$

This makes sense because the interpolating polynomial for the $n=3$ (three-point) case must pass through each of the three, such that:

$y_1 = ax_1^2 + bx_1 + c$
$y_2 = ax_2^2 + bx_2 + c$
$y_3 = ax_3^2 + bx_3 + c$

Theorem 4: Existence and uniqueness of interpolation polynomials

Given $n+1$ points $(x_i, y_i)^n_{i=0}$ with distinct $x$ values. Then there is one and only one polynomial $p_n(x)\in\mathbb{P}_n$ satisfying the condition ¹

$$p_n(x_i) = y_i,\quad i\in\{0,\ldots,n\}$$

Using partial pivoting, we avoid adding a large multiple of one equation onto another. Doing so would lose information about the second row.

Drawbacks of this approach: a polynomial of $M$ degree may have up to $M$ roots. Every time a derivative of a function has a root, the function itself reaches an extremum point. Therefore, the interpolating polynomial has up to $N-2$ bumps if it is $N$th order.³. As we start to add more points, the function begins to wave more, especially towards the ends of the interpolation interval, in a phenomenon known as Runge's phenomenon.

Theorem 5: Runge's phenomenon

A problem of oscillation at the edges of an interval that occurs when using polynomial (Vandermonde) interpolation with polynomials of high degree.

This restricts the applicability of the Vandermonde method in real-world applications.

{% definition(ref="Horner's method) %} Horner's method is a numerically stable method of evaluating a polynomial. In short, it transforms a polynomial from

$$P(x) = a_{n} x^n + a_{n-1} x^{n-1} + \ldots + a_1 x + a_0$$

to the form

$$P(x) = a_0 + x(a_1 + x(a_2 + x(a_3)))$$

(the above is for a cubic)

The method reduces the number of multiplications. {% end %}

Numerical stability

Example 5: Subtractive cancellation

Let us compare the functions $f$ and $g$ which are equivalent.

$$f(x) = x(\sqrt{x+1} - \sqrt{x})$$

$$g(x) = \frac{x}{\sqrt{x+1} + \sqrt{x}}$$

If all our calculations have up to two decimal points, then:

$$f(500) = 500(\sqrt{501}-\sqrt{500}) = 500(22.38 - 22.36) = 500(0.02) = 10$$

$$g(500) = \frac{500}{\sqrt{501}+\sqrt{500}} = \frac{500}{22.38+22.36} = 11.17$$

The desired value was $f(500)=g(500)=11.174755\ldots$.

We see that subtractive cancellation means that $f$ is ill-conditioned while $g$ is well-conditioned.

Time and memory complexity

The tools for analyzing numeric and algorithmic complexity are the same.

Big-O notation

Big-O notation is a symbolism used in complexity theory, computer science, and mathematics. It describes how fast a function grows or declines. Formally, it describes the asymptotic behaviour of functions.

Big-O notation is also called Landau's symbol, from the German theoretician Edmund Landau. The letter $\mathcal{O}$ stands for "order," which means the rate of growth of a function.

For example, an algorithm to compute a problem of size $n$ might take $T(n)$ steps to complete. This function might be $T(n) = 4n^2 - 2n + 2$. However, for $n\to\infty$ note that the behaviour of $T$ is only affected by the $4n^2$ term. Not even the coefficient has any behavioural change. Therefore we could say that $T(n)$ grows at the order of $n^2$, or

$$T(n) = \mathcal{O}(n^2)$$

Appendix A: Engineering units

Definition 24: Radian

A unit of angle measure, defined such that one radian is the angle subtended at the centre of a circle by an arc equal in length to the radius.

Dimensional analysis is the analysis of relationships between different physical quantities by identifying their base quantities and the units of measurement, and tracking these dimensions as calculations are performed.

Appendix B: Complex numbers

Glossary

Big-O notation
Dirac delta function, impulse function
Fourier series
Fourier transform
Jacobian, Jacobian matrix
Ordinary differential equation, ODE
Runge-Kutta method, 4th-order Runge-Kutta method

References

⁴

Atkinson, K., and Han, W. Theoretical Numerical Analysis, [Print]

⁵

Cheney, E. Ward, and Kincaid, David R. Numerical Mathematics and Computing, 7th ed., [Print]

Boyce, William E., and DiPrima, Richard C. Elementary Differential Equations and their Applications, 10th ed, Published 2012, [Print]

Braun, Martin. Differential Equations and Their Applications, 4th ed, Published 1993, [Online]

⁶

Burden, Faires, Numerical Analysis

⁷

Hamming, Richard W. Numerical Methods for Scientists and Engineers, Published 1962, [Print]

⁸

Isaacson, Keller, Analysis of Numerical Methods

Kaleniuk, Oleksandr. Geometry for Programmers, Published 2023, [Online]

⁹

Anne Kværnø, Lecture notes for TMA4320 Introduction to Scientific Computation, "Polynomial interpolation", Published 2022-02-11, [PDF]

¹⁰

Lundqvist, Kristina. 16.070 Introduction to Computers & Programming, Published 2003, [Online]

¹¹

Pólya, George. Mathematics and plausible reasoning, Published 1954, [Print]

¹²

Julius O. Smith III, Introduction to Digital Filters, Accessed 2024-04-03, [Online]

¹³

Sternberg, Shlomo. Lecture 1 Newton's method, Accessed 2024-02-13, [Online]

Mathematics for computer engineering