Introduction to Pontryagin Maximum Principle

Introduction

The Pontryagin Maximum Principle (PMP) is a fundamental theorem in optimal control theory that provides necessary conditions for a control to be optimal. It states that for an optimal control, there must exist a set of auxiliary variables (co-states) such that the optimal control maximizes the Hamiltonian at every point in time.

If you don’t seem to understand tha above given definition don’t worry, i am going to assume no prior knowledge of control theory from you. I am going to start from the basics of control theory and tee you up with all the concepts you need to understand PMP.

Lets start with the basics, any control problem in the world will have the following fundamental components:

Dynamics of the System

Given that you are in a continous time system. The dynamics of the environment can depend upon:

\[\dot{x}(t) = f(x(t),\alpha(t))\] \[x(0) = x^{0}\]

Where, $x^{0}$ is the intial state and $x(t)$ is the state of the system at time $t$. Here $x : [0, ∞) → R^{n}$.

Given that $A$ is the set of control parameters the control function is given as $α : [0, ∞) → A$.

The dynamics of the system are control by:

  • The intial state $x^{0}$
  • The sequence of control actions $\alpha(t)$
  • The laws governing the state transition

All of the above is represent by function $f(x(t),\alpha(t))$, where $f : R^{n} ×A → R^{n}$

Payoff

The control sequences $\alpha^{*}(t)$ is said to be optimal when $P [α^{∗}(·)] ≥ P [α(·)]$ where $P[α(·)]$ is the payoff functional. Here $P[α(·)]$ can be represented by:

\[P[\alpha(\cdot)] := \int_{0}^{T} r(\mathbf{x}(t), \alpha(t)) \, dt + g(\mathbf{x}(T))\]

Here $r(\mathbf{x}(t), \alpha(t))$ represents the running cost and $g(\mathbf{x}(T))$ represents the Termial cost.

While the above payoff is designed in a fashion that it runs for a finite time T. There can be problem which runs until a certain criteria is met. A few of such are called fixed enpoint and variable endpoint problems.

Controllability

We say a system is controllable, when there exists a trajectory from the intial state $x^{0}$ to the target. Where the trajectory is influenced by state dynamics and a specific sequence of control inputs.

We solve for the condition of controlllability in linear systems.

For the sections below we define $M ∈ \mathcal{M}^{n×n}$ and $N ∈ \mathcal{M}^{n×m}$. Where $\mathcal{M}^{n×m}$ denotes the set of all n × m matrices.

Unqiue solution

Given that $S$ denotes the target set (the desirable state/s) where $S \in R^{n}$. We define the reachable set $C$ to consist intial points $x^{0}$ for which within a finite time $T$ there exists a sequenece of control that can steer to the target $S$. Here the reachable set $C$ is symmetric and convex.

For any linear non homogenous system of ordinary set of differential equations (ODE), the unique solution to is given by:

\[x(t) = \mathbf{X}(t) {x}^0 + \mathbf{X}(t) \int_{0}^{t} \mathbf{X}^{-1}(s) \mathbf{f}(s) \, ds\]

The ODE’s are as follows:

\[\left\{ \begin{array}{l} \dot{\mathbf{x}}(t) = M \mathbf{x}(t) + \mathbf{f}(t) \\ \mathbf{x}(0) = \mathbf{x}^0 \end{array} \right.\]

In our case for our optmium control problem the unique solution for it is given by:

\[x(t) = \mathbf{X}(t) {x}^0 + \mathbf{X}(t) \int_{0}^{t} \mathbf{X}^{-1}(s) N \alpha(s) \, ds \tag{1}\]

Here $\mathbf{X}(t)$ is the fundamental solution and is sometimes given by:

\[\mathbf{X}(t) = e^{tM} := \sum_{k=0}^{\infty} \frac{t^k M^k}{k!} \tag{2}\]

Controllability Matrix

For any given optimal control problem we define controllability matrix $G$ as:

\[G = G(M,N) := [N,MN,M^{2}N,...,M^{n−1}N]\]

The operator [.] denotes concatation of the matrices present inside it.

Theorem: A system is controllable when rank of $G = n$ and and $Re \lambda$ ≤ 0 for each eigenvalue $\lambda$ of $M$

Observability

Given a linear system of ODE where $M \in \mathcal{M}^{n×n}$ and $N \in \mathcal{M}^{m×n}$

\[\left\{ \begin{array}{l} \dot{\mathbf{x}}(t) = M \mathbf{x}(t) + \mathbf{f}(t) \\ \mathbf{x}(0) = \mathbf{x}^0 \end{array} \right.\]

we can observe $y(t):= Nx(t)$

When m « n and we interpret $y(·)$ as low dimensional “observations” or “measurements” of the high dimensional dynamics $x(·)$.

Here, the pair (ODE), $(O)$ is called $observable$ if the knowledge of $y(·)$ on any time interval [0,t] allows us to compute $x^{0}$.

For our linear systems it is proven that observability and controllability are dual concepts. Which implies, if a system is not observable it is not controllable and vice versa.

Bang Bang Control

Given that $A$ is a cube $[-1,1]^{m}$ in $R^{m}$. A control is said to be Bang-Bang if for each time $t \geq 0$ and each index i = 1,2,3…m we have $ \alpha^{i}(t) $ = 1.

Theorem: Let t > 0 and suppose $x^0 ∈ C$, then for the system

\[\dot{\mathbf{x}}(t) = M \mathbf{x}(t) + N\alpha(t) \tag{3}\]

there exists a bang-bang control $α(·)$ which steers $x^0$ to $S$

This means that the graph of control input with respect to time looks similar to the graph of unit step function.

Optimal Control

Once we know that a system is controllable, we then find the optimal control sequence as follows:

Linear Time Optimal Control

For a linear time system such as the ones stated above

Theorem: For eqn-1 stated above there exists a non-zero vector $h$ such that:

\[h^T X^{−1}(t)Nα^∗(t) = max_{a \in A}\{h^T X^{−1}(t)Na\} \tag{M}\]

The above theorem implies that if we know the vector $h$ then the maximization principle (M) provides us with a formula for computing optimal control $α^∗(·)$.

All the below given examples have been taken from the below given resources, strongly recommend you to check them out if you find this topic to be interesting.

Example: ROCKET RAILROAD CAR

Here, we want to figure out how to fire the rockets, so as to arrive at the origin 0 with zero velocity in a minimum amount of time.

Given that:

\[q(t) = \text{position at time t}\] \[v(t) = q ̇(t) = \text{velocity at time t}\] \[\alpha(t) = \text{thrust from rockets}\]

Where $−1 ≤ α(t) ≤ 1,$ depending upon the sign the left or right engine is used.

Figure 1: Rocket RailRoad Car Problem Rocket RailRoad Car Figure

Assuming the car has mass m = 1, the laws of motion is given as $q^{..}(t) = α(t)$. By considering $x(t) = (q(t), v(t))^T$. The dynamics of the system as:

\[\begin{cases} \dot{\mathbf{x}}(t) = \begin{pmatrix} 0 & 1 \\ 0 & 0 \end{pmatrix} \mathbf{x}(t) + \begin{pmatrix} 0 \\ 1 \end{pmatrix} \alpha(t) \\ \mathbf{x}(0) = x^0 = (q_0, v_0)^T \end{cases}\]

Since our goal is to steer to the origin (0, 0) in minimum time, the payoff is defined as:

\[P[\alpha(\cdot)] = -\int_{0}^{\tau} 1 \, dt = -\tau,\]

where $\tau$ is the first time that $q(\tau) = v(\tau) = 0$.

From Equation 3 and PMP we say there exists a vector $h$ such that:

\[h^TX^{−1}(t)Nα^∗(t) = max\{h^TX^{−1}(t)Na\}\]

From eqution 2 and 3 we calculate $\mathbf{X^{-1}}(t)$ = $\begin{pmatrix} 1 & -t
0 & 1 \end{pmatrix}$ and N as $\begin{pmatrix} 0
1 \end{pmatrix}$

This implies: \(h^TX^{−1}(t)N =(h1,h2)\begin{pmatrix} -t \\ 1 \end{pmatrix}=−th1 +h2.\)

According to the PMP principle:

\[(−th1 + h2)α^∗(t) = max\{(−th1 + h2)a\}\]

Therefore the optimal control $α^∗$ for the problem is given as:

\[α^∗(t) = \textbf{sgn}(−th1 + h2)\]

here sgn(.) denotes the sign function

Lagrangian and Hamiltonian Dynamics

Before we learn the PMP theorem for a more generalised system where we have more than linear dynamics. There are few fundamental ideas to be covered.

Unlike the Newotian mechanics that look at the world of physics from the lens of Forces and Vectors. The Hamiltonian and Lagrange mechanics look at it more from a particle, velocity, momentuem and energy perspective.

Lagrangian Mechanics

In Lagrangian mechanics, we are given a smooth function $L : R^n ×R^n → R$ Where: \(L = L(x,v) \ \text{L is called the Lagrangian }\)

The variable $x$ denotes position and the variable $v$ denotes velocity.

Hamiltonian Equations

Optimal control problems are dealt with Hamiltonian Mechanics where we deal with position and momentum of particles. Here $p$ denotes the generalised momentum and is given as:

\[p = ∇_vL(x,v)\]

For any given dynamical system we define the Hamiltonian H as $H : R^n × R^n → R$ where:

\[H(x, p) = p · v(x, p) − L(x, V(x, p)) \tag{4}\]

After solving Equation 4 with the help from Euler-Lagrange equation for physical systems the Hamiltonain is the sum of the kinetic and potential energies:

\[H(x, p) = \frac{|p|^2}{2m} + V(x)\]

You can look into these concepts at a greater detail here Lagrangian and Hamiltonian Dynamics and here Principle of Least Action

We will see that if $α^∗(·)$ is an optimal control, then there exists a function $p^∗(·)$, called the costate, that satisfies a certain maximization principle.

You can interpret the $P(.)$ functional as a type of Lagrange multiplier. Where it serves as a constraint that enforces the optimality of the control by linking the dynamics of the system state to the optimization problem.

Generalised PMP Theorem

For a standard optimal control problem where the evolution of system and payoff are given as:

\[\left\{ \begin{array}{l} \dot{\mathbf{x}}(t) = f(x(t),\alpha(t)) \\ \mathbf{x}(0) = \mathbf{x}^0 \end{array} \right. \tag{ODE}\] \[P[\alpha(\cdot)] := \int_{0}^{T} r(\mathbf{x}(t), \alpha(t)) \, dt + g(\mathbf{x}(T)) \tag{P}\]

Where terminal time $T > 0$ and running payoff $r : R^n × A → R$ and terminal payoff $g:R^n →R$ are given.

In control theory the Hamiltonian is a function of: \(H(x, p, a) := f(x, a) · p + r(x, a) \tag{5}\)

Theorem Assume $α^∗(·)$ is optimal for (ODE), (P) and $x^∗(·)$ is the corresponding trajectory. Then there exists a function $p^* : [0, T ] → R^n$ such that:

\[\dot{x}^∗(t) = ∇_pH(x^∗(t), p^∗(t), α^∗(t)) \tag{ODE}\] \[\dot{p}^∗(t) = −∇_xH(x∗(t), p∗(t), α∗(t)) \tag{ADJ}\] \[H(x^∗(t), p^∗(t), α^∗(t)) = max_{a∈A} \{H(x∗(t), p∗(t), a)\} \tag{M}\] \[p^∗(T) = ∇g(x^∗(T )) \tag{T}\]

Example

As mentioned before the costate $p^∗(·)$ can be interpreted as a sort of Lagrange multiplier. This implies that in practice, we add the equations (ADJ) and (M) to (ODE) and then try to solve for $α^∗(·),x^∗(·)$ and for $p^∗(·)$.

An example problem worked on below, should provide one an idea as to how it is done.

Example: MOON LANDER PROBLEM

In this problem we are to bring a spacecraft to a soft landing on the lunar surface, using the least amount of fuel possible where:

\[h(t) = \text{height at time t}\] \[v(t) = \text{velocity} = \dot{h}(t)\] \[m(t) = \text{mass of spacecraft (changing as fuel is burned)}\] \[α(t) = \text{thrust at time t}\]

Figure 2: Moon Lander Problem Moon Lander Problem

The thrust is constrained so that $0 ≤ α(t) ≤ 1$; that is, $A = [0, 1]$.

There are also the constraints that the height and mass be nonnegative: $h(t) ≥ 0$ and $m(t) ≥ 0$

The dynamics and initial conditions for the system are given as:

Dynamics \(\begin{cases} \dot{h}(t) = v(t) \\ \dot{v}(t) = -g + \frac{\alpha(t)}{m(t)} \\ \dot{m}(t) = -k \alpha(t) \end{cases}\)

Intial Conditions \(\begin{cases} h(0) = h_0 > 0 \\ v(0) = v_0 \\ m(0) = m_0 > 0 \end{cases}\)

The goal is to land on the moon safely, maximizing the remaining fuel $m(τ)$, where $τ = τ[α(·)]$ is the first time $h(τ) = v(τ) = 0$.

Since $α = \frac{−m^ ̇}{k}$ , our intention is equivalently to minimize the total applied thrust before landing, where the payoff is given as:

\[P[\alpha(\cdot)] = -\int_{0}^{\tau} \alpha(t) \, dt.\]

Given that the general dynamics of the system are denoted as:

\[\mathbf{x}(t) = \begin{pmatrix} h(t) \\ v(t) \\ m(t) \end{pmatrix}, \quad \mathbf{f} = \begin{pmatrix} v \\ -g + \frac{a}{m} \\ -ka \end{pmatrix}.\]

According to Pontryagin Maximum Principle we know that:

\[H(x, p, a) = \mathbf{f} \cdot p + r\]

Implies, for our given problem \(H(x, p, a) = (v, -g + \frac{a}{m}, -ka) \cdot (p_1, p_2, p_3) - a\)

The above equation can be simplified as:

\[H(x, p, a) = -a + p_1 v + p_2 \left(-g + \frac{a}{m}\right) + p_3 (-ka)\]

We find the (ADJ) for our problem to be:

\[H_{x1} =H_h =0, \ H_{x2} =H_v =p_1, \ H_{x3} =H_m =−\frac{p_{2}a}{m^2}\]

Therefore

\[\begin{cases} \dot{p}^1(t) = 0 \\ \dot{p}^2(t) = -p^1(t) \\ \dot{p}^3(t) = \frac{p^2(t) \alpha(t)}{m(t)^2} \end{cases} \tag{ADJ}\]

The maximization condition (M) reads as:

\[\hat{H}(\mathbf{x}(t), \mathbf{p}(t), \alpha(t)) = \max_{0 \leq \alpha \leq 1} H(\mathbf{x}(t), \mathbf{p}(t),\alpha) \tag{M}\]

For our problem it can be simplfied to be:

\[= \max_{0 \leq a \leq 1} \left\{ -a + p^1(t)v(t) + p^2(t) \left[-g + \frac{a}{m(t)}\right] + p^3(t)(-ka) \right\}\] \[= p^1(t)v(t) - p^2(t)g + \max_{0 \leq a \leq 1} \left\{ a \left( -1 + \frac{p^2(t)}{m(t)} - kp^3(t) \right) \right\}\]

Thus the optimal control law is given by the rule:

\[\alpha(t) = \begin{cases} 1 & \text{if } 1 - \frac{p^2(t)}{m(t)} + kp^3(t) < 0 \\ 0 & \text{if } 1 - \frac{p^2(t)}{m(t)} + kp^3(t) > 0 \end{cases}\]

let us now find the costate $p(·)$ so that $α(·)$ and $x(·)$ described above satisfy (ODE), (ADJ), (M).

To do this, we will have have to figure out appropriate initial conditions:

\[p^1(0) = \lambda_1, \ p^2(0) = \lambda_2, \ p^3(0) = \lambda_3\]

We solve (ADJ) for $α(·)$ as above, and find:

\[\begin{cases} p^1(t) \equiv \lambda_1 & (0 \leq t \leq \tau) \\ p^2(t) = \lambda_2 - \lambda_1 t & (0 \leq t \leq \tau) \\ p^3(t) = \begin{cases} \lambda_3 & (0 \leq t \leq t^*) \\ \lambda_3 + \int_{t^*}^{t} \frac{\lambda_2 - \lambda_1 s}{(m_0 + k(t^* - s))^2} \, ds & (t^* \leq t \leq \tau) \end{cases} \end{cases}\]

here we define:

\[r(t) := 1 - \frac{p^2(t)}{m(t)} + p^3(t)k\]

This implies:

\[\dot{r} = -\frac{\dot{p}^2}{m} + \frac{p^2 \dot{m}}{m^2} + \dot{p}^3 k = \frac{\lambda_1}{m} + \frac{p^2}{m^2}(-k\alpha) + \left(\frac{p^2 \alpha}{m^2}\right)k = \frac{\lambda_1}{m(t)}.\]

we choose $λ_1$ < 0, so that $r$ is decreasing, We calculate:

\[r(t^∗) = 1 − \frac{(λ_2 − λ_1t^∗)}{m_0} + λ_3k\]

Therefore according to the maximizing principle (M) the optimal control is given as:

\[\alpha(t) = \begin{cases} 1 & \text{if } r(t) < 0 \\ 0 & \text{if } r(t) > 0 \end{cases}\]

Conclusion

There is so much more to the PMP, and optimal control theory. To understand the theorem and optimal control theory in greater detail. You can go through the below attached references. The blog was my way of improving my understanding of the topic. The examples and the blog as stated above are heavily inspired from the attached references.

References

An Introduction to Mathematical Optimal Control Theory

Optimal Control Theory