- @for Computer vision people
- @author Kai Ruhl
- @since 2021-02

Mathematical
optimization
is good for a lot of things, among them finding the best variables
for a model,
e.g. a rotation **R** and a translation **t** for two
point clouds. The way is a bit long, so we start with the 1D case, and go to 2D and then
3D.

We illustrate everything with plots from
Python, more specifically
Jupyter
QtConsole, which you can install by `sudo apt-get
install python3-qtconsole jupyter-core` and then start with
`jupyter qtconsole`, where we configure
SVG
with `%config InlineBackend.figure_format = 'svg'`.
You might also need to install
numpy
and matplotlib
[doc]
if not yet present.

A function like **x ^{2}** is 1D,
and "optimization" means going to the minimum, but without knowing
the function! We only know the value of the function at any place
we ask.

`import numpy as np; import matplotlib.pyplot as plt
f = lambda x : x ** 2. # some
"unknown" function
x = 1.5
# some place on that
function
X = np.linspace(-2, +2) # a lot of places
plt.plot(x, f(x), 'ro', X, f(X), 'b-')`

So the blue function is only known
very locally, e.g. at the red dot.
Now we can try to walk to the minimum at (0, 0) which is luckily
possible because the function is convex (in convex
optimization, hammering a function until it becomes
convex is a major goal). Our walk is called gradient
descent,
where we take an initial solution (above `x = 1.5`) and
go "against the slope" aka the 1st derivative.

`dx =
0.1
# tiny variation in x
Px = np.array([x - dx / 2, x + dx / 2]) #
x-coords around x
Pfx =
f(Px)
# y-coords around f(x)
gx = (Pfx[1] - Pfx[0]) / (Px[1] - Px[0]) #
gradient around (x, f(x))`

At this step, Python tells you that the slope `gx` is
around +3, so `x` going right means going up, and going
left means going down. We visualize the gradient/slope:

`def gradient(f, x, dx): return (f(x + dx / 2) - f(x - dx /
2)) / dx
def slope(f, x, gx, r=.5): sR = np.linspace (-r,
+r); sX = sR + x; sY = sR * gx + f(x);
return sX, sY
plt.plot(x, f(x), 'ro', X, f(X), 'b-', *slope(f, x,
gx), 'r-')`

Ok, so time to update our "best solution" `x` with
something better, by subtracting some portion of the slope.

`s = 0.3 # choose a
"good" step size
x = x - gx * s # go against the slope
plt.plot(x, f(x), 'ro', X, f(X), 'b-', *slope(f, x,
gradient(f, x, dx)), 'r-')`

So what did we just do? We assumed that `f` can be
approximated by a linear function (the slope), then just followed
that slope in reverse direction (down). This is called linearization.
Finding a good step size is crucial and the subject of Levenberg-Marquardt
and other algorithms. And formally, what we just did is compute **argmin_x
f(x)**, meaning "find the argument `x` that minimizes
`f(x)`".

More: Calculus intuition (video series), Newton 2nd order optimization (12min video), explanatory Levenberg-Marquardt intuition.

Now we enter the world of linear algebra (3Blue1Brown video series).

- Vectors can be positions in space, but also seen as axes.
- A matrix can contain axes as column vectors, then it
transforms
`special2world`, i.e. a point in that space goes to "normal" space; also called "change of basis".

- Matrix multiplication is changing spaces twice, combined in one.
- Determinant of a matrix is a measure for expansion. Zero determinant means a dimension is collapsed.
- Eigenvectors are the rare vectors that stay on the line when
transformed. Usually non-existant in
**R**and**t**.

So much for background, now we start with a 2D
function **f(x,y)=x ^{2}+5*y^{2}**
and a point

`
import numpy as np; import matplotlib.pyplot as plt
f = lambda x,y: (x**2) + 5*(y**2)
x, y = +2.0, -1.5 `

... and if for some reason, the figure wont show, just type `fig`
to redraw it.

This time, we can only determine partial derivatives:

`def partial_gradient(f, x, y, dx, dy): # dx OR
dy = 0
return (f(x+dx/2, y+dy/2)
- f(x-dx/2, y-dy/2)) / (dx+dy)
d = 0.1 # tiny variation in x or y
gx = partial_gradient(f, x, y, d, 0) # vary
x
gy = partial_gradient(f, x, y, 0, d) # vary
y`

Python tells us that `gx` is +4 and thus right means
higher; and `gy` is -15 and thus down means higher. Again,
for minimization we should go into the opposite direction, so we
show `-gx`, `-gy` and then the vector **[-gx,
-gy] ^{T}** which indicates the direction to go.

`def arrow(x, y, dx, dy, text, color, zoom=.1):
zx, zy = dx*zoom, dy*zoom #
arrow end point offset
if not (dx == 0
and dy == 0):
ax.arrow(x, y, zx, zy, color=color, head_width=.1, \
length_includes_head=True)
if text is not None:
sx, sy = np.sign(dx),
np.sign(dy) # [-1, 0, +1]
tx, ty = sx *
5, sy * 5 # text offset
ha
= ['right', 'center', 'left'][int(sx+1)]
va
= ['top', 'center', 'bottom'][int(sy+1)]
ax.annotate(text, (x+zx, y+zy), (tx, ty), ha=ha, va=va, \
textcoords='offset points', color=color, \
bbox=dict(fc='w', ec='w', alpha=0.9, pad=1.))
contour(ax, X, Y, f); ax.plot(x, y, 'ro')
arrow(x, y, -gx, 0, '-gx', 'green')
arrow(x, y, 0, -gy, '-gy', 'blue' )
arrow(x, y, -gx, -gy, '-g', 'violet')`

Note that **-g** does not point directly at the minimum
because the dimensions are not uniformly scaled (i.e. ellipse not
circle), this happens a lot with more natural parameters too. But
still, **-g** points in the right direction, and again we go
down with a "good" step size, this time in vector notation:

`step_size =
.08 #
advanced magic
x_ = np.array([x, y]) #
vector x at t=0
g_ = np.array([gx, gy]) # gradient
vector
x_ = x_ - step_size * g_ # vector x at t=1`

And compute the next step from here.

`def partial_gradients(f, x_, d):
n =
len(x_)
# here: 2
I = np.identity(n) * d #
indexer
return
np.array([partial_gradient(f, *x_, *I[r]) for r in range(n)])
g_ = partial_gradients(f, x_, d)
contour(ax, X, Y, f); ax.plot(*x_, 'ro')
arrow(*x_, *-g_, '-g', 'violet')`

... and again, we get a gradient direction that is sort-of-right
in solving **argmin_xy f(x, y)** for us, read as "find
arguments `x` and `y` that minimize `f(x,y)`;
or in vector version **argmin_x_ f(x_)**.

So far, we had only one objective, i.e. functions **f**:ℝ^{n}→ℝ^{1}
where the target energy to minimize was exactly one value. In
reality, we often have many data points that we wish to minimize
against some model. Here, consider the contour plot from the last
section lowered by `4`, so that we can try to bring the vertexes
of some triangle
to the `0` level.

`f = lambda x,y: (x**2) + 5*(y**2) - 4 #
cuts z=0
P = np.array([[0, 0], [1, 1.2], [2, 0]]) #
3 triangle vertexes (red)
def point(x, y, text, color, marker='o'):
ax.plot(x, y,
marker=marker, color=color)
if text is not None:
ax.annotate(text, (x, y), (0, 5), ha='center', va='bottom', \
textcoords='offset points', color=color, \
bbox=dict(fc='w', ec='w', alpha=0.9, pad=1.))
def tri(P):
ax.clear(); ax.axis('equal')
contour(ax, X, Y, f)
[point(*p_, "%.1f" %
f(*p_), 'red') for p_ in P]
ax.fill(*P.T, ec='r',
ls='--', alpha=.5, fill=False)
tri(P)`

One way to go is a model **t=[tx, ty]**, i.e. a translation
that shifts the triangle such that all three red vertexes (points) have a `z`
value of `0` or near it. So our objective is **argmin_t
e(t, P, f)** for a translation `t` on triangle points
`P` evaluated with a distance function `f`. The
energy to be minimized is **e(t, P, f)=∑_{p∈P} f(p+t) ^{2}**,
i.e we go through all points in

`t_ = np.array([0, 0]) # initial solution
r = lambda t_, P, f: f(*(P + t_).T) # residual
e = lambda t_, P, f: np.sum(r(t_, P, f)**2) # energy
r0 = r(t_, P, f)
e0 = e(t_, P, f)`

Initially `e0` is roughly `33.6`, which is
plausible: the red vertexes have
left-to-right `z` values of `-4`, `+4.2`
and `0` (see `r0`). So where to send `t`
now? If the points `p` on the contour `f`
themselves were the objective, we would compute the
Jacobian matrix
[ros]
`J` (all partial derivatives) similar to
before:

`J = np.array([partial_gradients(f, p_, d) for p_ in P]) #
df/dp; wrong objective`

But we are on `t` (not on `p` anymore),
minimizing the residual **r**, so it is not **∂f/∂p**
(above) but rather **∂r/∂t** (below). This means we
vary `tx` and `ty`, and see which effect this has
on `r`, following Gauss-Newton
with `n=2` (number of translation parameters) and `m=3`
(number of observations = triangle vertexes); we expect `J`
to be a 3x2 (`m*n`) matrix.

⌈ ∂r(p0)/∂tx ∂r(p0)/∂ty ⌉ J = | ∂r(p1)/∂tx ∂r(p1)/∂ty | ⌊ ∂r(p2)/∂tx ∂r(p2)/∂ty ⌋

`gr = lambda t_, d_, P, r, f : (r(t_ + d_/2, P, f) - r(t_ - d_/2, P, f)) / np.sum(d_, axis=-1)
d = 0.1 # tiny variation in x or y
gr_tx = gr(t_, np.array([d, 0]), P, r, f) # derive tx
gr_ty = gr(t_, np.array([0, d]), P, r, f) # derive ty`

At this point, `gr_tx` (gradient of `r` in
direction of `tx`) is [`0`, `2`, `4`]
which is plausible because the three points **P _{0}**,

`def grs(t_, d, P, r, f):
n, m = len(t_), len(P) # n=2 params, m=3 points
T2 = np.tile(t_, [n*m,
1])
# 6x2
I2 = np.tile((np.identity(n) * d),
[m, 1]) # 6x2
P2 = P.repeat(n,
axis=0)
# 6x2
return gr(T2, I2, P2, r,
f).reshape(m, n) # Jacobian
J = grs(t_, d, P, r, f)`

⌈ 0 0 ⌉ J = | 2 12 | ⌊ 4 0 ⌋

With gradient descent (*GD*), we would now go:

`step_size = .15 # advanced magic
g_ = np.average(J, axis=0) # gradient vector
t_ = t_ - step_size * g_ # go down
e1 = e(t_, P, f) # improved energy (GD)
Pa = np.average(P, axis=0) # triangle center
tri(P + t_)
arrow(*Pa, *t_, 'gd', 'gray', 1.)`

... where `e1` is `7.8` which is a lot better
than the previous `33.6` energy.
We averaged
`J` over the points **P**,
so each point had the same say in it;
numerically equivalent to **J ^{T}‧[⅓ ⅓ ⅓]^{T}**.
Alternatively we can use the residual

`step_size = .01 # advanced magic
t_ = np.array([0, 0]) # reset
g_ = J.T.dot(r(t_, P, f)) # weighted gradient
t_ = t_ - step_size * g_ # go down
e1 = e(t_, P, f) # improved energy (GD²)
tri(P + t_)
arrow(*Pa, *t_, 'gd2', 'gray', 1.)`

... and `e1` is `8.8` which is also much better
than the previous `33.6` energy. The effective gradient
`g_` seems to prioritize `ty` more, which is
somewhat plausible because of the steeper rise in y-direction.
There seems to be no official name for this method (here:
*GD²*), though with the **r** factor coming from
deriving **½[f(x)]²** [chain rule]
instead of **f(x)** alone, we can at minimum say that
least
squares (instead of total
variation) is involved in some capacity.

Gauss-Newton (*GN*) proposes another alternative step, namely
**(J ^{T}J)^{-1}J^{T}‧r(t_,P,f)**
where

`step_size = 1.6 # advanced magic
t_ = np.array([0, 0]) # reset
JI = np.linalg.inv(J.T.dot(J)).dot(J.T) # pseudo-inverse of J
g_ = JI.dot(r(t_, P, f)) # curved gradient
t_ = t_ - step_size * g_ # go down
e1 = e(t_, P, f) # improved energy (GN)
tri(P + t_)
arrow(*Pa, *t_, 'gn', 'gray', 1.)`

Under Gauss-Newton, `e1` is `9.3` which is again
a lot better than the previous `33.6` energy,
and `ty` seems to be favored even more.
Like gradient descent, the crux is of course the step size,
but regardless of optimization
method there is no way around it. The full algorithm can
look like this:

`def gauss_newton(t_, P, e, r, f, iter=[]):
step_sizes = 2 ** np.arange(10, 0,
-1) / 100 # [5.12..0.01]
t0 = np.copy(t_); iter.append(t0)
e0 = e(t0, P, f)
improvement =
999.
# can we improve t?
while improvement >
0.1:
J
= grs(t0, d, P, r, f)
JI
= np.linalg.inv(J.T.dot(J)).dot(J.T)
g_
= JI.dot(r(t0, P, f))
#
best step size?
t1
= t0 - step_sizes[0] * g_
e1
= e(t1, P, f)
for step in step_sizes[1:]:
t2 = t0 - step * g_
e2 = e(t2, P, f)
if e2 > e1: # getting worse
break
else:
t1, e1 = t2, e2
#
anyone better?
improvement = e0 - e1
if
improvement > 0:
t0, e0 = t1, e1
iter.append(t0)
return t0
iter0 = []
t_ = np.array([0, 0]) # reset
t_ = gauss_newton(t_, P, e, r, f, iter0)
e1 = e(t_, P, f) # improved energy (GN)
tri(P + t_)
[arrow(*(Pa+iter0[i-1]), *(iter0[i]-iter0[i-1]), None,
'gray', 1.) for i in range(1,len(iter0))]`

With `e1` as `7.6` the resulting energy is better
than ever before, even better than what I would have expected
manually:

`t3 = np.array([-1, -.6])
e3 = e(t3, P, f)
tri(P+t3)
arrow(*Pa, *t3, 'manual', 'gray', 1.)`

Here, `e3` is `7.7` which is indeed slightly
worse than the Gauss-Newton energy.

One last thing: We evaluated `r` numerically for partial
derivatives, which is the right thing for most problems; but
sometimes explicit derivation is also possible:

∂r/∂t_{x} |
= | ∂f(P+t)/∂t_{x} |
= | [(P_{x}+t_{x})^{2}+5(P_{y}+t_{y})^{2}
- 4]/∂t_{x} |
= | (P_{x}^{2}+2P_{x}t_{x}+t_{x}^{2})/∂t_{x} |
= | 2P_{x}+2t_{x} |

∂r/∂t_{y} |
= | ∂f(P+t)/∂t_{y} |
= | [(P_{x}+t_{x})^{2}+5(P_{y}+t_{y})^{2}
- 4]/∂t_{y} |
= | (5P_{y}^{2}+10P_{y}t_{y}+5t_{y}^{2})/∂t_{y} |
= | 10P_{y}+10t_{y} |

... so in these friendly cases, evaluating `r` can at
least be simplified.

More: Further improved variant Levenberg-Marquardt
(*LM*), with **(J ^{T}J+**

We now move away from continuous functions and concentrate fully
on multiple objectives, namely two pyramids that we want to
overlay in 3D;
one is fixed and the other one is first rotated by a 3x3 matrix
**R** and then translated by a 3D vector **t**. We use mplot3d
(a submodule of matplotlib)
for visualization, starting with the first pyramid:

`import numpy as np; import matplotlib.pyplot as plt
import mpl_toolkits.mplot3d # for 3d plots
V0 = np.array([[-1, -1, 0], [+1, -1, 0], [+1, +1, 0],
[-1, +1, 0], [0, 0, 2], [0, 0, 1]]) # vertexes
Ti = np.array([[0, 1, 3], [3, 1, 2], [0, 1, 4], [1, 2,
4], [2, 3, 4], [3, 0, 4]]) # triangles
fig = plt.figure(); fig.clear()
ax = fig.add_subplot(111, projection='3d')
def ax3lim(x, y, z, r):
ax.set_xlim3d(x-r, x+r); ax.set_xlabel('x')
ax.set_ylim3d(y-r, y+r); ax.set_ylabel('y')
ax.set_zlim3d(z-r, z+r); ax.set_zlabel('z')
def wire(V, Ti, color, line='-', a=.5):
Ti2 = Ti if 2 == Ti.ndim
else np.expand_dims(Ti, axis=0)
Li1 =
np.roll(Ti2.repeat(2, axis=1), -1, axis=1)
Li2 =
np.sort(Li1.reshape((-1, 2)), axis=1)
Li3 =
np.unique(Li2, axis=0)
[ax.plot(*V[li].T, c=color, alpha=a, ls=line)
for li in Li3]
ax3lim (0, 0,
1, 1.5)
wire(V0, Ti, 'blue')`

... and create a second pyramid by rotating and translating the
first; our optimization will then strive to get the inverse of
that **R** and **t** (or more formally, 6 parameters [**a**,
**b**, **c**, **t _{x}**,

⌈ | cos a cos b | cos a sin b sin c - sin a cos c | cos a sin b cos c + sin a sin c | ⌉ | |||

R | = | | | sin a cos b | sin a sin b sin c + cos a cos c | sin a sin b cos c - cos a sin c | | | |

⌊ | - sin b | cos b sin c | cos b cos c | ⌋ |

`def rotmat(a, b, c): # rz, ry, rx
sa,
sb, sc = np.sin([a, b, c])
ca,
cb, cc = np.cos([a, b, c])
row0
= [ca*cb, ca*sb*sc - sa*cc, ca*sb*cc + sa*sc]
row1
= [sa*cb, sa*sb*sc + ca*cc, sa*sb*cc - ca*sc]
row2
= [-sb, cb*sc, cb*cc]
return np.array([row0, row1, row2])
angles0 = np.array([5, 10, 15]) # rz, ry, rx
R0 = rotmat(*np.radians(angles0))
t0_ = np.array([0.6, 0.4, 0.2])
V1 = R0.dot(V0.T).T + t0_
wire(V1, Ti, 'red')`

When the two pyramids are not too far apart,
this is the classical Iterative
Closest Point (*ICP*) case in the field of point
cloud registration. The 2020 variants mostly use
"point-to-plane", i.e. some vertexes on the red
pyramid are pulled to the closest triangle of the blue pyramid. Since the red vertexes are probably not the best
choice to be pulled (they are all on corners, since we have no tessellation
or subdivision
whatsoever), we replace them with vertexes centered on the
triangles.

`P1 = np.average(V1[Ti], axis=1)
ax.clear()
wire(V0, Ti, 'blue')
wire(V1, Ti, 'red')
ax.plot(*P1.T, 'rx')`

The red crosses need to be tested for distance to the corresponding triangle, using point-to-plane distance [wolfram]. Finding the correct or best "partner triangle" is a major goal in surface or point registration; here we cheat because we already know the point-to-triangle correspondence.

`def triangle_distance(p_, V, ti_):
tri = V[ti_]; assert tri.shape == (3,
3) # 3 vertexes
n = np.cross(tri[1] - tri[0], tri[2]
- tri[0]) # plane normal
n = n / np.linalg.norm(n, axis=0) #
normalize
plane_dist = n.dot(p_ - tri[0])
pp_ = p_ - plane_dist * n
return np.abs(plane_dist), pp_
triangle_distance(P1[0], V0, Ti[0]) #
testing
def any_triangle_distance(p_, V, Ti): #
unused
dist_list = [triangle_distance(p_, V,
Ti[i]) + (i,) for i in range(len(Ti))]
min_i = np.argmin([a[0] for a in
dist_list])
return dist_list[min_i]
any_triangle_distance(P1[3], V0, Ti) #
testing
def partner_triangle_distance(P1, P1_i, V, Ti,
pti_=None):
p_ = P1 if 1 == P1.ndim else P1[P1_i]
ti = P1_i if pti_ is None else pti_[P1_i]
return triangle_distance(p_, V, Ti[ti]) + (ti,)
dist, pp_, ti = partner_triangle_distance(P1, 0, V0, Ti)`

So point `P1[0]` = (`0.3`, `0.05`, `0.2`),
in the middle of the red left floor
triangle, is projected onto nearest triangle `ti`= `0`,
the blue left floor triangle. The
intersection point `pp_` = (`0.3`, `0.05`,
`0.0`) has point-to-plane distance `dist` = `0.17`.

`def arrow3(p0_, p1_, color='gray'):
P0 = p0_ if 2 ==
p0_.ndim else np.expand_dims(p0_, 0)
P1 = p1_ if 2 ==
p1_.ndim else np.expand_dims(p1_, 0)
Q = P1 - P0
ax.quiver(*P0.T,
*Q.T, color=color, alpha=0.8)
return P0, Q
def wirepyr(V0, V1, Ti):
ax.clear()
wire(V0, Ti, 'blue', ':')
wire(V1, Ti, 'red', ':')
def wiretri(P1, P1_i, pp_, V0, V1, Ti, V0_i):
wirepyr(V0, V1, Ti)
ax.plot(*P1.T, 'rx')
ax.plot(*P1[P1_i:P1_i+1].T,
'gx')
arrow3(P1[P1_i], pp_)
wire(V1, Ti[P1_i],
'green', (0, (5, 5)), a=0.8)
wire(V0, Ti[V0_i],
'green', (0, (5, 5)), a=0.8)
wiretri(P1, 0, pp_, V0, V1, Ti, ti)`

We can see on the ground plate that the miniscule distance (down
the z-axis) is correct, but it is not the most interesting
triangle: This would be the right-most one, with point and
triangle index `3`.

`dist, pp_, ti = partner_triangle_distance(P1, 3, V0, Ti)
wiretri(P1, 3, pp_, V0, V1, Ti, ti)`

With the point-to-triangle distances, we now write the (somewhat
complex) residual **r** for one point, as well as the energy
function **e** which sums up over all points.

`rt1 = np.zeros(6) # desired a, b, c, tx, ty, tz
def r(rt_, P, i, V, Ti, pti_=None):
R, t_ = rotmat(*rt_[:3]),
rt_[3:]
q_ = R.dot(P[i]) + t_ #
near blue
dist_pp_ti =
partner_triangle_distance(q_, i, V, Ti, pti_)
return dist_pp_ti[0]
def e(rt_, P, V, Ti, pti_=None):
residuals =
np.array([r(rt_, P, i, V, Ti, pti_) for i in range(len(P))])
return
np.sum(residuals ** 2)
r1 = r(rt1, P1, 0, V0, Ti)
e1 = e(rt1, P1, V0, Ti)`

For the first point in `P1`, the distance to the
triangle plane `r1` is `0.17` and the least
squares energy `e1` (over all points in `P1`) is `0.98`,
so we seem to be not too far away. Now comes the hardest part,
determining the Jacobian `J`.

⌈ ∂r(p0)/∂a ∂r(p0)/∂b ∂r(p0)/∂c ∂r(p0)/∂tx ∂r(p0)/∂ty ∂r(p0)/∂tz ⌉ J = | ∂r(p1)/∂a ∂r(p1)/∂b ∂r(p1)/∂c ∂r(p1)/∂tx ∂r(p1)/∂ty ∂r(p1)/∂tz | ⌊ ... ... ... ... ... ... ⌋

As in 2D, we vary tiny distances `d` for all in [`a`,
`b`, `c`, `tx`, `ty`, `tz`].

`def gr3(rt_, d_, r, P, i, V, Ti, pti_=None):
assert 2 == P.ndim
assert 3 == P.shape[-1]
assert rt_.shape == d_.shape
rt0, rt1 = rt_ + d_/2, rt_ - d_/2
dr0 = r(rt0, P, i, V, Ti, pti_)
dr1 = r(rt1, P, i, V, Ti, pti_)
return (dr0 - dr1) / np.sum(d_)
gr3(rt1, np.array([0, 0, 0, 1e-8, 0, 0]), r, P1, 3, V0, Ti)
`

... and with a gradient of `+0.89` on `tx`, it
seems that the vertex `P1[3]` on the rightmost triangle
can benefit from going left (against the gradient, as usual). Now
on to all partial derivatives:

`def grs3(rt_, d, r, P, i, V, Ti, pti_=None):
I = np.identity(len(rt_)) * d
L = [gr3(rt_, d_, r, P, i, V, Ti, pti_) for
d_ in I]
return np.array(L)
np.set_printoptions(precision=2, linewidth=100)
grs3(rt1, 1e-8, r, P1, 3, V0, Ti)`

... and with a gradient (`-0.26`, `+0.03`, `+0.13`)
for rotation and (`+0.89`, `0`, `+0.45`)
for translation, it seems that for `P1[3]` rotation around
z- and x-axis are more meaningful than around y-axis; and that
shifts in x- and z-direction make more sense than shifting in
y-direction. Both kinda make sense if you look at the most recent
image above.

`def jac3(rt_, r, P, V, Ti, pti_=None):
L = [grs3(rt_, 1e-8, r, P, i, V, Ti, pti_)
for i in range(len(P))]
return np.array(L)
J1 = jac3(rt1, r, P1, V0, Ti)`

a b c tx ty tz ⌈ +0.00 -0.29 +0.05 +0.00 +0.00 +1.00 ⌉ J = | +0.00 -0.91 +0.75 +0.00 +0.00 +1.00 | ⌊ ... ... ... ... ... ... ⌋

For the first two lines (representing the two bottom triangles),
`a` being `0` means that both triangles are
indifferent to rotations around the z-axis; `tx` and `ty`
being `0` indicates a similar indifference to shifts in
the xy-plane. So let us try Gauss-Newton again:

`def gn3(rt_, r, e, P, V, Ti, pti_=None, iter=[], d=1e-8):
step_sizes = 2 ** np.arange(10, 0,
-1) / 100 # [5.12..0.01]
t0 = np.copy(rt_); iter.append(t0)
e0 = e(t0, P, V, Ti, pti_)
improvement = 999.
# can we improve rt?
while improvement > 0.1:
J = jac3(t0,
r, P, V, Ti, pti_)
JI =
np.linalg.inv(J.T.dot(J)).dot(J.T)
r_ =
np.array([r(t0, P, i, V, Ti, pti_) for i in range(len(P))])
g_ =
JI.dot(r_)
#
best step size?
t1 = t0 -
step_sizes[0] * g_
e1 = e(t1, P,
V, Ti, pti_)
for step in
step_sizes[1:]:
t2 = t0 - step * g_
e2 = e(t2, P, V, Ti, pti_)
if e2 > e1: # getting worse
break
else:
t1, e1 = t2, e2
#
anyone better?
improvement =
e0 - e1
if
improvement > 0:
t0, e0 = t1, e1
iter.append(t0)
return t0
pti = np.arange(0, len(P1)) #
point-triangle indexer, 1:1
iter3 = [] # step-wise rt_
rt1 = np.zeros(6) # reset
rt2 = gn3(rt1, r, e, P1, V0, Ti, pti, iter3)
e2 = e(rt2, P1, V0, Ti) # improved energy (GN)`

After optimizing, `e2` is `0.003` which is a lot
better than the `0.98` from the beginning. Now let us see
where the optimized rotation/translation parameters `rt2`,
at [`0.01`, `-0.06`, `-0.13`, `-0.6`,
`-0.37`, `-0.17`] with a big shift in **x**,
took the pyramid:

`def proj3(rt_, V):
R, t_ = rotmat(*rt_[:3]),
rt_[3:]
return R.dot(V.T).T + t_
def arrowproj3(iter, p_):
P = np.array([proj3(rt_,
p_) for rt_ in iter])
Q = np.roll(P, -1, axis=0) - P
ax.quiver(*P[:-1].T,
*Q[:-1].T, color=['gray', 'green'],
arrow_length_ratio=.1,
alpha=.5)
def wireproj3(rt_, V0, V1, Ti, iter):
V2 = proj3(rt_, V1)
wirepyr(V0, V2, Ti)
if iter is not None:
arrowproj3(iter, V1[-1])
wireproj3(rt2, V0, V1, Ti, iter3)`

While somewhat close, it also shows that our sampling was obviously not optimal. A closer look:

`P2 = proj3(rt2, P1)
dist_pp_ti2 = [partner_triangle_distance(P2, i, V0, Ti)
for i in range(len(P2))]
PP2 = np.array([a[1] for a in dist_pp_ti2])
ax.plot(*P2.T, 'rx')
[arrow3(P2[i], PP2[i]) for i in range(len(P2))]`

... and the point-to-plane arrows are barely visible, confirming
that Gauss-Newton did it right (verification);
but we did not do the right thing (validation).
Maybe adding more samples can help: One on the ground plate to fix
`z`, and two on the sides to fix `xy`.

`p_floor_ = V1[1] + 0.2 * (V1[3] - V1[1])
p_side1_ = V1[1] + 0.2 * ((V1[0] + V1[4])/2 -
V1[1])
p_side2_ = V1[1] + 0.2 * ((V1[2] + V1[4])/2 - V1[1])
P3 = np.vstack([P1, p_floor_, p_side1_, p_side2_])
pti3 = np.hstack([pti, 0, 2, 3]) `

And they are finally matched. Even though we did not normalize
the energy (i.e. divide by number of points), `e3` with `0.0007`
is much better than the previous `0.0030`. And so we
learned that choosing samples has quite an influence!

And that concludes our introduction to mathematical optimization. Though a bit lengthy, I hope you were able to get some intuition for the field, something that I found to be quite a challenge. Thanks for leaving at least two brain cells here, and have a fun day!

EOF (Mar:2021)