Math and Linear Algebra: Optimization

@for Computer vision people
@author Kai Ruhl
@since 2021-02

Mathematical optimization is good for a lot of things, among them finding the best variables for a model, e.g. a rotation R and a translation t for two point clouds. The way is a bit long, so we start with the 1D case, and go to 2D and then 3D.

We illustrate everything with plots from Python, more specifically Jupyter QtConsole, which you can install by sudo apt-get install python3-qtconsole jupyter-core and then start with jupyter qtconsole, where we configure SVG with %config InlineBackend.figure_format = 'svg'. You might also need to install numpy and matplotlib [doc] if not yet present.

1D Optimization

A function like x² is 1D, and "optimization" means going to the minimum, but without knowing the function! We only know the value of the function at any place we ask.

import numpy as np; import matplotlib.pyplot as plt f = lambda x : x ** 2. # some "unknown" function x = 1.5 # some place on that function X = np.linspace(-2, +2) # a lot of places plt.plot(x, f(x), 'ro', X, f(X), 'b-')

So the blue function is only known very locally, e.g. at the red dot. Now we can try to walk to the minimum at (0, 0) which is luckily possible because the function is convex (in convex optimization, hammering a function until it becomes convex is a major goal). Our walk is called gradient descent, where we take an initial solution (above x = 1.5) and go "against the slope" aka the 1st derivative.

dx = 0.1 # tiny variation in x Px = np.array([x - dx / 2, x + dx / 2]) # x-coords around x Pfx = f(Px) # y-coords around f(x) gx = (Pfx[1] - Pfx[0]) / (Px[1] - Px[0]) # gradient around (x, f(x))

At this step, Python tells you that the slope gx is around +3, so x going right means going up, and going left means going down. We visualize the gradient/slope:

def gradient(f, x, dx): return (f(x + dx / 2) - f(x - dx / 2)) / dx def slope(f, x, gx, r=.5): sR = np.linspace (-r, +r); sX = sR + x; sY = sR * gx + f(x); return sX, sY plt.plot(x, f(x), 'ro', X, f(X), 'b-', *slope(f, x, gx), 'r-')

Ok, so time to update our "best solution" x with something better, by subtracting some portion of the slope.

s = 0.3 # choose a "good" step size x = x - gx * s # go against the slope plt.plot(x, f(x), 'ro', X, f(X), 'b-', *slope(f, x, gradient(f, x, dx)), 'r-')

So what did we just do? We assumed that f can be approximated by a linear function (the slope), then just followed that slope in reverse direction (down). This is called linearization. Finding a good step size is crucial and the subject of Levenberg-Marquardt and other algorithms. And formally, what we just did is compute argmin_x f(x), meaning "find the argument x that minimizes f(x)".

2D Optimization

So much for background, now we start with a 2D function f(x,y)=x²+5*y² and a point (x,y) that we want to move towards the minimum.

import numpy as np; import matplotlib.pyplot as plt f = lambda x,y: (x**2) + 5*(y**2) x, y = +2.0, -1.5 # point (x,y) Xr = np.linspace(-3, +3) # range of x Yr = np.linspace(-2, +2) # range of y X, Y = np.meshgrid(Xr, Yr) # all x/y combinations fig = plt.figure(); fig.clear() ax = fig.subplots(); ax.axis('equal') def contour(ax, X, Y, f): ax.clear(); cs = ax.contour(X, Y, f(X,Y)) ax.clabel(cs, inline=True, fontsize=8) contour(ax, X, Y, f); ax.plot(x, y, 'ro')

def partial_gradient(f, x, y, dx, dy): # dx OR dy = 0 return (f(x+dx/2, y+dy/2) - f(x-dx/2, y-dy/2)) / (dx+dy) d = 0.1 # tiny variation in x or y gx = partial_gradient(f, x, y, d, 0) # vary x gy = partial_gradient(f, x, y, 0, d) # vary y

Python tells us that gx is +4 and thus right means higher; and gy is -15 and thus down means higher. Again, for minimization we should go into the opposite direction, so we show -gx, -gy and then the vector [-gx, -gy]^T which indicates the direction to go.

def arrow(x, y, dx, dy, text, color, zoom=.1): zx, zy = dx*zoom, dy*zoom # arrow end point offset if not (dx == 0 and dy == 0): ax.arrow(x, y, zx, zy, color=color, head_width=.1, \ length_includes_head=True) if text is not None: sx, sy = np.sign(dx), np.sign(dy) # [-1, 0, +1] tx, ty = sx * 5, sy * 5 # text offset ha = ['right', 'center', 'left'][int(sx+1)] va = ['top', 'center', 'bottom'][int(sy+1)] ax.annotate(text, (x+zx, y+zy), (tx, ty), ha=ha, va=va, \ textcoords='offset points', color=color, \ bbox=dict(fc='w', ec='w', alpha=0.9, pad=1.)) contour(ax, X, Y, f); ax.plot(x, y, 'ro') arrow(x, y, -gx, 0, '-gx', 'green') arrow(x, y, 0, -gy, '-gy', 'blue' ) arrow(x, y, -gx, -gy, '-g', 'violet')

Note that -g does not point directly at the minimum because the dimensions are not uniformly scaled (i.e. ellipse not circle), this happens a lot with more natural parameters too. But still, -g points in the right direction, and again we go down with a "good" step size, this time in vector notation:

step_size = .08 # advanced magic x_ = np.array([x, y]) # vector x at t=0 g_ = np.array([gx, gy]) # gradient vector x_ = x_ - step_size * g_ # vector x at t=1

def partial_gradients(f, x_, d): n = len(x_) # here: 2 I = np.identity(n) * d # indexer return np.array([partial_gradient(f, *x_, *I[r]) for r in range(n)]) g_ = partial_gradients(f, x_, d) contour(ax, X, Y, f); ax.plot(*x_, 'ro') arrow(*x_, *-g_, '-g', 'violet')

... and again, we get a gradient direction that is sort-of-right in solving argmin_xy f(x, y) for us, read as "find arguments x and y that minimize f(x,y); or in vector version argmin_x_ f(x_).

2.1D Optimization, multiple Objectives

So far, we had only one objective, i.e. functions f:ℝⁿ→ℝ¹ where the target energy to minimize was exactly one value. In reality, we often have many data points that we wish to minimize against some model. Here, consider the contour plot from the last section lowered by 4, so that we can try to bring the vertexes of some triangle to the 0 level.

f = lambda x,y: (x**2) + 5*(y**2) - 4 # cuts z=0 P = np.array([[0, 0], [1, 1.2], [2, 0]]) # 3 triangle vertexes (red) def point(x, y, text, color, marker='o'): ax.plot(x, y, marker=marker, color=color) if text is not None: ax.annotate(text, (x, y), (0, 5), ha='center', va='bottom', \ textcoords='offset points', color=color, \ bbox=dict(fc='w', ec='w', alpha=0.9, pad=1.)) def tri(P): ax.clear(); ax.axis('equal') contour(ax, X, Y, f) [point(*p_, "%.1f" % f(*p_), 'red') for p_ in P] ax.fill(*P.T, ec='r', ls='--', alpha=.5, fill=False) tri(P)

One way to go is a model t=[tx, ty], i.e. a translation that shifts the triangle such that all three red vertexes (points) have a z value of 0 or near it. So our objective is argmin_t e(t, P, f) for a translation t on triangle points P evaluated with a distance function f. The energy to be minimized is e(t, P, f)=∑_{p∈P} f(p+t)², i.e we go through all points in P, add the currently best translation t, and evaluate with f how far away that is from 0 (squared so negative f results turn into positive energy). Then we sum up all the (squared) distances; the smaller the better.

t_ = np.array([0, 0]) # initial solution r = lambda t_, P, f: f(*(P + t_).T) # residual e = lambda t_, P, f: np.sum(r(t_, P, f)**2) # energy r0 = r(t_, P, f) e0 = e(t_, P, f)

Initially e0 is roughly 33.6, which is plausible: the red vertexes have left-to-right z values of -4, +4.2 and 0 (see r0). So where to send t now? If the points p on the contour f themselves were the objective, we would compute the Jacobian matrix [ros] J (all partial derivatives) similar to before:

J = np.array([partial_gradients(f, p_, d) for p_ in P]) # df/dp; wrong objective

But we are on t (not on p anymore), minimizing the residual r, so it is not ∂f/∂p (above) but rather ∂r/∂t (below). This means we vary tx and ty, and see which effect this has on r, following Gauss-Newton with n=2 (number of translation parameters) and m=3 (number of observations = triangle vertexes); we expect J to be a 3x2 (m*n) matrix.

gr = lambda t_, d_, P, r, f : (r(t_ + d_/2, P, f) - r(t_ - d_/2, P, f)) / np.sum(d_, axis=-1) d = 0.1 # tiny variation in x or y gr_tx = gr(t_, np.array([d, 0]), P, r, f) # derive tx gr_ty = gr(t_, np.array([0, d]), P, r, f) # derive ty

At this point, gr_tx (gradient of r in direction of tx) is [0, 2, 4] which is plausible because the three points P₀, P₁, P₂ plus tx are progressively more "right on the graph"; and gr_ty (gradient of r in direction of ty) is [0, 12, 0] which is also plausible because only P₁ plus ty is non-zero. Now for the Jacobian in one go:

def grs(t_, d, P, r, f): n, m = len(t_), len(P) # n=2 params, m=3 points T2 = np.tile(t_, [n*m, 1]) # 6x2 I2 = np.tile((np.identity(n) * d), [m, 1]) # 6x2 P2 = P.repeat(n, axis=0) # 6x2 return gr(T2, I2, P2, r, f).reshape(m, n) # Jacobian J = grs(t_, d, P, r, f)

step_size = .15 # advanced magic g_ = np.average(J, axis=0) # gradient vector t_ = t_ - step_size * g_ # go down e1 = e(t_, P, f) # improved energy (GD) Pa = np.average(P, axis=0) # triangle center tri(P + t_) arrow(*Pa, *t_, 'gd', 'gray', 1.)

... where e1 is 7.8 which is a lot better than the previous 33.6 energy. We averaged J over the points P, so each point had the same say in it; numerically equivalent to J^T‧[⅓ ⅓ ⅓]^T. Alternatively we can use the residual r as weight by going J^T‧r(t_,P,f); so e.g. the happy P₂ (0.0) has no influence, the unhappy P₀ (-4.0) reverses, and the unhappy P₀ (+4.2) plows ahead.

step_size = .01 # advanced magic t_ = np.array([0, 0]) # reset g_ = J.T.dot(r(t_, P, f)) # weighted gradient t_ = t_ - step_size * g_ # go down e1 = e(t_, P, f) # improved energy (GD²) tri(P + t_) arrow(*Pa, *t_, 'gd2', 'gray', 1.)

... and e1 is 8.8 which is also much better than the previous 33.6 energy. The effective gradient g_ seems to prioritize ty more, which is somewhat plausible because of the steeper rise in y-direction. There seems to be no official name for this method (here: GD²), though with the r factor coming from deriving ½[f(x)]² [chain rule] instead of f(x) alone, we can at minimum say that least squares (instead of total variation) is involved in some capacity.

Gauss-Newton (GN) proposes another alternative step, namely (J^TJ)^-1J^T‧r(t_,P,f) where (J^TJ)^-1 is supposed to approximate the Hessian (2nd derivatives) to include local curvature in some way that I am really unclear on. Also, apparently the three combined J are supposed to be the pseudo-inverse of J, whatever that implies.

step_size = 1.6 # advanced magic t_ = np.array([0, 0]) # reset JI = np.linalg.inv(J.T.dot(J)).dot(J.T) # pseudo-inverse of J g_ = JI.dot(r(t_, P, f)) # curved gradient t_ = t_ - step_size * g_ # go down e1 = e(t_, P, f) # improved energy (GN) tri(P + t_) arrow(*Pa, *t_, 'gn', 'gray', 1.)

Under Gauss-Newton, e1 is 9.3 which is again a lot better than the previous 33.6 energy, and ty seems to be favored even more. Like gradient descent, the crux is of course the step size, but regardless of optimization method there is no way around it. The full algorithm can look like this:

def gauss_newton(t_, P, e, r, f, iter=[]): step_sizes = 2 ** np.arange(10, 0, -1) / 100 # [5.12..0.01] t0 = np.copy(t_); iter.append(t0) e0 = e(t0, P, f) improvement = 999. # can we improve t? while improvement > 0.1: J = grs(t0, d, P, r, f) JI = np.linalg.inv(J.T.dot(J)).dot(J.T) g_ = JI.dot(r(t0, P, f)) # best step size? t1 = t0 - step_sizes[0] * g_ e1 = e(t1, P, f) for step in step_sizes[1:]: t2 = t0 - step * g_ e2 = e(t2, P, f) if e2 > e1: # getting worse break else: t1, e1 = t2, e2 # anyone better? improvement = e0 - e1 if improvement > 0: t0, e0 = t1, e1 iter.append(t0) return t0 iter0 = [] t_ = np.array([0, 0]) # reset t_ = gauss_newton(t_, P, e, r, f, iter0) e1 = e(t_, P, f) # improved energy (GN) tri(P + t_) [arrow(*(Pa+iter0[i-1]), *(iter0[i]-iter0[i-1]), None, 'gray', 1.) for i in range(1,len(iter0))]

With e1 as 7.6 the resulting energy is better than ever before, even better than what I would have expected manually:

t3 = np.array([-1, -.6]) e3 = e(t3, P, f) tri(P+t3) arrow(*Pa, *t3, 'manual', 'gray', 1.)

Here, e3 is 7.7 which is indeed slightly worse than the Gauss-Newton energy.

One last thing: We evaluated r numerically for partial derivatives, which is the right thing for most problems; but sometimes explicit derivation is also possible:

∂r/∂t_x	=	∂f(P+t)/∂t_x	=	[(P_x+t_x)²+5(P_y+t_y)² - 4]/∂t_x	=	(P_x²+2P_xt_x+t_x²)/∂t_x	=	2P_x+2t_x
∂r/∂t_y	=	∂f(P+t)/∂t_y	=	[(P_x+t_x)²+5(P_y+t_y)² - 4]/∂t_y	=	(5P_y²+10P_yt_y+5t_y²)/∂t_y	=	10P_y+10t_y

More: Further improved variant Levenberg-Marquardt (LM), with (J^TJ+λ‧diag(J^TJ))^-1J^T‧r(t_,P,f) instead of (J^TJ)^-1J^T‧r(t_,P,f) with a "dampening factor" (or relative weighting) λ combining Gauss-Newton (GN) and gradient descent (GD); Jacobian in robotics with J relating joints to end-effector, instead of translation t to points P.

3D Optimization

We now move away from continuous functions and concentrate fully on multiple objectives, namely two pyramids that we want to overlay in 3D; one is fixed and the other one is first rotated by a 3x3 matrix R and then translated by a 3D vector t. We use mplot3d (a submodule of matplotlib) for visualization, starting with the first pyramid:

import numpy as np; import matplotlib.pyplot as plt import mpl_toolkits.mplot3d # for 3d plots V0 = np.array([[-1, -1, 0], [+1, -1, 0], [+1, +1, 0], [-1, +1, 0], [0, 0, 2], [0, 0, 1]]) # vertexes Ti = np.array([[0, 1, 3], [3, 1, 2], [0, 1, 4], [1, 2, 4], [2, 3, 4], [3, 0, 4]]) # triangles fig = plt.figure(); fig.clear() ax = fig.add_subplot(111, projection='3d') def ax3lim(x, y, z, r): ax.set_xlim3d(x-r, x+r); ax.set_xlabel('x') ax.set_ylim3d(y-r, y+r); ax.set_ylabel('y') ax.set_zlim3d(z-r, z+r); ax.set_zlabel('z') def wire(V, Ti, color, line='-', a=.5): Ti2 = Ti if 2 == Ti.ndim else np.expand_dims(Ti, axis=0) Li1 = np.roll(Ti2.repeat(2, axis=1), -1, axis=1) Li2 = np.sort(Li1.reshape((-1, 2)), axis=1) Li3 = np.unique(Li2, axis=0) [ax.plot(*V[li].T, c=color, alpha=a, ls=line) for li in Li3] ax3lim (0, 0, 1, 1.5) wire(V0, Ti, 'blue')

... and create a second pyramid by rotating and translating the first; our optimization will then strive to get the inverse of that R and t (or more formally, 6 parameters [a, b, c, t_x, t_y, t_z] that describe them). Building the rotation matrix from angles a, b, c is not entirely trivial.

		⌈	cos a cos b	cos a sin b sin c - sin a cos c	cos a sin b cos c + sin a sin c	⌉
R	=	\|	sin a cos b	sin a sin b sin c + cos a cos c	sin a sin b cos c - cos a sin c	\|
		⌊	- sin b	cos b sin c	cos b cos c	⌋

def rotmat(a, b, c): # rz, ry, rx sa, sb, sc = np.sin([a, b, c]) ca, cb, cc = np.cos([a, b, c]) row0 = [ca*cb, ca*sb*sc - sa*cc, ca*sb*cc + sa*sc] row1 = [sa*cb, sa*sb*sc + ca*cc, sa*sb*cc - ca*sc] row2 = [-sb, cb*sc, cb*cc] return np.array([row0, row1, row2]) angles0 = np.array([5, 10, 15]) # rz, ry, rx R0 = rotmat(*np.radians(angles0)) t0_ = np.array([0.6, 0.4, 0.2]) V1 = R0.dot(V0.T).T + t0_ wire(V1, Ti, 'red')

When the two pyramids are not too far apart, this is the classical Iterative Closest Point (ICP) case in the field of point cloud registration. The 2020 variants mostly use "point-to-plane", i.e. some vertexes on the red pyramid are pulled to the closest triangle of the blue pyramid. Since the red vertexes are probably not the best choice to be pulled (they are all on corners, since we have no tessellation or subdivision whatsoever), we replace them with vertexes centered on the triangles.

P1 = np.average(V1[Ti], axis=1) ax.clear() wire(V0, Ti, 'blue') wire(V1, Ti, 'red') ax.plot(*P1.T, 'rx')

The red crosses need to be tested for distance to the corresponding triangle, using point-to-plane distance [wolfram]. Finding the correct or best "partner triangle" is a major goal in surface or point registration; here we cheat because we already know the point-to-triangle correspondence.

def triangle_distance(p_, V, ti_): tri = V[ti_]; assert tri.shape == (3, 3) # 3 vertexes n = np.cross(tri[1] - tri[0], tri[2] - tri[0]) # plane normal n = n / np.linalg.norm(n, axis=0) # normalize plane_dist = n.dot(p_ - tri[0]) pp_ = p_ - plane_dist * n return np.abs(plane_dist), pp_ triangle_distance(P1[0], V0, Ti[0]) # testing def any_triangle_distance(p_, V, Ti): # unused dist_list = [triangle_distance(p_, V, Ti[i]) + (i,) for i in range(len(Ti))] min_i = np.argmin([a[0] for a in dist_list]) return dist_list[min_i] any_triangle_distance(P1[3], V0, Ti) # testing def partner_triangle_distance(P1, P1_i, V, Ti, pti_=None): p_ = P1 if 1 == P1.ndim else P1[P1_i] ti = P1_i if pti_ is None else pti_[P1_i] return triangle_distance(p_, V, Ti[ti]) + (ti,) dist, pp_, ti = partner_triangle_distance(P1, 0, V0, Ti)

So point P1[0] = (0.3, 0.05, 0.2), in the middle of the red left floor triangle, is projected onto nearest triangle ti= 0, the blue left floor triangle. The intersection point pp_ = (0.3, 0.05, 0.0) has point-to-plane distance dist = 0.17.

def arrow3(p0_, p1_, color='gray'): P0 = p0_ if 2 == p0_.ndim else np.expand_dims(p0_, 0) P1 = p1_ if 2 == p1_.ndim else np.expand_dims(p1_, 0) Q = P1 - P0 ax.quiver(*P0.T, *Q.T, color=color, alpha=0.8) return P0, Q def wirepyr(V0, V1, Ti): ax.clear() wire(V0, Ti, 'blue', ':') wire(V1, Ti, 'red', ':') def wiretri(P1, P1_i, pp_, V0, V1, Ti, V0_i): wirepyr(V0, V1, Ti) ax.plot(*P1.T, 'rx') ax.plot(*P1[P1_i:P1_i+1].T, 'gx') arrow3(P1[P1_i], pp_) wire(V1, Ti[P1_i], 'green', (0, (5, 5)), a=0.8) wire(V0, Ti[V0_i], 'green', (0, (5, 5)), a=0.8) wiretri(P1, 0, pp_, V0, V1, Ti, ti)

We can see on the ground plate that the miniscule distance (down the z-axis) is correct, but it is not the most interesting triangle: This would be the right-most one, with point and triangle index 3.

dist, pp_, ti = partner_triangle_distance(P1, 3, V0, Ti) wiretri(P1, 3, pp_, V0, V1, Ti, ti)

With the point-to-triangle distances, we now write the (somewhat complex) residual r for one point, as well as the energy function e which sums up over all points.

rt1 = np.zeros(6) # desired a, b, c, tx, ty, tz def r(rt_, P, i, V, Ti, pti_=None): R, t_ = rotmat(*rt_[:3]), rt_[3:] q_ = R.dot(P[i]) + t_ # near blue dist_pp_ti = partner_triangle_distance(q_, i, V, Ti, pti_) return dist_pp_ti[0] def e(rt_, P, V, Ti, pti_=None): residuals = np.array([r(rt_, P, i, V, Ti, pti_) for i in range(len(P))]) return np.sum(residuals ** 2) r1 = r(rt1, P1, 0, V0, Ti) e1 = e(rt1, P1, V0, Ti)

For the first point in P1, the distance to the triangle plane r1 is 0.17 and the least squares energy e1 (over all points in P1) is 0.98, so we seem to be not too far away. Now comes the hardest part, determining the Jacobian J.

As in 2D, we vary tiny distances d for all in [a, b, c, tx, ty, tz].

def gr3(rt_, d_, r, P, i, V, Ti, pti_=None): assert 2 == P.ndim assert 3 == P.shape[-1] assert rt_.shape == d_.shape rt0, rt1 = rt_ + d_/2, rt_ - d_/2 dr0 = r(rt0, P, i, V, Ti, pti_) dr1 = r(rt1, P, i, V, Ti, pti_) return (dr0 - dr1) / np.sum(d_) gr3(rt1, np.array([0, 0, 0, 1e-8, 0, 0]), r, P1, 3, V0, Ti) # testing

... and with a gradient of +0.89 on tx, it seems that the vertex P1[3] on the rightmost triangle can benefit from going left (against the gradient, as usual). Now on to all partial derivatives:

def grs3(rt_, d, r, P, i, V, Ti, pti_=None): I = np.identity(len(rt_)) * d L = [gr3(rt_, d_, r, P, i, V, Ti, pti_) for d_ in I] return np.array(L) np.set_printoptions(precision=2, linewidth=100) grs3(rt1, 1e-8, r, P1, 3, V0, Ti)

... and with a gradient (-0.26, +0.03, +0.13) for rotation and (+0.89, 0, +0.45) for translation, it seems that for P1[3] rotation around z- and x-axis are more meaningful than around y-axis; and that shifts in x- and z-direction make more sense than shifting in y-direction. Both kinda make sense if you look at the most recent image above.

def jac3(rt_, r, P, V, Ti, pti_=None): L = [grs3(rt_, 1e-8, r, P, i, V, Ti, pti_) for i in range(len(P))] return np.array(L) J1 = jac3(rt1, r, P1, V0, Ti)

For the first two lines (representing the two bottom triangles), a being 0 means that both triangles are indifferent to rotations around the z-axis; tx and ty being 0 indicates a similar indifference to shifts in the xy-plane. So let us try Gauss-Newton again:

def gn3(rt_, r, e, P, V, Ti, pti_=None, iter=[], d=1e-8): step_sizes = 2 ** np.arange(10, 0, -1) / 100 # [5.12..0.01] t0 = np.copy(rt_); iter.append(t0) e0 = e(t0, P, V, Ti, pti_) improvement = 999. # can we improve rt? while improvement > 0.1: J = jac3(t0, r, P, V, Ti, pti_) JI = np.linalg.inv(J.T.dot(J)).dot(J.T) r_ = np.array([r(t0, P, i, V, Ti, pti_) for i in range(len(P))]) g_ = JI.dot(r_) # best step size? t1 = t0 - step_sizes[0] * g_ e1 = e(t1, P, V, Ti, pti_) for step in step_sizes[1:]: t2 = t0 - step * g_ e2 = e(t2, P, V, Ti, pti_) if e2 > e1: # getting worse break else: t1, e1 = t2, e2 # anyone better? improvement = e0 - e1 if improvement > 0: t0, e0 = t1, e1 iter.append(t0) return t0 pti = np.arange(0, len(P1)) # point-triangle indexer, 1:1 iter3 = [] # step-wise rt_ rt1 = np.zeros(6) # reset rt2 = gn3(rt1, r, e, P1, V0, Ti, pti, iter3) e2 = e(rt2, P1, V0, Ti) # improved energy (GN)

After optimizing, e2 is 0.003 which is a lot better than the 0.98 from the beginning. Now let us see where the optimized rotation/translation parameters rt2, at [0.01, -0.06, -0.13, -0.6, -0.37, -0.17] with a big shift in x, took the pyramid:

def proj3(rt_, V): R, t_ = rotmat(*rt_[:3]), rt_[3:] return R.dot(V.T).T + t_ def arrowproj3(iter, p_): P = np.array([proj3(rt_, p_) for rt_ in iter]) Q = np.roll(P, -1, axis=0) - P ax.quiver(*P[:-1].T, *Q[:-1].T, color=['gray', 'green'], arrow_length_ratio=.1, alpha=.5) def wireproj3(rt_, V0, V1, Ti, iter): V2 = proj3(rt_, V1) wirepyr(V0, V2, Ti) if iter is not None: arrowproj3(iter, V1[-1]) wireproj3(rt2, V0, V1, Ti, iter3)

While somewhat close, it also shows that our sampling was obviously not optimal. A closer look:

P2 = proj3(rt2, P1) dist_pp_ti2 = [partner_triangle_distance(P2, i, V0, Ti) for i in range(len(P2))] PP2 = np.array([a[1] for a in dist_pp_ti2]) ax.plot(*P2.T, 'rx') [arrow3(P2[i], PP2[i]) for i in range(len(P2))]

... and the point-to-plane arrows are barely visible, confirming that Gauss-Newton did it right (verification); but we did not do the right thing (validation). Maybe adding more samples can help: One on the ground plate to fix z, and two on the sides to fix xy.

p_floor_ = V1[1] + 0.2 * (V1[3] - V1[1]) p_side1_ = V1[1] + 0.2 * ((V1[0] + V1[4])/2 - V1[1]) p_side2_ = V1[1] + 0.2 * ((V1[2] + V1[4])/2 - V1[1]) P3 = np.vstack([P1, p_floor_, p_side1_, p_side2_]) pti3 = np.hstack([pti, 0, 2, 3]) # point-triangle matches partner_triangle_distance(P3, 8, V0, Ti, pti3) # testing iter3 = [] rt3 = gn3(rt1, r, e, P3, V0, Ti, pti3, iter3) e3 = e(rt3, P3, V0, Ti, pti3) # improved energy (GN) wireproj3(rt3, V0, V1, Ti, iter3) ax.plot(*proj3(rt3, P3).T, 'rx', alpha=.5)

And they are finally matched. Even though we did not normalize the energy (i.e. divide by number of points), e3 with 0.0007 is much better than the previous 0.0030. And so we learned that choosing samples has quite an influence!

Conclusion

And that concludes our introduction to mathematical optimization. Though a bit lengthy, I hope you were able to get some intuition for the field, something that I found to be quite a challenge. Thanks for leaving at least two brain cells here, and have a fun day!