CartPole Maths

Our approach to the Cartpole is not black box. A Cartpole is a pretty simple system all things considered.

The first thing to do is derive the equations of motion. Originally I was using the Lagrangian for the system and deriving the equations of motion that way, which includes the back reaction of the pole back on the acceleration of the cart for example.

But the motor complicates things. I have a tough time in general modeling motors. What physical thing does a command to our motor driver correspond to? Velocity? Power? Torque? I have guesses. The easiest thing to do is just build and measure. It turns out for us that the commands are basically velocity control.

So in our case the back reaction is basically irrelevant. We have direct control over the cart velocity. In that case, one can use some easy enough hand waving to get the equations of motion.

Let’s set the angle \theta = 0 at pole down with positive angle going counter clockwise. We can assume that gravity acts at the center of mass of the pole, which is at the midpoint. This gives a torque \tau = -mg \frac{L}{2} \sin(\theta). One way to see the negative sign is that a slight positive angle should give a negative torque returning it back to the down position. The moment of inertia of the pole is mL^2/3. You can look this up as a pole around one of it’s ends or derive it from I =\int dm r^2. The \frac{1}{3} comes from the integration of the r^2. Putting these together we get mL^2/3 \ddot{\theta} = -mg  \frac{L}{2} \sin(\theta) .

Now we need to actually put in the cart stuff. The cart puts the pole in an accelerating frame, where basically you have a new component of gravity that points horizontally. This adds a torque -ma  \frac{L}{2} \cos(\theta) . As far as all of the signs go, honestly we just fiddled with them until it worked.

Now that we have all that in hand, we can talk about the Linear Quadratic Regulator (LQR) control.

https://en.wikipedia.org/wiki/Linear%E2%80%93quadratic_regulator

The model that LQR uses is that the equations of motion are linear and the cost function that you want to minimize are quadratic in the controls u and state x. This is plausibly tractable because Quadratic and linear stuff is usually ok. I’m serious.

These letter choices for the various bits are pretty standard.

cost = \int x^TQx + u^TRu dt

\dot{x}=Ax+Bu

If you just look at the wikipedia page, you can already just plug and chug to the solution.

u = -Kx

K= R^{-1} B^T P

A^TP + PA - PBR^{-1}B^TP + Q = 0 

Jesus that equation looks like shit.

which is QUITE CONVENIENTLY solved by the following scipy function https://docs.scipy.org/doc/scipy-0.14.0/reference/generated/scipy.linalg.solve_continuous_are.html

HOW SUSPICIOUS.

 

Now given this shit, we need to approximate our nonlinear equations of motion with linear equations in order to use the LQR framework.

mL^2/3 \ddot{\theta} = -m \frac{L}{2}( g\sin(\theta) + a \cos(\theta))

Near the top where we are linearizing

\delta\theta = \theta - \pi

\sin(\theta)\approx-\delta\theta

and

\cos(\theta)\approx-1

mL^2/3 \ddot{\delta\theta} \approx m \frac{L}{2}(g\delta\theta+a)

We can move some of those constants to the other side to get \ddot{\delta\theta} by itself.

\ddot{\delta\theta} \approx \frac{3}{2L}( g\delta\theta + a)

Another thing you have to do is massage these second order equations into first order form. You do this by introducing a new state variable \omega

\dot{\delta\theta} = \omega

\dot{\omega} \approx \frac{3}{2L}( g\delta\theta + a)

In matrix form this is

\begin{bmatrix}\dot{\delta\theta} \\ \dot{\omega}  \end{bmatrix} =    \begin{bmatrix} 0 & 1 \\    \frac{3}{2L} g & 0    \end{bmatrix}    \begin{bmatrix}\delta\theta \\ \omega \end{bmatrix}    +    \begin{bmatrix} 0 \\ \frac{3}{2L} \end{bmatrix} \begin{bmatrix} a \end{bmatrix}  

In addition to this, it is nice to add the cart dynamics, even though they are pretty trivial. This is because we can then add some weighting terms to discourage the cart from trying to go off the track or go faster than the motors support. There are ways to make it so that the cartpole never tries to go out of bounds, but they are a bit more complicated. I’ve got some blog posts about them.

\begin{bmatrix}\dot{\delta\theta} \\ \dot{\omega} \\ \dot{x} \\ \dot{v}  \end{bmatrix} =    \begin{bmatrix} 0 & 1 & 0 & 0 \\    \frac{3}{2L} g & 0 & 0 & 0 \\    0 & 0 & 0 & 1\\    0 & 0 & 0 & 0 \\    \end{bmatrix}    \begin{bmatrix}\delta\theta \\ \omega \\ x \\ v \end{bmatrix}    +    \begin{bmatrix} 0 \\ \frac{3}{2L} \\ 0 \\ 1 \end{bmatrix} \begin{bmatrix} a \end{bmatrix}  

So we can read off our needed matrices A and B from here.

A =    \begin{bmatrix} 0 & 1 & 0 & 0 \\    \frac{3}{2L} g & 0 & 0 & 0 \\    0 & 0 & 0 & 1\\    0 & 0 & 0 & 0 \\    \end{bmatrix}

B = \begin{bmatrix} 0 \\ \frac{3}{2L} \\ 0 \\ 1 \end{bmatrix}

Now in regards to the weighting matrices Q and R, this is a bit tougher to say what we want. We sort of want all the state variables to be small but the relative important isn’t a priori clear to me. So we picked diagonal matrices and tried out some different values. One thing to note though is that the states variables could possibly have very different scales, since their units are different. The characteristic time of the system is T=\sqrt{\frac{L}{g}}. The characteristic length is the size of our track 1000mm and the rough angle scale is \pi/8 -ish.

Now that we have our matrices we can plug it all into scipy and use it!

One thing to be careful about is that the pole can have swung around multiple times leading to the angle being some multiple of 2\pi. Our hack around this is to just take the \sin of the angle.

 

 

 

 

Analytic Center in Python using Scipy and Numpy

The analytic center for a set of inequalities \phi(x)<0 is the minimizing position of \sum -\ln (-\phi(x)) . In particular it is often used with linear inequalities. It gives a reasonable and easily computable for convex constraint function center of the region. The hessian at that point can give you a reasonable ellipse that approximates the region too (both interior and exterior approximation).

I wrote a program for linear inequalities. It is not particularly robust. First I get a feasible point using the LP solver in scipy. Then I give the appropriate gradients and Hessians to a newton conjugate gradient solver in scipy. It does return a reasonable center, but I had to fiddle with some epsilons to avoid logarithms exploding and to avoid the hessian being so big it overwhelms the gradient. Possibly a couple burn in steps of gradient descent might help, or getting a feasible point that isn’t optimal since the optimal points being on the boundary is a huge problem. If the newton solver comes back with only 1 or 2 iterations, it probably failed.

 

Noise and The Fluctuation Dissipation Theorem

I was looking at some slides the other day and they quoted noise power in units of \frac{W}{\sqrt{Hz}}. Being the ignoramus I am, I was wondering why it was scaled

First off, when a Watt is quoted in an electrical measurement, usually you’re measuring Voltage with an instrument with a known input impedance Z. That’s how you convert your fluctutating voltage measurement to Watts.

Second, the sqrt frequency thing? Nowadays, your measurement apparatus is probably a digital sampler and it performs an FFT giving you a spectrum. The width of your FFT is the sampling frequency roughly. Does that make sense that when you increase the width of your taken spectrum the height of the noise signal changes too? It does, but only because implicitly, most sampling circuits take an average of the signal over the same period as the sampling time. These two times are not necessarily intrinsically linked. One could have a system that takes a very fast snapshot and but can only save data or send it over a link at a much slower speed. The noise power is this snapshot time, not the data saving time. The data saving time would be the bandwidth in the FFT.

These two are engineered to be the same to avoid distortion of the frequency signal via aliasing.

But there is an even simpler way to see this. Suppose you have two measurements V1 and V2 that are the averages of time T with variance \sigma. Then the average of these two, V3, is over a time 2T. However, by the standard kind of manipulations (for Gaussian variables the squared variance of a sum = the sum of the squared variances, \sigma^2_{\sum x_i}=\sum \sigma_{x_i} ), the variance of the new signal is \sigma/\sqrt{2} which means it scales with the time window. Hence multiplying you actual measured variances by the square root of your time window gives you a time window invariant quantity.

 

While I was thinking about that in the car I realized that the fluctuation dissipation theorem is a mean field theory kind of thing. The fluctuation dissipation theorem feels weird and spooky, but I guess it is ultimately simple (or not).

Mean field theory tries to summarize all the complicated interactions with neighbors with a simple summary. For interacting spins, it tries to summarize as an effective B field from the surrounding spins. Then you have a 1-particle model which you can solve and try to find a self-consistent value of B. Here is a sketch in equations.

H= \sum S\cdot S - B_{ext}\cdot S \rightarrow \sum - B_{eff}\cdot  S

Z=\sum_s e^{-\beta H}

M = <S> = \partial_{\beta B} \ln(Z)

B = \alpha M

You can do something similar to find an effective permeability due to your surrounding neighbors. \partial_B M = \chi

The fluctuating force due to your neighbors is like B, a constant forcing term.

The damping is like the permeability. One may want to consider a system that starts with an intrinsic damping, that is one difference between the magnetic case and the fluctuation case, in that free space has a natural permeability but not a natural damping (I suppose there is always some damping, due to radiation and what not, but we have a tendency to totally neglect such things). One could imagine ball bearings being shaken in a cup of molasses or something. You might want to fluctuation due to being hit by other ball bearings, but consider the damping from the molasses to be the dominating damping term (the the thermal fluctuations from the molasses to be ignorable).

Another difference is that I think you really are going to need to work explicitly with time. Just the thermal average isn’t going to cut it I think (at least not conceptually. There might be some dirty tricks you can play, but a typical Hamiltonian can’t have damping terms. As I write this I am doubting it’s truth).

\ddot{x} = -\nu \dot{x}+ f

calculate some averages … Then use the self-consistency

B = \alpha M \rightarrow f = f(\hat{x})

The dissipation will be related to your correlation with your neighbors. When you are moving faster, they have to tend to move in such a way to make you slow down on average.

To Be Continued

Reverse Mode Auto Differentiation is Kind of Like a Lens

Warning: I’m using sketchy uncompiled Haskell pseudocode.

Auto-differentiation is writing a function that also computes the derivative alongside calculating its value. Function composition is done alongside applying the chain rule to the derivative part.

One way to do this is to use a “dual number”. Functions now take a tuple of values and derivatives.

The Jacobean of a function from R^n \rightarrow R^m is a m by n matrix. The chain rule basically says that you need to compose the matrices via multiplication when you compose the value functions.  This is the composition of the linear maps.

Conceptually, you initialize the process with a NxN identity matrix corresponding to the fact that $latex \partial x_i/\partial x_j=\delta_{ij}

Vectorized versions of scalar functions (maps) will often use diag

A couple points:

  1.  Since the Jacobean j is always going to be multiplied in composition, it makes sense to factor this out into a Monad structure (Applicative maybe? Not sure we need full Monad power).
  2. There is an alternative to using explicit Matrix data types for linear maps. We could instead represent the jacobeans using (Vector Double) -> Vector Double. The downside of this is that you can’t inspect elements. You need explicit matrices as far as I know to do Gaussian elimination and QR decomposition. You can sample the function to reconstitute the matrix if need be, but this is somewhat roundabout. On the other hand, if your only objective is to multiply matrices, one can use very efficient versions. Instead of an explicit dense NxN identity matrix, you can use the function id :: a -> a, which only does some minimal pointer manipulation or is optimized away. I think that since we are largely multiplying Jacobeans, this is fine.

 

What we’ve shown so far is Forward Mode.

When you multiply matrices you are free to associate them in any direction you like. (D(C(BA))) is the association we’re using right now. But you are free to left associate them. ((DC)B)A). You can write this is right associated form using the transpose ((DC)B)A)^T = (A^T(B^T(C^TD^T)))

This form is reverse mode auto differentiation. Its advantage is the number of computations you have to do and the intermediate values you have to hold. If one is going from many variables to a small result, this is preferable.

It is actually exactly the same in implementation except you reverse the order of composition of the derivatives. We forward compose value functions and reverse compose derivative functions (matrices).

drawing-2

We have CPSed our derivative matrices.

Really a better typed version would not unify all the objects into a. While we’ve chosen to use Vector Double as our type, if we could tell the difference between R^n and R^m at the type level the following would make more sense.

However, this will no longer be a monad. Instead you’ll have to specify a Category instance. The way I got down to this stuff is via reading Conal Elliott’s new Automatic Differentiation paper which heavily uses the category interface.  I was trying to remove the need to use constrained categories (it is possible, but I was bogged down in type errors) and make it mesh nice with hmatrix. Let me also mention that using the Arrow style operators *** and dup and &&& and fst, and clever currying that he mentions also seems quite nice here. The Tuple structure is nice for expressing direct sum spaces in matrices. (Vector a, Vector b) is the direct sum of those vector spaces.

Anyway, the arrows for RD are

This is a form I’ve seen before though. It is a lens. Lens’ have a getter (a -> b) that extracts b from a and a setter (a -> b -> a) that given an a and a new b returns the replaced a.

Is an automatic derivative function in some sense extracting an implicit calculable value from the original vector and returning in a sense how to change the original function? It is unclear whether one should take the lens analogy far or not.

The type of Lens’  (forall f. Functor f => (b -> f b) -> a -> f a) means that it is isomorphic to a type like DFun’. The type itself does imply the lens laws of setters and getters, so these functions are definitely not proper lawful lenses. It is just curious that conceptually they are not that far off.

The lens trick of replacing this function with a quantified rank 1 type (forall f. ) or quantified rank-2 (forall p.) profunctor trick seems applicable here. We can then compose reverse mode functions using the ordinary (.) operator and abuse convenience functions from the lens library.

Neat if true.

 

 

CartPole WORKIN’ BOYEEE

We have been fighting a problem for weeks. The Serial port was just not reliable, it had sporadic. The problem ended up being a surprising thing, we were using threading to receive the messages nd checking for limit switches. It is not entirely clear why but this was totally screwing up the serial port update in an unpredictable manner. Yikes. What a disaster.

After that though smoooooooth sailing.

With a slight adaptation of the previous Openai gym LQR cartpole code and a little fiddling with parameters we have a VERY stable balancer. We removed the back reaction of the pole dynamics on the cart itself for simplicity. This should be accurate when the pole vastly.

We did find that the motor is exactly velocity control in steady state with a linear response. There is a zero point offset (you need to ask for 100 out of 2046 before you get any movement at all).

We’ll see where we can get with the Lyapunov control next time.

 

https://github.com/philzook58/cart_pole/commit/ca4aabb86eca54c5a9c7884ad25fafdec4d545ce

https://github.com/philzook58/cart_pole/blob/master/lqr_controller.py

 

 

Some random links on uses of Convex relaxations, in particular Semidefinite programming

 

Problems involving outer products of vector variables can be relaxed into semidefinite programs. That’s a general trick. Then the low rank bit from SVD is an approixmate solution for the vector

 

https://en.wikipedia.org/wiki/Matrix_completion

 

convex relaxation for distributed optimal control

http://ieeexplore.ieee.org/document/7464306/

 

 

graph matching in relation to Image correspondence

Permutation matrices have sum of rows and columns must be 1 constraint, is one relaxation.

quickMatch. Actually, not convex programming but was the root of the chain of references I ‘m digging through

http://openaccess.thecvf.com/content_ICCV_2017/papers/Tron_Fast_Multi-Image_Matching_ICCV_2017_paper.pdf

https://dl.acm.org/citation.cfm?id=2600314

matchALS

www.cis.upenn.edu/~kostas/mypub.dir/xiaowei15iccv.pdf

https://arxiv.org/abs/1402.1473

https://vision.in.tum.de/research/convex_relaxation_methods

 

Finding MaxCut approximation of a graph is a classic one

 

 

Quantum Semidefinite programming course

Density matrices have a semidefinite constrina (non negative probabilities)

https://cs.uwaterloo.ca/~watrous/CS867.Winter2017/

 

 

Sum of Squares is a semidefinite program that can guarantee that lyapunov functions actually work

 

 

Cart Pole Trajectory optimization using Cvxpy

I’ve been watching Russ Tedrake’s underactuated robotics and been trying some stuff out. The Drake package probably does this stuff better. Also IpOpt and Snopt are the libraries that get mentioned when people want to do this stuff for real.

It’s  too slow to be practical. That is mostly the fault of the cvxpy overhead. The ECOS solver itself says that it solves each iteration in about 0.02 seconds on my macbook.

The idea is to use linearized dynamics as constraints. Then iteratively ask cvxpy to solve for us. Until hopefully it converges. This is Sequential Convex Programming https://web.stanford.edu/class/ee364b/lectures/seq_slides.pdf

I used the discretization of the equations of motion using the mehotd described here https://epubs.siam.org/doi/pdf/10.1137/16M1062569

It is possible I did it right.

The Hermite Collocation for the trajectory and trapezoidal for the control

If we used just an absolute value cost, this all would be a linear program. Something to consider. Also CVXOPT would probably be faster.

It hurts me that the constraint and cost matrices are banded and could be solved so quickly. But the next level up in programming complexity takes a lot more work it seems to me.

 

This is the result

Green is cart acceleration. You can see it slamming into the maximum acceleration constraint

Blue is pole angle and orange is angular velocity. So it does a pretty good job. For some settings it is just awful though.

figure_1

Making a Van Der Graaf Generator

Will has been juicin’ to make a Van der Graaf generator. Similar to the Stirling engines, we followed youtube instructions to make it out of trash (soda bottles and soda cans and tape) and failed miserably.

We were mostly following this video https://www.youtube.com/watch?v=vfvjGeOGoBU

Yesterday, we took the gloves off and went to home depot. We used a trash fan motor which is awesome. 3d Printed some pieces for mounting the motor and for mounting the roller on the shaft with a press fit.

https://cad.onshape.com/documents/5016468595aba2e0189feaa4/w/5325496b7d68ecef269a238a/e/3cc572cbcee673b79b85caba 

Big ole 4″ pvc is the stalk. A flange to mount to the base which is a nice circle wood from the nice wood section of . Used two ikea bowls as the globe. I think it is important to have the upper roller somewhat deeply inside the globe?

We’re using pantyhose as a belt. Sort of the waist area stretched out to be a loop. It is just rubbing on the fixed top pvc roller. Maybe wrapping that in teflon tape would make it even better? The pantyhose is pure nylon. Nylon and pvc are towards opposite ends of the triboelectric series, which is important for the thing to work, but teflon is even farther down.

We used some brushes from home depot as electrical brushes. Might be overkill.

We wrapped the lower roller in more pantyhose to make a bulge. Counterintuitively, this bulge is supposed to keep the belt centered?

We were getting pretty wimpy sparks until we installed to lower brush and properly grounded it. I guess that is really important. Now they are pretty beefy maybe a couple centimeters. You can definitely see them and they sting a bit.

All in all a success!

img_0231 img_5799 img_4759 img_6389

 

LQR with CVXPY

Holy crap this was easy.

And I was able to easily add a constraint on the available force. Hot damn. It is a ridiculously tiny problem I guess, but still pretty damn cool. 0.002 second runtime.

 

STM32F411 Discovery Board Getting started

Bought one of these discovery boards for 15$ from digikey. I like the built in stuff, like the 9axis mems device. I don’t enjoy wiring up little boards particularly and it makes every project so ephemeral.

http://www.st.com/content/st_com/en/products/evaluation-tools/product-evaluation-tools/mcu-eval-tools/stm32-mcu-eval-tools/stm32-mcu-discovery-kits/32f411ediscovery.html

I am concerned that I should have gotten the older board.

User manual has pin connectors

http://www.st.com/content/ccc/resource/technical/document/user_manual/e9/d2/00/5e/15/46/44/0e/DM00148985.pdf/files/DM00148985.pdf/jcr:content/translations/en.DM00148985.pdf

It’s in platform.io as

disco_f411ve

Huh it only supports STM32Cube? I kind of wanted to try libOpenCM3

start a project with

platformio init –board disco_f411ve

platformio run

The examples are invaluable

https://github.com/platformio/platform-ststm32/tree/develop/examples

ok. There is a blink for the Hardware abstraction layer (HAL) and Low level (LL)

Hmm. Neither blink examples work off the bat. SYS_CLOCK is undefined for Low level

and board not supported in the HAL.

 

 

Alright let’s try libopencm3

https://github.com/libopencm3/libopencm3-examples

make in top directory first

follow directions to init submodule

alirght I guess I need to download the arm toolchain

https://developer.arm.com/open-source/gnu-toolchain/gnu-rm/downloads

I put them on my path

export PATH=~/Downloads/gcc-arm-none-eabi-7-2017-q4-major/bin:$PATH

Then ran make

also need to apt install gawk

editted the Makefile of miniblink

https://github.com/libopencm3/libopencm3/blob/master/ld/devices.data

stm32f411re stm32f4 ROM=512K RAM=128K

that’s the right config even though re is wrong

Okay was getting weird error

sudo make flash V=1

the V=1 gives verbose

I may have had to apt install openocd?

Need to run under sudo. Go fig.

Alright got some blinking! WOOO

Ah I see. PD12 is PortD12. Makes sense.