Proving some Inductive Facts about Lists using Z3 python

Z3 is an SMT solver which has a good rep. Here are some excellent tutorials.

SMT stands for satisfiability modulo theories. The exact nature of power of these kinds of solvers has been and is still hazy to me. I have known for a long time that they can slam sudoku or picross or other puzzles, but what about more infinite or logic looking things? I think I may always be hazy, as one can learn and discover more and more encoding tricks to get problems and proofs that you previously thought weren’t solvable into the system. It’s very similar to learning how to encode to linear programming solvers in that respect.

SMT solvers combine a general smart search procedure with a ton of specialized solvers for particular domains, like linear algebra, polynomials, linear inequalities and more.

The search procedure goes by the name DPLL(T). It is an adjustment of the procedure of SAT solvers, which are very powerful and fast. SAT solvers find an assignment of boolean variables in order to make a complicated boolean expression true, or to eventually find that it can never be made true. SAT solvers work on the principle of guessing and deducing. If a OR b needs to be true and we already know a is false, we can deduce b must be true. When the deduction process stalls, we just guess and then backtrack if it doesn’t work out. This is the same process you use manually in solving Sudoku.

The modern era of SAT solvers came when a couple new tricks vastly extended their power. In particular Conflict Driven Clause Learning (CDCL), where when the solver finds itself in a dead end, it figures out the choices it made that put it in the dead end and adds a clause to its formula so that it will never make those choices again.

SMT works by now having the boolean variables of the SAT solver contain inner structure, like the boolean p actually represents the fact x + y < 5. During the search process it can take the pile of booleans that have been set to true and ask a solver (maybe a linear programming solver in this case) whether those facts can all be made true in the underlying theory. This is an extra spice on top of the SAT search.

Something that wasn’t apparent to me at first is how important the theory of uninterpreted formulas is to SMT solvers. It really is their bread and butter. This theory is basically solved by unification, which is the fairly intuitive process of finding assignments to variables to make a set of equations true. If I ask how to make fred(x,4) = fred(7,y), obviously the answer is y=4, x=7. That is unification. Unification is a completely syntax driven way to deduce facts. It starts to give you something quite similar to first order logic.

I was also under the impression that quantifiers \forall, \exists were available but heavily frowned upon. I don’t think this is quite true. I think they are sort of a part of the entire point of the SMT solver now, although yes, they are rather flaky. There are a number of methods to deal with the quantifier, but one common one is to look for a pattern or parts of the subformula, and instantiate a new set of free variables for all of the quantified ones and add the theorem every time the patterns match. This is called E-matching.

Here are a couple tutorials on proving inductive facts in SMT solvers. You kind of have to hold their hand a bit.

SMT solvers queries usually have the flavor of finding something, in this case a counterexample. The idea is that you try to ask for the first counterexample where induction failed. Assuming that proposition P was true for (n-1), find n such that P is not true. If you can’t find it, then the induction has gone through.

And here is some code where I believe I’m showing that some simple list functions like reverse, append, and length have some simple properties like \forall t. rev (rev(t)) == t .

A Short Skinny on Relations & the Algebra of Programming

I’ve been reading about the Algebra of Programming lately and lovin’ it. See J.N. Oliveira’s draft text in particular and the links in the references. I’ve started exploring the stuff from this post and more over here:

Why and What?

Relations can expand the power of functional programming for the purpose of specification.

The point of a specification is to be able to write down in a very compact and clear way your intent for a program, more clearly and compactly than a full implementation could be written. It therefore makes sense to add to your specification language constructs that are not necessarily executable or efficient for the sake of compactness and clarity. When one needs executability or efficiency, one writes an implementation whose behavior you can connect to the spec via a formal or informal process.

Functional programming, with it’s focus on the abstraction of the mathematical function, is a good trade-off between executability, efficiency, and expressibility. It lies in a reasonable location between the ideas amenable to reasoning by a human mind and the command-driven requirements of the machines.

Functions are a specialization of relations. Relations extend the mathematical notion of functions with constructs like nondeterministic choice, failure and converse. These constructs are not always obviously executable or efficient. However, they greatly extend the abilities of reasoning and the clarity of expression of a specification.

The point-free style of reasoning about functions extends to a point-free style reasoning about relations, which is known as relation algebra. There are rich analogies with databases, category theory, linear algebra, and other topics.

Plus, I think it is very neato for some reason. If anyone ever thinks something is really neato, isn’t it worth giving it a listen?

A Simple Representation of Relations in Haskell

The simplest description of relations is as a set of tuples. So first let’s talk a bit about the options for sets in Haskell.

Sets in Haskell

There are a couple different reasonable ways to represent sets in Haskell.

  • [a] or Vector a
  • a -> Bool
  • Set a — a tree based Set from the containers package.

These have different performance characteristics and different power. The list [a] is very simple and has specialized pleasant syntax available. The indicator function a -> Bool gives you no ability to produce values of type a, but can easily denote very sophisticated spaces. Set a is a good general purpose data structure with fast lookup. You might also choose to mix and match combinations of these. Interconversion is often possible, but expensive. This is not a complete list of possibilities for sets, for example you may want a representation with a stronger possibility for search.

Relations in Haskell

We can directly use the definition of relations as a set of tuples with the above

But we also have the option to “curry” our relation representations, sort of mixing and matching properties of these representations.

You might also choose to package up multiples of these representations, choosing the most appropriate as the situation requires, see for example the relation package, whose type holds both Map a (Set b) and Map b (Set a).

Despite fiendishly poor performance, for simplicity and list comprehension syntax we are going to be using type Rel a b = [(a,b)] for the remainder of the post.

I’m also taking the restriction that we’re working in bounded enumerable spaces for ease. I assume such a requirement can be lifted for many purposes, but finite spaces like these are especially well tamed. The following typeclass and definition is very useful in this case.

Functions and Relations

Functions can be thought of as relations with the special property that for each left part of the tuple, there is exactly one right side and every possible left side appears. The relation corresponding to a function f looks like F = \{(x,y) | x \in X, y \in Y, y = f (x)\}.

There is a natural and slightly clever lifting of function composition to relations. We now check whether there exists a value that is in the right side of one and the left side of the other.

Because of these two operations (and their properties of associativity and absorption), FinRel is a category. We do however need the Eq b restriction to make Rel an instance of the category typeclass, so it does not quite fit the definition of category in base. It is a constrained category.

We can lift the common arrow/categorical combinators up to relations for example.

With these combinators, you have access to many functions on basic non-recursive algebraic data types. By combining them in a point free style, you can build some other useful combinators.

An Aside: Relations, Linear Algebra, Databases

The composition operation described above is not so unfamiliar as it may first appear.

Relation algebra has a great similarity to linear algebra. This connection can be made more clear by considering sparsity patterns of matrices and tensors. Sparsity patterns are a useful abstraction of linear algebraic operations. Instead of considering matrices of numbers, instead the entries are “zero” and “possibly nonzero” or, if you prefer, a matrix of boolean values corresponding to those same questions.

The ordinary row times column matrix multiplication corresponds to relation composition. Replace * with AND and + with OR. If any of the numbers is zero, then multiplying them will result in zero. In summing two numbers, if either is possibly nonzero, then the result is possibly nonzero.

Another interesting way of looking at it is that we are replacing the summation binding form \sum_i with the logical quantifier \exists_i. Both introduce a scoped “dummy variable” i and have a lot of syntactic similarity. Other related forms include \lambda i, \forall i, \int di, \max_i .

There is also an analog of the point free relation algebra in linear algebra. Linear algebra has the most widely used point free notation in the world, matrix notation. Consider the expressions Ax=b and X = ABC as compared to \sum_j A_{ij} x_j = b_i and X_{il} = \sum_{jk} A_{ij} B_{jk} C_{kl} . Matrix notation is SO much better for certain calculations. Other pieces of the matrix notation include transpose, inverse, Kronecker product, the Khatri-Rao product, and Hadamard product. Their properties are more clear in the index free form in my opinion. I believe even massive tensor expressions can be written index free using these operators. There are also analogies to be drawn between the graphical notations in these different fields.

Databases can be thought of very similarly to sparse matrices. In principle, you could enumerate all the possible values for a column of a database. So you could think of a database as a giant matrix with a 1 if the item is in the database and 0 if not. Databases are very very sparse from this perspective, and you would never store them this way. The join operation is a relative of relational composition, however join usually operates via looking at the column names, whereas our join is position based.

Query optimization in databases has interesting analogs in sparse linear algebra. For example, the Taco compiler is doing something very akin to a query optimizer.

Inverting Relations

Unlike functions, Relations are always “invertible”. We call this the converse of a relation. When a function is invertible, it corresponds to the converse. In terms of the tuples underlying our representation, it just swaps them. Relations also possess operations trans and untrans that may be thought of as a kind of currying or as a partial inverse on a single parameter.

Orderings can also be lifted to relations (\leq) = \{(a,b) | a \leq b\}. The composition of relations also respects the usual composition of ordering.

Nondeterministic choice is sometimes represented in Haskell using Set returning functions a -> [b]. You may recall this from the context of the List monad. In fact in this case, we have an isomorphism as evidenced by tabulateSearch and searchRel.

Similarly partial functions can be reflected into relations

A useful trick is to lift sets/subsets to relations as a diagonal relation. \{(a,a) | a \in S \}. Projection onto the set can be achieve by composing with this relation. The identity results if you are talking about the entire set S.

Comparing Relations

We can compare sets by asking if one is a subset of the other A\subseteq B . Relations can also be compared by this operation, which we call relational inclusion.

A subservient notion to this is relational equality.

Relational algebra is chockful of inequality style reasoning, which is richer and slightly more complicated than equality style reasoning. This is one of the benefits of moving from functional descriptions to a relational description.

Relations also form a lattice with respect to these comparisons. What the hell are lattices? In the context of finite relations, lattices may be over powered mathematical machinery, but it really is useful down the line. They give you binary operators that play nice with some kind of ordering, in our case relational inclusion. These two operations are the meet and the join, which find the greatest lower bound and least upper bound of the operands respectively. For our relations, these correspond to the more familiar notion of set intersection and union. The intersection of two sets is the biggest set that is in both of them. The union is the smallest set for which both sets are a subset of it.

Using meet/join vs intersection/union becomes more interesting when the domain is fancier than relations over finite domains. Issues of infinity can make this interesting, or when using a representation that can’t explicitly represent arbitrary unions or intersections, but that instead has to approximate them. My favorite example is polyhedra. Polyhedra are not closed under unions. So in this case the join and union do not coincide. You need to take a convex hull of the union instead, which is the best approximation. Concretely, polyhedra can be represented as a list of their vertices, which generate the polyhedron. There is no way to express a union in this representation. Concatenating the lists represents taking the convex hull of the union.

An additional property that a lattice may possess is a largest and small element, called top (\top ) and bottom (\bot). Our finite domain relations do have these.

Relational Division

And now finally we get to one of the most interesting, powerful, and confusing operations: relational division. Relational division is a kind of pseudo inverse to relational composition. In linear algebra, the pseudo inverse is a matrix that does the best job it can to invert another matrix in a least squares sense. If the matrix is actually invertible, it equals the inverse. Relational division does the best job it can to invert a relational composition. Instead of taking least squares as a criteria, it ensures that the result doesn’t over approximate. If you had the inequality X \cdot Y \subseteq Z and you want to solve for X, relational division is the thing that does that. The right division Q = Z/Y is the largest relation such that Q \cdot Y \subseteq Z.

A helpful example is the similar operation of division in database tables.

And here is an implementation that I think is correct. I’ve goofed it up a couple times, it is a rather confusing construct.

There also exists a very similar operation of ldiv.

Relational division encapsulates many notions of searching or optimizing. I invite you to read more about it in J.N. Oliveira’s text or the Bird & de Moor text.

Properties and QuickCheck

Oh. Mah. Glob. You guys. So many properties. (Artwork courtesy of David)

Relation algebra is so chock full of properties. This is a perfect opportunity for some QuickCheck , a randomized property testing framework. There are so many more to test. I need to dig through to collect up all the identities.

Bits and Bobbles

  • Relations over continuous spaces. Vector subspaces (Linear Relations), Polyhedra (Linear inequality relations).
  • Non Bool-valued Relations. Replace \exists_x with \max_x. The weighted edgelist of a graph is a natural relation. By using composition we can ask about paths. We still have a comparison operator \subseteq which now respects the ordering of weights
  • Galois connections are cool.
  • Relations combined with recursion schemes. Recursion schemes are the point free way of describing recursion.
  • Moving into infinite spaces. How do we cope?
  • Faster search. Some relations are best specified by functions, Maps, others, mixes and matching.
  • If you “singletonize” relations a la the Agda project, you get very interesting interconnections with profunctors, which people say are a categorical generalization of relations.
  • Point-free DSLs are interesting and pleasant. Many worries about alpha renaming are gone, at the expense of point-free pain. A DSL like this may be necessary to choose good relational query plans


Edit: An interesting comment and related library from /u/stevana

Some Notes on Drake: A Robotic Control ToolBox

Dumping these notes out there as is. Some material is out of date. Hope they’re useful.

What is Drake?

From the Drake Website (

“Drake (“dragon” in Middle English) is a C++ toolbox started by the Robot Locomotion Group at the MIT Computer Science and Artificial Intelligence Lab (CSAIL). The development team has now grown significantly, with core development led by the Toyota Research Institute. It is a collection of tools for analyzing the dynamics of our robots and building control systems for them, with a heavy emphasis on optimization-based design/analysis.

While there are an increasing number of simulation tools available for robotics, most of them function like a black box: commands go in, sensors come out. Drake aims to simulate even very complex dynamics of robots (e.g. including friction, contact, aerodynamics, …), but always with an emphasis on exposing the structure in the governing equations (sparsity, analytical gradients, polynomial structure, uncertainty quantification, …) and making this information available for advanced planning, control, and analysis algorithms. Drake provides interfaces to high-level languages (MATLAB, Python, …) to enable rapid-prototyping of new algorithms, and also aims to provide solid open-source implementations for many state-of-the-art algorithms. Finally, we hope Drake provides many compelling examples that can help people get started and provide much needed benchmarks. We are excited to accept user contributions to improve the coverage.”

Drake is a powerful toolkit for the control of dynamical systems, and I hope I lower some of the barrier to entry with this post. Be forewarned, Drake changes quickly with time, and some of the following may be out of date (especially the rigid body trees) or ill advised. Use your judgement.

Getting Drake

Drake may be installed from binaries or source. Both may be gotten here:

Using Bazel

Drake uses Bazel as a build tool. Bazel is an open-source build and test tool similar to Make, Maven, and Gradle.

There are three commands that you need to know:

  • bazel build
  • bazel run
  • bazel query

Query is very useful for investigating available binaries within the examples folder and elsewhere.

The notation // is used to refer to the build’s main directory. This corresponds to the drake folder.
... is the notation for everything in the subdirectory


  • bazel build //... will build everything in the project.
  • To query all the binaries available for tools run bazel query //tools/...

Using Pydrake

Drake can be run natively in C++, or by using its MATLAB, python, or Julia bindings. This manual will be focusing on using Drake in python. By default pydrake will not be in the path. You can put pydrake into the path after building by running the following line or adding it to your .bashrc

The following line will import everything in Drake into the python namespace.

Drake is only compatible with python 2.7. If your default system install is python3, you may need to explicitly run the command python2.
The following message may be thrown if you inadvertently use python3.

Pydrake itself has generated documentation available here:

This is an extremely useful resource.

A very useful tool for exploring and confirming the Drake functionality and syntax is the python REPL. From the python REPL or within your code, it is useful to inspect an object using either help(obj) or print(dir(obj)) which will print a list of all the properties available on your object.

Drake resources


Besides the source code itself, the most accurate and up to date information is available in searchable form on the Doxygen page. This is a useful reference, but can be overwhelming.

Underactuated Robotics textbook and examples

Drake has a textbook being built alongside it by Russ Tedrake.

In particular the python source used to generate some of the examples in the book is worth examining.

The full course is available online both on Edx and more recent versions on Youtube.

Examples directory.

A useful source of use cases for drake can be found in the examples directory within the Drake project. As of October 2018 the contents include:

  • Cartpole – The cartpole is a classic control system consisting of a pendulum attached to a linearly actuated cart. In this directory you can find examples
  • Pendulum
  • Double Pendulum
  • Acrobot – The Acrobot is a double pendulum system actuated at the shoulder
  • Kinova Jaco Arm – A commercially available robotic arm
  • Kuka Iiwa Arm – A commerically available robotic arm
  • Particles
  • Bouncing Ball
  • Contact Model – Contains bowling pins, a gripper, and sliding bricks demonstrating Drakes ability to simulate contact dynamics
  • Rimless Wheel – A very simple model of a walking robot
  • Compass Gait – A slightly less simple model of a walking robot
  • Atlas – Examples concerning the Atlas humanoid robot
  • zmp
  • quadcopter
  • and others

Additional usage examples for pydrake can be found in the drake/bindings/pydrake/test folder.

Periscope Tutorials

There is a set of Jupyter notebook based tutorials for some basic Drake functionality in python.

Drake Concepts


For dynamic simulation, Drake exposes a Lego block interface for building complex systems out of smaller pieces, a paradigm similar to Simulink and other modeling software.
Objects possess input and output ports. One wires up input ports to output ports to build composite systems.

To build a simple forward simulation, construct a builder object. Then add all subsystems to the builder object. Explicitly connected input and output ports together. One possible cause of crashes may be leaving ports unconnected.

Once the entire system has been built, a Simulator object can be constructed from it. You may select an integration scheme and set initial conditions by getting a context from the Simulator object. The context holds state information.

There following excerpt from shows how to define a simple system and simulate it.


Rigid Body Trees

Edit : I think the Rigid Body Trees interface is deprecated. Use Multi Body Trees.

There are two complementary perspectives to take of the degrees of freedom of a robot, intrinsic coordinates and extrinsic coordinates. The intrinsic coordinates have one variable per degree of freedom of the robot. A common example is a set of joint angles. The dynamics are simply expressed in the intrinsic coordinates and can derived using Lagrangian mechanics. The extrinsic coordinates specify the spatial locations and orientation of frames attached to the robot. These are called frames. These spatial coordinates may be constrained in a relationship by a rigid mounting or gearing, so there may be more extrinsic frames available than intrinsic coordinates. Extrinsic coordinates are particularly pertinent for discussions of geometry, contact, and external forces.

Drake uses the URDF (Universal Robot Description Format) format. This is a common robot format originating in the ROS community for which you can find models of many commercial robots online. It is an XML based format with visualization, inertial, and collision tags.

This is an example URDF for a pendulum cart system.

Example: Pendulum URDF


The RigidBodyTree is a Drake class that describes both the intrinsic and extrinsic properties of a linkage. RigidBodyTrees may be built from a URDF file, some of which are packaged inside of Drake. For example, the Jaco arm URDF is available packaged with Drake.

The full properties of the RigidBodyTree class are available at , but it will be useful to highlight some of the most commonly used functionality.

You can probe the RigidBodyTree for useful properties about the linkage, for example the number of intrinsic coordinates, or the number of bodies in the tree.

For many operations, one needs to perform the forward dynamics to build a kinematic cache that needs to be supplied later.

Drake describes the dynamics in terms of the intrinsic coordinates. The robot manipulator equations have the common form (See )

M(q)\ddot{q} + C(q,\dot{q})\dot{q} = \tau_g(q) + Bu

where q is the state vector, M is the inertia matrix, C captures Coriolis forces, and \tau_g is the gravity vector. The matrix B maps control inputs u into generalized forces.

External forces on the tree are described as wrenches. Wrenches are an object that combines forces and torques. In Drake, they are specified by 6 dimensional vectors. From the Drake docs:
“A column vector consisting of one wrench (spatial force) = [r X f; f], where f is a force (translational force) applied at a point P and r is the position vector from a point O (called the “moment center”) to point P.”

Drake will also compute the quantities in the manipulator equation. For example, to compute the term C(q,v,f_{ext}) in the manipulator equations with no externally applied wrenches.

Drake can be asked to compute the other terms in the manipulator equation as well. Drake can compute the center of mass of the entire tree.

Drake provides a couple of useful mappings between the intrinsic and extrinsic coordinates.

First, given a set of internal coordinates, Drake can transform these into frames, which may be expressed in yet another coordinate system.

The extrinsic frames can be expressed as a function of the intrinsic coordinates

X_i = f_i(x_j)

Returns a 4\times 4 transformation matrix between body 0 and 9 of the tree. The upper 3\times 3 block corresponds to a rotation matrix, and the last column a translation vector of the frame.

The Jacobian of this mapping,

J_{ij} = \frac{\partial f_i}{\partial x_j}

is useful for translating externally described small displacements, velocities, forces, and torques, into the equivalent terms for the intrinsic coordinates.

The time derivative or differential of a frame will possess a linear and angular velocity. These come packed in a 6 dimensional vector called the Twist.

The geometric Jacobian function returns the Jacobian in a sparse format. It returns a tuple of a m\times 6 matrix and a 1\times m vector of indices to which coordinates these correspond.
The function takes the id of three different frames. In this example, it computes the differential of the transformation from frame 0 to frame 9 expressed in frame 3.

The dense matrix can be constructed by


A Mathematical Introduction to Robotic Manipulation –


The Drake visualizer may be found in the tools folder. The Drake visualizer needs to be running before any applications that needs to communicate with it are started. To get the drake visualizer running run the following command from the drake directory

May need to run commands here to make LCM work and visualizer not crash immediately

Kinova ROS package

The Kinova Jaco arm is easily communicated to through the Kinova ROS package

The package as of October 2018 supports Ubuntu 14.04 and ROS Indigo. The package includes bindings to the Jaco SDK for reading sensor data and giving motion commands. The package supports a variety of control modes, including torque control.

To update functionality to ubuntu 16.04 and ROS Kinetic you may wish to pull from this external branch

or try the beta branch

The Gazebo simulator seems to have a great deal of trouble.
One suggestion is that you have to add a small nonzero size parameter to your URDF files.

The topics made available by the ROS Package are

ROS Intercommunication

The communication stack used by Drake is Lightweight Communications and Marshalling (LCM). For interconnection to the rest of the ROS ecosystem, this adds a layer of friction.

One option is to use the lcm_to_ros project

This is a generator for building republishers of messages going between ROS and LCM

Another option is to build custom subscribers and publishers as in the following example.

Example: Connecting Drake Visualizer to External Jaco Arm

In another terminal turn on the Jaco driver

Also get the drake visualizer running from the drake directory before running the script

The Drake Visualizer

Example: Crumpling Jaco

A simple example of the simulation capabilities is the simulation of an unactuated Jaco arm.

Optimization Solvers

Drake provides a common interface to many optimization solvers. Because of the tight integration, Drake has the capability to directly build the equations of motion for a system into a form the solver can comprehend.

The Mathematical Program class provides a high level interface to the different solvers. This class can be constructed as an object on it’s own or as returned by other helper classes such as trajectory optimization builders.

Drake provides a symbolic expression modeling language in which to describe constraints and costs.

There are a large variety of solvers out there for problems of different structure.

A Mathematical Program is generally of the form

\min f(x) \   s.t.
g(x) \ge 0
h(x) = 0

One of the oldest and most studied class of these programs are known as Linear Programs which has linear cost and constraints. This mathematical program has very efficient solvers available for it.

A large class of optimization problems that are tractable are known as Convex Optimization problems. The cost functions must be bowl shaped (convex), the inequality constraints must define convex regions and the equality constraints must be affine (linear + offset). For this class, gradient descent roughly works and refinements of gradient descent like Newton’s method work even better. Convexity implies that greedy local moves are also acceptable global moves and there are no local minima or tendril regions to get caught in.

The common reference for convex optimization is the textbook by Boyd and Vandenberghe freely available at There is also an accompanying video course available online.

Subclasses of Convex programming may have solvers tuned to them. Important subclasses include:

  • Linear Programming – Linear objective, linear equality, and linear inequality constraints.
  • Quadratic Programming – Linear Programming + quadratic objective
  • Second Order Cone Programming
  • Semidefinite Programming – Optimization allowing for the constraint of a SemiDefinite matrix. This means the matrix is constrained to have all nonnegative eigenvalues or equivalently the quadratic form it defines q^T Xq is non-negative for all possible vectors q.
  • Sum of Squares Programming – Optimization over polynomials with the constraint that the polynomial can be written as a sum of squares, a manifestly positive form.

Many problems cannot be put into this form. If the inherent nature of the problem at hand requires it, you may choose to use a nonlinear programming solver or Mixed Integer Programming Solver.

The solvers Ipopt and Snopt are nonlinear programming solvers. Ipopt is an open source and Snopt is proprietary. These solvers use local convex approximations to the problem to heuristically drive the solution to a local minimum.

Mixed Integer Linear Programming is Linear Programming with additional ability to require variables to take on integer values. This additional constraint type takes the problem from polynomial time solvable to NP-complete. A surprisingly large number of discrete and continuous optimization problems can be encoded into this framework. Internally, these solvers use linear programming solvers to guide the discrete search.

Part of the art and fun of the subject comes from manipulating your problem into a form amenable to powerful available solvers and theory.

Hans Mittelmann at the University of Arizona maintains benchmarks for a variety of optimization tools. A rule of thumb is that commercial solvers perform better than open source solvers.

Available Solvers in Drake

  • Mosek – Mosek is a proprietary optimization solver package offering solvers for
    • Linear.
    • Conic quadratic.
    • Semi-definite (Positive semi-definite matrix variables).
    • Quadratic and quadratically constrained.
    • General convex nonlinear.
    • Mixed integer linear, conic and quadratic.
  • Gurobi – Gurobi is a proprietary optimization solver package offering solvers for
    • linear programming solver (LP)
    • mixed-integer linear programming solver (MILP)
    • mixed-integer quadratic programming solver (MIQP)
    • quadratic programming solver (QP)
    • quadratically constrained programming solver (QCP)
    • mixed-integer quadratically constrained programming solver (MIQCP)
  • Snopt – Snopt is a nonlinear optimization solver using sequential convex optimization (SQP)
  • Ipopt – Ipopt is a open source nonlinear optimization solver using sequential convex optimization (SQP)
  • Operator Splitting Quadratic Program (OSQP) – Open source quadratic programming package
  • ik
  • LCP
  • dReal – An SMT solver for reals

Automatic Differentiation

Automatic Differentiation is the capability to have derivatives computed alongside code that computes the values. It is largely based upon application of the chain rule. There are two modes, forward and reverse mode.

Forward mode is the simplest to describe. Functions can be overloaded so that they take a dual number, a value combined in a tuple with it’s derivative information. As the value of a function is computed, the Jacobian of that function is matrix multiplied on the derivative concurrently.

Drake exposes automatic differentiation capability for manual use

More importantly Drake uses automatic differentiation internally to marshal symbolic problems into forms acceptable to external solvers and to calculate the various Jacobians we’ve already seen.

Trajectory Optimization

Trajectory optimization is a framework in which one uses Mathematical programming to solve optimal trajectory problems. The input to the system
is considered to be a decision variable in a Mathematical programming problem.

The combination of dynamical system modeling, automatic differentiation, and bindings to mathematical programming solvers makes Drake an excellent platform for trajectory optimization.

In Direct Collocation, both the path and the input variables are discretized along time. The trajectories are described by cubics and force curves are described by piecewise linear. One way of performing direct collocation is to take the path position at a discrete number of time points and make a decision variable for each. The discretized equations of motion become constraints that neighboring time points have to obey. One may then inject any other desired requirements (staying inside some safe region for example) as additional constraints.

The Drake docs state:

DirectCollocation implements the approach to trajectory optimization as described in C. R. Hargraves and S. W. Paris. Direct trajectory optimization using nonlinear programming and collocation. J Guidance, 10(4):338-342, July-August 1987. It assumes a first-order hold on the input trajectory and a cubic spline representation of the state trajectory, and adds dynamic constraints (and running costs) to the midpoints as well as the knot points in order to achieve a 3rd order integration accuracy.

Drake provides a useful interface for talking about trajectories. For both describing the cost function and the constraint functions, you want them to apply at all time steps. You can ask drake for variables representing the position or forces at all time steps. Drake will then build the appropriate mathematical program applying the cost of constraint at all time steps.

You may want to select the initial and final state specifically to specify goals and initial conditions


Example: Trajectory Optimization of a Pendulum

This example comes from the Underactuated Robotics textbook source

Example Application: Estimating end-effector force from Jaco motor torques

End effector forces become part of the equations of motion.

The geometric Jacobian transforms the extrinsic force into intrinsic coordinates. It is in general a rectangular matrix, since the number of extrinsic coordinates does not need to match the number of intrinsic coordinates.

A force f applied to the end effector appears linearly in the manipulator equations as J^Tf. This will be canceled by the torques of the motors during static or quasi-static movement. Hence, we can can determinethe external force from the motor torques if we assume it is the only external force at play.

The pseudo-inverse is the best possible solution to an overdetermined set of linear equations, in the least squares sense. We use this operation due to the non square nature of the Jacobian.

\min (Jx - X)^2 \rightarrow J^T Jx = J^T X

The following script prints both the end effector force as supplied by the Jaco SDK and the force as computed by Drake from the internal motor torques.

Why I (as of June 22 2019) think Haskell is the best general purpose language (as of June 22 2019)

Me and my close friends have been interested in starting a project together and I suggested we use Haskell. I do not think the suggestion was received well or perhaps in seriousness. I have a tendency to joke about almost everything and have put forward that we use many interesting but not practical languages in the same tone that I suggest Haskell. This was a tactical mistake. I find myself in despair at the idea I can’t convince my personal friends, who are curious and intellectual people, to use Haskell on a fresh start web project we have complete control over. What hope do I have in the world at large? This brain dump post is meant for them. My suggestion to use Haskell is not just me being an asshole, although that does make it more fun for me. I will now try to explain in all seriousness and in all the honesty that I can muster what my opinions on languages are and why I have them.

Pragmatically can you start a new project using a language you don’t know? This is a problem. A project always has some intrinsic difficulty. Not all projects will survive an extra layer of unnecessary speedbump. But when and how are you supposed to learn new languages? Never is one answer. I disagree with this answer. In this case we have the leg up that I do know Haskell. Perhaps this is a downside in that it will be extra frustrating? It is also easy for me to ask, as using Haskell is not a burden for me, I have already sunk the cost, but a massive learning burden for others.

Is Haskell actually practical for a web application? Short answer: yes. Expect pain though. If your web application is so simple you could rip it out in 100 lines of python, this is such a simple project that it is a good opportunity to learn something new. If it will become large and complex, then I believe Haskell does shine, keeping complexity under control. I base this upon the professional experience making a web application using a Haskell-Purescript stack. For honesty, it wasn’t all good. I recall ripping my hair out threading shit through monad stacks to where it need to go. Yet on the whole, it kept the project sane. I also believe this based on the word of mouth that I believe but could be just cultish ramblings.

I believe that truly dominating properties of Haskell appear in large complex projects. This is difficult to prove in any other way except empirically and the experiments will be so wildly uncontrolled in terms of the project and people involved that no conclusions can truly be drawn. And yet I have faith, and think that personal experience validates this opinion to myself at least. We have to live this life even though truth does not exist. Choices and opinions must be made.

For programs that are going to be a single manageable file and written in one night, it doesn’t matter much what you use in terms of being choked on your own code. At this scale, I still think Haskell is enjoyable and interesting though. Haskell was my doorway into the world of computer science as I now understand it. I hope there are more doorways.

Things about me that may be different from you. Decide for yourself if these aspects of me make our opinions fundamentally incompatible.

  • I do have a talent and a like for practically oriented mathematical topics (computational methods, linear algebra, formal methods, calculus, geometric algebra, projective geometry, optimization, etc.). I actually have very little taste at all for mathematical topics that I see no purpose to.
  • I do have some desire and taste for esoterica for its own purpose. I cannot exactly characterize what makes some topics acceptable to me and others not.
  • A hard learning curve is not necessarily a downside for me. I enjoy the challenge if it is overcomable and worth it on the other side.
  • I like weird and different. That is a positive for me, but a negative for many. I might just be a millennial hipster idiot.
  • I would LOVE to find a language I think is better than Haskell. I would LOVE to abandon Haskell. Perhaps this already makes me odd. Perhaps I think many people don’t consider the differences between languages to be worth making the switch and the wasted knowledge. The people with this opinion may or may not have tried enough languages.
  • I have “drank the koolaid”. I do read what comes out of the Haskell and functional programming community and have a tendency to believe things without strong empirical backing.
  • I have been more deeply entwined with Haskell than any other language. Perhaps if I had reached a level of familiarity in another language I’d be as fervent about that one? I don’t believe this is the case.
  • While I desire performance, I consciously curb this desire. I am strongly in favor of cleanly written, clear, principled code. This is of course a weighted judgement. I will probably use a 100x performance gain for a 2x decrease in clarity or reusability. This is a result of the problem domains and scale that have interested me in the past and that I anticipate in the future. I STRONGLY believe the world at large over values performance at the expense of other good qualities of code. Or at least over optimizes early.

A Biased List of Pros and Cons

What do I find bad about C++.

  • The feature set is huge and difficult to understand
  • Extreme amounts of legacy features that will be even recommended if you read legacy documentation. Kitchen sink.
  • The language appears to be almost entirely built out of footguns.
  • The syntax is too verbose.
  • I have a distaste for mutation.
  • I have a distaste for object oriented programming

What do I find good about C++

  • It is commonly used. Large community.
  • It is possible for it to be very fast
  • Kitchen Sink. You can find almost any feature you want here.
  • The high level goals of the mind-leaders sound good.
  • Very interesting projects are written in C++. HPC things. Scientific computation.
  • Template metaprogramming seems very powerful, but arcane.

What I find bad about Java

  • The syntax puts me off as being incredibly verbose
  • Extreme object oriented focus
  • Corporate/Enterprise feel. I am an iconoclast and see myself as a reasonable but slightly rebellious character. Java in my mind brings images of cubicles and white walls. Perhaps this is not fair.

What do I find good about Java

  • ?

I’m joking. Sorry Java. But I also kind of mean it. Yes, there are positive aspects to Java.

What I like about python

  • Very commonly used and understood. Perhaps the lingua franca
  • Incredible library ecosystem
  • Numpy and scipy in particular are marvels of the modern age.
  • Syntax is basically imperative pseudo-code
  • I am personally very familiar with it
  • python is the easiest to use language I know.

What I dislike about python

  • Python has no natural tendency for correctness due to the free wheeling dynamically typed character. This is patched up with testing, opt-in type systems.
  • I don’t know how to grow as a pythonista. The skill curve flattens out. For some this may be a positive.
  • The main way of building new data types is the class system. I think this is ungainly, overly verbose, and not always a good conceptual fit.
  • Despite being among the most succinct of common imperative languages, I still think it ends up being too verbose.
  • It is slow. This is a negative, although not high on my priorities.

What is bad about Haskell

  • Very difficult learning curve. Let’s get real. Haskell is a very confusing programming language to get started in for common programming tasks.
  • functional programming is weirder than imperative programming.
  • The monad paradigm, even once learned is ungainly. Tracking multiple effects is a pain that does not exist in most languages.
  • The pain is up front. It is easy to get a sketch of what you want ripped out in python faster than in Haskell (for me). If you want a web server, command line tool, optimization problem, curve fitter, I can rip all of these out faster in python than I can in Haskell. As a psychological thing, this feels awful. For a small scale project, unless toying with Haskell itself or one of its domain expertises like implementing DSLs, python is the easier and correct choice. Python is a scripting language. I’d make the switch at two screens worth of code.
  • I think laziness is confusing and easy to shoot yourself with.
  • Haskell is not the fastest language, although faster than python.
  • Concern for there not being jobs and interestingly on the converse side, no people to hire. There is a catch-22 there. There is a set of people that would KILL for a Haskell job.
  • There are vocal smug assholes who use Haskell and push it. I am smug, I hope only mildly an asshole.

What I find good about Haskell

  • The number one reason is that there is something ephemeral that I just like. I am not naturally inclined to analyze such things. When I like a movie or don’t like it, it just happens, and if forced to explain why, I’ll bullshit.
  • Errors and bugs are to a shocking degree caught at compile time. More than any other language I have experience with does Haskell code run correctly without hidden bugs if it compiles. I am not claiming this is absolute but it is all the more incredible the degree to which it is true since it didn’t literally have to be this way. I have done major rewiring of a data structure in a project. The compiler guided my hand to exactly the positions that needed to be adjusted. Could C++ do this? Yes, to some degree. I believe that the tendencies of C++ programming make it less satisfactory on this point.
  • Types are an incredible design tool. I find designing the types of my program to be an extremely enjoyable and helpful activity. Much more so than box and wire class and interface diagrams. A good function name and a type signature basically entirely constrains the behavior of a function. Types can be quickly and completely be given to the compiler and machine enforced.
  • The pain that Monads cause are pains you should be feeling. They are the pain of explicitness which I 70% choose over the pain of not knowing what the fuck a function might do, and not enabling the compiler to enforce that. If something is capable of mutating state, it should say so in goddamn huge purple letters.
  • Haskell is more than fast enough. It isn’t that even people don’t care. The Haskell community at large cares a lot more for performance than I do, and I reap the dividends. The people in charge of the compiler and the main libraries are goddamn wizards who’s work I get to benefit from. This is true of all languages perhaps.
  • Laziness is very cool. At the beginning I thought it was INCREDIBLY awesome and inconceivable that I could manipulate an infinite list.
  • The way of specifying new data types is so succinct and so incredible. Pattern matching is SO GODDAMN good for some tasks.
  • Haskell has a paradigm of small combinators. It is common to build a sequence of very small natural functions for the domain and then build larger and larger things out of them. This is good for reusability and conceptual clearness.
  • Extreme preference for Immutability. As part of keeping what you must keep in your head or remember small while programming, immutability is an insane win. You think you know what you want now. You know you could just tweak this variable here, make a special variable over here. You can reason about how to make this all work and be correct now. You will not in a month. Your coworkers will mess it all up too.
  • Haskell code is generic by default. This allows the same code to be reused in many situations
  • The standard typeclass hierarchy is extremely well thought out and powerful. To some degree it is unnecessary in other languages. The difference between Functor, Applicative, Monad, and Traversable makes little sense in languages with unconstrained mutations and effects.
  • Haskell paradigms are inspired by mathematics, and I have great faith in mathematics. The concepts behind Haskell feel closer to discoveries rather than inventions. Imperative programming speaks in a language formed for the accidental nature of the machines we have. Functional programming is a language closer to mathematics, which I believe is closer to how the human mind works, and closer to what the problem at hand actually is.
  • Complexity scales. It is my belief, perhaps unverified, that as a project grows larger, the insane miserable churn and mental overhead grows slower in a Haskell project than in other languages. I do not have strong empirical evidence to this assertion. Word of mouth (of Haskellers).
  • The ceiling on Haskell is extremely high. You can continue to learn and get better, gaining more and more power. I do not currently see the end of this.
  • When I do reach for some new library, I am very often impressed by how thoughtfully built it is. Haskell itself is INCREDIBLY thoughtfully built.
  • The haskell community is very excited and they have many things to teach. There are many Haskellers out there who are very welcoming, kind, and intelligent.
  • Haskell does have some cache. I am not immune to wanting to seem smart. If the world thought that only idiots use Haskell, that would offput me some. That the world things that only impractical ivory tower smug weenies use Haskell does offput me, although I perhaps embrace it belligerently.
  • The Haskell library ecosystem is strong. Less strong than python, but much better than some of the other languages that intrigue my eye. There is functionality somewhere for most common tasks.
  • Haskell is used industrially. I live in an echo chamber, but by and large the industrial users of Haskell sing its praises.

For context, this is my programming journey:

I first learned BASIC in middle school. I wrote a computer game in Visual Basic. I toyed with the TI-83. I went to a summer camp in high school where I learned the extreme rudiments of C++. Did a small web business with some friends. Did really bad work with big dreams. I took a fairly poor Java course in high school. I learned MATLAB in college. I doinked around with Arduino level C projects. I learned python in grad school I think. I think I got far more proficient at python than any other language before. At this point, I was on board for object oriented programming, albeit not at a deep level (design patterns or ilk), just as light organizational principle. Did some Android projects, really didn’t like Java there. Did a small business with my friend and got deep in Javascript. As our web application got bigger, this really start to hurt. Errors all over the place. Unknown junk in objects. Who has access to what? We tried our darndest to read and follow best practices as we could find them, but it felt like we sinking in quicksand. More and more effort for more and more bugs and less and less progress. It was around this era that I first even heard of Haskell. I was intrigued immediately for some reason, maybe for just how weird it was. It took probably 2 years of forgetting about it and going back to tutorials every 6 months. I didn’t necessarily KNOW that this was a thing that I wanted, I just found it interesting. Currently I am fairly proficient at some things in Haskell (unfortunately I am more proficient at esoterica than more practical things). I have had a professional job writing Haskell, and it seems like my like of Haskell is working out very well for me professionally. Passion in all forms is powerful.

My history may not be as convincing as someone who spent 20 years as a professional C++ dev and then switched, but I have at least experienced different paradigms and languages and found Haskell the most to my liking.

Random Thoughts

I am now trying to push myself into comfort in Julia, Rust, Agda, Coq, OCaml, all languages I feel show promise for different reasons. To my knowledge Haskell is a better choice than these as a general purpose tool for pragmatic reasons. Haskell’s library ecosystem is strong and performance is good. These are points against agda and coq. Julia has a focus on scientific programming.

Rust might be a good compromise. I consider it a trojan horse for useful programming language features that I associate with functional languages. It claims to performance and being an acceptable systems level language, which appeals to some. The syntax does not scare anyone off. The library ecosystem seems good and the community strong. I did find myself missing Haskell features in Rust though, am personally much less familiar with it. I think the design of Rust weights more to performance than is warranted in most applications and has less descriptive and abstraction power than Haskell, qualities that I prioritize. This opinion is not strongly held.

What makes Haskells types any better or worse than C? At the beginning many of the features of Haskell seem like magic. But as time has worn on, I can see how the features can be emulated with just some slight syntactic and conceptual overhead in other languages. This slight overhead is enough though. A language is more than just it’s syntax. It is also its idioms. It is also they way it makes people think. Language is almost a prerequisite for thought. You cannot even conceive of the ways there are to express yourself before learning.

What exactly makes a language “good”? This is a question poorly phrased enough to have no answer. Excel can be an excellent language for many tasks. Easy to learn. Very powerful. Yet, it is not considered a good general purpose programming language. Library ecosystem is extremely important. Specialized languages are often the best choice for special problem domains, at the expense of learning them, or eventually finding incompatibility of what you want from what they designed for.

What makes abstractions “good”. Why do I have queasiness about object oriented-programmming. Java, I think basically. I, overeagerly have gone down the road of trying to design deep subclass hierarchies, which is not OO at it’s best. Zebra is a Quadruped is an Animal is Alive kind of stuff. I believe object oriented in an interesting principle. I hear about SmallTalk and Common Lisp doing object oriented right and I am duly intrigued. There has been some recent work in Haskell about how to do objects in a way aesthetically compatible with Haskell. I think object oriented has been over used and abused. I think it is a poor conceptual fit for many situations. I think it tends to make non reusable code. I think the form that it takes in C++ and Java development is arcane horseshit.

I deserve almost no opinion about Java or C++, having not done sufficient that much in them. Yet, I must state my opinions, take them as you will, for I do in fact hold them strongly. I have worked on a network simulator and a robotics framework in C++, but begrudgingly. I have done a very small amount of Java development for a personal project and some Processing sketches. My coworker was a 10 year professional Java dev before switching to Scala then Haskell. He despises Java now. Highly anecdotal, and he is a similar iconoclastic character like me. Nevertheless, this also informs my opinion. I have been reading Bjarne Stroustrup’s book (his stated goals and what he claims C++ achieves are admirable and one can’t argue he hasn’t changed the world) and actually find C++ rather interesting, especially in the sense that many projects that I find interesting are written in C++, I just don’t want to myself work in the language.

Haskell love:

Haskell Criticism (perhaps warranted) Ah Hacker News. Always a sunny worldview.

Hacker news discussion of this post:

A Basic Branch and Bound Solver in Python using Cvxpy

Branch and bound is a useful problem solving technique. The idea is, if you have a minimization problem you want to solve, maybe there is a way to relax the constraints to an easier problem. If so, the solution of the easier problem is a lower bound on the possible solution of the hard problem. If the solution of the easier problem just so happens to also obey the more constrained hard problem, then it must also be the solution to the hard problem. You can also use the lower bound coming from a relaxed problem to prune your search tree for the hard problem. If even the relaxed problem doesn’t beat the current best found, don’t bother going down that branch.

A standard place this paradigm occurs is in mixed integer programming. The relaxation of a binary constraint (either 0 or 1) can be all the values in between (any number between 0 and 1). If this relaxed problem can be expressed in a form amenable to a solver like a linear programming solver, you can use that to power the branch and bound search, also using returned solutions for possible heuristics.

I built a basic version of this that uses cvxpy as the relaxed problem solver. Cvxpy already has much much faster mixed integer solvers baked in (which is useful to make sure mine is returning correct results), but it was an interesting exercise. The real reason I’m toying around is I kind of want the ability to add custom branching heuristics or inspect and maintain the branch and bound search tree, which you’d need to get into the more complicated guts of the solvers bound to cvxpy to get at. Julia might be a better choice.

A somewhat similar (and better) project is which doesn’t use cvxpy explicitly, but does have the branch and bound control in the python layer of the solver. There are also other projects that can use fairly arbitrary solvers like Bonmin

As a toy problem I’m using a knapsack problem where we have objects of different sizes and different values. We want to maximize the value while keeping the total size under the capacity of the bag. This can be phrased linearly like so: \max v \cdot x s.t. \sum_i s_i x_i<= capacity , x \in {0,1}. The basic heuristic I’m using is to branch on variables that are either 0 or 1 in even the relaxed solution. The alternative branch hopefully gets pruned fast.

This is at least solving the problem fairly quickly. It needs better heuristics and to be sped up, which is possible in lots of ways. I was not trying to avoid all performance optimizations. It takes maybe 5 seconds, whereas the cvxpy solver is almost instantaneous.

Edit : I should investigate the Parameter functionality of cvxpy. That would make a make faster version than the one above based on deepcopy. If you made the upper and lower vectors on the binary variables parameters, you could restrict the interval to 0/1 without rebuilding the problem or copying everything.

Mixed Integer Programming & Quantization Error

I though of another fun use case of mixed integer programming the other day. The quantization part of a digital to analog converter is difficult to analyze by the techniques taught in a standard signals course (linear analysis, spectral techniques, convolution that sort of thing). The way it is usually done is via assuming the quantization error is a kind of randomized additive noise.

Mixed Integer programming does have the ability to directly encode some questions about this quantization though. We can directly encode the integer rounding relations by putting the constraint that the quantized signal is exactly +-1/2 a quantization interval away from the original signal. Then we can run further analysis on the signals and compare them. For example, I wrote down a quick cosine transform. Then I ask for the worst case signal that leads to the most error on the quantized transform versus the transform of the unquantized signal. My measure of worst case performance was the sum of the difference of the two transforms. I chose this because it is tractable as a mixed integer linear program. Not all reasonable metrics one might want will be easily encodable in a mixed integer framework it seems. Some of them are maximizing over a convex function, which is naughty. (for example trying to maximize the L2 error \sum|x-y|^2 )

In a variant of this, it is also possible to directly encode the digital signal process in terms of logic gate construction and compare that to the intended analog transform, although this will be a great deal more computational expensive.

This is interesting as a relatively straightforward technique for the analysis of quantization errors.

This also might be an interesting place to use the techniques of Vanderbei . He does a neato trick where he partially embeds the FFT algorithm into an optimization problem by adding auxiliary variables. Despite the expense of adding these variables, it greatly increases the sparsity of the constraint matrices, which will probably be a win. I wonder if one might do something similar with a Fast Multipole Method , Barnes Hut, or Wavelet transform? Seems likely. Would be neat, although I’m not sure what for. I was thinking of simulating the coulomb gas. That seems like a natural choice. Oooh. I should do that.

Solving the XY Model using Mixed Integer Optimization in Python

There are many problems in physics that take the form of minimizing the energy. Often this energy is taken to be quadratic in the field. The canonical example is electrostatics. The derivative of the potential \phi gives the electric field E. The energy is given as \int (|\nabla \phi|^2 + \phi \rho) d^3 x . We can encode a finite difference version of this (with boundary conditions!) directly into a convex optimization modelling language like so.

The resulting logarithm potential

It is noted rarely in physics, but often in the convex optimization world that the barrier between easy and hard problems is not linear vs. nonlinear, it is actually more like convex vs. nonconvex. Convex problems are those that are bowl shaped, on round domains. When your problem is convex, you can’t get caught in valleys or on corners, hence greedy local methods like gradient descent and smarter methods work to find the global minimum. When you differentiate the energy above, it results in the linear Laplace equations \nabla^2 \phi = \rho. However, from the perspective of solvability, there is not much difference if we replace the quadratic energy with a convex alternative.

Materials do actually have non-linear permittivity and permeability, this may be useful in modelling that. It is also possible to consider the convex relaxation of truly hard nonlinear problems and hope you get the echoes of the phenomenology that occurs there.

Another approach is to go mixed integer. Mixed Integer programming allows you to force that some variables take on integer values. There is then a natural relaxation problem where you forget the integer variables have to be integers. Mixed integer programming combines a discrete flavor with the continuous flavor of convex programming. I’ve previously shown how you can use mixed integer programming to find the lowest energy states of the Ising model but today let’s see how to use it for a problem of a more continuous flavor.

As I’ve described previously, in the context of robotics, the non-convex constraint that variables lie on the surface of a circle can be approximated using mixed integer programming. We can mix this fairly trivially with the above to make a global solver for the minimum energy state of the XY model. The XY model is a 2d field theory where the value of the field is constrained to lie on a circle. It is a model of a number of physical systems, such as superconductivity, and is the playground for a number of interesting phenomenon, like the Kosterlitz-Thouless phase transition. Our encoding is very similar to the above except we make two copies of the field phi and we then force them to lie on a circle. I’m trying to factor out the circle thing into my library cvxpy-helpers, which is definitely a work in progress.

Now, this isn’t really an unmitigated success as is. I switched to an absolute value potential because GLPK_MI needs it to be linear. ECOS_BB works with a quadratic potential, but it was not doing a great job. The commercial solvers (Gurobi, CPlex, Mosek) are supposed to be a great deal better. Perhaps switching to Julia, with it’s richer ecosystem might be a good idea too. I don’t really like how the solution of the absolute value potential looks. Also, even at such a small grid size it still takes around a minute to solve. When you think about it, it is exploring a ridiculously massive space and still doing ok. There are hundreds of binary variables in this example. But there is a lot of room for tweaking and I think the approach is intriguing.


  • Can one do steepest descent style analysis for low energy statistical mechanics or quantum field theory?
  • Is the trace of the mixed integer program search tree useful for perturbative analysis? It seems intuitively reasonable that it visits low lying states
  • The Coulomb gas is a very obvious candidate for mixed integer programming. Let the charge variables on each grid point = integers. Then use the coulomb potential as a quadratic energy. The coulomb gas is dual to the XY model. Does this exhibit itself in the mixed integer formalism?
  • Coulomb Blockade?
  • Nothing special about the circle. It is not unreasonable to make piecewise linear approximations or other convex approximations of the sphere or of Lie groups (circle is U(1) ). This is already discussed in particular about SO(3) which is useful in robotic kinematics and other engineering topics.

Edit: /u/mofo69extreme writes:

By absolute value potential, I mean using |del phi| as compared to a more ordinary quadratic |del phi|2.

This is where I’m getting confused. As you say later, you are actually using two fields, phi_x and phi_y. So I’m guessing your potential is the “L1 norm”

|del phi| = |del phi_x| + |del phi_y|

? This is the only linear thing I can think of.

I don’t feel that the exact specifics of the XY model actually matter all the much.

You should be careful here though. A key point in the XY model is the O(2) symmetry of the potential: you can multiply the vector (phi_x,phi_y) by a 2D rotation matrix and the Hamiltonian is unchanged. You have explicitly broken this symmetry down to Z_4 if your potential is as I have written above. In this case, the results of the famous JKKN paper and this followup by Kadanoff suggest that you’ll actually get a phase transition of the so-called “Ashkin-Teller” universality class. These are actually closely related to the Kosterlitz-Thouless transitions of the XY model; the full set of Ashkin-Teller phase transitions actually continuously link the XY transition with that of two decoupled Ising models.

You should still get an interesting phase transition in any case! Just wanted to give some background, as the physics here is extremely rich. The critical exponents you see will be different from the XY model, and you will actually get an ordered Z_4 phase at low temperatures rather than the quasi-long range order seen in the low temperature phase of the XY model. (You should be in the positive h_4 region of the bottom phase diagram of Figure 1 of the linked JKKN paper.)”

These are some interesting points and references.

2D Robot Arm Inverse Kinematics using Mixed Integer Programming in Cvxpy

Mixed Integer programming is crazy powerful. You can with ingenuity encode many problems into it. The following is a simplification of the ideas appearing in . They do 3d robot arms, I do 2d. I also stick to completely linear approximations.

The surface of a circle is not a convex shape. If you include the interior of a circle it is. You can build a good approximation to the circle as polygons. A polygon is the union of it’s sides, each of which is a line segment. Line sgements are convex set. Unions of convex sets are encodable using mixed integer programming. What I do is sample N regular positions on the surface of a circle. These are the vertices of my polygon. Then I build boolean indicator variables for which segment we are on. Only one of them is be nonzero \sum s_i == 1. If we are on a segment, we are allowed to make positions x that interpolate between the endpoints x_i of that segment x = \lambda_1 x_1 + \lambda_2 x_2, where \lambda_i >= 0 and \sum \lambda=1. These \lambda are only allowed to be nonzero if we are on the segment, so we suppress them with the indicator variables \lambda_i <= s_i + s_{i+1}. That’s the gist of it.

image link

Given a point on the circle (basically sines and cosines of an angle) we can build a 2d rotation matrix R from it. Then we can write down the equations connecting subsequent links on the arm. p_{i+1}=p_{i} +Rl. By using global rotations with respect to the world frame, these equations stay linear. That is a subtle point. p and R are variables, whereas l is a constant describing the geometry of the robot arm. If we instead used rotation matrices connecting frame i to i+1 these R matrices would compound nonlinearly.

All in all, pretty cool!

The Beauty of the Cone: How Convex Cones Simplify Convex Programming

I watched the Stephen Boyd course to get me started in convex programming. At the beginning, he spends some time talking about convex sets rather than launching in convex optimization. I did not appreciate this sufficiently on the first pass. Convex sets are a very geometric topic and I think that for the most part, convex functions are best thought as a special case of them. The epigraph of a scalar valued convex function on R^d , the filled in area above a graph, is a d+1 dimensional convex set. Convex constraints on the domain can be thought of as further cutting this shape. Finding the minimum of the shape can be thought of as a geometrical problem of finding the furthest point in the -y direction.

There is another mathematical topic that I did not appreciate for how powerful and clean it is. If you check out this textbook by Fenchel, he starts with the topic of convex cones rather than sets, I now realize for good reason.

I was sketching out a programmatic representation of convex sets and was annoyed at how ugly things were turning out. First off, infinity is a huge problem. Many convex problems have infinite answers.

The simplest problem is \max_x c^T x with no constraints. The answer plunges off to infinity vaguely in the direction of c. The next simplest problem is \max_x c^T x , a^T x \geq 0. This either goes off to infinity away from the constraint plane, hits the constraint plane and goes off to infinity, or if c and a are parallel, it is an arbitrary location on the constraint plane.

In short, the very most simple convex problems have infinite answers. You actually need to have a fairly complex problem with many conditions before you can guarantee a finite answer. Once we have a bounded LP, or a positive definite quadratic problem do we start to guarantee boundedness.

In order to work with these problems, it is helpful (necessary?) to compactify your space. There are a couple options here. One is to arbitrarily make a box cutoff. If we limit ourselves to an arbitrary box of length 1e30, then every answer that came back as infinite before is now finite, albeit huge. This makes me queasy though. It is ad hoc, actually kind of annoying to program all the corner cases, and very likely to have numerical issues. Another possibility is to extend your space with rays. Rays are thought of as points at infinity. Now any optimization problem that has an infinite answer returns the ray in the direction the thing goes of to infinity at. It is also annoying to make every function work with either rays or points though.

Another slightly less bothersome aesthetic problem is that the natural representation of half spaces is a normal ray and offset a^T x \geq b The principles of duality make one want to make this object as similar to our representation of points as possible, but it has 1-extra dimension and 1 arbitrary degree of freedom (scalar multiplying a and b by the same positive constant does not change the geometrical half space described). This is ugly.

In the field of projective geometry, there is a very beautiful thing that arises. In projective geometry, all scalar multiples of a ray are considered the same thing. This ray is considered a “point”. The reason this makes sense is that projective geometry is a model of perspective and cameras. Two points are completely equivalent from the perspective of a pinhole camera if they lie on the same ray connecting to the pinhole. This ray continues inside the camera and hits the photographic screen. Hence points on the 2D screen correspond to rays in 3D space. It makes elegant sense to consider the pinhole to be the origin or your space, and hence you come to the previous abstract definition. Points at infinity in 3D (like stars effectively) are not a problem since they lie on finitely describable rays. Points at infinite edge of the 2D screen are not really a problem either. Perfectly reasonable points in 3D can map to the infinite edge of the screen in principle. Someone standing perfectly to the side of the pinhole in 3d space has a ray that goes perfectly horizontally into the camera, and in a sense would only hit a hypothetical infinite screen at infinity.

A great many wonderful (and practical!) things fall out of the projective homogenous coordinates. They are ubiquitous in computer graphics, computer vision, and robotics. The mathematical language describing translations and rotations is unified. Both can be described using a single matrix. This is not the intention, but it is a pleasant surprise. Other geometrical questions become simple questions of linear or vector algebra. It is very cool.

Can we use this method for describing the space we want to find convex sets in? I think not. Unfortunately, the topology of projective space is goofy. At the very least in 2D projective space, which can be thought of as a sphere with opposite points identified, do not necessarily have an inside and outside (I’m questioning this idea now)? So convex sets and talking about maximal half planes and such seems questionable.

But I think we can fix it. Cones are good. In a slight twist on the projective geometry idea, what if you only non negative multiples of rays \lambda \geq 0 as the same “point”. You can take as a canonical plane x_0 =1 similar to the pinhole camera. This plane can be thought of as your more ordinary affine space. Now half spaces touching the origin (cones) correspond to affine half spaces. We have a reasonable way of describing points at infinity on this plane, which correspond to rays. Arbitrary convex sets on this plane correspond to cones of rays.

Cones in this context are sets closed under arbitrary non-negative sums of points within them. Hence a cone always includes the origin. Cones are basically convex sets of rays.

By adding in an arbtrary-ish degree of freedom to points, we bring points and half spaces much closer in alignment. Now evaluating whether a point in a half space looks like a^T x \geq 0 with no ugly extra b.

So in closing, as convex sets are kind of a cleaner version of convex functions, so are convex cones a cleaner version of convex sets. This is actually useful, since when you’re programming, you’ll have to deal with way less corner infinite cases. The theory is also more symmetrical and beautiful

Lens as a Divisibility Relation: Goofin’ Off With the Algebra of Types

Types have an algebra very analogous to the algebra of ordinary numbers (video). This is the basic table of correspondences. Code with all the extensions available here.

One way to see that this makes sense is by counting the cardinality of types built out of these combinators. Unit is the type with 1 inhabitant. Void has 0 inhabitants. If a has n and b has m possible values, then Either a b has n + m inhabitants, (a,b) has n*m and there are n^m possible tabulations of a->b. We’re gonna stick to just polynomials for the rest of this, ignoring a->b.

Another way of looking at this is if two finitely inhabited types have the same number of inhabitants, then the types can be put into an isomorphism with each other. In other words, types modulo isomorphisms can be thought as representing the natural numbers. Because of this, we can build a curious proof system for the natural numbers using ordinary type manipulation.

In addition, we also get a natural way of expressing and manipulating polynomials.Polymorphic types can be seen as being very similar to polynomial expressions with natural coefficients N[x]. The polymorphic type variables have the ability to be instantiated to any value, corresponding to evaluating the polynomial for some number.

The Lens ecosystem gives some interesting combinators for manipulating this algebra. The type Iso' a b contains isomorphisms. Since we’re only considering types up to isomorphism, this Iso' represents equality. We can give identity isomorphisms, compose isomorphisms and reverse isomorphisms.

We can already form a very simple proof.

Now we’ll add some more combinators, basically the axioms that the types mod isos are a commutative semiring. Semirings have an addition and multiplication operator that distribute over each other. It is interesting to note that I believe all of these Iso' actually are guaranteed to be isomorphisms ( to . from = id and from . to = id ) because of parametricity. from and to are unique ignoring any issues with bottoms because the polymorphic type signature is so constraining. This is not usually guaranteed to be true in Haskell just from saying it is an Iso'. If I give you an Iso' Bool Bool it might actually be the iso (const True) (const True) for example, which is not an isomorphism.

There are also combinators for lifting isomorphisms into bifunctors: firsting, seconding, and bimapping. These are important for indexing into subexpressions of our types in a point-free style.

Here is a slightly more complicated proof now available to us.

We can attempt a more interesting and difficult proof. I developed this iteratively using . _ hole expressions so that GHC would tell me what I had manipulated my type to at that point in my proof.

Artwork Courtesy of David. Sorry for any motion sickness.

The proof here is actually pretty trivial and can be completely automated away. We’ll get to that later.

If Iso' is equality, what about the other members of the Lens family? Justin Le says that Lens&nbsp;s&nbsp;a are witness to the isomorphism of a type s to the tuple of something and a. Prism witness a similar thing for sums. Once we are only considering types mod isos, if you think about it, these are expressions of two familiar relations on the natural numbers: the inequality relation and the divisibility relation

Mathematically, these relations can be composed with equalities, just like in the lens hierarchy Lens and Prism can be composed with Iso. Both form a category, since they both have id and (.).

Here are a couple identities that we can’t derive from these basic combinators. There are probably others. Woah-ho, my bad. These are totally derivable using id_mul, id_plus, mul_zero, _1, _2, _Left, _Right.

Pretty neat! Random thoughts and questions before we get into the slog of automation:

  • Traversal is the “is polynomial in” relation, which seems a rather weak statement on it’s own.
  • Implementing automatic polynomial division is totally possible and interesting
  • What is the deal with infinite types like [a]? Fix. I suppose this is a theory of formal power series / combinatorial species. Fun combinatorics, generatingfunctionology. Brent Yorgey did his dissertation on this stuff. Wow. I’ve never really read this. It is way more relevant than I realized.
  • Multivariate polynomial algorithms would also be interesting to explore (Grobner basis, multivariate division)
  • Derivatives of types and zippers – Conor McBride
  • Negative Numbers and Fractions?
  • Lifting to rank-1 types. Define Negative and Fractions via Galois connection?

Edit: /u/carette (wonder who that is 😉 ) writes:

“You should dig into
J Carette, A Sabry Computing with semirings and weak rig groupoids, in Proceedings of ESOP 2016, p. 123-148. Agda code in A lot of the algebra you develop is there too.

If you hunt around in my repos, you’ll also find things about lenses, exploring some of the same things you mention here.”

Similar ideas taken further and with more sophistication. Very interesting. Check it out.


Our factor example above was quite painful, yet the theorem was exceedingly obvious by expansion of the left and right sides. Can we automate that? Yes we can!

Here’s the battle plan:

  • Distribute out all expressions like a*(b+c) so that all multiplication nodes appear at the bottom of the tree.
  • Reduce the expression by absorbing all stupid a*1, a*0, a+0 terms.
  • Reassociate everything to the right, giving a list like format
  • Sort the multiplicative terms by power of the variable

Once we have these operations, we’ll combine them into a canonical operation. From there, most equality proofs can be performed using the rewrite operation, which merely puts both sides into canonical form

Once we have those, the painful theorem above and harder ones becomes trivial.

Now we’ll build the Typeclasses necessary to achieve each of these aims in order. The Typeclass system is perfect for what we want to do, as it builds terms by inspecting types. It isn’t perfect in the sense that typeclass pattern matching needs to be tricked into doing what we need. I have traded in cleverness and elegance with verbosity.

In order to make our lives easier, we’ll need to tag every variable name with a newtype wrapper. Otherwise we won’t know when we’ve hit a leaf node that is a variable. I’ve used this trick before here in an early version of my faking Compiling to Categories series. These wrappers are easily automatically stripped.

A common pattern I exploit is to use a type family to drive complicated recursion. Closed type families allow more overlap and default patterns which is very useful for programming. However, type families do not carry values, so we need to flip flop between the typeclass and type family system to achieve our ends.

Here is the implementation of the distributor Dist. We make RDist and LDist typeclasses that make a sweep of the entire tree, using ldist and rdist as makes sense. It was convenient to separate these into two classes for my mental sanity. I am not convinced even now that I have every case. Then the master control class Dist runs these classes until any node that has a (*) in it has no nodes with (+) underneath, as checked by the HasPlus type family.

Next is the Absorb type class. It is arranged somewhat similarly to the above. Greedily absorb, and keep doing it until no absorptions are left. I think that works.