Checkin out the OpenAI Baselines

These are my notes on trying to edit the opeai baselines codebase to balance a cartpole from the down position. They are pretty scattered.

First I just run the built in examples to get a feel and try out deepq networks.

The PPO algorithm at the bottom is the reccommended one still I think. I got the pole kind of upright and would balance for a couple seconds maybe. Work in progress. The ppo example has some funky batching going on that you need to reshape your observations for.

Some of these are ready to roll on anything you throw at them

They use python3.

pip3 install tensorflow-gpu

pip3 install cloudpickle

running a deepq cartpole example

checkout source

what is all this shit.

Has quickly suppressed  exploration

has a clear upward trend. But the increase dies off at episode 300 or so. Learning rate decay?

took around 300 episodes to reach reward 200

Pretty good looking

The pong trainer is now called

sudo apt install python3-opencv

Hmm. Defaults to playing breakout actually. Highly exploratory at first. Still basically totally random after a couple mins

The buffer size is actually smaller than I’d expected. 10000? is where the learn function is defined.

Hmmm explloration fraction is fraction of time for random play to be turned off over.

Is it getting better? Maybe. Still at 90% random. reward ~0.2

After about 2 hours 13,000 episodes still at 32% exploration. reward ~ 2

How much is from reduction of randomness, vs improvement of q-network? I suppose I could make a graph of reward vs exploration % using current network


trying ppo2 now (this is apparently openai goto method at the moment for first attempt)

I don’t know what to attribute this to necessarily but it is very quickly outperforming the q learning

like in 30seconds

clipfrac is probably how often the clipping bound gets hit

approxkl is approximate kullback-leibler divergence

looking inside the networks,

pi is the move probability distribution layer

vf is the value function

The MlpPolicy class will be useful for lower dimensional tasks. That is the multilayer perceptron, using  a bunch of fully connected layers

on a non simulated system, lower processor count to 1. The -np option on mpirun

Or just follow the mujoco example which is using only 1 cpu

Ah. This whole time it was saving in the /tmp folder in a time stamped folder



without threshold stopping, the thing went absolutely insane. You need to box in the pole


trying to make the square of height, so  that it more emphasizes a total height achieved? Maybe only give reward for new record height? + any height nearly all the way up

beefing up the force magntidue. I think it is a little too wimpy to get it over vertical

Maybe lower episode end so it spends more time trying to get higher than trying to just stay in bounds

Wait, should I not add those vector wrapper guys?


Hey! Sort of kind of success… It gets it up there sometimes, but then it can’t really keep it up there… That’s odd.

Wow. This is not a good integrator. Under no force the pendulum is obviously gaining energy. Should that matter much? I switched around the ordering to get a leapfrog. Much better.

custom cartpole instance to work from down position and editted to work with ppo2. To more closely match mujoco example, i switched from discrete actions to a continuous choice. I was trying to shape the reward to keep it in bounds. We’re getting there maybe. Picking the starting position happens in reset. Most everything else is a straight copy from the gym env


A script to view resulting network






How I Ruined Today Reading Haskell Posts

Lot of gold here (comments included). Graphs are important and I haven’t yet grokked the FGL thing or other methods of handling them

Functional programming with graphs from haskell

A new methodology for Arduino Haskell bindings

Sharing in Haskell using Data.Reify.

Resources on String Diagrams, and Adjunctions, and Kan Extensions

I’ve been trying to figure out Kan Extensions

Ralf Hinze on Kan Extensions


But while doing that I went down a rabbit hole on String Diagrams

This post is the first one on String Diagrams that made sense to me.

I had seen this stuff before, but I hadn’t appreciated it until I saw what Haskell expressions it showed equivalence between. They are not obvious equivalences

This seems like a very useful video on this topic.

In Summary, it is an excellent notation for talking about transformations of a long sequence of composed Functors  F G H … into some other long sequence of Functors. The conversion of functors runs from up to down. The composition of functors procedes left to right.  F eta is the fmap of eta, and eta F is eta with the forall’ed type unified to be of the form F a.

Adjunctions L -| R are asymmetric between cups and caps. L is on the left in cups and on the right in caps. That’s what makes squiggles pull straightable

I think I have an interesting idea for a linear algebra library based on this stuff


John Baez and Mike Stay’s Rosetta Stone (A touch stone I keep returning to)

Dan Piponi gave a talk which is another touch stone of mine that I come back to again and again. There is a set of corresponding blog posts.

Other resources:

NCatLab article

John Baez hosted seminars



Dan Marsden’s Article

Marsden and Hinze have been collaborating

Stephen Diehl on Adjunctions


A Section From an old Oregon Programming Language Summer School (a rich set of resources)


Marsden and Hinze have been collaborating


Mike Stay doing a very interesting series of Category Theory in Javascript. He uses contracts in place of types. Defeats one of the big points of types (static analysis), but still pretty cool



I think that about covers everything I know about.

Oh yeah, there is the whole Coecke and Abramsky categorical quantum mechanics stuff too.

Using the Purescript Servant Bridge

Alright, here is some garbage code. Hope it helps you, sorry if it confuses you.

Checkout the Counter example

that is where I pulled everything from. I ripped out just about everything fancy just so you and I can see the truly bare bones example.

It’s mostly boiler plate on the generator side.

He does a couple fancier things that you may want, like rearranging names and changing folders. Here is a more basic example

This goes in app/ if you’re using a stack template

Mostly things can just be the defaults. Look at the Counter Example for some more config stuff you can do.

You do need to separate out the user json api by itself. If you hand writeAPIModuleWithSettings an api that has the RAW serving the index.html, it freaked out. Maybe there is a way to handle that, but it’s not like that is probably what you want anyhow.

The myTypes Sum Type you want to add to for every type that you want to export over to purescript. frontEndRoot is where the generated files will go.

The Proxy business is a bunch of typelevel programming boilerplate. So is the empty MyBridge type.

There is basically no content to this code.

You also need to add this app/PSGenerator.hs file to your cabal file.


Every time you want to run the generator, you need to run

stack exec psGenerator


This then will put an API file and a Data type file into your purescript source in frontend/src

Using the API is a touch annoying but correct. If you look at the generated signature

There are a lot of constraints you need to satisfy in the monad m in order to call this thing. You need a monad that is Reader-like for getting the SPSettings_, needs to handle a Possible AjaxError, and needs to be an Aff-like monad. Woof.

It makes sense that you’d want to do all of this, but it is a burdensome mental overhead to get started.

Here’s some very basic source that shows how to at least get to the stage where you can log it the resulting request. I’m just dumping the error like a bad boy.

Note that you have to install purescript-servant-support as it tells you when you run psGenerator. I’ve been using psc-package. It is often helpful to go in to the psc-package.json file and update to the latest package-set. Just a little tip.

You see that the ExceptT handles the AjaxError and the ReaderT supplies the settings, which uses the defaults + a baseURL to point the request to

The whole thing is run inside an Aff monad.


Here’s the basic servant code


Again, I started using the stack servant template, whose directory structure I’m complying with.


Edit: Some more comments: Purescript bridge is a seperate project from servant-purescript. Purescript bridge will translate your Haskell types. Servant purescript writes your api calls. The two lines in the main of PSGenerator.hs do these sepearte tasks. the writeAPI writes the API calls and  writePSTypes writes the types.

If you want to transport a parametrized data type like (Maybe a) in the myTypes things, hand the type a special type from here

works like a charm



Downloading and Collecting Coursera videos

I like to watch and listen to my coursera videos on my commute. The app has download functionality but the quizzes and crap require your intervention. I need just a block of stuff so I can be hands free.

coursera-dl is a command line tool to download coursera content

basic usage is like so

coursera-dl -h is a help menu

you can pass in your username and password with -u and -p or setup a ~/.netrc file as described in the README

coursera-dl –list-courses -n

I think it should list courses by default honestly.

This downloads all mp4 videos

coursera-dl cloud-computing -n -f “mp4”


I then made a dirty script that will go through each week and concatenate the videos of that week hopefully in order into a single mp4 file. It is not a clean script. It will throw some errors and build some weird extra files, but it gets the job done. Run it in the course directory





Deep Learning Coursera Notes


cross-entropy – expectation value of log(p).

initialization – randn for weights. use 2/sqrt(input size) if using relu. See He. Avoids blow up

epoch – one run through all data

mini-batch – break up data into 1 gpus worth chunks. Worth trying different values to see

momentum – smooths gradients that are oscillating and lets build up

Adam – combined momentum and RMS prop. Works better often? 0.9 for beta1 and 0.999 for beta2 are common parameters.


Hyperparameter search – random points use log scales for some things.

reevalute your hyperparametrs occasionally

batch normalization – adds a normalization and mean subtraction at every hidden layer. makes later neurons less susceptible to earlier changes

tensorflow – variables – placeholder, make sessions, run a trainer

strategy – fit training set, dev set, test set, real world

use better optimizer bigger network if not fitting training

use more dat, rgularize if not

satisficing metric

add weight to realyy important examples

bias  – perforance on triainig set – human level is good benchmark

error analysis – ceiling on performance. Find out how many of some kind of problem are happening to figure out what is worthwhile. Do it manually

reasonably robust to random errors in training set

build first fast, iterate fast

if you need to use a different distro from training set, use the real stuff mostly in your dev and test

Break up into train dev and train-dev. so that you can know if the problem is due to mismatch or due to overfitting

manually try to make training set more like dev set on problem cases. Maybe add noise or find more examples of the error prone thing

Transfer learning

Multi-task learning

end to end – use subproblems if you have data for subproblems


And… I can’t access the two last ones yet. Poo.



Fixing up some jekyll problems for jupyter

the jupyer jekyll plugin supposedly won’t work on github pages

jupyter nbconvert –to markdown jekyll_test.ipynb

To get latex (including the $ tags) to work on the minima layout I added

into an _includes/head.html



I added a pynb  directory and added the following into my _config file? Not sure this was necessary.


replace all




in the markdown file.

could also add syntax highlighting but maybe this is good enough.

Maker Faire NYC

My first Maker Faire. Saw some neat stuff. Good energy. Declan made a write up.

Things to investigate:

CircuitStudio / circuitmaker – circuit design software a la Fusion 360 business model from altium. Windows only? Don’t like that

OctoPart – electronic components footprints and BOM

Mesh – Little IoT buttons and sensors

Sam Zeloof – Smart ass kid making his own IC

Tindie – Can I buy useful boards from here? – Maybe some photogrammetry tips? Hmm. Not anything there yet another really smart kid doing machine learning

Makerlogic – A guy pushing FPGA education. Max10 based board. Not much there yet – 4G enabled rapsberry pi shields – small form factor intel computers and accessories. Keep an eye out for update in realsense in October. – an interesting combo scratch and openscad javascript kind of thing. Also has physics engines?



Cellular Automata in Haskell

Ben recently tried this and I wanted to see if I could do it my way

I’ve seen this done (Bartosz?) before but I tried to do it without looking anything up.

The comonad is an interesting pattern to use. It automates the translation invariant nature off the cellular automata. This would also be useful for translationally invariant PDEs like the simple wave equation or others.

I used the laziness of Haskell to start with an infinite plane of zeros. Of course if you ever want to look at it, you need to pick a finite slice at the end using dtake