Hyperparameter Behavior in Reinforcement Learning

How do hyperparameters affect traditional RL algorithms?

At the moment this project is on halt until funding/another way of running these models are possible.

Essentially we look at the benefits of using different optimizers in RL.


Started the project as a spinoff of my previous post. I decided that since I'm writing so much Tex in Mathjax, why not just write this paper (that I was eventually going to do) in LaTex then easily use the VPG explanation into my previous blog post (which is why it isn't finished yet).

I worked on the project heavily in February and have finished all necessary testing for the k3s Kubernetes cluster program to begin and start on Google Cloud Compute.

Why I can't finish the paper


From the initial testing the project will take 5-8 weeks to complete with 6 asynchronous simulation and rollout servers. The estimated cost comes to around $8,000, which for a HS student, I don't have to any capacity. I've applied to Google's Research Credits program and hoping to gain this funding to start the cluster.


