top of page

kendama : Machine Learning Controls

CIS519 Final Project at UPenn

Fall 2016

Since we were all interested in robotics, my team decided that the focus of our final project for CIS519: Introduction to Machine Learning would be the use of reinforcement learning (RL) to teach a simulated robotic system some simple control algorithms. We used a series of algorithms to learn to swing- up and balance of an inverted pendulum on a cart using the OpenAI Gym, and then extended those algorithms to learn on the more complex game of kendama (a cup-in-ball game).

Traditional controllers require models of system dynamics and expertise in control tuning. However, this approach becomes impractical for systems with complex dynamics. RL can be used to learn a model-free controller of a simple system and then easily expand it to a complex system.

OPENAI GYM CARTPOLE SIMULATION

OUR KENDAMA MODIFICATION TO CARTPOLE SIMULATION

The main categories of RL techniques are value iteration, policy iteration, and actor-critic. Value iteration chooses a fixed policy and uses iterations of the learning event to update the value(state, action). We investigated and compared multiple algorithms: finite difference policy gradient, Q-learning, random search, policy iteration, and actor-critic.

POLICY GRADIENT LEARNING CURVE FOR KENDAMA SWING-UP

POLICY GRADIENT LEARNING CURVE FOR KENDAMA CATCH

We found that our policy gradient algorithm yielded both the fastest learning and the most robust policy. For the inverted pendulum swing-up and balance problem, we observed success in 86/100 trials using 1200 training iterations for swing-up and 300 iterations for balancing. We tweaked the algorithm for the expanded state space of kendama and observed success in 89/100 trials using 500 training iterations each for swing-up and catch.

PLOTS FOR X, R, THETA FOR SUCCESSFUL KENDAMA CATCH, AND BLURRED IMAGE OF THAT CATCH

The policy gradient method was able to solve both the cartpole and kendama problems easily, without explicit compensation for the difference in dynamics. We were excited to see how a well-chosen RL algorithm can be so readily generalized to solve different problems.

Additional Resources

bottom of page