The purpose of this project is to use advanced reinforcement learning techniques to build a network capable of reading physical data from a simulated environment to control four quadcopter motors.
I implemented Proximal Policy Optimization to generate four continuous outputs that each represent the power value for an individual quadcopter motor. For a more detailed explanation of my process, please refer to the GitHub link above.
My implementation was incapable of solving the environment. Despite many alterations such as reward shaping, minibatch learning, and learning rate annealing, the network failed to improve over thousands of epochs. Despite the apparent failure in performance, the project was still an excellent learning experience and I look forward to filling in the gaps in my reinforcement learning knowledge before returning to this project.