Phaniteja S*1 Parijat Dewangan*1 Abhishek Sarkar1 K. Madhava Krishna1
General Inverse Kinematic (IK) solvers may not guarantee real-time control of the end-effectors in external coordinates along with maintaining stability. This work addresses this problem by using Reinforcement Learning (RL) for learning an inverse kinematics solver for reachability tasks which ensures stability and self-collision avoidance while solving for end effectors. We propose an actor-critic based algorithm to learn joint space trajectories of stable configuration for solving inverse kinematics that can operate over continuous action spaces. Our approach is based on the idea of exploring the entire workspace and learning the best possible configurations. The proposed strategy was evaluated on the highly articulated upper body of a 27 degrees of freedom (DoF) humanoid for learning multi-goal reachability tasks of both hands along with maintaining stability in double support phase. We show that the trained model was able to solve inverse kinematics for both the hands, where the articulated torso contributed to both the tasks.