Trajectory optimization for mobile robots using model predictive control

Received Dec 18, 2018 The goal of this article is trajectory generation for biped robots based on Model Predictive Control (MPC) and the receding-horizon principle. Specifically, we want to minimize the error between the desired CoM and ZMP trajectory and the actual one and the cancellation of the shock gradient of the CoM and ZMP movements. Model predictive control (MPC) consist in a finite horizon optimal control scheme which uses a prediction model to predict vehicle response and future states, thus minimizing the current error and optimizing the future trajectory within the prediction horizon. The proposed algorithm will provide a trajectory of control inputs which will optimize the system states utilizing a quadratic form cost function similar to standard linear quadratic tracking. Specific to finite horizon control, the cost is summed over the finite prediction horizon of time length, rather than over an infinite time horizon. Many techniques have been proposed, developed, and applied to solve this constrained optimization problem for the mobile robots. With our aproach we try to investigate how is the MPC framework is applicable to trajectory generation for point-to-point problems with a fixed final time and to find a set of assumptions and methods that allow for realtime solutions. Model predictive control Real time robot control Mobile robot control Constrained optimization problem Linear quadratic optimal control


Introduction
Mobile robots, unlike other types of robots such as those with wheels or tracks, use similar devices for moving on the field like human or animal feet.Several researchers used switching techniques between the control laws needed at certain times in motion of the robot.An adaptive method to switch between different gain values used in tracking control on a motion trajectory for serial manipulators (Ouyang, et al., 2006).Compliant movement control which is essentially the default force control based on position was suggested by Lawrence, Stoughton (1987) and Kazerooni, Waibel, Kim (1990).Salisbury (1980) presented a method for active control of the end-effector apparent stiffness of the robot in Cartesian space.In this method the reference position is used to control the contact force and no force reference points are used.Khalil's method (Khalil et al., 1983) method stands out among them, for the advantages they offer.
The first one allows inverse kinematics problems to be solved regardless of the values of robot geometric characteristics, for robots with six degrees of mobility which have three rotational kinematic couplings on concurrent axis or three translation kinematic couplings.Because of the flexibility and that it has a solution for the inverse kinematics problem, this "decoupled" structure with three rotation couplings and concurrent axis is found in most robot models on the market.The position of the three axes intersection point is uniquely determined only by the q1, q2, q3 variables.Another advantage of the decoupled structure is allowing splitting and separate negotiating of the positioning and orientation.Paul's method as well as Lee and Elgazaar's treat each case separately without generalizations.In many robotic problems, it is desired to achieve a certain state of the robot at a given time.This will lead to a problem of tracking the reference signal when the desired state is specified as a function of time during the entire course (Pop &Vladareanu et al., 2018).If the desired state is discontinuous in time, we need to make a prediction between points, that is, a continuous transfer of the robot, from an initial state to the next state, and the robot movement must meet certain imposed restrictions.This behavior will be specified by an objective function.Model predictive control is proposed to address this problem, a model designed to predict the behavior of the system by minimizing predicted tracking errors and control effort used to achieve restrictions on control inputs and state variables in a finite time horizon.At each time sample, an optimal control input sequence is generated after solving the minimization problem.The first element of this control input sequence is applied to the system.Then the problem is resolved again at the next sampling time with the updated measurements and a shifted horizon.In this MPC formulation, the following cost is minimized to determine the optimal sequence of commands uk in the prediction horizon length.A trajectory generator must be able to determine the action ((u(x,t)), that satisfies a set of state constraints ((xC(t)) or minimize a cost function ((F(x)), or both subject to initial state constraints ((x(t0)) and the predictive motion model x,u,t).For this, an objective function, a system model (differential equation of the motion), state restrictions, and a time horizon are required.Our goal is to use MPC as a point-to-point trajectory generation problem and to track the reference signal (desired trajectory).

Mathematical Formulation of the MPC
Starting from information (measurements) at time t, the controller predicts the dynamic behavior of the system on a predictive horizon T p =N p xT c , where N p is number of the pre-calculated optimal inputs U={u 1 , …, u Np } and T c is the control sampling time.We enter only the first input u 1 in the system, and at the next sampling period, the measurements are repeated.If there are disturbances than the whole process is repeated (this case is often encountered).Let the following system of non-linear differential equations describing the dynamic system: 0 ( ) ( ( ), ( )), (0) Where ()  denote the vector of state and inputs, respectively, and max is the set of input constraints and the set of state vector restrictions, with constant vectors min max , uu and min max , xx .
The optimal control problem with the control input applied to the system is achieved by solving the optimal control problem over a finite time horizon at each time sample: Find (.) ˆ(.) min ( ( ), (.)) .
For the sequential case, let 0 ˆ: [ , ] kp u t t T U  be the control sequences from the current time 0 p t to t T  .

Dynamic model of walking robot approximated by 3D LIPM
The complete dynamic model of a human robot is non-linear and complex.In order to generate the movement of the biped robot, it was considered that the approximation of the non-linear dynamic model with the linear pendulum 3D model (3D-LIPM) is good enough and additionally, the cost of calculation is low and convenient for tracking the online trajectory.Assuming that the robot's CoM (center of mass) is restricted to move on a horizontal plane with a altitude h constant, a set of decoupled equations that governs the CoM and ZMP (zero moment point) will result. .where g is acceleration due to gravity, m is the mass of the pendulum, and are the moments around the x and y axes respectively.Considering 3D-LIPM with altitude h constant, the ZMP equations become (3.2): ; where (z x , z y ) are the coordinates of the ZMP on the flat floor.Substituting equations (3.2) in (3.1) we get the decoupled movement equations of ZMP:

The dynamic system with discrete time of walking robot representing the movement in the x direction of CoM and ZMP (similar for the y direction) for 3D-LIPM
We will consider the shock in the direction x and y respectively as the input controller.The two trajectories of the CoM and the ZMP are discretized with cubic polynomials on portions, depending on the shocks x and y , assumed constant on the samples of length T: where: and T is the length of the time sample.Similarly, in the y direction.

Linear Quadratic optimal control for Model Predictive Control
We will define a square cost function on a finite N-step horizon: where X k is the state vector at the time k, obtained from the state variable X 0 =X(0) and from (3.4) we will consider the matrix system: ( where U 0 =[U 0 , ...,U N-1 ] T is the input sequence.We consider the problem of optimal control (3.8) Here we assume that the weighting penalizing matrices of the state variables are positively defined, Q = Q T ≥ 0, P = P T ≥ 0 and also the weight matrix of penalizing the inputs is positively defined R = R T .By solving the optimal control (3.8), it means that we can minimize the error between the desired trajectory of the CoM and the ZMP and the real one.Through the penalty matrix Q we minimize the errors between the real and desired trajectory{} d k X .By means of the penalty matrix P, the robot is rebalanced in the final step N, and with the penalty matrix R it is desired cancellation the gradient of the shock in the direction x, We mention that the problem (3.8) can be solved globally or recursively using dynamic programming.
Although our problem is linear, there are two reasons for addressing the second option.The first reason is that in the first version, the work arrays are of large size and the computational effort is larger and the second reason is that in the trajectory tracking problem, the system may be subject to unpredictable perturbations in the model so that the feedback law would be more accurately calculated because at each step of the time the observed state variable X(k) is used to determine the control action before the predicted variable X k at time t=0.

The difference between MPC and LQR
The problem of predictive control formulated as a problem of modeling the future trajectory for dynamic systems with continuous or discrete time is similarly solved.Both types of problems are related to the classic linear quadratic regulator (LQR) when using a sufficiently long prediction horizon.The basic difference between predictive control and LQR is that in the case of predictive control the optimization problem is solved using a moving time horizon window, while in the case of LQR the same problem is solved in a fixed time window.The advantage of using a moving time window is the ability to perform real-time optimization with constraints on certain system variables.One of the well-known issues in classic predictive control is a numerical instability problem when the prediction horizon is big because the model used contains an integration process.To overcome this shortcoming, the design pattern must be asymptotically stable.

Conclusions
The development of control technology and software offers great possibilities for implementing advanced control algorithms.MPC is one of the most popular advanced techniques for industrial process applications, being able to handle a wide variety of control constraints and can be used at different levels of the process ( (0)) min ( (0), ) In this article we used MPC for to minimize the error between the desired CoM and ZMP trajectory and the actual one and the cancellation of the shock of the CoM and ZMP movements.
Moustris and Tzafestas (Moustris &Tzafestas, 2010) used a fuzzy switch, outside the control loop, for switching from a reference trajectory to another for the motion control of a mobile robot.Nicolás and Sagüés (Nicolás &Sagüés et.al., 2008) a switching control based on the epipolar geometry presented, which has the purpose to switch between different captured images by a mobile robot for to compute its trajectory up to target.These switching techniques and many others were used mainly to control between reference values (Vladareanu V. et al., 2013, 2014) or between constant values for a certain control law (Wang H.B. et al., 2015, Sandru O.I. et al., 2013).
Figures 1.1 and 1.2 depicts the basic principle of MPC.