diff --git a/doc/images/gradientDescent1D01.png b/doc/images/gradientDescent1D01.png new file mode 100755 index 0000000000000000000000000000000000000000..d7061d1225b481c9387727ee41eba3ee5e5fe226 Binary files /dev/null and b/doc/images/gradientDescent1D01.png differ diff --git a/doc/images/gradientDescent_vs_NewtonMethod01.jpg b/doc/images/gradientDescent_vs_NewtonMethod01.jpg new file mode 100644 index 0000000000000000000000000000000000000000..8b8bd582068516c0db73405c2852cab9a00a3a20 Binary files /dev/null and b/doc/images/gradientDescent_vs_NewtonMethod01.jpg differ diff --git a/doc/images/iLQR_objectBoundaries01.jpg b/doc/images/iLQR_objectBoundaries01.jpg new file mode 100644 index 0000000000000000000000000000000000000000..f8b031f7ac6ed1ba32e400470684ef7ee89ba3fe Binary files /dev/null and b/doc/images/iLQR_objectBoundaries01.jpg differ diff --git a/doc/images/iLQR_objectBoundaries01.png b/doc/images/iLQR_objectBoundaries01.png deleted file mode 100755 index 8b855c268999f7291ccc7a204b30235ace41b1fa..0000000000000000000000000000000000000000 Binary files a/doc/images/iLQR_objectBoundaries01.png and /dev/null differ diff --git a/doc/images/transformations01.jpg b/doc/images/transformations01.jpg new file mode 100644 index 0000000000000000000000000000000000000000..787ee1b128f376838fe23408e489eca0c1491109 Binary files /dev/null and b/doc/images/transformations01.jpg differ diff --git a/doc/images/transformations01.png b/doc/images/transformations01.png deleted file mode 100755 index f408379b082d37ff57e41febfcaa284d31448ccb..0000000000000000000000000000000000000000 Binary files a/doc/images/transformations01.png and /dev/null differ diff --git a/doc/rcfs.pdf b/doc/rcfs.pdf index 3e34e0ee14b7f55711941b31da3520780957987e..94dba56647bd69e46037b70fc7f2f7656d0c5f03 100644 Binary files a/doc/rcfs.pdf and b/doc/rcfs.pdf differ diff --git a/doc/rcfs.tex b/doc/rcfs.tex index 8ad68ae82b7e22201b6fa7bfc858f3a4df1a3a53..f3005c5ab0b180f4adad7b03608b93ecd02acb20 100644 --- a/doc/rcfs.tex +++ b/doc/rcfs.tex @@ -207,7 +207,7 @@ Figure \ref{fig:PoG} shows an illustration for 2 Gaussians in a 2-dimensional sp \newpage %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% -\section{Newton's method for minimization}\label{sec:Newton} +\section{Cost function minimization problems}\label{sec:Newton} \begin{wrapfigure}{r}{.28\textwidth} %\vspace{-20pt} @@ -220,9 +220,77 @@ Problem formulation. \vspace{20pt} \end{wrapfigure} -We would like the find the value of a decision variable $x$ that would give us a cost $c(x)$ that is a small as possible, see Figure \ref{fig:NewtonProblem}. Imagine that we start from an initial guess $x_1$ and that can observe the behavior of this cost function within a very small region around our initial guess. Now let's assume that we can make several consecutive guesses that will each time provide us with similar local information about the behavior of the cost around the points that we guessed. From this information, what point would you select as second guess (see question marks in the figure), based on the information that you obtained from the first guess? There were two relevant information in the small portion of curve that we can observe in the figure to make a smart choice. First, the trend of the curve indicates that the cost seems to decrease if we move on the left side, and increase if we move on the right side. Namely, the slope $c'(x_1)$ of the function $c(x)$ at point $x_1$ is positive. Second, we can observe that the portion of the curve has some curvature that can be also be informative about the way the trend of the curve $c(x)$ will change by moving to the left or to the right. Namely, how much the slope $c'(x_1)$ will change, corresponding to an acceleration $c''(x_1)$ around our first guess $x_1$. This is informative to estimate how much we should move to the left of the first guess to wisely select a second guess. \newline +We would like the find the value of a decision variable $x$ that would give us a cost $c(x)$ that is a small as possible, see Figure \ref{fig:NewtonProblem}. Imagine that we start from an initial guess $x_1$ and that can observe the behavior of this cost function within a very small region around our initial guess. Now let's assume that we can make several consecutive guesses that will each time provide us with similar local information about the behavior of the cost around the points that we guessed. From this information, what point would you select as second guess (see question marks in the figure), based on the information that you obtained from the first guess? -\begin{wrapfigure}{r}{.48\textwidth} +There were two relevant information in the small portion of curve that we can observe in the figure to make a smart choice. First, the trend of the curve indicates that the cost seems to decrease if we move on the left side, and increase if we move on the right side. Namely, the slope $c'(x_1)$ of the function $c(x)$ at point $x_1$ is positive. Second, we can observe that the portion of the curve has some curvature that can be also be informative about the way the trend of the curve $c(x)$ will change by moving to the left or to the right. Namely, how much the slope $c'(x_1)$ will change, corresponding to an acceleration $c''(x_1)$ around our first guess $x_1$. This is informative to estimate how much we should move to the left of the first guess to wisely select a second guess. \newline + +Now that we have this intuition, we can move to a more formal problem formulation. Newton's method attempts to solve $\min_x c(x)$ or $\max_x c(x)$ from an initial guess $x_1$ by using a sequence of \textbf{first-order Taylor approximations} (gradient descent) or \textbf{second-order Taylor approximations} (Newton's method) of $c(x)$ around the iterates. + + +%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% +\subsection{Gradient descent} + +\begin{algorithm} +\caption{Backtracking line search method with parameter $\alpha_{\min}$ (presented here for decision variable $\bm{x}$)} +\label{alg:linesearch} +$\alpha \gets 1$ \\ +%\While{$c(\bm{\hat{u}}+\alpha\;\Delta\bm{\hat{u}}) > c(\bm{\hat{u}}) \;\textbf{and}\;\; \alpha > \alpha_{\min}$}{ +\While{$c(\bm{x}+\alpha\;\Delta\bm{x}) > c(\bm{x}) \;\textbf{and}\;\; \alpha > \alpha_{\min}$}{ + $\alpha \gets \frac{\alpha}{2}$ +} +\end{algorithm} + +Figure \ref{fig:gradientDescent} shows how consecutive \textbf{first-order Taylor approximations} of $c(x)$ around the iterates allow to solve a minimization problem. + +The first-order Taylor expansion around the point $x_k$ can be expressed as +\begin{equation*} + c(x_k\!+\!\Delta x_k) \approx c(x_k) + c'(x_k) \; \Delta x_k, +\end{equation*} +where $c'(x_k)$ is the derivative of $c(x_k)$ at point $x_k$. + +\begin{wrapfigure}{r}{.42\textwidth} +\centering +\includegraphics[width=.36\textwidth]{images/gradientDescent1D01.png} +\caption{\footnotesize +Gradient descent for minimization, starting from an initial estimate $x_1$ and converging to a local minimum (red point) after 8 iterations. +} +\label{fig:gradientDescent} +\end{wrapfigure} + +By starting from a point $x_k$ at each step $k$, we are interested in applying a correction $\Delta x_k$ that would decrease the cost $c(x_k)$. +If the cost function follows a linear trend at point $x_k$ with a slope defined by its gradient $c'(x_k)$, one direction would increase the cost while the other would decrease it. Thus, by applying a correction $\Delta x_k = -\alpha c'(x_k)$, where $\alpha$ is a positive scaling factor, we go down the slope estimated at $x_k$. + +The scaling factor $\alpha$ can be either constant or variable. If $\alpha$ is too large, there is the risk that our local linear approximation is not valid anymore when we move far away from $x_k$. If it is too small, the iterative algorithm will require many iteration steps to converge to a local minimum of the cost function. + +In practice, a simple backtracking line search procedure can be considered with Algorithm \ref{alg:linesearch}, by considering a small value for $\alpha_{\min}$. For more elaborated methods, see Ch.~3 of \cite{Nocedal06}. + + +%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% +\subsubsection*{Multidimensional case} + +For functions that depend on multiple variables stored as multidimensional vectors $\bm{x}$, the cost function $c(\bm{x})$ can similarly be approximated by a first-order Taylor expansion around the point $\bm{x}_k$ with +%\begin{equation*} +% c(\bm{x}_k\!+\!\Delta\bm{x}_k) \approx c(\bm{x}_k) + \Delta\bm{x}_k^\trsp \, \frac{\partial c}{\partial\bm{x}}\Big|_{\bm{x}_k}, +%\end{equation*} +%which can also be rewritten in vector form as +\begin{equation*} + c(\bm{x}_k\!+\!\Delta\bm{x}_k) \approx c(\bm{x}_k) + {\bm{g}(\bm{x}_k)}^\trsp \Delta\bm{x}_k, +\end{equation*} +with gradient vector +\begin{equation*} + \bm{g}(\bm{x}_k) = \frac{\partial c}{\partial\bm{x}}\Big|_{\bm{x}_k}. +\end{equation*} + +By starting from an initial estimate $\bm{x}_1$ and by recursively refining the current estimate by following the gradient, we obtain at each iteration $k$ the recursion +\begin{equation*} + \bm{x}_{k+1} = \bm{x}_k - \alpha \bm{g}(\bm{x}_k). +\end{equation*} + + +%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% +\subsection{Newton's method} + +\begin{wrapfigure}{r}{.46\textwidth} %\vspace{-20pt} \centering \includegraphics[width=.42\textwidth]{images/NewtonMethod1D01.png} @@ -232,7 +300,7 @@ Newton's method for minimization, starting from an initial estimate $x_1$ and co \label{fig:Newton} \end{wrapfigure} -Now that we have this intuition, we can move to a more formal problem formulation. Newton's method attempts to solve $\min_x c(x)$ or $\max_x c(x)$ from an initial guess $x_1$ by using a sequence of \textbf{second-order Taylor approximations} of $c(x)$ around the iterates, see Fig.~\ref{fig:Newton}. +Figure \ref{fig:Newton} shows how consecutive \textbf{second-order Taylor approximations} of $c(x)$ around the iterates allow to solve a minimization problem. The second-order Taylor expansion around the point $x_k$ can be expressed as \begin{equation} @@ -243,20 +311,9 @@ where $c'(x_k)$ and $c''(x_k)$ are the first and second derivatives at point $x_ We are interested in solving minimization problems with this approximation. If the second derivative $c''(x_k)$ is positive, the quadratic approximation is a convex function of $\Delta x_k$, and its minimum can be found by setting the derivative to zero. -\newpage - -\begin{wrapfigure}{r}{.28\textwidth} -\vspace{-20pt} -\includegraphics[width=.26\textwidth]{images/optim_principle01.png} -\caption{\footnotesize -Finding local optima by localizing the points whose derivatives are zero (horizontal slopes). -} -\label{fig:optimPrinciple} -\end{wrapfigure} - -To find the local optima of a function, we can localize the points whose derivatives are zero, see Figure \ref{fig:optimPrinciple} for an illustration. +Indeed, to find the local optima of a function, we can localize the points whose derivatives are zero (horizontal slopes), see Figure \ref{fig:optimPrinciple} for an illustration. -Thus, by differentiating \eqref{eq:Taylor_1D} w.r.t.\ $\Delta x_k$ and equating to zero, we obtain +By differentiating \eqref{eq:Taylor_1D} w.r.t.\ $\Delta x_k$ and equating to zero, we then obtain \begin{equation*} c'(x_k) + c''(x_k) \, \Delta x_k = 0, \end{equation*} @@ -266,29 +323,40 @@ meaning that the minimum is given by \end{equation*} which corresponds to the offset to apply to $x_k$ to minimize the second-order polynomial approximation of the cost at this point. -\begin{wrapfigure}{r}{.28\textwidth} +\begin{wrapfigure}{r}{.32\textwidth} %\vspace{-20pt} \centering -\includegraphics[width=.26\textwidth]{images/NewtonMethod_negativeHessian01.png} +\includegraphics[width=.24\textwidth]{images/optim_principle01.png} \caption{\footnotesize -Newton update that would be achieved when the second derivative is negative. +Finding local optima. } -\label{fig:NewtonNegativeHessian} -%\vspace{-20pt} +\label{fig:optimPrinciple} \end{wrapfigure} -It is important that $c''(x_k)$ is positive if we want to find local minima, see Figure \ref{fig:NewtonNegativeHessian} for an illustration. - - By starting from an initial estimate $x_1$ and by recursively refining the current estimate by computing the offset that would minimize the polynomial approximation of the cost at the current estimate, we obtain at each iteration $k$ the recursion \begin{equation} x_{k+1} = x_k - \frac{c'(x_k)}{c''(x_k)}. \label{eq:Taylor_1D_update} \end{equation} +\begin{wrapfigure}{r}{.42\textwidth} +%\vspace{-20pt} +\centering +\includegraphics[width=.26\textwidth]{images/NewtonMethod_negativeHessian01.png} +\caption{\footnotesize +Newton update that would be achieved when the second derivative is negative. +} +\label{fig:NewtonNegativeHessian} +%\vspace{-20pt} +\end{wrapfigure} + +It is important that at each iteration, $c''(x_k)$ is positive, as we want to find local minima, see Figure \ref{fig:NewtonNegativeHessian} for an illustration. If $c''(x_k)$ is negative or too close to 0, it is in practice replaced by as small positive value in \eqref{eq:Taylor_1D_update}. + The geometric interpretation of Newton's method is that at each iteration, it amounts to the fitting of a paraboloid to the surface of $c(x)$ at $x_k$, having the same slopes and curvature as the surface at that point, and then proceeding to the maximum or minimum of that paraboloid. Note that if $c(x)$ is a quadratic function, then the exact extremum is found in one step, which corresponds to the resolution of a least-squares problem. %Note also that Newton's method is often modified to include a step size (e.g., estimated with line search). +Similarly as for gradient descent, in practice, a simple backtracking line search procedure can also be considered with Algorithm \ref{alg:linesearch}. + %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% \subsubsection*{Multidimensional case} @@ -350,6 +418,19 @@ By starting from an initial estimate $\bm{x}_1$ and by recursively refining the \label{eq:Taylor_nD_update} \end{equation} +In practice, we need to verify if the Hessian matrix $\bm{H}(\bm{x}_k)$ is positive definite at each iteration step. + +\begin{figure} +\centering +\includegraphics[width=.6\columnwidth]{images/gradientDescent_vs_NewtonMethod01.jpg} +\caption{\footnotesize +Convergence of gradient descent and Newton's method to solve minimization problems. +} +\label{fig:gradientDescent_vs_NewtonMethod} +\end{figure} + +Figure \ref{fig:gradientDescent_vs_NewtonMethod} shows the iteration steps taken by gradient descent and Newton's method to solve two minimization problems. The minimization problem in the first row is a quadratic cost function, where Newton's method converges in one iteration, while gradient descent requires multiple oscillatory steps to reach the minimum. + %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% \subsection{Gauss--Newton algorithm}\label{sec:GaussNewton} @@ -372,12 +453,21 @@ The update rule in \eqref{eq:Taylor_nD_update} then becomes \end{align} where $\bm{J}^\psin$ denotes the pseudoinverse of $\bm{J}$. -The Gauss--Newton algorithm is the workhorse of many robotics problems, including inverse kinematics and optimal control, as we will see in the next sections. +For comparison, the corresponding gradient descent problem would be computed as +\begin{equation*} + \bm{x}_{k+1} = \bm{x}_k - \, \bm{J}^\trsp(\bm{x}_k) \, \bm{f}(\bm{x}_k). +\end{equation*} + +Thus, for costs that can be expressed as $c(\bm{x})=\bm{f}(\bm{x})^\trsp\bm{f}(\bm{x})$, the difference between first order or second order optimization is that the latter transforms the gradient of the former by the inverse of the Hessian matrix $\bm{H}(\bm{x}_k)=\bm{J}^\trsp(\bm{x}_k) \bm{J}(\bm{x}_k)$. Another way to described this difference is to explain the former as an approximation of the latter with an Hessian matrix defined as identity (i.e., $\bm{H}=\bm{I}$). + +The Gauss--Newton algorithm is the workhorse of many robotics problems, including inverse kinematics and optimal control, as we will see later in the document. %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% \subsection{Least squares}\label{sec:LS} +We show in this section the direct link between Gauss--Newton algorithm and least squares. Extensions of least square are also presented here, which can also directly be used for optimization problems. + When the cost $c(\bm{x})$ is a quadratic function w.r.t.\ $\bm{x}$, the optimization problem can be solved directly, without requiring iterative steps. Indeed, for any matrix $\bm{A}$ and vector $\bm{b}$, we can see that if \begin{equation} c(\bm{x}) = \|\bm{A}\bm{x}-\bm{b}\|^2 = (\bm{A}\bm{x}-\bm{b})^\trsp (\bm{A}\bm{x}-\bm{b}), @@ -400,21 +490,21 @@ In contrast to the Gauss--Newton algorithm presented in Section \ref{sec:GaussNe \end{align*} which converges in a single iteration, independently of the initial guess $\bm{x}_0$. Indeed, a cost that takes a quadratic form with respect to the decision variable can be soled in batch form (least squares solution), which will be a useful property that we will exploit later in the context of linear quadratic controllers. -A ridge regression problem can similarly be defined as +A \textbf{ridge regression} problem can similarly be defined as \begin{align} \bm{\hat{x}} &= \arg\min_{\bm{x}} \; \|\bm{A}\bm{x}-\bm{b}\|^2 + \alpha \|\bm{x}\|^2 \nonumber\\ &= (\bm{A}^\trsp \bm{A} + \alpha\bm{I})^{-1} \bm{A}^\trsp \bm{b}, \label{eq:regLS} \end{align} -which is also called penalized least squares, robust regression, damped least squares or Tikhonov regularization. In \eqref{eq:regLS}, $\bm{I}$ denotes an identity matrix (diagonal matrix with 1 as elements in the diagonal). $\alpha$ is typically a small scalar value acting as a regularization term when inverting the matrix $\bm{A}^\trsp \bm{A}$. +which is also called \textbf{penalized least squares}, \textbf{robust regression}, \textbf{damped least squares} or \textbf{Tikhonov regularization}. In \eqref{eq:regLS}, $\bm{I}$ denotes an identity matrix (diagonal matrix with 1 as elements in the diagonal). $\alpha$ is typically a small scalar value acting as a regularization term when inverting the matrix $\bm{A}^\trsp \bm{A}$. -The cost can also be weighted by a matrix $\bm{W}$, providing the weighted least squares solution +The cost can also be weighted by a matrix $\bm{W}$, providing the \textbf{weighted least squares} solution \begin{align} \bm{\hat{x}} &= \arg\min_{\bm{x}} \; \|\bm{A}\bm{x}-\bm{b}\|^2_{\bm{W}} \nonumber\\ &= \arg\min_{\bm{x}} \; (\bm{A}\bm{x}-\bm{b})^\trsp \bm{W} (\bm{A}\bm{x}-\bm{b}) \nonumber\\ &= (\bm{A}^\trsp \bm{W} \bm{A})^{-1} \bm{A}^\trsp \bm{W} \bm{b}. \label{eq:WLS} \end{align} -By combining \eqref{eq:regLS} and \eqref{eq:WLS}, a weighted ridge regression problem can be defined as +By combining \eqref{eq:regLS} and \eqref{eq:WLS}, a \textbf{weighted ridge regression} problem can be defined as \begin{align} \bm{\hat{x}} &= \arg\min_{\bm{x}} \; \|\bm{A}\bm{x}-\bm{b}\|^2_{\bm{W}} + \alpha \|\bm{x}\|^2 \nonumber\\ &= (\bm{A}^\trsp \bm{W} \bm{A} + \alpha\bm{I})^{-1} \bm{A}^\trsp \bm{W} \bm{b}. \label{eq:regWLS} @@ -573,7 +663,7 @@ f = np.array([L @ np.diag(l) @ np.cos(L @ x), L @ np.diag(l) @ np.sin(L @ x)]) # \begin{figure} \centering -\includegraphics[width=.5\columnwidth]{images/transformations01.png} +\includegraphics[width=.5\columnwidth]{images/transformations01.jpg} \caption{\footnotesize Typical transformations involved in a manipulation task involving a robot, a vision system, a visual marker on the object, and a desired grasping location on the object. } @@ -2232,15 +2322,6 @@ from $t=2$ to $t=T-1$. \filename{iLQR\_manipulator.*} \end{flushright} -\begin{algorithm} -\caption{Backtracking line search method with parameter $\alpha_{\min}$} -\label{alg:linesearch} -$\alpha \gets 1$ \\ -\While{$c(\bm{\hat{u}}+\alpha\;\Delta\bm{\hat{u}}) > c(\bm{\hat{u}}) \;\textbf{and}\;\; \alpha > \alpha_{\min}$}{ - $\alpha \gets \frac{\alpha}{2}$ -} -\end{algorithm} - \begin{algorithm} \caption{Batch formulation of iLQR} \label{alg:iLQRbatch} @@ -2287,7 +2368,7 @@ Initialize all $\bm{\hat{u}}_t$ \\ Use \eqref{eq:iLQRrecursiveController} and $\bm{d}(\cdot)$ for reproduction \\ \end{algorithm} -To be more efficient, iLQR most often requires at each iteration to estimate a step size $\alpha$ to scale the control command updates. +iLQR typically requires to estimate a step size $\alpha$ at each iteration to scale the control command updates. For the batch formulation in Section \ref{sec:iLQRbatch}, this can be achieved by setting the update \eqref{eq:du_general} as \begin{equation} \bm{\hat{u}} \leftarrow \bm{\hat{u}} + \alpha \; \Delta\bm{\hat{u}}, @@ -2301,14 +2382,14 @@ For the recursive formulation in Section \ref{sec:iLQRrecursive}, this can be ac \label{eq:alpha2} \end{equation} -In practice, a simple backtracking line search procedure can be considered with Algorithm \ref{alg:linesearch}, by considering a small value for $\alpha_{\min}$. For more elaborated methods, see Ch.~3 of \cite{Nocedal06}. +In practice, a simple backtracking line search procedure can be considered, as shown in Algorithm \ref{alg:linesearch}, by considering a small value for $\alpha_{\min}$. The complete iLQR procedures are described in Algorithms \ref{alg:iLQRbatch} and \ref{alg:iLQRrecursive} for the batch and recursive formulations, respectively. %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% %\bm in titles and hyperref are incompatible: https://tex.stackexchange.com/questions/174840/bmp-in-the-section-title-or-subsection-title-produces-error -\subsection{iLQR with quadratic cost on {\boldmath$f(x_t)$}} +\section{iLQR with quadratic cost on f(x)} %{\boldmath$f(x_t)$} \label{sec:iLQR_quadraticCost} \begin{flushright} \filename{iLQR\_manipulator.*} @@ -2380,7 +2461,7 @@ In the next sections, we show examples of functions $\bm{f}(\bm{x})$ that can re %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% -\subsubsection{Robot manipulator} +\subsection{Robot manipulator} \begin{flushright} \filename{iLQR\_manipulator.*} \end{flushright} @@ -2405,7 +2486,7 @@ For a robot manipulator, $\bm{f}(\bm{x}_t)$ in \eqref{eq:du} typically represent \end{align*} %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% -%\subsubsection{Bounded joint space} +%\subsection{Bounded joint space} %\begin{figure} %\centering @@ -2427,7 +2508,7 @@ For a robot manipulator, $\bm{f}(\bm{x}_t)$ in \eqref{eq:du} typically represent %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% -\subsubsection{Bounded joint space} +\subsection{Bounded joint space} %\begin{figure} %\centering @@ -2470,7 +2551,7 @@ We can see with \eqref{eq:dcdx} that for $\bm{Q}=\frac{1}{2}\bm{I}$, if $\bm{x}$ %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% -\subsubsection{Bounded task space} +\subsection{Bounded task space} The iLQR solution in \eqref{eq:du} can be used to keep the endeffector within a boundary (e.g., endeffector position limits). Based on the above definitions, $\bm{f}(\bm{x})$ and $\bm{J}(\bm{x})$ are in this case defined as @@ -2483,7 +2564,7 @@ Based on the above definitions, $\bm{f}(\bm{x})$ and $\bm{J}(\bm{x})$ are in thi \begin{SCfigure} \centering -\includegraphics[width=.6\textwidth]{images/iLQR_objectBoundaries01.png} +\includegraphics[width=.6\textwidth]{images/iLQR_objectBoundaries01.jpg} \caption{\footnotesize Example of a viapoints task in which a planar robot with 3 joints needs to sequentially reach 2 objects, with object boundaries defining the allowed reaching points on the objects surfaces. \emph{Left:} Reaching task with two viapoints at $t=25$ and $t=50$. \emph{Right:} Corresponding values of the cost function for the endeffector space at $t=25$ and $t=50$. } @@ -2509,7 +2590,7 @@ see also Fig.~\ref{fig:iLQR_manipulator}. %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% -\subsubsection{Initial state optimization}\label{sec:iLQR_initStateOptim} +\subsection{Initial state optimization}\label{sec:iLQR_initStateOptim} \begin{flushright} \filename{iLQR\_manipulator\_initStateOptim.*} \end{flushright} @@ -2546,9 +2627,10 @@ where $\bm{\mu}$ is a vector containing the initial elements in $\bm{x}_1$ that The above approach can for example be used to estimate the optimal placement of a robot manipulator. For a planar robot, this can for example be implemented by defining the kinematic chain starting with two prismatic joints along the two axes directions, providing a way to represent the location of a free floating platform as two additional prismatic joints in the kinematic chain. If we would like the base of the robot to remain static, we can request that these first two prismatic joints will stay still during the motion. By using the above formulation, this can be done by setting arbitrary large control weights for these corresponding joints, see Figure \ref{fig:iLQR_initStateOptim} and the corresponding source code example. +\newpage %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% -\subsubsection{Center of mass} +\subsection{Center of mass} \begin{flushright} \filename{iLQR\_manipulator\_CoM.*} \end{flushright} @@ -2604,7 +2686,7 @@ The forward kinematics function $\bm{f}^\tp{CoM}$ can be used in tracking tasks %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% -\subsubsection{Bimanual robot} +\subsection{Bimanual robot} \begin{flushright} \filename{iLQR\_bimanual.*} \end{flushright} @@ -2693,7 +2775,7 @@ with the corresponding Jacobian matrix $\bm{J}^\tp{CoM}\in\mathbb{R}^{2\times 5} %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% -\subsubsection{Obstacle avoidance with ellipsoid shapes} %(1-e'*e)'*(1-e'*e) version, where 1-e'*e is a scalar +\subsection{Obstacle avoidance with ellipsoid shapes} %(1-e'*e)'*(1-e'*e) version, where 1-e'*e is a scalar \begin{flushright} \filename{iLQR\_obstacle.*} \end{flushright} @@ -2742,7 +2824,7 @@ A similar principle can be applied to robot manipulators by composing forward ki %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% -\subsubsection{Distance to a target} %(1-e'*e)'*(1-e'*e) version, where 1-e'*e is a scalar +\subsection{Distance to a target} %(1-e'*e)'*(1-e'*e) version, where 1-e'*e is a scalar \begin{flushright} \filename{iLQR\_distMaintenance.*} \end{flushright} @@ -2778,7 +2860,7 @@ Figure \ref{fig:distMaintain} presents examples with a point mass. %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% -\subsubsection{Manipulability tracking} +\subsection{Manipulability tracking} \begin{flushright} \filename{iLQR\_bimanual\_manipulability.*} \end{flushright} @@ -2821,7 +2903,7 @@ The approach can also be extended to other forms of symmetric positive definite %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% -\subsubsection{Decoupling of control commands} +\subsection{Decoupling of control commands} \begin{wrapfigure}{r}{.22\textwidth} %\vspace{-20pt} @@ -2859,7 +2941,7 @@ Figure \ref{fig:iLQR_decoupling} presents a simple example within a 2D reaching %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% -\subsubsection{Curvature} +\subsection{Curvature} \begin{wrapfigure}{r}{.38\textwidth} %\vspace{-20pt} @@ -3195,11 +3277,12 @@ which can then be converted to a discrete form with \eqref{eq:AdAc}. % \label{eq:du} %\end{equation} +\newpage %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% \section{Ergodic control}\label{sec:ergodicControl} -Ergodic theory studies the connection between the time-averaged and space-averaged behaviors of a dynamical system. By using it in a search and exploration context, it enables optimal exploration of an information distribution. Exploration problems can be formulated in various ways, see Figures \ref{fig:ergodic_intro} and \ref{fig:ergodic_vs_patterned}. +Ergodic theory studies the connection between the time-averaged and space-averaged behaviors of a dynamical system. By using it in a search and exploration context, it enables optimal exploration of an information distribution. Exploration problems can be formulated in various ways, see Figures \ref{fig:ergodic_intro}, \ref{fig:ergodic_vs_patterned} and \ref{fig:ergodic_fromCrudetoPrecise}. Research in ergodic control addresses fundamental challenges linking machine learning, optimal control, signal processing and information theory. A conventional tracking problem in robotics is characterized by a target to reach, requiring a controller to be computed to reach this target. In \emph{ergodic control}, instead of providing a single target point, a probability distribution is given to the robot, which must cover the distribution in an efficient way. Ergodic control thus consists of moving within a spatial distribution by spending time in each part of the distribution in proportion to its density. %(namely, ``tracking a distribution'' instead of ``tracking a point''). @@ -3217,28 +3300,14 @@ The underlying objective of \emph{spectral multiscale coverage} (SMC) takes a si It requires the spatial distribution to be decomposed as Fourier series, with a cost function comparing the spectral decomposition of the robot path with the spectral decomposition of the distribution, in the form of a weighted distance function with a decreasing weight from low frequency to high frequency components. The resulting controller allows the robot to explore the given spatial distribution in a natural manner, \textbf{by starting from a crude exploration and by refining the search progressively} (i.e., matching the Fourier coefficients with an increasing importance from low to high frequency components). - \begin{figure} -\centering{\includegraphics[width=.6\columnwidth]{images/ergodicVsPatternedSearch01.jpg}} +\centering{\includegraphics[width=.7\columnwidth]{images/ergodicVsPatternedSearch01.jpg}} \caption{\footnotesize Search and exploration problems can be formulated in various ways. Ergodic control can be used in a search problem in which we do not know the duration of the exploration (e.g., we would like to find a hidden target as soon as possible). In contrast to stochastic sampling of a distribution, ergodic control takes into account the cost to move the agent from one point to another. If the search optimization problem is formulated as a coverage problem with fixed duration, the optimal path typically reveals some regular coverage patterns (here, illustrated as spirals, but also called lawnmower strategy). Ergodic control instead generates a more natural search behavior to localize the object to search by starting with a crude coverage of the workspace and by then generating progressively more detailed paths to cover the distribution. This for example resembles the way we would search for keys in a house by starting with a distribution about the locations in the rooms in which we believe the keys could be, and by covering these distribution coarsely as a first pass, followed by progressive refinement about the search with more detailed coverage until the keys are found. Both stochastic samples or patterned search would be very inefficient here! \label{fig:ergodic_intro} } \end{figure} - -%ergodicControl-fromCrudetoPrecise01.jpg - -%\begin{figure} -%\centering{ - -%} -%\caption{ -%... -%\label{fig:ergodic_vs_patterned} -%} -%\end{figure} - \begin{SCfigure}[50] \centering \begin{tabular}{cc} @@ -3249,19 +3318,18 @@ Ergodic search & Patterned search\\ \includegraphics[width=.24\columnwidth]{images/SMC-room001c.jpg} \end{tabular} \caption{\footnotesize -Example illustrating the search of a car key in a living room, where regions of interest are specified in the form of a distribution over the spatial domain. These regions of interest correspond here to the different tables in the room where it is more likely to find the key. The left and right columns respectively depict an ergodic search and patterned search processes. For a coverage problem in which the duration is given in advance, a patterned search can typically be better suited than an ergodic control search, because the agent will optimize the path to reduce the number of transitions from one area to the other. In contrast, an ergodic control search will typically result in multiple transitions between the different regions of interest. When searching for a key, since we are interested in finding the key as early as possible, we have a problem in which the total search duration cannot be specified in advance. In this problem setting, an ergodic search will be better suited than a patterned search, as it generates a natural way of exploring the environment by first proceeding with a crude search in the whole space and by then progressively fine-tuning the search with additional details until the key is finally found. The bottom row shows an example in which the key location is unknown, with the search process stopping when the key is found. Here, a patterned search would provide an inefficient and unnatural way of scanning the whole environment regularly by looking at all details regions by regions, instead of treating the problem at different scales (i.e., at different spatial frequencies), which is what ergodic exploration provides. +Example illustrating the search of a key in a living room, where regions of interest are specified in the form of a distribution over the spatial domain. These regions of interest correspond here to the different tables in the room where it is more likely to find the key. The left and right columns respectively depict an ergodic search and patterned search processes. For a coverage problem in which the duration is given in advance, a patterned search can typically be better suited than an ergodic control search, because the agent will optimize the path to reduce the number of transitions from one area to the other. In contrast, an ergodic control search will typically result in multiple transitions between the different regions of interest. When searching for a key, since we are interested in finding the key as early as possible, we have a problem in which the total search duration cannot be specified in advance. In this problem setting, an ergodic search will be better suited than a patterned search, as it generates a natural way of exploring the environment by first proceeding with a crude search in the whole space and by then progressively fine-tuning the search with additional details until the key is finally found. The bottom row shows an example in which the key location is unknown, with the search process stopping when the key is found. Here, a patterned search would provide an inefficient and unnatural way of scanning the whole environment regularly by looking at all details regions by regions, instead of treating the problem at different scales (i.e., at different spatial frequencies), which is what ergodic exploration provides. } \label{fig:ergodic_vs_patterned} \end{SCfigure} -\begin{figure} -\centering{\includegraphics[width=.7\columnwidth]{images/ergodic01.png}} +\begin{SCfigure}[25] +\centering{\includegraphics[width=.75\columnwidth]{images/ergodic01.png}} \caption{\footnotesize Insertion tasks can be achieved robustly with ergodic control, by not only relying on accurate sensors, but instead using a control strategy to cope with limited or inaccurate sensing information, see \cite{Shetty21TRO} for details. \label{fig:ergodic_insertion} } -\end{figure} - +\end{SCfigure} %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% \subsubsection{Unidimensional SMC}\label{sec:SMC1D} @@ -3347,6 +3415,17 @@ Well-known applications of Fourier basis functions in the context of time series In ergodic control, the aim is instead to find a series of control commands $u(t)$ so that the retrieved trajectory $x(t)$ covers a bounded space $\mathcal{X}$ in proportion of a desired spatial distribution $\hat{g}(x)$. %, see Fig.~\ref{fig:ergodic_SMC}-\emph{(a)}. This can be achieved by defining a metric in the spectral domain, by decomposing in Fourier series coefficients both the desired spatial distribution $\hat{g}(x)$ and the (partially) retrieved trajectory $x(t)$, which can exploit the Fourier series properties presented in Table \ref{tab:Fourier}, which can also be extended to more dimensions. + +\begin{wrapfigure}{r}{.34\textwidth} +\centering +\includegraphics[width=.32\textwidth]{images/ergodicControl-fromCrudetoPrecise01.jpg} +\caption{\footnotesize +Although ergodic control with SMC is specified in the spectral domain, it can also be interpreted in the spatial domain as a coverage problem in which both the path of the agent and the targeted distributions are blurred with filters of varying strength. With very strong blurring, the agent can match the desired distribution very quickly (first row). The weaker the filter, the more time the agent will need to cover the path to match the details of the distribution (second and third row). As an optimization problem, the weights $\Lambda_k$ are set to promote a good match of the distributions that are strongly blurred (i.e., low frequency components). When this part of the cost goes to zero, the agent can then progressively refine the path so that less blurred distributions can also be matched (i.e., higher frequency components). +} +\label{fig:ergodic_fromCrudetoPrecise} +%\vspace{-20pt} +\end{wrapfigure} + The goal of ergodic control is to minimize \begin{equation} \epsilon = \frac{1}{2} \sum_{k=0}^{K-1} \Lambda_k \Big( w_k - \hat{w}_k \Big)^{\!2} \label{eq:ergodicerror1D} @@ -3393,6 +3472,16 @@ It is important to emphasize that the implementation of this controller does not \filename{ergodic\_control\_SMC\_1D.*} \end{flushright} + +%\begin{SCfigure}[50] +%\centering +%\includegraphics[width=.6\textwidth]{images/ergodicControl-fromCrudetoPrecise01.jpg} +%\caption{\footnotesize +%Although ergodic control with SMC is specified in the spectral domain, it can also be interpreted in the spatial domain as a coverage problem in which both the path of the agent and the targeted distributions are blurred with filters of varying strength. With very strong blurring, the agent can match the desired distribution very quickly (first row). The weaker the filter, the more time the agent will need to cover the path to match the details of the distribution (second and third row). As an optimization problem, the weights $\Lambda_k$ are set to promote a good match of the distributions that are strongly blurred (i.e., low frequency components). When this part of the cost goes to zero, the agent can then progressively refine the path so that less blurred distributions can also be matched (i.e., higher frequency components). +%} +%\label{fig:ergodic_fromCrudetoPrecise} +%\end{SCfigure} + \noindent The superposition of basis functions in \eqref{eq:gx} can be rewritten as \begin{equation} g(x) = \bm{w}^\trsp \bm{\phi}(x), @@ -3421,7 +3510,9 @@ where $\bm{\nabla}_{\!\!x}\bm{\phi}(x)$ is a vector formed by the $K$ elements $ \begin{equation} u(t) = - {\bm{\nabla}_{\!\!x}\bm{\phi}(x)}^\trsp \; \bm{\Lambda} \, \big( \bm{w} - \bm{\hat{w}} \big). \end{equation} -\\[2mm] +%\\[2mm] + +Figure \ref{fig:ergodic_fromCrudetoPrecise} shows how this controller can be interpreted in the spatial domain. %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%