forked from RussTedrake/underactuated
-
Notifications
You must be signed in to change notification settings - Fork 0
/
output_feedback.html
350 lines (281 loc) · 18 KB
/
output_feedback.html
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
<!DOCTYPE html>
<html>
<head>
<title>Ch. 15 -
Output Feedback (aka Pixels-to-Torques)</title>
<meta name="Ch. 15 -
Output Feedback (aka Pixels-to-Torques)" content="text/html; charset=utf-8;" />
<link rel="canonical" href="http://underactuated.mit.edu/output_feedback.html" />
<script src="https://hypothes.is/embed.js" async></script>
<script type="text/javascript" src="chapters.js"></script>
<script type="text/javascript" src="htmlbook/book.js"></script>
<script src="htmlbook/mathjax-config.js" defer></script>
<script type="text/javascript" id="MathJax-script" defer
src="htmlbook/MathJax/es5/tex-chtml.js">
</script>
<script>window.MathJax || document.write('<script type="text/javascript" src="https://cdn.jsdelivr.net/npm/mathjax@3/es5/tex-chtml.js" defer><\/script>')</script>
<link rel="stylesheet" href="htmlbook/highlight/styles/default.css">
<script src="htmlbook/highlight/highlight.pack.js"></script> <!-- http://highlightjs.readthedocs.io/en/latest/css-classes-reference.html#language-names-and-aliases -->
<script>hljs.initHighlightingOnLoad();</script>
<link rel="stylesheet" type="text/css" href="htmlbook/book.css" />
</head>
<body onload="loadChapter('underactuated');">
<div data-type="titlepage">
<header>
<h1><a href="index.html" style="text-decoration:none;">Underactuated Robotics</a></h1>
<p data-type="subtitle">Algorithms for Walking, Running, Swimming, Flying, and Manipulation</p>
<p style="font-size: 18px;"><a href="http://people.csail.mit.edu/russt/">Russ Tedrake</a></p>
<p style="font-size: 14px; text-align: right;">
© Russ Tedrake, 2023<br/>
Last modified <span id="last_modified"></span>.</br>
<script>
var d = new Date(document.lastModified);
document.getElementById("last_modified").innerHTML = d.getFullYear() + "-" + (d.getMonth()+1) + "-" + d.getDate();</script>
<a href="misc.html">How to cite these notes, use annotations, and give feedback.</a><br/>
</p>
</header>
</div>
<p><b>Note:</b> These are working notes used for <a
href="https://underactuated.csail.mit.edu/Spring2023/">a course being taught
at MIT</a>. They will be updated throughout the Spring 2023 semester. <a
href="https://www.youtube.com/channel/UChfUOAhz7ynELF-s_1LPpWg">Lecture videos are available on YouTube</a>.</p>
<table style="width:100%;"><tr style="width:100%">
<td style="width:33%;text-align:left;"><a class="previous_chapter" href=robust.html>Previous Chapter</a></td>
<td style="width:33%;text-align:center;"><a href=index.html>Table of contents</a></td>
<td style="width:33%;text-align:right;"><a class="next_chapter" href=limit_cycles.html>Next Chapter</a></td>
</tr></table>
<script type="text/javascript">document.write(notebook_header('output_feedback'))
</script>
<!-- EVERYTHING ABOVE THIS LINE IS OVERWRITTEN BY THE INSTALL SCRIPT -->
<chapter style="counter-reset: chapter 14"><h1>
Output Feedback (aka Pixels-to-Torques)</h1>
<p>In this chapter we will finally start considering systems of the form:
\begin{gather*} \bx[n+1] = {\bf f}(\bx[n], \bu[n], \bw[n]) \\ \by[n] =
{\bf g}(\bx[n], \bu[n], \bv[n]),\end{gather*} where most of these symbols
have been <a href="robust.html">described before</a>, but we have now added
$\by[n]$ as the output of the system, and $\bv[n]$ which is representing
"measurement noise" and is typically the output of a random process. In
other words, we'll finally start addressing the fact that we have to make
decisions based on sensor measurements -- most of our discussions until now
have tacitly assumed that we have access to the true state of the system for
use in our feedback controllers (and that's already been a hard problem).
</p>
<p>In some cases, we will see that the assumption of "full-state feedback" is
not so bad -- we do have good tools for state estimation from raw sensor data.
But even our best state estimation algorithms do add some dynamics to the
system in order to filter out noisy measurements; if the time constants of
these filters is near the time constant of our dynamics, then it becomes
important that we include the dynamics of the estimator in our analysis of the
closed-loop system.</p>
<p>In other cases, it's entirely too optimistic to design a controller
assuming that we will have an estimate of the full state of the system. Some
state variables might be completely unobservable, others might require
specific "information-gathering" actions on the part of the controller.</p>
<p>For me, the problem of robot manipulation is one important application
domain where more flexible approaches to output feedback become critically
important. Imagine you are trying to design a controller for a robot that
needs to button the buttons on your shirt. Our current tools would require
us to first estimate the state of the shirt (how many degrees of freedom does
my shirt have?); but certainly the full state of my shirt should not be
required to button a single button. Or if you want to program a robot to
make a salad -- what's the state of the salad? Do I really need to know the
positions and velocities of every piece of lettuce in order to be successful?
These questions are (finally) getting a lot of attention in the research
community these days, under the umbrella of "learning state representations".
But what does it mean to be a good state representation? There are a number
of simple lessons from output feedback in control that can shed light on this
fundamental question.
</p>
<section><h1>Background</h1>
<subsection><h1>The classical perspective</h1>
<p>To some extent, this idea of calling out "output feedback" as an
advanced topic is a relatively new problem. Before state-space and
optimization-based approaches to control ushered in "modern control", we
had "classical control". Classical control focused predominantly (though
not exclusively) on linear time-invariant (LTI) systems, and made very
heavy use of frequency-domain analysis (e.g. via the Fourier
Transform/Laplace Transform). There are many excellent books on the
subject; <elib>Hespanha09+Astrom10</elib> are nice examples of modern
treatments that start with state-space representations but also treat the
frequency-domain perspective. "Pole placement" and "loop shaping" are some
of the tools of this trade.</p>
<p>What's important for us to acknowledge here is that in classical
control, basically everything was built around the idea of output
feedback. The fundamental concept is the transfer function of a system,
which is a input-to-output map (in frequency domain) that can completely
characterize an LTI system. Core concepts like pole placement and loop
shaping were fundamentally addressing the challenge of output feedback
that we are discussing here. Sometimes I feel that, despite all of the
things we've gained with modern, optimization-based control, I worry that
we've lost something in terms of considering rich characterizations of
closed-loop performance (rise time, dwell time, overshoot, ...) and
perhaps even in practical robustness of our systems to unmodeled
errors.</p>
<todo>Add a few examples here that capture it.</todo>
</subsection>
<subsection><h1>From pixels to torques</h1>
<p>Just like some of our oldest approaches to control were fundamentally
solving an output feedback problem, some of our newest approaches to
control are doing it, too. Deep learning has revolutionized computer
vision, and "deep imitation learning" and "deep reinforcement learning"
have been a recent source of many impressive demonstrations of control
systems that can operate directly from pixels (e.g. consuming the output
of a deep perception system), without explicitly representing nor
estimating the full state of the system. Unfortunately, the success or
failure of these methods are not yet well understood, and they often
require a great deal of artisanal tuning and an embarrassing (sometimes
prohibitive) amount of computation.</p>
<p>The synthesis of ideas between machine learning (both theoretical and
applied) and control theory is one of the most exciting and productive
frontiers for research today. I am highly optimistic that we will be
able to uncover the underlying principles and help transition this
budding field into a technology. I hope that summarizing some of the key
lessons from control here can help.</p>
</subsection>
</section>
<section><h1>Static Output Feedback</h1>
<p>One of the extremely important almost unstated lessons from dynamic
programming with additive costs and the Bellman equation is that the
optimal policy can <i>always</i> be represented as a function $\bu^*
= \pi^*(\bx).$ So far in these notes, we've assumed that the controller has
direct access to the true state, $\bx$. In this chapter, we are finally
removing that assumption. Now the controller only has direct access to the
potentially noisy observations $\by$.</p>
<p>So the natural first question to ask might be, what happens if we write
our policies now as a function, $\bu = \pi(\by)?$ This is known as
"static" output feedback, in contrast to "dynamic" output feedback where
the controller is not a static function, but is itself another input-output
dynamical system. Unfortunately, in the general case it is <i>not</i> the
case that optimal policies can be perfectly represented with static output
feedback. But one can still try to solve an optimal control problem where
we restrict our search to static policies; our goal will be to find the
best controller in this class to minimize the cost.</p>
<subsection><h1>A hardness result</h1>
<p>We've already seen <a
href="policy_search.html#static_output_feedback">an example</a> of a very
simple linear control problem where the set of stabilizing feedback gains
formed a disconnected set -- which is suggestive that it could be a
difficult problem for optimization. For some other problems in control,
we've been able to find a convex reparametrization.</p>
<p>Unfortunately, <elib>Blondel97</elib> showed that the question of
whether a stabilizing static output feedback $\bu = -\bK \by$ even exists
for a given system of the form $$\dot\bx = \bA\bx + \bB\bu,\quad \by =
\bC \bx,$$ is NP-hard. Many of the strongest results from $H_2$ and
$H_\infty$ design, for instance, are limited to dynamic controllers that
can effectively reconstruct the entire state.</p>
<p>Just because this problem is NP-hard doesn't mean we can't find good
controllers in practice. Some of the recent results from reinforcement
learning have reminded us of this. We should not expect an efficient
globally optimal algorithm that works for every problem instance; but we
should absolutely keep working on the problem. Perhaps the class of
problems that our robots will actually encounter in the real world is a
easier than this general case (the standard examples of bad cases in
linear systems, e.g. with interleaved poles and zeros, do feel a bit
contrived and unlikely to occur in practice).</p>
</subsection>
<subsection><h1>Via policy search</h1>
<p>Searching for the best controller within a parametric class of
policies is generally referred to as <a href="policy_search.html">policy
search</a>. If we do policy search on a class of static output feedback
policies, how well does it perform? Of course, the answer depends on the
particular governing equations (for instance, $\by = \bx$ is a perfectly
reasonable output, and in this case the policy can be optimal).</p>
<todo>Jack had another nice example in his poly search lecture:
https://youtu.be/JhjROrZxBhM?t=1099 and "what goes wrong in output
feedback?" https://youtu.be/JhjROrZxBhM?t=5066</todo>
<p>Bilinear alternations with SOS, Policy search with SGD, ...</p>
</subsection>
</section>
<section><h1>Observer-based Feedback</h1>
<subsection><h1>Luenberger Observer</h1></subsection>
<subsection><h1>Linear Quadratic Regulator w/ Gaussian Noise
(LQG)</h1></subsection>
<subsection><h1>Partially-observable Markov Decision Processes</h1>
<todo>Defer most of the discussion to the state estimation
chapter</todo>
</subsection>
<subsection><h1>Trajectory optimization with Iterative LQG</h1></subsection>
</section>
<todo>
Russ Tedrake 11:28 AM
so we have three broad approaches to searching for linear dynamical controllers (for linear dynamical plants): 1) gradient descent in the original parameters. you have asymptotic convergence, but not a guaranteed rate.
2) convex reparameterizations (e.g. from scherer). in here there is a rank condition is trivially satisfied, and therefore disappears, when dim(xc) >= dim(x)
3) Youla/SLS, which in the time-domain is a finite-time synthesis but can be used to recover the LDC to arbitrary precision (e.g. via Ho-Kalman).
is that a reasonable summary?
jack 11:36 AM
I think this is reasonable :slightly_smiling_face: I would probably add the Riccati solutions of the famous DGKF paper https://authors.library.caltech.edu/3087/1/DOYieeetac89.pdf (if you wanted “completeness”)
gradient descent in original parameters; no suboptimal stationary points for minimal controllers
finite dimensional convex reparametrizations from Scherer, as you wrote
DGKF solution: solve two Riccati equations
Disturbance feedback: Youla/Q/SLS, etc. Synthesis problem is convex, but the decision variable(s) are infinite dimensional, but stable, transfer functions. Various approximations can be used (most common these days, for academic examples and learning theorists, is FIR).
11:37
I would be a bit cautious about recovering the LDC with Ho-Kalman. Nik Matni and James Anderson had a paper https://ieeexplore.ieee.org/abstract/document/8262844 about recovering state-space controllers from FIRs, in the SLS setting. According to James, I don’t think they meant it as a serious paper (i.e. something that you would ever actually implement), just an investigation.
I suppose I should re-read that paper, but there was no dimensionality reduction going on, when recovering a state-space representation of the FIR
</todo>
<section><h1>Disturbance-based feedback</h1>
<todo>State-space models. ARX Models.</todo>
<p>Here is a simple extension of the <a href="lqr.html#sls">LQR with
least-squares derivation</a>... (it's a work in progress!)</p>
<p>Given the state space equations: \begin{gather*} \bx[n+1] = \bA\bx[n] +
\bB\bu[n] + \bw[n],\end{gather*} Consider parametrizing an output feedback
policy of the form $$\bu[n] = \bK_0[n] \bx_0 +
\sum_{i=1}^{n-1}\bK_i[n]\bw[n-i],$$ then the closed-loop state is convex in
the control parameters, $\bK$: \begin{align*}\bx[n] =& \left( {\bf A}^n +
\sum_{i=0}^{n-1}{\bf A}^{n-i-1}{\bf B}{\bf K}_0[i] \right) \bx_0 +
\sum_{j=0}^{n-1} \sum_{i=0}^{n-1}{\bf A}^{n-i-1}{\bf B}{\bf K}_{j}[i]
\bw[i-j],\end{align*} and therefore objectives that are convex in $\bx$ and
$\bu$ (like LQR) are also convex in $\bK$. Moreover, we can calculate
$\bw[n]$ by the time that it is needed given our observations of $\bx[n+1],
\bx[n],$ and knowledge of $\bu[n].$</p>
<p>We can extend this to output disturbance-based feedback: \begin{gather*}
\bx[n+1] = \bA\bx[n] + \bB\bu[n] + \bw[n],\\ \by[n] = \bC\bx[n] + \bv[n],
\end{gather*} by parametrizing an output feedback policy of the form
$$\bu[n] = \bK_0[n] \by[0] + \sum_{i=1}^{n-1}\bK_i[n]{\bf e}[n-i],$$ where ${\bf e}[n] = ...$ <elib>Sadraddini20</elib>.</p>
<subsection><h1>System-Level Synthesis</h1></subsection>
</section>
<todo>Task-relevant variables / learning state representations</todo>
<section><h1>Feedback from Pixels</h1></section>
<todo>Should I even mention teacher-student (as used by Marco Hutter/Pulkit)?
Put it in context?</todo>
</chapter>
<!-- EVERYTHING BELOW THIS LINE IS OVERWRITTEN BY THE INSTALL SCRIPT -->
<div id="references"><section><h1>References</h1>
<ol>
<li id=Hespanha09>
<span class="author">Joao P. Hespanha</span>,
<span class="title">"Linear Systems Theory"</span>, Princeton Press
, <span class="year">2009</span>.
</li><br>
<li id=Astrom10>
<span class="author">Karl Johan {\AA}str{\"o}m and Richard M Murray</span>,
<span class="title">"Feedback systems: an introduction for scientists and engineers"</span>, Princeton university press
, <span class="year">2010</span>.
</li><br>
<li id=Blondel97>
<span class="author">Vincent Blondel and John N Tsitsiklis</span>,
<span class="title">"{NP}-hardness of some linear control design problems"</span>,
<span class="publisher">SIAM journal on control and optimization</span>, vol. 35, no. 6, pp. 2118--2127, <span class="year">1997</span>.
</li><br>
<li id=Sadraddini20>
<span class="author">Sadra Sadraddini and Russ Tedrake</span>,
<span class="title">"Robust Output Feedback Control with Guaranteed Constraint Satisfaction"</span>,
<span class="publisher">In the Proceedings of 23rd ACM International Conference on Hybrid Systems: Computation and Control</span> , pp. 12, April, <span class="year">2020</span>.
[ <a href="http://groups.csail.mit.edu/robotics-center/public_papers/Sadraddini20.pdf">link</a> ]
</li><br>
</ol>
</section><p/>
</div>
<table style="width:100%;"><tr style="width:100%">
<td style="width:33%;text-align:left;"><a class="previous_chapter" href=robust.html>Previous Chapter</a></td>
<td style="width:33%;text-align:center;"><a href=index.html>Table of contents</a></td>
<td style="width:33%;text-align:right;"><a class="next_chapter" href=limit_cycles.html>Next Chapter</a></td>
</tr></table>
<div id="footer">
<hr>
<table style="width:100%;">
<tr><td><a href="https://accessibility.mit.edu/">Accessibility</a></td><td style="text-align:right">© Russ
Tedrake, 2023</td></tr>
</table>
</div>
</body>
</html>