I recently sparked an animated little discussion during a research meeting by questioning the usefulness of the averaging protocol for the consensus problem. It turns out that this is a fairly sensitive issue in the control community these days and pretty much everybody has an opinion on this protocol. First, I was not arguing that consensus (or agreement) problems are very important in distributed systems. They are discussed at length in books on real-time systems and distributed algorithms. They occupy several chapters of Nancy Lynch’s [1] book for example, where they are defined (p.81) as problems where processes in a network begin with arbitrary initial values of a particular type, and are supposed to eventually output the same value (they must agree). This output value must be of the same type as the input values and might be subject to additional conditions, such as being equal to the initial value of at least one process. More often, control theorist have been interested in agreeing on the average of the initial values. The field of distributed algorithms in computer science considers consensus problems for various notions of time in networks (synchronous or asynchronous networks), and is particularly concerned with consensus when some of the processes are subject to failure (in particular, Byzantine fault tolerance).

The consensus or agreement problem has generated a huge literature in control theory over the past decade, in particular since the now famous paper of Ali Jadbabaie et al. [2]. However in contrast to the computer science literature, the problem is often not well defined when studied by control theorists. Fundamentally, the issue probably comes from the fact that in control papers, the processes are typically allowed to communicate real numbers without noise, at least in the basic version of the problem, which is a framework far too powerful from a theoretical computer science point of view (you could code an infinite amount of information in just one such transmission). So what is this control literature about? Well in the control community, consensus has almost lost its actual meaning and has become synonymous with a particular dynamical scheme where processes organized in a network start with arbitrary initial values and continuously average their current value with the value of their neighbors. Following this update law, the value of each process indeed converges to the average of all initial values in the network, at least in the simplest situations, assuming the local averages are performed adequately.

It is important to remember that this local averaging scheme is just one possible algorithm to reach consensus in a network using local communications, and therefore I will keep using the term averaging algorithm instead of consensus when talking about this particular scheme. In this post I give more background and explanations on the consensus problem and averaging protocol. In a follow-up post that I will publish soon, I discuss how the dynamics of this averaging algorithm lead to intriguing and still poorly understood behaviors very quickly. These behaviors in fact have attracted the interest of physicists, biologists and economists for example, and from a scientific point of view it is certainly useful to study them. But I’m also planning to discuss the drawbacks and issues that I see imparing the use of the averaging algorithm in real-world engineering applications (there seems to be very few of those actually, the vast majority of papers only report simulation, where everything is of course simpler ;-)).

A Tutorial on the Averaging Algorithm.

I follow here a recent paper by Alex Olshevksy and John Tsitsiklis [3] (what I call the averaging algorithm here is called the ”agreement algorithm” in that paper). Alex Olshevsky works on the consensus problem as part of his PhD work, and John Tsitsiklis was one of the first researchers who studied the averaging algorithm [4], in particular in asynchronous and time-varying networks.

We consider a set of nodes {\mathcal N=\{1,2, \ldots,n \}}. The nodes start with a scalar value {x_i(0)} and the vector of values for all nodes at (discrete) time {t} is denoted {x(t)=(x_1(t),\ldots,x_n(t))}. The averaging algorithm updates {x(t)} at each period according to the dynamics {x(t+1) = A(t) x(t)}, where {A(t)} is a matrix with nonnegative entries {a_{ij}(t)}. Moreover, we assume that the row sums of {A(t)} are equal to {1}, so that {A(t)} is a stochastic matrix and {x_i(t+1)} is a weighted average of the previous values {x_j(t)}. Intuitively, {a_{ij}(t)>0} means that node {j} communicates its current value to node {i} at time {t}. We let {\mathcal E(t)} be the set of edges {(j,i)} at time {t} for which {a_{ij}(t)>0}, i.e., {\mathcal E(t)} represents the communication graph at time {t}.

Consider the following assumptions:

  1. There exists a positive constant {\alpha} such that {a_{ii}(t) \geq \alpha, \forall i,t}, and {a_{ij}(t) \in \{0\} \cup [\alpha,1], \forall i,j,t}.
  2. {\sum_{j=1}^n a_{ij}(t) = 1, \forall i,t}.
  3. (bounded interconnectivity times) There is some {B} such that for all {k}, the graph {\mathcal N, \mathcal E(kB) \cup \mathcal E(kB+1) \cup \cdots \cup \mathcal E((k+1)B-1))} is strongly connected.

Then under these assumptions, the averaging algorithm guarantees asymptotic consensus, that is, there is a constant {c} (depending on {x(0)} and the sequence of communication graphs) such that {\lim_{t \rightarrow \infty} x_i(t) = c}. See [3] for the references of this result. In the simplest case, the communication graph is fixed ({\mathcal E(t)=\mathcal E}) and the {A(t)} matrix is assumed constant {A(t)=A}. If {A} is a doubly stochastic matrix (i.e., each column of {A} also sums to {1}), then we know that the value {c} is equal to the average of the initial values {c=(\sum_{i=1}^n x_i(0)) / n}.

Note that the averaging algorithm requires a priori that a central designer chooses the matrix {A} and communicates the weights {a_{ij}} to the proper nodes, in order that {A} be stochastic or doubly stochastic. We can improve on this point using ideas from random walks on graphs. Node {i} can set {a_{ij}=1/d_i} for {j} a neighbor of {i}, where {d_i} is the number of neighbors of {i} in the (constant) communication graph, including {i}. It sets {a_{ij}=0} if j is not a neighbor of i. Then {A} is a stochastic matrix which is irreducible and aperiodic under the previous assumptions, with steady-state probabilities {\pi_i = d_i / (\sum_{i=1}^n d_i)}. This is called the equal-neighbor model. The paper [3] describes a variation of the averaging algorithm which converges to the average of the initial values and does not require any central coordination. Each node {i} sets {y_i(0)=1/d_i}, {z_i(0)=x_i(0)/d_i}, and runs in parallel the algorithms {y(t+1) = A y(t)} and {z(t+1) = A z(t)}. Then each nodes sets {x_i(t)=z_i(t)/y(t)}, and we have {\lim_{t \rightarrow \infty} x_i(t) = (\sum_{i=1}^n x_i(0)) / n, \forall i}. The main issue with this algorithm is that it takes {\Theta(n^3 \log (n/\epsilon))} steps for {x(t)} to be {\epsilon}-close to its limit value. The {n^3} term in particular is quite bad since in practice the algorithm is supposed to run on a large number of nodes.

If we allow a central designer to choose the matrix {A}, then we can improve the convergence speed of the algorithm significantly. In general, any matrix {A} (not even necessarily nonnegative) such that the iterations {x(t+1)=A x(t)} converge to the vector {c \mathbf 1} can be used to obtain consensus, with additional conditions needed in order for {c} to be the actual average of the initial values. In particular, we can simply consider matrices {A} such that {\mathbf 1} is an eigenvalue of {A} with multiplicity {1}, the corresponding left eigenvector {\pi} has nonzero entries, and all other eigenvalues have magnitude less than {1}. The paper [3] explains that we can easily construct {A} in such a way that a consensus value is approached at arbitrarily fast rate. However this scheme seems numerically impractical and moreover forces convergence to the initial value of one of the nodes, not the average value. When taking these points into account, Olshevsky and Tsitsiklis describe an averaging algorithm which converges to the average value in time {O(n^2 \log (n/\epsilon))}, essentially by executing the algorithm on a spanning tree (deleting potentially additional arcs that the initial network might have). By considering line graphs, they show that the bound for their algorithm is tight, and also that more sophisticated methods for choosing {A} such as [5], take a similar {\Omega(n^2 \log(1/\epsilon))} number of steps to converge for these graphs (for both algorithms, the convergence might be better for different types of graphs).

Finally, for time varying graphs with bounded interconnectivity times as above, the authors show that the convergence time of the averaging algorithm for the equal neighbor model is not bounded by a polynomial in {n} and {B}. However, they describe a different algorithm, variation of an old load balancing algorithm, which converges to an {\epsilon}-neighborhood of the average of the initial values in time {O(B n^3 log (1/\epsilon))}. This algorithm is not an averaging protocol as the ones described previously. In particular, it is a nonlinear scheme.

That’s it for today. I’ll post more on this topic soon.

References

[1] Nancy A. Lynch. Distributed Algorithms. Morgan Kaufmann, April 1997.

[2] A. Jadbabaie, J. Lin, and A.S. Morse. “Coordination of Groups of Mobile Autonomous Agents Using Nearest Neighbor Rules”. IEEE Transactions on Automatic Control, Vol. 48, No. 6, June 2003.

[3] Alex Olshevksy and John Tsitsiklis, “Convergence Speed in Distributed Consensus and Averaging“, SIAM Journal on Control and Optimization, 48(1), p.33-55, 2009.

[4] J.N. Tsitsiklis, “Problems in Decentralized Decision Making and Computation”, Ph.D. Thesis, Department of EECS, MIT, 1984.

[5] L. Xiao and S. Boyd, “Fast linear iterations for distributed averaging”, Systems and Control Letters, vol 53, pp.65-78, 2004.

Advertisements

Last week the Year 1 program review meeting for the autonomy center of the MAST project took place here at Penn. This project, for which you can find more information on the ARL and MAST websites, is concerned with the development of teams of small mobile robots (air and ground vehicles) performing  various missions for military applications, essentially increasing situational awareness. MAST is a very large project, and is divided into 4 main research areas (or centers) representing different aspects that need to be developed: platform mechanics, microelectronics, processing for autonomous operations, and integration. The goal is to combine the developments of the different centers into one integrated demo by the end of the project.

GRASP is the lead institution for the autonomy center, which also includes participants from the ARL, Georgia Tech, UC Berkeley, UC Merced, University of Mexico, University of Delaware, University of Maryland, MIT, and the University of Sidney in Australia. The researchers span the fields of robotics, control, vision and sensing, and communication systems. A very interesting aspect of the project is that it is bringing these usually separate communities together to work on common problems and combine their knowledge. To take an example, Yasamin Mostofi is cooperating with Bert Tanner to develop motion planning strategies for groups of robots that use realistic models of the wireless communication links between the robots. The wireless communication systems community usually does not consider controlled mobility of the nodes in their models, and typically models the motion as random. The researchers working on motion planning for robotic networks on the other hand have proposed models taking into account communication constraints, but usually these communication models are oversimplified. The popular disc model for example assumes that the robots can communicate with their neighbors if and only if they are within a prespecified radius, but this does not model properly the complicated wireless interference aspects, especially those arising in indoor environments where multipath fading effects are important. They also want to integrated the motion planning component with the sensing task, for which Songhwai Oh is using environmental models based on Gaussian processes.

Werner Vogels (Amazon.com’s CTO) has an interesting article on his blog about the necessary trade-offs involved in building large reliable distributed databases. It seems that some of these ideas could be useful when thinking about building large sensing networks, for example groups of mobile robots (e.g. UAVs) collecting data on their environments (this is the main focus of the projects SWARMS and HUNT here at Penn). As mobile robotic networks grow in size, some robots will probably have to make decisions without having time to collect all the currently available useful information from all other robots. Moreover, just as with web services, a robotic network is expected to be highly available.  Currently if communication is lost with an UAV for only a few minutes, the UAV is programmed to return to its base automatically. Hence temporary communication loss can mean the end of a critical mission.  With a network of robots, we can think of implementing some degree of fault-tolerance, so that the service can still perform as expected even if some node or particular communication link fails. It might be useful in these scenarios to think about the fundamental limits to sharing consistent information between decision/sensing nodes.

One of the trade-off in distributed databases (DBs) is between high availability and data consistency. Database replication techniques aim at achieving consistency across different nodes, but as a system grows in size, availability becomes an issue. In 2000, Eric Brewer conjectured that in a shared-data system, only two of the following three properties can be achieved at the same time:

  • data consistency,
  • system availability,
  • and tolerance to network partition1.

Seth Gilbert and Nancy Lynch formalized this conjecture in a 2002 paper2. The system responding to client requests is a distributed shared memory. (Atomic) Consistency requires that the requests act as if they were executing on a single node, one at a time. Availability means that every request received by a non-failing node must result in a response (there is no bound on the response time here). To model partition tolerance, the network is allowed to lose arbitrarily many messages sent from one node to another. The first network model used in that paper is the asynchronous network model3. That is, there is no clock, and nodes must make decisions based only on the messages received and local computations. Under such difficult operating conditions, the impossibility result is clear. For example, consider a network which is available and tolerant to network partitions. Assume now that it is partitioned into 2 components G_1 and G_2, i.e., that all messages between the two components are lost. Consider a client that tries to write  data in component G_1, and once the data is written, reads it from G_2. Both operations will succeed by our availability assumption. Yet the read cannot return the updated value since no message managed to cross between G_1 and G_2. This violates the atomic consistency property. Moreover, in the asynchronous network model, replacing lost messages by arbitrarily delayed messages does not change the result.

Consider now a partially synchronous model, in which each node in the network has a clock. These clocks all increase at the same rate, but are not synchronized (they may give different values at the same real time). They can be used as local timers to measure the time elapsed since a given event. Also assume that messages not delivered within a known time t_{msg} are lost, and that every node processes a received message within a known time t_{local}. Even with this more powerful system, the impossibility result still holds when arbitrary messages may be lost, by an argument similar to the one for the asynchronous model.

Guaranteeing two of the three properties can be achieved by trivial algorithms. The more interesting problem is to guarantee two of the properties while guaranteeing a weaker version of the third. Web caches are one example of available, partition tolerant, and weakly consistent networks. More generally, for partially synchronous networks, we can design an (centralized) algorithm which is available, partition tolerant, and guarantees a return to consistency within some time limit if no message is lost for a certain time2 (see also the related notion of eventual consistency discussed by Vogels). In large databases used for web services, network partitions are a fact, and therefore complete consistency and system availability cannot be achieved at the same time in general. If a system emphasizes consistency, it may not always be available to take a write. If it emphasizes availability, a read might not always return the most recently completed write. Depending on the application, one might decide to give priority to one property over the other.

Reference

  1. Brewer, E. A. 2000. Towards robust distributed systems (abstract). In Proceedings of the 19th Annual ACM Symposium on Principles of Distributed Computing (July 16-19, Portland, Oregon): 7
  2. Gilbert , S., Lynch, N. 2002. Brewer’s conjecture and the feasibility of consistent, available, partition-tolerant Web services. ACM SIGACT News 33(2).
  3. Lynch, N. 1996. Distributed Algorithms. Morgan Kaufman.

update 04/11/08: An article describing the HUNT workshop at ASU was published by insciences.org

February has been a hectic month and unfortunately I haven’t been able to update this blog as I hoped. But I have several posts in mind and I might be able to finish some of them in the coming weeks.

I’m just returning from a two-day workshop at ASU for the HUNT project (Heterogeneous Unmanned Networked Teams) that I’m involved with. The link will take you to the workshop program. This project is about the control of teams of unmanned vehicles, and there is a strong focus on designing bio-inspired group behaviors. So this time the first day of the program was reserved for talks from biologists, studying lions, monkeys, fishes, ants, among other species.

I have to say first that I’m impressed by the results that have been possible in robotics thanks to biomimicry. A good example at GRASP is the RHex robot built by the group of Dan Koditschek. But so far, most successes seem related to robot mechanics and hardware technology. For robot AI, although behavior-based robotics is largely inspired by biology (incidentally, Ron Arkin is a PI in HUNT), I haven’t found a good example yet which demonstrates that studying animal groups will help us design autonomous robot teams performing complex missions. Hopefully we’ll see  one coming out of this project. You might say, what about swarming, flocking, and the related topics that have drawn so much attention in the past decade? Well, I don’t think this is addressing the high level mission planning problem that UAVs will need to solve, such as data collection tasks. But since this might be a bit controversial, I’m planning to write more on this subject. For now, Magnus Egerstedt pointed us to this video, which shows what we can currently do with a good amount of cheating ;-)

Chichen Itza

A good resolution for 2009 would be to start posting content on this blog. It’s likely that I won’t be able to keep that resolution but let’s give it a try. In December the annual conference on decision and control took place in Cancun, Mexico. This certainly was a nice break, especially since I took a week of vacation after the conference for scuba diving in Cozumel and visiting the area around Chichen Itza.

At the conference, I first attended the semi-plenary talk of Anders Rantzer on distributed control. This has been a topic of interest since at least the 1970’s (this year is the 30th anniversary of the special issue of the transactions on automatic control devoted to the subject), with applications to all sorts of networks (power networks, transportation networks, communication networks…) and complex large-scale systems. The slides of the lecture can be found here. He mentioned a distributed linear quadratic control design problem where the gain matrix of the controller is diagonally dominant and its coefficients decrease as the influence of the control input decreases with the distance. This reminded me of the recent work of Nader Motee and Ali Jadbabaie here at GRASP, which Rantzer didn’t mention however. The tools for distributed control mentioned in the talk, such as game theory (in particular potential games) and dual decomposition methods, were not surprising but I found the talk pretty well organized. There was an interesting reference to a book by Arrow, Hurwicz, and Uzawa (“Studies in linear and nonlinear programming”, Stanford University Press, 1958 ) which sounded interesting and worth checking out. I guess this was aimed at reminding us about the fact that these problems have been with us for such a long time now.  Someone (I believe it was Albert Benveniste) also mentioned that the techniques presented were pretty close to the work of Guy Cohen in the 1970s.

This brings me to the paper presented by my former labmate Parikshit Shah and his advisor Pablo Parrilo from LIDS, entitled “A Partial Order Approach to Decentralized Control”. I found this paper pretty original and I learned along the way some interesting facts on the stability of sparsity patterns of matrices under various operations, which are standard for combinatorists but probably not well known among control theorists. Here we consider a system composed of several interacting subsystems, which allows us to partition the global transfer function into several local ones. Communication constraints among subsystems correspond to sparsity patterns in the system and controller transfer functions.

Recall that a partially ordered set (or poset) is a set together with a binary relation which is reflexive, transitive, and antisymmetric. If \mathcal{P} is a poset, and R a ring, we define the incidence algebra I_\mathcal{P}(R) of \mathcal{P} over R to be the set of all functions f: \mathcal{P} \times \mathcal{P} \to R with the property that f(x,y)=0 if x \npreceq y. When the poset \mathcal{P} is finite, the functions in the incidence algebra may be thought of as matrices with a specific sparsity pattern given by the order relations of the poset. Accordingly, we define the following operations on I_\mathcal{P}(R): sum and scalar multiplications are defined pointwise as expected, and the multiplication follows the matrix multiplication rule f \cdot g \; (x,y) = \sum_{z \in \mathcal{P}} f(x,z) g(z,y). It is easy to see from the properties of the partial order that these operations indeed make I_\mathcal{P}(R) an associative algebra. In particular, if a matrix A \in I_{\mathcal{P}} is invertible (with the correspondence matrix-functions explained above), then A^{-1} \in I_{\mathcal{P}}.

Now back to control systems. Consider the linear discrete time system

x[k+1]=A x[k] + B_1 w[k] + B_2 u[k]

z[k]=C_1 x[k]+D_{11} w[k] + D_{12} u[k]

y[k]=C_2 x[k] + D_{21} w[k] + D_{22} u[k],

where x is the state, u is the control input, y is the measured output, w is the exogenous disturbance, and z is the error signal. We write the transfer function of the system as

P(z) = \left[\begin{array}{cc} P_{zw} & P_{zu} \\ P_{yw} & P_{yu} \end{array}\right],

and given a linear controller K mapping y to u, the closed-loop system has transfer function

f(P,K) = P_{zw} + P_{zu} K (I - P_{yu} K)^{-1} P_{yw}.   (1)

A general problem we might be interested in is then to minimize \| f(P,K) \|, for an appropriate norm, over the controller transfer functions K (rational proper) such that K stabilizes P and K belongs to some subspace S of the space of all controllers. This last constraint (which makes things hard) is added to capture additional requirements such as designing a decentralized controller. In such generality, it is not known how to solve this problem efficiently. But Rotkowitz and Lall have shown that if the plant P and the subspace S verify a property called quadratic invariance, then we can find K by solving a convex problem (in the Youla parameter). Quadratic invariance is still a pretty abstract notion however. Decentralization requirements translate into S defined by requiring certain sparsity patterns for the controller matrix K. Shah and Parrilo show that if these sparsity patterns are related to a partial order structure, then the problem is quadratically invariant, with some additional simplifications with respect to the general procedure of Rotkowitz and Lall. The partial order structure has a natural interpretation in terms of hierarchical control.

To model communication constraints using posets, we say that for two subsystems i and j we have i \prec j iff j transmits some information to i. More precisely, we impose that if i \npreceq j then we must have the block (P_{yu})_{ij}=0. So if i \prec j there is a directed communication link such that subsystem j communicates its information to subsystem i, but not vice-versa. We would like to design a controller that obeys the same communication structure. That is, the local controller at i has access to the output of system j in addition to the output of system i, but the local controller at j does not have access to the output of system i. You can interpret these in terms of the organisation of a decision structure by saying that i is a supervisor of j. System j reports to its manager i but the manager does not inform j of what he is doing…

Now if the communication constraints define a partial order \mathcal P (in general, we start with the partial order to represent the communication constraints), then this means exactly that P_{yu} belongs to the incidence algebra I_\mathcal{P} (the ring R here if the field of rational proper transfer functions). If we let \mathcal H_{stab} be the set of stable close-loop transfer matrices achieved by some stabilizing controller, there is a well known trick that tells us to rewrite (1) in terms of a new parameter, the transfer matrix R=K(I-P_{yu}K)^{-1}. We can then recover K as K=(I+RP_{yu})^{-1}R. Since P_{yu} \in I_\mathcal{P}, we see that R \in I_\mathcal{P} if and only if K \in I_\mathcal{P}, because I_\mathcal{P} is an algebra. We can then rewrite the optimization problem in terms of R, which transforms the problem into a convex problem.

A bit more details. First, Shah and Parrilo explain that for the task of designing a stabilizing controller to be even possible, we must have the necessary condition (P_{yu})_{ij} stable for i \neq j and (P_{yu})_{ii} stabilizable. Under this condition, we can first reduce the optimization problem to the case where P_{yu} is a triangular matrix (by extending the partial order to a total order) and then construct explicitely a stabilizing controller for the system satisfying the communication constraint. Adding this controller in the feedback loop, we reduce the problem to the case where P_{yu} is stable. In this case, it is known that

\mathcal H_{stab} = \{P_{zw}+P_{zu} R P_{yw} | R \text{ stable proper}. \}.

Hence we have reduced the decentralized controller design problem to

minimize \| P_{zw}+P_{zu} R P_{yw} \|

subject to R \in I_{\mathcal P}, \text{and } R \text{ stable proper},

which is a convex problem. The paper also mentions how to handle problems where the communication constraints of the controller and plant are different, using Galois connections.

I’ve been thinking that there is a need for more blogs on control theory out there. In fact, I haven’t found one yet (!), in contrast to the many scientific blogs talking about advanced mathematics, physics, and engineering. I’m not sure why this is the case.

I’m just starting as a Postdoc at the Grasp Laboratory at Penn. Here is a link to my webpage, still under construction. Actually, I am potentially open to creating a collaborative blog, so you can get in touch with me if you are interested.