I recently sparked an animated little discussion during a research meeting by questioning the usefulness of the averaging protocol for the consensus problem. It turns out that this is a fairly sensitive issue in the control community these days and pretty much everybody has an opinion on this protocol. First, I was not arguing that consensus (or agreement) problems are very important in distributed systems. They are discussed at length in books on real-time systems and distributed algorithms. They occupy several chapters of Nancy Lynch’s [1] book for example, where they are defined (p.81) as problems where processes in a network begin with arbitrary initial values of a particular type, and are supposed to eventually output the same value (they must agree). This output value must be of the same type as the input values and might be subject to additional conditions, such as being equal to the initial value of at least one process. More often, control theorist have been interested in agreeing on the average of the initial values. The field of distributed algorithms in computer science considers consensus problems for various notions of time in networks (synchronous or asynchronous networks), and is particularly concerned with consensus when some of the processes are subject to failure (in particular, Byzantine fault tolerance).

The consensus or agreement problem has generated a huge literature in control theory over the past decade, in particular since the now famous paper of Ali Jadbabaie et al. [2]. However in contrast to the computer science literature, the problem is often not well defined when studied by control theorists. Fundamentally, the issue probably comes from the fact that in control papers, the processes are typically allowed to communicate real numbers without noise, at least in the basic version of the problem, which is a framework far too powerful from a theoretical computer science point of view (you could code an infinite amount of information in just one such transmission). So what is this control literature about? Well in the control community, consensus has almost lost its actual meaning and has become synonymous with a particular dynamical scheme where processes organized in a network start with arbitrary initial values and continuously average their current value with the value of their neighbors. Following this update law, the value of each process indeed converges to the average of all initial values in the network, at least in the simplest situations, assuming the local averages are performed adequately.

It is important to remember that this local averaging scheme is just one possible algorithm to reach consensus in a network using local communications, and therefore I will keep using the term *averaging algorithm* instead of consensus when talking about this particular scheme. In this post I give more background and explanations on the consensus problem and averaging protocol. In a follow-up post that I will publish soon, I discuss how the dynamics of this averaging algorithm lead to intriguing and still poorly understood behaviors very quickly. These behaviors in fact have attracted the interest of physicists, biologists and economists for example, and from a scientific point of view it is certainly useful to study them. But I’m also planning to discuss the drawbacks and issues that I see imparing the use of the averaging algorithm in real-world engineering applications (there seems to be very few of those actually, the vast majority of papers only report simulation, where everything is of course simpler ;-)).

**A Tutorial on the Averaging Algorithm.**

I follow here a recent paper by Alex Olshevksy and John Tsitsiklis [3]* *(what I call the averaging algorithm here is called the ”agreement algorithm” in that paper)*. *Alex Olshevsky works on the consensus problem as part of his PhD work, and John Tsitsiklis was one of the first researchers who studied the averaging algorithm [4], in particular in asynchronous and time-varying networks.

We consider a set of nodes . The nodes start with a scalar value and the vector of values for all nodes at (discrete) time is denoted . The averaging algorithm updates at each period according to the dynamics , where is a matrix with nonnegative entries . Moreover, we assume that the row sums of are equal to , so that is a stochastic matrix and is a weighted average of the previous values . Intuitively, means that node communicates its current value to node at time . We let be the set of edges at time for which , i.e., represents the communication graph at time .

Consider the following assumptions:

- There exists a positive constant such that , and .
- .
- (bounded interconnectivity times) There is some such that for all , the graph is strongly connected.

Then under these assumptions, the averaging algorithm guarantees asymptotic consensus, that is, there is a constant (depending on and the sequence of communication graphs) such that . See [3] for the references of this result. In the simplest case, the communication graph is fixed () and the matrix is assumed constant . If is a doubly stochastic matrix (i.e., each column of also sums to ), then we know that the value is equal to the average of the initial values .

Note that the averaging algorithm requires a priori that a central designer chooses the matrix and communicates the weights to the proper nodes, in order that be stochastic or doubly stochastic. We can improve on this point using ideas from random walks on graphs. Node can set for a neighbor of , where is the number of neighbors of in the (constant) communication graph, including . It sets if is not a neighbor of . Then is a stochastic matrix which is irreducible and aperiodic under the previous assumptions, with steady-state probabilities . This is called *the equal-neighbor model*. The paper [3] describes a variation of the averaging algorithm which converges to the average of the initial values and does not require any central coordination. Each node sets , , and runs in parallel the algorithms and . Then each nodes sets , and we have . The main issue with this algorithm is that it takes steps for to be -close to its limit value. The term in particular is quite bad since in practice the algorithm is supposed to run on a large number of nodes.

If we allow a central designer to choose the matrix , then we can improve the convergence speed of the algorithm significantly. In general, any matrix (not even necessarily nonnegative) such that the iterations converge to the vector can be used to obtain consensus, with additional conditions needed in order for to be the actual average of the initial values. In particular, we can simply consider matrices such that is an eigenvalue of with multiplicity , the corresponding left eigenvector has nonzero entries, and all other eigenvalues have magnitude less than . The paper [3] explains that we can easily construct in such a way that a consensus value is approached at arbitrarily fast rate. However this scheme seems numerically impractical and moreover forces convergence to the initial value of one of the nodes, not the average value. When taking these points into account, Olshevsky and Tsitsiklis describe an averaging algorithm which converges to the average value in time , essentially by executing the algorithm on a spanning tree (deleting potentially additional arcs that the initial network might have). By considering line graphs, they show that the bound for their algorithm is tight, and also that more sophisticated methods for choosing such as [5], take a similar number of steps to converge for these graphs (for both algorithms, the convergence might be better for different types of graphs).

Finally, for time varying graphs with bounded interconnectivity times as above, the authors show that the convergence time of the averaging algorithm for the equal neighbor model is not bounded by a polynomial in and . However, they describe a different algorithm, variation of an old load balancing algorithm, which converges to an -neighborhood of the average of the initial values in time . This algorithm is not an averaging protocol as the ones described previously. In particular, it is a nonlinear scheme.

That’s it for today. I’ll post more on this topic soon.

**References**

[1] Nancy A. Lynch. *Distributed Algorithms.* Morgan Kaufmann, April 1997.

[2] A. Jadbabaie, J. Lin, and A.S. Morse. “Coordination of Groups of Mobile Autonomous Agents Using Nearest Neighbor Rules”. *IEEE Transactions on Automatic Control*, Vol. 48, No. 6, June 2003.

[3] Alex Olshevksy and John Tsitsiklis, “Convergence Speed in Distributed Consensus and Averaging“, *SIAM Journal on Control and Optimization*, 48(1), p.33-55, 2009.* *

[4] J.N. Tsitsiklis, “Problems in Decentralized Decision Making and Computation”, Ph.D. Thesis, Department of EECS, MIT, 1984.

[5] L. Xiao and S. Boyd, “Fast linear iterations for distributed averaging”, *Systems and Control Letters*, vol 53, pp.65-78, 2004.