Robust Exponential Memory in Hopfield Networks

The Hopfield recurrent neural network is a classical auto-associative model of memory, in which collections of symmetrically coupled McCulloch–Pitts binary neurons interact to perform emergent computation. Although previous researchers have explored the potential of this network to solve combinatorial optimization problems or store reoccurring activity patterns as attractors of its deterministic dynamics, a basic open problem is to design a family of Hopfield networks with a number of noise-tolerant memories that grows exponentially with neural population size. Here, we discover such networks by minimizing probability flow, a recently proposed objective for estimating parameters in discrete maximum entropy models. By descending the gradient of the convex probability flow, our networks adapt synaptic weights to achieve robust exponential storage, even when presented with vanishingly small numbers of training patterns. In addition to providing a new set of low-density error-correcting codes that achieve Shannon’s noisy channel bound, these networks also efficiently solve a variant of the hidden clique problem in computer science, opening new avenues for real-world applications of computational models originating from biology.


Background
The underlying probabilistic model of data in the Hopfield network is the nonferromagnetic Lenz-Ising model [16] from statistical physics, more generally called Fig. 1 Energy landscape and discrete dynamics in a Hopfield network having robust storage of all 4-cliques in graphs on 8 vertices. The deterministic network dynamics sends three corrupted cliques to graphs with smaller energy, converging on the underlying 4-clique attractors a Markov random field in the literature, and the model distribution in a fully observable Boltzmann machine [17] from artificial intelligence. The states of this discrete distribution are length n binary column vectors x = (x 1 , . . . , x n ) ∈ {0, 1} n each having probability p x := 1 Z exp(−E x ), in which E x := − 1 2 x Wx + θ x is the energy of a state, W is an n-by-n real symmetric matrix with zero diagonal (the weight matrix), the vector θ ∈ R n is a threshold term, and Z := x exp(−E x ) is the partition function, the normalizing factor ensuring that p x represents a probability distribution. In theoretical neuroscience, rows W e of the matrix W are interpreted as abstract "synaptic" weights W ef connecting neuron e to other neurons f .
The pair (W, θ) determines an asynchronous deterministic ("zero-temperature") dynamics on states x by replacing each x e in x with the value: in a (usually initialized randomly) fixed order through all neurons e = 1, . . . , n. The quantity I e := W e , x in (2) is often called the feedforward input to neuron e and may be computed by linearly combining input signals from neurons with connections to e.
Let E e (resp. x e = ±1, 0) be the energy (resp. bit) change when applying (2) at neuron e. The relationship guarantees that network dynamics does not increase energy. Thus, each initial state x will converge in a finite number of steps to its attractor x * (also called in the literature fixed-point, memory, or metastable state); e.g., see Fig. 1. The biological plausibility and potential computational power [18] of the dynamics update (2) inspired both early computer [19] and neural network architectures [4,20]. We next formalize the notion of robust fixed-point attractor storage for families of Hopfield networks. For p ∈ [0, 1 2 ], the p-corruption of x is the random pattern x p obtained by replacing each x e by 1 − x e with probability p, independently. The p-corruption of a state differs from the original by pn bit flips on average so that for larger p it is more difficult to recover the original binary pattern; in particular, x 1 2 is the uniform distribution on {0, 1} n (and thus independent of x). Given a Hopfield network, the attractor x * has (1 − ε)-tolerance for a p-corruption if the dynamics can recover x * from (x * ) p with probability at least 1 − ε. The α-robustness α(X, ε) for a set of states X is the most p-corruption every state (1 − ε)-tolerates.
At last, we say that a sequence of Hopfield networks H n robustly stores states X n with robustness index α > 0 if the following limit exists and equals the number α: If α is the robustness index of a family of networks, then the chance that dynamics does not recover an α-corrupted memory can be made as small as desired by devoting more neurons. (Note that by definition, we always have α ≤ 1/2.) To determine parameters (W, θ) in our networks from a set of training patterns X ⊆ {0, 1} n , we minimize the following probability flow objective function [14,15]: in which N (x) are those neighboring states x differing from x by a single flipped bit. It is elementary that a Hopfield network has attractors X if and only if the probability flow (5) can be arbitrarily close to zero, motivating the application of minimizing (5) to find such networks [15]. Importantly, the probability flow is a convex function of the parameters, consists of a number of terms linear in n and the size of X, and avoids the exponentially large partition function Z. We remark that the factor of 1 2 inside of the exponential in (5) will turn out to be unimportant for our analysis; however, we keep it to be consistent with the previous literature on interpreting (5) as a probability density estimation objective.
Let v be a positive integer and set n = v(v−1) 2 . A state x in a Hopfield network on n nodes represents a simple undirected graph G on v vertices by interpreting a binary entry x e in x as indicating whether edge e is in G (x e = 1) or not (x e = 0). A k-clique graphs consisting of k fully connected nodes and v − k other isolated nodes. Below, in Sect. 3, we will design Hopfield networks that have all k-cliques on 2k (or 2k − 2) vertices as robustly stored memories. For large n, the count 2k k approaches (1) by Stirling's approximation. Figure 1 depicts a network with n = 28 neurons storing 4-cliques in graphs on v = 8 vertices.

Results
Our first result is that numerical minimization of probability flow over a vanishingly small critical number of training cliques determines linear threshold networks with exponential attractor memory. We fit all-to-all connected networks on n = 3160, 2016, 1128 neurons (v = 80, 64, 48; k = 40, 32, 24) with increasing numbers of randomly generated k-cliques as training data X by minimizing (5) with the limited-memory Broyden-Fletcher-Goldfarb-Shanno (L-BFGS) algorithm [21] (implemented in the programming language Python's package SciPy). In Fig. 2, we plot the percentage of 1000 random new k-cliques that are fixed-points in these networks after training as a function of the ratio of training set size to total number of k-cliques.   Fig. 2. (b) Histograms of weight and threshold parameters for networks in (a) (histogram of thresholds θ in inset). Network parameters are scaled so that thresholds have mean Each triangle in the figure represents the average of this fraction over 50 networks, each given the same number of randomly generated (but different) training data. The finding is that a critical number of training samples allows for storage of all k-cliques. Moreover, this count is significantly smaller than the total number of patterns to be learned.
In Fig. 3(a), we display a portion of the weight matrix with minimum probability flow representing a v = 80 network (4,994,380 weight and threshold parameters) given 100 (≈1e−21% of all 40-cliques), 1000 (1e−20%), or 10,000 (1e−19%) randomly generated 40-cliques as training data; these are the three special starred points in Fig. 2. In Fig. 3(b), we also plot histograms of learned parameters from networks trained on data with these three sample sizes. The finding is that weights and thresholds become highly peaked and symmetric about three limiting quantities as sample size increases.
We next analytically minimize probability flow to determine explicit networks achieving robust exponential storage. To simplify matters, we first observe by a symmetrizing argument (see Sect. 5) that there is a network storing all k-cliques if and only if there is one with constant threshold θ = (z, . . . , z) ∈ R n and satisfying for each pair e = f , ether W ef = x (whenever e and f share one vertex) or W ef = y (when e and f are disjoint). Weight matrices approximating this symmetry can be seen in Fig. 3(a). (Note that this symmetry structure on the weights is independent of clique size k.) In this case, the energy of a graph G with #E(G) edges is the following linear function of (x, y, z) ∈ R 3 : in which S 1 (G) and S 0 (G) are the number of edge pairs in the graph G with exactly one or zero shared vertices, respectively. Consider the minimization of (5) over a training set X consisting of all v k kcliques on v = 2k − 2 vertices (this simplifies the mathematics), restricting networks to our 3-parameter family (x, y, z). When y = 0, these networks are sparsely connected, having a vanishing number of connections between neurons relative to total population size. Using single variable calculus and Eq. (6), one can check that, for any fixed positive threshold z, the minimum value of (5) is achieved uniquely at the parameter setting (x, 0, z), where This elementary calculation gives our first main theoretical contribution.

Theorem 1 McCulloch-Pitts attractor networks minimizing probability flow can achieve robust exponential pattern storage.
We prove Theorem 1 using the following large deviation theory argument; this approach also allows us to design networks achieving optimal robustness index α = 1/2 (Theorem 2). Fix v = 2k (or v = 2k −2) and consider a p-corrupted clique. Using Bernstein's concentration inequality for sums of Bernoulli binary random variables [22] ("coin flips"), it can be shown that with high probability (i.e., approaching 1 as v → ∞) an edge in the clique has 2k neighboring edges at least, on average (see Corollary 1).
This gives the fixed-point requirement from (2): On the other hand, a non-clique edge sharing a vertex with the clique has k(1 + 2p) neighbors at most, on average. Therefore, for a k-clique to be a robust fixed-point, this forces again from (2): and any other edges will disappear when this holds. (o(·) is "little-o" notation.) It follows that the optimal setting (7) for x minimizing probability flow gives robust storage (with a single parallel dynamics update) of all k-cliques for p < 1/4. This proves Theorem 1 (see Sect. 5 for the full mathematical details).
It is possible to do better than robustness index α = 1/4 by setting , which satisfies the above fixed-point requirements with probability approaching 1 for any fixed p < 1/2 and increasing k. We have thus also demonstrated: Theorem 2 There is a family of Hopfield networks on n = 2k 2 nodes that robustly store 2k In Fig. 4, we show robust storage of the (≈10 37 ) 64-cliques in graphs on 128 vertices using three (x, y, z) parameter specializations designed here.
A natural question is whether we can store a range of cliques using the same architecture. In fact, we show here that there is a network storing nearly all cliques.

Theorem 3 For large v, there is a Hopfield network on
for constants C ≈ 0.43, D ≈ 13.93. Moreover, this is the largest possible range of k for any such Hopfield network.
Our next result demonstrates that even robustness to vanishingly small amounts of noise is nontrivial (see Sect. 5.5 for the proof).

Theorem 4 Hopfield-Platt networks storing all permutations will not robustly store derangements (permutations without fixed-points).
As a final application to biologically plausible learning theory, we derive a synaptic update rule for adapting weights and thresholds in these networks. Given a training pattern x, the minimum probability flow (MPF) learning rule moves weights and 4k(1+2p) and p = 1/4 (gray), or a sparsely connected MPF theoretical optimum (7) (black). Over 10 trials, 100 64-cliques chosen uniformly at random were p-corrupted for different p and then dynamics were converged initialized at noisy cliques. The plot shows the fraction of cliques completely recovered vs. pattern corruption p (standard deviation error bars). Dotted lines are average number of bits in a pattern retrieved correctly after converging network dynamics thresholds in the direction of steepest descent of the probability flow objective function (5) evaluated at X = {x}. Specifically, for e = f the rule takes the form: After learning, the weights between neurons e and f are symmetrized to 1 2 (W ef + W f e ), which preserves the energy function and guarantees that dynamics terminates in fixed-point attractors. As update directions (8) descend the gradient of an infinitely differentiable convex function, learning rules based on them have good convergence rates [23].
Let us examine the (symmetrized) learning rule (8) more closely. Suppose first that x e = 0 so that x e = 0 or 1 (depending on the sign of I e − θ e ). When x e = 0, weight W ef does not change; on the other hand, when x e = 1, the weight decreases if x f = 1 (and stays the same, otherwise). If instead x e = 1, then W ef changes only if x e = −1 or x f = −1, in which case the update is positive when at least one of x e , x f is 1 (and zero, otherwise). In particular, either (i) weights do not change (when the pattern is memorized or there is no neural activity) or (ii) when neurons e and f are both active in (8), weights increase, while when they are different, they decrease, consistent with Hebb's postulate [9], a basic hypothesis about neural synaptic plasticity. In fact, approximating the exponential function with unity in (8) gives a variant of classical outer-product rule (OPR) learning. Note also that adaptation (8) is local in that updating weights between 2 neurons only requires their current state/threshold and feedforward input from nearby active neurons.

Discussion
The biologically inspired networks introduced in this work constitute a new nonlinear error-correcting scheme that is simple to implement, parallelizable, and achieves the most asymptotic error tolerance possible [24] for low-density codes over a binary symmetric channel (α = 1/2 in definition (4)). There have been several other approaches to optimal error-correcting codes derived from a statistical physics perspective; for a comprehensive account, we refer the reader to [25]. See also [26][27][28][29] for related work on neural architectures with large memory. Additionally, for a recent review of memory principles in computational neuroscience theory more broadly, we refer the reader to the extensive high level summary [30].
Although we have focused on minimizing probability flow to learn parameters in our discrete neural networks, several other strategies exist. For instance, one could maximize the (Bayesian) likelihood of cliques given network parameters, though any strategy involving a partition function over graphs might run into challenging algorithmic complexity issues [31]. Contrastive divergence [17] is another popular method to estimate parameters in discrete maximum entropy models. While this approach avoids the partition function, it requires a nontrivial sampling procedure that precludes exact determination of optimal parameters.
Early work in the theory of neural computation put forward a framework for neurally plausible computation of (combinatorial) optimization tasks [32]. Here, we add another task to this list by interpreting error-correction by a recurrent neural network in the language of computational graph theory. A basic challenge in this field is to design efficient algorithms that recover structures imperfectly hidden inside of others; in the case of finding fully connected subgraphs, this is called the "Hidden clique problem" [33]. The essential goal of this task is to find a single clique that has been planted in a graph by adding (or removing) edges at random.
Phrased in this language, we have discovered discrete recurrent neural networks that learn to use their cooperative McCulloch-Pitts dynamics to solve hidden clique problems efficiently. For example, in Fig. 5 we show the adjacency matrices of three corrupted 64-cliques on v = 128 vertices returning to their original configuration by one iteration of the network dynamics through all neurons. As a practical matter, it is possible to use networks robustly storing k-cliques for detecting highly connected subgraphs with about k neighbors in large graphs. In this case, error-correction serves as a synchrony finder with free parameter k, similar to how "K-means" is a standard unsupervised approach to decompose data into K clusters.
In the direction of applications to basic neuroscience, we comment that it has been proposed that co-activation of groups of neurons-that is, synchronizing them-is a design principle in the brain (see, e.g., [34][35][36]). If this were true, then perhaps the networks designed here can help discover this phenomenon from spike data. Moreover, our networks also then provide an abstract model for how such coordination might be implemented, sustained, and error-corrected in nervous tissue. As a final technical remark about our networks, note that our synapses are actually discrete since the probability flow is minimized at a synaptic ratio equaling a rational number. Thus, our work adds to the literature on the capacity of neural networks with discrete synapses (see, e.g., [26,[37][38][39][40]), all of which build upon early classical work with associative memory systems (see, e.g., [20,41]).

Mathematical Details
We provide the remaining details for the proofs of mathematical statements appearing earlier in the text.

Symmetric 3-Parameter (x, y, z) Networks
The first step of our construction is to exploit symmetry in the following set of linear inequalities: where c runs over k-cliques and c over vectors differing from c by a single bit flip. The space of solutions to (9) is the convex polyhedral cone of networks having each clique as a strict local minimum of the energy function, and thus a fixed-point of the dynamics.
The permutations P ∈ P V of the vertices V act on a network by permuting the rows/columns of the weight matrix (W → P WP ) and thresholds (θ → P θ), and this action on a network satisfying property (9) preserves that property. Consider the average (W,θ) of a network over the group P V : W := 1 v! P ∈P V P WP ,θ := 1 v! P ∈P V P θ, and note that if (W, θ) satisfies (9) then so does the highly symmetric object (W,θ). To characterize (W,θ), observe that P WP = W and Pθ =θ for all P ∈ P V . These strong symmetries imply there are x, y, z such thatθ = (z, . . . , z) ∈ R n and for each pair e = f of all possible edges: where |e ∩ f | is the number of vertices that e and f share. Our next demonstration is an exact setting for weights in these Hopfield networks.

Exponential Storage
For an integer r ≥ 0, we say that state x * is r-stable if it is an attractor for all states with Hamming distance at most r from x * . Thus, if a state x * is r-stably stored, the network is guaranteed to converge to x * when exposed to any corrupted version not more than r bit flips away. For positive integers k and r, is there a Hopfield network on n = 2k 2 nodes storing all k-cliques r-stably? We necessarily have r ≤ k/2 , since 2( k/2 + 1) is greater than or equal to the Hamming distance between two k-cliques that share a (k − 1)-subclique. In fact, for any k > 3, this upper bound is achievable by a sparsely connected three-parameter network.

Lemma 1
There exists a family of three-parameter Hopfield networks with z = 1, y = 0 storing all k-cliques as k/2 -stable states.
The proof relies on the following lemma, which gives the precise condition for the three-parameter Hopfield network to store k-cliques as r-stable states for fixed r.

Furthermore, a pattern within Hamming distance r of a k-clique converges after one iteration of the dynamics.
Proof For fixed r and k-clique x, there are 2 r possible patterns within Hamming distance r of x. Each of these patterns defines a pair of linear inequalities on the parameters x, y, z. However, only the inequalities from the following two extreme cases are active constraints. All the other inequalities are convex combinations of these.
1. r edges in the clique with a common node i are removed. 2. r edges are added to a node i not in the clique.
In the first case, there are two types of edges at risk of being mislabeled. The first are those of the form ij for all nodes j in the clique. Such an edge has 2(k − 2) − r neighbors and k−2 2 non-neighbors. Thus, each such edge will correctly be labeled 1 after one network update if and only if x, y, and z satisfy The other type are those of the formīj for all nodesī = i in the clique, and j not in the clique. Assuming r < k − 1, such an edge has at most k − 1 neighbors and k−1 2 − r non-neighbors. Thus, each such edge will be correctly labeled 0 if and only if Rearranging Eqs. (10) and (11) yield the first two rows of the matrix in the lemma. A similar argument applies for the second case, giving the last two inequalities. From the derivation, it follows that if a pattern is within Hamming distance r of a k-clique, then all spurious edges are immediately deleted by case 1, all missing edges are immediately added by case 2, and thus the clique is recovered in precisely one iteration of the network dynamics.
Proof of Lemma 1 The matrix inequalities in Lemma 2 define a cone in R 3 , and the cases z = 1 or z = 0 correspond to two separate components of this cone. For the proof of Theorem 1 in the main article, we use the cone with z = 1. We further assume y = 0 to achieve a sparsely connected matrix W. In this case, the second and fourth constraints are dominated by the first and third. Thus, we need x that solves There exists such a solution if and only if The above equation is feasible if and only if r ≤ k/2 .

Proofs of Theorems 1, 2
Fix y = 0 and z = 1. We now tune x such that asymptotically the α-robustness of our set of Hopfield networks storing k-cliques tends to 1/2 as n → ∞. By symmetry, it is sufficient to prove robustness for one fixed k-clique x; for instance, the one with vertices {1, . . . , k}. For 0 < p < 1 2 , let x p be the p-corruption of x. For each node i ∈ {1, . . . , 2k}, let i in , i out denote the number of edges from i to other clique and nonclique nodes, respectively. With an abuse of notation, we write i ∈ x to mean a vertex i in the clique; that is, i ∈ {1, . . . , k}. We need the following inequality originally due to Bernstein from 1924.
Proposition 1 (Bernstein's inequality [22]) Let S i be independent Bernoulli random variables taking values +1 and −1, each with probability 1/2. For any ε > 0, the following holds: The following fact is a fairly direct consequence of Proposition 1.

Lemma 3 Let Y be an n × n symmetric matrix with zero diagonal, Y ij
∼ Bernoulli(p). For each i = 1, . . . , n, let Y i = j Y ij be the ith row sum. Let M n = max 1≤i≤n Y i , and m n = min 1≤i≤n Y i . Then, for any constant c > 0, as n → ∞, we have P |m n − np| > c √ n ln n → 0 and P |M n − np| > c √ n ln n → 0.
Proof Fix c > 0. As a direct corollary of Bernstein's inequality, for each i and for any ε > 0, we have

It follows that
and thus from a union bound with ε = c ln n √ n , we have Since this last bound converges to 0 with n → ∞, we have proved the claim for M n . Since Y i is symmetric about np, a similar inequality holds for m n . To guarantee that all edges e in the clique are labeled 1 after one dynamics update, we need x > 1 N(e) ; that is, If f is an edge with exactly one clique vertex, then we have To guarantee that x f = 0 for all such edges f after one iteration of the dynamics, we need x < 1 N(f ) ; that is, In particular, if p = p(k) ∼ 1 2 − k δ−1/2 for some small δ ∈ (0, 1/2), then taking x = x(k) = 1 2 [ 1 2k + 1 k(1+2p) ] would guarantee that for large k the two inequalities (13) and (14) are simultaneous satisfied. In this case, lim k→∞ p(k) = 1/2, and thus the family of two-parameter Hopfield networks with x(k), y = 0, z = 1 has robustness index α = 1/2.

Clique Range Storage
In this section, we give precise conditions for the existence of a Hopfield network on .
Proof Fix z = 1/2 and r = 0 in Lemma 1. (We do not impose the constraint y = 0.) Then the cone defined by the inequalities in Lemma 1 is in bijection with the polyhedron I k ⊆ R 2 cut out by inequalities: When the polytope M k=m I k is nonempty, its vertices are the following points: One can now check that β = 1/D gives the best value, producing the range in the statement of the theorem. Next, note that v k 2 −v is the fraction of k-cliques in all cliques on v vertices, which is also the probability of a Binom(v, 1/2) variable equaling k. For large v, approximating this variable with a normal distribution and then using Mill's ratio to bound its tail c.d.f. Φ, we see that the proportion of cliques storable tends to for some constant C ≈ (D−1) 2 2D 2 ≈ 0.43.

Hopfield-Platt Networks
We prove the claim in the main text that Hopfield-Platt networks [13] storing all permutations on {1, . . . , k} will not robustly store derangements (permutations without fixed-points). For large k, the fraction of permutations that are derangements is known to be e −1 ≈ 0.36.

Proof of Theorem 4
Fix a derangement σ on {1, . . . , k}, represented as a binary vector x in {0, 1} n for n = k(k − 1). For each ordered pair (i, j ), i = j , j = σ (i), we construct a pattern y ij that differs from x by exactly two bit flips: 1. Add the edge ij .

Remove the edge iσ (i).
There are k(k − 2) such pairs (i, j ), and thus k(k − 2) different patterns y ij . For each such pattern, we flip two more bits to obtain a new permutation x ij as follows: 1. Remove the edge σ −1 (j )j . 2. Add the edge σ −1 (j )σ (i).
It is easy to see that x ij is a permutation on k letters with exactly two cycles determined by (i, j ). Call the set of edges modified the critical edges of the pair (i, j ).
Note that x ij are all distinct and have disjoint critical edges. Each y ij is exactly two bit flips away from x and x ij , both permutations on k letters. Starting from y ij , there is no binary Hopfield network storing all permutations that always correctly recovers the original state. In other words, for a binary Hopfield network, y ij is an indistinguishable realization of a corrupted version of x and x ij .
We now prove that, for each derangement x, with probability at least 1 − (1 − 4p 2 ) n/2 , its p-corruption x p is indistinguishable from the p-corruption of some other permutation. This implies the statement in the theorem.
For each pair (i, j ) as above, recall that x p and x ij p are two random variables in {0, 1} n obtained by flipping each edge of x (resp. x ij ) independently with probability p. We construct a coupling between them as follows. Define the random variable x p via: • For each non-critical edge, flip this edge on x p and x ij with the same Bernoulli(p). In other words, given a realization of x ij p , with probability 4(1 − p) 2 p 2 , this is equal to a realization from the distribution of x p , and therefore no binary Hopfield network storing both x ij and x can correctly recover the original state from such an input. An indistinguishable realization occurs when two of the four critical edges are flipped in a certain combination. For fixed x, there are k(k − 2) such x ij where the critical edges are disjoint. Thus, the probability of x p being an indistinguishable realization from a realization of one of the x ij is at least completing the proof of Theorem 4.

Examples of Clique Storage
To illustrate the effect of two different noise levels on hidden clique finding performance of the networks from Fig. 4, we present examples in Fig. 7 of multiple networks acting with their dynamics on the same two noisy inputs. Notice that nonclique fixed-points appear, and it is natural to ask whether a complete characterization of the fixed-point landscape is possible. Intuitively, our network performs a local, weighted degree count at each edge of the underlying graph and attempts to remove edges with too few neighbors, while adding in edges that connect nodes with high . Images show the result of dynamics applied to these noisy patterns using networks with all-to-all MPF parameters after L-BFGS training on 50,000 64-cliques (≈2e−31% of all 64-cliques), Large deviation parameters (x, y, z) = (0.0091, 0, 1), or MPF Theory parameters (x, y, z) = (0.0107, 0, 1) from Eq. (7) in the main text degrees. Thus, resulting fixed-points (of the dynamics) end up being graphs such as cliques and stars. Beyond this intuition, however, we do not have a way to characterize all fixed-points of our network in general.
In fact, this is a very difficult problem in discrete geometry, and except for toy networks, we believe that this has never been done. Geometrically, the set of all states of a binary Hopfield network with n neurons is the n-hypercube {0, 1} n . Being a fixedpoint can be characterized by the energy function becoming larger when one bit is flipped. As the energy function is quadratic, for each of the n bits flipped, this creates a quadratic inequality. Thus, the set of all fixed-point attractors in a binary Hopfield network is the n-hypercube intersected with n quadratic inequalities in n variables. In theory, one could enumerate such sets for small n; however, characterizing them all is challenging, even for the highly symmetric family of weight matrices that we propose here.