Neural Coding as a Statistical Testing Problem

We take the testing perspective to understand what the minimal discrimination time between two stimuli is for different types of rate coding neurons. Our main goal is to describe the testing abilities of two different encoding systems: place cells and grid cells. In particular, we show, through the notion of adaptation, that a fixed place cell system can have a minimum discrimination time that decreases when the stimuli are further away. This could be a considerable advantage for the place cell system that could complement the grid cell system, which is able to discriminate stimuli that are much closer than place cells.


Introduction
Grid cells are particular neurons in medial enthorinal cortex [16] that exhibit a periodic spatial firing pattern.Whereas place cells in the hippocampus fire at a given location and appear to be associated with an allocentric representation, grid cells fire at each node of a hexagonal lattice and appear to be involved in a self-localization representation [17].The grid cell system not only encodes spatial position but also direction and velocity [20] or sounds [1] and could even be used by mammals to encode episodic memories [5].Organized in modules (one module being dedicated to one lattice scale), the grid cells in each module have a uniform distribution, while the progression of scales between modules appears to be geometric [24].
Since their discovery in 2005 [11], which earned O'Keefe and the Mosers the Nobel Prize, grid cells have been intensively studied from a theoretical point of view in relation to place cells [19].Some authors (see for example [21]) are interested in how neural networks can generate such patterns, while others try to explain the hexagonal lattice [6] or the exact geometric progression of scales [27,23].
In particular, one of the main focuses has been the encoding capacity of the system.More specifically, the authors focused on a statistical capacity measure, namely the Fisher information, because of its link with estimation.Indeed in Statistics, Cramér-Rao bound states that the L 2 -error (i.e., E(|ŝ n − s| 2 )) of an unbiased estimator ŝn (i.e. , E(ŝ n ) = s)) of a given quantity s is lower bounded by I n (s) −1/2 , where I n (s) is the Fisher information.Moreover, this lower bound is generally achieved by Maximum Likelihood Estimators (MLE), at least asymptotically in a context of n i.i.d.observations [7].Informally, the Cramér-Rao bound is interpreted as follows: there is a "best" estimator (which would be the MLE) that would achieve the smallest error I n (s) −1/2 .So if a code refers to a model that describes the precise influence of s on the spike trains emitted by the neural cells, Cramèr-Rao bounds paves the path of finding the best code as the code that maximizes the Fisher information.
Initially, [4] worked on the relationship between mutual information and Fisher information for place cells and other types of neurons with receptive fields.They showed in particular that if s represents a position, then I n (s) grows linearly with n, for a particular code where the n place cells responses to position s are i.i.d.This leads to an error in n −1/2 .Then Fiete and her co-authors [9,22] showed that for a given number n of neurons, grid cells can encode many more positions and that I n (s) grows exponentially with n the number of place cells.In particular, this means that the accuracy of the position estimator that can be done with n grid cells is much more precise than the accuracy that can be done with n place cells (see also these related works [26,15,18]).
In the present work, we take a different statistical viewpoint from that of unbiased estimation using Fisher information: we take a testing approach with a minimax viewpoint.Our first argument is that estimation is very complex in fact.For instance, when using Cramér-Rao bound, one has to be aware that it applies only to unbiased estimators and the Stein phenomenon shows that biased estimator might sometimes be faster that I n (s) −1/2 [25].Minimax theory [25] can help to shed more lights on the right order of magnitude of the error by computing the minimal value of max s E(|ŝ n − s| 2 ).Our second argument is that pointwise testing (e.g.testing a point like "s = 0" for instance) is easier than estimation: imagine an estimator ŝn of s whose variance depends on s itself.When testing s = 0, we know the variance of ŝn under this hypothesis and we can therefore use this to describe a rejection region for the test.Going further, the fact that tests are easier to build than estimation, can lead to surprising theoretical effects: depending on the regularity of the s to estimate, it has been shown that typically one makes a minimax estimation error of say ∆ n , with n the number of observations, whereas there exists some test that can detect that s ̸ = 0 as soon as the distance between s and 0 is larger than ρ n , with ρ n that is negligible with respect to ∆ n [12,2].The difference is more than a mere multiplicative constant: testing rates can be faster than estimation rates.Therefore we want to adopt this testing point of view to see if it can improve our understanding of the place cells/grid cells code.
From this testing point of view, we think that a good encoding system should be able to discriminate quickly between two stimuli or positions s 1 and s 2 , as soon as they are sufficiently apart.In particular, the testing procedure can take into account at which distance s 1 is from s 2 (information that cannot be taken into account, at least as explicitly, in an estimation procedure) and one can have a discrimination time between two points that depends on this distance.Therefore the purpose of the present work is to study the following three theoretical problems.To make things more concrete, imagine a rat in a maze who should learn a certain behavior in position s 1 and another one in position s 2 .We give for each problem the "rat" interpretation with respect to this situation.Definition 1.1.Given n neurons obeying a certain stochastic model parametrized by a code f (s) in response to a stimulus/position s ∈ S presented to the system for a duration T , we define the minimal discrimination time between two locations s 1 and s 2 for the code f , with precision α (denoted T min (f, s 1 , s 2 , α)), as the minimal time the output of the n neurons must be observed in order to distinguish s 1 from s 2 with a probability of error less than α ∈ (0, 1).
Problem 1 consists in understanding the behavior of the minimal discrimination time T min (f, s 1 , s 2 , α).If this minimal time is infinite, it expresses in particular the fact that the coding system cannot distinguish s 1 from s 2 .More precisely, this quantity also expresses how the minimal discrimination time decreases when the distance between s 1 and s 2 increases.Note in particular that the test which is able to achieve this minimal time can depend on the precise knowledge of s 1 and s 2 because we define this as a test and this will be the case in our solution.From the "rat" perspective, the discrimination time between the two positions, is a good lower bound for the reaction time of the rat to this given task, because this is only after realising that it is in position s 1 or s 2 that the rat can proceed to the learned associated behavior.Definition 1.2.We equip S with a certain metric d.We define the minimax discrimination time of a family of codes F at distance ρ by This quantity can be seen as the rate at which the best code f in a certain family F (for instance place cells or grid cells) can discriminate all stimuli at distance ρ or more.
Problem 2 consists in computing upper and lower bounds for this minimax discrimination time.In particular, this rate is a function of n and ρ and it is not clear whether the best code f depends on ρ or not.From the "rat" perspective, it implies that the important parameter for the distinction between s 1 and s 2 is the distance: if the brain "uses" the best code in the family F at distance at least ρ", then as soon as d(s 1 , s 2 ) ≥ ρ, one can guarantee a reaction time that is at least T (F, ρ, α).From a modeling point of view, this raises a good question: why would the brain "use the best code in the family F at distance at least ρ" ?.The important part in the previous sentence is in particular: why would the brain focus on one particular ρ?This leads to the third problem: adaptation.Definition 1.3.A code f ∈ F is said to be adaptive if it achieves the rate defined in (1.1), up to multiplicative constants, in a given range of values for ρ, that is, ∀ρ in a given range, sup s1,s2∈S: d(s1,s2)≥ρ Problem 3 consists in finding an adaptive code and the corresponding range of ρ for different family of codes.Here the word adaptation is meant in the sense of theoretical statistics/minimax theory [12,2,25].From the "rat" perspective, adaptation (in the previous statistical sense) is fundamental.We can indeed pinpoint two scenarios about the learning.In Scenario 1, before even learning, the system (place cells or grid cells) can achieve for many ρ the best discrimination time (adaptive code).The only thing that the rat has to learn is the specific response of the cells to position s 1 and s 2 to perform the "best" discrimination test and decide what is the correct behavior.This scenario has the advantage of minimal training time: if a new couple (s 1 , s 2 ) is presented, the learning time should be very quick.In Scenario 2, the system is not adaptive and then each time a new couple (s 1 , s 2 ) at a different distance is proposed, the rat either is stuck with a suboptimal code leading to a suboptimal discrimation time, or it has to learn a new representation/new code at the same time as the new couple to react faster.Scenario 2 is of course less "adaptive" because the rat would take a longer time to react/learn.Solving these three theoretical problems should help us to distinguish between several types of codes and in particular between place cells and grid cells.The minimal discrimination time computed in Problem 1 is an idealized reference of a certain reaction time, which depends on the coding system.It is already slightly more complex than simple estimation accuracy since it encompasses the idea that if the stimuli are very different, the reaction time should be faster.Second, the minimax time in Problem 2 should in particular tell us which system seems to be more competitive than the other.Finally, the adaptation viewpoint in Problem 3 should give a more specific viewpoint on the practical use of each system to see if it is possible to discriminate stimuli at different scales with the same code.
To push the mathematics as far as possible, we use a very simple stochastic model: the spike trains of the n neurons are homogeneous Poisson processes (i.e.constant firing rates in time, that only depends on the position/stimulus s) with coding performed only via their respective firing rates.We also idealize the rate code f into simple step functions with only two values and used the circle as the stimulus/position set (which is consistent with, for example, a 1D circular maze or with the direction of movement [10]).
We have been able to completely answer all three problems for the place cells code, for which adaptation is possible.All three problems are also solved for grid cell codes when the number of cells per module and scales are fixed, but minimax rate and adaptation for general grid cells code, where even the number of modules is let free is still an open problem.In particular, we have shown that the minimal distance ρ that can be detected by n place cells is of order 1/n, a distance that is much smaller than the rate n −1/2 obtained via the classical use of Fisher information (see [4]).It also appears that grid cells have much better resolution than place cells, up to 2 −n , which is consistent with the bounds given by [22].Grid cells may also be faster for discrimination than place cells, for a fixed ρ, if the system is well calibrated.However, it does not seem clear that there is an adaptive code for general grid cells, and in this sense, place cells might have an advantage in terms of reaction time for sufficiently distant stimuli/positions.
In Section 2, we give the stochastic model, the main notation and compute the minimal discrimination time (Problem 1) in a very general setting, as well as a lower bound in 2 −n on the smallest distance ρ that can be discriminated whatever the code.In Section 3, we study more deeply the place cells code, compute minimax rates (Problem 2) in ⌊nρ⌋ −1 and prove that even random codes are adaptive in this setting (Problem 3).In Section 4, we study grid cells code and prove that it can reach the resolution 2 −n and that another grid code can also achieve the rate ⌊n/ log 2 (1/ρ)⌋ −1 .Numerical illustration is provided in Section 5. Conclusion, Discussion and Perspectives are given in Section 6. Auxiliary results are postponed to Section 7.

Model and notation
Since we are interested in grid cells that have a periodic feature, it is simpler to represent stimuli as a circle than as an interval.
We consider stimulus/position s that belong to S 1 = [0, 1), that is considered in a periodic way, i.e. 0 ≡ 1. Equivalently, we can represent this set of stimuli/position as a circle.
The stimulus is encoded by n neurons, which emit spikes as independent homogeneous Poisson processes.By homogeneous Poisson process, we mean that if the stimulus s remains constant through time, then the firing rate of the Poisson process is also constant through time.More specifically, for a given stimulus s in S 1 , each neuron i has a firing rate f i (s), which only depends on s, as long as the stimulus s is presented.
In the literature, to model cells with receptor fields, these functions f i (s) are most of the time centered around a certain favorite stimulus for which the value is very large, Visual representation of the main notions.In A, S u , the circle of radius u with a point a and its corresponding argument θ a ∈ [0, u) In B, the visual representation of the intervals.In red and green, the visual representation of the function s → g(s) = µ a,b (s)+ 1 a,b c (s), with the value µ in red and 1 in green.In C and D, visual representation of the action of mod.In these pictures, α = a mod 1/4 = a 1 mod 1/4 = a 2 mod 1/4 = a 3 mod 1/4.Also in D, the representation of the function s → g 1/4 (s) = µ1 α,β (s) + 1 α,β c (s) with the same color code as in B. In C, the representation of the periodic function s → g(s) = g 1/4 (s mod u) = µ1 α,β (s mod 1/4) + 1 α,β c (s mod 1/4) whereas it returns to some very small rate when it is far from this favorite stimulus.
To simplify the mathematical computations, we decide to use piecewise constant functions to model f i .More precisely, these functions only take two values µ and 1 with µ much larger than 1.We say that neuron i responds to stimulus s or is activated by s when f i (s) = µ.We denote the code f = (f 1 , ..., f n ), the vector of the firing rates and I f s is the set of neurons responding to s that is To define more precise the piecewise constant functions f i , especially for the grid cells code, we need to introduce further mathematical notation.Let S u be the circle of radius u, which can be put in one-to-one map with the segment [0, u), through the argument a → θ a as seen in Figure 1.A.More mathematically, for a given a ∈ S u , θ a is the only value in [0, u) such that a corresponds to the point u(cos(2πθ a /u), sin(2πθ a /u)).The distance we use on S u is the geodesic distance on the circle divided by 2π.This can also be viewed as At most, it is u/2.When u = 1, we write d instead of d 1 for short.Observe that, in S 1 , the largest distance one can have is 1/2.The interval a, b is defined as the set of all points s such that θ s ∈ [θ a , θ b ) if θ a < θ b , and the set of all points the s such that θ s / ∈ [θ b , θ a ) if θ b < θ a with the convention that a, a = ∅ is the empty set.Note that the complementary of a, b , satisfies a, b c = b, a .
See also Figure 1.B.
The first code that we are interested in, corresponds (via this piecewise constant simplification) to classical neurons having a certain receptor field, or to place cells, as mentioned in the introduction.

Place cells code P.
A code f is a place cells code if and only if for all i in [n] = {1, ..., n}, there exist a i , b i in S 1 such that for all s in S 1 , f i (s) = µ1 ai,bi (s) + 1 ai,bi c (s).With this definition, we can assimilate the receptor field of neuron i with the interval a i , b i .A typical representation of this place cells code can be seen Figure 1.B Grid cells have a periodic structure.Therefore, we need to define properly periodic functions as well.To do so, we use the modulus operation.It is defined for x ∈ R and u ∈ R >0 by x mod u = x − ⌊x/u⌋u ∈ [0, u), the remainder of the euclidean division of x by u, where ⌊x/u⌋ is the largest integer less than or equal to x/u.Informally, a function g of period u on S 1 is a function that only depends on the value of the stimulus s mod u, i.e. g(s) = g(s mod u) for all s ∈ S 1 .Let us define module on circles more formally: for radii u, v ∈ R >0 and any s ∈ S v , we denote t = s mod u the point in S u such that θ u t = θ v s mod u.
To give an intuition of what the modulus operation does, we put an example in Figure 1C and D. In this sense, we can formally define periodic functions s → g(s) on S 1 with period u by saying that g(s) = g u (s mod u), for some function g u on S u .However, if we cannot divide S 1 in an integer number of intervals of length u, the periodic functions are not completely well defined.This remark leads to the restriction on the λ i 's in the following definition of the grid cells code.
Grid cells code G((n i , λ i ) i=1,...,m ).[24] showed experimentally that grid cells are grouped by modules.A given module is dedicated to a certain scale (or spatial periodicity of the firing pattern).Once the scale is fixed the exact localisation of the centers of the grid does not seem to show any particular structure.But the progression of the scales seems to be done in quantized manner.We model this as follows.A code f is a grid cells code with m modules of cardinals (n i ) i=1,...,m and scales (λ i ) i=1,...,m if and only if the n neurons are grouped into m modules M 1 , . . ., M m so that there is n i := |M i | cells per module i with m i=1 n i = n and all these n i cells have a periodic code of period λ i .More precisely, for a given neuron j ∈ M i , there exists a i,j and b i,j in S λi , the circle of radius λ i , such that for all s ∈ S 1 f i,j (s) = µ1 ai,j ,bi,j (s mod λ i ) + 1 ai,j ,bi,j c (s mod λ i ) To ensure coherence of the respective periods in each module, we assume that 1 := λ 1 > λ 2 > . . .> λ m are real positive numbers satisfying the following relations: where the last union is taken only on the λ i 's such that Note that grid cells code with only one module are place cells code and this also justifies why it was easier to consider stimulus on the circle in the first place.
Some of the results we are going to prove also hold for more general binary codes.

General binary code A code f is a general binary code if and only if for all i in
[n] = {1, ..., n}, s → f i (s) is a piecewise constant function on S 1 taking only two possible values, µ and 1. Place cells code and grid cells code are just particular cases of general binary codes.In a general binary code, the set of stimulus to which neuron i responds forms a borelian of S 1 .Notice that the choice of 1 as the smallest of the two possible rates for the code is to simplify computations, but one can always transform the data to be in this case.Indeed, the time-changing theorem [3] allows us to change time in order to fix the smallest rate at 1.

The statistical testing problem
A stimulus s is applied for a time T and it results from the stimulus s, n spike trains for the n different neurons, i.e.N 1 , ..., N n , the n independent Poisson processes on [0, T ].
We consider that the individual (or the agent or the brain) has only these spike trains as source of information on s and that it tries to use the best statistical tool available, based on N 1 , ..., N n .This philosophy gives us an ideal bound on the performance that the brain can do with one or another encoding system.It is typically used by [4] to say that the inverse of the Fisher Information gives the smallest variance of an estimator of the stimulus (Cramer-Rao bound) and that maximizing the Fisher Information gives the best code.We want to basically adopt the same point of view but for testing instead of estimating.
In the testing problem, given s 1 and s 2 two possible values for s, the individual has to guess that s = s 1 or s = s 2 based solely on N 1 , ..., N n .Mathematically, it means that this guess is a test Φ = Φ(N 1 , ..., N n ) that can only take two values s 1 or s 2 .The individual can make two mistakes P s1 (Φ = s 2 ) that is the probability that the guess is s 2 whereas the applied stimulus is s 1 , and reciprocally P s2 (Φ = s 1 ), that is the probability that the guess is s 1 whereas the applied stimulus is s 2 .There are varieties of possible tests and we are interested only by the ones, which have (up to multiplicative constants) the smallest possible error, that is we want to find Φ such that it achieves p e (s 1 , s 2 ) = min Order of magnitude of p e For s 1 , s , and Cµ = (µ − 1) log(µ).
Proof.We set ), the set of neurons activated by s 1 (resp.s 2 ).We also denote P 1 (resp.P 2 ) the distribution of N 1 , ..., N n and E 1 (resp.E 2 ) the corresponding expectation, when the applied stimulus is s 1 (resp.s 2 ).First notice that the Kullback-Leibler distance between both distribution is K where dP2 dP1 the Radon-Nikodym derivative of P 2 with respect to P 1 .One can check that As a consequence, it follows that The lower bound on p e follows from Theorem 2.2 of [25].
this is not the case we exchange s 1 and s 2 ).It is now sufficient to consider the test Ψ defined by (2.4) Indeed, ).Thus, by applying inequality (7.1) of Lemma 7. .
which concludes the proof.

Minimal discrimination time
From Proposition 2.1 we have that for a given admissible error level α ∈ (0, 1), if Hence the minimal discrimination time stated in Problem 1, T min (f, s 1 , s 2 , α), is of the order of 1/∆ f s1,s2 , up to positive multiplicative constants in α and µ (see also Section 5 for numerical verification).This turns the other statistical problems into combinatorial problems.In particular, discrimination is not possible if and only if ∆ f s1,s2 = 0, that is The behavior of the quantities in α and µ are quite intuitive.If the level α tends to 0, T min tends to infinity.If µ tends to infinity, T min tends to 0. Considering that µ is some biological parameter that is fixed, as well as the admissible level α of reliability of the system, we are now focusing on the behavior of 1/∆ f s1,s2 and from now on, we denote with a slight abuse of language . (2.5) Let us now introduce the minimal time for which one can distinguish any pair of stimuli which are at least ρ apart with ρ ∈ (0, 1/2]: Clearly, the function ρ → T (f, ρ) is non increasing (the larger the distance between two stimuli the smaller the time one needs to observe the activity of the network to distinguish one from the other).Before speaking of minimax codes (Problem 2), let us derive an absolute lower bound on the range that can be detected by a general binary code f .Proposition 2.2.Let f be a general binary code.Then, for any This means that whatever the code, there exists always two stimuli at distance less than 2 −n that cannot be discriminated, whatever the observation time.
Proof.For A ⊆ [n], let S A be the set of all s ∈ S such that Since {S A } A⊆[n] forms a partition of S 1 there exists A ⊆ [n] such that Leb * (S A ) ≥ 2π/2 n .Suppose that diam(S A ) =: sup s1,s2∈S A d(s 1 , s 2 ) ≤ ρ.If this was true, then the set S A would be contained in an interval of Leb * -measure 2πρ so that 2π/2 n ≤ Leb * (S A ) ≤ 2πρ, implying that ρ ≥ 1/2 n , a contradiction.
Therefore, if ρ < 1/2 n , then we must have that diam(S A ) > ρ, implying that we can find s 1 , s 2 ∈ S A such that d(s 1 , s 2 ) > ρ.In this case, s 1 and s 2 cannot be distinguished and T min (f, ρ) = ∞.

Results for place cells code
In this section, we focus our analysis on the class of place cells code P defined in (2.1).Our first result is a lower bound for which is (up to multiplicative constants in α and µ) the minimax discrimination time for the class of place cells code defined in (1.1).
Hence, it follows that and Therefore, it follows that ρ is an integer, we must have Since f ∈ P is arbitrary, the above inequality ensures that , so that the first part of the proof follows from inequality above.
To conclude the proof, observe that if ρ ≤ (2(n + 1) To better understand the lower bound provided by Proposition 3.1, let us adopt an asymptotic point of view in which the number of observed neurons n → ∞ and ρ = ρ n is a function of n.For example, when ρ n is a constant function of n the lower bound decreases as 1/n, whereas in the regime ρ n → 0 such that nρ n → ∞ the lower bound behaves as (2nρ n ) −1 .Moreover, as long as ρ n ≤ (2(n + 1)) −1 ≈ (2n) −1 , whatever the code f ∈ P, there exists always two stimulus at distance ρ n that cannot be distinguished, whatever the observation time.

Upper bounds and minimax codes
Next we obtain an upper bound for T (P, ρ) matching the lower bound above, up to multiplicative constants.To that end, we study the behavior of T (f, ρ) for some examples of place cell codes and use the fact that T (P, ρ) ≤ T (f, ρ) for any f ∈ P.

Example 3.2 (1-Uniform code).
For each i ∈ {0, 1, . . ., n}, let p i be the point in S 1 associated with θ i = i n .The 1-uniform code is defined as f n,n = (h 1 , . . ., h n ) (the superscript n, n stands for n neurons divided into n groups of size 1) where each h i is given by (2.1) with a i = p i−1 and b To check that, suppose first that ρ < 1/n.In this case, take θ ρ such that ρ ≤ θ ρ < 1/n , denote p ρ the point in S 1 associated to θ ρ and observe that p ρ ∈ p 0 , p and the result follows.

Example 3.3 (d-Uniform code).
For each k ∈ {0, 1, . . ., d}, let p k denote the point in S 1 defined by θ k = k d and let h k be given by (2.1) with a k = p k−1 and b k = p k .Also, write L = ⌊n/d⌋.The d-uniform code is defined as f n,d = (f 1 , . . ., f n ) (the superscript n, d stands for n neurons divided into d groups) where 1/⌊n/d⌋, otherwise .
To verify that, consider first the case that ρ < 1/d.In this case, take θ ρ such that ρ ≤ θ ρ < 1/d, denote p ρ the point in S 1 given by θ ρ and observe that p ρ ∈ p 0 , p 1 , Since the lower bound L is attained for some pair s 1 , s 2 ∈ S 1 (for example, take s 1 and s 2 given by θ s1 = 0 and θ s2 = 1/2 respectively), the results follows.
This codes allows us to prove the following result.
and is therefore minimax up to multiplicative constant.

A particular adaptive code
Now that we have found a minimax code for given ρ up to multiplicative constant, one could wonder if there exists a given place cells code, which for every ρ in a given range of values, achieves the minimax rate up to a constant.We refer to this problem as Problem 3, adaptation to the distance.The following code is a particular example of such a code.Example 3.5 (An adaptive place cells code).Let g = (g 1 , . . ., g n ) be the code in P with f i defined by (2.1) where the points a i , b i ∈ S 1 are associated to θ i = i/2n and θ i + 1/2 respectively.A visualisation is given in Figure 2.For the code g, one can show the following.
In particular the g code cannot discriminate at distance ρ < 1/(2n) and it is adaptive to the distance in the class P, up to some absolute multiplicative constant, in the range ρ ∈ [1/(2n), 1/2].
To conclude the proof, observe that min where s 0 is the point S 1 defined by θ 0 = 0 and the result follows.

Random codes are adaptive as well
In the next example, we show that even random codes are adaptive (in the range ρ ≥ n −1/2 , up to multiplicative constants) with large probability.
Example 3.7 (Random code).Let A 1 , . . ., A n and B 1 , . . ., B n be independent and uniformly distributed points on S 1 .The random code f r is defined as f r = (f r 1 , . . ., f r n ), where for each i ∈ [n], f i is of the form (2.1) with a i = A i and b i = B i .For the random code f r , we can prove the following result.Proposition 3.8.There exist constants K 1 , K 2 > 0 such that for each n ≥ K 1 the following property holds: for each x ∈ R >0 and 0 ≤ ρ ≤ 1/2, with probability at least 1 − exp (−x).In particular, for any δ ∈ (0, 1/4) and each n with probability at most 1 − exp(−x).
Proof.First observe that to prove inequality in (3.4), it suffices to show that min s1,s2∈S 1 :d(s1,s2)≥ρ with probability at least 1 − exp (−x).To that end, first observe that With this notation, we deduce from the previous inequality that min s1,s2∈S 1 :d(s1,s2)≥ρ Now, by Lemma 7.5, Hence, it follows that for s 1 , s 2 ∈ S 1 such that d(s we have that where in the last inequality we have used that 0 ≤ ρ ≤ 1/2.From the above inequality, we deduce that max s1,s2∈S 1 :d(s1,s2)≥ρ where W C is the random variable defined as Combining the above inequalities, we deduce that min s1,s2∈S 1 :d(s1,s2)≥ρ By applying the Bousquet inequality (e.g., see inequality (5.49) on page 170 of [14]), we deduce that for any x ∈ R >0 , From now on, we denote K a positive constant which can change from line to line.Let us denote V (C) the VC-dimension of the set C. By Lemma 6.4 of [14], there exists an absolute constant K such that, for all n for which 1 where in the last inequality we have used that 0 ≤ ρ ≤ 1/2.Combining the above inequality with (3.8) and by using again that 0 ≤ ρ ≤ 1/2, it follows that for all n for which n ≥ 2K 2 V (C)(1 + log(2)/2) and any x ∈ R >0 , with probability at least 1 − exp(−x).Let us assume that the VC-dimension V (C) is bounded by some absolute constant.In this case, it follows from the above inequality and (3.7) that for all n ≥ K 2 and any x ∈ R >0 , min s1,s2∈S 1 :d(s1,s2)≥ρ with probability at least 1 − exp(−x), proving (3.6).For the VC-dimension V (C), there are various ways to see that it is finite.One way is to say that ξ = (A, B) ∈ C s1,s2 is equivalent to saying that (θ A , θ B ) belongs to the union of • [0, min(θ s1 , θ s2 )] × (max(θ s1 , θ s2 ), 1] • (max(θ s1 , θ s2 ), 1] × [0, min(θ s1 , θ s2 )] • [min(θ s1 , θ s2 ), max(θ s1 , θ s2 )) × [min(θ s1 , θ s2 ), max(θ s1 , θ s2 )) • [0, min(θ s1 , θ s2 )] × [0, min(θ s1 , θ s2 )] Hence it is included in the union of 5 rectangles.It is well known that the VC dimension of the family of rectangles is 4. Class C is included in the 5-fold unions of rectangles, whose VC dimension is bounded by 4 * 5 log(5) up to multiplicative constants, thanks to [8].Therefore Class C is of finite VC-dimension, which concludes the proof.Note that with a closer look at all possibilities, one can show that class C cannot shatter samples of ξ i 's of size 5 and that in fact its VC dimension is therefore 4.However, the proof with all the possibilities is much longer than using the bound given by [8].

Summary of the results on place cells codes 4 Results for grid cells code
In this section, we discuss some results for the class of grid cells code defined in where I f,i s = {j ∈ M i : f ji (s) = µ} is the set of cells activated by stimulus s ∈ S 1 in the i-th module.The first result establishes a useful link between grid cells code and place cells code.Proposition 4.1.For any f ∈ Ḡ and s 1 , s 2 ∈ S 1 , the following inequality holds Proof.Suppose that ∆ f s1,s2 = |I f s1 \ I f s2 | (if not exchange s 1 and s 2 ).On one hand, observe that since the modules are disjoint we have that On the other hand, using that and the result follows.
4.1 Lower bound on the minimax discrimination time for G((n i , λ i ) i=1,...,m ) The following result provides a lower bound on the minimax discrimination time (up to constants depending on α and µ) defined in (1.1), when restricted to the class of grid cells code: , with the convention that λ m+1 = 0.In particular, if we define j ρ = max{1 ≤ k ≤ m : λ k ≥ ρ}, then T (G((n i , λ i ) i=1,...,m ), ρ) = ∞ whenever one of the following conditions hold: • either there exists 2 ≤ k ≤ j ρ such that Proof.We need to compute an upper bound for min s1,s2∈S 1 :d(s1,s2)≥ρ ∆ f s1,s2 uniformly on f ∈ F 2 .By the definition of j ρ , if j ρ ≥ 2 then we have that Let us consider first the case j ρ ≥ 2. In this case, let 2 ≤ k ≤ j ρ and take s 1 , s 2 ∈ S 1 such that d(s 1 , s 2 ) = λ k ≥ ρ.By Lemma 7.3 it follows that s 1 mod λ i = s 2 mod λ i for all k ≤ i ≤ m so that for all j ∈ M i , Hence, by Proposition 4.1 we have that For each 1 ≤ i ≤ k − 1, let ρ i ℓ in S λi defined by θ ρ i ℓ = ℓλ k with 0 ≤ ℓ ≤ D i+1:k , where D i+1:k := λ i /λ k is an integer larger than 2 for all i < k.For later use, let us also define D 2:1 = 1.By arguing as in the proof of Proposition 3.1, one can show that Proof of the Claim.Write ℓ = rD i+1:k + ℓ mod D i+1:k for some positive integer r.Since λ i = D i+1:k λ k , we deduce that This concludes the proof of the Claim.
By periodicity, we deploy these points on module 1, there are D 2:jρ L of those.Taking their modulo, they correspond on module i = 1, ..., m to D i+1:jρ L pairs at distance ρ.Each of these pairs are at distance a multiple of λ k for k > j ρ and therefore cannot be detected by the modules for k > j ρ .Therefore a similar argument as above lead us to Also, as before for all k ≥ j ρ .Finally since ρ ≥ ρ, we get that as long as j ρ ≥ 2, (4.3) holds.
Hence it can only be detected by the first module and min s1,s2∈S 1 :d(s1,s2)≥ρ for all k ≥ 1 and we conclude as for the other cases.
To conclude, let us just remark that if there exists 2 ≤ k ≤ j ρ such that , so that the integer part in the lower bound is null.The same phenomenon appears at k = j ρ if j ρ ρ < min i≤jρ λi 6ni .
To understand better the bound given by Theorem 4.2, note that 1/⌊2n i ρ/λ i ⌋ is in fact the rate for the place cells code in module i.So, roughly up to multiplicative constant, to distinguish at distance ρ, the code needs to be such that the modules of the grid cells have to be coherent (i.e. the period kλ k+1 needs to be detected by at least one of the modules before, for all k such that λ k+1 ≥ ρ) and that j ρ ρ also needs to be in the detection range of at least one of modules i ≤ j ρ .It is not clear whether the factor k should be present or not.Also, for the rate itself note that k i=1 ni λi λ k+1 is not necessarily monotonous in k.Hence we could be in a situation where even if we are interested by small ρ that cannot be detected by the first module, the rate of detection of λ 2 by the first module acts as a threshold that impacts the rates at ρ.For the code g gc , one can prove the following result.• for all k ≤ j ρ , there exists j < k such that λ k ≥ λj nj • and there exists j ≤ j ρ with ρ ≥ λj nj , we have that , meaning that g gc is adaptive in this range of ρ up to a factor log 2 (1/ρ).
Proof.Take a pair s 1 , s 2 ∈ S 1 such that d(s 1 , s 2 ) ≥ ρ and define By using Lemma 7.2, we then conclude that for 1 ≤ j ≤ j s1,s2 , Hence, proceeding as in the proof of Proposition 3.6, we have then that Combining the above inequality and Proposition 4.1, we deduce that min s1,s2∈S 1 :d(s1,s2)≥ρ To conclude the proof we need to show that To that end, first observe that and also that for each k 1) it follows that 2 log 2 (ρ −1 ) ≥ log(ρ −1 ) + 1 ≥ j ρ ≥ k, so that from the above inequality we obtain that By similar arguments, one can also show that Combining the last two inequalities above with (4.5), we then deduce that proving (4.4), and the result follows.
denote the set of neurons in module i.Let (A ij ) j∈Mi and (B ij ) j∈Mi be independent and uniformly distributed points on S λi .The random code f r gc for the grid cells is defined as f r gc = (f r gc,11 , . . ., f r gc,ij , . . ., f r gc,M n ), where f r gc,ij is given by (2.2) with a ij = A ij and b ij = B ij for each i ∈ [m] and j ∈ M i .Proposition 4.6.There exist constants K 1 , K 2 > 0 such that if min 1≤i≤m {n i } ≥ K 1 the following property holds: for x ∈ R >0 and 0 ≤ ρ ≤ 1/2, T (f r gc , ρ) is upper bound by the inverse of the ceiling function of with probability 1 − e −x , with the convention λ m+1 = 0.In particular, denoting with probability 1 − e −x , that is f r gc is adaptive in the range of ρ where the following extra conditions holds: • and also that Proof.Fix 0 ≤ ρ ≤ 1/2, take s 1 , s 2 ∈ S 1 with d(s 1 , s 2 ) ≥ ρ and define, as in the proof of Proposition 4.4, Hence, by Proposition 4.1 we have that Next, proceeding as in the proof of Proposition 3.8 one can show that min u1,u2∈S λ i :d(u1,u2)≥ρ where W Ci is the random variable given by Arguing for each W Ci as it was done for W C in the proof of Proposition 3.8, one deduces that if with probability at least 1 − m i=1 exp(−x i ).Therefore, combining inequalities (4.7), (4.8) and (4.9) it follows if min 1≤i≤m {n i } ≥ K 2 then for all 1 ≤ k ≤ m and all x 1 , . . ., By Cauchy-Schwarz inequality, we know that where in the second inequality we have used that λ i ≤ 2 −(i−1) for each 1 ≤ i ≤ m.As a consequence, we deduce that Similarly, one can show that if the second extra condition holds then we also have that From the last to inequalities above it is easy to deduce (4.6), concluding the proof of the result.

What can we say about the general class Ḡ?
The rate Proof.First, we will show that T (f d , ρ, α) ≤ 1.To that end, we need to show that for all s 1 , s 2 ∈ [0, 1) such that d(s 1 , s 2 ) ≥ ρ, we have max Assuming that the claim is true, we can then use again Lemma 7.4 to conclude that We now prove the Claim.We argue by contradiction.Suppose that (θ s1 ) j = (θ s2 ) j for all j ∈ [n].Suppose also that θ s1 ≤ θ s2 ≤ θ s1 + 1/2 (if this is not the case we change s 2 by s 1 ).In this case,  Proof.Let us first compute the corresponding lower bound for T (G b,m ) provided by Theorem 4.2.Notice that for all 2 ≤ k ≤ j ρ we have that Moreover, one can check that jρ i=1 Hence Theorem 4.2 implies that T (G b,m , ρ) ≥ 1 3⌊n/m⌋ .Since, ⌊n/m⌋/2 ≥ 1 we have that y 1 ≥ ⌊n/m⌋/4 (here we use that ⌊x⌋ ≥ x/2 when x ≥ 1). Moreover, ⌊n/m⌋ρ2 j > (⌊n/m⌋/2) jρ−1 j=0:⌊n/m⌋ρ2 j ≥1 ρ2 j , so that if we denote j min = min{0 ≤ j ≤ j ρ − 1 :: ⌊n/m⌋ρ2 j ≥ 1}, then y jρ > (⌊n/m⌋/2) jρ−1 j=jmin ρ2 j = (⌊n/m⌋/2)2 jmin ρ(2 jρ−jmin − 1).This result about adaptivity is very interesting.Indeed it tells us that whatever the balanced adaptive code that we use with m modules, the minimax discrimination time does not vary as a function of ρ and in particular does not decrease when the distance between two stimuli increases.It stays constant at m/n up to multiplicative constants.
In the extreme case with m = n modules, we recover the extreme dyadic code whose minimax discrimination time is 1 whatever the distance.On the other hand, and as a corollary of Proposition (4.10), we have the following result.
Corollary 4.11.For all 2 −n/2 ≤ ρ ≤ 1/2, let m = ⌊log 2 (ρ −1 )⌋.Then the code g b,m gc satisfies The proof of Corollary 4.11 is straightforward.Note that this result shows that if we know in advance at which distance ρ one needs to detect, one can use a balanced adaptive code with m = ⌊log 2 (ρ −1 )⌋ modules to reach a rate much faster than the place cells codes.Indeed, log 2 (ρ −1 ) ≪ ρ −1 when ρ is small.On the other hand, once m is fixed Proposition 4.10 tells us that the minimax discrimination time is then constant and cannot decreases when ρ increases as log 2 (ρ −1 )/n.In this sense, balanced grid cell codes cannot be adaptive.The question remains open for unbalanced grid cells codes.

Summary of the results on grid cells codes
1.One can compute the minimax discrimination rate on G((n i , λ i ) i=1,...,m ).
2. In general, grid cells code obtained with the adaptive code of Example 3 on each module are able to reach this minimax discrimination rate, for every ρ up to a log 2 (ρ −1 ) factor but on a restricted range of ρ. 3. The random codes do not lose this extra log 2 (ρ −1 ) factor but they have a different range of ρ. 4. Extreme dyadic codes can reach the best possible precision in 2 −n .However, their discrimination rate cannot be faster for larger ρ.
5. Balanced grid cells code obtained with the adaptive code of Example 3 on each module with number of modules m = log 2 (ρ −1 ) are able to reach the rate log 2 (ρ −1 )/n, which is faster than the corresponding minimax rate for place cells code.However, balanced grid cells code cannot go faster when the distance between the stimuli increases.

6.
In particular, we do not know the minimax discrimination rate over the general class Ḡ and we do not know if adaptivity is possible there.However, we do know that the extreme dyadic code which achieves the precision 2 −n or the balanced grid cells code cannot be adaptive.

Numerical illustration
We have simulated what happens for n = 100 cells with firing rate µ = 30.We used 5 different configurations, as indicated below.
• Place cells -adaptive.That is the code f is given by Example 3 • Place cells -random.The code f is picked at random as in Example 4 • Grid cells -adaptive balanced.It consists in 20 modules with λ i = 2 −(i−1) .All the 20 modules have the same n i = 5.Inside each module, the code is taken as Example 3.
• Grid cells -random balanced.It consists in 20 modules with λ i = 2 −(i−1) .All the 20 modules have the same n i = 5.Inside each module, the code is taken at random as Example 4.
We picked s = 1/3, which is not in any of the periods of the different modules.We also picked s ′ = s + ρ with ρ in a grid between 2 −21 and 0.5.
For various possible T in a grid from 0.001 to 20, we simulated 5000 times the test Ψ given by (2.4) where we choose s 1 = s and s and the reverse if this is not the case.In each case, we found T min (f, s, s ′ , α) the smallest time in the grid that gives an error less than α (the error is evaluated by Monte Carlo method on the 5000 simulations).In Figure 3 on the left, we see T min (f, s, s ′ , α) as a function of As derived in (2.5), we also see in the simulations that the same constant of proportionality holds whatever the code (this constant depending only on µ and α) and that it was legitimate to study directly 1/∆ f s,s ′ .
It was not possible to compute T (f, ρ) as defined by (2.6), so we used the following quantity as a proxy: T (f, s, ρ, α) = max ρ ′ ≥ρ in the grid T min (f, s, s + ρ ′ , α), for s = 1/3.In Figure 3 on the right, we plotted T (f, s, ρ, α) as a function of ρ.We see that place cells -adaptive are indeed following a curve in 1/ρ as expected and that place cells -random have a similar behavior.In particular, with n = 100 they cannot detect a ρ of order 1/(2n) = 0.005.On the other hand, all grid cells can reach this range, because they can reach a much smaller range, at least 2 −20 .Besides, grid cells -(random or adaptive) balanced cannot have a time of detection which decreases when ρ increases.
Hence there is a point where the place cells system is quicker in discrimination time than the grid cells system.We tried a non balanced version of the grid cells (grid cells -adaptive decreasing), to make the discrimination time decreasing in ρ.However, as one can see on Figure 3 on the right, even if the discrimination time becomes decreasing in ρ, it reaches a plateau like behavior (up to logarithmic behavior that cannot be seen on the curve).Therefore, even this non balanced version of the grid cells is still slower than the place cells system.

Conclusion, discussion and Perspectives
We have adopted a new point of view on rate coding : the testing point of view and use it to enhance differences between a place cell system and a grid cell system with the same number of neurons.
On the testing point of view.The testing point of view is complementary to the estimation/information theory point of view developed originally by [4] for place cells or other cells with simple receptor field.Indeed, for place cells, Brunel and Nadal proved in a similar framework as the random code of the present work that the Fisher information is proportional to n, the number of cells.The only main difference is that they used triangular codes f instead of step functions with two parameters.If the Fisher information is proportional to n, it means that the standard deviation of the best estimator of the stimulus s is in n −1/2 .We have proved that a similar random place cell system can discriminate between two positions as long as they are at least at distance of order n −1/2 .In this sense, both frameworks seem consistent.However, we have also proved (see Example 3) that another place cells code achieves a much smaller precision in 1/n.Moreover the testing set-up allows us to capture an interesting phenomenon, which should take place in practice and might be tested via experiments.When two stimuli are very different, the discrimination time between both should be smaller.
On the place cell system.We have shown that adaptivity in terms of ρ (the distance between two stimuli) is possible.It means that not only certain codes have the ability to discriminate faster when the stimuli are further away but it also means that the rate 1/(nρ) is up to constant not improvable by a place cell system.This is reached not only by very particular codes such as the one of Example 3 but also by random codes, for ρ ≥ c/ √ n, for some positive constant c.
On the grid cell system.As shown already by the study of the Fisher information [9,22], we can prove that grid cells can reach a precision that is much smaller than the one of place cells and which is exponentially decreasing with n.We have also shown that the rate 2 −n for the discrimination time is in fact an absolute lower bound for all kind of codes with only two values and that the grid cells in this sense are the ones able to achieve the smallest precision not only with respect to place cells, but with respect to any code.We have also been able to derive upper and lower bound on the minimal discrimination time for grid cells with a given number of cells per module and given period for each module.In particular, the distribution of the cells inside a module is derived from their place cell equivalent and a random uniform distribution inside a module leads to the best rate up to constant as soon as ρ is large enough.This is coherent with the experiment of [24] that has found no structure in the distribution of the center of the receptor field inside a given module.Finally, for a fix ρ, particular balanced grid cells code are able to achieve the discrimination rate of log(1/ρ)/n which is much faster than place cells, but we have not been able to find a fix grid cells code whose discrimination rate would decrease when ρ increases in an interesting range (that is past the detection range of the first module).Informally once ρ < 1/(2n 1 ), it seems that all codes are limited by what happens in the first module, that is all discrimination rates seem to be equal to 1/(2n 1 ).However, we have not been able to rigorously prove this fact.
Place cells versus grid cells.As a consequence, and contrary to place cells, we have not been able to find a general grid cell adaptive code that would achieve a minimax discrimination rate that can only be smaller or equal to log(1/ρ)/n thanks to our results.
Our tentative non balanced codes as shown in the simulation were not satisfying.We do not know if adaptation is possible for general grid cells.If it is not possible, then there is definitely an advantage to have both systems (grid cells and place cells) at the same time.The simulations show a compromise that can be made in using both systems at once (see Figure 3 on the right).Indeed, grid cells can go at much smaller precision that place cells but are much slower when ρ increase than place cells.In this sense, having a combination of both systems would allow a fast reaction time when stimuli are far away and a good reaction time even if the stimuli are very close.
Limitations and Open problems.We restricted ourselves to one dimensional, periodic stimuli (or circular maze).We do not know at the moment how to generalize it to larger dimension, especially if we want grid cells to have an hexagonal pattern.Moreover, [24] have proved that the periods progress in a geometric manner (see also [27] for similar geometric progressions in 1d) but that the ratio is not an integer and so far our method is relying too strongly on periodicity to allow that.To go beyond this, we would need to consider stimuli with boundaries effect and if the adaptation to boundaries for grid cells have been described [13] modeling precisely the boundary effect from a mathematical point of view is not straightforward.Also one might want to add more realistic rate functions f (s) that step functions, but we do not think this would massively change the rates we found as long as the shape of f (s) is not authorized to vary much.Finally, even in the 1d circular case, the problem of adaptivity and even the computation of the minimax discrimination rate of the general grid cells codes remains open.It means that we do not know what is the best choice in terms of discrimination rate for the number of cells per module, the scales or even the number of modules.

Figure 2 :
Figure 2: Visual representation of Examples 3.For an easier visualisation, the different half red circles have a different radius.Nevertheless, each of them correspond a certain cell i of the code, and more precisely to the locations s such that f i (s) = µ.Two couples are considered (s 1 , s 2 ) (standing for two positions that are very close) and (s 1 , s ′2 ) (standing for positions that are very far).We see that ∆ s1,s ′ 2 = n − 1 >> ∆ s1,s2 = 1.

4. 2 .
Upper bounds and a particular adaptive code for G((n i , λ i ) i=1,...,m ) Example 4.3.(An adaptive grid cell code) This code is made of adaptive place cells code as in Example 3 for each of the modules.More precisely, the module i is the set of neurons M i = {n i−1 + 1, . . ., n i−1 + n i } and f ni+j,i is associated with the points a ij , b ij ∈ S λi corresponding to the angles θ aij = jλi 2ni and θ bij = (j+ni)λi 2ni Let us denote this code by g gc .

4. 3 Example 4 . 5 (
Random codes are also adaptive in G((n i , λ i ) i=1,...,m ) Random code for grid cells).Again we use random codes on each of the modules.More precisely, for each 1

Minimal range of detection Example 4 . 7 (Proposition 4 . 8 .
to understand.What would be the best choice of n i 's and λ i 's ?Extreme Dyadic code).In the extreme dyadic code f d , we have m = n modules M 1 , . . ., M n where the i-th module M i = {i} has period λ i = (1/2) i−1 and whose points a ii := a i , b ii := b i ∈ S λi are such that θ ai = 0 and θ bi = (1/2) i .In the sequel, let us denote f d,ii = f d,i so that f d = (f d1 , . . ., f dn ).Let f d be the extreme dyadic code.For any ρ ≥ 1/2 n , we have T (f d , ρ) = 1.

Example 4 . 9 .Proposition 4 . 10 .
(Grid cells -adaptive balanced) For a given 1 ≤ m ≤ n, let λ i = 2 −(i−1) and n i = ⌊n/m⌋ for 1 ≤ i ≤ m − 1, and λ m = 2 −(m−1) and n m = ⌊n/m⌋ + n mod m.We call balanced grid cells code the class G b,m := G((n i , λ i ) i=1,...,m ) corresponding to these choices of λ i 's and n i 's.Note that each code in this class has the same number of neurons per module (except the last one).In what follows, let us denote g b,m gc the codeg gc defined in Example 4.3 belonging to the class of balanced grid cells code G b,m .Let 1 ≤ m ≤ n such that n ≥ 2m.For any ρ such that 1/2 m ≤ ρ ≤ 1/2, 1 3⌊n/m⌋ ≤ T (G b,m , ρ) ≤ T min (g b,m gc , ρ) ≤ 16 ⌊n/m⌋ ,that is, the code g b,m gc is adaptive in the range ρ ≥ 1/2 m in the class G b,m .

Figure 3 :
Figure 3: Discrimination time as a function of ∆ fs1,s2 and ρ.On the left, T min (f, s 1 , s 2 , α) as a function of 1/∆ f s,s ′ for the 5 different codes.On the right, T min (f, ρ, α) as a function of ρ for the 5 different codes.
µ}, with [n] the short notation for {1, ..., n}.In what follows, for any subset B ⊆ [n], we denote |B| the cardinality of the set B.
2 for all module i.A typical representation of this grid cells code can be seen Figure1.C.