# Signal processing in the cochlea: the structure equations

- Hans Martin Reimann
^{1}Email author

**1**:5

https://doi.org/10.1186/2190-8567-1-5

© Reimann; licensee Springer 2011

**Received: **15 November 2010

**Accepted: **6 June 2011

**Published: **6 June 2011

## Abstract

### Background

Physical and physiological invariance laws, in particular time invariance and local symmetry, are at the outset of an abstract model. Harmonic analysis and Lie theory are the mathematical prerequisites for its deduction.

### Results

The main result is a linear system of partial differential equations (referred to as the structure equations) that describe the result of signal processing in the cochlea. It is formulated for phase and for the logarithm of the amplitude. The changes of these quantities are the essential physiological observables in the description of signal processing in the auditory pathway.

### Conclusions

The structure equations display in a quantitative way the subtle balance for processing information on the basis of phase versus amplitude. From a mathematical point of view, the linear system of equations is classified as an inhomogeneous $\overline{\partial}$-equation. In suitable variables the solutions can be represented as the superposition of a particular solution (determined by the system) and a holomorphic function (determined by the incoming signal). In this way, a global picture of signal processing in the cochlea emerges.

## Keywords

## 1 Background

At the outset of this work is the quest to understand signal processing in the cochlea.

### 1.1 Linearity and scaling

It has been known since 1992 that cochlear signal processing can be described by a wavelet transform (Daubechies 1992 [1], Yang, Wang and Shamma, 1992 [2]). There are two basic principles that lie at the core of this description: Linearity and scaling.

In the cochlea, an incoming acoustical signal $f(t)$ in the form of a pressure fluctuation (*t* is the time variable) induces a movement $u(x,t)$ of the basilar membrane at position *x* along the cochlea. At a fixed level of sound intensity, the relation between incoming signal and movement of the basilar membrane is surprisingly linear. However as a whole this process is highly compressive with respect to levels of sound - and thus cannot be linear.

In the present setting this is taken care of by a ‘quasilinear model’. This is a model that depends on parameters, for example, in the present situation the level of sound intensity. For fixed parameters the model is linear. It is interpreted as a linear approximation to the process at these fixed parameter values. Wavelets give rise to linear transformations. The description of signal processing in the cochlea by wavelet transformations, where the wavelets depend on parameters, is compatible with this approach.

Scaling has its origin in the approximate local scaling symmetry (Zweig 1976 [3], Siebert 1968 [4]) that was revealed in the first experiments (Békésy 1947 [5], Rhode 1971 [6]).

*ω*there corresponds an output $u(x,t)$ at the position

*x*along the cochlea that on the basis of linearity has to be of the form

*x*and $\omega >0$. Its modulus $|\stackrel{\u02c6}{g}(x,\omega )|$ is a measure of amplification and its argument is the phase shift between input and output signals. The experiments of von Békésy [5] showed that the graphs of $|\stackrel{\u02c6}{g}(x,\omega )|$ and $|\stackrel{\u02c6}{g}(x,c\omega )|$ as functions of the variable

*x*are translated against each other by a constant multiple of log

*c*. By choosing an appropriate scale on the

*x*-axis, the multiple can be taken to be 1. The scaling law is then expressed as

The scaling law will be extended - with some modifications - to include the argument of $\stackrel{\u02c6}{g}$.

Intimately connected to scaling is the concept of a tonotopic order. It is a central feature in the structure of the auditory pathway. Frequencies of the acoustic signal are associated to places, at first in the cochlea and in the following stages in the various neuronal nuclei. The assignment is monotone, it preserves the order of the frequencies. In the cochlea, to each position *x* along the cochlear duct a circular frequency $\sigma =\xi (x)$ is assigned. The function *ξ* is the position-frequency map. Its inverse is called the tonotopic axis. At the stand of von Békésy’s results, the frequency associated to a position *x* along the cochlea is simply the best frequency (BF), that is the frequency *σ* at which $|\stackrel{\u02c6}{g}(x,\omega )|$ attains its maximum. The refined concept takes care of the fact that the transfer function and with it the BF changes with the level of sound intensity, at which $\stackrel{\u02c6}{g}$ is determined. The characteristic frequency (CF) is then the low level limit of the best frequency. The position-frequency map *ξ* assigns to the position *x* its CF.

*K*is determined by inserting a special value for

*x*. The scaling law tells us that the function $|\stackrel{\u02c6}{g}(x,\omega )|$ is actually a function of the ‘scaling variable’

*et al.*1981 [10], Greenwood 1990 [11]), the position-frequency map is now known precisely for many species. Shera 2007 [12] gives the formula

*l*and the ‘transition frequency’ ${\mathit{CF}}_{1}$ vary from species to species. The scaling variable that goes with it is

*x*is the normalized variable (

*x*instead of $x/l$) and the precise position-frequency map is expressed in the form

*ξ* denotes circular frequency and $K=\xi (0)+S$. The constant *S* is referred to as the shift.

In the abstract model as it will be developed, much will depend on the definition of the function *σ* that specifies the frequency location. In the present treatment the frequency localization of a function will be defined as an expectation value in the frequency domain.

### 1.2 Wavelets

*f*satisfies $\stackrel{\u02c6}{f}(\omega )=\overline{\stackrel{\u02c6}{f}(-\omega )}$. If the definition of $\stackrel{\u02c6}{g}$ is extended to negative values of

*ω*by $\stackrel{\u02c6}{g}(x,-\omega )=\overline{\stackrel{\u02c6}{g}(x,\omega )}$ then $u(x,t)$ can be written as

*h*in the scaling variable:

*f*can then be expressed as

*Wf*with wavelet

*ψ*is defined by

The fact, that the cochlea - in a first approximation - performs a wavelet transform appears in the literature in 1992, both in [1] and in [2].

### 1.3 Uncertainty principle

The natural symmetry group for signal processing in the cochlea is built on the affine group Γ. It derives from the scaling symmetry in combination with time-invariance. In addition, there is the circle group *S* that is related to phase shifts. Its action commutes with the action of the affine group. The full symmetry group for hearing is thus $\Gamma \times S$. For this group, the uncertainty principle can be formulated. The functions for which equality holds in the uncertainty inequalities are called the extremal functions. They play a special role, similar as in quantum physics the coherent states (the extremals for the Heisenberg uncertainty principle). The starting point in the present work is the tenet that these functions provide an approximation for the cochlear transfer function.

That the extremal functions should play a special role is not a new idea. In signal processing the extremal functions first appeared in Gabor’s work (1946) [13] in connection with the Heisenberg uncertainty principle and then in Cohen’s paper (1993) [14] in the context of the affine group. In a paper by Irino 1995 [15] the idea is taken up in connection with signal processing in the cochlea. It is further developed by Irino and Patterson [16] in 1997. The presentation in this paper is based on previous work (Reimann, 2009 [17]). The concept pursued is to determine the extremals in the space of real valued signals and to use a setup in the frequency domain, not in the time domain. Different representations of the affine group give different families ${E}_{c}$ of extremal functions. The parameter *c* is used to adjust to the sound level and hence to provide linear approximations at different levels to the non-linear behavior of cochlear signal processing.

## 2 Results and discussion

### 2.1 Uncertainty principle

This section starts with the specification of the symmetry group $\Gamma \times S$ that underlies the hearing process. The basic uncertainty inequalities for this group are then explicitly derived. The analysis builds on previous results (Reimann [17]). A modification is necessary because the treatment of the phase in [17] was not satisfactory. An improvement can be achieved with the inclusion of the term $\alpha \stackrel{\u02c6}{H}$ in the uncertainty inequality. This term comes in naturally and it will influence the argument - but not the modulus - of the extremal functions associated to the uncertainty inequalities. It is claimed that the extremal functions derived in this section are a first approximation to the basilar membrane transfer function $\stackrel{\u02c6}{g}$. The extremal functions for the basic uncertainty principle are interpreted as the transfer function at high levels of sound. This situation corresponds to the parameter value $c=1$. With increasing parameter values the extremal functions for the general uncertainty inequality are then taken as approximations to the cochlear response at decreasing levels of sound.

#### 2.1.1 The symmetry group

**R**. It is generated by the transformation group ${\tau}_{b}(t)=t+b$ ($b\in \mathbf{R}$) and the dilation group ${\delta}_{a}(t)=at$ ($a\in \mathbf{R}$, $A\ne 0$). Under the Fourier transform, the action of the dilation group on ${L}^{2}(\mathbf{R},\mathbf{C}$) is intertwined to the action of the inverse dilation group $\stackrel{\u02c6}{\delta}$. This group also acts directly in frequency space:

(With this convention, the group action and the induced action are denoted with the same symbol.) Clearly, the invariance property of the basilar membrane transfer function directly reflects this group action.

of the circle group S. This is the third distinguished group action.

*t*and position

*x*along the cochlea. Clearly $\stackrel{\u02c6}{B}$ is related to time, whereas $\stackrel{\u02c6}{A}$ - as will be shown presently - is related to the position. In our approach the tonotopic axis is given by the exponential law

*a*) in

*x*:

The uncertainty principle that goes with the group $\Gamma \times S$ can thus been seen as an uncertainty for the determination of time and position.

#### 2.1.2 The basic uncertainty inequality

*α*,

*β*and

*ν*. The expression $\parallel (\stackrel{\u02c6}{H}\stackrel{\u02c6}{B}+\nu )h\parallel $ is minimal for

This *ν* is the decisive parameter. It has the interpretation of an expectation value for the frequency. Later it will be associated with the place along the cochlea.

*α*and

*β*in the expression

has been used.

The integrals ${\int}_{-\infty}^{\infty}{|h|}^{2}|\omega |\frac{d}{d\omega}argh\phantom{\rule{0.2em}{0ex}}d\omega $ and ${\int}_{-\infty}^{\infty}t{|\frac{df}{dt}(t)|}^{2}\phantom{\rule{0.2em}{0ex}}dt$ can be interpreted as expectation values of ${|h|}^{2}\frac{d}{d\omega}argh$ in the frequency space and for ${|\frac{df}{dt}(t)|}^{2}$ in the time space. Roughly, in combination with $\stackrel{\u02c6}{H}$ the operator $\stackrel{\u02c6}{A}$ controls $\frac{d}{d\omega}argh$ and $\stackrel{\u02c6}{B}$ the time derivative.

We will assume that the parameters *α*, *β* and *ν* are always chosen such that the right hand side in the uncertainty inequality is minimal, that is, the inequality is formulated in its sharpest form.

*ν*for the modulus of the frequency is

does not have such a simple interpretation except in the special case $\alpha =0$. This is treated in [17].

A function *h* is called extremal, if equality holds for it in the uncertainty relation. The extremal functions are expected to play a special role in the signal processing of the cochlea. In the context of the classical Heisenberg uncertainty relation, the extremal functions are translates of the Gaussian function ${e}^{-{x}^{2}}$ under the action of the Heisenberg group. They are called ‘coherent states’. Their significance in signal processing is well established since the appearance of Gabor’s work in 1946 [13]. At the outset of the present discussion is however the fact that the cochlea performs a wavelet transform - and not a Fourier transform. The invariance group is $\Gamma \times S$ and not the Heisenberg group. It should therefore be expected that the extremal functions as discussed below play the crucial role in the hearing process.

*h*(in frequency space) satisfy the equation

with real constants *k*, *ε*, *α*, *β*, *κ* and *ν*. Square integrability implies $\kappa >0$ and *ν* is the positive frequency expectation value.

From the explicit form it is clear that the space of solutions is invariant under the action of $\Gamma \times S$. The tenet is now:

### 2.2 The basilar membrane transfer function is given by extremal functions

*h*, normalized by the condition $\nu (h)=1$, such that $h(\frac{\omega}{\xi})$ adequately describes the basilar membrane transfer function $\stackrel{\u02c6}{g}$:

*x*is thus $\xi (x)$. The question then arises whether the experiments confirm the tenet. To arrive at a preliminary conclusion, graphs of the modulus and of the real part of the function

*h*are displayed in Figure 1. The parameters are $\alpha =-\pi $, $\beta =2\pi $ and $\kappa =4$.

The classical results by von Békésy (1947) [5] seem to be in favor of such a statement. However the situation is of course not so simple. The basic problem is the non-linearity of the process that associates the movement $u(x,t)$ of the basilar membrane to the input signal $f(t)$. This process is highly compressive and therefore its description by a transfer function can at best be looked at as an approximation.

Von Békésy’s result stem from experiments on dead animals. The outcome can be compared to the experimental results obtained with life animals, yet at high intensities of sound pressure. The above description of the basilar membrane transfer function is therefore taken to be a linear approximation at high levels of sound pressure. In the following section the approach will be modified with the aim of obtaining linear approximations at all levels of sound pressure.

#### 2.2.1 General uncertainty inequality for $\Gamma \times S$

There are various ways that the abstract group $\Gamma \times S$ can act on the space ${L}_{\mathit{sym}}^{2}$. Apart from the natural representation that associates to the corresponding basis elements in the Lie algebra of Γ the operators $\stackrel{\u02c6}{A}$ and $\stackrel{\u02c6}{B}$, the general representation considered below is built on the operators $\frac{1}{c}\stackrel{\u02c6}{A}$ and ${\stackrel{\u02c6}{B}}_{c}=-i{|\omega |}^{c}sgn(\omega )$. The representation of Γ induced by this algebra representation retains the crucial scaling behavior known from the experimental results. It seems to be suitable in the present context, despite the fact that the operator ${\stackrel{\u02c6}{B}}_{c}$ does not stand for the time derivative any more.

The proportionality factor is $\kappa =-\frac{c\mu}{\lambda}$. Its choice is arbitrary. All the constants *α*, *β*, and *κ* can in fact be chosen in dependence of the parameter *c*. This gives a possibility for fine adjustment of the extremal function ${h}_{c}$ that describes the linear approximation at level *c* of the basilar membrane filter.

with real constants *k*, *ε*, *α*, *β*, *κ* and *ν*. These solutions are in ${L}^{2}$ if both $\kappa >0$ and $\nu >0$.

*h*is

*ω*is

*ξ*:

The parameter *c* allows to express at which level of sound intensity the linearization is specified. Parameters $c\sim 1$ indicate high levels and parameters $c\gg 1$ small levels of intensity.

*x*, the extremal functions $|{\stackrel{\u02c6}{g}}_{c}(x,\omega )|$ attain their maxima at

With increasing values of *c* this approaches the frequency localization $\xi (x)$.

With the present setup the argument of the basilar membrane filter is independent of *c*. The experimental results by Rhode and Recio (2000) show minor changes of phase in dependence of the intensity level. With increasing intensity there is a small phase lag below the characteristic frequency and an equally small phase lead for frequencies above the characteristic frequency. Studies of the impulse response also confirm that the phase is almost invariant under changes in sound level (Recio and Rhode (2000) [20], Shera (2001) [21]). In order to obtain a fine adjustment of the phase data, the parameters *α* and *β* would have to be chosen in dependence of *c*.

*ω*and as a function of the place

*x*. The phase of the extremal function does not satisfy this requirement because of the logarithmic term. Yet still, the phase of the extremal function serves as an approximation of the physiological phase function on the interval in which the the absolute value is relevant. At places at which the absolute value is close to zero, the argument is of no significance. In Figure 2 the phase function is pictured for the fixed circular frequency $\omega =1\text{,}000$ as a function of the distance

*d*to the stapes, on the interval that is of physiological relevance. In Figure 3 the phase is pictured as a function of frequency (in Hz). In this figure, the characteristic frequency is 7,000 Hz. The part above about 3,000 Hz is of physiological relevance. The approximation holds in this range. It should be compared with the experimental results by Rhode and Recio (2000 [19], Figure 2E). The part below 3,000 Hz is the mathematical expression for the phase function. It is physiologically not correct, but this is of no significance.

*β*. Dividing the extremal functions by this factor, one is left with the extremal functions as they would appear if

*β*had been set equal to zero in the general uncertainty inequality. They are the extremal functions for the uncertainty inequality

### 2.3 The structure equations

Extremals for the uncertainty principle satisfy differential equations. Since the membrane transfer function is described by an extremal function and its transforms under the symmetry group and since the extremal functions are preserved under this action, it is possible to derive differential equations for the output of the signal. The resulting equations are called the structure equations.

The derivation starts with the simple case $c=1$. In this situation differentiation of the wavelet transform $Wf(a,t)$ with respect to the parameters of the symmetry group directly leads to a differential equation. In the case $c>0$ however, the resulting equation is actually a pseudo differential equation. A linearization process for the kernel brings it back to a differential equation that is then satisfied approximately.

The quantities in the equation at first are derivatives of the output function $Wf(a,t)$ and its Hilbert transform. A further calculation then shows that the result can be formulated as an inhomogeneous system of linear partial differential equations for the phase and for the logarithm of the amplitude of the output signal. This is particularly satisfying because these are exactly the physiologically relevant quantities.

First the case $c=1$ is treated. It leads to exact results whereas in the case $c>1$ an approximation procedure will be applied.

*a*gives

*h*satisfies the differential equation

*H*is mapped into $\stackrel{\u02c6}{H}$. This gives the basic equation

*x*) brings in the Hilbert transform

*H*. This transform is a unitary operator on ${L}^{2}(\mathbf{R},\mathbf{C})$. Its square is the negative of the identity operator: ${H}^{2}=-I$. It extends to a bigger class of functions (to all temperate distributions). On the basic trigonometric functions it operates very simply:

holds for the functions $f=cos(\omega t)$. The linearity assumption then implies that this holds for arbitrary input signals *f*.

*Zf*then satisfies

Notice the shift by $\frac{1}{2}$ that has its origin in the factor $\frac{1}{\sqrt{a}}$.

*Zf*it follows immediately that

*c*is an odd natural number). It is however possible to use a linear approximation for ${\stackrel{\u02c6}{B}}_{c}$ near the frequency expectation value of ${h}_{c}$, that is, at the point $\omega =1$:

The calculation for the analytic wavelet transform $Zf=u+iHu$ then proceeds as above. Only the constants are slightly different. In the sequel the notation $\gamma =c\kappa $ will be used. Recall that in prospective refined adjustments the parameters *α*, *β* and *γ* may vary with *c*.

Equality holds if $c=1$.

### 2.4 Consequences of the structure equations

Signal processing in the cochlea is non-linear. The main - but certainly not the only - source of non-linearity is the compressive nature inherent in the hearing process. In the abstract model pursued here this is taken care of with a single parameter that represents the level of sound intensity. The model then describes the linear approximations at these levels. The structure equations are at the core of this abstract model, in fact they comprise all the essential features. First of all, they are linear (as would be expected from a linear approximation). From a mathematical point of view, the equations therefore are very simple. On top, the system is quite special. With respect to suitable variables it represents an inhomogeneous $\overline{\partial}$-equation. Its solutions can be realized in complex form as products of two factors, the first of which is entirely determined by the system and the second is a holomorphic function that can be calculated from the signal. At every level *c* it is thus possible to associate to an input signal in a unique way a holomorphic function that describes the output signal in terms of the physiological parameters.

The phase and the logarithm of the amplitude are used in the description of the experiments and they are omnipresent in all the representations of the auditory pathway. In themselves they are of limited significance, because they are not coded as such. What really is essential in any cochlear or in any neural model are the changes of these quantities, both with respect to time and with respect to the place. The structure equations precisely relate the local and temporal derivatives of phase and (logarithm of) amplitude. The geometry of the cochlea implicitly is inherent in the extremality property of the basilar membrane filter. But in the structure equations this only shows in terms of the constants. The implicit appearance of the tonotopic axis is an expression of the basic invariance principle that stands at the outset of all considerations.

this then determines $\frac{\partial}{\partial a}logr$. Conversely, the complete knowledge of amplitude information determines the phase information. From an abstract point of view, phase information and amplitude information each individually contain the full information of the signal. In the auditory pathway both phase and amplitude information is being processed. It is commonly assumed that phase information dominates in the low frequency range and amplitude information in the regions that process high frequencies. The equations tell us that phase processing and amplitude processing are equally significant.

shows that there is also a twofold way of data processing with respect to time and with respect to the place. Complete information on derivatives with respect to the position gives complete information on time derivatives - and *vice versa*.

The general solution of an inhomogeneous linear differential equation can be presented as the linear combination of a particular solution (any chosen solution of the equation) and the general solution of the associated homogeneous differential equation.

*Y*is of the form

*X*satisfying the homogeneous equation

*Zf*. Writing

*X*also holds at the zeros of

*X*.) With the variable change

with *G* a holomorphic function in the variable $z=t-a\beta +ia\gamma $. Since $a>0$ (and $\gamma >0$) it is defined in the upper half space $\{z\in \mathbf{C}:Imz>0\}$. The function $G(z)$ is uniquely defined up to a constant.

*Zf*approximately satisfy the complex structure equation. The solutions

are then expected to provide approximations for *Zf* (with equality for $c=1$).

The functions *G* are holomorphic and depend on the parameter *γ*. They can in fact be determined directly from the Fourier transform of the incoming signal $f(t)$. Since the system is linear, the superposition principle holds:

*f*is

*f*is

### 2.5 Examples

#### 2.5.1 Pure sounds

The second structure equation is satisfied as an equality.

as the approximate value of $logZf$.

*G*associated to the input signal

The constants *k* and *ε* are of little importance and do not show in the structure equations. In the following calculations we set $k{e}^{i\epsilon}=1$.

#### 2.5.2 Amplitude modulation

*ν*is dominant as long as

*A*and $\mu \ll \nu $ it includes the entire range along the cochlea that is involved in the processing of the amplitude modulated signal. The function

*F*describing this signal in the relevant range can then be estimated by using the approximation $log(1+x)\cong x$ for small $|x|$:

The result exhibits the basic frequency *ν* as the carrier frequency. But it should be warned that the approximation is valid only in the frequency interval specified above.

It can clearly be seen that there is the constant contribution from the carrier frequency and - as the interesting part - a slow oscillation of angular frequency *μ* that stems from $sin(\mu z+\mathit{const})$. Both the amplitude and the phase derivatives show this oscillation.

#### 2.5.3 The sound of a violin

*a*${e}^{\prime}$ and ${d}^{\u2033}$. The program ‘Prisma-Realtime’ by Bachmann

*et al.*(2007) [22] uses windowed Fourier transform for this spectrogram. The amplitudes are determined at short intervals and marked with a point. The intensity of these points is fading with the time.

*m*. In the dB-scale the decrease is roughly linear with slope −2:

(with the approximation $10\cong {e}^{2.3}$).

*a*). The amplitudes

*nν*, that is, for $\xi (x)=\frac{1}{a}=n\nu $, the coefficient ${d}_{n}$ is thus dominant. In a neighborhood of this ${n}^{\mathrm{th}}$ harmonic (near $a=\frac{1}{n\nu}$) the function

*F*can be described locally. The calculation exhibits apart from the contribution by the carrier frequency a substantial oscillatory part of angular frequency

*ν*. In a first calculation only the influence of the two closest harmonics is taken into account. The partial signal

near the ${n}^{\mathrm{th}}$ harmonic. The carrier frequency accounts for the part $in\nu z+P(n\nu )+logA+i\theta $. In the structure equation it only participates with time independent terms. But the significant part is the contribution that varies in time with angular frequency *ν* (recall that $z=t-a\beta +ia\gamma $).

with a well defined remainder term that is $\frac{2\pi}{\nu}$-periodic in time. The term $in\nu z+P(n\nu )+log{c}_{n}$ shows the presence of the carrier frequency. The relevant information about the violin sound is however contained in the fact that the $\frac{2\pi}{\nu}$-periodicity extends over an interval along the cochlea that comprises more than three octaves in the tonal range. The nature of this contribution is similar all along the interval covered by the frequency spectrum of the violin sound. The exceptions are the low harmonics (essentially the first and second) at which the influence of the neighboring harmonics is very small. Furthermore, the amplitude spectrum of a violin sound very often fails to display monotonicity for the first few harmonics.

With regard to the violins, it should be mentioned that there are considerable differences between different (good quality) instruments. It is believed that the distribution in the first few harmonics very much contributes to the individuality of the violin.

### 2.6 The impulse response

Therefore $au(logK+loga,t)={\stackrel{\u02c7}{h}}_{c}(\frac{t}{a})$ is a function of the single variable $s=\frac{t}{a}$. This could also be expressed by saying that $tu(x,t)$ only depends on the single variable *s* - the usual way to formulate the invariance statement.

*c*. The impulse response must have its support on the positive half of the time axis. The membrane cannot show a reaction before the impulse arrives. The numerical calculations show that this is almost satisfied. At this point attention should be drawn to a deficiency of the approach. The basic difficulty lies in the concept of using the uncertainty principle. The appropriate thing would be to restrict the class of functions in the uncertainty principle to functions that in the time domain are supported on the positive half axis. However, in the restricted class there are no extremal functions. This can be seen from the fact that the class of extremal functions is translation invariant in the time domain. With the present setting it is not strictly true that the impulse response has its support on the positive half axis. The extremal functions have to be interpreted as a first approximation. They have to be modified slightly such that they really vanish for negative values of

*t*. The numerical calculations show that only small modifications are necessary.

The invariance property of the impulse response mentioned above in combination with the structure equations allow for an explicit approximate calculation of ${\stackrel{\u02c7}{h}}_{c}$. In the case $c=1$ this procedure gives the precise value up to a multiplicative constant.

*f*and

*g*are then defined starting from $au(k+loga,t)$ as

*s*.

*K*. The inverse Fourier transform ${\stackrel{\u02c7}{h}}_{c}$ of ${h}_{c}$ is then approximated as

This is an exact result if $c=1$.

*z*and

*s*are related by $z=t-a\beta +ia\gamma =a(s-\beta +i\gamma )$, it follows that

and this function is indeed holomorphic in the upper half plane.

*f*and

*g*. The above system is solved for ${f}^{\prime}$ and ${g}^{\prime}$:

*r*at the position given by the angular frequency $\frac{1}{a}$ can be obtained. The peak is determined by ${g}^{\prime}(s)=0$:

This equation tells us how the peak arising from a click travels along the cochlea.

*f*is calculated:

*r*the second derivative is small and it changes sign between $\beta +\alpha $ and

*β*(recall that $\alpha <0$):

*s*as $\phi (a,t)=f(s)$. This gives

for the instantaneous frequency and hence ${f}^{\prime}(s)=\frac{1}{\xi (x)}\frac{\partial \phi}{\partial t}(a,t)$ for the normalized instantaneous frequency. In [23], p. 2025, Shera denotes it by ${\beta}_{in}(\tau )$ and pictures its graph in Figure 2b. The above calculations show that ${f}^{\prime}(s)$ is increasing for values below $\alpha +\beta $. This can be interpreted as a frequency glide. Note that ${f}^{\prime}(s)$ starts to decrease near *β*. Hence Figure 2b in [23] would roughly confirm the present calculations provided that $\alpha +\beta \approx 12\pi $. There is however a difference in that the present calculation exhibits a dependence of ${f}^{\prime}$ on the sound level (as represented by *c*) with maximal values for ${f}^{\prime}$ that are slightly bigger than one.

### 2.7 General invariance groups

Scale invariance as considered in the previous sections is based on the dilation group ${\delta}_{a}$ and on the assumption that the tonotopic axis is given by the exponential law $\xi (x)=K{e}^{-x}$. From the experimental data it should however rather be concluded that the symmetry hypotheses are satisfied only locally and in a first approximation. The question therefore arises whether the results subsist qualitatively when these basic assumptions are modified. To answer this question the setup of an abstract model is being presented.

The basic hypothesis is still that the symmetry in cochlear mechanics is given by a one parameter transformation group ${\stackrel{\u02c6}{\lambda}}_{a}$ acting in phase space. This action can be taken in a quite general form. Ideally it should be possible to adapt it individually to each species. The specific form of the tonotopic axis is to a certain extent independent of the action of the one parameter group. It will be discussed as a separate issue and for the time being the exponential law will be retained.

The transformation group ${\stackrel{\u02c6}{\lambda}}_{a}$ thus stands at the outset of a general framework for an abstract description of cochlear mechanics. The one parameter group will be enlarged to a bigger group. At first this will be done on the infinitesimal level by defining a multiplier operator $\stackrel{\u02c6}{M}$ that plays the role of $\stackrel{\u02c6}{B}=-i\omega $ in the previous sections. Together with the infinitesimal generator $\stackrel{\u02c6}{L}$ for the one parameter group ${\stackrel{\u02c6}{\lambda}}_{a}$ and together with $\stackrel{\u02c6}{H}$, the action of the Lie algebra of the abstract symmetry group is then completely determined. As an abstract group, the symmetry group is still $\Gamma \times S$, yet the action in phase space will have changed. It will in fact be conjugate to the standard action. The conjugation mapping typically maps a bounded symmetric range in frequency space onto the whole frequency axis. In the application the bounded range will be the interval $(-R,R)$. The upper bound *R* appears as an absolute frequency bound. In such a model, the inner ear is completely indifferent to signals whose frequency content lies beyond this limit.

As a first issue the wavelet transform for general one parameter groups is being discussed. Next the action of the one parameter group is determined in dependence of the parameter *c* that relates to the overall sound level of the signal. This action will then be extended on the infinitesimal level to all of $\Gamma \times S$. Along with this, the conjugation mapping will be defined.

#### 2.7.1 The wavelet transform for general one parameter groups

*v*extends to a continuous odd function on

**R**and that the solutions ${\tau}_{t}(\omega )$ to the differential equation

*ω*. Then ${\tau}_{t}$ is a one parameter transformation group of ${\mathbf{R}}^{+}$. It extends in an antisymmetric way to all of

**R**. At the point $x=0$ the vector field vanishes and $x=0$ is a stationary solution of the differential equation. Let us transform the time parameter

*t*and set

*v*has a finite number of zeroes $\pm {x}_{i}$, labeled in ascending order

*v*). If

*t*gives

The formula expresses that $\frac{|d\omega |}{|v(\omega )|}$ is the invariant measure for the group action.

*λ*on ${L}^{2}(\mathbf{R},\mathbf{C})$:

(The same notation ${\lambda}_{a}$ is used for both the group and its unitary representation. To be consistent with the previous notation the group should actually be denoted by ${\stackrel{\u02c6}{\lambda}}_{a}$, since it will be taken as a group that acts in phase space.)

is a cocycle for the transformation group $\{{\lambda}_{a}\}$.

*ψ*and transformation group $\{{\lambda}_{a}\}$ is

*ω*:

*ψ*is a real valued wavelet. Then

*f*is contained in the wavelet transform, provided the integrals are finite and the constants ${C}_{i}$ different from zero for all indices. A formal reconstruction can be obtained in terms of the wavelets

(Notice the negative sign in the exponent!)

For the description of cochlear mechanics the reconstruction of the signal from its output at the cochlear level (that is, from its wavelet transform) is not an issue. No reconstruction is taking place in the auditory pathway. However from the point of information processing it is of relevance to know whether the wavelet transform contains the full information of the original signal.

In the application the wavelet transform will be described by a wavelet with frequency support in $[-R,R]$. The above reconstruction process would then give the projection of the signal onto the subspace of band limited signals.

#### 2.7.2 Extension of the group action

*v*, the infinitesimal generator for the group action is

*M*such that

*M*can be taken in the form

*s*is determined up to a multiplicative factor by

*v*is a smooth vector field with zeros at $\pm {x}_{i}$ then $s(\omega )$ will map any interval ${I}_{i}=({x}_{i-1},{x}_{i})$ onto the positive real half axis ${\mathbf{R}}^{+}$. Furthermore, it will conjugate the action of the transformation group ${\lambda}_{a}$ (restricted to ${I}_{i}$) with the action of the dilation group ${\stackrel{\u02c6}{\delta}}_{a}$:

*s*induces an isometry ${s}^{\ast}$ between ${L}^{2}$ and the subspace ${L}_{I}^{2}$ of ${L}^{2}$-functions restricted to $-{I}_{i}\cup {I}_{i}$:

In the application, the interval on which ${\lambda}_{a}$ acts will be $I=(0,R)$ and the vector field will take negative values. The function $sgn(\omega )s(\omega )$ then maps $(-R,R)$ onto **R** and conjugates ${\lambda}_{a}$ to ${\stackrel{\u02c6}{\delta}}_{a}$.

The operators *L* and *M* satisfy the commutator relation of the Lie algebra of the affine group Γ. The action of the group ${\lambda}_{a}$ is thus extended on the infinitesimal level to a Lie algebra action of the affine group.

*L*, whereas

#### 2.7.3 The uncertainty inequality

*Lh*and ${M}_{c}h$ in ${L}^{2}$. Under ${\lambda}_{a}$ they transform as

*h*by

*s*conjugates ${\lambda}_{a}$ to ${\stackrel{\u02c6}{\delta}}_{a}$, it follows that

These solutions can either be calculated directly from the differential equation for the extremal solutions or they can be determined from ${h}_{c}$ by applying the conjugation mapping ${s}^{\ast}$.

*s*conjugates ${\stackrel{\u02c6}{\delta}}_{a}$ to ${\lambda}_{a}$, one has $s({\lambda}_{a}^{-1}(\omega ))=as(\omega )$ and therefore

Apart from the factor ${|v(\omega )|}^{-\frac{1}{2}}$ this is a function of the single variable $as(\omega )$.

#### 2.7.4 Structure equations in the general setting

*v*is a vector field with zeros at $\pm {x}_{i}$ that generates a one parameter group ${\lambda}_{a}$ of transformations on $-I\cup I=(-{x}_{i},-{x}_{i-1})\cup ({x}_{i-1},{x}_{i})$. Signals $f\in {L}_{I}^{2}$ with frequency content in $-I\cup I$ can then be analyzed with the group wavelet transform

*L*, ${M}_{c}$ and $\stackrel{\u02c6}{H}$. If

*h*is an extremal function with coefficients

*α*,

*β*and

*ν*, then $\overline{h}$ is also an extremal function. Its coefficients are −

*α*, −

*β*and

*ν*. It is thus possible to take $\overline{\stackrel{\u02c6}{\psi}}={h}_{c}$: