Skip to main content

Signal processing in the cochlea: the structure equations

Abstract

Background

Physical and physiological invariance laws, in particular time invariance and local symmetry, are at the outset of an abstract model. Harmonic analysis and Lie theory are the mathematical prerequisites for its deduction.

Results

The main result is a linear system of partial differential equations (referred to as the structure equations) that describe the result of signal processing in the cochlea. It is formulated for phase and for the logarithm of the amplitude. The changes of these quantities are the essential physiological observables in the description of signal processing in the auditory pathway.

Conclusions

The structure equations display in a quantitative way the subtle balance for processing information on the basis of phase versus amplitude. From a mathematical point of view, the linear system of equations is classified as an inhomogeneous ∂ ¯ -equation. In suitable variables the solutions can be represented as the superposition of a particular solution (determined by the system) and a holomorphic function (determined by the incoming signal). In this way, a global picture of signal processing in the cochlea emerges.

1 Background

At the outset of this work is the quest to understand signal processing in the cochlea.

1.1 Linearity and scaling

It has been known since 1992 that cochlear signal processing can be described by a wavelet transform (Daubechies 1992 [1], Yang, Wang and Shamma, 1992 [2]). There are two basic principles that lie at the core of this description: Linearity and scaling.

In the cochlea, an incoming acoustical signal f(t) in the form of a pressure fluctuation (t is the time variable) induces a movement u(x,t) of the basilar membrane at position x along the cochlea. At a fixed level of sound intensity, the relation between incoming signal and movement of the basilar membrane is surprisingly linear. However as a whole this process is highly compressive with respect to levels of sound - and thus cannot be linear.

In the present setting this is taken care of by a ‘quasilinear model’. This is a model that depends on parameters, for example, in the present situation the level of sound intensity. For fixed parameters the model is linear. It is interpreted as a linear approximation to the process at these fixed parameter values. Wavelets give rise to linear transformations. The description of signal processing in the cochlea by wavelet transformations, where the wavelets depend on parameters, is compatible with this approach.

Scaling has its origin in the approximate local scaling symmetry (Zweig 1976 [3], Siebert 1968 [4]) that was revealed in the first experiments (Békésy 1947 [5], Rhode 1971 [6]).

The scaling law can best be formulated with the basilar membrane transfer function g ˆ (x,ω). This is the transfer function that is defined from the response of the linear system to pure sounds. To an input signal

cos(ωt)=Re e i ω t ,ω>0,
(1)

that is, to a pure sound of circular frequency ω there corresponds an output u(x,t) at the position x along the cochlea that on the basis of linearity has to be of the form

u(x,t)=Re { g ˆ ( x , ω ) e i ω t } .
(2)

The basilar membrane transfer function is thus a complex valued function of x and ω>0. Its modulus | g ˆ (x,ω)| is a measure of amplification and its argument is the phase shift between input and output signals. The experiments of von Békésy [5] showed that the graphs of | g ˆ (x,ω)| and | g ˆ (x,cω)| as functions of the variable x are translated against each other by a constant multiple of logc. By choosing an appropriate scale on the x-axis, the multiple can be taken to be 1. The scaling law is then expressed as

| g ˆ ( x − log c , c ω ) | = | g ˆ ( x , ω ) | .
(3)

The scaling law will be extended - with some modifications - to include the argument of g ˆ .

Intimately connected to scaling is the concept of a tonotopic order. It is a central feature in the structure of the auditory pathway. Frequencies of the acoustic signal are associated to places, at first in the cochlea and in the following stages in the various neuronal nuclei. The assignment is monotone, it preserves the order of the frequencies. In the cochlea, to each position x along the cochlear duct a circular frequency σ=ξ(x) is assigned. The function ξ is the position-frequency map. Its inverse is called the tonotopic axis. At the stand of von Békésy’s results, the frequency associated to a position x along the cochlea is simply the best frequency (BF), that is the frequency σ at which | g ˆ (x,ω)| attains its maximum. The refined concept takes care of the fact that the transfer function and with it the BF changes with the level of sound intensity, at which g ˆ is determined. The characteristic frequency (CF) is then the low level limit of the best frequency. The position-frequency map ξ assigns to the position x its CF.

Scaling according to von Békésy’s results implies the exponential law

ξ(x)=K e − x
(4)

for the position-frequency map. The constant K is determined by inserting a special value for x. The scaling law tells us that the function | g ˆ (x,ω)| is actually a function of the ‘scaling variable’

1 K ω e x = ω ξ ( x ) .
(5)

At the outset of the present investigation it will be assumed that the transfer function g ˆ is a function of the scaling variable ω ξ ( x ) . This is not strictly true, but it simplifies the exposition. In subsequent sections a general theory will be developed that incorporates quite general scaling behavior. With the availability of advanced experimental data (Rhode 1971 [6], Kiang and Moxon 1974 [7], Liberman 1978 [8], 1982 [9], Eldredge et al. 1981 [10], Greenwood 1990 [11]), the position-frequency map is now known precisely for many species. Shera 2007 [12] gives the formula

CF(x)=[CF(0)+ CF 1 ] e x / l − CF 1 .
(6)

The constant l and the ‘transition frequency’ CF 1 vary from species to species. The scaling variable that goes with it is

ν(x,f)= f + CF 1 CF ( x ) + CF 1 .
(7)

In the present setting, x is the normalized variable (x instead of x/l) and the precise position-frequency map is expressed in the form

ξ(x)=K e − x −S.
(8)

ξ denotes circular frequency and K=ξ(0)+S. The constant S is referred to as the shift.

In the abstract model as it will be developed, much will depend on the definition of the function σ that specifies the frequency location. In the present treatment the frequency localization of a function will be defined as an expectation value in the frequency domain.

1.2 Wavelets

The response to a general signal f(t) with Fourier representation

f ˆ (ω)= 1 2 π ∫ − ∞ ∞ f(t) e − i ω t dt
(9)

is given as

u(x,t)= 2 π Re ∫ 0 ∞ f ˆ (ω) g ˆ (x,ω) e i ω t dω.
(10)

Note that the Fourier transform of the real valued signal f satisfies f ˆ (ω)= f ˆ ( − ω ) ¯ . If the definition of g ˆ is extended to negative values of ω by g ˆ (x,−ω)= g ˆ ( x , ω ) ¯ then u(x,t) can be written as

u(x,t)= 1 2 π ∫ − ∞ ∞ f ˆ (ω) g ˆ (x,ω) e i ω t dω.
(11)

The transfer function will be described by a function h in the scaling variable:

g ˆ (x,ω)=h ( ω ξ ( x ) ) .

The response of the cochlea to a general signal f can then be expressed as

u(x,t)= 1 2 π ∫ − ∞ ∞ f ˆ (ω)h ( ω ξ ( x ) ) e i ω t dω.

Setting a= 1 ξ ( x ) = 1 K e x and thus x=k+loga with k=logK, the scaling function is simply h(aω). This leads to the equivalent formulation

u(k+loga,t)= 1 2 π ∫ − ∞ ∞ f ˆ (ω)h(aω) e i ω t dω.
(12)

This is recognized as a wavelet transform. Indeed, with the standard L 2 -normalization a wavelet transform Wf with wavelet ψ is defined by

W f ( a , t ) = ∫ − ∞ ∞ f ( s ) 1 a ψ ( s − t a ) ¯ d s = ∫ − ∞ ∞ f ˆ ( ω ) a ψ ˆ ( a ω ) ¯ e i ω t d ω .

If 1 2 π h(ω) is identified with ψ ( ω ) ¯ then

u(x,t)=u(k+loga,t)= 1 a Wf(a,t).
(13)

The fact, that the cochlea - in a first approximation - performs a wavelet transform appears in the literature in 1992, both in [1] and in [2].

1.3 Uncertainty principle

The natural symmetry group for signal processing in the cochlea is built on the affine group Γ. It derives from the scaling symmetry in combination with time-invariance. In addition, there is the circle group S that is related to phase shifts. Its action commutes with the action of the affine group. The full symmetry group for hearing is thus Γ×S. For this group, the uncertainty principle can be formulated. The functions for which equality holds in the uncertainty inequalities are called the extremal functions. They play a special role, similar as in quantum physics the coherent states (the extremals for the Heisenberg uncertainty principle). The starting point in the present work is the tenet that these functions provide an approximation for the cochlear transfer function.

That the extremal functions should play a special role is not a new idea. In signal processing the extremal functions first appeared in Gabor’s work (1946) [13] in connection with the Heisenberg uncertainty principle and then in Cohen’s paper (1993) [14] in the context of the affine group. In a paper by Irino 1995 [15] the idea is taken up in connection with signal processing in the cochlea. It is further developed by Irino and Patterson [16] in 1997. The presentation in this paper is based on previous work (Reimann, 2009 [17]). The concept pursued is to determine the extremals in the space of real valued signals and to use a setup in the frequency domain, not in the time domain. Different representations of the affine group give different families E c of extremal functions. The parameter c is used to adjust to the sound level and hence to provide linear approximations at different levels to the non-linear behavior of cochlear signal processing.

2 Results and discussion

2.1 Uncertainty principle

This section starts with the specification of the symmetry group Γ×S that underlies the hearing process. The basic uncertainty inequalities for this group are then explicitly derived. The analysis builds on previous results (Reimann [17]). A modification is necessary because the treatment of the phase in [17] was not satisfactory. An improvement can be achieved with the inclusion of the term α H ˆ in the uncertainty inequality. This term comes in naturally and it will influence the argument - but not the modulus - of the extremal functions associated to the uncertainty inequalities. It is claimed that the extremal functions derived in this section are a first approximation to the basilar membrane transfer function g ˆ . The extremal functions for the basic uncertainty principle are interpreted as the transfer function at high levels of sound. This situation corresponds to the parameter value c=1. With increasing parameter values the extremal functions for the general uncertainty inequality are then taken as approximations to the cochlear response at decreasing levels of sound.

2.1.1 The symmetry group

The affine group Γ is the group of affine transformations of the real line R. It is generated by the transformation group τ b (t)=t+b (b∈R) and the dilation group δ a (t)=at (a∈R, A≠0). Under the Fourier transform, the action of the dilation group on L 2 (R,C) is intertwined to the action of the inverse dilation group δ ˆ . This group also acts directly in frequency space:

δ ˆ a (ω)= ω a .
(14)

The induced unitary action on L 2 (R,C) is

δ ˆ a h(ω)= a h ( δ ˆ a − 1 ( ω ) ) = a h(aω).
(15)

(With this convention, the group action and the induced action are denoted with the same symbol.) Clearly, the invariance property of the basilar membrane transfer function directly reflects this group action.

The action

τ b (f)=f(t−b)

of the translation group intertwines under the Fourier transform to the unitary action

τ ˆ b h(ω)= e − i b ω h(ω).
(16)

Of relevance to our considerations is the space L 2 (R,R) of real valued signals of finite energy. Under the Fourier transform it is mapped onto

L sym 2 (R,C)= { h ∈ L 2 ( R , C ) : h ( ω ) = h ( − ω ) ¯ } .
(17)

Both δ ˆ and τ b ˆ act on L sym 2 . The only action that commutes with both of them is the action

ε ˆ φ h(ω)= e − i φ sgn ( ω ) h(ω)
(18)

of the circle group S. This is the third distinguished group action.

The infinitesimal operators associated with the unitary actions of δ ˆ , τ b ˆ and ε ˆ are the skew hermitian operators

A ˆ f ˆ (ω)= d d s | s = 0 δ ˆ ( e s ) f ˆ (ω)= d d s | s = 0 e s / 2 f ˆ ( e s ω )
(19)
= 1 2 f ˆ (ω)+ω d f ˆ d ω (ω),
(20)
B ˆ f ˆ (ω)= d d b | b = 0 e − i b ω f ˆ (ω)=−iω f ˆ (ω),
(21)
H ˆ f ˆ (ω)= d d φ | φ = 0 e − i φ sgn ( ω ) f ˆ (ω)=−isgn(ω) f ˆ (ω).
(22)

The commutator relations are

[ A ˆ , B ˆ ]= B ˆ ,
(23)
[ A ˆ , H ˆ ]=[ B ˆ , H ˆ ]=0.
(24)

The operators A ˆ , B ˆ and H ˆ span the Lie algebra of the ‘hearing group’ Γ×S, with Γ the affine group and S the circle group. The basic variables in cochlear signal processing are time t and position x along the cochlea. Clearly B ˆ is related to time, whereas A ˆ - as will be shown presently - is related to the position. In our approach the tonotopic axis is given by the exponential law

ξ(x)=K e − x .

Under the tonotopic axis dilations δ ˆ a are conjugated to translations (by loga) in x:

τ log a = ξ − 1 ∘ δ ˆ a ∘ξ.

Here, ξ − 1 (ω)=−log ω K is the inverse function with respect to the composition law. The intertwining action

ξh(ω)= 1 ω h ( ξ − 1 ω ) ,ω>0,

is an isometry in the sense that for all h∈ L 2 (R,C)

∫ 0 ∞ | ξ h ( ω ) | 2 dω= ∫ 0 ∞ | h ( ξ − 1 ω ) | 2 d ω ω = ∫ − ∞ ∞ | h ( x ) | 2 dx.

It intertwines A ˆ with − d d x :

A ˆ ξ=−ξ d d x
(25)

as the following calculation shows:

A ˆ ξ h ( ω ) = 1 2 ξ h ( ω ) + ω d d ω ( 1 ω h ( ξ − 1 ω ) ) = 1 2 ξ h ( ω ) − 1 2 ξ h ( ω ) + ω ω d h d x ( ξ − 1 ω ) d ξ − 1 d ω = − 1 ω d h d x ( ξ − 1 ω ) = − ξ ( d h d x ) ( ω ) .

The uncertainty principle that goes with the group Γ×S can thus been seen as an uncertainty for the determination of time and position.

2.1.2 The basic uncertainty inequality

The commutator relation

[ A ˆ , B ˆ ]= B ˆ
(26)

is at the basis of the uncertainty principle for the affine group. From the inequality

0 ≤ ∥ A ˆ h + κ H ˆ B ˆ h ∥ 2 = ∥ A ˆ h ∥ 2 + κ ( A ˆ h , H ˆ B ˆ h ) + κ ( H ˆ B ˆ h , A ˆ h ) + κ 2 ∥ H ˆ B ˆ h ∥ 2 = ∥ A ˆ h ∥ 2 + κ ( h , [ H ˆ B ˆ , A ˆ h ] h ) + κ 2 ∥ H ˆ B ˆ h ∥ 2 ,

that has to hold for all κ∈R, it follows that

| ( h , H ˆ B ˆ h ) | ≤2∥ A ˆ h∥∥ H ˆ B ˆ h∥.

In this calculation, the operators A ˆ and B ˆ can be replaced by A ˆ −α H ˆ −β B ˆ and B ˆ −ν H ˆ respectively. This leads to the new inequality

| ( h , H ˆ B ˆ h ) | ≤2 ∥ ( A ˆ − α H ˆ − β B ˆ ) h ∥ ∥ ( H ˆ B ˆ + ν ) h ∥ .
(27)

This inequality is of the same nature as the previous inequality. It can be considered as a more precise inequality, because it holds for all parameter values of α, β and ν. The expression ∥( H ˆ B ˆ +ν)h∥ is minimal for

ν=ν(h)=− ( h , H ˆ B ˆ h ) ∥ h ∥ 2 = 1 ∥ h ∥ 2 ∫ − ∞ ∞ |ω| | h ( ω ) | 2 dω.
(28)

This ν is the decisive parameter. It has the interpretation of an expectation value for the frequency. Later it will be associated with the place along the cochlea.

The uncertainty inequality can thus be stated as

ν ∥ h ∥ 2 ≤2 ∥ ( A ˆ − α H ˆ − β B ˆ ) h ∥ ∥ ( H ˆ B ˆ + ν ) h ∥ .
(29)

The minimality condition for the parameters α and β in the expression

∥ ( A ˆ − α H ˆ ) h − β B ˆ ∥

is given by the linear system

α ∥ h ∥ 2 −β(h, H ˆ B ˆ h)+(h, H ˆ A ˆ h)=0,
(30)
−α(h, H ˆ B ˆ h)+β ∥ B ˆ h ∥ 2 −Re( A ˆ h, B ˆ h)=0.
(31)

The coefficients are

( h , H ˆ B ˆ h ) = − ν ∥ h ∥ 2 , ( h , H ˆ A ˆ h ) = − ( A ˆ h , H ˆ h ) = − ∫ − ∞ ∞ ( h 2 + ω d h d ω ) i sgn ( ω ) h ¯ d ω = Re ∫ − ∞ ∞ i | ω | h ′ h ¯ d ω = 1 2 ∫ − ∞ ∞ i | ω | ( h h ′ ¯ − h ¯ h ′ ) d ω = ∫ − ∞ ∞ | h | 2 | ω | d d ω arg h d ω .

In this calculation, the fact that

d d ω argh= 1 2 i d d ω (logh−log h ¯ )= 1 2 i ( h ′ h − h ′ ¯ h ¯ )

has been used.

The remaining coefficient is

Re ( A ˆ h , B ˆ h ) = ∫ − ∞ ∞ ( h 2 + ω h ′ ) i ω h ¯ d ω = ∫ − ∞ ∞ | ω | 2 i 2 ( h ′ h − h ′ ¯ h ¯ ) d ω = ∫ − ∞ ∞ | h | 2 | ω | 2 d d ω arg h d ω .

With h= f ˆ a different meaning can be given to it:

Re ( A ˆ f ˆ , B ˆ f ˆ ) = Re ∫ − ∞ ∞ ( − f ( t ) 2 − t d f d t ( t ) ) ( − d f ¯ d t ( t ) ) d t = ∫ − ∞ ∞ t | d f d t ( t ) | 2 d t .

The integrals ∫ − ∞ ∞ | h | 2 |ω| d d ω arghdω and ∫ − ∞ ∞ t | d f d t ( t ) | 2 dt can be interpreted as expectation values of | h | 2 d d ω argh in the frequency space and for | d f d t ( t ) | 2 in the time space. Roughly, in combination with H ˆ the operator A ˆ controls d d ω argh and B ˆ the time derivative.

We will assume that the parameters α, β and ν are always chosen such that the right hand side in the uncertainty inequality is minimal, that is, the inequality is formulated in its sharpest form.

The mean deviation from the expectation value ν for the modulus of the frequency is

τ 2 = ∥ ( H B + ν I ) f ˆ ∥ 2 ∥ f ˆ ∥ 2 = 1 ∥ f ∥ 2 ∫ − ∞ ∞ ( | ω | − ν ) 2 | f ˆ ( ω ) | 2 dω.
(32)

The factor

∥ ( A ˆ − α H ˆ − β B ˆ ) h ∥

does not have such a simple interpretation except in the special case α=0. This is treated in [17].

A function h is called extremal, if equality holds for it in the uncertainty relation. The extremal functions are expected to play a special role in the signal processing of the cochlea. In the context of the classical Heisenberg uncertainty relation, the extremal functions are translates of the Gaussian function e − x 2 under the action of the Heisenberg group. They are called ‘coherent states’. Their significance in signal processing is well established since the appearance of Gabor’s work in 1946 [13]. At the outset of the present discussion is however the fact that the cochlea performs a wavelet transform - and not a Fourier transform. The invariance group is Γ×S and not the Heisenberg group. It should therefore be expected that the extremal functions as discussed below play the crucial role in the hearing process.

The extremal functions h (in frequency space) satisfy the equation

( A ˆ −α H ˆ −β B ˆ )h+κ( H ˆ B ˆ +ν)h=0.
(33)

This is in fact a differential equation:

( 1 2 + ω d d ω ) h= ( − i α sgn ( ω ) − i β ω − κ | ω | + κ ν ) h.
(34)

The solutions are

h(ω)=k e i ε sgn ( ω ) − i α sgn ( ω ) log | ω | − i β ω e − κ | ω | | ω | κ ν − 1 2 ,
(35)

with real constants k, ε, α, β, κ and ν. Square integrability implies κ>0 and ν is the positive frequency expectation value.

From the explicit form it is clear that the space of solutions is invariant under the action of Γ×S. The tenet is now:

2.2 The basilar membrane transfer function is given by extremal functions

To be more precise, there exist an extremal function h, normalized by the condition ν(h)=1, such that h( ω ξ ) adequately describes the basilar membrane transfer function g ˆ :

g ˆ (x,ω)=h ( ω ξ ) =k e i ε sgn ( ω ) − i α sgn ( ω ) log | ω ξ | − i β ω ξ e − κ | ω ξ | | ω ξ | κ − 1 2 .
(36)

In this formula, ξ=ξ(x) is the position-frequency map. Note further that

h ( ω ξ ) =h∘ δ ˆ ξ (ω)= ξ δ ˆ ξ − 1 h(ω)

such that

ν(h∘ δ ˆ ξ )=ν ( δ ˆ ξ − 1 h ) = δ ˆ ξ − 1 ( ν ( h ) ) =ξ.
(37)

The frequency expectation of g ˆ (x,ω) at x is thus ξ(x). The question then arises whether the experiments confirm the tenet. To arrive at a preliminary conclusion, graphs of the modulus and of the real part of the function h are displayed in Figure 1. The parameters are α=−π, β=2π and κ=4.

Fig. 1
figure 1

The extremal function on a relative scale. The real part and the modulus of the extremal function h to the basic uncertainty principle (c=1). In the drawings, the parameters are distance d in mm from the stapes (x= d l = d 6.6 ) and frequency f= ω 2 π in Hz. The extremal function is shown for a fixed frequency f as a function of the distance d (in mm) to the stapes. The parameters are: ε=2π, α=−π, β=2π, κ=4.

The classical results by von Békésy (1947) [5] seem to be in favor of such a statement. However the situation is of course not so simple. The basic problem is the non-linearity of the process that associates the movement u(x,t) of the basilar membrane to the input signal f(t). This process is highly compressive and therefore its description by a transfer function can at best be looked at as an approximation.

Von Békésy’s result stem from experiments on dead animals. The outcome can be compared to the experimental results obtained with life animals, yet at high intensities of sound pressure. The above description of the basilar membrane transfer function is therefore taken to be a linear approximation at high levels of sound pressure. In the following section the approach will be modified with the aim of obtaining linear approximations at all levels of sound pressure.

2.2.1 General uncertainty inequality for Γ×S

There are various ways that the abstract group Γ×S can act on the space L sym 2 . Apart from the natural representation that associates to the corresponding basis elements in the Lie algebra of Γ the operators A ˆ and B ˆ , the general representation considered below is built on the operators 1 c A ˆ and B ˆ c =−i | ω | c sgn(ω). The representation of Γ induced by this algebra representation retains the crucial scaling behavior known from the experimental results. It seems to be suitable in the present context, despite the fact that the operator B ˆ c does not stand for the time derivative any more.

The general representation of Γ×S on the space L sym 2 that will be considered is determined by the representation of its Lie algebra and as such by the operators 1 c A ˆ , B ˆ c =−i | ω | c sgn(ω) and H ˆ . There is the single non trivial commutator relation:

[ 1 c A ˆ , B ˆ c ] = B ˆ c .
(38)

The uncertainty inequality that goes with it is

| ( h , H ˆ B ˆ c h ) | ≤ 2 c ∥ ( A ˆ − α H ˆ − β B ˆ ) h ∥ ∥ ( H ˆ B ˆ c + ν ) h ∥ .
(39)

At this point it is not clear whether the term 1 c ( A ˆ −α H ˆ −β B ˆ ) should rather be 1 c ( A ˆ −α H ˆ −β B ˆ c ). In fact the inequality is true for both variants and in both cases, families of inequalities depending on the parameters are obtained. The question is how the extremal functions that are associated to these inequalities vary with the parameters. Yet by choosing the present version one finds that the set of extremal functions is invariant under the action of Γ×S. The expression α H ˆ +β B ˆ should been seen as the linear approximation of the skew hermitian operator A ˆ . As a result of using both B ˆ c and B ˆ , the first as an operator and the latter as an approximation term, this has the effect, that the argument of the extremal function appears in a slightly different context than the modulus. The extremal functions are obtained from the relation

λ 1 c ( A ˆ −α H ˆ −β B ˆ )h+μ( H ˆ B ˆ c +ν)h=0.

They satisfy the differential equation

( 1 2 + ω d d ω ) h= ( − i α sgn ( ω ) − i β ω − κ | ω | c + κ ν ) h.
(40)

The proportionality factor is κ=− c μ λ . Its choice is arbitrary. All the constants α, β, and κ can in fact be chosen in dependence of the parameter c. This gives a possibility for fine adjustment of the extremal function h c that describes the linear approximation at level c of the basilar membrane filter.

The solutions of the equation are

h c (ω)=k e i ε sgn ( ω ) − i α sgn ( ω ) log | ω | − i β ω e − κ c | ω | c | ω | κ ν − 1 2 ,
(41)

with real constants k, ε, α, β, κ and ν. These solutions are in L 2 if both κ>0 and ν>0.

The parameter in the uncertainty inequality is

ν= ν c (h)=− ( h , H ˆ B ˆ c h ) ∥ h ∥ 2 = 1 ∥ h ∥ 2 ∫ − ∞ ∞ | ω | c | h ( ω ) | 2 dω.
(42)

This time, the frequency localization of the function h is

ν 1 c (h)= ( 1 ∥ h ∥ 2 ∫ − ∞ ∞ | ω | | h ( ω ) | 2 d ω ) 1 c
(43)

and

ν 1 c (h∘ δ ˆ a )=a ν 1 c (h).
(44)

In accordance with our tenet, the basilar membrane filter is described as

g ˆ (x,ω)= h c ∘ δ ˆ ξ (ω),
(45)

with an extremal function h c , normalized by the condition ν( h c )=1.

g ˆ (x,ω)= h c ( ω ξ ) =k e i ε sgn ( ω ) − i α sgn ( ω ) log | ω ξ | − i β ω ξ e − κ c | ω ξ | c | ω ξ | κ − 1 2 .
(46)

As before, ξ=ξ(x) is the tonotopic axis. The frequency localization of g ˆ (x,ω) as a function of ω is ξ:

ν 1 c ( g ˆ ( x , ⋅ ) ) =ξ(x).
(47)

The parameter c allows to express at which level of sound intensity the linearization is specified. Parameters c∼1 indicate high levels and parameters c≫1 small levels of intensity.

Experimental results on the basilar membrane transfer function are reviewed in Robles and Rugggero, 2001 [18]. As already pointed out in the previous paper 2009 [17], the shape of the modulus of the transfer function determined at various intensities as given by Rhode and Recio (2000 [19], Figure 1C) is approximately described by the modulus of the extremal functions at the corresponding parameter values. In particular, at a fixed position, the modulus of the transfer function has its peak below the position of the frequency localization and with decreasing intensity of sound level it approaches this position. Similarly, for fixed x, the extremal functions | g ˆ c (x,ω)| attain their maxima at

ω=ξ(x) ( 1 − 1 2 κ ) 1 c .
(48)

With increasing values of c this approaches the frequency localization ξ(x).

With the present setup the argument of the basilar membrane filter is independent of c. The experimental results by Rhode and Recio (2000) show minor changes of phase in dependence of the intensity level. With increasing intensity there is a small phase lag below the characteristic frequency and an equally small phase lead for frequencies above the characteristic frequency. Studies of the impulse response also confirm that the phase is almost invariant under changes in sound level (Recio and Rhode (2000) [20], Shera (2001) [21]). In order to obtain a fine adjustment of the phase data, the parameters α and β would have to be chosen in dependence of c.

In any case, the phase function has to be a decreasing function both when considered as a function of the frequency ω and as a function of the place x. The phase of the extremal function does not satisfy this requirement because of the logarithmic term. Yet still, the phase of the extremal function serves as an approximation of the physiological phase function on the interval in which the the absolute value is relevant. At places at which the absolute value is close to zero, the argument is of no significance. In Figure 2 the phase function is pictured for the fixed circular frequency ω=1,000 as a function of the distance d to the stapes, on the interval that is of physiological relevance. In Figure 3 the phase is pictured as a function of frequency (in Hz). In this figure, the characteristic frequency is 7,000 Hz. The part above about 3,000 Hz is of physiological relevance. The approximation holds in this range. It should be compared with the experimental results by Rhode and Recio (2000 [19], Figure 2E). The part below 3,000 Hz is the mathematical expression for the phase function. It is physiologically not correct, but this is of no significance.

Fig. 2
figure 2

Phase as a function of d. The phase (in cycles) of the extremal function h as a function of the distance d (in mm) to the stapes. The frequency is 500 Hz. Under the tonotopic axis this value corresponds to d=24.4mm. The parameters are: ε=16π, α=−7.8π, β=24π.

Fig. 3
figure 3

Phase as a function of f. The phase (in cycles) of the extremal function h as a function of frequency f= ω 2 π at the distance d=10.6mm to the stapes. The solid line is the phase as determined by the extremal function. The dashed line is the physiologically correct substitute at low frequencies. Note that the phase values in this region are practically irrelevant for signal processing, because the amplitude values in this region are negligible. Under the tonotopic axis d corresponds to the frequency f=4,000Hz. The parameters are the same as in Figure 2.

The factor e − i β ω in the extremal functions h c stands for a pure time delay by β. Dividing the extremal functions by this factor, one is left with the extremal functions as they would appear if β had been set equal to zero in the general uncertainty inequality. They are the extremal functions for the uncertainty inequality

ν c ∥ h ∥ 2 ≤ 2 c ∥ ( A ˆ − α H ˆ ) h ∥ ∥ ( H ˆ B ˆ c + ν c ) h ∥ .
(49)

2.3 The structure equations

Extremals for the uncertainty principle satisfy differential equations. Since the membrane transfer function is described by an extremal function and its transforms under the symmetry group and since the extremal functions are preserved under this action, it is possible to derive differential equations for the output of the signal. The resulting equations are called the structure equations.

The derivation starts with the simple case c=1. In this situation differentiation of the wavelet transform Wf(a,t) with respect to the parameters of the symmetry group directly leads to a differential equation. In the case c>0 however, the resulting equation is actually a pseudo differential equation. A linearization process for the kernel brings it back to a differential equation that is then satisfied approximately.

The quantities in the equation at first are derivatives of the output function Wf(a,t) and its Hilbert transform. A further calculation then shows that the result can be formulated as an inhomogeneous system of linear partial differential equations for the phase and for the logarithm of the amplitude of the output signal. This is particularly satisfying because these are exactly the physiologically relevant quantities.

The point of departure is the tenet that the basilar membrane transfer function is given as

g ˆ (x,ω)= h c ∘ δ ˆ ξ (ω) ( = h c ( ω ξ ) ) ,

with an extremal function h c (normalized by the condition ν( h c )=1).

g ˆ (x,ω)= h c ( ω ξ ) =k e i ε sgn ( ω ) − i α sgn ( ω ) log | ω ξ | − i β ω ξ e − κ c | ω ξ | c | ω ξ | κ − 1 2 .

As mentioned in the section ‘background’ the response to a general signal is interpreted as a wavelet transform:

u(x,t)= 1 2 π ∫ − ∞ ∞ f ˆ (ω) g ˆ (x,ω) e i ω t dω= 1 2 π ∫ − ∞ ∞ f ˆ (ω) h c ( ω ξ ( x ) ) e i ω t dω,

with

ξ(x)=K e − x .

The parameter in the wavelet transform can be normalized such that a= 1 ξ ( x ) . The wavelet transform is then

Wf(a,t)= ∫ − ∞ ∞ f ˆ (ω) a h c (aω) e i ω t dω.
(50)

The considerations in this section start from the formula

u(x,t)= 1 a Wf(a,t).
(51)

First the case c=1 is treated. It leads to exact results whereas in the case c>1 an approximation procedure will be applied.

Differentiation with respect to the variable a gives

a ∂ ∂ a W f ( a , t ) = ∫ − ∞ ∞ f ˆ ( ω ) a d d a δ ˆ a h ( ω ) e i ω t d ω = ∫ − ∞ ∞ f ˆ ( ω ) δ ˆ a A ˆ h ( ω ) e i ω t d ω .

Note that A ˆ commutes with δ ˆ a . The normalized extremal function h satisfies the differential equation

A ˆ h=(α H ˆ +β B ˆ +κ H ˆ B ˆ +κ)h.

It follows that

a ∂ ∂ a W f ( a , t ) = ∫ − ∞ ∞ f ˆ ( ω ) δ ˆ a ( α H ˆ + β B ˆ + κ H ˆ B ˆ + κ ) h ( ω ) e i ω t d ω = ∫ − ∞ ∞ f ˆ ( ω ) ( α H ˆ + a β B ˆ + a κ H ˆ B ˆ + κ ) δ ˆ a h ( ω ) e i ω t d ω .

Under the Fourier transform, − d d t is mapped into B ˆ and the Hilbert transform H is mapped into H ˆ . This gives the basic equation

a ∂ ∂ a W f ( a , t ) = κ W f ( a , t ) + α H W f ( a , t ) − a β ∂ ∂ t W f ( a , t ) − a κ ∂ ∂ t H W f ( a , t ) .
(52)

It is quite remarkable that the differentiation a ∂ ∂ a (this essentially is differentiation with respect to x) brings in the Hilbert transform H. This transform is a unitary operator on L 2 (R,C). Its square is the negative of the identity operator: H 2 =−I. It extends to a bigger class of functions (to all temperate distributions). On the basic trigonometric functions it operates very simply:

Hcos(ωt)=sin(ωt).

Experimentally, the basilar membrane function is - at least in many studies - determined in terms of the input signals cos(ωt). It immediately follows that

WHf(a,t)=HWf(a,t)
(53)

holds for the functions f=cos(ωt). The linearity assumption then implies that this holds for arbitrary input signals f.

Since H ˆ commutes with both A ˆ and B ˆ , the basic equation tells us that

a ∂ ∂ a H W f ( a , t ) = κ H W f ( a , t ) − α W f ( a , t ) − a β ∂ ∂ t H W f ( a , t ) + a κ ∂ ∂ t W f ( a , t ) .
(54)

The Hilbert transform thus appears naturally in this setting. This is the justification to study the ‘analytic wavelet transform’. Taking the factor a into account this is

Zf(a,t)= 1 a Wf(a,t)+iH 1 a Wf(a,t)=u(x,t)+iHu(x,t).
(55)

The complex valued function Zf then satisfies

a ∂ ∂ a Z f ( a , t ) = ( κ − 1 2 ) Z f ( a , t ) + α H Z f ( a , t ) − a β ∂ ∂ t Z f ( a , t ) − a κ ∂ ∂ t H Z f ( a , t ) .
(56)

Notice the shift by 1 2 that has its origin in the factor 1 a .

The basic equation can now be reformulated as a system of equations in the polar coordinates

r ( a , t ) = | Z f ( a , t ) | = ( 1 a | W f ( a , t ) | 2 + 1 a | H W f ( a , t ) | 2 ) 1 2 , φ ( a , t ) = arg Z f ( a , t ) = arc tg H W f ( a , t ) W f ( a , t ) .

The calculation in real terms starts with the observation that on one hand

Re ( a ∂ ∂ a Z f Z f ¯ ) = 1 2 a ∂ ∂ a r 2 ,

whereas on the on the other hand

a Re ( a ∂ ∂ a Z f Z f ¯ ) = W f ( α H W f + ( κ − 1 2 ) W f − a β ∂ ∂ t W f − a κ ∂ ∂ t H W f ) + H W f ( − α W f + ( κ − 1 2 ) H W f − a β ∂ ∂ t H W f + a κ ∂ ∂ t W f ) = a ( κ − 1 2 ) r 2 − a ( W f ∂ ∂ t W f + H W f ∂ ∂ t H W f ) − a κ ( W f ∂ ∂ t H W f − H W f ∂ ∂ t W f ) = a ( κ − 1 2 ) r 2 − a 2 β 2 ∂ ∂ t r 2 − a 2 κ r 2 ∂ φ ∂ t .

This gives the first equation

a ∂ ∂ a logr= ( κ − 1 2 ) −aβ ∂ ∂ t logr−aκ ∂ φ ∂ t .
(57)

Similarly, on one hand

Im ( a ∂ ∂ a Z f Z f ¯ ) = Re Z a ∂ ∂ a Im Z − Im Z a ∂ ∂ a Re Z = r 2 a ∂ ∂ a arc tg Im Z Re z = r 2 a ∂ φ ∂ a .

On the other hand

a Im ( a ∂ ∂ a Z f Z f ¯ ) = W f ( − α W f + ( κ − 1 2 ) H W f − a β ∂ ∂ t H W f − a κ ∂ ∂ t W f ) − H W f ( − α H W f + ( κ − 1 2 ) W f − a β ∂ ∂ t W f − a κ ∂ ∂ t H W f ) = − α a r 2 − a β ( W f ∂ ∂ t H W f − H W f ∂ ∂ t W f ) + a κ ( W f ∂ ∂ t W f + H W f ∂ ∂ t H W f ) = − α r 2 − a 2 β r 2 ∂ φ ∂ t + a 2 κ 2 ∂ ∂ t r 2 .

This gives the second equation

a ∂ φ ∂ a =−α+aκ ∂ ∂ t logr−aβ ∂ φ ∂ t .
(58)

The calculation in complex notation makes use of the fact that

HZf=Hu(x,t)+i H 2 u(x,t)=−i ( u ( x , t ) + i H u ( x , t ) ) =−iZf.

The basic equation is then

a ∂ ∂ a Zf= ( κ − 1 2 ) Zf+αHZf−aβ ∂ ∂ t Zf−aκ ∂ ∂ t HZf
(59)
= ( κ − 1 2 − i α ) Zf−a(β−iκ) ∂ ∂ t Zf.
(60)

Dividing by Zf it follows immediately that

∂ ∂ a logZf=κ− 1 2 −iα−a(β−iκ) ∂ ∂ t logZf.
(61)

The case c≠1 is now being treated in a similar spirit. Recall that the normalized extremal functions (ν=1) satisfy the differential equation

A ˆ h c =(α H ˆ +β B ˆ +κ H ˆ B ˆ c +κ) h c

and that the basic equation derives from

a ∂ ∂ a W f ( a , t ) = ∫ − ∞ ∞ f ˆ ( ω ) δ ˆ a A ˆ h c ( ω ) e i ω t d ω = ∫ − ∞ ∞ f ˆ ( ω ) δ ˆ a ( α H ˆ + β B ˆ + κ H ˆ B ˆ c + κ ) h c ( ω ) e i ω t d ω .

The case c≠1 will not directly lead to a differential operator, because the operator B ˆ c is not the Fourier transform of a differential operator (unless c is an odd natural number). It is however possible to use a linear approximation for B ˆ c near the frequency expectation value of h c , that is, at the point ω=1:

−isgn(ω) | ω | c =−isgn(ω)−ic ( ω − sgn ( ω ) ) +O ( ω − sgn ( ω ) ) 2 ,
(62)
B ˆ c ≅ H ˆ +c( B ˆ − H ˆ ).
(63)

The above equation is then approximated by

a ∂ ∂ a W f ( a , t ) ≅ ∫ − ∞ ∞ f ˆ ( ω ) δ ˆ a ( α H ˆ + β B ˆ + κ H ˆ ( 1 − c ) H ˆ + c κ H ˆ B ˆ + κ ) h c ( ω ) e i ω t d ω = ∫ − ∞ ∞ f ˆ ( ω ) ( α H ˆ + a β B ˆ + a c κ H ˆ B ˆ + c κ ) δ ˆ a h c ( ω ) e i ω t d ω .

With the consequence that

a ∂ ∂ a Wf≅cκWf+αHWf−aβ ∂ ∂ t Wf−acκ ∂ ∂ t HWf.
(64)

The calculation for the analytic wavelet transform Zf=u+iHu then proceeds as above. Only the constants are slightly different. In the sequel the notation γ=cκ will be used. Recall that in prospective refined adjustments the parameters α, β and γ may vary with c.

The structure equations are

a ∂ ∂ a logr≅ ( γ − 1 2 ) −aβ ∂ ∂ t logr−aγ ∂ φ ∂ t ,
(65)
a ∂ φ ∂ a ≅−α+aγ ∂ ∂ t logr−aβ ∂ φ ∂ t .
(66)

They combine to the complex equation

a ∂ ∂ a logZf≅γ− 1 2 −iα−a(β−iγ) ∂ ∂ t logZf.
(67)

Equality holds if c=1.

Under the tonotopic axis ξ(x)=K e − x , x=k+loga, the derivative a ∂ ∂ a transforms into ∂ ∂ x :

∂ ∂ x (u+iHu)(x,t)=a ∂ ∂ a logZf(a,t).
(68)

The structure equations can be written in x,t-coordinates:

∂ ∂ x log(u+iHu)(x,t)≅γ− 1 2 −iα− 1 ξ ( x ) (β−iγ) ∂ ∂ t log(u+iHu)(x,t).
(69)

2.4 Consequences of the structure equations

Signal processing in the cochlea is non-linear. The main - but certainly not the only - source of non-linearity is the compressive nature inherent in the hearing process. In the abstract model pursued here this is taken care of with a single parameter that represents the level of sound intensity. The model then describes the linear approximations at these levels. The structure equations are at the core of this abstract model, in fact they comprise all the essential features. First of all, they are linear (as would be expected from a linear approximation). From a mathematical point of view, the equations therefore are very simple. On top, the system is quite special. With respect to suitable variables it represents an inhomogeneous ∂ ¯ -equation. Its solutions can be realized in complex form as products of two factors, the first of which is entirely determined by the system and the second is a holomorphic function that can be calculated from the signal. At every level c it is thus possible to associate to an input signal in a unique way a holomorphic function that describes the output signal in terms of the physiological parameters.

The phase and the logarithm of the amplitude are used in the description of the experiments and they are omnipresent in all the representations of the auditory pathway. In themselves they are of limited significance, because they are not coded as such. What really is essential in any cochlear or in any neural model are the changes of these quantities, both with respect to time and with respect to the place. The structure equations precisely relate the local and temporal derivatives of phase and (logarithm of) amplitude. The geometry of the cochlea implicitly is inherent in the extremality property of the basilar membrane filter. But in the structure equations this only shows in terms of the constants. The implicit appearance of the tonotopic axis is an expression of the basic invariance principle that stands at the outset of all considerations.

The structure equations clearly exhibit the dichotomy in cochlear signal processing. The signals can either be analyzed in terms of their phase or in terms of their amplitudes. Assume that there is complete information on phase changes, that is, the quantities ∂ φ ∂ t and ∂ φ ∂ a are known. Then the second equation

a ∂ φ ∂ a ≅−α+aγ ∂ ∂ t logr−aβ ∂ φ ∂ t

can be solved for ∂ ∂ t logr. Inserted in the first equation

a ∂ ∂ a logr≅ ( γ − 1 2 ) −aβ ∂ ∂ t logr−aγ ∂ φ ∂ t

this then determines ∂ ∂ a logr. Conversely, the complete knowledge of amplitude information determines the phase information. From an abstract point of view, phase information and amplitude information each individually contain the full information of the signal. In the auditory pathway both phase and amplitude information is being processed. It is commonly assumed that phase information dominates in the low frequency range and amplitude information in the regions that process high frequencies. The equations tell us that phase processing and amplitude processing are equally significant.

The complex equation

∂ ∂ x log(u+iHu)(x,t)≅γ− 1 2 −iα− 1 ξ ( x ) (β−iγ) ∂ ∂ t log(u+iHu)(x,t)

shows that there is also a twofold way of data processing with respect to time and with respect to the place. Complete information on derivatives with respect to the position gives complete information on time derivatives - and vice versa.

The structure equations are so simple that they can be solved in explicit mathematical terms. In its complex form the structure equation is the linear inhomogeneous equation

a ∂ ∂ a logY(a,t)=γ− 1 2 −iα−a(β−iγ) ∂ ∂ t logY(a,t).
(70)

The general solution of an inhomogeneous linear differential equation can be presented as the linear combination of a particular solution (any chosen solution of the equation) and the general solution of the associated homogeneous differential equation.

A particular solution log Y p of the above complex equation is the function

log Y p = ( γ − 1 2 − i α ) loga:= P γ (a).
(71)

Its distinguished feature is the time independence. It follows that the general solution logY is of the form

logY(a,t)= P γ (a)+logX(a,t)
(72)

for some function X satisfying the homogeneous equation

∂ ∂ a X(a,t)=−(β−iγ) ∂ ∂ t X(a,t).
(73)

This leads to the product representation

Y(a,t)= e P γ ( a ) X(a,t).
(74)

(As a side remark, observe that the complex structure equation is obtained from the basic equation

a ∂ ∂ a Zf= ( γ − 1 2 − i α ) Zf−a(β−iγ) ∂ ∂ t Zf
(75)

after division by Zf. Writing

Zf(a,t)= e P γ ( a ) X(a,t),

it is then clear that the homogeneous differential equation for X also holds at the zeros of X.) With the variable change

z = t − a β + i a γ , z ¯ = t − a β − i a γ

the homogeneous equation turns into a ∂ ¯ -equation for the transformed function

G(z, z ¯ )=X(a,t).
(76)
0 = ∂ ∂ a X ( a , t ) + ( β − i γ ) ∂ ∂ t X ( a , t ) = ∂ G ∂ z ( − β + i γ ) + ∂ G ∂ z ¯ ( − β − i γ ) + ( β − i γ ) ( ∂ G ∂ z + ∂ G ∂ z ¯ ) = − 2 i γ ∂ G ∂ z ¯ .

This then shows that the solutions of the linear inhomogeneous equation have the representation

Y(a,t)= a γ − 1 2 − i α G(z),
(77)

with G a holomorphic function in the variable z=t−aβ+iaγ. Since a>0 (and γ>0) it is defined in the upper half space {z∈C:Imz>0}. The function G(z) is uniquely defined up to a constant.

The situation can now be summarized as follows: An incoming signal f(t) gives rise to a family of analytic wavelet transforms

Zf= 1 a (Wf+iHWf)
(78)

depending on the parameter γ=cκ. The functions Zf approximately satisfy the complex structure equation. The solutions

Y(a,t)= a γ − 1 2 − i α G(t−aβ+iaγ)
(79)

of the equation

a ∂ ∂ a logY=γ− 1 2 −iα−a(β−iγ) ∂ ∂ t logY
(80)

are then expected to provide approximations for Zf (with equality for c=1).

The functions G are holomorphic and depend on the parameter γ. They can in fact be determined directly from the Fourier transform of the incoming signal f(t). Since the system is linear, the superposition principle holds:

If f= f 1 + f 2 is the superposition of two incoming signals f 1 and f 2 to which the holomorphic functions G 1 (z) and G 2 (z) are associated, then the holomorphic function for f is

G(z)= G 1 (z)+ G 2 (z).
(81)

All that has to be done is to calculate the holomorphic functions that correspond to to the basic functions f(t)=Acos(νt+ϑ). In the following section these are identified as the functions

G(z)=k e i ε A e i ϑ ν γ − 1 2 − i α e i ν z .
(82)

The Fourier representation

f(t)= 2 π Re ∫ 0 ∞ f ˆ (ω) e i ω t dω
(83)

then tells us that the holomorphic function associated to f is

G(z)=k e i ε 2 π ∫ 0 ∞ f ˆ (ω) ω γ − 1 2 − i α e i ω z dω.
(84)

The conclusion is that the holomorphic functions G(z) with z=t−aβ+iaγ provide approximate solutions to the structure equation

Zf(a,t)≅ e P γ ( a ) G(z)
(85)
= a γ − 1 2 − i α G(t−aβ+iaγ).
(86)

The relevant expressions in the structure equations can then be calculated from the derivative of F(z):=logG(z):

∂ ∂ t logr= ∂ ∂ t ReF(z)=Re ∂ ∂ t F(z)=Re { F ′ ( z ) ∂ z ∂ t } =Re F ′ (z),
(87)
∂ ∂ t φ=Im F ′ (z),
(88)
a ∂ ∂ a logr=γ− 1 2 +Re { F ′ ( z ) ( − a β + i a γ ) } ,
(89)
a ∂ ∂ a ϕ=α+Im { F ′ ( z ) ( − a β + i a γ ) } .
(90)

2.5 Examples

2.5.1 Pure sounds

For the input signal

f(t)=Re e i ν t =cosνt,ν>0,

the quantities logr(a,t) and φ(a,t) can be calculated explicitly from the formula

1 a W f ( a , t ) = Re { g ˆ ( k + log a , ν ) e i ν t } = Re { k e i ε sgn ( ν ) − i α log | a ν | − i β a ν + i ν t e − κ c | a ν | c | a ν | κ − 1 2 } .

This is

log r ( a , t ) = log k − κ c | a ν | c + ( κ − 1 2 ) log a ν , φ ( a , t ) = i ε − α log a ν − β a ν + ν t ,

and for the derivatives

a ∂ ∂ a log r = − κ | a ν | c + κ − 1 2 , ∂ ∂ t log r = 0 , a ∂ ∂ a φ = − α − β a ν , ∂ ∂ t φ = ν .

The first structure equation gives

−κ | a ν | c +κ− 1 2 ≅ ( c κ − 1 2 ) −acκν.

This is in fact the correct linear approximation. It is equivalent with the linear approximation of | a ν | c at aν=1:

| a ν | c ≅1+c(aν−1).

The second structure equation is satisfied as an equality.

From the complex structure equation the holomorphic function associated to f(t)=cosνt can be determined:

log Z f = log k + i ε ν − i α log | a ν | − i β a ν + i ν t − κ c | a ν | c + ( κ − 1 2 ) log | a ν | , a ∂ ∂ a log Z f = − κ | a ν | c + ( κ − 1 2 ) − i ( α + β a ν ) .

With the above approximation this is

a ∂ ∂ a logZf≅−cκaν+ ( c κ − 1 2 ) −i(α+βaν)=γaν+ ( γ − 1 2 ) −i(α+βaν)

(with the abbreviation cκ=γ). Observe that for c>1

| a ν | c >1+c(aν−1),

unless aν=1. The approximate value for log|Zf| therefore is an over estimation. Together with

a ∂ ∂ t logZf=iν

the result leads to

log k + i ε + ( γ − 1 2 − i α ) log | a ν | − γ a ν − i a β ν + i ν t = log k + i ε + P γ ( a ) + P γ ( ν ) + i ν z

as the approximate value of logZf.

The associated holomorphic function is thus

G(z)=k e i ν z + P γ ( ν ) + i ε ,

with

z = t − a β + i a γ , e P γ ( ν ) = ν γ − 1 2 − i α .

Note that the holomorphic function G associated to the input signal

f(t)=ReA e i ν t + i ϑ =Acos(νt+ϑ),ν>0,A>0,ϑ∈R,

is

G(z)=k e i ε A e i ϑ ( ν ) γ − 1 2 − i α e i ν z .
(91)

The constants k and ε are of little importance and do not show in the structure equations. In the following calculations we set k e i ε =1.

2.5.2 Amplitude modulation

The amplitude modulated signal

f ( t ) = ( 1 + A cos μ t ) cos ν t = Re { e i ν t + A 2 e i ( ν + μ t ) + A 2 e i ( ν − μ t ) } ( 0 < A < 1 )

with 0<μ≪ν is described by the holomorphic function

G ( z ) = e P ( ν ) + i ν z + A 2 e P ( ν + μ ) + i ν z + i μ z + A 2 e P ( ν − μ ) + i ν z − i μ z = e P ( ν ) + i ν z ( 1 + A 2 e P ( ν + μ ) − P ( ν ) + i μ z + A 2 e P ( ν − μ ) − P ( ν ) − i μ z ) .

The outcome depends on the ratio between the amplitudes of the coefficients. The frequency ν is dominant as long as

| A 2 e P ( ν + μ ) − P ( ν ) + i μ z | < 1 2 , | A 2 e P ( ν − μ ) − P ( ν ) − i μ z | < 1 2 .

With

Re { P ( ν ± μ ) − P ( ν ) } = ( γ − 1 2 ) log ν ± μ ν ≅ ± μ ν ( γ − 1 2 ) , Re i μ z = − μ a γ

this gives the estimates

a > 1 ν ( 1 − 1 2 γ + ν μ γ log A ) , a < 1 ν ( 1 − 1 2 γ − ν μ γ log A ) .

The frequency interval covered is

[ ν 1 − 1 2 γ + k , ν 1 − 1 2 γ − k ]

with

k=− ν μ γ logA.

For sufficiently small values of A and μ≪ν it includes the entire range along the cochlea that is involved in the processing of the amplitude modulated signal. The function F describing this signal in the relevant range can then be estimated by using the approximation log(1+x)≅x for small |x|:

P(ν±μ)−P(ν)= ( γ − 1 2 − i α ) log ( 1 ± μ ν ) ≅± ( γ − 1 2 − i α ) μ ν

and

F(z)=P(ν)+iνz+log { 1 + A 2 ( e i μ z + μ ν ( γ − 1 2 − i α ) + e − i μ z − μ ν ( γ − 1 2 − i α ) ) }
(92)
≅P(ν)+iνz+Acos ( μ z − α − i μ ν ( γ − 1 2 ) ) .
(93)

The result exhibits the basic frequency ν as the carrier frequency. But it should be warned that the approximation is valid only in the frequency interval specified above.

The relevant expressions in the structure equation can be calculated from F ′ (z):

F ′ (z)≅iν−μAsin ( μ z − α − i μ ν ( γ − 1 2 ) ) .

It can clearly be seen that there is the constant contribution from the carrier frequency and - as the interesting part - a slow oscillation of angular frequency μ that stems from sin(μz+const). Both the amplitude and the phase derivatives show this oscillation.

2.5.3 The sound of a violin

No doubt, the distinguished feature of a violin sound is the extraordinary big number of harmonics in the frequency spectrum. It is not uncommon to observe around twenty harmonics at an intensity level at which the sounds can still be detected. Except possibly for the first few, the harmonics show a gradual decrease in amplitude with some oscillation. It is conjectured that these properties are in fact characteristic for violins of good quality. Figure 4 shows the amplitudes of the individual harmonics of the violin sounds a e ′ and d ″ . The program ‘Prisma-Realtime’ by Bachmann et al. (2007) [22] uses windowed Fourier transform for this spectrogram. The amplitudes are determined at short intervals and marked with a point. The intensity of these points is fading with the time.

Fig. 4
figure 4

Spectrograms of three sounds on a new violin. Spectrograms of the three sounds a, e ′ and d ″ on a new violin. The amplitudes of the higher harmonics are determined at short intervals and marked with a point. The intensity of these points is fading with the time. Amplitudes of the harmonics are shown on a relative scale (in dB). More than 20 higher harmonics can be identified. In the first example the level differences of the first 20 harmonics are within a limit of 20 dB.

The violin sound has the representation

f(t)=Re { ∑ m = 1 ∞ c m e i m ν t } .

Figure 4 indicates that | c m | decreases exponentially in m. In the dB-scale the decrease is roughly linear with slope −2:

20 log 10 | c m | ≅ const . − 2 m , | c m | ≅ const . e − 0.23 m

(with the approximation 10≅ e 2.3 ).

The associated holomorphic function is

G(z)= ∑ m = 1 ∞ c m e i m ν z + P ( m ν ) .

It is a Fourier series ∑ m = 1 ∞ d m e i m ν t with coefficients

d m = c m e − a γ m ν − i a β m ν + P ( m ν )

depending on the position (represented by the variable a). The amplitudes

| d m |=| c m | e − a γ m ν ( m ν ) γ − 1 2 ≅ e 0.2 m − a γ m ν ( m ν ) γ − 1 2

are maximal for

m= γ − 1 2 a γ ν + 0.23 ≅ 1 a ν .

At the place along the cochlea that codes for the angular frequency nν, that is, for ξ(x)= 1 a =nν, the coefficient d n is thus dominant. In a neighborhood of this n th harmonic (near a= 1 n ν ) the function F can be described locally. The calculation exhibits apart from the contribution by the carrier frequency a substantial oscillatory part of angular frequency ν. In a first calculation only the influence of the two closest harmonics is taken into account. The partial signal

f ( t ) = Re { c n e i n ν t + c n + 1 e i ( n + 1 ) ν t + c n − 1 e i ( n − 1 ) ν t } : = Re { A e i n ν t + i ϑ + A + e i ( ( n + 1 ) ν t + i θ + ) + A − e i ( ( n − 1 ) ν t + i θ − ) } = Re { A e i n ν t + i ϑ ( 1 + A + A e i ( ϑ + − ϑ ) e i ( n + 1 ) ν t + A − A e i ( ϑ − − ϑ ) e i ( n − 1 ) ν t ) }

corresponds to the function

As before, the approximation

P ( n + 1 n ) ≅± 1 n ( γ − 1 2 − i α )

is used. The coefficients

c ± = A ± A e i ( ϑ ± − ϑ ) ± 1 n ( γ − 1 2 − i α )

are decomposed as

c + = s + d 2 , c − = s − d 2 ,

with s= c + + c − and d= c + − c − . This gives the approximation

F(z)≅inνz+P(nν)+logA+iθ+scos(νz)+idsin(νz)

near the n th harmonic. The carrier frequency accounts for the part inνz+P(nν)+logA+iθ. In the structure equation it only participates with time independent terms. But the significant part is the contribution that varies in time with angular frequency ν (recall that z=t−aβ+iaγ).

The qualitative picture remains unchanged, if several neighboring harmonics are considered. Near the n th harmonic the dominant term in the function F(z) is expected to be inνz+P(nν)+log c n .

F ( z ) = log { ∑ m = 1 ∞ c m e i m ν z + P ( m ν ) } = i n ν z + P ( n ν ) + log c n + log { 1 + ∑ m ≠ n c m c n e i ( m − n ) ν z + P ( m n ) } .

This shows the presence of the carrier frequency. The remainder term is approximated by

R n ( z ) = ∑ m ≠ n c m c n e i ( m − n ) ν z + P ( m n ) = e i ν z ∑ m > n c m c n e i ( m − n − 1 ) ν z + P ( m n ) + e − i ν z ∑ m < n c m c n e i ( m − n + 1 ) ν z + P ( m n ) .

This function is 2 π ν -periodic in time, with leading term

e i ν z c n + 1 c n e P ( n + 1 n ) + e − i ν z c n − 1 c n e P ( n − 1 n ) .

In conclusion it can be said that locally around a= 1 n ν the function F(z) is of the form

F(z)=inνz+P(nν)+log c n + R n (z),
(94)

with a well defined remainder term that is 2 π ν -periodic in time. The term inνz+P(nν)+log c n shows the presence of the carrier frequency. The relevant information about the violin sound is however contained in the fact that the 2 π ν -periodicity extends over an interval along the cochlea that comprises more than three octaves in the tonal range. The nature of this contribution is similar all along the interval covered by the frequency spectrum of the violin sound. The exceptions are the low harmonics (essentially the first and second) at which the influence of the neighboring harmonics is very small. Furthermore, the amplitude spectrum of a violin sound very often fails to display monotonicity for the first few harmonics.

With regard to the violins, it should be mentioned that there are considerable differences between different (good quality) instruments. It is believed that the distribution in the first few harmonics very much contributes to the individuality of the violin.

2.6 The impulse response

The response to the impulse function (the dirac function at the origin) is up to rescaling the inverse Fourier transform h ˇ c of the extremal function

h c (ω)=k e i ε sgn ( ω ) − i α sgn ( ω ) log | ω | − i β ω e − κ c | ω | c | ω | κ − 1 2 .

For simplicity it is assumed that k e i ε sgn ( ω ) =1. Since the Fourier transform of the dirac function is the constant function 1 2 π , the impulse response is

u(x,t)=u(logK+loga,t)= 1 a ∫ ∞ ∞ 1 2 π δ a ˆ h c (ω) e i ω t dω
(95)
= 1 2 π ∫ ∞ ∞ h c (aω) e i ω t dω
(96)
= 1 a h ˇ c ( t a ) .
(97)

Therefore au(logK+loga,t)= h ˇ c ( t a ) is a function of the single variable s= t a . This could also be expressed by saying that tu(x,t) only depends on the single variable s - the usual way to formulate the invariance statement.

Figure 5 shows the impulse response for different values of c. The impulse response must have its support on the positive half of the time axis. The membrane cannot show a reaction before the impulse arrives. The numerical calculations show that this is almost satisfied. At this point attention should be drawn to a deficiency of the approach. The basic difficulty lies in the concept of using the uncertainty principle. The appropriate thing would be to restrict the class of functions in the uncertainty principle to functions that in the time domain are supported on the positive half axis. However, in the restricted class there are no extremal functions. This can be seen from the fact that the class of extremal functions is translation invariant in the time domain. With the present setting it is not strictly true that the impulse response has its support on the positive half axis. The extremal functions have to be interpreted as a first approximation. They have to be modified slightly such that they really vanish for negative values of t. The numerical calculations show that only small modifications are necessary.

Fig. 5
figure 5

Impulse response. The impulse response on a relative scale as a function of time (in seconds), for κ=12 and for various values of c. The position along the cochlea corresponds to the frequency 200 Hz.

The invariance property of the impulse response mentioned above in combination with the structure equations allow for an explicit approximate calculation of h ˇ c . In the case c=1 this procedure gives the precise value up to a multiplicative constant.

Recall that

Zf(a,t)=u(k+loga,t)+iHu(k+loga,t)=r(a,t) e i φ ( a , t )

was the analytic response. The functions f and g are then defined starting from au(k+loga,t) as

g ( s ) = g ( t a ) = log a + log r ( a , t ) , f ( s ) = f ( t a ) = φ ( a , t ) .

The derivatives appearing in the structure equations can be expressed in terms of f ′ and g ′ :

a ∂ ∂ a log r ( a , t ) = a ∂ ∂ a ( − log a + g ( t a ) ) = − t a g ′ ( t a ) − 1 , ∂ ∂ t log r ( a , t ) = ∂ ∂ t g ( t a ) = 1 a g ′ ( t a ) , a ∂ ∂ a φ ( a , t ) = − t a f ′ ( t a ) , ∂ ∂ t φ ( a , t ) = 1 a f ′ ( t a ) .

They are inserted in the structure equations. The result is a system of differential equations for the functions f ′ and g ′ of the variable s.

− s g ′ ( s ) − 1 ≅ ( γ − 1 2 ) − β g ′ ( s ) − γ f ′ ( s ) , − s f ′ ( s ) ≅ − α + γ g ′ ( s ) − β f ′ ( s )

or equivalently

( s − β ) g ′ ( s ) − γ f ′ ( s ) ≅ − ( γ + 1 2 ) , γ g ′ ( s ) + ( s − β ) f ′ ( s ) ≅ α .

The complex calculation is quickly done.

( s − β + i γ ) ( g ′ ( s ) + i f ′ ( s ) ) ≅ − ( γ + 1 2 ) + i α , g ( s ) + i f ( s ) ≅ ( − γ − 1 2 + i α ) log ( s − β + i γ ) + const . , e g ( s ) + i f ( s ) ≅ K ( s − β + i γ ) ( − γ − 1 2 + i α ) .

All that remains is to determine the integration constant K. The inverse Fourier transform h ˇ c of h c is then approximated as

h ˇ c (s)≅Re { K ( s − β + i γ ) ( − γ − 1 2 + i α ) } .

This is an exact result if c=1.

The function log(s−β+iγ) is well defined and holomorphic for Ims>−γ. In this range

ϑ=arg(s−β+iγ)

can be chosen in the interval (0,Ï€). The modulus of the holomorphic function

( s − β + i γ ) ( − γ − 1 2 + i α ) = e ( − γ − 1 2 + i α ) log ( s − β + i γ )

is then bounded in the upper half plane. Therefore the function F(z) associated to the impulse response is holomorphic and can be calculated from it. Since z and s are related by z=t−aβ+iaγ=a(s−β+iγ), it follows that

log r ( a , t ) e i φ ( a , t ) = log K − log a + ( − γ − 1 2 + i α ) log ( s − β + i γ ) = log K + ( γ − 1 2 − i α ) log a − ( γ + 1 2 − i α ) log z = log K + P γ ( a ) − ( γ + 1 2 − i α ) log z .

Therefore

F(z)=logK− ( γ + 1 2 − i α ) logz,
(98)

and this function is indeed holomorphic in the upper half plane.

The determination of the integration constant is left open. There is however some information that can be obtained directly from the differential equations for f and g. The above system is solved for f ′ and g ′ :

g ′ ( s ) ≅ α γ − ( s − β ) ( γ + 1 2 ) ( s − β ) 2 + γ 2 , f ′ ( s ) ≅ γ 2 + γ 2 + α ( s − β ) ( s − β ) 2 + γ 2 : = N D .

From the equation ∂ ∂ t g( t a )= 1 a g ′ ( t a ) an estimate for the peak of logr at the position given by the angular frequency 1 a can be obtained. The peak is determined by g ′ (s)=0:

0 = α γ − ( s − β ) ( γ + 1 2 ) , t = a ( β + α − α 2 γ + 1 ) .

This equation tells us how the peak arising from a click travels along the cochlea.

Information on the derivative of the phase is obtained from ∂ ∂ t φ(a,t)= 1 a f ′ ( t a ). For this the second derivative of f is calculated:

f ″ (s)≅ α D − 2 ( s − β ) N D 2 .

Near the peak of logr the second derivative is small and it changes sign between β+α and β (recall that α<0):

f ″ ( β + α ) ≅ − α γ 2 + γ + α 2 ( γ 2 + α 2 ) 2 , f ″ ( β ) ≅ α γ 2 .

Therefore f ′ is a slightly increasing function at s=(α+β) and it is slightly decreasing at s=β. The function f ′ itself takes values close to 1:

f ′ ( β + α ) ≅ γ 2 + γ 2 + α 2 γ 2 + α 2 = 1 + γ 2 ( γ 2 + α 2 ) , f ′ ( β ) ≅ 1 + 1 2 γ .

The graph of f ′ (s) is shown in Figure 6. The parameter values are α=−5.3π, β=16π, furthermore, γ=32π.

Fig. 6
figure 6

Frequency glides. The ratio of momentaneous frequency versus CF as a function of time, measured in periods of CF. The parameters are α=−5.3π, β=16π and c=32π.

According to Shera [23] frequency glides in click responses of the basilar membrane have their origin in the dispersion properties of the slow traveling wave. The variable s= t a =tξ(x) is the scale invariant variable denoted by 2πτ=t2π f CF (x) in [23], p. 2025. Above, the frequency is expressed in the variable s as φ(a,t)=f(s). This gives

1 2 π ∂ φ ∂ t (a,t)= 1 2 π a f ′ (s)

for the instantaneous frequency and hence f ′ (s)= 1 ξ ( x ) ∂ φ ∂ t (a,t) for the normalized instantaneous frequency. In [23], p. 2025, Shera denotes it by β i n (τ) and pictures its graph in Figure 2b. The above calculations show that f ′ (s) is increasing for values below α+β. This can be interpreted as a frequency glide. Note that f ′ (s) starts to decrease near β. Hence Figure 2b in [23] would roughly confirm the present calculations provided that α+β≈12π. There is however a difference in that the present calculation exhibits a dependence of f ′ on the sound level (as represented by c) with maximal values for f ′ that are slightly bigger than one.

2.7 General invariance groups

Scale invariance as considered in the previous sections is based on the dilation group δ a and on the assumption that the tonotopic axis is given by the exponential law ξ(x)=K e − x . From the experimental data it should however rather be concluded that the symmetry hypotheses are satisfied only locally and in a first approximation. The question therefore arises whether the results subsist qualitatively when these basic assumptions are modified. To answer this question the setup of an abstract model is being presented.

The basic hypothesis is still that the symmetry in cochlear mechanics is given by a one parameter transformation group λ ˆ a acting in phase space. This action can be taken in a quite general form. Ideally it should be possible to adapt it individually to each species. The specific form of the tonotopic axis is to a certain extent independent of the action of the one parameter group. It will be discussed as a separate issue and for the time being the exponential law will be retained.

The transformation group λ ˆ a thus stands at the outset of a general framework for an abstract description of cochlear mechanics. The one parameter group will be enlarged to a bigger group. At first this will be done on the infinitesimal level by defining a multiplier operator M ˆ that plays the role of B ˆ =−iω in the previous sections. Together with the infinitesimal generator L ˆ for the one parameter group λ ˆ a and together with H ˆ , the action of the Lie algebra of the abstract symmetry group is then completely determined. As an abstract group, the symmetry group is still Γ×S, yet the action in phase space will have changed. It will in fact be conjugate to the standard action. The conjugation mapping typically maps a bounded symmetric range in frequency space onto the whole frequency axis. In the application the bounded range will be the interval (−R,R). The upper bound R appears as an absolute frequency bound. In such a model, the inner ear is completely indifferent to signals whose frequency content lies beyond this limit.

As a first issue the wavelet transform for general one parameter groups is being discussed. Next the action of the one parameter group is determined in dependence of the parameter c that relates to the overall sound level of the signal. This action will then be extended on the infinitesimal level to all of Γ×S. Along with this, the conjugation mapping will be defined.

2.7.1 The wavelet transform for general one parameter groups

One parameter transformation groups on R + =(0,∞) are generated by vector fields v: R + →R. Assume that v extends to a continuous odd function on R and that the solutions τ t (ω) to the differential equation

d x d t =v(x)
(99)

with initial condition τ 0 (ω)=ω are uniquely determined and depend smoothly on the initial condition ω. Then τ t is a one parameter transformation group of R + . It extends in an antisymmetric way to all of R. At the point x=0 the vector field vanishes and x=0 is a stationary solution of the differential equation. Let us transform the time parameter t and set

a = e t , λ a = λ e t = τ t .

The one parameter group { λ a } is then written with a multiplicative parameter a∈ R + (the multiplicative group of positive real numbers). It then follows that

d λ a d a ( ω ) = d τ t d a ( ω ) = d τ t d t ( ω ) 1 a = v ( τ t ω ) 1 a = 1 a v ( λ a ω ) .

Example

v(x)=xlog | x | R .

The differential equation has the solutions

τ t ( ω ) = sgn ( ω ) R | ω R | e t , λ a ( ω ) = sgn ( ω ) R | ω R | a .

Assume that the vector field v has a finite number of zeroes ± x i , labeled in ascending order

x 0 =0< x 1 < x 2 <⋯< x k =∞

(the point x k =∞ is included though this is not necessarily a zero of v). If

ω∈( x i − 1 , x i ):= I i ,

then λ a ω stays in I i for all a∈ R ∗ . If in addition v>0 in I i , then

lim a → ∞ λ a ω= x i

and

lim a → 0 λ a ω= x i − 1 .

The variable transform ζ= λ a ω,

d ζ d a = d d a ( λ a ω)= 1 a v( λ a ω)= 1 a v(ζ)
(100)

gives

∫ 0 ∞ g( λ a ω) d a a = ∫ x i − 1 x i g(ζ) d ζ v ( ζ )

whenever the integrals exist. If however v<0 in I i then the sign changes. Hence in both cases

∫ 0 ∞ g( λ a ω) d a a = ∫ I i g(ζ) d ζ | v ( ζ ) | .

If τ t (ω) is the solution of the differential equation d x d t =v(x) with initial condition τ 0 (ω)=ω then the variational equation is

d d t d τ t d ω = v ′ ( τ t ω) d τ t d ω .

Integrating with respect to t gives

ln | d τ t d ω | − ln | d τ 0 d ω | = ∫ 0 t d d t ln | d τ t d ω | d t = ∫ 0 t v ′ ( τ t ω ) d t = ∫ ω τ t ω v ′ ( x ) v ( x ) d x = ln | v ( τ t ω ) v ( ω ) | , | d τ t d ω | = | v ( τ t ω ) | | v ( ω ) | = v ( τ t ω ) v ( ω ) .

This can be written as

| d ω | | v ( ω ) | = | d λ a | | v ( λ a ω ) | .
(101)

The formula expresses that | d ω | | v ( ω ) | is the invariant measure for the group action.

The transformation group { λ a } induces a unitary representation λ on L 2 (R,C):

λ a h ( ω ) = | d λ a − 1 d ω ( ω ) | 1 2 h ( λ a − 1 ( ω ) ) , ∥ λ a h ∥ 2 = ∫ R | d λ a − 1 d ω ( ω ) | | h ( λ a − 1 ( ω ) ) | 2 d ω = ∥ h ∥ 2 .
(102)

(The same notation λ a is used for both the group and its unitary representation. To be consistent with the previous notation the group should actually be denoted by λ ˆ a , since it will be taken as a group that acts in phase space.)

Note that

| d λ a d ω ( ω ) | = | v ( λ a ω ) | | v ( ω ) |

is a cocycle for the transformation group { λ a }.

Given the transformation group { λ a } in phase space and a function ψ∈ L 2 (R,C), define the one parameter family of wavelets by

ψ(a,t)= 1 2 π ∫ − ∞ ∞ λ a ψ ˆ (ω) e i t ω dω.
(103)

The wavelet transform with wavelet ψ and transformation group { λ a } is

W ψ f(a,b)= ∫ − ∞ ∞ f(t) ψ ¯ (a,t−b)dt
(104)
= ∫ − ∞ ∞ f ˆ (ω) λ a ψ ˆ ( ω ) ¯ e i b ω dω.
(105)

The isometry properties of the wavelet transform have their origin in the isometry property of the Fourier transform. Applied to W ψ they give

1 2 π ∫ − ∞ ∞ | W ψ f ( a , b ) | 2 db= ∫ − ∞ ∞ | f ˆ ( ω ) | 2 | λ a ψ ˆ ( ω ) | 2 dω.

This equality is integrated with respect to the measure d a a and the order of integration is interchanged:

1 2 π ∫ 0 ∞ ∫ − ∞ ∞ | W ψ f ( a , b ) | 2 d a d b a = ∫ − ∞ ∞ | f ˆ ( ω ) | 2 ( ∫ 0 ∞ d a a | λ a ψ ˆ ( ω ) | 2 ) d ω = ∫ − ∞ ∞ | f ˆ ( ω ) | 2 ( ∫ 0 ∞ d a a | v ( λ a − 1 ω ) v ( ω ) | | ψ ˆ ( λ a − 1 ω ) | 2 ) d ω .

With the change of variables ω= λ a (σ) the integration with respect to d a a can be transformed into an integration over the interval I i =( x i , x i + 1 ) that contains ω:

∫ 0 ∞ d a a | v ( λ a − 1 ω ) | | ψ ˆ ( λ a − 1 ω ) | 2 = ∫ I i | ψ ˆ ( σ ) | 2 dσ:= C i .

Assume now that ψ is a real valued wavelet. Then

1 2 π ∫ 0 ∞ ∫ − ∞ ∞ | W ψ f ( a , b ) | 2 d a d b a = ∑ i C i ∫ − I i ∪ I i | f ˆ ( ω ) | 2 d ω | v ( ω ) | .
(106)

From the formula it can be concluded that the full information on f is contained in the wavelet transform, provided the integrals are finite and the constants C i different from zero for all indices. A formal reconstruction can be obtained in terms of the wavelets

χ(a,t):= 1 C i 2 π ∫ − ∞ ∞ | d λ a − 1 ω d ω | − 1 2 ψ ˆ ( λ a − 1 ω ) e i t ω dω.

(Notice the negative sign in the exponent!)

The corresponding wavelet transform is

V χ g(a,b)= ∫ − ∞ ∞ g(t) χ ¯ (a,t−b)dt.

A similar calculation as before then gives

1 2 π ∫ 0 ∞ ∫ − ∞ ∞ W ψ f(a,b) V χ g ( a , b ) ¯ d a d b a = ∫ − ∞ ∞ f ˆ (ω) g ˆ ¯ (ω)dω= ∫ − ∞ ∞ f(t) g ( t ) ¯ dt.

The function is thus recovered weakly in the sense of L 2 -duality by

1 2 π ∫ 0 ∞ ∫ − ∞ ∞ W ψ f(a,b)χ(a,t−b) d a d b a =f(t).
(107)

For the description of cochlear mechanics the reconstruction of the signal from its output at the cochlear level (that is, from its wavelet transform) is not an issue. No reconstruction is taking place in the auditory pathway. However from the point of information processing it is of relevance to know whether the wavelet transform contains the full information of the original signal.

In the application the wavelet transform will be described by a wavelet with frequency support in [−R,R]. The above reconstruction process would then give the projection of the signal onto the subspace of band limited signals.

2.7.2 Extension of the group action

In the case of the dilation group when λ a is δ ˆ a and the generating vector field is v(x)=−x, the operator B ˆ =−iω together with the infinitesimal generator A ˆ = 1 2 +ω d d ω for the action of the dilation group satisfy the commutator relation

[ A ˆ , B ˆ ]= B ˆ

of the Lie algebra of the affine group Γ. For general vector fields v, the infinitesimal generator for the group action is

Lh(ω)= d d a | a = 1 λ a h(ω)
(108)
=− 1 2 v ′ (ω)h(ω)−v(ω) d d ω h(ω).
(109)

Together with B ˆ it will in general fail to span a finite dimensional Lie algebra of operators. However there exists a skew hermitian multiplier operator M such that

[L,M]=M.
(110)

The multiplier of M can be taken in the form

M=−isgn(ω)s(ω).
(111)

The symmetric real valued function s(ω) is then determined by the differential equations that results by applying the commutator relation

−v d d ω s=s.
(112)

In any interval void of zeros of the vector field, the function s is determined up to a multiplicative factor by

s(ω)= e − ∫ d ω v ( ω ) .

If v is a smooth vector field with zeros at ± x i then s(ω) will map any interval I i =( x i − 1 , x i ) onto the positive real half axis R + . Furthermore, it will conjugate the action of the transformation group λ a (restricted to I i ) with the action of the dilation group δ ˆ a :

λ a = s − 1 ∘ δ ˆ a ∘s.
(113)

This can be verified by calculating the infinitesimal generator of the conjugate group:

d d a | a = 1 s − 1 ∘ δ ˆ a ∘ s ( ω ) = − ( s − 1 ) ′ ( s ( ω ) ) s ( ω ) = − s ( ω ) s ′ ( ω ) = v ( ω ) .

The conjugation map s induces an isometry s ∗ between L 2 and the subspace L I 2 of L 2 -functions restricted to − I i ∪ I i :

s ∗ h(ω):= | d s d ω | 1 2 h ( s ( ω ) ) = | s ( ω ) v ( ω ) | 1 2 h ( s ( ω ) ) .
(114)

In the application, the interval on which λ a acts will be I=(0,R) and the vector field will take negative values. The function sgn(ω)s(ω) then maps (−R,R) onto R and conjugates λ a to δ ˆ a .

The operators L and M satisfy the commutator relation of the Lie algebra of the affine group Γ. The action of the group λ a is thus extended on the infinitesimal level to a Lie algebra action of the affine group.

It can be lifted to Γ={(a,t):a>0,t∈R} by setting

μ t h(ω)= e − i t sgn ( ω ) s ( ω ) h(ω).

The circle group action ε ˆ φ with infinitesimal generator H ˆ remains unchanged and commutes with the Γ-action

[L, H ˆ ]=[M, H ˆ ]=0.
(115)

The operators corresponding to B ˆ c are

M c =−isgn(ω) s c (ω),a>0.

The only non trivial commutator relation is

[ 1 c L , M c ] = M c .
(116)

For later use note that λ a commutes with L, whereas

λ a M = a M λ a , λ a M c = a c M c λ a .

2.7.3 The uncertainty inequality

For a given vector field generating the transformation group λ a , an interval I i =( x i − 1 , x i ) is fixed. The signal space L 2 (R,R) is then restricted to the subspace L I 2 of band limited signals with frequencies in − I i ∪ I i . In this space, the general uncertainty inequality for Γ×S can be stated as

ν ∥ h ∥ 2 ≤ 2 c ∥ ( L − α H ˆ − β M ) h ∥ ∥ ( H ˆ M c + ν ) h ∥ ,
(117)

with

ν= ν c =− ( h , H ˆ M c h ) ∥ h ∥ 2 = 1 ∥ h ∥ 2 ∫ − ∞ ∞ | s ( ω ) | c | h ( ω ) | 2 dω
(118)

and

α ∥ h ∥ 2 − β ( h , H ˆ M h ) + ( h , H ˆ L h ) = 0 , − α ( h , H ˆ M h ) + β ∥ M h ∥ 2 − Re ( L h , M h ) = 0 .

The quantities ν(h), α(h) and β(h) are defined for all functions in L I 2 with Lh and M c h in L 2 . Under λ a they transform as

ν( λ a h)=− ( λ a h , H ˆ M c λ a h ) ∥ h ∥ 2 =− a − c ( λ a h , H ˆ M c λ a h ) ∥ h ∥ 2 = a − c ν(h),
(119)
α( λ a h)=α(h),
(120)
β( λ a h)=aβ(h).
(121)

For every parameter c>0 define the frequency localization of h by

σ c (h)= s − 1 ( | ν c ( h ) | 1 c ) .
(122)

Since s conjugates λ a to δ ˆ a , it follows that

σ c ( λ a h)= λ a ( σ c ( h ) ) .
(123)

Similarly, the phase expectation can be defined by

ϱ(h)= s − 1 ( β ( h ) ) .

It satisfies

ϱ( λ a h)= λ a − 1 ( ϱ ( h ) ) .

The equation for the extremal functions is

(L−α H ˆ −βM)h=κ( H ˆ M c +ν)h.
(124)

It has explicit solutions in terms of s=s(ω):

h c (ω)= | v ( ω ) | − 1 2 k e i ε sgn ( ω ) − i α sgn ( ω ) log | s ( ω ) | − i β s ( ω ) e − κ c | s ( ω ) | c | s ( ω ) | κ ν .
(125)

These solutions can either be calculated directly from the differential equation for the extremal solutions or they can be determined from h c by applying the conjugation mapping s ∗ .

For later purpose note the explicit formula for λ a h c (ω). Since s conjugates δ ˆ a to λ a , one has s( λ a − 1 (ω))=as(ω) and therefore

λ a h c (ω)= | d λ a − 1 d ω ( ω ) | 1 2 h c ( λ a − 1 ( ω ) )
(126)
= | v ( ω ) | − 1 2 k e i ε sgn ( ω ) − i α sgn ( ω ) log | a s ( ω ) | − i β a s ( ω ) e − κ c | a s ( ω ) | c | a s ( ω ) | κ ν .
(127)

Apart from the factor | v ( ω ) | − 1 2 this is a function of the single variable as(ω).

2.7.4 Structure equations in the general setting

As in the previous section, assume that v is a vector field with zeros at ± x i that generates a one parameter group λ a of transformations on −I∪I=(− x i ,− x i − 1 )∪( x i − 1 , x i ). Signals f∈ L I 2 with frequency content in −I∪I can then be analyzed with the group wavelet transform

W ψ f(a,t)= ∫ − ∞ ∞ f ˆ (ω) λ a ψ ˆ ( ω ) ¯ e i t ω dω.

The Fourier transform ψ ˆ of the wavelet with frequency support in −I∪I is taken to be a normalized (with ν=1) extremal function for the uncertainty inequality for the group Γ×S. The Lie algebra action is determined by the operators L, M c and H ˆ . If h is an extremal function with coefficients α, β and ν, then h ¯ is also an extremal function. Its coefficients are −α, −β and ν. It is thus possible to take ψ ˆ ¯ = h c :

Wf(a,t)= ∫ − ∞ ∞ f ˆ (ω) λ a h c (ω) e i t ω dω.
(128)

The structure equations have their origin in the differential equations

(L−α H ˆ −βM)h=κ( H ˆ M c +ν)h
(129)

for the extremal functions. In combination with the differentiated form of the wavelet transform

a d d a Wf(a,b)= ∫ − ∞ ∞ f ˆ (ω)L λ a h c (ω) e i b ω dω

this gives the basic formula

a d d a Wf(a,t)= ∫ − ∞ ∞ f ˆ (ω) λ a (α H ˆ +βM+κ H ˆ M c +κ) h c (ω) e i t ω dω
(130)
= ∫ − ∞ ∞ f ˆ (ω) ( α H ˆ + a β M + a c κ H ˆ M c + κ ) λ a h c (ω) e i t ω dω.
(131)

In this situation, a linear approximation for the multiplier operator M c is inserted. The frequency localization of λ a h c is at σ= σ c ( λ a h c ). At this position s c is approximated by

s c (ω)≅ s c (σ)+c s c − 1 (σ) s ′ (σ)(ω−σ)
(132)
= s c (σ)−c s c ( σ ) v ( σ ) (ω−σ).
(133)

Note that

s c (σ)= s c ( σ c ( λ a h c ) ) = ν c ( λ a h c )= a − c ν c ( h c )= a − c
(134)

with the chosen normalization. Therefore the multiplier operators M and M c - when applied to λ a h c - are approximated by

M c ≅ a − c ( 1 + c σ v ( σ ) ) H ˆ − a − c c v ( σ ) B ˆ ,
(135)
M≅ a − 1 ( 1 + σ v ( σ ) ) H ˆ − a − 1 1 v ( σ ) B ˆ .
(136)

Altogether, when applied to λ a h c , the expression in brackets in the basic formula above is replaced by

( α + β ( 1 + σ v ( σ ) ) ) H ˆ − β v ( σ ) B ˆ − c κ σ v ( σ ) − c κ v ( σ ) H ˆ B ˆ .

The derivative of the wavelet transform is then expressed in terms of Wf, d d t Wf and their Hilbert transforms

a d d a W f ≅ − γ σ v ( σ ) W f + ( α + β ( 1 + σ v ( σ ) ) ) H ˆ W f + β v ( σ ) d d t W f + γ v ( σ ) d d t W H ˆ f
(137)

(again, the notation cκ=γ has been used).

In order to identify the Fourier transform of the wavelet with the cochlear filter, it must be suitably normalized. In case of the dilation group, the normalization 1 a δ ˆ a h(ω) has been used. The function δ ˆ a h is localized at ξ(x)= 1 a . The appropriate normalization for λ a h is thus σ ( λ a h ) λ a h. Note that d d a σ( λ a h)= d d a λ a (σ(h))=v(σ( λ a h)). As in the section on the structure equation, the analytic wavelet transform of f is then defined by

Zf(a,t)= σ Wf(a,t)+i σ HWf(a,t),
(138)

with σ=σ( λ a h). The polar coordinates of Zf are

r ( a , t ) = | Z f ( a , t ) | = ( σ | W f ( a , t ) | 2 + σ | H W f ( a , t ) | 2 ) 1 2 , φ ( a , t ) = arg Z f ( a , t ) = arc tg H W f ( a , t ) W f ( a , t ) .

The calculation of the general structure equations now proceeds as before. Taken the shift v ( σ ) 2 σ into account that is caused by the normalizing factor σ , the coefficients in the above equation are abbreviated by

A=− γ σ v ( σ ) + v ( σ ) 2 σ ,
(139)
B=α+β ( 1 + σ v ( σ ) ) ,
(140)
C= β v ( σ ) ,
(141)
D= γ v ( σ ) .
(142)

The general structure equations are

a ∂ ∂ a logr≅A+C ∂ ∂ t logr+D ∂ φ ∂ t ,
(143)
a ∂ φ ∂ a ≅−B−D ∂ ∂ t logr+C ∂ φ ∂ t
(144)

and in their complex form

a ∂ ∂ a logZf≅A−iB+(C−iD) ∂ ∂ t logZf.
(145)

In the special case that v(x)=−x the coefficients reduce to A=γ− 1 2 , B=α, C=−aβ and D=−aγ.

The previous structure equations are thus recovered.

The solutions of the complex equation can again be determined. The t-independent particular solution P, giving rise to the multiplicative factor e P , is obtained by solving the equation

a ∂ ∂ a P=A−iB.
(146)

If the variable is changed:

σ = σ ( λ a h ) = λ a ( σ ( h ) ) : = λ a ( σ 1 ) , d σ d a = 1 a v ( σ ) ,

then

P ˜ (σ):=P(a)=∫(A−iB) d a a =∫(A−iB) d σ v ( σ )
(147)

gives the t-independent factor as a function of the variable σ.

Similarly, the homogeneous equation

a ∂ ∂ a X(a,t)=(C−iD) ∂ ∂ t X(a,t)
(148)

can be formulated in the variables σ and t:

The result is the ∂ ¯ -equation

v(σ) ∂ ∂ σ X ˜ (σ,t)=(C−iD) ∂ ∂ t X ˜ (σ,t)
(149)

for the complex variable

z=t+∫(C−iD) d σ v ( σ ) .
(150)

Expressed in the variables σ and t, the solutions of the complex structure equation are thus of the form P ˜ +logG with G(z)= X ˜ (σ,t) a holomorphic function in the variable z.

2.7.5 The abstract models

An abstract model for the cochlea is a model for the cochlea that is based on a one parameter group λ a . Its cochlear filter g(x,ω) is described by the translates λ a h of a normalized function h. This function is an extremal for the uncertainty relation that goes with the action of the symmetry group Γ×S that is determined from the one parameter group λ a . The frequency localization σ(g(x,⋅)) will be used in place of the CF. It is independent of the parameter c that stands for the level of sound intensity and it satisfies

σ( λ a h)= λ a ( σ ( h ) ) .
(151)

Assume that the extremal function λ a h with frequency location σ( λ a h) represents the cochlear filter at x:

σ( λ a h)=ξ(x).
(152)

Then the point x ′ at which λ a ′ h represents the cochlear filter is given by

x ′ = ξ − 1 ( σ ( λ a ′ h ) ) = ξ − 1 ( λ a ′ σ ( h ) ) = ξ − 1 ∘ λ a ′ ∘ξ(x).
(153)

The position-frequency map ξ conjugates the group λ a to the transformation group λ ˜ a = ξ − 1 ∘ λ a ∘ξ along the x-axis. Observe that the generating vector fields v and v ˜ for the group λ a and its conjugate λ ˜ a are related by

v ˜ (x)= ( d ξ d x ( x ) ) − 1 v ( ξ ( x ) ) .
(154)

At the outset of the present studies the group action was given by the dilation group δ ˆ a and the tonotopic axis was defined by

x=logK−logω,ω>0.
(155)

As a result, the group parameter was related to the position by

x=logK+loga
(156)

and the group conjugate to δ ˆ a under the tonotopic axis was the translation group, generated by the vector field v ˜ =1.

Basically there are now two different ways in which the tonotopic axis can be built into the abstract model:

  1. 1.

    The starting point is the translation group along the x-axis (generated by the constant vector field). Under the tonotopic axis, this group is conjugated to the group λ a in phase space. As the prime example, take the experimentally determined tonotopic axis

    x=logK−log(ω+S)
    (157)

with ‘shift’ S (the inverse to the position-frequency map, see the section ‘Background’).

  1. 2.

    The preassigned group λ a in phase space is the point of departure. The position- frequency map ξ then conjugates this group to a group λ ˜ a =ξ∘ λ a ∘ξ along the x-axis. In general, λ ˜ a will not be the translation group.

Examples for the two variants will be given in the next subsection. Here is an illustration, how the choice of the one parameter group λ a can be motivated. There is a graduation of the physiological and geometrical properties along the cochlear duct. The cochlea is arranged in a spiral and the geometric quantities like, for example, the width of the cochlear duct change gradually. The gradation of the physiological data manifests itself, for example, in the change of the elasticity properties of the basilar membrane and in the increase in length of the hair cells and their cilia. The proposition is to take this into account by replacing the translation group along the x-axis (generated by the constant vector field v ˜ =1) by the group λ ˜ a generated by an affine vector field v ˜ =1+kx (k constant). This group is then conjugated under the ‘rough’ position-frequency map ω=ξ(x)=K e − x to the scaling group λ a . The generating vector field for this group is then

v(ω)= d ξ d x ( ξ − 1 ω ) v ˜ ( ξ − 1 ( ω ) ) =−ω ( 1 + k log K ω ) =kωlog ω R ,ω>0,
(158)

with

R=K e 1 k .
(159)

2.7.6 Two examples

In this section the procedure of setting up a model according to invariance principles is summarized by explicitly calculating two specific models from the data derived in the previous sections. At the core is the one parameter transformation group λ a acting in phase space. This action is extended to a group action of the affine group Γ, compatible with the natural action of the circle group S in phase space. The general uncertainty principle for Γ×S leads to families E c of extremal functions that are invariant under the action of λ a .

In the cochlea, incoming signals are transformed into neuronal impulses. This process is non linear - in particular with respect to changes in the level of sound pressure of the incoming signal. The parameter c stands for an unspecified average level of sound pressure. The action of the cochlea can then be described in its linear approximation at the sound level captured by the parameter c. Linear actions are completely determined by the cochlear transfer function g c (x,ω).

The models thus obtained will be called abstract models, since they are based on general principles. There are very few parameters in such a model, and these will have to be estimated on the basis of experimental results. Some flexibility lies in the choice of the one parameter group action that stands at the outset of all considerations. Yet the experimental results restrict the choice to group actions that are close to the action of the dilation group. Abstract models vary smoothly in dependence on the one parameter group λ a . It can be said that the qualitative behavior of abstract models - in particular with respect to the structure equations - is affected very little by the choice of the group λ a .

In the first example, the underlying one parameter group is the transformation group in phase space generated by the vector field

v(ω)=ωlog | ω | R
(160)

with flow

λ a (ω)=sgn(ω)R | ω R | a
(161)

that has already been discussed as an example in the section on general invariance groups. This vector field is conjugate via the tonotopic axis to an affine vector field along the x-axis. The heuristic motivation for this choice is the gradation of all physical quantities along the cochlea. The affine vector field should take care of this aspect. The infinitesimal generator for the L 2 -action is

L h ( ω ) = − 1 2 v ′ ( ω ) h ( ω ) − v ( ω ) d d ω h ( ω ) = − 1 2 ( 1 + log | ω | R ) h ( ω ) − ω log | ω | R d d ω h ( ω )

and the multiplier operator M that satisfies [L,M]=M is given by

M=−isgn(ω)s(ω).
(162)

The symmetric real valued function s(ω) is determined up to a multiplicative constant by

s(ω)= − 1 log ω R .
(163)

It maps the interval (−R,R) onto the real axis. (The zeroes of the vector field are at 0 and at ±R.) The value R appears as an absolute upper bound for the frequency range that is relevant in the hearing process. The extremal functions h c ∈ E c for the uncertainty relation have been calculated above (formula (125)). For the normalized function (ν=1)

h c (ω)= | v ( ω ) | − 1 2 k e i ε sgn ( ω ) − i α sgn ( ω ) log | s ( ω ) | − i β s ( ω ) e − κ c | s ( ω ) | c | s ( ω ) | κ
(164)

the frequency localization is given by

σ c ( h c )= s − 1 ( | ν c ( h c ) | 1 c ) = s − 1 (1)=R e − 1 ,
(165)

since the inverse function to τ=s(ω) is

ω= s − 1 (τ)=R e − 1 τ .
(166)

The frequency localization of λ a h c is then

σ(a):= σ c ( λ a h c )= λ a ( σ c ( h c ) ) = λ a ( R e − 1 ) =R e − a .
(167)

Under the ‘rough’ position-frequency map ξ(x)=K e − x the parameter a is related to the position x by the equation

R e − a = σ c ( λ a h c )=ξ(x)=K e − x
(168)

and hence

a=x+log R K .
(169)

A parametric view of the modulus of the basilar membrane filter is shown in Figures 7 and 8. The first figure gives it as a function of the distance to the stapes and the second as a function of frequency (on a relative scale). The shape of the amplitude curves changes with CF from wide shallow tunings to sharp tunings at high CF. This is consistent with neuronal tuning curves (Kiang et al. 1965 [24]) and with basilar membrane data (Robles and Ruggero 2001 [18]). Nonlinear analysis techniques that build on Wiener kernels (Temchin et al. 2005 [25], Recio-Spinoso et al. 2005 [26]) or on ‘zwuis-analysis’ (van der Heijden and Joris 2003 [27], 2006 [28]) allow to recover the basilar membrane motion from measurements in the auditory nerve. The resulting panoramic graphs (for example, [28], Figure 5 or [25], Figures 1 and 2) should be compared with Figures 7 and 8. That the basilar movement in the apical region of the cochlea is difficult to come by with experimental techniques is discussed by Temchin and Ruggero 2010 [29].

Fig. 7
figure 7

The cochlear filter for an abstract model. The amplitude (on a relative scale) of the cochlear filter calculated for the invariance group λ generated by the vector field v(ω)=ωlog ω R . This is a panoramic view, showing the amplitude on a relative scale as a function of the distance d to the stapes for the frequencies f=200,400,800,1,600,3,200and6,400Hz.

Fig. 8
figure 8

The cochlear filter as a function of frequency. The amplitude (on a relative scale) of the same cochlear filter as in the previous figure, but as a function of frequency on a logarithmic scale. The places are taken at distances d=5,10,15,20,25,30and35mm from the stapes.

The wavelet transform in the cochlea is given by

Wf(a,b)= ∫ − ∞ ∞ f ˆ (ω) λ a h c (ω) e i b ω dω.
(170)

The function λ a h c is given by formula (127). The expressions for v and s have to be inserted.

The structure equations are approximate differential equations, which are satisfied by

r(a,t) e φ ( a , t ) =Zf(a,t)= σ ( a ) Wf(a,t)+iH σ ( a ) Wf(a,t).
(171)

The equations are

a ∂ ∂ a log r ≅ A + C ∂ ∂ t log r + D ∂ φ ∂ t , a ∂ φ ∂ a ≅ − B − D ∂ ∂ t log r + C ∂ φ ∂ t .

The functions A, B, C and D of the parameter a are:

A = γ a − a 2 , B = α + β ( 1 − 1 a ) , C = − β a R e a , D = − γ a R e a .

The function logZf(a,t)=logr(a,t)+iφ(a,t) is of the form

logZf(a,t)≅P(a)+logG(z)=− γ a − a 2 + i β a +i(α+β)loga+logG(z),
(172)

with G(z) a holomorphic function of the variable

z=t+∫(C−iD) d a a = − β + i γ R ∫ e a d a a .
(173)

The second example is based on the ‘physiological’ tonotopic axis

x=logK−log(ω+S)
(174)

(see section ‘Background’) with inverse

ξ(x)=K e − x −S.
(175)

The one parameter group λ a is taken to be conjugate to δ ˆ a under the shift mapping

s(ω)=ω+S
(176)

with shift S. Unfortunately, this mapping does not map R + to itself. Hence it has to be modified near ω=0 such that the modified mapping s ˜ is monotone, maps R + onto itself and agrees with s for values ω≥ε. The positive constant ε can be chosen to be arbitrarily small. This difficulty will be suppressed and the notation s instead of s ˜ will be used for the modified mapping. The vector field for the conjugate one parameter group λ a = s − 1 ∘ δ ˆ a ∘s is

v(ω)=−ω−S.
(177)

The frequency localization of the extremal function λ a h is

σ( λ a h)= λ a ( σ ( h ) ) = s − 1 ( 1 a ) = 1 a −S.
(178)

The parameter a is related to the position by

1 a =σ( λ a h)+S=ξ(x)+S.
(179)

The normalized function

1 a λ a h c = ξ ( x ) + S ω + S k e i ε sgn ( ω ) − i α sgn ( ω ) log | a s ( ω ) | − i β a s ( ω ) × e − κ c | a s ( ω ) | c | a s ( ω ) | κ ν
(180)

is thus a function of the ‘scaling variable’

as(ω)= ω + S ξ ( x ) + S .
(181)

This is in accordance with the generalized scaling variable defined by Shera (2007, [12], formula 5, p. 2740).

Note that the normalization slightly differs from the normalization ξ ( x ) λ a h c that was used above for the general setting.

In Figure 9 the effect of the physiological tonotopic axis is illustrated with a panoramic view of the modulus of the basilar membrane transfer function.

Fig. 9
figure 9

Adjusting for the position-frequency map. The amplitude (on a relative scale) of the cochlear filter shown for the physiological position-frequency map f(d)=K e − d l −S with K=20,000, S=200, l=6.6 and parameter c=4. The amplitude is drawn as function of the distance to the stapes for the frequencies f=250,500,1,000,2,000,4,000,8,000and16,000Hz.

3 Conclusions

Signal processing in the cochlea is investigated from an abstract point of view. The time variable t and the place variable x along the cochlea provide the basic operators d d t and d d x . These are translated into frequency space. For the place variable the tonotopic axis is used and for the time variable it is the Fourier transform that accomplishes the transfer. The corresponding operators A ˆ and B ˆ in frequency space do not commute. They satisfy the commutator relation [A,B]=B that displays the relation for the Lie algebra of the affine group Γ. The image under the Fourier transform of the space of acoustic signals is identified as the subspace L sym of the complex Hilbert space L 2 (R,C). The only one-parameter group of transformations S on L sym that commutes with the action of the affine group is generated by the (Fourier transform H ˆ of the) Hilbert transform H. The invariance group for the hearing process in the cochlea is thus the product Γ×S of the affine group Γ with the circle group S.

Signal processing in the cochlea is highly compressive and thus non-linear. The approach pursued here is to fix a parameter c for the general level of sound intensity and to specify a linear approximation at this level.

From a mathematical point of view this is done by choosing a different representation of Γ×S for every parameter c. The non-commutativity of the associated infinitesimal operators gives rise to a family of uncertainty inequalities depending on the parameter c. The extremal functions - the coherent states - for these uncertainty relations are expected to play a special role. It is known from the work of Daubechies (1992) and Yang, Wang and Shamma (1992) that the cochlea performs a wavelet transform. The wavelet is defined by the cochlear filter. In the present approach, the linear approximation and with it the cochlear filter depend on the parameter c. Comparison with experimental results now suggests that the wavelet that determines the cochlear filter is an extremal function for the uncertainty relation with parameter c.

The abstract model as it is derived here from general mathematical concepts has very few parameters and thus gives a very concise picture of signal processing in the cochlea. On the other hand, with just a few parameters at diposition, it will be impossible to capture all the fine structure that has been established experimentally.

With the model at hand, signal processing in the cochlea can be understood in a global way. At the core of the analysis is a system of differential equations - called the structure equations - that hold for the processed signals, that is, for the wavelet transforms of the acoustic signals. The equations are formulated for physiological observables. In this context these are the derivatives of amplitude and phase. The equations provide us with qualitative and quantitative information on the structure of signal processing in the cochlea. Specifically they give insight into the delicate balance of phase versus amplitude. A global picture emerges since it is possible to present the solutions to the structure equations in terms of holomophic functions.

As examples, pure sounds, amplitude modulated sounds, clicks and sounds of the violin are subject to special scrutiny. The click response exhibits the wavelet in the time variable (Figure 5). A deficiency of the mathematical model becomes apparent, since the impulse response should have its support in the positive time axis. This would imply that the Fourier transform of the wavelet has a holomophic continuation to the half space - and this is not the case. In order to remedy this deficiency, the extremals for the uncertainty inequality should be determined within the class of these functions. Yet this class is not compact and such extremals do not exist. However looking at the impulse response, it is clear that the values for t<0 are quite close to zero. The impulse responses at various intensities are thus close to functions with support in the positive time axis.

The analysis of violin sounds tells us that the pitch frequency is present in the movement of the cochlea on a range that covers several octaves. The symmetries observed experimentally are the most important constituents of the abstract model.

On a closer look, these symmetries are only local, that is they hold approximately on bounded time and frequency intervals. The question then arises whether the theory can be adapted to a wider concept of symmetry. In the last section such a theory is developed and it is argued that the principal features of the model can be preserved in this enlarged framework. In particular, the structure equations still remain approximately satisfied. It thus appears that there is some inherent stability in this model.

References

  1. Daubechies I: Ten Lectures on Wavelets. SIAM, Philadelphia; 1992.

    Book  Google Scholar 

  2. Yang X, Wang K, Shamma S: Auditory representation of acoustic signals. IEEE Trans. Inf. Theory 1992, 38: 824–839.

    Article  Google Scholar 

  3. Zweig G: Basilar membrane motion. Cold Spring Harbor Symp. Quant. Biol. 1976, 40: 619–633.

    Article  Google Scholar 

  4. Siebert WM: Stimulus transformations in the peripheral auditory system. In Recognizing Patterns. Edited by: Kolers P.A., Eden M.. MIT, Cambridge; 1968:104–133.

    Google Scholar 

  5. von Békésy G: The variation of phase along the basilar membrane with sinusoidal vibrations. J. Acoust. Soc. Am. 1947, 19: 452–460.

    Article  Google Scholar 

  6. Rhode WS: Observations of the vibration of the basilar membrane in squirrel monkeys using the Mössbauer technique. J. Acoust. Soc. Am. 1971, 49: 1218–1231.

    Article  Google Scholar 

  7. Kiang NYS, Moxon EC: Tails of tuning curves of auditory-nerve fibers. J. Acoust. Soc. Am. 1974, 55: 620–630.

    Article  Google Scholar 

  8. Liberman MC: Auditory-nerve response from cats raised in a low-noise chamber. J. Acoust. Soc. Am. 1978, 63: 442–455.

    Article  Google Scholar 

  9. Liberman MC: The cochlear frequency map for the cat: labeling auditory nerve-fibers of known characteristic frequency. J. Acoust. Soc. Am. 1982, 72: 1441–1449.

    Article  Google Scholar 

  10. Eldredge DH, Miller JD, Bohne BA: A frequency-position map for the chinchilla cochlea. J. Acoust. Soc. Am. 1981, 69: 1091–1095.

    Article  Google Scholar 

  11. Greenwood DD: A cochlear frequency-position function for several species - 29 years later. J. Acoust. Soc. Am. 1990, 87: 2592–2605.

    Article  Google Scholar 

  12. Shera CA: Laser amplification with a twist: traveling-wave propagation and gain functions from throughout the cochlea. J. Acoust. Soc. Am. 2007, 122: 2738–2758.

    Article  Google Scholar 

  13. Gabor D: Theory of communication. J. IEEE 1946, 93: 429–457.

    Google Scholar 

  14. Cohen L: The scale representation. IEEE Trans. Signal Process. 1993, 41: 3275–3292.

    Article  Google Scholar 

  15. Irino, T.: An optimal auditory filter. In: Proc. IEEE Signal Processing Society. Workshop on Applications of Signal Processing to Audio and Acoustics, New Paltz, NY, 1995 Irino, T.: An optimal auditory filter. In: Proc. IEEE Signal Processing Society. Workshop on Applications of Signal Processing to Audio and Acoustics, New Paltz, NY, 1995

  16. Irino T, Patterson RD: A time-domain, level-dependent auditory filter: the gammachirp. J. Acoust. Soc. Am. 1997, 101: 412–419.

    Article  Google Scholar 

  17. Reimann HM: Uncertainty principles for the affine group. Funct. Approx. Comment. Math. 2009, 40: 45–67.

    Article  MathSciNet  Google Scholar 

  18. Robles L, Ruggero MA: Mechanics of the mammalian cochlea. Physiol. Rev. 2001, 81: 1305–1352.

    Google Scholar 

  19. Rhode WS, Recio A: Study of mechanical motions in the basal region of the chinchilla cochlea. J. Acoust. Soc. Am. 2000, 107: 317–3332.

    Article  Google Scholar 

  20. Recio A, Rhode WS: Basilar membrane response to broadband stimuli. J. Acoust. Soc. Am. 2000, 108: 2281–2298.

    Article  Google Scholar 

  21. Shera CA: Intensity-invariance of fine time structure in basilar-membrane click responses: implications for cochlear mechanics. J. Acoust. Soc. Am. 2001, 110: 332–348.

    Article  Google Scholar 

  22. Prisma-Realtime. http://www.prisma-music.ch Prisma-Realtime. http://www.prisma-music.ch

  23. Shera CA: Frequency glides in click responses of the basilar membrane and auditory nerve: their scaling behavior and origin in travelling-wave dispersion. J. Acoust. Soc. Am. 2001, 109: 2023–2034.

    Article  Google Scholar 

  24. Kiang NYS, Watanabe T, Thomas EC, Clark LF: Discharge Patterns of Single Fibers in the Cat’s Auditory Nerve. MIT, Cambridge; 1965.

    Google Scholar 

  25. Temchin A, Recio-Spinoso A, van Dijk P, Ruggero M: Wiener kernels of Chinchilla auditory-nerv fibers: verification using responses to tones, clicks, and noise and comparision with basilar-membrane vibrations. J. Neurophysiol. 2005, 93: 3635–3648.

    Article  Google Scholar 

  26. Recio-Spinoso A, Temchin A, van Dijk P, Fan YH, Ruggero M: Wiener-kernel analysis of responses to noise of chinchilla auditory-nerve fibers. J. Neurophysiol. 2005, 93: 3615–3634.

    Article  Google Scholar 

  27. van der Heijden M, Joris PX: Cochlear phase and amplitude retrieved from auditory nerve at arbitrary frequencies. J. Neurosci. 2003,23(27):9194–9198.

    Google Scholar 

  28. van der Heijden M, Joris PX: Panoramic measurements of the apex of the cochlea. J. Neurosci. 2006,26(44):11462–11473.

    Article  Google Scholar 

  29. Temchin A, Ruggero M: Phase-locked responses to tones of chinchilla auditory nerve fibers: implications for apical cochlear mechanics. JARO 2010, 11: 297–318.

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Hans Martin Reimann.

Additional information

Competing interests

The author declares that he has no competing interests.

Authors’ original submitted files for images

Rights and permissions

Open Access This article is distributed under the terms of the Creative Commons Attribution 2.0 International License (https://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Reprints and permissions

About this article

Cite this article

Reimann, H.M. Signal processing in the cochlea: the structure equations. J. Math. Neurosc. 1, 5 (2011). https://doi.org/10.1186/2190-8567-1-5

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1186/2190-8567-1-5

Keywords