Monday, September 7, 2009

Sensors for capturing speech signal

As my research focused on use of Physiological Microphone - PMIC for stress and speaker verification, I thought it would be a good idea to identify different sensors that have been used for speech signal analysis/recording.

Any sensor which has an ability to measure displacement would be thought as a good option to record speech as then it can record jaw movement, cheeck movement, tongue movement so on and so forth, recording movement of an articulator, or any vibration that is due to speech production (bone vibrations) at different locations other than the standard mic-near-the-mouth standard approach.

So, the study does focus on understanding details of displacement sensors - and the principle behind measuring displacements. - The sensors to measure displacement are strain gauges, piezoelectric crystals, and accelerometers.

Microphones - the acoustic ones usually record the pressure variations - caused due to sound travel in air - acoustic vibrations. Simplest form is the diaphragm based which measure the difference in pressure on either side - pressure gradient microphones. There can also be a pressure microphone with diaphragm open to atmospheric pressure and any pressure variation on either side can be recorded as a displacement, hence can be treated as omni-directional.

Examples of a few regular microphones.

1. Crystal Microphone - piezocrystal on one side of diaphragm, frequency response 80Hz to 6000Hz, needs amplifier as the signal is weak. p+ve points - dynamic range, sensitivity, and lack of distortion is good. good stability if used outside of humid conditions.
2. Dynamic Microphone - moving coil wound on a permanent magnet. Sensitivity is less as compared to crystal one because of the coil and magnet weight and also may not perform faithfully in presence of other electromagnetic sources.

3. Condenser Microphone - diaphragm is one of the capacitor plate and back plate (another parallel) combined together act as a capacitor. Requires a power supply (capacitor polarization voltage). Flat response for speech / music range, expensive and high impedance hence requires a pre-amplifier for impedance matching and signal transfer.

4. Electret Microphone - condensor microphone with permanent charge on the plates. Similar flat response over a frequency range, can be made smaller in size, but require battery power.

5. Probe Tube Microphone - useful to measure sound pressure levels within a closed or nearly closed cavity. Consists of usual transducer, but the diaphragm is coupled to a length of tubing of very small diameter. The tubing is inserted into the cavity to be measured. The length of the tubing defines the resonance characteristics hence the probe tube microphone is calibrated to appropriately yield a flat frequency response.

Monday, April 6, 2009

Fractal Dimension - PMIC

Analyzing PMIC acquired signal using nonlinear dynamical framework

Tuesday, March 17, 2009

Fractal Dimensions - a review and study on sound captured through skin vibrations

Chapter 5. Invariants of the motion [book - analysis of observed chaotic data - Abarbanel H.D.I.]


Chapter 2/pg 21 - Fractal Dimension: A Quantitative Measure of Self_Similarity and Scaling
[book - Fractal Physiology - James Bassingthwaighte, Larry Liebovitch, Bruce West]

Self-Similarity Dimension-
describes how many new pieces geometrically similar to the whole object are observed as the resolution is made finer. If we cahnge the scale by a factor F, and we find that there are N pieices similar to the original, then the self-similarity dimension D(self-similarity) is given by
N = F ^ D(self-similarity)
This can be a fractional dimension - fractals (fragments) which usually takes in non-integer values and which therefore lies between Euclidean Dimensions.

Hausdorff-Besicovitch Dimension and Capacity Dimension-
Necessary to find fractional dimensions of irregular shaped objects. these two dimensions are technically similar.
This is explained in terms of number of balls required to cover all the points within a space. In 1-D (ball is a line), 2-D (circle), 3-D (sphere), and so on. Consider the radius of the ball to be r/ let N(r) be minimum number of balls of size r needed to cover the object. The capacity dimension Dcap tells us how the number of balls needed to cover the object changes as the size of the balls is decreased.
Dcap = lim (r -> 0) log N(r) / log (1/r)

If the capacity dimension is determined by using balls that are contiguous non overlapping boxes of a rectangular coordinate grid, then the procedure is called as "BOX COUNTING".

Thursday, March 12, 2009

nonlinear time series analysis (Kantz, Schreiber)

Comments on "Nonlinear Time Series Analysis," Holger Kantz and Thomas Schreiber, 2nd edn, Cambridge

Chapter 3 Phase Space Methods
Consider a dynamical system whose trajectory can be studied in finite dimensional vector space R^m, thus the state is specified by some vector x belonging to R^m. The said dynamical system can be described as xn+1 = F(xn), for discrete variable, and d/dt x(t) = f(x(t)) for a continuous case. Based on the initial condition for x0, the sequence of points for xn will portray a trajectory of the dynamica lsystem, which either will run to infinite or stay within some bounds.

the set of initial conditions leading to the same asymptotic behaviour of the trajectory is called the basin of attraction for this particular motion.
Since the dynamical equations (or the equations of motion) are defined in phase space, it is also most natural to yuse a phase space description for approximations.

3.2 delay reconstruction pg 35
sn = s(x(n5t)) - sequence of scale measurements of some scalar quantity taken at mutiples of fixed sampling time
a delay reconstruction in m dimensions is then formed by the vectors sn,
sn = (sn-(m-1)tao, sn-(m-2)tao, ...sn)
The attractor formed by above reconstruction is equivalent to the attractor in the unknown space in which the original system is living if the dimension m if the delay coordinate space is sufficiently large. precisely, this is guaranteed if m is larger than twice the box counting dimension Df of the attractor.

pg 36
for many practical purposes, the most important embedding parameter is the product mtao of the delay time, and the embedding dimension, rather than the embedding dimension m or the delay time tao alone.

Wednesday, March 11, 2009

Analysis of observed chaotic data (Abarbanel, H. D.I.)

Comments on "Analysis of observed chaotic data," Abarbanel H.D.I. Institute of Nonlinear Science Springer Study Edition 1995


pg 4 1.1.2 Correlations among data points
The dynamics takes palce in a space of vectors y(t) of larger dimension, and we view it projected down on the axis of observed variables. we can identify a space formally equivalent to the original space of variables using coordinates made out of the observed variables and its time delayed copies. - Phase space reconstruction.


The time delay in the embedding is divided in terms on nonlinear correlation function called average mutual information has its first minimum.


pg 5 1.1.3 number of coordinates

the number of coordinates to use is determined by asking when the projection of the geometrical structure has been completely unfolded. implies that points lying close to one another in the space of the y(t) vectors do so because of the dynamics and not because of the projection. this can be decided on the basis of False Nearest Neighbors Algorithm. [false neighbors are connected with unpredictability as phase space points will be near to each other for non dynamical reasons, namely just because of projection. it may happen that the percentage of false nearest neighbors drops to ZERO is the necessary global dimension to be used to examine the observed data]. pg 7

reconstructed series replace the scalar data measurements with data vectors in Euclidean distance d-dimensional space in which the invariant aspects of the sequence of points s(n) are captured with no loss of information about the properties of the original system. the new space is related to the original space of the s(n) by smooth, differentiable transformations.

what is the time lag Ts to use and what dimension d to use are the central issues of this reconstruction ----
the theorem notes that if the motion lies on a set of dimension Da, which could be fractional, then choosing the integer dimension d of the unfolding space so d > 2Da is sufficient to undo all overlaps and make the orbit unambiguous.

pg25
Time delays
it must be some multiple of the sampling time tau_s, since we only have data at those times, an interpolation scheme to get more data is just as uncertain as estimating the time derivatives of s(t).
if the time delay is too short, the coordinates s(n) and s(n+T) which we wish to use in our reconstructed data vector y(n) will not be independent enough.
finally, since chaotic systems are intrinsically unstable, if T is too large, any connection between the measurements is numerically tantamount to being random with respect to reach other.

Average mutual information -
the actual prescription suggesed is to take the T where the first minimum of the average mutual information I(T) occurs as that valye to use in time delay reconstruction of phase space. Average mutual information is invariant under smooth changes of coordinate system. Thus, the quantity I(T) evaluated in time delay coordinates and in the original, but unknown, coordinates takes on the same values. Also, I(T) will be robust against noise than many other quantities.

choosing the dimension of reconstructed phase space
De is a global dimension and may well be different from the local dimension of the underlying dynamics.
using global false nearest neighbours --
since the attractors for real physical systems is quite compact in phase space, each phase space point will have numerous neighbours as the number of data becomes large enough to populate state space well.

Teh basic geometric idea in the embedding theorem is that we have achieved an acceptable unfolding of the attractor from its values as seen projected on the observation axis when the orbits conposing the attractor are no longer crossing one another in the reconstructed phase space.

filtering out spatial variations on small scales, perhaps even associated with high frequencies as well, ca nlead to a practical system of lower dynamical dimensions.



Friday, March 6, 2009

The Dripping Faucet as a Model Chaotic System

Comments on "The dripping faucet as a model chaotic system," Shaw, Robert, Publisher - Science Frontier Express Series (Aerial Press, Inc.) 1984.

This is a technical story of how a dripping faucet can make you lose your sleep :-)

The major question to be addressed is: How do we construct a model from a stream of experimental data which we have not seen before? How do we use the model to make predictions? what are the limits of our predictive ability?

Chaotic dynamics - short-term predictability but long-term unpredictability. The system state at one instant of time is causally disconnected with its state far enough into the future.

The author wants to find answers to:
1. the amount of information a system is capable of storing, or transmitting from one instant of time to the next,
2. the rate of loss of this information.

If we consider any system to be a black box which is producing a stream of numbers - the above questions along with the "predictability" can be posed.

One method is to construct "return map" or Poincaire' cross-section by using a embedding dimension on time-delayed phase space reconstruction.

Most important attribute of the "characteristic of a system" is that quantities - "information stored in a system" - must be properties of the system, and not type of measurement. This implies an invariance under coordinate transformations - a property which appropriately defined measures of information possess.

The dripping faucet has different patterns of behavior - periodic or chaotic - depending on initial conditions, indicating multiple "basics of attraction", changes in behavior are sudden and hysteretic.

A behavior which examplifies a mixture of periodic and chaotic aspects can be called as noisy periodicity (May) or semiperdicity (Lorenz).

then there is a possibility of "mixed dimensionality" - in some regions of the state space - the state of the system can be optimally specified by the value of a single coordinate - but in othe rregions, more coordinates are required.

Interesting conclusions (page 18-19) made by the author are:
1. a model of only a few dimensions can sometimes adequately describe the chaotic behavior of a continuum system, the high dimensionality of the system is not required.
2. there exist fundamental geometrical structures which reappear in many different nonlinear systems.

Pg 30 -
Poincare realized that even in celestial mechanics determinism did not imply unlimited predictability. For purely deterministic dynamical systems, predictability may be extremely limited.
The degree of unpredictability --> entropy of the given system.

pg 32 ->
Kolmogorov, Sinai and others were able to show that, if one considers the set of all possible pratitions, and selects the one which gives the largest numerical value for the entropy, the number one obtains is a topological invariant of the system, independent of the coordinate system used to describe the original continuous variables v.

34:
the entropy describes the rate at which information flows from the microscopic variables up to the macropscopic. But in presence of noise, partition definition of entropy does not help,as the partition becomes finer than length scale by the noise.
In mechanics the transmission of information requires the possibility of change.

The channel capacity is defined as maximum rate one can find by varying the statistics of the input message over all possible input ensembles and using the optimum.
For dynamical systems, a particular input ensemble is selected by the properties of the system itself. Thus, information (Stored in the system) is quantifies as the average increase in out ability to predict the future of a system when we learn its past. Thus it is the difference in the expected randomness of a system with and without knowledge of its past. [pg 40]

[48-49]
our information about a system cannot increase in the absence of observation.
sharper distributions representing greater stored information will spread faster than broad distributions.
The maximum entropy or loss of predictability will thus occur when the stored information is at the maximum. under this definition both the stored information and entropy of a purely stochastic map will be zero.

pg 55 the entropy in the limit of "pure determinism"
In deterministic case (for zero noise), the information storage capacity of the map diverges. But, it becomes finite with the addition of noise, and information stored by higher iterates of the map has the concave behavior.

one task of an experimenter studying a new system is to characterize the determinism of the system, which can be described as the information stored through time in the dynamical variables. if the system has a positive entropy, some or all of this causal connection will be lost in the passage of time.

pg69
in a real chaotic system with noise present, initial data has predictive value for only a limited time, events which are too far apart are causally disconnected. correlations extend over only a finite length of the string of symbols, old experience ceases to have predictive value.

[pg 93]
from a classical perspective, it is clear that an experimenter's claim that he is "in the noise", and that there is no point in attempting to increase resolution, is an assumption. if the observation rate is less than the entropy of the system at some resolution level, the system will appear stochastic, even if it were "completely deterministic".

pg 103
The dimension of an object should describe how its volume scales with increasing linear size.

pg 109:
the usefulness of a "dimension" number as a scaling exponent depends on the number of orders of magnitude between the smallest and largest length scales. Robert Shaw comments that "the more low-dimensional structure there is in the reconstructed attractor, the less well-defined is a single "dimension" number. Thus, there is little reason at this point to prefer one dimension algorithm over another as "more fundamental".

In short:
Comments on the book -
Book has been vividly written with less of mathematical probing (or the probe is kept to minimal and necessary, whenever required).

Technically - attacks fundamental problem of what information and flow of information is related to "determinism" factor of a chaotic system.

Tuesday, February 17, 2009

Nonlinear Model of Speech Production

[Ref: Banbrook PhD thesis, Breen's article]

- Not paraphrased. - is as is from Banbrook's thesis..


The mechanism by which we produce speech is a complex ionteraction of several processes. the basic mechaniusm for producing any sound is to expel air from the lungs, using muscular action. the expelled air is forced through the larynx (Which is capable of producing two main categories of speech: voiced nad unvoiced.)


Voiced speech, or phonation, is produced by oscillating the

Phoneme - Structure





































Tuesday, February 3, 2009

Continuous Speech Recognition


The left side of this blog represents a nice example of why continuous speech recognition is so difficult. [Ref. Rodman R.D., "Computer Speech Technology," Artech House, Norwood, 1999].