DSP related applied math ramblings.

This is a collection of misc notes about applied math, algorithms and
circuits, somewhat related to programs I'm writing and projects I'm
working on, but deserving a place of their own.

For more http://zwizwa.be/ramblings

Entry: Killing Chipmunks with Parametric Dynamic Wavetable Synthesis
Date: Mon Aug 4 11:55:00 CEST 2003
Type: tex


\begin{abstract} In this paper we discuss some sound synthesis methods
in the dynamic wavetable framework.  We will take a closer look at
approaches that combat problems regardig aliasing and pitch dependent
formant locations, two of the major drawbacks of dynamic wavetable
synthesis (DWS).  DWS is a hibrid form of wavetable synthesis and
additive/granular synthesis. We will limit ourselves to ``holistic''
DWS, which uses (semi) parametric methods to cancel pitch dependent
formant location, as an alternative to basic additive or granular
synthesis for signals with a clear single pitch.  \end{abstract}


\section{Introduction}

Wavetable synthesis is one of the first digital sound synthesis
methods developped [refs].  The method stores a period of a sound wave
in a table and plays it back at a variable pitch.  The data in the
table determines the timbre of the resulting waveform. A drawback of
this form of synthesis is the fact that formant frequencies of the
resulting waveform are proportional to the playback frequency. This is
frequently called the ``chipmunk effect'' and is usually undesirable.
In commercial synthesizers this artifact is reduced by using several
wavetables for some pitch ranges and interpolating between them. This
is referred to as multitimbral synthesis.

Another drawback of wavetable synthesis is aliasing in the case of
polynomial interpolation.  This is solved in practice by using a
decimating filter. This is also called bandlimited interpolation.
This usually requires the storage of a large set of interpolation
coefficients with different cutoff frequencies, since it is hard to
impossible to do this in a parametric way.

Dynamic wavetable synthesis is a derivative of wavetable synthesis,
with the addition that the wavetable is synthesised using some
(parametric) method, often updated at a haptic rate of around
20Hz. This enables the construction of dynamic timbres. However, being
a table playback method, the chipmunk effect is also present in this
method and aliasing needs to be taken into account when the table is
read out at a higher rate than the spacing between table samples.

In this paper we will illustrate some applications of DWS, with
special attention on the chipmunk effect and aliasing. The first
method deals with explicit additive synthesis using the fast fourier
transform. In short we synthesize the spectrum of the periodic
waveform directly, and introduce pitch by resampling the waveform on
playback.  When the pitch of the playback is known (i.e. by lookahead)
the wavetable can be made explicitly bandlimited. If dilatation and
contraction of the spectrum are parametric, we can use the playback
pitch information to perform a formant correction before the waveform
is transformd to the time domain. This approach has some advantages
over synthesizing sinusoidal contributions with standard, direct FFT
approaches such as $\text{FFT}^{-1}$ [2], especially for near
periodicity.


The second topic in this paper is an illustration of the general
framework of scanned synthesis using dynamic wavetables. The idea is
to derive a waveform, or timbre, from a (nonlinear) dynamic
system. The holy grail here is to find some way to change the formant
spectrum of a wavetable, depending on the desired playback pitch. We
will persue this using circulant neural networks, as a generalization
of a linear circular time invariant system.


\section{DWS algorithm}

\subsection{DWS}

The algorithm is very simple. The wave table is a vector function
$x[k] \in \mathbb{R}^N$, with a sampling rate $f_c$ which is a
fraction of the system sample rate $f_s$. Sound is synthesized by
interpolating the $x[k]$ at a desired pitch. To prevent
discontinuities when the table is updated, the readout can be made to
interpolate between $x[k-1]$ and $x[k]$.  The choice of $N$ depends on
the desired frequency resolution. When $N=2048$ and $f_s = 1024 f_k$
and $f_s$ in the order of $40$ kHz the entire audible spectrum can be
represented, together with the interesting haptic frequencies. A
choice of sampling subdivision and table length of $1024$ is usually
more than enough for practical purposes.


The computational cost in its basic form is relatively low. The
playback resampling can be done using simple polynomial
interpolation. Using more expensive methods like bandlimited
interpolation is also possible, however for some uses (FDDWS) playback
aliasing can be cancelled explicitly, so an expensive decimating
filter is not necessary. Usually the cost of synthesizing the
wavetable (using convolution methods) can be significantly reduced by
using an FFT.

\subsection{FDDWS}

Since working in the frequency domain has a more intuitive feel, we
will give the method where de table is transformed using an IFFT
before resampling playback a different name: frequency domain dynamic
wavetable synthesis (FDDWS).

One of the advantages over direct additive synthesis methods is the
elegance of the scheme. We don't have to care about things like window
length and spacing, and window transfer function, which are serious
drawbacks of standard, ad-hoc frequency domain processing methods.
Since elegance is not necessarily an argument in engineering, we claim
the real advantage is simplicity of the parametric synthesis phase,
which will ease implementation and save computational cost when the
expense of the spectrum synthesis far outweighs the cost of the
resampling involved.

When employing frequency domain DWS, anti-aliasing is a trivial
multiplication before performing an IFFT.  The only real problem that
needs to be taken into account is to make the spectrum
``stretchable''.  When killing the chipmunks is not necessary, since
the effect might be desired in some cases, it can be simply left
out. In short, by introducing separate pitch and formant controls, a
lot is possible. Combatting chipminks with FDDWS is thus creative use
of time/frequency contraction and dilatation.

Requiring parametric spectrum dilatation or contraction might seem a
serious disadvantage.  We can also look at it as a trade-off for the
added simplicity of te frequency domain processing.  Since the
transform lengths are fixed we are freed of window juggling.  We hope
the illustrations in the next sections wil clarify this simplicity.

\subsection{Time-Frequency Duality}

All in all, DWS enables us to do pitch relative processing using the
entire dsp toolbox in a simplified form because the data size remains
constant. The slight increase in cost and the straightforward
trade-offs involving aliasing and formant locations give us a very
flexible and intuitive framework as return of investment.

When working in the complex field, we have a nice dual representation
that allows us to switch between time and frequency representation
without too much hassle, and interchange the dual operations of
circular convolution and ring modulation.


\section{Parametric Formant Preserving Wavetable Synthesis}

We will illustrate this principle by explicitly synthesizing
``constant-Q'' formants.


\section{Dynamical Systems}

This section deals with the synthesis of tables using a parametric
(nonlinear) dynamical system.  This is in fact equivalent to Scanned
Synthesis, pioneered by Max Mathews e.a. [ref].  In most cases we are
interested in a periodic table without discontinuities.  We could
define the table on the complex unit circle, even in continuous form
to enable some analytic manipulation using i.e. orthogonal polynomials
or fractional linear maps, but we will leave that route for
later. There are other ways to get to periodicity.  We take the
wavetable to be $x \in \mathbb{R}^N$.

The system is defined as an iterated map

\begin{equation}
x[n+1] = f(x[n]).
\label{ifs}
\end{equation}

We will also extend this equaiton to include an input, in the form of
$x[n+1] = f(x[n] + u[n])$ or the more general $x[n+1] = f(x[n],
u[n])$. For simplicity the output (the wavetable) is taken to be the
state vector of the system.


\subsection{Linear Systems}

When we limit ourselves to linear maps $f = A \in \mathbb{R^{N \times
N}}$ we get a ``standard'' linear state space system % dynwav paper
[1].  A way to introduce circular symmetry in $f$ is to make $A$
circulant. Application of $A$ can then be computed in the frequency
domain since it amounts to circular convolution. Working in the
frequency domain will also enable us to interpret the working of the
network to a large extent, since the FFT (denoted further as the
matrix operator $F$) of the wavetable $Fx$ has a very intuitive
interpretation.  The operator $A' = FAF^-1$ is diagonal containing the
spectrum of this filter.  Requiring circular symmetry thus brings us
to FDDWS.

For the linear case (\ref{ifs}) reduces to

\begin{equation}
x[n+1] = A x[n].
\label{lifs}
\end{equation}

With $A$ circulant, $x' = Fx$ and the diagonal decomposition $C =
FAF^-1$ we have a set of decoupled linear systems

\begin{equation}
x'_i[n+1] = c_i x'_i[n].
\label{dlifs}
\end{equation}

With $c_i$ an eigenvalue of $A$ corresponding to one of the
eigenvectors of $A$ (an FFT base function).  Because we defined $A$ to
be real valued, each $c_i$ has a complex conjugate $\bar c_i$. We now
have the choice to take $A \in \mathbb{C}^N$ or to replace the
eigenvalue decomposition $C = FAF^-1$ by its real blockdiagonal
decomposition.

The effect of the system is to produce a wavetable composed of a set
of harmonic sinusoids, whose phase and amplitude are modulated damped
complex exponentials. Scanning $x$ will approximately produce a set of
near harmonicly spaced damped sinusoids. The damping is affected by
the magnitude of the $c_i$ and the frequency shift away from
harmonicity is determined by the phase of the $c_i$.

What we are interested in is to make the magnitude of the frequency
spectrum of $x$ (the magnitude of $x'$) parametrizable, so we can
cancel out the formant shift introduced by table readout. This would
turn the DWS method into a direct, fully separated pitch and timbre
synthesis method.

This can be done by changing the time scale of the $x_i$ and $c_i$
whenever the playback speed varies. Doing so will move the aliasing
problem to this scaling step. If the $c_i$ are fully parametric, this
does not pose a real problem. Stretching the state vector $x_i'$ will
pose no problem either, except for the loss of
information. Compressing $x_i'$ does pose an aliasing problem. A way
to solve this is to stretch the time domain state vector $x_i$, since
frequency domain compression is equivalent to time domain stretching.

All this means the entire algorithm will be very much based on
resampling, both for the time and the frequency domain representation
of the state vector as for the final playback resampling. This of
course turns us back to square one in some sense as to how far it is
worth persuing this approach instead of keeping the pitch and timbre
method separated in a source/filter approach, i.e. using bandlimited
impulse synthesis together with an IIR filter bank or overlap add
frequency domain filtering. On the other hand, when persuing this
route, we do have a separate pitch and formant control, together with
a nice framework processing waveforms and spectra in a combined
time/frequency approach.


\subsection{Nonlinear Systems}

In this section we will use a recurrent neural network with
connectivity matrix $A \in \mathbb{R^{N \times N}}$ and squashing
function $\sigma(t) \in \mathbb{R}$. We define $f$ as a composition of
$x' = Ax$ and $f_i(x) = \sigma(x'_i)$.

Like in the previous linear approach, we take $A$ to be circulant. This network is thus equivalent
to a filter followed by a waveshaper, in music-dsp speak. 


\subsection{Examples}

\subsubsection{Analog waveform modeling}

It is of course debatable wether it is worth using an IFFT for
waveform synthesis. However, since the transform is running at a
haptic rate, the cost is proportional to $\log N$. Compared to filter
based methods for synthesizing analog waveforms, this is not so
expensive\footnote{ The filter order required to get a decent
bandwidth vs. aliasing trade-off for virtual analog is significant,
and at least comparable to the complexity of the IFFT. On the other
hand, implementing a virtual analog synth is a lot easier to do with
just an increase in sample rate. Increasing the sample rate has a lot
more advantages. It simplifies building blocks, allows for a smaller
wordlength, allows for low order filters to be used to tame nonlinear
elements, allows for sigma-delta modulation to perform bit depth
reduction when increased precision is necessary, i.e. for high Q
recursive filters, etc...}.

Since we're not too much concerned with killing chipmunks, the only
thing we need is an anti-aliased IFFT. Most analog style waveforms
have a simple fourier transform. Again this can lead to new schemes
for synthesizing new ``analog sounding'' waveforms and is in fact no
different from general parametric spectral synthesis.


\subsubsection{Piano modeling}

Modeling the piano is not a simple task.  However, some of the
features of the piano, like the inharmonicity and triple strings per
note fit wel into our scheme.

We will use the simplest form of a circulant linear state space system
(FDDWS), with nothing but a spectral envelope and a phase shifter
(multiplication with unit norm complex numbers) in the feedback
path. Reading a table three times with frequencies corresponding to
$1, 2+\Delta_2, 3+\Delta_3$ to simulate the effect of triple strings
already produces a very piano-like sound.

We can introduce inharmonicity by modulating the FFT bin corresponding
to basis function $e^{\frac{j2\pi n}{N}}$ by $e^{j(an + bn^2)}$. This
will produce a frequency dependent shift, which will shift ($a$) or
bend ($b$) the harmonic series upward ($a > 0$ or $b>0$) or downward
($a<0$ or $b<0$) from harmonicity. This nonlinearity can even be made
signal dependent, i.e. dependent on the power of the system.

Modeling the attack is almost as important as the strings.  We use a
second FDDWS system with a gaussian envelope in the feedback path,
reduced by an overall gain. The wavetable is initialized with a random
spectral content when a new note is struck.


\subsubsection{Waveform feedback}

Since we know the pitch of the scanning playback, we can scan a
waveform, do some processing (i.e. a reverb) and scan the output back
into the table. This allows for some more coupling between the real
world and our DWS model.

Some nice feedback effects can be obtained when both scanning
frequencies are not inverses, or a multiple read-out scan is combined
with a single read-in scan. Generating a new waveform by stretching or
compressing the previous one and feeding it back into the loop can
generate a lot of interesting sounds, even natural sounding drum
sounds. By adding a static limiter and a lowpass filter in the
feedback loop, stability is ensured and some spectral control is
possible.  The lowpass filter also smooths the edges.


The update equation is $x[k+1] = L \circ D \circ R \{x[k] \}$ where
$R$ is a resampling stretch or compress operator, $D$ is a waveform
limiter (i.e. a hard clipper) and $L$ is a lowpass filter. By
windowing $x[k]$ with a cosine window before scanning it for final
output, most of the waveform discontinuities re removed and the
frequencies are shifted upward which gives extra punch.

Driving the same system with a constant period input, instead of an
impulsive exicitation, giving the update equation $x[k+1] = L \circ D
\circ R \{x[k] + u \}$ , with $u$ the constant input, can generate
interesting fractal like waveforms, reminicent of synced sawtooths.

When the limiter is replaced by a power normalizer and the feedback
loop contains both a lowpass and a highpass filter, $x[k+1] = N \circ
L \circ D \circ R \{x[k]\}$, with $N$ the power normalization
operator, a stable oscillator is constructed. This oscillator can
produce a wide variety of chaotic sounds.  Due to the normalization,
the system is nonlinear but energy conserving, and due to the filters,
the frequency content can be localized.

Some more variants are possible. Changing the update function to
$x[k+1] = N \circ L \circ D \circ (1 + aR)\{x[k]\}$, gives an extra
control $a$ that is a tuning parameter for the amount of folding
applied.


\subsubsection{Random remarks}

In waveguide string modeling, there are usually two polarizations that
are combined, both with different properties. Using the positive and
negative part of the spectrum to express something similar in DWS,
enbles us to use complex FFT's. This is a bit of a moot point, but it
enables easier processing in PD, since the data will be in a more
natural form, plus we sort of get two signals for the price of one.
In other words, if the positive and negative spectrum will receive a
different phase shift (so both left and right partials differ slightly
in frequency after scanning) we get amplitude modulation for free.


\section{Conclusion}

Despite the chipmunk effect which sometimes can be a big drawback,
dynamic wavetable syntesis is an interesting approach because it
enables us to use fast periodic transforms to build systems that can
be easily understood as to how they behave from a time or frequency
perspective. When we can counter the chipmunk effect by adjusting
system parameters, it is a very useful technique because it eliminates
some of the ad-hoc methods used to compensate for the changing
periodicity in transform based techniques. In short, if the formant
distribution of a wavetable can be shifted in a straightforward
parametric way, we can cancel the chipmunk effect completely. Also,
when working in the frequency domain, anti-aliasing is ``free'', in a
sense that knowing the readout frequency enables us to explicitly
cancel out the frequencies that are bound to fold.


Entry: Entropy
Date: Sun Oct 16 15:25:56 CEST 2005
Type: tex

\def\gauss{e^{t^2 \over 2\sigma^2}}

Entropy is the (logarithm of the) number of states of a system for
which some \emph{macroscopic} measure is left invariant. The logarithm
is there just so we can add entropies, instead of multiplying the
number of possible states. 
% weighted sum?

For example take 100 coins. If we describe the \emph{macrostate} by
the number of heads, states with high entropy are those around 50, for
which there are a huge number of possible \emph{microstates} leading
to the same measure. States around 0 and 100 are low entropy, since
there are only few possible such arrangements.

Note that the definition of entropy depends on which measure(s) we
are using to describe a macrostate.

Quick, some airplane notes.  Maximum entropy distribution for the
interval $[0,1]$. Suppose the solution is $p(x) = 1$. The entropy is
then given as $H_p = \int_0^1 1 \log 1 = 0$. We take another
distribution $q(x) = 1 + \epsilon q'(x)$ with $\int_0^1 q'(x) = 0$.
Using the linear approximation $\log(x) \approx 1 + x$, the linear
approximation of the entropy is $H_q \approx - \int_0^1 (1 + \epsilon
q'(x)) \epsilon q'(x) = - \int_0^1 \epsilon^2 q'(x)^2$, which is
always less than $0$, so $p(x)$ has maximal entropy.


Entry: The number e
Date: Mon Oct 17 23:40:33 CEST 2005
Type: tex

% WWOOOOOW!  (hmm.. apparently this was once an orgastic insight ;)

%% About $e$. I wonder if it is at all possible to define $e$ without the
%% concept of limit or differentiation. I don't think so.

The number $e$ is about the prototypical way of combining a huge
number of very small updates. Starting from the power series of $e^x$
as definition, we can prove
$$e^x = \lim_{n \to \infty} \left(1 + {x \over n} \right)^n,$$ 
using the binomial expansion
$$e^x = \lim_{n \to \infty} \sum_{k=0}^n {x^k \over k!} {n! \over (n-k)!n^k}.$$

This gives a term by term correspondence since
$$\lim_{n \to \infty} {n! \over (n-k)!n^k} = 1,$$ because $\lim_{n \to
\infty} {n-m \over n} = 1$ for any $m$. It is interesting to note
that in words this says that the number of ways to pick $k$ natural
numbers is the same as the number of ways to pick $k$ \emph{different}
natural numbers, in the sense that the ratio of both numbers tends to
one as the set grows towards the size of the set of natural numbers.

Very important is the fact that we can't arrive at $e$ without using
infinities. If the converse were possible, $e$ would be algebraic,
which it is not. $e$ serves as the gategeeper of the infinitesimal.

%% Then onto commutators and $x$ and $p$. Suppose position and momentum
%% operators are matrices $X$ and $P = FXF^H$, with $F$ denoting the
%% Fourier transform. The commutator is
%% $$[X,P] = XP - PX = XFXF^H - FXF^HX.$$
%% Does this say anything useful? Probably not.


Entry: Dither
Date: Fri May  5 01:00:51 CEST 2006
Type: tex

Reading the ``Dither'' page on WikiPedia, and landed on Triangular
PDF, which is equivalent to the roll of two dice. Of course, since the
roll of one dice is rectangular, and the sum of two is the convolution
of the PDFs.

Reminds me about Knudth's Musing ``Hooray for probability theory'',
which deals with polynomials with positive coefficients and only real
zeros. They can be seen as the cumulant of $N$ coin tosses, and always
have a gaussian shape.

%[1] http://www-cs-faculty.stanford.edu/~knuth/musings.html


Entry: Finite differences
Date: Sat May 20 17:16:02 EDT 2006
Type: tex

Reading ``Finite Difference Equations'' by Levy and
Lessman. Interesting book. Opens my eyes about a lot of things related
to the continuous and discrete cases. The one that struck me today is
$$2^x = (1 + 1)^x = \sum_n \binom{x}{n} = \sum_n {x^{\nf}
\over n!}.$$ Here $x^{\nf} = x(x-1)\ldots(x-n+1)$, the
falling powers of $x$. This is a notation due to Knuth; the book uses
$(x)_n$ which I find less clear. The expression akin to the Taylor
expansion is
$$f(x_0 + xh) = \sum_n {x^{\nf} \over n!} \Delta^n f(x_0),$$ where
$\Delta^n f(x_0)$ is the $n$ times iterated difference operation
$\Delta f(x) = f(x+h) - f(x)$. The sum is finite when $f$ is a
polynomial, and is otherwize defined if $f$ is analytic.  The
relationships between the difference operator $\Delta$, the shift
operator $E=1+\Delta$, and the differentiation $D$ are expressed as
$$E=e^{hD}$$
and
$$\log(1+\Delta) = \log E = hD,$$ where the exponential and logarithm
signify the usual power series of the operators.


Entry: Subspaces
Date: Mon May 22 15:09:42 EDT 2006
Type: tex

Start with Dedekind's law stating that if $S \leq R$ we have
$$S \cap (R + T) = R + (S \cap T),$$ where $S \leq R$ denotes that $S$
is a subspace of $R$. In order to see this, it is interesting to
transform this law into a less general statement. Using just the
dimensions of the subspaces does not work, since a statement like $S
\leq R$ is not equivalent to the ordinary meaning when the symbols
represent integers, because the relation $\leq$ for subspaces is not a total
order. Reducing the subspaces to those spanned by a finite subset of
an basis $\{e_k\}$ allows us to talk about the bit vector
$(b_N\ldots b_1)$, which indicates which of the basis vectors are
combined to form a subspace, or in other words, it allows to write any
subspace as $B = \sum_k b_k E_k$, where $E_k$ is the one dimensional
space spanned by $e_k$.


%[1] http://en.wikipedia.org/wiki/Modular_lattice

Using this representation, the more specific version of Dedekind's law
in terms of the bit vectors $s$, $r$ and $t$ becomes
$$s \bitwiseand (r \bitwiseor t) = r \bitwiseor (s \bitwiseand t),$$
where $s \leq r$ means $s \bitwiseand r = s$ and $s \bitwiseor r =
r$. This reads as $t$ augmented with $r$, filtered by $s$ is equal to
$t$ filtered by $s$, augmented with $r$. Obviously this only works if
$r$ does not introduce bits that are not in $s$, which is the same as
saying that $r$ is ``contained'' in $s$.


Entry: Poles with multiplicity > 1
Date: Sat Jun 17 18:46:16 EDT 2006

In practice (from real-world data) multiple poles don't happen since
their probability is zero.  If they are there, they are structural
properties (i.e. system is constructed to exhibit poles with
multiplicity.)

However, for discrete finite structures they are important since their
probability is no longer zero due to the finite number of elements.

[1] entry://20090617-200427


Entry: Combining Splitting Fields
Date: Tue Jun 20 12:41:48 EDT 2006
Type: tex

The splitting Field of a polynomial in $\GF(2)$ has to be a field
extension of $\GF(2)$. Now, does this field extension have to be
$\GF(2^m)$? Well, let's see if there's a counterexample.

Let's factor $(x^2+x+1)(x^3+x+1)$. The first factor splits over the
field $\{p, p+1, 1, 0\}$ and the second factor splits over the field
field $\{q,q^2,q+1,q^2+q,q^2+q+1,q^2+1,1,0\}$. How to combine both of
them? 

Starting from the multiplicative groups $\{1,p,p^2\}$ ($\ZZ_3$) and
$\{1, q, \ldots, q^7}$ ($\ZZ_7$) we get a group $\ZZ_{3} \times
\ZZ_{7}$.  However, this group is not the multiplicative group of the
field we're looking for, since it is not closed under addition.  For
example $pq^2+1$ can not be factored as a product $p^nq^m$, but can be
obtained from linear combination of these products.  The additive
closure of the group gives the extension field we're looking for.

In general, a field extension of a finite field $\F$ is always a
vector space spanned by polynomials over $\F$, ensuring this additive
closure.  If $\F$ already is a polynomial field, the field is then the
additive closure of the products of polynomials in different formal
variables, each modulo an irreducible polynomial.

In our example this gives a proper extension of ${2^2}^{3} = 2^6 = 64$
elements $a_0 + a_1 q + a_2 q^2$ with $a_i \in \{1,p,p+1,0\}$.  It has
a multiplicative subgroup $\ZZ_9 \times \ZZ_7$ which has $\ZZ_3 \times
\ZZ_7$ as a subgroup.

Question. What is the exact isomorphism between successive extensions
like that, and the normal representation of $\GF(2)$ as polynomials
modulo an irreducible polynomial?  In other words, how to map these
polynomials in two variables to a representation in one variable?


Entry: Dither, PWM, Sigma-Delta, Bresenham
Date: Mon Oct  9 14:31:06 CEST 2006

What is dither[3] used for? It makes a digital system behave more like
an analog system, in that it can be used to make it exhibit _graceful
degradation_ instead of suddon failure.  Dither can increase
resolution on average, but adds noise to the immediate values. For
phenomena where the average is more important than the immediate
value, this can be an interesting trick.

A simple example of this is a first order difference equation in
finite precision.  Suppose I'm using 16-bit values to represent
oscillator periods.  In order to make logarithmic increments, I could
use a difference equation 

    y[k+1] = (1+a)y[k]

where a is a small constant. However, if ay is below hex 0.00008000
the value of y remains constant. In this case, average behaviour is
important, so using 0.0000rrrr as rounding term, where rrrr is a
16-bit uniformly distributed random number, gives an approximation of
the correct behaviour.

Random number generation (RNG) for this application is most
efficiently done using a finite field primitive polynomial.  However,
care needs to be taken to check the statistical properties in case the
same RNG is used to drive several dithering filters, although I can't
see any pathological cases at first glance.

If clear patterns in the output are not really a problem, a simpler
solution is to just use a triangle wave as dithering function, which
gives effectively a PWM output. What I'm really intereseted in is a
dithering scheme where the ripple is smaller after integration of the
output. This requires a higher frequency quantization step.

For example, instead of using a 01 ramp, a 81 ramp can be used. What
exactly is the relation to S-D modulation and the Bresenham[2] line
drawing algorithm? Let's make a list.


* uniform dither: Dither is from a uniform noise source.

* PWM: Dither from a square wave. Same average as uniform dither, but
  clear pattern. Large ripple after integration.

* nPWM: Like pwm, but using a triangle wave modulated to Nyquist
  frequency. Smaller ripple.

* S-D: Minimizes ripple after integration.

* Bresenham: Uses higher state resolution to generate a proper
  average. Ordinary rounding quantization.


The reason PWM is used over S-D in motor control (as I guess from PWM
modulators being part of uC hardware, but S-D not) could be switching
efficiency.  Larger ripple is tolerated if higher switching efficiency
(lower switch frequency) is attained. See Don Lancaster's magic
sinewaves[1] for explicit minimization of switching events for sine
wave generation as an alternative to PWM. When low ripple amplitude is
more important than efficiency, for example in audio applications, S-D
modulation can be used.

Question: can PWM hardware on uC be used to generate S-D signals?

[1] http://www.tinaja.com/glib/msinprop.pdf
[2] http://en.wikipedia.org/wiki/Bresenham's_line_algorithm
[3] http://en.wikipedia.org/wiki/Dither


Entry: LED lights on a circle
Date: Thu Jan 11 16:47:43 CET 2007


I ran into an interesting geometrical or topological problem.  Working
with N-simplexes to drive LED lights using a minimal amount of
electrical nodes.  The problem I have is how to associate an N-symplex
with a circle in a natural way.  For a tetrahedron this is trivial,
but what with higher dimensional things?

It's an interesting problem, because it seems to make absolutely no
sense to do so.  There should be no reason why a higher dimensional
polytope should be naturally projectable on a 2-sphere.  But i can't
refute it just like that.

To rephrase the problem in a more down-to-earth wording: I want to
create a 2-sphere covered with lights in a way that the lights are
normally distributed, and i have a certain pattern to access them.  At
the same time, i'd like to do this with as little connections as
possible.

Note that simplexes are easier to work with as they follow the
binomial expansion for vertices, edges, triangle faces, tetrahedral
volumes, etc...

Trying the simple thing of projecting a 5-plex on a sphere. Compared
to the 4-plex (tetrahedron), there is one pair of crossing lines.


Entry: Splines
Date: Sat May  5 21:56:32 CEST 2007
Type: tex

Need to figure out polynomials and splines. I'm limiting myself to
uniform splines atm. Let's start with the Bernstein polynomials, which
come from the binomial expansion of
$$
1 
= [x + (1 - x)]^n 
= \sum_{k=0}^n \binom{n}{k}x^n(1-x)^{n-k}
= \sum_{k=0}^n b_{k,n}(x)
.
$$
A polynomial written in terms of the basis $b_{k,n}$ is a Bezier
curve, and its coefficients are called control points.


Entry: Chaotic Oscillators
Date: Wed Nov 21 01:08:37 CET 2007

making chaos with a mixer usually involves an EQ, feedback and
gain. the nonlinear element is a saturation. i've never seen this
particular cicuit realized as a special purpose chaotic
oscillator. lots of comparator based stuff, but no saturation..

the simplest i can find is this one:
http://www.ecse.rpi.edu/~khaled/papers/iscas00.pdf

which uses 3 integrators, a comparator and a summing amp. i'd like to
get the parts count down to 4 amps, so i can use a quad. this means
the summing amp needs to be incorporated somewhere else.

i'd like to try the following. on the plane i made a (faulty) circuit
with a svf with positive feedback (timeconstant t = 1 for simplicity)

   x'' =  -x + a x'

if a > 0 this circuit is unstable. for a < 0 this is a standard
biquad.

now, if the integrators can be biased to some voltage, this bias can
be derived from a smith trigger acting on one of the state variables x
or x', switching between two unstable points. the st is stateful, so
chaos is possible on two planes: R^2 x {1,-1}

( what i did wrong is to apply the bias to the (+) input, which can't
be correct since the capacitor voltages are relative to this point, so
changing bias would also change the state variables. )

so.. what about using the saturation of the opamp? by measuring the
voltage at the noninverting inputs, saturation can be detected. in the
phase plane of an svf, there are 4 points where saturation can
occur.

using the general timer principle: 

  * detect a certain condition (one of the state variables saturated)
  * discharge one of the state variables

see:
http://www.scholarpedia.org/article/Chaotic_Spiking_Oscillators

switched 2nd order: a classic unstable biquad with state variables x
and y, where x is decremented with the output of a schmitt trigger
before it's fed into the y integrator and the integrator chain input
summer.

see:
http://www.cs.rmit.edu.au/~jiankun/Sample_Publication/IECON.pdf

which basicly contains the circuit i just (re)invented..


EDIT:
some refs from johan suykens:

M.E. Yalcin, J.A.K. Suykens, J.P.L. Vandewalle, Cellular Neural 
Networks, Multi-Scroll Chaos and Synchronization, World Scientific 
Series on Nonlinear Science, Series A - Vol. 50, Singapore, 2005


Entry: Chaos and Clipping
Date: Sun Dec 16 18:03:49 CET 2007

It's not so hard to understand the mechanisms of chaos in 3D switched
systems. In fact, they are quite simple to construct:

  * In one (of two) regimes, there is one (2D) mode performing an
    unstable oscillation in a plane together with a (1D) mode that
    decays toward that plane.

  * Put the switching barrier (plane) such that whenever a point
    crosses that plane (due to the unstable 2D oscillation), it is
    attracted fast towards the 2nd regime unstable plane by the
    decaying mode, and will end up close to the unstable fixed point
    of the unstable oscillatory mode.

Maybe linear distorted (clipped) systems are easier to understand as
switched systems too?


Entry: The Hough Transform
Date: Thu Jun 26 15:18:38 CEST 2008

Scan an image and collect histogram information for the parameters
that define the lines in an image.

The parametric equation for a line is the set of points 

  {p0 + a p0' | a in |R} 

where p0' is the vector p_0 rotated by +90 degree. or with p0 = r0(cos
t0, sin t0) the set

   { (r0 cos t0 - a sin t0, r0 sin t0 + a cos t0) | a in |R}

eliminating a in these 2 scalar equiations gives the equation of ponts
(x,y) in terms of the parameters (r0,t0) 

   LINE = { (x,y) | x cos t0 + y sin t0 = r0 }

Turning this around, given a point (x0,y0) all the lines that pass
through it are parameterized by the equation

   BUNDLE = { (r,t) | x0 cos t + y0 sin t = r }

Given this, how to find lines? We let the points vote for bundles,
then find intersections (hot spots) in bundle parameter space.

For each point in an image, record its brightness, and use that as a
vote for all the lines through that point. This can be done by
dividing the (r,t) bundle parameter space in a rectangular grid, and
let each point (x0,y0) vote for the points that are in the bundle it
defines.

If the number of angle bins is in the same order as the number of
input point bins, this algorithm casts N votes per point (x0,y0)
giving cubic complexity O(N^3)


Simplifications:

* Since this is a voting algorithm, knowing where to NOT to look can
  reduce the search space.

* Thresholding: don't let dark points vote: If a point doesn't lie on
  an edge, it doesn't need to update the array either. If possible,
  reduce the input to a binary image.


Subdivision:

As long as the number of angle bins is kept constant, the algorithm
has constant complexity whenever it is subdivided. The quality of the
estimates probably goes down, as there are less discriminiating votes.
In our case (radial lens distortion) subdivision is probably
necessary.


Algorithm: 

For all points: if white then update accus r(t) = x0 cos t + y0 sin t


Implementation:

Loop order: working per scanline y allows the expression (y sin t)
to be constant, it thus needs to be drawn only once, after the
accumulated intensity for that scanline is known. Since the same trick
is not possible for the x, this only halves the number of updates, so
is probably not worth it: its easier to transform the (x,y) into
amplitude + phase, which also halves the update time.

Memory access: the input can be streamed, but the state update goes
all over the place. Since the input, if binary, has very little
memory, it might be more interesting to perfrom random access on the
input image, and stream the accumulators. This requires for each point
in the 2D histogram to assemble a list of points that contribute to
it. The equation for this list of points is not a line, but a small
line bundle. (otherwise bresenham algo might have been used). This
does however raise the interesting question: can an approximate
algorithm be used that does scan lines like that? Can the image be
preprocessed so lines get a bit 'fatter' ?

Summarized:

  * Staying with the naive implementation, the iteration goes column
    wise over the input points, and for each point a sinusoid is added
    to the histogram bins.

  * Going the experimental route: algoritm might be reversed so that
    each (r,t) parameter can be directly computed using a Bresenhem
    based function evaluation kernel. This makes sense if the image
    data fits in cache, or can be made to use prefetching (lines are
    quite predictable access patterns). Probably numerical gradient
    can be used to do this iteratively: this would allow for
    non-histogram based incremental approximation. ( However, the
    space around the optimal points in the picture here:
    http://en.wikipedia.org/wiki/Hough_transform are not really
    smooth. In fact, their shape is quite odd. Is that an artifact of
    chopping things in bins? )

Note that the last comment is called the Radon transform
http://en.wikipedia.org/wiki/Radon_transform


Entry: Discretized integral transforms: domain / codomain loop order
Date: Thu Jun 26 18:32:28 CEST 2008

Moving from the Hough transform -> Radon transform is an example of a
more general pattern of re-arranging control around data dependencies
in discretized integral transforms, basicly chainging the order of
loops.

The central idea is this: for finite versions of an integral
transform, there exists a related finite histogram based version with
inverted control.  Instead of computing the integral for each sampled
codomain point from all the domain points, one computes the
contribution of each domain point to all the codomain points. Basicly,
in the discrete version, the domain and codomain loops are transposed.

For example, a generic integral transform
             _  _
 (y0,y1) = _/ _/  F(x0,x1,y0,y1) dx0 dx1
           x0 x1

can be rewritten, after discretization as 4 nested loops over
y0,y1,x0,x1 (outer->inner): for each y0,y1 the result is computed
directly. However, it can also be computed as an update of an y0,y1
grid by re-ordering the loops x0,x1,y0,y1.

For the Hough transform there is only one integral, so there are 3
loops to reorder to get to a discretized Radon transform.

Is this useful? The only scenario i can think about is the case where
the histogram version is faster to compute than the direct version,
but you need some estimate before you use the direct version in an
iterative algorithm. Maybe this only makes sense for line integrals,
not for full 2D->2D maps.


Entry: Snowcrash encoding
Date: Thu Jun 26 19:22:43 CEST 2008

The basic algorithm uses a single LFSR (linear feedback shift
register) to construct a 2D grid by:

  * partitioning X and Y state space into 2 separate segments so the
    same difference equation of degree D can be used for both.

  * shifting each column by a constant number N, and performing an XOR
    between X and Y

Let's call the observations G[i], and the hidden sequences X[i] and
Y[i]. These use addressing A(x,y) = A[x+Ny]. Alternatively, primed
array names represent transposed addressing A(x,y) = A'[xN + y].

The equations from LFSR state dependency and output mixing:

X[D+i]  = f( X[i] ... X[D-1+i] )     0 <= i < (N^2 - D)
Y'[D+i] = f( Y'[i] ... Y'[D-1+i] )   0 <= i < (N^2 - D)
G[i]    = X[i] + Y[i]                0 <= i < (N^2)
                                              ---------
                                              3N^2 - 2D

The number of unknowns (X[i] and Y[i]) is 2N^2
( The input data consists of N^2 points. )

With independent equations, this means it is soluble if 
2N^2 <= 3N^2 - 2D or

                  D <= N^2 / 2

                  N >= sqrt(2D)

For D=15 -> N=6

I don't see why this should be 7 as in the current implementation.
Maybe it's used to reduce ambiguity? Something 'feels wrong' about
using a shift of 6 and a detection of 7x7 : the sequences don't simply
wrap around any more, which distroys some symmetry.

 Q: is it possible to solve the quations in least-squares form, by
    adding an error term to each?

 Q: is there a way to test orientation properties by using some
    property of the relationship of the LFSR to its inverse?

If the parameter space is known, it can be proven by enumeration that
ambiguity can always be resolved. Maybe that is really a better
approach, because I don't see a way to do it otherwise.

Entry: reverse LFSR
Date: Fri Jun 27 13:31:59 CEST 2008

What happens if you reverse an LFSR sequence? This is most easily seen
by expressing the LFSR as f(x[i] ...) = 0. The only thing that happens
is the reversal of the coefficients. So 100101 -> 101001

Now, the question is, are there LFSR equations that are symmetric in
this sense? This would also mean the sequence itself is symmetric
around every point, which doesn't look like it's possible.

Next step: is it possible to relate an LFSR to its inverse? After all,
all galois fields of order 2^N are isomorphic.

It might be interesting to cast the equation into exponential form to
work with multiplicative generators g^i. Are all LFSR sequences of a
certain order related by mere re-arranging, like taking every 3rd bit?
Hmm... getting confused.

Let's view this in a transformed domain. The LFSR sequence is the
coefficient of the sequence of polynomials generated by
multiplications by x, the 'shift' operator. (This turns the modulo
operation into a simple XOR.)

In matrix/vector terms, given an LFSR polynomial a of order N, pick an
initial point in the state space e = {e1,...,eN). Take the update
equation of the LFSR, and call its matrix g. This is the
multiplicative generator for the sequence.


 Q: Another thing to think about: to compute eigenvalue decomposition
    in |C one needs complex eigenvalues (the splitting
    field). However, there exists a 'real form' of this which has
    blocks of 2D rotations instead of complex diagonal forms. Is this
    possible for LFSR?

 Q: how many different sequences are there for a given Galois field?
    (number of automorphisms)


Entry: LFSR resolving linear orientation ambiguity
Date: Fri Jun 27 15:28:37 CEST 2008

 Q: What happens if the LFSR in one direction is XORed with its
    reverse? does that still give a proper sequence? That way it would
    be possible to have contribution of 2 directions, which then can
    be used to resolve ambiguity.

Equations: 
( with A'[i] = A[N-i-1] )

L[D+i]  = f( L[i] ... L[D-1+i] )     0 <= i < (N - D)
R'[D+i] = f( R'[i] ... R'[D-1+i] )   0 <= i < (N - D)
X[i]    = L[i] + R[i]                0 <= i < N
                                              ---------
                                              3N  - 2D
Number of unknowns: 2N

          D <= N/2

The only problem then is to figure out whether the systems are
independent. Does this sum of sequence + reverse somehow reduce to
something of lower degree?

For one thing, it is reversal-symmetric around the initial state,
which can be taken not to ly in the surface of interest. Is there
anything else destroyed?

In terms of the unit element e, and the generators g and g^-1 the
sequence becomes a cosine instead of an exponential:

         e g^i + e g^{-i} = e (g^i + g^{-i})

This can no longer be written as an LFSR, because it has hidden state
(LFSR has a naked state). It is a more general form of linear system.

I don't se any reason why this can't be generalized to the 2D case.

 Q: Are there points where this XOR operation itself leads to
    ambiguities? In other words, are there XORed segments that have
    multiple parents? If so, then it is probably possible to find a
    small upper bound of the number of observations to take to
    eliminate this ambiguity. ( Since an large ub is probably easy to
    find: take one that reduces the search space for collisions to a
    small amount, i.e. one case. ) Note that if there are such
    ambiguities then a simple exhaustive search for soluble sets of
    equations in the non-symmetric 2D case might be a better approach.


Entry: symmetric 2D LFSR
Date: Sat Jun 28 12:52:36 CEST 2008

For a 2D LFSR grid, the problem to solve is rotation
ambiguity. Instead of using an LFSR generator for each direction, use
2 like in the 1D case, but tile them such that the structure is
rotation-symmetric. I.e. for N=3 the 4 LFSR are wrapped as:

X+     Y+     X-     Y-
1 2 3  3 6 9  9 8 7  7 4 1
4 5 6  2 5 8  6 5 4  8 5 2
7 8 9  1 4 7  3 2 1  9 6 3

Analogous to before, this gives 5N^2-4D equations and 4N^2 unknowns,
leading to the solvability condition:

                    N^2 >= 4D 

This looks like a nice idea, but using a 6x6 estimator for D=15 gives
underdetermined equations.. Apparently, there is some information lost
in the XOR. 

Intuition: do something like this in |R and you might reduce
condition, but it is hard to get to linear dependence. In a finite
space it is a lot easier to end up with linear dependence because
there is less room.


Entry: Lens correction
Date: Sat Jun 28 13:15:43 CEST 2008

Grid estimation with lens correction, roadmap:

- Lens parameter estimation: cubic distortion + optic center -> try
  this with manual approach first.

- general flow: sobel edge -> histogram for direction -> projection on
  2 orthogonal directions -> grid phase + frequency.

- Optionally: search the LFSR 2D space for sets of equations that are
  not soluble.


Entry: 2D direction estimation (confused)
Date: Mon Jun 30 14:20:24 CEST 2008


I first thought this would be Hough transform in r,t with the radial
direction integrated out. However, that doesn't work because of the
way accumulation works: each t is voted for in the same way, but with
different r.

I can't see how a bundle-accumulator would work for estimating
direction. How can a point vote for a direction? Doesn't make sense.

What about hierarchical Hough transform? We know there are only 2
directions, and they are orthogonal, so basically, there is only
one. This looks like a successive binary estimation problem. First
choice between t=0,pi/4

But, doing it this way: how to pick one of the two angles? The result
will be 2 accumulators for an r spectrum. We're not interested in r,
but it will look like a spectrum with multiple (blunt) peaks. The
decision rule could be something like high-pass content (local
variation).


Entry: simplified HT for direction estimation
Date: Tue Jul  1 11:12:56 CEST 2008

Essentially, a Hough transform H(r,t) of an image I(x,y) can be seen
as a likelihood function for the parameters r and t.

Now observe a typical 'sinogram', like the one here:
http://en.wikipedia.org/wiki/Hough_transform

Given the Hough transform of a single line, take an r-spectrum for a
certain angle: it will will have less variance when the angle is close
to the correct one. (The integral of the r-spectrum sums to the
number of line-points that voted.)

It would be interesting to be able to ignore the fact that there are
multiple lines, and have them all contribute to a single parameter. In
fact, the first harmonic in the DFT of the r-spectrum will give an
estimate for the phase and offset.

Let's try this.


Entry: Generalized Pitch Detection
Date: Thu Jul  3 16:26:11 CEST 2008

I'm not sure if I can capture my intuition about this, but let's try
anyway: There exists a class of problems which exhibit somewhat
periodic cost functions: let's call them Generalized Pitch Detection
problems (GPD). The problem with such problems is to add bias to the
estimator so it will go for the desired pitch, and not one of its
(sub-)harmonics.

The grid estimation problem in Snowcrash is a GPD problem.


Entry: LFSR: roundup
Date: Sat Jul  5 14:45:13 CEST 2008

After a couple of days of tinkering in the Scheme lab, thinking about
algorithm complexity and being distracted by HOP for prototyping
(which works nice btw..) it's time to get something done. Tasks:

  (A) lens distortion estimation from running example.

  (B) grid parameter estimation

Approach: do both based on Hough transform. First one: local tx,
second one: global tx with transformed coordinates. I do need move
from Scheme to C code because I'm running into trouble with execution
time. HOFs are nice, but parametricity is slow..


Entry: Hough over 0->pi/2
Date: Fri Jul 11 17:28:55 CEST 2008

( exploration log )

Got the Sobel + HT working in PF; drawing some nice pictures. Thinking
about letting theta + pi/2 vote for theta. What should r be?
Hmm.. This gives two mixed r spectra: for a certain theta, grid might
shift, so this doesn't seem like a good idea.

The pictures help a lot to build some intuition though.. Looking at
it, it doesn't seem too difficult to estimate scale, phase and angle,
right from the image, using the right biasing (filtering).

Try some multires?

With r,t going over their full range, there are 4 angles, pi/2
apart. How to change the transform such that a symmetric diagram over
0->pi/2 is possible? 

The full diagram is redundant, since for each (r,t) there is an
(-r,t+pi). The first reduction is to either halve r or halve
t. Let's halve t.

I'm using only one quadrant for x,y. Maybe change that too? Yes. Gives
a prettier picture: the whole (r,t) plane is used.

What about this one: Compute the sum of squares over r. Since we know
the integral is constant, the integral of squares will give some
distinction between highly concentrated regions and spread-out
regions. Is this entropy?

No, entropy is  \sum p_i ln p_i

p_i don't need to be normalized -> it offsets the entropy by 
ln \sum p_i

Both entropy and energy are superlinear nonlinearities, so will
emphasize outliers. Which one is the more natural measure?
Interpretation of the r spectrum as a PDF is quite straightforward,
but interpreting it as an energy, not so.. Maybe the potential energy
of a heap of sand? 

Potential energy is proportional to height, integral from 0->top is
quadratic in top.

Plotting the energy seems to give quite a clear distinction. However,
this still requires the construction of a histogram.

Instead of using energy or entropy, another way to introduce bias is
to assume the r period R is assumed to be known, it can be used to
construct a filter: simply summing (cos 2 pi r / R, sin 2 pi r / R)

So, instead of voting for the bundle

            (r,t) | r = x cos t + y cos t,  t \in [0,pi]

we vote for the phasor:

            (e^{i r/R} , t)

The advantage is that the r dimension can be reduced by immediately
accumulating the phasors (non-periodic contributions will interfere
destructively). This basically demodulates the grid carrier.


My intuition says it's possible to get away with fixing one parameter
to get a simpler/faster kernel function that can then be used to
perform incremental refinement of estimates of that parameter.

This also might benefit from representing the image as binary data,
reducing memory accesses. (Done: I don't see dramatic speedup, maybe
because at that time it was still compute bound due to sin/cos
evaluation?)

( Also, it might help to switch to the Radon transform, to compute the
  line integral directly. Hmm.. that's not the same: Radon transform
  would help if we're looking for lines, but here there is some
  benefit to computing a whole r-spectrum, which can then be reduced
  to a single phasor. )


Entry: Toroidal Hough Transform
Date: Sat Jul 12 13:49:32 CEST 2008

For grid estimation, we're not really interested in individual maxima
on the r-spectrum. Reducing this spectrum to a couple of values can be
done by adding bias toward a period R. Then, instead of using a voting
algorithm for bundles:

            (r,t) | r = x cos t + y cos t,  t \in [0,pi]

we vote for the phasor:

            (e^{i r/R} , t)

The advantage is that these votes can be accumulated immediately, so
memory access becomes simpler, and a histogram interval discretization
step is not necesesary.

The core computation is the evaluation of cos/sin, the rest seems
memory-bound.

Given R and t, the grid period, compute the complex phasor p: 

    p(R,t) = \sum_n e^{i r_n(t) / R}
    r_n(t) = (x,y) ? x cos t + y cos t : 0

The inner loop is the sum over the exponentials of r_n(t) for linearly
increasing x with y fixed, so this can be updated.

However, the image will be mostly sparse, so this will probably not
bring much speedup (points are not equidistant, so multiple rotation
update matrices are necessary).

( Tests indicate the 2 approachs: straight eval + update are about the
same speed. Maybe it's memory bound? )


Entry: Inner loop over angle/tangent
Date: Sat Jul 12 14:21:05 CEST 2008

( I'm not sure if this is useful: only works when the angle/tangent
loop is the inner one. )

Since we're not using any resolution-reducing histogram, and
floats/doubles have a large dynamic range, can the parameter space be
turned into something more convenient so the kernel method for
evaluatiing p(R,t) becomes simpler?

Goal: make a kernel method that evaluates p(R,t) in a different
parameter space. 

Let's try to replace it by (o,m) : offset and tangent of angle, and
use the computation of p(R,m) to compute p(R,t).

The bundle equation for (x,m) becomes:

      (o,m) | o = y - m x

Voting for e^{i o/O(m)} = e^{ i (y - m x) / O(m) } becomes fairly
trivial: the exponential can be accumulated since it's linear in both
m and (x,y). The polar coord version is only linear in (x,y), so works
only for a fixed t.


Entry: Accellerating the inner accumulator loop
Date: Sat Jul 12 16:47:21 CEST 2008

Bring the conditional out of the loop.

RLE will transform the sparse vector into an array of offsets that can
be used to perform the incremental accumulation without a conditional
in the inner loop. This representation needs to be computed only once,
and can be reused for every p(R,t) evaluation.

             e^{i (x cos t + y sin t) 2 pi / R}

( Moving one step further, instead of storing offsets that will have
  to be transformed to phase increments once per inner loop, what
  about storing the phase increments directly? This doesn't make sense
  for t=constant however, since each increment is used only once. It
  does make sense when multiple R are evaluated per t. )

The inner loop with y=cte looks like:

          \sum_X e^{i (a x + b)} = e^{i b} \sum_X e^{i a x}

Where:

      X = {x | image(x,y) = 1}  (range of sum)
      a = 2 pi cos t / R
      b = 2 pi y sin t / R


This makes the inner loop accumulation the accumulation of sin/cos of
scaled versions of the RLE x values.

Here a is the 'speed' at which the points in X rotate around. This
depends on t: the speed is a projection of the expected speed R under
angle t. Doesn't look there is a shortcut there, because of the
absence of structure in the set of points X.

( With RLE, the running time is proportional to the number of white
  points in the input image. Maybe some preprocessing step is desired
  to eliminate blurry gradients? )

If sincosf (math.h) can be used, this might go a bit faster.


Entry: implementing RLE
Date: Sun Jul 13 10:37:17 CEST 2008

Each row incrementally adds a multiplier e ^ { 2 pi sin t / R
}. Instead of performing a conditional on the input RLE data stream,
this might be encoded as a single bit.


Entry: Toroid plots
Date: Sun Jul 13 18:02:10 CEST 2008

Got some reasonably fast kernel routine and used it to generate a
theta/period plot. Gives results as expected: main period peak is
quite clear, and with an initial estimate it can probably be computed
iteratively.


Entry: AR - ARMA
Date: Wed Oct  1 17:10:57 CEST 2008

How to fit an ARMA model?  It's been a while.

http://en.wikipedia.org/wiki/Box-Jenkins

There are three primary stages in building a Box-Jenkins time series
model. 

1. Model identification

* Detecting stationarity: can be done by inspecting the
  autocorrelation.  Slow decay (a flat spower spectrum without
  isolated peaks) can indicate non-stationarity.

* Detecting seasonality: if there is significant periodicity this can
  be removed (modeled separately), or included in the model order
  estimation.  (The idea being that periodicity comes from external
  inputs, something which the ARMA model doesn't accomodate.)

* MA order (q) selection from autocorrelation plot.  For an exact
  model, this becomes zero after lag = q.

* AR order selection (p) from partial autocorrelation plot. (CHECK THIS).

This could be automated using information-based criteria such as FPE
(Final Prediction Error) and AIC (Aikake Information Criterion).

2. Model estimation

Once a suitable model order is found, use a NL-LS or ML method to
estimate the model parameters.

3. Model validation

The error term is assumed to follow the assumptions for a stationary
univariate process.  


For ARMAX the approach is similar?

So, what's the difference between using NL-LS or ML methods?  Linear
least squares corresponds to maximum likelihood if the errors have a
normal distribution.  It looks like this is no longer the case for
NL-LS.


Entry: motor control
Date: Mon Oct  6 01:18:51 CEST 2008

Something I forgot about modeling motors is that for AC induction
motors, the torque-speed curve is quite nonlinear.

First, an asynchronous AC motor is basicly a transformer where the
load of the secondary is the rotary dynamic attached to the axis.
This load can be modeled as a torque -> angular velocity transducer.

                 T -> LOAD -> w_L

The controller that closes this loop maps w_L to T through the the
motor's torque-speed T/w characteristic, which has two inputs: voltage
V and synchronous velocity w_s.

                       V,w_s
                        |
                        V
                 w_L -> M -> T

The nonlinearity between w_L/w_s and T is motor-dependent.

This was helpful: 
http://www.sea.siemens.com/step/templates/lesson.mason?ac_drives:2:1:1

http://en.wikipedia.org/wiki/Direct_Torque_Control

Torque can be estimated from the motor current+voltage.  This allows
to build a controller with simple measurement sensors, but doesn't
allow zero-speed control.

( Explain the flux-linkage estimation. )

With accurate speed/position measurement better control is possible:

http://en.wikipedia.org/wiki/Vector_control_(motor)

http://en.wikipedia.org/wiki/Variable-frequency_drive

  "AC motor characteristics require the applied voltage to be
  proportionally adjusted whenever the frequency is changed in order
  to deliver the rated torque. For example, if a motor is designed to
  operate at 460 volts at 60 Hz, the applied voltage must be reduced
  to 230 volts when the frequency is reduced to 30 Hz. Thus the ratio
  of volts per hertz must be regulated to a constant value (460/60 =
  7.67 V/Hz in this case)."

(The idea behind this being that sending DC through a coil is a bad
idea: when the frequency is reduced, the reactive part of the
impedance goes down, which makes the resistive part more significant,
resulting in more heat dissipation.)


Entry: FFT window design for sinusoidal synthesis
Date: Mon Oct  6 13:30:26 CEST 2008

(Mostly note to self.  FIXME: explain this a bit better.)

In 2000 I tried to design windowing functions for spectral synthesis:
synthesizing a sum of sinusoids by constructing a DFT spectrum through
updates of a limited number of frequency components per sinusoid.  I
apporached this as a FIR filter design problem.

However, I came to realize that approaching this as a filter design
problem, focussing only on stationary signals is not sufficient.  The
real problem is about amplitude modulation effects that happen when
mixing subsequent stationary signals, as a result of phase
discontinuities.  The only way to tackle this properly seems to be to
increase the spectral update rate.

It would be interesting to see whether this can be solved in the
DYNWAV model: it already has a variable frequency mechanism (wavetable
playback) which could be used to capture "average" frequency change
without amplitude modulation effects.  However, synthesizing multiple
pitches still gives the same problem.


Entry: I/O delays in Digital Control
Date: Wed Nov 19 16:40:31 CET 2008

I got a bit confused by conversations about latencies of digital
control systems, but it is quite straightforward:

For a synchronously sampled control system with input and output
updated at t:0,1,... the input data sampled at t=0 can only influence
actuation at t=1 (when process delay Z=1) and as such the effect of
this actuation can only be measured at t=2.


Entry: Dual Numbers and Automatic Differentiation
Date: Sun Feb 22 15:08:07 CET 2009

Extending the real numbers with e satisfying e^2 = 0 produces the
dual numbers, equiped with extended elementary operations.  Besides
being nilpotent, e behaves as a real number.

A dual mumber has two components a and b, and is denoted as

   a + b e

Given an analytic function f(x) over the real numbers, the extended
function over the dual numbers has the following remarkable
property:

   f(x + e) = f(x) + f'(x) e

This can be derived starting from the power series expansion of f(x)
at x = 0, and lifting all operations to those of the dual numbers.
The entity e effectively selects the first derivative from the power
series.

Practical significance: given an algorithm for computing f(x), an
algorithm for computing both f(x) and f'(x) can be automatically
derived by extending all elementary operations to their equivalent on
the dual numbers.

The "magic" is in that in general, single-expression function
derivatives _look_ much more complicated than the original function
expression.  This complication is artificial: it can be alleviated by
introducing proper decomposition.  This is exactly what AD does:
complexity doesn't really rise, but the amount of data that is tracked
in the computation is increased: there are more local dependencies in
the data flow, but global size doesn't explode.

Instead of using dual numbers, the same mechanism can equivalently
be explained as explicity tracking both function value and
deriviative, and updating them using the chain rule.

[1] http://en.wikipedia.org/wiki/Automatic_differentiation


Entry: Graceful degradation of integer values
Date: Wed May  6 13:52:21 CEST 2009

Transmitting binary digit-encoded values is problematic in the sense
that not all digits have the same noise sensitivity in terms of the
encoded integer value.

I.e. binary 1100 = decimal 12
     binary 1101 = decimal 13  (diff = 1)
     binary 0100 = decimal  4  (diff = 8)

Is there a way to encode a number such that each flipped bit will
cause the same (or similar) error in the integer value, and not too
many extra bits are used.

One way to do this with lots of redundancy is to use a PWM
representation.  I.e. for 4 bit values, this gives 16 bits
representation and a lot of redundancy.

However, can this principle be used in a way that isn't too costly?

0 000
1 100 010 001
2 011 101 110
3 111

Using bigger numbers the redundancy explodes quite fast.  Is there a
way to "flatten the bulge" in the redundancy curve?

This is related quite a bit to the concept of entropy.  Patterns where
the numers of ones and zeros are equally distributed are more common
because there are more variants.

What about using 100 010 001 as representations of 1,2,3.  That way
bit flips will only jump between "bands" and no longer all over the
place.

0 {000}           {0}
1 {100 010 001}   {1,2,3}
2 {011 101 110}   {4,5,6}
3 {111}           {7}

This produces a lattice structure.

Now, is there a way to do this recursively?  Convert each number to
transitions.

100 10
010 11
001 01

The same structure is found the the other section since it's simply
the inverse.

Hm.. I don't see it.

What I see is that numbers near the middle bands are all equal in some
way.  It's as if this says that only black and white are really
important (they are outliers), and the rest is not so special.

It's the binomial expansion of (1 + x)^n

1
1 1
1 2 1
1 3 3 1
1 4 6 4 1

Can this be used in control systems?  If the errors are small, we
don't really care much about their exact value, just their sign and
approximate size.  If the errors are large, we really want to know
them well.

Suppose a we have n wires carrying a single signal.  If they're all 1
or all 0 that's significant.  If they are equally distributed that's
not significant.

So, this is quite a natural representation of "normalness".  Flipping
one bit won't do much, but the ensemble has a meaning that can be
interpreted as a number.

Now, can we count with these numbers? Or at least treat them as values
to which we can apply functions?


Let's define this better.

    - A node is a binary function of time.

    - A number is a set of nodes.

    - The sum of two numbers is the union of their sets.

    - The weight of a number is the cardinality of its set.

    - A "reduction" translates a number of high weight m to one of low
      weight n, m > n.

Computation seems is hidden in the reduction operation.  Let's define
one for cardinality 2->1

This should be a transition of classes, probably random to eliminate
arbitrary choices.  The "outer" classes will remain while the inner
clas picks a random choice.

2 ----> 1

00    | 0
01 10 | 0 1
11    | 1


Similar for the next in line.

3 ----------> 2

000         | 00
100 010 001 | 01 10
011 101 110 | 01 10
111         | 11

Now, can reduction be implemented with an electronic device that has
some kind of balanced internal noise source?

Now..  3->1 can be implemented as a deterministic majority gate.  But
it seems that the big idea in the proposed computer is its
indeterminate nature.

So, with completely fair n -> (n-1) transitions, what is the
probability of an n->1 transition?  

Wait.  The simplest gate is really to just drop one bit randomly.  (A
computer full of Kill Gates?)

Maybe random killing isn't the trick, maybe random permuting is?  Then
killing is simply not using an output.  With a primitive in the random
transposition gate.

Enough insanity..  There might be something hidden here, but it is
really dependent on cheap connections.  It probably needs a 3D
structure to be implemented.  Random transpositions don't seem so
difficult..  It's basically a beam splitter.


Entry: Counting votes.
Date: Mon May 11 09:54:21 CEST 2009

This is a rehash of the previous post.

For bit vectors b of dimension n, the function 

  bits : 2^n -> [0..n]

introduces a partition[1] of 2^n.  The equivalence classes have an
order relation, determined by the bits function.

The central idea in using this partition to represent numbers in terms
of bit vectors is the distribution of bits(v) when v is drawn from a
uniform distribution.  It is a binomial distribution[2].

Such a distribution naturally encodes "agreement" (most bits zero or
one) as an exceptionally ordered state, and "disagreement" as the
natural disordered state.  

This non-linearity is quite a good match to the non-linear behaviour
of neural networks when modeled as an inner product followed by a
limiting nonlinearity, without ever using any explicit computations.
Simply concatenating a number of such vectors gives a resulting vector
that represents anything on the scale of no->disagrement->yes.  The
size of an individual bit vectors dermines its weight in the vote:

  v_out = [ v_1 | ... | v_n ]

Let's call a single bit vector v \in {0,1}^n a "vote".  Each vote has
a weight n which is its dimensionality.

Now, to make this managable, the only real "computation" that has to
happen is to reduce the weigth of votes so they can be combined with
other votes to form new ones.  The essential observation is that the
distribution of a vote with weight n is qualitatively similar to the
distribution of a vote with weigth n-1.  In other words, simply
discarding one element of the bit vector on average has no qualitative
effect.  Let's call this "weight loss".

The same reasoning goes for randomly flipping a bit.  On average this
has little effect, giving rise to good noise immunity.

This gives some form of computation (it can do anything feedforward
neural networks do) in a statistical way.  

The next step is to make weight loss programmable.  A vote should be
able to inhibit another vote.  The problem here is that it is a
physical operation.  Inhibition means to cut wires.  Is it possible to
do a non-phsyical inhibition?

Yes.  The and/or gate to introduce assymetry between yes and no.  One
vote could be combined with another vote (gate it).

So, what then is a nerve cell?  It's where multiple wires come
together and a random part of them is discarded.  Forgetting is the
essence of computation. ;)

So, given a (huge) set of binary nodes, there are only two operations:
    - reduction: taking a subset of signals to create a node
    - introduction of assymetry (inhibition through AND|OR) = gating


[1] http://en.wikipedia.org/wiki/Partition_of_a_set
[2] http://en.wikipedia.org/wiki/Binomial_distribution


Entry: Computer Modern is Too Thin
Date: Mon May 11 18:11:07 CEST 2009

Not really so math-y but I have no other place for it..

Something that has been bugging me ever since I used LateX for the
first time (1998?) is that Computer Modern is a very thin font.
Looking around the web, I've found several people complaining about
this.

On paper this problem isn't as pronounced because resolution is high
enough, and black is black.  But on screen, thin lines turns grey, and
that is really annoying.  It's easier to read black blobs than grey
ones.  On blurry low-quality paper, ink doesn't turn to grey either.

So, how to fix this?

The simplest way to fatten up a font is to use erosion (replace pixel
with minimal pixel value in a 3x3 square).  Alternatively, blur
followed by increased contrast will also work.

So, why doesn't ink on paper turn grey when it spills out a bit?  The
reason is probably that what is diffused is pigment (amount of
absorption or negative color) and not light intensity.  Even with
pigment diluted, black is still very black.


Code:


#include <stdlib.h>
#include <stdio.h>

int min (int a, int b) {
    return (a < b) ? a : b;
}

int main(int argc, char **argv) {
    int w=0,h=0,d=0;
    size_t size = w*h;
    unsigned char *buf;
    unsigned char *obuf;
    int l,c;
  again:
    if (3 != fscanf(stdin, "P5\n%d %d\n%d\n", &w, &h, &d)) exit(0);
    printf("P5\n%d %d\n%d\n", w, h, d);
    size = w*h;
    buf = malloc(size);
    obuf = malloc(size);
    fread(buf, 1, size, stdin);

    for (l = 1; l < h-1; l++) {
        int line = l * w;
        for (c = 1; c < w-1; c++) {
            int l1 = buf[(line-w) + (c-1)];
            int l2 = buf[(line-w) + (c)];
            int l3 = buf[(line-w) + (c+1)];
            int l4 = buf[(line)   + (c-1)];
            int l5 = buf[(line)   + (c)];
            int l6 = buf[(line)   + (c+1)];
            int l7 = buf[(line+w) + (c-1)];
            int l8 = buf[(line+w) + (c)];
            int l9 = buf[(line+w) + (c+1)];

            obuf[line + c] = 
                min(min(min(min(l1,l2),
                            min(l3,l4)),
                        min(min(l5,l6),
                            min(l7,l8))),
                    l9);
        }
    }

    fwrite(obuf, 1, size, stdout);
    fflush(stdout);
    free(buf);
    free(obuf);
    goto again;

    return 0;
}


Let this operate on images generated by "pdftoppm -gray -r 300" before
passing it to a DJVU converter.  Don't do this on lower rez.


Entry: Throwing away the right information
Date: Tue Jun  2 14:10:46 CEST 2009

While the computation mechanism explained in the previous posts might
be interesting as a mathematical toy, I'm not sure whether building
something like that is practical.  It requires a huge amount of
connections.

  A human brain contains about 10^11 cells with on average 7x10^3
  connections.  In log10 that's 11 and 3.84 with a ratio of 2.6

The thing is though, the architecture has failure built-in.  So
manufacturing failure should probably not throw off things too much.
It can be built with low yield, might might drasticly reduce
fabrication costs, to a point where it might be feasible in a home
lab with something other than sillicon.


Entry: Analog vs. Digital
Date: Tue Jun  2 14:24:28 CEST 2009

My original idea started with bringing back uniform error sensitivity
to digital systems: in a digital system, it is possible to get
error-free data transfer with an arbitrarily high probability.
However, when an error does occur, it is usually fatal.

This is in stark contrast with analog communication: errors will
degrade the signal, but are far from fatal.  The information is of a
different kind: there is always some error but "we can live with it"
and a little more error makes a little more annoyance, but no
fatalities are introduced abruptly.

Now, is there something inbetween these things?  Is it possible to use
the signal re-generation property of digital systems with graceful
degradation observed in an analog system?  In other words: contain
errors locally, but make sure that noise that gets promoted to signal
does not have global effect.

This seems to be different from error detecting/correcting codes:
these work well up to a certain noise level where they completely
fail.  This is more about representation of data.  About what a
_number_ really is.

Voting has this property, but it also is extremely wasteful for
representing "don't care".

TODO: make this a bit more formal and formulate the computation
properties as continuous statistics of a discrete (limit->inf) system.


Entry: Counting votes.
Date: Sun Jun  7 15:06:40 CEST 2009

Representing a non-negative integer as a number of 1 bits in a bit
vector allows the use of concatenation as addition.

Let's investigate the effect of dropping 1 bit at random from a length
n bit vector.  More specificly, given the expected value of the number
of 1 bits in a vector, calculate this value after dropping one bit at
random.


Entry: Sigma-Delta analog synth
Date: Tue Jun  9 11:13:03 CEST 2009

I think Sigma-Delta[1] modulation is a great idea.  The idea behind
this method is what triggered the previous posts.  It has been part of
my lingering background noise for a long time, influencing a lot of
the ideas that went into the Sheep synth.  This modulation scheme has
an aestetic appeal I can't really describe.

So, to get to know the modulation form a bit better, I'd like to build
a modular analog synth using this representation for signals, by
patching up a couple of PIC and discrete logic chips.  However, this
is probably more suited to FPGA implementation.


Let's see what operations are supported:

  * switch/mix:     interleave two streams
  * distorted amp:  perform AND/OR on an unbiased signal
  * inversion:      XOR with a stream of known average
  * differential:   pulse->edge: amplitude to frequency conversion.

Some real-world hacks on top of this:

  * A->D:           PIC's 12-bit AD


Modulation

Instead of using higher order filters as in an SD tailored to
accomodate a low-noise band of arbitrary shape, we use an n-bit
wrap-around accumulator as a discrete integrator.

To convert an n-bit signal to this representation, simply accumulate
it.  This yields a "digitial phasor" which rotates around 2^n states
over time.  Observing the carry bit of the accumulation as a binary
signal yields a signal with the same average as the original signal.

The reason this encoding can be used as a representation of audio is
that the error is small for low frequencies (where it matters).
Almost all encoding errors are present in the higher frequency bands.
Using higher order integrating filters will give better results if the
objective is good quality audio.  See [1] for more information.

However, for us the simplicity of the representation is more
important, since we want to use it as a computation substrate.  Its
connection to frequency modulation is of key importance.


Demodulation

Converting from a bit stream to a signal stream can be done using
accumulation and subsampling.  Some information is lost in the
modulated form (high frequency spectral content) and when converting
back to a stream of numbers, we'll loose some more.


Averages: 0, 1 and 1/2

There are 3 special signals to distinguish:

  signal   average
  0000...   0
  0101...  1/2
  1111...   1

The all-zero signal corresponds to a phasor that doesn't change.  The
alternating 0/1 signal corresponds to a phasor in a binary
oscillation, which corresponds to an input signal of 2^(n-1).  The
latter however is a limit case and cannot be generated by the phasor
mechanism.  The maximum is a signal with a 0 bit once every 2^n 1
bits, reflecting the maximal input signal amplitude of 2^n-1.

To simplify interfacing to the analog world, let's call 0101.. the
reference signal (zero).  This allows 0000.. to be the clip min and
1111.. to be the clip max amplitude.


Statistics

In order to calculate with SD modulated signals and prove some of
their properties we need to clearly define how such binary sequences
relate to sequences of real numbers.  It is remarkable that magnitude
is encoded in an single event stream.  How to fish out the magnitude
info?

In first approximation we forget about time altogether and interpret a
binary sequence as draws from a random variable.  This allows us to
work with the expected value defined as:

    E(a) = \sum p(a=x) x

With x ranging over the values {0,1} this simply reduces to

    E(a) = p(a=1)

This mathematical model connects to the messy physical world by
observing that a finite average of a sequence of a_i samples
approximates E(a).  Anything we say about the statistical property
E(a) in some sense carries over to what the estimator produces.  In
practice this means that the sequences can represent time-varying low
frequency signals by cutting off the integration at a particular low
frequency.

    
Attenuation

The logical AND of two signals behaves as attenuation.  A 0 in one
signal can cancel the transmission of a 1 in the other.  As long as
this doesn't lead to cases where a disproportionate number of 1s is
canceled due to correlation of the two bitstreams, the effect on the
expected value is to be attenuated by an amount proportional to the
expected value of the other signal.  

Representing AND as real number multiplication on the set {0,1} allows
the expression of the expected value of the product

    E(ab) = \sum p(a_x, b_y) a_x b_y
 
for events a_x and b_y ranging over {0,1}.  Of these 4 terms only
p(a=1,b=1) produces a nonzero contribution.  This can be simplified to
E(a) E(b) if the random variables are independent[3] meaning

    p(a_x, b_y) = p(a_x) p(b_y).

Note that "time slots" erased in a signal can be filled up with those
from another one, which might be created through an AND operation in a
similar manner.  This then provides a means for implementing addition
without the need for recoding.  This naturally leads to consider the
interpolation of two signals a and b, by a third signal x

    OR(AND (x,a), AND(~x,b))

This fills up all available time slots by multiplexing the signals x
and y proportionally to E(x).  A special case of this happens when b =
~a, leading to the XOR of two signals

    XOR(a,b) = OR((AND(x,a), ~AND(~x,~a)))

This interpolates between a signal and its polarity reversal.

Note that multiplication with factors >1 is not possible locally.  The
signal would need to be demodulated into a form where extra energy can
be added, which would then result in this being spread out over time.
However, it might be possible to construct some kind of buffered
mechanism that merges two streams, ensuring conservation of energy.


Decorrelation

Signals coming from an SD modulator are very repetetive, which means
that they can become correlated, breaking down multiplication.  In
order for this to work some form of decorrelation must be added.
Every modulator could be equipped with a dither module, which would
also improve the SNR.


An analogy

(Thu Oct 19 22:21:22 CEST 2006) Think of an SD-modulator as a bus stop
where on each timestep (say 10 minutes) a bus either leaves or not. A
bus only leaves if it's full, and the number of people that arrive in
a single time step is never greater than the capacity of the bus. This
way, the bus stop never needs to accommodate more than one bus worth
of people. Once i got this picture in my head i couldn't get rid of it
any more. It clearly illustrates the conservation properties of a bus
stop, which implies the SDM preserves averages and low frequency
signals.


References

I've read some papers by Philips people about the SACD..  It included
some operations that can be performed on an SACD stream without
decoding it.  What I remember is the use of interleaving to mix two
streams.  For other operations they always used (de / re) modulation
around an ordinary n-bits representation.  I found this[2] in Chuck
Moore's site, which is related but talks about analog domain VCOs.

[1] http://en.wikipedia.org/wiki/Delta-sigma_modulation
[2] http://colorforth.com/immunity.htm
[3] http://en.wikipedia.org/wiki/Statistical_independence


Entry: Symmetries of boolean functions
Date: Tue Jun  9 14:55:19 CEST 2009

Inspired by the idea from the Sheep synth that only XOR and AND are
interesting ways of combining oscillators, here is some investigation
of the why of that.

There are 2^4 boolean functions that map 2 inputs to one.  There are
only a couple of interesting ones after you remove 0/1 morphism.  In
the following, the squares indicate their Karnaugh maps[1].

    |  A
    |  0 1 
----+--------
B 0 |  . .
  1 |  . .

Simply looking at the Karnaugh maps already suggests a number of
equivalence classes:

Select class: 1/3

NAND  OR    =>    <=   
1 1   0 1   1 0   1 1  
1 0   1 1   1 1   0 1  

AND   NOR   N=>   N<=
0 0   1 0   0 1   0 0
0 1   0 0   0 0   1 0

Transform class: 2/2 diag

XOR   <=>
1 0   0 1
0 1   1 0

Drop class: 2/2 non-diag

A     B     !A    !B
0 1   0 0   1 0   1 1
0 1   1 1   1 0   0 0

Ignore class: 4/0

0     1
0 0   1 1
0 0   1 1

Each class is a set of boolean function which is closed under
inversion of one of the 2 inputs, or the output of the function.  The
classes can be constructed from a single representative f.  With i_o,
i_a, and i_b inversion or identitiy, the functions 

  g(a,b) = i_o(f(i_a(a), i_b(b)))

make up the class.  The maximum number of elements is 2^(n+1), which
occurs when all combinations of inversion and identitiy yield
a distinct function.

Let's define an "interesting" function to be the ones that belong to
large equivalence classes.  In this sense only AND and its symmetries
are interesting.  Note that this is => (implication).  Putting it as
"only implication is interesting" makes it sound a bit more profound.

Instead of looking at inversions, it's also possible to look at input
permutations.  For the 2->1 case this seems to be a closed operation
in all the classes (mirror around the diagonal).  In fact, in the 3->1
case we need to include permutation symmetry.


So what about 3->1 functions.  Is there a similar way to classify them
as interesting/boring using the same classification (inversion +
permutation).

There are 2^(2^3) = 256 such boolean functions.  They can be
visualized as 2x2x2 Karnaugh cubes or folded open to one side as a
flat map:

    |  AB    :
    | 00 10  : 11 10 
----+--------:------
C 0 |  .  .  :  .  .
  1 |  .  .  :  .  .
             :
            fold

For 3->1 functions one just needs to look at the morphisms of the
2x2x2 binary cube.  With what we find from the 2->1 case, there are
some things we can expect:

1. single select (3AND, 3OR, ...)
2. embeddings of the 2->1 functions
3. irreducible structures

Here 3. is expected to be an analogue of non-separable convolution
kernels: increasing dimensionality introduces new structures that
can't be written simply in terms of lower-dimensional ones.

Looking at the 2x2x2 binary cube and trying to fill it with a pattern
that can be oriented in many different ways makes me think of an
L-shape like this:

x x . .
x . . .

It can be placed in one of 8 corners, with each taking 3 possible
orientations, leading to 24 symmetries.  However, this one is "flat".
It is independent of one of the inputs.  Rotating one plane of the the
2x2x2 cube 90 degrees wrt to another one (like you would do with a
rubik's cube) gives this configuration:

. x x .
x . . .

Which is no longer flat.  It is like an L that maximally occupies
space in the cube.  This turns out to be the function (class) that's
used in Wolfram's rule 110[2].

These two functions are in distinct classes (they difference is
_essential_ as the former doesn't use one of the inputs, while the
latter does).  However, they have the same number of symmetries:

       8 x 3 x 2 = 48

This follows from 8 different positions (the corners of the cube), 3
different orientations to fix one of the other 3 points at, and the
inversion of the pattern.

This might not be a coincidence.  Are they maybe lines and planes of a
projective plane, thereby being dual?

For 3-input, binary, 1D CAs, Wolfram identifies 88[3] fundamentally
different rules.  Why are there so many?  For CAs the order and
polarity matters a great deal.  Due to recursive application, there is
little more than left-right symmetry (256->128) + some symmetry in
degenerate cases.

Anyways.  How to look for such "interesting" functions automatically?

Let's look at the interpretation of functions in this class.  After
all, this is simple logic, so there is natural-language intuition to
connect to.

       _ __ _
_  AB AB AB AB
C   .  x  x  . 
C   x  .  .  .

(A & B & C) | ((~B & ~C))
            | ~(B | C)

Hmm not much there..


What practical predictions to make?  Not more than hunches really.  I
suspect that a representative of this function will lead to an
interesting modulation scheme for a 3-oscillator synth.  This can
probably be generalized to higer dimensional functions.

Does giving up permutation give special status to the individual
functions?  This might also be usable in sound synthesis, where one of
the modulators has a different function than the other sounds.


[1] http://en.wikipedia.org/wiki/Karnaugh_map
[2] http://mathworld.wolfram.com/Rule110.html
[3] http://www.wolframscience.com/nksonline/page-57-text?firstview=1


Entry: Music and almost integers
Date: Tue Jun  9 20:38:33 CEST 2009

An almost integer[1] is a number that is very close to an integer, but
isn't an integer.  These can be found in great number by applying
trigoniometric functions to integers, ending up close to integers.  

(cos 22) -> -0.9999608263946371 != -1

This particular one is related to the approximation of PI as a ratio
22/7.  Similar things can happen with exponentials of ratios ending up
close to other simpler ratios.  This phenomenon is very much related
to tonal music, which is built on rational approximations to
irrational numbers.

Simultaneously sounding pure tones produce harmony, wich can be
described as regular beat patterns.  This happens when frequency
ratios are ratios of small integers.  This is then tied into melody by
employing the ratios that make sense for simultaneously sounding tones
to space out frequencies of tones intended to be played sequentially,
creating scales.  Melody is evolution of tone frequency over time,
where repetition and proximity play an important role.  This makes
melody and harmony combine nicely yielding tonal music.

However, this picture only works approximately.  Somehow the human
brain likes the kind of non-exactness.  Some intervals just aren't
that pure as others, and this can be used as an expressive device.

I.e. a perfect fifth[2] has a ratio of 3/2.  In an equal tempered
scale this is approximated by 2^(7/12).  Applying this ratio 12 times
produces the circle of fifths[3] and ends up exactly 7 octaves higher.

However, in just intonation this isn't quite so:

(define (pow n x) (if (zero? n) 1 (* x (pow (sub1 n) x))))

(pow 3/2 12) 
-> 531441/4096 = 129.746337890625 
!= 2^7 = 128

This is solved by making some of the fifths that construct the scale
less exact, favouring the fourth[4] (4/3) major third[5] (5/3) and
minor third[6] (6/5) to build harmonicly sounding chords.


[1] http://mathworld.wolfram.com/AlmostInteger.html
[2] http://en.wikipedia.org/wiki/Perfect_fifth
[3] http://en.wikipedia.org/wiki/Circle_of_fifths
[4] http://en.wikipedia.org/wiki/Perfect_fourth
[5] http://en.wikipedia.org/wiki/Major_third
[6] http://en.wikipedia.org/wiki/Minor_third


Entry: Filters in SD space.
Date: Wed Jun 10 12:46:49 CEST 2009

In [1] it was shown that Sigma-Delta modulated signals can be
attenuated and mixed using local operations as long as the energy
never increases, and as long as signals remain independent.  This post
elaborates on the subject bringing back the concept of time.


Filters

To implement filters I see two possibilities: continuous time filters
(integrators) and discrete time filters (based on time delay).
Time-delay filters are possible with the current representation as
long as independence of time-shifted signals is maintained.  However,
this might not be so straightforward to accomplish.  Following my
hunch I think it's better to go for the idea of directly modeling
integration as an operation on the SD bitstream.

Is it possible to re-introduce the concept of time?  Note that uptil
now we talked only about expected values of random variables, and
ignored time completely, using some hand-waving that "short time
averaging" turns the sequence back into an analog signal.

With a concept of time it is probably possible to introduce the
concept of integration, or at least that of a bandlimted integration
(1-pole low pass filtering == an opamp), operating directly on an SD
bitstream.


Integration

This operation is about re-introducing time in the binary sequence.
Integration can no longer be expressed as the expected value of a
stream, as that is a single number.  Some clever way of turning the
bit sequence back into a sequence of numbers (or a continuous
function!) should lead to a way to express the integration operation.

It seems to me that leaving the discrete domain as soon as possible is
the way to approach the problem.  The signal s(t) is hidden in the
bitstream as follows:

    a(t) = \sum a_i p(t-i) = s(t) + n(t)

Where p(t) is a square pulse with hight one, duration 1.  What is the
effect of integrating s(t) on a_i?

It might be better to approach this problem from the other side.  What
can we do operating on the bitstream a_i to produce something that
behaves like integration in the low band?  Is there a way to do this
without recoding?

The key problem is that in order to be able to generate a bitstream, a
local representation of the accumulated state is necessary, to be able
to use this number to space out pulses.  At first sight it does look
like there is no way around a demodulation/process/modulation scheme.


Additional Questions

* It looks like noise is going to be the essential element to
  translate between the discrete (bitstream) and the continuous
  (statistics of the bitstream).

* Is a square wave, edge triggered representation actually useful?
  Instead of sending out the carry bit as a binary signal, sending the
  carry bit as a state change generates a square wave
  frequency-encoded signal.  This signal represents a full-scale
  square wave in the SD domain.

* Can we represent complex multiplication as a means to introduce sine
  and cosine?  The answer seems to be yes as long as magnitude is <1.
  Qantify this.

* Since decorrelation is so important, can't we just use
  (uncompressable) information transmitted alongside the signal as an
  extra digital channel?

* Since we're digital and thus error-less, can we maybe construct
  differential equations based on differentiation instead of
  integration?  Does this make sense at all?

* Maybe the only question to ask is: Is it possible to implement
  differential equations using a S-D integrator and negative feedback?
  Is such a thing stable?  Both in the digital domain (non-linear
  limit cycles) and the statistical domain.

[1] entry://20090609-111303


Entry: SD and edge-triggered representation
Date: Wed Jun 10 15:35:37 CEST 2009

What is nice about the SD form is the ease at which it can be
integrated into mixed analog/digital electronics.  In fact, it can be
attached directly to a set of speakers, or any analog circuit with
low-pass characteristic.

However, I wonder if moving to an edge-only representation will make
things simpler.  This will give a pure FM representation of a signal:
value directly proportional to square wave frequency.  It eliminates
an element from the data representation: the width of a pulse.  This
is essentially an arbitrary component and can be easily reconstructed
later.

The important question is: does it make things really simpler?

Let's re-interpret the elementary operations explored in [1] in terms
of square wave signal.

FM signals are trivially generated from an accumulator as the
accumulator's MSB.  Converting an FM signal to analog can be done in
many ways [2], but the simplest ways seem to convert the edge signal
back to a pulse signal either unclocked using a one-shot, or clocked
using a bit delay.

Since a signal's value is directly proportional to the density of
transitions, addition is represented by the XOR operation.  The
advantage here is that addition is "dumb".  There is no gating
required as in the pulsed case.

This seems simpler, but just as the pulse representation, this might
lead to loss of information due to aliasing.  However, the form of
aliasing is different and consists of glitches.  Proper "de-glitching"
needs some clocked logic, and can only be done exactly if there is
room in the output signal (if signals don't change during de-glitching
pulse width).  This brings us back to the complexity of dealing with
bit slots as in the SD case.

Attenuation could be implemented statistically as "selectively
forgetting" to toggle the switch.  There isn't much difference here.

So, it looks like most operations can again be implemented using
simple logic gates, conditionally on the uncorrelated nature of the
input signal.  I'm not really convinced about this since it seems to
be harder to qualify: the extra work that needs to be done in a
clocked/pulsed representation seems to bring at least some order that
is harder to reconstruct in the FM case.  The FM case however is
probably simpler to interface to special puropose (non-clocked) analog
circuits.

Conclusion: FM representation is in some respects simpler, but harder
to control using statistical time-multiplexing tricks as described in
[1].  Since conversion between the two formats is rather
straightforward in clocked logic, it seems that sticking with SD is
best.  Then FM can be used when it is convenient, i.e. when
interfacing to an unclocked or asynchronously clocked circuit.

[1] entry://20090609-111303
[2] http://en.wikipedia.org/wiki/Demodulation#FM_radio


Entry: Using finite fields for music
Date: Wed Jun 10 18:57:46 CEST 2009

One of those ideas that disappeared into the background..  Let's try
again.  A finite field[1] or Galois field is a field[2] with p^n
elements, where p is a prime number.  There is only one GF(p^n) upto
isomorphism.  The field p^n is the field of polynomials over p, modulo
a reducable polynomial of order n.

For music applications, it is probably possible to look at Galois
fields in the same way as one would use the complex number field to
represent sum-of-sines.  Multiplicative cycles can be used as
different rhytmic/harmonic components, and each element of the field
could be associated to an instrument, or a combination of.  

For music we need the numbers 2, 3, 5 and 7 in copious quantities.
Larger prime numbers are less interesting from a harmony/rhythm point
of view.  Given that we're looking for a certain combination of small
prime numbers in the factorization of the multiplicative group, we can
start typing things into a calculator as see if we end up at prime
powers.

  2^a 3^b 5^c 7^d = p^n - 1

A more systematic way is probably in order, i.e. generating[5] all
power of primes below a certain number and investigating the factors
of the multiplicative group order.  However, to illustrate the idea
let's just pick a few starting from the factorization.  The first
candidate is GF(211) which fits nicely in 8 bits.

  2^2 . 3 . 5 . 7 = 211 - 1

Another one that fits in 8 bits is GF(241).

  2^4 . 3 . 5 = 241 - 1


The question is: how to make this interesting?  We can generate
periodic sequences of numbers using formulas like:

   x_n = \sum a_i g_i^n

where g_i are generators of cyclic subgroups, and a_i are arbitrary
field elements.  The first question is: given a particular
representation of a GF, how many such sequences exist?  Suppose we
associate a certain configuration of drum sounds to each element in
the field.  How many patterns can be generated?  Are they at all
interesting?  What about maps from the elements to a smaller number of
drums?  

Maybe combinations of drums can be associated to terms in a
polynomial.  Is there an interesting way to combine a non-prime field
with cycles this way?  

Maybe the interesting stuff lies in the symmetries of a field[7]?
I.e. start with a simple rhythm.  Find a (small) field to embed it in,
then find variations of this by looking at the field's automorphisms.

How can a GF be visualized?  The structure of the additive group is
quite straightforward: for GF(p^n) it is n times p cycles.  How can
the multiplicative group be shown on top of this?  There's a
visualization of GF(3^2) here[3].  Visualizing the the group as
actions on a set seems like a good idea.  Looks like a nice
application for graph visualization[4].

Intuitively it seems that groups with small p and n in the same order
of magnitude, will contain a more interesting structure.  Instead of a
single flat space (p^1) the structure is a vector space where each
dimension has a separate entitiy.  It allows the representation of
instruments/notes (q) that can sound together in chords(n).

I.e. 5 tones, 3 note-chords gives 5^3=15 with a multiplicative group
of 14 = 2 . 7 elements.  More concisely:

order   factorization
          add   mul
----------------------------
15        5^3   2 . 7 
21        7^3   2 . 2 . 5

As a side note, maybe Galois Theory[7] is interesting to study in its
own right, as it is connected to polynomials which are next to
matrices the bread and butter of numerical math.


[1] http://en.wikipedia.org/wiki/Finite_field
[2] http://en.wikipedia.org/wiki/Field_(mathematics)
[3] http://finitegeometry.org/sc/9/3x3.html
[4] http://en.wikipedia.org/wiki/Graph_drawing
[5] http://www.research.att.com/~njas/sequences/A000961
[6] http://en.wikipedia.org/wiki/Cyclic_group
[7] http://en.wikipedia.org/wiki/Galois_theory


Entry: Binary Sound Synthesis
Date: Wed Jun 17 19:22:43 CEST 2009
Type: tex

This is an account of a journey into ``Binary Sound Synthesis'' (BSS),
which started early 2005 as a test application\footnote{The original
  synthesizer ran early 2005 on a PIC12F628, an 8 bit microcontroller
  with 6 i/o pins which I had configured to run at 1 MIPS. The current
  one runs since mid 2006 on a PIC18F1220, an 8 bit micro with 18 pins
  and a slightly more powerful ISA.  The original synth is a
  synchronous one running at $8$kHz. At each sampling point the output
  state of $3$ oscillators is determined. The control updates run at
  $200$Hz, while the note updates run at $8$Hz, which is $1/4$ note at
  $120$bpm. The current one uses 3 asynchronous hardware timers, thus
  a higher maximal rate of $2$MHz.  In a nutshell, the synth contains
  2 main algorithms: cross modulation (XOR mixer), a formant synth
  (AND/OR mixer), LFSR pseudo-random sequence generator. It uses 3
  oscillators that can be synchronized.  Most of the interesting parts
  are in the controller that reconfigures the oscillators.} for a
Forth compiler for the 8--bit Microchip PIC architectures.  With BSS
is meant the process of generating audible sound from digital
square--wave signals using an absolute minimum of logic or code.

This is old stuff.  The digital approach dates from the era of early
8--bit game machines, and as such a lot of the algorithms are no
longer used.  Currently there are few reasons to not use floating or
fixed point math with PCM outputs.

The point of this paper is partly to archive and document old
techniques, and to shine an idiosyncratic light on the matter.  I find
this a facinating subject.  Probably because it is so different from
the standard real and complex number based signal processing.  

In BSS the main focus is on the time component, since it is the timing
between switching a speaker on and off which becomes the only means of
controlling the sound.

\section{Cycles}

The simplest sounds we can produce are based on periodic square
waveforms.  An oscillator can be can be implemented using a frequency
divider or pulse counter, which generates one output event for each
$k$ input events.  Multiple such oscillators can then be connected to
a hardware timer interrupt.

Mathematically, this can be related to addition in the ring of
integers modulo $k$.  We'll denote this algebraic structure as
$\ZZ_k$, the cyclic group of order $k$.  This section talks about such
groups, as they will appear as multiplicative groups of Galois Fields.


\subsection{Counters and Cyclic Groups}

Cyclic groups are groups, necessarily abelian, that can be generated
by a single element. In our canonical representation, this element
will be $1$ in $\ZZ_k$. If $k+1$ is a power of a prime, it is the
multiplicative group of the Galois Field $\GF(k+1)$, which we will
encounter later. All abelian groups are composed of direct products of
prime order cyclic groups, but not all abelian groups are cyclic. The
prototypical example being the Klein $4$--group $\ZZ_2 \times \ZZ_2$.

The exact relation between a group and a counter is as follows. We use
additive notiation because all the groups are abelian.  To each
counter we associate a group $\G$, a generator $g\in\G$, and a state
$s\in\G$. On each input event, the state $s$ will be replaced with
$s+g$. Whenever the state reaches $s=0$, the unit element of the
group, an output event is generated. The division factor is the order
of $g$, which is the smallest integer $o$ for which
$\underbrace{g+g+\ldots+g}_{o \text{ times}} = 0$. 

For example in $\ZZ_{6}$ where $\{0,6,12,\ldots\}$ denote the same
element, the generator $1$ has order $6$, the generator $2$ has order
$3$ and the generator $3$ has order $2$.  

\subsection{Composite Numbers}

Groups with prime order are not so interesting to us, they generate
only one kind of cycle. What interests us most is the combination of
timers. Let's have a look at the example of highly degenerate groups,
which have a number of elements equal to a composite number like the
number of seconds in a week $604800 = 2^7 3^3 5^2 7$.

Note that from the size of an arbitrary group we can't necessarily
deduce its structure. However, requiring that the group is cyclic is
enough, since there's only one of a given size. This makes cyclic
groups look a lot like positive integers. A cyclic group can be
constructed explicitly as a product of cyclic groups of prime power
order, ensuring the component groups have orders that are mutually
coprime. This is again analogous to how positive integers behave. A
point of difference, however, is that groups of prime power order are
not a product of smaller cycles, but they do contain all smaller prime
power cyclic groups as subgroups.

The number above gives the group
$$\ZZ_{2^7} \times \ZZ_{3^3} \times \ZZ_{5^2} \times \ZZ_{7},$$ which
consists of $4$--tuples with elements from the respective groups.
This group is essentially the same as $\ZZ_{2^7 3^3 5^2 7}$. 
I'll denote the group order as
$$\#\G=\prod_{n=1}^{N} p_n ^ {m_n},$$ and $\G$ as the abstract group,
where $\ZZ_{\#\G}$ and $\times_{n=1}^{N} \ZZ_{p_n^{m_n}}$ are two
isomorphic representations.
In our representation of $\G$, the cyclic subgroup of order $q_n$ in
$\G$ can be generated from the element
$$g_n = {\#\G \over p_n^{m_n}},$$ giving a homomorphism from $\G$ to
$\ZZ_{p_n^{m_n}}$. Combining $N$ such homomorphisms allows the
construction of an isomorphism
$$f : \times_n Z_{p_n^{m_n}} \to Z_{\#G} : (x_1, \ldots, x_N) \mapsto
\sum_n x_n g_n.$$ This leads us to the more concrete observation that
an oscillator based on $\G$ can be implemented either as a single
counter $\ZZ_{\#G}$, using a single generator, or a ``wired or'' of $N$
counters $\ZZ_{p_n^{m_n}}$, each with its own generator, where the
whole acts as a single frequency divider which only generates an event
if all counters have $s_n=0$.

\subsection{Musical Scale}

This makes one wonder if it's not possible to construct a group
that is very degenerate, in a way that it
produces a relatively well spread out ``frequency spectrum'' which
maps well to the (logarithmic) frequency resolution of human hearing.
Keeping the number of large prime factors small, it might even be
possible to create some kind of musical scale with inherent
``smoothness''. Probably, if there are enough small primes, large
prime numbers are not really necessary since frequency resolution for
large numbers is less of a problem.

For example $d = 604800 = 2^7 3^3 5^2 7$, which is the number of
seconds in a week. Moreover, $d+1$ is prime, so it has an associated
Galois field, but this is no longer a polynomial field,

In order to investigate the usefulness of this, we could plot
out the possible frequencies that can be generated using this scheme
on a logarithmic scale. On the other hand, using fields might be
overkill. Using just cyclic groups (counters) might be better here.


Given a a degenerate cyclic group $\G$, how many different (cyclic)
subgroups does it have? All subgroups of cyclic groups are cyclic, so
there is a one to one correspondence of the order of all possible
combinations of subgroups of $\G$ and the combinations of prime
factors of $\#\G$. Which leads us to the number of possible groups
$$\prod_N (m_n+1),$$ since the subgroups of $\ZZ_{p^m}$ are $\{\ZZ_{1},
\ZZ_{p}, \ldots, \ZZ_{p^m}\}$, which totals $m+1$.

The subgroups, or numbers representing the orders, can be arranged in a
$N$ dimensional cuboid, addressed with coordinates $(x_1,\ldots,x_N)$
where each coordinate is limited to $0 \leq x_n \leq m_n$. The order
can be computed as
$$\#\G(X) = \#\G(x_1, \ldots, x_N) = \prod^N p_n^{x_n},$$ which
corresponds to the the subgroups $\G(X) = \times^N \ZZ_{p_n^{x_n}}$ of $\G$

The things we are interested in is the distribution of $\G(X)^{-1}$,
since it represents the number of frequences we can generate by
dividing a master clock, which would be the CPU clock for example. All
frequencies that can be generated in this way have a fairly simple
harmonic relationship, making it possible to generate smooth scales
and chords. 

\subsection{Playing with Subgroups of $\ZZ_{p^m}$}

Given a divider $\ZZ_{2^m}$, obtaining one for $\ZZ_{2^{m'}}$ with
$m'<m$ is really straightforward since the event $s=0$ can be computed
modulo $2^{m'}$. This can be implemented as checking for the zero
condition after performing an OR operation.

What about the case $p \neq 2$ where the modulo operation is no longer
trivial? What happens if we just apply the OR operation anyway? For
example, the $7$ element cycle can give the following sequences modulo $2^n$.
\begin{align*}
2^3 &\to 0,1,2,3,4,5,6 \\
2^2 &\to 0,1,2,3,0,1,2 \\
2^1 &\to 0,1,0,1,0,1,0 \\
2^0 &\to 0,0,0,0,0,0,0 \\
\end{align*}
which gives frequencies $1/7$, $2/7$, $5/7$ and $7/7$. Instead of
taking modulo and comparing to zero to generate events, it is possible
to access state bits directly and obtain a waveform which changes at
the moment of the events discribed above.

The hack I'm after here is to see wether it makes sense to have a
couple of static oscillators running and to make noises by directly
accessing some of their state bits. The configurable parts are then
just the generators used to update the counters. For example, one
counter with period $256=2^8$, one with $255=17^1 5^1 3^1$, one with
$225=3^2 5^2$, one with $245=5^1 7^2$, $250=2^1 5^3$ etc\ldots.

%[1] http://en.wikipedia.org/wiki/Cyclic_group

\section{Binary Signals}

This section deals with the analysis tools available to talk about
binary signals, and some algorithms to actually generate certain
classes of sounds.

\subsection{Generating Functions}


There's not much freedom to combine the output of binary oscillators.
The most well--behaved logic operation is XOR, since it preserves all
timing information (all transitions).  AND and OR act as conditional
off/on gates respectively. It is ok to leave out OR, since it is
equivalent to AND for inverted signals. The resulting operations are
addition (XOR) and multiplication (AND) in the field of integers
modulo $2$, also known as $\GF(2)$. Using this as the base field for
analysis of bit sequences $s_k$ allows the use of generating functions
$$s(x) = \sum_k s_k x^{k}.$$

It's usually more convenient to use single sided sequences, meaning
the sum runs over $k \geq 0$, or $s_k = 0$ for $k<0$.
Let's have a look at some oscillators. A pulse train with
perioid $P$ is given by $$s_P(x) = \sum_{k \geq 0} x^{Pk} = { 1 \over
1 + x^{P}}.$$
Any finite signal represented by the polynomial $s_0(x)$ of degree
$<P$ gives rise to a periodic signal by convolving it with the pulse
train above or
$$s(x) = { s_0(x) \over 1 + x^{P}}.$$ The polynomial $s_0(x)$ here
acts as a finite response filter for the signal $s_P(x)$.  A finite
pulse of \emph{degree} $D$, with which I mean $D+1$ consecutive ones,
is represented by
$$s_D(x) = 1 + x + \ldots + x^{D} = {1 + x^{D-1} \over 1 + x}.$$

\subsection{Difference equations}

It is possible to use exactly the same tools for dealing with binary
sequences as is customary to do with sequences of real or complex
numbers, which means using addition and multiplication of polynomials
and power series to represent addition and convolution of
sequences. More specifically, solutions to linear difference equations
can be expressed as rational functions over $\GF(2)$.

For example, the difference equation
$$D^2 y = D y + y + u,$$
expressed in terms of the generating functions becomes
$$(x^2 + x + 1) y(x) = u(x),$$ which expresses $y(x)$ as the signal
$u(x)$ filtered by 
$${1 \over x^2+x+1} = {x + 1 \over x^3 + 1} = (x+1) \sum_n x^{3n},$$
which is $\sum_n h_n x^n$, with $h_n$ periodic with period $3$. To
find the sequence corresponding to any rational function, compute the
partial fraction expansion in terms of the irreducible factors
$p_i(x)$ of the denominator, and perform the computation above, which
is to find the minimal degree polynomial $s_i(x)$ such that
$p_i(x)^{-1} = s_i(x) (x^P + 1)^{-1}$. Then $P$ will be the period of
the sequence generated by $p_i(x)^{-1}$. Above $x^2+x+1$ is already
irreducible.

So, what we see is that the solution of a linear difference equation
in $\GF(2)$ is determined by an impulse response (filter), which can
be written as a sum of periodic sequences, where each such sequence
corresponds to an irreducible polynomial $p(x)$. The period $P$ of
such a sequence is the order of the element $x$ in the field of
polynomials modulo $p(x)$. These sequences play the role of
\emph{damped sinusoids} related to difference equations in
$\mathbb{R}$.

\subsection{Exponentials}

% To get an analogue of the notion of \emph{complex damped exponential}
% as solutions to linear difference equations, we need to work in a ring
% extension of $\GF(2)$. It can't be a field extension in general, since
% all field extensions of $\GF(2)$ have $2^n$ elements. This makes their
% multiplicative group carry $2^n-1$ elements, so $2$ does not divide
% the largest cycle. Hence we need to move to a different structure to
% represent even period cycles.

To get an analogue of the notion of \emph{complex damped exponential}
let's have a look at the splitting field of a single irreducible
polynomial
\begin{align*}
{x^2+x+1$}
&= {(x + p)(x + q)} \\
&= {x^2 + (p + q) x + pq}.
\end{align*}
The extra field elements $p$ and $q$ satisfy $p + q = 1$ and $p q =
1$.  Eliminating $p$ from these equations we obtain $(1+q)q = 1$ or
$q^2 + q + 1 = 0$, which means the terms $0,1,q,q+1$ themselves are
polynomials in $q$ modulo $q^2+q+1$.  These elements form the
splitting field of the polynomial $x^2+x+1$, the quotient field of the
polynomial ring $\GF(2)[x]$ and its maximal ideal generated by
$x^2+x+1$.  To see that a periodic signal is indeed a sum of
``complex'' exponentials, we compute a partial fraction decomposition
as an example.
\begin{align*}
{1 \over x^2+x+1} 
&= {1 \over (x+q)(x+q^{-1})} \\
&= {1 \over (1+q^{-1}x)(1+qx)} \\
&= {a \over 1+q^{-1}x} + {b \over 1 + qx} \\
&= \sum_k aq^{-k} x^k + \sum_k bq^{k} x^k
\end{align*}
which is indeed a sum of two exponential sequences $a_k=aq^{-k}$ and
$b_k=bq^{k}$ with values in the splitting field.  The values $q^{-1}$
and $q$ are the ``signal poles'' and the field elements $a$ and $b$ are
the analogue of amplitudes or phases.
The field has multiplicative order $3$ which means both exponentials
are of this period, as is their sum.  This periodicity can be seen
directly by
\begin{align*}
{{1} \over {x^2+x+1}}
&= {{1 + x} \over {1 + x^3}} \\
&= (1 + x) \sum_k {x^{3k}}
\end{align*}

% Constructing the splitting field of an arbitrary polynomial can be
% done recursively, so in the end we'll end up with a field $\GF(2^m)$
% over which the polynomial splits into a product of linear factors.

\subsection{Degeneracy}

What happens if we have periodicities that cannot be represented as
subcycles of the multiplicative group of any extension of $\GF(2)$?
All multiplicative groups of the extensions of $\GF(2)$ are odd, so
can never have a period $2$ subcycle. 

An example of such a sequence would be a period $2$ sequence
associated with the generating function $b(x)(1+x^2)^{-1} = b(x)\sum_k
x^{2k}$. The factor $b(x)$ doesn't matter much for our point, so we'll
choose it such that the partial fraction expansion takes a simple
form. The denominator can be factored as $(x+1)^2$, which
gives the expansion
$${1 \over x+1} + {1 \over (x+1)^2}$$ if we choose $b(x)=x$.
The first term is simpy the period $1$ sequence
$\sum_k x^k$. What about the second one? It is the product of two such
sequences, which can be worked out as
$${1 \over (1+x)^2} = {1 \over 1+x^2} = 1+x^2+x^4+\ldots = \sum_k
x^{2k},$$ wich is a period $2$ sequence. It can't be an exponential of
an extension field element, so what is it? It turns out it can be
expressed as
$$\sum_k (k+1)x^k.$$ This case is of course analogous to $\CC$
in the case of a double pole. One tends to forget about these, since
in practical signal processing applications they do not occur. This is
because the chance a random sequence drawn from a continuous
probability distribution over $\CC$ has a multple pole is
zero. However, $\GF(2)$ is finite, so poles with multiplicities
greater than one are quite common. There is less room to avoid them!

Any period $2$ function has $2$ degrees of freedom, so these two terms
form a basis for those signals. In general, a period $N$ signal can be
reduced to sum of $N$ base terms in the splitting field of $x^N+1$.
We could think of this as a Fourier basis.  

A bit more about symbols like $(k+1)x^k$. Here the $k$ actually is a
function of $x$ and $k$ which is defined as
$$(k+1)x^k = \underbrace{x^k + x^k + x^k \ldots + x^k}_{k+1 \text{ terms }}.$$
In a similar way we can define the expression 
$${k(k+1) \over 2}x^k = \underbrace{1x^k + 2x^k + \ldots + (k+1)x^k}_{k+1
\text{ terms }}.$$ The last one can also be understood as
$$\underbrace{x^k + 0 + x^k + 0 + x^k + \ldots}_{k+1 \text{ terms,
including $0$ terms }}$$ This clearly shows how periods $2^m$ can be
constructed.

More generally, the terms in the partial fraction expansion for a
multiple pole $p$ in the splitting field are most conveniently
expressed as
$$(1 + px)^{-n} = \sum_k \binom{-n}{k} (px)^k$$ 

If the total period $N$ period is a multiple of a power of $2$, or
$N=2^m M$, all poles will have a multiplicity wich is a multiple of
$2^m$, since $$x^N+1 = x^{M 2^m}+1 = (x^M+1)^{2^m}$$. 

For a period $2^n$ signal the only pole is $p=1$.  In this case the
bit patterns of the base functions take a particular simple form,
which consists of a rotated, periodic Sierpinski triangle. 


\subsection{Roots of Unity}

% misschien moeten de coefficienten niet uit een veld komen?

There's another way to tell the story --- one which does enable any
periodic signal to be expressed as a sum of exponentials. However, we
need to focus on a different algebraic structure. Using the approach
above, the space in which to express the exponentials is a ring
constructed as a product of splitting fields of irreducible
components, combined with ``doubling up'' due to multiple poles. This
is rather convoluted. It might be easier to directly focus on the
structure generated by the roots of unity.


Suppose we can introduce the number $\omega$ so it's possible to
factor $$1+x^N = \prod_k (1+\omega^kx).$$ Here $\omega$ is a generator
of the group $\ZZ_N$, which we have encountered before. The group
contains all the cycles with period $p_n^k$ with $k \leq m_n$, since
it can be seen as a product of the cyclic groups
$\ZZ_{p_n^{m_n}}$. Here we assume the prime factorization
$$N = \prod_n p_n^{m_n} = 2^{m_1} 3^{m_2} 5^{m_3} \ldots$$

Given any signal with period $N$, represented by the degree $N-1$ polynomial
$s(x)$, the partial fraction expansion in terms of roots of unity is
\begin{align*}
{ \sum_k s_k x^k  \over 1+x^N}
&= \sum_n {S_n \over 1 + \omega^{-n} x} \\
&= { \sum_n S_n \sum_k (\omega^{nk})x^k \over 1+x^N} 
\end{align*}
which, after identifying coefficients of $x^k$ gives the relation
$$s_k = { \sum_n S_n \omega^{nk} },$$ clearly identified as the
\emph{discrete Fourier transform} or DFT, in abstract form.  To have a
look at this transform, we move away from the field of rational
functions over $\GF(2)$ which we used to express difference equations,
and consider only the ring $\GF(2)[x] / (1+x^N)$ of polynomials modulo
$1+x^N$ as a vector space.

The relation above is an automorphism of the vector space $\GF(2)^N$,
relating its two representations we shall call $x$--space and
$\omega$--space.  It is also a ring isomorphism since it dually maps
the operations of multiplication and convolution, to which I'll return
later.

But how can we see $\omega$ more concretely? In what space do $\omega$
and the $S_n$ live? Clearly it does not live in $\GF(2)$, or in any
extension field of $\GF(2)$ in the more general case where $N$ is
even.  Let's have a look at the example $N=6=2^1 3^1$. In this case it
is possible to represent the group generated by $\omega$ by an element
of $\GL(4,\GF(2))$, the general linear group of the $4$--dimensional
vector space over the field $\GF(2)$. If we choose
$$\omega = \matrix{ 1 & 1 & 0 & 0\\ 0 & 1 & 0 & 0\\ 0 & 0 & 1 & 1 \\ 0
& 0 & 1 & 0 }$$ it serves the purpose as a primitive root of unity,
since $\omega^6$ is the identity, but lesser powers are all different.
The top $2 \times 2$ diagonal block of $\omega$ is a generator of
$\ZZ_3$, and the bottom one is a generator of $\ZZ_2$, making this a
generator of the cyclic group $\ZZ_6$ written as an outer product of
the former. Writing all linear combinations of the $6$ independent
elements $\omega^n$ gives $2^6=64$ elements in the $6$--dimensional
vector space over $\GF(2)$. This vector space is equipped with a
product, turning it into a ring. This ring is isomorphic to the ring
of polynomials mentioned above, since the map $x \mapsto \omega$ is
clearly an isomorphism, but we already had a different isomorphism,
namely the DFT. In general, it is always possible to construct a
matrix representation like this.

Now, what is the relation between this way of analyzing periodic
signals, and the one explained above?


%% The structure to examine periodicity is really the ring $\GF(2)[x] /
%% (1+x^N)$.

%% % cyclotomic fields?

\subsection{$\GF(2)[x] / (1+x^N)$}

What about this field of polynomials? The extension fields mentioned
before can be reconstructed by dividing out some maximal ideals.

\subsection{Duality}

The automorphism of $\GF(2)[x] / (1+x^N)$ we called DFT above is a
particular one. We represented this map as a relation between
$x$--space and $\omega$--space. It is easy to see that these spaces
are dual with respect to the operations of $\emph{componentwise
multiplication}$ and $\emph{convolution}$.

The exponential representation is exactly what's necessary to analyze
componentwise multiplication of sequences, which can be seen here as
masking or gating one signal using another, implemented by bitwize
AND.

Given two periodic sequences $a_k$ and $b_k$ with periods dividing $N$,
the componentwize multiplication is expressed as
\begin{align*}
c_k 
&= a_k b_k  \\
&= (\sum_i A_i \omega^{ki}) (\sum_j B_j \omega^{kj}) \\
&= \sum_l (\omega x)^l \sum_{i+j = l} (A_i B_j) \\
&= \sum_l C_l \omega^{kl}
\end{align*}
where the sum $i+j=l$ is taken modulo $N$. It is clear that the
coefficients $C_l$ are obtained as a circular convolution of $A_i$ and
$B_j$.

\subsection{Frequency analysis}

It's nice to be able to use rational functions over $\GF(2)$ to reason
about binary signals, but they don't say much about what we actually
hear. We definitely don't hear bit sequences. Much of our hearing is
based on the cochlear resonance, and a lot of the subtleties of the
binary sequencies we can produce will get lost if the machinery in our
inner ear can't make sense of it. That's exactly what we use to our
advantage to generate ``noise'' using periodic bit sequences, which
seem to be to $\GF(2)$ what sinusoids are to $\mathbb{R}$.

Some good old Fourier series in $\mathbb{C}$ might help us on the road
to some interesting patterns. Let's compute the Fourier series of a
periodic bit sequence $b_k \in \{0,1\}$ with period $N$, here
represented as a function $b(t) = \sum_{k=0}^{N-1} b_k(t)$, with
$b_k(t) = b_k e(t-k)$, the unit step impulse $e(t) = u(t) - u(t-1)$,
where $u(t)$ is the Heaviside step function. This gives the Fourier
coefficients
\begin{align*}
h_n &= \int_0^N b(t) e^{-{i 2\pi n t  \over N}}dt \\
    &= \sum_{k=0}^{N-1} b_k \int_k^{k+1} e^{-{i 2\pi n t  \over N}}dt \\
    &= { iN (e^{-{i 2\pi n \over N}} - 1) \over 2 \pi n} 
       \sum_{k=0}^{N-1} b_k e^{-{i 2\pi n k  \over N}} \\
    &= { iN (\omega^n - 1) \over 2 \pi n} \sum_{k=0}^{N-1} b_k w^{nk}
\end{align*}
where we take $\omega = e^{-{i 2\pi \over N}}$, and of course $h_0 =
\sum_k b_k$. This is not the DFT of $b_n$, because the base functions
$b_k(t)$ are not Dirac impulses, and $n$ rangles over the whole
numbers.  However, we can compute $N$ terms of this infinite sequence
from the DFT if we multiply it by the weighting function or spectral
envelope $w_n={ iN (\omega^n - 1) \over 2 \pi n}$. Taking the even and
odd part gives scaling factors proportional to ${1 \over n}\cos{2 \pi
n \over N}$ and ${1 \over n}\sin{2 \pi n \over N}$.

% maybe we DO have such a binary parser?

Now, the question is, how to exploit possible means for generating bit
sequences $b_k$ to get to a desired or otherwize interesting $h_n$?


\subsection{Linear Transforms}

We already encountered linear transforms of vector spaces over
$\GF(2)$ to express the $N$th roots of unity, yielding a
representation of the ring of polynomials modulo $1+x^N$. Some more
about representations.

An example. The algebra of linear transforms $\GF(2)^{2 \times 2}$
contains $16$ elements. Any element $A$ in this algebra can be related
to its characteristic polynomial
$$a(x) = \determinant{a_{11} - x & a_{12} \\ a_{21} & a_{22} - x},$$
which has the general form
$$a(x) = a_2 x^2 + a_1 x + a_0.$$ Any element $A$ satisfies its own
characteristic polynomial or $a(A)=0$. The result of this is that the
ring $\GF(2) / (a(x))$ can be embedded in the matrix algebra.

For $\GF(2)^{2 \times 2}$ there are two interesting subrings, apart
from the trivial ring and $\GF(2)$. The first one comes from
$x^2+x+1$, which is the characteristic polynomial of
$$A=\matrix{1 & 1 \\ 1 & 0},$$ giving the field $\GF(2^2)$, with
elements represented as the matrices $\{0,I,A,A^2=A+I\}$. This field
has the cyclic group $\ZZ_3$ as a multiplicative group, consisting of
$\{I,A,A^2\}$. The second one comes from $x^2+1$, which is associated
to the matrix
$$B=\matrix{1 & 1 \\ 0 & 1},$$ giving the ring $\ZZ_4$ with elements
$\{0,B,B^2=I,B+I\}$. This ring has the cyclic group $\ZZ_2$ as a group
of units, consisting of $\{I,B\}$, and a zero divisor $(B+I)^2 =
0$. The isomorphism from this representation to the integers modulo
$4$ is obviously $\{(0,0),(I,1),(B,3),(B+I,2)\}$.
The matrix
$$\matrix{0 & 1 \\ 1 & 0}$$
gives the same structure, since it has the same characteristic polynomial.


%% Practically, matrix operations over $\GF(2)$ amount to the computation
%% of parity bits, combined with AND operations, so it might be
%% interesting to have a look at how to do that properly.

\subsection{Sierpinski Triangle}

Let's have a look at some recursive combinations of the matrices
above. For example, call $B_0 = \left[ 1 \right]$ and
$$B_{n+1} = \matrix{ B_n & B_n \\ 0 & B_n}.$$
Squaring this gives
$$B_{n+1}^2 = \matrix{ B_n^2 & 0 \\ 0 & B_n^2} = I_{n+1}$$ which
doesn't bring us very far apart from saying that the upper triangular
arrangement of the Sierpinski triangle gives involutions. We could have
gotten here too by recursively constructing the antidiagonal from the
other matrix which generates $\ZZ_2$.

$$A_{n+1} = \matrix{ A_n & A_n \\ A_n & 0},$$ where $A_1 = A$? This is
the symmetric arrangement of the triangle. 
Squaring this gives
$$A_{n+1}^2 = \matrix{ 0 & A_n^2 \\ A_n^2 & A_n^2},$$ and the third power gives
$$A_{n+1}^3 = \matrix{ A_n^3 & 0 \\ 0 & A_n^3} = I_{n+1},$$ which gives a
similar relation to the one for the iterated tensor product of $B$.


\subsection{Walsh Functions}

Over $\RR$, Hadamard matrices are defined as the successive tensor
products of
$$H_2 = \matrix{ 1 & 1 \\ 1 & -1 },$$
where $H_1=\matrix{1}$ and 
$$H_{2^{n+1}} = H_2 \otimes H_{2^n} = \matrix {H_{2^n} & H_{2^n} \\
H_{2^n} & -H_{2^n}}.$$ Walsh matrices are obtained from these by
arranging the rows so that the number of sign changes is in increasing
order. This is called \emph{sequency} ordering. This permutation is
equal to bit reversal permutation, followed by Gray coding.

These matrices have the property $H_n H_n^T = nI_n$. This relation
also holds for matrices with integer coefficients. However, mapping
this directly to $\GF(2)$ renders the relation trivial, since the
matrices have all $1$ elements and become nilpotent. How can these
be related to $2^n$th roots of unity represented as matrices over $\GF(2)$?

One obvious embedding of roots of unity is using the circular shift
operator. Are there any with a more interesting pattern, but still
with characteristic polynomial $x^N+1$ ?


\subsection{Oddities of \GF(2)}

In $\GF(2)[x]$, every irreducible polynomial has to have a nonzero
coefficient of $x^0$, so $p(x)+1 = xq(x)$. Then $q(x)=x^{-1} \mod
p(x)$. This gives easy access to the operations (approximating) left
and right shift of the bit vector of coefficients, which are
represented as multiplication by $x$ and $x^{-1}$ respectively.

The polynomial $x+1$ generates the binomial triangle. In $\GF(2)[x]$
this is the higly symmetrical Sierpinski triangle. As a consequence we
have relations like $(x^a+x^b)^2 = x^{2a} + x^{2b}$.

% check wikipedia page on binomial triangle -> exponential
% what about matrices over GF(2)? what's to find there?
% most theorems in linear algebra / projective geometry
% exclude GF(2) as a base field, because it behaves so differently.


%% \subsection{Matrix Representation}

%% Over $\GF(2)$ polynomials can be represented by (infinte) matrices,
%% while polynomial fields can be represented by (finite) matrices.


\subsection{Application : Linear Noise Generators}

The cheapest way to generate noise for the application at hand is to
use exactly the impulse response of linear systems described by
difference equations as explained above. If the period of the bit
sequence is large enough, the human auditory pattern detection will
not be able to recover the redundancy.

Following the same algebraic trick as exposed above, suppose the
difference equation is expressed by $p(x)y(x) = u(x)$, and $u(x) = 1$,
corresponding to the impulse sequence $(1000\ldots)$, what we need to
do is to find out which period is related to $p(x)$. In other words,
we have to find the $s_p(x)$ with the lowest possible degree such that
$${1 \over p(x)} = {s_p(x) \over x^P + 1} = s_p(x) \sum_n x^{Pn} ,$$
where $P$ is the period of the bit sequence corresponding to
$p(x)^{-1}$.  This is equivalent to the congruence relation
$$x^P = 1 \mod p(x).$$ To get the smallest $P$ satisfying this
equation as large as possible, it is necessary to choose $p(x)$ to be
a \emph{primitive} polynomial, which means it is irreducible over
$\GF(2)[x]$, and $x^n\mod p(x)$ generates all nonzero elements in the
field modulo $p(x)$, also denoted as $\GF(2^n)$ with $n$ the degree of
$p$.

%% In the quotient field $\GF(2)[x] / (p(x))$, the
%% element $x + (p(x))$ generates the entire multiplicative group. Here
%% $(p(x))$ denotes the ideal in $\GF(2)[x]$ generated by $p(x)$. The
%% multiplication is commutative, so it is always possible to construct
%% such a generator. Whether this generator is mapped to $x$ depends on
%% the polynomial $p(x)$.

The direct implementation of the difference equation is called a
\emph{Linear feedback shift register} or LFSR. The next output is
computed as the sum of a certain number of taps.  I originally used a
$16$bit LFSR with taps $(15,14,12,3)$, but the parallel variant is
simpler to implement given a parallel XOR operation is available.

This parallel variant is related more directly to the Galois field
$\GF(2^n)$. More specificly, the output of the generator is taken to
be any coefficient of the polynomial sequence generated by $x$, or
$\sum_k s_{k,n}x^k = s(x)_n = x^n \mod p(x)$, where the output
sequence could be $s_{0,n}$.

To illustrate the difference between an irreducible and a primitive
polynomial, take the irreducible polynomials $x^8+x^4+x^3+x^2+1$ and
$x^8+x^4+x^3+x+1$. Both are irreducible, but only the former produces
a primitive field. In the field of polynomials modulo the latter one,
the $x+1$ is a generator, but not $x$. Note that these fields are
isomorphic, or essentially equal, but their representations are
different.

Implementing the parallel scheme for the former polynomial on a
digital computer is straightforward. Multiplication of the state
polynomial $s(x)$ by $x$ corresponds to a left shift of the register
containing the coefficient bits. The bit that's shifted off (the
coefficient of $x^8$) is replaced by adding the other terms
$x^4+x^3+x^2+1$ to the current state. In other words, when a bit
carries over, \verb+1Dh+ $=$ \verb+00011101b+ is XORed with the state.


%% \subsection{Using a multiplier}

%% When a multiplier is present, a lot more intersting things can be
%% done.  One of them is the use of the linear congruential method for
%% random number generation.

%% \section{Resurrection}

%% Digging on the web.

\section{Other Discrete Structures}

%\subsection{Sampling and Formants}
% \subsection{Diadic Numbers}

\subsection{Phase and Frequency Modulation}

What happens when we modulate the generators of an additive subgroup,
or period of counters? Let's call the first \emph{phase} modulation
and the second \emph{frequency} modulation.

Using an oscillator with period $M$ and generator $1$ to generate the
generator of a second oscillator $F$ gives the following function
$$F_n = \sum_{i=0}^n i = {n(n-1) \over 2}\mod F.$$ 


\subsection{Cellular Automata}

Another interesting way to generate signals is by using cellular
automata. Using the ideas above, we could represent a finite row of
nonzero cells as a polynomial, and the update function as another
polynomial. Giving rise to an update function in the ring $\GF(2)[x]$
or any finite field $\GF(2)[x] / (p(x))$.


\subsection{The NNT and the Fields $\GF(2^{2^n}+1)$}


For $0 \leq k \leq 4$ the Fermat numbers $F_k = 2^{2^k}+1$ are prime,
which means there exists a primitive field $\F_k = \GF(F_k)$. This
field is isomorphic to $\mathbb{Z} / F_k \mathbb{Z}$ the integers
modulo $F_k$. 

Implementing this representation requires the implementation of a
modulo operation.  This can be quite expensive, especially on cheap
hardware lacking a fast hardware divider. However, it is possible to
embed a representation of $\F_k$ in the ring $\R_{k+1}$, where we take
$\R_k$ to be the ring represented by $\mathbb{Z} M_{k} / \mathbb{Z}$,
the integers modulo $M_{k} = 2^{2^k}-1$, where the $\{M_k : k \in
\mathbb{N}\}$ form a subset of the set of Mersenne numbers $\{2^n-1 :
n \in \mathbb{N}\}$.

The implementation of this ring relies on the reduction modulo $M_k$,
which is a cheap operation if $K=2^k$ is (a multiple of) the machine
wordsize. Suppose we want to reduce a number $[c:a]$ of $2K$ bits
modulo $M_k$, represented by the concatenation of two $K$ bit words
$c$ and $a$. This gives
\begin{align*}
[c:a]  &= 2^Kc+a \\ 
       &= (2^K-1)c+c+a \\
       &= c+a
\end{align*}
with operations modulo $M_k$. The consequence of this is that both
addition and multiplication in $\R_{k}$ can be performed by the
default finite wordlenght unsigned integer operations, followed by
this very simple reduction step. This reduction is just addition due
to the availability of $c$ as a separately addressable entity, namely
carry bit for unsigned addition and high word for unsigned
multiplication.

So how to embed $\F_k$ in $\R_{k+1}$? The map
$$f_k : \R_{k+1} \to (2^{2^k} - 1)\R_{k+1} : x \mapsto x 2^{2^k - 1}
(2^{2^k}-1) = x 1_k$$ will do just that. Here 
$$(2^{2^k} - 1)\R_{k+1} = \{(2^{2^k} - 1)m : m \in \R_{k+1}\},$$
which we will call $I_k$ is the ideal generated by $2^{2^k} - 1$ in the ring
$\R_{k+1}$. The element
$$1_k = 2^{2^k - 1}(2^{2^k}-1)$$ is the representative in $\R_{k+1}$
of the multiplicative unit of $\F_k$. It can be used to generate the
additive group, which gives us a clear relation between integers and
the representation in $\R_{k+1}$ of their associated elements in $\F_k$.

So how can we see this is a representation of $\F_k$? It is easy to
see that $f_k$ is a group homomorphism for the additive group of
$\R_{k+1}$ since $f_k(a+b) = (a+b)1_k = a1_k + b1_k = f_k(a) +
f_k(b)$. To see that $f_k$ is a ring homomorphism, rests us to show
that $f_k$ is a group homomorphism for the multiplicative group. We
need $f_k(a) f_k(b) = a b (1_k)^2 = a b 1_k = f_k(ab)$ which is valid
if $1_k^2 = 1_k$. That this is so can easily be verified.


%% This is so since
%% \begin{align*}
%% 1_k^2
%% &=  (2^{2^k - 1} (2^{2^k}-1))^2 \\
%% &=  (2^{2^{k+1} - 2}) (2^{2^{k+1}} - 2^{2^k + 1} + 1) \\
%% &=  (2^{2^{k+1} - 2}) (- 2^{2^k + 1} + 2) \\
%% &=  2^{2^k} (2^{2^{k} - 2}) 2 (-2^{2^k} + 1) \\
%% &=  (2^{2^{k} - 1}) (-2^{2^{k+1}} + 2^{2^k}) \\
%% &=  (2^{2^{k} - 1}) (-1 + 2^{2^k}) \\
%% &=  1_k
%% \end{align*}
%% with all operations in $\R_{k+1}$. Note there are no divisions here,
%% only multiplication and modulo reduction using $2^{2^{k+1}} = 1$. 

The kernel $\{x : f_k(x)=0\}$ of this ring homomorphism is $\{xF_k :
x\in\R_{k+1}\}$, which is a maximal ideal of $\R_{k+1}$, which means
that $\R_{k+1} / \{xF_k\}$ is a field. This structure is carried into
the ideal $M_k\R_{k+1}$ by $f$.

To find this representation by construction see \verb+mersenne.tex+,
which uses arguments to identify $1_k$ in the additive subgroup
generated by $M_k$, by requiring the property $1_k^j = 1_k$, for $0
\leq j \leq F_k$,


\subsection{Shape of $1_k$}
Returning to the embeddings of $\F_{k}$ in $\R_{k+1}$, the interesting
thing what $1_k$ exactly looks like.
\begin{align*}
1_0 &= \0\1 \\
1_1 &= \0\1\1\0 \\
1_2 &= \0\1\1\1.\1\0\0\0 \\
1_3 &= \0\1\1\1.\1\1\1\1.\1\0\0\0.\0\0\0\0 \\
1_4 &= \0\1\1\1.\1\1\1\1.\1\1\1\1.\1\1\1\1.\1\0\0\0.\0\0\0\0.\0\0\0\0.\0\0\0\0
\end{align*}
Here the dots just separates nibbles. This looks like a nice square
wave!  Another interesting thing to see is that in $\R_{k+1}$, there
is only one silence, which is all zero, since it is the same element
as all one.

What do the other numbers in the ideal $M_k \R_{k+1}$ look like?  The
redundancy is $M_k : 1$, which gets large quite fast. Are the
waveforms special in some way so we can use them as sounds?

\subsection{Resonant Filter}


Using two state output, there is really no straightforward way to add
two signals, performing additive synthesis. Composing a signal
consisting of different frequencies by adding them together modulo 2
behaves more like cross modulation than addition. The audible spectrum
will contain not only the sum, but also the modulation components.

However, it is possible to approximate $2$ tone polyphony by
synthesizing waveforms that have two clearly distinct harmonic
components. The approach is based on the following observation. The
sequence
$$\1\0\1\0\1\0\0\0\0\0\0\0\1\0\1\0\1\0\0\0\0\0\0\0 \ldots$$ resembles
the response of a resonant filter with period $2$ and time constant
$6$, to an impulse signal with period $12$, while the sequence
$$\1\0\1\0\1\0\1\1\1\1\1\1\0\1\0\1\0\1\0\0\0\0\0\0 \ldots$$ resembles
a response of the same filter to a square wave with period $24$. It
is even so that this particular scenario can be expressed in terms of
generating functions, when the resonant filter is represented by a
polynomial (finite), stretching the analogy even further.

This system has $3$ degrees of freedom: the periodicity, the resonance
frequency and the time constant. This can be implemented as the
product (AND) of a pulse width modulated low frequency signal
oscillator, and a high frequency oscillator which has its phase reset
for each $0 \to 1$ transition of the low frequency signal.

It is probably possible to combine two of those to get two formants
which is enough to encode very basic vowel spectra. This requires 3
timers, which fits perfectly in the available hardware.
% distorted walky talky ??


\subsection{Chaotic Oscillator}

An emulation of the qualitative behaviour of a chaotic oscillator can
be obtained by driving the Resonant Filter system described above with
a base period that is irregular.


\subsection{Pulse Width Modulation}

Now, this is sort of cheating, since the purpose of PWM is really
efficient switched digital to analog conversion. Using it, we give up
the ``true'' binary output, and add a layer that will allow us to vary
the voltage (on average) between more than 2 levels.

The \verb+18f1220+ chips contain a hardware PWM module with $8$ bit
resolution ($10$ bit for the enhanced version). A fixed period
oscillator is used as the timing source to drive a pin high and low
with a programmable duty cycle. 

I do wonder wether it's possible or desirable to do some computations
in $\GF(257)$. I don't see much however, other than convolution using
NTT.

\def\sd{\Sigma\Delta}
\subsection{$\sd$ Modulation}

Note that if switching efficiency isn't all important, $\sd$
modulation gives much better performance for 1--bit audio digital to
analog conversion. It generates higher frequency waveforms, but better
low frequency resolution.

It can also be used as a cheap way to do synthesis, since it behaves
approximately as a reverse frequency to voltage converter: output
pulse frequency is on average proportional to the input value.

With multibit digital input, it is very easy to implement: the binary
$\sd$ signal is the carry bit of a simple input accumulator. When the
accumulator overflows, one ``packet'' of energy is transferred to the
output.


\subsection{Gray Code}

Gray codes are (cyclic) sequences of $N$ binary digits, where each
number in the sequence differs only by one bit from the previous. They
can be understood as \emph{Hamiltonian paths} on a $N$--dimensional
hypercube, where one moves along edges of the cube. 

However, the most commonly used gray code is one with a particular
kind of symmetry. It is called the \emph{binary reflected} grey code
or BRGC, which is a recursive scheme that traverses both $N-1$
dimensional halves of the hypercube in the opposite direction. This
rule is applied recursively and terminates at the trivial $1$
dimensional case.

If $\bar{s}$ is the reversal of the sequence $s$, $s : t$ is the
concatenation of sequences $s$ and $t$, and $b(s)$ prepends each bit
vector in the sequence $s$ with bit $b$, the BRGC sequence recursion
relation is $$s_{n+1} = 0(s_n) : 1(\bar{s}_n),$$ where $s_0$ is the
length $1$ sequence containing the empty vector.

% the wikipedia page about this is unclear, since a 4D, 16 cycle coil is a BRGC.. maybe it's the ``constrained'' part?
%% Gray codes are related to the \emph{snake in a box} problem, ore more
%% specifically, the \emph{coil in a box} problem.

% interesting: single track codes


\section{Control and Note Sequencing}

This is a whole other ballpark. With \emph{control} is meant the
evolution of the configuration of the synth engine at a slower
rate. This is called \emph{haptic} rate and is related to our ability
to experience mechanic vibrations. The next rate down is \emph{note}
rate, and roughly corresponds to the frequency of events that can be
experienced visually. This cascade of time scales is called the
\emph{frequency hierarchy}.

The lower frequency parts are much less limited by the speed and
simplicity of the hardware, since more time can be used to do actual
computations. This makes it rather pointless to try to exhaust the
possibilities. However, here are some general ideas.

\subsection{Transient Stack}

In a binary synth there is no direct way to mix sounds.  One way of
bringing discrete events to the forground is by temporarily
interrupting stable background sounds.

The typical example would be to synthesize a sequence with
\emph{bass}, \emph{bassdrum}, \emph{snare} and \emph{hihat}, which are
ordered here left to right with time extent decreasing. Mixing the 4
will boil down to playing the shortest sound until it is done.


\subsection{Wavetable}

Since there's a wavetable player, it is possible to do some control
rate timbre synthesis by generating or updating wavetables. There are
a lot of possibilities here. Maybe the most interesting parts are
transform domain methods like the \emph{Walsh--Hadamard} transform.

\subsection{Chirps}

A simple form of frequency modulation using a linear ramp or a
sawtooth signal is easily implemented.  This can be used to generate
drum-like sounds.


\section{Design and Implementation}

\subsection{Frequency Hierarchy}

Human perception already seems to have a built in time hierarchy
roughly ordered from high to low as \emph{hearing}, \emph{feeling},
\emph{seeing}, \emph{moving}, \emph{remembering}.  This is just
a vague description, since a lot of these time scales overlap. However,
this idea of time hierarchy can easily be reflected in the way we keep time
in the synth, using the concepts of \emph{sample}, \emph{control} and
\emph{note} frequency. 

This is fairly standard practice. We're going a step further in that
each of these levels has an associated language, which metaprograms
the previous one. This is possible since efficiency can be traded for
expressive power as we move up the time scale ladder.

\begin{itemize}
\item[N)]   Note state updates switch control engines.
\item[C)]   Control state updates switch synth engines, 
            synth parameters and signal routing.
\item[S)]   Synth updates produce waveform samples.
\end{itemize}

It's probably easiest to have the low levels pre--empt the higher
ones, so computations can be spread out without explicit cooperative
multitasking. There's no reason to not have cooperative multitasking
on the high abstraction layer though.

%% The first layer's language is plain data: all synth engines are fixed,
%% only the current routine and engine, and it's parameters can be
%% changes. The second layer's language is threaded forth. Current
%% control word is any forth word, but actual code is still in flash
%% memory. The third layer's language is a music language.

\subsection{Synth Architectures}

\begin{itemize}

\item An asynchronous design using the $3$ high resolution timers to
generate events, and the low resolution timer to drive the noise
generator and the other time bases.

\item A synchronous design with a software synth engine running at a
fixed sample rate, from which all other time scales can be derived.

\item A similar synchronous design, but using $10$ bit hardware PWM as
output.

\end{itemize}

\subsection{Synth Patch}

The idea is to make the synth as configurable as possible, but at the
same time keeping the configurability contained in a small number of
bits so the transient stack can be implemented efficiently. The way to
do this is to eliminate all redundancy (symmetry) in the configuration
data.

Also the decoding of this information should be very straightforward,
since it has to be computed at each synchronous or asynchronous engine
event. I'll call the total configuration state, not including the
engine state, as the \emph{synth patch}.

The synth engine consists of 4 \emph{generators}. One noise generator
\verb+[noise]+ and 3 square wave oscillators \verb+[osc0]+,
\verb+[osc1]+ and \verb+[osc2]+. The oscillators and the noise
generator each have an associated update period, which is part of the
patch data. There state is not.

These oscillators can be synced to each other as single master sync $0
\to 1,2$ or cascaded sync $0 \to 1 \to 2$. This is encoded as one bit
for \verb+[osc1]+ and two for \verb+[osc2]+, leaving room for one
external sync event.

I'd like to parametrize the output combination of the 3 square wave
oscillators and the noise generator using a limited set of boolean
functions. For $4$ inputs, there are $2^{2^4}=2^{16}$ such
functions. To see this, draw the Karnaugh map of a $4$ input function,
and observe it has $16$ squares in total. Some things that could be
divided out are: the polarity of the output, the polarity of the noise
signal and the polarity of each of the input square waves.

Working it from the other side, the necessary combinations to get the
absolute minumum functionality are
\begin{itemize}
\item \verb+[xmod]+ Add (XOR) the output of 3 square wave
outputs. This can be used to generate PWM.
\item \verb+[reso]+ Multiply (AND) \verb+[osc1]+ and \verb+[osc2]+, and
add (AND) with \verb+osc0+.
\item \verb+[gate]+ Multiply (AND) the output of 3 square wave
outputs.
\end{itemize}

% is da wel waar?


\subsection{Synth Algorithms}

Currently, there's only $3$. Square waves, pseudorandom noise and
sample playback. Together with intermodulation schemes described
above, this gives already quite a rich palette.


\subsection{Misc Hacks}

So evaluate $AND$ of a number of bits in a byte, mask out all the bits
that are not necessary, and use the $ADD$ operation. This can also be
used to evaluate expressions of the form $f = a + bc$, which occur in
the \verb+[reso]+ mixer. Here I combine 3 waveforms as $f = a +
bc$. Using ordinary adder logic this gives $(a,b,c) + (0,0,1) =
(f,x,x)$. Adding more bits allows to propagate the result into the
carry flag.


Rotating oscillator period's bit rep.


\section{Electronics}

I think a lot can be done by combining analog and digital electronics.
One of the prototypical examples is of course the analog time varying
reso filter. Some other interesting things could be done by using
comparator based chaotic oscillators (switched unstable systems) and
time constant capturing.


Entry: LFSRs
Date: Thu Jun 18 11:27:53 CEST 2009
Type: tex

Looking at how linear feedback shift registers (LFSRs) are
implemented, one can notice that the tap coefficients feeding into the
XOR have to be from an irreducible polynomial in the polynomial ring
$\GF(2)[x]$. Why is this?

Suppose we have polynomials modulo $x^2+x+1$.  These polynomials form
the Galois field $\GF(2^2)$.  An LFSR shifts, which means it takes a
polynomial $p(x)=p_0+p_1x$ representing the current $2$--bit state
vector, and multiplies it by $x$.  After shifting, the highest order
term $p_1x^2$ needs to be folded back into the representation by
eliminating $x^2$ using the equation $x^2 + x + 1= 0$, which follows
from modulo arithmetic.  Hence, LFSR is shift followed by conditional
XOR, which means addition of $x+1$ if $p_1 = 1$.

So, the shift represents multiplication by the generator polynomial
$x$ which produces a sequence of all elements of the field's cyclic
multiplicative group.  The XOR computes the modulo operation after
multiplication, and is there to keep the representation contained in
the minimal amount of bits.

%[1] http://en.wikipedia.org/wiki/LFSR


Entry: Topics in Algebra
Date: Thu Jun 18 15:57:54 CEST 2009
Type: tex

This is a cheat sheet summary of ``Topics in Algebra'', by
I. N. Herstein.

\section{Groups}

\begin{itemize}
\item {[Order]} The order $o(G)$ of a (finite) group $G$ is the number
of elements it contains. The order $o(a)$ of an element $a$ is the
least positive integer $m$ such that $a^m = 1$.

\item{[Right Coset]} If $H$ is a subgroup of $G$, we call the distinct
sets $Hg$ with $g \in G$ the right cosets of $H$ in $G$. The subgroup
$H$ together with its right cosets $Hg$ is a partition of $G$.

\item{[Lagrange's Theorem]} For $H$ a subgroup of $G$, $i = o(G)/o(H) \in
\mathbb{N}$. This is called the \emph{index}, or the number of right
cosets.

\item{[Normal Subgroup]} $gNg^{-1} = N$, $g \in G$. The right cosets
of $N$ are also left cosets, since $gN = Ng$.

\item{[Quotient group]} is the group formed by all the right cosets of
a normal subgroup $N$ in $G$, denoted by $G/N$.

\item{[Homomorphism]} The kernel of a homomorphism is a normal
subgroup.  Likewize, a normal subgroup defines a homomorphism. For a
homomorphism of $G$ onto $\bar{G}$ with kernel $K$ we have $G/K
\approx \bar{G}$.

\item{[Cayley's Theorem]} Every group is isomorphic to a subgroup of
the automorphism group $A(S)$, for some set $S$.

\end{itemize}

The key feature of normal subgroups, $Ng=gN$, enables the introduction
of a group operation for the partition of cosets, because
$NaNb = NNab = Nab$.


% commutatieve groupen: cycli

\section{Rings}

\begin{itemize}
\item {[Ideal]} An additive subgroup invariant under multiplication.
\item {[Quotient ring]} The partition of additive cosets $R/U$ of an
ideal $U$ has the structure of a ring, and is a homomorphic image of
$R$.
\item{[Maximal ideal]} is an ideal $M$ yielding a field as quotient
ring $R/M$ for a commutative ring $R$.
\end{itemize}

The additive subgroup is always commutative, so the additive subgroup
is trivially normal. The quotient for groups involves two groups, but
for rings it involves a ring and an ideal. If the only ideals are $R$
and $\{0\}$, there the ring $R$ can no longer be simplified by
applying homomorphisms and has the structure of a field.

\section{Fields}

\begin{itemize}

\item{[Irreducible polynomial]} The splitting field $E$ of an
irreducible polynomial $p(x) \in F[x]$ of degree $n$ has degree $n$ or
$[E:F] = n$. As a corollary, the maximum degree of a splitting field
of any polynomial of degree $n$ is $n!$, where the worst case is
reached if after each extension step the polynomial has only one root,
so the resulting polynomial is again irreducible.

\item{[Unique splitting field]} The basis of galois theory lies in the
theorem that relates the splitting fields $E$ and $E'$ of the
isomorphic polynomials $f(x)$ and $f'(t)$, from two isomorphic
polynomial rings $F[x]$ and $F[t]$, where the isomophism leaves the
field $F$ invariant. The splitting fields $E$ and $E'$ are isomorphic
with an isomorphism leaving $F$ invariant. Applying this to different
splitting fields of the same polynomial, we find the splitting field's
structure is unique.

\end{itemize}


Entry: Bandlimited Discontinuities with Infinite Response
Date: Thu Jun 18 16:01:18 CEST 2009
Type: tex

This post contains a paper about bandlimited synthesis, coming from an
idea that sprouted somewhere in the summer of 2002, when I was
re-iterating over basic DSP math involving IIR and FIR filters and the
Z-transform.  The remaining problems are the evaluation of exponential
functions (complexity and precision), and the error analysis which is
important in this case as the algorithm relies on cancelling, a
technique which is generally avoided as it is noise-sensitive.
However, since the framework is quite controlled, there might be some
shortcuts to derive bounds.

\begin{abstract}
In this paper I discuss the use of a bank of damped sinusoidal
oscillators to synthesize the audible band of wideband signals
obtained as higher order integrals of impulse functions, i.e. step and
ramp functions. The usual approach is to use FIR polyphase
filtering. I show in this paper that using the corresponding IIR
filtering scheme is possible due to the high amount of redundancy of
the input data, which allows the use of analytical methods to
construct a specialized algorithm.
\end{abstract}

\section{Introduction}

Why are Infinite Impulse Response (IIR) filters are never used in
sample rate converters? The answer is fairly simple. Finite Impulse
Response (FIR) filters depend only on the input, while IIR filters
depend on input and previous output. Using an FIR filter to smooth out
a signal before downsampling is an efficient operation because the
samples that are dropped by the downsampling operation simply never
need to be computed. The approach is called \emph{polyphase
  filtering}. This does not work for IIR filters because their output
values are computed \emph{recursively}, so discarding intermediate
output values does not enable one to discard the computation, because
the next computation depends on its result.

Conclusion: while IIR filters are more efficient than FIR filters if
no linear phase relation needs to be preserved, for sample rate
conversion they are more expensive than their FIR counterparts because
the polyphase simplification can not be made. End of story. Or not?


\section{Analog Waveforms}

In the early days of electronic sound synthesis, analog relaxation
oscillator circuits were used to produce interesting broad--spectrum
sounds like square waves, sawtooth waves, etc\ldots. These could then
be used as inputs to time--varying filters to produce interesting
sounds.

Performing synthesis of these waveforms in the digital domain however
is much less straightforward than one might assume at first.  Naively
approximating a waveform by shifting its transition points to the
nearest sample points introduces jitter in the signal. This is quite
audible at higher frequencies due to it's stable modulation
pattern. I.e. for square waves the result is simplest to describe as a
pulse width modulated square wave.

Closer inspection of the analog circuit quickly indicates that the
step of analog to digital conversion complicates the matter: the
linear distortion caused by the Anti--Aliasing (AA) filter present in
a physical system is an effect that cannot be ignored, and thus has to
be synthesized. Traditionally, the effect of the AA filter is
implemented as a polyphase FIR filter bank, which could be called the
\emph{oversampled naive} approach.  In this paper I will describe a
solution for this problem using an IIR filter implemented as a
parrallel bank of complex one--pole filters.

The only assumption made is that the simulated analog waveforms are
(integrals of) time shifted Dirac impulse signals. This will lead to
an analytically tractable effect on the state of an IIR filter. This
paper shows how to compute the state updates necessary to synthesize
the impulse response at each sample point.


\section{State Updates}

Define the Dirac impulse at time $T$ as the generalized function
$\delta_T(t)$ which exhibits the properties $\delta_T(t) = 0$ for $t
\neq T$ and
$$\int_{-\infty}^{+\infty} \delta_T(x) f(t-x) dx = f(T).$$

Define a 1--pole continuous time low--pass filter as the differential
equation defining the function $y(t)$ in terms of the function $x(t)$
and the pole $s$ as
$${d \over {d t}} y = s (x - y).$$

To see this is a low--pass filter, assume is the constant function
$x(t) = x_0$. This gives the solution $y(t) = x_0$, meaning DC is
passed.

Assume we are given a model for the anti--aliasing lowpass filter
mentioned in the introduction (obtained from your filter design method
of choice). Using the partial fraction expansion of the transfer
function of this filter, it can always be written as the sum of
complex first order linear sections. This means the entire problem can
be understood in terms of the simple first order case. This is what
will lead to a tractable formula for the state update.


\subsection{Impulse Input}


What we are interested in is the function $y(t)$ evaluated at integer
points, assuming for simplicity the sample rate is $1$. Denote the
resulting sequence as $y[n]$. Compute the sequence by integrating the
differential equation from $t$ to $t + 1$.

$$y[n+1] = y[n] + s\int_n^{n+1} (x(t) - y(t)) dt. $$

What happens when we apply the filter to $x_0\delta_T(t)$ with $0 < T <
1$? There are 3 parts in the input: the impulse at time $T$, and the
free--running before and after. The solution of the differential
equation of the free system from $t_0$, obtained by setting the input
$x(t)=0$, is
$$ y(t) = y(t_0) e^{-s(t - t_0)}. $$

To simplify the following, denote $z = e^{-s}$. The impulse will just
add the value $x_0$ to the state, so we have the complete state update
$$y[1] = x_0 z^{1-T} + y[0] z,$$
which is a first order discrete IIR filter with a correction term
depending on the location of the impulse.


\subsection{Step Input}

Signals composed of $\delta_T(t)$ have a very rich spectrum and
thus need a strong anti--aliasing filter. Therefore it is interesting
to look at piecewize constant inputs. These are closer to the
waveforms that can be constructed using analog relaxation oscillator
circuits. Working with these directly helps to relax requirements for
the anti--aliasing filter, since these signals have less power in the
higher frequencies.

Here a similar approach is possible. A system with constant input
$x(t) \equiv x_0$ can be obtained from the previous zero--input system
by a change of variables. It has the solution
$$ y(t) = x_0 + z^{t-t_0}(-x_0 + y(t_0)).$$

The state now no longer jumps instantaneously at transition
points. Instead, the $x$ in the update equation changes when the input
jumps to a new value. I.e. a transition from $x_0$ to $x_1$ at $0 < T
< 1$ gives the state update
$$y[1] = x_1 + (-x_1 + x_0)z^{1-T} + (- x_0 + y[0])z,$$ which is again
a first order discrete IIR filter with a correction term.

It's probably enough to stop here, and approximate the integrals of
steps in the discrete domain. It won't make much difference for the
specification of the anti--alias filter.

\section{Exponentials}

The core routine of the algorithm is the computation of a complex
exponential $e^{tc}$ where $c$ is a filter pole and $0 < T <1$.  The
total cost is directly proportional to the frequency of the
discontinuities, since there is one update per discontinuity.  Note
that it's probably best to separate the computation of the real and
imaginary parts, so they can both have their own approximations.

% At first glance this seems to be problematic, because the algorithm
% resembles some random number generator. Really?

The AA filter has a cutoff around the sampling frequency, so the poles
$s$ are all over the unit disk. This means there is no general
simplification possible for $Ts \approx 1$. This makes sense: if there
would be such an approximation, the correction terms in the algorithm
would be practically insignificant.

Since the sample period and the filter poles are fixed, it's probably
easiest to use the $N$ bit binary approximation
$$e^{Ts} = z^T = \prod_{n=1}^N z^{T_n 2^{-n}},$$ where the $T_n$ are the
binary digits of $T$.  An alternative representation of this is
$$\prod_{n=1}^N z^{(2T_n - 1) 2^{-n-1}} \prod_{n=1}^N z^{2^{-n-1}},$$ or using the
constant $a = \prod_{n=1}^N z^{2^{-n-1}}$ and the relation $(2T_n-1) =
(-1)^{T_n}$ this can be written more clearly as
$$a\prod_{n=1}^N z^{(-1)^{T_n}2^{-n-1}}.$$

Using $N$ bits from $T$ reduces the perfect analytical solution to the
oversampled naive approach mentioned before, with oversampling factor
$2^N$. This seems to be the main precision trade--off point. 


Is it possible to update the computation? Suppose a constant input
period has discontinuities spaced period $P'$ apart. The only thing we
need is the fractional part $0 \leq P < 1$. Updating is indeed
possible by using the constants $z^{P}$ and $z^{P-1}$, which are the
\emph{forward} and \emph{backward} phase updates. The latter one is
used when the phasor would wrap around. Note that these updates are
not numerically stable, so they have to be reset from time to time
using direct computations.

\section{Resampling}

This looks like a general purpose resampler for arbitrary
time--varying input frequencies, which can be derived directly from it
by replacing the analog input signal by a PCM waveform.
My main question is, if it can be used as such, why didn't I hear
about it before?

Suppose the deal is to build an $N$ times downconverter. I do this again by
stimulating a bank of parallel complex one--pole filters, computing
the state update for each one--pole. In the case where the input signal
is an oversampled pulse amplitude modulated signal

$$x(t) = \sum_n a_n \delta(t - n/N),$$

the state update during one sample period is

$$y(t+1) =  a_{N-1} + p(a_{N-2} + \ldots p(a_0 + y(t)) \ldots ),$$

with $p = z^{-N} = \exp(-cN)$. This is a polynomial in $p$.
How is this different from just running a discrete filter at a the $N$
times higher sample rate, and discarding the intermediate samples?  It
is in fact exactly the same, since each of the parenthized expressions
of the form $p(a_i + y)$ is exactly the update equation of a discrete
one--pole IIR filter.

This shows that in fact, this \emph{is} indeed a general purpose
sample rate converter, but it doesn't really buy anything new when the
$a_i$ are not structured in a way that enables the fast evaluation of
the state update.


\section{Implementation Hacks}

The algorithm described above is only a small part of the entire
problem. This section collects some remarks about how one might go
about implementing a virtual analog waveform synthesizer.

\subsection{Filter design}

The way we presented the problem requires us to design a continuous
time (analog) filter with desired anti--aliasing characteristics.  The
simplest approach is to use a maximally flat filter. Rumor has it that
this is best for musical applications due to the way it leaves
low--frequency signals relatively untouched. If you don't like rumors,
anything goes: filter design is a black art, and for the application
at hand, it is a relatively independent part, and can be done
off--line, with virtually unlimited computing resources.

An additional constraint that might be interesting to incorporate in
the filter requirements are the characteristics of the class of input
signals, which generally have a $1/f$ spectrum (i.e. square and
triangle waves). It should even be possible to add additional
constraints to the values (quantization) of the filter poles so the
complex exponentials are easier to evaluate.


\subsection{Dithering}

The problem with aliasing is not the noise power per se, but the
(periodic) patterns present in the aliasing noise, and the difference
in noise level depending on the frequency. For the naive approach the
aliasing can go from nothing for subharmonics of the sampling
frequency, to quite noticable ``beating'' for frequencies close to
such subharmonics. In addition, the noise power increases with
frequency.

A classic dithering trick might be applied to lessen the
requirements for the anti--aliasing filter, by randomizing the
quantification noise. In this scheme it would amount to pulse position
modulation: shifting the time location of discontinuities by a random
value take from a rectangular or triangular PDF.

The point where dithering could be introduced is in the evaluation of
the complex exponentials: if the algorithm used is an approximation,
we could try to ensure its error is random. This only needs to be done
when the input is periodic, in which case the algorithm is an updating
one.


\subsection{Filter Error Analysis}

Parallel IIR realization is not so good in the stopband due to
round--off errors: the algorithm relies heavily on cancellation. The
irony is that this is exactly what we are forced to use! So high
precision calculations are required. This makes one wonder if it is
possible to transform the problem to something embedded in exact
integer arithmetic.


\subsection{High Frequency Approximation}

So where lie the real trade--off? Are higher frequencies just more
expensive?

First, it is not the case that correct treatment of low frequencies is
not necessary. Indeed it is the case that for low frequency square and
sawtooth waves the noise caused by naive synthesis can be inaudible,
but often these waveforms are used in a subtractive synthesis
algorithm, where the lower harmonics are filtered out, rendering the
error clearly audible: it is important to synthesize a correct
harmonic spectrum for all frequencies.

Second, the cost of updates is directly proportional to the frequency
of discontinuities. At some point it becomes cheaper to use a discrete
frequency domain approach: i.e. to synthesize the harmonics directly.

What's needed here is to determine the point where the proposed
algorithm becomes more expensive than an oscillator approach, and a
way to smoothly transition between the algorithm and a bank of
oscillators.


%% \subsection{Polar coordinates}

%% Something i've been wondering about for so many years is why it's not
%% possible to represent filters in polar coordinates. The obvious answer
%% is of course the computation of the logs and exponents, and the ubiquity
%% of additions that make this impractical. But is it always like that?


\section{Conclusions}

\begin{itemize}
\item Analytically compute state updates for simple discontinuities.
\item Simplify the numerical computation of such updates in the case of constant frequency inputs.
\item Error analysis of parallel filter bank.
\item Quantization error diffusion for pulse position.
\item Use as all--purpose resampler.
\end{itemize}


Entry: Dynamic Wavetable Synthesis
Date: Thu Jun 18 16:54:50 CEST 2009
Type: tex

\def\diff{ d }
\def\difft{ \diff \over \diff t }
\def\pdiff{ \partial }
\def\pdifft{ \pdiff \over \pdiff t }

This DRAFT paper is an attempt to pick up some research which shifted
to the background in 2005 because of problems integrating it into a
PhD program.  I am currently at the point of isolating the problems
and formulating them more explicitly.  A first version of model has
been implemented in Pure Data around 2000-2001 as the \verb+dynwav~+
object in the Creb extension library.

\subsection{Time--Phase Model}
Dynamic Wavetable Synthesis (DWS) approaches the sound synthesis
problem by separating it in two parts.
\begin{itemize}
\item Synthesis of sequences of instantaneous wave shapes in a
  discrete $2$--dimensional representation.
\item Interpolation of this representation to produce $1$--dimensional
  sound signals.
\end{itemize}
The novelty of the technique lies in the way \emph{pitch} and
\emph{timbre} data are represented more abstractly, which simplifies
the expression of algorithms that relate these two elements.  This
abstract representation is decoupled from output signal sampling
details through \emph{interpolation}.  Separating these two concerns
makes them easier to manage.

The DWS representation greatly simplifies the use of the Discrete
Fourier Transform (DFT) as a link between the time--domain waveform
and its harmonic spectrum, because the model domain consists of truly
periodic functions.  More generally this property makes it easier to
link the representation with other linear or non--linear transforms.

A unified $2$--dimensional interpolation approach replaces the need
for an ad--hoc short--time windowing mechanism as is used in many
contemporary techniques.  Such an approach is used to bridge the world
of stationary models and non--stationary sounds.  The use of explicit
interpolation in the DWS model also makes it easier to keep frequency
aliasing and phase cancelling under control, more specifically because
the interpolation step can tie into the frequency domain
representation of the waveform directly.

% In addition to describing synthesis, this paper also hints at possible
% analysis strategies.  Analysis however is not trivial in the presence
% of multiple dominant pitches.  In general it seems best to stick to
% well-proven techniques and use the DWS model only as a sound signal
% sythesis engine.

% To keep the exposition short I will assume familiarity with the matrix
% computation and interpolation techniques used.

% This paper needs to be reworked to restate the 2 problems clearly, and
% relate them to the underlying model properly. It also should be stated
% clearly that the subject is basicly sound synthesis, and analysis is
% added as an afterthought. Some other things that are missing: i
% currently see no way for a clean solution of the two main problems,
% being source/filter separation and pitch detection, other than trial
% and error heuristics.

\subsection{Continuous Model}
The basis of the DWS representation is a function $p(\phi, t)$ defined
on an infinitely long torus, with $\phi$ the periodic dimension and
$t$ the linear dimension. The period is most conveniently taken to be
$2\pi$. This model is then related to a signal $s(t)$ through a phase
trajectory $\phi(t)$ as
$$ s(t) = p(\phi(t), t). $$
The instantanious frequency or \emph{pitch}\footnote{Note that a more
  accurate term for pitch would be `inverse of period', since the
  perceptual quality pitch is not purely related to periodicity.  In
  this document we treat both terms to mean the same thing.} is
defined as $\omega(t) = {\difft} \phi(t)}$.  This model is an
alternative to time--frequency modeling, and allows us to represent
pitch and timbre independently.  One could call it ``time--phase
modeling''.  We define \emph{timbre} as the information associated to
a single time instance $t_0$.  This is associated to a function on the
circle as $P_{t_0}(\phi) = p(\phi, t_0)$.

\subsection{Discrete Model}
A discrete version of the model can be constructed by choosing $N$ to
indicate the number of points to sample on the circle, and performing
the sampling along the $t$ axis at multiples of $H$ for the timbre and
multiples of $T$ for phase trajectory and associated signal.  This
gives the sequences
$$
\begin{array}{rcl}
p_{kl}   &=& p(2\pi l / N ,kH) \\
\phi_n   &=& \phi(nT) \\
s_n      &=& s(nT). \\
\end{array}
$$
We define $P_{k}$ as the vector with $N$ components obtained from
$p_{kl}$ with $l$ ranging from $0$ to $N-1$.  Note that $H$, the
``haptic'' period that fixes the update step of the timbre
information, can be much larger than $T$, the signal and phase update
step.

An important note is that there is no \emph{direct} relationship
between the timbre sequence $p_{kl}$ and pitch trajectory $\phi_n$ on
the one hand, and the signal $s_n$ on the other hand.  They are
\emph{only} related through the continuous functions $p(\phi, t)$ and
$\phi(t)$.  This means that \emph{interpolation} is an essential part
in reconstructing $s_n$ from $p_{kl}$ and $\phi_n$.  We will come back
to the techincal details and inaccuracies of interpolation later, but
will assume for now that it is possible to switch between the discrete
and continuous model without loss of information.  This will allow a
slight abuse of notation in that $P_k$ (the concrete representation)
implicitly refers to a region in the function $p(\phi, t)$.

A modeling technique based on interpolation seems to be a drawback.
However, it is the main goal of this paper to convince you that
including this indirection and as a result being able to make the
sampling for $p(\phi, t)$ fixed is a workable trade--off, since it
greatly simplifies manipulation of $P_k$, which we are most interested
in.

\subsection{Frequency Domain}
The frequency domain representation of $p$, defined by computing the
Fourier Transform (FT) of $p(\phi,t)$ over the phase dimension $\phi$,
allows the DWS model to be related to a sum--of--sines model.  For
each $l \in \ZZ$ this gives a fuction which represents the phase and
amplitude of harmonic $l$ at time $t$.
$$c_l(t) = {1 \over {2\pi}}}\int_0^{2\pi} \exp(- j l \phi) p(\phi, t) \diff \phi.$$
The inverse is given by
$$p(\phi, t) = \sum_l c_l(t) \exp(j l \phi).$$
Interpolating the latter using the phase trajectory $\phi(t)$ gives a
representation for $s(t)$ decomposed as a sum of harmonicly spaced,
phase and amplitude modulated complex exponentials
$$s(t) = \sum_l c_l(t) \exp(j l \phi(t))\\.$$
Writing $c_l(t) = a_l(t) \exp(j\phi_l'(t))$ with $a_l(t) \in \RR$ and
setting $\phi_l(t) = \phi(t) + \phi_l'(t)$ gives the alternative
formulation
$$s(t) = \sum_l a_l(t) \exp(j l \phi_l(t)),$$
which models $s(t)$ as a sum of amplitude and phase modulated complex
exponentials without the harmonicity relationship.  The term ${\difft}
\phi_l'(t)$, which is the rotation speed of the complex spectral
components over time, can then be viewed as frequency offset of each
exponential to the ideal harmonic frequency $n {\difft} \phi(t)$.
This representation (harmonic + per component frequency offset) is
useful later to introduce slight inharmonicities in mostly harmonic
models, related to frequencies of beating patterns.


\section{Relation to Existing Techniques}

The DWS model is related to several other sound synthesis techniques.
\begin{itemize}
\item By eliminating time variation of the timbre ${\pdifft} p = 0$ we
  get ordinary wavetable playback.
\item The DWS model is a variant of the Scanned Synthesis model, where
  $P_k$ is produced as the output of a discrete dynamical system.
\item The DWS model can be related to the harmonic and inharmonic
  sum--of--sines model, by moving inharmonicities to the timbre
  sequence $P_k$.
\end{itemize}

The relation to ordinary wavetable synthesis isn't so interesting as
it is fully contained within the DWS model.  Relation to
sum--of--sines was already explained, and is an essential element of
DWS.  It has to be stressed though that DWS does not work well for
sounds which are not largely harmonic (harmonic + small phase
differences) due to phase cancellation effects.  This will be
explained further in the section about interpolation.  DWS is most
related to Scanned Synthesis, and it could be argued that both
techniques essentially talk about the same subject, be it from a
different perspective.

\subsection{Scanned synthesis}

If we take the vector $P_k$ to be the output of a dynamical system
sampled at $k$ rate (the haptic period $H$), we get the method of
scanned synthesis. The most general form for a such a dynamical system
is
$$\matrix{X_{k+1} \\ P_{k}} = \psi(\matrix{X_k \\ U_k}),$$
where $X_k$ is the state vector, $U_k$ is the system input and $P_k$
is the system output.  When the state update function $\psi$ is
linear this is ususally represented by
$$\matrix{X_{k+1} \\ P_{k}} = \matrix{A & B \\ C & D} \matrix{X_k \\ U_k},$$
where the poles of the system are determined by the eigenvalues of
matrix $A$.  This representation is not unique: we can transform $X_k$
to another base $X_k' = Q^{-1}X_k$ using an invertable matrix $Q$ with
columns representing the new base for the state vector. The new system
matrices then become $A' = Q^{-1}AQ$, $B = Q^{-1}B$, $C'=CQ$ and $D' =
D$.  If the eigenvalue decomposition of $A$ exists, the equations can
be transformed using $Q=S$, with $S$ the matrix containing the
eigenvectors of $A$. This makes $A'$ diagonal, containing the
eigenvalues of $A$.  

Let's look at the diagonal form for a system without inputs, which
means originally $B=0$ and $D=0$.  With $S$ the matrix composed of
eigenvectors as columns and $E = S A S^{-1}$ a diagonal matrix
containing the corresponding eigenvalues, the system
$$\matrix{X_{k+1} \\ P_{k}} = \matrix{E \\ S} \matrix{X_k}$$
can then be interpreted as a bank of oscillators which is a mix of the
``eigen waveforms'' contained in $S$.  The mixing amplitude and phase
is directed by $k$ independently decaying envelope functions.


A special case arises when $A$ is circulant, meaning $a_{ij} =
a_{i'j'}$, with $i' = (i-1) \mod N$ and $j' = (j-1) \mod N$.  It
ensures that each element of $X_k$ has the same relationship to its
neighbouring elements, with all elements arranged on a circle.  If $A$
is circulant it can be diagonalized by the discrete Fourier transform,
with $Q = F$, a matrix with columns made up of DFT base functions.
This means the elements of the state vector $X_k$ represent the
instantanious amplitudes and phases of the DFT base functions.  When
all eigenvalues of $A$ have a magnitude strictly smaller than $1$, the
signal associated to this model after interpolation is a sum of
harmonicly spaced damped exponentials.


\section{Interpolation}

Reconstructing the function $p(\phi,t)$ from a sequence of $P_k$ and
obtaining a discrete signal $s_n$ from the phase trajectory $\phi_n$
is hindered by two phenomena.
\begin{itemize}
\item Frequency aliasing of $s(t)$.  Suppose it is possible to
  perfectly reconstruct $p(\phi,t)$ from $P_k$ and $\phi(t)$ from
  $\phi_n$.  The resulting signal $s(t) = p(\phi(t),t)$ might contain
  frequency components that cannot be represented in the discrete
  domain and will create aliasing distortion.
\item Reconstruction of $p(\phi, t)$ from $P_k$, in the case where
  $P_k \in \CC^N$ might be hindered by phase cancellation phenomena in
  the case where naive linear interpolation is used.  This happens
  when sounds are inharmonic.  This is related to undesirable
  modulation problems in windowed sinusoidal modeling.
\end{itemize}


The challenge is to divise an interpolation strategy that minimises
the error due to these two effects.  In general the solution is to
filter the $P_k$ before interpolation and to minimise phase
cancellation by interpolating the phase trajectories instead of the
amplitudes.  The former is relatively straightforward.  The latter is
quite difficult as it is expensive to perform.

Both problems are straightforward to solve in the frequency domain
(DFT of $P_k$).  The challenge lies in determining a proper sampling
point \emph{inbetween} successive $P_k$.  This largely depends on the
contents of the $P_k$ and the sampling period determined by $\phi_n$.

Another question is: for inharmonic sounds, does it \emph{really}
matter so much to get phases wrong here and there?  There is already a
lot of cancelling and amplitude modulation going on as a consequence
of the inharmonicities.  Not getting this exactly right might not be
so problematic after all.


\section{Formant Structure}

An important drawback of wavetable based synthesis is the coupling
between pitch and \emph{formant} structure, which refers to the
spectral envelope of a signal, which can be properly defined as the
result of low order \emph{linear prediction}.

In a pure synthesis framework this relation can be made explicit.
Simply make sure that the $P_k$ are postprocessed with a spectral
envelope which cancels the phase trajectory frequency, keeping the
resulting spectral envelope of $s_n$ under control.  Since this is
rather trivial it will no longer be discussed.  However, can this
process be automated by performing operations on $P_k$ in a
formant--neutral way?  It is straightforward to \emph{stretch} a
formant spectrum by moving the poles around.

This raises the question about placing standard linear prediciton
modeling techniques in a \emph{periodic} framework.  There are
probably a lot of shortcuts that can be exploited by involving the
DFT.

\subsection{Periodic Linear Prediction}


Entry: Implementing Fermat Prime order Galois Fields
Date: The Jun 18 22:48:20 CEST 2009
Type: tex

\def\F{\mathcal{F}}
\def\R{\mathcal{R}}

\section{Introduction}

Some words on the ring of integers modulo $2^{2^{k+1}}-1$, which is
easily implemented using ordinary hardware operations.  Such a ring
can be used to implement the finite fields with $2^{2^{k}}+1$
elements, for $0 \leq k \leq 4$.


\section{Embedding $\F_a$}


Let $\F_a$ denote the field of integers modulo the prime $a$, and
$\R_a$ the unital commutative ring of integers modulo $a$, not
necessarily a prime. We want to find out how to embed a field $\F_a$
in a ring $\R_{ab}$, where $b$ is not necessarily prime.

Consider the map
$$f : \R_{ab} \to b\R_{ab} : x \mapsto x b e = x 1_a,$$ which is a map
from $\R_{ab}$ to its ideal $b\R_{ab} = \{ x b : x \in \R_{ab} \}$. If
we can find an element $1_a = be$ in $b\R_{ab}$ such that $1_a ^ 2 = 1_a$,
this map is a ring homomorphism. If $1_k \neq 0$, this map is nontrivial,
and it gives a field structure to the elements of the ideal $b\R_{ab}$.

The fact that $f$ is a ring homomorphism follows from $f(x)+f(y) = x
1_a + y 1_a = (x+y) 1_a = f(x+y)$ and $f(x)f(y) = x 1_a y 1_a = x y
1_a^2 = x y 1_a = f(xy)$. If we require $0 < e < a$ we have $1_k \neq
0$, and all the $a$ elements in $b\R_{ab}$ will be reached by $f$.

The kernel $K_f$ of $f$, determined by $\{x : x1_a = 0\}$, has $b$
elements, and is identified as $a\R_{ab}$, the ideal generated by $a$. This
ideal is maximal since $a$ is prime, so the quotient $\R_{ab} /
a\R_{ab}$ is a field containing $a$ elements, which is essentially the
same as $\F_a$. 

The canonical decomposition of $f$ seen as an endomorphism of
$\R_{ab}$ can be decomposed as
$$f = e \circ i \circ h,$$
where 
$$h : \R_{ab} \to \R_{ab} / a\R_{ab} : x \mapsto x + a\R_{ab}$$
is the homomorphism onto the quotient field which introduces the field structure,
$$i : \R_{ab} / a\R_{ab} \to b\R_{ab} : x + a\R_{ab} \mapsto x1_a$$ 
is the isomorphism from this quotient field to the ideal generated by $b$, and 
$$e : b\R_{ab} \to \R_{ab} : x1_a \mapsto x1_a $$ is the embedding of
this ideal in the ring. This is the embedding we're after, it carries
the field structure introduced by $h$ and preserved by $i$ over to a
subset of $\R_{ab}$.


\section{Mersenne factors}

We call the number $M_{k} = 2^{2^k}-1$ a Mersenne number and $F_{k} =
2^{2^k}+1$ a Fermat number. For $k \geq 0$ we have 
$$M_{k+1} = F_{k} M_{k}.$$ For $0 \leq k \leq 4$ the $F_{k}$ are
prime, this makes $M_5 = 2^{32}-1$ rather special, as the product of
the first 5 Fermat primes.
\begin{align*}
2^{32}-1 &= (2^{16}+1)(2^{16}-1) \\
         &= (2^{16}+1)(2^{8}+1)(2^{8}-1) \\
         &= (2^{16}+1)(2^{8}+1)(2^{4}+1)(2^{4}-1) \\
         &= (2^{16}+1)(2^{8}+1)(2^{4}+1)(2^{2}+1)(2^{2}-1) \\
         &= (2^{16}+1)(2^{8}+1)(2^{4}+1)(2^{2}+1)(2^{1}+1) \\
\end{align*}
Note that this is the end of the known chain for prime Fermat
numbers. As of this time, for $n\geq5$ all \emph{tested} Fermat
numbers $2^{2^n}$ are composite. Wether there are more Fermat primes
is not known, and not really of interest to us now.

\section{Circular Addition}

Why are these Mersenne numbers interesting? Addition modulo $M_{k}$ can be
seen as \emph{circular binary addition}. Looking at the structure of a
binary adder, each bit is connected to the last using a \emph{carry
bit}. By linking the carry output of the most significant bit's adder
to the carry input of the least significant bit, one obtains addition
modulo $M_k$, with $2^k$ the number of linked adders.

In case $N=2^k$ is the wordlength of a machine, or a multiple thereof, the
operation of reducing a number of $2N=2^{k+1}$ bits to one modulo $M_{k}$ is
simply
\begin{align*}
c 2^N + x 
&= c(2^N - 1) + c + x\\
&= c + x
\end{align*}
which is very cheap to do if $N$ is a multiple of the machine
wordlength. For addition of two numbers modulo $\mod M_k$, the result
has $N+1$ bits, with $c$ the carry flag. For multiplication, the
result has $2N$ bits. This results in the instruction sequences
``ADD,ADDC'' and ``UMUL,ADD,ADDC''.

So, implementing the ring $\R_{M_k}$ is efficient. This is not so
interesting in itself, since addition modulo $2^{2^n}$ is a ring we
get for free. However, this does enable us to implement the field
$\F_{F_{k-1}}$ in an efficient manner.

[Note] This does not reduce $111\ldots1$ to $000\ldots0$, so we have
two elements representing $0$.


\section{Algebraic structure of $\F_{F_k}$}

\def\R{\mathcal{R}}

The multiplicative group of the fields $\F_{F_k}$ is the cyclic group
of order $N=2^{2^k}=F_k-1$ and is thus highly composite. This has the
advantage that a fast radix $2$ FFT-like algorithm exists to compute
the number theoretic transform
$$X_n = \sum_{k=0}^{N-1} x_n \omega^{-nk},$$ with $\omega$ a primitive
root of unity. Here $\omega$ is a generator of the multiplicative
group, meaning the smallest positive integer $k$ satisfying $\omega^k
= \omega$ is $N=2^{2^n}$. The inverse transform is given by
$$x_n = N^{N-1}\sum_{k=0}^{N-1} X_n \omega^{nk}.$$

\section{Finding $1_{F_k}$}

Following our general notes, it is possible to represent $\F_{F_k}$
using the ideal $M_k \R_{M_{k+1}}$ of the ring $\R_{M_{k+1}}$ if we
can find the unit element
$$1_{F_k} = M_k e_k.$$ 
In order for this unit to yield the homomorphism $f_k$ which will lead
us to a representation of the field $\F$, we need
$$1_{F_k}^2 = 1_{F_k} \mod M_{k+1}.$$ 
All terms, including the modulo divisor, are a multiple of $M_{k}$, so
we can safely divide out this number yielding the equation
$$M_k e_k^2 = e_k \mod F_k.$$
Since $F_k$ is prime, we can take inverses, so we can multiply both sides
by $(M_ke_k)^{-1}$, which gives the solution
$$e_k = (M_k)^{-1} \mod F_k.$$
This gives us $1_{F_k}$.

We can do better and find an explicit expression by observing that
\begin{align*}
e_{k} 
&= (F_k - 2)^{-1}  \\
&= (-1) 2^{-1}     \\
&= 2^{2^k - 1}
\end{align*}
using $(-1)^{-1} = -1 = 2^{2^k}$, all modulo $F_k$. This gives a nice
expression for the unit
$$1_{F_k} = 2^{2^k - 1} (2^{2^k} - 1).$$
In binary this gives
\begin{align*}
1_0 &= 01 \\
1_1 &= 0110 \\
1_2 &= 0111.1000 \\
1_3 &= 0111.1111.1000.0000 \\
1_4 &= 0111.1111.1111.1111.1000.0000.0000.0000
\end{align*}

% is dit eigenlijk wel uniek? lijkt me van wel, maar ben niet helemaal zeker


Entry: Matrix Decompositions
Date: Fri Jun 19 10:57:12 CEST 2009
Type: tex

\subsection{Comparing Matrices}

There are different ways in which matrices can be \emph{alike}.  The
following talks about 3 relations called \emph{equivalent},
\emph{congruent} and \emph{similar}.  Two matrices $A$ and $C$ are
equivalent if there exists non--singular matrices $R$ and $Q$ such
that $$C = RAQ.$$ Two matrices are equivalent iff they have the same
rank. This is related to Gaussian elimination for solving linear
systems and computing a matrix inverse.  If $R=Q^{-1}$ the matrices
are \emph{similar}.  They represent the same linear transform, but in
another basis. All square matrices are similar to their Jordan
form. Normal matrices are similar to a diagonal matrix.  A consequence
is that similar matrices have the same eigenvalues.  If $A$ and $C$
are hermitian, and $R=Q^T$, the matices are \emph{congruent}.  They
are both congruent to the same signature matrix, so have the same
signature, and represent essentially the same quadratic form, but in
another basis.  If two matrices are \emph{both} similar and congruent,
they are called orthogonally similar and $R=Q^T=Q^{-1}$.  Hermitian
matrices are orthogonally similar to a real diagonal matrix.

\subsection{Matrics Semantics}
When studying matrix algorithms, it is important to keep in mind what
they represent.
\begin{itemize}
\item Both the normal form for the eigenvalue decomposition and the
  more general Jordan form study the \emph{linear transform} related
  to a matrix, by performing a change of basis.  Here a the matrix
  represents a linear function $f(x) = Ax$.
\item The normal form for the congruence transform studies the
  \emph{quadratic form} associated to a symmetric or hermitian matrix,
  by transforming a change of basis.  More generally, a symmetric
  matrix $A$ can represent a \emph{bilinear form} $f(x,y) = x^TAy$.
  Likewise a \emph{symplectic form} is represented by antisymmetric
  matrix.

\end{itemize}


%[1] http://mathworld.wolfram.com/EquivalentMatrix.html


Entry: Matrix Multiplication
Date: Fri Jun 19 10:58:12 CEST 2009
Type: tex


% GOLUB & VAN LOAN:
% 3 manieren om matrix vermenigvuldiging te zien
% * inner product A^T B
% * outer product A B^T
% * lineair combination of columns of A / rows of B


% dus: Ax = b
% MATRIX = operator, acting on (row) vector x on its left
% -> rows are linear functionals, mapping vector x to a coordinate of b (number)
% -> columns are base vectors, coordintes of x map these columns to vector b

% exchange all these left--right row--column: x^T A^T = b^T
% MATRIX = dual operator, acting on (column) vector = linear functional x^T on its right
% -> columns are linear vectors, to which the linear functional x is applied to give a coordinate of b^T
% -> rows are linear functionals, coordinates of x combine these to functional b^T

% in mensentaal
% vector = pijl
% linear functional = coordinaatextractor

% hetgeen wat alle problemen veroorzaakt: 
% -> linear functional = ook een vector
% -> matrix, interpreteerbaar als een set van functionals, of een set van vectoren, is ook een vector!

% matrices (lineare transformaties) zijn vectoren die je niet alleen kan optellen, maar ook kan vermenigvuldigen!

% aint this fun :)

% toch vreemd he, hoe zo'n absurd ding als matrixvermenigvuldiging zo
% verwoven kan geraken met automatisme zodat ge eigenlijk niet meer
% weet wat ge aan het doen zijn. en als ge gaat kijken wat ge aan het
% doen zijt blijkt het verschillende interpretaties te hebben. of hoe
% wiskunde een soort middenweg kan zijn, die verschillende dingen die
% duidelijk verschillende dingen lijken samenvat in 1 ander abstrakt
% ding met de verschillende dingen als eigenschappen.. dus dit hoort
% hier eigenlijk niet echt thuis, maar het is een mooie oefening om
% mijn intuitie door de jaren opgebouwd eens proberen te vatten in
% woorden. vrij moeilijk.


The algebra of finite dimensional linear operators is associative.
This means that the objects that \emph{operate} and the objects that
are being \emph{operated on} are somehow interchangable, because of
the law of associativity, as shown in $$ABx = A(Bx) = (AB)x.$$
Semantically, this has quite some implications.  If $A$ and $B$ denote
linear transforms operating on the object on their right, and $x$
denotes a vector, the above means the transform of a transformed
vector is equal to a transformed vector, with the latter transform the
combination of two other transforms.


% Maybe it's just because I've been out of the loop for a while, and I
% am re--discovering the beauty of math in general and the beauty of
% linear algebra in particilar, but in this particular case, the
% frivolous interplay of the \emph{concrete} and the \emph{abstract}
% is pretty amazing. Reality really changes by finding a good way to
% describe it\ldots


Instead of talking about vectors and transforms in an abstract way, it
is possible to make matters more \emph{concrete} by fixing a basis and
working only with coordinates taken from the field from which the
vector space is constructed.  For linear transforms these coordinates
are expressed as matices.  Probably the reason why engineers like
matrices is that in some sense it is simpler to talk about structured
collection of numbers and rules for addition and multiplication, than
about abstract entities.  However, we should never forget that it
sometimes makes sense to talk about the objects they represent more
abstractly\footnote{ To me, and to most people I know,
  \emph{understanding} basic linear algebra, next to the opaque rule
  of matrix multiplaction has always been about putting it in terms of
  simple geometry.  To see what they \emph{do}.  Points, line
  segments, cubes and spheres transformed under rotations,
  translations, reflections, scalings and projections, and
  combinations thereof. A common pitfall in thinking about rotations
  is to see (elementary) rotations happen around an axis.  Rotations
  really happen in a plane. This is clearest to see in $2$ dimensions,
  since there is not much left to rotate around, except for a single
  point. Only in $3$ dimensions there is always a $1$--dimensional
  subspace invariant under the rotation, also called the axis. When we
  go to $4$ dimensions, there is a $2$--dimensional subspace invariant
  under (elementary) rotations. It is even possible to combine two
  independent elementary rotations. Independent in the sense that they
  commute, which is not true for rotations in general. This is
  equivalent to saying a $4$--th order linear system can have $2$
  distinct eigenfrequencies. An interesting thing to visualize is the
  nilpotent transform, which can be seen as a rotation/reflection
  followed by a projection, where the rotation is such that the space
  onto which is projected does not contain a subspace which is
  invariant under the rotation/reflection. In other words, the
  projection and rotation work as ``squeeze'' and ``turn'' such that
  when both operations are iterated a couple of times, the result
  eventually shrinks to a point. In two dimensions this can be
  visualized as collecting bread crums spread out over the surface of
  a table into a pile in the following way.  Rotate the table $90$
  degrees, wipe all the crums to the left side using the ruler. Then
  do this again.  Of course, nilpotent transforms are much easier
  visualized as shifts of coordinates of vectors, where on each shift,
  one or more things disappear into the void.} because the definition
of matrix multiplication
$$c_{ij} = \sum_k a_{ik} b_{kj}$$
is not very intuitive\footnote{There is unfortunately just one thing
  to do with this: figure out what it means when you see two of these
  grids next to each other, and train yourself to do the manipulation
  without thinking about how to do it. For me it is a very visual
  thing: I move the columns of $B$, rotate them $90$ degrees
  counterclockwize so they align with the rows of $A$. Then, in close
  proximity, they \emph{zip up} into a single number.}.  Here $k$
ranges from $1$ to the \emph{width} of $A$, with elements $a_{ij}$ on
row $i$ and column $j$. This range is equal to the \emph{hight} of
$B$.  


In the following I will use the ``engineering shortcut'' of talking
about vectors, linear forms and linear transforms as if they were the
same as matrices of dimension $N \times 1$, $1 \times M$ or $N \times
M$ respectively.  It is important to see that mathematically, they are
\emph{distinct} objects.  The algebra of matrices \emph{behaves as}
the algebra of abstract linear operators because both are related by
an isomorphism tied to a particular basis in the abstract vector
space.  In matrix algebra the \emph{transpose} operation represents
the endomorphism of a (finite) abstract vector space which relates
vectors and linear forms.  The ease at which the matrix multiplication
formula can be re-interpreted is exactly because of the existence of
these morphisms.


% oppassen met algemeenheden: tis niet noodzakelijk allemaal
% orthogonaal!  maar dat zeg ik ook niet.. similar tot jordan vorm ist
% wel ok


An interesting thing to observe about the matrix multiplication rule
is that it is recursive. This means you can chop 2 matrices you want
to multiply in rectangular pieces, give the pieces a name, and apply
the multiplication rules to the pieces.  The result is the same, only
less or more \emph{unrolled}. There are only 2 things that need to be
taken into account. One is that multiplication is not commutative, and
two is that the number of horizontal pieces on the left and the number
of vertical pieces on the right are the same. So, we can interpret the
$a$'s, $b$'s and $c$'s in the matrix multiplication formula not only
as scalars, but as matrices or vectors, taking into account that we
loose commutativity doing so.

% This is an illustration of the fact that matrices can be defined
% over rings.

This observation allows us to cast some light on the different
operations a matrix multiplication can represent. Let's start our
journey with $$Ax = b.$$ This is usually associated with a set of $M$
linear equations in $N$ unknowns, but only if we really don't know
$x$. The matrix $A$ has dimensions $M \times N$, and both $x$ and $b$
are row vectors (matrices with width 1) of dimension $N$ and $M$
respectively.  There are several ways to look at this. The most
obvious one is the more abstract one: the matrix $A$ is the
representation of an operator, operating on the vector $x$ on its
right. The result of this operation is another vector $b$.  Now,
taking a closer peek at the internals of $A$, we recognize it is built
out of $N$ columns of numbers. Each of the columns of $A$ can be seen
as a vector so we give column $i$ the name $a_i$.  This enables us to
write the previous formula using the notation of submatrices as
$$\matrix{ a_1 & a_2 & \ldots & a_N } \matrix{ x_1 \\ x_2 \\ \vdots \\ x_N } = b.$$
Here the $a_i$ and $b$ are $M$--dimensional vectors, and the $x_i$ are
scalars. Working out the matrix product using the usual rules gives us
$b = \sum_i a_i x_i$. Moreover, because $x_i$ are scalars we have also
$b = \sum_i x_i a_i$.  So $b$, a column vector, is a linear
combination of the vectors $a_i$, which are in our case the columns of
$A$.  This expresses $b$ (vector) in terms of coordinates $x_i$
(number) with respect to the basis $a_i$ (vectors).  The space spanned
by all possible combinations is called the column space.

% \footnote{ Note
%   that this looks like an interior product. When $M = 1$ the $a_i$ are
%   scalars, and we have an interior product between two
%   $N$--dimensional vectors.  Can we interpret the case $M > 1$ as an
%   interior product?  Depends on what we mean by interior product. If
%   we interpret interior product as a sort of \emph{average} combined
%   with a \emph{scaling}, then indeed, the analogy holds.}.

% Note, this gives a nice interpretation of what orthogonality
% means. If you take two strings of numbers, $a$ and $b$, and you take
% the average of the first string according to a weighing defined by
% the second string, both are said to be orthogonal if the weighing
% yields an average of zero: $a^T b = 0$.

% This has the nice interpretation of distance and mass. Suppose $a$
% represents the masses of a collection of fruits, $b$ represents a
% possible arrangement of the fruits on a scale, indicating distance
% (positive or negative) to the suspension point. The scale is
% balenced if and only if the vectors $a$ and $b$ are orthogonal.

Another way to chop up $A$ is like $$\matrix{ a_1^T \\ a_2^T \\ \vdots
  \\ a_M^T } x = \matrix{ b_1 \\ b_2 \\ \vdots \\ b_M }.$$ Here
$a_i^T$ are the rows of $A$, written in terms of the column vectors
$a_i$, the columns\footnote{I've found it benificial in most cases to
  represent row vectors as the transpose of column vectors, not to
  confuse my visual system too much when recognizing formula
  patterns.}  of $A^T$.  This arrangement expresses the scalar
elements of $b$ as inner products between $x$ and rows of $A$, or
$b_i = a_i^T x$.  Written as this, $x$ takes the average of the
elements on each row of $A$ independently, yielding $M$ scalars.
Another way to put this is that the rows of $A$ represents linear
functions (functionals) mapping vectors to numbers, and each row
applied to $x$ gives one element of $b$.

Once you can interpret the formula of matrix-vector multiplication, it
easily generalizes to matrix-matrix multiplication $AX = B$, with $X$
and $B$ matrices containing columns $x_i$ and $b_i$
respectively. Which gives
$$\begin{array}{lcl}
  AX & = & A \matrix{x_1 & \ldots & x_K}\\
  & = & \matrix{Ax_1 & \ldots & Ax_K}\\
  & = & \matrix{b_1 & \ldots & b_K}\\
  & = & B.
 \end{array}
$$
Matrix multiplication on the left can be seen as an independent
transform of all the \emph{columns} of the matrix it operates on. This
can be seen as a function which maps a matrix $X$ to a matrix
$B$. Note that the only thing we are doing is still to chop matrices
up into parts, and express the relationships of the parts by looking
at the matrix multiplication formula from different angles.
Taking the transpose of the expression above gives the dual
interpretation: a matrix operating on the right independently
transforms the \emph{rows} of a matrix\footnote{For me it makes sense
  to only think in terms of ``operator is on the left, all columns of
  the operand are operated on independently'', instead of mixing the
  left--right and row--column stuff.  Because of non--commutativity,
  left and right \emph{does} matter a lot, but luckily, they are
  related by the transpose (or Hermitian conjugate).}.

% \footnote{ Note, that the dual of a vector refers to a vector in the
%   space of linear functionals, the space of all linear transforms from
%   the vector space to the base field.  Since we work in the less
%   abstract settings of $\RR^N$, vectors are column vectors, while their
%   duals, the linear functionals, are row vectors, transposes of column
%   vectors.  Application of a functional is nothing but the
%   multiplication of $a^T$ with a vector $x$, or $a^Tx$, which is the
%   inner product of vectors $a$ and $x$.}.

% Strange how the brain does not automaticly incorporate this kind of
% symmetry. Well, you can learn it though.

% Note that operator on the left/right relates to computer languages
% like LISP or FORTH.


Let's interpret a multiplication differently, as inner and outer
products. Take vectors $a$ and $b$, both unit norm, so we can
concentrate on the effect of the angle between these vectors. Their
length has no real effect on products, other than a change of scale,
so we can safely leave them out of the picture.  We form the inner
product as $b^Ta$. This is a scalar, equal to the cosine of the angle
between both vectors. If this product is zero, $a$ and $b$ are said to
be orthogonal. The inner product of a vector and itself is its norm
squared. In our case, norm is $1$, meaning angle is $0$, meaning both
vectors are the same.

The outer product $b a^T$ is quite different. It is a linear operator
(a matrix) which represents a projection on the space spanned by $a$,
followed by a rotation in the direction of $b$. We can also call this
a reflection, since there's no handedness to preserve in a
1--dimensional space. Rotations (even number of reflections) and
reflections (odd number of reflections) have essentially the same
effect. The outer product of a vector and itself, $a a^T$ is a
projection operator onto the space spanned by $a$.

% Now, we generalize this to multiple dimensions.


% So, things to remember. A matrix multiplication is: a collective
% transform of a (bunch of) vector(s), or a recombination of a bunch
% of vectors or scalars.

% A matrix multiplying from the left operates on columns
% independently. A matrix multiplying from the right operates on rows
% independently.

Let's have a look at inverses. With $B^T = A^{-1}$ we have $B^T A = I
= AB^T$. So the columns of $B$ and $A$ form mutually orthogonal basis
sets. For one column $b_j$ of $B$ this means it is orthogonal to all
the columns $a_i$ of $A$ with $j \neq i$.  And we have $b_i^T a_i =
1$. What does this mean?  Suppose we write $A A^{-1} = A B^T$ as
$$
\matrix{a_1 & \ldots & a_N}
\matrix{b_1 & \ldots & b_N}^T
$$
This gives us $I = \sum_k a_k b_k^T$, a complicated expression for
doing nothing!  Digging into this we can, given any independent set of
vectors $a_i$, write any vector $x$ in terms of it, by projecting it
on a dual basis $b_i$. In other words, associated with any basis
$a_i$, there is a set of linear functionals $b_i$, which when applied
to a vector as $b_i^Tx$, give the cordinates of $x$ in the new basis
$a_i$, yielding the expression $x = \sum_k a_k (b_k^T x)$.


What can we learn about this? What is this duality all about? In fact,
it gives us always two ways of looking at the same thing. For example,
we saw above that $A^{-1}$ is just a disguised form of some other
matrix $B^T$.  An inverse can always be interpreted as the dual of
some other matrix, where this other matrix contains the dual basis for
the columns of the original matrix.  Stretching this a bit, a matrix
can either be interpreted colum-wize, as a collection of base vectors,
where the vector it operates on is a set of coordinates Or it can be
interpreted row--wize, as a collection of linear functionals producing
a set of coordinates of some basis. Operating \emph{on} a matrix from
the left changes rows independently, from the right changes columns
independently. Conjugate (transpose, Hermitian conjugate) exchanges
the dual views.

% Next to that, we can chop it up into blocks to aid our understanding
% of certain properties and operations.

% Of course, in the abstract setting, coordinates are always relative to
% some basis.  Matrices are defined exactly in this matter, as grids of
% coordinates, and \emph{true} vectors are coordinate free
% entities. This doesn't remove the fact that a matrix, a block of
% numbers, \emph{behaves} as a vector, so we can freely interpret it as
% we like, either as a vector being acted on, or as a set of coordinates
% explaining one vector in terms of other vectors. It's this footloose
% property that makes linear algebra so powerful for formulating
% real-world phenomena, while remaining very earth--bound, because it's
% just a pack of numbers in the end.

So, how to read matrix multiplications?  When you see the same matrix
on both sides of another one, there is both a row \emph{and} a column
transform going on. In other words, there is a transform to another
basis and back again. There are basicly two types: a congruence
transform when the expression is about the signature of a quadratic
form, and and a similarity transform, when the relative scaling
(eigenvalue decomposition) is studied.  For example the similarity
transform $X' = Q X Q^{-1}$, with $Q$ regular.  The columns of $Q$
represent a basis. So, building upon our investigation of what an
inverse actually is, the first effect of $X'$ is to yield the
coordinates of the vector it operates on in terms of the columns of
$Q$.  Then, in this new coordinate system, the operation $X$ is
performed on the coordinates, after which, the coordinates are used to
build a vector using as basis the columns of $Q$.  When the eigenvalue
decomposition exists, $X$ can be made diagonal, so the effect of $X'$
can be decoupled into a set of independent actions on the coordinates
in the basis represented by the columns of $Q$, which in this case are
the eigenvectors.  If you see $Q X Q^T$ then you can safely bet that
$X$ is symmetric, and we are mostly interested in its signature,
i.e. the kind of quadratic form the matrix $X$ produces.


Entry: Math and Music: Harmony vs. Melody
Date: Fri Jun 19 16:04:53 CEST 2009

In a talk with Axel yesterday the harmony / melody topic came up.
Axel's point was that it might be better to study harmony or a
harmonious instrument (one that can play chords like guitar, piano)
before studing melody.

My remark was that it's strange to see melody from a mathematical
point, because if you look at frequency ratios, what you see first is
harmony.  Then in the background of this one could develop tone scales
using harmonic intervals, but it's never possible to really decouple
melodic structurs from the harmonic background.

Axel remarked that in the history of (western?) music there was this
move from harmony -> melody -> re-interpreting harmony in terms of
melody.  That in classical music, harmony is somewhat implicitly
understood with melody brought to the front.  


Entry: Complex Number Wave Shaping
Date: Sat Jun 20 15:25:27 CEST 2009
Type: tex

The idea present in this paper is related to the operations of
\emph{modulation}, \emph{wave shaping} and \emph{sample playback}.
These can be represented respectively by an injection (or a curve)
$\RR \to \MM$ where $\MM$ is a manifold, a transformation $\MM \to
\MM$ and a projection $\MM \to \RR$.  Combining these operations
allows the characterization of sound signals as $\RR \to \RR$ maps.

\section{Complex Numbers}

Before going to exotic structures, let's have a look at $\MM=\CC$.  A
sinusoidal oscillator can be modeled using the curve $f(t) =
e^{i\omega t}.$ The trivial projection onto $\RR$ is to take the real
or imaginary part.  Using these this we can concentrate on
transformations of $\CC$.  Sticking to transformations that can be
implemented directly using arithmetic functions we get addition,
multiplication and function composition which lead us to polynomials,
and division which leads us to infinite power series.

% This paper is mainly intended as an illustration of the elegance of
% working with complex signals, as opposed to the more cumbersome real
% signal approach.

\section{Polynomials}

Polynomials in $\CC$ are closed under addition, multiplication and
function composition, so it is interesting to look at what we can do
with them.  Performing $f(x) = x^2$ on $e^{j\omega t}$ yields
$e^{2j\omega t}$.  Squaring doubles\footnote{In $\RR$ on the interval
  $[-1,1]$ the same behaviour can be produced by the Chebyshev
  polynomials $T_n$.  They satisfy the relation $T_n(\cos\theta) =
  \cos n \theta$.}  the frequency.  In general an order $N$
polynomial $$p(x) = \sum_{n=0}^N p_n x^n$$ produces a mixed spectrum
according to
$$p(e^{j\omega t})=\sum_{n=0}p_ne^{nj\omega t}.$$ This means we can
talk in terms of the polynomials instead of the terms $e^{j\omega t}$,
with $p_n$ corresponding to the amplitudes of the harmonics.  The
following talks about the polynomials $p(x) = \sum^N p_n x^n$ and
$q(x) = \sum^M q_mx^m$.  To simplify notation we take $p_n$ and $q_n$
to take the value $0$ when they fall outside of the range $0,\ldots,N$
and $0,\ldots,M$ respectively.  Adding two polynomials produces
another polynomial, with the coefficients added per term
$$p(x) + q(x) = \sum^{\max(N,M)} (p_n + q_n)x^n.$$  Multiplying polynomials
\emph{convolves} the spectra, which means that one spectrum is spread
out with the pattern of the other spectrum.
$$p(x) q(x) = \sum_{n=0}^{N+M} (\sum_{k+l=n}p_k q_l)x^n.$$
Composing two polynomials yields
$$p(q(x)) = \sum_{n=0}p_n (\sum_{m=0}^M q_n x^M)^n$$ which has a less
trivial closed form.  Apart from special cases, like the Chebyshev
polynomials which obey the nesting property $T_n(T_m(x)) = T_{n
  m}(x)$, composition and more specifically iterated composition is
the realm of \emph{dynamical systems}.


% \section{Special Polynomials}


\section{Analytic Functions}

Next to polynomials, \emph{power series} 
$$f(z) = \sum_{k=0}^{\infty} a_k z^n$$ are an interesting tool.  They
are a natural generalization as polynomials of infinite order.  Power
series represent analytic functions.  Just as with polynomails, the
coefficients $a_k$ represent the harmonic content of the resulting
signal.
 

% OPGELET: filters zijn ''fourier multipliers'', en geen waveshapers!!!

% This is the ``mathematical'' Z--transform\footnote{ Note the duality
%   between time and frequency analysis, looking at the correspondence
%   between filtering and waveshaping.  In discrete filter design and
%   analysis, the spectrum is continuous and periodic and lives on the
%   complex unit circle. Filters are causal, infinite length, discrete
%   sequences (discrete impulse responses).  Here we turn the picture
%   around: signals live on the complex unit circle and are periodic
%   and continuous. Spectra are discrete (harmonic) infinite
%   series. I.e. rational waveshaping functions are the equivalent of
%   finite order discrete IIR linear filters, which are ``spectrum
%   waveshapers''. Finite waveshaping polynomials are in the same
%   sense equivalent to discrete FIR filters.}.


% The function $\frac{z}{|z|}$, which maps the complex plane onto the
% unit circle, can be used as a general purpose (nonlinear) normalizig
% function. Since it is not differentiable, it is not an analytic
% function and does not have a power series expansion, so it is not
% very useful for analysis and should be used with care.

\subsection{Fractional Linear Transforms}


The function $$f_{a,\theta}(z) = e^{i\theta}\frac{z - a}{1 -
  \overline{a} z}$$ with $z,a \in \mathbb{C}$, $\theta \in \mathbb{R}$
and $|a| \neq 1$ is an invertible map from the unit circle onto
itself.  It is called a Fractional Linear Transform (FLT).  If $|z| =
1$ then $|f_{a,\theta}(z)| = 1$.  If $|a| < 1$ this function is
analytic on the open unit disc.  The first thing to remark is that the
$f_{a,\theta}$ form a group with composition the group operation.  The
inverse element of $f_{a,\theta}$ is $f_{-a,-\theta}$.  What this
means is that performing this operation multiple times can always be
related to a single such operation, so it isn't terribly interesting.

% In the dual representation, this is of course the z transform of a
% discrete all-pass filter.  The more general fractional linear
% transform can be used as well.  The dual being a complex one
% pole/zero filter.


However, combining $f$ using \emph{multiplication} is meaningful.  It
gives rise to Schur functions. More specificly Blaschke products $f(z)
= e^{i\theta} \prod_k \frac{z-a_k}{1-\bar{a_k}z}$.  The Schur
algorithm gives an alternative representation for this
$$f_k(z) = \frac{1}{z}\frac{f_{k-1}(z) - \rho_k}{1 - \bar{\rho}_k f_{k-1}(z)},$$
with $\rho_k = f_{k-1}(0)$.  This means that when we \emph{multiply}
the outputs of several first order waveshapers we get a higher order
waveshaper with a richer behaviour.

\subsection{The Discrete Summation Formula}

What happens when we use the $a$ parameter as an input?  We set
$\theta=0$ in the following because it will only amount to a constant
phase shift.  Let $a = r z'$ with $|z'| = 1$.  Suppose $z$ and $z'$
are two unit norm complex oscillators defined by $z(t)=e^{i\alpha t}$
and $z'(t) = e^{i\beta t}$.  Our modulator driven by the two
oscillators has the form$$ \frac{z-rz'}{1-r \overline{z'}z}.$$ The
signal generated is
$$s(t) = \frac{e^{i\alpha t} - r e^{i\beta t}} { 1 - r e^{i(\alpha-\beta)t}}$$
and is periodic when $\frac{\alpha}{\beta}$ is rational.  The constant
$r$ can be seen as a modulation index.  This can be written in a more
useful way as
$$s(t) = e^{i\alpha t}\frac{1 - re^{-i(\alpha-\beta) t}} { 1 - r e^{i(\alpha-\beta)t}}.$$
The factor $e^{i\alpha t}$ is just a frequency shift and the second
factor only depends on $\delta = \alpha-\beta$, so we have
$$s(t) = e^{i\alpha t} s'(t) (1 - re^{-i\delta t}),$$
with 
$$
s'(t) = \frac{1}{ 1 - r e^{i \delta t}} = \sum_{k=0}^\infty r^k e^{i k
  \delta t}.$$ This expansion is the limit case of the discrete
summation formula (DSF)
$$
\frac{1 - r^N e^{i N \delta t}}{ 1 - r e^{i \delta t}} =
\sum_{k=0}^{N-1} r^k e^{i k \delta t}.$$ This establishes the link
between what we can call FLT modulation and harmonic spectra with an
$1/f$ envelope.  The spectrum of $s(t)$ is thus harmonic with spacing
$\delta$, except for the frequency shift $\alpha$.
$$
\begin{align*}
s(t) & =  e^{i\alpha t} s_0(t) \\
s_0(t) & = re^{-i\delta t} + (1 - r^2) \sum_{k=0}^\infty r^k
e^{ik\delta t}.
\end{align*}
$$

% .. which is the dual of truncated IIR filters.

% The more general case of fractional linear transforms still has a
% discrete spectrum.


\subsection{Formants}

Whenever we get a power series in $m(t) = e^{i\alpha t}$ we can shift
it up and down the frequency spectrum by multiplying the signal with a
polynomial in $m(t)$.  The result will still be a power series in
$m(t)$.  Alternatively, the series can be combined with a polynomial
through function composition, again yielding a harmonic power series.

Such modulation cannot shift the \emph{spectra envelope} in steps
smaller than the spacing of the harmonics.  I.e. we would want to do
this to simulate independent control of pitch and tone colour.
However, it is possible to employ an interpolation trick by
cross--fading\footnote{Combined with LFS this is essentially the Phase
  Aligned Formant (PAF) method.}  between successive harmonics.  The
function that creates such a modulator from a base modulator $m$ and a
fractional shift $x$ is given by
$$c(m, x) = (x-[x]) m^{[x]} + (1-(x-[x])) m^{[x]+1}.$$ Here $[x]$ is
the integral part of $x$, which makes $x_f = x-[x]$ the fractional
part.  Multiplying this with a power series in $m$ gives
$$c(m, x) \sum a_im^{i} = \sum \big[ x_f a_i + (1-x_f) a_{i+1} \big]
m^{i + [x]}$$ which gives another power series.  When the two factors
are synchronous, i.e. when when $m(t)=e^{i\alpha t}$, this indeed
produces a harmonic spectrum with the $a_i$ fractionally
shifted\footnote{This effect is analogous to fractional delay using
  linear interpolation. A straighforward observation is to extend the
  linear interpolation to a higher order polynomial interpolator.  A
  polynomial fractional delay filter that preserves DC needs to have a
  zero at Nyquist to introduce the $180$ degree phase discontinuity
  when $a = 0.5$. Translating this to our problem, higher order
  polynomial interpolation of nearby harmonics is the same as
  amplitude modulation of an oscillator with frequency $x$ which has a
  phase discontinuity that is disguised by a vanishing waveshaper. The
  flatness of this waveshaper can be increased by increasing the order
  of the interpolation, which widens the bandwidth, and decreases
  amplitude modulation of the carrier wave.}.  The function
$c(e^{i\alpha t}, x)$ is periodic with a very narrow bandwidth, and so
seems to behave like a single, localized oscillation in between
harmonic $n$ and $n+1$.

The method poses a problem when a time varying oscillator is
implemented, i.e. if $m(t) = e^{i\phi(t)}$ where $\phi(t)$ is
continuous but has a variable slope. Due to the locking, a change in
frequency will cause a phase jump in at least one of the locked
oscillators. This can be solved by smoothing out the discontinous
frequency change, but not without additional frequency modulation.
With locked oscillators, the frequency can only be changed without a
phase discontinuity if it is done when the phase of the basic harmonic
is $0$.


\subsection{Power Series}

From the previous it can be concluded that it is the division
operation that introduces interesting harmonics.  Let's continue with
$(1 - re^{i\alpha t})^{-1}$ some more.  An odd harmonic series can be
synthesized using
$$\frac{e^{i \alpha t}}{ 1 - r e^{i 2 \alpha t}} = \sum_{k=0}^\infty
r^k e^{i (2 k + 1) \alpha t}.$$
Squaring the powerseries is equivalent to convolving the spectral
envelope with itself.  This gives a more localized formant structure.
Taking the real part of $f$ turns the harmonic spectrum into a
symmetric one that can be shifted up (or down) in frequency to build a
formant like symmetric wave packet.  Squaring the real signal before
multiplication gives a smoother formant packet.

If $r$ is complex, we can introduce a phase shift in the
harmonics. Combining both $r$ and $\overline{r}$ we can construct
damped sinusoidal spectral envelopes.  This enables us to use standard
sinusoidal methods to model a spectral envelope.  If $f_{r,\alpha}(t)$
represents the power series or the DSF we see that
$f_{\overline{r},\alpha}(t) = \overline{f_{r,-\alpha}(t)}$.

[ TODO: Cleanup from here. ] Normalization can sometimes be desirable
to get a constant power output over the modulation range\footnote{When
  using a ``squashed'' normalized DSF and allowing the modulation
  index to move beyond unit magnitude, we get a very nice sounding
  broad spectrum transient when $r$ passes through $|r|=1$.
  Especially so when the change in $r$ is not made instantanious, but
  i.e. processed by a lowpass filter (dezipped). During a fast change
  in $r$, the DSF normalization (which limits the number of harmonics
  to $N$) is not valid. This gives a large signal which then will be
  squashed by the normalizing function.}.  If we define the inner
product between two periodic signals $s(t)$ and $s'(t)$ with common
period $T$ to be $\left<s,s'\right>P_p=\frac{1}{T}\int_0^T
\overline{s'(t)}s(t)dt$ and the power to be $P = \left<s,s\right>$ and
note that the components of the DSF are orthogonal, we can see the
power is equal to

\begin{equation}
\sum_{k=0}^{N-1} |r|^{2k} = \frac{1-|r|^{2N}}{1-|r|^2}
\end{equation}


% Ik moet heel goed opletten dat ik de operaties niet door elkaar
% haal!  Namelijk: optelling, vermenigvuldiging en compositie. B.V. de
% fractional linear transforms zijn een groep voor de compositie, maar
% niet voor de vermenigvuldiging. Waveshaping is compositie van
% waveshaper na oscillator.  Filtering is een Fourier multiplier, geen
% compositie. Spectral waveshaping is compressor!

% De algemene rationale is dit: sferische ruimtes geven oscillatie,
% terwijl hyperbolische ruimtes events genereren (fly-by). Nu die vage
% onzin vertalen naar duidelijke termen...

% Dit is te zien door de matrixvoorstellingen.

% Unit norm complex numbers geven oscillatie op de eenheidscircel,
% terwijl unit norm hypercomplex numbers een pad op de
% eenheidshyperbool afleggen.

% Dit is oscillatie vis a vis transienten. Als ik juist zit kan je met
% clifford algebras directe sommen maken tussen zulkse ruimtes.  Het
% resultaat is, dat na normalisatie, de hyperbolische explosie er voor
% zorgt dat de oscillaties een transient verloop hebben.


\subsection{Normalization}

The normalization function
$$\sigma(z) =  z/|z|$$
which maps $\CC$ to the unit circle, or its frequency--doubling square
$$\sigma(z)^2 = z/\bar{z}$$ map $\CC \backslash \{0\}$ to the unit circle.

\subsection{Normalized Addition}

The operation $s(a,b) = \sigma(a+b)$ is interesting as it is closed on
the unit circle and defined for all $a$, $b$ except $a = -b$.  Is it
possible to relate this somehow to the spectrum it produces?  From
ad--hoc experiments I can testify this yields very interesting sounds.

\subsection{Conclusion}

The basic idea is the same as in $\RR$.  Polynomials of $e^{i\alpha
  t}$ produce harmonic spectra in a predictable way, just like
polynomials of $\sin(\alpha t)$ and $\cos(\alpha t)$ do in $\RR$.
However in $\CC$ polynomials and power series are a bit easier to
handle.  As long as one starts from coherent modulation, the story is
quite straightforward.  Polynomials allow the introduction of integer
multiples of the base frequency, while power series (obtained by
division or other non-polynomial analytical functions) represent broad
spectra.  This suggests an alternative organization: 1. investigate
addition, multiplication, function composition driven by simple
modulators, and 2. look at power series of some easy to compute
functions.  Basicly: what can polynomials do for you + what if you add
division?


\section{Other Manifolds}
Investigate Clifford Algebras and certain matrix algebras.
(i.e. quaternions, biquaternions, \ldots).

Figure out how torsion and other non--geodesic leads to interesting
paths through compact and connected manifolds.

A lot can be done with division to generate interesting power series.
Is it possible to change this to normalization (which has division),
which is a more intuitive operation?

Also, it seems that a lot of neat tricks are possible by using matrix
embeddings of (compact) manifolds with a group action (Lie groups).
Defining an additional operation as addition in the embedding space
followed by projection onto the manifold subset in the embedding
space seems to yield interesting ways of combining two elements in
addition to the group action.


Entry: Video Codecs
Date: Tue Jul  7 09:23:02 CEST 2009

Time to get an overview of different techniques used in video codecs,
to see how they relate to resource use.  In general, MPEG-1 is mostly
for video on CD, MPEG-2 is directed at TV (broadcast + DVD) with more
emphasis on robustness, delay, different modes, ... and has an
improved audio codec.  MPEG-4 adds improved video coding and a more
general-purpose decoder, but is a hodge-podge of optionally
implemented features. Part 2 (divx) and 10 (avc) are important.

MPEG-1 video [1]: 

  - Group-Of-Pictures with Ineter-frame (keyframe) encoded as +- JPEG,
    Predicted-frame difference to previous frame incorporating motion
    vectors on macroblocks (16x16 = 4 luma 8x8 + 1 chroma 8x8).
    Bidirectional frame using forward and backward frame as
    reference.  DC frames serve as "thumbnails" for fast-forward.

  - Motion estimation works on a fixed diamond region using quarter
    pixels.  MVs are differentially encoded from neighbouring
    macroblocks (16x16 = 4 luma 8x8 + 1 chroma 8x8).

  - The DC part of the DCT coefficients is encoded differentially.
    The AC is coded in a zig-zag pattern (most energy is in the upper
    left corner around DC) which is then RLE encoded.  Quantization
    uses 5 bits (0-31).  Thresholding is adaptive (or user
    selectable).

  - The whole bitstream is Huffman encoded.
  

MPEG-2

  - Systems section: Transport Stream (lossy media like broadcast) and
    Program Stream (reliable media like DVD).

  - Video similar to MPEG-1, optimized for higher bitrates and more
    different formats (i.e. interlaced).  Audio part contains AAC,
    which is more efficient, flexible and robust.  (Part 2 = H.262)


MPEG-4

  - More advanced video coding + object oriented design.  Decoder
    behaves more like a rendering engine.  Variable block size motion
    compensation.

  - Part 2: DIVX

  - Part 10: H.264 / AVC (HD-DVD, Blue-ray) Many additions[9].  Highly
    nontrivial decoding/encoding.


Other formats:

  - Theora (in Ogg container).  Open but less well performing codec.

  - H.261 Low bit rate (ISDN) video conferencing.

  - H.263 Low bit rate video converencing.  A variant (Sorenson H.263)
    is used in Apple Quicktime and Adobe Flash Video.  Original base
    for Real Video.  Part of 3GPP (MMS).


[1] http://en.wikipedia.org/wiki/MPEG-1#Part_2:_Video
[2] http://en.wikipedia.org/wiki/Advanced_Audio_Coding
[3] http://en.wikipedia.org/wiki/MPEG
[4] http://en.wikipedia.org/wiki/Flv
[5] http://en.wikipedia.org/wiki/Theora
[6] http://en.wikipedia.org/wiki/H.261
[7] http://en.wikipedia.org/wiki/H.263
[8] http://en.wikipedia.org/wiki/Video_codec
[9] http://en.wikipedia.org/wiki/H.264/MPEG-4_AVC


Entry: Binary Coding
Date: Tue Jul  7 10:02:48 CEST 2009

[1] http://en.wikipedia.org/wiki/Huffman_coding
[2] http://en.wikipedia.org/wiki/Entropy_coding
[3] http://en.wikipedia.org/wiki/Arithmetic_coding
[4] http://en.wikipedia.org/wiki/Context-adaptive_binary_arithmetic_coding


Entry: MDCT
Date: Tue Jul  7 10:54:32 CEST 2009

"Lapped" transform.  Used in coding applications where overlap can be
used to reduce artifacts.

Note that H.264 uses several "exact-match" DCT variants.

[1] http://en.wikipedia.org/wiki/Modified_discrete_cosine_transform


Entry: Navier-Stokes
Date: Mon Sep 28 14:47:26 CEST 2009

Time to look at the Navier-Stokes equation[1] and fluid dynamics.
There is a description in the Princeton Companion to Mathematics[2]
section III.23 p196.  Another starting point is Feynman's Lectures on
Physics, part II chapters 40-41.

The Navier-Stokes (and Euler) equations are non-linear partial
differential equations in terms of a velocity vector field u(x) and a
pressure distribution p(x).  The Euler equation is N-S with viscosity
v=0.  Apparently, even for small v these behave quite different.

Additionally one can pose a constraint that expresses the fluid is not
compressable.

E and N-S are Newton's law applied to an infinitesimal portion of the
fluid.

The nonlinearity (which causes turbulence) is due to convective
acceleration, which is an acceleration associated with the change in
velocity over position.

In practice, the interaction between large and fine scale behaviour
requires such a fine mesh that the problem is often approximated,
i.e. using the Reynolds-averaged Navier-Stokes equiation[3].


Following Feynman in chapter II.40 (The Flow of Dry Water).  A liquid
moves under shear stress.  Hydrostatics, fluid in rest, amounts to the
absence of shear stress.  The important result here is that the force
per unit volume is -grad(p).  The important conclusion is that for a
force defined by a potential energy, there is no general equilibrium
solution if the density is not constant. (An exception arises when
density depends only on pressure).

For dynamic fluids, the clue seems to be to translate a velocity
vector field description to a particle path description, and formulate
Newton's law of conservation of momentum for that path.  I.e. with dx
denoting infinitesimals and a 2D equation.  This yields an equation
nonlinear in the velocity components.

The force component of the N-S equation is where most of the variation
lies.


[1] http://en.wikipedia.org/wiki/Navier-Stokes_equations
[2] isbn://0691118809
[3] http://en.wikipedia.org/wiki/Reynolds-averaged_Navier-Stokes_equations


Entry: Material Derivative
Date: Mon Sep 28 18:36:22 CEST 2009
Type: tex

\def\duxx{\frac{\delta u_x}{\delta x}}
\def\duyy{\frac{\delta u_y}{\delta y}}
\def\duxy{\frac{\delta u_x}{\delta y}}
\def\duyx{\frac{\delta u_y}{\delta x}}

The nonlinearity of the Navier-Stokes equation comes from the
expression of the derivative of a velocity vector along a particle
path, starting from a velocity field.  

Let $n$ be the spatial dimension.  Given a velocity field $u(x,t) =
(u_1, \ldots, u_n)$ expressed in terms of $n$ spatial coordinates $x =
(x_1,\ldots,x_n)$ and a time coordinate $t$, we can use the spatial
and temporal partial derivatives $\frac{\delta u_i}{\delta x_j}$ and
$\frac{\delta u_i}{\delta t}$ to make the following linear
approximations in terms of time and space offsets.
(1) The velocity vector at a space offset $\Delta x = (\Delta x_1, \ldots,
\Delta x_n)$ and a time offset $\Delta t = t^{+} - t$ from $(x,t)$ is
$(u_1^{+},\ldots,u_n^{+})$ with
$$u_i^{+} = u_i + \frac{\delta u_i}{\delta t} \Delta t + \sum_{j=1}^n
\frac{\delta u_i}{\delta x_j} \Delta x_j.$$ (2) A particle starting at
a spatial point $x$ and time $t$, flowing according to the velocity
vector field $u$ will after a time $\Delta t$ end up at the point
$x^{+} = (x_i^{+},\ldots,x_n^{+})$ with
$$x_i^{+} = x_i + u_i \Delta t.$$ Substituting $\Delta x_j = x_j^{+} -
x_j = u_j \Delta t$ in (1) and dividing by $\Delta t$ then gives an
approximation for the change of the velocity vector $\Delta u_i =
u_i^{+} - u_i$ over a time interval $\Delta t$, along a flow path of
the vector field $u$ through the point $(x,t)$.
$$\frac{\Delta u_i}{\Delta t} = \frac{\delta u_i}{\delta t} +
\sum_{j=1}^n \frac{\delta u_i}{\delta x_j} u_j.$$ The expression on
the right hand side remains the same for the limit $\Delta t \to 0$
when the linear approximations are replaced by exact expressions, and
is called the material derivative of $u$.

For a generic (scalar-valued) multivariate function $f(x,t)$ and a
velocity field $u(x,t)$ the fact that this is a simple application of
the chain rule becomes more clear when we write the total derivative
of $F(t) = f(x(t),t)$ as
$$\frac{dF}{dt} = \frac{\delta f}{\delta t} + \sum_{j=1}^n
\frac{\delta f}{\delta x_j} \frac{d x_i}{dt}.$$ Applying this for $f$
ranging over each component $u_i$ of $u$ then gives the previous
expression.


%[1] http://en.wikipedia.org/wiki/Substantive_derivative


Entry: Informal proofs are difficult
Date: Sat Oct 17 16:40:31 CEST 2009

This has bothered me for quite a while.  Formal proofs can be quite
tedious to read, but at least they will not put you in a position
where you're reading a proof and you just can't see why a certain step
is made, because the author assumed it was obvious.

Granted, the ``obvious'' steps usually are obvious, that is from the
perspective of somebody who understands the proof, or has a background
in the matter (i.e. the justification that's left implicit is used a
lot in other proofs).


Entry: HSVD
Date: Mon Nov 16 14:34:17 CET 2009

A summary of [1].

CHAPTER 2

Explains basic principles of Linear Prediction (LP) and State Space
(SS) based methods for parameter estimation.

The flow of HSVD is like this:

1. start with a signal of length N
2. create a hankel matrix (N-q x q)             -> choose q
3. compute SVD
4. truncate: take first n singular vectors      -> choose n
5. LS solve shift equation
6. compute eigenvectors

The two values to be chosen are n (signal space dimension) and q
(signal + noise space dimension).  The n parameter follows from what
one is looking for, but how to determine q?

Translating LP to an AR problem shifts the statistics but makes it
easier to estimate using LS.

The essence seems to be this: trunctation by SVD is essentially noise
reduction.  This then makes the LS or TLS-based estimation for LP and
SS approaches more precise.  However, the statistically optimal (ML)
estimates require nonlinear optimization.

In general SS methods are better for larger number of poles since they
obtain poles from an eigenvalue decomposition instead of polynomial
rooting, which is ill-conditioned for large order.


[1] ftp://ftp.esat.kuleuven.ac.be/pub/SISTA/vanhamme/phd.ps


Entry: WSOLA - Waveform Similarity OverLap Add.
Date: Tue Jan 26 18:15:18 CET 2010

Gist: An extension of the OLA time stretching algorithm where source
frames are taken from a neighbourhood around the point a normal OLA
would pick the frame.  The offset is taken such that the overlapped
part has maximal correlation.

[1] http://etros1.vub.ac.be/Research/DSSP/Publications/int_conf/ICASSP-1993.pdf
[2] http://music.columbia.edu/pipermail/music-dsp/2001-December/046763.html
[3] http://mffmtimescale.sourceforge.net/
[4] http://diuf.unifr.ch/pai/people/juillera/PitchTech/Preview.html


Entry: Never invert a matrix!
Date: Wed Feb  3 17:22:40 CET 2010

Actually, there is a reason why one would want to invert it anyway: if
the same matrix inverse is applied to a large number of vectors
inversion might pay off because it can be parallellized.  

Solving a set of equations still has a longer data dependency chain.

[1] http://www.johndcook.com/blog/2010/01/19/dont-invert-that-matrix/


Entry: Binary Streams as Stochastic Variables (PDFs)
Date: Thu Feb 18 10:44:38 CET 2010


(EDIT: Mon Dec 27 11:20:35 EST 2010: check [2]: Need to make sure that
multiplication doesn't cause any undesired spectral content to
modulate into the signal band.)


Goal: find a way to look at boolean functions on binary bitstreams
coming from sigma-delta modulators.  I.e. where the bitstream
represents an analog signal that is the result of applying a
reconstruction filter R after mapping the binary signal into the
analog domain.

1. Assumption: we're only interested in the long-term average, so in
   first approximation we deal with stochastic variables, not
   stochastic processes.

2. The ``physical'' variable we're interested in is modeled as the
   expected value of the stochastic variable, which approximates the
   real implementation which is a running average / low-pass filter.

3. As another simplification we assume variables are not correlated.


Using these simplifications, we can work with Probability Density
Functions; a stochastic variable is completely specified by its PDF.

Let's denote PDFs as this:
      a(v), b(v), c(v)      PDF of single variable
      ab(v1,v2), ac(v1,v2)  joint PDF of 2 varaibles
      abc(v1,v2,v3)         joint PDF of 3 ...

Since we use independent variables, joint PDFs can be written as
products: ab() = a()b().

Additionally PDFs of binary stochastic variables can be represented by
a single number \in [0,1] : the probability the variable is "1" (or
"0").  We will abuse this by taking a to mean a(1), i.e. the number a
\in [0,1] is the probability that variable a == 1.

Using this notation, computing expected values of binary boolean
functions is straightforward.  I.e. for the XOR function:

  <+> | 0 1
  ----+----
  0   | 0 1 
  1   | 1 0


  E (a <+> b) = 1 ab(0,1) + 1 ab(1,0) + 0 ab(1,1) + 0 ab(0,0)
              = ab(0,1) + ab(1,0)
              = a(0)b(1) + a(1)b(0)    [independent]
              = (1 - a) b + a (1 - b)
              = a + b - 2ab

To make sense of this result it's simpler to have the binary values
represent {-1,1}.  The transformation F that maps [-1,1] into [0,1]
is:

  F x = (1 + x) / 2

This then yields:

  a <+> b = a + b - 2 a b   implies   (F a) <+> (F b) = F (- (a b))

In other words: the XOR operation is _inverted multiplication_ of
expected values of variables on {-1,1}.


The function <=>, the negation of <+> gives a similar result.

  <=> | 0 1
  ----+----
  0   | 1 0
  1   | 0 1


  E (a <=> b) = ab(0,0) + ab(1,1)
              = a(0)b(0) + a(1)b(1)  [ independent ]
              = (1 - a) (1 - b) + ab
              = 1 - a - b + 2ab


  a <=> b = 1 - a - b + 2ab   implies  (F a) <=> (F b) = F (a b)

The EQV operation is multiplication of EVs of vars on {-1,1}.

It seems it's only interesting to look at equivalence classes under
inversion of inputs and outputs.  See also [1].  I.e. the only other
interesting operation to look at is AND.  Similar functions
(OR,=>,NAND,...) will yield the same result qualitatively by inverting
inputs or outputs.

  . | 0 1
  --+----
  0 | 0 0
  0 | 0 1

  E (a . b) = ab(1,1)
            = a(1)b(1) = a b

So AND,OR,=>,... are multiplications; just in a different domain [0,1].


Conclusion: these binary logic operations give bilinear functions.  On
the [0,1] domain they are binary linear combinations of the 4 terms:
ab, a(1-b), (1-a)b, (1-a)(1-b), where a,b are the
probabilities/expected values of the random variables a and b.


--

Further work:

 * What happens when variables are not independent?  Can the
   correlation itself do anything useful?

For signals close to 1/2 of the range, i.e. represented by an
alternating 01 sequence, it is difficult to be practically
independent.  However, two coin toss processes are independent, so
this should be built-in at the core somehow.  The question then
becomes: how to implement a modulator?  I.e. how to introduce bias
from a uniform random source, given the desired expected value.

I.e. is it possible to take a structured signal and decorrelate it
with a noise process?  Doesn't seem to be straightforward, i.e. XOR or
EQV with a uniform process are equivalent to multiplication with 0.

It seems that one of the a-symmetric operators is needed to create
non-uniform distributions from uniform ones.  I.e. the AND of two 1/2
noise streams gives a 1/4 noise stream.


 * Can we use the apparent dependency on randomness (i.e. from noise
   injection) to transmit other information?

 * What about looking at these functions as ``modulation maps''
   instead of binary endomaps of [0,1] or [-1,1], i.e. as:

                     [0,1] x [-1,1] -> [-1,1] 

   I.e. one of the inputs represents an asymmetric signal, while the
   other one is symmetric.


Some literal Haskell code:

> a `xor` b = a + b - 2 * a * b
> a `eqv` b = (1 - a) * (1 - b) + a * b
>
> f x = (1 + x)/2
>
> a === b = (abs (a - b)) < 1e-14
>
> e_xor a b = ((f a) `xor` (f b)) === (f (- (a * b)))
> e_eqv a b = ((f a) `eqv` (f b)) === (f (a * b))


[1] entry://20090609-145519
[2] entry://20101227-110515


Entry: Rhythm & Dance
Date: Thu Apr  8 13:19:24 EDT 2010

Instead of looking at rythm as a series of events to be interpreted by
a listener, it might be simpler to look at it as a series of motions,
i.e. each up needs a down.

The point being: "performability" might be a key element in rhythm.
Performability meaning: what you hear is not a sequence of sounds, but
a representation of the drummer's hand movements.  I.e. rhythm _needs_
to be dancable.

From this point it might make sense to approach rhythm from the
phisical system way (choreography) as opposed to the auditory
perception way.  I.e. biological motion.

A good base migh be to start from acceleration (muscle tension) and
connected body dynamics (joints produce circular behaviour).


Entry: Euler-Lagrange equations
Date: Fri Apr  9 13:53:14 EDT 2010
Type: tex

From Arnold[1] chapter 3, with slightly different Haskell-inspired
notation.  Suppose a functional $\Phi :: (\R \to \R) \to \R$ can be
written as $$\Phi(x) = \int_{t_0}^{t_1} \! L(x,x',t) \, dt$$ where $L
:: (\R,\R,\R) \to \R$ is a function of 3 variables and $x' = {dx \over
  dt}$ denotes the derivative of the curve $x :: \R \to \R$.  The
function $x :: \R \to \R$ is an \emph{extremal} of the functional
$\Phi$ if and only if $$\frac{d}{dt}\left\frac{\partial L}{\partial
  x'}\right - \frac{\partial L}{\partial x} = 0.$$

The term $\frac{\partial L}{\partial x'}$ means the partial derivative
of function $L(x,x',t)$ of 3 variables to its second variable.
Substituting variables, this denotes the same function as
$\frac{\partial}{\partial v_2}L(v_1,v_2,v_3)}.$ The result of the
partial derivation operation is a new expression in terms of
$(x,x',t)$.  In this expression, one subsitutes the unary function
$x(t)$ and its derivative $x'(t)$, yielding a unary function of $t$,
of which the derivative to $t$ can be computed.

Note that this is 300 year old sloppy notation.  See next post and
SICM[2], where Spivak's unambiguous notation[3] is used.

This theorem is readily generalized to a multi--variate functional
$\Phi :: (\R \to \R^n) \to \R$ operating on $n$--dimensional curves
$:: \R \to \R^n$ and a function $L :: (\R^n, \R^n, \R) \to \R$ of
$2n+1$ variables.

Summarized, if a functional only depends on \emph{positions},
\emph{velocities} and \emph{time}, the Euler--Lagrange equation can be
used to find concrete equations for an \emph{extremal} curve of the
functional.

%[1] isbn://0387968903
%[2] http://mitpress.mit.edu/SICM/
%[3] isbn://0805390219

Entry: Sussman about crappy notation
Date: Fri Apr  9 20:03:34 EDT 2010

In the video[1] Arnold's book[3] is mentioned two times (and that he's
a SOB).  Sussman doesn't like the ambiguous notation.  

  Arnold[3] p246: (not breaking with tradition and explaining what the
  Langrange equations mean) and p258: ``It is necessary to use the
  apparatus of partial derivatives, in which even the notation is
  ambiguous.''

Sussman uses the notation based on Spivak's Calculus on Manifolds[5]
which handles it properly: partial differentials expressed in terms of
position of parameter instead of names that can have different
meanings (parameters or functions).  See p.44 at the end of the
chapter about differentiation[6].

His point: programming forces one to be precise and formal without
being excessively rigorous.


[1] http://video.google.com/videoplay?docid=-2726904509434151616&hl=en#
[2] http://mitpress.mit.edu/sicm/
[3] isbn://0387968903
[4] http://www.docin.com/p-708956.html
[5] http://en.wikipedia.org/wiki/Calculus_on_Manifolds_(book)
[6] http://books.google.com/books?id=POIJJJcCyUkC&pg=PA44#v=onepage&q&f=false


Entry: Legendre Transform
Date: Sat Apr 10 09:47:32 EDT 2010
Type: tex

\newcommand{\ip}[2]{\langle {#1} , {#2} \rangle}

The Legendre Transform (LT) shows up in classical mechanics as the
link between the Legendre equations and Hamilton equations.  Some
notes about what this transform actually does.

First, how to compute?  The LT of $f(x)$ is defined as the function
$g(p) = \max_x F(x,p)$ where $F(x,p) = px - f(x)$.  For a convex
function the maximum occurs where $\nabla_x F(x,p) = 0$ or $p =
\nabla_x f(x)$.  Solving the latter equation yields an expression for
$x(p)$ which can be substituted in $F$ to yield $g(p) = F(x(p),p)$.

\begin{itemize}

\item The LT transforms a convex function on a vector space to a
convex function on the dual vector space.  The coordinates $p = \nabla
x$ can be seen as a representation of this dual space.  (TODO: less
handwaving please).

\item Two functions make up an LT transform pair if their first
derivatives are inverses of each other.  Note that the derivative of a
convex function is a monotone function and therefore invertible.

\item For a quadratic form $f(x) = x^T F x$, we have $p(x) = 2 F x$.
After subsituting $x(p) = \frac{1}{2} F^{-1} p$ in $g(p) =
\ip{p}{x(p)} - f(x(p))$ we get $g(p) = \frac{1}{4}p^TF^{-1}p$.  Note
that the respective derivatives $x \to 2Fx$ and $p \to
\frac{1}{2}F^{-1}p$ are inverses.

\item For a QF we have additionally that $g(p) = f(x)$ or that $f(x) =
\ip{x}{p(x)} - f(x)$ or $f(x) = \frac{1}{2}\ip{x}{\nabla f}$. This is
not true in general.

\item The Langrangian of a physical system $L = T - U$ is (assumed to
be) convex w.r.t. the (generalized) velocities.  This seems to make
general sense: the kinetic energy grows for velocities that go to
infinity, \emph{independent of direction}.  In addition, $T$ will
likely be a quadratic form.  Note that even if the generalized
coordinates live in a non-flat manifold, the velocities will still be
vectors in the tangent space, and kinetic energy is expressed in terms
of the weighted magnitudes of these vectors (a positive definite QF).


\end{itemize}

%[1] http://arxiv.org/pdf/0806.1147v2
%[2] isbn://0387968903
%[3] http://en.wikipedia.org/wiki/Legendre_transformation


Entry: Computer Algebra System (CAS)
Date: Sat Apr 10 16:21:09 EDT 2010

From what I gather, a generic "simplify" method is a collection of
hacks, because there is no generic notion of what "simple" actually
means.

Side-stepping that issue, particular transformations are usually
well-defined.  From my own limited experience with static analysis &
abstract interpretation, it seems best to map expressions between
different domains.  From an engineering point of view it is almost
always simpler to split any transformation stap in at least two steps,
and have formal intermediate languages.  For any transformation step:

   * Convert equivalent expressions (L1) to a unique normal form (L2).

   * Express complicated transformations as directed equalities
     (pattern matching functions) on normal forms (L2) to some target
     domain (L3).

   * Optionally, re-embed the target domain (L3) into the original
     (L1) or some other domain.


Entry: What is a differential?
Date: Mon Apr 12 12:52:09 EDT 2010
Type: tex

A differential of a function on a manifold $f :: M \to \R$ is a linear
functional $df_p(t) :: T_p \to \R$ defined on the tangent space $T_p$
at $p \in M$.


Entry: Conservative Systems and Chaos
Date: Mon Apr 12 21:08:26 EDT 2010
Type: tex

Apparently it is possible to have chaotic behaviour in a conservative
(hamiltonian) system.  A HS conserves phase-space volume, but not
necessarily shape.  Think of kneading dough: volume is conserved, but
an initial ball of raisins spreads out.

How to make an example?  It needs to be a-symmetric to avoid
integrability.  Take a potential like $$U(x,y) = x^2 + y^2 +
\frac{a}{1 + (x-b)^2 + y^2}$$ which looks like a harmonic oscillator
potential well $x^2 + y^2$ at large scale, but has a bump at $(b,0)$.
Can it produce chaotic behaviour?

For simple systems, can the Lyapunov coefficient (LC) be computed
locally?  Sure it can, but what matters more is the \emph{average} LC,
no?  Need more study here\ldots


Entry: d'Alembert Principle
Date: Sun Apr 18 13:24:28 EDT 2010

Basic idea: the ``constraint force'' that keeps trajectories on a
configuration manifold embedded in Euclidian space can never perform
any work, i.e. change the energy of the system.  This is the same as
saying that the constraint force is always orthogonal to the tangent
space of the configuration manifold.


Entry: Spinors and Simplectic Geometry (Hamiltonian Mechanics)
Date: Wed Apr 28 22:41:41 EDT 2010

Ever since an ex-collegue of mine made the remark that he heard some
physicist friend of his say that ``I wonder why spinors aren't used
more in signal processing'', I'be been intrigued by the idea but
didn't really see the connection.

Lately I've been looking into classical mechanics a bit more, and I'm
about to enter the Hamiltonian Mechanics part.  

I take an interest in this as I'd like to see the links with sound
synthesis.

[1] http://en.wikipedia.org/wiki/Spinor
[2] http://en.wikipedia.org/wiki/Symplectic_geometry
[3] http://en.wikipedia.org/wiki/Linear_symplectic_space


Entry: Differential Music
Date: Fri Apr 30 16:17:58 EDT 2010
Type: tex

In an attempt to create music in the form of differentiable functions
on tori, I ran into the problem of simulating events anyway.  The
apporach that seems to work well is to apply a function that would
return a smooth pulse wave when applied to $x(\theta) = cos \theta$.
This lead to $$x \to \frac{1}{1 + k^2(1 + x)}.$$ Another one that
might work is $$x \to \frac{1}{1 + (kx)^2}.$$

The reason for using differentiable functions is that they open the
door for local analysis of parameter spaces of sound generating
networks, and other analytic (computer algebra) techniques.

The next question is: how to bring rhythm into the game?


Entry: Cleaning up PhD research papers
Date: Sun May 30 13:33:15 CEST 2010

I'm keeping general introductory papers, and audo synth + FX papers.
For sinusoidal modeling, Petre Stoica seems to be a good starting
point for generic approach.  The other direction was Sabine VanHuffel
for the fast algorithms.  For wavelets it's Daubecies and Sweldens,
I'm keeping some introductory papers.

I'm throwing away the paper forms of specific ad-hoc papers about:

  - Blind source separation (statistics based)
  - CAS (computational source separation: perception model based)
  - Computational Sciene Analysis
  - Matching Pursuit (iterative filtering)
  - Audio Coding
  - Sinusoidal + complex exponential modeling
  - Wavelets + Applications to approximate LU


I was not able to integrate most of that knowledge during my PhD years
because of the many ways to characterize errors (what mathematical
framework to use to express the modeling problem) and the
non-linearity of the resulting optimization problems, which makes
practical comparison quite difficult.  Possible variations: amplitude
or amp+freq estimation, polynomial phase, optimality (matching
vs. linear models + error), noise coloring, etc.

I'm not keeping Matching Pursuit papers: this method is too ad-hoc,
and forgive my arrogance but I think I can re-invent most of this
technology if I'm ever in need of it.

I'm not keeping the sinusoidal modeling papers (peak picking, etc..).
Same story as with MP: too ad-hoc and re-inventable.

I'm limiting myself to more mathematically meaningful structures.
I've vowed to not set a foot in the theatre of perception ever again!
(I.e. speech recognition: most of this technology needs extra
information about how the brain works; as it is that what we want to
mimick in the first place.)


Entry: Wavelets : local trigonometric basis
Date: Sun May 30 14:09:19 CEST 2010

I'm keeping some wavelet and transform papers.  Might be interesting
to look into.


Entry: Exponential sinusoidal modeling
Date: Wed Jun  2 15:05:45 CEST 2010

I'm keeping the exponential modeling papers.  Most sinusoidal modeling
problems also suffer from ambiguities (optimization problems with many
similar local minima), which require disambiguating constraints and
assumptions to become useful.  However, these can usually stay a bit
analytically tractable.  I'd like to eventually work out some kind of
survey/indexing into this kind of research.


Entry: Exponential modeling
Date: Sat Jun 19 13:38:34 CEST 2010

I worked on sinusoidal modeling for a while, and if there is one thing
that I can remember, it is that there is a lot of wiggle room in how
to tackle the problem on the technical side, and that what we hear
isn't so clearly related to what we can measure: our perception fills
in a lot of gaps.

Sinusoidal modeling highlights:

  * Fourier transform and DFT (FFT).  The FT is interesting for
    reducing the complexity of convolutions, performing a whitening
    transform for adaptive filters (as a filterbank), and as an ad-hoc
    method for analysing harmonic sounds.

  * Autoregressive models + ladder filters (orthogonal polynomials and
    Shur's algorithm).  The theory behind this is quite pretty.
    For structured matrices, modified Shur and rank-revealing
    decompositions can be used to yield fast algorithms.

  * Non-linear phase signals.  Moving from linear phase (exponential
    sinusoidal) to chirp and higher order polynomial phase adds more
    complexity.  There doesn't seem to be much structure to explore
    here.

  * Nonlinear optimization.  Writing sinusoidal modeling as a generic
    optimization problem can use the fact that derivatives of
    exponentials in general do not look horrible.  However, these
    functions are usually multi-modal, and give rise to ambiguities,
    suggesting that there are "many ways to see a signal" as a sum of
    sine waves.

  * Subspace based techniques.  I see two distinct classes: one that
    operates directly on the signal/noise subspaces obtained through
    the SVD decomposition of signal marices, and one that uses signal
    spaces to further derive approximate linear system parameters
    (sinusoidal parameters).


Entry: Subspace Identification
Date: Sat Jun 19 15:02:00 CEST 2010

Subspace Identification[1] refers to system identification of linear
MIMO dynamical models using linear algebra techniques (QR + SVD).  The
basic idea revolves around a two-step procedure:

  1. Given input/output data, construct a _Kalman state sequence_ by
     projection onto a limited subspace.

  2. Obtain state matrices from this using linear least squares.

The fact that 1. is at all possible provides the main leverage.

( The following is generalized from the HSVD method.  I'm not sure if
it completely applies to the stochastic version in [1], but I have the
impression it does.  I need a more intuitive grasp of the Kalman
filter first. )

Such SVD-based methods are good in _practice_ but the characterization
of the approximation is theoretically very far removed from what can
be obtained by maximum likelyhood methods using a more direct
approach.

Subspace methods have been used to categorize the "unreasonable
effectiveness" of mathematics.  I'm definitely no expert, but it seems
to me that they fall more into the class of convenient, accedental
hacks.  (The link between the estimation error and an ML approach is
not clear : probably there is none due to radical re-interpretation of
the problem?)


[1] ftp://ftp.esat.kuleuven.ac.be/pub/SISTA/nackaerts/other/alln.ps.gz


Entry: Maximum Likelihood estimation and the Kalman Filter
Date: Sat Jun 19 19:25:07 CEST 2010
Type: tex

( See [6] for a good introduction to ML and Bayesian estimation. )

Maximum likelihood estimation (MLE) is a way to estimate model
parameters based on parameterized probability density functions.
\begin{enumerate}

\item Construct a conditional model $P(x|\theta)$ which gives the
probability density function (PDF) of observables $x$ in terms of
model parameters $\theta$.

\item Interpret the PDF as a function $L(\theta) = P(x_0 | \theta)$,
setting the observables to a particular observed outcome $x_0$, and
find the $\theta_0$ that maximizes this function.

\end{enumerate} This gives a $\theta_0$ that \emph{best explains} the
data, as the probability of observing $x_0$ is highest for the
parameter vector $\theta_0$.

A simple example of MLE is linear least squares estimation with
uniform noise assumption.

From [6] p. 91, the Kalman filter arrises in a Bayesian framework from
calculating updates of conditional probabilities, taking into account
the next observation step.  In the case where PDFs of parameter priors
and noise sources are gaussian, there is a efficient update mechanism
to compute parameter representation of these PDFs (mean and covariance
matrix).  In general the KF gives the best possible linear estimator
in a MSE sense.

In [6], p. 71-77 the differences between ML, MMSE and MAP estimators
is explained using the concept of risk[7] which combines cost with
probability.  An estimator minimises risk.  Different cost functions
give rise to different estimators.

In [8] a brief explanation is given about how one arrives at the
Kalman filter relations from a recursive Bayesian setting.


% [1] http://www.tina-vision.net/docs/memos/1996-002.pdf 
% [2] ftp://ftp.esat.kuleuven.ac.be/pub/SISTA/nackaerts/other/alln.ps.gz
% [3] http://en.wikipedia.org/wiki/Maximum_likelihood
% [4] http://en.wikipedia.org/wiki/Likelihood
% [5] http://en.wikipedia.org/wiki/Ordinary_least_squares
% [6] http://www-sigproc.eng.cam.ac.uk/~sjg/book/digital_audio_restoration.zip
% [7] http://en.wikipedia.org/wiki/Risk_%28statistics%29
% [8] http://en.wikipedia.org/wiki/Kalman_filter#Relationship_to_recursive_Bayesian_estimation

Entry: Carette's Implicit Model Specialization
Date: Tue Jun 22 12:22:06 CEST 2010

Remarks from reading [1].

  * Model transformation is straightforward, but the selection of the
    techniques still requires human insight.  (I.e. too many degrees
    of freedom to automate).

  * Not only computation, also deduction is necessary (i.e. to prove
    that a particular computation can be eliminated).

  * Look into active libraries[2] and telescoping[4].


[1] http://www.cas.mcmaster.ca/~carette/newtongen/
[2] http://mozart-dev.sourceforge.net/activelib.html
[3] http://awurl.com/SvpJqXPwZ
[4] http://telescoping.rice.edu/


Entry: Digital Audio Restauration
Date: Tue Jun 22 15:14:45 CEST 2010

Digital Audio Restoration - A Statistical Model-Based Approach by
Simon J. Godsill and Peter J. W. Rayner[1].

It contains interesting introduction on Bayesian and ML estimation.


[1] http://www-sigproc.eng.cam.ac.uk/~sjg/book/digital_audio_restoration.zip


Entry: Annoyed by Computer Modern
Date: Tue Jun 22 15:18:31 CEST 2010

I have this book [1] in print.  The printed version is non-glossy
paper, and it has a good looking "spill".  Computer modern is too thin
on the screen and on a laser printer.  Was it designed with this spill
in mind?  TAOCP has less spill, non glossy but finer paper, but looks
better than laser print.

[1] http://www-sigproc.eng.cam.ac.uk/~sjg/book/digital_audio_restoration.zip


Entry: Exponential sinusoidal modeling
Date: Thu Jun 24 17:00:53 CEST 2010
Type: tex

There are many ways to formulate the exponential sinusoidal modeling
problem.  It is important to distinguish between the \emph{stochastic}
model, and the \emph{estimator}.  A basic element of all approaches is
the linear prediction (LP) or autoregressive (AR) relation $$y_n =
\sum_{i=1}^N a_i y_{n-i}.$$ 

Placing this in a statistical framework can yield many variants.
I.e. the stochastic AR model $$y_n = \sum_{i=1}^N a_i y_{n-i} + e_n,$$
and a noisy sum-of-sinusoids model $$x_n = \sum_{i=1}^N a_i x_{n-i}
\text{ and } y_n = x_n + e_n.$$ The difference between these is that
in the former, the noise signal can be seen as driving the input of a
dynamical system, while in the latter there is only measurement noise.

For each stochastic model, several estimators can be created.  An
estimator transforms an observation into a guess for the corresponding
parameters, depending on known or estimated statistics.

In the case of the AR model, when the signal covariance matrix is
known, the AR model can be solved exactly.  It is possible to solve
the model in the LS sense, which is equivalent to building an estimate
for the signal covariance.


Entry: Superfast Toeplitz algorithms
Date: Fri Jun 25 10:28:42 CEST 2010

The structure in linear algebra problems involving Toeplitz matrices
can be exploited using generalizations of the Levinson[2] and Schur
algorithms[4].  Further exploitation is possible using superfast
algorithms based on the FFT or DCT.

[1] http://people.cs.kuleuven.be/~marc.vanbarel/software/
[2] http://en.wikipedia.org/wiki/Levinson_algorithm
[3] http://www.math.niu.edu/~ammar/cortona/cortona.html
[4] http://nalag.cs.kuleuven.be/papers/ade/BWG93/


Entry: Displacement Rank & Generalized Schur
Date: Fri Jun 25 14:58:42 CEST 2010
Type: tex

Reading Mastronardi's PhD diss[1].  One of the things that has put me
off before is the horrible notation.  Many of the formulas talk about
symmetric matrices, which yields a lot of $AJA^T$-style terms.  Isn't
there a more readable way to write this down?

Anyway, the algorithms are quite elegant: instead of operating on a
matrix $R$, one performs elimination operations on generators only,
reducing the complexity order by one.

What is interesting though is that one leaves the domain of signal
algorithms to be able to use matrix algorithms, but one re-enters the
domain of signal algorithms in the implementation, recovering some of
the matrix structure.

% [1] http://users.ba.cnr.it/~irmanm21/Welcome.html


Entry: Predict and Update
Date: Thu Jul  8 10:11:26 CEST 2010

If you look at the equations for the Kalman filter[1], it is
remarkable that they are factored in two steps:

  1. predict output using current input, previous state estimate and
     system equations.

  2. update state estimate and noise statistics by observing the real
     (noisy) system output.


[1] http://en.wikipedia.org/wiki/Kalman_filter


Entry: Binary clocked modulation & averages
Date: Thu Jul 22 07:26:44 CEST 2010

About this[1]..

The problem seems to be that there isn't really much "space" when
signals are binary and clocked; i.e. there is a finite amount.  Maybe
instead of trying to use infinite techniques (statistics: limits) it's
better to look at properties of discrete structures.

[1] entry://20100218-104438


Entry: CRC algo
Date: Fri Jul 30 12:03:25 CEST 2010

What's with the table of coefficients in this CRC32 implementation?

( code: COPYRIGHT (C) 1986 Gary S. Brown. )

[1] https://aachen.uni-dsl.de/svn/unidsl_firmware/backfire/trunk/backfire/tools/firmware-utils/src/cyg_crc32.c


Entry: Minimal erase counter
Date: Tue Aug  3 09:06:50 CEST 2010

How to make a counter that has a minimal number of erases (0->1
transitions).

i.e.

1111 erase
1110
1100
1000
1101 erase
1001
0001
...

It boils down to finding a sequence of paths through the bit-flip
hypercube from 111... to 000...

This is probably not unique, but how to find a representative with a
simple structure?  I.e. how to turn the directed (1->0) hypercube
graph into a tree?  (And step 2, how to order the branches of that
tree?)

The idea is to give each node only a single parent.  I.e. given a bit
vector, there is an algorithm to compute its parent.

10010011 ->
11010011

101001 ->
111001


Entry: Quadratic Residues
Date: Sat Aug  7 20:25:32 CEST 2010

( From conversation with Antti; looking up some background
information.. )

Exactly half of the integers in range [1..(p-1)] are quadratic
residues (QR), and the other half are non-residues of prime p.

       q is QR if there exists a x : x^2 = q (mod p)

In other words, x is the square root of q in the field Z/pZ.

I.e. for p = 5.
                  
     1^2 = 1
     2^2 = 4
     3^2 = 4 (9)
     4^2 = 1 (16)

Meaning 1 and 4 are QRs and 2 and 3 are not.

Why is this?  Each number can be written as g^n, where g is a
generator of the cyclic multiplicative group of the field Z/pZ.

If n is even, g^n has a square root as g^(n/2).

With the exception of p-2, exactly half of the elements of the field
Z/pZ has a square root, as there are always an even number of elements
in the multiplicative group, which has order p-1.

[1] http://en.wikipedia.org/wiki/Quadratic_residue


Entry: Peak compression
Date: Sun Aug  8 06:57:58 CEST 2010

For many applications that have constraints on bandwidth and dynamic
range at the same time, it is useful to be able to control the
maximal signal amplitude.

The simplest way to produce wide-band constant-amplitude pulses is to
use chirps.

Another intriguing way, when the spectral content is known, is to use
Schroeder phases[4][5].  ( Using n(n-1) phase offsets -- compare to
n^2 for Newman phases. )


Now the interesting part is that there is a _digital_ variant of this.
Here one starts from a signal with minimal dynamic range: a (PSK
modulated) binary signal.

Binary can only represent 1 and -1. When modulated as a PSK signal it
is an analog signal with a small dynamic range.  (Smallest known?  See
BER[3]).

A Barker code[2] then is a sequence that is near-orthogonal to its own
shifts, meaning that its autocorrelation has a distinct peak at
shift=0, but has maginitude <= 1 for other shifts.  (Apparently,
orthogonality is too strict a constraint).

Local refs:
Optimal Binary Sequences for Spread Spectrum Multiplexing[6].
Synthesis of Low-Peak-Factor Signals and Binary Sequences With Low Autocorrelation[7].


[1] http://en.wikipedia.org/wiki/Pulse_compression
[2] http://en.wikipedia.org/wiki/Barker_code
[3] http://en.wikipedia.org/wiki/Bit_error_rate
[4] http://books.google.be/books?id=hQ6bl3RG04sC&pg=PA290&lpg=PA290&dq=schroeder+phases&source=bl&ots=7IT7kYGcyu&sig=45_66hBv0Xxo1yw9YBlyXVU0usY&hl=nl&ei=oT5eTJ-IJIX80wSc-8XHBw&sa=X&oi=book_result&ct=result&resnum=6&ved=0CEIQ6AEwBQ#v=onepage&q=schroeder%20phases&f=false
[5] http://ieeexplore.ieee.org/xpl/freeabs_all.jsp?arnumber=1054411
[6] md5://c35e337e61dfbcd980e3728768f71af4
[7] md5://9b2913693c005de2515e3efeb8548e47

Entry: Schroeder phases and quadratic residue diffusors.
Date: Sun Aug  8 07:30:28 CEST 2010

What is the idea behind a quadratic residue diffusor (QRD)?  What is
the link with Schroeder and Newman phases?

( Is there any link with the "maximal irrationality" of the golden
ratio.  Max. irr. here means partial fraction expansion is all ones
which means that rational approximation converges as slow as
possible. )

In [2] it is mentioned that surprisingly, the DFT of the exponentiated
sequence (amplitude 1 and n^2 mod p used as the _phase_) has a
constant magnitude.

Why is this?

[1] http://www.dpi.physik.uni-goettingen.de/~mrs/Vortraege/Remembering-the-Good-Days-at-Bell-Laboratories-2004/index.html
[2] https://ccrma.stanford.edu/~kglee/pubs/2dmesh_QRD_rev1/node2.html


Entry: Finite Field Automorphisms
Date: Sun Aug  8 08:44:55 CEST 2010

A binary LFSR sequence ultimately comes from a cycle in the
multiplicative group of a finite field GF(2^n).  This is a unique
mathematical structure with a bunch of representatives that are all
isomorphic.  So, what does the automorphism group of a finite field
look like?

One place to look for some intuition is in Gold Codes[1].  This is the
only practical application I know where two sequences with different
generator polynomials are combined to form a new sequence with useful
properties.  Interesting properties of Gold codes:

  * Low autocorrelation (this is a "meaningless" property, as they are selected

  * The XOR (+) operation is closed

    PROOF:  Say s1(n) and s2(n) are LFSR sequences of the same lenght.
            A gold code is constructed as g(n) = s1(0) + s2(n).
            We then have 
              g(a) + g(b) 
            = s1(0) + s2(a) + s1(0) + s2(b)
            = s2(a) + s2(b)
            = ...  ???  (hmm... it seemed obvious - not awake yet)


A link I found about finite field automorphisms[3].  Time to finally
get into Galois Theory[4].

But what do I know?  It has to do with the structure of the
multiplicative group.  I would think that the symmetry group gets
larger if the multiplicative group is very composite.

So, which GF(2^n) have prime order multiplicative groups?  Are there
primes that look like 2^n-1?  Mersenne primes[7].

But is it really primes we're looking for, not maximally composite
numbers?


[1] http://en.wikipedia.org/wiki/Gold_code
[2] http://ieeexplore.ieee.org/xpl/freeabs_all.jsp?arnumber=1054048
[3] http://everything2.com/title/automorphisms+of+finite+fields
[4] http://en.wikipedia.org/wiki/Galois_theory
[5] http://en.wikipedia.org/wiki/Safe_prime
[6] http://en.wikipedia.org/wiki/Strong_prime
[7] http://en.wikipedia.org/wiki/Mersenne_prime


Entry: More PRN sequences: 1/x as fractional or p-adic number.
Date: Sun Aug  8 10:13:06 CEST 2010

What I'm looking for are binary sequences that are naturally periodic
and somehow "obvious".

An interesting place to look is the fractional digits of 1/x, where x
is some special number.  At some point the digits start repeating.

Apparently pseudo random noise generation is an application of Sophie
Germain primes[2].

A prime number p is a Sophie Germain prime if 2p + 1 is also prime.

( Instead of the fractional expansion, can we also use the p-adic
expansion? )

Constructing the bit sequence is done using long division.  The period
can be obtained from observing that in base b:

1 / (b^n - 1)  = 0.0...10...10...
                   <--->
                  n digits

For a number 1/q we find the smallest n such that

    q * x = b^n - 1

Here x is the repeated pattern in the fractional expansion of 1/q, one
period of the pseudo random number sequence we are interested in.  How
can we make sure that n is as large as possible?

In other words, find a q that maximises n, where n is the smallest
number such that q divides b^n - 1.

Rewriting it makes it a bit more clear.

          b^n = 1 + q * x   ->   b^n = 1 (mod q)

How to make n as large as possible?  Let's start with q a prime
number.  Then this becomes a statement about the multiplicative group
of the field Z/qZ, which has order q-1.  

The multiplicative group of a finite field is cyclic and commutative,
so completely determined by its order.

Now, when q is a Sophie Germain prime of the form 2p + 1 the
multiplicative group order is 2p.  There are 3 possible cycles in this
group, with periods 2, p, and 2p.  What needs to be verified to make
sure we have the maximal cycle length 2p is that b is a generator of
this cycle (b is a primitive element), meaning

        b^2 !=  1 mod q
        b^p !=  1 mod q

For b = 2 the first one is true when q > 3.  The second one can be
verified for given p and b, but probably also proven generally (??).


[1] http://en.wikipedia.org/wiki/Sophie_Germain_prime
[2] http://mathworld.wolfram.com/DecimalExpansion.html


Entry: LFSR beacon detection
Date: Sun Aug  8 10:35:36 CEST 2010

Actually, this might be interesting to use together with the krikit
idea.  Or how to make a variant of GPS using LFSR audio-beacons.

Problem: we have plenty of bandwidth (say 3 - 10 kHz) but not dynamic
range (environment noise, and "noisyness" of the transmitter).  We
want to transmit a low bitrate signal.

Solution: using spread-spectrum techiques we can trade bandwidth for
dynamic range.

A PSK baseband signal after heterodyne filtering by the carrier wave
is a complex phase signal.  It might have some small residual
modulation that is easy to track, so let's ignore it.

The basic idea is to use the LFSR to perform a local prediction for
the I and Q signals separately, keeping the operation itself linear.
(I.e. if more than one bit of information is available, it can be
used.)

We integrate the output of this predictor to yield a slow-varying
information carrier.  A Schmitt trigger on the output of the
integrator should be enough.


Entry: Chirps from corrugated galvanised iron (golfplaat)
Date: Sun Aug  8 11:58:43 CEST 2010

An open barn/shed with 3 walls made of corrugated galvanised iron[1],
with lines arranged vertically, produces chirp impulse response at an
angle of 45 degrees with the walls.  Why?

[1] http://en.wikipedia.org/wiki/Corrugated_galvanised_iron


Entry: Size of GF(2^n) symmetry groups
Date: Wed Aug 11 21:38:04 CEST 2010

I think I'm stuck with a false fact in my head..  Are the number of
distinct irreducible polynomials and the size of the symmetry group of
GF(2^n) actually the same?

[1] http://oeis.org/wiki/A011260
[2] http://oeis.org/wiki/A000031


Entry: Binary Tree Permutations
Date: Sat Aug 14 14:25:38 CEST 2010

( Of relevance to understanding PF's memory model [1]. )

I'm trying to understand the group of binary tree permutations.  Let's
start with these two, intuitively basic operations:

   * swap:        a b  S  =  b a
   * rotate:  a (b c)  L  =  (a b) c
              (a b) c  R  =  a (b c)

Notation: Lower case letters or digits indicate trees, upper case
letters are tree transformations.  Group action of transformation T on
tree t is denoted as tT.  Binary trees have only one non-associative
operator: (t1 t2) is the tree made from splicing together the subtrees
t1 and t2.  The relation t1 -T-> t2 is shorthand for t1 T = t2.

Questions: 1) is this set minimal? and 2) does it generate all
permutations of binary trees?

For convenience, let's stick to permutations of infinite trees, but
have them go only finitely deep.  This means the group of
transformations is infinite in size but more "regular" than a
transformation group of finite-size trees as there are no special
border conditions.

The answer to 1) is no.  It is possible to write L = SRSRS.  Proof:

    a (b c)  -S->  (b c) a
             -R->  b (c a)
             -S->  (c a) b
             -R->  c (a b)
             -S->  (a b) c

We also can't write S in terms of L and R because both L and R leave
invariant the ordering of the leaf nodes, while S does not.

To understand more about 2) let's see what the relation is to lists.

Suppose that a binary tree can be transformed to a (left or right
leaning) list by a succession of L and R operations.  The inverse is
then also true: any list can be transformed back into an arbitrary
binary tree with the same leaf-node order using a particular
combination of L and R transformations.

So is the premise true?  Or do we need another operation?

From inspection, it seems that rotating a tree to a (right leaning)
list can be done by R operations on _subtrees_.  The question is then:
how to express R operations on subtrees?

I.e. how to express:  a ((b c) d) -> a (b (c d))

More generally, how to take a tree transformation T and re-root it at
a different base node?

Intuitively this seems to be a new operation as you "swipe something
under the carpet".

Let's try: if the basic operations of S,L,R can be constructed to
operate only on the right (or left) node of the tree, expressed in
terms of root-node S,L,R operations, then we can recursively transform
any operation.

    r (a b) -S1-> r (b a)

But I don't immediately see how to construct S1.  My first attempt brought
me to some other primitives: 3-element right-leaning rotations LS and
SR, which are each other's inverse:

    1 (2 3) -L-> (1 2) 3
            -S-> 3 (1 2)

    3 (1 2) -S-> (1 2) 3
            -R-> 1 (2 3)

The left-leaning variants are RS amd SL.

    (1 2) 3 -R-> 1 (2 3)
            -S-> (2 3) 1

    (2 3) 1 -S-> 1 (2 3)
            -L-> (1 2) 3

Because they are embeddings of 3-element list rotations we have:

    (ST)^3 = (TS)^3 = I where T is L,R.

From these basic operations I can't see how to implement the re-rooted
swap operation:

    1 (2 3) -> 1 (3 2),

which seems to be a genuine primitive transformation.

This reminds me of the hint at necessity of '>R' and 'R>' or 'dip'
when writing Forth or Joy code: you need a second stack to hide
intermediate results.  (Additionally you need a "free" stack to
implement creation and destruction of cells in a conservative way.)

Then the question might be: if we replace S, R, L by S', R', L' all
rooted at the right subtree, what can we generate?

   * swap':        r (a b)  S'  =  r (b a)
   * rotate':  r (a (b c))  L'  =  r ((a b) c)
               r ((a b) c)  R'  =  r (a (b c))
   
For sure, the group these generate is isomorphic to the other one, but
it's not a complete set as it leaves the whole left subtree invariant.

Here 'r' really resembles the Forth return stack.  The '>R' and 'R>'
operations could be

    r (a b)  -L->  (r a) b
    (r a) b  -R->  r (a b)
    
( However, in the s-expression encoding of Forth I would use the
version which builds stacks as right-leaning trees. )

Again the questions: 1) minimal set?  2) all permutations?

Approach: My hunch is that it's not necessary to provide an infinite
set of primitives: I think you can build level 2 from the connection
between 0 and 1.  Then when that works, do we have all?

...

Antti[4] points out I'm looking for Thompson's group[2].  Follow
through on [3].

[1] entry://../libprim/20100814-114746
[2] http://en.wikipedia.org/wiki/Thompson_groups
[3] http://www.math.binghamton.edu/matt/thompson/cfp.pdf
[4] http://ndirty.cute.fi/~karttu/matikka/stebrota.htm


Entry: Compensating small room low-frequency acoustic modes
Date: Mon Nov 22 11:34:52 EST 2010

Problem: I have a ringing room mode at 132Hz, right where I'm sitting.

Mentioned by Ethan Winer[1] on Gearslutz[2] that it doesn't make much
sense to compensate for narrow (high-Q, long rining) room modes with
EQ.

First: trying to EQ out room modes works only for a small sweeet spot.
I can live with that.  But as he mentions, it doesn't remove the
ringing and it can't compensate for zeros.

[1] http://www.ethanwiner.com/acoustics.html
[2] http://www.gearslutz.com


Entry: Pink noise
Date: Tue Nov 23 18:04:42 EST 2010

Where does pink noise come from?  The wikipedia page[1] says ``There
are no simple mathematical models to create pink noise.''

In general power laws[2] indicate scale invariance.  However the 1/f
noise is inbetween white noise and its integral (Brownian noise).

[1] http://en.wikipedia.org/wiki/Pink_noise
[2] http://en.wikipedia.org/wiki/Scaling_law


Entry: Impedance and Composite Circuits (What is a parallel impedance geometrically?)
Date: Tue Nov 23 19:28:39 EST 2010
Type: tex

I'm trying to revive some intuition for reading analog circuits and
strikes me now in a very clear way is that the eye to use for
electronics is not the eye for dimensionless signals but the eye for
impedances.

Dimensionless signals are nice to work with mathematically, and in
most DSP work these numbers are all you ever see.  The basic building
block is the unit--delay, or the integrator in the continuous case.
One input and one output, all very neat.

However, the low--level basic building blocks of electronics are
\emph{impedances}, relations between the voltage $V$ across and the
current $I$ through an edge representing a primitive circuit element
or a composite circuit.  For any element, the impedance is defined as
the ratio between voltage and current: $Z = V / I$.  The conductance
is the inverse $S = I / V$.


\begin {itemize}
\item There are only 3 linear circuit elements
\begin {itemize}
\item resistor   $V = R I$
\item capacitor  $dV = C I dt$
\item inductor   $dI = L V dt$
\end {itemize}
\item There are only 2 ways to compose them:
\begin {itemize}
\item parallel, summing $I$, $S$
\item series, summing $V$, $Z$
\end{itemize}
\end{itemize}


When it is understood that the currents and voltages in a circuit have
a sinusoidal form, impedances can be represented by complex numbers,
which are 2D vectors with added multiplication and division (inverse).
This is because the primitive impedances behave as integrators or
differentiators and thus leave the sinusoidal shape intact, changing
only phase and amplitude.

The interesting question is then: while investigating a circuit, is it
simpler to switch between variables V and I when trying to understand
circuits, or should one build an intution for the transformed summing
relation (invert--sum--invert)

    $$A \| B = (A^{-1} + B^{-1})^{-1}$$

One way to try to go the second route is to see how this works
geometrically.  It's simplest to take one of the two values as a
reference point, working with $C = B/A$.


    $$1 \| C = (1 + C^{-1})^{-1}$$

The inverse of a complex number $C$ is expressed by
$\frac{\bar{C}}{\|C\|^2}$.  I've tried these on paper: conjugate (flip
around 1), invert (magnitude), add, conjugate, invert.  

This doesn't give me much understanding as the addition is still done
in the other domain, so I move from impedance to conductance or vise
versa.  However the conjugates are not necessary as they cancel out so
simple magnitude inversion is enough.

The resultant in first approximation resembles most the smallest one
and in second approximation the the resultant is inbetween the 2, and
smaller than both.


Entry: Quantization and number theory
Date: Mon Dec 27 11:05:15 EST 2010

This subject keeps coming back really.  Context: Digital control of
analog synthesizers.

This pops up in an electronics project I'm working on.  The idea is to
build a multi-channel DAC for control signals (say up to 200Hz), using
a cheap microcontroller.

The main design idea is that approximation error is OK, as long as it
averages to 0 and it doesn't correlate with the signal or with itself.
This means ordinary PWM doesn't really cut it as it has very strong
periodic content.  Dithered Sigma-Delta seems to be the proper tool.


Next to digital control, I'd also like to explore pure digital
combination of binary modulated signals using logic gates.  See
previous entries, i.e. [5].  The key idea here is to make sure there
is no correlation.  However some remark in [4], section "Multiplying 2
1-bit signals and getting a noise-shaped result" make me think I'm
missing some key point when multiplying: high frequency modulation
noise being modulated down.

Recently I found a link to a seemingly interesting book [1] about
Analytic Number Theory here [2], and a 2005 workshop[3][4][6] dealing
with the subject and some more from Robert Adams[7].  The last one
probably deserves a separate post[8].

[1] http://148.202.11.158/ebooks/mathbooks/Number%20theory/Analytic%20Number%20Theory%20-%20Newman%20D.J..pdf
[2] http://rjlipton.wordpress.com/2010/12/26/unexpected-connections-in-mathematics/
[3] http://www.cscamm.umd.edu/programs/ocq05/
[4] http://www.cscamm.umd.edu/programs/ocq05/adams/adams_ocq05.htm
[5] entry://20100218-104438
[6] http://www.cscamm.umd.edu/programs/ocq05/wolfe_ocq05.htm
[7] http://www.netsoc.tcd.ie/~fastnet/cd_paper/ICASSP/ICASSP_2005/pdfs/0400077.pdf


Entry: A Signal-processing interpretation of the Riemann Zeta Function
Date: Mon Dec 27 11:33:06 EST 2010

Just a link[1].  Interesting stuff.  Basic idea: a log delay network
(LDN) is a system with delays at logarithm of the natural numbers.
These networks have closed composition (LDN . LDN \in LDN).

This is then used to construct an infinite series network which has
supposedly provable stability, from which we can conclude that the
zeros of the eta/zeta function have to be on the real line.  So where
is the flaw that provents this from being used as a proof of the
Riemann Hypothesis?

[1] http://www.netsoc.tcd.ie/~fastnet/cd_paper/ICASSP/ICASSP_2005/pdfs/0400077.pdf
[2] http://www.cscamm.umd.edu/programs/ocq05/adams/adams_ocq05.htm
[3] md5://31aab89af3f503238b57126a5d268639


Entry: Music: scales and intervals: almost equal composite numbers
Date: Sun Jan  2 13:05:26 EST 2011

The fact that there are different tuning systems for the chromatic
scale indicates that there is something quite wrong here.  The
chromatic scale is a happy accident because our brain glosses over the
differences between intervals that are close but quite different.

There are many examples.  Take the minor 7th[1] for example.  It
corresponds to 

   16 / 9  = 1.777...  
    9 / 5  = 1.800...

both harmonic intervals have nothing to do with each other, but
melodically they are very close, about 1%.  (A minor step is about
5%).

It seems that music relies on this kind of gloss-over substitution
when tones take on different relations in chords and melodies.  I've
always found this to be one of the weirdest things about the theory
behind music.  I've never seen it being mentioned so explicitly
either, only in the context of tuning systems where it is mentioned as
a nuisance.  I think Bach even called it a Devine Joke or something
(TODO: find quote).

In short, it seems as if we can "teleport" between ratios that are
related by near-1 fractions.

I.e. for the minor 7th above we have a difference of

  16 * 5 / 9^2 =  80 / 81

In general this phenomenon is recognized as a comma[3].  The ratio
above is called the syntonic comma[2].

So one, how to enumerate the commas? Since the intervals in music all
use limited prime ratios (i.e. up to 7), and limited amount of
octaves, there is a limit to the amount of mistune one can get.

And second, I wonder if this can be turned around.  Given a chord
written in "human" chromatic notation glossing over commas, is it
possible to "distill" its different harmonic meanings by finding
intervals that match in certain ways?

I.e.:
    16 / 9 = (4 / 3)^2   : two perfect fourths
    9 / 5  = ...


[1] http://en.wikipedia.org/wiki/Minor_seventh
[2] http://en.wikipedia.org/wiki/Syntonic_comma
[3] http://en.wikipedia.org/wiki/Comma_%28music%29


Entry: Zeta function
Date: Sun Jan  2 18:10:24 EST 2011
Type: tex

I've always found the multiplicative form of the zeta function
$\zeta(s)$ quite intriguing, even if the explanation is
straightforward\footnote{For the real proof: I forgot what the
conditions are for rearranging terms in infinite sums. I believe it's
allowed for absolute convergent series, which is the case there since
all terms are positive.  Note that the region of convergence is
$\text{Re}(s) > 1$, so any statements made about this expansion and
its multiplicative form are only valid for that particular region!}.
With $N$ the naturals and $P$ the primes we have $$\zeta(s) = 1 +
2^{-s} + 3^{-s} + \ldots = \sum_{n \in N} n^{-s} = \prod_{p \in P}
\frac{1}{1 - p^{-s}}.$$ This is because $$\frac{1}{1 - p^{-s}} = 1 +
p^{-s} + p^{-2s} + \dots$$ and if you multiply them out, each natural
number will be due to exactly one combination of the terms of the
different factors of the product.  This is an explicit way of saying
that each natural number can be written in exactly one way as a
product of prime powers.

The interesting trick to note here is that operations on infinite sums
are used to represent \emph{quantification} over a domain, i.e. for
all integers.

%[1] http://en.wikipedia.org/wiki/Riemann_zeta_function


Entry: Encoding pairs of numbers
Date: Sun Jan  2 18:32:00 EST 2011
Type: tex

To encode a pair of two integers, construct a discreate space filling
curve.  Let's take a look at the single quadrant triangle constructed
from successive antidiagonals.
$$
\begin{array}{cccc}
    0 & 1 & 3 & 6 \\
    2 & 4 & 7 \\
    5 & 8 & \\
    9 & \\
\end{array}
$$

The top row are the partial sums of the the natural numbers:
$n(n+1)/2$.  If $(x,y)$ is the number to be encoded, you first find
the diagonal is associated to by picking its top row element
$(x+y,0)$, and then move down to the desired spot, so the encoding is
$$n = y + \frac{1}{2}(x+y)(x+y+1).$$

Given $n$, the inverse can be computed by solving a quadratic equation
$$\frac{1}{2}s(s+1) = n$$ for $s$ and rounding down to the nearest
integer $s_r$.  This is the first coordinate of the top row element of
the diagonal the element is on.  Whe then just need to move down the
diagonal by by $y = n - s_r$, and so $x = s_r - y$.


Entry: Encoding finite sequences of numbers
Date: Sun Jan  2 18:58:23 EST 2011
Type: tex

Primes can be used to to represent a finite sequence of integers of
arbitrary length as a single number.  For a sequence $x_1,\ldots,x_n$,
construct the number $\prod_n p_n^{x_n}$ where $p_n$ is the $n$th
prime number.  The operation can be reversed by factoring the number.
Is there a simpler way?


Entry: The 7 divisions of the string.
Date: Sun Jan  2 20:48:09 EST 2011

From Antti[1].

This is a simple algorithm for constructing musical interval ratios,
starting with small numbers.  We stop when the denominator corresponds
to a regular polygon not constructable by ruler and compass.

Now that sounds quite magical, but what does it mean?  I think the
non-constructable polygons (by ruler and compass) are the same as the
n-th order polynomials factorizable by radicals.  This problem is
really about symmetry / discrete groups.

The question is then: is this a coincidence, i.e. simply a question of
where the brain decides ``well, this is enough beating to be called
stable sound'' or does it really have something to do with the
symmetry behind the non-constructable polygons.

[1] http://ndirty.cute.fi/~karttu/Kepler/a086592.htm
[2] http://ndirty.cute.fi/~karttu/Kepler/HarMun.txt


Entry: Tim Stilson PhD Introduction
Date: Tue Jan  4 04:55:22 EST 2011

Looks like he finished his PhD in 2006[1].

His work is mostly not about oversampling and non-linearities, but
about non-oversampled techniques.  ( Though I doubt BLIT and BLEP can
be considered non-oversampled, as they are essentially oversampled
lowpass. )

Anyways, in the section about discretization of continuous filters he
mentions the Delta Operator on page 33, which also appeared in the
book "Finite Difference Equations"[2] I read the first part of
recently.  The idea is to use an integrator/differentiator analogy to
note discrete systems instead of the unit delay as this is more
elegant and more closely resembles the analog case.  This then gives
the difference calculus, a set of rules to manipulate formulas with
deltas, as exposed in [2].

The advantage of this encoding is that it isn't plagued by numerical
issues for highly oversampled systems, i.e. Sigma-Delta modulators.

There should also be a link to automatic differentiation somewhere...
I.e. using memoized / butterfly-style networks instead of
multiplied-out direct formula resembling "convoluted" bell/binomial
shaped coefficients as Tim mentions.

I need to explore other references too..  Goldmine.

Dana Massie: EQ, Harvey Thornburg: NL moog.  Antti Huovilainen: moog
circuit model.


[1] https://ccrma.stanford.edu/~stilti/papers/Welcome.html
[2] isbn://0486672603


Entry: Difference Calculus
Date: Tue Jan  4 11:09:54 EST 2011

Following up on the references from Tim Stilson's PhD[2] thesis.
Already have the book "Finite Difference Equations"[1] from 1961 by
Levy and Lessman.  ( I found the original in a used book store without
ISBN, [1] is the Dover reissue. )

The references Tim mentions are [3][4][5][6][7][8] (books and
paywalled papers).  I found one accessible paper on the subject[9].

For x[k] a real valued sequence |N - > |R, we define the operator z:

      z x[k] = x[k+1]

Then we can define d = z - 1 or

      d x[k] = x[k+1] - x[k]

These are normalized equations with time step equal to 1.  Naively
relating d to the s-plane requires s/f_s normalization.

Before getting into cargo cult DSP math, one needs to keep in mind that:

  * The d and z transforms of a discrete filter contain the same
    information: they are merely described in terms of different
    primitives: the unit integrator (accumulator) vs. the unit delay.

  * In the limit case of infinite sampling rate filter coefficients of
    d-plane and s-plane coincide.  However, naively converting between
    the discrete and continuous case for finite sample rates is still
    wrong!

    In fact, naive conversion is the forward difference transform wich
    maps s -> z - 1, and which has known stability issues for
    coefficients away from z = 1.

More general, all the caveats concerning traditional filter
discretizations still apply.  One can derive reformulated versions of
the well known transforms: forward/backward difference, bilinear,
impulse response invariance (pole mapping) and pole-zero mapping.
These then look like naive (f_s scaled) direct s -> d mappings + some
correction terms O(s / f_s).

The main use seems to be one of implementation: coefficient
sensitivity will be less in d-plane formulation for low-frequency
poles and zeros.

Of course one can start by simply substituting z = d + 1 in a fully
expanded z-plane formulation, but that is not very insightful.
Instead we do the following:

  * Factor the z-plane formulation first as a product of first order
    terms (z-z_k), including complementary complex ones.  Performing
    the substitution in this form yields terms (d + 1 - z_k).  We name
    d_k = 1 - z_k.  For low-frequency phenomena it happens that the
    z_k are all close to one meaning the d_k are small in magnitude.

    So essentially, a formulation in d-plane form eliminates a
    subtraction that is the cause of loss of precision in the z-plane
    formulation.

  * 1st order IIR sections in the form of b/(d-a) can be implemented
    directly using leaky integrators[11].  A state variable 2nd order
    section can be used to implement the complex conjugate 1st order
    sections.


For IIR filter design (when everything is analytic) there is little
practical difference between z-plane and d-plane approach: simply
transform (analytically) when you're done.

For FIR filter design which is mostly numerical in nature, the d-plane
approach seems also useful for avoiding numerical instabilities.
I.e. to formulate the optimization problem in such a way that the FIR
filter appears in factored form, instead of multiplied out.

( From my own experience I've found that FIR filter design problems
  involving low frequencies are seriously plagued by convergence
  issues due to insufficient numerical precision.  I recall that using
  a spectral factorization method to design an FFT window filter I
  would obtain results that where clearly wrong after visual
  inspection, i.e. one coefficient value "sticking out" while there
  should be overall symmetry.  It would surprise me if this hasn't
  been realized long before.  Look into ladder FIR filters.  )


[1] isbn://0486672603
[2] https://ccrma.stanford.edu/~stilti/papers/Welcome.html
[3] http://ieeexplore.ieee.org/xpl/freeabs_all.jsp?arnumber=1104162
[4] http://md1.csa.com/partners/viewrecord.php?requester=gs&collection=TRD&recid=2270973CI&q=&uid=790228084&setcookie=yes
[5] isbn://9780817639341
[6] http://ieeexplore.ieee.org/xpl/freeabs_all.jsp?arnumber=0123294
[7] http://ieeexplore.ieee.org/xpl/freeabs_all.jsp?arnumber=73564
[8] http://ieeexplore.ieee.org/xpl/freeabs_all.jsp?arnumber=219661
[9] http://sites.google.com/site/jamesgibsonhomepage/projects/DeltaOperatorCaseStudy.pdf
[11] entry://20110104-124734

Entry: Practical d-plane formulations
Date: Tue Jan  4 12:47:34 EST 2011

As a continuation of [1].

Given a d-plane formulation, how to implement it?

It's simplest to start from the leaky discrete integrator and see what
it's d-plane formulation looks like.  The time domain formulation

   y[k+1] - y[k]  =  a y[k] + b x[k]
   y[k+1]        +=  ...

expresses the update of y in terms of the weighted magnitude of y and
x.  It corresponds to the frequency domain formulation

   d y  =  a y  + b x

   (d - a) y = b x

   y / x = b / d - a

The non-leaky integrator is then b / d.

Complex poles are handled in exaclty the same way: We take the complex
form of one of the complementary pairs, feed it a real input and
taking only the real part of the integrator as as output.

TODO:

  - Make the 2nd order implementation explicit.  Handwaving might be
    wrong.

  - What about d-zeros?


[1] entry://20110104-110954

Entry: Lattice and Ladder filters
Date: Tue Jan  4 14:02:24 EST 2011

Let's get some terminology ironed out.  Mostly in relation to [2][3].

Structurally, a d-plane formulation can be associated to a
ladder/lattice topology, and their good numeric properties are well
known.  They also have interesting mathematical properties (see linear
prediction and orthogonal functions on the unit circle [10], and
further work in generalized Schur and displacement rank algorithms for
Toeplitz-structured matrix problems.).


[1] http://en.wikipedia.org/wiki/Lattice_filter
[2] entry://20110104-110954
[3] entry://20110104-124734
[10] http://www.emis.de/journals/BBMS/Bulletin/bul941/BULTHEEL.PDF


Entry: Nonlinear digital Moog VCF + bandlimited sawtooth.
Date: Tue Jan  4 15:31:08 EST 2011

Also mentioned in Stilson PhD[2] is Antti Huovilainen: Nonlinear
digital implementation of the Moog ladder filter[1][3] ([5][4]).
Interesting derivation, but not such a surprising result.

However, the piecewize parabola is a nice trick!  So I wonder if this
can be taken further one order by representing the remaining "bounce"
discontinuity by a 2nd order "brake" discontinuity and differentiating
twice, i.e. by using piecewize 3rd order polynomials:

   (x-1) x (x+1)

Deriving once gives the piecewize parabola, deriving again gives the
sawtooth.  The 3rd order polynomial's discontinuity is smoother so
should roll off faster (18dB/octave).

It shouldn't be too hard to implement in Pd either..
Indeed:  differentiated parabolic   (DPW)  [6]
         twice differentiated cubic (2DCW) [7]

Looks like the obvious extensions are already published[8].

  V. Välimäki, J. Nam, J. O. Smith, and J. S. Abel, “Aliassuppressed
  oscillators based on differentiated polynomial waveforms,” IEEE
  Trans. Audio, Speech, Language Processing, vol. 18, no. 4,
  pp. 786–798, May 2010.

[1] http://dafx04.na.infn.it/WebProc/Proc/P_061.pdf
[2] https://ccrma.stanford.edu/~stilti/papers/Welcome.html
[3] http://www.mitpressjournals.org/doi/abs/10.1162/comj.2006.30.2.19
[4] md5://81ef26b98b858bfc7ea351850b7f8872
[5] md5://ec26bdee832237793c6de78875942a60
[6] http://zwizwa.be/darcs/pd/abstractions/saw2~.pd
[7] http://zwizwa.be/darcs/pd/abstractions/saw3~.pd
[8] http://ieeexplore.ieee.org/Xplore/login.jsp?url=http%3A%2F%2Fieeexplore.ieee.org%2Fiel5%2F10376%2F5446581%2F05153306.pdf%3Farnumber%3D5153306&authDecision=-203

Entry: 2nd order 4-multiply ladder EQ
Date: Wed Jan  5 07:06:37 EST 2011

Also mentioned in Stilson PhD[2] is Dana Massie's work on 2nd order
ladder EQ with separate frequency and Q controls[1].

[1] http://www.aes.org/e-lib/browse.cfm?elib=6994
[2] https://ccrma.stanford.edu/~stilti/papers/Welcome.html


Entry: Model based control
Date: Thu Jan  6 13:06:47 EST 2011

I think I just re-invented MBC:

  - Given a (non-linear) fixed model of the control->actuation
    transfer, perform feedforward or feedback control updates at high
    frequency.

  - Update model parameters at lower frequencies.


Entry: Delta Operator
Date: Thu Jan  6 14:28:24 EST 2011

Called divided difference as it's defined as:

       d x[k]   =   x[k+1] - x[k] / T

I.e. the differences are taken relative to the sampling period T.

In most cases where T is fixed this can be normalized to T=1 by moving
the scaling to the coefficient end, i.e. coefficients will become
small when T approaches 0.

However, in the exposition of [1] explicit mention of T is desired to
be able to take the limiting case of T->0 to make the bridge to the
continuous case, and the cases with different T.

See also fixed step integration[2], Time Scale Calculus[3].

[1] md5://3e41deebd80a5b988de09b645c036e57
[2] http://en.wikipedia.org/wiki/Eulers_method
[3] http://en.wikipedia.org/wiki/Time_scale_calculus


Entry: Matrix multiplication is quadratic?
Date: Mon Jan 10 00:10:48 EST 2011

Basic idea: Matrix multiplication can be related to group
multiplication, which can then be implemented by the fourier
transform.

This is freaky stuff.  I think this paper applies to finite field
matrices though.  Does this also work for the reals, or in
approximation for floating point numbers?

Does it mean that in general, _all_ matrix multiplication problems can
be expressed as convolution problems?  I.e. the generalized Levinson /
Schur style algorithms already exploit this in some sense..  Are all
matrices "structured" ?

[1] http://arxiv.org/abs/math.GR/0511460


Entry: Dithering and "FAT" saw stacks
Date: Mon Jan 10 11:43:47 EST 2011

The idea: a stack of sawtooth oscillators sounds "FAT" when it's from
an analog synth.  I don't have one on hand here, but is this due to
subtle frequency changes?

Using the parabolic/cubic sawtooth generator in Pd I get a rich sound,
but the very repetitive intermodulation is a bit disturbing.

( Problem however, I do not know if I'm just paying attention to
something I wouldn't normally notice.  Unfortunately I do not have an
analog sawtooth oscillator handy. )

However, the idea in general is quite interesting.

This is related to:
  - dithering in sigma-delta
  - peak limiting in spread-spectrum pulses and ADSL[1]
  - eliminating room resonance modes in acoustics (flattening)
  - same for artificial reverberators (i.e. FDN)

Ultimately this is about integers, i.e. "almost equal" numbers.

But maybe that's not what I'm looking for.  Could this be an issue of
coupling and chaos?  I.e. if multiple saws go thorugh a discontinuity
close together, could there be some secondary cause for them to spread
apart in frequency?  I.e. modulations in a string do just that: peaks
increase the tension and thus temporarily disturb the linear order.

[1] md5://9b2913693c005de2515e3efeb8548e47


Entry: Z-transform and generating functions
Date: Tue Jan 11 15:58:12 EST 2011

Who chose that damn negative exponent in the Z-transform[1]?

For expressing difference and update equations in state space form it
is simpler to have "z" mean _index increment_ or feedforward in time,
and "d" defined as d = z - 1.  This makes update equations and
difference equations directly expressible:

     x_{k+1} = a x_k         =>     z x = a x

     x_{k+1} - x_k = b x_k   =>     d x = b x

In state space form, the relation between the z and d formulation is
also pretty clear.  For systems where the dynamics is slow compared to
the sampling rate, the state space formulation in difference form also
makes more sense: it has better numerical stability and corresponds to
fixed-step[3] integration of a continuous differential equation.

Flipping the exponent around makes the Z-transform equal to the
relation that expresses a generating function[2].  The inverse
Z-transform then corresponds to the line integral expression for power
series coefficients in the expansion of an analytical function around
z=0.

So, picking a non-(engineering-)standard notation means we need to
take care when just copying expressions out of books and papers.
However from context it is usually quite clear which is which.  The
most important differences for the positive-exponent Z-transform are:

  * Stability: convergence of a power series of an analytical function
    (representing a signal or transfer function) works for |z| < R
    where R is the location of the pole closest to 0.

  * Stable transfer functions need to have all poles _outside_ the
    unit circle.

This is just the reverse of the flipped-sign Z-transform.

Of course, a similar point could be made for the Laplace transform
(LT).  Why does it have a negative exponent?  Same for the Fourier
transform (FT).  For the FT it seems to make sense to me, as the
reconstruction is in terms of positive exponents, which somehow makes
a more intuitive appeal.  Anyways, it really doesn't matter that much
-- just convention and something to nag about...

[1] http://en.wikipedia.org/wiki/Z_transform
[2] http://en.wikipedia.org/wiki/Generating_function
[3] http://en.wikipedia.org/wiki/Euler_method


Entry: Factored FIR
Date: Fri Jan 14 15:44:14 EST 2011

Q: Are there any FIR filter design algorithms that explicitly rely on
   the equation to be in factored-out form, i.e. as a product of 2nd
   order sections?

I suppose this introduces the problem of having to specify whether FIR
zeros are complex or not.  I seem to remember them mostly being
complex, exept for odd orders where there would be one zero.


Q: Is there a reason why FIR filters have "mostly complex" zeros?  

Is this equivalent to asking why their inverses the IIR all-pole
filters have "mostly complex" poles?  (Because that gives more
locality.)


Entry: Initial Algebra / Final Algebra
Date: Fri Jan 21 12:01:08 EST 2011

This terminology is used in [1].  The concrete interpretation for me
is that "final" means modeling in terms of functions while "initial"
means modeling in terms of abstract data types.  Other than that the
terminology is obscure to me.

Googling brings me to this[2].  So it's in the field of semantics.


[1] http://www.cs.rutgers.edu/~ccshan/tagless/jfp.pdf
[2] http://homepages.feis.herts.ac.uk/~comqejb/algspec/node12.html


Entry: Squirrels & robot control
Date: Sun Feb 20 12:38:52 EST 2011

Squirrels use very fast balancing corrections.  Why is this?  Some
possibly relevant issues:

  - Fast corrections (high acceleration) are relatively independent of
    gravity.  I.e. this is a "pulse compensation" as compared to a
    continuous control process.

  - They have litte mass m, which means that they don't need much
    force F to get to high acceleration a = F/m.

  - Fast pulse compensation is also present in eye movements.
    Apparently for the eye, tracking is more difficult than focusing
    on the same point for a longer period of time.


Entry: Fuzzy class D multi channel amp
Date: Wed Mar  2 12:33:07 EST 2011

Problem 1: I want a multi-output class D amp for synth control
signals.  BW = 200Hz is a must, anything higher is welcome.

Problem 2: I want to use a cheap, off-the-shelf part (a PIC uC).

In general, you'd want to solve the problem in hardware (i.e. PLC or
FPGA) for one channel independently, and then duplicate it n times.

However, the rates involved in my problem are quite low, so it's
probably possible to use a cheaper uC-based solution.  How to go about
making that optimal?

To do it on a PIC requires some cleverness, as flipping multiple bits
at the exact calculated time will not work well because the PIC needs
to do the necessary computations serially.  

Taking a strict time multplexing approach is probably not optimal, as
in most cases no action needs to be taken, i.e. no toggle performed,
so we might as well not have done the computation in the first place.

So I wonder how to go about trying not to perform unnecessary work and
using a more optimal multiplexing scheme.  I.e. something like:

  - find a fast way to guess the next output that needs a toggle

  - verify the guess by performing a computation

  - perform the toggle (or not!)

  - record the actual time of the toggle so it can be incorporated in
    the next control computation


The hard problem seems to be the first one: how to build a system that
knows which current error is the largest, so requires immediate toggle
action?  If the guesses are accurate, there won't be many computations
that are for nothing, i.e. that decide to not toggle because it's too
soon.

I'd say this is a data structure problem.  Find a way to keep a sorted
list of current errors.  Then simply pick the top one to update.


Entry: Sound impedance
Date: Tue Mar 15 23:49:18 EDT 2011

If I get this right: speakers are cone shaped.  

At the narrow end, velocity is high but pressure is low.  At the wide
end it's reversed.  Widening enables the same amount of mass to move
with less velocity.  WRONG!

According to [1] it's just the other way around!

I think it's time for me to pick up Feynman's lectures again..

[1] http://en.wikipedia.org/wiki/Horn_loudspeaker


Entry: I don't know sound:  1. Pressure
Date: Wed Mar 16 00:04:01 EDT 2011

I think I don't understand pressure.  It's a statistical measure that
represents force per unit area in a collection of moving point-masses.

Note that it is really the force that is averaged:

Take a very small piece of area such that it gets hit infrequently.
The force necessary to keep it in place has a pulse-like nature due to
collisions: force is high when we're "pushing back" on an incoming
molecule, and zero when there is no contact.

Therefore it is simpler to go up and down the integral stairs once:
don't look at force but at impulse transfer per collision (integrate)
and then look at the number of collisions per time unit
(differentiate).

From this perspective, why is it that a speaker driver (i.e. a flat
surface moving in a direction perpendicular to its surface plane) has
high impedance, meaning it produces mostly pressure and no
displacement?


Entry: Power spectrum as power of derivative of image
Date: Thu May  5 08:36:15 EDT 2011

See [1] -> Listening to your webcam.

I was wondering, since there is such a "naturally pleasing" map
between the derivative of an intensity map and an audio spectrum, are
there more such parallels to make?

[1] http://homepages.kcbbs.gen.nz/tonyg/projects/tangents.html


Entry: Queued Class-D DAC
Date: Sun May  8 08:57:30 EDT 2011
Type: tex

\def\us{\mu\text{s}}

The objective is to use a cheap microcontroller to implement an
$N$-channel pulse--modulated DAC, as opposed to using parallel
programmable hardware, duplicating a $1$-channel DAC $N$ times.  The
following is a list of remarks about how to implement this
functionality on a cheap microcontroller.

\begin{itemize}

\item The software switching performance of a typical uC like a PIC18F
is about $10$ instructions or $1\us$, where each time slot reads and
updates a state register and an output pin.

\item The DAC is going to be used to modulate amplitude and frequency
of audio signals.  This places an extra constraint on the conversion
error: its decorrelation is favoured over its inaudibility.

\item A straightforward approach is to multiplex such time slots on a
rigid grid, obtaining a switching period of $NT$ where $N$ is the
number of channels.

\item Depending on the modulation algorithm used, it might be possible
to optimize this scheme.  Instead of computing each DAC's state update
at each tick period, it might be possible to just compute the time to
next switch, and then queue these operations.  If the scheduling
problem can be solved exactly, this can bring the effective resolution
from $NT$ in the dumb multiplexed case to $T$ in the queued case.

\item If the scheduling problem can not be solved exactly, it might be
possible to measure the switching error and feed it back into the
state update.  Some global strategy is probably necessary to make sure
these errors are distributed over several DACs, and sufficiencly
decorrelated in the case of signals that modulate audio.

\item It might be possible to favour the $50\%$ duty cycle, and only
encode differences with respect to this default.  I.e. the signal
$101011010$ would be represented by $000001111$ after demodulation by
the default $2T$ square wave $10101010$.  It would be interesting to
find out if such an approach is beneficial for eliminating correlated
noise ripples, or if it just shifts the problem.


\end{itemize}


Entry: Code generation and quasi-particles
Date: Sun Jun 12 00:04:03 CEST 2011

(mumbo-jumbo)

I've had it with this borking real-world stuff.

Time for some seriously wacky ideas.

What I'm trying to do in libprim is to make a compiler and VM at the
same time.  In practice, glossing over this means to take the simplest
representation of that structure (an interpreter), i.e. represented as
the composition A B C, and inserting "virtual particles" (which are
operations and their inverses) at the boundaries of the VM and
compiler.  I.e. A B x x' C, where A B x is the VM and x' C is the
compiler.

That's the high level mumbo jumbo.  Now, can this be done in practice?
How to express code specialization in an explicit algebraic way,
i.e. as symmetry transforms?

From afar the theory seems sound: you take an element of a group which
represents a function, and you find a representation of that element
such that it can be split in two: a compile time and a run time element.

Then use your understanding of the symmetry of that group to make
automatic search a little bit feasible..


Entry: Programs and specifications
Date: Sun Jun 12 15:22:28 CEST 2011

(mumbo-jumbo)

The trouble is that representation needs to be as abstract as
possible.  This seems to be the hard part as very often it remains
implicit.  I.e. programmer thinks "Ah, let's use this bit for that to
quickly hack around doing these things separately".  That's exactly
the implementation information that one wants to capture, and has an
incredibly large amount of detail if made explicit.

This then also indicates why this process is not factored in the
following particular way:

   specification S . specializer (S->I) = implementation (I)

but by writing the implementation I separately, and providing a proof
that I embeds S.


Q: What is more difficult, to provide an indirect proof that S is
   isomorphic to a subpart of I, or to construct a direct
   (parameterizable?) embedding function S->I?

Putting it like that it seems that the former is simpler, as it allows
the use of non-constructive proof techniques.  However, making the
S->I embedder parameterizable might give a whole family of
implementations that do not need separate proofs.


Entry: A Primer of Infinitesimal Analysis
Date: Thu Jul  7 23:28:09 CEST 2011

A nice one to stack next to dual numbers[3] and automatic
differentiation.  Link from [2], which is quite an interesting post in
itself.  ``Smooth space-time, say `R^4`, allows only smooth motion and
smooth distribution of mass. If we place non-smooth mass in it, the
space will change to a _subset_ of `R^4` which carries additional
information about the anomalies contained in it.''

I started reading [1][5] and I must say it's quite interesting to reason
with non-non-existing entities ;)  In the book they are likened to
virtual particles: something that's not really present in end results,
but useful as a tool in calculations.


[1] isbn://0521624010
[2] http://math.andrej.com/2008/08/13/intuitionistic-mathematics-for-physics/
[3] http://en.wikipedia.org/wiki/Dual_numbers
[4] http://en.wikipedia.org/wiki/Smooth_infinitesimal_analysis
[5] md5://3f3da3bbc8da4d430677acc78cb9c7c3


Entry: Getting that book feel on an e-reader
Date: Tue Jul 12 22:00:30 CEST 2011

If the font is too thin, e-readers or screens don't give a very good
feel.  I.e. math books and papers in computer modern are really
horrible.

I'd like to emulate the feel of blurry ink on paper.  This seems to
work well with a 600dpi PDF with radius 6 gaussian blur (gimp's units)
and a exponential looking J-shaped intensity curve.

Before I tried this with a pixel-growin algorithm too which gives
similar results[1].


[1] entry://20090511-181107


Entry: Human motion
Date: Sun Oct 30 17:44:29 EDT 2011

Plot derivative of a moving laser dot to see the small "pulse
corrections" and make a model for that.


Entry: Time varying filters
Date: Mon Oct 31 00:16:32 EDT 2011

What makes musical filters musical is their time-variant nature, their
speech or phrasing.  Filters by themselves are quite cheap, though
controlling them, making them time variant to make an interesting
texture is not at all a simple problem.

It might be interesting to go into that problem a bit now that I
(almost) have my code generation machinery running..


Entry: Logic
Date: Sun Dec 18 16:42:08 EST 2011

For my EE-brain, "logic" is the logic of electronic circuits, which
corresponds to propositional logic.

The "logic" of type systems is at least first order logic.
Quantification (things with holes) is necessary to express structure.


Entry: Inference
Date: Fri Dec 23 12:53:47 EST 2011

Is inference always generate-and-test?

More specifically, I'm thinking of the task "design somthing that has
these properties.".  In general, there is no algorithm that goes from
a description of the properties to the description of something that
implements those properties (a logic and a model?).

I'm inclined to think (in a vague an unrooted way) that any form of
"algorithmic inference" is really just fixing parameters an already
existing / well-defined family of models, not *ever* the construction
of a model out of the blue.

Why is this important?  Compilation of specifications.  Is it so that
a model can be seen as a "constructive proof" of a set of properties?
Is this kind of thinking formalized in some way?


[1] http://en.wikipedia.org/wiki/Model_theory


Entry: Distributivity vs. Commutation
Date: Thu Dec 29 08:50:12 EST 2011

I'm still in the process of finding a good terminology for "type
commutation" in Haskell, i.e. correspondences like this:

               a (b t) <-> b (a t)
 
       a (b t1) (b t2) <-> b (a t1 t2)

and multi-arity variants of this, which seem to be more arbitrary due
to the different ways the t_x can be permuted.


Entry: Density of discrete entities
Date: Sun Jan  1 17:04:12 EST 2012

There's something funny going on with the concept of density for a
finite amount of particles.  How is the "blurring" regularization for
this to work usually handled?

I guess that moving to a statistical interpretation solves this
problem.  The density is not a property of the particles, but one of
the probability distrubution that is used as a model of the particles.

I.e. model 10 particles by thinking of them as a (random) a sampling
from a PDF, then talk about the PDF instead.


Entry: Accounting
Date: Mon Feb 13 13:27:37 EST 2012

How to use gnucash.

Problem: I currently don't care about "proper" accounting (i.e. the
accounting equation [1]).  I just want a way to manage order reports
(i.e Amazon, Ebay, ...) that can be obtained electronically, to
categories of expenses (Office, Equipment, Transport, ...).

I'm not sure if that makes any sense in a standard accounting
approach.  It's probably good to first learn how to do this properly.

[1] http://en.wikipedia.org/wiki/Accounting_equation


Entry: Mixers and integrators
Date: Tue Jul  3 08:41:27 EDT 2012

Maybe it's time to try the Sigma/Delta synth again, using two building
blocks: 

 - the integrator = counter + S/D converter

 - the mixer = "random" bit stream multiplexer

Where "random" means the switch waveform is not corrlated with the S/D
"waveform".  It would probably be enough to make it into a cheap LFSR.

Before doing this on chip, it's probably a good idea to simulate it
first in Haskell.  After that it might be interesting to put it in an
FPGA, or a low-frequency version in a PIC.

Theoretically, it would be interesting to find a measure of how
inefficient this actually is.  The main tradeof is to use a use
spectrum so it has an 1/f characteristic, and simplify the "mixer" to
a multiplexer (assuming RNG is free).


Why is an S/D dac/adc beneficial?  It seems that it trades off speed
vs. size, but not in a very efficient way.  However, "speed is free".
HW runs in MHz/GHz range, about 1000x more than ordinary audio
sampling frequencies.  Apart from time-multiplexing (i.e. sequential
computer programs) there is no way to easily use these extra cycles.

One thing I don't really understand is how to look at "information".
It's said that the higher bands contain noise, but that's not really
true.. it's because of the "noise" in the higher bands that the
information in the lower bands can fit in the 1-bit dynamic range.  It
doesn't seem that these higher bands can actually contain information
along side the low band signal.  However, the encoding is very robust
against random bit errors: they definitely introduce unwanted signals,
but they do it in a gracefully degrading way.

It would be interesting to compute the theoretical capacity[4] of a
channel with the same noise envelope as a s/d and see how it relates
to the digital bit rate.

Not to forget, for signals close to the 0/1 voltage, a S/D acts as a
voltage <-> frequency converter.  Closer to 1/2 voltage this is the
same, only modulated near to 1/2 the output frequency.  In this mode
is is clear that DC accuracy is actually infinite (infinite
integration time) so to talk about "dynamic range" is always dependent
on frequency.


[1] entry://20090609-145519
[2] entry://20101227-110515
[3] entry://20100218-104438
[4] http://en.wikipedia.org/wiki/Channel_capacity


Entry: Sigma/Delta retake
Date: Wed Jul  4 13:44:39 EDT 2012

Let's start from scratch.  Basically, there are two systems I'm
interested in: the first order analog lowpass or integrator, and the
digital integrator (i.e. the PIC carry flag hack).

Some basic assumptions:

- Reconstruction should be a simple (maximally flat) lowpass filter,
  meaning that the digital signal has a direct analog meaning.

- Feedback and reconstruction filter are the same.  I.e. the error
  signal is the effective error.

- First iteration: work with 1st order LP to simplify math, this
  leaves one design parameter: filter cutoff.

- Quantisation noise and signal are not correlated (which is wrong in
  practice!).

- Negative feedback minimizes error with white spectral distribution
  (if signal and noise are not correlated).


Let's get at the basic schematics.  We're interested in the error of
the reconstruction which is present after the negative feedback
summation [i].  First iteration I was thinking to minimise the
reconstruction error, which is o = (s - Fr).
                
    s-->--[-]-e-->-[Q]--->--r
           \--o--<-[F]-/

                    q
                    |
    s-->--[-]-e-->-[+]--->--r
           \--o--<-[F]-/

where F is a lowpass filter, s is the input signal, r is the binary
representation, e is the feedback error signal to be minimized, and o
is the reconstructed output.  The quantizer Q is modeled as a noise
source q.

However, this is not the same as the circuit I find here[1].

    s-->--[-]-e-[F]->-[Q]--->--r
           \--------<----/

which is equivalent to 

    s->-[F]--[-]-e->-[Q]--->--r
              \----<-[F]-/ 

This means that we're minimizing the error between the reconstruction
(filtered representation) Fr and the filtered input Fs.  This makes
perfect sense: we can't reconstruct the high band of s, as that's
where we're going to move the noise.

In practice however, if F is already bandlimited such that s = Fs, the
first schematic will also work.

To see the shape of the noise component e in r, divide out F in the
error equation:

             Fs - Fr = e
              s -  r = e/F = e'

since e is white, e' has the shape of 1/F.


EDIT: The fist schematic is a delta modulator, and [1] uses
integrators instead of lowpass filters.  Explain that.

[1] http://en.wikipedia.org/wiki/Delta-sigma_modulation


Entry: Pulse counters as grey code?
Date: Wed Jul  4 15:11:21 EDT 2012

Can pulse counters for implementing integrators be implemented as grey
code counters?  Does it even make sense at gate level?


Entry: Usable spectrum in a Sigma/Delta signal
Date: Wed Jul  4 15:42:53 EDT 2012

How does this[1] translate to the SNR of s?  We have the shape of the
noise, but not the maximal amplitude.  I'm interested in SNR to
compute the channel capacity, i.e. how much information can be encoded
in the Fs signal as opposed to the raw bitstream r.  Is there any
capacity lost by this encoding?

It seems that this is not an easy answer, as it involves assumptions
to make the nonlinearity go away.

What about this:

- in r = s + e', the signal component s is negligible.

- e' is highpass, which means it has no DC component, so the
  instantaneous energy is known, and constant: half of the bits are 1,
  half are 0.  Setting 1/2 in the middle, the RMS is 0.5, power is
  0.5^2.

- given the shape of e' and the the total energy, the
  frequency-dependent energy can be computed.

So it seems that as long as s doesn't have a DC component, it is
straightforward to compute the absolute noise envelope, which is
constant as long as s remains small wrt. e'

If s does have a DC component, the linearization doesn't work.  The
more DC there is, the less "room" there is for the AC component.

A better assumption would be to say that s is highpass (we still need
that assumption to compute the total energy) but that it's cutoff is
much lower than that of e'.  Here the total energy (still 50% duty
cycle) is distributed over s and e.  As long as the cutoff of e' is
much lower than nyquist, it seems that the first assumption (setting r
~ e') is sound.

The 2nd assumption can still be used to make sure that s doesn't
exceed the highest point of e'.

What is surprising here is that the presence of a DC component shifts
the dynamic range.  Actually, the same happens in other amplitude-
limited channels.

So.. Given the assumptions above the instantaneous of the high pass
signal (0.5^2) would approximate a white signal, meaning that the
maximum of the PDF is 0.5^2/fs.

This seems to be enough to approximate the signal capacity, using the
part of e' that's below the maximum, keeping in mind that the DC part
is not usable because the linearized channel properties depend on the
DC component, but negligible for computing channel capacity.

( It would be interesting to express all those approximations
exactly..  The exact formula is probably fairly complex. )

So, to top off all the approximations, let's say that the bandwidth is
reduced by x, the oversampling factor, which leads to an increase of x
in amplitude dynamic range.  Plugging this into Shannon's formula[2]
directly gives the asymptotic behaviour in terms of x

         C = B log (P / N)

           ~ B_0/x log ( P_0 * x^2 / N)

which clearly shows that this is a fairly expensive technique when
looking just at the information content of the channel.  The reduction
in capacity is

               log x
               -----
                 x

(where we ignore the constant power x^2) which corresponds to the
intuition that we're using bits to represent individual events E_i,
and not "sums of events" which only needs bits in the order of log
(sum E_i).

This high redundancy makes it plausible to believe that the effect of
bit errors is minor.  It seems to indicate that the "shape" of the
signal is largely irrelevant, so we can probably use that to our
advantage (decorrelation to allow computation with such signals).

So... say x = 100000 which is 5 decades, which is 100dB dynamic range.
The information cost of this is about a factor 6000, meaning that only
1/6000 of the information is actually useful.

That's a lot of room to put some extra stuff!.

Think of this: to change a bit from 0<->1 adds/subtracts energy that
can be seen in the base band (impulse response of the reconstruction
filter) but to switch the position of 2 adjacent complementary bits
has almost no effect in the base band as this blip has no DC
component.  As a result it is probably possible to add random
permutations to the output bits without this being noticed.

Anyways, this also makes it clear why it's probably best to use some
steeper filters as they bring the noise floor down.  With o the order
of the filter this becomes:

              o log x
              -------
                 x

NEXT: Revisit the logic operations on (non-correlated) S/D signals +
check HF noise modulated into base-band.


[1] entry://20120704-134439 
[2] http://en.wikipedia.org/wiki/Channel_capacity


Entry: Haskell & simulation
Date: Wed Jul  4 20:28:38 EDT 2012

It would be a good opportunity to put the S/D modeling in Haskell
first.  The digital part is straightforward.  The missing component
are integrators.  Differentials can be computed using autodiff.  Once
this works, deriving a generator for an implementation should be
straightforward.


Entry: Basic Hypothesis Testing
Date: Sun Jul  8 15:39:34 EDT 2012

1. Formulate a hypothesis
2. Test if it agrees with the data

Questions: What "form" does a hypothesis have?  A function?  A
relation?  How to test?  What form does a test have?  Probability?


Entry: Understanding Density
Date: Tue Jul 10 12:34:06 EDT 2012

Some things have popped up lately that are all related to the problem
of density: a bunch of discrete things interact in an intractable way,
but when this exact representation is integrated or averaged it can
represent a very nice model.  Usually when 1. the granularity is fine
enough and 2. the entropy is high enough, cross terms drop out and
there is a (information-reducing) "morphism" between operations in the
discrete domain and operations in the density domain.

I wonder, how to qualify "enough" for the properties of granularity
and entropy.

Some concrete examples:

- The use of money in the form of discrete transactions in time.  To
  map this to a density model (money stream) makes it easier to think
  about.

- Sigma/Delta modulators: all operations are discrete both in time and
  space: fixed sampling frequency and 0/1 bit levels, but the part of
  the signal we're interested in is a forgetful time integral = low
  pass filter.

- Sound is a phenomenon that can be modeled as fluctuations of
  densities of particle positions and velocities.

All density representations seem to be based on some form of smoothing
= convolution of a discrete distribution with a smooth, local kernel
like, e.g. a bell curve.


Entry: Density of monetary transactions
Date: Tue Jul 10 12:46:01 EDT 2012

See last post.  Compared to Sigma/Delta and sound waves, monetary
transactions seem to be different.  Why is this?

The problem is that simple smoothing doesn't seem to do the trick.  It
appears that some transactions have an intrinsic "bulkiness".  I.e. a
$1000 transaction that happens once a month (e.g. rent payment) is
different in kind than a $10 transaction that happens every day
(e.g. a meal at a fast food restaurant).

Modeling money streams using a density model seems to need to take
this into account: how much time does a transaction cover?  I.e. we
want to recover the "processing" that is done to the transaction.

Conclusion: monetary transactios are not "natural" in the sense that
they miss information, i.e. the intrinsic periodicity of a
transaction.  Is there a good way to attribute a time scale to
transactions that is meaningful?

The goal for (home) finance analysis is to distinguish recurring costs
(taxes, rent, food) from one-time costs.


Entry: Analog Synth using Sigma/Delta signals
Date: Tue Jul 10 15:40:09 EDT 2012

So the question is: do the calculations on expected values in in [1]
actually make sense?  Due to modulation, part of the spectrum will
probably end up in the base band, but maybe this doesn't show up in
the expected value because the average is invisible?

It's probably best to code up a couple of examples and compute
spectra.  However, it still seems that if the noise is not correlated,
the result should be 0 at DC.  With C = A x B, the spectra are
              _
     C(f) = _/ A (f-x) B(x) dx

It really seems that the trick is non-correlated noise. In the
following
              _
     C(f) = _/ [ A(f-x) + A_n(f-x) ] * [ B(x) + B_n(x) ] dx

the convolution of the noise terms should be zero.
              _
        0 = _/  A_n(f-x) B_n(x) dx

If the bands of A(f) and B(f) are small, the noise band in C(f) will
only grow a little, but it will grow: any non-DC signal A(x) | B(x)
smears out the noise B_n(x) | A_n(x) a little.

So it seems that [1] is sound, given that there is no correlation.

The real question is then: how to decorrelate S/D streams?

[1] entry://20090609-111303


Entry: Bitswapper
Date: Tue Jul 10 16:26:28 EDT 2012

Bitstream i in, bitstream o out, such that i _|_ o but /i = /o.

Someting like this: circular shift register, read from a random
position n, then send out that bit and read into position n.  Repeat.
This guarantees there is no average energy lost, but the result is
probably not very correlated with the input.

Implementation: barrel shifter with log_n random input bits, or an
addressable memory.

There has to be a relation between the size of the store and the
quality of decorrelation.  How does this work?

1-bit: no decorrelation
2-bit: either pass or bump

Note that while this might reduce correlation (sum of element-wise
product) it does not mean streams are statistically independent,
because they are related through the property of average.

Even if this doesn't work for anything hi-fi, there must be some
pretty interesting effects hidden here.  Convert a PCM signal to S/D,
then apply this bit algo.  Such a thing is not so hard to build, so
let's try a Pd external.

What are the statistical properties of such a decorrelator?  It's
probabl simplest to start with a couple of concrete examples and work
from there.

2 bit
-----
50% D=1 input
50% state
    50% D=2 input
    50% state
    ...

The output bit is:
1/2 x[1]
1/4 x[2]
1/8 x[3]
...

This looks suspiciously much like a lowpass filter.  On average, the
effect is a filtering with h[t] = 2^-t for t>0 ad h[t] = 0 otherwise.

3 bit
-----
1/3 D=1
2/3 state
    1/3 D=2
    2/3 state
        1/3 D=3
        2/3 state

The output bit is:
1/3    x[1]
2/9    x[2]
4/27   x[3]

or

1/2 (2/3)^n = x[n]

Looks lke this generalizes..

4 bit
-----
1/4 D=1
3/4 state
    1/4 D=2
    3/4 state

1/4
3/4^2
3^2/4^3
...

for p>1 bits, the shape is:

  1/(p-1) (p-1 / p)^n

meaning that if p grows, the tail gets longer.

It seems that the "interesting effects" is just one-pole low-pass
fitering.  But hey, that's a nice primitive!

Still, it's quite an inefficient way of storing the data.  It seems
that this can be implemented in a black box way.

Actually, it doesn't look like this is correct.  The state shrinks in
the different cases:

1/4 D=1
3/4 state
    1/3 D=2
    2/3 state
        1/2 D=3
        1/2 state

actually that can't be true.  This is only for a non-recursive
variant, where we shift out, then write in a random place.  The first
swapper algo is definitely infinite tail.

The question is though: what kind of baseband noise does this
introduce?


Entry: Inverse of 7 mod 2^32
Date: Tue Jul 10 20:27:57 EDT 2012

7 * x = 1 mod 2^32

How to compute?  Only exhaustive?


Entry: Counter bitswapper
Date: Wed Jul 11 13:33:25 EDT 2012

It seems that picking a random bit from a set of bits can be
represented much more efficiently using a counter.  Keep track only of
the number of bits that are in the bucket, then randomly generate a 0
or 1 based on this level and update the counter.

If this behaves the same, then the "impulse response" should be the
same also.  Though I don't see a way of how to compute it.


Entry: FFT / DFT
Date: Sat Jul 14 16:27:00 EDT 2012
Type: tex

I'm trying some DSP stuff in Haskell and I need to find a nice way to
compute some spectrum diagrams from what are essentailly infinite
streams.  Starting from the Cooley-Tuckey algorithm, how to best
express it in terms of infinite streams such that precision can be on
demand?

The radix 2 FFT is most easily explained in terms of the DFT of $N =
2^n$ size signals, with $x_N[0 \ldots N-1]$ the signal, $X_N[0 \ldots
N-1]$ the spectrum and $w_N = e^{- i 2 \pi / N}$
$$X_N[f] = \sum_{k=0}^{N-1} w_N ^ {kf} x_N[k].$$
The divide-and-conquer step is
based on taking the even and odd components of the input stream
$$X_N[f] = \sum_{k'=0}^{N/2-1} w_{N}^{2k'f} x_N[2k'] +
\sum_{k'=0}^{N/2-1} w_{N}^{(2k'+1)f} x_N[2k'+1].$$
Using $w_{N}^2 =
w_{N/2}$ this is $$X_N[f] = \sum_{k'=0}^{N/2-1} w_{N/2}^{k'f} x_N[2k']
+ w_N^f \sum_{k'=0}^{N/2-1} w_{N/2}^{k'f} x_N[2k'+1].$$ Defining the
decimated signals $x_{N/2}[k] = x_N[2k]$ and $x'_{N/2}[k] = x_N[2k+1]$ this gives
$$X_N[f] = X_{N/2}[f] + w_N^f X'_{N/2}[f],$$ where we use the
convention that capitalization relates signal and DFT.  This
recurrence relation produces a tree structure in the computational
dependencies.

A second important set of observations is that $X_{N/2}[f+N/2] =
X_{N/2}[f]$ due to periodicity, and $w_N^{N/2} = -1$, which allows the reuse of
intermediate results in the expression for $X_N[f]$ to also compute
$$X_N[f+N/2] = X_{N/2}[f] - w_N^f X'_{N/2}[f],$$ This relation produces
the butterfly structure.

Is it possible compute the FFT of $x_4[0,1,2,3]$ by combining FFTs of
$x_2[0,1]$ and $x'_2[2,3]$?  This would give a simpler conversion
between time and frequency domains, incrementally converting a hopped
STFT to one with different decimation.  Let's derive this special case
to see how it would work.  It's clear that $X_4[0] = X_2[0] + X'_2[0]$
and $X_4[2] = X_2[1] + X'_2[1]$ as they use the same weighting
waveforms: all $(1,\ldots)$ and $(1,-1,\ldots)$.  Can the other 2
components be computed?  Doesn't seem so, as the waveforms are
$(1,i,-1,-i)$ and $(1,-i,-1,i)$, and there's no way to compose that
from $(1,1)$ and $(1,-1)$.  So it seems it is important to note that
the sub--FFTs are FFTs of \emph{decimated} signals.  It's the
decimation that allows composition like:
$$(1,i,-1,-i) = (1,0,-1,0) + i(0,1,0,-1)$$
$$(1,-i,-1,i) = (1,0,-1,0) - i(0,1,0,-1)$$ Another way of explaining
it is that the sub-FFTs need to contain low--frequency information.


Entry: FFT: recursive construction of analysis functions
Date: Sat Jul 14 22:31:22 EDT 2012

Some simplified visualization of composition of analysis functions in
the FFT.  A bin in an FFT/DFT is the input signal modulated by an
oscillatory analysis function, then summed (averaged).

The oscillatory functions have the property that time shift is the
same as multiplication by a phase rotation factor.

The idea is that for N, the basic oscillation has N phase
states, here represented as numbers from 0 to N-1.

N=2

0 0
0 1

N=4

0 0 0 0
0 1 2 3
0 2 0 2
0 3 2 1

N=8

0 0 0 0 0 0 0 0
0 1 2 3 4 5 6 7
0 2 4 6 0 2 4 6
0 3 6 1 4 7 2 5
0 4 0 4 0 4 0 4
0 5 2 7 4 1 6 3
0 6 4 2 0 6 4 2
0 7 6 5 4 3 2 1

Going from N=2 -> N=4 we have composition of 2 decimated signals with
the following base functions.  Here '.' means amplitude zero (we use 0
to indicate phase angle 0, amplitude 1).

( Expressed as angle = x \in [0,1] )

0 . 0 .
0 . 1 .
. 0 . 0
. 0 . 1

( Expressed as angle = x \in [0,3] )

0 . 0 .
0 . 2 .
. 0 . 0
. 0 . 2

To create the 4 base functions for N=4 it is clear that the rotated
odd sampled base functions need to be added to the even sampled ones
to create all 4 base functions.  Rotation of base functions commutes
with weighted summation, hence in the FFT the rotation happens on the
composition of FFT bins from the 2 shifted contributions, i.e. the
combination of 2 weighted sums.


even/odd contr     rotation of odd base function
------------------------------------------------

N=4

0 . 0 .
. 0 . 0            0

0 . 2 .
. 1 . 3            1

0 . 0 .
. 2 . 2            2  (= phase 0, ampl: -1)

0 . 2 .
. 3 . 1            3  (= phase 1, ampl: -1)


N=8

0 . 0 . 0 . 0 .  
. 0 . 0 . 0 . 0    0

0 . 2 . 4 . 6 .
. 1 . 3 . 5 . 7    1

0 . 4 . 0 . 4 .
. 2 . 6 . 2 . 6    2

0 . 6 . 4 . 2 .
. 3 . 1 . 7 . 5    3

...


Note that a phase rotation by N/2 is a multiplication by -1, which
allows for some optimizations.

So what does an FFT do in plain language?  

1) It uses 2 phase-shifted coarse analysis functions for frequency f
(expressed wrt. decimation factor N) to construct a new, finer
detailed analysis function for frequency f.  This is done by applying
a phase shift to the second analysis function such that the decimation
gaps line up.

2) Using this finer analysis function f it constructs an analysis
function f+N/2 by modulating the first one with 1,-1,1,...  This
doubles the frequency resolution on each step.


Entry: Smoothing financial data
Date: Sun Jul 22 08:52:46 EDT 2012

Using the heuristic that large amounts span larger time frames, it
seems best to parameterize the data based on span ~ amount in such a
way that the integral is equal to the size of the transaction.
I.e. "shape-invariant" smoothing.

How to implement?  This needs an evaluation function that sums over
all data points, evaluating the interpolating function, maybe a
spline.

To determine a proper scale it's probably best to sync to income,
i.e. a quadratic spline spanning 2 months corresponds to the average
monthly payment.

Maybe negative and positive amounts should also be covered
differently:  income lags but expense trails.

What do I need
- CSV -> Haskell
- Haskell -> Gnuplot


Entry: Mean/variance estimator
Date: Mon Jul 23 11:33:11 EDT 2012

For large number of samples this is just:
m = \sum x_i
v = \sum (x_i - m)^2

Samples are 10 bit, so there is room for 32 - 10+10 = 12 bits or 2^12
is 4096 samples, though this of course depends on the variance.

What I want is some kind of sliding window.  The simplest approach is
to reset the accumulators.  Doing it incrementally requires a delay
line and memory to store the samples.

So it seems the tradeoff here is bit depth: we get more effective bits
if we represent the variance in terms of the current mean, and perform
an adjust on every sample.  This makes the updates more costly so
probably not worth it.


Entry: S/D notes
Date: Sat Jul 28 20:38:19 EDT 2012

- From [1], not all possible code words are generated in the
  constant-input case.

  Can this be turned around and translated to an optimization problem
  where a signal waveform is generated such that noise is maximally
  decorrelated from signal, possibly storing these in tables?

- Could subsampling/supersampling work, i.e. pitching?  What's the
  effect on noise?  If the samples are not short-time correlated, this
  could possibly work.  ( It seems that a lot of applications arise
  when the correlations can be controlled.. )

- Can an incremental decorrelator work?  I.e. starting from a linear
  approximation, is it possible to "push" parts of the noise to higher
  frequences?

- Does it make sense to use 2-dimensional representations,
  i.e. complex signals?  This gives more "room" to push the noise to?

- Decorrelation: "more" decorrelation corresponds to low-pass
  filtering the signal, and probably also moving the noise into the
  lower bands.  Investigate that latter part.

- Can the decorrelator be used with fractional samples?  Actually, it
  is a sigma-delta converter with non-deterministic outputs.


Paper notes from trip:

- Even when there is no correlation between 2 noise signals and their
  respective signals, XOR (multiplication) always introduces noise
  spectrum shifts based on the highest frequencies.  However, in
  practice this might be controlled (i.e. reso lowpass), and not even
  a problem (human hearing drops to 0, doesn't roll of gently).

- Sound synthesis, why the trouble?  There should be no aliasing due
  to nonlinearities + AND gives a "free" nonlinearity.

[1] isbn://0792393090


Entry: The non-deterministic S/D modulator
Date: Sat Jul 28 21:53:32 EDT 2012

The decorrelator mentioned before can be generalized by taking a
fractional bit input, and producing a random output bit based on the
current accumulator.  This is the basic building block for the whole
S/D synth approach.

How to make this well-defined?  I.e. for a deterministic encoder a bit
is generated if a threshold is reached.  This needs to be reformulated.

E.g. if state is 1.00 and input is 1.00, the probability of a 1 bit
should be 100%, which then leaves the state at 50%.

If state is 0 and input is 0, the probability of a 1 bit should be 0%

If state is seeded by 0, and input is 0.50 on every step, by symmetry
the probability of the output should always be 50%.

What about this:
- If state > 1  -> deterministic 1
- If state < 0  -> deterministic 0
- Else -> value = uniform probability for 1

This seems ad-hoc, however the hard limits are really caused by the
hard limits of saturated 0/1 signals.

It sees that in the non-deterministic case, the state needs to be able
to "buffer" an "unlikely 1".  Is there a way to make this buffer
larger to spread it out more?  I would guess there is going to be some
tradeoff in the probability distribution.

E.g. state is 0.01 and input is constant 0.01 which means there is a
1/50 chance the next pulse is a 1.  If it is, then the next 99 pulses
are going to be guaranteed 0.  This seems too rigid.  It is
"correlation due to saturation".  Is it possible to spread out pulses
over a longer period of time?

What is the difference with the decorrelator?  It has an n-bit bucket
(represented as a log(n) bit counter) and generates bits based on this
counter.

Big difference is that a separate decorrelator has the
non-deterministic output outside of the S/D feedback loop...

It's probably time to let this sit a bit and/or do some simulations.


Entry: Combine S/D mod + decorrelator
Date: Sun Jul 29 17:28:29 EDT 2012

Issue: can SDM and decorrelator be combined in one or is it necessary
to place them in series?  What's the difference?

Combined, there seems to be an arbitrary element about the "spread" of
the energy mismatch, i.e. the tradeoff between decorrelation and
exactness.  Also, how does the mismatch effect noise shape?

Integrating first-order S/D:

  s'  = s  + in
  s'' = s' - out
  out = f s'

Here 'f' is a random generator using its stored energy as a
probability to generate a "1" bit.  Probability should approach 0 when
bucket approaches 1 from above, and falls off based on the depth of
the bucket.

Main question seems to be: is the bucket bounded?  I.e. does the
chance for 0 or 1 approach 100% at some point.

Seems best to make it symmetric: for generating a "1", s=0 is the
p=50% point, s=-1 is p=0% and +1 is p=100%.  Then spreading the bucket
is to spread this out more.


Entry: Switchcap
Date: Mon Jul 30 19:53:05 EDT 2012

Paraphrasing Dan Piponi, sometimes you can learn more from trying to
decipher a single sentence in an advanced paper that's way over your
head, than from going through introductory or course material that is
intended to teach you things gradually.  I've seen this happen so many
times: it's wrong and confusing until suddenly it's right and clear.
Bing!

For S/D I was reading this book[1] and it made me realize that a
switchcap filter is discrete in time but continous in amplitude, while
any other digital signal is both discrete in time and amplitude.

However, the Z-transform used in digital signal processing is actually
defined in terms of real number amplitudes which makes a switchcap
filter a better representative of a discrete filter than an ordinary
finite-precision digital filter!

One of those things that's obvious and fairly basic in retrospect but
hiding in a corner until it was made explicit..

[1] isbn://0792393090
[2] http://en.wikipedia.org/wiki/Switched_capacitor


Entry: Random book?
Date: Mon Jul 30 20:10:54 EDT 2012

Bookshelf diving.  Right rack, starting at top from top to bottom.

Book 6, page 76, Knuth TACP Vol1, Basic Concepts, Harmonic numbers,
the Riemann zeta function.

Book 13, page 105, Abelson&Sussman SICP, Example: Symbolic
Differentiation,

Book 18, page 95, Conway, A Course in Functional Analysis, The
principle of uniform boundedness.


Entry: Amplitude vs. frequency distribution
Date: Tue Aug  7 19:26:11 EDT 2012

Probability distribution of amplitude has little to do with frequency
distribution, which is related to conditional probabilities of
different moments in time.

( Obvious, but apparently my intuition got off again.. )


Entry: Smoothing threshold detections
Date: Wed Aug  8 10:48:22 EDT 2012

Input: threshold detections of maximum signal excursion, i.e. for a
measurement sequence x[n], the input is the following reduction:

       in = max_{n=1:N} |x[0] - x[n]|} > thresh


Entry: Expected value of maximum
Date: Thu Aug  9 17:58:09 EDT 2012

How to compute the expected value of a normally distributed random
variable, given the mean, variance and the number of samples taken?


Entry: New method for solving linear equations over finite fields F_p, p prime
Date: Fri Aug 10 09:50:40 EDT 2012

Fount this[1] on HN this morning.  Maybe look into that a bit more, as
I've recently found that part of my Sigma/Delta synth is going to be
an algorithm with needs for noise.  I believe Knuth has some topics on
random algorithms in TAOCP.

[1] http://rjlipton.wordpress.com/2012/08/09/a-new-way-to-solve-linear-equations


Entry: Phases of 60 seconds
Date: Tue Aug 14 09:45:18 EDT 2012

What's the simplest way to divide 60 seconds into a number of phases
without performing division, i.e. only shifts?

To do it exactly it's only possible to go up to 4/minute, by masking
out everything except the 2 lowest bits, and comparing these to 0.

Next up is 8 which will do: 0, 8, 16, 24, 32, 40, 48, 56, 60, ... so
the last interval is 4 seconds instead of 8.  Might not be so
desirable.


Entry: Meaning
Date: Sat Aug 25 08:52:06 EDT 2012

Reply to Antti's article.


- Ad-hoc meaning (Forth / Machine langauge / Lisp).

  The appeal here is definitely conceptual simplicity and expressive
  power.

  The big idea I think is feedback: should interpretation be strictly
  layered (ML / Haskell style, or even to some extent the Racket
  Scheme staged macros), or should it feed back onto itself: no
  distinction between data and code.

  This feedback loop seems to be essential to keeping things
  conceptually simple at the base level.


- Types.  While typed languages are in some sense less elegant due to
  their ad-hoc structure, these systems can be useful for generating a
  different kind of expressive power.

  E.g. dispatching based on function return type for Haskell style
  type classes.  This kind of "precognition" is of a different nature
  than purely local meaning available in feedback systems.

  However, heterogenous type/meta systems impose a lot of
  problems. Most importantly maybe: simplicity of the core system is
  lost (e.g. again Haskell: type system is very complex compared to
  the lambda calculuse it "protects").

  Basically, compared to the conceptual beauty of a feedback system
  (same language is base and meta), a layered type system has a
  different kind of freedom: meta level and language level do not need
  to correspond, and thus can very widely.

  ( Note that in Haskell there are essentially 3 levels: 1. Base
  lambda calculus, 2. the type system visible by the programmer, and
  3. the implementation of all the different type system extensions
  which map abstract logic stuff to more simple constructs.  To be
  fair, 3. needs to be counted; it seems to be necessary to keep
  piling extension onto extension to work around the limitations of
  the type system.  This has to be done as part of the compiler, not
  as part of a program.  It is not very modular!  And this work cannot
  be done by mortals either... )


- On the practical side, after using both approaches fairly
  intensively I'm still not sure which one is better, but it's obvious
  they are different and lead to a different way of thinking.

  One advantage of strong type systems and impossibility of "language
  feedback" is that they seem to be easier to read.  

  The ad-hoc nature of the "implicit meaning" that is often present
  only at the time the programmer writes an ad-hoc feedback structure
  is very hard to reconstruct from the code, and is often a "short
  term memory" organization in the programmer's head.  ( Of course
  here "programmer" is me in my own experience. )

  From my experience, writing very "semantically stacked" code in a
  dynamic/feedback language is a sure way of pissing off people that
  need to maintain your code after you're done with it!


  As for writing in Haskell vs. writing in Scheme.  In Haskell I get a
  good feeling of "wow, finally" being able to express the structure
  of what I want to do.  This often leads to a better understanding,
  but being required to do so can be a pain in the ass.  In Scheme
  things go a lot faster but often I loose track because of some
  hidden inconsistencies or differences between what I think it does
  and what it is actually doing.  The latter almost never happens in
  Haskell because it's meaning is more rigid.


Entry: The Joy of Cats
Date: Sun Sep  2 17:42:06 CEST 2012

[1] http://katmat.math.uni-bremen.de/acc/acc.pdf


Entry: Stochastic vs. Deterministic
Date: Sun Sep  2 17:42:47 CEST 2012

Goal:

Find a good mathematical model for representing real-valued, discrete
time "low pass" signal s[n] encoded in fixed sample rate binary signal
b[n].

This consists mainly of finding a morphism between operations on b[n]
and real-valued continuous time signals, i.e. map the signals + a
collection of meaningful operations.


Status:

The main heuristic is that "proper decorrelation" is a good idea to
build an interesting morphism.

  b[n] = s[n] + e[n]

b[n] and s[n] are related through some kind of linear smoothing /
low-pass filtering operation.

An important property is that s[n] is recoverable from b[n] such that

  R (b[n]) = s[n] + e'[n]  with |e'| << |e|

meaning that a recovery function function R takes the binary sequence
b and pruces an approximation to s that is much better than b itself.
Ideally |e'| = 0.

Currently it is assumed that R is a linear low pass filter, which
means that e is a high pass signal: all the noise is in the higher
frequency bands.


Representing b[n] \in {1,+1} allows the use of the real multiplication
operator to represent the boolean function AND.  To make
multiplication of two binary signals b1[n] and b2[n] meaningful, the
correlation of their noise components should be meaningful:

     b1 = s1 + e1
     b2 = s2 + e2
     
     b1 b2 = s1 s2  +  s1 e2 + s2 e1 + e1 e2
     b12   = s12    +  e12

In some sense, the resulting signal should have the same properties,
such that R(b1 b2) ~= s1 s2

Sticking to the linear filter reconstruction property, we need e12 to
be high pass so it can be filtered out by R.  This gives 3 conditions:

     R(s1 e2) ~= 0
     R(s2 e1) ~= 0
     R(e1 e2) ~= 0

of course still satisfying the reconstruction properties for the
signals themselves, next to the reconstruction of the product above:

     R(e1)    ~= 0
     R(e2)    ~= 0

If R is linear, these equations are weighted infinite sums.

The question is now: given arbitrary s1 and s2, is it possible to
construct b1 = s1 + e1 and b2 = s2 + e2 such that the constraints are
met?

My guess is that it is not possible to do this exactly and thus this
needs an error measure to quantify the approximation "~=".


Entry: Where is the low frequency noise in S/D approximation?
Date: Tue Sep  4 14:07:56 CEST 2012

( Context: understanding properties of the binary sequence produced by
           a Sigma/Delta modulator. )

Supposedly (following from linear approximation and white quantisation
noise assumptions) the frequency envelope of the S/D approximation
noise is F^{-1}, where F is the loop filter, e.g. lowpass one pole or
integrator.

Then I seem to have a paradox.  Suppose we're approximating a constant
signal s[n] = 2/3 with the periodic sequence b = 1,1,0,...  This gives
a periodic approximation error e = 1/3, 1/3, -2/3, ...  (following b =
s + e)

This b is the output of a S/D modulator with F = 1 / z-1, a pure
(summing) integrator with pole at DC: w_0 = 0, e^{j w_0} = z_0 = 1.

Where is the low frequency component of e?  The signal is periodic
with period 3, meaning all frequency components are multiples of fs/3
which is high, meaning there are no low frequency components in e.

Is this because linearizated analysis is too crude an approximation?

This is a proper S/D signal?  It is the
output of a S/D modulator with F = 1 / z-1, a pure (summing)
integrator with pole at DC: w_0 = 0, e^{j w_0} = z_0 = 1.


Is it possible to construct an error signal that does have a
low-frequency component?  I'm guessing that this would be the case for
irrational constant signals.

Just as in the analysis above, if s[n] is constant, the frequency
content of b and e are the same, so it is enough to limit the analysis
to b.

Picking F = (z-1)^-1 allows a similar simple accumulation model to
generate the binary sequence.  As irrational number, pick 1/sqrt(2).

What is the sequence?

This can be represented by a graph of y = D int(x/sqrt(2)).  Each
impulse is a 1.

This sequence will be a-periodic, which means that it will have
non-vanishing arbitrarily low frequency content: for any period p, no
matter how large, we can never have

   1/sqrt(2) - \sum_{k=1}^p b[k] = 0


To construct an approximation to a constant signal that has some some
low frequency content in the error signal, we simply need to pick the
period large enough.  If s[n] = q/p with q<p, then the output will be
periodic with period p.

So can it be seen that at least the amplitude of low frequency content
is proportional to 1/p, following the conclusion of the linear
analysis?

An example:
 3/11  0  3  6  9  1  4  7 10  2  5  8  0
-8        0  0  0  1  0  0  0  1  0  0  1

The DC component obviously corresponds to 3/11.  The frequency
component corresponding to 3/11 fs is:

          e^3x + e^7x + e^10x   with x = j 2pi/11


Conclusion: for constant approximation there is going to be
low-frequency content apart from some exceptions because:

- (real)  most real numbers are irrational
- (n-bit) most integers are "fairly" coprime with 2^n

( fairly meaning they have a factor 2^m with m << nb bits in word )


Entry: Ergodic theory
Date: Tue Sep  4 14:48:42 CEST 2012

Line of thought in previous post was probably primed by some recent
mentioning in correspondence with Antti e.a. of Ergodic Theory[1].

[1] http://en.wikipedia.org/wiki/Ergodic_theory


Entry: Approximaton as low-denominator rationals
Date: Tue Sep  4 14:55:56 CEST 2012

I wonder if it makes sense to represent signals as low-denominator
rational values to avoid making low-frequency correlations?

Probably not, since necessary time-variance probably introduces such
correlations.  Might make nice pictures though..

EDIT: Maybe it is useful simply because there is a closed form
expression of the bit sequence for piecewize approximation of
low-denominator rational numbers.


Entry: Decorrelation through addition of state noise
Date: Tue Sep  4 15:19:13 CEST 2012

Shifting the sequence forward/backward can be done by
adding/subtracting to the integrator state.  Maybe this is enough for
decorrelation?


Entry: Diagrams of x/11
Date: Tue Sep  4 17:51:46 CEST 2012

   0 1 2 3 4 5 6 7 8 9 T 0 1 2 3 4 5 6 7 8 9 T
 0 . . . . . . . . . . . . . . . . . . . . . .
 1 . . . . . . . . . . X . . . . . . . . . . X
 2 . . . . . X . . . . X . . . . . X . . . . X
 3 . . . X . . . X . . X . . . X . . . X . . X
 4 . . X . . X . . X . X . . X . . X . . X . X
 5 . . X . X . X . X . X . . X . X . X . X . X
 6 . X . X . X . X . X X . X . X . X . X . X X
 7 . X . X X . X X . X X . X . X X . X X . X X
 8 . X X . X X X . X X X . X X . X X X . X X X
 9 . X X X X . X X X X X . X X X X . X X X X X 
10 . X X X X X X X X X X . X X X X X X X X X X
11 X X X X X X X X X X X X X X X X X X X X X X


 4/11 0  4  8  1  5  9  2  6 10  3  7  0 
-7       0  0  1  0  0  1  0  0  1  0  1

 5/11 0  5 10  4  9  3  8  2  7  1  6  0
-6       0  0  1  0  1  0  1  0  1  0  1


Two symmetries, one obvious: the 0-1 duality (top/bottom) and one
"almost" symmetry around 5: mirrors except 0/T, which I can't explain.
It has to do with rounding, i.e. the last column T has all the
"fractional bits".

This might be an artifact of 11 = 12 - 1 with 12 being highly
composite: 2*2*3

A Haskell program for visualization:

-- Output sequences of Sigma/Delta modulator (1st order non-leaky
-- accumulator) approximating constant rational.

bit False = 0    
bit True  = 1

-- Bit sequence
sd q p = f 0 where
  f s = b : f s'' where
    s' = s + q
    b  = bit $ s' >= p
    s'' = s' - b * p
    

-- N periods
sdn n q p = take (fromInteger $ n * p) $ sd q p    

-- Print
ps = concat . (map f) where
  f 0 = "."
  f _ = "X"
  
-- Build a spectrum: wave forms for values from 0/p to p/p  
spec n p = concat $ map (++ "\n") lines where
  lines = map (\q -> ps $ sdn n q p) [0..p]
s n p = putStr $ spec n p


Example: 2 periods of all x/37 waveforms
*Main> s 2 37
..........................................................................
....................................X....................................X
..................X.................X..................X.................X
............X...........X...........X............X...........X...........X
.........X........X........X........X.........X........X........X........X
.......X......X.......X......X......X.......X......X.......X......X......X
......X.....X.....X.....X.....X.....X......X.....X.....X.....X.....X.....X
.....X....X....X.....X....X....X....X.....X....X....X.....X....X....X....X
....X....X...X....X....X...X....X...X....X....X...X....X....X...X....X...X
....X...X...X...X...X...X...X...X...X....X...X...X...X...X...X...X...X...X
...X...X...X..X...X...X..X...X...X..X...X...X...X..X...X...X..X...X...X..X
...X..X...X..X..X...X..X..X...X..X..X...X..X...X..X..X...X..X..X...X..X..X
...X..X..X..X..X..X..X..X..X..X..X..X...X..X..X..X..X..X..X..X..X..X..X..X
..X..X..X..X..X..X.X..X..X..X..X..X.X..X..X..X..X..X..X.X..X..X..X..X..X.X
..X..X.X..X..X.X..X..X.X..X..X.X..X.X..X..X.X..X..X.X..X..X.X..X..X.X..X.X
..X.X..X.X..X.X..X.X..X.X..X.X..X.X.X..X.X..X.X..X.X..X.X..X.X..X.X..X.X.X
..X.X.X..X.X.X..X.X.X..X.X.X..X.X.X.X..X.X.X..X.X.X..X.X.X..X.X.X..X.X.X.X
..X.X.X.X.X..X.X.X.X.X.X..X.X.X.X.X.X..X.X.X.X.X..X.X.X.X.X.X..X.X.X.X.X.X
..X.X.X.X.X.X.X.X.X.X.X.X.X.X.X.X.X.X..X.X.X.X.X.X.X.X.X.X.X.X.X.X.X.X.X.X
.X.X.X.X.X.X.X.X.X.X.X.X.X.X.X.X.X.XX.X.X.X.X.X.X.X.X.X.X.X.X.X.X.X.X.X.XX
.X.X.X.X.X.XX.X.X.X.X.X.XX.X.X.X.X.XX.X.X.X.X.X.XX.X.X.X.X.X.XX.X.X.X.X.XX
.X.X.X.XX.X.X.XX.X.X.XX.X.X.XX.X.X.XX.X.X.X.XX.X.X.XX.X.X.XX.X.X.XX.X.X.XX
.X.X.XX.X.XX.X.XX.X.XX.X.XX.X.XX.X.XX.X.X.XX.X.XX.X.XX.X.XX.X.XX.X.XX.X.XX
.X.XX.X.XX.XX.X.XX.XX.X.XX.XX.X.XX.XX.X.XX.X.XX.XX.X.XX.XX.X.XX.XX.X.XX.XX
.X.XX.XX.XX.XX.XX.X.XX.XX.XX.XX.XX.XX.X.XX.XX.XX.XX.XX.X.XX.XX.XX.XX.XX.XX
.XX.XX.XX.XX.XX.XX.XX.XX.XX.XX.XX.XXX.XX.XX.XX.XX.XX.XX.XX.XX.XX.XX.XX.XXX
.XX.XX.XXX.XX.XX.XXX.XX.XX.XXX.XX.XXX.XX.XX.XXX.XX.XX.XXX.XX.XX.XXX.XX.XXX
.XX.XXX.XXX.XX.XXX.XXX.XX.XXX.XXX.XXX.XX.XXX.XXX.XX.XXX.XXX.XX.XXX.XXX.XXX
.XXX.XXX.XXX.XXX.XXX.XXX.XXX.XXX.XXXX.XXX.XXX.XXX.XXX.XXX.XXX.XXX.XXX.XXXX
.XXX.XXXX.XXX.XXXX.XXXX.XXX.XXXX.XXXX.XXX.XXXX.XXX.XXXX.XXXX.XXX.XXXX.XXXX
.XXXX.XXXX.XXXX.XXXXX.XXXX.XXXX.XXXXX.XXXX.XXXX.XXXX.XXXXX.XXXX.XXXX.XXXXX
.XXXXX.XXXXX.XXXXX.XXXXX.XXXXX.XXXXXX.XXXXX.XXXXX.XXXXX.XXXXX.XXXXX.XXXXXX
.XXXXXX.XXXXXX.XXXXXXX.XXXXXX.XXXXXXX.XXXXXX.XXXXXX.XXXXXXX.XXXXXX.XXXXXXX
.XXXXXXXX.XXXXXXXX.XXXXXXXX.XXXXXXXXX.XXXXXXXX.XXXXXXXX.XXXXXXXX.XXXXXXXXX
.XXXXXXXXXXX.XXXXXXXXXXX.XXXXXXXXXXXX.XXXXXXXXXXX.XXXXXXXXXXX.XXXXXXXXXXXX
.XXXXXXXXXXXXXXXXX.XXXXXXXXXXXXXXXXXX.XXXXXXXXXXXXXXXXX.XXXXXXXXXXXXXXXXXX
.XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX.XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX


Entry: S/D conclusions
Date: Tue Sep  4 22:13:26 CEST 2012

* From wave form pictures: nothing looks really special apart from
  low-denominator rational numbers.

* From analysis: it's clear why for low-denominator rational numbers
  there is no low-frequency content, and why there is some LF content
  for high-denominator rational numbers, but not (yet) clear that this
  follows the frequency slope from linear analysis (== inverse of
  integrator == high pass noise).

* The non-leaky 1st order accumulator is really a value -> frequency
  converter.  It has non-vanishing autocorrelation.  From this another
  suggestion for decorrelation can be suggested: integrator state
  increments/decrements with average 0.


Entry: Plotting the approximation error
Date: Tue Sep  4 22:55:11 CEST 2012

What to plot?
Start with bit depth, e.g. d=16 bits.
This gives p=2^d

For each wave form assocated to q:1->2^d-1, plot the FFT on a log/log plot.

This gives a density plot that gives an idea of the distribution of
the error assuming uniform input amplitude distribution.

In first approximation maybe a density plot isn't necessary.  Just the
average over all wave forms might be good enough?  Though it would be
nice to also have variance and shape of distribution.


Entry: How to make a good density plot?
Date: Tue Sep  4 23:05:00 CEST 2012

I know of only two ways:

- Histogram: tuning is the interval resolution.  Trade off: more
  space resolution gives less density resolution.

- Kernel method: Each discrete event is spread out in space.  Basic
  "tuning" parameter is the size of the kernel.  Trade off: more space
  resolution 9smaller kernel size) gives a more bumpy density curve.

I can think of some mods:

- Adaptive kernel?  (E.g. for financial data?)  Decrease kernel size
  in high-density regions.

It would be good to figure out a good way to do this in Haskell +
Gnuplot.  Probably needs the preprocessing (events -> curve) in
outside of Gnuplot.


Entry: Closed form expression for A1 modulator output + Fourier spectrum
Date: Wed Sep  5 14:06:19 CEST 2012

An indication of the "noise" component in a S/D signal from a
constant, q/p rational-valued signal, using a non-leaky 1st order
integrator / accumulator (an "A1 modulator"), can be computed as the
lowest harmonic of the Fourier expansion of the p-periodic binary
output signal b[k].

  \sum_{k=1}^p b[k] w^k  with w=e^{i 2 \pi / p}

In a single period b[1] to b[p] we have q nonzero values.  This
follows from the average q/p and b[i] \in {0,1}.  The indices of these
non-zero sequence elements are given by

  I = { ceil( k p / q ) | 1 <= k <= q }

The state of an A1 modulator can be represented as the rotation angle
of a regular polygon with p sides, where the input (the constant
number q/p) represents the constant rotation to be performed at each
time step.  The output of the modulator is 1 at each time step that a
new full revolution has been completed.

To find the indices at which these revolutions complete, first
approximate them fractionally, solving for n in

            q/p * n_f = k   or   n_f = k * p/q,

reasoning that a full revolution is completed for each integer k.

However, n cannot be fractional so if n_f is not an integer, the full
revolution will be completed at the nearest later time index obtained
by rounding n_f to the nearest larger integer n = ceil(n_f). This
corresponds to the definition of I.

( Note that because q <= p making p/q >= 1, this equation never yields
  the same index more than once. )

In an example, q/p = 3/11 we have   n = k*11/3

k              1        2       3
n_f            1*11/3,  2*11/3, 3*11/3
                 3.67,    7,33,     11
n = ceil(n_f) =     4,       8,     11

The output bit sequence is: 

1  2  3  4  5  6  7  8  9 10 11
0  0  0  1  0  0  0  1  0  0  1

and the amplitude of the lowest harmonic in the Fourier analysis is

   w^4 + w^7 + w^11 != 0   w = e^{2 \pi / 11}

which is almost zero (easy to see when drawing the 3 vectors on paper)
but not quite.  It would be zero if fractional indices were allowed,
making the 3 vectors the sides of a regular triangle.

   w^{1 * 11/3} + w^{2 * 11/3} + w^{3 * 11/3} = 0


Conclusion: The existence of a closed form expression for the binary
sequence (I) avoids having to perform a simulation to generate the bit
sequence.


-- Closed form expression for indices of non-zero sequence elements.
nz' q p = map n [1..q] where
  n k = ceiling $ k * p / q

-- Type conversion
nz q p = nz' (i q) (i p) where i = fromIntegral


Entry: Approximation of error analysis
Date: Thu Sep  6 13:17:31 CEST 2012

\sum_k exp(i 2 \pi ceil(k p /q) / p) - exp(i 2 \pi k / q)

The second term is zero when summed over k:1->q so the above gives an
exact form for the first harmonic of the approximation.

If p is large, the two angles are similar and a linear approximation
can be used:

   exp(i a) - exp (i b) = exp(i b) * (exp (i (a - b)) - 1)
                       ~= exp(i b) * i (a - b)

This then gives:

\sum_k exp(i 2 \pi k / q) * (i 2 \pi) * (1/p) * fract(k p / q)

where fract(x) = ceil(x)-x

Note this is another DFT 1st harmonic expression for period q.


It's not easy to see how this can be approximated by a constant.  The
1/p factor already gives the 1/p characteristic we're looking for, so
the rest should somehow be 

The fract (k p / q) signal is also fairly regular.

It has to be periodic with period q, so there should be a
straightforward way to compute this.


With r = p mod q, this is the same as fract (k r / q).  I.e. q = 3, r
= 2, this gives the sequence:  2/3, 1/3, 0.


So it seems that depending on the input constant, all cases will be
covered.  Next: closed form (approximate) formula?


Entry: Dynwav
Date: Tue Oct  2 11:09:29 EDT 2012

The good thing about dynwav (generating wave tables for wave table
playback) is that it decouples instantaneous phase/time from
instantaneous frequency.  Something which isn't possible on the
analysis side, meaning information is lost in the process.

The bad thing is that it is too flexible, so the problem to solve
really is to make generation of spectral shapes simpler, and to tie it
into the current playback pitch.

One of the effect's I was thinking about is a smooth morph from
unimodal to bi-modal spectra, to make it more voice-like.

The most important technical problem is T/F allocation, i.e. size of
the DFTs based on the pitch: there is no point using a large wave
table for high-pitched sounds, as most of the higher pitches need to
be zeroed out to prevent aliasing.

Bringing this to market, it might be good to keep the engine internal
and not make the frontend too complicated.  Just add plugins with
synthesis algos that perform the spectrum update, so the focus is the
engine.

pitch -> CHUNKER -> out

Based on pitch, the chunker will use different chunking sizes.  The
idea is to keep the table size similar to the time update.


- chorus / flanger: almost for free as multiple readouts

- chords: meaning same-spectrum chords.  these can be performed by
  sorting the pitches in ascending order, generate them in order, and
  pre-filtering the spectrum to avoid aliasing.


Entry: 2 forms of LFSRs
Date: Wed Nov  7 19:58:48 EST 2012

Fibonacci:

  shifts IN a bit based on the XOR of a pattern[1]

Galois: a direct implementation of multiplication by x to produce a
sequence 1,x,x^2, ... modulo p(x)

  take the bit that shifts out and use to contidionally XOR the lower
  coefficients of the polynomial with the state vector (Used in
  Staapl).

The former is a Fibonacci LFSR, while the latter is a Galois LFSR.
Both are easy in hardware, but the latter is easy in software due to
the availability of a "conditional parallel XOR".


[1] http://en.wikipedia.org/wiki/Linear_feedback_shift_register

Entry: Oversampled signals
Date: Sun Nov 11 20:38:34 EST 2012

A while ago I read something about the 'd'-transform which is
essentially the 'z' transform where the unit operator is not a delay
but a difference.  

                        d = z - 1

This encoding avoids the necessity for high precision coefficients,
i.e. sensitivity of coefficient to transfer function is reduced.
Obviously, some kind of high precision is necessary to make room for
the "DC component" such that the basic difference op is a naked
subtract.

For low-frequency signals, most of the action is around z=1, which
means around d=0.  Around the origin, the d domain looks like the
analog s domain (where s is differentiation).


( I'd like to use this oversampled "virtual analog" simulation, using
the MMX instruction set.  It has only a 16 x 16 -> 32 bit
multiplication, but this would fit nicely.  The reason for using fixed
point is to be able to use cheap saturation for emulation of
power-supply clipping and and simple wrap operations for implementing
phasors.  It should also make porting to fixed point DSP simpler. )

So, basically, this would work mostly as an analog computer.  Let's
rebuild some of that intuition.

1. If 'd' is a difference / 's' is a derivative operator, why do
   analog circuits have integrators?

In engineering schoold they tol be this is because differentiation is
too noisy.  While this might be true, it's not a good thing to tell a
novice.  The real reason is that the underlying model is a
differential equation, i.e.

            d/dt (x) = A(x) + B(u)

Here x is a state vector, u is an input vector, d/dt is the derivative
operator and A(.) and B(.) are vector->vector functions of the
appropriate arity.

If you integrate the whole thing you get an explicit expression of the
x in terms of integrals of functions of x.  For an analog circuit,
that equation directly describes the voltages (or currents) at a
specific point.


2. Is it OK to pretend that d == s ?

Yes, keeping in mind that the stability requirements are slightly
different.  For the human ear however, as long as the poles are close
enough to d = 0, the difference in transfer function between the
(oversampled) digital d operator and the analog s operator is very
small.

In other words, the d plane is just a shifted z plane, but looks very
similar to the s plane for small d.


Entry: Using bilinear texture sampling for anti-aliasing
Date: Thu Jan  3 17:29:42 EST 2013

This would work for anit-aliasing lines, but not for complex shapes;
they need mipmapping.  EDIT: Indeed, works well.

The thing to keep in mind here is to keep the aspect ratio.
Mipmapping uses squares, so when you drop resolution on one axis, the
same hapens for the other one causing visible gradients/blur.

Maybe for what I want to do, a relational model will be a lot easier
to manage.  Rendering is essentially off-line so there is no time
constraint.


Entry: Relational anti-aliased rendering
Date: Fri Jan  4 11:39:46 EST 2013

The "stupid slow" solution to anti-aliased rendering of basic
geometric shapes.

The idea is to render (unions / intersections) of coordinate member
functions.  It seems simplest to not assume anything about the
primitive inequalities (borders) used for finding pixel values, but to
simply do crude oversampling.

To get to 256 greyscale levels one needs 8x8 oversampling.


Entry: Synchronizing "simultaneous" distributed updates.
Date: Fri Jan 11 17:13:19 EST 2013

Suppose 2 data stores D and D' are linked by a non-instantaneous link
which is used to keep the state in both stores the same D == D'.

However, it might happen that a data item d @ D gets updated at the
same time as its twin d' @ D', both to different values.  The finite
time delay between the updates makes this problem a practical one.

How does this usually get solved?  Supposedly there is some kind of
arbitration order imposed on the updates, either by site or in real
time.

EDIT: I found a solution I like because of its simple symmetric
implementation:

- store prev and cur values in both A and B sides.
- if (cur != prev) { send(cur); prev = cur; }

This works well for non-simultaneous updates.  E.g. if a value in A
changes, a message A->B is sent once.  After B receives, it will send
this value back to A.

This has the problem of concurrent updates causing an oscillation if
queue read and write are interleaved as follows:

A: B->set(1)
B: A->set(2)

A: receive set(2)
A: B->set(2)

B: receive set(1)
B: A->set(1)

...

To break this oscillation some kind of order needs to be introduced:
set one of the two as master. E.g. A will always propagate on receive,
but B will never.

Note that one of the two needs to bounce the value, otherwise
simultaneous value changes will not be resolved, and both sides will
end up with different states.


Entry: Declarative line rendering
Date: Sun Jan 13 09:51:52 EST 2013

In OpenGL it is quite straightforward to use a 2-step rendering
process.  As opposed to using (straight-line) OpenGL primitives, this
creates a bit more freedom.

1. Create a texture bitmap
2. Render the texture bitmap to screen

Creating texture bitmaps can be an offline process, so is not so much
resource constrained.  The approach I'm using is to create
anti-aliased bitmaps directly from a specification function 

             shape :: coord -> bool

by evaluating it over a supersampled grid and performing an
anti-aliasing averaging step.  In C such a specirication function
would be:

    bool circle(float x, float y) { return x*x + y*y < 1.0; }

This works well, especially since it is not limited to straight lines
as is the direct style OpenGL drawing approach.

However, just like in the direct style, it can be a bit of a pain
because it requires specification of both "edges" of lines.  I want to
just have one equation for a line, and parameterize that with a
thickness.  How to do this?

- Abstract shapes in a function that just parameterizes thickness directly.

- Use some kind of derivative trick to find the curve's normal.


Entry: Integrated wavetable
Date: Wed Jan 23 03:17:40 CET 2013

Following the anti-aliased sawtooth approach, this technique of course
generalizes to pre-integration / post differentiation [1][2].

I wonder if it's possible to solve the resolution problem by using
some kind of multi-scale representation, where the difference
components are stored "amplified", and differentiation is performed on
this representation.

Important conclusions: higher order is better, but sensitivity to
polynomial interpolation order increases, and sensitivity to numerical
precision increases at high frequencies,

[1] http://dafx12.york.ac.uk/papers/dafx12_submission_69.pdf
[2] md5://b5f03aed795cd98807b6af3648593c57
[3] entry://20110104-153108


Entry: Hoeldrich method for phase generation
Date: Sun Jan 20 23:40:13 CET 2013

As found in the Pd code, d_osc.c phasor~ [1][2].
The trick is to reset part of the word to a know value after update.

#define UNITBIT32 1572864.  /* 3*2^19; bit 32 has place value 1 */

[1] https://github.com/libpd/libpd/blob/master/pure-data/src/d_osc.c
[2] http://music.columbia.edu/pipermail/music-dsp/2004-November/061991.html


Entry: Faust, or point-free DSP
Date: Mon Jan 21 14:12:49 CET 2013

While I very much like the idea, my current opinion is that lambda is
king, and if one wants a combinator approach, it should be written on
top of lambda.  Both named/unnamed data approaches are valid, but
writing unnamed on top of named (lambda) is easier than the other way
around.

I also no longer see the point of writing a DSL that is not embedded
in an "old" tried-and-true functional language (Haskell or Racket).
Writing languages and compilers for DSL is hard enough without having
to also re-implement the low level details like symbol management and
input parsing.

That said, it might be an interesting application to implement Faust
on top of the stream language I'm writing here[2].

[1] http://en.wikipedia.org/wiki/FAUST_(programming_language)
[2] http://zwizwa.be/darcs/meta/rai


Entry: Quantum Information
Date: Sun Feb 24 23:53:24 CET 2013

An interesting talk[1] about interpretation of QM.  Slides here[6].

Merits some investigation, some strong claims made here :)
Seems that most of it comes from [7] and [12].
  
  "We" are not made of atoms, we are made of (classical) bits.
  "Correlations without correlata" -- David Mermin[2].

That last phrase is in [11] and in a 1998 publication by Mermin[12].

Then he goes a bit on a roll:

  The cassical world is not real.  We are just information.  We are
  our thoughts.  We are a simulation running on a quantum computer.

Some quotes skipped in the presentation:

  ... the particle-like behavior of quantum systems is an illusion
  created by the incomplete observation of a quantum (entangled)
  system with a macroscopic number of degrees of freedom.

  ... randomness is not an essential cornerstone of quantum
  measurement but rather an illusion created by it

      -- Nicholas Cerf and Chris Adami

I could trace the first quote down to Cerf and Adami paper[7].  I did
not find many things that cite it, so not sure if this is a fringe
thing...  One is here[8]. 

Quotes skipped in the presentation:

  ... the particle-like behavior of quantum systems is an illusion
  created by the incomplete observation of a quantum (entangled)
  system with a macroscopic number of degrees of freedom.

  ... randomness is not an essential cornerstone of quantum
  measurement but rather an illusion created by it

      -- Nicholas Cerf and Chris Adami

He also mentions Mermin's book[3]: Boojums All the Way through.
Some Arxiv papers[4] on QI.  Some of these mention QBism[5].

I could trace the first quote down to Cerf and Adami paper[7].  I did
not find many things that cite it, so not sure if this is a fringe
thing...  One is here[8].  That paper is not in Cerf's publication
list (through [9]) but it is probably similar to a bunch from that
period that are, like [10] (paywalled).


[1] http://www.youtube.com/watch?v=dEaecUuEqfc
[2] http://en.wikipedia.org/wiki/David_Mermin
[3] http://www.cambridge.org/gb/knowledge/isbn/item1139045/?site_locale=en_GB
[4] http://arxiv.org/find/quant-ph/1/au:+Mermin_N/0/1/0/all/0/1
[5] http://en.wikipedia.org/wiki/Quantum_Bayesianism
[6] http://www.slideshare.net/UnitB166ER/the-quantum-conspiracy-what-popularizers-of-quantum-mechanics-dont-want-you-to-know-by-ron-garret
[7] http://arxiv.org/abs/quant-ph/9605002
[8] http://www.sfu.ca/~pbastani/cmpt881.pdf
[9] http://quic.ulb.ac.be/members/ncer
[10] http://dl.acm.org/citation.cfm?id=300811
[11] http://chaos.swarthmore.edu/courses/phys134/papers/mohr1.pdf
[12] http://arxiv.org/abs/quant-ph/9801057


Entry: Time-variant filters
Date: Thu Feb 28 20:47:36 CET 2013

What are the effects of time-variance of parameters on linear filters?
Recent pubs are mostly about non-linearities in the signal path, but
time variance also creates effects that depend on the circuit
topology.


Entry: Exponentials
Date: Thu Mar  7 09:29:07 EST 2013

I'm running into some interesting problems with the audio synth stuff
I'm working on.  At some points in the program, (complex) exponentials
are necessary.  Some observations:

- exp / sin / cos from math.h, libm.so seem to be a bit slow.

- straight line curves in the form of complex exponentials, e.g. for
  parameter interpolation, can be generated incrementally using just
  multiply add.

- for "algorithm" parameter transformation such as initializing
  interpolating exponential curves, the accuracy is key.

- for "human" parameter transformation, i.e. things like mapping a
  control value to a frequency value, the accuracy of exp probably
  doesn't need to be so high, and approximation could be used.

- Some interesting comments on stackexchange[1].

- Maybe this can be split into 2 parts: linear evolution based on
  previous coefficient + computation of updated coefficient using some
  iteration step, based on some setpoint.


Conclusion seems to be that yes, exp and sin are expensive.


[1] http://math.stackexchange.com/questions/18445/fastest-way-to-calculate-ex-upto-arbitrary-number-of-decimals


Entry: Filter transforms
Date: Thu Mar  7 21:54:11 EST 2013

- Forward/backward diff
- Bilinear[1]
- Impulse invariance (exponential pole mapping) [2]
- Matched-Z (exponential pole/zero mapping) [3]

Forward/backward diff is simplest, but has stability issues for very
resonant poles.  Only works in the limit when poles/zeros go to s=0

Bilinear maps zeros at infinity to zeros at -1 while exponential
essentially samples the impulse response.

[1] http://en.wikipedia.org/wiki/Bilinear_transform
[2] http://en.wikipedia.org/wiki/Impulse_invariance
[3] http://en.wikipedia.org/wiki/Matched_Z-transform_method

Entry: Conservative nonlinearities - hamiltonian systems?
Date: Thu Mar  7 22:00:33 EST 2013

Nonlinearities are cool but
- they can produce aliasing
- non-conservative systems are hard to manage


Entry: Exponentials: exp(spline(t))
Date: Fri Mar  8 11:12:36 EST 2013

Goal: compute the pole path of a 2nd order IIR filter as an
exponential of a complex spline curve, approximating a parameter curve
which is itself exponential or sinusoidal.

Simplification: exact curve tracking is not so important (say 1-2% is
ok), but instabilities should not accumulate.


Two subproblems:

- Kernel routine: exp ( a0 + a1 t + a2 t^2 ) for ai complex and t
  typically 0->63.

- Spline parameter update at block boundaries, compensating for any
  inaccuracies of the kernel routine.

a1 and a2 are probably << 1, so some approximation might be used.

Questions:

* amplitude accuracy is probably more important than phase.  Use
  followed by normalization?

* use linear or multiplicative path.  problem with linear path == more
  attenuation.

* are there any useful bounds on the delta over 64 samples?  e.g. more
  than pi/2 is in 64 samples is probably exceptional.  it seems delta
  is going to be small enough to allow for approximation of
  exponentials occuring in endpoint/increment operations.  -> what is
  the most natural thing to do here?  ignoring numerical errors, it's
  probably possible to get exact figures too.


Practically: limit to linear spline

This makes the update equations for the complex fiter coefficient
curve something like this (for maginitude 1, c=cos(dp), s=sin(dp)

   x+ = cx + sy
   y+ = cy - sx

For numerical stability, best ordered as increments:

   dx = (c-1)x + sy
   dy = (c-1)y - sy

here, (c + i s) comes from  (c' + i s')^{1/64}

questions:

- when c and s are taken from approximations, what does (c + i s)^64
  look like, ideally and numerically?

- can we make sure that errors do not introduce instability,
  i.e. poles should stay inside the unit circle, or, filters should be
  amplitude-limited.

- should phasor and amplitude be computed separately?

- subproblem: given current phase/amplitude, compute linear curve to
  setpoint.


Subproblem: linear curve to setpoint

- In: phasor p(0), desired phase, log amplitude

- Out: p(n) = p(0)q^n

Given the polynomial used to compute sin/cos-1/exp, this is just a
nonlinear equation, which can probably be solved using newton-raphson.

Path:
- construct equation
- construct derivative expression (AI)
- construct NR expression

Find q   s.t.   p q ^n = exp (P') = p'

Here p' is computed using some exp() approximation that is good enough
for "humanly" accurate parameter mapping.  Trouble is the presence of
q, otherwise the ^n could be absorbed by the exp.

What about expressing this differently:

q = (exp(P') / p) ^ {1/n}

q is very close to 1, so it might be better to express

q'(P',p) = (exp(P') / p) ^ {1/n} - 1

To simplify, forget about the correction that takes into account
current p vs desired p (previous setpoint): apply it later.  The focus
should really be on the evaluation of the exponential.

      q01(P1) = exp((P1-P0) / n)

Given q01, we have p1 = p0 q01^n, which can be compared to the desired
value exp(P1).  If we don't make any correction here, eventually we'll
get off track: any systematic error in q_n,n+1 is going to be
accumulative.

Correction: approximate (exp(P1)/p1)^{1/n}, and use it to scale q01.
Up to approximation, this will cancel *all* the errors accumulated.

The approximation could use the log of p1, i.e. exp((P1-log(p1)/n)).
The weak point in all of this is log(p1), since p1 has the largest
deviation wrt. the base of any expansion = 1.

What about incrementally computing the estimate of exp(P1) by
successive squaring?  It could even be spread over time.


Conclusion:

- Core routine is exp((P_k+1-P_k) / n), which is a small number ->
  gives fast convergence.

- There is systematic drift.  Some absolute value feedback is
  necessary to compensate.  Combine setpoint exp(P_k) with computed
  value p_k.  The exp(P_k) has larger error, but might be accurate
  enough to perform compensation.

- Stability is probably most important: approximation exp'(x) such
  that |exp'(x)| < |exp(x)|


Entry: Exponentials: approximation error should not introduce instability
Date: Fri Mar  8 13:24:40 EST 2013

approximation exp'(x) ~= exp(x)  s.t.  |exp'(x)| < |exp(x)|

One question that bugs me is that exp(x) = exp(x/2) * exp(x/2).

The smaller the argument of x gets, the faster the series converges.
At what point does this successive squaring get imprecise?
I.e. exp(x/2^n) might be more precise, but the squaring looses
information again.


Entry: Exponentials: reset at border
Date: Sat Mar  9 10:45:18 EST 2013

Compute q_k = exp((P_k_+1 - P_k) / n) such that a complex exponential
can be generated as p_k+1 = p_k * q_k^n

Assuming p_k = exp(P_k) and if the approximation of exp() is good,
p_k+1 will be pretty close to exp(P_k+1).  Maybe it is good enough to
just set p_k+1 to this value?

This gives a simple algorithm where noise (click) is only dependent on
the accuracy of exp(), assuming that the numerical problem due to
successive multiplication by q is small enough.

Simple enough to test.


Entry: Exponentials: Reset + Nth root
Date: Sat Mar  9 17:18:06 EST 2013

Problem with previous reset algo is that ^exp(P1) and ^exp((P1-P0)/n)
can have strong difference in precision.

Alternative method:

Compute exp(P1) using a "human precise" approximation, i.e. something
that is good enough as real end point, then interpolate from exp(P0)
through the Nth root of ^exp(P1) / ^exp(P0).  This takes care of
accumulating errors, but requires more work.

However, this might be solved by using a Newton-Raphson step from an
initial value of ^exp((P1-P0)/N).


Entry: Exponentials: push drift compensation upward
Date: Sat Mar  9 17:26:12 EST 2013

Some experiments show that for the values involved, an order 5-10
Taylor series is going to be quite good and approximation.  The
remaining problem then is drift.

These functions are to be used in LFOs and envelopes in a sound synth,
so there might be some trade-offs to be exploited, i.e. for an
envelope, the end state is 0 increment, which should remain stable.
Maybe it's even possible to compute where we'll end up after one
cycle, to compensate for that externally?

So, given ^exp(), approximation of exp(), a set-point curve at block
boundaries (N), and an initial state, it is possible to compute the
expected deviation after one "cycle" of the control wave.  This can
probably be used externally to compensate the control wave form on a
larger time scale.


Entry: Exponentials: soft reset (lowpass)
Date: Sat Mar  9 17:48:47 EST 2013

Alternative method:

Instead of resetting, use a "soft reset", i.e. a lowpass filter that
pulls the state into the right direction.  This should be enough to
prevent drift, without introducing too much dependence on the
accuracy difference of the ^exp(P1) vs ^exp((P1-P0)/N) terms.


Entry: Exponentials: compute exp(x) and exp(iy) separately?
Date: Sat Mar  9 18:12:44 EST 2013

Maybe not such a bad idea, since both might need separate precision,
and it would free up the dependency chain.

This essentially computes exp,sin,cos.

Using just taylor expansion, stability requirements are easy to
satisfy, as for real exp the approx is strictly smaller, while for
sin/cos it is satisfied if we stop at a negative term and stay in the
first quadrant.


Entry: Exponentials: exp(x) vs exp(x/2)^2
Date: Sat Mar  9 18:23:54 EST 2013

For approximations of exp(x) as truncated Taylor series, this might be
an interesting problem.

The main question is: halving followed by squaring is two
multiplications, which is about the same complexity as adding a term
to the series (multiply + add).  What is the difference in precision?


Entry: Synth sampling
Date: Sat Mar  9 18:25:53 EST 2013

It seems that sticking with at least 2x oversampling (wrt usable audio
rate) is going to be a good thing, as it makes a lot of the math
simpler.


Entry: Normalization
Date: Sat Mar  9 18:27:23 EST 2013

Given a,b s.t. a^2+b^2 ~= 1, is there a fast way to push this closer
to the unit circle?  Direction doesn't matter so much.

( The a and b are sin and cos from a truncated taylor expansion. It is
probably possible to find out which of the two is systematically more
precise. )

Exact normalization would be scaling by r = 1 / sqrt(a^2 + b^2).

This needs to somehow be approximated.  Since r will be close to one,
the following taylor series could be used:

  1 / (1 + x)^1/2  ~= 1 - 1/2 x

 
;; Approximate normalization for a^2 + b^2 close to 1
;; Exact coef: r = (a^2 + b^2)^1/2, a -> r a,  b -> r b
;; Since this is close to 1 we can use 1st order Taylor:
;; r = 1 / (1 - x)^1/2  ~=  1 + 1/2 x   with  x = 1 - (a^2 + b^2)

;; Maybe using a -> a + (r1 * a) with r1 = 1/2 x is more stable?
;; Don't think so, since x does not have many significant bits, it's
;; probably OK to hide it in the LSBs of r.

(define normalize-2D
  (lambda (a b)
    (let* ((a2 (* a a))
           (b2 (* b b))
           (x (- 1 (+ a2 b2)))
           (r (+ 1 (* 0.5 x))))
      (values (* r a)
              (* r b)))))
         
            
Entry: Exponentials: pole interpolation: linear vs. exp(linear) ?
Date: Sat Mar  9 19:22:54 EST 2013

Since all values are going to be quite small, it might be good enough
to just use linear interpolation for the pole positions instead of
multiplicative updates.

The thing is: pole Q is not going to change much.  All control effort
will be for pole frequency which is most musically interesting.

When using linear interpolation for cos/sin also, the curvature will
start to play a role when frequency difference is large.  This
introduces modulation in the Q, and reduces the average Q.

A trade-off is probably possible:
- linear interpolation for pole Qs
- circular interpolation for frequencies + normalization at border

r += dr

p_x += (c-1) p_x + s p_y
p_y += (c-1) p_y - s p_x


Entry: Exponentials: update
Date: Tue Mar 12 17:29:26 EDT 2013

Given a good estimate of exp(x0), how to compute exp(x0+dx) ?
This is easy: exp(x0) * exp(dx).

Doing this many times gives the same geometric series as before:
exp(x0) * exp(dx)^n


Entry: Exponentials: stable update
Date: Wed Mar 13 09:27:29 EDT 2013

I'm still not convinced that it's not possible to compute a stable
update for frequency increments.

Summary of before: 

* ^exp(( P1 - P0 ) / N) = q01 Is more precise than ^exp(P0) and
  ^exp(P1) such that ^exp(P0) * q^N and ^exp(P1) do not line up.

* using only incremental updates q will cause drift.

A "directed" method would solve this, something that is targeted at a
set point, but does not have a growing approximation error.  This
would boil down to computing:

      ( exp(P1) / p0 ) ^ 1/N

This can be simplified to P1 purely imaginary and p0 unit norm.

The question seems to be: how to construct a series for this?  Using
autodiff[1] it's probably possible to automate it.  But is a straight
series good enough?  It might have non-elementary operations.

Let's try to massage it a bit first.  

     exp (P1 / N)                         = \sum_n (iP1/N)^n / n!

     1 / p0^1/N = 1 / (1 - (1 - p0^ 1/N)) = \sum_n (1 - p0^1/N)^n

The resulting series is a convolution of these two.  However, the term
p0^1/N is not elementary.  This seems like a detour..

Let's try a different way, by writing the error explicitly.

It is known that p0 approximates exp(P0) or p0 = e + exp(P0).  The e
is not known, but could be computed.  Can this be used?

The error is completely contained in the difference between p0 and
exp(P0).  This brings us back to one of the previous attempts:
compensate the error accumulated up to a point by spreading out the
compensation over the next block, i.e. multiply the update
coefficient by:

          (exp(P0) / p0) ^ 1/N

This has the same form as the original problem.  It doesn't look like
it is going to be of much use..


Let's ignore the 1/N as it is probably possible to introduce it later,
and go for the expansion of log() :

         q(P1,p0)

         = exp(P1 - log (p0))

         = exp(P1 - log (1 - (1 - p0)))

         = sum_n,0 1/n! (P1 + sum_m,1 1/m (1 - p0) ^ m) ^ n


I'm thinking that it might be time to start implementing convolution
of sequences, and work numerically.

EDIT: The result of the log is imaginary, so all real parts could be
eliminated.

EDIT: It doesn look like this will work since the inner series has a
constant term.

Conclusion: bad intuition.  The connection to "small values" because
P1 and p0 appear naked in the expressions.


[1] http://zwizwa.be/darcs/meta/rai/ai-autodiff.rkt


Entry: Composition of series expansion
Date: Wed Mar 13 11:52:32 EDT 2013

Is it actually possible to write this in a finite way?  E.g.

    exp(exp(x)) = sum_n 1/n! (sum_m 1/m! x^m) ^n

what is the coefficient of 1, x, x^2, ... ?

Hmm... I'm missing an important point here.  The trouble is that this
is doubly infinite, so when doing this numerically, it is necessary to
add a termination criterion.

Only in the first coefficient though, meaning f(0), as it will show up
as an infinite sum.  All other sums will be finite.

Finding the contribution for x^n depends on all possible splits of the
factorization of n, but is finite.

Another problem: the infinite sum might not even converge!
Actually, this is mentioned here[1] on composition g(f(X))

  A point here is that this operation is only valid when f(X) has no
  constant term, so that the series for g(f(X)) converges in the
  topology of R[[X]]. In other words, each cn depends on only a finite
  number of coefficients of f(X) and g(X).


[1] http://en.wikipedia.org/wiki/Formal_power_series#Composition_of_series


Entry: Direct computation of 2D series
Date: Thu Mar 14 18:47:03 EDT 2013

These are all interesting problems, but not sure if it is useful to
solve them.  There might be a simpler way to compute the 2D Taylor
series directly through differentials (autodiff).

( I'm thinking that diff is more work since there are more terms to
compute -- the tensors with partial derivatives -- but maybe the same
explosion happens for the function composition of sequences.  A
serious case of bad intuition here.. )

But... for 2 variables this isn't really such an explosion, is it?

0 f
1 f_x f_y
2 f_xx f_xy f_yy
3 f_xxx f_xxy f_xyy f_yyy

Still not convinced this will work for exp((P1 - log(p0)) / N).
Somehow not explicitly using the fact that P1 - log(p0) is small seems
to be be wrong, i.e. nothing special.  Somewhere, some cancelling is
going on eventually, but in what way?


Entry: Exponentials: final word
Date: Fri Mar 15 10:23:26 EDT 2013

The update data dependencies are:

  (P0, P1, p0, N) -> (q01, p1)

The approximation error is:

  exp(P1) - p1 = e

which could also be expressed relative to p1 if it makes sense for
compensation expressions:

             e = e' p1

Because q01 is fairly precise, it should be possible to do one of these:
- ignore the drift, solve it at a higher level
- hard-reset the drift once by computing exp(P1) using more terms
- adding 1-st order compensation for q01 drift in q12 in terms of e.

Any other approach explored in the previous entries seems to loose
precision due to the inability to express that e is small.  I.e. we
don't get a series expansion in e, but in P1, p1, which is not helpful.

So, time to stop analysis paralysis: it's really a non-issue.  Either
find an approach that gives an expansion in e, or shut up and see what
happens in practice.


Entry: Exponentials: computing the log of |p| ~= 1
Date: Fri Mar 15 13:09:45 EDT 2013

Hmm... still something is missing.

Given that P ~= log(p) and p=c+si with |p| = 1 so re(P)=0, what
happens if this is plugged into the Taylor expansion?

log(1 + x) = \sum_n=1 1/n x^n


      P ~= (c-1 + s i) 
         - 1/2 (c-1 + s i) ^2  
         + 1/3 (c-1 + s i) ^3

Taking only the imaginary coefficients:

        = s i
        - (c-1) s i
        + (c-1)^2 s i - 1/3 (s^3) 

From this it seems that computing log(1 + x) for x small is not such a
problem.  How to use this?

It could be used to compute log (p1/p0), and use that to update the
estimate of P1 = log(p1).

I'm still missing a way to make all the relations concrete:
- Using P1 directly avoids drift, but is imprecise
- Using (P1-P0)/N has drift, but is more precise.

These are two estimators.  How to combine two estimators into one in
an attempt to avoid drift?


The thing is, there is an estimate already.  To use this information,
an update method should be used.  Using N-R, log(P) is a solution of
f(x) = 0 with:

   f(x) = exp(x) - P

The N-R update step for this is  

        u(x) = x - f'(x)/f(x)

             = x - exp(x) / ( exp(x) - P )

Maybe it pays to combine these, i.e. take exp(x) and x to be the
estimates we have...

To be continued..


Entry: Normalization: a-symmetric
Date: Fri Mar 15 14:30:11 EDT 2013

See also [1].  I'm looking for a simpler, a-symmetric approach where
only one of the coefficients is updated, e.g. c.  Solving for f(c) = 0 in

   f(c) = (c^2 + s^2) - 1

the N-R update is:

   u(c) = c - f(c) / f'(c)
        = c - (c^2 + s^2 - 1) / 2c

        = 1/2 (c + (1-s^2)/c)

This is probably much more precise than the other method, but it uses
a division operator.  Also, it's symmetric in c and s, so if c->0 we
can switch to the s form.

The direct form is probably better.


[1] entry://20130309-182723


Entry: Normalization: Fast inverse square root
Date: Fri Mar 15 20:56:46 EDT 2013

Some nice bit twiddling tricks[1].

Made me wonder if I should use N-R for the normalization problem..

To compute 1 / sqrt(q) where r^2 = c^2 + s^2 ~= 1, it might be good
enough to just use the N-R update for f(x) = 1/x^2 - r^2
 
   u(x) = 1/2 x * (3 - r^2 x^2)

Since x will be close to one, this gives the series of estimates:

x0 = 1
x1 = (3 - r^2) / 2    

Here x1 is actally the same as 1st order Taylor approx of 
1 / (1 + x)^1/2  with  1 + x = r^2 from [2].


[1] http://en.wikipedia.org/wiki/Fast_inverse_square_root
[2] entry://20130309-182723


Entry: Standard filters from pole positions.
Date: Tue Mar 19 12:46:18 EDT 2013

How to derive the LP/HP/BP/BS filters from a 2-pole discrete state
variable filter?  There seem to be a couple of degrees of freedom here
that require a bit of extra information.

I'm picking the "orthogonal" SVF because it seems easiest to control.
Practically, I'm looking for B,C,D in

           s_k+1 = A s_k + B i_k
           o_k   = C s_k + D i_k

where A is [  p_x  p_y ] with (p_x, p_y) the pole location.
           [ -p_y  p_x ]

with i,o 1-dimensional, this boils down to 2+2+1 parameters.


Transfer frunction is:

        o = (C (zI - A)^1 B + D) i


Maybe it's best to start from the relation HP = 1 - LP, and focus on
LP, then find out how to derive BP and BS.  The latter 2 are not as
important as the LP, HP pair.

For LP, we need unit gain at z=1 or

         C (I - A)^1 B + D = 1


Another way to go is to look at the 2D equation in terms as if it were
a 1D complex equation.


It's probably simpler to think in terms of complex numbers here.
Basically, a real 2-pole is the real part of a complex 1-pole.


Entry: Complex 1-poles
Date: Tue Mar 19 13:24:42 EDT 2013

Something that has puzzled me for a while:

* The ideas of "low pass" (LP) and "high pass" (HP) filter are not
  meanigful for a discrete, complex filter.


The ideas of LP,HP,BP,LP stem from real, continuous filters.  They
refer to the behavior of the transfer function at zero, infinity and a
reference frequency f.

      | 0  f  inf
  ----+----------
  LP  | 1  ?  0
  HP  | 0  ?  1
  BP  | 0  1  0
  LP  | 1  0  1


This has to do with the symmetry of the transfer functions.

For continuous filters, transfer functions are defined on the real
line, which have 2 special points apart from a current reference
point: zero and infinity.

For a complex discrete filter, no special points exist, other than
"not the point of interest".  This is due to the symmetry: transfer
functions, which is apparent by them being defined on the complex unit
circle (z-transform).

Looking at the symmetry, there are only two "kinds" of discrete
filters: "band pass" (BP) and "band stop" (BS).

    f=f0  f!-f0
----------------
BS   0     >0
BP   1     <1


* The ideas of LP/HP can be introduced into the discrete world by
  - Sticking to real signals.  This creates 2 special points: DC, NY
  - Defining a mapping from continuous to discrete.

Sticking to real signals creates two special points where the real
axis intersects the unit circle: DC (the direct current) and NY
(Nyquist frequency).

By itself, DC and NY are 2 points that are not essentially different.
However, they can be distinguished by relating discrete and continuous
transfer functions.

Impulse invariance:
  - DC  0 (but also 2NY, 4NY, ...)
  - NY  not in any way special.  infinity is not mapped

Bilinear transform:  
  - DC  0
  - NY  infinity

Once this mapping is in place, the ideas of HP and LP can be made to
carry over.

So it is important when talking about LP and HP filters in a discrete
sense to also take into account how these ideas are defined,
i.e. through impulse invariance or the bilinear transform.

Using the impulse invariance transform which is useful for musical
applications, it might be useful to still define properties in terms
of behaviour at infinity, even if that doesn't make sense in
actuality, as filter transfer functions will alias.


Entry: Impulse-invariant high-pass filter: what does it mean?
Date: Tue Mar 19 14:04:48 EDT 2013

An interesting conceptual problem: it is possible to define the
behavior of a low pass filter based on the impulse-invariant transform
(pole mapping).

The same doesn't work for a high-pass filter, since the high pass band
will alias down into the stopbad, killing the high-pass property.

So, how should it be implemented?  Maybe it only makes sense using a
bilinear transform?


Entry: CT proto 2nd order filters.
Date: Tue Mar 19 15:17:14 EDT 2013

Thinking about filters, re-building intuition..  These are all 2nd
order filters with p a complex pole and p' its complex conjugate.

* Band-pass filter (BPF).  Gain is zero at DC and infinity.  To have
  zero DC gain a zero at 0 is necessary.  To have have zero gain at
  infinity this should be the only zero.

          s
  ----------------
  (s - p) (s - p')


* High pass filter (HPF).  Derivative of BPF.  Limits to 1 when s->inf.

         s^2         
  ----------------
  (s - p) (s - p')

  High pass filters are closed under series composition (product).


* Low pass filter (LPF).  Integral of BPF.  Limits to 1 when s->0.

         pp'                   1
  ---------------- =  -------------------- 
  (s - p) (s - p')    (1 - s/p) (1 - s/p')

  A LPF is also an all-pole filter, meaning it has no (finite) zeros.
  Low pass filters are closed under series composition (product).


* All pass filter (APF).  Gain=1, only phase shift

  (s + p) (s + p')
  ----------------
  (s - p) (s - p')

  All pass filters are closed under series composition (product).


* Sum of one-pole filter (S1F).  Partial fraction expansion has no
  zeros.  Not sure if this is really so useful, but it does tend to
  show up from time to time, as they are the only ones closed under
  parallel composition (sum).

      s - Re(p)         1/2     1/2
  -----------------  = ----- + -----
  (s - p) (s - p')     s - p   s - p'


Entry: Partial fraction forms
Date: Tue Mar 19 16:49:07 EDT 2013

http://www.wolframalpha.com/widgets/view.jsp?id=ec4a062bb304f88c2ba0b631d7acabbc

 LP(s)

 = pp' / (s - p) (s - p') 

      a          b
 = -------  + -------
    s - p      s - p'


  =>   as  + bs  = 0   => a = -b
     - ap' - pb' = pp' => a (p - p') = pp'

  or a = pp' / (p - p') = 2 |p|^2 / Im(p)


The wave form in the response is thus 90 degree shifted from a normal
exponential, of which the real part is a cosine and the imaginary part
a sine.  I.e. the response is a sine wave, gradually rising to follow
the pulse.  It doesn't jump immdiately, as would a high-pass filter.

 HP(s)

 = s^2 / (s - p) (s - p')

 = 1 + p^2 / 2 Im(p) (s-p)  + p'^2 / 2 Im(p) (s-p')

This has a direct path + a phase-shifted sinusoid.

 
Entry: CT -> DT
Date: Tue Mar 19 17:03:47 EDT 2013

Here[1] is mentioned that impulse invariance works for LP and
matched-Z[2] works for BP,BS,HP.

Bilinear works for anything, as it is 1->1.
  
[1] http://dsp.stackexchange.com/questions/109/are-there-alternatives-to-the-bilinear-transform
[2] http://en.wikipedia.org/wiki/Matched_Z-transform_method


Entry: Bilinear vs. impulse invariance
Date: Tue Mar 19 17:16:35 EDT 2013

What would be the best approach for time-variant filters?  Some things
that are important are: the ease of mapping frequency and Q parameters
to discrete, and the interpolation between different set points.


Entry: What does an LP/HP/BP response look like?
Date: Tue Mar 19 17:37:00 EDT 2013

Is there any kind of visual quality that can distinguish impulse
responses of LP/HP/BP?  They are all sums of damped exponentials, so
where is the difference?

Qualitatively, one of the key points seems to be how phase and damping
interact.

 - The DC response is determined by the average, so when most of the
   response is above zero, the response will be low-pass.

 - If the response doesn't jump from 0 immediately, this shows
   inertia, a property of low-pass behavior.

Probably step responses are a lot easier to interpret since they are a
bit more intuitive to grasp.


Entry: Cheaper filter?
Date: Tue Mar 19 22:43:19 EDT 2013

( EDITED 2013-5-7 - Original was wrong.  It completely missed the
forward/backward thing and use a forward different transformation. )

According to [1], the direct digital approximation of the SVF is
stable for low frequencies, even at q=0 (Q=inf).

In Stilson's PhD [3] this is called the Chamberlin filter.  Note that
this is _not_ a trivial transformation of the CT equations.  It has
one forward and one backward difference to replace the 2 integrators.
See also [2].  In this notation, the uptdate equations are g
_sequential_, where the update for s2 is used in the update for s1.
This in contrast to the usual formulation as simultaneous updates,
i.e. state space form.

   s2' = s2 + f s1                 : LP

   s1' = s1 - f (q s1 + s2' + i)   : HP

Interpreting these equations as simultaneous, i.e. using s2 instead of
s2' in the second equation, gives a forward difference approximation
and makes the filter unstable.

The parameters f and q are related to their continuous time
counterparts as


f = 2 sin ( pi F_c / F_s )
q = 1 / Q

Even in the presence of oversampling, this approach is a lot simpler
than the direct pole computation and interpolation approach from the
last couple of posts.

[1] http://www.earlevel.com/main/2003/03/02/the-digital-state-variable-filter/
[2] entry://20130421-101744
[3] entry://20130401-012657
[4] entry://20130408-005953


Entry: Newton-Raphson method
Date: Wed Mar 20 13:36:49 EDT 2013
Type: tex

Once, to never forget, the derviation.  Given a point $x_i$ called the
\emph{initial estimate}, and a function $f(x)$, we can make an
approximation $x_u$ for $x_0$ s.t. $f(x_0)=0$ by constructing the
$1$-st order approximation to $f$ at $x_i$.  The equation of this
line, translated to $t = x - x_i$ is $$l(t) = f(x_i) + t f'(x_i).$$
Solving for $t_u$ s.t. $l(t_u) = 0$ gives $t_u = - f(x_i)/f'(x_i)$, or
$$x_u = u(x_i) = x_i - {f(x_i) \over f'(x_i)}.$$ The function $u$ can
now be iteratively applied to refine the estimate.


Entry: Applying the N-R method for function evaluation
Date: Wed Mar 20 13:50:58 EDT 2013
Type: tex

To use the N-R for function evaluation $f(x)$ requires an extra step,
which is to construct a function F such that $F(f(x)) = 0$.  Is there
a systematic way to do this?

As an example, take $f(x) = 1/\sqrt{x}$.  Construct a function $F(q)$
such that $F(1/\sqrt{x}) = 0$.  The straightforward approach would
give something like $$F_1(q) = q - 1/\sqrt{x}.$$  Computing the N-R
update from $F_1$ is rather useless $$u_1(q) = q - \frac{q -
{1/\sqrt{x}}}{1} = 1/\sqrt{x}.$$

The trick seems to be to take this trivially obtained but useless
$F_1$ and change it in a way that turns the dependency on $x$ into
elementary functions, without removing the zero of $F_1$ we are
interested in.  The following two steps seem appropriate: remove the
square root $$F_2(q) = (q - 1/\sqrt{x})(q +1/\sqrt{x}) = q^2 - 1/x$$
and remove the division operation $$F_3(q) = xq^2 - 1$$

Which of the two is best?  There is no way to decide until computing
the N-R update steps.  It turns out that both lead to the same function
$$u_2(q) = u_3(q) = \frac{1}{2}(q + \frac{1}{xq}).$$

Now $u_2(q)$ still contains a division, which is not optimal.  There
are other ways. In [1] the function $F_4(q) = 1/q^2 - x$ is used,
which leads to $$u_4(q) = \frac{1}{2}(3q - xq^3).$$ The heuristic to
distill from this is that if there is a division involved, it might be
best to express $F(q)$ in terms of $1/q$.  The derivative will have a
decreasing negative power.  The negative power is then cancelled by
$F'(q)$ appearing in a denominator in the update formula.

That trick should then also work for evaluating $f(x) = 1/x$. Indeed,
with $F_5(q) = 1/q - x$ we get $u_5(q) = q(2 - xq)$.

%[1] http://en.wikipedia.org/wiki/Fast_inverse_square_root


Entry: Saturation update
Date: Wed Mar 20 14:43:54 EDT 2013
Type: tex

A function that's interesting for computing smooth saturation is
$$f(x) = \frac{x}{\sqrt{1 + x^2}}.$$ Using the $1/q$ trick from the
previous post, a N-R method can be used to evaluate this from $$F(q) =
\frac{1}{q^2} - \frac{1+x^2}{x^2}$$ yielding $$u(q) = \frac{3}{2}q^2(1
- \frac{q^2 (1+x^2)}{x^2}).$$ From this, a disadvantage seems to be
that the $1/q$ method doesn't work for $x \to 0$.  Hmm\ldots No free
lunch.


Entry: The Harmonic Oscillator and the Laplace Transform
Date: Wed Mar 20 20:02:59 EDT 2013
Type: tex

The state space form of the harmonic oscillator consists of the
following system of differential equations
$$
\begin{array}{rcl}
\dot{q_1}(t) &=& - \omega q_2(t) \\
\dot{q_2}(t) &=& \omega q_1(t) \\
\end{array}
$$
This can be solved by Laplace-transforming the differential equations
to a set of equations
$$
\begin{array}{rcl}
s Q_1(s) - q_1(0) &=& - \omega Q_2(s) \\
s Q_2(s) - q_2(0) &=& \omega Q_1(s) \\
\end{array}
$$
Solving this for $Q_1(s), Q_2(s)$ gives a solution in the Laplace
domain, which can be converted back to the time domain.  The solutions
looks like $$Q_i(s) = \frac{n_i(s)}{s^2 + \omega^2}$$ where the numerators
$n_i(q)$ are first order polynomials in terms of the initial
conditions $q_i(0)$.  The interesting part is the denominator $s^2 +
\omega^2 = (s-j\omega)(s+j\omega)$ which determines the \emph{shape} of the result.
Splitting the $Q_i(s)$ in partial fractions then gives the sum of
first order rational functions proportional to $(s-j\omega)^{-1}$ and
$(s+j\omega)^{-1}$, which are Laplace transforms of $e^{j\omega t}$ and
$e^{-j\omega t}$ respectively.

The denominator can also be obtained as $\det(Is-A)$ where $A$ is the
system matrix $$A = \matrix{0 & -\omega \\ \omega & 0}.$$


Entry: The State Variable Filter
Date: Wed Mar 20 21:58:08 EDT 2013
Type: tex

The state space form of the SVF [1][2][3] consists of the
following system of differential equations
$$
\begin{array}{rcl}
\dot{s_1}(t) &=& - \omega [ 2 \zeta s_1(t) + s_2(t) + i(t) ] \\
\dot{s_2}(t) &=& \omega s_1(t) \\
\end{array}
$$
corresponding to a system matrix
$$A = \matrix{ - 2 \zeta \omega  & -\omega \\ \omega & 0}.$$

Here $2\zeta=1/Q$ with $Q$ the usual pole--based definition of
\emph{quality factor}, and $\omega = 2\pi f$ the angular frequency.
Personally, I prefer to use the parameter $\zeta$ over $Q$ as it makes
the expression of the poles a bit easier to read.  The pole equation
$$p(z) = \det(A-Is) = s^2 + 2 \zeta \omega s + \omega^2 = 0$$
has two solutions
$$s_\pm = \omega (- \zeta \pm \sqrt{\zeta^2 - 1}).$$

If $\zeta < 1$ the poles are complex conjugate with complex angle only
dependent on $\zeta$, i.e. unit norm phase component $e^\theta =
\zeta+i\sqrt{1-\zeta^2}$ where $\cos\theta = \zeta$.


% [1] http://www.earlevel.com/main/2003/03/02/the-digital-state-variable-filter/
% [2] entry://20130319-224319
% [3] http://en.wikipedia.org/wiki/State_variable_filter


Entry: SVF discretization + compensation
Date: Wed Mar 20 19:18:04 EDT 2013
Type: tex

EDIT 2013-5-7: I'm leaving this here for reference, but make sure to
note that using the forward difference transform on the CT SVF filter
is not a useful approach.  See [8]. The Chamberlin filter used a
forward and a backward difference, which has different properties than
what I've been using in the next couple of SVF posts, which use only
forward differences.

The CT system described in [3] can be approximated naively [1][2] in
the discrete domain using the substitution $s=z-1$, which corresponds
to the forward difference transform\footnote{This is not the whole
story. Other discretizations are possible. See [7].}.  This leads to
the update equation $$ \begin{array}{rcl} s_1[k+1] &=& s_1[k] - \omega
(2 \zeta s_1[k] + s_2[k] + i[k]) \\ s_2[k+1] &=& s_2[k] + \omega
s_1[k] \\ \end{array} $$ corresponding to a discrete feedback matrix
$$A = \matrix{ 1 - 2 \zeta \omega& -\omega \\ \omega & 1}.$$


The poles determined by $\det(A-Iz)=0$ can be derived directly from
the poles of the continuous system through the substitution $z=s+1$ or
$$z_\pm = 1 + s_\pm = 1 + \omega (- \zeta \pm \sqrt{\zeta^2 - 1})$$


This is no longer a constant--$Q$ filter.  For any $\zeta$, when
$\omega$ gets large, eventually it will intersect the unit circle to
cause instabilities.

The question is then, how to compensate this?  Several options are
possible.  Running the filter multiple times as suggesteded in [1]
will reduce approximation errors.  Is it possible to use a squaring
approach as opposed to mere linear oversampling such as used in the
scale reduction technique in [5]?  Yes, but due to damping it requires
full $2 \times 2$ matrix multiplication as opposed to simplified
complex phasor multiplication.  Instead of $A(\omega,\zeta)$ from
above, compute the system matrix $$A(\omega',\zeta')^{2^n}$$ using
successive swquaring where $\omega' = \omega / 2^n$ and $\zeta$ is
adjusted accordingly.  There might be some interesting ways to reduce
operations by preserving the symmetry of the intermediate squaring
results.

Another approach is to construct polynomial approximations that
compensate $\omega$ and $\zeta$ to lessen the nonlinearities and maybe
more importantly, keep the filter stable if $\omega$ is varied,
i.e. approximate the constant--$Q$ behaviour of the original analog
filter.  The question is how to express this property in the discrete
domain, as it only makes sense in reference to the poles of the
original system, where it refers to the property that the pole angle
does not depend on $f$.

We can use the impulse invariant transform[4] to map the poles $z_\pm$
of the discrete system to the continuous system poles $s_i$ through
the relation $z_i = \exp(s_i)$.

The property we're interested in can then be expressed as
$$\frac{d}{df} \arg(\log(z_\pm(f,q(f)))) = 0.$$ where $\arg(.)$
computes the angle of a complex number.  This is a first order
nonlinear differential equation in $q(f)$.  It is possible to
construct different equations that express the same constraint such as
$$\frac{d}{df}\frac{\arg(z_\pm)}{\log(|z_\pm|)}

What would be a good way to approximate this?  To solve the equation
numerically and use it to fit an approximation, or to fill in a
parameterized curve and turn it into an error minimization problem.

Anyway it's pretty sure that anything that is workable will be a
polynomial in $f$.  It's probably also a good guess to set the linear
coefficient to zero, since there is no compensation necessary near
$z=1$.

Let's start with computing the magnitude of the pole.  It seems there
is enough to gain by compensating that first. (TODO)


% [1] http://www.earlevel.com/main/2003/03/02/the-digital-state-variable-filter/
% [2] entry://20130319-224319
% [3] entry://20130320-215808
% [4] http://en.wikipedia.org/wiki/Impulse_invariance
% [5] entry://20130326-173542
% [6] entry://20130325-094100
% [7] https://ccrma.stanford.edu/~stilti/papers/Welcome.html
% [8] http://20130421-101744

Entry: Polynomial bumps and saturation functions
Date: Thu Mar 21 16:00:05 EDT 2013
Type: tex

A saturation function $f(x)$ defined on an interval $I=[-1,1]$ should
have the properties $f'(x) \geq 0$ (monotonous), $f(x) = -f(-x)$ or
$f'(x) = f'(-x)$ (symmetric) and $f'(0) = 1$ (small signal identical).
Such a function reaches its maximum,minimum $+m,-m$ at the interval
end,begin points $+1,-1$.  Let's call this $m$ the \emph{squash}, the
maximum gain reduction possible under constraints above.  Let's call
the derivative $f'(x)$ is a \emph{bump}, following its shape.

For the application, a function is more useful as $m$ gets smaller, or
equivalently when scaled as $F(x) = f(m x)/m$, the useful domain gets
larger.

The simplest polynomial that satisfies these constraints can be
constructed from its deriviative specification, using just two zeros
$$f_1'(x) = (1-x) (1+x) = 1 - x^2$$ which leads to $$f_1(x) = x -
\frac{x^3}{3}.$$ Here $m_1=2/3$.  This can be generalized to higher
order bumps by just adding more zeros at $\pm 1$.  The family of bumps
is given by $$f_n'(x) = (1 - x^2)^n.$$ At first glance, this seems
like a useful family, but it is not entirely clear how to implement it
efficiently.  Integrating the binomial expansion of $f_n'(x)$ is
straightforward, but not very elegant.

A second way to generate bumps is to make the derivative maximally
flat at $0$, using multiple zeros at $0$ for the second derivative,
which makes the saturation function maximall straight at $0$.  This
constraint is satisfied by the functions $$g'_n(x) = 1 - x^{2^n}$$
which have simple zeros at $\pm 1$ besides the factors with complex
zeros $1+x^2, \ldots, 1+x^{2^{n-1}}$.  While the integral $g_n$ 
has a simple closed form, it is not useful, as the squash actually
increases as the order increases.

A third way is by simply squashing multiple times using a simple
squash function.  It is simpler to express by starting with a scaled
$f_1$ such that $-1,0,+1$ are fixed points.  A sequence of functions
can then be constructed as $$h_1(x) = \frac{1}{2}(3x-x^3) \text{ and }
h_{n+1}(x) = h_1(h_n(x)).$$ Due to the scaling, the gain at $0$
increases with $n$ as $(3/2)^n$.  To obtain a gain $1$ function, this
scaling factor can just be divided out again at the end of the
iteration chain.

After some doodling and plotting, it looks like the $h_i$ are very
similar to the $f_i$, as $h_i$ gives rise to polynomials that are also
increasingly flat at $\pm 1$.  This can be seen as follows.  $f_1$
around the fixed point $1$ looks like $1-(x-1)^2$, which is a double
zero after translation to $(1,1)$.  Iterating this turns the double
zero into a quadruple zero.  At each iteration, 2 more zeros are
added.

However, $h_n$ and $f_n$ are not the same sequence. The order of $h_n$
is $ 3^n$, so that of $h'_n$ is $3^n-1$, which grows in a different
way than the order of $f'_n$ which is $4^n$.  Outside of our range of
interest we have $h_1(2) = -1$, so some flattening will occur at
$x=\pm2$, as its surrounding is mapped to a stable fixed point.

To avoid these other fixed points with the aim of getting more useful
zeros, we could start te iteration scheme with $k_2(x) = c f_2(x)$,
which is monotonous as $f'_2(x) = (1-x^2)^2\geq 0$.  After
integrating, $$f_2(x) = x - \frac{2}{3} x^3 + \frac{1}{5}x^5.$$ This
gives $c = f_2(1)^{-1} = 15/8$.  The flatness order of the fixed
points at $\pm 1$ is $3^n$, while the polynomial order grows as $5^n$.

A forth way is using a truncated Taylor expansion of an elementary
function.  Often, when a saturating analytic function is necessary,
$\arctan(x)$ is used, which corresponds to the integral of the bump
$(1+x^2)^{-1}$.  This doesn't seem to be so useful here as every term
needs to be evaluated, i.e. there is no fast $O(log(n))$ evaluation,
and the function squashes the real line instead of a compact interval.


Entry: Constant-Q pole curves
Date: Mon Mar 25 09:41:00 EDT 2013
Type: tex

The constant-$Q$ constraint of a $2$nd order CT filter is equivalent to
a constant phase constraint on the signal poles.  The parameter $Q$ is
usually defined from a $2$nd order transfer function, which has a pole
polynomial $$s^2 + \frac{\omega}{Q}s + \omega^2 = (s-s_+)
(s-s_-).$$ For $Q>\frac{1}{2}$, and using the more conventient
parameter $\zeta = 1/2Q$ with $\zeta < 1$, this corresponds to the poles $$s_\pm =
\omega \left[ \frac{-1}{2Q} \pm j\sqrt{1 - \frac{1}{4Q^2}} \right] =
\omega (-\zeta \pm j \sqrt{1-\zeta}) = \omega \phi.$$\ This has has
$\| \phi \| = 1$, so we can write $\phi = \exp(j\theta)$, where
$$\theta = \arctan \sqrt{4Q^2-1}.$$

Mapping the idea of constant-$Q$ to the discrete domain can be done
using the impulse-invariant transform $z_k=\exp(s_k T)$, where we set
$T=1$ for simplicity.  Given $\phi$, curves of constant $Q$ are
parameterized by $\omega(t)$ which corresponds to the discrete poles
$$z_\pm = \exp[\pm\omega(t)\phi].$$ Let's drop the $\pm$ subscript,
since it is clear we're dealing with complex conjugate poles.

For practical purposes in a digital synth, where control signals
usually have a lower bandwith, it is useful to approximate pole curves
in a piecewize linear fashion, i.e. to approximate $\omega(t)$ as
linear curves over discrete points or $\omega[k] = \omega_0 + \omega_1
k$, which creates the pole curve $$z[k] = \exp [\phi(\omega_0 +
\omega_1 k)] = a_0 a_1^k$$ with $a_0 = \exp(\phi \omega_0)$ and $a_1 =
\exp(\phi \omega_1)$.  The good news here is that a constant-$Q$ path
in the discrete domain corresponds to a pole path that can be updated
in an efficient way, using successive multiplication by $a_1$.  What
remains is to find a filter topology with explicit mention of the
signal poles.


Entry: Constant-Q filter topologies
Date: Mon Mar 25 10:54:52 EDT 2013

The question remains, which is more convenient?
- A naive SVF + correction for non-linearity
- a "correct" pole-position using constant-Q linear interpolation

The answer is probably "both" as it depends on what the purpose is,
i.e. in what control structure they are embedded.

EDIT: 2013-5-7.  It's probably best to go for the SVF.


Entry: Construct a low-pass filter from a discrete pole pair.
Date: Mon Mar 25 11:01:58 EDT 2013
Type: tex

Given a set of complex conjugate signal poles $z_\pm$, how to
construct a low pass filter from it?  Two elements are important: how
to derive a low--pass from a real $1$-pole filter and how to construct a
real $2$-pole filter from a complex $1$-pole.

A real low-pass filter with input $i_k$ and state/output $x_k$, built
from a single real pole $z_0$, has the update equation $$ x_{k+1} =
z_0 x_k + (1 - z_0) i_k.$$ This can be interpreted as the filter
gradually \emph{mixing in} the input signal $i_k$ by \emph{forgetting}
part of the state $x_k$ in the update to the next state $x_{k+1}$.
When $i_k$ is constant, $x_k$ will start to approach it in the limit
$k \to \infty.$ The $z$--transform of the update equation is $zX = z_0
X + (1 - z_0) I$ or $$\frac{X}{I} = \frac{1-z_0}{z-z_0},$$ which
clearly shows that the response is $1$ for $z=1$.

This can be directly generalized to a filter with a single complex
pole, by replacing $z_0$ with one of the complex poles $z_\pm$,
e.g. $z_+ = c + j s$.  For $i_k$ constant, $x_k$ will eventually
approach it. The update equation $x_{k+1} + j y_{k+1} = z_+ (x_k + j
y_k) + (1 - z_+) i_k$ written in matrix form for the real and
imaginary components is $$ \begin{array}{rcl} \matrix{x_{k+1} \\
y_{k+1}} = \matrix{c & -s \\ s & c} \matrix{x_k \\ y_k} + \matrix{1-c
\\ -s} i_k \end{array}.$$ Note that calling this a low--pass filter is
stretching the definition a bit.  The only thing we constrain is the
response at DC, which should be $1$.

The next step is to look only at the real part of the state/output
sequence $x_k + j y_k$.  This does not change the response for DC. The
effect however is to sum the output with that of a second filter based
on the complex conjucate signal pole $z_-$.  A real filter with
response $1$ at DC is by definition a low-pass filter.  It is as if
the matrix equation can be interpreted in two ways: as a complex
one--pole with real input and gain 1 at DC, or as a real two--pole
low--pass when considering only $x_k$.

Note that for most poles of interest, $c$ will be close to $1$, so it
might be numerically more interesting to parameterize the
implementation in terms of $1-c$.


Entry: Exponential curve interpolation
Date: Mon Mar 25 13:08:31 EDT 2013

To come back on the exponential curve interpolation, the question is
really how much difference is there in computing a_0, a_1 and a_N =
a_0*a_1^N from [1].

I.e. it doesn't matter that a_0 and a_N computed from direct
exponentials are somewhat off, what is important is that the direct
computation of the approximation of a_N is fairly close to a_0*a_1^N,
where a_0 and a_1 are also approximations.

[1] entry://20130325-094100


Entry: Broad-range, weak approximation to 2^x
Date: Tue Mar 26 09:34:01 EDT 2013
Type: tex

I need an exponential mapping $2^x$ that is accurate on the large
scale but can have some local deviations as long as they are
relatively smooth.  A canditate is to make it exact for integer powers
and then require continuitiy of the derivative.

\section{Piecewize}

Computing a weak approximation to $2^x$ can be split in two parts:
floating point modulo, which splits $x = n + \epsilon$ with $n$ an
integer and $\epsilon \in [0,1]$, and computing a polynomial
approximation for the range of $\epsilon$.  Let's focus on the latter
first.

A good guess is probably to make sure that function $2^x$ and first
derivatives $\log(2) 2^x$ of the polynomial $f(x)$ at $0,1$ agree with
$2^x$.  It's probably possible to drop one of those constraints and
just require that the derivatives match at the stitch points, or $2
f'(0) = f'(1)$.  This gives respectively $4$ and $3$ constraints for a
$3$rd and $2$nd order polynomial $f_3$ and $f_2$.  Solving the
constraints for the latter gives $$f_2(x) = 1 + \frac{2}{3}x +
\frac{1}{3}x^2.$$ It has $f'(0) = \frac{2}{3}.$ From the plot this
seems to be accurate enough for human interfacing, i.e. log pots.

The straightforward way to solve the The floating modulo problem is to
use a $\tt{float}$ to $\tt{int}$ conversion.  Once $n$ is obtained,
evaluating $2^n$ can be computed in a loop in $O(\log(n))$ time.

\section{Limited Range}

The loop in the previous approach can be problematic, e.g. for SIMD
evaluation.  A different way to construct an approximation is to keep
in mind that function iteration can sometimes help to construct higher
order polynomials efficiently.

Based on the relation $$2^x =
(2^{\frac{x}{2^n}})^{2^n},$$ we can take the inner exponentiation to
be the approximation $f_2(x)$ from above and use an iteration function
$u(x)=x^2$ to get the estimate $$f_n(x) = u^n[f_2(\frac{x}{2^n})],$$
which works for $x \in [0,2^n]$.

For the application, $5$ orders of magnitude will suffice, which
roughly corresponds to the \emph{audible} frequency range
$20$Hz-$20$kHz together with the \emph{haptic} range $0.2$Hz-$20$Hz.
It's a nice coincidence that in this range we find $2^{2^4} = 65536$,
approximately $10^{4.8}$.  The ratio $f_2(x/2^4)^{2^4} / 2^x$ over the
range $[0,2^4]$ ranges between $0.97$ and $1.05$, which is pretty
good.

Entry: Successive squaring for complex exponentials
Date: Tue Mar 26 17:35:42 EDT 2013
Type: tex

\section{Buildup}

The previous post[1] makes one wonder if the same trick works for
complex exponentials.  Also hinted at in [2].  Note that the idea
doesn't change if we change the equation to $$e^x =
(e^{\frac{x}{2^n}})^{2^n}.$$

What we're interested in is a pure imaginary exponent $e^{j\omega}$,
since the real exponent is already solved in [1], so the unit--norm
constraint could be used to perform some more approximations.  For the
application, the range of interest is $\omega \in [0,\pi]$.

Computing the iteration is straightforward.  The problem is in the
initial estimate.  Over the whole range, we'd like to have accuracy
concentrated in the lower frequency ranges.  This can be established
by making the approximation error maximally flat at $0$, correspond to
a truncated Taylor series.  Also we'd like to err to one side, keeping
the magnitude $| f(x) | \leq 1$.

For example, with $4$ iterations as before, the range for the inner
approximation would be $[0.\frac{\pi}{2^4}]$ which gives the first
catch: the reduction in argument range is not double exponential as it
is for the real exponential function. I.e. multiplication of complex
numbers gives has an \emph{additive} phase effect, as opposed to a
multiplicative magnitude effect.

But\ldots maybe that is not a problem?  Just a factor $16$ might be
enough to bring most of the useful frequency range into a precise
enough region.  Convergence is very fast nearing $0$.

\section{Algo}

After some plotting, I'm getting very good results for $[-\pi,\pi]$
with second order approximations $$s(x)=x \text{ and }
c(x)=1-\frac{x^2}{2}$$ followed by a single N-R normalization step[3],
which is a multiplication of $c(x)$ and $c(x)$ by $s(x) = (3 - (c(x)^2
+ s(x)^2)) / 2$, followed by two complex squarings compensated with
the appropriate $2^2=4$ times input range rediction.  This is quite
remarkable.  The resulting polynomial is of order $24$.

% [1] entry://20130326-093401
% [2] entry://20130309-182354
% [3] entry://20130315-205646


Entry: Imaginary logarithm
Date: Tue Mar 26 18:17:11 EDT 2013
Type: tex

Using halving, followed by re-normalization, it is possible to
construct a $2^n$th root of a unit norm complex number, which when
taken far enough allows the computation of a logarithm.

Using a single N-R step normalization, this seems to work fairly well
for $\theta < \pi/4$.  


Entry: Exponentials: drift compensation.
Date: Wed Mar 27 00:36:36 EDT 2013
Type: tex

What I missed last time is that the angle between $p_0 = e^{j\phi_0}$
and $p_1 = e^{j\phi_1}$ can be computed from $p_{10} = p_1 \bar{p_0}$.
The division is just multiplication with the complex conjugate since
the norm is $1$, and since the value is very close to $1$, the angle
can just be read as the imaginary part of $p_{01}$.

This cheap logarithm can be used for the computation of phasor paths.
The successive approximation algorithm from last post.


Entry: Phasor path
Date: Wed Mar 27 00:44:59 EDT 2013

Init: p_0
Desired at end of period: f_1
p_1  = ^exp(-j f_1)  # approx, but make sure unit norm
p_01 = p_1 * p_0'    # complex conjugate, to get angle difference

p_01^(1/(2^n))       # p_01 is expected to be small, so this should be fairly accurate

Computing the log can be split into 2 parts: first part with
normalization, second part as division.

Eventually, the algorithm will converge, as long as there is no
positive feedback.  I.e. we should *underestimate* the magnitude of
p_01^(1/(2^n)).

The approximation of the complex exponential can a simple second order
approximation, together with normalization and successive squaring.

If accurate frequencies are necessary, they could be computed using
another feedback loop, i.e. a PLL.

A second level is possible here too: since the normalization step is
part of the computation, it will might be added to the compensation.
I.e. given an approximation of exp(-jw) that produces unit norm
estimates, construct a map of w'->w to compensate for the phase
non-linearity.


Entry: Rational squash functions
Date: Wed Mar 27 09:05:46 EDT 2013
Type: tex

Using the reasonable approximation of $2^x$ from [1], and making the
range symmetric around $0$, e.g. by starting the iteration using
$g_2(x) = f_2(x+1/2)$, it is possible to construct a squash function
defined on a limited range [2] from the function $$s(x) = \frac{2^x -
1}{2^x + 1},$$ with $s'(0) = \frac{1}{2}\log(2)$.  However, $s(x)$
reaches $f(x')=0$ only at $\pm \infty$, so not much can be said about
the derivative using the approximation of $2^x$.

I got this function from [3], noticing that the plot of $R_1$ in
yellow looked odd, as a rational function cannot have a different
limit to $+\infty$ and $-\infty$.  Then I realized the $x$ axis is
logarithmic, with the left side of the graph approaching $0$ from
above.  This corresponds to the function $s(x)$.

EDIT 2013-5-7: Actually, this is the Hyperbolic Tangent function.

% Another approach is to use the N-R approach of the normalization function?

% [1] entry://20130326-093401
% [2] entry://20130321-160005
% [3] http://en.wikipedia.org/wiki/Chebyshev_rational_function


Entry: ...
Date: Thu Mar 28 00:42:35 EDT 2013
Type: tex

Direct synthesis of waveform integrals followed by differentiation, as
can be done for a sawtooth wave by synthesizing piecewize parabola,
seems to work well for limiting the aliasing.

Some questions.  Does this work for a square wave, by differentiating
a triangle?

And, is it possible to modify the technique to implement anti--aliased
static nonlinear saturation, which for high gain factors suffers from
similar problems.

I don't see how to directly synthesize the time integral of such a
nonlinearity.  For a non-linearity $\sigma(x)$, operating on a
sequence $y_n = \sigma(x_n)$, this would boil down to generating the
integral directly, or $$z_{n+1} = y_n + z_n = \sigma(x_n) +
\sigma(x_{n-1}) + \ldots$$
which doesn't seem to make much sense.


Entry: Static nonlinearities for discrete systems.
Date: Thu Mar 28 01:04:34 EDT 2013
Type: tex

The effect of a static nonlinearity $\sigma$ on a signal $x_n$ should
be seen as the anti--aliased sampling of $\sigma(x(t))$ where $x(t)$
is the reconstruction of $x_n$.  Typically this is done using
oversampling.  A naive example with a linear interpolating
oversampling step, and a $\frac{1}{4}[1,2,1]$ subsampling mask is
$$\frac{1}{4} [\sigma(x_{n-1}) + 2 \sigma(\frac{x_{n-1} + x_n}{2}) +
\sigma(x_n)].$$

Isn't it better to design this directly as a \emph{dynamic}
nonlinearity, or as a static nonlinearity that takes $2$ inputs but is
black box such that some internal sharing might be exposed?

From listening, I have a vague suspicion that this is only important
for hard--corner nonlinearities, for which the integral might actually
be computable in a piecewize fashion using curve intersection.

Intuitively, what happens is that when the signal makes big,
square--wave jumps, the end points are actually very important, as
this is where phase differences create amplitude-modulated pulse-like
distortions, which are quite annoying.


Entry: Iteration and polynomial interpolation
Date: Sun Mar 31 16:18:59 EDT 2013

Thinking about recent ad--hoc experiments with polynomial iteration.
However, there is a classical poly approx method that does successive
refinement, adding one approximation point at a time.  This would be
Newton interpolation.  It's not really what I had in mind though..
Overview from "Inleiding tot de numerieke wiskunde" - Adhemar
Bultheel.

- Linear equation - Vandermonde matrix

- Lagange: sum of base polynomials  f_n at x=x_n and 0 at x=x_i, i!=n

- Newton: successive approximation, one data point per step +
  specialization for equidistant, ...

- Hermite: also derivatives


Entry: Stilson's PhD meta
Date: Mon Apr  1 01:26:57 EDT 2013

I started reading Stilson's PhD theses again a couple of days ago.  An
interesting book, with a good general smell.  I first noticed the
finished PhD here[1], a little over two years ago.  Some references
are the 4-multiply equalizer[3] and the Sawtooth hack[4].

An idea that seems to be ripe for me to explore again are several ways
of looking at different CT/DT relations.  Also mentioned here[2].

The question I have now is: what is the point?  Why bother using the
Time-scale calculus[5]?

Some other things:
- Weighted least squares for polynomial-parameterized FIR filters


[1] entry://20110104-045522
[2] entry://20110104-110954
[3] entry://20110105-070637
[4] entry://20110104-153108
[5] http://en.wikipedia.org/wiki/Time_scale_calculus


Entry: Root locus
Date: Mon Apr  1 23:53:39 EDT 2013

Just a random thought: computing root locus paths directly is
difficult since it requires the factorization of a polynomial at each
point.  However, when a single root is known, working the other way
around might be a lot more trivial: compute how coefficients move when
we nudge the poles, then integrate this according to the desired
direction of the coefficients.

This could probably be used in some kind of parameter optimization
procedure!  Compute function (functional) of pole path, then solve or
minimize.

[1] http://en.wikipedia.org/wiki/Inverse_functions_and_differentiation


Entry: Autodiff partial differentials
Date: Tue Apr  2 00:09:36 EDT 2013

Following Dan's Haskell implementation, it might be better to define
the autodiff in RAI in a similar way, by allowing a list of
infinitesimals instead of just one, to allow evaluation of partial
derivatives.

[1] http://blog.sigfpe.com/2006/09/practical-synthetic-differential.html


Entry: Natural gravitation points
Date: Tue Apr  2 00:29:10 EDT 2013

- autodiff, smooth infinitesimal analysis & synthetic differential geometry
- geometric algebra
- conservative, chaotic dynamics
- difference calculus


% [1] entry://20130320-191804


Entry: Backwards vs. Forward difference
Date: Mon Apr  8 00:59:53 EDT 2013
Type: tex

Two naive approximations for discretizising the differential equation
$$ \dot{y} = x$$ corresponding to the Laplace domain equation $s y =
x$, representing the integrator $y=s^{-1}x$, are the \emph{backward
difference} approximation $$y[k] - y[k-1] = x[k]$$ corresponding to
the Z--transform representation $(1-z^{-1})y = x$ and the
\emph{forward difference} approximation $$y[k+1] - y[k] = x[k]$$
corresponding to $(z-1)y = x$.  From the sign in $k\pm 1$ it is clear
where the naming scheme comes from.

Respectively, these discrete approximations to differential equations
correspond to the backward and forward transforms $$T_b(s) =
1-z^{-1}$$ and $$T_f(s) = z-1,$$ which relate the $s$ and $z$
domains in two different ways.

Following directly from the transform mappings and normalization to
causal form [3], the discrete integrators corresponding to the
continuous integrator $i(s) = s^{-1}$ for the backward and forward
transforms are $$i_b(z) = \frac{1}{1-z^{-1}}$$ and $$i_f(z) =
\frac{z^{-1}}{1-z^{-1}} = z^{-1} i_b(z).$$


The backward difference has a delay--free path so can not always be
used in \emph{structural} transformation, i.e. the direct replacement
of integrators in analog filter topologies.  The forward transform has
a decoupling delay.

The forward transform is the basis of the calculus of finite
differences [2][4].


% [1] https://ccrma.stanford.edu/~stilti/papers/Welcome.html
% [2] entry://20110104-110954
% [3] entry://20130415-172439
% [4] http://en.wikipedia.org/wiki/Finite_difference#Calculus_of_finite_differences


Entry: Circle filter
Date: Mon Apr 15 11:10:56 EDT 2013
Type: tex

From stilson's PhD, I'm a bit surprised to find a constant bandwidth
filter with circles as root loci for the frequency
component\footnote{See [3].}.  However it does not have straight lines
for the bandwidth parameters.  This seems odd, let's have a look at
it.

The pole equation of the forward $(z-r)^{-1}$ and backward
$z(z-r)^{-1}$ leaky integrators in series, fed back through a $k$ gain
is $$(z-r)^2 + zk = 0.$$ The coefficient of $z^2$ is $1$, which means
the constant term $r^2$ is the product of the roots.  This product is
independent of $k$, so the root locus for $r$ constant is indeed a
circle.

This analysis makes it clear why we have both a forward \emph{and}
backward integrator.  The presence of the extra delay keeps the $k$
coefficient away from the quadratic and constant terms, which
determine the product of the poles and thus their magnitude in case
they are complex conjugate.


% [1] https://ccrma.stanford.edu/~stilti/papers/Welcome.html
% [2] md5://86f4ad7cfe947186e03a5d0c20ad3e65
% [3] entry://20130421-101744

Entry: Realizable transfer functions
Date: Mon Apr 15 17:24:39 EDT 2013
Type: tex


Instead of writing a filter transfer function as a quotient of two
polynomials in $z$ like $$\frac{1}{z-z_0}$$ in most DSP literature one
writes $$\frac{z^{-1}}{1-z_0z^{-1}}$$ to make it more clear what the
\emph{realization} of the filter looks like in terms of implementable
delays $z^{-1}$, as opposed to non--implementable predictors $z$.  The
form $F(z) = F_i(z^{-1}) / [1 - z^{-1} F_o(z^{-1})]$ with $F_i$, $F_o$
polynomials, is a representation of a \emph{direct form} realization
of a causal IIR filter[1].

The negative exponents ultimately come from the definition of the
Z--transform\footnote{ When working with discrete transfer functions,
I find the use of $z$ as forward delay a little annoying.  I've seen
literature where the sign of the exponent in the $z$--transform is
reversed, to yield easier to read formulas.  Basically, we're dealing
with \emph{causal} filters $99 \%$ of the time, so why not take the
convention that causal sequences correspond to positive powers of $z$,
so we can get rid of all those $.^{-1}$?  Even though I've been bugged
by this for a while[2], for the sake of clarity I'll stick to the
convention found in most of the DSP literature as this seems to be
quite firmly established.  The point of this post is to lift my
confusion about the conventional notation. } of a sequence $a_n$ as
$$A(z) = \sum_{n = -\infty}^{\infty} a_n z^{-n},$$ which for a causal
impulse response has only negative powers in $z$.


% [1] http://en.wikipedia.org/wiki/Digital_filter#Direct_Form_II
% [2] entry://20110111-155812


Entry: Bultheel - Linear Prediction
Date: Mon Apr 15 20:08:20 EDT 2013
Type: tex

This[1][2] is one of my old-time favorites.  It has a very wide
perspective on the problem of linear prediction.

% [1] ftp://129.132.148.131/hg/EMIS/journals/BMMSS/Bulletin/bul941/BULTHEEL.PDF
% [2] md5://dc1a061d4aaef16a7b53bf2753f10905


Entry: Ladder vs. Lattice?
Date: Tue Apr 16 02:02:46 EDT 2013

EDIT: See later this week.

Lattice has to do with AR modeling.  What is a (normalized) ladder
filter?  It shows up here[1].

Just staring at some pictures, something tells me the lattice vs. ladder
distinction has to do with hyperbolic vs. orthogonal rotations...

EDIT: Yes +-.  A ladder filter can be derived directly from a
waveguide, while a lattice filter is derived from the Levinson-Durbin
algorithm for linear prediction.  Both arrive at +- the same point,
with the former being engergy-normalized over the sections and the
other one not.  They have the same transfer function for the allpass.

Moonen's course notes might have some more info[2].  From Bultheel's
linear prediction article[4] it is clear that the hyperbolic form
comes from moving from an energy equation for traveling waves moving
into opposite directions specified at different points in space, to a
form that is parameterized in space only.  Basically, these are
equations that can be interpreted in two directions.

Ladder is phase shifter: Given an allpass filter $h_n(z)$, adding an
orthogonal ladder junction (wihtout delay) parameterized by angle
$\alpha$ will give a new allpass filter

$$h_{n+1}(z) = z^{-1} \frac{s + h_n}{1 + s h_n(z)}.$$ with $s = \sin \alpha.$

I.e. given a phase shaper, it will shape it some more.  Without the
delay this doesn't do much: such operations form a monoid (actually a
group).  However, adding the delay will add a full phase wrap on each
section.

?? Now the relation between ladder and lattice is to observe that the
recursion relation above can also be implemented more directly ??


[1] http://thesounddesign.com/MIO/EQ-Coefficients.pdf
[2] http://homes.esat.kuleuven.be/~moonen/asp_course.html
[3] https://ccrma.stanford.edu/~jos/pasp/Conventional_Ladder_Filters.html
[4] ftp://129.132.148.131/hg/EMIS/journals/BMMSS/Bulletin/bul941/BULTHEEL.PDF


Entry: Equalizers
Date: Tue Apr 16 02:23:43 EDT 2013

Interesting read[1].  It mentions the Massie paper also referenced in
Stilson's PhD, which implements an EQ section using an allpass filter
that is mixed with the original signal for boost or cut.

[1] http://thesounddesign.com/MIO/EQ-Coefficients.pdf


Entry: Waveguide, Ladder, Lattice
Date: Wed Apr 17 16:22:20 EDT 2013
Type: tex

This derivation tries to make sense of discrete waveguide, ladder and
lattice filters, starting from the ladder filter as an intuitive
physical model.  Since the terminology is not completely clear to me,
I'll add ``FIR'' and ``IIR'' to disambiguate the direction of the
signal flow.

The waveguide is a direct model of left and right traveling waves
combined with scattering junctions.  We will show that a waveguide
model with $z^{-\frac{1}{2}}$ delay sections has the same transfer
function as a ladder filter, that a ladder filter is an allpass
filter, that a ladder section can be related to a lattice section by
inverting one of the input/output relations, and that the input/output
relation of a feedforward lattice filter can be computed based on this
relation.

An important observation is to separate the concepts of filter
realization (implementation block diagram), from abstract filter
transform domain equations.  Essentially, an equation does not have a
direction, while a filter realization is a directional update equation
directly related to a dataflow network.  As a notational device,
transform domain equations or transfer functions in a specific form
are used to hint at a specific realization.

The main trick in the derivation below is to perform algebraic
manipulations on equations in the $z$ domain, and then relate them to
directional update equations in the discrete time $k$ domain.

A \emph{wave junction} is defined as an equation $$(r_o, l_o) = J
(r_i, l_i),$$ relating four entities: incoming right and left
traveling waves $r_i$, $l_i$, and outgoing right and left traveling
waves $r_o$ and $l_o$. For a lossless junction, the linear
2--dimensional operator $j$ is required to be be energy--preserving
(orthogonal).  $J$ can be represented by the matrix $$J_\theta =
\matrix{\cos\theta & -\sin\theta \\ \sin\theta & \cos\thetac}.$$ We'll
use the abbreviation $c_n=\cos\theta_n$ and $s_n=\sin\theta_n$.

In a wave guide model, this 4--port relation is applied multiple
times, connecting together different sections that are assumed to be
spatially separated such that left and right wave travel time between
sections corresponds to half the unit delay.

We define junction $n$ to connect sections $n$ and $n+1$, where the
rightmost section $0$ is a termination 2--port.  The junction equation
at $n$ is then $$\matrix{r_n \\ l_{n+1}} = z^{-\frac{1}{2}} J_{\theta_n}
\matrix{r_{n+1} \\ l_n}.$$

A change of time reference for the right side of each junction,
expressed by the substitution $r'_{n+1} = z^{-\frac{1}{2}} r_{n+1}$
and $l'_{n+1} = z^{-\frac{1}{2}} l_{n+1}$ allows the symmetric
half--unit delay to be absorbed into a full delay, yielding a ladder
filter topology.  This is equivalent and possibly easier to understand
by drawing the diagrams and pushing each rightward half--delay through
the right side network until it shows up to combine with the leftward
half--delay.  The transformed equation is then $$\matrix{r_n \\
l'_{n+1}} = J_{\theta_n} \matrix{r'_{n+1} \\ z^{-1} l_n},$$ which can
be performed at each junction, shifting the time base of each section
by $1/2$.  Below we work only in this time--shifted coordinate system,
dropping the primes in the notation.

To summarize, a symmetric waveguide junction is equivalent
(isomorphic) to an asymmetric ladder filter junction.  The difference
is one of time coordinates of nodes $r_n$ and $l_n$.  This yields two
advantages.  The ladder filter nodes $r_n$ are not state variables,
but instantaneous computation nodes, so they do not require memory
when a filter is realized.  These computation nodes can be reversed
without introducing un--implementable inverse delays.  This allows the
lattice filter to be related to a pair of FIR filters.

So a ladder filter topology is more useful in practice, but a
waveguide filter might be simpler to relate to a more intuitive
physical structure.  From this point on our equations talk only about
the ladder filter \emph{structure}.  However, we will be able to use
the waveguide \emph{analogy} due to the isomorphism.

A ladder filter needs to be terminated at one end, effectively turning
a collection of 4--port relations (the energy--preserving junctions +
delays) into a single 2--port relation, i.e. a filter.

We'll terminate it at juction $0$ with an abstract transfer function
$h_0(z)$ or $l_0(z) = h_0(z) r_0(z)$, meaning the righ--traveling wave
$r_0$ gets shaped by $h_0$ to become the left--traveling wave $l_0$.
This allows us to write a recursion relation for the transfer function
of each section by succesively applying $J_{\theta_n}$ to yield a
nesting of transfer functions $$\begin{array}{ccc} h_{n+1}(z) =
f_n(z^{-1} h_n(z)), & f_n(x) = \frac{s_n + x}{1 + s_n x}, & s_n = \sin
\theta_n.  \end{array}$$ Note that the transfer function does not
depend on $\cos\theta_n$, which is important later when we relate it
to lattice filters.


Because $f_n$ maps the unit circle to itself, if $h_n$ is an allpass
filter, $h_{n+1}$ will also be allpass.  Note there is some freedom in
where to place the $z^{-1}$ unit delay.  The alternative is to perform
$f_n$ first and multiple by $z^{-1}$ afterwards\footnote{If there is
no delay, the result of the reucursion is a composition of moebius
transforms, which form a group.}.

To relate this recursion relation to FIR lattice filters, the
equations need to be re--arranged to $$(r_{n+1}, l_{n+1}) =
\frac{1}{c} J'_\theta (r_n, z^{-1} l_n).$$ with $$J' = \matrix{1 &
-s_n \\ -s_n & 1}.$$

This is an algebraic transformation.  Adding a \emph{directional}
interpretation to this relates the ladder structure to the FIR lattice
structure.  We set the input of the FIR lattice filter to be $l_0 =
r_0$.  This gives two output nodes $l_N$ and $r_N$ for $N$ sections.
Since the lattice structure is feedforward, both outputs $f_l = l_N /
l_0$ and $f_r = r_N / r_0$ need to be FIR filters, i.e. polynomials.

An interesting point here is that allpass transfer characteristic
$$\frac{r_N(z)}{l_N(z)}$$ of the ladder filter can be used to obtain
the transfer function of both FIR outputs.  It is just a relation
between nodes, and we are still working with the same equations!  The
only thing that changed is the interpretation of the equations as a
different realization.  What this means is that $f_r(z) / f_l(z)$ is
allpass.  From that we have $f_r(z) = f_l(z^{-1})$.  These are
sometimes called the. the forward and backward predictors in the
context of \emph{linear prediction}.


\section{Conclusions}

The allpass--terminated digital waveguide with energy--preserving
junctions and half--unit section delay implements the same allpass
transfer function as a ladder filter.  The forward lattice filter is a
re--interpretation of the ladder filter equations, with one of the
signal directions reversed.


Entry: Inverse Lattice
Date: Wed Apr 17 20:58:19 EDT 2013

So, what is the inverse lattice in [1]?  It's related, but not the
same as the normalized ladder.  It appears it is not normalized, but
cosines do show up in one direction due to the multiple paths.

EDIT: Cosines show up because the junctions are still
energy-perserving (unit norm eigenvalues) but no longer orthogonal.  A
non--normalized lattice allpass implements the same transfer frunction
as a normalized ladder, but will have undergone a coordinate
transformation for the state variables, i.e. the signals traveling
through the sections and delays.

It seems that the inverse lattice is really just the lattice filter
with the delay--free chain reversed.  There is no 1/c normalization,
but this only has an effect on the allpole output, not the allpass, as
in the forward lattice, both forward and backward predictors are
scaled in the same way, and they serve as numerator and denominator of
the allpass.

The important thing to realize is that the equations harbor several
filter realizations: forward/backward FIR, reconstructing allpole,
allpass, both with and without the 1/c normalization.

% [1] ftp://129.132.148.131/hg/EMIS/journals/BMMSS/Bulletin/bul941/BULTHEEL.PDF


Entry: Lattice, Ladder and Waveguide
Date: Thu Apr 18 13:36:56 EDT 2013

Since all these concepts are related, it is hard to get subjective and
pick out the most important concept.  We might choose the central idea
to be the construction of a family of allpass filters from a single
recursion relation that corresponds directly to the structural
operation of adding a left/right signal flow section to an existing
allpole filter.

The normalized ladder filter can be interpreted as a time--shifted
version of a lossless digital waveguide.  This analogy can be used to
add a physical interpretation to the resulting structure.

Once an allpass filter is obtained from the iterative construction
process, the allpole filter corresponding to the denominator of the
allpass is accessible directly in the filter structure, and the two
FIR filters that make up the numerator and denominator of the allpass
are accessible in the time--reversed ladder filter.

From the time--reversed normalized ladder filter, the lattice filter
can be constructed by dropping the normalizing multiplication between
different sections.  This structure is the same as the one can be
derived from the Levinson-Durbin algorithm for solving the Yule-Walker
equations occuring in the problems of autoregressive modeling and
linear prediction.

To realize an arbitrary minimim phase FIR, allpole or stable allpass,
any of the structures can be used.

As a remaining question, why would one use a normalized ladder filter
instead of an IIR lattice filter (non-normalized ladder) in the
construction of an allpass filter?


Entry: Levinson-Durbin algorithm
Date: Fri Apr 19 13:14:35 EDT 2013

The important part is the definition of the partial correlation
coefficients as the residu--normalized prediction error of the next
autocorrelation coefficient.


Entry: Transposition of realizations
Date: Fri Apr 19 13:15:58 EDT 2013

To compute the transposition of a filter realization:

- Reverse arrows
- Exchange summation points "+" with contact points "o"
- Swap input and output

( Note that a summation point with fanout is actually 2 nodes: a
summuation node connected to a contact point node. )

What is the algebraic counterpart to this?  (see below)

A way to prove this is probably to first devise a way to construct
arbitrary networks from primitives, prove it for the primitives and
prove that the construction rules leave the property intact.

The property is satisfied by these primitives:
- unit delay
- multiplicative scaling

The property is preserved by these compositions:
- serial composition (A B)
- parallel composition (o A/B +)
- feedback composition (+ A/B' o)

( There is probably a link to the Haskell ArrowLoop rules! )

Creating loops is like parallel composition, only one of the two
networks is reversed, the input is a summation node and the output is
a contact point.

From trying some examples, it seems that this is a complete set of
operations to construct an arbitrary network from primitives.

Given an arbitrary connected graph, the following algorithm can be
used to construct it in terms of the primitive operations.

1. Pick a primitive edge in the graph to start the construction

2. For all primitive edges between the 2 nodes, apply parallel or
   feedback composition to construct a subgraph.

3. If all nodes are covered, we're done.  If not, continue.

4. Pick a primitive edge that connects the current subgraph to a node
   outside of the subgraph, using serial composition.

5. Apply parallel or feedback composition for all nodes in the
   subgraph that connect to the newly added node.  Goto 5.


While it follows from construction, still, this transposition business
is rather remarkable.  It's not visible in "flat" algebraic form of a
transfer function, since there is no sharing.  I wonder if it is
visible in state-space form.

Yes. It follows directly from it!

The transfer function of a standard form state space model is

  C (Iz - A)^{-1} B + D.

For a SISO system this is scalar value, so taking the
transpose of it doesn't change its value.  This yields

  B^T (Iz - A^T)^{-1} C^T + D^T,

which corresponds to the transpose of the system matrix.  Transposing
the system matrix corresponds to performing the network transposition
operation.


Entry: All-pole filter designer GUI
Date: Fri Apr 19 14:17:26 EDT 2013

Given: some graphical represenation of a frequency response,
e.g. displayed as a log-log plot in a GUI.

In order for the user to be able to incrementally edit the graph, two
parameters should be settable: a gain increment at a specific
frequency (e.g. mouse drag point) and the bandwidth of the gain
increment.  Suppose that the "smoothing" operation can somehow be
defined.

Given a power spectrum, it can be mapped to an all-pole model using
the Schur algorithm after converting the power spectrum to an
autocorrelation sequence.  When the resulting AR model is plotted
again as a power spectrum, this by itself is a smoothing operation.

What about the following: allow the user to pick the filter order.
This will set the "smoothing amount".  Then tugging at different
places is just changing 1 frequency bin.  The effect of this on the
autocorrelation can be computed directly by adding a sinusoidal
component, so an IFFT step is actually not necessary.

Instead of adding a pure sine, which might be too "sharp", it might be
possible to update the autocorrelation directly with the
autocorrelation of a damped exponential.

Given an N (even) order model, the first N/2 damped exponentials added
will be matched exactly.  Impulse at DC is overall gain, not a
parametric EQ.  However, a notch filter can be designed as the inverse
of a peak filter.

Additionally, it might be interesting to look at interpolation between
different transfer functions by setting linear interpolation of
reflection coefficients, or different kinds of non-linear
interpolation where some reflection coefficients are interpolated at a
different rate.


Entry: IRT - impulse response truncation & Gibbs
Date: Fri Apr 19 20:16:15 EDT 2013

There the Dirichlet function shows up (not the sinc function!), which
is the in-phase sum of N complex exponentials. A.k.a. discrete
summation formula.

The integral of the Dirichlet function comes back in the explanation
of the Gibbs phenomenon for IRT filters.  Due to this, IRT filters can
never exceed a certain pass/stop-band spec.


Entry: White minimal amplitude, periodic
Date: Fri Apr 19 20:24:12 EDT 2013

What does a periodic signal with unit magnitude spectrum but random
phase look like?  How to make this signal minimal amplitude, i.e. how
to pump energy into a system while keeping the amplitude minimal and
the spectrum white.

( I found some links w.r.t radar pulses.  These look like chirps. )


Entry: Efficiency tradeoff for block-based FFT FIR implementation
Date: Fri Apr 19 21:03:49 EDT 2013

What is the optimal overlap / size for FIR implementation?


Entry: Stilson's Root Locus Adventure
Date: Sat Apr 20 12:48:59 EDT 2013

I can see how it gives some insight.  Building linear and circular by
doctoring root locus equations does look like an interesting approach.
However, the subject itself seems rather unstructured, i.e. a lot of
ad-hoc stuff, and additionally, the idea of "constant-Q" controls seem
quite unnatural for discrete time filters.


Entry: Logarithmic oversampling
Date: Sat Apr 20 13:04:54 EDT 2013
Type: tex

One way of approximating analog filters is to use \emph{structural}
transformations that keep the analog domain's \emph{separation of
controls} property intact around $z=1$, but will degrade quickly when
the poles move to higher frequencies.

One can work against this by simply using oversampling.  However,
oversampling is linear in complexity, meaning that for oversampling
rate $N$, a feedback matrix $A$ would be multiplied $N$ times.

It might be possible to compute $A^N$ more efficiently using
successive squaring when $N=2^n$.  However, this works only for low
system orders or for structured matrices, since dense matrix
multiplication complexity is $O(n^3)$.  E.g. it works beatifully for
orthogonal $2$D matrices since there are only two parameters to keep
track of, but that might be the limit.  Alternatively, we can compute
$r(c+is)$ for each pole pare approximately from the low frequency
parameter equation.


Entry: SVF and Symplectic Update
Date: Sun Apr 21 10:17:44 EDT 2013
Type: tex

Something I read a while ago, about constructing an approximation to a
conservative system by performing the difference equation update in a
\emph{ping--pong} fashion, i.e. using speed to update position and
then the new position to update speed, instead of performing the
updates in parallel.  Supposedly, this preserves the \emph{symplectic
structure}.  I lost the reference, so this is an attempt to
reconstruct it.

This might be related with approximations to 2nd order CT filters
mentioned in Stilson's PhD thesis[1], where two integrators are
replaced with a backward and forward difference each.  A similar thing
happens here: the update equations are no longer parallel, since
$x[k+1]$ (first integrator, BP output) is used in the computation of
$y[k+1]$ (second integrator, LP output).

It's probably best to first define what \emph{symplectic structure}
means, and then to see how it can be preserved in the analog to
digital conversion.  What I remember is that it is important to be
able to factor the update into two successive triangular updates.
This is exactly what happens in the mixed FW/BW SVF discretization.
At full resonance $k=0$ (we're talking about lossless systems only) in
the update

$$ \begin{array}{ccl}
z x_1 &=& x_1 - a (k x_1 + x_2 - i) \\
z x_2 &=& x_2 + a z x_1 \\ \end{array},$$
we have two consecutive
operations which hide behind the presence of $z x_1$ in the equation
(as opposed to $x_1)$.  This succession of updates can be represented
by two triangular matrices.

$$
\begin{array}{ccc} A = L U, & L = \matrix{1 & 0 \\ a & 1}, & U =
\matrix{1 &-a \\ 0 & 1} \end{array}.$$

Note that this matrix is not orthogonal or $A^TA \neq I$.  However, it
is symplectic.  A 2D discrete system with feedback matrix $A$ is
symplectic if $A^T J A = J$ with $$J=\matrix{0 & 1 \\ -1 & 0}.$$ This
property is satisfied by the conservative forward/backward SVF.  Note
that $J^2 = -I$.

Because $\det A = 1$ and $A \in R^{2 \times 2}$, if the poles are
complex conjugate they have to be magnitude $1$ so the system is still
stable.

So, is a symplectic system always stable?  In general, a $A \in R^{2N
\times 2N}$ is symplectic if $A^T J_N A = J$ where $$J_N = \matrix{0 &
I \\ -I_N & 0}$$ where $I$ is the $N \times N$ identity matrix.  From
this we have $\det A = 1$.  So it is conservative in that state energy
is preserved, but that doesn't mean the individual 2D subsystems
should be energy--preserving.  In [2], p282 it is explained how for
each eigenvalue $\lambda$, the inverse $\lambda^{-1}$ is also an
eigenvalue.

An example of a 4D symplectic matrix can be constructed from a 2D
symplectic matrix $A_1^T J_1 A_1 = J_1$ by constructing the matrix
$$A'_2 = \matrix{ r A_1 & 0 \\ 0 &  r^{-1} A_1 }.$$ Straightforward
computation shows that $A'_2^T J'_2 A'_2 = J'_2$ where $$J'_2 =
\matrix{0 & J_1 \\ -J_1 & 0},$$ which is a permutation of $J_2$, meaning
$J'_2 = P^T J_2 P$.  From this follows that $A_2 = P^T A'_2 P$ is
symplectic.  Since $P$ is orthogonal, $A_2$ and $A'_2$ have the same
eigenvalues.  The eigenvalues of $A_1$ are unit norm complex conjucate
which makes the eigenvalues of $A_2$ complex conjugate pairs of
magnitude $r$ and $r^{-1}$ respectively.


% [1] https://ccrma.stanford.edu/~stilti/papers/TimStilsonPhDThesis2006.pdf
% [2] http://mitpress.mit.edu/sites/default/files/titles/content/sicm/book.html

Entry: Determinant 1
Date: Sun Apr 21 11:46:27 EDT 2013
Type: tex

I've been running into the matrix $$A=\matrix{1 & -a \\ a & 1-a^2} =
\matrix{1 & -\sin\theta \\ \sin\theta & \cos^2\theta}$$ in several
places.  It has a determinant of $1$\footnote{As an aside, I've been
making the error setting $\| A x \|$ equal to $\| x \| \det A$. Norm
is only preserved for orthogonal matrices.  Given the similarity
transform $A = S^{-1} \Lambda S$, it does follow that if $\Lambda$ is
orthogonal then it preserves norm.  Feeding an arbitrary value to
$A^n$ will produce some norm distortion, but it will not "blow up" if
$n$ becomes large.}.


For a real matrix $A$, if $\det A = 1$ it doesn't necessarily mean
that $A$ is orthogonal, or $A^T A = 1$.  However, if $A \in R^{2
\times 2}$ with complex eigenvalues, there aren't enough degrees of
freedom for the eigenvalues to be anything else than complex conjugate with
magnitude 1.  This means the system with feedback matrix $A$ is
stable.  The poles are $$c' \pm \sqrt{1-c'^2}, \text{ with } c' =
\frac{1 + c^2}{2}.$$ It might be interesting to look at the
eigenvectors too.


Entry: SVF decay
Date: Sun Apr 21 14:01:42 EDT 2013

How to separate out the SVF decay parameter?  The symplectic form of
the SVF discretization has the nice property that the stability is
preserved, however this only says something about conservative
systems.  How to separate out the non-conservative part?  ( Does this
actually make sense? )


Entry: Symplectic update.  What to learn?
Date: Sun Apr 21 14:09:14 EDT 2013
Type: tex

The important idea is that the ping-pong update can transform a CT
conservative system into a DT conservative system.  For the 2D system,
this gives us a simple way to synthesize a stable oscillation using
only two multiplications.  For higher degrees, it doesn't seem all
that useful, since such a system is just a product of independent 2D
systems.

It seems that the usefulness of this approach would shine more for
\emph{nonlinear} conservative systems, which can be unstable in local
linear approximation, but always conservative such that an unstabe
pole with magnitude $r$ is always compensated by an unstable pole with
magnitude $r^{-1}$.

Also, for systems that do not have a direct correspondence to physical
position/velocity models, such as the Moog 4--pole, the connection
probably isn't there.


Entry: Symplectic integrator
Date: Wed Apr 24 16:33:00 EDT 2013

Once you find the right Google term...

[1] http://www.av8n.com/physics/symplectic-integrator.htm
[2] http://en.wikipedia.org/wiki/Symplectic_integrator
[3] http://en.wikipedia.org/wiki/Symplectic_Euler_method


Entry: Equalizers
Date: Wed Apr 24 17:07:54 EDT 2013

Peak/notch: 2nd order allpass  0 -> 180 -> 360
Shelving:   1st order allpass? 0 -> 180


Entry: All-pole
Date: Mon Apr 29 23:10:22 EDT 2013

There is something I do not understand about allpole filters.  Somehow
it seems that the most important property of an all-pole is that it is
the inverse of an all-zero.

An all-pole response is still a sum of complex exponentials, but
somehow they do not cancel out.  How exactly is this manifested?


Entry: Stretched exponentials: scaling attack & decay times
Date: Fri May  3 14:44:43 EDT 2013
Type: tex

Basic idea is to provide a scale that is mostly log--like, but can
express the extremities $0$ and $\infinity$.

How to map this time scale $T = [0,\infty]$ to a control parameter
range $C=[0,1]$ in a meaningful way? A mapping that satisfies the
interval boundaries is $$c(t) = \frac{t }{a + t}.$$ This maps $c(0)=0$
and $c(\infty)=1$.  The inverse is $$t(c) = \frac{a c}{1-c}.$$

The parameter $a$ can be determined by constraining the middle of the
scale $t(1/2) = a$.  Another important parameter is how fast the dial
will move to $0$ and $\infty$.  Both are extremes that are part of
$T$, but for say $50\%$ of the $T$ range we'd like to have decay rates
that change mostly exponential in $c$.

To expose the symmetry in $t(c)$, let's introduce a change of variable
$d = 2c -1.$ This gives $$t(d) = a\frac{1+d}{1-d}.$$ This scale is
logarithmically symmetric around $a$ or $t(d)/a = (t(-d)/a)^{-1}$,
meaning that in the middle range it behaves mostly exponential, while
tending to $0$ and $\infty$ in the two extremes\footnote{ This hints
at the input--scaled variant $\frac{1+ax}{1-ax}$ being a good
approximation for $e^x$, which is the case when $a=\frac{1}{2}$.}.

The slope of $t(d)$, relative to $a$ is fixed.  At $d=0$, the function
approximates $a\exp(2d)$.  For mapping meaningful parameters, this
might be a bit too flat.  Successive squaring of $t_0(d) = (1+d)/(1-d)$ can
solve this.  For $n$ squarings we have $\exp(2nd)$.  The curve then becomes
$$t_n(d) = a t_0(d)^{2^n}.$$

In practice it seems that a single squaring works good.  This gives a
reasonably flat log response for $3$ decades, about $70\%$ of the
scale, leaving the rest for the extreme range.  If necessary, the
extremities can be avoided by prescaling $d$.  With one squaring,
using $0.99$ limits the output range to about $9$ decades.

Mapping this to pole radius requires an extra step.  Let's use the
natural $1/e$ decay to relate decay time $t$ (measured in samples) to
pole $p$ as $p^t = 1/e$ or $$p = \exp(- \frac{1}{t}).$$

This approximation needs to be accurate for $t \gg 1$, and extend
correctly to $p=0$ at $t=0$.  The first degree Taylor expansion is $1
- 1/t$.  Modifying this slightly to give $$p' = 1 - \frac{1}{t+1}$$
yields the wanted behavior at $t=0$ without changing the large $t$
behavior too much.  For numerical reasons the update equations will
use the positive quantity $$q' = \frac{1}{t+1},$$ where $p' = 1 -
q'$. Composing the two mappings gives $$q_n(d) =
\frac{1}{1+a(\frac{1+d}{1-d})^n}.$$ where $a$ gives the mid--scale
time constant in samples.

I'm using $n=2$, but some knob twiddling makes me think that maybe
$n=1$ is better.  What is important is to get the mid--scale value
correct.  I.e. what is a prototypical note's attack and decay rate?

To find the warping


% [1] entry://20130321-160005


Entry: Derivative of log
Date: Mon May  6 00:55:38 EDT 2013
Type: tex

One of the things I never really understood properly, is
$$D \log(x) = \frac{1}{x}.$$

This essentially comes from two properties: derivatives of inverses
are related through multiplicative inverses, and the exponential
function is an eigenfunction of the derivative operator.

The expression of the derivative of the inverse $Df^{-1}$ can be
obtained from the chain rule as $$D (f \circ f^{-1}) = [(D f) \circ
f^{-1}] D f^{-1}.$$ Since this also expresses the derivative of the
identity function, we have $$D f^{-1} = \frac{1}{(D f)\circ f^{-1}}.$$
This relation is obvious for a linear function, e.g. $y=ax \Rightarrow
x=a^{-1}y.$ Also, following the composition of the 3 operations
$f^{-1}$, $Df$ and $x \to 1/x$ on a plot makes it fairly obvious what
is going on is just obtaining the tangent line and expressing it in
the proper coordinate system.

The expression for $D\log$ follows straight from the rule of the
derivative of the inverse of $\exp$.
$$D \log(x) = \frac{1}{\exp [ \log (x)] } = \frac{1}{x}.$$


Entry: Gain knob
Date: Tue May  7 13:40:14 EDT 2013
Type: tex

The stretched exponential can also be used for Gain knobs, where you
would want near-exponential behaviour for the high gains, but linear
behavior for the low gains, to be able to reach $-\infty$dB at the
left.  The prototype can be derived from $[(1+x)/(1-x)]^n$, taking
only $x \in [-1, a]$, where $a \in [0,1].$ A typical dB control knob
has $(-\infty, 0, 15)$.  With $g$ the right--half gain
($15\text{dB}\approx 5.6$) the relation between $g$ and $a$ is $$g
\frac{1+\frac{a-1}{2}}{1-\frac{a-1}{2}} = \frac{1+a}{1-a}$$ which can
be simplified to $$g(a) = \frac{3-a}{1-a}$$ and $$a(g) =
\frac{3-g}{1-g}.$$ What's left is normalization, which is
straightforward to do.  Map desired input range to $[0,a]$ and use
$(1+a)/(1-a)$ to normalize the output.

It is remarkable that $g(a)$ is a self--inverse or \emph{involution}.


Entry: Curious involution
Date: Tue May  7 16:18:04 EDT 2013
Type: tex

The function $$f(x) = \frac{x - a}{x - 1}$$ is an involution for $a
\neq 1$, meaning $f(f(x)) = x$.  Can this be put to practical use?
Where does it come from?

In general, functions $$\frac{ax - b}{cx - d}$$ with $ad - bc \neq 0$
are a closed set of bijections of the extended complex plane called
\emph{fractional linear transformations} or \emph{Moebius
transforms}[1].

The function $f$ maps $1\to \infty$, $\infty \to 1$,$0 \to a$ and $a
\to 0$.  This gives $4$ points relations for $f^2$: $1 \to 1$,
$\infty\to\infty$, $0\to0$ and $a\to a$.  Knowing that $f^2$ is also an
FLT leaves $f^2(x) = x$ as the only possible solution.

% [1] http://en.wikipedia.org/wiki/M%C3%B6bius_transformation


Entry: Spinors
Date: Tue May  7 16:57:06 EDT 2013

Maybe it's time to tackle the question in [2]:
Why aren't Spinors used in signal processing?

[1] http://en.wikipedia.org/wiki/Spinor
[2] entry://20100428-224141


Entry: SVF revistited
Date: Tue May  7 19:11:15 EDT 2013

Some interesting notes[1].  One is the introduction of a band-limited
nonlinearity by a power detector.

[1] http://www.cs.washington.edu/education/courses/cse490s/11au/Readings/Digital_Sound_Generation_2.pdf


Entry: FDN Reverberation
Date: Sun May 12 18:41:58 EDT 2013

[1] http://www.music.miami.edu/programs/mue/mue2003/research/jfrenette/toc.html


Entry: Environmental synth
Date: Fri May 17 01:26:27 EDT 2013


[1] http://www.charlesverron.com/content/papers/2010_ieee_verron_TASLP_draft.pdf


Entry: Simulating reverb tail
Date: Wed May 22 10:41:16 EDT 2013

How to simulate the effect of a reverb tail in an oscillator?


Entry: Distance in high-dimensional vector spaces
Date: Thu Jun  6 14:46:17 EDT 2013

From [2]:

    ... under a broad set of conditions (much broader than independent
    and identically distributed dimensions), as dimensionality
    increases, the distance to the nearest data point approaches the
    distance to the farthest data point.

Is this related to the idea that the volume of a high dimensional
sphere[3] is small compared to its bounding cube?  I don't see the
connection..

The basic problem seems to be that high-dimensional
discrete-dimensional spaces are "very connected" in a way that is not
immediately clear on an intuitive sense.  Our usual 1,2,3-dimensional
view is very particular, and doesn't generalize to larger numbers.


[1] http://spin.atomicobject.com/2013/05/06/k-nearest-neighbor-racket/
[2] http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.31.1422
[3] http://en.wikipedia.org/wiki/N-sphere#n-ball


Entry: Potential wells / nondeterministic chaos
Date: Fri Sep 27 18:57:24 EDT 2013
Type: tex

In a 2D potential well (e.g. a ``real'' well), does a system with
irregular bumps cause chaotic behavior?  I would think so, because
there can be directional divergence, which leads to amplification of
difference in initial conditions.  How to make this explicit?

Though I would expect there to be symmetry in solutions if there is
symmetry in the potetential well.  Looks like this would be about
breaking of such symmetry.


Entry: Making things with maths - Steven Wittens
Date: Wed Nov  6 14:08:11 EST 2013

- CSS 3D
- js1k demo[2]
- navier stokes parallellized on GPU
- how does he do the animated presentations? (Brett Victor's + mathbox.js[5])
- glsl shader curves from projective numbers & rational functions
- nice visual demo of quadratic/cubic bezier curve
- folding space / hybrid fractals
- zoidberg curve[3]
- stacked value noise (spatial 1/f^n noise)
- verlet integration (preservation of the symplectic form on phase space)
- mandelbulber[4]
- GLSL sandbox


[1] http://www.youtube.com/watch?v=Zkx1aKv2z8o
[2] http://js1k.com/
[3] http://www.wolframalpha.com/input/?i=zoidberg+curve
[4] http://mandelbulber.com
[5] http://acko.net/blog/making-mathbox/


Entry: Power-efficient variable-rate signal processing
Date: Fri Dec 13 14:46:42 EST 2013

Say with a chip like the GA144.  Does it make sense to write DSP
algorithms that gracefully degrade with "silence"?


Entry: Image Processing Primer
Date: Mon Dec 16 18:21:43 EST 2013

[1] http://engineeringnotes.net/Notes/EE/ImageProcessing.pdf


Entry: The absurdity of categories (histograms)
Date: Mon Dec 16 18:49:15 EST 2013

Example: household expense categories.

Too coarse: no information
Too fine:   no information

There seems to be no intrinsic way to decide what level of detail is
good enough.  However, there is definitely some level of detail that
makes "most sense" in some handwavy way.

Where exactly does the sweet spot come from?

Is understanding nothing more than applying appropriate courseness to
minimize some kind of connective / organizational property?


Entry: Transistor model
Date: Fri Dec 20 18:44:41 EST 2013

I want a good enough transistor model to run a circuit simulation in
real time on a recent PC.  What would be a good way to do this?  Just
use fixed step, high oversampling?


Entry: Impedance
Date: Fri Dec 20 22:12:03 EST 2013

Talked about this before: there's an interesting difference between
"unitless" difference equations and those that operate on two
variables like voltage and current.

There is a name for this.  I keep looking for it.  Was mentioned in my
2nd year course on systems theory.


Entry: I am a Strange Loop
Date: Thu Jan  9 22:32:48 EST 2014

Reading Douglas Hofstadter's sequal to Goedel, Escher, Bach.

Notes:

- Video feedback: where do the large-scale (1 second) stable pattern
  come from?  Similar to audio mixer feedback: 1/2 second patterns.


Entry: Where > and >= are the same.
Date: Tue Jan 28 10:17:19 CET 2014

Everyone tells you, you can't compare floats for equality.

This is an interesting problem in fact.  Equality doesn't really make
sense when your only functions are smooth functions, which is often
the case.  Non-smooth functions are often implementation artifacts,
such as mapping (wrapping) the real line onto a circle.  True
"thresholding" often meeds smooth squashing functions without
discontinuities.

( TODO: Find out how this relates to smooth infinitesimal analysis[1]. )

[1] http://en.wikipedia.org/wiki/Smooth_infinitesimal_analysis


From rai/prim.h:

/* Note that this is based on float to int truncation, so is not the
   same as the floor() function from math.h

   For negative integers, the map is n -> n - 1.

   In practical use in numerical algorithms this difference is OK.
   Think of it this way: in the presence of noise, there is no real
   difference between > and >=.
*/

#define _ float
#define i_ int

INLINE i_ p_ifloor(_ a) {
    i_ i_truncate = (i_)a;
    i_ i_negative = a < 0;
    i_ i_floor = i_truncate - i_negative;
    return i_floor;
}

INLINE _ p_floor(_ a) {
    _ f_floor = (_)p_ifloor(a);
    return f_floor;
}

from rai/stream-lib.rkt:

;; Map phase value to the [0,1] representation interval.
;;
;; Note that floor(n) for n integer and n < 0 maps n -> n - 1.
;; This means that wrap01 for n integer and n < 0 maps n -> 1.
;;
;; This is OK for "phase" values, i.e. those that are at some point
;; inserted into cos(2*pi*phase) or sin(2*pi*phase), since all
;; integers map to a single point.

(define (wrap01 x) (- x (floor x))) 


Entry: Integers don't exist
Date: Tue Jan 28 10:38:19 CET 2014

Time for some wacko intuitive math speculation..

I've always found the tension between integers and smooth functions
quite interesting.  Integers are "meta", in that they appear in the
theory _about_ smooth functions, but not in their domain.

The real world is a smooth domain, while the _study_ of the real world
is a discrete domain.  Integers have no reality!  <cough>

An interesting rule of thumb: whenever you run into integers while
doing signal processing, you're making a jump to "interpretation",
i.e. talking about discrete patterns in the behaviour of smooth maps.
In DSP or AI, domains that abstract measurments to interpretation,
integers and discrete structures are sometimes called "features".

Floating point math is best interpreted as an (apporoximate)
representation of smooth functions over smooth domain.  The smoothness
maps well to the idea of rounding errors in floating point: errors
will not amplify as much, and equality of numbers has no precise
meaning, while equality or equivalence of smooth maps is
well-defined.

When thinking about singnal processing, integers should always be
understood in relation to topology.  I wonder if this idea of link
between floats and smooth domains can be made precise.


Entry: Exponential IDAC (EIDAC)
Date: Sat May 10 19:07:20 EDT 2014
Type: tex

A sinking current can be constructed using a NPN BJT, which can pull
the voltage all the way down to 0.1V above ground.

The collector current $I_c = I_s \exp(V_{BE} / V_T)$ with $V_T = kT/q$
the thermal voltage, approximately 26mV at room temperature.  This
relation can be used to make an \emph{exponential} converter, which is
useful for controlling analog audio synthesis and processing circuits.

A typical range for audio frequencies is 4 decades which corresponds
to a $V_{BE}$ difference of $V_T log(10000)$, about 250mV.  For a
2N3904 NPN at 1mA, I measure $V_{BE}=630mV$.

It is possible to remove $T$ from the equation by using two thermally
coupled transistors to generate the high and low reference voltage for
a DAC.  These transistors are biased at respectively the maximum
current $I_+$ and the minimum current $I_-$.  The linear DAC will then
result in a geometrically interpolated current $$I / I_- = (I_+ /
I_-)^x \text{ with } x \in [0,1]$$

Temperature compensation is probably a good idea.  Sensitivity is
$$dI = -I(V_{BE} / V_T^2) dV_T = -I \frac{V_{BE}}{V_T} \frac{dT}{T}.$$
which is better written in relative form
$$dI/I = -\frac{V_{BE}}{V_T}dT/T.$$

Looking at ballpark figures $V_T=26mV$ at $T=300K$, $V_{BE}=630mV$ at
$I=1mA$ we have $V_{BE}/V_T \approx 24$ or a relative current error
temperature sensitivity of about $0.08/K$.  This is a thermometer!


If $I_- = I_s \exp(V_-/V_T)$ is derived from $I_+ = I_s \exp(V_+/V_T)$
through setting $V_- = r V_+$, then is $I_-$ still
temperature-dependent?  We have $$I_+ / I_- = \exp((1-r)V_+/V_T) =
\exp(V_+/V_T)^{1-r},$$ so if $I_+$ is regulated constant, the ratio
$V_+/V_T$ is regulated constant so $I_+ / I_-$ is constant as well.

This allows sending $V_+$ as a reference voltage to a DAC referenced
to ground.  Then using a non-inverting summing amp to average the DAC
output weighted $1-r$ with $V_+$ weighted $r$ gives $rV_+$.  For a
2N3904 at $1mA$, $r=2/3$ gives a $I_+/I_-$ scaling of
\exp(630/(3\times26)) \approx 3000.


Entry: Add a little frequency or a little period?
Date: Sun Jul  6 18:50:31 EDT 2014
Type: tex

Detunig by adding a little frequency is the same as subtracting a
little period.  With $p = \frac{1}{f}$ we have $$\frac{df}{f} =
\frac{d\frac{1}{p}}{\frac{1}{p}} = -\frac{dp}{p}.$$


Entry: Compsable or Separable?
Date: Sat Sep  6 20:10:31 CEST 2014

When you say Composable, you really mean Separable.  I.e. we think of
whole solutions first (top down) then wish for a way to chop the
problem into non- or little-interacting parts.  A composable construct
is then the building material.


Entry: Settable and Non-Interfering Signal Functions for FRP - Daniel Winograd-Cort 
Date: Sat Sep  6 17:13:09 CEST 2014

Take an arbitrary signal function and turn it into a resettable signal
function by replacing all delays with resettable delays.

[1] https://www.youtube.com/watch?v=zgNRM8tZguY


Entry: Manifold music
Date: Sun Oct 19 19:14:45 EDT 2014

Dus. 

1. one box with rotary encoders.  
2. one non-commutative manifold

use the knobs to navigate the manifold using local rotations
map the manifold coordianates to a musical generator


Entry: Circuit simulation
Date: Mon Oct 20 07:03:27 EDT 2014

Let's take a closer look at what is involved.

Some initial concerns:

  - How to turn network into procedural code? I.e. not all components
    are "directional".  Most are relational.

  - Opamp: probably won't work for set-point analysis (relaxation) if
    not reactive.
 

Start with some simple components:

 - Ideal opamp:

   4 terminals -> 8 degrees of freedom
   3 equations:
     I- = 0
     I+ = 0
     Vo - Vg = A (V+ - V-)

This is a scalar relation.  Lift it to an operator equation to also
qsupport reactive components.

  - Ideal capacitor:

    2 terminals -> 4 degrees of freedom
    2 equations:
      I+ = - I-
      C (V+ - V-) = dI+/dt

  - Ideal resistor:

    2 terminals -> 4 d.o.f.
    2 equations:
      I+ = - I-
      R I+ = V+ - V-


To solve: 

  - Collect nodes and equations.

  - If #eq = #unk, build diff matrix

  - eliminate algebraic parts?


Question: how to handle algebraic parts, i.e. eliminate variables to
prepare for solving the diff equation.  Is this necessary?  Can it all
be done in one (stupid) move?

They can be done in ping-pong.

  1. Pick initial conditions for all diff state variables.

  2. Solve algebraic equation for current setpoint

  3. Update diff state variables, goto 2.


More questions:

- What to do for non-linear relations?  How to solve a set of
  nonlinear algebraic equations?

- What should the front end for this look like?  


The non-linear equations probably need to be solved iteratively.  I'm
guessing that the iteration step for the time direction and that for
the "space" direction can somehow be combined.

Is it possible to transform the nonlinearities themselves into just
time nonlinearities?  Just to make the whole equation more uniform.

Let's try for a diode.

  - Ideal diode:

    I+ = I-
    I+ = Io e^[(V+ - V-)/VT]

Maybe introducing an extra variable here would work?

Another observation: currently, the only nonlinearity I'm interested
in is the exponential.  This has a special update property.  But
that's maybe not relevant.


Explore this first: transform an algebraic nonlinearity in an
approximate temporal one.

Essentially: an exponential can be computed as the static state of a
system.


Hmm... I'm thinking this might not work due to symmetry issues.  The
idea is to "open up" a nonlinear relation by exposing the
non-linearity.  Let's try again.

  y = f(x0) + f'(x0) (x - x0) + f''(x0) O[(x-x0)^2]

The idea is then to drop the O(.^2) term to compute an update for x
and y using the old values of f(x0) and f'(x0), and then recompute
these coefficients.


Though I'm still thinking about modeling a diode as a diode + series
coil and parallel capacitor.  If the values are in the correct range,
this would give the same low frequency behavior as an ideal diode, but
allow the nonlinearity to be isolated from the terminals.

           /---||---\
           |        |
 ---OOOO---o--|>|---o---

Starting this as an open circuit: L current = 0, C voltage = 0.


( Maybe a cause of problems when approximating ciruits with finite
things is that when not band-limited, circuits can contain an infinite
amount of information? )


Entry: Circuit sim, a-symmetry of static NL transformation
Date: Tue Oct 21 09:35:13 EDT 2014

Basic idea: add inertia to a static nonlinearity to make simulation
simpler.


                 Ic>
             /---||---\
             |    C   |
       L     |        |
V1 ---OOOO---o--|>|---o--- V0
    I>       Vx   Id>

I = Id + Ic
L DI = V1 - V0
Ic = C DVx
Id = f(Vx-V0)

6 variables, 4 equations -> 2 deg freedom

Compare to 3, 1 -> 2  (V1,V0,I) when L,C absent


So this transformation shields the diode from voltage transients on
V1,V0.

BUT

This introduces a-symmetry: the resulting 2-terminal can no longer be
driven by a current source.

Is that a problem?
And why is this a-symmetry choice so natural?


( There's a deeper philosophical questions: why this a-symmetry in
thinking about impedance?  V and I are dual.  Why do we prefer voltage
sources? )


When making one side of the V/I dual, one needs to ask which is the
other.  What is the V/I dual circuit of this? Is it relevant?

How does this relate to I=f(V) V=f^-1(I) duality for the nonlinear
element?


Start with the linear approximation of Id = Is e(Vx/VT)

   Id/Is = e(Vx0/VT) (1 + [Vx-Vx0]/VT)

Here Vx0 is the value from the previous time step.  This approximation
is based on the assumption that Vx0 ~ Vx over the range of the update,
which for a diode is the right thing.

This also gives a retrofitted reason for a-symmetry:
at small time scales, a diode behaves as a voltage source.


PROBLEM SETTING:

Solving a circuit differential equation can be separated in "space"
and "time" steps.

- Space step satisfies static relations at a specific time, based on
  the current state of energy storage devices (modeled as current or
  voltage sources).

- The time step approximates computes time evolution of energy storage
  state based on the static update calculated in the previous step.

In this approach, the time update step is just function evalutation,
i.e. it is "local", while the temporal step involves the solution of a
system of equations, i.e. it is "global".

Nonlinearities that include in the time (local) step are easy to
handle, but those that occur in the spatial (global) step require a
separate solution strategy: from linear to non-linear sets of
equations.

The idea then is to transform non-linearities from the spatial to the
temporal domain by surrounding them by energy storage devices,
essentially "low-pass filtering" the approximation errors.

This is done in such a way that the functioning of the system is not
compromised, i.e. the time constants added would be present in the
circuit anyway in parasitic amounts.


CONCLUSION:

( intuitive.. quantify this )

- a nonliearity can be "low-pass shielded", but this intruduces
  a-symmetry: is inertia added to voltage or current?

- express the static non-linearity in terms of the "slowed down"
  variable.  ( in the case above it would be Vx )


Entry: V/I Duality
Date: Tue Oct 21 09:52:18 EDT 2014

Is there a way to represent voltage and current in a single number?

Problem then is the idea of node.  A node can have a voltage, but not
a current.  Is there a way to capture the serial/parallel duality as
well?

I.e. what is a circuit modulo the V/I duality?[1]

[1] http://en.wikipedia.org/wiki/Quotient_space
 

Entry: A slowed-down diode
Date: Tue Oct 21 10:29:33 EDT 2014

Starting from the analog circuit:

                 Ic>
             /---||---\
             |    C   |
       L     |        |
V1 ---OOOO---o--|>|---o--- V0
    I>       Vx   Id>

  I = Id + Ic
  L DI = V1 - V0
  Ic = C DVx
  Id = f(Vx-V0)


The transformation from analog to digital is completely captured in
the equation:

   Id/Is = e(Vx0/VT) (1 + [Vx-Vx0]/VT)

Now, what about elimiting L and C, and trusting that the global
circuit has enough inertia to make this approximation work when
applied naively?


Is this just a very roundabout way to approximating every static
nonlinearity with a linearized version around the previous set point?

YES.

The key element is to equate the iteration process necessary for
solving a non-linear equation with the temporal one necessary to solve
a differential equation.

So the point is: the previous derivation with L/C filter localizes the
effect.  In practice, this is probably not necessary if the circuit is
slow enough.


Entry: Circuits: is connection essential?
Date: Tue Oct 21 10:46:34 EDT 2014

Trying to get a good intuition about the bridge between circuits
(impedance) and feed-forward difference equations.

Suppose the space->time approximation to linearize the connection
matrix is appropriate, is the "network quality" essential?

Basically, at each time step there is the solution of a system of
linear equations.  Is the "global nature" of this equation essential
to the behavior?

In other words: can such a system be factored to localize (pipeline!)
dependencies?

The idea of "pushing things into the past" is quite central when
thinking about distributed computation.


In practice: the connection is a matrix where coefficients might
depend on past behaviour (making them essentially constants).
What about factoring this matrix?


I.e. for linear systems it's possible to decouple.
For nonlinear ones there might be an essential "global effect".

Nonlinearity globalizes.


Entry: Spinor Fourier Transform
Date: Tue Oct 21 14:40:06 EDT 2014

  Our approach relies on the three following considerations: 

  - mathematically speaking, defining a Fourier transform requires to
    deal with group actions;

  - vectors of the acquisition space can be considered as generalized
    numbers when embedded in a Clifford algebra;

  - the tangent space of the image surface appears to be a natural
    parameter of the transform we define by means of so-called spin
    qcharacters.

[1] http://ieeexplore.ieee.org/xpl/login.jsp?tp=&arnumber=6507537&url=http%3A%2F%2Fieeexplore.ieee.org%2Fiel7%2F4200690%2F6558795%2F06507537.pdf%3Farnumber%3D6507537


Entry: About static nonlinearities and aliasing
Date: Sun Oct 26 23:12:28 EDT 2014

Johannes' blep clipper can possibly be generalized by polynomial
modeling of signals combined with polynomial distortion.  Evaluating
those polynomials at say 8 points and then combining them using a
low-pass filter.


Entry: Conjugate gradient
Date: Tue Nov  4 22:02:38 EST 2014

Basic idea:

- 1-D minimum is easy

- compute 1-D minimum across n orthogonal directions

- here orthogonal is actually conjugate = orthogonal in metric
  (ellipsoid) defined by convex optimization


Formulated as convex optimization problem -> this is why it only works
with symmetric, positive definite matrices.


Entry: Coming together
Date: Fri Nov  7 04:35:42 EST 2014

Some weird synchronicity

Looking into circuit theory lately.  From a restless night of
meandering ended up at John Baez' talks about network theory.

Forth and dup/swap/drop in John's diagrams[1] at around 14:00

Got a little jolt after seeing the word 'Fourth' on the slide.

Then it goes on explaining the "xor trick" for swapping two variables,
only using negation instead of xor.

Then it gets weird: bends as relations?
How would that work in Forth?

And how do classical mechanics and nonlinear circuit theory relate?


[1] http://math.ucr.edu/home/baez/networks_oxford/#I


Entry: Symplectic Integration
Date: Sun Dec 14 08:57:31 EST 2014
Type: tex

THIS NEEDS EDIT

So what does it all mean?  These are notes from/for an email
discussion with Kragen[1].

Before going into my notes again: the point of SI as I remember is to
preserve some symmetry / invariants when translating a continuous
system to a discrete one.


The way I understand it is that symplectic structure is a
generalization of the idea of complex numbers.  See $J^2 = -I$ in [3].
Complex numbers are intimately related to 2D rotations / circles.  A
linear conservative system is a product of circles (torus).

Glossing over details: The time evolution of a system can be expressed
as the exponential expressed in terms of the (generalized) complex
unit, a hamiltonian and time. Essentially this is a generalization of
the 1-D case: $y(t) = y_0 e(-j w t)$, where $w$ is related to the
square root of the energy of the system (= hamiltonian).

Now if you move from linear to nonlinear, the symplectic structure
will be only local - i.e. a property of derivatives.  This is where
symplectic manifolds come in.

So symplectic integration is a way to preserve that local symplectic
structure in the hope to keep closer to the real solution when you do
discretely integrate a nonlinear equation through successive
linearization.


I'd like to find a better way to relate all these things.  I find it
very intriguing but don't have a full picture in my head..


EDIT:  To check
- Poisson bracket
- Canonical transformations


% [1] http://twitter.com/kragen
% [3] http://zwizwa.be/-/math/20130421-101744


Entry: LU, Cholesky and Hamiltonians
Date: Tue Dec 16 09:48:00 EST 2014
Type: tex

Since the LU decomposition of a simple example symplectic euler
system[1] has a lot of structure, can this be translated to structure
of matrices?

The structure is $(I+E^T)(I-E)$, where $E$ is nilpotent $E^2=0$.


% [1] entry://20130421-101744


Entry: Hamiltonian matrix
Date: Tue Dec 16 10:26:44 EST 2014

[1] http://en.wikipedia.org/wiki/Hamiltonian_matrix


Entry: Simplest symplectic manifolds
Date: Tue Dec 16 10:33:40 EST 2014
Type: tex

Let's take a constructive approach to get a better idea of how the
spaces (manifolds) and Hamiltonians $H$ relate.

The simplest space would be a 2D plane $(q,p)$, where $q$ denotes
position and $p$ denotes momentum.  Ignoring constants, this is the
phase space of a free point particle traveling on a line $H(q,p)=p^2$,
and the harmonic oscillator $H(q,p)= q^2 + p^2$.

In these simple contexts, what does $dp \wedge dq = 0$ mean?

A more interesting question might be: why is the symplectic Euler
discretization stable?  And why is it slighty skewed?


Entry: computation is topology
Date: Mon Dec 29 22:02:27 EST 2014

Not sure how to make that explicit, but if you look even at just the
practicals: most effort in electronics and programming is about wiring
things together.  Somehow coming from a DSP background I got mislead
by the whole "FLOPS" approach: how many arithmetic operations can the
thing do per second.


Entry: PLL stability
Date: Sat Mar  7 19:13:29 EST 2015

Trying to wrap my head around stability issues in a PLL design.  The
question that comes to mind is: how can it ever synchronize if the
initial phase variation is so large that the filtered can't "catch"
it?  I.e. I understand that a PLL is stable once it is locked, as it
is close to its linear approximation, but how can it lock in the first
place?

How good does the initial guess need to be?

It seems that the simplest PLL has 2 design variables: low pass cutoff
and loop gain.  My case has another one: frequency offset.  It seems
that offset can be a problem if it gets too far away from the locking
range.

From [1] it seems all filters are integrators.


[1] http://cp.literature.agilent.com/litweb/pdf/ads2008/dgpll/ads2008/PLL_DesignGuide_Reference.html
[2] https://electronics.stackexchange.com/questions/86059/what-is-phase-lock-looppll-lock-range-capture-range


Entry: DPW
Date: Mon May 25 11:51:47 CEST 2015

Google for "Differentiated Parabolic Waveform": 1/p * diff((p-1/2)^2)
There is a implementation in RAI: look for saw-d2 in
https://github.com/zwizwa/rai/blob/master/synth-lib.rkt


Entry: deep learning
Date: Sun Jul  5 22:57:36 EDT 2015

[1] https://charlesmartin14.wordpress.com/2015/03/25/why-does-deep-learning-work/
[2] https://charlesmartin14.wordpress.com/2015/04/01/why-deep-learning-works-ii-the-renormalization-group/e-t
[3] https://www.quantamagazine.org/20141204-a-common-logic-to-seeing-cats-and-cosmos/


Entry: Commutation
Date: Sun Aug  2 18:56:32 EDT 2015

See in Haskell, meta/siso: e.g. commutation between representation and list:

A list of representations of e

   :: l (r e)

is not the same as a representation of a list of e

   :: r (l e).


Entry: category theory / algebraic topology
Date: Wed Sep 16 20:51:26 EDT 2015

https://lobste.rs/s/6gfcsd/category_theory_a_gentle_introduction


Entry: Complex derivative hack
Date: Sun Oct  4 16:57:50 EDT 2015

https://codewords.recurse.com/issues/four/hack-the-derivative


Entry: Bisecting intervals with gaps
Date: Fri Nov 20 12:09:18 EST 2015

Find maximum value on a strictly increasing sequence on a circle (last
element in a circular buffer).

This works by successive bisection refinement, picking an initial
interval by arbitrarily picking a point:
(M > L) ? {M,R} : {L,M}


3 4 5 6 7 0 1 2 3
L       M       R


How to adjust this to be able to bisect ranges with gaps in them

0 1 2 3 4 5 6 7 8 9  index
3 4 5 . 6 7 0 1 2 3  value
L                 R

The locations of the gaps are not known beforehand.  Reading is
expensive.

What to do when index 3 is read?

Guess:
- switch to decision rule:
   M >= L ? {M,R} : {L,M}
- copy value from next block


Entry: CT and declarative programming
Date: Mon Jan 11 16:00:33 CET 2016

http://bartoszmilewski.com/2015/04/15/category-theory-and-declarative-programming/


Entry: Hamiltonian signals
Date: Sat Jan 23 20:27:19 EST 2016

- Chaotic
- Simplectic update through first order ODEs derived from Hamiltonian
- Also use Hamiltonian for re-normalization using constant energy constraint
- For oscillators: tune simulation parameters based on frequency estimator / regulator.

How to generate interesting systems?

Swinging Atwood's machine:
https://www.youtube.com/watch?v=3ajr5Fb2i1g
https://en.wikipedia.org/wiki/Swinging_Atwood's_machine

Pendulum's reactive centrifugal force counteracts the counterweight's
weight.  The force tends go get quite high corresponding to the high
angular velocity (slingshot), and a smooth "bounce" on the
counterweight.

The ODEs have singularities for points passing through r=0.  

I wonder if it is possible to modify the geometry to remove those
signularities.

E.g. instead of the 1/r^2 in the Hamiltonian, to use a 1/x+r^2

Phase-space is 4D, but due to energy preservation, the points are
constrained on a 3D manifold.


For the 3-body problem, as e.g. in these two fixed masses:
https://www.youtube.com/watch?v=uN3IwAD1hdQ

It is possible to guarantee boundedness based on total energy.
Because the singularities are points, the chance that a randomly
chosen initial condition passes through a singularity is zero.
However, due to finite approximation it is not.  Regularization is
probably a good idea.  ( Regularize as a bounce?  Or smooth out the
hamiltonian. )

For integrated simulation, the singularity actually doesn't matter as
the infinite fource only happens for an instance.  What actually
happens when a point passes through the singularity?


Large-scale n-body:
https://www.youtube.com/watch?v=byI9yhITDsM


Entry: Biliart ball chaos
Date: Sat Jan 23 21:40:56 EST 2016

See [1].  Chaos is possible if the triangle (polygon?) does not tile
the plane (or tiles the plane chaotically?).

The geometry generated by a non-tesselating polygon is "piece-wize
curved".  What's a proper term for that?

Some other related [2][3].

[1] https://www.youtube.com/watch?v=3ajr5Fb2i1g
[2] https://plus.maths.org/content/chaos-billiard-table
[3] https://en.wikipedia.org/wiki/Ergodicity


Entry: Colliding point masses.
Date: Sat Jan 23 22:58:12 EST 2016

What happens in the equations of motion if two point masses collide?
They will move past each other in a straight line, and will have only
an infinitely smal moment of infinite fource.  How to represent that
without the singularity?

A way to look at this is to parameterize the phase space trajectory in
terms of a regularization of the potential energy:

1 / (a + r^2)

And then take the limit of that trajectory as a->0.


( Can tricks be used as in infinitesimal analysis? )

Related?
http://mathoverflow.net/questions/97391/how-to-deal-with-the-singular-reduction-of-the-hamiltonian-n-body-problem


Wait..


Entry: base64
Date: Thu Feb 25 23:39:09 EST 2016

6 bits per character

2*3 vs 2*2*2

*4     *3

so 4 characters encode 3 bytes.
how is padding handled?

https://en.wikipedia.org/wiki/Base64#Padding


Entry: Cascaded integrator-comb filter
Date: Wed May 18 18:41:08 EDT 2016

Replaces the standard interpolation/reconstruction filter with
integrator/differentiator (comb) pairs.

Effectively creating a moving average (rectangular) FIR filter.

1-stage: rectangular
2-stage: triangular
3-stage: 2nd order bump
...


[1] https://en.wikipedia.org/wiki/Cascaded_integrator%E2%80%93comb_filter


Entry: Linear Logic - Session Types
Date: Sat Aug  6 22:16:09 EDT 2016

http://homepages.inf.ed.ac.uk/wadler/topics/linear-logic.html
https://www.youtube.com/watch?v=IOiZatlZtGU (Q&A)
https://www.infoq.com/presentations/category-theory-propositions-principle

CCC : we can treat functions as data


Entry: What is it like to understand advanced mathematics?
Date: Sun Aug 14 20:17:50 EDT 2016

https://www.quora.com/What-is-it-like-to-understand-advanced-mathematics/answers/873950?srid=p6KQ&amp;share=1


Entry: Inplace matrix transposition
Date: Mon Aug 29 12:49:44 EDT 2016

https://en.wikipedia.org/wiki/In-place_matrix_transposition


Entry: Packetized TDM as coordinate transformations
Date: Mon Aug 29 23:46:13 EDT 2016

For lack of better name, packetized TDM meaning: to transmit a number
of channels in block form, e.g. N samples from channel 0, followed by
N from channel 1.

Just a practical format I ran into, but this is to hint at the general
idea of chunking which tends to happen in nested form and can get
confusion.  Therefore it makes sense to look at TDM as coordinate
transformations


An important thing to realize is that this form of TDM can be seen as
a 2-step process: the introduction of an extra dimension followed by
the removal of another:

1) packetize: splitting t into a 2-dimensional (p,i) coordinate
2) multiplex: flattening (c,p) into a 1-dimensional p' coordinate


(c,t)    // channel(finite), time(infinite)

 <->

(c,p,i)  // channel(finite), index(finite), packet(infinite)

 <->

(p',i)  //  index(finite), flattened packet+channel (infinite)


p  = t div N
i  = t mod N
p' = p * N + i


c = p mod N
t = p * N + i
p = (p' - i) div N


Entry: Category Theory lectures - Bartosz Milewski
Date: Fri Sep  2 19:12:51 EDT 2016

1.1 https://www.youtube.com/watch?v=I8LbkfSSR58

Compares Kmett's way of generating librarys from CT to C++ TMP
deriving from functional programming ideas: implementation in one
language (concrete) derived from ideas in the higher level
mathematical structure.

1.2 https://www.youtube.com/watch?v=p54Hd7AmVFU

Everything about a category is encoded in the composition table of the
morphisms.


2.1 https://www.youtube.com/watch?v=O2lZkr-aAqk

2.2 https://www.youtube.com/watch?v=NcT7CGPICzo


Entry: Running mean/variance
Date: Mon Jan  2 18:02:56 EST 2017

2s^2 = (a - m)^2 + (b - m)^2 = a^2 + b^2 - 2m^2  with 2m = a + b

general case:

s^2 = 1/n sum x_i^2 - (1/n sum x_i)^2


Entry: Merging
Date: Tue Apr 11 21:35:39 EDT 2017

Problem: 

- Given two time series, match them up based on them not being exactly
  matched.


Entry: woven balls
Date: Fri May 19 00:31:16 CEST 2017

Cover sphere with random path.  What's the pattern behind a woven ball
and why does it look random but also spherical?

Is it a variant of a dodecahedron?

( Context: parameter exploration in high-dimensional spaces )

http://mathtourist.blogspot.be/2013/06/pentagons-and-game-balls.html
http://www.instructables.com/id/10-strip-woven-ball/

Seems these are multiple strips, not a single one.


Entry: Derivatives
Date: Fri Sep  8 22:21:28 EDT 2017

The derivative is an operator that maps a function to a linear function.

I'm looking at applying the chain rule to functions over trees.

What does it mean for a function on a tree to be linear?

Linearity in algebraic terms is:  f( a x ) = a f(x)
What does multiplication mean on trees?

Are these zippers the only interpretation that works, or is there a
different way that works with differences of trees.


Entry: On intuition
Date: Fri Sep  8 22:24:11 EDT 2017

I've got good intution (my non-symbolic mind sees hints of things),
and I can do the mechanical symbolic manipulation if I have a starting
point.  However, making the bridge is sometimes really difficult,
because the "language" spoken by the intuition is really out there and
is _very_ tolerant to missing details.

Maybe that is what intelligence is: seeing patters without seeing the
entire mechanics?


Entry: duality
Date: Sat Nov 18 01:50:35 EST 2017

De morgan, and categorical duality.

And/or vs products/sums.

https://en.wikipedia.org/wiki/Coproduct


Entry: De Morgan vs. product/coproduct
Date: Thu Dec 28 18:12:55 EST 2017

De Morgan's duality is straightforward.  When expressed in ^,v,~

  ~(~A v ~B) = A ^ B

  ~(~A ^ ~B) = A v B

The duality operation is to flip all polarities, and exchange v and ^.

Why is product/sum duality so convoluted in functional programming
notation?

A good starting point for looking at this is category theory's
products and coproducts.

E.g from here
https://en.wikipedia.org/wiki/Coproduct

Why are products and coproducts "dramatically different"?
In general, why are categorical duals so different?


Entry: duality: product and coproduct
Date: Fri Dec 29 11:49:41 EST 2017

Duality "same-ness".  Since duality observations are used a lot, and
products and coproducts are the perfect example of this due to their
relevance in algebraic types, it might be a good starting point to
gain some intuition.

Start with sets:
p: carthesian product
c: disjoint union

Those definitions are clear.  What is not clear is the sameness.

Simplified to boolean logic (De Morgan's laws) this becomes clear, but
even for algebraic types, the difference between sums and products is
stark.

So why is this same-ness at all?

https://en.wikipedia.org/wiki/Product_(category_theory)
https://en.wikipedia.org/wiki/Coproduct

Going from the wikipedia pages.

W.r.t. duality: an important distinction is about arrows arriving and
leaving at objects.  That is likely where the "dramatic difference"
comes from.  Let's juxtapose:

Given f1:Y->X1, f2:Y->X2, there exists f: Y->(X1,X2)
      f1:X1->Y, f2:X2->Y               f: (X1+X2)->Y

Here, the tupling is completely obvious, but the sum is not.

Wait a miniute: is this about the existence of f1 or f2 in the case of
the coproduct?  No.  Both f1 and f2 exist.

Also, X1 and X2 live in the same category.  Otherwise diagrams would
be meaningless.  So the coproduct is closed.


That last one is important.  In the category of sets, both carthesian
product and disjoint union are closed binary operations connecting two
(possibly distinct) objects to a third one.

The existence of products and sums is a property of the category.

Given the "constructors" for product and sum, the existence of
projections,injections and composite arrows is assumed.


So really, it doesn't say much.  It only talks about how the
components relate to the composite.

Taking a->b to mean logical implication (a' v b), this duality is the
same as that introduced by negation in boolean logic:

        y->x1, y->x2      => y -> (x1 ^ x2)

        x1'->y', x2'->y'  =>  (x1' v x2') -> y'


Write an example using sets, starting with the objects
X1={1,2} X2={2,3}

Product = {(1,2),(1,3),(2,2),(2,3)}
Union   = {1,2,3}

These are really different animals.  An illustration of how much
concrete details are thrown out in CT.


Entry: Agda, coproducts
Date: Fri Dec 29 13:48:44 EST 2017

Coming back to proofs in agda.  Why does it feel so asymmetric to
prove things using case analysis (disjunction) as opposed to tupling
(conjunction)?

I still don't have a good answer as to why disjuction and conjuction
feel so different, while somehow "being made of the same stuff".


Entry: sparsity
Date: Fri Jan  5 02:42:34 EST 2018

https://lists.samba.org/archive/rsync/2014-June/029495.html
intersection/union triggered an association:
this is S-D encoding

Basic idea: 2% of neurons is on.  This is a normalization.

That's exactly what reminds me of the SD stuff: logical operations
might have some other algebraic meaning if they've got enough
decorrelation and are far enough from the saturation (full on or off).

spectra, colors, distributions

sparse distributed representations (SDR)
https://www.youtube.com/watch?v=6ufPpZDmPKA

overlap "means something", because random overlaps are very rare.

density is low, such that the chance of overlap becomes low, which is
a key property (overlap chance is approx s^2 for small s).

conclusions to be made in case s is small:

- existence of overlap is not likely due to chance
- it's ok to match a subset to test for a pattern
- noise can be mitigated
 
Btw matches in overlap sets are bloom filters.


Entry: The important part is concepts
Date: Mon Feb 12 23:51:25 CET 2018

Without those, there are no connections.

Put differently, finding a good way to look at something is most of
the work. Once established, connections often become obvious.


Entry: Sums and Products
Date: Thu Mar 15 17:14:33 EDT 2018

Back to the duality thing.

While products feel natural (tuples), sums are harder to represent.
I.e. they need tags.

Where does the a-symmetry come from?  Is this related to code
vs. data?  Produces are different locations in data memory, and sums
are different locations in code memory (i.e. branch based on tag).


Entry: Sums and Products..
Date: Sun May 27 15:26:58 EDT 2018

Maybe this is just syntax?

if' will do case analysis that looks like it is a product.

The thing is maybe that different cases have different matching, and
that you would use sums for analysis and products for constructions?
I.e. they are dual that way?

I'm not really sure where this is going or if it even makes any sense..


Entry: Brouwer
Date: Fri Jun 15 02:23:15 EDT 2018

Homotopy Type Theory introduction lecture -- Robert Harper
https://www.youtube.com/watch?v=ISq1xi-mGk8&list=PLo61mJSqK5cXHsnYUAXNea31BCj1u-zln

Brouwer: Proofs are mathematical objects.

Different idea that just formal proofs (Goedel's theorem is about formal
proof being strictly limiting, compared to the proof "outside").

Difference between HoTT way of thinking about proofs and just proofs
in set theory encoded in Coq.

Synthetic (abstract) vs. Analytic (nuts & bolts, Bourbaki tedious).


Lecture 2: Entailment (Logical consequence)

Mapping (entailment) becomes internalized (implication).

Entailment and implication are not the same thing:
https://philosophy.stackexchange.com/questions/12816/difference-between-implication-conditional-and-logical-entailment


Entry: Intuitionistic logic
Date: Sat Jun 16 22:22:51 EDT 2018

Before getting into HoTT, it's probably best to do a little more
constructive logic and category theory first.

https://en.wikipedia.org/wiki/Intuitionistic_logic


Entry: correctness proofs
Date: Thu Jun 21 11:54:16 EDT 2018

It's all about case analysis, e.g. sums, which cause multiple code
paths.  Products are somewhat automatically handled by data
structures.

(or free?  I don't dare to use that word any more).

This ties into some itching intuition I have that while producs and
sums are essentially the same (duals), they are very different in a
practical sense when it comes down to proof and program structure.

Why is this?
Is there some insight to gain here, or is this fever dream stuff?


Entry: Constructive analysis
Date: Tue Jun 26 15:42:16 EDT 2018

https://en.wikipedia.org/wiki/Constructive_analysis

IVT: rephrase as approximation algorithm, essentially.

This does seem like a more appropriate way to talk about things when
computers are involved.


Entry: intuitionistic type theory
Date: Fri Jul  5 18:26:55 EDT 2018

Going over the wikipedia page 
https://en.wikipedia.org/wiki/Intuitionistic_type_theory

What I miss in that page is a bit more of a direction.  Clearly I'm
expecting something else than what is exposed...

I get the 3 primitive types: Bottom, Unit and bool.

But the type constructors are a little strange to me.  Especially why
the Sigma is defined in terms of products.

Also, it says that dependent types model predicate logic.  How?


Entry: Kalman filter
Date: Fri Sep 14 21:39:33 EDT 2018

Objective: reconstruct state sequence given measurement and system
parameters.

https://en.wikipedia.org/wiki/Kalman_filter


Entry: Gravity peak detection
Date: Tue Sep 18 12:05:48 EDT 2018

I was thinking about how not to add arbitrary segmentation algorithms
for quick and dirty peak detection.  What about using a form of
"gravity", where peaks will first coalesce, and then a peak detection
can be used?

The problem is that peaks will move towards each other, so maybe not
what I'm looking for.

I guess that this is just always going to be a little ad-hoc, where
there is a combination of numeric clustering, followed by a discrete
peak selection phase.


Entry: clustering is not well defined
Date: Mon Nov 26 14:36:03 EST 2018

https://twitter.com/MarkKriegsman/status/1067125849315524618
https://towardsdatascience.com/the-5-clustering-algorithms-data-scientists-need-to-know-a36d136ef68
https://en.wikipedia.org/wiki/Determining_the_number_of_clusters_in_a_data_set

Links into intuition to keep things continuous, or at least "density-encoded".


Entry: Dobble Beach
Date: Sat Jan 12 16:44:06 CET 2019

6 images per card
31 (30?) cards 

for any 2 cards, only 2 images the same

how is this possible?

edit: discrete projective geometry


Entry: Universal coalgebra: a theory of systems
Date: Fri Feb 22 12:13:10 EST 2019

https://www.sciencedirect.com/science/article/pii/S0304397500000566

https://en.wikipedia.org/wiki/Coinduction
https://en.wikipedia.org/wiki/Corecursion


Entry: Transaction matching
Date: Thu Feb 28 20:12:17 EST 2019

Link transactions between accounts, but allow for a little slack in date.

How to express this problem precisely?


Say M is the main account, and S is a sub account that needs to be
fully explained.  There will be a set of solutions to this.

- If the set is empty, something is wrong with the data.
- If there are multiple such sets, pick one based on:
  - Metadata distance?  This might be hard
  - Date distance
  - Random association
  - Combination


Expressed as a join, how can the uniqueness characteristic be
expressed?  It is a global property, so I don't really see how.


A sequential greedy algoritm:

- for each transaction in S, find a transaction in M that has the same
  amount by sequentially searching pre/post time and removing the
  corresponding transaction from M


This seems to be easy enough to implement.  Imperative in-place update
is easiest.  The main ledger should be ordered to make searching easy.


More general problems:

https://brilliant.org/wiki/matching-algorithms/


Entry: Bezier Curves
Date: Tue Mar 26 22:40:55 EDT 2019

Simpler to understand as iterated linear interpolation:

https://en.wikipedia.org/wiki/De_Casteljau%27s_algorithm


Entry: Embedding manifolds
Date: Thu Apr  4 14:44:58 EDT 2019

Application domain: I have a system where the central measurement is
phase difference between two oscillators, and I need to continously
trak this difference, keeping track of "wraparound" based on a
well-known initial point.

The main insight is to keep the representation of phase "embedded" in
the representation of a quadrature signal, as opposed to flattening it
out into a single number that cannot represent the idea of rollover in
a continous way.


I think there is a generic field here: e.g. representation of Lie
groups as sets of matices.

Can this be generalized?  I.e. there are more cases where
discontinuities and ambiguities are a consequence of the
representation, and not intrinsic to the problem.

Riemann surfaces come to mind as well.
http://mathworld.wolfram.com/RiemannSurface.html
https://en.wikipedia.org/wiki/Riemann_surface


Entry: autodiff and recursive eval
Date: Thu Apr 11 22:33:55 EDT 2019

Exponentials can be evaluated by update equations.  Does an autodiff
form of such a program look different from a computed form?


Entry: Mapping fractions to integers
Date: Mon May  6 21:27:37 EDT 2019

A straightforward way would be to use the prime factorization, and
"interleave" them.

If p[n] is the sequence of primes, split this in odd/even and use them
to build a new number.


Entry: The importance of reframing
Date: Sun May 12 17:59:51 EDT 2019

I wish somebody had explained this to me earlier in my life: 

Easy problems are those where it is always clear what the next step
would be, and where each individual step leads to more clarity.  The
sequential nature of discovery is something very hard to appreciate if
you've not experienced it.

Hard problems are those where it is not clear what the next step would
be.  It often requires meta-insight: why is there no insight coming?
How can you think of yourself as a tool that is not used properly?


Entry: Different sync encoding
Date: Mon Aug 26 23:16:58 EDT 2019

Pick one special byte to act as packet boundary.  This leaves 255 values.

A n in m encoding would satisfy 

255^m >= 256^n

basically, encode in base 255 instead of base 256.

The midi 7 in 8 encoding is a special case of that, leaving 128
control characters (e.g. with high bit set).

Is there a natural chunk size to do that?

Say that we only use 64 codes.  This is 0.5 bits per byte that are
missing.  Can this be 15 in 16?

Or can it be 255 in 256?

Too tired and wacky brained to do this intuitive shit..

At the bit level, bit stuffing can be used.  This can also be done in
uart bytes.  If the last bit is 0, this was a 0xFF byte, and the
entire stream should be shifted by one.

That would be more efficient, but requires a more complex decoder.


Entry: CRC
Date: Wed Oct 23 09:20:49 EDT 2019

EDIT: Principle is right, but implementation details requires some
additional assumptions.  See end.


CRC is modulo reduction using an irreducible polynomial.  I.e. long
division.  However, how does a bit stream actually map to the input
polynomial?  And why does accumulator chaining work?

Let's try to perform a division and fill in the gaps.

x^3 + x + 1  is  1011

Dividing 10101010 by 1011


10101010 | 1011
10110000 |-------
-------- | 10011
   11010
   10110
   -----
    1110
    1011
    ----
     101

Only the remainder is useful.

How to turn this into something that takes a single bit at a time?
Note that only the first 4 bits of the dividend are necessary.  That
is the key to turning this into a state machine.

To visualize, spell out the zero subtractions explicitly.

10101010 | 1011
10110000 |-------
-------- | 10011
 0011010
 0000000
 -------
  011010
  000000
  ------
   11010
   10110
   -----
    1110
    1011
    ----
     101

Then mask out the bits that do not need to be known at each step,
feeding in a bit when it is necessary.

1010.... | 1011
10110000 |-------
-------- | 10011
 0011...
 0000000
 -------
  0110..
  000000
  ------
   1101.
   10110
   -----
    1110
    1011
    ----
     101

The trick here is that there is no carry!  This allows us to carry on
a step without knowing the downstream bits!

This would not work so smoothly with an addition operation that
carries over into lower significant bits.

To restore symmetry, add zero bits.

........
00000000
--------
1.......
00000000
--------
10......
00000000
--------
101.....
00000000
--------
1010.... | 1011
10110000 |-------
-------- | 000010011
 0011...
 0000000
 -------
  0110..
  000000
  ------
   1101.
   10110
   -----
    1110
    1011
    ----
     101


This property allows to just roll the algoritm.

So the "accumulator" is essentially the first 4 digits of the
intermediate, which will end up as the full modulo at the end.

This all seems plausible.  So let's implement it and see if it holds
up.

There are essentially only two operations:
- devision reduction step
- shift in a new bit

The polynomial is typically represented as bits without the top bit
which is always 1.

In our case, the poly residu is 011.

Suppose the accumulator is
1010

The algorithm step is then:
- store the top bit
- shift in the new bit
- xor with residu if top bit was 1


Now I just need to know what the bit order convention is.  It seems
that shifting in MSB first makes most sense.

Let's check this with a crc8 plucked from the internet.


EDIT: This mentions padding at the end.
https://en.wikipedia.org/wiki/Cyclic_redundancy_check

Still not quite clear.  I'm doing something else.

https://rosettacode.org/wiki/CRC-32

I think I have the definition wrong.

From the wikipedia page:

  This is first padded with zeros corresponding to the bit length n of
  the CRC. This is done so that the resulting code word is in
  systematic form.

https://en.wikipedia.org/wiki/Systematic_code

  A systematic code is any error-correcting code in which the input
  data is embedded in the encoded output.

I don't understand this.
But what I am doing is padding with zeros at the start?


So I give up.  Interpret and generalze a working implementation instead:
https://stackoverflow.com/questions/21001659/crc32-algorithm-implementation-in-c-without-a-look-up-table-and-with-a-public-li

So CRC32 and CRC32B are not the same:

https://stackoverflow.com/questions/15861058/what-is-the-difference-between-crc32-and-crc32b


Entry: Financial transactions and density
Date: Sat Nov 16 21:40:10 EST 2019

Some posts in the past around 20120910.

The main idea:
- predict future based on past
- set targets, given a margin

I'm still not quite sure how to get the trend out of the noise, but a
key element seems to be to divide into categories (which is what gives
an idea where the money is going in rough terms), and apply smoothing
by filtering each category using a method that distributes reserves
while maintaining thresholds.

E.g. it is possible to take one paycheck and split it into one that is
payed at the original time, and one that is payed 1/2 of the period
later.  Or in the limit, spread the amount over the entire "future"
payment.  Same with expenses, but in the other direction: it is always
allowed to pay a debt early.

This is a reflection of the main property of money:

    You can always not spend money that you have avaiable, but you
    can't spend money that you do not have available.

That is the primary "shape" of the space we're dealing with.

The mangement of money is about managing those hard deadlines.  Now in
practice, deadlines are not completely hard, but debt does tend to
spiral.  So it is best to leave a safety margin.


Ideally, any partition of expenses and income can be smoothed by
performing time-distribution, in such a way that the resulting "curve"
is as smooth as possible, and can be used as a trend to make
decisions.

So what this does is to give you smoothnes, but it consumes "buffer
space", i.e. there needs to be an amount of cash that can handle these
fluctuations without running out.

Essentially, we want to extract the following information:

- Trend: plot smoothed income vs. expenses.  This helps with planning
  future income, trading study vs. work.

- Buffer: what is the size of the variability, determining the need
  for reserves.  Note that smoothing might influence this.

- Feedback: if trends for categories can be clearly identified,
  optimization effort to reduce expenses can be directed.


Compared to previous insight related to "density" ideas: it is now
clear that apart from the need for smoothing to be directional, it
seems fairly straightforward to implement.  E.g. a piecewize linear
curve per year, quarter, or month is probably already enough.
Multiple scales could be used.  Other base functions could be used.

What is clear is that the main problem is introducing "goldy locks"
categories.  Too many or too few and no information can be extracted.


Entry: protocol analyzer
Date: Sun Feb 16 19:10:37 EST 2020

Starting from 8 bit, how to find the logic levels?  Assume the levels
will not change, so start with constructing a histogram.  This will be
strongly bimodal.  How to find those modes?

The simplest approach would be to find the local maximum starting from
above and below.  Then pick the mid-point, and use a ST to convert it
to digital.

Now, what about 3-level?  E.g. I have a RS485 line that has an idle
state at say 2/3 between high and low.


Alternative:
- logaritmic histogram
- average (center of gravity)
- split based on center of gravity
- linear center of gravity for each
- midpoint


Different approach: given midpoint, compute variance of the 2
sections.  Minimize variance.  This can be done exhaustively.

This likely also works for 3-modal.

Another: hill-climbing from all initial conditions.

Another: 3-modal: find first 2 modes using left/right local maxima,
then bisect hillclimb until disctinct 3rd pops up.

the hill-climbing require smoothing.

Another: fit a single sine wave to a smoothed histogram.


https://en.wikipedia.org/wiki/Cluster_analysis

https://en.wikipedia.org/wiki/K-means_clustering
https://en.wikipedia.org/wiki/Voronoi_diagram


Forgot: for differential line, it is actually completely trivial.  By
analogy, setting the clusters manually also solves the problem.


Entry: The D-transform
Date: Sat May  9 15:05:26 EDT 2020

A while ago I ran into something I believe was called the D-Transform.
It is a reformulation of the Z-transform based on the transformation
d=z-1.

Recently this idea has come up in private conversation related to the
field of audio processing and synthesis and analog modeling.  The
basic setting is to oversample a signal to remove digital sampling
artefacts.  That part is quite straightforward, but doing so exposes
numerical instability when signal operations are expressed in terms of
differences between subsequent sample values, as is usually the case.

Instead of keeping track of delays, it is possible to reformulate
filter topologies by keeping track of differences, all the way
assuming that signals do not change much from one to the other.  In
essence it is a way to use the available precision where it is best
used.

Some references:

1. 

Finite Difference Equations - H. Levy and F. Lessman
isbn://9780486672601

At least I think that's quite similar to the '61 first edition I found
in a used book store.

2.

TODO: Some previous log entries.


Entry: SVF vs. biquad
Date: Mon May 18 08:15:54 EDT 2020

Implementing filters has been a while.  I think this time I want to do
it with a proper test setup.  Bascially I want to create some C code,
and have it compute some analysis.


Entry: Gyrator, Symplectic
Date: Sun May 24 23:47:33 EDT 2020

https://en.wikipedia.org/wiki/Gyrator


Entry: Computing logarithm
Date: Sun Jun  7 00:40:22 EDT 2020

Implemented this in fixed point:

https://en.wikipedia.org/wiki/Logarithm#Feynman's_algorithm

But polynomial is going to be much faster:

https://math.stackexchange.com/questions/61279/calculate-logarithms-by-hand


Entry: Low frequency non-linear model updates
Date: Sat Jun 13 21:11:23 EDT 2020

Running into this interesting problem: frequency calibration of a
synth.  First I attempted a fairly standard integrating regulator,
which really didn't work out, mostly because of the delay on the
frequency measurement.  Then I realized the parameter I want to model
(temperature) is much slower moving, so the idea was to split it up
into two parts:

- a feedforward control based on a model

- low frequency update of the model (e.g. 1/sec) after initial
  explicit calibration.

The model will likely be a polynomial compensating for the
nonlinearity in the system.  This however is temperature-dependent.


Entry: The 'd' transform
Date: Thu Jun 18 22:40:28 EDT 2020

Ok I remember now.

I've been looking at the d=z-1 transformation to implement analog
style filters more directly at high sample rates.  The problem is that
there is a direct path in the d operatior, so it can't be realized
with feedback.


So that is a dead end.  But it is possible to find different
topologies that have good numeric stability by not subtracting numbers
after multiplication.

I made a deriviation in this spirit starting from the svf, but split
up in three update stages: a loss stage and something resembling a
ping-pong lossles symplectic integrator step.  E.g. a product of 3
triangular matrices:

p' = p  - b p
p+ = p' + a q
q+ = q  - a p+

Which can be implemented sequentially using code below.  GCC compiles
the following C representation to the proper high-word MAC opcodes on
ARM Cortex M.

#define I64(x) ((int64_t)(int32_t)(x))

p -= (I64(b) * I64(p)) >> 32;
p += (I64(a) * I64(q)) >> 32;
q -= (I64(a) * I64(p)) >> 32;


Now to make an arbitrary 2nd order section, add input, bypass and
output coefficients.  Which probably can be done sparse as well.


Entry: Higher order sigma-delta
Date: Sat Jul  4 15:07:55 EDT 2020

The framework to analyse this is filtered error feedback: the output
is regulated to the input, the error is computed and fed through a
loop filter that computes the next output.

The quantizer can be approximated by a linear segment (just a
passthrough) and a noise source.

From what I read, instability issues only pop up in the 1-bit case,
where the quantization noise is relatively large.  I am in a situation
where I have about 8-12 bits of output PWM precision.

It is just a filtering problem, as the error is shaped by the inverse
of the loop filter.

This mentions that dithering is probably necessary to decorrelate:
https://en.wikipedia.org/wiki/Noise_shaping


Entry: Third order
Date: Sun Jul  5 23:12:36 EDT 2020

Why does this work, i.e. with each integrator being fed an error
signal, and not just chaining 3 integrators and feeding it in the
front?

struct pdm3 {
    uint32_t s1;
    uint32_t s2;
    uint32_t s3;
};

static inline uint32_t pdm3_update(struct pdm3 *p, uint32_t input, uint32_t out_shift) {
    uint32_t out_q = p->s3 >> out_shift;
    uint32_t out_a = out_q << out_shift;

    p->s1 += input - out_a;
    p->s2 += p->s1 - out_a;
    p->s3 += p->s2 - out_a;

    return out_q;
}

The explanation is that an order n modulator is made by taking an
order 1 modulator, and replacing the quantizer with an order n-1
modulator.

But why is this so different?


Entry: smoothing transaction graph
Date: Thu Jul 23 21:05:36 EDT 2020

I've been looking for a while to find a way to smooth a discontinuous
transaction graph, e.g. a bank account.  Maybe treating it as heat and
then running the heat equation?

EDIT: So that is just gaussian blur...

https://en.wikipedia.org/wiki/Weierstrass_transform