DSP related applied math ramblings. This is a collection of misc notes about applied math, algorithms and circuits, somewhat related to programs I'm writing and projects I'm working on, but deserving a place of their own. For more http://zwizwa.be/ramblings Entry: Killing Chipmunks with Parametric Dynamic Wavetable Synthesis Date: Mon Aug 4 11:55:00 CEST 2003 Type: tex \begin{abstract} In this paper we discuss some sound synthesis methods in the dynamic wavetable framework. We will take a closer look at approaches that combat problems regardig aliasing and pitch dependent formant locations, two of the major drawbacks of dynamic wavetable synthesis (DWS). DWS is a hibrid form of wavetable synthesis and additive/granular synthesis. We will limit ourselves to ``holistic'' DWS, which uses (semi) parametric methods to cancel pitch dependent formant location, as an alternative to basic additive or granular synthesis for signals with a clear single pitch. \end{abstract} \section{Introduction} Wavetable synthesis is one of the first digital sound synthesis methods developped [refs]. The method stores a period of a sound wave in a table and plays it back at a variable pitch. The data in the table determines the timbre of the resulting waveform. A drawback of this form of synthesis is the fact that formant frequencies of the resulting waveform are proportional to the playback frequency. This is frequently called the ``chipmunk effect'' and is usually undesirable. In commercial synthesizers this artifact is reduced by using several wavetables for some pitch ranges and interpolating between them. This is referred to as multitimbral synthesis. Another drawback of wavetable synthesis is aliasing in the case of polynomial interpolation. This is solved in practice by using a decimating filter. This is also called bandlimited interpolation. This usually requires the storage of a large set of interpolation coefficients with different cutoff frequencies, since it is hard to impossible to do this in a parametric way. Dynamic wavetable synthesis is a derivative of wavetable synthesis, with the addition that the wavetable is synthesised using some (parametric) method, often updated at a haptic rate of around 20Hz. This enables the construction of dynamic timbres. However, being a table playback method, the chipmunk effect is also present in this method and aliasing needs to be taken into account when the table is read out at a higher rate than the spacing between table samples. In this paper we will illustrate some applications of DWS, with special attention on the chipmunk effect and aliasing. The first method deals with explicit additive synthesis using the fast fourier transform. In short we synthesize the spectrum of the periodic waveform directly, and introduce pitch by resampling the waveform on playback. When the pitch of the playback is known (i.e. by lookahead) the wavetable can be made explicitly bandlimited. If dilatation and contraction of the spectrum are parametric, we can use the playback pitch information to perform a formant correction before the waveform is transformd to the time domain. This approach has some advantages over synthesizing sinusoidal contributions with standard, direct FFT approaches such as $\text{FFT}^{-1}$ [2], especially for near periodicity. The second topic in this paper is an illustration of the general framework of scanned synthesis using dynamic wavetables. The idea is to derive a waveform, or timbre, from a (nonlinear) dynamic system. The holy grail here is to find some way to change the formant spectrum of a wavetable, depending on the desired playback pitch. We will persue this using circulant neural networks, as a generalization of a linear circular time invariant system. \section{DWS algorithm} \subsection{DWS} The algorithm is very simple. The wave table is a vector function $x[k] \in \mathbb{R}^N$, with a sampling rate $f_c$ which is a fraction of the system sample rate $f_s$. Sound is synthesized by interpolating the $x[k]$ at a desired pitch. To prevent discontinuities when the table is updated, the readout can be made to interpolate between $x[k-1]$ and $x[k]$. The choice of $N$ depends on the desired frequency resolution. When $N=2048$ and $f_s = 1024 f_k$ and $f_s$ in the order of $40$ kHz the entire audible spectrum can be represented, together with the interesting haptic frequencies. A choice of sampling subdivision and table length of $1024$ is usually more than enough for practical purposes. The computational cost in its basic form is relatively low. The playback resampling can be done using simple polynomial interpolation. Using more expensive methods like bandlimited interpolation is also possible, however for some uses (FDDWS) playback aliasing can be cancelled explicitly, so an expensive decimating filter is not necessary. Usually the cost of synthesizing the wavetable (using convolution methods) can be significantly reduced by using an FFT. \subsection{FDDWS} Since working in the frequency domain has a more intuitive feel, we will give the method where de table is transformed using an IFFT before resampling playback a different name: frequency domain dynamic wavetable synthesis (FDDWS). One of the advantages over direct additive synthesis methods is the elegance of the scheme. We don't have to care about things like window length and spacing, and window transfer function, which are serious drawbacks of standard, ad-hoc frequency domain processing methods. Since elegance is not necessarily an argument in engineering, we claim the real advantage is simplicity of the parametric synthesis phase, which will ease implementation and save computational cost when the expense of the spectrum synthesis far outweighs the cost of the resampling involved. When employing frequency domain DWS, anti-aliasing is a trivial multiplication before performing an IFFT. The only real problem that needs to be taken into account is to make the spectrum ``stretchable''. When killing the chipmunks is not necessary, since the effect might be desired in some cases, it can be simply left out. In short, by introducing separate pitch and formant controls, a lot is possible. Combatting chipminks with FDDWS is thus creative use of time/frequency contraction and dilatation. Requiring parametric spectrum dilatation or contraction might seem a serious disadvantage. We can also look at it as a trade-off for the added simplicity of te frequency domain processing. Since the transform lengths are fixed we are freed of window juggling. We hope the illustrations in the next sections wil clarify this simplicity. \subsection{Time-Frequency Duality} All in all, DWS enables us to do pitch relative processing using the entire dsp toolbox in a simplified form because the data size remains constant. The slight increase in cost and the straightforward trade-offs involving aliasing and formant locations give us a very flexible and intuitive framework as return of investment. When working in the complex field, we have a nice dual representation that allows us to switch between time and frequency representation without too much hassle, and interchange the dual operations of circular convolution and ring modulation. \section{Parametric Formant Preserving Wavetable Synthesis} We will illustrate this principle by explicitly synthesizing ``constant-Q'' formants. \section{Dynamical Systems} This section deals with the synthesis of tables using a parametric (nonlinear) dynamical system. This is in fact equivalent to Scanned Synthesis, pioneered by Max Mathews e.a. [ref]. In most cases we are interested in a periodic table without discontinuities. We could define the table on the complex unit circle, even in continuous form to enable some analytic manipulation using i.e. orthogonal polynomials or fractional linear maps, but we will leave that route for later. There are other ways to get to periodicity. We take the wavetable to be $x \in \mathbb{R}^N$. The system is defined as an iterated map \begin{equation} x[n+1] = f(x[n]). \label{ifs} \end{equation} We will also extend this equaiton to include an input, in the form of $x[n+1] = f(x[n] + u[n])$ or the more general $x[n+1] = f(x[n], u[n])$. For simplicity the output (the wavetable) is taken to be the state vector of the system. \subsection{Linear Systems} When we limit ourselves to linear maps $f = A \in \mathbb{R^{N \times N}}$ we get a ``standard'' linear state space system % dynwav paper [1]. A way to introduce circular symmetry in $f$ is to make $A$ circulant. Application of $A$ can then be computed in the frequency domain since it amounts to circular convolution. Working in the frequency domain will also enable us to interpret the working of the network to a large extent, since the FFT (denoted further as the matrix operator $F$) of the wavetable $Fx$ has a very intuitive interpretation. The operator $A' = FAF^-1$ is diagonal containing the spectrum of this filter. Requiring circular symmetry thus brings us to FDDWS. For the linear case (\ref{ifs}) reduces to \begin{equation} x[n+1] = A x[n]. \label{lifs} \end{equation} With $A$ circulant, $x' = Fx$ and the diagonal decomposition $C = FAF^-1$ we have a set of decoupled linear systems \begin{equation} x'_i[n+1] = c_i x'_i[n]. \label{dlifs} \end{equation} With $c_i$ an eigenvalue of $A$ corresponding to one of the eigenvectors of $A$ (an FFT base function). Because we defined $A$ to be real valued, each $c_i$ has a complex conjugate $\bar c_i$. We now have the choice to take $A \in \mathbb{C}^N$ or to replace the eigenvalue decomposition $C = FAF^-1$ by its real blockdiagonal decomposition. The effect of the system is to produce a wavetable composed of a set of harmonic sinusoids, whose phase and amplitude are modulated damped complex exponentials. Scanning $x$ will approximately produce a set of near harmonicly spaced damped sinusoids. The damping is affected by the magnitude of the $c_i$ and the frequency shift away from harmonicity is determined by the phase of the $c_i$. What we are interested in is to make the magnitude of the frequency spectrum of $x$ (the magnitude of $x'$) parametrizable, so we can cancel out the formant shift introduced by table readout. This would turn the DWS method into a direct, fully separated pitch and timbre synthesis method. This can be done by changing the time scale of the $x_i$ and $c_i$ whenever the playback speed varies. Doing so will move the aliasing problem to this scaling step. If the $c_i$ are fully parametric, this does not pose a real problem. Stretching the state vector $x_i'$ will pose no problem either, except for the loss of information. Compressing $x_i'$ does pose an aliasing problem. A way to solve this is to stretch the time domain state vector $x_i$, since frequency domain compression is equivalent to time domain stretching. All this means the entire algorithm will be very much based on resampling, both for the time and the frequency domain representation of the state vector as for the final playback resampling. This of course turns us back to square one in some sense as to how far it is worth persuing this approach instead of keeping the pitch and timbre method separated in a source/filter approach, i.e. using bandlimited impulse synthesis together with an IIR filter bank or overlap add frequency domain filtering. On the other hand, when persuing this route, we do have a separate pitch and formant control, together with a nice framework processing waveforms and spectra in a combined time/frequency approach. \subsection{Nonlinear Systems} In this section we will use a recurrent neural network with connectivity matrix $A \in \mathbb{R^{N \times N}}$ and squashing function $\sigma(t) \in \mathbb{R}$. We define $f$ as a composition of $x' = Ax$ and $f_i(x) = \sigma(x'_i)$. Like in the previous linear approach, we take $A$ to be circulant. This network is thus equivalent to a filter followed by a waveshaper, in music-dsp speak. \subsection{Examples} \subsubsection{Analog waveform modeling} It is of course debatable wether it is worth using an IFFT for waveform synthesis. However, since the transform is running at a haptic rate, the cost is proportional to $\log N$. Compared to filter based methods for synthesizing analog waveforms, this is not so expensive\footnote{ The filter order required to get a decent bandwidth vs. aliasing trade-off for virtual analog is significant, and at least comparable to the complexity of the IFFT. On the other hand, implementing a virtual analog synth is a lot easier to do with just an increase in sample rate. Increasing the sample rate has a lot more advantages. It simplifies building blocks, allows for a smaller wordlength, allows for low order filters to be used to tame nonlinear elements, allows for sigma-delta modulation to perform bit depth reduction when increased precision is necessary, i.e. for high Q recursive filters, etc...}. Since we're not too much concerned with killing chipmunks, the only thing we need is an anti-aliased IFFT. Most analog style waveforms have a simple fourrier transform. Again this can lead to new schemes for synthesizing new ``analog sounding'' waveforms and is in fact no different from general parametric spectral synthesis. \subsubsection{Piano modeling} Modeling the piano is not a simple task. However, some of the features of the piano, like the inharmonicity and triple strings per note fit wel into our scheme. We will use the simplest form of a circulant linear state space system (FDDWS), with nothing but a spectral envelope and a phase shifter (multiplication with unit norm complex numbers) in the feedback path. Reading a table three times with frequencies corresponding to $1, 2+\Delta_2, 3+\Delta_3$ to simulate the effect of triple strings already produces a very piano-like sound. We can introduce inharmonicity by modulating the FFT bin corresponding to basis function $e^{\frac{j2\pi n}{N}}$ by $e^{j(an + bn^2)}$. This will produce a frequency dependent shift, which will shift ($a$) or bend ($b$) the harmonic series upward ($a > 0$ or $b>0$) or downward ($a<0$ or $b<0$) from harmonicity. This nonlinearity can even be made signal dependent, i.e. dependent on the power of the system. Modeling the attack is almost as important as the strings. We use a second FDDWS system with a gaussian envelope in the feedback path, reduced by an overall gain. The wavetable is initialized with a random spectral content when a new note is struck. \subsubsection{Waveform feedback} Since we know the pitch of the scanning playback, we can scan a waveform, do some processing (i.e. a reverb) and scan the output back into the table. This allows for some more coupling between the real world and our DWS model. Some nice feedback effects can be obtained when both scanning frequencies are not inverses, or a multiple read-out scan is combined with a single read-in scan. Generating a new waveform by stretching or compressing the previous one and feeding it back into the loop can generate a lot of interesting sounds, even natural sounding drum sounds. By adding a static limiter and a lowpass filter in the feedback loop, stability is ensured and some spectral control is possible. The lowpass filter also smooths the edges. The update equation is $x[k+1] = L \circ D \circ R \{x[k] \}$ where $R$ is a resampling stretch or compress operator, $D$ is a waveform limiter (i.e. a hard clipper) and $L$ is a lowpass filter. By windowing $x[k]$ with a cosine window before scanning it for final output, most of the waveform discontinuities re removed and the frequencies are shifted upward which gives extra punch. Driving the same system with a constant period input, instead of an impulsive exicitation, giving the update equation $x[k+1] = L \circ D \circ R \{x[k] + u \}$ , with $u$ the constant input, can generate interesting fractal like waveforms, reminicent of synced sawtooths. When the limiter is replaced by a power normalizer and the feedback loop contains both a lowpass and a highpass filter, $x[k+1] = N \circ L \circ D \circ R \{x[k]\}$, with $N$ the power normalization operator, a stable oscillator is constructed. This oscillator can produce a wide variety of chaotic sounds. Due to the normalization, the system is nonlinear but energy conserving, and due to the filters, the frequency content can be localized. Some more variants are possible. Changing the update function to $x[k+1] = N \circ L \circ D \circ (1 + aR)\{x[k]\}$, gives an extra control $a$ that is a tuning parameter for the amount of folding applied. \subsubsection{Random remarks} In waveguide string modeling, there are usually two polarizations that are combined, both with different properties. Using the positive and negative part of the spectrum to express something similar in DWS, enbles us to use complex FFT's. This is a bit of a moot point, but it enables easier processing in PD, since the data will be in a more natural form, plus we sort of get two signals for the price of one. In other words, if the positive and negative spectrum will receive a different phase shift (so both left and right partials differ slightly in frequency after scanning) we get amplitude modulation for free. \section{Conclusion} Despite the chipmunk effect which sometimes can be a big drawback, dynamic wavetable syntesis is an interesting approach because it enables us to use fast periodic transforms to build systems that can be easily understood as to how they behave from a time or frequency perspective. When we can counter the chipmunk effect by adjusting system parameters, it is a very useful technique because it eliminates some of the ad-hoc methods used to compensate for the changing periodicity in transform based techniques. In short, if the formant distribution of a wavetable can be shifted in a straightforward parametric way, we can cancel the chipmunk effect completely. Also, when working in the frequency domain, anti-aliasing is ``free'', in a sense that knowing the readout frequency enables us to explicitly cancel out the frequencies that are bound to fold. Entry: Entropy Date: Sun Oct 16 15:25:56 CEST 2005 Type: tex \def\gauss{e^{t^2 \over 2\sigma^2}} Entropy is the (logarithm of the) number of states of a system for which some \emph{macroscopic} measure is left invariant. The logarithm is there just so we can add entropies, instead of multiplying the number of possible states. % weighted sum? For example take 100 coins. If we describe the \emph{macrostate} by the number of heads, states with high entropy are those around 50, for which there are a huge number of possible \emph{microstates} leading to the same measure. States around 0 and 100 are low entropy, since there are only few possible such arrangements. Note that the definition of entropy depends on which measure(s) we are using to describe a macrostate. Quick, some airplane notes. Maximum entropy distribution for the interval $[0,1]$. Suppose the solution is $p(x) = 1$. The entropy is then given as $H_p = \int_0^1 1 \log 1 = 0$. We take another distribution $q(x) = 1 + \epsilon q'(x)$ with $\int_0^1 q'(x) = 0$. Using the linear approximation $\log(x) \approx 1 + x$, the linear approximation of the entropy is $H_q \approx - \int_0^1 (1 + \epsilon q'(x)) \epsilon q'(x) = - \int_0^1 \epsilon^2 q'(x)^2$, which is always less than $0$, so $p(x)$ has maximal entropy. Entry: The number e Date: Mon Oct 17 23:40:33 CEST 2005 Type: tex % WWOOOOOW! (hmm.. apparently this was once an orgastic insight ;) %% About $e$. I wonder if it is at all possible to define $e$ without the %% concept of limit or differentiation. I don't think so. The number $e$ is about the prototypical way of combining a huge number of very small updates. Starting from the power series of $e^x$ as definition, we can prove $$e^x = \lim_{n \to \infty} \left(1 + {x \over n} \right)^n,$$ using the binomial expansion $$e^x = \lim_{n \to \infty} \sum_{k=0}^n {x^k \over k!} {n! \over (n-k)!n^k}.$$ This gives a term by term correspondence since $$\lim_{n \to \infty} {n! \over (n-k)!n^k} = 1,$$ because $\lim_{n \to \infty} {n-m \over n} = 1$ for any $m$. It is interesting to note that in words this says that the number of ways to pick $k$ natural numbers is the same as the number of ways to pick $k$ \emph{different} natural numbers, in the sense that the ratio of both numbers tends to one as the set grows towards the size of the set of natural numbers. Very important is the fact that we can't arrive at $e$ without using infinities. If the converse were possible, $e$ would be algebraic, which it is not. $e$ serves as the gategeeper of the infinitesimal. %% Then onto commutators and $x$ and $p$. Suppose position and momentum %% operators are matrices $X$ and $P = FXF^H$, with $F$ denoting the %% Fourier transform. The commutator is %% $$[X,P] = XP - PX = XFXF^H - FXF^HX.$$ %% Does this say anything useful? Probably not. Entry: Dither Date: Fri May 5 01:00:51 CEST 2006 Type: tex Reading the ``Dither'' page on WikiPedia, and landed on Triangular PDF, which is equivalent to the roll of two dice. Of course, since the roll of one dice is rectangular, and the sum of two is the convolution of the PDFs. Reminds me about Knudth's Musing ``Hooray for probability theory'', which deals with polynomials with positive coefficients and only real zeros. They can be seen as the cumulant of $N$ coin tosses, and always have a gaussian shape. %[1] http://www-cs-faculty.stanford.edu/~knuth/musings.html Entry: Finite differences Date: Sat May 20 17:16:02 EDT 2006 Type: tex Reading ``Finite Difference Equations'' by Levy and Lessman. Interesting book. Opens my eyes about a lot of things related to the continuous and discrete cases. The one that struck me today is $$2^x = (1 + 1)^x = \sum_n \binom{x}{n} = \sum_n {x^{\nf} \over n!}.$$ Here $x^{\nf} = x(x-1)\ldots(x-n+1)$, the falling powers of $x$. This is a notation due to Knuth; the book uses $(x)_n$ which I find less clear. The expression akin to the Taylor expansion is $$f(x_0 + xh) = \sum_n {x^{\nf} \over n!} \Delta^n f(x_0),$$ where $\Delta^n f(x_0)$ is the $n$ times iterated difference operation $\Delta f(x) = f(x+h) - f(x)$. The sum is finite when $f$ is a polynomial, and is otherwize defined if $f$ is analytic. The relationships between the difference operator $\Delta$, the shift operator $E=1+\Delta$, and the differentiation $D$ are expressed as $$E=e^{hD}$$ and $$\log(1+\Delta) = \log E = hD,$$ where the exponential and logarithm signify the usual power series of the operators. Entry: Subspaces Date: Mon May 22 15:09:42 EDT 2006 Type: tex Start with Dedekind's law stating that if $S \leq R$ we have $$S \cap (R + T) = R + (S \cap T),$$ where $S \leq R$ denotes that $S$ is a subspace of $R$. In order to see this, it is interesting to transform this law into a less general statement. Using just the dimensions of the subspaces does not work, since a statement like $S \leq R$ is not equivalent to the ordinary meaning when the symbols represent integers, because the relation $\leq$ for subspaces is not a total order. Reducing the subspaces to those spanned by a finite subset of an basis $\{e_k\}$ allows us to talk about the bit vector $(b_N\ldots b_1)$, which indicates which of the basis vectors are combined to form a subspace, or in other words, it allows to write any subspace as $B = \sum_k b_k E_k$, where $E_k$ is the one dimensional space spanned by $e_k$. %[1] http://en.wikipedia.org/wiki/Modular_lattice Using this representation, the more specific version of Dedekind's law in terms of the bit vectors $s$, $r$ and $t$ becomes $$s \bitwiseand (r \bitwiseor t) = r \bitwiseor (s \bitwiseand t),$$ where $s \leq r$ means $s \bitwiseand r = s$ and $s \bitwiseor r = r$. This reads as $t$ augmented with $r$, filtered by $s$ is equal to $t$ filtered by $s$, augmented with $r$. Obviously this only works if $r$ does not introduce bits that are not in $s$, which is the same as saying that $r$ is ``contained'' in $s$. Entry: Poles with multiplicity > 1 Date: Sat Jun 17 18:46:16 EDT 2006 In practice (from real-world data) multiple poles don't happen since their probability is zero. If they are there, they are structural properties (i.e. system is constructed to exhibit poles with multiplicity.) However, for discrete finite structures they are important since their probability is no longer zero due to the finite number of elements. [1] entry://20090617-200427 Entry: Combining Splitting Fields Date: Tue Jun 20 12:41:48 EDT 2006 Type: tex The splitting Field of a polynomial in $\GF(2)$ has to be a field extension of $\GF(2)$. Now, does this field extension have to be $\GF(2^m)$? Well, let's see if there's a counterexample. Let's factor $(x^2+x+1)(x^3+x+1)$. The first factor splits over the field $\{p, p+1, 1, 0\}$ and the second factor splits over the field field $\{q,q^2,q+1,q^2+q,q^2+q+1,q^2+1,1,0\}$. How to combine both of them? Starting from the multiplicative groups $\{1,p,p^2\}$ ($\ZZ_3$) and $\{1, q, \ldots, q^7}$ ($\ZZ_7$) we get a group $\ZZ_{3} \times \ZZ_{7}$. However, this group is not the multiplicative group of the field we're looking for, since it is not closed under addition. For example $pq^2+1$ can not be factored as a product $p^nq^m$, but can be obtained from linear combination of these products. The additive closure of the group gives the extension field we're looking for. In general, a field extension of a finite field $\F$ is always a vector space spanned by polynomials over $\F$, ensuring this additive closure. If $\F$ already is a polynomial field, the field is then the additive closure of the products of polynomials in different formal variables, each modulo an irreducible polynomial. In our example this gives a proper extension of ${2^2}^{3} = 2^6 = 64$ elements $a_0 + a_1 q + a_2 q^2$ with $a_i \in \{1,p,p+1,0\}$. It has a multiplicative subgroup $\ZZ_9 \times \ZZ_7$ which has $\ZZ_3 \times \ZZ_7$ as a subgroup. Question. What is the exact isomorphism between successive extensions like that, and the normal representation of $\GF(2)$ as polynomials modulo an irreducible polynomial? In other words, how to map these polynomials in two variables to a representation in one variable? Entry: Dither, PWM, Sigma-Delta, Bresenham Date: Mon Oct 9 14:31:06 CEST 2006 What is dither[3] used for? It makes a digital system behave more like an analog system, in that it can be used to make it exhibit _graceful degradation_ instead of suddon failure. Dither can increase resolution on average, but adds noise to the immediate values. For phenomena where the average is more important than the immediate value, this can be an interesting trick. A simple example of this is a first order difference equation in finite precision. Suppose I'm using 16-bit values to represent oscillator periods. In order to make logarithmic increments, I could use a difference equation y[k+1] = (1+a)y[k] where a is a small constant. However, if ay is below hex 0.00008000 the value of y remains constant. In this case, average behaviour is important, so using 0.0000rrrr as rounding term, where rrrr is a 16-bit uniformly distributed random number, gives an approximation of the correct behaviour. Random number generation (RNG) for this application is most efficiently done using a finite field primitive polynomial. However, care needs to be taken to check the statistical properties in case the same RNG is used to drive several dithering filters, although I can't see any pathological cases at first glance. If clear patterns in the output are not really a problem, a simpler solution is to just use a triangle wave as dithering function, which gives effectively a PWM output. What I'm really intereseted in is a dithering scheme where the ripple is smaller after integration of the output. This requires a higher frequency quantization step. For example, instead of using a 01 ramp, a 81 ramp can be used. What exactly is the relation to S-D modulation and the Bresenham[2] line drawing algorithm? Let's make a list. * uniform dither: Dither is from a uniform noise source. * PWM: Dither from a square wave. Same average as uniform dither, but clear pattern. Large ripple after integration. * nPWM: Like pwm, but using a triangle wave modulated to Nyquist frequency. Smaller ripple. * S-D: Minimizes ripple after integration. * Bresenham: Uses higher state resolution to generate a proper average. Ordinary rounding quantization. The reason PWM is used over S-D in motor control (as I guess from PWM modulators being part of uC hardware, but S-D not) could be switching efficiency. Larger ripple is tolerated if higher switching efficiency (lower switch frequency) is attained. See Don Lancaster's magic sinewaves[1] for explicit minimization of switching events for sine wave generation as an alternative to PWM. When low ripple amplitude is more important than efficiency, for example in audio applications, S-D modulation can be used. Question: can PWM hardware on uC be used to generate S-D signals? [1] http://www.tinaja.com/glib/msinprop.pdf [2] http://en.wikipedia.org/wiki/Bresenham's_line_algorithm [3] http://en.wikipedia.org/wiki/Dither Entry: LED lights on a circle Date: Thu Jan 11 16:47:43 CET 2007 I ran into an interesting geometrical or topological problem. Working with N-simplexes to drive LED lights using a minimal amount of electrical nodes. The problem I have is how to associate an N-symplex with a circle in a natural way. For a tetrahedron this is trivial, but what with higher dimensional things? It's an interesting problem, because it seems to make absolutely no sense to do so. There should be no reason why a higher dimensional polytope should be naturally projectable on a 2-sphere. But i can't refute it just like that. To rephrase the problem in a more down-to-earth wording: I want to create a 2-sphere covered with lights in a way that the lights are normally distributed, and i have a certain pattern to access them. At the same time, i'd like to do this with as little connections as possible. Note that simplexes are easier to work with as they follow the binomial expansion for vertices, edges, triangle faces, tetrahedral volumes, etc... Trying the simple thing of projecting a 5-plex on a sphere. Compared to the 4-plex (tetrahedron), there is one pair of crossing lines. Entry: Splines Date: Sat May 5 21:56:32 CEST 2007 Type: tex Need to figure out polynomials and splines. I'm limiting myself to uniform splines atm. Let's start with the Bernstein polynomials, which come from the binomial expansion of $$ 1 = [x + (1 - x)]^n = \sum_{k=0}^n \binom{n}{k}x^n(1-x)^{n-k} = \sum_{k=0}^n b_{k,n}(x) . $$ A polynomial written in terms of the basis $b_{k,n}$ is a Bezier curve, and its coefficients are called control points. Entry: Chaotic Oscillators Date: Wed Nov 21 01:08:37 CET 2007 making chaos with a mixer usually involves an EQ, feedback and gain. the nonlinear element is a saturation. i've never seen this particular cicuit realized as a special purpose chaotic oscillator. lots of comparator based stuff, but no saturation.. the simplest i can find is this one: http://www.ecse.rpi.edu/~khaled/papers/iscas00.pdf which uses 3 integrators, a comparator and a summing amp. i'd like to get the parts count down to 4 amps, so i can use a quad. this means the summing amp needs to be incorporated somewhere else. i'd like to try the following. on the plane i made a (faulty) circuit with a svf with positive feedback (timeconstant t = 1 for simplicity) x'' = -x + a x' if a > 0 this circuit is unstable. for a < 0 this is a standard biquad. now, if the integrators can be biased to some voltage, this bias can be derived from a smith trigger acting on one of the state variables x or x', switching between two unstable points. the st is stateful, so chaos is possible on two planes: R^2 x {1,-1} ( what i did wrong is to apply the bias to the (+) input, which can't be correct since the capacitor voltages are relative to this point, so changing bias would also change the state variables. ) so.. what about using the saturation of the opamp? by measuring the voltage at the noninverting inputs, saturation can be detected. in the phase plane of an svf, there are 4 points where saturation can occur. using the general timer principle: * detect a certain condition (one of the state variables saturated) * discharge one of the state variables see: http://www.scholarpedia.org/article/Chaotic_Spiking_Oscillators switched 2nd order: a classic unstable biquad with state variables x and y, where x is decremented with the output of a schmitt trigger before it's fed into the y integrator and the integrator chain input summer. see: http://www.cs.rmit.edu.au/~jiankun/Sample_Publication/IECON.pdf which basicly contains the circuit i just (re)invented.. EDIT: some refs from johan suykens: M.E. Yalcin, J.A.K. Suykens, J.P.L. Vandewalle, Cellular Neural Networks, Multi-Scroll Chaos and Synchronization, World Scientific Series on Nonlinear Science, Series A - Vol. 50, Singapore, 2005 Entry: Chaos and Clipping Date: Sun Dec 16 18:03:49 CET 2007 It's not so hard to understand the mechanisms of chaos in 3D switched systems. In fact, they are quite simple to construct: * In one (of two) regimes, there is one (2D) mode performing an unstable oscillation in a plane together with a (1D) mode that decays toward that plane. * Put the switching barrier (plane) such that whenever a point crosses that plane (due to the unstable 2D oscillation), it is attracted fast towards the 2nd regime unstable plane by the decaying mode, and will end up close to the unstable fixed point of the unstable oscillatory mode. Maybe linear distorted (clipped) systems are easier to understand as switched systems too? Entry: The Hough Transform Date: Thu Jun 26 15:18:38 CEST 2008 Scan an image and collect histogram information for the parameters that define the lines in an image. The parametric equation for a line is the set of points {p0 + a p0' | a in |R} where p0' is the vector p_0 rotated by +90 degree. or with p0 = r0(cos t0, sin t0) the set { (r0 cos t0 - a sin t0, r0 sin t0 + a cos t0) | a in |R} eliminating a in these 2 scalar equiations gives the equation of ponts (x,y) in terms of the parameters (r0,t0) LINE = { (x,y) | x cos t0 + y sin t0 = r0 } Turning this around, given a point (x0,y0) all the lines that pass through it are parameterized by the equation BUNDLE = { (r,t) | x0 cos t + y0 sin t = r } Given this, how to find lines? We let the points vote for bundles, then find intersections (hot spots) in bundle parameter space. For each point in an image, record its brightness, and use that as a vote for all the lines through that point. This can be done by dividing the (r,t) bundle parameter space in a rectangular grid, and let each point (x0,y0) vote for the points that are in the bundle it defines. If the number of angle bins is in the same order as the number of input point bins, this algorithm casts N votes per point (x0,y0) giving cubic complexity O(N^3) Simplifications: * Since this is a voting algorithm, knowing where to NOT to look can reduce the search space. * Thresholding: don't let dark points vote: If a point doesn't lie on an edge, it doesn't need to update the array either. If possible, reduce the input to a binary image. Subdivision: As long as the number of angle bins is kept constant, the algorithm has constant complexity whenever it is subdivided. The quality of the estimates probably goes down, as there are less discriminiating votes. In our case (radial lens distortion) subdivision is probably necessary. Algorithm: For all points: if white then update accus r(t) = x0 cos t + y0 sin t Implementation: Loop order: working per scanline y allows the expression (y sin t) to be constant, it thus needs to be drawn only once, after the accumulated intensity for that scanline is known. Since the same trick is not possible for the x, this only halves the number of updates, so is probably not worth it: its easier to transform the (x,y) into amplitude + phase, which also halves the update time. Memory access: the input can be streamed, but the state update goes all over the place. Since the input, if binary, has very little memory, it might be more interesting to perfrom random access on the input image, and stream the accumulators. This requires for each point in the 2D histogram to assemble a list of points that contribute to it. The equation for this list of points is not a line, but a small line bundle. (otherwise bresenham algo might have been used). This does however raise the interesting question: can an approximate algorithm be used that does scan lines like that? Can the image be preprocessed so lines get a bit 'fatter' ? Summarized: * Staying with the naive implementation, the iteration goes column wise over the input points, and for each point a sinusoid is added to the histogram bins. * Going the experimental route: algoritm might be reversed so that each (r,t) parameter can be directly computed using a Bresenhem based function evaluation kernel. This makes sense if the image data fits in cache, or can be made to use prefetching (lines are quite predictable access patterns). Probably numerical gradient can be used to do this iteratively: this would allow for non-histogram based incremental approximation. ( However, the space around the optimal points in the picture here: http://en.wikipedia.org/wiki/Hough_transform are not really smooth. In fact, their shape is quite odd. Is that an artifact of chopping things in bins? ) Note that the last comment is called the Radon transform http://en.wikipedia.org/wiki/Radon_transform Entry: Discretized integral transforms: domain / codomain loop order Date: Thu Jun 26 18:32:28 CEST 2008 Moving from the Hough transform -> Radon transform is an example of a more general pattern of re-arranging control around data dependencies in discretized integral transforms, basicly chainging the order of loops. The central idea is this: for finite versions of an integral transform, there exists a related finite histogram based version with inverted control. Instead of computing the integral for each sampled codomain point from all the domain points, one computes the contribution of each domain point to all the codomain points. Basicly, in the discrete version, the domain and codomain loops are transposed. For example, a generic integral transform _ _ (y0,y1) = _/ _/ F(x0,x1,y0,y1) dx0 dx1 x0 x1 can be rewritten, after discretization as 4 nested loops over y0,y1,x0,x1 (outer->inner): for each y0,y1 the result is computed directly. However, it can also be computed as an update of an y0,y1 grid by re-ordering the loops x0,x1,y0,y1. For the Hough transform there is only one integral, so there are 3 loops to reorder to get to a discretized Radon transform. Is this useful? The only scenario i can think about is the case where the histogram version is faster to compute than the direct version, but you need some estimate before you use the direct version in an iterative algorithm. Maybe this only makes sense for line integrals, not for full 2D->2D maps. Entry: Snowcrash encoding Date: Thu Jun 26 19:22:43 CEST 2008 The basic algorithm uses a single LFSR (linear feedback shift register) to construct a 2D grid by: * partitioning X and Y state space into 2 separate segments so the same difference equation of degree D can be used for both. * shifting each column by a constant number N, and performing an XOR between X and Y Let's call the observations G[i], and the hidden sequences X[i] and Y[i]. These use addressing A(x,y) = A[x+Ny]. Alternatively, primed array names represent transposed addressing A(x,y) = A'[xN + y]. The equations from LFSR state dependency and output mixing: X[D+i] = f( X[i] ... X[D-1+i] ) 0 <= i < (N^2 - D) Y'[D+i] = f( Y'[i] ... Y'[D-1+i] ) 0 <= i < (N^2 - D) G[i] = X[i] + Y[i] 0 <= i < (N^2) --------- 3N^2 - 2D The number of unknowns (X[i] and Y[i]) is 2N^2 ( The input data consists of N^2 points. ) With independent equations, this means it is soluble if 2N^2 <= 3N^2 - 2D or D <= N^2 / 2 N >= sqrt(2D) For D=15 -> N=6 I don't see why this should be 7 as in the current implementation. Maybe it's used to reduce ambiguity? Something 'feels wrong' about using a shift of 6 and a detection of 7x7 : the sequences don't simply wrap around any more, which distroys some symmetry. Q: is it possible to solve the quations in least-squares form, by adding an error term to each? Q: is there a way to test orientation properties by using some property of the relationship of the LFSR to its inverse? If the parameter space is known, it can be proven by enumeration that ambiguity can always be resolved. Maybe that is really a better approach, because I don't see a way to do it otherwise. Entry: reverse LFSR Date: Fri Jun 27 13:31:59 CEST 2008 What happens if you reverse an LFSR sequence? This is most easily seen by expressing the LFSR as f(x[i] ...) = 0. The only thing that happens is the reversal of the coefficients. So 100101 -> 101001 Now, the question is, are there LFSR equations that are symmetric in this sense? This would also mean the sequence itself is symmetric around every point, which doesn't look like it's possible. Next step: is it possible to relate an LFSR to its inverse? After all, all galois fields of order 2^N are isomorphic. It might be interesting to cast the equation into exponential form to work with multiplicative generators g^i. Are all LFSR sequences of a certain order related by mere re-arranging, like taking every 3rd bit? Hmm... getting confused. Let's view this in a transformed domain. The LFSR sequence is the coefficient of the sequence of polynomials generated by multiplications by x, the 'shift' operator. (This turns the modulo operation into a simple XOR.) In matrix/vector terms, given an LFSR polynomial a of order N, pick an initial point in the state space e = {e1,...,eN). Take the update equation of the LFSR, and call its matrix g. This is the multiplicative generator for the sequence. Q: Another thing to think about: to compute eigenvalue decomposition in |C one needs complex eigenvalues (the splitting field). However, there exists a 'real form' of this which has blocks of 2D rotations instead of complex diagonal forms. Is this possible for LFSR? Q: how many different sequences are there for a given Galois field? (number of automorphisms) Entry: LFSR resolving linear orientation ambiguity Date: Fri Jun 27 15:28:37 CEST 2008 Q: What happens if the LFSR in one direction is XORed with its reverse? does that still give a proper sequence? That way it would be possible to have contribution of 2 directions, which then can be used to resolve ambiguity. Equations: ( with A'[i] = A[N-i-1] ) L[D+i] = f( L[i] ... L[D-1+i] ) 0 <= i < (N - D) R'[D+i] = f( R'[i] ... R'[D-1+i] ) 0 <= i < (N - D) X[i] = L[i] + R[i] 0 <= i < N --------- 3N - 2D Number of unknowns: 2N D <= N/2 The only problem then is to figure out whether the systems are independent. Does this sum of sequence + reverse somehow reduce to something of lower degree? For one thing, it is reversal-symmetric around the initial state, which can be taken not to ly in the surface of interest. Is there anything else destroyed? In terms of the unit element e, and the generators g and g^-1 the sequence becomes a cosine instead of an exponential: e g^i + e g^{-i} = e (g^i + g^{-i}) This can no longer be written as an LFSR, because it has hidden state (LFSR has a naked state). It is a more general form of linear system. I don't se any reason why this can't be generalized to the 2D case. Q: Are there points where this XOR operation itself leads to ambiguities? In other words, are there XORed segments that have multiple parents? If so, then it is probably possible to find a small upper bound of the number of observations to take to eliminate this ambiguity. ( Since an large ub is probably easy to find: take one that reduces the search space for collisions to a small amount, i.e. one case. ) Note that if there are such ambiguities then a simple exhaustive search for soluble sets of equations in the non-symmetric 2D case might be a better approach. Entry: symmetric 2D LFSR Date: Sat Jun 28 12:52:36 CEST 2008 For a 2D LFSR grid, the problem to solve is rotation ambiguity. Instead of using an LFSR generator for each direction, use 2 like in the 1D case, but tile them such that the structure is rotation-symmetric. I.e. for N=3 the 4 LFSR are wrapped as: X+ Y+ X- Y- 1 2 3 3 6 9 9 8 7 7 4 1 4 5 6 2 5 8 6 5 4 8 5 2 7 8 9 1 4 7 3 2 1 9 6 3 Analogous to before, this gives 5N^2-4D equations and 4N^2 unknowns, leading to the solvability condition: N^2 >= 4D This looks like a nice idea, but using a 6x6 estimator for D=15 gives underdetermined equations.. Apparently, there is some information lost in the XOR. Intuition: do something like this in |R and you might reduce condition, but it is hard to get to linear dependence. In a finite space it is a lot easier to end up with linear dependence because there is less room. Entry: Lens correction Date: Sat Jun 28 13:15:43 CEST 2008 Grid estimation with lens correction, roadmap: - Lens parameter estimation: cubic distortion + optic center -> try this with manual approach first. - general flow: sobel edge -> histogram for direction -> projection on 2 orthogonal directions -> grid phase + frequency. - Optionally: search the LFSR 2D space for sets of equations that are not soluble. Entry: 2D direction estimation (confused) Date: Mon Jun 30 14:20:24 CEST 2008 I first thought this would be Hough transform in r,t with the radial direction integrated out. However, that doesn't work because of the way accumulation works: each t is voted for in the same way, but with different r. I can't see how a bundle-accumulator would work for estimating direction. How can a point vote for a direction? Doesn't make sense. What about hierarchical Hough transform? We know there are only 2 directions, and they are orthogonal, so basically, there is only one. This looks like a successive binary estimation problem. First choice between t=0,pi/4 But, doing it this way: how to pick one of the two angles? The result will be 2 accumulators for an r spectrum. We're not interested in r, but it will look like a spectrum with multiple (blunt) peaks. The decision rule could be something like high-pass content (local variation). Entry: simplified HT for direction estimation Date: Tue Jul 1 11:12:56 CEST 2008 Essentially, a Hough transform H(r,t) of an image I(x,y) can be seen as a likelihood function for the parameters r and t. Now observe a typical 'sinogram', like the one here: http://en.wikipedia.org/wiki/Hough_transform Given the Hough transform of a single line, take an r-spectrum for a certain angle: it will will have less variance when the angle is close to the correct one. (The integral of the r-spectrum sums to the number of line-points that voted.) It would be interesting to be able to ignore the fact that there are multiple lines, and have them all contribute to a single parameter. In fact, the first harmonic in the DFT of the r-spectrum will give an estimate for the phase and offset. Let's try this. Entry: Generalized Pitch Detection Date: Thu Jul 3 16:26:11 CEST 2008 I'm not sure if I can capture my intuition about this, but let's try anyway: There exists a class of problems which exhibit somewhat periodic cost functions: let's call them Generalized Pitch Detection problems (GPD). The problem with such problems is to add bias to the estimator so it will go for the desired pitch, and not one of its (sub-)harmonics. The grid estimation problem in Snowcrash is a GPD problem. Entry: LFSR: roundup Date: Sat Jul 5 14:45:13 CEST 2008 After a couple of days of tinkering in the Scheme lab, thinking about algorithm complexity and being distracted by HOP for prototyping (which works nice btw..) it's time to get something done. Tasks: (A) lens distortion estimation from running example. (B) grid parameter estimation Approach: do both based on Hough transform. First one: local tx, second one: global tx with transformed coordinates. I do need move from Scheme to C code because I'm running into trouble with execution time. HOFs are nice, but parametricity is slow.. Entry: Hough over 0->pi/2 Date: Fri Jul 11 17:28:55 CEST 2008 ( exploration log ) Got the Sobel + HT working in PF; drawing some nice pictures. Thinking about letting theta + pi/2 vote for theta. What should r be? Hmm.. This gives two mixed r spectra: for a certain theta, grid might shift, so this doesn't seem like a good idea. The pictures help a lot to build some intuition though.. Looking at it, it doesn't seem too difficult to estimate scale, phase and angle, right from the image, using the right biasing (filtering). Try some multires? With r,t going over their full range, there are 4 angles, pi/2 apart. How to change the transform such that a symmetric diagram over 0->pi/2 is possible? The full diagram is redundant, since for each (r,t) there is an (-r,t+pi). The first reduction is to either halve r or halve t. Let's halve t. I'm using only one quadrant for x,y. Maybe change that too? Yes. Gives a prettier picture: the whole (r,t) plane is used. What about this one: Compute the sum of squares over r. Since we know the integral is constant, the integral of squares will give some distinction between highly concentrated regions and spread-out regions. Is this entropy? No, entropy is \sum p_i ln p_i p_i don't need to be normalized -> it offsets the entropy by ln \sum p_i Both entropy and energy are superlinear nonlinearities, so will emphasize outliers. Which one is the more natural measure? Interpretation of the r spectrum as a PDF is quite straightforward, but interpreting it as an energy, not so.. Maybe the potential energy of a heap of sand? Potential energy is proportional to height, integral from 0->top is quadratic in top. Plotting the energy seems to give quite a clear distinction. However, this still requires the construction of a histogram. Instead of using energy or entropy, another way to introduce bias is to assume the r period R is assumed to be known, it can be used to construct a filter: simply summing (cos 2 pi r / R, sin 2 pi r / R) So, instead of voting for the bundle (r,t) | r = x cos t + y cos t, t \in [0,pi] we vote for the phasor: (e^{i r/R} , t) The advantage is that the r dimension can be reduced by immediately accumulating the phasors (non-periodic contributions will interfere destructively). This basically demodulates the grid carrier. My intuition says it's possible to get away with fixing one parameter to get a simpler/faster kernel function that can then be used to perform incremental refinement of estimates of that parameter. This also might benefit from representing the image as binary data, reducing memory accesses. (Done: I don't see dramatic speedup, maybe because at that time it was still compute bound due to sin/cos evaluation?) ( Also, it might help to switch to the Radon transform, to compute the line integral directly. Hmm.. that's not the same: Radon transform would help if we're looking for lines, but here there is some benefit to computing a whole r-spectrum, which can then be reduced to a single phasor. ) Entry: Toroidal Hough Transform Date: Sat Jul 12 13:49:32 CEST 2008 For grid estimation, we're not really interested in individual maxima on the r-spectrum. Reducing this spectrum to a couple of values can be done by adding bias toward a period R. Then, instead of using a voting algorithm for bundles: (r,t) | r = x cos t + y cos t, t \in [0,pi] we vote for the phasor: (e^{i r/R} , t) The advantage is that these votes can be accumulated immediately, so memory access becomes simpler, and a histogram interval discretization step is not necesesary. The core computation is the evaluation of cos/sin, the rest seems memory-bound. Given R and t, the grid period, compute the complex phasor p: p(R,t) = \sum_n e^{i r_n(t) / R} r_n(t) = (x,y) ? x cos t + y cos t : 0 The inner loop is the sum over the exponentials of r_n(t) for linearly increasing x with y fixed, so this can be updated. However, the image will be mostly sparse, so this will probably not bring much speedup (points are not equidistant, so multiple rotation update matrices are necessary). ( Tests indicate the 2 approachs: straight eval + update are about the same speed. Maybe it's memory bound? ) Entry: Inner loop over angle/tangent Date: Sat Jul 12 14:21:05 CEST 2008 ( I'm not sure if this is useful: only works when the angle/tangent loop is the inner one. ) Since we're not using any resolution-reducing histogram, and floats/doubles have a large dynamic range, can the parameter space be turned into something more convenient so the kernel method for evaluatiing p(R,t) becomes simpler? Goal: make a kernel method that evaluates p(R,t) in a different parameter space. Let's try to replace it by (o,m) : offset and tangent of angle, and use the computation of p(R,m) to compute p(R,t). The bundle equation for (x,m) becomes: (o,m) | o = y - m x Voting for e^{i o/O(m)} = e^{ i (y - m x) / O(m) } becomes fairly trivial: the exponential can be accumulated since it's linear in both m and (x,y). The polar coord version is only linear in (x,y), so works only for a fixed t. Entry: Accellerating the inner accumulator loop Date: Sat Jul 12 16:47:21 CEST 2008 Bring the conditional out of the loop. RLE will transform the sparse vector into an array of offsets that can be used to perform the incremental accumulation without a conditional in the inner loop. This representation needs to be computed only once, and can be reused for every p(R,t) evaluation. e^{i (x cos t + y sin t) 2 pi / R} ( Moving one step further, instead of storing offsets that will have to be transformed to phase increments once per inner loop, what about storing the phase increments directly? This doesn't make sense for t=constant however, since each increment is used only once. It does make sense when multiple R are evaluated per t. ) The inner loop with y=cte looks like: \sum_X e^{i (a x + b)} = e^{i b} \sum_X e^{i a x} Where: X = {x | image(x,y) = 1} (range of sum) a = 2 pi cos t / R b = 2 pi y sin t / R This makes the inner loop accumulation the accumulation of sin/cos of scaled versions of the RLE x values. Here a is the 'speed' at which the points in X rotate around. This depends on t: the speed is a projection of the expected speed R under angle t. Doesn't look there is a shortcut there, because of the absence of structure in the set of points X. ( With RLE, the running time is proportional to the number of white points in the input image. Maybe some preprocessing step is desired to eliminate blurry gradients? ) If sincosf (math.h) can be used, this might go a bit faster. Entry: implementing RLE Date: Sun Jul 13 10:37:17 CEST 2008 Each row incrementally adds a multiplier e ^ { 2 pi sin t / R }. Instead of performing a conditional on the input RLE data stream, this might be encoded as a single bit. Entry: Toroid plots Date: Sun Jul 13 18:02:10 CEST 2008 Got some reasonably fast kernel routine and used it to generate a theta/period plot. Gives results as expected: main period peak is quite clear, and with an initial estimate it can probably be computed iteratively. Entry: AR - ARMA Date: Wed Oct 1 17:10:57 CEST 2008 How to fit an ARMA model? It's been a while. http://en.wikipedia.org/wiki/Box-Jenkins There are three primary stages in building a Box-Jenkins time series model. 1. Model identification * Detecting stationarity: can be done by inspecting the autocorrelation. Slow decay (a flat spower spectrum without isolated peaks) can indicate non-stationarity. * Detecting seasonality: if there is significant periodicity this can be removed (modeled separately), or included in the model order estimation. (The idea being that periodicity comes from external inputs, something which the ARMA model doesn't accomodate.) * MA order (q) selection from autocorrelation plot. For an exact model, this becomes zero after lag = q. * AR order selection (p) from partial autocorrelation plot. (CHECK THIS). This could be automated using information-based criteria such as FPE (Final Prediction Error) and AIC (Aikake Information Criterion). 2. Model estimation Once a suitable model order is found, use a NL-LS or ML method to estimate the model parameters. 3. Model validation The error term is assumed to follow the assumptions for a stationary univariate process. For ARMAX the approach is similar? So, what's the difference between using NL-LS or ML methods? Linear least squares corresponds to maximum likelihood if the errors have a normal distribution. It looks like this is no longer the case for NL-LS. Entry: motor control Date: Mon Oct 6 01:18:51 CEST 2008 Something I forgot about modeling motors is that for AC induction motors, the torque-speed curve is quite nonlinear. First, an asynchronous AC motor is basicly a transformer where the load of the secondary is the rotary dynamic attached to the axis. This load can be modeled as a torque -> angular velocity transducer. T -> LOAD -> w_L The controller that closes this loop maps w_L to T through the the motor's torque-speed T/w characteristic, which has two inputs: voltage V and synchronous velocity w_s. V,w_s | V w_L -> M -> T The nonlinearity between w_L/w_s and T is motor-dependent. This was helpful: http://www.sea.siemens.com/step/templates/lesson.mason?ac_drives:2:1:1 http://en.wikipedia.org/wiki/Direct_Torque_Control Torque can be estimated from the motor current+voltage. This allows to build a controller with simple measurement sensors, but doesn't allow zero-speed control. ( Explain the flux-linkage estimation. ) With accurate speed/position measurement better control is possible: http://en.wikipedia.org/wiki/Vector_control_(motor) http://en.wikipedia.org/wiki/Variable-frequency_drive "AC motor characteristics require the applied voltage to be proportionally adjusted whenever the frequency is changed in order to deliver the rated torque. For example, if a motor is designed to operate at 460 volts at 60 Hz, the applied voltage must be reduced to 230 volts when the frequency is reduced to 30 Hz. Thus the ratio of volts per hertz must be regulated to a constant value (460/60 = 7.67 V/Hz in this case)." (The idea behind this being that sending DC through a coil is a bad idea: when the frequency is reduced, the reactive part of the impedance goes down, which makes the resistive part more significant, resulting in more heat dissipation.) Entry: FFT window design for sinusoidal synthesis Date: Mon Oct 6 13:30:26 CEST 2008 (Mostly note to self. FIXME: explain this a bit better.) In 2000 I tried to design windowing functions for spectral synthesis: synthesizing a sum of sinusoids by constructing a DFT spectrum through updates of a limited number of frequency components per sinusoid. I apporached this as a FIR filter design problem. However, I came to realize that approaching this as a filter design problem, focussing only on stationary signals is not sufficient. The real problem is about amplitude modulation effects that happen when mixing subsequent stationary signals, as a result of phase discontinuities. The only way to tackle this properly seems to be to increase the spectral update rate. It would be interesting to see whether this can be solved in the DYNWAV model: it already has a variable frequency mechanism (wavetable playback) which could be used to capture "average" frequency change without amplitude modulation effects. However, synthesizing multiple pitches still gives the same problem. Entry: I/O delays in Digital Control Date: Wed Nov 19 16:40:31 CET 2008 I got a bit confused by conversations about latencies of digital control systems, but it is quite straightforward: For a synchronously sampled control system with input and output updated at t:0,1,... the input data sampled at t=0 can only influence actuation at t=1 (when process delay Z=1) and as such the effect of this actuation can only be measured at t=2. Entry: Normal Numbers and Automatic Differentiation Date: Sun Feb 22 15:08:07 CET 2009 Extending the real numbers with e satisfying e^2 = 0 produces the normal numbers, equiped with extended elementary operations. Besides being nilpotent, e behaves as a real number. A normal mumber has two components a and b, and is denoted as a + b e Given an analytic function f(x) over the real numbers, the extended function over the normal numbers has the following remarkable property: f(x + e) = f(x) + f'(x) e This can be derived starting from the power series expansion of f(x) at x = 0, and lifting all operations to those of the normal numbers. The entity e effectively selects the first derivative from the power series. Practical significance: given an algorithm for computing f(x), an algorithm for computing both f(x) and f'(x) can be automatically derived by extending all elementary operations to their equivalent on the normal numbers. The "magic" is in that in general, single-expression function derivatives _look_ much more complicated than the original function expression. This complication is artificial: it can be alleviated by introducing proper decomposition. This is exactly what AD does: complexity doesn't really rise, but the amount of data that is tracked in the computation is increased: there are more local dependencies in the data flow, but global size doesn't explode. Instead of using normal numbers, the same mechanism can equivalently be explained as explicity tracking both function value and deriviative, and updating them using the chain rule. [1] http://en.wikipedia.org/wiki/Automatic_differentiation Entry: Graceful degradation of integer values Date: Wed May 6 13:52:21 CEST 2009 Transmitting binary digit-encoded values is problematic in the sense that not all digits have the same noise sensitivity in terms of the encoded integer value. I.e. binary 1100 = decimal 12 binary 1101 = decimal 13 (diff = 1) binary 0100 = decimal 4 (diff = 8) Is there a way to encode a number such that each flipped bit will cause the same (or similar) error in the integer value, and not too many extra bits are used. One way to do this with lots of redundancy is to use a PWM representation. I.e. for 4 bit values, this gives 16 bits representation and a lot of redundancy. However, can this principle be used in a way that isn't too costly? 0 000 1 100 010 001 2 011 101 110 3 111 Using bigger numbers the redundancy explodes quite fast. Is there a way to "flatten the bulge" in the redundancy curve? This is related quite a bit to the concept of entropy. Patterns where the numers of ones and zeros are equally distributed are more common because there are more variants. What about using 100 010 001 as representations of 1,2,3. That way bit flips will only jump between "bands" and no longer all over the place. 0 {000} {0} 1 {100 010 001} {1,2,3} 2 {011 101 110} {4,5,6} 3 {111} {7} This produces a lattice structure. Now, is there a way to do this recursively? Convert each number to transitions. 100 10 010 11 001 01 The same structure is found the the other section since it's simply the inverse. Hm.. I don't see it. What I see is that numbers near the middle bands are all equal in some way. It's as if this says that only black and white are really important (they are outliers), and the rest is not so special. It's the binomial expansion of (1 + x)^n 1 1 1 1 2 1 1 3 3 1 1 4 6 4 1 Can this be used in control systems? If the errors are small, we don't really care much about their exact value, just their sign and approximate size. If the errors are large, we really want to know them well. Suppose a we have n wires carrying a single signal. If they're all 1 or all 0 that's significant. If they are equally distributed that's not significant. So, this is quite a natural representation of "normalness". Flipping one bit won't do much, but the ensemble has a meaning that can be interpreted as a number. Now, can we count with these numbers? Or at least treat them as values to which we can apply functions? Let's define this better. - A node is a binary function of time. - A number is a set of nodes. - The sum of two numbers is the union of their sets. - The weight of a number is the cardinality of its set. - A "reduction" translates a number of high weight m to one of low weight n, m > n. Computation seems is hidden in the reduction operation. Let's define one for cardinality 2->1 This should be a transition of classes, probably random to eliminate arbitrary choices. The "outer" classes will remain while the inner clas picks a random choice. 2 ----> 1 00 | 0 01 10 | 0 1 11 | 1 Similar for the next in line. 3 ----------> 2 000 | 00 100 010 001 | 01 10 011 101 110 | 01 10 111 | 11 Now, can reduction be implemented with an electronic device that has some kind of balanced internal noise source? Now.. 3->1 can be implemented as a deterministic majority gate. But it seems that the big idea in the proposed computer is its indeterminate nature. So, with completely fair n -> (n-1) transitions, what is the probability of an n->1 transition? Wait. The simplest gate is really to just drop one bit randomly. (A computer full of Kill Gates?) Maybe random killing isn't the trick, maybe random permuting is? Then killing is simply not using an output. With a primitive in the random transposition gate. Enough insanity.. There might be something hidden here, but it is really dependent on cheap connections. It probably needs a 3D structure to be implemented. Random transpositions don't seem so difficult.. It's basically a beam splitter. Entry: Counting votes. Date: Mon May 11 09:54:21 CEST 2009 This is a rehash of the previous post. For bit vectors b of dimension n, the function bits : 2^n -> [0..n] introduces a partition[1] of 2^n. The equivalence classes have an order relation, determined by the bits function. The central idea in using this partition to represent numbers in terms of bit vectors is the distribution of bits(v) when v is drawn from a uniform distribution. It is a binomial distribution[2]. Such a distribution naturally encodes "agreement" (most bits zero or one) as an exceptionally ordered state, and "disagreement" as the natural disordered state. This non-linearity is quite a good match to the non-linear behaviour of neural networks when modeled as an inner product followed by a limiting nonlinearity, without ever using any explicit computations. Simply concatenating a number of such vectors gives a resulting vector that represents anything on the scale of no->disagrement->yes. The size of an individual bit vectors dermines its weight in the vote: v_out = [ v_1 | ... | v_n ] Let's call a single bit vector v \in {0,1}^n a "vote". Each vote has a weight n which is its dimensionality. Now, to make this managable, the only real "computation" that has to happen is to reduce the weigth of votes so they can be combined with other votes to form new ones. The essential observation is that the distribution of a vote with weight n is qualitatively similar to the distribution of a vote with weigth n-1. In other words, simply discarding one element of the bit vector on average has no qualitative effect. Let's call this "weight loss". The same reasoning goes for randomly flipping a bit. On average this has little effect, giving rise to good noise immunity. This gives some form of computation (it can do anything feedforward neural networks do) in a statistical way. The next step is to make weight loss programmable. A vote should be able to inhibit another vote. The problem here is that it is a physical operation. Inhibition means to cut wires. Is it possible to do a non-phsyical inhibition? Yes. The and/or gate to introduce assymetry between yes and no. One vote could be combined with another vote (gate it). So, what then is a nerve cell? It's where multiple wires come together and a random part of them is discarded. Forgetting is the essence of computation. ;) So, given a (huge) set of binary nodes, there are only two operations: - reduction: taking a subset of signals to create a node - introduction of assymetry (inhibition through AND|OR) = gating [1] http://en.wikipedia.org/wiki/Partition_of_a_set [2] http://en.wikipedia.org/wiki/Binomial_distribution Entry: Computer Modern is Too Thin Date: Mon May 11 18:11:07 CEST 2009 Not really so math-y but I have no other place for it.. Something that has been bugging me ever since I used LateX for the first time (1998?) is that Computer Modern is a very thin font. Looking around the web, I've found several people complaining about this. On paper this problem isn't as pronounced because resolution is high enough, and black is black. But on screen, thin lines turns grey, and that is really annoying. It's easier to read black blobs than grey ones. On blurry low-quality paper, ink doesn't turn to grey either. So, how to fix this? The simplest way to fatten up a font is to use erosion (replace pixel with minimal pixel value in a 3x3 square). Alternatively, blur followed by increased contrast will also work. So, why doesn't ink on paper turn grey when it spills out a bit? The reason is probably that what is diffused is pigment (amount of absorption or negative color) and not light intensity. Even with pigment diluted, black is still very black. Code: #include #include int min (int a, int b) { return (a < b) ? a : b; } int main(int argc, char **argv) { int w=0,h=0,d=0; size_t size = w*h; unsigned char *buf; unsigned char *obuf; int l,c; again: if (3 != fscanf(stdin, "P5\n%d %d\n%d\n", &w, &h, &d)) exit(0); printf("P5\n%d %d\n%d\n", w, h, d); size = w*h; buf = malloc(size); obuf = malloc(size); fread(buf, 1, size, stdin); for (l = 1; l < h-1; l++) { int line = l * w; for (c = 1; c < w-1; c++) { int l1 = buf[(line-w) + (c-1)]; int l2 = buf[(line-w) + (c)]; int l3 = buf[(line-w) + (c+1)]; int l4 = buf[(line) + (c-1)]; int l5 = buf[(line) + (c)]; int l6 = buf[(line) + (c+1)]; int l7 = buf[(line+w) + (c-1)]; int l8 = buf[(line+w) + (c)]; int l9 = buf[(line+w) + (c+1)]; obuf[line + c] = min(min(min(min(l1,l2), min(l3,l4)), min(min(l5,l6), min(l7,l8))), l9); } } fwrite(obuf, 1, size, stdout); fflush(stdout); free(buf); free(obuf); goto again; return 0; } Let this operate on images generated by "pdftoppm -gray -r 300" before passing it to a DJVU converter. Don't do this on lower rez. Entry: Throwing away the right information Date: Tue Jun 2 14:10:46 CEST 2009 While the computation mechanism explained in the previous posts might be interesting as a mathematical toy, I'm not sure whether building something like that is practical. It requires a huge amount of connections. A human brain contains about 10^11 cells with on average 7x10^3 connections. In log10 that's 11 and 3.84 with a ratio of 2.6 The thing is though, the architecture has failure built-in. So manufacturing failure should probably not throw off things too much. It can be built with low yield, might might drasticly reduce fabrication costs, to a point where it might be feasible in a home lab with something other than sillicon. Entry: Analog vs. Digital Date: Tue Jun 2 14:24:28 CEST 2009 My original idea started with bringing back uniform error sensitivity to digital systems: in a digital system, it is possible to get error-free data transfer with an arbitrarily high probability. However, when an error does occur, it is usually fatal. This is in stark contrast with analog communication: errors will degrade the signal, but are far from fatal. The information is of a different kind: there is always some error but "we can live with it" and a little more error makes a little more annoyance, but no fatalities are introduced abruptly. Now, is there something inbetween these things? Is it possible to use the signal re-generation property of digital systems with graceful degradation observed in an analog system? In other words: contain errors locally, but make sure that noise that gets promoted to signal does not have global effect. This seems to be different from error detecting/correcting codes: these work well up to a certain noise level where they completely fail. This is more about representation of data. About what a _number_ really is. Voting has this property, but it also is extremely wasteful for representing "don't care". TODO: make this a bit more formal and formulate the computation properties as continuous statistics of a discrete (limit->inf) system. Entry: Counting votes. Date: Sun Jun 7 15:06:40 CEST 2009 Representing a non-negative integer as a number of 1 bits in a bit vector allows the use of concatenation as addition. Let's investigate the effect of dropping 1 bit at random from a length n bit vector. More specificly, given the expected value of the number of 1 bits in a vector, calculate this value after dropping one bit at random. Entry: Sigma-Delta analog synth Date: Tue Jun 9 11:13:03 CEST 2009 I think Sigma-Delta[1] modulation is a great idea. The idea behind this method is what triggered the previous posts. It has been part of my lingering background noise for a long time, influencing a lot of the ideas that went into the Sheep synth. This modulation scheme has an aestetic appeal I can't really describe. So, to get to know the modulation form a bit better, I'd like to build a modular analog synth using this representation for signals, by patching up a couple of PIC and discrete logic chips. However, this is probably more suited to FPGA implementation. Let's see what operations are supported: * switch/mix: interleave two streams * distorted amp: perform AND/OR on an unbiased signal * inversion: XOR with a stream of known average * differential: pulse->edge: amplitude to frequency conversion. Some real-world hacks on top of this: * A->D: PIC's 12-bit AD Modulation Instead of using higher order filters as in an SD tailored to accomodate a low-noise band of arbitrary shape, we use an n-bit wrap-around accumulator as a discrete integrator. To convert an n-bit signal to this representation, simply accumulate it. This yields a "digitial phasor" which rotates around 2^n states over time. Observing the carry bit of the accumulation as a binary signal yields a signal with the same average as the original signal. The reason this encoding can be used as a representation of audio is that the error is small for low frequencies (where it matters). Almost all encoding errors are present in the higher frequency bands. Using higher order integrating filters will give better results if the objective is good quality audio. See [1] for more information. However, for us the simplicity of the representation is more important, since we want to use it as a computation substrate. Its connection to frequency modulation is of key importance. Demodulation Converting from a bit stream to a signal stream can be done using accumulation and subsampling. Some information is lost in the modulated form (high frequency spectral content) and when converting back to a stream of numbers, we'll loose some more. Averages: 0, 1 and 1/2 There are 3 special signals to distinguish: signal average 0000... 0 0101... 1/2 1111... 1 The all-zero signal corresponds to a phasor that doesn't change. The alternating 0/1 signal corresponds to a phasor in a binary oscillation, which corresponds to an input signal of 2^(n-1). The latter however is a limit case and cannot be generated by the phasor mechanism. The maximum is a signal with a 0 bit once every 2^n 1 bits, reflecting the maximal input signal amplitude of 2^n-1. To simplify interfacing to the analog world, let's call 0101.. the reference signal (zero). This allows 0000.. to be the clip min and 1111.. to be the clip max amplitude. Statistics In order to calculate with SD modulated signals and prove some of their properties we need to clearly define how such binary sequences relate to sequences of real numbers. It is remarkable that magnitude is encoded in an single event stream. How to fish out the magnitude info? In first approximation we forget about time altogether and interpret a binary sequence as draws from a random variable. This allows us to work with the expected value defined as: E(a) = \sum p(a=x) x With x ranging over the values {0,1} this simply reduces to E(a) = p(a=1) This mathematical model connects to the messy physical world by observing that a finite average of a sequence of a_i samples approximates E(a). Anything we say about the statistical property E(a) in some sense carries over to what the estimator produces. In practice this means that the sequences can represent time-varying low frequency signals by cutting off the integration at a particular low frequency. Attenuation The logical AND of two signals behaves as attenuation. A 0 in one signal can cancel the transmission of a 1 in the other. As long as this doesn't lead to cases where a disproportionate number of 1s is canceled due to correlation of the two bitstreams, the effect on the expected value is to be attenuated by an amount proportional to the expected value of the other signal. Representing AND as real number multiplication on the set {0,1} allows the expression of the expected value of the product E(ab) = \sum p(a_x, b_y) a_x b_y for events a_x and b_y ranging over {0,1}. Of these 4 terms only p(a=1,b=1) produces a nonzero contribution. This can be simplified to E(a) E(b) if the random variables are independent[3] meaning p(a_x, b_y) = p(a_x) p(b_y). Note that "time slots" erased in a signal can be filled up with those from another one, which might be created through an AND operation in a similar manner. This then provides a means for implementing addition without the need for recoding. This naturally leads to consider the interpolation of two signals a and b, by a third signal x OR(AND (x,a), AND(~x,b)) This fills up all available time slots by multiplexing the signals x and y proportionally to E(x). A special case of this happens when b = ~a, leading to the XOR of two signals XOR(a,b) = OR((AND(x,a), ~AND(~x,~a))) This interpolates between a signal and its polarity reversal. Note that multiplication with factors >1 is not possible locally. The signal would need to be demodulated into a form where extra energy can be added, which would then result in this being spread out over time. However, it might be possible to construct some kind of buffered mechanism that merges two streams, ensuring conservation of energy. Decorrelation Signals coming from an SD modulator are very repetetive, which means that they can become correlated, breaking down multiplication. In order for this to work some form of decorrelation must be added. Every modulator could be equipped with a dither module, which would also improve the SNR. An analogy (Thu Oct 19 22:21:22 CEST 2006) Think of an SD-modulator as a bus stop where on each timestep (say 10 minutes) a bus either leaves or not. A bus only leaves if it's full, and the number of people that arrive in a single time step is never greater than the capacity of the bus. This way, the bus stop never needs to accommodate more than one bus worth of people. Once i got this picture in my head i couldn't get rid of it any more. It clearly illustrates the conservation properties of a bus stop, which implies the SDM preserves averages and low frequency signals. References I've read some papers by Philips people about the SACD.. It included some operations that can be performed on an SACD stream without decoding it. What I remember is the use of interleaving to mix two streams. For other operations they always used (de / re) modulation around an ordinary n-bits representation. I found this[2] in Chuck Moore's site, which is related but talks about analog domain VCOs. [1] http://en.wikipedia.org/wiki/Delta-sigma_modulation [2] http://colorforth.com/immunity.htm [3] http://en.wikipedia.org/wiki/Statistical_independence Entry: Symmetries of boolean functions Date: Tue Jun 9 14:55:19 CEST 2009 Inspired by the idea from the Sheep synth that only XOR and AND are interesting ways of combining oscillators, here is some investigation of the why of that. There are 2^4 boolean functions that map 2 inputs to one. There are only a couple of interesting ones after you remove 0/1 morphism. In the following, the squares indicate their Karnaugh maps[1]. | A | 0 1 ----+-------- B 0 | . . 1 | . . Simply looking at the Karnaugh maps already suggests a number of equivalence classes: Select class: 1/3 NAND OR => <= 1 1 0 1 1 0 1 1 1 0 1 1 1 1 0 1 AND NOR N=> N<= 0 0 1 0 0 1 0 0 0 1 0 0 0 0 1 0 Transform class: 2/2 diag XOR <=> 1 0 0 1 0 1 1 0 Drop class: 2/2 non-diag A B !A !B 0 1 0 0 1 0 1 1 0 1 1 1 1 0 0 0 Ignore class: 4/0 0 1 0 0 1 1 0 0 1 1 Each class is a set of boolean function which is closed under inversion of one of the 2 inputs, or the output of the function. The classes can be constructed from a single representative f. With i_o, i_a, and i_b inversion or identitiy, the functions g(a,b) = i_o(f(i_a(a), i_b(b))) make up the class. The maximum number of elements is 2^(n+1), which occurs when all combinations of inversion and identitiy yield a distinct function. Let's define an "interesting" function to be the ones that belong to large equivalence classes. In this sense only AND and its symmetries are interesting. Note that this is => (implication). Putting it as "only implication is interesting" makes it sound a bit more profound. Instead of looking at inversions, it's also possible to look at input permutations. For the 2->1 case this seems to be a closed operation in all the classes (mirror around the diagonal). In fact, in the 3->1 case we need to include permutation symmetry. So what about 3->1 functions. Is there a similar way to classify them as interesting/boring using the same classification (inversion + permutation). There are 2^(2^3) = 256 such boolean functions. They can be visualized as 2x2x2 Karnaugh cubes or folded open to one side as a flat map: | AB : | 00 10 : 11 10 ----+--------:------ C 0 | . . : . . 1 | . . : . . : fold For 3->1 functions one just needs to look at the morphisms of the 2x2x2 binary cube. With what we find from the 2->1 case, there are some things we can expect: 1. single select (3AND, 3OR, ...) 2. embeddings of the 2->1 functions 3. irreducible structures Here 3. is expected to be an analogue of non-separable convolution kernels: increasing dimensionality introduces new structures that can't be written simply in terms of lower-dimensional ones. Looking at the 2x2x2 binary cube and trying to fill it with a pattern that can be oriented in many different ways makes me think of an L-shape like this: x x . . x . . . It can be placed in one of 8 corners, with each taking 3 possible orientations, leading to 24 symmetries. However, this one is "flat". It is independent of one of the inputs. Rotating one plane of the the 2x2x2 cube 90 degrees wrt to another one (like you would do with a rubik's cube) gives this configuration: . x x . x . . . Which is no longer flat. It is like an L that maximally occupies space in the cube. This turns out to be the function (class) that's used in Wolfram's rule 110[2]. These two functions are in distinct classes (they difference is _essential_ as the former doesn't use one of the inputs, while the latter does). However, they have the same number of symmetries: 8 x 3 x 2 = 48 This follows from 8 different positions (the corners of the cube), 3 different orientations to fix one of the other 3 points at, and the inversion of the pattern. This might not be a coincidence. Are they maybe lines and planes of a projective plane, thereby being dual? For 3-input, binary, 1D CAs, Wolfram identifies 88[3] fundamentally different rules. Why are there so many? For CAs the order and polarity matters a great deal. Due to recursive application, there is little more than left-right symmetry (256->128) + some symmetry in degenerate cases. Anyways. How to look for such "interesting" functions automatically? Let's look at the interpretation of functions in this class. After all, this is simple logic, so there is natural-language intuition to connect to. _ __ _ _ AB AB AB AB C . x x . C x . . . (A & B & C) | ((~B & ~C)) | ~(B | C) Hmm not much there.. What practical predictions to make? Not more than hunches really. I suspect that a representative of this function will lead to an interesting modulation scheme for a 3-oscillator synth. This can probably be generalized to higer dimensional functions. Does giving up permutation give special status to the individual functions? This might also be usable in sound synthesis, where one of the modulators has a different function than the other sounds. [1] http://en.wikipedia.org/wiki/Karnaugh_map [2] http://mathworld.wolfram.com/Rule110.html [3] http://www.wolframscience.com/nksonline/page-57-text?firstview=1 Entry: Music and almost integers Date: Tue Jun 9 20:38:33 CEST 2009 An almost integer[1] is a number that is very close to an integer, but isn't an integer. These can be found in great number by applying trigoniometric functions to integers, ending up close to integers. (cos 22) -> -0.9999608263946371 != -1 This particular one is related to the approximation of PI as a ratio 22/7. Similar things can happen with exponentials of ratios ending up close to other simpler ratios. This phenomenon is very much related to tonal music, which is built on rational approximations to irrational numbers. Simultaneously sounding pure tones produce harmony, wich can be described as regular beat patterns. This happens when frequency ratios are ratios of small integers. This is then tied into melody by employing the ratios that make sense for simultaneously sounding tones to space out frequencies of tones intended to be played sequentially, creating scales. Melody is evolution of tone frequency over time, where repetition and proximity play an important role. This makes melody and harmony combine nicely yielding tonal music. However, this picture only works approximately. Somehow the human brain likes the kind of non-exactness. Some intervals just aren't that pure as others, and this can be used as an expressive device. I.e. a perfect fifth[2] has a ratio of 3/2. In an equal tempered scale this is approximated by 2^(7/12). Applying this ratio 12 times produces the circle of fifths[3] and ends up exactly 7 octaves higher. However, in just intonation this isn't quite so: (define (pow n x) (if (zero? n) 1 (* x (pow (sub1 n) x)))) (pow 3/2 12) -> 531441/4096 = 129.746337890625 != 2^7 = 128 This is solved by making some of the fifths that construct the scale less exact, favouring the fourth[4] (4/3) major third[5] (5/3) and minor third[6] (6/5) to build harmonicly sounding chords. [1] http://mathworld.wolfram.com/AlmostInteger.html [2] http://en.wikipedia.org/wiki/Perfect_fifth [3] http://en.wikipedia.org/wiki/Circle_of_fifths [4] http://en.wikipedia.org/wiki/Perfect_fourth [5] http://en.wikipedia.org/wiki/Major_third [6] http://en.wikipedia.org/wiki/Minor_third Entry: Filters in SD space. Date: Wed Jun 10 12:46:49 CEST 2009 In [1] it was shown that Sigma-Delta modulated signals can be attenuated and mixed using local operations as long as the energy never increases, and as long as signals remain independent. This post elaborates on the subject bringing back the concept of time. Filters To implement filters I see two possibilities: continuous time filters (integrators) and discrete time filters (based on time delay). Time-delay filters are possible with the current representation as long as independence of time-shifted signals is maintained. However, this might not be so straightforward to accomplish. Following my hunch I think it's better to go for the idea of directly modeling integration as an operation on the SD bitstream. Is it possible to re-introduce the concept of time? Note that uptil now we talked only about expected values of random variables, and ignored time completely, using some hand-waving that "short time averaging" turns the sequence back into an analog signal. With a concept of time it is probably possible to introduce the concept of integration, or at least that of a bandlimted integration (1-pole low pass filtering == an opamp), operating directly on an SD bitstream. Integration This operation is about re-introducing time in the binary sequence. Integration can no longer be expressed as the expected value of a stream, as that is a single number. Some clever way of turning the bit sequence back into a sequence of numbers (or a continuous function!) should lead to a way to express the integration operation. It seems to me that leaving the discrete domain as soon as possible is the way to approach the problem. The signal s(t) is hidden in the bitstream as follows: a(t) = \sum a_i p(t-i) = s(t) + n(t) Where p(t) is a square pulse with hight one, duration 1. What is the effect of integrating s(t) on a_i? It might be better to approach this problem from the other side. What can we do operating on the bitstream a_i to produce something that behaves like integration in the low band? Is there a way to do this without recoding? The key problem is that in order to be able to generate a bitstream, a local representation of the accumulated state is necessary, to be able to use this number to space out pulses. At first sight it does look like there is no way around a demodulation/process/modulation scheme. Additional Questions * It looks like noise is going to be the essential element to translate between the discrete (bitstream) and the continuous (statistics of the bitstream). * Is a square wave, edge triggered representation actually useful? Instead of sending out the carry bit as a binary signal, sending the carry bit as a state change generates a square wave frequency-encoded signal. This signal represents a full-scale square wave in the SD domain. * Can we represent complex multiplication as a means to introduce sine and cosine? The answer seems to be yes as long as magnitude is <1. Qantify this. * Since decorrelation is so important, can't we just use (uncompressable) information transmitted alongside the signal as an extra digital channel? * Since we're digital and thus error-less, can we maybe construct differential equations based on differentiation instead of integration? Does this make sense at all? * Maybe the only question to ask is: Is it possible to implement differential equations using a S-D integrator and negative feedback? Is such a thing stable? Both in the digital domain (non-linear limit cycles) and the statistical domain. [1] entry://20090609-111303 Entry: SD and edge-triggered representation Date: Wed Jun 10 15:35:37 CEST 2009 What is nice about the SD form is the ease at which it can be integrated into mixed analog/digital electronics. In fact, it can be attached directly to a set of speakers, or any analog circuit with low-pass characteristic. However, I wonder if moving to an edge-only representation will make things simpler. This will give a pure FM representation of a signal: value directly proportional to square wave frequency. It eliminates an element from the data representation: the width of a pulse. This is essentially an arbitrary component and can be easily reconstructed later. The important question is: does it make things really simpler? Let's re-interpret the elementary operations explored in [1] in terms of square wave signal. FM signals are trivially generated from an accumulator as the accumulator's MSB. Converting an FM signal to analog can be done in many ways [2], but the simplest ways seem to convert the edge signal back to a pulse signal either unclocked using a one-shot, or clocked using a bit delay. Since a signal's value is directly proportional to the density of transitions, addition is represented by the XOR operation. The advantage here is that addition is "dumb". There is no gating required as in the pulsed case. This seems simpler, but just as the pulse representation, this might lead to loss of information due to aliasing. However, the form of aliasing is different and consists of glitches. Proper "de-glitching" needs some clocked logic, and can only be done exactly if there is room in the output signal (if signals don't change during de-glitching pulse width). This brings us back to the complexity of dealing with bit slots as in the SD case. Attenuation could be implemented statistically as "selectively forgetting" to toggle the switch. There isn't much difference here. So, it looks like most operations can again be implemented using simple logic gates, conditionally on the uncorrelated nature of the input signal. I'm not really convinced about this since it seems to be harder to qualify: the extra work that needs to be done in a clocked/pulsed representation seems to bring at least some order that is harder to reconstruct in the FM case. The FM case however is probably simpler to interface to special puropose (non-clocked) analog circuits. Conclusion: FM representation is in some respects simpler, but harder to control using statistical time-multiplexing tricks as described in [1]. Since conversion between the two formats is rather straightforward in clocked logic, it seems that sticking with SD is best. Then FM can be used when it is convenient, i.e. when interfacing to an unclocked or asynchronously clocked circuit. [1] entry://20090609-111303 [2] http://en.wikipedia.org/wiki/Demodulation#FM_radio Entry: Using finite fields for music Date: Wed Jun 10 18:57:46 CEST 2009 One of those ideas that disappeared into the background.. Let's try again. A finite field[1] or Galois field is a field[2] with p^n elements, where p is a prime number. There is only one GF(p^n) upto isomorphism. The field p^n is the field of polynomials over p, modulo a reducable polynomial of order n. For music applications, it is probably possible to look at Galois fields in the same way as one would use the complex number field to represent sum-of-sines. Multiplicative cycles can be used as different rhytmic/harmonic components, and each element of the field could be associated to an instrument, or a combination of. For music we need the numbers 2, 3, 5 and 7 in copious quantities. Larger prime numbers are less interesting from a harmony/rhythm point of view. Given that we're looking for a certain combination of small prime numbers in the factorization of the multiplicative group, we can start typing things into a calculator as see if we end up at prime powers. 2^a 3^b 5^c 7^d = p^n - 1 A more systematic way is probably in order, i.e. generating[5] all power of primes below a certain number and investigating the factors of the multiplicative group order. However, to illustrate the idea let's just pick a few starting from the factorization. The first candidate is GF(211) which fits nicely in 8 bits. 2^2 . 3 . 5 . 7 = 211 - 1 Another one that fits in 8 bits is GF(241). 2^4 . 3 . 5 = 241 - 1 The question is: how to make this interesting? We can generate periodic sequences of numbers using formulas like: x_n = \sum a_i g_i^n where g_i are generators of cyclic subgroups, and a_i are arbitrary field elements. The first question is: given a particular representation of a GF, how many such sequences exist? Suppose we associate a certain configuration of drum sounds to each element in the field. How many patterns can be generated? Are they at all interesting? What about maps from the elements to a smaller number of drums? Maybe combinations of drums can be associated to terms in a polynomial. Is there an interesting way to combine a non-prime field with cycles this way? Maybe the interesting stuff lies in the symmetries of a field[7]? I.e. start with a simple rhythm. Find a (small) field to embed it in, then find variations of this by looking at the field's automorphisms. How can a GF be visualized? The structure of the additive group is quite straightforward: for GF(p^n) it is n times p cycles. How can the multiplicative group be shown on top of this? There's a visualization of GF(3^2) here[3]. Visualizing the the group as actions on a set seems like a good idea. Looks like a nice application for graph visualization[4]. Intuitively it seems that groups with small p and n in the same order of magnitude, will contain a more interesting structure. Instead of a single flat space (p^1) the structure is a vector space where each dimension has a separate entitiy. It allows the representation of instruments/notes (q) that can sound together in chords(n). I.e. 5 tones, 3 note-chords gives 5^3=15 with a multiplicative group of 14 = 2 . 7 elements. More concisely: order factorization add mul ---------------------------- 15 5^3 2 . 7 21 7^3 2 . 2 . 5 As a side note, maybe Galois Theory[7] is interesting to study in its own right, as it is connected to polynomials which are next to matrices the bread and butter of numerical math. [1] http://en.wikipedia.org/wiki/Finite_field [2] http://en.wikipedia.org/wiki/Field_(mathematics) [3] http://finitegeometry.org/sc/9/3x3.html [4] http://en.wikipedia.org/wiki/Graph_drawing [5] http://www.research.att.com/~njas/sequences/A000961 [6] http://en.wikipedia.org/wiki/Cyclic_group [7] http://en.wikipedia.org/wiki/Galois_theory Entry: Binary Sound Synthesis Date: Wed Jun 17 19:22:43 CEST 2009 Type: tex This is an account of a journey into ``Binary Sound Synthesis'' (BSS), which started early 2005 as a test application\footnote{The original synthesizer ran early 2005 on a PIC12F628, an 8 bit microcontroller with 6 i/o pins which I had configured to run at 1 MIPS. The current one runs since mid 2006 on a PIC18F1220, an 8 bit micro with 18 pins and a slightly more powerful ISA. The original synth is a synchronous one running at $8$kHz. At each sampling point the output state of $3$ oscillators is determined. The control updates run at $200$Hz, while the note updates run at $8$Hz, which is $1/4$ note at $120$bpm. The current one uses 3 asynchronous hardware timers, thus a higher maximal rate of $2$MHz. In a nutshell, the synth contains 2 main algorithms: cross modulation (XOR mixer), a formant synth (AND/OR mixer), LFSR pseudo-random sequence generator. It uses 3 oscillators that can be synchronized. Most of the interesting parts are in the controller that reconfigures the oscillators.} for a Forth compiler for the 8--bit Microchip PIC architectures. With BSS is meant the process of generating audible sound from digital square--wave signals using an absolute minimum of logic or code. This is old stuff. The digital approach dates from the era of early 8--bit game machines, and as such a lot of the algorithms are no longer used. Currently there are few reasons to not use floating or fixed point math with PCM outputs. The point of this paper is partly to archive and document old techniques, and to shine an idiosyncratic light on the matter. I find this a facinating subject. Probably because it is so different from the standard real and complex number based signal processing. In BSS the main focus is on the time component, since it is the timing between switching a speaker on and off which becomes the only means of controlling the sound. \section{Cycles} The simplest sounds we can produce are based on periodic square waveforms. An oscillator can be can be implemented using a frequency divider or pulse counter, which generates one output event for each $k$ input events. Multiple such oscillators can then be connected to a hardware timer interrupt. Mathematically, this can be related to addition in the ring of integers modulo $k$. We'll denote this algebraic structure as $\ZZ_k$, the cyclic group of order $k$. This section talks about such groups, as they will appear as multiplicative groups of Galois Fields. \subsection{Counters and Cyclic Groups} Cyclic groups are groups, necessarily abelian, that can be generated by a single element. In our canonical representation, this element will be $1$ in $\ZZ_k$. If $k+1$ is a power of a prime, it is the multiplicative group of the Galois Field $\GF(k+1)$, which we will encounter later. All abelian groups are composed of direct products of prime order cyclic groups, but not all abelian groups are cyclic. The prototypical example being the Klein $4$--group $\ZZ_2 \times \ZZ_2$. The exact relation between a group and a counter is as follows. We use additive notiation because all the groups are abelian. To each counter we associate a group $\G$, a generator $g\in\G$, and a state $s\in\G$. On each input event, the state $s$ will be replaced with $s+g$. Whenever the state reaches $s=0$, the unit element of the group, an output event is generated. The division factor is the order of $g$, which is the smallest integer $o$ for which $\underbrace{g+g+\ldots+g}_{o \text{ times}} = 0$. For example in $\ZZ_{6}$ where $\{0,6,12,\ldots\}$ denote the same element, the generator $1$ has order $6$, the generator $2$ has order $3$ and the generator $3$ has order $2$. \subsection{Composite Numbers} Groups with prime order are not so interesting to us, they generate only one kind of cycle. What interests us most is the combination of timers. Let's have a look at the example of highly degenerate groups, which have a number of elements equal to a composite number like the number of seconds in a week $604800 = 2^7 3^3 5^2 7$. Note that from the size of an arbitrary group we can't necessarily deduce its structure. However, requiring that the group is cyclic is enough, since there's only one of a given size. This makes cyclic groups look a lot like positive integers. A cyclic group can be constructed explicitly as a product of cyclic groups of prime power order, ensuring the component groups have orders that are mutually coprime. This is again analogous to how positive integers behave. A point of difference, however, is that groups of prime power order are not a product of smaller cycles, but they do contain all smaller prime power cyclic groups as subgroups. The number above gives the group $$\ZZ_{2^7} \times \ZZ_{3^3} \times \ZZ_{5^2} \times \ZZ_{7},$$ which consists of $4$--tuples with elements from the respective groups. This group is essentially the same as $\ZZ_{2^7 3^3 5^2 7}$. I'll denote the group order as $$\#\G=\prod_{n=1}^{N} p_n ^ {m_n},$$ and $\G$ as the abstract group, where $\ZZ_{\#\G}$ and $\times_{n=1}^{N} \ZZ_{p_n^{m_n}}$ are two isomorphic representations. In our representation of $\G$, the cyclic subgroup of order $q_n$ in $\G$ can be generated from the element $$g_n = {\#\G \over p_n^{m_n}},$$ giving a homomorphism from $\G$ to $\ZZ_{p_n^{m_n}}$. Combining $N$ such homomorphisms allows the construction of an isomorphism $$f : \times_n Z_{p_n^{m_n}} \to Z_{\#G} : (x_1, \ldots, x_N) \mapsto \sum_n x_n g_n.$$ This leads us to the more concrete observation that an oscillator based on $\G$ can be implemented either as a single counter $\ZZ_{\#G}$, using a single generator, or a ``wired or'' of $N$ counters $\ZZ_{p_n^{m_n}}$, each with its own generator, where the whole acts as a single frequency divider which only generates an event if all counters have $s_n=0$. \subsection{Musical Scale} This makes one wonder if it's not possible to construct a group that is very degenerate, in a way that it produces a relatively well spread out ``frequency spectrum'' which maps well to the (logarithmic) frequency resolution of human hearing. Keeping the number of large prime factors small, it might even be possible to create some kind of musical scale with inherent ``smoothness''. Probably, if there are enough small primes, large prime numbers are not really necessary since frequency resolution for large numbers is less of a problem. For example $d = 604800 = 2^7 3^3 5^2 7$, which is the number of seconds in a week. Moreover, $d+1$ is prime, so it has an associated Galois field, but this is no longer a polynomial field, In order to investigate the usefulness of this, we could plot out the possible frequencies that can be generated using this scheme on a logarithmic scale. On the other hand, using fields might be overkill. Using just cyclic groups (counters) might be better here. Given a a degenerate cyclic group $\G$, how many different (cyclic) subgroups does it have? All subgroups of cyclic groups are cyclic, so there is a one to one correspondence of the order of all possible combinations of subgroups of $\G$ and the combinations of prime factors of $\#\G$. Which leads us to the number of possible groups $$\prod_N (m_n+1),$$ since the subgroups of $\ZZ_{p^m}$ are $\{\ZZ_{1}, \ZZ_{p}, \ldots, \ZZ_{p^m}\}$, which totals $m+1$. The subgroups, or numbers representing the orders, can be arranged in a $N$ dimensional cuboid, addressed with coordinates $(x_1,\ldots,x_N)$ where each coordinate is limited to $0 \leq x_n \leq m_n$. The order can be computed as $$\#\G(X) = \#\G(x_1, \ldots, x_N) = \prod^N p_n^{x_n},$$ which corresponds to the the subgroups $\G(X) = \times^N \ZZ_{p_n^{x_n}}$ of $\G$ The things we are interested in is the distribution of $\G(X)^{-1}$, since it represents the number of frequences we can generate by dividing a master clock, which would be the CPU clock for example. All frequencies that can be generated in this way have a fairly simple harmonic relationship, making it possible to generate smooth scales and chords. \subsection{Playing with Subgroups of $\ZZ_{p^m}$} Given a divider $\ZZ_{2^m}$, obtaining one for $\ZZ_{2^{m'}}$ with $m' exponential % what about matrices over GF(2)? what's to find there? % most theorems in linear algebra / projective geometry % exclude GF(2) as a base field, because it behaves so differently. %% \subsection{Matrix Representation} %% Over $\GF(2)$ polynomials can be represented by (infinte) matrices, %% while polynomial fields can be represented by (finite) matrices. \subsection{Application : Linear Noise Generators} The cheapest way to generate noise for the application at hand is to use exactly the impulse response of linear systems described by difference equations as explained above. If the period of the bit sequence is large enough, the human auditory pattern detection will not be able to recover the redundancy. Following the same algebraic trick as exposed above, suppose the difference equation is expressed by $p(x)y(x) = u(x)$, and $u(x) = 1$, corresponding to the impulse sequence $(1000\ldots)$, what we need to do is to find out which period is related to $p(x)$. In other words, we have to find the $s_p(x)$ with the lowest possible degree such that $${1 \over p(x)} = {s_p(x) \over x^P + 1} = s_p(x) \sum_n x^{Pn} ,$$ where $P$ is the period of the bit sequence corresponding to $p(x)^{-1}$. This is equivalent to the congruence relation $$x^P = 1 \mod p(x).$$ To get the smallest $P$ satisfying this equation as large as possible, it is necessary to choose $p(x)$ to be a \emph{primitive} polynomial, which means it is irreducible over $\GF(2)[x]$, and $x^n\mod p(x)$ generates all nonzero elements in the field modulo $p(x)$, also denoted as $\GF(2^n)$ with $n$ the degree of $p$. %% In the quotient field $\GF(2)[x] / (p(x))$, the %% element $x + (p(x))$ generates the entire multiplicative group. Here %% $(p(x))$ denotes the ideal in $\GF(2)[x]$ generated by $p(x)$. The %% multiplication is commutative, so it is always possible to construct %% such a generator. Whether this generator is mapped to $x$ depends on %% the polynomial $p(x)$. The direct implementation of the difference equation is called a \emph{Linear feedback shift register} or LFSR. The next output is computed as the sum of a certain number of taps. I originally used a $16$bit LFSR with taps $(15,14,12,3)$, but the parallel variant is simpler to implement given a parallel XOR operation is available. This parallel variant is related more directly to the Galois field $\GF(2^n)$. More specificly, the output of the generator is taken to be any coefficient of the polynomial sequence generated by $x$, or $\sum_k s_{k,n}x^k = s(x)_n = x^n \mod p(x)$, where the output sequence could be $s_{0,n}$. To illustrate the difference between an irreducible and a primitive polynomial, take the irreducible polynomials $x^8+x^4+x^3+x^2+1$ and $x^8+x^4+x^3+x+1$. Both are irreducible, but only the former produces a primitive field. In the field of polynomials modulo the latter one, the $x+1$ is a generator, but not $x$. Note that these fields are isomorphic, or essentially equal, but their representations are different. Implementing the parallel scheme for the former polynomial on a digital computer is straightforward. Multiplication of the state polynomial $s(x)$ by $x$ corresponds to a left shift of the register containing the coefficient bits. The bit that's shifted off (the coefficient of $x^8$) is replaced by adding the other terms $x^4+x^3+x^2+1$ to the current state. In other words, when a bit carries over, \verb+1Dh+ $=$ \verb+00011101b+ is XORed with the state. %% \subsection{Using a multiplier} %% When a multiplier is present, a lot more intersting things can be %% done. One of them is the use of the linear congruential method for %% random number generation. %% \section{Resurrection} %% Digging on the web. \section{Other Discrete Structures} %\subsection{Sampling and Formants} % \subsection{Diadic Numbers} \subsection{Phase and Frequency Modulation} What happens when we modulate the generators of an additive subgroup, or period of counters? Let's call the first \emph{phase} modulation and the second \emph{frequency} modulation. Using an oscillator with period $M$ and generator $1$ to generate the generator of a second oscillator $F$ gives the following function $$F_n = \sum_{i=0}^n i = {n(n-1) \over 2}\mod F.$$ \subsection{Cellular Automata} Another interesting way to generate signals is by using cellular automata. Using the ideas above, we could represent a finite row of nonzero cells as a polynomial, and the update function as another polynomial. Giving rise to an update function in the ring $\GF(2)[x]$ or any finite field $\GF(2)[x] / (p(x))$. \subsection{The NNT and the Fields $\GF(2^{2^n}+1)$} For $0 \leq k \leq 4$ the Fermat numbers $F_k = 2^{2^k}+1$ are prime, which means there exists a primitive field $\F_k = \GF(F_k)$. This field is isomorphic to $\mathbb{Z} / F_k \mathbb{Z}$ the integers modulo $F_k$. Implementing this representation requires the implementation of a modulo operation. This can be quite expensive, especially on cheap hardware lacking a fast hardware divider. However, it is possible to embed a representation of $\F_k$ in the ring $\R_{k+1}$, where we take $\R_k$ to be the ring represented by $\mathbb{Z} M_{k} / \mathbb{Z}$, the integers modulo $M_{k} = 2^{2^k}-1$, where the $\{M_k : k \in \mathbb{N}\}$ form a subset of the set of Mersenne numbers $\{2^n-1 : n \in \mathbb{N}\}$. The implementation of this ring relies on the reduction modulo $M_k$, which is a cheap operation if $K=2^k$ is (a multiple of) the machine wordsize. Suppose we want to reduce a number $[c:a]$ of $2K$ bits modulo $M_k$, represented by the concatenation of two $K$ bit words $c$ and $a$. This gives \begin{align*} [c:a] &= 2^Kc+a \\ &= (2^K-1)c+c+a \\ &= c+a \end{align*} with operations modulo $M_k$. The consequence of this is that both addition and multiplication in $\R_{k}$ can be performed by the default finite wordlenght unsigned integer operations, followed by this very simple reduction step. This reduction is just addition due to the availability of $c$ as a separately addressable entity, namely carry bit for unsigned addition and high word for unsigned multiplication. So how to embed $\F_k$ in $\R_{k+1}$? The map $$f_k : \R_{k+1} \to (2^{2^k} - 1)\R_{k+1} : x \mapsto x 2^{2^k - 1} (2^{2^k}-1) = x 1_k$$ will do just that. Here $$(2^{2^k} - 1)\R_{k+1} = \{(2^{2^k} - 1)m : m \in \R_{k+1}\},$$ which we will call $I_k$ is the ideal generated by $2^{2^k} - 1$ in the ring $\R_{k+1}$. The element $$1_k = 2^{2^k - 1}(2^{2^k}-1)$$ is the representative in $\R_{k+1}$ of the multiplicative unit of $\F_k$. It can be used to generate the additive group, which gives us a clear relation between integers and the representation in $\R_{k+1}$ of their associated elements in $\F_k$. So how can we see this is a representation of $\F_k$? It is easy to see that $f_k$ is a group homomorphism for the additive group of $\R_{k+1}$ since $f_k(a+b) = (a+b)1_k = a1_k + b1_k = f_k(a) + f_k(b)$. To see that $f_k$ is a ring homomorphism, rests us to show that $f_k$ is a group homomorphism for the multiplicative group. We need $f_k(a) f_k(b) = a b (1_k)^2 = a b 1_k = f_k(ab)$ which is valid if $1_k^2 = 1_k$. That this is so can easily be verified. %% This is so since %% \begin{align*} %% 1_k^2 %% &= (2^{2^k - 1} (2^{2^k}-1))^2 \\ %% &= (2^{2^{k+1} - 2}) (2^{2^{k+1}} - 2^{2^k + 1} + 1) \\ %% &= (2^{2^{k+1} - 2}) (- 2^{2^k + 1} + 2) \\ %% &= 2^{2^k} (2^{2^{k} - 2}) 2 (-2^{2^k} + 1) \\ %% &= (2^{2^{k} - 1}) (-2^{2^{k+1}} + 2^{2^k}) \\ %% &= (2^{2^{k} - 1}) (-1 + 2^{2^k}) \\ %% &= 1_k %% \end{align*} %% with all operations in $\R_{k+1}$. Note there are no divisions here, %% only multiplication and modulo reduction using $2^{2^{k+1}} = 1$. The kernel $\{x : f_k(x)=0\}$ of this ring homomorphism is $\{xF_k : x\in\R_{k+1}\}$, which is a maximal ideal of $\R_{k+1}$, which means that $\R_{k+1} / \{xF_k\}$ is a field. This structure is carried into the ideal $M_k\R_{k+1}$ by $f$. To find this representation by construction see \verb+mersenne.tex+, which uses arguments to identify $1_k$ in the additive subgroup generated by $M_k$, by requiring the property $1_k^j = 1_k$, for $0 \leq j \leq F_k$, \subsection{Shape of $1_k$} Returning to the embeddings of $\F_{k}$ in $\R_{k+1}$, the interesting thing what $1_k$ exactly looks like. \begin{align*} 1_0 &= \0\1 \\ 1_1 &= \0\1\1\0 \\ 1_2 &= \0\1\1\1.\1\0\0\0 \\ 1_3 &= \0\1\1\1.\1\1\1\1.\1\0\0\0.\0\0\0\0 \\ 1_4 &= \0\1\1\1.\1\1\1\1.\1\1\1\1.\1\1\1\1.\1\0\0\0.\0\0\0\0.\0\0\0\0.\0\0\0\0 \end{align*} Here the dots just separates nibbles. This looks like a nice square wave! Another interesting thing to see is that in $\R_{k+1}$, there is only one silence, which is all zero, since it is the same element as all one. What do the other numbers in the ideal $M_k \R_{k+1}$ look like? The redundancy is $M_k : 1$, which gets large quite fast. Are the waveforms special in some way so we can use them as sounds? \subsection{Resonant Filter} Using two state output, there is really no straightforward way to add two signals, performing additive synthesis. Composing a signal consisting of different frequencies by adding them together modulo 2 behaves more like cross modulation than addition. The audible spectrum will contain not only the sum, but also the modulation components. However, it is possible to approximate $2$ tone polyphony by synthesizing waveforms that have two clearly distinct harmonic components. The approach is based on the following observation. The sequence $$\1\0\1\0\1\0\0\0\0\0\0\0\1\0\1\0\1\0\0\0\0\0\0\0 \ldots$$ resembles the response of a resonant filter with period $2$ and time constant $6$, to an impulse signal with period $12$, while the sequence $$\1\0\1\0\1\0\1\1\1\1\1\1\0\1\0\1\0\1\0\0\0\0\0\0 \ldots$$ resembles a response of the same filter to a square wave with period $24$. It is even so that this particular scenario can be expressed in terms of generating functions, when the resonant filter is represented by a polynomial (finite), stretching the analogy even further. This system has $3$ degrees of freedom: the periodicity, the resonance frequency and the time constant. This can be implemented as the product (AND) of a pulse width modulated low frequency signal oscillator, and a high frequency oscillator which has its phase reset for each $0 \to 1$ transition of the low frequency signal. It is probably possible to combine two of those to get two formants which is enough to encode very basic vowel spectra. This requires 3 timers, which fits perfectly in the available hardware. % distorted walky talky ?? \subsection{Chaotic Oscillator} An emulation of the qualitative behaviour of a chaotic oscillator can be obtained by driving the Resonant Filter system described above with a base period that is irregular. \subsection{Pulse Width Modulation} Now, this is sort of cheating, since the purpose of PWM is really efficient switched digital to analog conversion. Using it, we give up the ``true'' binary output, and add a layer that will allow us to vary the voltage (on average) between more than 2 levels. The \verb+18f1220+ chips contain a hardware PWM module with $8$ bit resolution ($10$ bit for the enhanced version). A fixed period oscillator is used as the timing source to drive a pin high and low with a programmable duty cycle. I do wonder wether it's possible or desirable to do some computations in $\GF(257)$. I don't see much however, other than convolution using NTT. \def\sd{\Sigma\Delta} \subsection{$\sd$ Modulation} Note that if switching efficiency isn't all important, $\sd$ modulation gives much better performance for 1--bit audio digital to analog conversion. It generates higher frequency waveforms, but better low frequency resolution. It can also be used as a cheap way to do synthesis, since it behaves approximately as a reverse frequency to voltage converter: output pulse frequency is on average proportional to the input value. With multibit digital input, it is very easy to implement: the binary $\sd$ signal is the carry bit of a simple input accumulator. When the accumulator overflows, one ``packet'' of energy is transferred to the output. \subsection{Gray Code} Gray codes are (cyclic) sequences of $N$ binary digits, where each number in the sequence differs only by one bit from the previous. They can be understood as \emph{Hamiltonian paths} on a $N$--dimensional hypercube, where one moves along edges of the cube. However, the most commonly used gray code is one with a particular kind of symmetry. It is called the \emph{binary reflected} grey code or BRGC, which is a recursive scheme that traverses both $N-1$ dimensional halves of the hypercube in the opposite direction. This rule is applied recursively and terminates at the trivial $1$ dimensional case. If $\bar{s}$ is the reversal of the sequence $s$, $s : t$ is the concatenation of sequences $s$ and $t$, and $b(s)$ prepends each bit vector in the sequence $s$ with bit $b$, the BRGC sequence recursion relation is $$s_{n+1} = 0(s_n) : 1(\bar{s}_n),$$ where $s_0$ is the length $1$ sequence containing the empty vector. % the wikipedia page about this is unclear, since a 4D, 16 cycle coil is a BRGC.. maybe it's the ``constrained'' part? %% Gray codes are related to the \emph{snake in a box} problem, ore more %% specifically, the \emph{coil in a box} problem. % interesting: single track codes \section{Control and Note Sequencing} This is a whole other ballpark. With \emph{control} is meant the evolution of the configuration of the synth engine at a slower rate. This is called \emph{haptic} rate and is related to our ability to experience mechanic vibrations. The next rate down is \emph{note} rate, and roughly corresponds to the frequency of events that can be experienced visually. This cascade of time scales is called the \emph{frequency hierarchy}. The lower frequency parts are much less limited by the speed and simplicity of the hardware, since more time can be used to do actual computations. This makes it rather pointless to try to exhaust the possibilities. However, here are some general ideas. \subsection{Transient Stack} In a binary synth there is no direct way to mix sounds. One way of bringing discrete events to the forground is by temporarily interrupting stable background sounds. The typical example would be to synthesize a sequence with \emph{bass}, \emph{bassdrum}, \emph{snare} and \emph{hihat}, which are ordered here left to right with time extent decreasing. Mixing the 4 will boil down to playing the shortest sound until it is done. \subsection{Wavetable} Since there's a wavetable player, it is possible to do some control rate timbre synthesis by generating or updating wavetables. There are a lot of possibilities here. Maybe the most interesting parts are transform domain methods like the \emph{Walsh--Hadamard} transform. \subsection{Chirps} A simple form of frequency modulation using a linear ramp or a sawtooth signal is easily implemented. This can be used to generate drum-like sounds. \section{Design and Implementation} \subsection{Frequency Hierarchy} Human perception already seems to have a built in time hierarchy roughly ordered from high to low as \emph{hearing}, \emph{feeling}, \emph{seeing}, \emph{moving}, \emph{remembering}. This is just a vague description, since a lot of these time scales overlap. However, this idea of time hierarchy can easily be reflected in the way we keep time in the synth, using the concepts of \emph{sample}, \emph{control} and \emph{note} frequency. This is fairly standard practice. We're going a step further in that each of these levels has an associated language, which metaprograms the previous one. This is possible since efficiency can be traded for expressive power as we move up the time scale ladder. \begin{itemize} \item[N)] Note state updates switch control engines. \item[C)] Control state updates switch synth engines, synth parameters and signal routing. \item[S)] Synth updates produce waveform samples. \end{itemize} It's probably easiest to have the low levels pre--empt the higher ones, so computations can be spread out without explicit cooperative multitasking. There's no reason to not have cooperative multitasking on the high abstraction layer though. %% The first layer's language is plain data: all synth engines are fixed, %% only the current routine and engine, and it's parameters can be %% changes. The second layer's language is threaded forth. Current %% control word is any forth word, but actual code is still in flash %% memory. The third layer's language is a music language. \subsection{Synth Architectures} \begin{itemize} \item An asynchronous design using the $3$ high resolution timers to generate events, and the low resolution timer to drive the noise generator and the other time bases. \item A synchronous design with a software synth engine running at a fixed sample rate, from which all other time scales can be derived. \item A similar synchronous design, but using $10$ bit hardware PWM as output. \end{itemize} \subsection{Synth Patch} The idea is to make the synth as configurable as possible, but at the same time keeping the configurability contained in a small number of bits so the transient stack can be implemented efficiently. The way to do this is to eliminate all redundancy (symmetry) in the configuration data. Also the decoding of this information should be very straightforward, since it has to be computed at each synchronous or asynchronous engine event. I'll call the total configuration state, not including the engine state, as the \emph{synth patch}. The synth engine consists of 4 \emph{generators}. One noise generator \verb+[noise]+ and 3 square wave oscillators \verb+[osc0]+, \verb+[osc1]+ and \verb+[osc2]+. The oscillators and the noise generator each have an associated update period, which is part of the patch data. There state is not. These oscillators can be synced to each other as single master sync $0 \to 1,2$ or cascaded sync $0 \to 1 \to 2$. This is encoded as one bit for \verb+[osc1]+ and two for \verb+[osc2]+, leaving room for one external sync event. I'd like to parametrize the output combination of the 3 square wave oscillators and the noise generator using a limited set of boolean functions. For $4$ inputs, there are $2^{2^4}=2^{16}$ such functions. To see this, draw the Karnaugh map of a $4$ input function, and observe it has $16$ squares in total. Some things that could be divided out are: the polarity of the output, the polarity of the noise signal and the polarity of each of the input square waves. Working it from the other side, the necessary combinations to get the absolute minumum functionality are \begin{itemize} \item \verb+[xmod]+ Add (XOR) the output of 3 square wave outputs. This can be used to generate PWM. \item \verb+[reso]+ Multiply (AND) \verb+[osc1]+ and \verb+[osc2]+, and add (AND) with \verb+osc0+. \item \verb+[gate]+ Multiply (AND) the output of 3 square wave outputs. \end{itemize} % is da wel waar? \subsection{Synth Algorithms} Currently, there's only $3$. Square waves, pseudorandom noise and sample playback. Together with intermodulation schemes described above, this gives already quite a rich palette. \subsection{Misc Hacks} So evaluate $AND$ of a number of bits in a byte, mask out all the bits that are not necessary, and use the $ADD$ operation. This can also be used to evaluate expressions of the form $f = a + bc$, which occur in the \verb+[reso]+ mixer. Here I combine 3 waveforms as $f = a + bc$. Using ordinary adder logic this gives $(a,b,c) + (0,0,1) = (f,x,x)$. Adding more bits allows to propagate the result into the carry flag. Rotating oscillator period's bit rep. \section{Electronics} I think a lot can be done by combining analog and digital electronics. One of the prototypical examples is of course the analog time varying reso filter. Some other interesting things could be done by using comparator based chaotic oscillators (switched unstable systems) and time constant capturing. Entry: LFSRs Date: Thu Jun 18 11:27:53 CEST 2009 Type: tex Looking at how linear feedback shift registers (LFSRs) are implemented, one can notice that the tap coefficients feeding into the XOR have to be from an irreducible polynomial in the polynomial ring $\GF(2)[x]$. Why is this? Suppose we have polynomials modulo $x^2+x+1$. These polynomials form the Galois field $\GF(2^2)$. An LFSR shifts, which means it takes a polynomial $p(x)=p_0+p_1x$ representing the current $2$--bit state vector, and multiplies it by $x$. After shifting, the highest order term $p_1x^2$ needs to be folded back into the representation by eliminating $x^2$ using the equation $x^2 + x + 1= 0$, which follows from modulo arithmetic. Hence, LFSR is shift followed by conditional XOR, which means addition of $x+1$ if $p_1 = 1$. So, the shift represents multiplication by the generator polynomial $x$ which produces a sequence of all elements of the field's cyclic multiplicative group. The XOR computes the modulo operation after multiplication, and is there to keep the representation contained in the minimal amount of bits. %[1] http://en.wikipedia.org/wiki/LFSR Entry: Topics in Algebra Date: Thu Jun 18 15:57:54 CEST 2009 Type: tex This is a cheat sheet summary of ``Topics in Algebra'', by I. N. Herstein. \section{Groups} \begin{itemize} \item {[Order]} The order $o(G)$ of a (finite) group $G$ is the number of elements it contains. The order $o(a)$ of an element $a$ is the least positive integer $m$ such that $a^m = 1$. \item{[Right Coset]} If $H$ is a subgroup of $G$, we call the distinct sets $Hg$ with $g \in G$ the right cosets of $H$ in $G$. The subgroup $H$ together with its right cosets $Hg$ is a partition of $G$. \item{[Lagrange's Theorem]} For $H$ a subgroup of $G$, $i = o(G)/o(H) \in \mathbb{N}$. This is called the \emph{index}, or the number of right cosets. \item{[Normal Subgroup]} $gNg^{-1} = N$, $g \in G$. The right cosets of $N$ are also left cosets, since $gN = Ng$. \item{[Quotient group]} is the group formed by all the right cosets of a normal subgroup $N$ in $G$, denoted by $G/N$. \item{[Homomorphism]} The kernel of a homomorphism is a normal subgroup. Likewize, a normal subgroup defines a homomorphism. For a homomorphism of $G$ onto $\bar{G}$ with kernel $K$ we have $G/K \approx \bar{G}$. \item{[Cayley's Theorem]} Every group is isomorphic to a subgroup of the automorphism group $A(S)$, for some set $S$. \end{itemize} The key feature of normal subgroups, $Ng=gN$, enables the introduction of a group operation for the partition of cosets, because $NaNb = NNab = Nab$. % commutatieve groupen: cycli \section{Rings} \begin{itemize} \item {[Ideal]} An additive subgroup invariant under multiplication. \item {[Quotient ring]} The partition of additive cosets $R/U$ of an ideal $U$ has the structure of a ring, and is a homomorphic image of $R$. \item{[Maximal ideal]} is an ideal $M$ yielding a field as quotient ring $R/M$ for a commutative ring $R$. \end{itemize} The additive subgroup is always commutative, so the additive subgroup is trivially normal. The quotient for groups involves two groups, but for rings it involves a ring and an ideal. If the only ideals are $R$ and $\{0\}$, there the ring $R$ can no longer be simplified by applying homomorphisms and has the structure of a field. \section{Fields} \begin{itemize} \item{[Irreducible polynomial]} The splitting field $E$ of an irreducible polynomial $p(x) \in F[x]$ of degree $n$ has degree $n$ or $[E:F] = n$. As a corollary, the maximum degree of a splitting field of any polynomial of degree $n$ is $n!$, where the worst case is reached if after each extension step the polynomial has only one root, so the resulting polynomial is again irreducible. \item{[Unique splitting field]} The basis of galois theory lies in the theorem that relates the splitting fields $E$ and $E'$ of the isomorphic polynomials $f(x)$ and $f'(t)$, from two isomorphic polynomial rings $F[x]$ and $F[t]$, where the isomophism leaves the field $F$ invariant. The splitting fields $E$ and $E'$ are isomorphic with an isomorphism leaving $F$ invariant. Applying this to different splitting fields of the same polynomial, we find the splitting field's structure is unique. \end{itemize} Entry: Bandlimited Discontinuities with Infinite Response Date: Thu Jun 18 16:01:18 CEST 2009 Type: tex This post contains a paper about bandlimited synthesis, coming from an idea that sprouted somewhere in the summer of 2002, when I was re-iterating over basic DSP math involving IIR and FIR filters and the Z-transform. The remaining problems are the evaluation of exponential functions (complexity and precision), and the error analysis which is important in this case as the algorithm relies on cancelling, a technique which is generally avoided as it is noise-sensitive. However, since the framework is quite controlled, there might be some shortcuts to derive bounds. \begin{abstract} In this paper I discuss the use of a bank of damped sinusoidal oscillators to synthesize the audible band of wideband signals obtained as higher order integrals of impulse functions, i.e. step and ramp functions. The usual approach is to use FIR polyphase filtering. I show in this paper that using the corresponding IIR filtering scheme is possible due to the high amount of redundancy of the input data, which allows the use of analytical methods to construct a specialized algorithm. \end{abstract} \section{Introduction} Why are Infinite Impulse Response (IIR) filters are never used in sample rate converters? The answer is fairly simple. Finite Impulse Response (FIR) filters depend only on the input, while IIR filters depend on input and previous output. Using an FIR filter to smooth out a signal before downsampling is an efficient operation because the samples that are dropped by the downsampling operation simply never need to be computed. The approach is called \emph{polyphase filtering}. This does not work for IIR filters because their output values are computed \emph{recursively}, so discarding intermediate output values does not enable one to discard the computation, because the next computation depends on its result. Conclusion: while IIR filters are more efficient than FIR filters if no linear phase relation needs to be preserved, for sample rate conversion they are more expensive than their FIR counterparts because the polyphase simplification can not be made. End of story. Or not? \section{Analog Waveforms} In the early days of electronic sound synthesis, analog relaxation oscillator circuits were used to produce interesting broad--spectrum sounds like square waves, sawtooth waves, etc\ldots. These could then be used as inputs to time--varying filters to produce interesting sounds. Performing synthesis of these waveforms in the digital domain however is much less straightforward than one might assume at first. Naively approximating a waveform by shifting its transition points to the nearest sample points introduces jitter in the signal. This is quite audible at higher frequencies due to it's stable modulation pattern. I.e. for square waves the result is simplest to describe as a pulse width modulated square wave. Closer inspection of the analog circuit quickly indicates that the step of analog to digital conversion complicates the matter: the linear distortion caused by the Anti--Aliasing (AA) filter present in a physical system is an effect that cannot be ignored, and thus has to be synthesized. Traditionally, the effect of the AA filter is implemented as a polyphase FIR filter bank, which could be called the \emph{oversampled naive} approach. In this paper I will describe a solution for this problem using an IIR filter implemented as a parrallel bank of complex one--pole filters. The only assumption made is that the simulated analog waveforms are (integrals of) time shifted Dirac impulse signals. This will lead to an analytically tractable effect on the state of an IIR filter. This paper shows how to compute the state updates necessary to synthesize the impulse response at each sample point. \section{State Updates} Define the Dirac impulse at time $T$ as the generalized function $\delta_T(t)$ which exhibits the properties $\delta_T(t) = 0$ for $t \neq T$ and $$\int_{-\infty}^{+\infty} \delta_T(x) f(t-x) dx = f(T).$$ Define a 1--pole continuous time low--pass filter as the differential equation defining the function $y(t)$ in terms of the function $x(t)$ and the pole $s$ as $${d \over {d t}} y = s (x - y).$$ To see this is a low--pass filter, assume is the constant function $x(t) = x_0$. This gives the solution $y(t) = x_0$, meaning DC is passed. Assume we are given a model for the anti--aliasing lowpass filter mentioned in the introduction (obtained from your filter design method of choice). Using the partial fraction expansion of the transfer function of this filter, it can always be written as the sum of complex first order linear sections. This means the entire problem can be understood in terms of the simple first order case. This is what will lead to a tractable formula for the state update. \subsection{Impulse Input} What we are interested in is the function $y(t)$ evaluated at integer points, assuming for simplicity the sample rate is $1$. Denote the resulting sequence as $y[n]$. Compute the sequence by integrating the differential equation from $t$ to $t + 1$. $$y[n+1] = y[n] + s\int_n^{n+1} (x(t) - y(t)) dt. $$ What happens when we apply the filter to $x_0\delta_T(t)$ with $0 < T < 1$? There are 3 parts in the input: the impulse at time $T$, and the free--running before and after. The solution of the differential equation of the free system from $t_0$, obtained by setting the input $x(t)=0$, is $$ y(t) = y(t_0) e^{-s(t - t_0)}. $$ To simplify the following, denote $z = e^{-s}$. The impulse will just add the value $x_0$ to the state, so we have the complete state update $$y[1] = x_0 z^{1-T} + y[0] z,$$ which is a first order discrete IIR filter with a correction term depending on the location of the impulse. \subsection{Step Input} Signals composed of $\delta_T(t)$ have a very rich spectrum and thus need a strong anti--aliasing filter. Therefore it is interesting to look at piecewize constant inputs. These are closer to the waveforms that can be constructed using analog relaxation oscillator circuits. Working with these directly helps to relax requirements for the anti--aliasing filter, since these signals have less power in the higher frequencies. Here a similar approach is possible. A system with constant input $x(t) \equiv x_0$ can be obtained from the previous zero--input system by a change of variables. It has the solution $$ y(t) = x_0 + z^{t-t_0}(-x_0 + y(t_0)).$$ The state now no longer jumps instantaneously at transition points. Instead, the $x$ in the update equation changes when the input jumps to a new value. I.e. a transition from $x_0$ to $x_1$ at $0 < T < 1$ gives the state update $$y[1] = x_1 + (-x_1 + x_0)z^{1-T} + (- x_0 + y[0])z,$$ which is again a first order discrete IIR filter with a correction term. It's probably enough to stop here, and approximate the integrals of steps in the discrete domain. It won't make much difference for the specification of the anti--alias filter. \section{Exponentials} The core routine of the algorithm is the computation of a complex exponential $e^{tc}$ where $c$ is a filter pole and $0 < T <1$. The total cost is directly proportional to the frequency of the discontinuities, since there is one update per discontinuity. Note that it's probably best to separate the computation of the real and imaginary parts, so they can both have their own approximations. % At first glance this seems to be problematic, because the algorithm % resembles some random number generator. Really? The AA filter has a cutoff around the sampling frequency, so the poles $s$ are all over the unit disk. This means there is no general simplification possible for $Ts \approx 1$. This makes sense: if there would be such an approximation, the correction terms in the algorithm would be practically insignificant. Since the sample period and the filter poles are fixed, it's probably easiest to use the $N$ bit binary approximation $$e^{Ts} = z^T = \prod_{n=1}^N z^{T_n 2^{-n}},$$ where the $T_n$ are the binary digits of $T$. An alternative representation of this is $$\prod_{n=1}^N z^{(2T_n - 1) 2^{-n-1}} \prod_{n=1}^N z^{2^{-n-1}},$$ or using the constant $a = \prod_{n=1}^N z^{2^{-n-1}}$ and the relation $(2T_n-1) = (-1)^{T_n}$ this can be written more clearly as $$a\prod_{n=1}^N z^{(-1)^{T_n}2^{-n-1}}.$$ Using $N$ bits from $T$ reduces the perfect analytical solution to the oversampled naive approach mentioned before, with oversampling factor $2^N$. This seems to be the main precision trade--off point. Is it possible to update the computation? Suppose a constant input period has discontinuities spaced period $P'$ apart. The only thing we need is the fractional part $0 \leq P < 1$. Updating is indeed possible by using the constants $z^{P}$ and $z^{P-1}$, which are the \emph{forward} and \emph{backward} phase updates. The latter one is used when the phasor would wrap around. Note that these updates are not numerically stable, so they have to be reset from time to time using direct computations. \section{Resampling} This looks like a general purpose resampler for arbitrary time--varying input frequencies, which can be derived directly from it by replacing the analog input signal by a PCM waveform. My main question is, if it can be used as such, why didn't I hear about it before? Suppose the deal is to build an $N$ times downconverter. I do this again by stimulating a bank of parallel complex one--pole filters, computing the state update for each one--pole. In the case where the input signal is an oversampled pulse amplitude modulated signal $$x(t) = \sum_n a_n \delta(t - n/N),$$ the state update during one sample period is $$y(t+1) = a_{N-1} + p(a_{N-2} + \ldots p(a_0 + y(t)) \ldots ),$$ with $p = z^{-N} = \exp(-cN)$. This is a polynomial in $p$. How is this different from just running a discrete filter at a the $N$ times higher sample rate, and discarding the intermediate samples? It is in fact exactly the same, since each of the parenthized expressions of the form $p(a_i + y)$ is exactly the update equation of a discrete one--pole IIR filter. This shows that in fact, this \emph{is} indeed a general purpose sample rate converter, but it doesn't really buy anything new when the $a_i$ are not structured in a way that enables the fast evaluation of the state update. \section{Implementation Hacks} The algorithm described above is only a small part of the entire problem. This section collects some remarks about how one might go about implementing a virtual analog waveform synthesizer. \subsection{Filter design} The way we presented the problem requires us to design a continuous time (analog) filter with desired anti--aliasing characteristics. The simplest approach is to use a maximally flat filter. Rumor has it that this is best for musical applications due to the way it leaves low--frequency signals relatively untouched. If you don't like rumors, anything goes: filter design is a black art, and for the application at hand, it is a relatively independent part, and can be done off--line, with virtually unlimited computing resources. An additional constraint that might be interesting to incorporate in the filter requirements are the characteristics of the class of input signals, which generally have a $1/f$ spectrum (i.e. square and triangle waves). It should even be possible to add additional constraints to the values (quantization) of the filter poles so the complex exponentials are easier to evaluate. \subsection{Dithering} The problem with aliasing is not the noise power per se, but the (periodic) patterns present in the aliasing noise, and the difference in noise level depending on the frequency. For the naive approach the aliasing can go from nothing for subharmonics of the sampling frequency, to quite noticable ``beating'' for frequencies close to such subharmonics. In addition, the noise power increases with frequency. A classic dithering trick might be applied to lessen the requirements for the anti--aliasing filter, by randomizing the quantification noise. In this scheme it would amount to pulse position modulation: shifting the time location of discontinuities by a random value take from a rectangular or triangular PDF. The point where dithering could be introduced is in the evaluation of the complex exponentials: if the algorithm used is an approximation, we could try to ensure its error is random. This only needs to be done when the input is periodic, in which case the algorithm is an updating one. \subsection{Filter Error Analysis} Parallel IIR realization is not so good in the stopband due to round--off errors: the algorithm relies heavily on cancellation. The irony is that this is exactly what we are forced to use! So high precision calculations are required. This makes one wonder if it is possible to transform the problem to something embedded in exact integer arithmetic. \subsection{High Frequency Approximation} So where lie the real trade--off? Are higher frequencies just more expensive? First, it is not the case that correct treatment of low frequencies is not necessary. Indeed it is the case that for low frequency square and sawtooth waves the noise caused by naive synthesis can be inaudible, but often these waveforms are used in a subtractive synthesis algorithm, where the lower harmonics are filtered out, rendering the error clearly audible: it is important to synthesize a correct harmonic spectrum for all frequencies. Second, the cost of updates is directly proportional to the frequency of discontinuities. At some point it becomes cheaper to use a discrete frequency domain approach: i.e. to synthesize the harmonics directly. What's needed here is to determine the point where the proposed algorithm becomes more expensive than an oscillator approach, and a way to smoothly transition between the algorithm and a bank of oscillators. %% \subsection{Polar coordinates} %% Something i've been wondering about for so many years is why it's not %% possible to represent filters in polar coordinates. The obvious answer %% is of course the computation of the logs and exponents, and the ubiquity %% of additions that make this impractical. But is it always like that? \section{Conclusions} \begin{itemize} \item Analytically compute state updates for simple discontinuities. \item Simplify the numerical computation of such updates in the case of constant frequency inputs. \item Error analysis of parallel filter bank. \item Quantization error diffusion for pulse position. \item Use as all--purpose resampler. \end{itemize} Entry: Dynamic Wavetable Synthesis Date: Thu Jun 18 16:54:50 CEST 2009 Type: tex \def\diff{ d } \def\difft{ \diff \over \diff t } \def\pdiff{ \partial } \def\pdifft{ \pdiff \over \pdiff t } This DRAFT paper is an attempt to pick up some research which shifted to the background in 2005 because of problems integrating it into a PhD program. I am currently at the point of isolating the problems and formulating them more explicitly. A first version of model has been implemented in Pure Data around 2000-2001 as the \verb+dynwav~+ object in the Creb extension library. \subsection{Time--Phase Model} Dynamic Wavetable Synthesis (DWS) approaches the sound synthesis problem by separating it in two parts. \begin{itemize} \item Synthesis of sequences of instantaneous wave shapes in a discrete $2$--dimensional representation. \item Interpolation of this representation to produce $1$--dimensional sound signals. \end{itemize} The novelty of the technique lies in the way \emph{pitch} and \emph{timbre} data are represented more abstractly, which simplifies the expression of algorithms that relate these two elements. This abstract representation is decoupled from output signal sampling details through \emph{interpolation}. Separating these two concerns makes them easier to manage. The DWS representation greatly simplifies the use of the Discrete Fourier Transform (DFT) as a link between the time--domain waveform and its harmonic spectrum, because the model domain consists of truly periodic functions. More generally this property makes it easier to link the representation with other linear or non--linear transforms. A unified $2$--dimensional interpolation approach replaces the need for an ad--hoc short--time windowing mechanism as is used in many contemporary techniques. Such an approach is used to bridge the world of stationary models and non--stationary sounds. The use of explicit interpolation in the DWS model also makes it easier to keep frequency aliasing and phase cancelling under control, more specifically because the interpolation step can tie into the frequency domain representation of the waveform directly. % In addition to describing synthesis, this paper also hints at possible % analysis strategies. Analysis however is not trivial in the presence % of multiple dominant pitches. In general it seems best to stick to % well-proven techniques and use the DWS model only as a sound signal % sythesis engine. % To keep the exposition short I will assume familiarity with the matrix % computation and interpolation techniques used. % This paper needs to be reworked to restate the 2 problems clearly, and % relate them to the underlying model properly. It also should be stated % clearly that the subject is basicly sound synthesis, and analysis is % added as an afterthought. Some other things that are missing: i % currently see no way for a clean solution of the two main problems, % being source/filter separation and pitch detection, other than trial % and error heuristics. \subsection{Continuous Model} The basis of the DWS representation is a function $p(\phi, t)$ defined on an infinitely long torus, with $\phi$ the periodic dimension and $t$ the linear dimension. The period is most conveniently taken to be $2\pi$. This model is then related to a signal $s(t)$ through a phase trajectory $\phi(t)$ as $$ s(t) = p(\phi(t), t). $$ The instantanious frequency or \emph{pitch}\footnote{Note that a more accurate term for pitch would be `inverse of period', since the perceptual quality pitch is not purely related to periodicity. In this document we treat both terms to mean the same thing.} is defined as $\omega(t) = {\difft} \phi(t)}$. This model is an alternative to time--frequency modeling, and allows us to represent pitch and timbre independently. One could call it ``time--phase modeling''. We define \emph{timbre} as the information associated to a single time instance $t_0$. This is associated to a function on the circle as $P_{t_0}(\phi) = p(\phi, t_0)$. \subsection{Discrete Model} A discrete version of the model can be constructed by choosing $N$ to indicate the number of points to sample on the circle, and performing the sampling along the $t$ axis at multiples of $H$ for the timbre and multiples of $T$ for phase trajectory and associated signal. This gives the sequences $$ \begin{array}{rcl} p_{kl} &=& p(2\pi l / N ,kH) \\ \phi_n &=& \phi(nT) \\ s_n &=& s(nT). \\ \end{array} $$ We define $P_{k}$ as the vector with $N$ components obtained from $p_{kl}$ with $l$ ranging from $0$ to $N-1$. Note that $H$, the ``haptic'' period that fixes the update step of the timbre information, can be much larger than $T$, the signal and phase update step. An important note is that there is no \emph{direct} relationship between the timbre sequence $p_{kl}$ and pitch trajectory $\phi_n$ on the one hand, and the signal $s_n$ on the other hand. They are \emph{only} related through the continuous functions $p(\phi, t)$ and $\phi(t)$. This means that \emph{interpolation} is an essential part in reconstructing $s_n$ from $p_{kl}$ and $\phi_n$. We will come back to the techincal details and inaccuracies of interpolation later, but will assume for now that it is possible to switch between the discrete and continuous model without loss of information. This will allow a slight abuse of notation in that $P_k$ (the concrete representation) implicitly refers to a region in the function $p(\phi, t)$. A modeling technique based on interpolation seems to be a drawback. However, it is the main goal of this paper to convince you that including this indirection and as a result being able to make the sampling for $p(\phi, t)$ fixed is a workable trade--off, since it greatly simplifies manipulation of $P_k$, which we are most interested in. \subsection{Frequency Domain} The frequency domain representation of $p$, defined by computing the Fourier Transform (FT) of $p(\phi,t)$ over the phase dimension $\phi$, allows the DWS model to be related to a sum--of--sines model. For each $l \in \ZZ$ this gives a fuction which represents the phase and amplitude of harmonic $l$ at time $t$. $$c_l(t) = {1 \over {2\pi}}}\int_0^{2\pi} \exp(- j l \phi) p(\phi, t) \diff \phi.$$ The inverse is given by $$p(\phi, t) = \sum_l c_l(t) \exp(j l \phi).$$ Interpolating the latter using the phase trajectory $\phi(t)$ gives a representation for $s(t)$ decomposed as a sum of harmonicly spaced, phase and amplitude modulated complex exponentials $$s(t) = \sum_l c_l(t) \exp(j l \phi(t))\\.$$ Writing $c_l(t) = a_l(t) \exp(j\phi_l'(t))$ with $a_l(t) \in \RR$ and setting $\phi_l(t) = \phi(t) + \phi_l'(t)$ gives the alternative formulation $$s(t) = \sum_l a_l(t) \exp(j l \phi_l(t)),$$ which models $s(t)$ as a sum of amplitude and phase modulated complex exponentials without the harmonicity relationship. The term ${\difft} \phi_l'(t)$, which is the rotation speed of the complex spectral components over time, can then be viewed as frequency offset of each exponential to the ideal harmonic frequency $n {\difft} \phi(t)$. This representation (harmonic + per component frequency offset) is useful later to introduce slight inharmonicities in mostly harmonic models, related to frequencies of beating patterns. \section{Relation to Existing Techniques} The DWS model is related to several other sound synthesis techniques. \begin{itemize} \item By eliminating time variation of the timbre ${\pdifft} p = 0$ we get ordinary wavetable playback. \item The DWS model is a variant of the Scanned Synthesis model, where $P_k$ is produced as the output of a discrete dynamical system. \item The DWS model can be related to the harmonic and inharmonic sum--of--sines model, by moving inharmonicities to the timbre sequence $P_k$. \end{itemize} The relation to ordinary wavetable synthesis isn't so interesting as it is fully contained within the DWS model. Relation to sum--of--sines was already explained, and is an essential element of DWS. It has to be stressed though that DWS does not work well for sounds which are not largely harmonic (harmonic + small phase differences) due to phase cancellation effects. This will be explained further in the section about interpolation. DWS is most related to Scanned Synthesis, and it could be argued that both techniques essentially talk about the same subject, be it from a different perspective. \subsection{Scanned synthesis} If we take the vector $P_k$ to be the output of a dynamical system sampled at $k$ rate (the haptic period $H$), we get the method of scanned synthesis. The most general form for a such a dynamical system is $$\matrix{X_{k+1} \\ P_{k}} = \psi(\matrix{X_k \\ U_k}),$$ where $X_k$ is the state vector, $U_k$ is the system input and $P_k$ is the system output. When the state update function $\psi$ is linear this is ususally represented by $$\matrix{X_{k+1} \\ P_{k}} = \matrix{A & B \\ C & D} \matrix{X_k \\ U_k},$$ where the poles of the system are determined by the eigenvalues of matrix $A$. This representation is not unique: we can transform $X_k$ to another base $X_k' = Q^{-1}X_k$ using an invertable matrix $Q$ with columns representing the new base for the state vector. The new system matrices then become $A' = Q^{-1}AQ$, $B = Q^{-1}B$, $C'=CQ$ and $D' = D$. If the eigenvalue decomposition of $A$ exists, the equations can be transformed using $Q=S$, with $S$ the matrix containing the eigenvectors of $A$. This makes $A'$ diagonal, containing the eigenvalues of $A$. Let's look at the diagonal form for a system without inputs, which means originally $B=0$ and $D=0$. With $S$ the matrix composed of eigenvectors as columns and $E = S A S^{-1}$ a diagonal matrix containing the corresponding eigenvalues, the system $$\matrix{X_{k+1} \\ P_{k}} = \matrix{E \\ S} \matrix{X_k}$$ can then be interpreted as a bank of oscillators which is a mix of the ``eigen waveforms'' contained in $S$. The mixing amplitude and phase is directed by $k$ independently decaying envelope functions. A special case arises when $A$ is circulant, meaning $a_{ij} = a_{i'j'}$, with $i' = (i-1) \mod N$ and $j' = (j-1) \mod N$. It ensures that each element of $X_k$ has the same relationship to its neighbouring elements, with all elements arranged on a circle. If $A$ is circulant it can be diagonalized by the discrete Fourier transform, with $Q = F$, a matrix with columns made up of DFT base functions. This means the elements of the state vector $X_k$ represent the instantanious amplitudes and phases of the DFT base functions. When all eigenvalues of $A$ have a magnitude strictly smaller than $1$, the signal associated to this model after interpolation is a sum of harmonicly spaced damped exponentials. \section{Interpolation} Reconstructing the function $p(\phi,t)$ from a sequence of $P_k$ and obtaining a discrete signal $s_n$ from the phase trajectory $\phi_n$ is hindered by two phenomena. \begin{itemize} \item Frequency aliasing of $s(t)$. Suppose it is possible to perfectly reconstruct $p(\phi,t)$ from $P_k$ and $\phi(t)$ from $\phi_n$. The resulting signal $s(t) = p(\phi(t),t)$ might contain frequency components that cannot be represented in the discrete domain and will create aliasing distortion. \item Reconstruction of $p(\phi, t)$ from $P_k$, in the case where $P_k \in \CC^N$ might be hindered by phase cancellation phenomena in the case where naive linear interpolation is used. This happens when sounds are inharmonic. This is related to undesirable modulation problems in windowed sinusoidal modeling. \end{itemize} The challenge is to divise an interpolation strategy that minimises the error due to these two effects. In general the solution is to filter the $P_k$ before interpolation and to minimise phase cancellation by interpolating the phase trajectories instead of the amplitudes. The former is relatively straightforward. The latter is quite difficult as it is expensive to perform. Both problems are straightforward to solve in the frequency domain (DFT of $P_k$). The challenge lies in determining a proper sampling point \emph{inbetween} successive $P_k$. This largely depends on the contents of the $P_k$ and the sampling period determined by $\phi_n$. Another question is: for inharmonic sounds, does it \emph{really} matter so much to get phases wrong here and there? There is already a lot of cancelling and amplitude modulation going on as a consequence of the inharmonicities. Not getting this exactly right might not be so problematic after all. \section{Formant Structure} An important drawback of wavetable based synthesis is the coupling between pitch and \emph{formant} structure, which refers to the spectral envelope of a signal, which can be properly defined as the result of low order \emph{linear prediction}. In a pure synthesis framework this relation can be made explicit. Simply make sure that the $P_k$ are postprocessed with a spectral envelope which cancels the phase trajectory frequency, keeping the resulting spectral envelope of $s_n$ under control. Since this is rather trivial it will no longer be discussed. However, can this process be automated by performing operations on $P_k$ in a formant--neutral way? It is straightforward to \emph{stretch} a formant spectrum by moving the poles around. This raises the question about placing standard linear prediciton modeling techniques in a \emph{periodic} framework. There are probably a lot of shortcuts that can be exploited by involving the DFT. \subsection{Periodic Linear Prediction} Entry: Implementing Fermat Prime order Galois Fields Date: The Jun 18 22:48:20 CEST 2009 Type: tex \def\F{\mathcal{F}} \def\R{\mathcal{R}} \section{Introduction} Some words on the ring of integers modulo $2^{2^{k+1}}-1$, which is easily implemented using ordinary hardware operations. Such a ring can be used to implement the finite fields with $2^{2^{k}}+1$ elements, for $0 \leq k \leq 4$. \section{Embedding $\F_a$} Let $\F_a$ denote the field of integers modulo the prime $a$, and $\R_a$ the unital commutative ring of integers modulo $a$, not necessarily a prime. We want to find out how to embed a field $\F_a$ in a ring $\R_{ab}$, where $b$ is not necessarily prime. Consider the map $$f : \R_{ab} \to b\R_{ab} : x \mapsto x b e = x 1_a,$$ which is a map from $\R_{ab}$ to its ideal $b\R_{ab} = \{ x b : x \in \R_{ab} \}$. If we can find an element $1_a = be$ in $b\R_{ab}$ such that $1_a ^ 2 = 1_a$, this map is a ring homomorphism. If $1_k \neq 0$, this map is nontrivial, and it gives a field structure to the elements of the ideal $b\R_{ab}$. The fact that $f$ is a ring homomorphism follows from $f(x)+f(y) = x 1_a + y 1_a = (x+y) 1_a = f(x+y)$ and $f(x)f(y) = x 1_a y 1_a = x y 1_a^2 = x y 1_a = f(xy)$. If we require $0 < e < a$ we have $1_k \neq 0$, and all the $a$ elements in $b\R_{ab}$ will be reached by $f$. The kernel $K_f$ of $f$, determined by $\{x : x1_a = 0\}$, has $b$ elements, and is identified as $a\R_{ab}$, the ideal generated by $a$. This ideal is maximal since $a$ is prime, so the quotient $\R_{ab} / a\R_{ab}$ is a field containing $a$ elements, which is essentially the same as $\F_a$. The canonical decomposition of $f$ seen as an endomorphism of $\R_{ab}$ can be decomposed as $$f = e \circ i \circ h,$$ where $$h : \R_{ab} \to \R_{ab} / a\R_{ab} : x \mapsto x + a\R_{ab}$$ is the homomorphism onto the quotient field which introduces the field structure, $$i : \R_{ab} / a\R_{ab} \to b\R_{ab} : x + a\R_{ab} \mapsto x1_a$$ is the isomorphism from this quotient field to the ideal generated by $b$, and $$e : b\R_{ab} \to \R_{ab} : x1_a \mapsto x1_a $$ is the embedding of this ideal in the ring. This is the embedding we're after, it carries the field structure introduced by $h$ and preserved by $i$ over to a subset of $\R_{ab}$. \section{Mersenne factors} We call the number $M_{k} = 2^{2^k}-1$ a Mersenne number and $F_{k} = 2^{2^k}+1$ a Fermat number. For $k \geq 0$ we have $$M_{k+1} = F_{k} M_{k}.$$ For $0 \leq k \leq 4$ the $F_{k}$ are prime, this makes $M_5 = 2^{32}-1$ rather special, as the product of the first 5 Fermat primes. \begin{align*} 2^{32}-1 &= (2^{16}+1)(2^{16}-1) \\ &= (2^{16}+1)(2^{8}+1)(2^{8}-1) \\ &= (2^{16}+1)(2^{8}+1)(2^{4}+1)(2^{4}-1) \\ &= (2^{16}+1)(2^{8}+1)(2^{4}+1)(2^{2}+1)(2^{2}-1) \\ &= (2^{16}+1)(2^{8}+1)(2^{4}+1)(2^{2}+1)(2^{1}+1) \\ \end{align*} Note that this is the end of the known chain for prime Fermat numbers. As of this time, for $n\geq5$ all \emph{tested} Fermat numbers $2^{2^n}$ are composite. Wether there are more Fermat primes is not known, and not really of interest to us now. \section{Circular Addition} Why are these Mersenne numbers interesting? Addition modulo $M_{k}$ can be seen as \emph{circular binary addition}. Looking at the structure of a binary adder, each bit is connected to the last using a \emph{carry bit}. By linking the carry output of the most significant bit's adder to the carry input of the least significant bit, one obtains addition modulo $M_k$, with $2^k$ the number of linked adders. In case $N=2^k$ is the wordlength of a machine, or a multiple thereof, the operation of reducing a number of $2N=2^{k+1}$ bits to one modulo $M_{k}$ is simply \begin{align*} c 2^N + x &= c(2^N - 1) + c + x\\ &= c + x \end{align*} which is very cheap to do if $N$ is a multiple of the machine wordlength. For addition of two numbers modulo $\mod M_k$, the result has $N+1$ bits, with $c$ the carry flag. For multiplication, the result has $2N$ bits. This results in the instruction sequences ``ADD,ADDC'' and ``UMUL,ADD,ADDC''. So, implementing the ring $\R_{M_k}$ is efficient. This is not so interesting in itself, since addition modulo $2^{2^n}$ is a ring we get for free. However, this does enable us to implement the field $\F_{F_{k-1}}$ in an efficient manner. [Note] This does not reduce $111\ldots1$ to $000\ldots0$, so we have two elements representing $0$. \section{Algebraic structure of $\F_{F_k}$} \def\R{\mathcal{R}} The multiplicative group of the fields $\F_{F_k}$ is the cyclic group of order $N=2^{2^k}=F_k-1$ and is thus highly composite. This has the advantage that a fast radix $2$ FFT-like algorithm exists to compute the number theoretic transform $$X_n = \sum_{k=0}^{N-1} x_n \omega^{-nk},$$ with $\omega$ a primitive root of unity. Here $\omega$ is a generator of the multiplicative group, meaning the smallest positive integer $k$ satisfying $\omega^k = \omega$ is $N=2^{2^n}$. The inverse transform is given by $$x_n = N^{N-1}\sum_{k=0}^{N-1} X_n \omega^{nk}.$$ \section{Finding $1_{F_k}$} Following our general notes, it is possible to represent $\F_{F_k}$ using the ideal $M_k \R_{M_{k+1}}$ of the ring $\R_{M_{k+1}}$ if we can find the unit element $$1_{F_k} = M_k e_k.$$ In order for this unit to yield the homomorphism $f_k$ which will lead us to a representation of the field $\F$, we need $$1_{F_k}^2 = 1_{F_k} \mod M_{k+1}.$$ All terms, including the modulo divisor, are a multiple of $M_{k}$, so we can safely divide out this number yielding the equation $$M_k e_k^2 = e_k \mod F_k.$$ Since $F_k$ is prime, we can take inverses, so we can multiply both sides by $(M_ke_k)^{-1}$, which gives the solution $$e_k = (M_k)^{-1} \mod F_k.$$ This gives us $1_{F_k}$. We can do better and find an explicit expression by observing that \begin{align*} e_{k} &= (F_k - 2)^{-1} \\ &= (-1) 2^{-1} \\ &= 2^{2^k - 1} \end{align*} using $(-1)^{-1} = -1 = 2^{2^k}$, all modulo $F_k$. This gives a nice expression for the unit $$1_{F_k} = 2^{2^k - 1} (2^{2^k} - 1).$$ In binary this gives \begin{align*} 1_0 &= 01 \\ 1_1 &= 0110 \\ 1_2 &= 0111.1000 \\ 1_3 &= 0111.1111.1000.0000 \\ 1_4 &= 0111.1111.1111.1111.1000.0000.0000.0000 \end{align*} % is dit eigenlijk wel uniek? lijkt me van wel, maar ben niet helemaal zeker Entry: Matrix Decompositions Date: Fri Jun 19 10:57:12 CEST 2009 Type: tex \subsection{Comparing Matrices} There are different ways in which matrices can be \emph{alike}. The following talks about 3 relations called \emph{equivalent}, \emph{congruent} and \emph{similar}. Two matrices $A$ and $C$ are equivalent if there exists non--singular matrices $R$ and $Q$ such that $$C = RAQ.$$ Two matrices are equivalent iff they have the same rank. This is related to Gaussian elimination for solving linear systems and computing a matrix inverse. If $R=Q^{-1}$ the matrices are \emph{similar}. They represent the same linear transform, but in another basis. All square matrices are similar to their Jordan form. Normal matrices are similar to a diagonal matrix. A consequence is that similar matrices have the same eigenvalues. If $A$ and $C$ are hermitian, and $R=Q^T$, the matices are \emph{congruent}. They are both congruent to the same signature matrix, so have the same signature, and represent essentially the same quadratic form, but in another basis. If two matrices are \emph{both} similar and congruent, they are called orthogonally similar and $R=Q^T=Q^{-1}$. Hermitian matrices are orthogonally similar to a real diagonal matrix. \subsection{Matrics Semantics} When studying matrix algorithms, it is important to keep in mind what they represent. \begin{itemize} \item Both the normal form for the eigenvalue decomposition and the more general Jordan form study the \emph{linear transform} related to a matrix, by performing a change of basis. Here a the matrix represents a linear function $f(x) = Ax$. \item The normal form for the congruence transform studies the \emph{quadratic form} associated to a symmetric or hermitian matrix, by transforming a change of basis. More generally, a symmetric matrix $A$ can represent a \emph{bilinear form} $f(x,y) = x^TAy$. Likewise a \emph{symplectic form} is represented by antisymmetric matrix. \end{itemize} %[1] http://mathworld.wolfram.com/EquivalentMatrix.html Entry: Matrix Multiplication Date: Fri Jun 19 10:58:12 CEST 2009 Type: tex % GOLUB & VAN LOAN: % 3 manieren om matrix vermenigvuldiging te zien % * inner product A^T B % * outer product A B^T % * lineair combination of columns of A / rows of B % dus: Ax = b % MATRIX = operator, acting on (row) vector x on its left % -> rows are linear functionals, mapping vector x to a coordinate of b (number) % -> columns are base vectors, coordintes of x map these columns to vector b % exchange all these left--right row--column: x^T A^T = b^T % MATRIX = dual operator, acting on (column) vector = linear functional x^T on its right % -> columns are linear vectors, to which the linear functional x is applied to give a coordinate of b^T % -> rows are linear functionals, coordinates of x combine these to functional b^T % in mensentaal % vector = pijl % linear functional = coordinaatextractor % hetgeen wat alle problemen veroorzaakt: % -> linear functional = ook een vector % -> matrix, interpreteerbaar als een set van functionals, of een set van vectoren, is ook een vector! % matrices (lineare transformaties) zijn vectoren die je niet alleen kan optellen, maar ook kan vermenigvuldigen! % aint this fun :) % toch vreemd he, hoe zo'n absurd ding als matrixvermenigvuldiging zo % verwoven kan geraken met automatisme zodat ge eigenlijk niet meer % weet wat ge aan het doen zijn. en als ge gaat kijken wat ge aan het % doen zijt blijkt het verschillende interpretaties te hebben. of hoe % wiskunde een soort middenweg kan zijn, die verschillende dingen die % duidelijk verschillende dingen lijken samenvat in 1 ander abstrakt % ding met de verschillende dingen als eigenschappen.. dus dit hoort % hier eigenlijk niet echt thuis, maar het is een mooie oefening om % mijn intuitie door de jaren opgebouwd eens proberen te vatten in % woorden. vrij moeilijk. The algebra of finite dimensional linear operators is associative. This means that the objects that \emph{operate} and the objects that are being \emph{operated on} are somehow interchangable, because of the law of associativity, as shown in $$ABx = A(Bx) = (AB)x.$$ Semantically, this has quite some implications. If $A$ and $B$ denote linear transforms operating on the object on their right, and $x$ denotes a vector, the above means the transform of a transformed vector is equal to a transformed vector, with the latter transform the combination of two other transforms. % Maybe it's just because I've been out of the loop for a while, and I % am re--discovering the beauty of math in general and the beauty of % linear algebra in particilar, but in this particular case, the % frivolous interplay of the \emph{concrete} and the \emph{abstract} % is pretty amazing. Reality really changes by finding a good way to % describe it\ldots Instead of talking about vectors and transforms in an abstract way, it is possible to make matters more \emph{concrete} by fixing a basis and working only with coordinates taken from the field from which the vector space is constructed. For linear transforms these coordinates are expressed as matices. Probably the reason why engineers like matrices is that in some sense it is simpler to talk about structured collection of numbers and rules for addition and multiplication, than about abstract entities. However, we should never forget that it sometimes makes sense to talk about the objects they represent more abstractly\footnote{ To me, and to most people I know, \emph{understanding} basic linear algebra, next to the opaque rule of matrix multiplaction has always been about putting it in terms of simple geometry. To see what they \emph{do}. Points, line segments, cubes and spheres transformed under rotations, translations, reflections, scalings and projections, and combinations thereof. A common pitfall in thinking about rotations is to see (elementary) rotations happen around an axis. Rotations really happen in a plane. This is clearest to see in $2$ dimensions, since there is not much left to rotate around, except for a single point. Only in $3$ dimensions there is always a $1$--dimensional subspace invariant under the rotation, also called the axis. When we go to $4$ dimensions, there is a $2$--dimensional subspace invariant under (elementary) rotations. It is even possible to combine two independent elementary rotations. Independent in the sense that they commute, which is not true for rotations in general. This is equivalent to saying a $4$--th order linear system can have $2$ distinct eigenfrequencies. An interesting thing to visualize is the nilpotent transform, which can be seen as a rotation/reflection followed by a projection, where the rotation is such that the space onto which is projected does not contain a subspace which is invariant under the rotation/reflection. In other words, the projection and rotation work as ``squeeze'' and ``turn'' such that when both operations are iterated a couple of times, the result eventually shrinks to a point. In two dimensions this can be visualized as collecting bread crums spread out over the surface of a table into a pile in the following way. Rotate the table $90$ degrees, wipe all the crums to the left side using the ruler. Then do this again. Of course, nilpotent transforms are much easier visualized as shifts of coordinates of vectors, where on each shift, one or more things disappear into the void.} because the definition of matrix multiplication $$c_{ij} = \sum_k a_{ik} b_{kj}$$ is not very intuitive\footnote{There is unfortunately just one thing to do with this: figure out what it means when you see two of these grids next to each other, and train yourself to do the manipulation without thinking about how to do it. For me it is a very visual thing: I move the columns of $B$, rotate them $90$ degrees counterclockwize so they align with the rows of $A$. Then, in close proximity, they \emph{zip up} into a single number.}. Here $k$ ranges from $1$ to the \emph{width} of $A$, with elements $a_{ij}$ on row $i$ and column $j$. This range is equal to the \emph{hight} of $B$. In the following I will use the ``engineering shortcut'' of talking about vectors, linear forms and linear transforms as if they were the same as matrices of dimension $N \times 1$, $1 \times M$ or $N \times M$ respectively. It is important to see that mathematically, they are \emph{distinct} objects. The algebra of matrices \emph{behaves as} the algebra of abstract linear operators because both are related by an isomorphism tied to a particular basis in the abstract vector space. In matrix algebra the \emph{transpose} operation represents the endomorphism of a (finite) abstract vector space which relates vectors and linear forms. The ease at which the matrix multiplication formula can be re-interpreted is exactly because of the existence of these morphisms. % oppassen met algemeenheden: tis niet noodzakelijk allemaal % orthogonaal! maar dat zeg ik ook niet.. similar tot jordan vorm ist % wel ok An interesting thing to observe about the matrix multiplication rule is that it is recursive. This means you can chop 2 matrices you want to multiply in rectangular pieces, give the pieces a name, and apply the multiplication rules to the pieces. The result is the same, only less or more \emph{unrolled}. There are only 2 things that need to be taken into account. One is that multiplication is not commutative, and two is that the number of horizontal pieces on the left and the number of vertical pieces on the right are the same. So, we can interpret the $a$'s, $b$'s and $c$'s in the matrix multiplication formula not only as scalars, but as matrices or vectors, taking into account that we loose commutativity doing so. % This is an illustration of the fact that matrices can be defined % over rings. This observation allows us to cast some light on the different operations a matrix multiplication can represent. Let's start our journey with $$Ax = b.$$ This is usually associated with a set of $M$ linear equations in $N$ unknowns, but only if we really don't know $x$. The matrix $A$ has dimensions $M \times N$, and both $x$ and $b$ are row vectors (matrices with width 1) of dimension $N$ and $M$ respectively. There are several ways to look at this. The most obvious one is the more abstract one: the matrix $A$ is the representation of an operator, operating on the vector $x$ on its right. The result of this operation is another vector $b$. Now, taking a closer peek at the internals of $A$, we recognize it is built out of $N$ columns of numbers. Each of the columns of $A$ can be seen as a vector so we give column $i$ the name $a_i$. This enables us to write the previous formula using the notation of submatrices as $$\matrix{ a_1 & a_2 & \ldots & a_N } \matrix{ x_1 \\ x_2 \\ \vdots \\ x_N } = b.$$ Here the $a_i$ and $b$ are $M$--dimensional vectors, and the $x_i$ are scalars. Working out the matrix product using the usual rules gives us $b = \sum_i a_i x_i$. Moreover, because $x_i$ are scalars we have also $b = \sum_i x_i a_i$. So $b$, a column vector, is a linear combination of the vectors $a_i$, which are in our case the columns of $A$. This expresses $b$ (vector) in terms of coordinates $x_i$ (number) with respect to the basis $a_i$ (vectors). The space spanned by all possible combinations is called the column space. % \footnote{ Note % that this looks like an interior product. When $M = 1$ the $a_i$ are % scalars, and we have an interior product between two % $N$--dimensional vectors. Can we interpret the case $M > 1$ as an % interior product? Depends on what we mean by interior product. If % we interpret interior product as a sort of \emph{average} combined % with a \emph{scaling}, then indeed, the analogy holds.}. % Note, this gives a nice interpretation of what orthogonality % means. If you take two strings of numbers, $a$ and $b$, and you take % the average of the first string according to a weighing defined by % the second string, both are said to be orthogonal if the weighing % yields an average of zero: $a^T b = 0$. % This has the nice interpretation of distance and mass. Suppose $a$ % represents the masses of a collection of fruits, $b$ represents a % possible arrangement of the fruits on a scale, indicating distance % (positive or negative) to the suspension point. The scale is % balenced if and only if the vectors $a$ and $b$ are orthogonal. Another way to chop up $A$ is like $$\matrix{ a_1^T \\ a_2^T \\ \vdots \\ a_M^T } x = \matrix{ b_1 \\ b_2 \\ \vdots \\ b_M }.$$ Here $a_i^T$ are the rows of $A$, written in terms of the column vectors $a_i$, the columns\footnote{I've found it benificial in most cases to represent row vectors as the transpose of column vectors, not to confuse my visual system too much when recognizing formula patterns.} of $A^T$. This arrangement expresses the scalar elements of $b$ as inner products between $x$ and rows of $A$, or $b_i = a_i^T x$. Written as this, $x$ takes the average of the elements on each row of $A$ independently, yielding $M$ scalars. Another way to put this is that the rows of $A$ represents linear functions (functionals) mapping vectors to numbers, and each row applied to $x$ gives one element of $b$. Once you can interpret the formula of matrix-vector multiplication, it easily generalizes to matrix-matrix multiplication $AX = B$, with $X$ and $B$ matrices containing columns $x_i$ and $b_i$ respectively. Which gives $$\begin{array}{lcl} AX & = & A \matrix{x_1 & \ldots & x_K}\\ & = & \matrix{Ax_1 & \ldots & Ax_K}\\ & = & \matrix{b_1 & \ldots & b_K}\\ & = & B. \end{array} $$ Matrix multiplication on the left can be seen as an independent transform of all the \emph{columns} of the matrix it operates on. This can be seen as a function which maps a matrix $X$ to a matrix $B$. Note that the only thing we are doing is still to chop matrices up into parts, and express the relationships of the parts by looking at the matrix multiplication formula from different angles. Taking the transpose of the expression above gives the dual interpretation: a matrix operating on the right independently transforms the \emph{rows} of a matrix\footnote{For me it makes sense to only think in terms of ``operator is on the left, all columns of the operand are operated on independently'', instead of mixing the left--right and row--column stuff. Because of non--commutativity, left and right \emph{does} matter a lot, but luckily, they are related by the transpose (or Hermitian conjugate).}. % \footnote{ Note, that the dual of a vector refers to a vector in the % space of linear functionals, the space of all linear transforms from % the vector space to the base field. Since we work in the less % abstract settings of $\RR^N$, vectors are column vectors, while their % duals, the linear functionals, are row vectors, transposes of column % vectors. Application of a functional is nothing but the % multiplication of $a^T$ with a vector $x$, or $a^Tx$, which is the % inner product of vectors $a$ and $x$.}. % Strange how the brain does not automaticly incorporate this kind of % symmetry. Well, you can learn it though. % Note that operator on the left/right relates to computer languages % like LISP or FORTH. Let's interpret a multiplication differently, as inner and outer products. Take vectors $a$ and $b$, both unit norm, so we can concentrate on the effect of the angle between these vectors. Their length has no real effect on products, other than a change of scale, so we can safely leave them out of the picture. We form the inner product as $b^Ta$. This is a scalar, equal to the cosine of the angle between both vectors. If this product is zero, $a$ and $b$ are said to be orthogonal. The inner product of a vector and itself is its norm squared. In our case, norm is $1$, meaning angle is $0$, meaning both vectors are the same. The outer product $b a^T$ is quite different. It is a linear operator (a matrix) which represents a projection on the space spanned by $a$, followed by a rotation in the direction of $b$. We can also call this a reflection, since there's no handedness to preserve in a 1--dimensional space. Rotations (even number of reflections) and reflections (odd number of reflections) have essentially the same effect. The outer product of a vector and itself, $a a^T$ is a projection operator onto the space spanned by $a$. % Now, we generalize this to multiple dimensions. % So, things to remember. A matrix multiplication is: a collective % transform of a (bunch of) vector(s), or a recombination of a bunch % of vectors or scalars. % A matrix multiplying from the left operates on columns % independently. A matrix multiplying from the right operates on rows % independently. Let's have a look at inverses. With $B^T = A^{-1}$ we have $B^T A = I = AB^T$. So the columns of $B$ and $A$ form mutually orthogonal basis sets. For one column $b_j$ of $B$ this means it is orthogonal to all the columns $a_i$ of $A$ with $j \neq i$. And we have $b_i^T a_i = 1$. What does this mean? Suppose we write $A A^{-1} = A B^T$ as $$ \matrix{a_1 & \ldots & a_N} \matrix{b_1 & \ldots & b_N}^T $$ This gives us $I = \sum_k a_k b_k^T$, a complicated expression for doing nothing! Digging into this we can, given any independent set of vectors $a_i$, write any vector $x$ in terms of it, by projecting it on a dual basis $b_i$. In other words, associated with any basis $a_i$, there is a set of linear functionals $b_i$, which when applied to a vector as $b_i^Tx$, give the cordinates of $x$ in the new basis $a_i$, yielding the expression $x = \sum_k a_k (b_k^T x)$. What can we learn about this? What is this duality all about? In fact, it gives us always two ways of looking at the same thing. For example, we saw above that $A^{-1}$ is just a disguised form of some other matrix $B^T$. An inverse can always be interpreted as the dual of some other matrix, where this other matrix contains the dual basis for the columns of the original matrix. Stretching this a bit, a matrix can either be interpreted colum-wize, as a collection of base vectors, where the vector it operates on is a set of coordinates Or it can be interpreted row--wize, as a collection of linear functionals producing a set of coordinates of some basis. Operating \emph{on} a matrix from the left changes rows independently, from the right changes columns independently. Conjugate (transpose, Hermitian conjugate) exchanges the dual views. % Next to that, we can chop it up into blocks to aid our understanding % of certain properties and operations. % Of course, in the abstract setting, coordinates are always relative to % some basis. Matrices are defined exactly in this matter, as grids of % coordinates, and \emph{true} vectors are coordinate free % entities. This doesn't remove the fact that a matrix, a block of % numbers, \emph{behaves} as a vector, so we can freely interpret it as % we like, either as a vector being acted on, or as a set of coordinates % explaining one vector in terms of other vectors. It's this footloose % property that makes linear algebra so powerful for formulating % real-world phenomena, while remaining very earth--bound, because it's % just a pack of numbers in the end. So, how to read matrix multiplications? When you see the same matrix on both sides of another one, there is both a row \emph{and} a column transform going on. In other words, there is a transform to another basis and back again. There are basicly two types: a congruence transform when the expression is about the signature of a quadratic form, and and a similarity transform, when the relative scaling (eigenvalue decomposition) is studied. For example the similarity transform $X' = Q X Q^{-1}$, with $Q$ regular. The columns of $Q$ represent a basis. So, building upon our investigation of what an inverse actually is, the first effect of $X'$ is to yield the coordinates of the vector it operates on in terms of the columns of $Q$. Then, in this new coordinate system, the operation $X$ is performed on the coordinates, after which, the coordinates are used to build a vector using as basis the columns of $Q$. When the eigenvalue decomposition exists, $X$ can be made diagonal, so the effect of $X'$ can be decoupled into a set of independent actions on the coordinates in the basis represented by the columns of $Q$, which in this case are the eigenvectors. If you see $Q X Q^T$ then you can safely bet that $X$ is symmetric, and we are mostly interested in its signature, i.e. the kind of quadratic form the matrix $X$ produces. Entry: Math and Music: Harmony vs. Melody Date: Fri Jun 19 16:04:53 CEST 2009 In a talk with Axel yesterday the harmony / melody topic came up. Axel's point was that it might be better to study harmony or a harmonious instrument (one that can play chords like guitar, piano) before studing melody. My remark was that it's strange to see melody from a mathematical point, because if you look at frequency ratios, what you see first is harmony. Then in the background of this one could develop tone scales using harmonic intervals, but it's never possible to really decouple melodic structurs from the harmonic background. Axel remarked that in the history of (western?) music there was this move from harmony -> melody -> re-interpreting harmony in terms of melody. That in classical music, harmony is somewhat implicitly understood with melody brought to the front. Entry: Complex Number Wave Shaping Date: Sat Jun 20 15:25:27 CEST 2009 Type: tex The idea present in this paper is related to the operations of \emph{modulation}, \emph{wave shaping} and \emph{sample playback}. These can be represented respectively by an injection (or a curve) $\RR \to \MM$ where $\MM$ is a manifold, a transformation $\MM \to \MM$ and a projection $\MM \to \RR$. Combining these operations allows the characterization of sound signals as $\RR \to \RR$ maps. \section{Complex Numbers} Before going to exotic structures, let's have a look at $\MM=\CC$. A sinusoidal oscillator can be modeled using the curve $f(t) = e^{i\omega t}.$ The trivial projection onto $\RR$ is to take the real or imaginary part. Using these this we can concentrate on transformations of $\CC$. Sticking to transformations that can be implemented directly using arithmetic functions we get addition, multiplication and function composition which lead us to polynomials, and division which leads us to infinite power series. % This paper is mainly intended as an illustration of the elegance of % working with complex signals, as opposed to the more cumbersome real % signal approach. \section{Polynomials} Polynomials in $\CC$ are closed under addition, multiplication and function composition, so it is interesting to look at what we can do with them. Performing $f(x) = x^2$ on $e^{j\omega t}$ yields $e^{2j\omega t}$. Squaring doubles\footnote{In $\RR$ on the interval $[-1,1]$ the same behaviour can be produced by the Chebyshev polynomials $T_n$. They satisfy the relation $T_n(\cos\theta) = \cos n \theta$.} the frequency. In general an order $N$ polynomial $$p(x) = \sum_{n=0}^N p_n x^n$$ produces a mixed spectrum according to $$p(e^{j\omega t})=\sum_{n=0}p_ne^{nj\omega t}.$$ This means we can talk in terms of the polynomials instead of the terms $e^{j\omega t}$, with $p_n$ corresponding to the amplitudes of the harmonics. The following talks about the polynomials $p(x) = \sum^N p_n x^n$ and $q(x) = \sum^M q_mx^m$. To simplify notation we take $p_n$ and $q_n$ to take the value $0$ when they fall outside of the range $0,\ldots,N$ and $0,\ldots,M$ respectively. Adding two polynomials produces another polynomial, with the coefficients added per term $$p(x) + q(x) = \sum^{\max(N,M)} (p_n + q_n)x^n.$$ Multiplying polynomials \emph{convolves} the spectra, which means that one spectrum is spread out with the pattern of the other spectrum. $$p(x) q(x) = \sum_{n=0}^{N+M} (\sum_{k+l=n}p_k q_l)x^n.$$ Composing two polynomials yields $$p(q(x)) = \sum_{n=0}p_n (\sum_{m=0}^M q_n x^M)^n$$ which has a less trivial closed form. Apart from special cases, like the Chebyshev polynomials which obey the nesting property $T_n(T_m(x)) = T_{n m}(x)$, composition and more specifically iterated composition is the realm of \emph{dynamical systems}. % \section{Special Polynomials} \section{Analytic Functions} Next to polynomials, \emph{power series} $$f(z) = \sum_{k=0}^{\infty} a_k z^n$$ are an interesting tool. They are a natural generalization as polynomials of infinite order. Power series represent analytic functions. Just as with polynomails, the coefficients $a_k$ represent the harmonic content of the resulting signal. % OPGELET: filters zijn ''fourier multipliers'', en geen waveshapers!!! % This is the ``mathematical'' Z--transform\footnote{ Note the duality % between time and frequency analysis, looking at the correspondence % between filtering and waveshaping. In discrete filter design and % analysis, the spectrum is continuous and periodic and lives on the % complex unit circle. Filters are causal, infinite length, discrete % sequences (discrete impulse responses). Here we turn the picture % around: signals live on the complex unit circle and are periodic % and continuous. Spectra are discrete (harmonic) infinite % series. I.e. rational waveshaping functions are the equivalent of % finite order discrete IIR linear filters, which are ``spectrum % waveshapers''. Finite waveshaping polynomials are in the same % sense equivalent to discrete FIR filters.}. % The function $\frac{z}{|z|}$, which maps the complex plane onto the % unit circle, can be used as a general purpose (nonlinear) normalizig % function. Since it is not differentiable, it is not an analytic % function and does not have a power series expansion, so it is not % very useful for analysis and should be used with care. \subsection{Fractional Linear Transforms} The function $$f_{a,\theta}(z) = e^{i\theta}\frac{z - a}{1 - \overline{a} z}$$ with $z,a \in \mathbb{C}$, $\theta \in \mathbb{R}$ and $|a| \neq 1$ is an invertible map from the unit circle onto itself. It is called a Fractional Linear Transform (FLT). If $|z| = 1$ then $|f_{a,\theta}(z)| = 1$. If $|a| < 1$ this function is analytic on the open unit disc. The first thing to remark is that the $f_{a,\theta}$ form a group with composition the group operation. The inverse element of $f_{a,\theta}$ is $f_{-a,-\theta}$. What this means is that performing this operation multiple times can always be related to a single such operation, so it isn't terribly interesting. % In the dual representation, this is of course the z transform of a % discrete all-pass filter. The more general fractional linear % transform can be used as well. The dual being a complex one % pole/zero filter. However, combining $f$ using \emph{multiplication} is meaningful. It gives rise to Schur functions. More specificly Blaschke products $f(z) = e^{i\theta} \prod_k \frac{z-a_k}{1-\bar{a_k}z}$. The Schur algorithm gives an alternative representation for this $$f_k(z) = \frac{1}{z}\frac{f_{k-1}(z) - \rho_k}{1 - \bar{\rho}_k f_{k-1}(z)},$$ with $\rho_k = f_{k-1}(0)$. This means that when we \emph{multiply} the outputs of several first order waveshapers we get a higher order waveshaper with a richer behaviour. \subsection{The Discrete Summation Formula} What happens when we use the $a$ parameter as an input? We set $\theta=0$ in the following because it will only amount to a constant phase shift. Let $a = r z'$ with $|z'| = 1$. Suppose $z$ and $z'$ are two unit norm complex oscillators defined by $z(t)=e^{i\alpha t}$ and $z'(t) = e^{i\beta t}$. Our modulator driven by the two oscillators has the form$$ \frac{z-rz'}{1-r \overline{z'}z}.$$ The signal generated is $$s(t) = \frac{e^{i\alpha t} - r e^{i\beta t}} { 1 - r e^{i(\alpha-\beta)t}}$$ and is periodic when $\frac{\alpha}{\beta}$ is rational. The constant $r$ can be seen as a modulation index. This can be written in a more useful way as $$s(t) = e^{i\alpha t}\frac{1 - re^{-i(\alpha-\beta) t}} { 1 - r e^{i(\alpha-\beta)t}}.$$ The factor $e^{i\alpha t}$ is just a frequency shift and the second factor only depends on $\delta = \alpha-\beta$, so we have $$s(t) = e^{i\alpha t} s'(t) (1 - re^{-i\delta t}),$$ with $$ s'(t) = \frac{1}{ 1 - r e^{i \delta t}} = \sum_{k=0}^\infty r^k e^{i k \delta t}.$$ This expansion is the limit case of the discrete summation formula (DSF) $$ \frac{1 - r^N e^{i N \delta t}}{ 1 - r e^{i \delta t}} = \sum_{k=0}^{N-1} r^k e^{i k \delta t}.$$ This establishes the link between what we can call FLT modulation and harmonic spectra with an $1/f$ envelope. The spectrum of $s(t)$ is thus harmonic with spacing $\delta$, except for the frequency shift $\alpha$. $$ \begin{align*} s(t) & = e^{i\alpha t} s_0(t) \\ s_0(t) & = re^{-i\delta t} + (1 - r^2) \sum_{k=0}^\infty r^k e^{ik\delta t}. \end{align*} $$ % .. which is the dual of truncated IIR filters. % The more general case of fractional linear transforms still has a % discrete spectrum. \subsection{Formants} Whenever we get a power series in $m(t) = e^{i\alpha t}$ we can shift it up and down the frequency spectrum by multiplying the signal with a polynomial in $m(t)$. The result will still be a power series in $m(t)$. Alternatively, the series can be combined with a polynomial through function composition, again yielding a harmonic power series. Such modulation cannot shift the \emph{spectra envelope} in steps smaller than the spacing of the harmonics. I.e. we would want to do this to simulate independent control of pitch and tone colour. However, it is possible to employ an interpolation trick by cross--fading\footnote{Combined with LFS this is essentially the Phase Aligned Formant (PAF) method.} between successive harmonics. The function that creates such a modulator from a base modulator $m$ and a fractional shift $x$ is given by $$c(m, x) = (x-[x]) m^{[x]} + (1-(x-[x])) m^{[x]+1}.$$ Here $[x]$ is the integral part of $x$, which makes $x_f = x-[x]$ the fractional part. Multiplying this with a power series in $m$ gives $$c(m, x) \sum a_im^{i} = \sum \big[ x_f a_i + (1-x_f) a_{i+1} \big] m^{i + [x]}$$ which gives another power series. When the two factors are synchronous, i.e. when when $m(t)=e^{i\alpha t}$, this indeed produces a harmonic spectrum with the $a_i$ fractionally shifted\footnote{This effect is analogous to fractional delay using linear interpolation. A straighforward observation is to extend the linear interpolation to a higher order polynomial interpolator. A polynomial fractional delay filter that preserves DC needs to have a zero at Nyquist to introduce the $180$ degree phase discontinuity when $a = 0.5$. Translating this to our problem, higher order polynomial interpolation of nearby harmonics is the same as amplitude modulation of an oscillator with frequency $x$ which has a phase discontinuity that is disguised by a vanishing waveshaper. The flatness of this waveshaper can be increased by increasing the order of the interpolation, which widens the bandwidth, and decreases amplitude modulation of the carrier wave.}. The function $c(e^{i\alpha t}, x)$ is periodic with a very narrow bandwidth, and so seems to behave like a single, localized oscillation in between harmonic $n$ and $n+1$. The method poses a problem when a time varying oscillator is implemented, i.e. if $m(t) = e^{i\phi(t)}$ where $\phi(t)$ is continuous but has a variable slope. Due to the locking, a change in frequency will cause a phase jump in at least one of the locked oscillators. This can be solved by smoothing out the discontinous frequency change, but not without additional frequency modulation. With locked oscillators, the frequency can only be changed without a phase discontinuity if it is done when the phase of the basic harmonic is $0$. \subsection{Power Series} From the previous it can be concluded that it is the division operation that introduces interesting harmonics. Let's continue with $(1 - re^{i\alpha t})^{-1}$ some more. An odd harmonic series can be synthesized using $$\frac{e^{i \alpha t}}{ 1 - r e^{i 2 \alpha t}} = \sum_{k=0}^\infty r^k e^{i (2 k + 1) \alpha t}.$$ Squaring the powerseries is equivalent to convolving the spectral envelope with itself. This gives a more localized formant structure. Taking the real part of $f$ turns the harmonic spectrum into a symmetric one that can be shifted up (or down) in frequency to build a formant like symmetric wave packet. Squaring the real signal before multiplication gives a smoother formant packet. If $r$ is complex, we can introduce a phase shift in the harmonics. Combining both $r$ and $\overline{r}$ we can construct damped sinusoidal spectral envelopes. This enables us to use standard sinusoidal methods to model a spectral envelope. If $f_{r,\alpha}(t)$ represents the power series or the DSF we see that $f_{\overline{r},\alpha}(t) = \overline{f_{r,-\alpha}(t)}$. [ TODO: Cleanup from here. ] Normalization can sometimes be desirable to get a constant power output over the modulation range\footnote{When using a ``squashed'' normalized DSF and allowing the modulation index to move beyond unit magnitude, we get a very nice sounding broad spectrum transient when $r$ passes through $|r|=1$. Especially so when the change in $r$ is not made instantanious, but i.e. processed by a lowpass filter (dezipped). During a fast change in $r$, the DSF normalization (which limits the number of harmonics to $N$) is not valid. This gives a large signal which then will be squashed by the normalizing function.}. If we define the inner product between two periodic signals $s(t)$ and $s'(t)$ with common period $T$ to be $\leftP_p=\frac{1}{T}\int_0^T \overline{s'(t)}s(t)dt$ and the power to be $P = \left$ and note that the components of the DSF are orthogonal, we can see the power is equal to \begin{equation} \sum_{k=0}^{N-1} |r|^{2k} = \frac{1-|r|^{2N}}{1-|r|^2} \end{equation} % Ik moet heel goed opletten dat ik de operaties niet door elkaar % haal! Namelijk: optelling, vermenigvuldiging en compositie. B.V. de % fractional linear transforms zijn een groep voor de compositie, maar % niet voor de vermenigvuldiging. Waveshaping is compositie van % waveshaper na oscillator. Filtering is een Fourier multiplier, geen % compositie. Spectral waveshaping is compressor! % De algemene rationale is dit: sferische ruimtes geven oscillatie, % terwijl hyperbolische ruimtes events genereren (fly-by). Nu die vage % onzin vertalen naar duidelijke termen... % Dit is te zien door de matrixvoorstellingen. % Unit norm complex numbers geven oscillatie op de eenheidscircel, % terwijl unit norm hypercomplex numbers een pad op de % eenheidshyperbool afleggen. % Dit is oscillatie vis a vis transienten. Als ik juist zit kan je met % clifford algebras directe sommen maken tussen zulkse ruimtes. Het % resultaat is, dat na normalisatie, de hyperbolische explosie er voor % zorgt dat de oscillaties een transient verloop hebben. \subsection{Normalization} The normalization function $$\sigma(z) = z/|z|$$ which maps $\CC$ to the unit circle, or its frequency--doubling square $$\sigma(z)^2 = z/\bar{z}$$ map $\CC \backslash \{0\}$ to the unit circle. \subsection{Normalized Addition} The operation $s(a,b) = \sigma(a+b)$ is interesting as it is closed on the unit circle and defined for all $a$, $b$ except $a = -b$. Is it possible to relate this somehow to the spectrum it produces? From ad--hoc experiments I can testify this yields very interesting sounds. \subsection{Conclusion} The basic idea is the same as in $\RR$. Polynomials of $e^{i\alpha t}$ produce harmonic spectra in a predictable way, just like polynomials of $\sin(\alpha t)$ and $\cos(\alpha t)$ do in $\RR$. However in $\CC$ polynomials and power series are a bit easier to handle. As long as one starts from coherent modulation, the story is quite straightforward. Polynomials allow the introduction of integer multiples of the base frequency, while power series (obtained by division or other non-polynomial analytical functions) represent broad spectra. This suggests an alternative organization: 1. investigate addition, multiplication, function composition driven by simple modulators, and 2. look at power series of some easy to compute functions. Basicly: what can polynomials do for you + what if you add division? \section{Other Manifolds} Investigate Clifford Algebras and certain matrix algebras. (i.e. quaternions, biquaternions, \ldots). Figure out how torsion and other non--geodesic leads to interesting paths through compact and connected manifolds. A lot can be done with division to generate interesting power series. Is it possible to change this to normalization (which has division), which is a more intuitive operation? Also, it seems that a lot of neat tricks are possible by using matrix embeddings of (compact) manifolds with a group action (Lie groups). Defining an additional operation as addition in the embedding space followed by projection onto the manifold subset in the embedding space seems to yield interesting ways of combining two elements in addition to the group action. Entry: Video Codecs Date: Tue Jul 7 09:23:02 CEST 2009 Time to get an overview of different techniques used in video codecs, to see how they relate to resource use. In general, MPEG-1 is mostly for video on CD, MPEG-2 is directed at TV (broadcast + DVD) with more emphasis on robustness, delay, different modes, ... and has an improved audio codec. MPEG-4 adds improved video coding and a more general-purpose decoder, but is a hodge-podge of optionally implemented features. Part 2 (divx) and 10 (avc) are important. MPEG-1 video [1]: - Group-Of-Pictures with Ineter-frame (keyframe) encoded as +- JPEG, Predicted-frame difference to previous frame incorporating motion vectors on macroblocks (16x16 = 4 luma 8x8 + 1 chroma 8x8). Bidirectional frame using forward and backward frame as reference. DC frames serve as "thumbnails" for fast-forward. - Motion estimation works on a fixed diamond region using quarter pixels. MVs are differentially encoded from neighbouring macroblocks (16x16 = 4 luma 8x8 + 1 chroma 8x8). - The DC part of the DCT coefficients is encoded differentially. The AC is coded in a zig-zag pattern (most energy is in the upper left corner around DC) which is then RLE encoded. Quantization uses 5 bits (0-31). Thresholding is adaptive (or user selectable). - The whole bitstream is Huffman encoded. MPEG-2 - Systems section: Transport Stream (lossy media like broadcast) and Program Stream (reliable media like DVD). - Video similar to MPEG-1, optimized for higher bitrates and more different formats (i.e. interlaced). Audio part contains AAC, which is more efficient, flexible and robust. (Part 2 = H.262) MPEG-4 - More advanced video coding + object oriented design. Decoder behaves more like a rendering engine. Variable block size motion compensation. - Part 2: DIVX - Part 10: H.264 / AVC (HD-DVD, Blue-ray) Many additions[9]. Highly nontrivial decoding/encoding. Other formats: - Theora (in Ogg container). Open but less well performing codec. - H.261 Low bit rate (ISDN) video conferencing. - H.263 Low bit rate video converencing. A variant (Sorenson H.263) is used in Apple Quicktime and Adobe Flash Video. Original base for Real Video. Part of 3GPP (MMS). [1] http://en.wikipedia.org/wiki/MPEG-1#Part_2:_Video [2] http://en.wikipedia.org/wiki/Advanced_Audio_Coding [3] http://en.wikipedia.org/wiki/MPEG [4] http://en.wikipedia.org/wiki/Flv [5] http://en.wikipedia.org/wiki/Theora [6] http://en.wikipedia.org/wiki/H.261 [7] http://en.wikipedia.org/wiki/H.263 [8] http://en.wikipedia.org/wiki/Video_codec [9] http://en.wikipedia.org/wiki/H.264/MPEG-4_AVC Entry: Binary Coding Date: Tue Jul 7 10:02:48 CEST 2009 [1] http://en.wikipedia.org/wiki/Huffman_coding [2] http://en.wikipedia.org/wiki/Entropy_coding [3] http://en.wikipedia.org/wiki/Arithmetic_coding [4] http://en.wikipedia.org/wiki/Context-adaptive_binary_arithmetic_coding Entry: MDCT Date: Tue Jul 7 10:54:32 CEST 2009 "Lapped" transform. Used in coding applications where overlap can be used to reduce artifacts. Note that H.264 uses several "exact-match" DCT variants. [1] http://en.wikipedia.org/wiki/Modified_discrete_cosine_transform Entry: Navier-Stokes Date: Mon Sep 28 14:47:26 CEST 2009 Time to look at the Navier-Stokes equation[1] and fluid dynamics. There is a description in the Princeton Companion to Mathematics[2] section III.23 p196. Another starting point is Feynman's Lectures on Physics, part II chapters 40-41. The Navier-Stokes (and Euler) equations are non-linear partial differential equations in terms of a velocity vector field u(x) and a pressure distribution p(x). The Euler equation is N-S with viscosity v=0. Apparently, even for small v these behave quite different. Additionally one can pose a constraint that expresses the fluid is not compressable. E and N-S are Newton's law applied to an infinitesimal portion of the fluid. The nonlinearity (which causes turbulence) is due to convective acceleration, which is an acceleration associated with the change in velocity over position. In practice, the interaction between large and fine scale behaviour requires such a fine mesh that the problem is often approximated, i.e. using the Reynolds-averaged Navier-Stokes equiation[3]. Following Feynman in chapter II.40 (The Flow of Dry Water). A liquid moves under shear stress. Hydrostatics, fluid in rest, amounts to the absence of shear stress. The important result here is that the force per unit volume is -grad(p). The important conclusion is that for a force defined by a potential energy, there is no general equilibrium solution if the density is not constant. (An exception arises when density depends only on pressure). For dynamic fluids, the clue seems to be to translate a velocity vector field description to a particle path description, and formulate Newton's law of conservation of momentum for that path. I.e. with dx denoting infinitesimals and a 2D equation. This yields an equation nonlinear in the velocity components. The force component of the N-S equation is where most of the variation lies. [1] http://en.wikipedia.org/wiki/Navier-Stokes_equations [2] isbn://0691118809 [3] http://en.wikipedia.org/wiki/Reynolds-averaged_Navier-Stokes_equations Entry: Material Derivative Date: Mon Sep 28 18:36:22 CEST 2009 Type: tex \def\duxx{\frac{\delta u_x}{\delta x}} \def\duyy{\frac{\delta u_y}{\delta y}} \def\duxy{\frac{\delta u_x}{\delta y}} \def\duyx{\frac{\delta u_y}{\delta x}} The nonlinearity of the Navier-Stokes equation comes from the expression of the derivative of a velocity vector along a particle path, starting from a velocity field. Let $n$ be the spatial dimension. Given a velocity field $u(x,t) = (u_1, \ldots, u_n)$ expressed in terms of $n$ spatial coordinates $x = (x_1,\ldots,x_n)$ and a time coordinate $t$, we can use the spatial and temporal partial derivatives $\frac{\delta u_i}{\delta x_j}$ and $\frac{\delta u_i}{\delta t}$ to make the following linear approximations in terms of time and space offsets. (1) The velocity vector at a space offset $\Delta x = (\Delta x_1, \ldots, \Delta x_n)$ and a time offset $\Delta t = t^{+} - t$ from $(x,t)$ is $(u_1^{+},\ldots,u_n^{+})$ with $$u_i^{+} = u_i + \frac{\delta u_i}{\delta t} \Delta t + \sum_{j=1}^n \frac{\delta u_i}{\delta x_j} \Delta x_j.$$ (2) A particle starting at a spatial point $x$ and time $t$, flowing according to the velocity vector field $u$ will after a time $\Delta t$ end up at the point $x^{+} = (x_i^{+},\ldots,x_n^{+})$ with $$x_i^{+} = x_i + u_i \Delta t.$$ Substituting $\Delta x_j = x_j^{+} - x_j = u_j \Delta t$ in (1) and dividing by $\Delta t$ then gives an approximation for the change of the velocity vector $\Delta u_i = u_i^{+} - u_i$ over a time interval $\Delta t$, along a flow path of the vector field $u$ through the point $(x,t)$. $$\frac{\Delta u_i}{\Delta t} = \frac{\delta u_i}{\delta t} + \sum_{j=1}^n \frac{\delta u_i}{\delta x_j} u_j.$$ The expression on the right hand side remains the same for the limit $\Delta t \to 0$ when the linear approximations are replaced by exact expressions, and is called the material derivative of $u$. For a generic (scalar-valued) multivariate function $f(x,t)$ and a velocity field $u(x,t)$ the fact that this is a simple application of the chain rule becomes more clear when we write the total derivative of $F(t) = f(x(t),t)$ as $$\frac{dF}{dt} = \frac{\delta f}{\delta t} + \sum_{j=1}^n \frac{\delta f}{\delta x_j} \frac{d x_i}{dt}.$$ Applying this for $f$ ranging over each component $u_i$ of $u$ then gives the previous expression. %[1] http://en.wikipedia.org/wiki/Substantive_derivative Entry: Informal proofs are difficult Date: Sat Oct 17 16:40:31 CEST 2009 This has bothered me for quite a while. Formal proofs can be quite tedious to read, but at least they will not put you in a position where you're reading a proof and you just can't see why a certain step is made, because the author assumed it was obvious. Granted, the ``obvious'' steps usually are obvious, that is from the perspective of somebody who understands the proof, or has a background in the matter (i.e. the justification that's left implicit is used a lot in other proofs). Entry: HSVD Date: Mon Nov 16 14:34:17 CET 2009 A summary of [1]. CHAPTER 2 Explains basic principles of Linear Prediction (LP) and State Space (SS) based methods for parameter estimation. The flow of HSVD is like this: 1. start with a signal of length N 2. create a hankel matrix (N-q x q) -> choose q 3. compute SVD 4. truncate: take first n singular vectors -> choose n 5. LS solve shift equation 6. compute eigenvectors The two values to be chosen are n (signal space dimension) and q (signal + noise space dimension). The n parameter follows from what one is looking for, but how to determine q? Translating LP to an AR problem shifts the statistics but makes it easier to estimate using LS. The essence seems to be this: trunctation by SVD is essentially noise reduction. This then makes the LS or TLS-based estimation for LP and SS approaches more precise. However, the statistically optimal (ML) estimates require nonlinear optimization. In general SS methods are better for larger number of poles since they obtain poles from an eigenvalue decomposition instead of polynomial rooting, which is ill-conditioned for large order. [1] ftp://ftp.esat.kuleuven.ac.be/pub/SISTA/vanhamme/phd.ps Entry: WSOLA - Waveform Similarity OverLap Add. Date: Tue Jan 26 18:15:18 CET 2010 Gist: An extension of the OLA time stretching algorithm where source frames are taken from a neighbourhood around the point a normal OLA would pick the frame. The offset is taken such that the overlapped part has maximal correlation. [1] http://etros1.vub.ac.be/Research/DSSP/Publications/int_conf/ICASSP-1993.pdf [2] http://music.columbia.edu/pipermail/music-dsp/2001-December/046763.html [3] http://mffmtimescale.sourceforge.net/ [4] http://diuf.unifr.ch/pai/people/juillera/PitchTech/Preview.html Entry: Never invert a matrix! Date: Wed Feb 3 17:22:40 CET 2010 Actually, there is a reason why one would want to invert it anyway: if the same matrix inverse is applied to a large number of vectors inversion might pay off because it can be parallellized. Solving a set of equations still has a longer data dependency chain. [1] http://www.johndcook.com/blog/2010/01/19/dont-invert-that-matrix/ Entry: Binary Streams as Stochastic Variables (PDFs) Date: Thu Feb 18 10:44:38 CET 2010 (EDIT: Mon Dec 27 11:20:35 EST 2010: check [2]: Need to make sure that multiplication doesn't cause any undesired spectral content to modulate into the signal band.) Goal: find a way to look at boolean functions on binary bitstreams coming from sigma-delta modulators. I.e. where the bitstream represents an analog signal that is the result of applying a reconstruction filter R after mapping the binary signal into the analog domain. 1. Assumption: we're only interested in the long-term average, so in first approximation we deal with stochastic variables, not stochastic processes. 2. The ``physical'' variable we're interested in is modeled as the expected value of the stochastic variable, which approximates the real implementation which is a running average / low-pass filter. 3. As another simplification we assume variables are not correlated. Using these simplifications, we can work with Probability Density Functions; a stochastic variable is completely specified by its PDF. Let's denote PDFs as this: a(v), b(v), c(v) PDF of single variable ab(v1,v2), ac(v1,v2) joint PDF of 2 varaibles abc(v1,v2,v3) joint PDF of 3 ... Since we use independent variables, joint PDFs can be written as products: ab() = a()b(). Additionally PDFs of binary stochastic variables can be represented by a single number \in [0,1] : the probability the variable is "1" (or "0"). We will abuse this by taking a to mean a(1), i.e. the number a \in [0,1] is the probability that variable a == 1. Using this notation, computing expected values of binary boolean functions is straightforward. I.e. for the XOR function: <+> | 0 1 ----+---- 0 | 0 1 1 | 1 0 E (a <+> b) = 1 ab(0,1) + 1 ab(1,0) + 0 ab(1,1) + 0 ab(0,0) = ab(0,1) + ab(1,0) = a(0)b(1) + a(1)b(0) [independent] = (1 - a) b + a (1 - b) = a + b - 2ab To make sense of this result it's simpler to have the binary values represent {-1,1}. The transformation F that maps [-1,1] into [0,1] is: F x = (1 + x) / 2 This then yields: a <+> b = a + b - 2 a b implies (F a) <+> (F b) = F (- (a b)) In other words: the XOR operation is _inverted multiplication_ of expected values of variables on {-1,1}. The function <=>, the negation of <+> gives a similar result. <=> | 0 1 ----+---- 0 | 1 0 1 | 0 1 E (a <=> b) = ab(0,0) + ab(1,1) = a(0)b(0) + a(1)b(1) [ independent ] = (1 - a) (1 - b) + ab = 1 - a - b + 2ab a <=> b = 1 - a - b + 2ab implies (F a) <=> (F b) = F (a b) The EQV operation is multiplication of EVs of vars on {-1,1}. It seems it's only interesting to look at equivalence classes under inversion of inputs and outputs. See also [1]. I.e. the only other interesting operation to look at is AND. Similar functions (OR,=>,NAND,...) will yield the same result qualitatively by inverting inputs or outputs. . | 0 1 --+---- 0 | 0 0 0 | 0 1 E (a . b) = ab(1,1) = a(1)b(1) = a b So AND,OR,=>,... are multiplications; just in a different domain [0,1]. Conclusion: these binary logic operations give bilinear functions. On the [0,1] domain they are binary linear combinations of the 4 terms: ab, a(1-b), (1-a)b, (1-a)(1-b), where a,b are the probabilities/expected values of the random variables a and b. -- Further work: * What happens when variables are not independent? Can the correlation itself do anything useful? For signals close to 1/2 of the range, i.e. represented by an alternating 01 sequence, it is difficult to be practically independent. However, two coin toss processes are independent, so this should be built-in at the core somehow. The question then becomes: how to implement a modulator? I.e. how to introduce bias from a uniform random source, given the desired expected value. I.e. is it possible to take a structured signal and decorrelate it with a noise process? Doesn't seem to be straightforward, i.e. XOR or EQV with a uniform process are equivalent to multiplication with 0. It seems that one of the a-symmetric operators is needed to create non-uniform distributions from uniform ones. I.e. the AND of two 1/2 noise streams gives a 1/4 noise stream. * Can we use the apparent dependency on randomness (i.e. from noise injection) to transmit other information? * What about looking at these functions as ``modulation maps'' instead of binary endomaps of [0,1] or [-1,1], i.e. as: [0,1] x [-1,1] -> [-1,1] I.e. one of the inputs represents an asymmetric signal, while the other one is symmetric. Some literal Haskell code: > a `xor` b = a + b - 2 * a * b > a `eqv` b = (1 - a) * (1 - b) + a * b > > f x = (1 + x)/2 > > a === b = (abs (a - b)) < 1e-14 > > e_xor a b = ((f a) `xor` (f b)) === (f (- (a * b))) > e_eqv a b = ((f a) `eqv` (f b)) === (f (a * b)) [1] entry://20090609-145519 [2] entry://20101227-110515 Entry: Rhythm & Dance Date: Thu Apr 8 13:19:24 EDT 2010 Instead of looking at rythm as a series of events to be interpreted by a listener, it might be simpler to look at it as a series of motions, i.e. each up needs a down. The point being: "performability" might be a key element in rhythm. Performability meaning: what you hear is not a sequence of sounds, but a representation of the drummer's hand movements. I.e. rhythm _needs_ to be dancable. From this point it might make sense to approach rhythm from the phisical system way (choreography) as opposed to the auditory perception way. I.e. biological motion. A good base migh be to start from acceleration (muscle tension) and connected body dynamics (joints produce circular behaviour). Entry: Euler-Lagrange equations Date: Fri Apr 9 13:53:14 EDT 2010 Type: tex From Arnold[1] chapter 3, with slightly different Haskell-inspired notation. Suppose a functional $\Phi :: (\R \to \R) \to \R$ can be written as $$\Phi(x) = \int_{t_0}^{t_1} \! L(x,x',t) \, dt$$ where $L :: (\R,\R,\R) \to \R$ is a function of 3 variables and $x' = {dx \over dt}$ denotes the derivative of the curve $x :: \R \to \R$. The function $x :: \R \to \R$ is an \emph{extremal} of the functional $\Phi$ if and only if $$\frac{d}{dt}\left\frac{\partial L}{\partial x'}\right - \frac{\partial L}{\partial x} = 0.$$ The term $\frac{\partial L}{\partial x'}$ means the partial derivative of function $L(x,x',t)$ of 3 variables to its second variable. Substituting variables, this denotes the same function as $\frac{\partial}{\partial v_2}L(v_1,v_2,v_3)}.$ The result of the partial derivation operation is a new expression in terms of $(x,x',t)$. In this expression, one subsitutes the unary function $x(t)$ and its derivative $x'(t)$, yielding a unary function of $t$, of which the derivative to $t$ can be computed. Note that this is 300 year old sloppy notation. See next post and SICM[2], where Spivak's unambiguous notation[3] is used. This theorem is readily generalized to a multi--variate functional $\Phi :: (\R \to \R^n) \to \R$ operating on $n$--dimensional curves $:: \R \to \R^n$ and a function $L :: (\R^n, \R^n, \R) \to \R$ of $2n+1$ variables. Summarized, if a functional only depends on \emph{positions}, \emph{velocities} and \emph{time}, the Euler--Lagrange equation can be used to find concrete equations for an \emph{extremal} curve of the functional. %[1] isbn://0387968903 %[2] http://mitpress.mit.edu/SICM/ %[3] isbn://0805390219 Entry: Sussman about crappy notation Date: Fri Apr 9 20:03:34 EDT 2010 In the video[1] Arnold's book[3] is mentioned two times (and that he's a SOB). Sussman doesn't like the ambiguous notation. Arnold[3] p246: (not breaking with tradition and explaining what the Langrange equations mean) and p258: ``It is necessary to use the apparatus of partial derivatives, in which even the notation is ambiguous.'' Sussman uses the notation based on Spivak's Calculus on Manifolds[5] which handles it properly: partial differentials expressed in terms of position of parameter instead of names that can have different meanings (parameters or functions). See p.44 at the end of the chapter about differentiation[6]. His point: programming forces one to be precise and formal without being excessively rigorous. [1] http://video.google.com/videoplay?docid=-2726904509434151616&hl=en# [2] http://mitpress.mit.edu/sicm/ [3] isbn://0387968903 [4] http://www.docin.com/p-708956.html [5] http://en.wikipedia.org/wiki/Calculus_on_Manifolds_(book) [6] http://books.google.com/books?id=POIJJJcCyUkC&pg=PA44#v=onepage&q&f=false Entry: Legendre Transform Date: Sat Apr 10 09:47:32 EDT 2010 Type: tex \newcommand{\ip}[2]{\langle {#1} , {#2} \rangle} The Legendre Transform (LT) shows up in classical mechanics as the link between the Legendre equations and Hamilton equations. Some notes about what this transform actually does. First, how to compute? The LT of $f(x)$ is defined as the function $g(p) = \max_x F(x,p)$ where $F(x,p) = px - f(x)$. For a convex function the maximum occurs where $\nabla_x F(x,p) = 0$ or $p = \nabla_x f(x)$. Solving the latter equation yields an expression for $x(p)$ which can be substituted in $F$ to yield $g(p) = F(x(p),p)$. \begin{itemize} \item The LT transforms a convex function on a vector space to a convex function on the dual vector space. The coordinates $p = \nabla x$ can be seen as a representation of this dual space. (TODO: less handwaving please). \item Two functions make up an LT transform pair if their first derivatives are inverses of each other. Note that the derivative of a convex function is a monotone function and therefore invertible. \item For a quadratic form $f(x) = x^T F x$, we have $p(x) = 2 F x$. After subsituting $x(p) = \frac{1}{2} F^{-1} p$ in $g(p) = \ip{p}{x(p)} - f(x(p))$ we get $g(p) = \frac{1}{4}p^TF^{-1}p$. Note that the respective derivatives $x \to 2Fx$ and $p \to \frac{1}{2}F^{-1}p$ are inverses. \item For a QF we have additionally that $g(p) = f(x)$ or that $f(x) = \ip{x}{p(x)} - f(x)$ or $f(x) = \frac{1}{2}\ip{x}{\nabla f}$. This is not true in general. \item The Langrangian of a physical system $L = T - U$ is (assumed to be) convex w.r.t. the (generalized) velocities. This seems to make general sense: the kinetic energy grows for velocities that go to infinity, \emph{independent of direction}. In addition, $T$ will likely be a quadratic form. Note that even if the generalized coordinates live in a non-flat manifold, the velocities will still be vectors in the tangent space, and kinetic energy is expressed in terms of the weighted magnitudes of these vectors (a positive definite QF). \end{itemize} %[1] http://arxiv.org/pdf/0806.1147v2 %[2] isbn://0387968903 %[3] http://en.wikipedia.org/wiki/Legendre_transformation Entry: Computer Algebra System (CAS) Date: Sat Apr 10 16:21:09 EDT 2010 From what I gather, a generic "simplify" method is a collection of hacks, because there is no generic notion of what "simple" actually means. Side-stepping that issue, particular transformations are usually well-defined. From my own limited experience with static analysis & abstract interpretation, it seems best to map expressions between different domains. From an engineering point of view it is almost always simpler to split any transformation stap in at least two steps, and have formal intermediate languages. For any transformation step: * Convert equivalent expressions (L1) to a unique normal form (L2). * Express complicated transformations as directed equalities (pattern matching functions) on normal forms (L2) to some target domain (L3). * Optionally, re-embed the target domain (L3) into the original (L1) or some other domain. Entry: What is a differential? Date: Mon Apr 12 12:52:09 EDT 2010 Type: tex A differential of a function on a manifold $f :: M \to \R$ is a linear functional $df_p(t) :: T_p \to \R$ defined on the tangent space $T_p$ at $p \in M$. Entry: Conservative Systems and Chaos Date: Mon Apr 12 21:08:26 EDT 2010 Type: tex Apparently it is possible to have chaotic behaviour in a conservative (hamiltonian) system. A HS conserves phase-space volume, but not necessarily shape. Think of kneading dough: volume is conserved, but an initial ball of raisins spreads out. How to make an example? It needs to be a-symmetric to avoid integrability. Take a potential like $$U(x,y) = x^2 + y^2 + \frac{a}{1 + (x-b)^2 + y^2}$$ which looks like a harmonic oscillator potential well $x^2 + y^2$ at large scale, but has a bump at $(b,0)$. Can it produce chaotic behaviour? For simple systems, can the Lyapunov coefficient (LC) be computed locally? Sure it can, but what matters more is the \emph{average} LC, no? Need more study here\ldots Entry: d'Alembert Principle Date: Sun Apr 18 13:24:28 EDT 2010 Basic idea: the ``constraint force'' that keeps trajectories on a configuration manifold embedded in Euclidian space can never perform any work, i.e. change the energy of the system. This is the same as saying that the constraint force is always orthogonal to the tangent space of the configuration manifold. Entry: Spinors and Simplectic Geometry (Hamiltonian Mechanics) Date: Wed Apr 28 22:41:41 EDT 2010 Ever since an ex-collegue of mine made the remark that he heard some physicist friend of his say that ``I wonder why spinors aren't used more in signal processing'', I'be been intrigued by the idea but didn't really see the connection. Lately I've been looking into classical mechanics a bit more, and I'm about to enter the Hamiltonian Mechanics part. I take an interest in this as I'd like to see the links with sound synthesis. [1] http://en.wikipedia.org/wiki/Spinor [2] http://en.wikipedia.org/wiki/Symplectic_geometry [3] http://en.wikipedia.org/wiki/Linear_symplectic_space Entry: Differential Music Date: Fri Apr 30 16:17:58 EDT 2010 Type: tex In an attempt to create music in the form of differentiable functions on tori, I ran into the problem of simulating events anyway. The apporach that seems to work well is to apply a function that would return a smooth pulse wave when applied to $x(\theta) = cos \theta$. This lead to $$x \to \frac{1}{1 + k^2(1 + x)}.$$ Another one that might work is $$x \to \frac{1}{1 + (kx)^2}.$$ The reason for using differentiable functions is that they open the door for local analysis of parameter spaces of sound generating networks, and other analytic (computer algebra) techniques. The next question is: how to bring rhythm into the game? Entry: Cleaning up PhD research papers Date: Sun May 30 13:33:15 CEST 2010 I'm keeping general introductory papers, and audo synth + FX papers. For sinusoidal modeling, Petre Stoica seems to be a good starting point for generic approach. The other direction was Sabine VanHuffel for the fast algorithms. For wavelets it's Daubecies and Sweldens, I'm keeping some introductory papers. I'm throwing away the paper forms of specific ad-hoc papers about: - Blind source separation (statistics based) - CAS (computational source separation: perception model based) - Computational Sciene Analysis - Matching Pursuit (iterative filtering) - Audio Coding - Sinusoidal + complex exponential modeling - Wavelets + Applications to approximate LU I was not able to integrate most of that knowledge during my PhD years because of the many ways to characterize errors (what mathematical framework to use to express the modeling problem) and the non-linearity of the resulting optimization problems, which makes practical comparison quite difficult. Possible variations: amplitude or amp+freq estimation, polynomial phase, optimality (matching vs. linear models + error), noise coloring, etc. I'm not keeping Matching Pursuit papers: this method is too ad-hoc, and forgive my arrogance but I think I can re-invent most of this technology if I'm ever in need of it. I'm not keeping the sinusoidal modeling papers (peak picking, etc..). Same story as with MP: too ad-hoc and re-inventable. I'm limiting myself to more mathematically meaningful structures. I've vowed to not set a foot in the theatre of perception ever again! (I.e. speech recognition: most of this technology needs extra information about how the brain works; as it is that what we want to mimick in the first place.) Entry: Wavelets : local trigonometric basis Date: Sun May 30 14:09:19 CEST 2010 I'm keeping some wavelet and transform papers. Might be interesting to look into. Entry: Exponential sinusoidal modeling Date: Wed Jun 2 15:05:45 CEST 2010 I'm keeping the exponential modeling papers. Most sinusoidal modeling problems also suffer from ambiguities (optimization problems with many similar local minima), which require disambiguating constraints and assumptions to become useful. However, these can usually stay a bit analytically tractable. I'd like to eventually work out some kind of survey/indexing into this kind of research. Entry: Exponential modeling Date: Sat Jun 19 13:38:34 CEST 2010 I worked on sinusoidal modeling for a while, and if there is one thing that I can remember, it is that there is a lot of wiggle room in how to tackle the problem on the technical side, and that what we hear isn't so clearly related to what we can measure: our perception fills in a lot of gaps. Sinusoidal modeling highlights: * Fourier transform and DFT (FFT). The FT is interesting for reducing the complexity of convolutions, performing a whitening transform for adaptive filters (as a filterbank), and as an ad-hoc method for analysing harmonic sounds. * Autoregressive models + ladder filters (orthogonal polynomials and Shur's algorithm). The theory behind this is quite pretty. For structured matrices, modified Shur and rank-revealing decompositions can be used to yield fast algorithms. * Non-linear phase signals. Moving from linear phase (exponential sinusoidal) to chirp and higher order polynomial phase adds more complexity. There doesn't seem to be much structure to explore here. * Nonlinear optimization. Writing sinusoidal modeling as a generic optimization problem can use the fact that derivatives of exponentials in general do not look horrible. However, these functions are usually multi-modal, and give rise to ambiguities, suggesting that there are "many ways to see a signal" as a sum of sine waves. * Subspace based techniques. I see two distinct classes: one that operates directly on the signal/noise subspaces obtained through the SVD decomposition of signal marices, and one that uses signal spaces to further derive approximate linear system parameters (sinusoidal parameters). Entry: Subspace Identification Date: Sat Jun 19 15:02:00 CEST 2010 Subspace Identification[1] refers to system identification of linear MIMO dynamical models using linear algebra techniques (QR + SVD). The basic idea revolves around a two-step procedure: 1. Given input/output data, construct a _Kalman state sequence_ by projection onto a limited subspace. 2. Obtain state matrices from this using linear least squares. The fact that 1. is at all possible provides the main leverage. ( The following is generalized from the HSVD method. I'm not sure if it completely applies to the stochastic version in [1], but I have the impression it does. I need a more intuitive grasp of the Kalman filter first. ) Such SVD-based methods are good in _practice_ but the characterization of the approximation is theoretically very far removed from what can be obtained by maximum likelyhood methods using a more direct approach. Subspace methods have been used to categorize the "unreasonable effectiveness" of mathematics. I'm definitely no expert, but it seems to me that they fall more into the class of convenient, accedental hacks. (The link between the estimation error and an ML approach is not clear : probably there is none due to radical re-interpretation of the problem?) [1] ftp://ftp.esat.kuleuven.ac.be/pub/SISTA/nackaerts/other/alln.ps.gz Entry: Maximum Likelihood estimation and the Kalman Filter Date: Sat Jun 19 19:25:07 CEST 2010 Type: tex ( See [6] for a good introduction to ML and Bayesian estimation. ) Maximum likelihood estimation (MLE) is a way to estimate model parameters based on parameterized probability density functions. \begin{enumerate} \item Construct a conditional model $P(x|\theta)$ which gives the probability density function (PDF) of observables $x$ in terms of model parameters $\theta$. \item Interpret the PDF as a function $L(\theta) = P(x_0 | \theta)$, setting the observables to a particular observed outcome $x_0$, and find the $\theta_0$ that maximizes this function. \end{enumerate} This gives a $\theta_0$ that \emph{best explains} the data, as the probability of observing $x_0$ is highest for the parameter vector $\theta_0$. A simple example of MLE is linear least squares estimation with uniform noise assumption. From [6] p. 91, the Kalman filter arrises in a Bayesian framework from calculating updates of conditional probabilities, taking into account the next observation step. In the case where PDFs of parameter priors and noise sources are gaussian, there is a efficient update mechanism to compute parameter representation of these PDFs (mean and covariance matrix). In general the KF gives the best possible linear estimator in a MSE sense. In [6], p. 71-77 the differences between ML, MMSE and MAP estimators is explained using the concept of risk[7] which combines cost with probability. An estimator minimises risk. Different cost functions give rise to different estimators. In [8] a brief explanation is given about how one arrives at the Kalman filter relations from a recursive Bayesian setting. % [1] http://www.tina-vision.net/docs/memos/1996-002.pdf % [2] ftp://ftp.esat.kuleuven.ac.be/pub/SISTA/nackaerts/other/alln.ps.gz % [3] http://en.wikipedia.org/wiki/Maximum_likelihood % [4] http://en.wikipedia.org/wiki/Likelihood % [5] http://en.wikipedia.org/wiki/Ordinary_least_squares % [6] http://www-sigproc.eng.cam.ac.uk/~sjg/book/digital_audio_restoration.zip % [7] http://en.wikipedia.org/wiki/Risk_%28statistics%29 % [8] http://en.wikipedia.org/wiki/Kalman_filter#Relationship_to_recursive_Bayesian_estimation Entry: Carette's Implicit Model Specialization Date: Tue Jun 22 12:22:06 CEST 2010 Remarks from reading [1]. * Model transformation is straightforward, but the selection of the techniques still requires human insight. (I.e. too many degrees of freedom to automate). * Not only computation, also deduction is necessary (i.e. to prove that a particular computation can be eliminated). * Look into active libraries[2] and telescoping[4]. [1] http://www.cas.mcmaster.ca/~carette/newtongen/ [2] http://mozart-dev.sourceforge.net/activelib.html [3] http://awurl.com/SvpJqXPwZ [4] http://telescoping.rice.edu/ Entry: Digital Audio Restauration Date: Tue Jun 22 15:14:45 CEST 2010 Digital Audio Restoration - A Statistical Model-Based Approach by Simon J. Godsill and Peter J. W. Rayner[1]. It contains interesting introduction on Bayesian and ML estimation. [1] http://www-sigproc.eng.cam.ac.uk/~sjg/book/digital_audio_restoration.zip Entry: Annoyed by Computer Modern Date: Tue Jun 22 15:18:31 CEST 2010 I have this book [1] in print. The printed version is non-glossy paper, and it has a good looking "spill". Computer modern is too thin on the screen and on a laser printer. Was it designed with this spill in mind? TAOCP has less spill, non glossy but finer paper, but looks better than laser print. [1] http://www-sigproc.eng.cam.ac.uk/~sjg/book/digital_audio_restoration.zip Entry: Exponential sinusoidal modeling Date: Thu Jun 24 17:00:53 CEST 2010 Type: tex There are many ways to formulate the exponential sinusoidal modeling problem. It is important to distinguish between the \emph{stochastic} model, and the \emph{estimator}. A basic element of all approaches is the linear prediction (LP) or autoregressive (AR) relation $$y_n = \sum_{i=1}^N a_i y_{n-i}.$$ Placing this in a statistical framework can yield many variants. I.e. the stochastic AR model $$y_n = \sum_{i=1}^N a_i y_{n-i} + e_n,$$ and a noisy sum-of-sinusoids model $$x_n = \sum_{i=1}^N a_i x_{n-i} \text{ and } y_n = x_n + e_n.$$ The difference between these is that in the former, the noise signal can be seen as driving the input of a dynamical system, while in the latter there is only measurement noise. For each stochastic model, several estimators can be created. An estimator transforms an observation into a guess for the corresponding parameters, depending on known or estimated statistics. In the case of the AR model, when the signal covariance matrix is known, the AR model can be solved exactly. It is possible to solve the model in the LS sense, which is equivalent to building an estimate for the signal covariance. Entry: Superfast Toeplitz algorithms Date: Fri Jun 25 10:28:42 CEST 2010 The structure in linear algebra problems involving Toeplitz matrices can be exploited using generalizations of the Levinson[2] and Schur algorithms[4]. Further exploitation is possible using superfast algorithms based on the FFT or DCT. [1] http://people.cs.kuleuven.be/~marc.vanbarel/software/ [2] http://en.wikipedia.org/wiki/Levinson_algorithm [3] http://www.math.niu.edu/~ammar/cortona/cortona.html [4] http://nalag.cs.kuleuven.be/papers/ade/BWG93/ Entry: Displacement Rank & Generalized Schur Date: Fri Jun 25 14:58:42 CEST 2010 Type: tex Reading Mastronardi's PhD diss[1]. One of the things that has put me off before is the horrible notation. Many of the formulas talk about symmetric matrices, which yields a lot of $AJA^T$-style terms. Isn't there a more readable way to write this down? Anyway, the algorithms are quite elegant: instead of operating on a matrix $R$, one performs elimination operations on generators only, reducing the complexity order by one. What is interesting though is that one leaves the domain of signal algorithms to be able to use matrix algorithms, but one re-enters the domain of signal algorithms in the implementation, recovering some of the matrix structure. % [1] http://users.ba.cnr.it/~irmanm21/Welcome.html Entry: Predict and Update Date: Thu Jul 8 10:11:26 CEST 2010 If you look at the equations for the Kalman filter[1], it is remarkable that they are factored in two steps: 1. predict output using current input, previous state estimate and system equations. 2. update state estimate and noise statistics by observing the real (noisy) system output. [1] http://en.wikipedia.org/wiki/Kalman_filter Entry: Binary clocked modulation & averages Date: Thu Jul 22 07:26:44 CEST 2010 About this[1].. The problem seems to be that there isn't really much "space" when signals are binary and clocked; i.e. there is a finite amount. Maybe instead of trying to use infinite techniques (statistics: limits) it's better to look at properties of discrete structures. [1] entry://20100218-104438 Entry: CRC algo Date: Fri Jul 30 12:03:25 CEST 2010 What's with the table of coefficients in this CRC32 implementation? ( code: COPYRIGHT (C) 1986 Gary S. Brown. ) [1] https://aachen.uni-dsl.de/svn/unidsl_firmware/backfire/trunk/backfire/tools/firmware-utils/src/cyg_crc32.c Entry: Minimal erase counter Date: Tue Aug 3 09:06:50 CEST 2010 How to make a counter that has a minimal number of erases (0->1 transitions). i.e. 1111 erase 1110 1100 1000 1101 erase 1001 0001 ... It boils down to finding a sequence of paths through the bit-flip hypercube from 111... to 000... This is probably not unique, but how to find a representative with a simple structure? I.e. how to turn the directed (1->0) hypercube graph into a tree? (And step 2, how to order the branches of that tree?) The idea is to give each node only a single parent. I.e. given a bit vector, there is an algorithm to compute its parent. 10010011 -> 11010011 101001 -> 111001 Entry: Quadratic Residues Date: Sat Aug 7 20:25:32 CEST 2010 ( From conversation with Antti; looking up some background information.. ) Exactly half of the integers in range [1..(p-1)] are quadratic residues (QR), and the other half are non-residues of prime p. q is QR if there exists a x : x^2 = q (mod p) In other words, x is the square root of q in the field Z/pZ. I.e. for p = 5. 1^2 = 1 2^2 = 4 3^2 = 4 (9) 4^2 = 1 (16) Meaning 1 and 4 are QRs and 2 and 3 are not. Why is this? Each number can be written as g^n, where g is a generator of the cyclic multiplicative group of the field Z/pZ. If n is even, g^n has a square root as g^(n/2). With the exception of p-2, exactly half of the elements of the field Z/pZ has a square root, as there are always an even number of elements in the multiplicative group, which has order p-1. [1] http://en.wikipedia.org/wiki/Quadratic_residue Entry: Peak compression Date: Sun Aug 8 06:57:58 CEST 2010 For many applications that have constraints on bandwidth and dynamic range at the same time, it is useful to be able to control the maximal signal amplitude. The simplest way to produce wide-band constant-amplitude pulses is to use chirps. Another intriguing way, when the spectral content is known, is to use Schroeder phases[4][5]. ( Using n(n-1) phase offsets -- compare to n^2 for Newman phases. ) Now the interesting part is that there is a _digital_ variant of this. Here one starts from a signal with minimal dynamic range: a (PSK modulated) binary signal. Binary can only represent 1 and -1. When modulated as a PSK signal it is an analog signal with a small dynamic range. (Smallest known? See BER[3]). A Barker code[2] then is a sequence that is near-orthogonal to its own shifts, meaning that its autocorrelation has a distinct peak at shift=0, but has maginitude <= 1 for other shifts. (Apparently, orthogonality is too strict a constraint). Local refs: Optimal Binary Sequences for Spread Spectrum Multiplexing[6]. Synthesis of Low-Peak-Factor Signals and Binary Sequences With Low Autocorrelation[7]. [1] http://en.wikipedia.org/wiki/Pulse_compression [2] http://en.wikipedia.org/wiki/Barker_code [3] http://en.wikipedia.org/wiki/Bit_error_rate [4] http://books.google.be/books?id=hQ6bl3RG04sC&pg=PA290&lpg=PA290&dq=schroeder+phases&source=bl&ots=7IT7kYGcyu&sig=45_66hBv0Xxo1yw9YBlyXVU0usY&hl=nl&ei=oT5eTJ-IJIX80wSc-8XHBw&sa=X&oi=book_result&ct=result&resnum=6&ved=0CEIQ6AEwBQ#v=onepage&q=schroeder%20phases&f=false [5] http://ieeexplore.ieee.org/xpl/freeabs_all.jsp?arnumber=1054411 [6] md5://c35e337e61dfbcd980e3728768f71af4 [7] md5://9b2913693c005de2515e3efeb8548e47 Entry: Schroeder phases and quadratic residue diffusors. Date: Sun Aug 8 07:30:28 CEST 2010 What is the idea behind a quadratic residue diffusor (QRD)? What is the link with Schroeder and Newman phases? ( Is there any link with the "maximal irrationality" of the golden ratio. Max. irr. here means partial fraction expansion is all ones which means that rational approximation converges as slow as possible. ) In [2] it is mentioned that surprisingly, the DFT of the exponentiated sequence (amplitude 1 and n^2 mod p used as the _phase_) has a constant magnitude. Why is this? [1] http://www.dpi.physik.uni-goettingen.de/~mrs/Vortraege/Remembering-the-Good-Days-at-Bell-Laboratories-2004/index.html [2] https://ccrma.stanford.edu/~kglee/pubs/2dmesh_QRD_rev1/node2.html Entry: Finite Field Automorphisms Date: Sun Aug 8 08:44:55 CEST 2010 A binary LFSR sequence ultimately comes from a cycle in the multiplicative group of a finite field GF(2^n). This is a unique mathematical structure with a bunch of representatives that are all isomorphic. So, what does the automorphism group of a finite field look like? One place to look for some intuition is in Gold Codes[1]. This is the only practical application I know where two sequences with different generator polynomials are combined to form a new sequence with useful properties. Interesting properties of Gold codes: * Low autocorrelation (this is a "meaningless" property, as they are selected * The XOR (+) operation is closed PROOF: Say s1(n) and s2(n) are LFSR sequences of the same lenght. A gold code is constructed as g(n) = s1(0) + s2(n). We then have g(a) + g(b) = s1(0) + s2(a) + s1(0) + s2(b) = s2(a) + s2(b) = ... ??? (hmm... it seemed obvious - not awake yet) A link I found about finite field automorphisms[3]. Time to finally get into Galois Theory[4]. But what do I know? It has to do with the structure of the multiplicative group. I would think that the symmetry group gets larger if the multiplicative group is very composite. So, which GF(2^n) have prime order multiplicative groups? Are there primes that look like 2^n-1? Mersenne primes[7]. But is it really primes we're looking for, not maximally composite numbers? [1] http://en.wikipedia.org/wiki/Gold_code [2] http://ieeexplore.ieee.org/xpl/freeabs_all.jsp?arnumber=1054048 [3] http://everything2.com/title/automorphisms+of+finite+fields [4] http://en.wikipedia.org/wiki/Galois_theory [5] http://en.wikipedia.org/wiki/Safe_prime [6] http://en.wikipedia.org/wiki/Strong_prime [7] http://en.wikipedia.org/wiki/Mersenne_prime Entry: More PRN sequences: 1/x as fractional or p-adic number. Date: Sun Aug 8 10:13:06 CEST 2010 What I'm looking for are binary sequences that are naturally periodic and somehow "obvious". An interesting place to look is the fractional digits of 1/x, where x is some special number. At some point the digits start repeating. Apparently pseudo random noise generation is an application of Sophie Germain primes[2]. A prime number p is a Sophie Germain prime if 2p + 1 is also prime. ( Instead of the fractional expansion, can we also use the p-adic expansion? ) Constructing the bit sequence is done using long division. The period can be obtained from observing that in base b: 1 / (b^n - 1) = 0.0...10...10... <---> n digits For a number 1/q we find the smallest n such that q * x = b^n - 1 Here x is the repeated pattern in the fractional expansion of 1/q, one period of the pseudo random number sequence we are interested in. How can we make sure that n is as large as possible? In other words, find a q that maximises n, where n is the smallest number such that q divides b^n - 1. Rewriting it makes it a bit more clear. b^n = 1 + q * x -> b^n = 1 (mod q) How to make n as large as possible? Let's start with q a prime number. Then this becomes a statement about the multiplicative group of the field Z/qZ, which has order q-1. The multiplicative group of a finite field is cyclic and commutative, so completely determined by its order. Now, when q is a Sophie Germain prime of the form 2p + 1 the multiplicative group order is 2p. There are 3 possible cycles in this group, with periods 2, p, and 2p. What needs to be verified to make sure we have the maximal cycle length 2p is that b is a generator of this cycle (b is a primitive element), meaning b^2 != 1 mod q b^p != 1 mod q For b = 2 the first one is true when q > 3. The second one can be verified for given p and b, but probably also proven generally (??). [1] http://en.wikipedia.org/wiki/Sophie_Germain_prime [2] http://mathworld.wolfram.com/DecimalExpansion.html Entry: LFSR beacon detection Date: Sun Aug 8 10:35:36 CEST 2010 Actually, this might be interesting to use together with the krikit idea. Or how to make a variant of GPS using LFSR audio-beacons. Problem: we have plenty of bandwidth (say 3 - 10 kHz) but not dynamic range (environment noise, and "noisyness" of the transmitter). We want to transmit a low bitrate signal. Solution: using spread-spectrum techiques we can trade bandwidth for dynamic range. A PSK baseband signal after heterodyne filtering by the carrier wave is a complex phase signal. It might have some small residual modulation that is easy to track, so let's ignore it. The basic idea is to use the LFSR to perform a local prediction for the I and Q signals separately, keeping the operation itself linear. (I.e. if more than one bit of information is available, it can be used.) We integrate the output of this predictor to yield a slow-varying information carrier. A Schmitt trigger on the output of the integrator should be enough. Entry: Chirps from corrugated galvanised iron (golfplaat) Date: Sun Aug 8 11:58:43 CEST 2010 An open barn/shed with 3 walls made of corrugated galvanised iron[1], with lines arranged vertically, produces chirp impulse response at an angle of 45 degrees with the walls. Why? [1] http://en.wikipedia.org/wiki/Corrugated_galvanised_iron Entry: Size of GF(2^n) symmetry groups Date: Wed Aug 11 21:38:04 CEST 2010 I think I'm stuck with a false fact in my head.. Are the number of distinct irreducible polynomials and the size of the symmetry group of GF(2^n) actually the same? [1] http://oeis.org/wiki/A011260 [2] http://oeis.org/wiki/A000031 Entry: Binary Tree Permutations Date: Sat Aug 14 14:25:38 CEST 2010 ( Of relevance to understanding PF's memory model [1]. ) I'm trying to understand the group of binary tree permutations. Let's start with these two, intuitively basic operations: * swap: a b S = b a * rotate: a (b c) L = (a b) c (a b) c R = a (b c) Notation: Lower case letters or digits indicate trees, upper case letters are tree transformations. Group action of transformation T on tree t is denoted as tT. Binary trees have only one non-associative operator: (t1 t2) is the tree made from splicing together the subtrees t1 and t2. The relation t1 -T-> t2 is shorthand for t1 T = t2. Questions: 1) is this set minimal? and 2) does it generate all permutations of binary trees? For convenience, let's stick to permutations of infinite trees, but have them go only finitely deep. This means the group of transformations is infinite in size but more "regular" than a transformation group of finite-size trees as there are no special border conditions. The answer to 1) is no. It is possible to write L = SRSRS. Proof: a (b c) -S-> (b c) a -R-> b (c a) -S-> (c a) b -R-> c (a b) -S-> (a b) c We also can't write S in terms of L and R because both L and R leave invariant the ordering of the leaf nodes, while S does not. To understand more about 2) let's see what the relation is to lists. Suppose that a binary tree can be transformed to a (left or right leaning) list by a succession of L and R operations. The inverse is then also true: any list can be transformed back into an arbitrary binary tree with the same leaf-node order using a particular combination of L and R transformations. So is the premise true? Or do we need another operation? From inspection, it seems that rotating a tree to a (right leaning) list can be done by R operations on _subtrees_. The question is then: how to express R operations on subtrees? I.e. how to express: a ((b c) d) -> a (b (c d)) More generally, how to take a tree transformation T and re-root it at a different base node? Intuitively this seems to be a new operation as you "swipe something under the carpet". Let's try: if the basic operations of S,L,R can be constructed to operate only on the right (or left) node of the tree, expressed in terms of root-node S,L,R operations, then we can recursively transform any operation. r (a b) -S1-> r (b a) But I don't immediately see how to construct S1. My first attempt brought me to some other primitives: 3-element right-leaning rotations LS and SR, which are each other's inverse: 1 (2 3) -L-> (1 2) 3 -S-> 3 (1 2) 3 (1 2) -S-> (1 2) 3 -R-> 1 (2 3) The left-leaning variants are RS amd SL. (1 2) 3 -R-> 1 (2 3) -S-> (2 3) 1 (2 3) 1 -S-> 1 (2 3) -L-> (1 2) 3 Because they are embeddings of 3-element list rotations we have: (ST)^3 = (TS)^3 = I where T is L,R. From these basic operations I can't see how to implement the re-rooted swap operation: 1 (2 3) -> 1 (3 2), which seems to be a genuine primitive transformation. This reminds me of the hint at necessity of '>R' and 'R>' or 'dip' when writing Forth or Joy code: you need a second stack to hide intermediate results. (Additionally you need a "free" stack to implement creation and destruction of cells in a conservative way.) Then the question might be: if we replace S, R, L by S', R', L' all rooted at the right subtree, what can we generate? * swap': r (a b) S' = r (b a) * rotate': r (a (b c)) L' = r ((a b) c) r ((a b) c) R' = r (a (b c)) For sure, the group these generate is isomorphic to the other one, but it's not a complete set as it leaves the whole left subtree invariant. Here 'r' really resembles the Forth return stack. The '>R' and 'R>' operations could be r (a b) -L-> (r a) b (r a) b -R-> r (a b) ( However, in the s-expression encoding of Forth I would use the version which builds stacks as right-leaning trees. ) Again the questions: 1) minimal set? 2) all permutations? Approach: My hunch is that it's not necessary to provide an infinite set of primitives: I think you can build level 2 from the connection between 0 and 1. Then when that works, do we have all? ... Antti[4] points out I'm looking for Thompson's group[2]. Follow through on [3]. [1] entry://../libprim/20100814-114746 [2] http://en.wikipedia.org/wiki/Thompson_groups [3] http://www.math.binghamton.edu/matt/thompson/cfp.pdf [4] http://ndirty.cute.fi/~karttu/matikka/stebrota.htm Entry: Compensating small room low-frequency acoustic modes Date: Mon Nov 22 11:34:52 EST 2010 Problem: I have a ringing room mode at 132Hz, right where I'm sitting. Mentioned by Ethan Winer[1] on Gearslutz[2] that it doesn't make much sense to compensate for narrow (high-Q, long rining) room modes with EQ. First: trying to EQ out room modes works only for a small sweeet spot. I can live with that. But as he mentions, it doesn't remove the ringing and it can't compensate for zeros. [1] http://www.ethanwiner.com/acoustics.html [2] http://www.gearslutz.com Entry: Pink noise Date: Tue Nov 23 18:04:42 EST 2010 Where does pink noise come from? The wikipedia page[1] says ``There are no simple mathematical models to create pink noise.'' In general power laws[2] indicate scale invariance. However the 1/f noise is inbetween white noise and its integral (Brownian noise). [1] http://en.wikipedia.org/wiki/Pink_noise [2] http://en.wikipedia.org/wiki/Scaling_law Entry: Impedance and Composite Circuits (What is a parallel impedance geometrically?) Date: Tue Nov 23 19:28:39 EST 2010 Type: tex I'm trying to revive some intuition for reading analog circuits and strikes me now in a very clear way is that the eye to use for electronics is not the eye for dimensionless signals but the eye for impedances. Dimensionless signals are nice to work with mathematically, and in most DSP work these numbers are all you ever see. The basic building block is the unit--delay, or the integrator in the continuous case. One input and one output, all very neat. However, the low--level basic building blocks of electronics are \emph{impedances}, relations between the voltage $V$ across and the current $I$ through an edge representing a primitive circuit element or a composite circuit. For any element, the impedance is defined as the ratio between voltage and current: $Z = V / I$. The conductance is the inverse $S = I / V$. \begin {itemize} \item There are only 3 linear circuit elements \begin {itemize} \item resistor $V = R I$ \item capacitor $dV = C I dt$ \item inductor $dI = L V dt$ \end {itemize} \item There are only 2 ways to compose them: \begin {itemize} \item parallel, summing $I$, $S$ \item series, summing $V$, $Z$ \end{itemize} \end{itemize} When it is understood that the currents and voltages in a circuit have a sinusoidal form, impedances can be represented by complex numbers, which are 2D vectors with added multiplication and division (inverse). This is because the primitive impedances behave as integrators or differentiators and thus leave the sinusoidal shape intact, changing only phase and amplitude. The interesting question is then: while investigating a circuit, is it simpler to switch between variables V and I when trying to understand circuits, or should one build an intution for the transformed summing relation (invert--sum--invert) $$A \| B = (A^{-1} + B^{-1})^{-1}$$ One way to try to go the second route is to see how this works geometrically. It's simplest to take one of the two values as a reference point, working with $C = B/A$. $$1 \| C = (1 + C^{-1})^{-1}$$ The inverse of a complex number $C$ is expressed by $\frac{\bar{C}}{\|C\|^2}$. I've tried these on paper: conjugate (flip around 1), invert (magnitude), add, conjugate, invert. This doesn't give me much understanding as the addition is still done in the other domain, so I move from impedance to conductance or vise versa. However the conjugates are not necessary as they cancel out so simple magnitude inversion is enough. The resultant in first approximation resembles most the smallest one and in second approximation the the resultant is inbetween the 2, and smaller than both. Entry: Quantization and number theory Date: Mon Dec 27 11:05:15 EST 2010 This subject keeps coming back really. Context: Digital control of analog synthesizers. This pops up in an electronics project I'm working on. The idea is to build a multi-channel DAC for control signals (say up to 200Hz), using a cheap microcontroller. The main design idea is that approximation error is OK, as long as it averages to 0 and it doesn't correlate with the signal or with itself. This means ordinary PWM doesn't really cut it as it has very strong periodic content. Dithered Sigma-Delta seems to be the proper tool. Next to digital control, I'd also like to explore pure digital combination of binary modulated signals using logic gates. See previous entries, i.e. [5]. The key idea here is to make sure there is no correlation. However some remark in [4], section "Multiplying 2 1-bit signals and getting a noise-shaped result" make me think I'm missing some key point when multiplying: high frequency modulation noise being modulated down. Recently I found a link to a seemingly interesting book [1] about Analytic Number Theory here [2], and a 2005 workshop[3][4][6] dealing with the subject and some more from Robert Adams[7]. The last one probably deserves a separate post[8]. [1] http://148.202.11.158/ebooks/mathbooks/Number%20theory/Analytic%20Number%20Theory%20-%20Newman%20D.J..pdf [2] http://rjlipton.wordpress.com/2010/12/26/unexpected-connections-in-mathematics/ [3] http://www.cscamm.umd.edu/programs/ocq05/ [4] http://www.cscamm.umd.edu/programs/ocq05/adams/adams_ocq05.htm [5] entry://20100218-104438 [6] http://www.cscamm.umd.edu/programs/ocq05/wolfe_ocq05.htm [7] http://www.netsoc.tcd.ie/~fastnet/cd_paper/ICASSP/ICASSP_2005/pdfs/0400077.pdf Entry: A Signal-processing interpretation of the Riemann Zeta Function Date: Mon Dec 27 11:33:06 EST 2010 Just a link[1]. Interesting stuff. Basic idea: a log delay network (LDN) is a system with delays at logarithm of the natural numbers. These networks have closed composition (LDN . LDN \in LDN). This is then used to construct an infinite series network which has supposedly provable stability, from which we can conclude that the zeros of the eta/zeta function have to be on the real line. So where is the flaw that provents this from being used as a proof of the Riemann Hypothesis? [1] http://www.netsoc.tcd.ie/~fastnet/cd_paper/ICASSP/ICASSP_2005/pdfs/0400077.pdf [2] http://www.cscamm.umd.edu/programs/ocq05/adams/adams_ocq05.htm [3] md5://31aab89af3f503238b57126a5d268639 Entry: Music: scales and intervals: almost equal composite numbers Date: Sun Jan 2 13:05:26 EST 2011 The fact that there are different tuning systems for the chromatic scale indicates that there is something quite wrong here. The chromatic scale is a happy accident because our brain glosses over the differences between intervals that are close but quite different. There are many examples. Take the minor 7th[1] for example. It corresponds to 16 / 9 = 1.777... 9 / 5 = 1.800... both harmonic intervals have nothing to do with each other, but melodically they are very close, about 1%. (A minor step is about 5%). It seems that music relies on this kind of gloss-over substitution when tones take on different relations in chords and melodies. I've always found this to be one of the weirdest things about the theory behind music. I've never seen it being mentioned so explicitly either, only in the context of tuning systems where it is mentioned as a nuisance. I think Bach even called it a Devine Joke or something (TODO: find quote). In short, it seems as if we can "teleport" between ratios that are related by near-1 fractions. I.e. for the minor 7th above we have a difference of 16 * 5 / 9^2 = 80 / 81 In general this phenomenon is recognized as a comma[3]. The ratio above is called the syntonic comma[2]. So one, how to enumerate the commas? Since the intervals in music all use limited prime ratios (i.e. up to 7), and limited amount of octaves, there is a limit to the amount of mistune one can get. And second, I wonder if this can be turned around. Given a chord written in "human" chromatic notation glossing over commas, is it possible to "distill" its different harmonic meanings by finding intervals that match in certain ways? I.e.: 16 / 9 = (4 / 3)^2 : two perfect fourths 9 / 5 = ... [1] http://en.wikipedia.org/wiki/Minor_seventh [2] http://en.wikipedia.org/wiki/Syntonic_comma [3] http://en.wikipedia.org/wiki/Comma_%28music%29 Entry: Zeta function Date: Sun Jan 2 18:10:24 EST 2011 Type: tex I've always found the multiplicative form of the zeta function $\zeta(s)$ quite intriguing, even if the explanation is straightforward\footnote{For the real proof: I forgot what the conditions are for rearranging terms in infinite sums. I believe it's allowed for absolute convergent series, which is the case there since all terms are positive. Note that the region of convergence is $\text{Re}(s) > 1$, so any statements made about this expansion and its multiplicative form are only valid for that particular region!}. With $N$ the naturals and $P$ the primes we have $$\zeta(s) = 1 + 2^{-s} + 3^{-s} + \ldots = \sum_{n \in N} n^{-s} = \prod_{p \in P} \frac{1}{1 - p^{-s}}.$$ This is because $$\frac{1}{1 - p^{-s}} = 1 + p^{-s} + p^{-2s} + \dots$$ and if you multiply them out, each natural number will be due to exactly one combination of the terms of the different factors of the product. This is an explicit way of saying that each natural number can be written in exactly one way as a product of prime powers. The interesting trick to note here is that operations on infinite sums are used to represent \emph{quantification} over a domain, i.e. for all integers. %[1] http://en.wikipedia.org/wiki/Riemann_zeta_function Entry: Encoding pairs of numbers Date: Sun Jan 2 18:32:00 EST 2011 Type: tex To encode a pair of two integers, construct a discreate space filling curve. Let's take a look at the single quadrant triangle constructed from successive antidiagonals. $$ \begin{array}{cccc} 0 & 1 & 3 & 6 \\ 2 & 4 & 7 \\ 5 & 8 & \\ 9 & \\ \end{array} $$ The top row are the partial sums of the the natural numbers: $n(n+1)/2$. If $(x,y)$ is the number to be encoded, you first find the diagonal is associated to by picking its top row element $(x+y,0)$, and then move down to the desired spot, so the encoding is $$n = y + \frac{1}{2}(x+y)(x+y+1).$$ Given $n$, the inverse can be computed by solving a quadratic equation $$\frac{1}{2}s(s+1) = n$$ for $s$ and rounding down to the nearest integer $s_r$. This is the first coordinate of the top row element of the diagonal the element is on. Whe then just need to move down the diagonal by by $y = n - s_r$, and so $x = s_r - y$. Entry: Encoding finite sequences of numbers Date: Sun Jan 2 18:58:23 EST 2011 Type: tex Primes can be used to to represent a finite sequence of integers of arbitrary length as a single number. For a sequence $x_1,\ldots,x_n$, construct the number $\prod_n p_n^{x_n}$ where $p_n$ is the $n$th prime number. The operation can be reversed by factoring the number. Is there a simpler way? Entry: The 7 divisions of the string. Date: Sun Jan 2 20:48:09 EST 2011 From Antti[1]. This is a simple algorithm for constructing musical interval ratios, starting with small numbers. We stop when the denominator corresponds to a regular polygon not constructable by ruler and compass. Now that sounds quite magical, but what does it mean? I think the non-constructable polygons (by ruler and compass) are the same as the n-th order polynomials factorizable by radicals. This problem is really about symmetry / discrete groups. The question is then: is this a coincidence, i.e. simply a question of where the brain decides ``well, this is enough beating to be called stable sound'' or does it really have something to do with the symmetry behind the non-constructable polygons. [1] http://ndirty.cute.fi/~karttu/Kepler/a086592.htm [2] http://ndirty.cute.fi/~karttu/Kepler/HarMun.txt Entry: Tim Stilson PhD Introduction Date: Tue Jan 4 04:55:22 EST 2011 Looks like he finished his PhD in 2006[1]. His work is mostly not about oversampling and non-linearities, but about non-oversampled techniques. ( Though I doubt BLIT and BLEP can be considered non-oversampled, as they are essentially oversampled lowpass. ) Anyways, in the section about discretization of continuous filters he mentions the Delta Operator on page 33, which also appeared in the book "Finite Difference Equations"[2] I read the first part of recently. The idea is to use an integrator/differentiator analogy to note discrete systems instead of the unit delay as this is more elegant and more closely resembles the analog case. This then gives the difference calculus, a set of rules to manipulate formulas with deltas, as exposed in [2]. The advantage of this encoding is that it isn't plagued by numerical issues for highly oversampled systems, i.e. Sigma-Delta modulators. There should also be a link to automatic differentiation somewhere... I.e. using memoized / butterfly-style networks instead of multiplied-out direct formula resembling "convoluted" bell/binomial shaped coefficients as Tim mentions. I need to explore other references too.. Goldmine. Dana Massie: EQ, Harvey Thornburg: NL moog. Antti Huovilainen: moog circuit model. [1] https://ccrma.stanford.edu/~stilti/papers/Welcome.html [2] isbn://0486672603 Entry: Difference Calculus Date: Tue Jan 4 11:09:54 EST 2011 Following up on the references from Tim Stilson's PhD[2] thesis. Already have the book "Finite Difference Equations"[1] from 1961 by Levy and Lessman. ( I found the original in a used book store without ISBN, [1] is the Dover reissue. ) The references Tim mentions are [3][4][5][6][7][8] (books and paywalled papers). I found one accessible paper on the subject[9]. For x[k] a real valued sequence |N - > |R, we define the operator z: z x[k] = x[k+1] Then we can define d = z - 1 or d x[k] = x[k+1] - x[k] These are normalized equations with time step equal to 1. Naively relating d to the s-plane requires s/f_s normalization. Before getting into cargo cult DSP math, one needs to keep in mind that: * The d and z transforms of a discrete filter contain the same information: they are merely described in terms of different primitives: the unit integrator (accumulator) vs. the unit delay. * In the limit case of infinite sampling rate filter coefficients of d-plane and s-plane coincide. However, naively converting between the discrete and continuous case for finite sample rates is still wrong! In fact, naive conversion is the forward difference transform wich maps s -> z - 1, and which has known stability issues for coefficients away from z = 1. More general, all the caveats concerning traditional filter discretizations still apply. One can derive reformulated versions of the well known transforms: forward/backward difference, bilinear, impulse response invariance (pole mapping) and pole-zero mapping. These then look like naive (f_s scaled) direct s -> d mappings + some correction terms O(s / f_s). The main use seems to be one of implementation: coefficient sensitivity will be less in d-plane formulation for low-frequency poles and zeros. Of course one can start by simply substituting z = d + 1 in a fully expanded z-plane formulation, but that is not very insightful. Instead we do the following: * Factor the z-plane formulation first as a product of first order terms (z-z_k), including complementary complex ones. Performing the substitution in this form yields terms (d + 1 - z_k). We name d_k = 1 - z_k. For low-frequency phenomena it happens that the z_k are all close to one meaning the d_k are small in magnitude. So essentially, a formulation in d-plane form eliminates a subtraction that is the cause of loss of precision in the z-plane formulation. * 1st order IIR sections in the form of b/(d-a) can be implemented directly using leaky integrators[11]. A state variable 2nd order section can be used to implement the complex conjugate 1st order sections. For IIR filter design (when everything is analytic) there is little practical difference between z-plane and d-plane approach: simply transform (analytically) when you're done. For FIR filter design which is mostly numerical in nature, the d-plane approach seems also useful for avoiding numerical instabilities. I.e. to formulate the optimization problem in such a way that the FIR filter appears in factored form, instead of multiplied out. ( From my own experience I've found that FIR filter design problems involving low frequencies are seriously plagued by convergence issues due to insufficient numerical precision. I recall that using a spectral factorization method to design an FFT window filter I would obtain results that where clearly wrong after visual inspection, i.e. one coefficient value "sticking out" while there should be overall symmetry. It would surprise me if this hasn't been realized long before. Look into ladder FIR filters. ) [1] isbn://0486672603 [2] https://ccrma.stanford.edu/~stilti/papers/Welcome.html [3] http://ieeexplore.ieee.org/xpl/freeabs_all.jsp?arnumber=1104162 [4] http://md1.csa.com/partners/viewrecord.php?requester=gs&collection=TRD&recid=2270973CI&q=&uid=790228084&setcookie=yes [5] isbn://9780817639341 [6] http://ieeexplore.ieee.org/xpl/freeabs_all.jsp?arnumber=0123294 [7] http://ieeexplore.ieee.org/xpl/freeabs_all.jsp?arnumber=73564 [8] http://ieeexplore.ieee.org/xpl/freeabs_all.jsp?arnumber=219661 [9] http://sites.google.com/site/jamesgibsonhomepage/projects/DeltaOperatorCaseStudy.pdf [11] entry://20110104-124734 Entry: Practical d-plane formulations Date: Tue Jan 4 12:47:34 EST 2011 As a continuation of [1]. Given a d-plane formulation, how to implement it? It's simplest to start from the leaky discrete integrator and see what it's d-plane formulation looks like. The time domain formulation y[k+1] - y[k] = a y[k] + b x[k] y[k+1] += ... expresses the update of y in terms of the weighted magnitude of y and x. It corresponds to the frequency domain formulation d y = a y + b x (d - a) y = b x y / x = b / d - a The non-leaky integrator is then b / d. Complex poles are handled in exaclty the same way: We take the complex form of one of the complementary pairs, feed it a real input and taking only the real part of the integrator as as output. TODO: - Make the 2nd order implementation explicit. Handwaving might be wrong. - What about d-zeros? [1] entry://20110104-110954 Entry: Lattice and Ladder filters Date: Tue Jan 4 14:02:24 EST 2011 Let's get some terminology ironed out. Mostly in relation to [2][3]. Structurally, a d-plane formulation can be associated to a ladder/lattice topology, and their good numeric properties are well known. They also have interesting mathematical properties (see linear prediction and orthogonal functions on the unit circle [10], and further work in generalized Schur and displacement rank algorithms for Toeplitz-structured matrix problems.). [1] http://en.wikipedia.org/wiki/Lattice_filter [2] entry://20110104-110954 [3] entry://20110104-124734 [10] http://www.emis.de/journals/BBMS/Bulletin/bul941/BULTHEEL.PDF Entry: Nonlinear digital Moog VCF + bandlimited sawtooth. Date: Tue Jan 4 15:31:08 EST 2011 Also mentioned in Stilson PhD[2] is Antti Huovilainen: Nonlinear digital implementation of the Moog ladder filter[1][3] ([5][4]). Interesting derivation, but not such a surprising result. However, the piecewize parabola is a nice trick! So I wonder if this can be taken further one order by representing the remaining "bounce" discontinuity by a 2nd order "brake" discontinuity and differentiating twice, i.e. by using piecewize 3rd order polynomials: (x-1) x (x+1) Deriving once gives the piecewize parabola, deriving again gives the sawtooth. The 3rd order polynomial's discontinuity is smoother so should roll off faster (18dB/octave). It shouldn't be too hard to implement in Pd either.. Indeed: differentiated parabolic (DPW) [6] twice differentiated cubic (2DCW) [7] Looks like the obvious extensions are already published[8]. [1] http://dafx04.na.infn.it/WebProc/Proc/P_061.pdf [2] https://ccrma.stanford.edu/~stilti/papers/Welcome.html [3] http://www.mitpressjournals.org/doi/abs/10.1162/comj.2006.30.2.19 [4] md5://81ef26b98b858bfc7ea351850b7f8872 [5] md5://ec26bdee832237793c6de78875942a60 [6] http://zwizwa.be/darcs/pd/abstractions/saw2~.pd [7] http://zwizwa.be/darcs/pd/abstractions/saw3~.pd [8] http://ieeexplore.ieee.org/Xplore/login.jsp?url=http%3A%2F%2Fieeexplore.ieee.org%2Fiel5%2F10376%2F5446581%2F05153306.pdf%3Farnumber%3D5153306&authDecision=-203 Entry: 2nd order 4-multiply ladder EQ Date: Wed Jan 5 07:06:37 EST 2011 Also mentioned in Stilson PhD[2] is Dana Massie's work on 2nd order ladder EQ with separate frequency and Q controls[1]. [1] http://www.aes.org/e-lib/browse.cfm?elib=6994 [2] https://ccrma.stanford.edu/~stilti/papers/Welcome.html Entry: Model based control Date: Thu Jan 6 13:06:47 EST 2011 I think I just re-invented MBC: - Given a (non-linear) fixed model of the control->actuation transfer, perform feedforward or feedback control updates at high frequency. - Update model parameters at lower frequencies. Entry: Delta Operator Date: Thu Jan 6 14:28:24 EST 2011 Called divided difference as it's defined as: d x[k] = x[k+1] - x[k] / T I.e. the differences are taken relative to the sampling period T. In most cases where T is fixed this can be normalized to T=1 by moving the scaling to the coefficient end, i.e. coefficients will become small when T approaches 0. However, in the exposition of [1] explicit mention of T is desired to be able to take the limiting case of T->0 to make the bridge to the continuous case, and the cases with different T. See also fixed step integration[2], Time Scale Calculus[3]. [1] md5://3e41deebd80a5b988de09b645c036e57 [2] http://en.wikipedia.org/wiki/Eulers_method [3] http://en.wikipedia.org/wiki/Time_scale_calculus Entry: Matrix multiplication is quadratic? Date: Mon Jan 10 00:10:48 EST 2011 Basic idea: Matrix multiplication can be related to group multiplication, which can then be implemented by the fourier transform. This is freaky stuff. I think this paper applies to finite field matrices though. Does this also work for the reals, or in approximation for floating point numbers? Does it mean that in general, _all_ matrix multiplication problems can be expressed as convolution problems? I.e. the generalized Levinson / Schur style algorithms already exploit this in some sense.. Are all matrices "structured" ? [1] http://arxiv.org/abs/math.GR/0511460 Entry: Dithering and "FAT" saw stacks Date: Mon Jan 10 11:43:47 EST 2011 The idea: a stack of sawtooth oscillators sounds "FAT" when it's from an analog synth. I don't have one on hand here, but is this due to subtle frequency changes? Using the parabolic/cubic sawtooth generator in Pd I get a rich sound, but the very repetitive intermodulation is a bit disturbing. ( Problem however, I do not know if I'm just paying attention to something I wouldn't normally notice. Unfortunately I do not have an analog sawtooth oscillator handy. ) However, the idea in general is quite interesting. This is related to: - dithering in sigma-delta - peak limiting in spread-spectrum pulses and ADSL[1] - eliminating room resonance modes in acoustics (flattening) - same for artificial reverberators (i.e. FDN) Ultimately this is about integers, i.e. "almost equal" numbers. But maybe that's not what I'm looking for. Could this be an issue of coupling and chaos? I.e. if multiple saws go thorugh a discontinuity close together, could there be some secondary cause for them to spread apart in frequency? I.e. modulations in a string do just that: peaks increase the tension and thus temporarily disturb the linear order. [1] md5://9b2913693c005de2515e3efeb8548e47 Entry: Z-transform and generating functions Date: Tue Jan 11 15:58:12 EST 2011 Who chose that damn negative exponent in the Z-transform[1]? To be fair, I do see why it's done that way. It makes sense for "multiplied out" FIR and IIR filters since it leads to all-positive exponents corresponding to the index decrement unit delays. However, for expressing difference and update equations in state space form it is simpler to have "z" mean _index increment_ or feedforward in time, and "d" defined as d = z - 1. This makes update equations and difference equations directly expressible: x_{k+1} = a x_k => z x = a x x_{k+1} - x_k = b x_k => d x = b x In state space form, the relation between the z and d formulation is also pretty clear. For systems where the dynamics is slow compared to the sampling rate, the state space formulation in difference form also makes more sense: it has better numerical stability and corresponds to fixed-step[3] integration of a continuous differential equation. In addition to that, flipping the exponent around makes the Z-transform equal to the relation that expresses a generating function[2]. The inverse Z-transform then corresponds to the line integral expression for power series coefficients in the expansion of an analytical function around z=0. So, picking a non-(engineering-)standard notation means we need to take care when just copying expressions out of books and papers. However from context it is usually quite clear which is which. The most important differences for the postive-exponent Z-transform are: * Stability: convergence of a power series of an analytical function (representing a signal or transfer function) works for |z| < R where R is the location of the pole closest to 0. * Stable transfer functions need to have all poles _outside_ the unit circle. This is just the reverse of the flipped-sign Z-transform. Of course, a similar point could be made for the Laplace transform (LT). Why does it have a negative exponent? Same for the Fourier transform (FT). For the FT it seems to make sense to me, as the reconstruction is in terms of positive exponents, which somehow makes a more intuitive appeal. Anyways, it really doesn't matter that much -- just convention and something to nag about... [1] http://en.wikipedia.org/wiki/Z_transform [2] http://en.wikipedia.org/wiki/Generating_function [3] http://en.wikipedia.org/wiki/Euler_method Entry: Factored FIR Date: Fri Jan 14 15:44:14 EST 2011 Q: Are there any FIR filter design algorithms that explicitly rely on the equation to be in factored-out form, i.e. as a product of 2nd order sections? I suppose this introduces the problem of having to specify whether FIR zeros are complex or not. I seem to remember them mostly being complex, exept for odd orders where there would be one zero. Q: Is there a reason why FIR filters have "mostly complex" zeros? Is this equivalent to asking why their inverses the IIR all-pole filters have "mostly complex" poles? (Because that gives more locality.) Entry: Initial Algebra / Final Algebra Date: Fri Jan 21 12:01:08 EST 2011 This terminology is used in [1]. The concrete interpretation for me is that "final" means modeling in terms of functions while "initial" means modeling in terms of abstract data types. Other than that the terminology is obscure to me. Googling brings me to this[2]. So it's in the field of semantics. [1] http://www.cs.rutgers.edu/~ccshan/tagless/jfp.pdf [2] http://homepages.feis.herts.ac.uk/~comqejb/algspec/node12.html Entry: Squirrels & robot control Date: Sun Feb 20 12:38:52 EST 2011 Squirrels use very fast balancing corrections. Why is this? Some possibly relevant issues: - Fast corrections (high acceleration) are relatively independent of gravity. I.e. this is a "pulse compensation" as compared to a continuous control process. - They have litte mass m, which means that they don't need much force F to get to high acceleration a = F/m. - Fast pulse compensation is also present in eye movements. Apparently for the eye, tracking is more difficult than focusing on the same point for a longer period of time. Entry: Fuzzy class D multi channel amp Date: Wed Mar 2 12:33:07 EST 2011 Problem 1: I want a multi-output class D amp for synth control signals. BW = 200Hz is a must, anything higher is welcome. Problem 2: I want to use a cheap, off-the-shelf part (a PIC uC). In general, you'd want to solve the problem in hardware (i.e. PLC or FPGA) for one channel independently, and then duplicate it n times. However, the rates involved in my problem are quite low, so it's probably possible to use a cheaper uC-based solution. How to go about making that optimal? To do it on a PIC requires some cleverness, as flipping multiple bits at the exact calculated time will not work well because the PIC needs to do the necessary computations serially. Taking a strict time multplexing approach is probably not optimal, as in most cases no action needs to be taken, i.e. no toggle performed, so we might as well not have done the computation in the first place. So I wonder how to go about trying not to perform unnecessary work and using a more optimal multiplexing scheme. I.e. something like: - find a fast way to guess the next output that needs a toggle - verify the guess by performing a computation - perform the toggle (or not!) - record the actual time of the toggle so it can be incorporated in the next control computation The hard problem seems to be the first one: how to build a system that knows which current error is the largest, so requires immediate toggle action? If the guesses are accurate, there won't be many computations that are for nothing, i.e. that decide to not toggle because it's too soon. I'd say this is a data structure problem. Find a way to keep a sorted list of current errors. Then simply pick the top one to update. Entry: Sound impedance Date: Tue Mar 15 23:49:18 EDT 2011 If I get this right: speakers are cone shaped. At the narrow end, velocity is high but pressure is low. At the wide end it's reversed. Widening enables the same amount of mass to move with less velocity. WRONG! According to [1] it's just the other way around! I think it's time for me to pick up Feynman's lectures again.. [1] http://en.wikipedia.org/wiki/Horn_loudspeaker Entry: I don't know sound: 1. Pressure Date: Wed Mar 16 00:04:01 EDT 2011 I think I don't understand pressure. It's a statistical measure that represents force per unit area in a collection of moving point-masses. Note that it is really the force that is averaged: Take a very small piece of area such that it gets hit infrequently. The force necessary to keep it in place has a pulse-like nature due to collisions: force is high when we're "pushing back" on an incoming molecule, and zero when there is no contact. Therefore it is simpler to go up and down the integral stairs once: don't look at force but at impulse transfer per collision (integrate) and then look at the number of collisions per time unit (differentiate). From this perspective, why is it that a speaker driver (i.e. a flat surface moving in a direction perpendicular to its surface plane) has high impedance, meaning it produces mostly pressure and no displacement? Entry: Power spectrum as power of derivative of image Date: Thu May 5 08:36:15 EDT 2011 See [1] -> Listening to your webcam. I was wondering, since there is such a "naturally pleasing" map between the derivative of an intensity map and an audio spectrum, are there more such parallels to make? [1] http://homepages.kcbbs.gen.nz/tonyg/projects/tangents.html Entry: Queued Class-D DAC Date: Sun May 8 08:57:30 EDT 2011 Type: tex \def\us{\mu\text{s}} The objective is to use a cheap microcontroller to implement an $N$-channel pulse--modulated DAC, as opposed to using parallel programmable hardware, duplicating a $1$-channel DAC $N$ times. The following is a list of remarks about how to implement this functionality on a cheap microcontroller. \begin{itemize} \item The software switching performance of a typical uC like a PIC18F is about $10$ instructions or $1\us$, where each time slot reads and updates a state register and an output pin. \item The DAC is going to be used to modulate amplitude and frequency of audio signals. This places an extra constraint on the conversion error: its decorrelation is favoured over its inaudibility. \item A straightforward approach is to multiplex such time slots on a rigid grid, obtaining a switching period of $NT$ where $N$ is the number of channels. \item Depending on the modulation algorithm used, it might be possible to optimize this scheme. Instead of computing each DAC's state update at each tick period, it might be possible to just compute the time to next switch, and then queue these operations. If the scheduling problem can be solved exactly, this can bring the effective resolution from $NT$ in the dumb multiplexed case to $T$ in the queued case. \item If the scheduling problem can not be solved exactly, it might be possible to measure the switching error and feed it back into the state update. Some global strategy is probably necessary to make sure these errors are distributed over several DACs, and sufficiencly decorrelated in the case of signals that modulate audio. \item It might be possible to favour the $50\%$ duty cycle, and only encode differences with respect to this default. I.e. the signal $101011010$ would be represented by $000001111$ after demodulation by the default $2T$ square wave $10101010$. It would be interesting to find out if such an approach is beneficial for eliminating correlated noise ripples, or if it just shifts the problem. \end{itemize} Entry: Code generation and quasi-particles Date: Sun Jun 12 00:04:03 CEST 2011 (mumbo-jumbo) I've had it with this borking real-world stuff. Time for some seriously wacky ideas. What I'm trying to do in libprim is to make a compiler and VM at the same time. In practice, glossing over this means to take the simplest representation of that structure (an interpreter), i.e. represented as the composition A B C, and inserting "virtual particles" (which are operations and their inverses) at the boundaries of the VM and compiler. I.e. A B x x' C, where A B x is the VM and x' C is the compiler. That's the high level mumbo jumbo. Now, can this be done in practice? How to express code specialization in an explicit algebraic way, i.e. as symmetry transforms? From afar the theory seems sound: you take an element of a group which represents a function, and you find a representation of that element such that it can be split in two: a compile time and a run time element. Then use your understanding of the symmetry of that group to make automatic search a little bit feasible.. Entry: Programs and specifications Date: Sun Jun 12 15:22:28 CEST 2011 (mumbo-jumbo) The trouble is that representation needs to be as abstract as possible. This seems to be the hard part as very often it remains implicit. I.e. programmer thinks "Ah, let's use this bit for that to quickly hack around doing these things separately". That's exactly the implementation information that one wants to capture, and has an incredibly large amount of detail if made explicit. This then also indicates why this process is not factored in the following particular way: specification S . specializer (S->I) = implementation (I) but by writing the implementation I separately, and providing a proof that I embeds S. Q: What is more difficult, to provide an indirect proof that S is isomorphic to a subpart of I, or to construct a direct (parameterizable?) embedding function S->I? Putting it like that it seems that the former is simpler, as it allows the use of non-constructive proof techniques. However, making the S->I embedder parameterizable might give a whole family of implementations that do not need separate proofs. Entry: A Primer of Infinitesimal Analysis Date: Thu Jul 7 23:28:09 CEST 2011 A nice one to stack next to dual numbers[3] and automatic differentiation. Link from [2], which is quite an interesting post in itself. ``Smooth space-time, say `R^4`, allows only smooth motion and smooth distribution of mass. If we place non-smooth mass in it, the space will change to a _subset_ of `R^4` which carries additional information about the anomalies contained in it.'' I started reading [1] and I must say it's quite interesting to reason with non-non-existing entities ;) In the book they are likened to virtual particles: something that's not really present in end results, but useful as a tool in calculations. [1] isbn://0521624010 [2] http://math.andrej.com/2008/08/13/intuitionistic-mathematics-for-physics/ [3] http://en.wikipedia.org/wiki/Dual_numbers [4] http://en.wikipedia.org/wiki/Smooth_infinitesimal_analysis Entry: Getting that book feel on an e-reader Date: Tue Jul 12 22:00:30 CEST 2011 If the font is too thin, e-readers or screens don't give a very good feel. I.e. math books and papers in computer modern are really horrible. I'd like to emulate the feel of blurry ink on paper. This seems to work well with a 600dpi PDF with radius 6 gaussian blur (gimp's units) and a exponential looking J-shaped intensity curve. Before I tried this with a pixel-growin algorithm too which gives similar results[1]. [1] entry://20090511-181107 Entry: Human motion Date: Sun Oct 30 17:44:29 EDT 2011 Plot derivative of a moving laser dot to see the small "pulse corrections" and make a model for that. Entry: Time varying filters Date: Mon Oct 31 00:16:32 EDT 2011 What makes musical filters musical is their time-variant nature, their speech or phrasing. Filters by themselves are quite cheap, though controlling them, making them time variant to make an interesting texture is not at all a simple problem. It might be interesting to go into that problem a bit now that I (almost) have my code generation machinery running.. Entry: Logic Date: Sun Dec 18 16:42:08 EST 2011 For my EE-brain, "logic" is the logic of electronic circuits, which corresponds to propositional logic. The "logic" of type systems is at least first order logic. Quantification (things with holes) is necessary to express structure. Entry: Inference Date: Fri Dec 23 12:53:47 EST 2011 Is inference always generate-and-test? More specifically, I'm thinking of the task "design somthing that has these properties.". In general, there is no algorithm that goes from a description of the properties to the description of something that implements those properties (a logic and a model?). I'm inclined to think (in a vague an unrooted way) that any form of "algorithmic inference" is really just fixing parameters an already existing / well-defined family of models, not *ever* the construction of a model out of the blue. Why is this important? Compilation of specifications. Is it so that a model can be seen as a "constructive proof" of a set of properties? Is this kind of thinking formalized in some way? [1] http://en.wikipedia.org/wiki/Model_theory Entry: Distributivity vs. Commutation Date: Thu Dec 29 08:50:12 EST 2011 I'm still in the process of finding a good terminology for "type commutation" in Haskell, i.e. correspondences like this: a (b t) <-> b (a t) a (b t1) (b t2) <-> b (a t1 t2) and multi-arity variants of this, which seem to be more arbitrary due to the different ways the t_x can be permuted. Entry: Density of discrete entities Date: Sun Jan 1 17:04:12 EST 2012 There's something funny going on with the concept of density for a finite amount of particles. How is the "blurring" regularization for this to work usually handled? I guess that moving to a statistical interpretation solves this problem. The density is not a property of the particles, but one of the probability distrubution that is used as a model of the particles. I.e. model 10 particles by thinking of them as a (random) a sampling from a PDF, then talk about the PDF instead. Entry: Accounting Date: Mon Feb 13 13:27:37 EST 2012 How to use gnucash. Problem: I currently don't care about "proper" accounting (i.e. the accounting equation [1]). I just want a way to manage order reports (i.e Amazon, Ebay, ...) that can be obtained electronically, to categories of expenses (Office, Equipment, Transport, ...). I'm not sure if that makes any sense in a standard accounting approach. It's probably good to first learn how to do this properly. [1] http://en.wikipedia.org/wiki/Accounting_equation