Ideas in need of testing. Take what you find here with a grain of salt. Next to some random observations and thoughts, this blog is about (mathematical) intuition and vague hints at possibly interesting ideas. Entry: Triangular Matrics Date:Sun Oct 9 02:31:26 CEST 2005 Type: tex %% Displacement rank. $\Delta R = R - Z R Z^T$. This looks an awful lot %% like the update equation for the a--priori estimation covariance %% estimate $P_{k+1|k} = A P_{k|k} A^T + W$ in the Kalman filter. So, we %% could stretch things and give $R$ the meaning of covariance matrix, %% and $Z$ the dynamics of the system under study. Waking up now. This %% is of course because in our setting of displacement rank, $R$ is %% symmetric/Hermitian. More dreams were about triangular matrices again. My ``aha'' last week was about the property they leave invariant a \emph{subspace nest}, a total ordering of subspaces, making them into a \emph{nest algebra}. This ``recursive quality'' leads to all kinds of properties about why they are so important for numerical linear algebra. The most striking is linear system solving, where the property leads to an inductive divide and conquer strategy which solves the (trivial) system in the one dimensional subspace and then updates the equations to reduce the problem size form $N$ to $N-1$. The second consequence is $LU$ and $LL^T$ factorization, which establishes the link to ordinary and positive definite symmetric matrices. The algorithms for computing this have the same recursive properties. %% All this is of course intimately related to causality, where the nice %% example is of course the Schur complement used to express conditional %% probability. Now to say something more about the Hermitian/Upper triangular vs Unitary matrix concept. (For real matrices, replace with symmetric and orthogonal). We know any matrix can be written in ``polar form'' A = UR, where $U$ is unitary, and $R$ is pisitive definite Hermitian. BUT, we can also take $R$ to be positive definite upper triangular. What is exactly the link between these upper triangular matrices and Hermitian matrices, other than the Cholesky factorization? I.e. we can take the square root of a Hermitian matrix to be another Hermitian matrix or a triangular matrix, but can we transform a positive definite Hermitian matrix directly into an upper triangular matrix? Is there some deeper structure relating the two generalizations of positive real number? Let's have a look at square roots of triangular matrices first. For Hermitian matrices it is trivial to see from the eigenvalue decomposition. Writing it inductively we get $$\matrix{A & 0 \\ B & C}^2 = \matrix{A^2 & 0 \\ BA + CB & C^2} = \matrix{D & 0 \\ E & F}. $$ This is trivial to solve when all submatrices are scalars. If not, each of these can be split into two equations $A^2 = D$ and $C^2 = F$, which can be propagated back up to solve $BA+CB = E$. Suppose $a$ is a scalar and $b$, $e$ are column vectors. Here we have $a = \sqrt{d}$, and due to commutation $aB = Ba$ the equation above becomes $(C+aI)b = e$, which is a triangular system. So we have both the divide and conquer steps, giving us an algorithm. %% Now, onto Hermitian matrices. Unfortunately, they do not form a group %% under ordinary matrix multiplication like the triangular matrices do, %% only if they commute. So this is where to look. If they commute, they %% clearly do not present the same algebraic structure as lower %% triangular matrices, so what about commutators and anti--Hermitian %% matrices? %% This boils down to identifying with ``positive imaginary'' numbers %% instead of positive real numbers, and imposing a product like $A %% \times B = i[A,B]$ with $[A,B] = AB-BA$. The other interesting link %% might be that for triangular matrices, the commutator is %% nilpotent\footnote{ More generally, the invertable triangular matrices %% are a special case of solvable Lie group, who's Lie algebra are the %% upper triangular matrices. }. %% Now, that's enough for today. I'm waking up. Pure intuition, there are %% two sides to this coin\ldots The obvious side is that some connections %% I make here are very trivial from a logic point of view, others are %% more far out, making me seem like I am on drugs. Now, to make one of %% those drugged statements: this is about category theory and patterns. %% Making explicit these more far out statements is caught in category %% theory, for which at this moment I do not have the energy to dive into %% fully, but it's about turning the equality sign into morphisms. Or in %% other words, making approximate intuition a bit more mathematically %% ``serious''. Entry: Entropy and non-invertability Date: Mon Oct 10 05:18:05 CEST 2005 I wonder what the link is between entropy and non-invertibility, since both are related to one-way processes and information reduction. [EDIT Thu Jun 18 15:36:35 CEST 2009] This remark appeared in the log. It hints at something I don't yet understand enough to articulate. Look at the matrix computations tutorial (basics.tex). Entry: Displacement Date: Tue Oct 11 04:11:06 CEST 2005 Type: tex % here i'm falling in the ellipse-parabola-hyperbola trap again.. %% Some intuitive leaps. Separability versus displacement structure and %% differential equations like the Lapace and d'Alambert equation. And %% symplectic matrices. So what's this displacement business for symmetric matrices all about? In short: finding a way to write the sum/difference of two transforms of a single transform $A$ as a sum/difference of ``squares''. $$\Delta A = F A - A \tilde{F}^H = G \tilde{G}^H = \sum_{k=1}^r g_k \tilde{g}_k^H$$ More general, as a limited outer product sum. This should be done in a way as to lower the rank $r$, making it effectively separable. This way \emph{the original structure that we were forced to introduce to write our problem as an ordinary matrix problem can be recovered}. So the matrices $F_1$ and $F_2$ should really be gotten from the problem domain. For example for filtering problems, it is a shift matrix. Care has to be taken though that the original matrix $A$ can be reconstructed from this representation, since we want an alternative representation, not something completely different. One thing that has struck me thinking about structured matrices before is the question: if your equation is structured, why do you blow it up into a matrix? Naivity can be beautiful, because it usually hides the answer right there in the question: we can't do much equation solving in practice without using algorithms based on matrices and their factorizations, so we have to stick to formulating our problem in the language we know, to find a more general approach instead of exploiting the structure of the problem directly. This of course means that some of the (more obvious) fast algorithms we find by applying this philosophy are already encountered before by exploiting the structure of the problem in a more direct fashion. The key example here is the Levinson algorithm. Entry: Hermitian matrices Date: Tue Oct 11 08:30:44 CEST 2005 Type: tex A bit more awake now. I'm still quite puzzled by this Hermitian versus triangular stuff. The main reason for my puzzlement is that they have the same number of degrees of freedom, are both kind of a generalization of (positive) real number, indicated by the QR and polar decompositions, but they do not behave the same under multiplication. Maybe it's because i'm not making a distinction between Lie group and Lie algebra. Another difference is that multiplication of triangular matrices is closed, but this is not the case for Hermitian matrices. For that we have to move to the Lie bracket and anti--Hermitian matrices. Something else to remark. Exponentiation is closed for all triangular matrices, but it is only closed for positive definite Hermitian matrices. So there is a very subtle difference betweent the two. All this wierdness originates in the assymetry of the $A=QR=US$ relation of course, where $Q$ and $U$ are unitary, $R$ is upper triangular and $S$ is Hermitian. This is not a similarity transform. It is one--sided. Both $R$ and $S$ can be turned into each other by pre--multiplying them with a unitary matrix. Entry: Symplectic forms Date: Tue Oct 11 09:30:44 CEST 2005 Type: tex Then, symplectic forms. I can't seem to find any decent explanation of this anywhere, so here we go. Partition all the coordinates in a phase space as $(a_i,b_i)$ pairs, which could be represented as complex numbers, but which we are just going to represent as reals here. The thing that's preserved under symplectic transforms is $$\sum_k a_i b_i \sin \theta_i.$$ Now what does this mean if we compare it to that other expression $\sum_k a_i b_i \cos \theta_i$. If i recall this last expression involves dissipation, so the other one must mean conservation, no? Let's have a look at the leapfrog method. It solves the nonlinear continuous time set of equations $$ \begin{array}{l} d q / d t = p \\ d p / d t = f(q), \end{array} $$ by approximating them as $$ \begin{array}{l} q_{n+1} = q_n + \Delta t p_n \\ p_{n+1} = p_n + \Delta t f(q_{n+1}). \end{array} $$ It turns out this gives good results because of the $q_{n+1}$. First updating $q$ and then using this to compute $p$ gives the method its name. It was realized that the reason this equation works so well is because it respects \emph{symplectic conservation}. Which is a symmetry in Newton's laws of physics analogous to conservation of energy and angular momentum. See also that nice picture of invariant tori and quasiperiodicity. I wonder if there's any nice sound in this. %% And maybe I can combine this displacement rank stuff to do some %% symplectic geometry. %[1] http://www.massey.ac.nz/~rmclachl/SciRevMcL.html Entry: Golden Ratio Date: Tue Oct 11 10:30:44 CEST 2005 Type: tex One interesting remark i read on Baez's This Weeks Finds about the golden ratio, amongst all the witch hunting, was that the golden ratio can be seen as the most irrational ratio, because of the fact that each term in it's fractional expansion is $1$, so adds the minimal amount of approximation to the true value $(1+\sqrt{5})/2$. %[1] http://math.ucr.edu/home/baez/twf_ascii/week203 Entry: Questions Date: Tue Oct 11 17:44:17 CEST 2005 Type: tex So i should ask more questions. What is a Wiener-Hopf equation, and what is its relation to displacement rank? What's so special about symplectic conservation? Why did i hear about momentum and energy, but not symplectum? [sic. Basically, the answer is in Kepler's law.] Why is the wave equation not seperable in more than one time and space dimension? [Hmm. Because it is intrisicly \emph{spatial}. It is about the \emph{coupling in space} of certain fenomena. In one dimension, coupling is only left--right, so there's no arbitrary angle. Maybe the right answer is: the equation is seperable, but in so many parts (angles/rays) that it doesn't really compare to the $2$ part 1D solution.] What's the deal with this time is imaginary business, and wick rotation. [It's just a way to deal with hyperbolic spaces differently?] Is this more than a notational trick, or does it really say something? An, last but not least, what is a stochastic process? [It is a distribution of functions, and not a function of a stochastic variable. Or, the stochastic variable is a function of time, a trajectory.] Entry: DFT Date: Thu Oct 13 08:17:03 CEST 2005 Type: tex Another thing i suddenly realized. If $F$ is the DFT matrix, containing as columns the DFT base functions in terms of positive $i$, we have $F^4 = I$, meaning $F$ represents a $90$ degree rotation combined with a reflection. The eigenvalues of $F$ are $4$--th roots of unity $\in \{\pm 1, \pm i\}$. Note that a DFT of matrix $A$ looks like $FAF^H$. For $2 \times 2$ matrices, the normalized $F$ looks like $$F_2 = {1 \over \sqrt{2}}\matrix{1 & 1 \\ 1 & -1}.$$ We can write this as $$\sqrt{2} F_2 = \matrix{1 & 0 \\ 0 & -1} + \matrix{0 & 1 \\ 1 & 0} = E_1 + E_2,$$ where $E_1^2 = E_2^2 = I$ en $E_1E_2 = -I$. This is a $2^2$--dimensional Clifford algebra. Note that for this case $F_2 ^2 = I$. For $4 \times 4$ matrices we have $$F_4 = {1 \over \sqrt{4}} \matrix{ 1 & 1 & 1 & 1 \\ 1 & i & -1 & -i \\ 1 & -1 & 1 & -1 \\ 1 & -i & -1 & i}.$$ Something which has always bugged me is why no--one talks about eigenvectors of a DFT. For the continuous case this is important, but for DFT this doesn't really seem to make sense. Why? [Edited] Maybe there is an answer in the recursive formulas for the Hermite polynomials. Probably something needs to be said about the Gaussian wave packet. On the real line, the exponential of $-x^2$ somehow makes sense, but on the fine cycle on which the DFT is defined, it does not really. Entry: Square Root Matrix Notation Date: Sun Oct 16 16:25:56 CEST 2005 Type: tex Reading some displacement rank proofs for symmetric matrices, I found the notation to be rather cumbersome. Why not use a ``square root notation'', something like $QAQ^T = Q_A$. So the subscript denotes the signature/QF, and the main symbol denotes the basis transform for the QF. Absence of signature implies $I$. This gives $\Delta R = I_R - Z_R = G_J$ as main expression for $Z$ based symmetric displacement structure. The Cholesky decomposition then becomes $I_R = \sum_k l_k$, with $l_k$ the columns of the lower triangular Cholesky factor. Entry: Fisher Information Date: Mon Oct 17 13:15:33 CEST 2005 Type: tex Supposedly, Fisher information is used for continuous variables. But what about the difference of $\int_A p_X(x) \log p_X(x) d x$ and $\log (\int_A p_X(x) d x)$, for the event $A$? The second one clearly doesn't make much sense, but how can we relate mutual information to Bayes' rule then? The answer is rather obvious. Both rules are due to operations on events, instead of one being a consequence of the other. Interesting, but not for now: information geometry[1]: the study of probability and information by way of differential geometry. %[1] http://en.wikipedia.org/wiki/Information_geometry Entry: Laplace Operator Date: Mon Oct 17 14:15:33 CEST 2005 Type: tex \def\dt{\delta \over \delta t} \def\dx{\delta \over \delta x} \def\d2x{\delta^2 \over \delta x^2} So, onto the relation between time and space. The main player here is the Laplace operator $\sum_k {\delta^2 \over \delta x_k^2}$, with $k$ ranging over all space dimensions. Instead of starting from some physical process which leads to an equation involving this operator, we start to interpret the operator in several dimensions. For a univariate function it expresses just the second derivative. All this is nice to understand visually, by expressing the effect of the operatior on a Gaussian ``packet'', since the operator is linear and most functions of interest can be modeled as a sum of a collection of Gaussian packets. Starting from $p(x) = \exp(-x^2/2)$, we have $Dp(x) = -x p(x)$ and $D^2p(x) = (x^2-1)p(x)$. What we have here is the Rodrigues formula for the Hermite polynomials, upto scaling $(-1)^n$. Plotting these functions gives us a good idea of which way the push goes. For multivariate Gaussians the behaviour is essentially the same. We could start from $\int p(x) d x$ if we want to see the effect of a stepwize scalar field. Where does this bring us? Take for example the diffusion equation $Lp = {\delta p \over \delta t}$, which equates speed of change over $t$ to the Laplacian. This means concave sections will see a decrease, while convex sections will see an increase. Sections where the Laplacian is zero will be stationary. So the diffusion equation can be seen to ``straighten out'' a scalar field, evolving towards a state with minimal tension. Equating the Laplacian to zero gives harmonic functions. The equation $Lp = 0$ can be solved given boundary conditions. In one dimension, the solutions are just straight lines between the boundary points. In two dimensions, boundaries are one--dimensional giving rise to a larger family of possible solutions. For the wave equations, the Laplacian does not express the rate of change of $p$, but its acceleration $Lp = {\delta^2 p \over \delta t^2}$. This means we need extra information about the speed ${\delta p \over \delta t}$ if we want to interpret what happens for a Gaussian wave packet. In Short, we need to know if it's moving. Standing still, a packet will split. When moving with the wave velocity, a packet will keep its shape. Question. Are the nonlinear waves produced in the soapbubble pdp patch solitons? Intermezzo. An interesting pattern. Orthogonal polynomials on the unit interval $[0,1]$ with weight function $w(x)=1$ (Legendre), on the positive real line $\R^+$ with weight function $w(x)=e^{-x}$ (Laguerre), and on the whole of $\R$ with weight function $w(x)=e^{-x^2}$ (Hermite). I wonder what this has to do with the Riemann mapping theorem. The 3 intervals above are basicly the analogs of the unit disk, the upper half plane and the complex plane. Entry: Real Numbers Date: Tue Oct 18 11:37:34 EDT 2005 Type: tex Is it so that each irrational number is the limit of a sequence of rational numbers? Sounds very much like my surprize about $e$ in the last entry. Anyway, this is the continued fraction expansion. How did this work again? Continued fractions are motivated by a desire to have a ``mathematically pure'' representation for the real numbers as a sequence $a_k$ of integers. It is generated recursively as follows. With $[x]$ denoting integer part of $x$, we construct the series as $x_0 = x$, and proceed recursively as $$a_k = [x_k],\qquad x_{k+1} = 1 / (x_k - a_k).$$. For example, the first recursion for $x$ gives $x = a_0 + 1 / x_1$, where $x_1 \geq 1$. Repeating this procedure for $x_1$ gives an infinite series if $x$ is irrational. Truncation of continued fraction expansion gives in some sense the best rational approximation to an irrational number. Whenever an integer gets large in this expension, it doesn't add much accuracy. In this sense the ``golden ratio'' $(1+\sqrt{5})/2$ can be seen as the most irrational number, since its expansion consists of all ones. Entry: Beach Date: Sun Jun 4 21:44:37 EDT 2006 Type: tex Day on the beach. Pascal's triangle modulo 2 and it's symmetries. Matrices of symmetric parts of Pascal's triangle. Linear cellular automata and relation between boolean algebra and Galois fields. Modulation and splitting fields. Shifts: generated by $x$, Pascal's triangle, generated by $1+x$. Been reading in Tony Smith's page about Clifford algebras. He mentions the clifford product is sort of an extension of the set--theoretic XOR from sets and subsets to vector spaces and subspaces. Limit can be seen as a (linear) operator mapping a sequence of things to a thing. So when does this operator commute with other things? When is $f(\lim_{n\to\infty} s_n) = \lim_{n\to\infty} f(s_n)$ for example? My guess would be that $f$ needs to be continuous. What's the difference between algebra and analysis? The concept of limit. Where does this fit in the construction of real numbers? Real numbers are such that taking the limit of a sequence is a closed operation. This is not true for the rational numbers, i.e. $e = \sum_k 1/k!$ is not a rational number. For the real numbers, every Cauchy sequence is convergent. (Now, for what spaces is this not true?) This allows one to talk about continuity, differentiation, integration, \ldots All of them defined in terms of limits. Construction of the reals is done axiomatically by imposing Dedekind--completeness: every non--empty subset of $\R$ with an upper bound in $\R$ has a least upper bound in $\R$. This is not true for the rationals, i.e. $S = \{x : x^2 < 2\}$. Something I didn't know is that most real numbers are not computable because there are only countably many algorithms, or even definable because the set of all logical formulas is countably infinite. So all this madness is in some sense the result of the fact that every natural number has a successor, and we somehow want to look at the behaviour of the ``last'' one. When we do this for sequences, a.k.a. maps from $\N$ to $\R$, we can give the image of this last element a name, and call it the limit of the sequence. For maps from $\N$ to $\Q$ this is not always possible, hence the need for $\R$. Dedekind--completeness is simply the easiest way to impose a property that makes us see the ``horizon'', being the supremum of a set. Entry: Measure Theory Date: Wed Oct 19 10:30:36 EDT 2005 Type: tex So, what is Lebesgue integrability, and why should i care? Probably it's relation to distributions. First comment: Lebesgue integration interacts better with limits of sequences of functions, and as such is important in Fourier analysis. [ It bridges $\mathbb{N}$ and $\mathbb{R}$, or the discrete and the continuous. ] This brings us first to \emph{measure theory}, which discusses the notion of \emph{length}. We will work with the Lebesgue measure on $\R$ and $\R^n$, which assigns a length or $n$--volume to subsets of the line or $n$--dimensional Euclidian space. An important concept is a \emph{null set}, which is a set with measure zero. Countable sets are null sets, as well as sets with dimension smaller than $n$. The integral is built starting from the measure, following a path through indicator functions, simple functions, non--negative functions, signed functions and complex valued functions. Intuitively, the difference between Riemann integration and Lebesgue integration can be understood as chopping up the domain---evaluate a function on a grid---versus the codomain---sum areas of contour maps. An important property of Lebesgue integration is that integrals of functions $f$ and $g$ which differ only over a countable set have the same integral. In other words, $f$ and $g$ are equal \emph{almost everywhere}. Now, all this is again very theoretical. I still don't see why in practice there is anything ``useful'' about Lebesgue integration. However, it does seem to be very necessary to proove things about convergence of series and all sorts of tricks involving different kinds of infinities, or in other words which have to do with $\R$ not being countable. Supposedly it has something to do with the difference between Hilbert spaces (energy, norm from inner product) and Banach spaces. Important applications are probably the $1$ and $\infty$ norms related to control and systems theory. So, what is a Banach space? It is the generalization of completeness of $\R$ to vector spaces, in a way that the concept of \emph{distance} is generalized to \emph{norm}. Completeness is imposed by requiring that every Cauchy sequence of vectors has a limit. So a Banach space is a complete normed vector space. Sequences and Cantor's diagonal argument. Some detail i got hung up on: representation of real numbers as continued fractions, which can be made unique, and Banach spaces of bounded sequences. For the continued fractions, the values are taken in $\N$. How does this relate to the cardinality of the powerset of $\N$? This brings us to the next question. Disentangle measure, norm, metric, quadratic form, 2--form, inner product. Entry: Progress.. Date: Thu Oct 20 15:53:21 EDT 2005 Some small dose of matrix Lie groups, and then on to trace and determinant. Been random reading a bit until i got to multilinear algebra. The thing that strikes me is ``Clifford algebra'' [2] and so many related things on the bottom of [1]. I didn't get anywhere with this. But i did proove the generalized schur algorithm. There is one question though. How to solve a linear system on the fly without saving the Cholesky coefficients? Another random thing which was prepended a referral to a Feynmann quote about always trusting life will throw good things at you as long as you stay somewhere long enough. [1] http://en.wikipedia.org/wiki/Multilinear_algebra [2] http://en.wikipedia.org/wiki/Clifford_algebra Entry: Spinors Date: Fri Oct 21 06:08:52 EDT 2005 Type: tex Why are spinors not used more often in signal processing? To answer that question, we have to find a way to explain what spinors are really. As far as i can see right now, the essence is about representing a vector (a one--dimensional thingy) as a matrix. So, instead of looking at the transform $Qx$, we look at $QXQ^{-1}$. This is a double cover representation. It brings us straight to Clifford algebras and higher order tensors. For example, take a $16$--component vector. The spinor is the $Q$. If $X$ is a vector, $Q$ is a $128$--dimensional even grade element called a spinor. One example of this is representing SO(3) as SU(2). %% Now, can we do something with this? My immediate question would be, %% what are the analogs of the matrix decompositions for this double cover %% representation, and what can we say about displacement rank? %% \section{Sat Oct 22 12:24:07 EDT 2005} %% Been reading some subspace singal processing and identification stuff. %% Something is should have done way earlier, really. Anyways. Books %% arrived, so the tensor--sumsin estimation stuff will have to wait: %% Papy and Delathauwer. Entry: Unstructured remarks Date: Sun Oct 23 09:24:55 EDT 2005 Type: tex Some of yesterday's notes. Pad\'{e} approximation gives rise to a set of equations with displacement structure. It's no surprise that there exist recurrence relations. Bernoulli's method for finding the largest root of a polynomial and the power method for finding the largest eigenvalue are very related. Sequences of polynomials give rise to triangular coefficient matrices. Sturm-Liouville problems, which are Hermitian $2$nd order linear differential equations give orthogonal functions. Examples are all the orthogonal polynomials, and Bessel functions. They are the analog of $(A-\lambda B)x = 0$, with $A,B$ symmetric and $B$ positive definite. The latter equation produces eigenvectors $x$ that are $B$--orthogonal. Real polynomials have recurrence relations with that go two steps back, while complex have only one step. This is very related oscillatory behaviour only possible for 2nd degree real difference equations, and 1st degree complex ones. Today's reading brings me to Green's functions. If i am correct, it is about inverting an operator (differential) equation like $Ly = f$, by transforming it to an operator equation (integral) like $y = Kf$. A special case (?) is when $L$ is Hermitian. This gives a countable spectrum with orthogonal eigenfunctions that span the Hilbert space. This is analogous to the orthogonal decomposition of a Hermitian matrix. The Green function can then be written as $K(x,x') = \sum_k \lambda_k^{-1}\phi(x)\overline{\phi(x')}$. Entry: Derivative of the Inverse Function Date: Mon Oct 24 16:25:02 EDT 2005 Type: tex Inspired by the question: ``Why does $\tan^{-1}$ have such a nice derivative?'' \subsection{Derivative of $f^{-1}$} Why do some inverse functions have such a nice derivative? With $D\{f\}$ denoting the derivative of $f$, we can say this is a consequence of the fact that the derivatives of $f$ and $f^{-1}$ are $D\{f\}$ and $D\{f\}^{-1}$. The former inversion is for the composition operator, while the latter is for the multiplication. In order for this to yield a simple derivative, we need $D\{f\} = g \circ f$, with $g$ simple. To distinguish domain and codomain of $f$ we will denote them by $x$ and $y$ respectively. This gives for the above relations $y = f(x)$, $x = f^{-1}(y)$, and $D_y\{f^{-1}(y)\} = 1 / D_x\{f(x)\}$. Following the chain rule it is easy to see this is true, since the latter expression is simply a re--arranged version of $D_x \{ x = f^{-1}(f(x)) \}$. Now with $D_x\{f(x)\} = g(f(x))$ we can write $D_y\{f^{-1}(y)\} = 1 / g(f(x)) = g(y)^{-1}$. What we have expressed here is that $f$ is a solution of the first order differential equation $$y' = g(y)$$. The most famous example is of course $g(x) = x$, which yields $y = \exp(x)$, leading to $D\{\log x\} = 1/x$. Another is $g(x) = -x^2$ which yields $y = x^{-1}$. There are some more interesting ones among the trigonometric functions. For example $g(x) = 1 + x^2$, which is satisfied by $y = \tan(x)$, and $g(x) = \sqrt{1 - x^2}$, satisfied by $y = \sin(x)$. Note that derivation maps the inverse with respect to composition to inverse with respect to multiplication. This can easily be seen geometrically: for $y = f(x)$, exchanging $x$ and $y$ will reflect all tangent lines along the $y = x$ diagonal, which means $y = mx$ is mapped to $x = m^{-1} y$. With $m = \tan \alpha$, we have $m^{-1} = \tan(\pi/4 - \alpha) = \tan(\alpha)^{-1}$. There is more of course. We have $f \circ g = f(g(x))$, which is mapped to $(f' \circ g) g' = f'(g(x)) g'(x)$, so composition is mapped to multiplication. More about that later. Entry: Differential equations Date: Tue Oct 25 09:42:39 EDT 2005 Type: tex Let's talk a bit about differential equations. There is no single elegant theory for solving the general problem. In general we can only talk about existence and uniqueness. There is a theorem which states that under certain conditions (Lipschitz), a system of first order differential equations with initial conditions $y_k' = f_k(x, y_1, \ldots, y_N)$ has exactly one solution on an interval containing $x_0$. This is the most general form to which every ODE can be reduced. Finding expressions for a solution as an implicit $F(x,y)=0$ or explicit function $y(x)$ is often not possible. There are however classes of special cases which allow to obtain such expressions. The simplest class is no doubt linear equations with constant coefficients, which has an elegant mapping to linear algebraic equations through the Laplace transform resulting from $L\{y'\} = sL\{y\} - y_0$. The constant factor is the result of partial integration of $\int_0^\infty e^{-sx} y'(x) dx$. Note that the expression allows to recover $y$ from $y'$ and $y_0$. Expressions for higher order derivatives $y_k^{(n)}$ follow through induction. The consequence of this relation is that a set of linear differential equations for the functions $y_k$ can be transformed into an algebraic set of equations linear in $L\{y_k\}$ with coefficients parametrized in $s$. After obtaining a solution in the Laplace domain by solving the algebraic set, it can be transformed back to obtain the $y_k(x)$. % An important property of the Laplace transform is that it transforms % convolutions into multiplications. An important tool for studying linear differential equations with non--constant coefficients is to treat them as operator equations in infinite dimensional vector spaces. They can often be transformed to integral equations using Green's functions. Classifying equations according to order, we can say some things about first order equations by rewriting them in terms of the differential $P(x,y)dx + Q(x,y)dy$, which we hope to be able to integrate. We will be able to do so if the differential is \emph{seperable} or \emph{exact}. The former means it is reducable to the form $P(x)dx + Q(y)dy$, where we can integrate the two terms separately. The second method means we can write it as $dF(x,y)$, which can then be integrated. There are several tricks to reduce an equation to one of these two forms. Equations that can be reduced to the first form include special forms of $P(x,y)$ and $Q(x,y)$. If both are homogenous in $x$ and $y$, they can be reduced to separable form by writing them in terms of $x/y$. The same is possible when both are linear in $x$ and $y$, where we can shift $x$ or $y$ so both become homogenous. For the second form we can try to multiply the differential by an integrating factor, which is used for example to reduce a linear first order equation to an exact differential. Another special case is the Bernoulli equation, which can be reduced to a linear equation. For higher order linear differential equations with non--constant coefficients it is possible to reduce an $N$th order equation to a first order one by using information about $N-1$ known solutions. An $N$th order equation explicit in $y^{(N)}$ can be written as a first order system of $N$ equations. There are more exotic tricks of course. It's quite a maze. From silly substitutions to deep connections? To understand a nonlinear differential equation (first order set), it is instructive to linearize them to first degree, to analyze the behaviour of the equation around a certain point $(x_0, y_0)$ as $y' = J_0 (y-y_0) + f_0$ where $J_0$ is the Jacobian of $f$. It is convenient to perform an affine coordinate transform which brings the linear equation to real normal form. Note that in this case we only study the difference from constant motion, since for $y = y_0$ we have $y' = f_0$. We assume the normal form can be decoupled into a $2 \times 2$ block diagonal system, so we can limit ourselves to the $2$--dimensional normal forms. We ignore the case where the normal form has a Jordan block of order larger $2$. This brings us to $$y' = \matrix{\lambda_1 & 0 \\ 0 & \lambda_2} y$$ in case of real eigenvalues with $\lambda_1 > \lambda_2$. The solution for each component is $y_k(x) = e^{\lambda_k x}$. We call a positive eigenvalue \emph{repulsive} and a negative eigenvalue \emph{attractive}. In case of $2$ attractive eigenvalues, the largest one will dominate behaviour. This can be seen in the example $y_1(x) = e^{-x}$, $y_2(x) = e^{-2x}$. We can rewrite this as $y_2(x) = [y_1(x)]^2$, so points along the eigendirection of $y_2$ will be attracted much faster to the average behaviour (determined by $f_0$) than those along the direction of $y_1$. The effect of this is that we can mostly view the system as behaving one--dimensional in this region, with dynamics determined by the eigenvalue $\lambda_1$. When both are repulsive, a similar thing happens. For example $y_1(x) = e^{2x}$, $y_2(x) = e^{x}$. Here $y_1(x) = [y_2(x)]^2$, which explodes much faster than $y_2(x)$, so again, we can view the system as essentially one--dimensional. If there is one positive and one negative eigenvalue, the same reasoning applies, only more strongly. The system behaves as an explosion along the direction of $y_1$, while it is compressed in the direction of $y_2$. So the general conclusion is that when we have a collection of real eigenvalues, the largest one will be dominant. When there are complex conjugate eigenvalues, we get $$y' = \matrix{r & -\omega \\ \omega & r}y = ry + \omega y_{\perp}$$ where $y_{\perp}$ denotes a $90$ degree counterclockwize rotation. This leads to a \emph{rotation} combined with an attraction or repulsion. If $r < 0$, the system will be attracted to the average behaviour of $f_0$. When $r > 0$ it will be repelled. In addition, there is a rotation (oscillation) around the average behaviour, of which the frequency is determined by $\omega$. Again, comparing $r$ to the real part of other eigenvalues shows which are dominant. The degenerate case $$y' = \matrix{\lambda & 1 \\ 0 & \lambda} y$$ is a bit more special. A solution would be $y_2 = e^{\lambda x}$, $y_1 = x e^{\lambda x}$. Which gives $y_1 = x \log(x) / \lambda$. % DAMN. loop quantum gravity. 4D -> 2D for small scales.. this smells % like dominance of diff eq! Entry: Tensors Date: Tue Oct 25 14:55:42 EDT 2005 Type: tex [ Needs serious weeding. ] \def\notindex{\underbrace{\ldots}} \def\t{\mathcal} Tensors for dummies: tensors as grids of numbers. Working just with coordinate blocks, we start with $\R^N$. Transforms from $\R^N \to \R^N$ are matrices $T \in \R^{N \times N}$, which form an algebra. We can multiply $u_{ij}$ and $v_{ij}$ as $$t_{ij} = u_{ik}v_{kj},$$ using Einstein summation. Other kinds of multiplication can be defined. For example, the outer product $t_{ij} = u_i v_j$ or inner product $t = u_i v_i$. The multiplication for matrices is a combination of the two. A coordinate transform $UV\tilde{U}$ can be written as $u_{ik} v_{kl} \tilde{u}_{lj}$. When $\tilde{U}$ is $U^T$ this becomes $u_{ik} v_{kl} u_{jl}$. Which makes it clear that both rows and columns of $v_{ij}$ will be transformed by $u_{ij}$. We can multiply a generic tensor with a matrix along a certain dimension as $$t_{\notindex_{k-1} j \notindex_{J-k}} = t_{\notindex_{k-1} i \notindex_{K-k}} u_{ij}.$$ Which performs a coordate transform on the $k$--th coordinate array. We can write this in matrix notation as $\t{T} \times_k U$. To talk about subspaces generated by tensors we first make some observations. We call $N$ the dimension of the vector space generating the tensor space, and call $K$ the degree (also called rank) of the tensor, which indicates the number of grid coordinates. The dimension of the space generated by all $N$--dimensional $k$--mode vectors can never be larger than $N$, while the number of possible vectors is $N^{K-1}$. The dimension of the space generated by all $N^2$--dimensional $(k,l)$--mode matrices can never be larger than $N^2$, while the number of possible submatrices is $N^{K-2}$. This indicates there is something special going on for $K = 2^k$. For $K=4$ there are two $N$--dimensional spaces generated: the column space $(1)$ and the row space $(2)$. For $K=2$ there are six $N^2$--dimensional spaces generated: $(1,2),(1,3),(1,4),(2,3),(2,4),(3,4)$. Some like to define the rank of a $K$--tensor $\t{T}$ as the cardinality of the minimal set of rank $1$ tensors that sum to $\t{T}$. A rank $1$ tensor is the outer product of $K$ vectors. So, for a $4$--tensor of rank $R$ we have $$\t{T}_{ijkl} = \sum_r^R u_{i}^{(r)} v_{j}^{(r)} w_{k}^{(r)} x_{l}^{(r)}.$$ The SVD can be generalized to tensors. This leads to orthogonal tensors. But first, some weird shit. Take all tensors to be of dim $N^K$. There is a fundamental difference between even and odd $K$. Vectors $(K=1)$ are transformed by matrices $(K=2)$. We cannot multiply vectors and obtain vectors. We can only obtain scalars or matrices. But we can do this with matrices. Does the same thing generalize to higher $K$? Let's define some products. To multiply the $K_a$--tensor $\t{A}$ with the $K_b$--tensor $\t{B}$ we need to choose a subset of indices of both, identify, and sum. We already know what happens when we combine $K \in \{0,1,2\}$. So let's look at $K=3$. The multiplication $(3,0)$ is trivial: $t_{ijk} = u_{ijk} v$. The next one is $(3,1)$, for which we have several possibilities $t_{ij} = u_{ijk} v_k$, where we can choose $3$ different dimensions to multiply using an inner product. Using an outer product we get $t_{ijkl} = u_{ijk} v_l$. There are $4$ possible ways to do this. The resultant $K$s are $\{4,2\}$. The next combination is $(3,2)$. An outer/outer product gives a $5$--tensor, an inner/outer product gives a $3$--tensor, while an inner/inner product gives a $2$--tensor. A $(3,3)$ combination gives an o/o/o product $6$, an o/o/i product $4$, an o/i/i product $2$, and an i/i/i product $0$. Lets move to $(4,0)$ which gives $4$. Then $(4,1)$ gives $\{5,3\}$, next $(4,2)$ gives $\{6,4,2,0\}$. Note a pattern here. Combining two even $K$s gives an even $K$. Combining two odd $K$s gives an even $K$. Combining an even $K$ and an odd $K$ gives an odd $K$. We could call the even $K$s \emph{transforms}, because they have the ability to leave $K$ the same. For example take $4 \times 2 \to 2$, or $t_{ij} = u_{ijkl} v_{kl}$. There are $\binom{4}{2} = 6$ possible ways of writing the product without changing the order of the indices. (Compare this to $t_{i} = u_{ik} v_k$, where there are $\binom{2}{1} = 2$ ways of writing the product, corresponding to $t = U v$ or $t = U^T v$.) If we choose $\t{U}$ to be separable $u_{ijkl} = c_{ik} r_{jl}$ we get the ordinary matrix transform $C V R$, where $C$ operates on rows and $R$ operates on columns. % Ik vraag me wel af waarom die $2^k$--structuur nergens opduikt. Entry: Central Limit Theorem Date: Tue Oct 25 20:20:41 EDT 2005 Type: tex Using characteristic functions, the proof of the central limit theorema is based on the fact one can write the PDF of a sum of independent variables as the product of their characteristic functions, which gives a relation like $$\lim_{n\to\infty}\left(1 + {x \over n} + o({x\over n}) \right)^n = e^x,$$ where $f \in o(x/n)$ means asymptotically negligible, or $\lim_{n\to\infty} n f(x/n) = 0$. More specifictly $x = -{t^2 \over 2}$ for a standard normal distribution, of which the right hand side is the characteristic function. Never knew it was so simple. Entry: Tensors and Displacement Rank Date: Wed Oct 26 08:49:13 EDT 2005 Type: tex [Probably a dead end.] So, what to do with tensors and displacement rank? First, we have to do this to solve inverse transforms. There is little value in computing a triangular (tetrahedral?) decomposition without using it to decouple spaces. Once this is cleared out, i think it is relatively straightforward to generalize the Schur algorithm to higher dimensions. Another thing i was thinking about. What about fractional $Z$? If we have a structured matrix with fractional delay structure $R - Z_f R Z_f^T$, with $0 < f < 1$, does this make sense? One more thing, how to keep track of covarianant/contravariant indices when they're all on the same level. I understand that when using a certain signature/metric inner products need to pass through the metric tensor. But with ordinary coordinate tensors, the inner product is implicitly defined. Does this only matter for defining the products? I.e. if you take care to define everything well (a coordinate transform is 1=contra 2=covariant), there is nothing to worry about? Maybe it is a good idea to review the standard tensor transform things and also have a look at index--free approach, which i guess mostly deals with the grading of tensor spaces? Also it would be nice to finally figure out if it is possible to model (unitary) signals as spinors, or at least have a decent idea of what a spinor actually is, other than something that changes sign under $2\pi$ radians rotation. Entry: Particle Music Date: Wed Oct 26 13:47:18 EDT 2005 Type: tex A knot in 3D is a map from the complex unit circle to Euclidian 3--space $$k : \delta \D \to \R^3 : \theta \mapsto (t,x,y)$$ This can be seen as the worldsheet of a 3D spacetime ``fermionic vacuum fluctuation''. Meaning, if you take one of it's coordinates to be time, for example the first one, and intersect it with the current time sheet $\tau = \{(t, x, y) : t=t_0\}$, you get a collection of moving/creating/annihilating particles and anti--particles. This could be nice to map to musical structures. Especially creation/annihilation is musically interesting. \def\p{\oplus} \def\m{\ominus} At a creation point, the two particles (roots) emerging could be labeled $\p$ and $\m$ depending on the the direction of the curve. Let's call a $\p$ particle a root that follows the increasing part of the curve, and a $\m$ particle the decreasing part. This shows that annihilation is always between a $\p$ and $\m$ particle. Now, what I thought was that nontrivial knots reflect recombinations of particles. But this is not the case. The trivial knot can be deformed in a fluctuation diagram that exchanges particles. So what exactly is expressed by the ``tangling of world lines'' ?. Some more comments. For knots, which are 1D entities, it is not possible to have interactions other than creation and annihilation, since other interactions would require intersections. For 2D manifolds embedded in 3D this is possible. Before collision, an intersection contains 2 circles. The moment of collision gives a ``twisted circle'', due to the local saddle point structre. After the collision, both fuse into a compressed circle. An interesting set of knots are $(p,q)$ torus knots. Actually\ldots All this can be nicely cast into real/complex roots of polynomials. We could have real and imaginary particles. Take any polynomial $$f(x) = \sum_{k=0}^N a_k x^k,$$ and compute its factorization. The real roots correspond to particles. The complex roots could be seen as ``points in harmony'', since they come in complex conjugate pairs that can give birth to closely spaced real poles which we can associate with repulsion due to inharmonicity. In case $N$ is odd, sweeping $f(x) = c$ from $-\infty$ to $+\infty$ will start out with one particle, and end with one. If $N$ is even, we start out with nothing and and with $2$ if $a_N$ is positive, or vice versa if $a_N$ is negative. To get as much particles as possible during the sweep, it is interesting to use oscillatory polynomials. Other oscillatory functions could be used, but polynomials are easier to work with. We will find good candidates amongst the orthogonal polynomials. Let's concentrate on those from the Sturm--Liouville problem. % this is very similar to the MI paper about algebraic systems Entry: Chebyshev polynomials Date: Wed Oct 26 15:50:48 EDT 2005 Type: tex Something I learned yesterday rading Conway's Course in Functional Analysis. It's obvious in retrospect, but there's a theorem stating that every Hilbert space with the same cardinality is isomorphic. This is quite obvious for Chebyshev polynomials on $[-1,1]$, which are orthogonal w.r.t. the weighting function $(1-x^2)^{-1/2}$ an have the property $$T_n(x) = \cos(n \cos^{-1}(x)).$$ With $\theta = cos^{-1}(x)$, or $cos\theta = x$ we write $T_n(cos\theta) = cos(n\theta)$. The mapping $x\mapsto\theta$ maps the interval $[-1,1]$ to $[\pi, 0]$. This mapping is invertable. It is a Hilbert space isomorphism because it preserves the inner product $$\int_{-1}^{+1}f(x)g(x)(1-x^2)^{-1/2}dx = \int_{\pi}^0 f(\theta)g(\theta)d\theta.$$ This mapping maps the orthogonal cosine basis $\cos(n\theta)$ for $[\pi, 0]$ to $T_n$. Chebyshev polynomials satisfy the differential equation $$(1-x^2){d^2 y \over dx^2} - x {dy \over dx} + n^2 y = 0,$$ $${d^2 y \over d\theta^2} + n^2 y = 0.$$ and the recurrence relation $$T_n(x) = 2xT_{n-1} - 2T_{n-2}),$$ $$\cos(n\theta) = 2\cos \theta \cos[(n-1)\theta] - 2\cos[(n-2)\theta],$$ where the bottom expression is obtained after substitution $x=\cos\theta$. These polynomials have equiripple properties, which makes them show up in several minimax problems in filtering and beamforming techniques. One result in this domain is that the Cheby grid, determined by the zeros of $T_n(x)$ is optimal for polynomial interpolation, in a minimax sense. Furthermore, evaluation of the polynomial is related to the DCT. Now I wondered, if these have a simple bijection to cosine, what about the other Sturm--Liouville polynomials? Probably, since i never heard about it anyway, the mapping is terribly simple for the things above, but pretty complex for the other polynomials. Entry: Knots Date: Thu Oct 27 12:02:22 EDT 2005 Type: tex Then knots. I'm still thinking about this particle business above. Wether it's trivial or not. Wether it makes a difference if a certain compact closed curve in $\R^3$ is a knot or not when you scan it to get dancing particles. Maybe after all the polynomial approach is better, at least its easier to understand. Been toying a bit with the trefoil knot in its torus knot representation, with the slicing plane parallel to the revolution plane of the torus. Intersecting it from the down side gives $3$ pairs of particles. Suppose the parametrization is such that the $\m$ particles stay on the outside of the torus, while the $\p$ particles move through the hole. The dance they make is a $120$ degree rotation before they recombine. Now how is this permutation related to it being a knot or a link, looking only at torus knots/links? Started browsing Arnold's Mathematical methods for classical mechanics. Very interesting book. Of course, my main interest right now is the magic word ``symplectic'' and differential geometry. So a little about that. I've edited it out and moved to tomorrow's entry. Entry: Differential Geometry Date: Fri Oct 28 11:12:30 EDT 2005 Type: tex A new book on differential geometry by Erwin Kreyszig arrived today. The funny thing about the book is that, while all my orders from Amazon arrived in perfect state, this one was curved! Looking at it, it seems to not use the intrinsic view. Talks about embedded stuff, so this is fundamentally different from the approach used in Arnold's book, which talks about differential forms, so it's not going to help much with my ``explanation'' about intrinsic differential geometry. Arnold's is quite something. Have to get used at the different kind of style used. He's shooting out concepts at a rate you can't believe. But, nevertheles, very interesting. Especially the very basic geometric things about phase space i never really thought about. All in the introduction, but i have to take time to absorb these ideas in a thorough way. One thing i noticed about reading about differential forms is the importance of the determinant, since a $k$--form can be expressed as the determinant of $k$ one--forms. So one can say, if the inner product relates to angle, the determinant relates to area/volume/\ldots. Today i got also Shilov's book on linear algebra. It starts with determinants. The last chapter is about categories. I just had a brief look at it. It talks about the concept of algebra and linear family, which after choosing a basis talks about the algebraic structure of non--square matrices basicly. Have to take some time to see what extra insight this brings. To my knowledge, embedding a $M$--dim manifold in an $N$--dim Euclidian space is always possible, where $N>M$. I do not know if it is possible to write it as a solution set of $N-M$ algebraic equations. The latter is somehow very interesting to me, because i see a direct way of plugging in some numerical linear algebra things to use for animation and the study of motion, hence my interest in Arnold's book. Anyway, i'll give Kreyzig a try, and leave the abstract differential geometry for later when i can relate the two. % For an implicit function $F(x_1, \ldots, x_N)$ I do recall something % like $dF = \sum_{k=1}^N {\pd F \over \pd x_k} dx_k,$ from which the % derivative of th \subsection{Differential Geometry} I start with an intuitive introduction of the concepts. We'd like to explore a manifold $M$ and functions defined on it using tools from differential and integral calculus. % * manifold : maps charts atlas (man = class of equivalent atlasses) % * derivative % * differential When we want to study a function $f$ defined on the Euclidian space $\R^N$, we often resort to its linear approximation around the point $x'$ which is given by $$f(x) \approx f_{x'}(x) = f(x') + \sum_{k=1}^N \left. {\pd f \over \pd x_k}\right|_{x'}(x_k - x_k').$$ This function is the equation of the tangent line/plane/volume/\ldots of $f$ at $x'$. Since the function $f_{x'}(x) - f(x')$ is a linear function, it is a vector living in the dual space of the vector space $\R^N$, which enables us to use linear algebra tools to study the behaviour of $f$. Iterpreting $f_{x'}(x) - f(x')$ as a function of $x$ and $x'$ gives a bivariate function called the \emph{differential}. It returns a linear functional for every point $x'$ in $\R^N$. There is no need to stop at linear approximations. Indeed, there are a lot of tools which study $f$ using constant, linear and quadratic terms from its Taylor series. This is all very useful. But how can we carry over these methods to a manifold $M$ which is not isomorphic to $\R^N$? It has to look somewhat like $\R^N$ if we want to use our known tools so we suppose it is at least locally isomorphic, meaning there is an invertable map called a chart $\phi : M \to U$ from $M$ around to a subset $U$ of $\R^N$. Since the codomain of $\phi$ is the space of $N$--tuples, we can identify each of its components as $x^k : M \to \R$, and call them \emph{coordinate maps}. If the docomains of two such charts $\phi_n$ and $\phi_m$ are overlapping, we call the function $\phi_n \circ \phi_m^{-1} : \R^N \to \R^N$ a transition function, and require it to be smooth in the overlapping part. A collection of overlapping charts is called an \emph{atlas}. % transition functions % Jacobian of transition functions This structure allows us to relate $M$ to $\R^N$ locally. We now define what a tangent vector is, without assuming there is any algebraic structure on the manifold $M$, which we would assume if it would be embedded in some higher dimensional Euclidian space. In the embedded view, a tangent vector at point $p$ can be used to partition all the curves passing through $p$. In the intrinsic view, we can turn this around and define a tangent vector as the equivalence class of all curves which have a similarity at $p$. For two curves $\sigma, \tau : \R \to M$ to be similar we require $p = \sigma(0) = \tau(0)$ and $${d x^k(\sigma(t)) \over d t} = {d x^k(\tau(t)) \over d t},\qquad k = 1,\ldots,N,$$ where the $x^k$ are the coordinate maps of some chart\footnote{If it is true in one chart, it is true in any chart due to the existance of transition functions: the partial derivatives in one chart transform linearly to the partial derivatives in another chart through the Jacobian of the transition function.}, and the derivatives are evaluated at $t=0$. We denote all the curves similar to $\sigma$ as the equivalence class $[\sigma]$. This defines a partition which behaves as a vector space\footnote{We can define vector scaling $\lambda [\sigma(t)] = [\sigma(\lambda t)]$, and addition as $[\alpha] + [\beta] = [\alpha + \beta]$, where every representative $\sigma(t)$ from the latter equivalence class satisfies ${d x^k(\alpha(t)) \over d t} + {d x^k(\beta(t)) \over d t} = {d x_p^k(\sigma(t)) \over d t} ,\qquad k = 1,\ldots,N$ in some coordinate chart.} we will denote as $TM_p$. The differential at point $p \in M$ is then defined as the linear map $$df_p : TM_p \to \R : [\sigma] \mapsto {d f(\sigma(t)) \over d t},$$ where $\sigma(t)$ is any representative of the equivalence classe $[\sigma]$. The differential $dx_p^k$ is then defined as the functional for which $dx_p^k([\chi^j]) = \delta_{jk}$. In other words $[\chi^j]$ is the class of curves for which ${d \over dt} x^k(\chi^j(t)) = \delta_{jk}$. This defines a basis/dual basis pair for $TM_p$ and $TM_p^*$. The derivative of $f$ along a vector $[\sigma]$, which takes the role of partial derivative if this vector is one of the basis vectors of $TM_p$, is defined as ${d \over dt}f(\sigma(t))$, where $\sigma(t)$ is any representative of $[\sigma]$. So the differential maps a vector $[\sigma]$ to the derivative of $f$ along along $[\sigma]$. Interpreting the point $p$ as a variable, we obtain a function $df : TM \to \R$ defined on the tangent bundle $TM = \bigcup_p TM_p$. This is basicly the same as the one--dimensional Euclidian version where the differential of a function $f(x)$ in one independent variable $x$ can be written as $$df : \R^2 \to \R : (x,\Delta) \mapsto f'(x) \Delta x.$$ The $\Delta x$ can be seen to live in $TM_x$, while $x$ lives in $M$, which makes $(x,\Delta x)$ an element of $TM$. Entry: Curves in 3D Date: Sat Oct 29 13:21:55 EDT 2005 Type: tex Started reading Kreyszig's book on differential geometry. Starts with curves. The central expression to remember is Frenet's formulas $$\matrix{ \dot{t} \\ \dot{p} \\ \dot{b} } = \matrix{ 0 & \kappa & 0 \\ -\kappa & 0 & \tau \\ 0 & -\tau & 0 } \matrix{ t \\ p \\ b }, $$ \def\SO{\text{SO}} where $x' = {d x \over d t } = \dot{x} \|x'\|$. So $\dot{x} = { d x \over d s }$, where $s$ denotes arc length. This gives the local orthonormal set of vectors consisting of the unit tangent $t = \dot{x}$, the principal unit normal $p = k / \kappa$ with $k = \dot{t}$, and the binormal $b = t \times p$. Note that this matrix is skew--symmetric, making it an infinitesimal generator of a subgroup of the rotation group $\SO(3)$. Calling the axis of the infinitesimal rotation $d$ we get $$\dot{t} = d \times b, \qquad \dot{p} = d \times p, \qquad \dot{b} = d \times b.$$ This reads as: ``Locally, every curve in Euclidian 3--space behaves as a helix.'' Or in other words, locally every curve goes through a screw motion. This is the approximation of a curve by a 3rd order polynomial parametrization in the arc length. This approximation makes a 3rd order contact with the osculating circle. Entry: Generalized Schur Date: Sat Oct 29 14:21:55 EDT 2005 Type: tex While it is possible to compute the Cholesky factorization of a symmetric positive definite matrix $R$ using the generator representation $R-ZRZ^T = GJG^T$, the result is a dense set of matrices. So, while computation time is $O(N^2)$ instead of $O(N^3)$ for the general case, intermediate storage is still $O(N^2)$ and not $O(N)$ as it is for $R$. Can this be remedied? Is there for example a way to compute the solution, or a number of solutions, without actually storing the triangular factors? The tricky thing here is the forward/backward substitution in solving the linear system $LL^T x = b$. So the formal question becomes ``Is it possible to translate the problem into some other other formulation so the triangular factors can be consumed immediately? Is it at all possible to solve the equations $L^T x = y$ and $L y = b$ by using some kind of pingpong method between two spaces? \def\I{\widetilde{I}} \def\RRR{\widetilde{R}} \def\L{\widetilde{L}} \def\x{\widetilde{x}} \def\y{\widetilde{y}} \def\b{\widetilde{b}} Using the argument of symmetry. There is absolutely no reason why we should not consider the decomposition $$R = LL^T = \L^T \L,$$ which does not contain the arbitrary choice of lower/upper triangular. The relation between $L$ and $\L$ can be expressed as $$L = \I\L\I,$$ which means the first column of $L$ is equal to the reversal of the last column of $\L$. Since $\I^2 = I$ and $\I\RRR\I = R$ for a symmetric Toeplitz matrix, we can use both representations. This smells like Levinson and the triangular decomposition of $R^{-1}$. Is it? Probably it is, since the Levinson algorithm actually finds a solution to a set. [NOTE: see later about Levinson. This is very confusing.] If $R$ is a symmetric Toeplitz matrix, we can generate both $L$ and $\L$ at the same time, now what does this bring us? We have the sets of equations $$ \left\{ \begin{array}{ccc} L^T x &=& y \\ L y &=& b \\ \L^T x' &=& y' \\ \L y' &=& b \\ \end{array} \right.,$$ where $x = \x$ is the solution to $Rx = b$. Now using $L = \I\L\I$ and $\I y' = \y$, $\I x' = \x$, and $\b = \I b$ this becomes $$ \left\{ \begin{array}{ccc} L^T x &=& y \\ L y &=& b \\ L^T \x &=& \y \\ L \y &=& \I b \\ \end{array} \right..$$ Rearranging some more we get $$L^T \matrix{x & \x} = \matrix{y & \y},$$ and $$L \matrix{y & \y} = \matrix{b & \b}.$$ Can we update this? More about this later. Check Bultheel's paper and Kailath's paper about combining triangularization of $T$ and $T^{-1}$. Note: need to look at Mastronardi's PhD to see how QR and RQ play together. [See Sun Nov 6 12:47:15 EST 2005] Entry: Matrix Computations Date: Sun Oct 30 11:30:08 EST 2005 Type: tex \subsection{Levinson algorithm} Now that I finally understand the Schur algorithm, it is time to return to the Levinson algorithm, where instead of computing a triangular factorization, we update the solution to a special kind of symmetric Toeplitz system. \def\x{\times} The main properties which are exploited in the Levinson algorithm are the symmetric Toeplitz ``rotate'' property $T = \I T \I$, which relates the forward and backward solutions $Tx = b$ and $T (\I x) = (\I b)$, and the Toeplitz shift property, which can be expressed as $$T_{k+1}\matrix{x_k \\ 0} = \matrix{b_k \\ \x}$$ and $$T_{k+1}\matrix{0 \\ x_k} = \matrix{\x \\ b_k},$$ with $T_k x_k = b_k$. The latter is basicly the observation that the submatrices indicated by $\x$ in $$ \matrix{ \x & \x & . \\ \x & \x & . \\ . & . & . } \text{ and } \matrix{ . & . & . \\ . & \x & \x \\ . & \x & \x \\ }$$ are equal. In other words $T_k$ occurs two times in $T_{k+1}$. Combining these two relations enables us to write a recursive relation between $T_{k} y_{k} = b_{k}$ and $T_{k+1} y_{k+1} = b_{k+1}$ as $$T_{k+1} \left( \matrix{x_k \\ 0} + \rho_{k+1} \matrix{0 \\ \I x_k} \right) = \left( \matrix{b_k \\ \beta_k} + \rho_{k+1} \matrix{\beta_k \\ \I b_k} \right).$$ Whether this relation is helpful to solve linear systems depends on the way $b_k$ and $b_{k+1}$ are related. In other words, if we can solve $$b_{k+1} = \left( \matrix{b_k \\ \beta_k} + \rho_{k+1} \matrix{\beta_k \\ \I b_k} \right)$$ for $\rho_{k+1}$, we can obtain $x_{k+1}$. Note that in this scheme, the first coordinate of $x$ is never updated, so we can choose it to be equal to 1. If we choose all coefficients of $b$ except the top one to be different of zero, and allow that element to contain an arbitrary value (i.e. it is not part of the data), we are effectively solving the problem $Tx = -t$, where $t$ is the second row of $T$ and the vector $b$ contains a ``crutch'' to allow this system to be updated. This crutch is called the prediction error in linear prediction, and the set of equations above are called the Yule--Walker equations. The bottom equation will yield $\rho_{k+1}$, while the top relation will compute the next prediction error $\beta_{k+1}$. Now, the interesting thing is that this method can be extended to solve systems $Tx = b$, with arbitrary $b$, by observing that the problem $Tx = -t$ occurs as the core problem in solving $T_{k+1} y_{k+1} = b_{k+1}$. Introducing the new unknowns $x_k$ and $\gamma_k$, $y_k$ and $y_{k+1}$ can be related as $$\matrix{T_k & t \\ t^T & \tau_k} \matrix{y_k + x_k \gamma_k \\ \gamma_{k}} = \matrix{b_k \\ \beta_{k}}.$$ Using $T_k y_k = b_k$, the top equation gives $T_k x_k = -t$, which can be solved using Levinson recursion, while the bottom one gives $t^T (y_k + x_k \gamma_k) + \tau_k\gamma_k = \beta_k$, which can be solved for $\gamma_k$ using the solution of the first equation. Now why is this so damn ingenious? The way the symmetry is introduced is rather remarkable. The key to solving the system $Tx=-t$ is to introduce the prediction error. Let's try to re--invent the algorithm. Because $t$ is a column of $T$, we can put it where it belongs and rewrite the equation as $$\matrix{t & T}\matrix{1 \\ x} = 0.$$ Now, to restore full symmetry to the matrix, we can add the top row $$\matrix{t_0 & t^T \\ t & T}\matrix{1 \\ x} = \matrix{\beta \\ 0},$$ giving the prediction error $\beta$. Can we learn something from this? Yes. Exploiting symmetry can be obvious sometimes, but it pops up in strange ways. The self--similarity of the Yule--Walker equations leading to the Levinson recursion is a good example of this: instead of focussing on $Ax=b$ with near exploitable symmetry, we can try to see it as part of a larger structure which has better exploitable structure. \subsection{Approximate updating} \def\d{\Delta} Something i ran into yesterday. If $Ax = b$, how can we find the solution to $(A+\d A)x' = b$? Paying attention to which way to factor out $A$, we can write this as $x' = (I + A^{-1} \d A)^{-1} A^{-1}b$, where the parenthized expression can be written as $\sum (-1)^k (A^{-1} \d A)^k$. Assuming $\| \d A \|$ is small, this series quickly converges. Using the factored form of $A$ this series is easy to evaluate. Another note is that $(1+ax)^{-1}$ is a binomial expression if for non--natural numbers we define $$\binom{x}{k} = x(x-1)\ldots(x-k+1).$$ For more, check [1]. \subsection{Tensor calculus} Been reading about tensors yesterday. I think I finally get what this covariant / contravariant thing is about. Lesson to learn: go to books instead of Wikipedia for really new things. An important distinction to make with matrices is wether they are used to represent coordinate transforms or quadratic forms. For example, using two contravariant vectors $x^i$ and $y^j$, a quadratic form $\left = a_{ij}x^i y^j$ is represented by a covariant form $a_{ij}$. The linear transformation $y^j = a_i^j x^i$ is represented by a mixed form $a_i^j$. In matrix lingo we can put this as $\bar{x} = R^{-1}x$, which gives the quadratic form $y^T A x = \bar{y} R^TAR \bar{x}$, and the linear transformation $\bar{y} = R^{-1}AR \bar{x}$. Both clearly do not transform in the same way under coordinate transform. The first being a congruence transform, while the other is a similarity transform. The missing thing here is of course contravariant matrices $a^{ij}$ which are related to bilinear forms of dual vectors. %[1] http://www.math.niu.edu/~rusin/known-math/index/65FXX.html Entry: Rational Approximation Date: Mon Oct 31 08:43:43 EST 2005 Type: tex Polynomials in polynomials, or mapping $x^n \mapsto q_n(x)$. The thing that triggered this is Chebychev--Pad\'{e} approximation from AFCNA p. 304. First, a transform from a polynomial $p(x) = \sum_k^N p_k x^k$ to a new basis $q_n(x) = \sum_k^n q_{nk} x^k$ is coordinate transform in the vector space of $N$th order polynomials. In general, Pad\'{e} approximation transforms the Taylor series $f(x) = \sum_k^\infty c_k x^k$ of a function to a rational function $a(x) / b(x)$, where $a(x) = \sum_{k=0}^p a_k x^k$ and $b(x) = \sum_{k=0}^q b_k x^k$. There are $p+q+1$ degrees of freedom in the $(p,q)$ approximation. We can safely take $b_0 = 1$ to fix the arbitrary scaling factor of $a(x)$ and $b(x)$. We impose $p+q+1$ constraints by requiring $f(x) - a(x)/b(x)$ to be maximally flat at $0$. This is equivalent to identifying the first $p+q+1$ coefficients in the series $f(x)b(x)$ with those in $a(x)$. In the case where $(p,q)\to(\infty,\infty)$ this leads to the infinite set of equations $$ \matrix{ c_0 \\ c_1 & c_0 \\ c_2 & c_1 & c_0 \\ \vdots & & & \ddots } \matrix{b_0 \\ b_1 \\ b_2 \\ \vdots} = \matrix{a_0 \\ a_1 \\ a_2 \\ \vdots}$$ or $C b = a$. The finite truncation of this gives $C\matrix{1 \\ b}=\matrix{1 \\ a \\ 0}$, where $C \in \R^{p+q+1 \times q+1}$ and $a$ and $b$ have coefficients $1\ldots p,q$. Rearranging some more we get $\matrix{C & -I}\matrix{1 \\ b \\ \hline 1 \\ a} = 0$, which can be cast into $a_0 = c_0$ and the canonical form $$\matrix{L & -I \\ T & 0}\matrix{b \\ a} = {-c},$$ where $c$ contains coefficients $1\ldots(p+q)$, $L$ is an LTT matrix, and $T$ is Toeplitz. This system is highly symmetric, so it should come as no surprize there are fast algorithms in the style of the Levinson recursion. For Chebyshev--Pad\'{e} approximation, all the coefficients $a,b,c$ are with respect to the basis $\{T_n(x)\}$ of Chebyshev polynomials, instead of $\{x^n\}$. This gives a better approximation in a minimax sense. Question. When doing Chebyshev--Pad\'{e} approximation, you need to start from $c_k$ obtained by projecting $f(x)$ onto the Chebychev basis, and rearrange it in a matrix. I was thinking if it would be possible to Transform a LTT matrix like $C$ directly into another matrix $\bar{C}$ where $\bar{c}_k = f(c_0,\ldots)$ using some tensor transform. For a diagonal matrix it is simple. Though a structure preserving tensor transform seems to me something that does exist and is useful. %[1] http://mathworld.wolfram.com/PadeApproximant.html Entry: More.. Date: Mon Oct 31 09:43:43 EST 2005 Type: tex \subsection{Differential geometry} After not being able to really concentrate on matrix stuff, and giving up on operators in Hilbert spaces, i turned to Kreyszig again. Defined the tensor $b_{ij}$ containing the curvature information. For curves this is related to $\kappa$, just as the metric tensor $g_{ij}$ is related to the velocity vector. Hence, for curves it is possible to parametrize them in terms of $s$, such that $t = dx/ds$ is a unit vector. For surfaces this is of course not possible, since there are two independent directions, and the ``velocity'' is a tensor. However, it is possible to construct orthogonal parametrizations (but probably not for all surfaces/manifolds?), which at least diagonalize $g_{ij}$. \subsection{Conjugate gradient} Conjugate gradient method is an iterative way to solve the system $Ax = b$, in the case that $A$ is symmetric positive definite. This is used mostly for large sparse systems. \subsection{Nonlinear equations and optimization} Newton's algorithm for solving the set of nonlinear equations $f(x) = 0$ starts from the first order Taylor approximation at $x'$ given by $$f(x) \approx f(x') + J (x - x'),$$ where $j_{kl} = \left.{\d f_k \over \d x_l}\right|_{x'}$ is Jacobian evaluated at $x'$. From this we determine the iterative method $$x_{k+1} = x_k - J_k^{-1}f(x_k).$$ From this we can derive several modified versions with different convergence properties, by writing them as $$x_{k+1} = x_k + t_k H_k f(x_k).$$ Entry: Quantum Calculus Date: Tue Nov 1 11:17:26 EST 2005 Type: tex Got another load of books yesterday. Probably the last, or I will exceed my allowed weight. First is a big one: Horrowitz and Will, The art of electronics. A classic, i've read. I'm not getting into that right now, takes me too far away from the current topics. Reference book. Then there's Weyl's book, the classical groups. Hard core group theory, i've read. Maybe this is a bit over my head. I don't know. Would be interesting to finally read a real math book \verb+:-)+ I also got Golub and Van Loan's Matrix Computations. One I should have gotten a long time ago. The last one is Conceptual Mathematics, a book on Category theory, by Lawvere and Shanuel. Something i had to get into. Been reading it yesterday. It's nice, though a bit verbose. Maybe that's a good thing when it gets a bit harder\ldots So, onto pf stuff. \subsection{Take 1} Apparently, there's a branch of mathematics called \emph{quantum calculus}. There are two forms, the \emph{h--calculus} and \emph{q--calculus} which replace the derivative by $${f(x+h)-f(x) \over h}$$ and $${f(qx)-f(x) \over qx-x}$$ respectively. The link between the two is $q = \exp(ih)$. For more info see [1]. %% Maybe the most important hunger I feel at this moment is that for %% justification. Why why why! \subsection{Take 2} Really, the quantum $h$--calculus is nothing more than discrete signal processing. Where the derivative maps to the difference operator $$\lim_{h\to 0} {f(x + h) - f(x) \over h} \mapsto f_s \left[ f(x + {1 \over f_s}) - f(x) \right],$$ where we usally set $h = 1/f_s = 1$. Now, according to something I read, the more interesting modification is the $q$--calculus, where the derivative is replaced by $${f(qx) - f(x) \over qx - x},$$ with $q = \exp(ih)$, which reduces to the usual derivative when $q \to 1$. %[1] http://math.ucr.edu/home/baez/week183.html Entry: Levinson Algorithm Date: Thu Nov 3 16:46:00 EST 2005 Type: tex A bit more philosophical. This Levinson algorithm is a typical case of extending a problem to another one, one with higher dimensionality, which, due to symmetry, is easier to solve than the smaller problem. %% This idea also comes back in a more fomal form in the theory % of %% support vector machines. The Levinson algorithm can be rewritten as a recursion for the solution of the proper system $Tx = b$, with a fixed RHS $b = e_1$. Which is the same as the prediction error formulation, only with the RHS normalized. Dividing $x$ by $x_1$ gives the non--normalized solution. The recursion relations for the normalized system are $$T_{k+1} \left( \alpha_{k+1} \matrix{x_k \\ 0} + \beta_{k+1} \matrix{0 \\ \I x_k} \right) = \left( \alpha_{k+1} \matrix{b_k \\ \eta_k} + \beta_{k+1} \matrix{\eta_k \\ \I b_k} \right).$$ These can be solved by solving the symmetric Toeplitz system $$ \matrix{1 & \eta_k \\ \eta_k & 1} \matrix{\alpha_{k+1} \\ \beta_{k+1}} = \matrix{1 \\ 0}.$$ This clearly has the same form as the two solutions $T_n x_n = e_1$ and $T_{n+1}x_{n+1} = e_1$ it relates. It is easy to see that linear prediction is basically orthogonalization: find an $x$ that is orthogonal to all rows of $T$ except the first one. Note that $T_n x_n = e_1$, but $T_n (\I x_n) = e_n$, so $a^H T (\I x_n)$ is zero whenever $a_i=0$ for $i \geq n$. This means the set of vectors $y_i, i \leq n$ derived from zero extending $\I x_i, i \leq n$ form an orthogonal basis for inner product space determined by $T_n$. Suppose $l_i$ is the normalized set derived from $y_i$. This means $l_i^H T l_j = \delta_{ij}$. Which gives $$\matrix{l_1 & \ldots & l_n}^H T \matrix{l_1 & \ldots & l_n} = L^H T L = I$$ or $$LL^H = T^{-1},$$ so the $l_i$ are the columns of the Cholesky factorization of $T^{-1}$. Entry: Matrix Computations Date: Fri Nov 4 14:46:43 EST 2005 Type: tex Matrix computations, Golub and Van Loan. Should have read that book a long time ago, instead of equating the world with the FFT. Anyways, it's interesting. Browsed Mastronardi's thesis an found out what's going on with the Hankel STLS problem. STLS1 solves a linear equality constrained LS problem, while STLS2 uses a (modified) Newton algorithm on the Lagrangian of the original non--linear optimization problem. \subsection{Generalized Least squares} Let's first have a look at two generalized least squares problems. The first is the equality constrained least squares problem (LSE), given by $$\min\|Ax-c\| \text{ s.t. }Bx=d.$$ The second is the Gauss--Markov linear model problem (GLM) $$\min\|y\| \text{ s.t. }Ax + By = d.$$ Both can be solved using the generalized RQ (GRQ) algorithm. If $B$ is square and nonsingular we can set $By = z$ where we minimize $\|B^{-1}z\|$, which is a weighted least squares problem. But before we get into that, we first have a look at some short hand formulas for matrix differentials. It's easy to go from $f(x)=ax^2$ to $f'(x)=2ax$. What is the analogy for the matrix derivation? The imporant thing to realize is that there are several kinds of derivatives, depending on how we mix functions The simplest one is the differential of the parametrized vector $x(t)$, which just the vector of differentials of the components of $x$ or $$dx_i = {dx_i \over dt} dt.$$ The dual of this is the differential of the functional $f(x)$, which we can write as an inner product between the gradient and the vector of differentials of $x_i$ or $$df = \sum {\pd f \over \pd x_i } dx_i.$$ For multivalued vector functions, the derivative is the combination of both $$df_j = \sum {\pd f_j \over \pd x_i } dx_i,$$ which we usually write in the shorthand form $df = J dx$, where $df$ and $dx$ are vectors of differentials, containing the differentials of the components, and $J$ is the Jacobian matrix, containing all the first order partial differentials of $f_j$ and $x_i$. Let's have a look at some special cases. Affine vector function $d(Ax+b) = Adx$. Bilinear form $d(y^T A x) = y^T A dx + dy^T A x$, which yields the 2--norm squared $d(x^T A x) = dx^T A x + x^T A dx$, which simplifies to $x^T A dx$ when $A$ is symmetric. Note that a differential of a scalar function is always symmetric, as is the case for the Lagrangian used below. Note that when introducing the Lagrangian, we can usually clean up the equations a bit if we scale the multipliers $\lambda$ by an arbitrary constant, or equivalently, scale the cost function. The main reason for this introduction is to arrive at the Lagrange formulation for constrained optimization for a quadratic function with linear constraints, expressed above. For the LSE problem we have a langrangian $$L(x,\lambda) = \frac{1}{2}(Ax-c)^T(Ax-c) + \lambda^T(Bx-c),$$ where the dimension of the langrange multiplier vector is determined by the number of constraints. The scalar differential $dL$ in terms of vector differentials $dx$ and $d\lambda$ is $$dL = dx^T [A^T(Ax-c) + B^T\lambda] + d\lambda^T[Bx-d],$$ where we write the differentials in the quadratic forms on the left so we can group them. This is possible because the terms that we sum here are all scalars. The extremum is reached when $dL = 0$, which means the vectors occuring in the scalar products with $dx$ and $d\lambda$ both vanish. This gives the set $$\matrix{A^T A & B^T \\ B & 0}\matrix{x \\ \lambda} = \matrix{2Ac \\ d},$$ which clearly gives the unconstrained LS problem when $B=0$. Doing the same for the GLM problem gives the Lagrangian $$L(x,y,\lambda) = \frac{1}{2}y^Ty + \lambda^T(Ax + By - d),$$ which has a differential $$dL = dx^T(A^T\lambda) + dy^T(y + B^T\lambda) + d\lambda^T(Ax + By -d).$$ Requiring $dL=0$ gives the set of equations $$\matrix{0 & 0 & A^T \\ 0 & I & B^T \\ A & B & 0} \matrix{x \\ y \\ \lambda} = \matrix{0 \\ 0 \\ d}.$$ Since the equation $y = -B^T \lambda$ is trivial we can simplify this to $A^T \lambda = 0$ and $Ax - BB^T \lambda = d$ or $$\matrix{BB^T & A \\ A^T & 0}\matrix{-\lambda \\ x} = \matrix{d \\ 0}.$$ The previous equation with $B=I$ gives an alternative representation of the ordinary least squares system where the errors are explicit. \subsection{Root loci} Actually, this is quite interesting. How to roots of polynomials and systems of polynomials move? There's an article in the Mathematical Intelligencer book about this. Then there's that knot/poly particle idea I had, and the ubiquity of statements about eigenvalues under perturbation of a matrix. Not going to make a list here, but maybe i can move my particle music stuff to matrices instead of polynomials? What is so intriguing about this? It has strange dynamics. %% Probably this is the reason why they say algebraic geometry leads to %% deep waters. Anyway, back to my basic books. \subsection{Conjugate gradient} It's a bit more elaborate than i thought. Plus there's a connection to the Lanczos method. Interesting, but somehow not surprizing, since both have to do with orthogonalization. I think i understand the basics, after trying the local minimizing gradient method. The idea is to always search in directions $p_k$ orthogonal to $Ap_i$ for $i predict) 7. empathy 8. morality 9. intuition (re-representation of body state, enables empathy) These are connected to mindfulness, being present and aware. (aware of awareness, attention to intention). Creating those integrated states through practice changes the structure of the brain. States become traits with practice. Effects: change to "approach state" as opposed to "withdrawal state". Immune system functions and several fysiological properties improve. [1] http://www.youtube.com/watch?v=Gr4Od7kqDT8 Entry: Scale feedback Date: Wed Jun 10 14:25:39 CEST 2009 I worked a lot with wavelets during my time at the uni. Repeatedly I ran into the idea that it might be interesting to figure out what it means to feed information from a higher band into a lower band: letting localized small-scale phenomena influence large-scale ones. I didn't run into any use for this, nor did I encounter this in any formalized form, but it seems to be related to self-simility, fractals and deterministic chaos. Let's look at the former. If you look at a nonlinear dynamical system that can exhibit chaoitic behaviour it has the following properties: - local unstability - global stability Meaning that if you apply local linearization, there is at least one axis that experiences exponential growth. This growth is then limitied on a larger scale by "bending" trajectories back into regions that exhibit local unstability. The initial conditions of a chaotic system can be represented as a finite set of real numbers. Supose a typical 3-dimensional system. Real numbers can be represented by their binary expansion: x(0) = .1001001010... y(0) = .1110011001... z(0) = .1011001110... Folding the bits into each other gives a single sequence. s(0) = .111010011101000010111001100010... An initial condition thus has a countable number of bits of information. Computing the state vector at any time in the future can be done by integrating a set of differential equations. This can be done in such a way that the error is controllable When the initial conditions are given with enough precision, it is possible to bound the error of the result and drive it arbitrarily low. (This probably needs some constraints on the continuity of the equation though..) What makes chaotic systems special is that for a given output precision, the number of input bits necessary _grows linearly_ with time. For locally stable differential equations this is not the case. What a chaotic system does is lifting information from the depth of the decimal expansion into the observable world. This is essentially scale feedback: information that lives on a small scale is lifted into large scale. As a matter of fact, the sequence s(0) resembles a _program_ executed from left to right, generating the entire future behaviour. The computer on which it runs is an integrator, which maps a sequence of bits to a sequence of bits, given a time difference (represented as a sequence of bits). Questions * Given a classical example of a chaotic oscillator, what happens when you run it backwards? Entry: What is a polynomial? Date: Wed Jun 10 22:23:52 CEST 2009 Might sound silly, but if you look at a polynomial in RPN it's actually easier to understand. The polynomial a x^2 + b x + c Is written in RPN as 0 x * a + x * b + x * c + Which is Horner's rule[1] for polynomial evaluation. (The first x * is there only for symmetry. This leads to a straightforward geometric interpretation: a polynomail is a function that performs "iterative scaling and displacement". A power series is then the infinite extent of this: 0 ... x * a_2 + x * a_1 + x * a_0 + Somehow this makes more sense: zoom-shift-zoom-shift... Now that's one way of seeing polynomials. Another is by collecting them into a set: the set of all (finite) polynomials is closed under addition, multiplication and composition. Are there any other entities that have this property? Polynomials are functions P : R -> R. The functions + and * can be lifted from R x R -> R to P x P -> P. Functions already have a "native" binary composition P x P -> P. This yields an algebraic structure with 3 binary operations: + * o. How is this structure called? (Addition preserves order, multiplication adds order and composition multiplies order). [1] http://mathworld.wolfram.com/HornersRule.html Entry: Abstraction before Experience Date: Sat Jun 13 14:08:30 CEST 2009 As abstracted from my own experience: people learn by example. It is futile to try to explain a model before you give any examples of real-world "stuff" that the model is trying to explain. The only people this does work for is mathematicians, since this is basicly what they do. [1] entry://../staapl/20090611-231018 [2] http://thefarmers.org/Habitat/2004/04/you_cant_tell_people_anything.html Entry: Math dreams Date: Sat Jun 13 16:31:29 CEST 2009 I did this before: recording weird math intuitions when I was studying some numerical algorithms theory. Mostly stuff related to linear algebra and calculus. Maybe it's a good time to fish up some ideas? Most of it was just consolidation: marvel about finding connections that I'd label now as obvious. One of them I find quite startling still: the importance of causality in triangular decompositions of matrices. The fact that you can factor a linear transform in a forward/backward way still strikes me as odd. Let's see if I can find the post.. EDIT: I started moving stuff from the old .tex blogs to here + I'm working on some way to render .tex to .png on the server. Entry: Computation, Circuits and Physics. Date: Sun Jun 14 14:44:11 CEST 2009 (Reworked deleted post from Mon Oct 8 15:53:12 CEST 2007) An attempt to link two ideas: * The circuit-view of computation: complexity theory talks about size of static boolean circuits instead of run-time of turing machines. The former has a simpler mathematical structure. It's essential property is that it makes _dependence_ explicit. State machine transitions leave this explicit: produce some state which might be collected later, or might not. Essentially, removing _time_ from the picture makes things simpler. Time is made implicit, no longer ordered. (Dataflow turnes ordered time into a lattice.) * Taking this apart into 2 processes: Physical interpretation of compututation as reversable _transformation_ followed by _selection_ (projection). The former is related to time evolution and the concept of energy, while the latter related to projection / selection / waveform collapse, decoherence, ... CIRCUITS By ignoring the actual operation, but concentrating on the data dependency, the same pattern emerges in all computations. The essential concept is the function, the many to one map, the integration of information through selection. Between the mathematics of the natural numbers and the physical reality of computers there is only one essential difference: physical circuits cannot deal with infinite structures (like infinite digit sequences). However, they can usually deal with enumerable[1] structures upto the point of interest. What this means is that it's possible to keep thinking about infinite structures when modeling computation, but keep in mind that if you don't compute infinite results, you can get by with using only finite amount of data from the infinite models[3]. Now keep in mind that this is not that surprising if you realise that all programming languages eventually boil down to physical circuits so in some sense I'm stating the obvious. However, looking at how this _actually works_ still amazes me: everything eventually maps back to "selective" logic circuits (AND/OR), and all concepts are abstractions on top of this. EXAMPLE: LINEAR FILTER (inner product) Take the concept of linear filtering. It is several levels of abstraction removed from logic circuits, but it can be expressed quite simply. Linear filtering is the isolation of some information in a bunch of other information by transforming (rotating) it so it will stand out in an average: all parts one is not interested in will simply cancel each other out. It is interesting to look at how linear filtering is implemented, and how this high-level process of canclling / integration relates to more elementary processes of information cancelling / integration. 1. Natural numbers can be represented by infinite bit sequences (2-adic numbers[2]). Taking double-infinite series this then leads to a representation of real numbers. 2. Addition of two numbers a, b, is the "carry connecting local computation". The local computation being the XOR of 3 inputs (a_i, b_i, carry_i). This rippling carry effect connects every bit in the output with all the input bits below it. One could call addition the "causal" operation. 3. Multiplication of two numbers can then be considered as "acausal" each bit in the input is connected to each bit in the output. (Represented) real numbers with addition and multiplication, together with some lazyness tricks to avoid infinte sequences and allow implementation on fine machines is all that is necessary to make the concept of linear filters implementable. TRANSFORMATION Now, the interesting thing is to split computation in transformation and selection. In some sense, transformation is free in physics: it is time-evolution. Selection is then related to structure. Real/complex vector spaces are better suited for models of physical systems. So can't this be exploited in a better way? (Sat Apr 7 11:40:04 EDT 2007) [1] http://en.wikipedia.org/wiki/Enumerable [2] http://en.wikipedia.org/wiki/P-adic [3] http://en.wikipedia.org/wiki/Lazy_evaluation Entry: Logic vs. Math Date: Sun Jun 14 15:39:03 CEST 2009 I'm trying to understand the basics of mathematical logic. Let's try to answer the question: "What is the difference between a theory/logic and a model?", by looking at propositional logic vs. boolean algebra. Entry: Computation and Topology Date: Sun Jun 14 15:53:38 CEST 2009 (Reworked from Sat Apr 7 11:40:04 EDT 2007) The simplest model of computation is cicuits, which are directed a-cyclic graphs[1] or possibly lattices[2]. Now I wondered, can DAGs be made flat, or is "knotting" an essential part when you try to embed it into a topological structure. How does this relate to linear logic and reversable computation? Instead of not allowing destruction, what about not allowing copy/memoization? How does conservation of energy relate to conservation of "cons cells" in a tree rewriting machine? Permutation (automorphism) is actually ``Conservation of Energy'' for discrete structures. There was this crazy idea in the original post: 1) Waves going through space are boring. 2) Waves going through a knotted topology are interesting (essentially circuits). 3) Instead of looking at 3D space as the thing in which networks are embedded, why not turn it around? Maybe 3D is just the most random network? Space is disorder, therefore infinitely boring? [1] http://en.wikipedia.org/wiki/Directed_acyclic_graph [2] http://en.wikipedia.org/wiki/Partially_ordered_set Entry: Random truth Date: Mon Jun 15 11:29:57 CEST 2009 I've been reading a bit about incompleteness lately. I've read Meta Math[1] before, and a week ago I read a transcript[2] of a presentation Chaitin gave about his work. He's a funny guy, though a bit arrogant. He's a friend of Stephen Wolfram. Judging from how they write, I can imagine those two getting along :) One of the things that intrigued me and never realized until reading [1] is that computers are so central to the idea of incompleteness. I've read stuff about Goedel's proof before, but I don't seem to have gotten so much from it. What Chaitin said is that Goedel invented Lisp! Now, that got my attention. I still can't explain it to my own satisfaction, but it probably has to do with this meta-circularity of the lisp interpreter, and the relation to Goedel numbering of proofs. Somewhere there is this "level-shift" which Hofstadter talks about. I'd like to explore this a bit more. Programming and writing programming languages and compilers has given me a lot of intuitive understanding about computation. Maybe some more formal ideas won't hurt.. [1] isbn://1400077974 [2] http://www.cs.auckland.ac.nz/~chaitin/cmu.html [3] http://en.wikipedia.org/wiki/Gödel's_incompleteness_theorems Entry: Code vs. Data Date: Mon Jun 15 13:08:18 CEST 2009 If you look at the circuit model of computation, it turns out that dependency and connectedness are key ideas. Does it make sense to look at programs also _only_ from the data-flow point of view? In the end, control structures in programming languges are essentially re-used data dependency subgraphs, if you translate it to a circuit. So why is functional programming cool again? They make data dependencies explicit, allowing you to _move away_ from the iterated machine approach, where you essentially loose dependency information due to random access. Entry: From Wow! back to normal. Date: Mon Jun 15 14:22:47 CEST 2009 This happens to me quite often: I'm trying to solve some problem I can't get a handle on.. Then by manipulating it, I am eventually able to place the _problem_ into a larger picture which then leads to a very elegant solution. This makes me very excited ("This is going to change the world!") until I place the _solution_ itself into a wider context, and see that it isn't so special after all. All that happened is making an (initially) non-trivial link between two distinct representations of knowledge. Once the link is there, the whole thing seems trivial and boring, because this jolt of insight changed the way I'm thinking about the subject. I guess this is why I love math. It turns me into a hero for 5 seconds, and then places my feet back on the ground to go looking for more hits. Entry: 24 Date: Thu Jun 18 11:05:16 CEST 2009 I was originally intrigued by a post from TWF[2]. It has popped up in many other places, and is related to some interesting mathematical structures. Maybe a good starting point to do some exploring? [1] http://en.wikipedia.org/wiki/Dedekind_eta_function [2] http://math.ucr.edu/home/baez/TWF.html [3] entry://../math/20090609-145519 Entry: Fixed Points Date: Thu Jun 18 12:58:32 CEST 2009 Maybe the single most important principle in numerical computation: if you can't compute it directly, try to build a dynamical system that converges to something you can use to compute it directly. What I mean: iterative numerical algorithms take a model with a simple implementation namely update equations of discrete time dynamical systems, and use knowledge about their behaviour in the form of some mathematical theory consisting of a collection of axioms and theorems, to get to approximation of entities that are not directly computable, simply by observing the behaviour of these systems. Essentially this brings one from algebra, which in its basic form as usable on a computer is very well understood, to analysis (through limits) and theories _about_ these simple algebraic structures, which can lead to incredible diverse theories. Entry: How to do math? Date: Thu Jun 18 15:39:13 CEST 2009 (At least, what works for me?) 1. The human brain abstracts from examples, not the other way around. Do not attempt to go too far without doing active manipulations (exercise!). Mathematics is not a spectator sport. 2. Try to intuitively understand the basic concepts (definitions). If they are too abstract this means that you don't understand well enough the theory which _suggests_ these abstractions. Usually new definitions come from patterns that cannot be properly abstracted in some basic theory, but making the theory more abstract usually does expose the patterns. 3. Look at the most important theorems. Try to understand how they work intuitively, then (very important!) look at the proofs, or at least the core elements. Entry: 2D waveguides Date: Thu Jun 18 16:21:54 CEST 2009 (Not dated. From problems.tex) In one dimension, the solution of the wave equation can be split into two traveling wave equations, which can be easily discretizised using waveguides. In more dimensions, this doesn't work. How to work around that? Basically, I'd like to know what exactly is the complexity and error trade--off for simulating 2D and 3D waves. Probably, the only way to do this is using finite element methods, but, there must be some hacks floating in nD hack space. Entry: Our nonlinear ears Date: Thu Jun 18 16:22:43 CEST 2009 (Not dated. From problems.tex) Most of the current problems in computer music and musical signal processing are actually problems in ``Artifical Intelligence''. We, or some of us at least, are lucky that most sound producing phenomena are reasonably linear, or at least, have tractable non--linearities. Once we enter the domain of perceivable qualities, hell breaks loose. There is scarse information about how things actually work so introspection and intuition prevail, leading to a patchwork of heuristics and seemingly original work. Can we do something about it? Two interesting problems here are pitch detection and source separation. Entry: Cleaning up the papers/ dir Date: Thu Jun 18 16:24:27 CEST 2009 So I'm collecting some ideas I wrote down over the years in this blog. Supposedly, separating them up into bite-size chucks will make it easier for the ideas to reach their final destination, be it the dust bin. Entry: Geometric Algebra and Computer Music Date: Thu Jun 18 16:32:01 CEST 2009 Type:tex % More specifically, I will investigate for several manifolds what the % concepts of ``modulation'' and [ Raw. Needs rework. ] \section{Clifford Algebras} A thing which has always intrigued me is the algebraic aspects of sound modulation. For example, two oscillators (oscillations) $a (t)$ and $b (t)$, when combined, produce a third oscillation $c (t)$. Take for example mixing and cross--modulation. These are no more than the operations of addition and multiplication in the field of the real numbers. If we make the functions take their value in the field of the complex numbers, we can construct a notion of oscillation easily by using complex multiplication to generate it. Complex numbers seem to have rotation \emph{built in}. This can be used to give a lot of sound generating and processing operations an algebraic feel. In general, this is the subject of geometric algebra. In layman terms, we can construct a geometric algebra, or Clifford algebra, by constructing a new vector space spanned by all distinct products of $n$ orthogonal basis vectors of an $n$--dimensional vector space $V$. This gives us a $2^n$ dimensional \emph{multivector} algebra. A clifford algebra is a subgroup of the tensor algebra, which is nothing more than the algebra constructed from a vector space, using the tensor product as multiplication. The tensor product can be seen as the \emph{most general bilinear form}, or most general multiplication of vectors. This produces a graded algebra, which is a direct sum of homogenous subspaces for the vector addition. You can see this as a generalization of splitting the space of all polynomials into a direct sum of spaces of homogenous polynomails of degree $n$. Other multiplications can be defined by restricting this most general multiplication. The way to do this is to impose an equivalence relation on the tensor product. For Clifford algebras used in geometric algebra, this equivalence relation is defined by the ideal generated by $\{v \otimes v - 1 Q (v)\}$, with $Q (v)$ the quadratic form defining the algebra. This means that the tensor square of a vector is taken from the quadratic form, with all of the other structure coming from the tensor product. For example, with $V$ the 2--dimensional space spanned by two orthogonal vectors $\{\sigma_1, \sigma_2 \}$. We define a product satisfying $\sigma_1^2 = 1$, $\sigma_2^2 = 1$, $\sigma_1 \sigma_2 = - \sigma_2 \sigma_1$. The set of basis vectors for the resulting clifford algebra is the group generated by these 2 elements using the multiplication rules above, which gives an 8 element group $\{\pm 1, \pm \sigma_1, \pm \sigma_2, \pm \sigma_1 \sigma_2 \}$. This is the Dihedral group representing 2 (orthogonal) reflections of the plane $\sigma_1$ and $\sigma_2$, and their composition, $90$ degree anticlockwize rotation $\sigma_1 \sigma_2$, which forms an order $4$ subgroup. Over the reals, this generates a basis for a 4--dimensional vector algebra. As the symmetry group generated by the basis vectors suggests. Entry: Manifolds and Sound Date: Thu Jun 18 16:35:12 CEST 2009 Type: tex With $\mathcal{M}$ an arbitrary (Riemannian) manifold, how to make sounds by defining functions $\mathcal{M} \to \mathbb{R}$ (wave shapers) and curves $\mathbb{R} \to \mathcal{M}$ (modulators) on these spaces? In the case where $\mathcal{M}$ is the complex unit circle, we have the basic theory of wave shaping and modulation, which is ``playing with sine and cosine''. This story started a couple of years ago (2002-2003?) when I was using complex and quaternions multiplication and addition to perform sound modulation. I then took a detour reading about Lie groups and differential geometry and lost track of the original idea, but the basic driver has stayed the same: I'd like to get a feel for navigating manifolds like SO(3) the group of 3D rotations (the 3D projective space) and its double cover and SU(2), the group of unit norm quaternions (the 3--sphere). I'd like to do this by making sound, to see if there are any interesting algorithms hidden here, or if everything just reduces to combinations of known things. Entry: Oscillations Date: Thu Jun 18 16:43:57 CEST 2009 Type: tex \def\GF{\text{GF}} \section{Introduction} Physics is about symmetry, about invarance under transformation. I'm writing this to order my thoughts about motion, rotations, oscillations and wave equations a bit better. %% [FIXME] Split everything in oscillations due to symmetry (constants %% of motion), and others. Maybe incorporate the QM picture: %% observables generate symmetry operations. \section{Motion} In the abstract, motion is the trajectory of a point $x(t)$ through a \emph{phase space} $X$, which is a set of points representing all the possible states of a system. % The parameter $t$ is referred to as \emph{time}. % A \emph{space} is a set with some notion of \emph{distance}. Conservative systems are a special kind of systems where the trajectory is restricted to a subspace satisfying an equation $E(x) = E_i$, where $i \in I$, the index set of all possible energy levels. In other words, the energy equation creates a partition $\{X_i\}$, where each $X_i$ is a constant energy subspace of $X$. For conservative motion, each trajectory $x(t) \in X_i$, with $t \in T$, the set of parameters called \emph{time}. The set $T$ is equipped with an order. The trajectory will have the form $$x(t) = g^t x(0) = \exp(w t) x(0).$$ Here $g$ is the generator of motion. The trajectory can be seen to be a parametrized morphism of the constant energy subspace $X_i$. This equation is taken as the definition of the exponential function, which mainly relates $g$ and $w$. The parameter $w$ is the representative of the linear equation of motion, or \emph{differential equation}. The structure preserved by $G_i$ is the algebraic structure used to express the equation of motion, which is here parametrized by $w$, the \emph{momentum}. I will now relate these abstract entities to several spaces $X$, with associated spaces $T$ and $I$. %% talk about temperature and entropy \section{Discrete systems} In this case all of $X$, $T$ and $I$ are taken to be finite or countable. \subsection{Binary system} The simplest nontrivial state space that fits the abstract setting above is $X=\{(0,0), (0,1), (1,0), (1,1)\}$. Using the energy equation $E(i,j) = i + j$, there are $3$ possible energy levels $E \in \{0,1,2\}$, giving rise to the partition $\{X_i\}$ with $X_0 = \{(0,0)\}$, $X_1=\{(0,1),(1,0)\}$ and $X_2=\{(1,1)\}$. The only nontrivial trajectory can be found in $X_1$. To write equations of motion, some algebraic structure on $X$ is necessary. Take $X$ to be the two--dimensional vector space over $\GF(2)$. The equation of motion can be written as $$x(t+1) = \matrix{0 && 1 \\ 1 && 0} x(t) = Gx(t),$$ or $x(t) = G^t x(0)$ where $G$ will leave all spaces $X_i$ invariant. Identifying only one of the components as position yields an oscillation, where the other state is the hidden state, or momentum. Here we started from a space and constructed a compatible equation of motion, but in general one works the other way. Given the equation of motion, construct the geometry of phase space. Keeping the base field at $\GF(2)$, but increasing the dimensionality of the vector space allows the expressions of linear recursive equations like $$x_0(t+1) = \sum_i^{N-1} g_i x_i(t) \text{ and } s_{k+1}(t+1) =s_k(t).$$ which are used to generate pseudo--random bit sequences for example. This can still be expressed as $x(t) = G^t x(0)$ or $x(t+1) = G x(t)$. Conservative cellular automata fall in the same category. These are based on exchange operations similar to $G$. The typical mode of operation is to have two partitions, $P_0$ and $P_1$ of an $N$--dimensional grid, such that each element of the even partition has a maximal local overlap with the elements of the odd partition, and vice versa. This is the essential coupling necessairy to get wave--like phenomena. % Sorting networks are related to this too, but have probably more in common with the heat equation. %% NOTE: rectilinear motion and harmonic oscillation are the same \subsection{Finite fields} %% The simplest possible set $X$ with non--trivial algebraic structure is %% $\GF(2^2)$, where $X=\{0,1,x,1+x\}$. Addition is modulo $2$, while %% multiplication is modulo $1+x+x^2$. Looking at finite fields, there is a little more structure than the vector spaces, namely the elements can be multiplied. Let's have a look at the Galois fields $\GF(2^N)$ which can be represented by formal polynomials over $\GF(2)$, modulo an irreducible polynomial $m(x)$. The algebraic structures are equipped with multiplication and addition, which is enough to express the ideas of linear difference/differential equation and exponential, central in this paper. There are two basic structures to create oscillations, which are both related to each other. The first one deals with the multiplicative group, which is abelian. This group is also finite, which means it consists of a collection of primitive cycles of period $N_i$, such that $g_i^{N_i} = 1$. There can be several irreducible polynomials $m(x)$ that give a representation of the field $\GF(2^N)$, so it can be benifitial to choose the ``best'' one. Being only interested in cycles, I want the generator to be as simple as possible, meaning $x$, which corresponds to a simple shift. If $x$ is a generator of the largest cycle $2^{N-1}$, the modulo polynomial $m(x)$ is called primitive. %% Let's study the effect of $G$ on the basis $\{1, x, x^2, x^3\}$. This %% gives $G(1) = x + g_0$, $G(x) = x^2 + g_1$, $G(x^2) = x^3 + g_2$ and %% $G(x^3) = g_3$. \subsection{Decimal expansion} It is well known that a fraction represented as a decimal expansion in base $x$ contains infinitely repeating terms if the denominator contains prime factors which do not occur in $x$. In general, a rational number can be represented as $${n \over d} = x^p (a_0 + a \sum_k x^{-Pk}) = x^p(a_0 + a {x^P \over x^{P} - 1}) = {Nx^p \over x^P - 1},$$ where $p$ denotes the position of the point after which the decimal series starts to repeat, and $P$ is the period of repetition. Anoter way of writing the fraction is $${ a \over b} = \prod_k p_k^{m_k},$$ where $p_k$ represents the $k$th prime number. This can be identified with the previous expression by forming the product $$\prod_{k \in K} p_k^{|m_k|},$$ where $K$ is the set of indices for which $m_k < 0$ and $p_k$ does not divide $x$. All the other factors can be incorporated in $N$ and $p$. Now find the smallest $n$ such that the expression above divides $2^n-1$. \subsection{Integers and rational numbers} Starting from Pythagorian triples, which are integer solutions to $a^2 + b^2 = c^2$ interpreted as the energy equation, it is possible to create rational conservative systems $x(t+1) = Gx(t)$, where $$G = {1 \over c}\matrix{a & -b \\ b & a}.$$ This works for example for the famous $(a,b,c) = (3,4,5)$. Sets of those can be computed from two arbitrary integer numbers $m>n$, using $a = m^2 - n^2$, $b=2mn$ and $c=m^2+n^2$. When the relation holds, the determinant of the matrix is $1$, which means $G$ and all powers $G^k$ preserve the $2$--norm or energy, and are thus conservative. Time $T=\mathbb{Z}$. The phase space $X=\mathbb{Q}^2$, the spaced of ordered pairs of rational numbers. There are an infinite number of constant energy partitions parametrized by all positive rational numbers, so $I = \mathbb{Q}^+$. \subsection{Cellular automata} I'm curious about two versions of cellular automata which enable analysis in the way I present here: linear CA, which can be analyzed using generating functions or polynomials in the finite case, and the slightly more general reversible CA, where time evolution is a group action, probably always a composition of elementary bit swaps? %% \subsection{Discrete circles} %% Another thing to look at is the classic discrete circle drawing %% algorithm. Viewing this as a conservative system requires some %% fiddling. The phase space here is again $2$--dimensional, represented %% by the pair $p,q$. The energy of the system is $E(p,q) = p^2 + %% q^2$. Suppose that motion does not require this energy balance to be %% constant all the time, but minimally different from $E_0$, given that %% the updates are a binary decision problem given by %% $$q^+ = q + \Delta_p\sigma(p) \text{ and } p^+ = p - %% \Delta_q\sigma(q),$$ where $\sigma(x)$ represents the sign function %% and the $\Delta_p, \Delta_q \in \{0,1\}$ are chosen such that %% $$B(p,q) = p^2 + q^2 - E_0,$$ the energy bucket, which can contain %% negative or positive energy, remains as close as possible to zero. In %% practice only one of $\Delta_p, \Delta_q$ will change per step, which %% one determined by the absolute values of $p$ and $q$. Moreover, $B$ %% can be updated between time steps. %% \subsection{Oscillation versus rectilinear motion} %% Note that rectilinear motion always produces oscillatory behaviour in %% finite fields, which is not very surprizing due to the finiteness. %% Given two variables $p$ and $q$, with the update equation $p[k+1] = %% p[k]$ and $q[k+1] = q[k] + p[k]$. In such a system, the energy is %% purely kinetic, maining $p[k] = p[0]$ is the energy equation. %% Another way of phrasing this is that harmonic oscillation is %% rectilinear motion on a manifold constrained by the energy %% equation. So both are really the same, it is the shape of the space %% which is different. %% One thing which strikes me is that motion seems to have a lot to do %% with morphisms of fields. There is this pattern: for a finite %% dimensional state space, an equation of motion is a linear update %% equation, expressed in terms of field multiplications and %% additions. Due to the linearity of these updates equations, it is %% possible to rephrase them in terms of automorphisms of the state %% space. %% So, motion is a one--parameter submanifold of a constant energy %% submanifold of phase space, the space of possible configurations of a %% system. The one--parameter submanifold is a trajectory generated by %% $g^t = \exp(t a)$, where $g$ is a generator of a subgroup of the %% automorphism group of the constant energy manifold, and $a$ is the %% momentum. %% \subsection{Hacks} %% \subsubsection{Truncation} %% An interesting question is wether it is possible to efficiently find %% numbers like that arbitrarily close to a certain angle. Taking $n=1$ %% it is possible to go arbitrarily close to $0$. If so, it would be %% possible to ``reduce'' a complicated fraction to a simpler one. This %% would allow for an oscillator which never leaves the unit circle, but %% can be reset by allowing a small phase error when truncating the %% representation. %% \subsubsection{Cordic} %% If logarithms where cheap, running exact discrete filters in the log %% domain would make updating them fairly trivial. Oscillation is just %% finite wordlength wraparound, while decay is increment and truncation %% of the log of the amplitude. I'm curious how cordic fits in this %% picture, and wether there are % cordic, discrete log \section{The continuum} Here I take $T=\mathbb{Z}$, and $X=\mathbb{R}^N$. \subsection{Real and Complex numbers} In one dimension there's only the function $f(x) = -x$ to generate the trivial binary oscillation. Using complex numbers or multi--dimensional linear transformations allows the representation of true rotations and oscillations as multiplication $f(x) = ax$, which gives $y[k+n] = a^n y[k]$. Trying to do get oscillations with real numbers requires a two--valued system. The discrete version of the harmonic oscillator with $q$ and $p$ as state variables is $$q[k+1] = q[k] + p[k] \text { and } p[k+1] = p[k] - q[k].$$ Here I'm not using coefficients, but give the general idea: $p$ contributes positively to $q$, and thus behaves as momentum, generator of motion, and $q$ contributes negatively to $p$, so which makes sure the point will eventually stop and return to the zero point. It's easier to collapse position and momentum into one entity $x$ instead of seeing them as different entities, and write the update equation as $x[k+1] = Ax[k]$. When $\det A = 1$ and $A$ is orthogonal it produces an oscillation. The state $x[k]$ is again constrained to a lower dimensional manifold $E(x[k]) = E_0$. The manifold in this example is clearly a circle since $E(x) = p^2 + q^2$. %% A generalization of oscillation is possible by %% letting $A$ be non--orthogonal. In that case, the phase space volume %% $pq$ is still preserved, but one of $p$ and $q$ will go to zero, while %% the other goes to infinity. \section{Continuous time} \subsection{The number $e$} The function $$e^x = \lim_{n\to\infty} \left(1 + {x \over n} \right)^n$$ pops up when moving from a discrete to a continuous time representation, transforming an iterated function into a differential equation. I change notation from $y[k]$ to $y(t)$ to indicate that $t \in \mathbb{R}$. Subdividing a state update from $t$ to $t+1$ represented by the complex multiplier $a$ in the state update equation $y(t+1) = a y(t)$ can be done by setting $a = (1 + s/n)^n$ or $s = n(a^{1 / n} -1)$. This gives $$y(t + 1) = \left(1 + {s \over n}\right)^n y(t).$$ Here the update has been replaced by $n$ smaller updates. Taking the limit $n \to \infty$ gives $$y(t + 1) = e^s y(t),$$ with $s = \log a$. Let's take a look at the intermediate values. Using $\epsilon = 1/n$ and taking only a single update gives $$y(t + \epsilon) = (1 + \epsilon s) y(t),$$ which can be rewritten as $${y(t + \epsilon) - y(t) \over \epsilon} = s y(t).$$ This gives the differential equation ${dy \over dt}=sy$ when $n \to \infty$, $\epsilon \to 0$. \subsection{The wave equation} %Waves are about energy and space and time and Minkowsky spacetime. Deriving the one--dimensional wave equation from scratch goes as follows. Imagine a grid in spacetime $(t, x_1, \ldots, x_n)$. At each point in this spacetime, there are two state variables $(q, p)$ representing the position and momentum of a point. The update equations for these are similar to a simple oscillator, however the influence of position on momentum is now a contribution of all neighbours. So qualitatively it is still true that $$q[t+1,x] = q[t,x] + p[t,x],$$ but the momentum update becomes $$p[t+1,x] = p[t,x] + \sum_{\{n\}} q[t,x+n] - q[t,x],$$ where $\{n\}$ is the set of all neighbours, which is $\{-1, +1\}$ in $2$ dimensions. The continuous version of these are $${{\partial q} \over {\partial t}} = p,$$ and $${{\partial p} \over {\partial t}} = \sum_{\{s\}} {{\partial^2 q} \over {\partial x_s^2}},$$ where the $2$ sides of the discrete equation are replaced by a second derivative by substituting $$q(t, x + \epsilon) = q(t,x) + \epsilon q_x(t,x) + \epsilon^2 q_{xx}(t,x) + o(\epsilon^2)$$ in $${{q(t,x+\epsilon) + q(t,x-\epsilon) -2q(t,x)} \over \epsilon^2}$$ which yields $q_{xx}(t,x) + o(\epsilon^2)/\epsilon^2$ which gives the second derivative after taking the limit $\epsilon \to 0$. The second differential expression can be interpreted as ``The change of momentum (force) is equal to the sum of the curve direction in all directions.'' In other words, if a point's neighbours are closer to some direction, the point is attracted to that direction. %% At all times, the total energy of the system, or %% $$E = \sum_{x} p^2 + q^2$$ %% is constant. Entry: Tonal Music: Lattice Diagrams Date: Thu Jun 18 16:52:48 CEST 2009 Type: tex \begin{abstract} Some thoughts on lattice diagrams for just intonation and algorithmic composition. This is about harmonic and tonal distance. Is it possible to have two different distance measurements, and does it make sense to do so? What is the mathematical structure that results from this. [draft]. \end{abstract} \section{Space of small prime ratios} I'm looking for a space of harmonic intervals with a measure reflecting roughness. My main motivation is to take a dive in discrete geometry, since I know absolutely nothing of the subject. To represent harmonic ratios, I work in an $N$-dimensional frequency ratio space or interval space defined by the first $N$ prime numbers $p_1 \ldots p_N$. I am interested in ratios decomposed in their prime components like $$r_n = 2^{n_1} 3^{n_2} 5^{n_3},$$ with $n_i \in \mathbb{Z}$. Let's call the space of such ordered triplets $\mathbb{B_N}$. This space is in $1-1$ correspondence with a subset of the rational numbers, which I'll call $\mathbb{Q_N}$, the set of all rational composed of the first $N$ prime numbers. $\mathbb{B_N}$ is $N$--dimensional, so it can be equipped with a nontrivial geometry. Let's define a measure in this space as $$d(n,m) = \sum_k |n_k - m_k|.$$ This I call the harmonic distance. What can be said about this space? An what happens if I relate it to the ordinary distance measure in $\mathbb{Q}$, which we could call the melodic distance? Because the ratio basis $B=\{2,3,5\}$ has relatively large unit steps, It is possible to use another basis which has smaller steps, i.e. $B'=\{2,\frac{3}{2}, \frac{5}{4}\}$, corresponding to octave, perfect fifth and major third. Here the mapping $b' = Tb$ and its inverse are defined by \begin{equation} T,T^{-1} = \left( \begin{array}{rrr} 1 & -1 & -2 \\ 0 & 1 & 0 \\ 0 & 0 & 1 \\ \end{array} \right), \left( \begin{array}{rrr} 1 & 1 & -2 \\ 0 & 1 & 0 \\ 0 & 0 & 1 \\ \end{array} \right) \end{equation}. \begin{table} %REF: http://www.redshift.com/~dcanright/harmser/ \begin{tabular}{|l|l|l|l|} \hline ratio & name & $\{2,3,5\}$ & $\{2,3/2,5/4\}$ \\ \hline $2/1$ & octave & $(1,0,0)$ & $(1,0,0)$ \\ \hline $3/2$ & 5th & $(-1,1,0)$ & $(0,1,0)$\\ $4/3$ & 4th & $(2,-1,0)$ & $(1,-1,0)$\\ \hline $5/4$ & M 3rd & $(-2,0,1)$ & $(0,0,1)$ \\ $8/5$ & m 6th & $(3,0,-1)$& $(1,0,-1)$ \\ $5/3$ & M 6th & $(0,-1,1)$ & $(1, -1,1)$\\ $6/5$ & m 3rd & $(1, 1,-1)$& $(0, 1, -1)$ \\ \hline $15/8$ & M 7th & $(-3, 1, 1)$& $(0,1,1)$ \\ $16/15$ & m 2nd & $(4, -1, -1)$& $(1,-1,-1)$ \\ \hline $9/8$& M 2nd & $(-3, 2, 0)$& $(-1,2,0)$ \\ $10/9$& M 2nd (2) & $(1,-2, 1)$& $(1,-2,1)$ \\ $16/9$& m 7th & $(4,-2, 0)$& $(2,-2,0)$ \\ $9/5$& m 7th (2) & $(0,2, -1)$& $(0,2,-1)$ \\ \hline \end{tabular} \caption{Some octave normalized positive intervals with inverses.} \label{tab1} \end{table} \begin{table} \begin{tabular}{|l|l|l|} \hline ratio & name & $\{2,3,5,7\}$ \\ \hline $7/4$ & bluesy flat seventh & $(-2, 0, 0, 1)$ \\ $8/7$ & ? & $(3, 0, 0, -1)$ \\ $7/6$ & bluesy minor third & $(-1, -1, 0, 1)$ \\ $12/7$ & ? & $(2, 1, 0, -1)$ \\ $7/5$ & diminished fifth & $(0, 0, -1, 1)$ \\ $10/7$& ? & $(1, 0, 1, -1)$ \\ $9/7$& ? & $(0,2, 0, -1)$ \\ $14/9$& ? & $(1,-2, 0, 1)$ \\ \hline \end{tabular} \caption{More intervals based on $7$.} \label{tab2} \end{table} Note that each $T$ mapping the lattice onto itself must be unimodular or $\det T = \pm 1$. See Table \ref{tab1} for a list of intervals from the $3$-dimensional basis. Table \ref{tab2} has some extensions from a $4$-dimensional interval space. %% In order to limit the grid to a small set of notes, we can introduce %% ``wormholes''. I.e. two points are considered equal if their melodic %% distance falls below a certain threshold. When traversing the grid to %% build a melody for instance, we can take big leaps with a small %% melodic distance, or zap to a location which is similar and continue %% the journey. Entry: Scale and Rotate Date: Fri Jun 19 10:59:12 CEST 2009 Type: tex % Start EDIT here. Linear algebra and basic geometry are very much related, which gives most operations on matrices an interesting geometric interpretation. We will see that matrices can be seen as a generalization of complex numbers. So what are complex numbers, in short? Compared to real numbers, complex numbers are richer because they are 2--dimensional. This introduces an additional concept of \emph{angle}. Complex multiplication can modify an angle and can so be seen as to combine the effects of both \emph{scaling}, a quality inherited from the reals, and \emph{rotation}, a new quality involving the manipulation of direction. Going to higher dimensions, we will see the concepts of scaling and rotation are still there, though what we call rotation has become much richer. The biggest change in moving from complex numbers to matrices is the loss of commutativity. The same happens in geometry for dimensions $\geq 3$, where rotations no longer commute. So, what are the analogs? Over $\RR$ we have orthogonal and symmetric matrices. Over $\CC$ we have unitary and Hermitian matrices. Both are related to their conjugates (transpose versus complex conjugate transpose) in special ways. A unitary matrix $U$ is related to its conjugate as \begin{equation} U^H = U^{-1}. \end{equation} A Hermitian matrix $R$ is related to its conjugate as \begin{equation} R^H = R. \end{equation} The same goes for orthogonal and symmetric. Where did we see this before? Suppose the matrices are $1 \times 1$, so they behave like complex numbers. This gives $\bar{u} = u^{-1}$, and $\bar{r} = r$. Which means $u$ resides on the unit circle and $r$ is a real number. For real matrices, the same analogy holds, only they cannot represent complex numbers in the $1 \times 1$ case, where $u \in \{\pm 1\}$, but if we take $2 \times 2$ matrices, they can! Take for example \begin{equation} U = \matrix { cos\theta & \sin\theta \\ -sin\theta & \cos\theta } \text{ and } R = \matrix { r & 0 \\ 0 & r} \end{equation} We can make it completely obvious by writing this as $U = I \cos\theta + J \sin\theta$ and $R = r I$, where $J^2 = -I$. On to the concept of Eigenvalue. We see that eigenvalues are defined as the solutions to the equation \begin{equation} Ax = x \lambda, \end{equation} with $\lambda$ a number called an Eigenvalue and $x$ a vector called an Eigenvector. This means that from the perspective of $x$, the matrix $A$ really behaves as just a number. There are $N$ such directions for an $N \times N$ matrix $A$ in which $A$ behaves as just a number. There is one exception to this rule though, which is related to the generalization of the number $0$ to nilpotent transforms, which leads us to Jordan forms, but we can safely ignore this right now. If we arrange these $N$ vectors as columns of an $N \times N$ matrix $X$, and arrange the corresponding eigenvalues in a diagonal matrix $\Lambda$, we get the relation \begin{equation} AX = X \Lambda, \label{eigenvaluedecomp} \end{equation} which leads to $A=X \Lambda X^{-1}$, which reads $A$ and $\Lambda$ are similar. An important remark to make here is that, even if $A$ is real, $\lambda$ can still be complex. The reason for this is, that if $A$ represents a rotation, it can only be represented as a number if this number can represent a rotation, which only complex numbers can. We saw above that complex numbers can be represented by real matrices, so for directions where we cannot express the effect of $A$ as simply a scaling by $\lambda$ along a line, we can represent the effect as a rotation in a plane spanned by two vectors. In other words we have \begin{equation} A \matrix{x_1 & x_2} = r \matrix{x_1 & x_2} \matrix{cos \theta & sin\theta \\ -\sin\theta & \cos\theta}, \end{equation} with $r$ and $\theta$ real numbers. The $2 \times 2$ matrix performs the rotation, and $r$ performs the scaling. When we have $2N$ dimensions, we can have $N$ orthogonal rotation planes, so (\ref{eigenvaluedecomp}) exhibits a matrix $\Lambda$ which carries $2\times 2$ blocks when we don't resort to complex representation where a rotation is represented by $2$ complex conjugate eigenvalues, and $2$ complex conjugate eigenvectors. Ignoring the scaling factor $r$, and setting $x = (1/\sqrt{2}) (x_1 + i x_2)$, we have \begin{equation} \matrix{x & \bar{x}} \matrix{e^{i\theta} & 0 \\ 0 & e^{-i\theta}} \matrix{x & \bar{x}}^H. \end{equation} The complex conjugates assure the product is real. Following from $x^H x = 1$ and $\bar{x}^H x = 0$ we have $x_1^T x_1 = x_2^T x_2 = 1$ and $x_1^Tx_2 = 0$ yielding the real form \begin{equation} \matrix{x_1 & x_2} \matrix{ \cos\theta & \sin\theta \\ -\sin\theta & \cos\theta} \matrix{x_1 & x_2}^T. \end{equation} This enables us to clearly see what it means to have \emph{complex conjugate} eigenvalue/eigenvector pairs: a 2D rotation in the plane spanned by the real and imaginary parts of the eigenvectors. With $\theta$ going counterclockwize, the rotation is from $x_1$ to $x_2$. So what does it really mean for 2 matrices to be similar? The definition says $A = QBQ^{-1}$. Now suppose $A$ and $B$ are complex numbers. In that case, we have commutativity leading to $A = B$. So similar matrices are one generalization of equality of complex numbers, reflecting the fact they have the same eigenvalues, since they are similar to the same diagonal form. Equality is mapped to an equivalence relation, which ignores \emph{coupling}. If similarity maps to equality in the complex number analog, where do we fit congruence? The complex analog for $A = QBQ^{H}$ is $a = q b \bar{q} = q\bar{q} b$. So the effect of $q$ changes magnitude, but not angle. Most of the time we encounter this $B$ and $A$ will be Hermitian or symmetric, so the analogy reads as ``congruence transform preserves signature''. So, we see that matrices are really just another kind of number, though they are richer since they exhibit a larger number of phenomena. The main differences are non--commutativity, making $AB != BA$, which enables them to represent things like rotations in higher dimensions, and the presence of zero devisors, which we will discuss later. We saw above that a lot of structure survives from the complex numbers. Let's make some more links. The concepts of unit complex number survives as unitary or orthogonal matrix, and the concept of real number survives as Hermitian or symmetric matrix, since these have real eigenvalues. Another remark is that squares of real numbers are positive, while $X^H X$ and $X X^H$ are positive definite, which enables us to talk about \emph{real square roots} of positive definite Hermitian matrices. Note that each matrix $Z$ can be split in a Hermitian part $(Z+Z^H)/2$ and an anti--Hermitian part $(Z-Z^H)/2$. Rewriting the last one as $i(Z-Z^H)/(2i)$, enables us to split each matrix as $Z = X + iY$, with both $X$ and $Y$ Hermitian. The exponential of an imaginary number has unit modulus, while the exponential of a skew Hermitian matrix is a unitary matrix. The exponential of a matrix $Z$ is defined through the ordinary power series $\exp{Z} = \sum_k Z^k / k!$. For normal matrices, we can use the eigenvalue decomposition to compute this, which reduces to a decoupled scalar exponential. Entry: Generalized rotations Date: Fri Jun 19 11:00:12 CEST 2009 Type: tex What is an elementary rotation or reflection from a more general point of view? For any real $2 \times 2$ orthogonal matrix $Q$ we have $Q^TQ = QQ^T = I$. For any vector $x$, its length is equal to the length of $Qx$. So transformation by $Q$ preserves length. What if we replace the condition on $Q$ with $Q^TJQ = QJQ^T = J$? Basicly it is still the same principle, only we do not preserve the length, but the square of a generalized length determined by the quadratic form $x^TJx$. In other words, we use a different \emph{signature}. If $J$ is not positive definite, it is no longer a norm. If it is positive semi--definite, it still represents a semi--norm. If we take length to mean square root of a quadratic form, an indefinite $J$ can yield ``imaginary length'', since the quadratic form can return negative results. Let's have a look at this. Changing as little as possible compared to ordinary elementary rotations, we investigate \begin{equation} J = \matrix{1 & 0 \\ 0 & -1}. \end{equation} A physicist would call a $2$--dimensional space equiped with such a quadratic form a $2$--dimensional spacetime. For example, special relativity requires the laws of physics to be invariant under transforms that conserve this entity, which is called \emph{spacetime interval}, here expressed as $x^TJx$, or $x_1^2 - x_2^2$. Transformations that do so are called \emph{Lorentz transformations}, but we will just call them hyperbolic rotations, because they preserve the entity $x_1^2 - x_2^2 = c$, which is the equation of a hyperbole. It's asymptotes are given by $(x_1-x_2)(x_1+x_2)=0$. As circular rotations leave circles invariant. Hyperbolic rotations leave hyperboles invariant. So both are essentially the same except for the effects of $J$. It is no surprise then that just like the ordinary $2$--dimensional rotation, this has just one degree of freedom. It can be written in the following (normalized) form \begin{equation} \frac{1}{\sqrt{a^2-b^2}}\matrix{a & b \\ b & a}. \end{equation} Compare this to the ordinary $2$--dimensional rotation \begin{equation} \frac{1}{\sqrt{a^2+b^2}}\matrix{a & b \\ -b & a}. \end{equation} we treated above. Usually one sets $a = 1$ and nominates $b$ to \emph{reflection coefficient}. A different parametrization is $a = \cosh t$ and $b = \sinh t$. Using the latter parametrization we get this \begin{equation} \frac{1}{2} e^t \matrix{1 & 1 \\ 1 & 1} + \frac{1}{2} e^{-t} \matrix{1 & -1 \\ -1 & 1}, \end{equation} which is the expanded eigenvalue decomposition (or SVD) of the elementary hyperbolic rotation in terms of the asymptotes of the hyperbole $v_1 = \frac{1}{\sqrt{2}}\matrix{1 & 1}^T$ and $v_2 = \frac{1}{\sqrt{2}}\matrix{1 & -1}^T$ and can be written as \begin{equation} \matrix{v_1 & v_2} \matrix{e^t & 0 \\ 0 & e^{-t}} \matrix{v_1 & v_2}^T. \end{equation} Here it is clear to see that, although ordinary (non--hyperbolic) angles are not preserved, the determinant is still $1$ which means this transform does conserve area, using expansion along one asymptote enough to compensate compression along the other. Another class of strange gizmos are the symplectic matrices, which are square $2N \times 2N$ matrices orthogonal to a weighting \begin{equation} J_{2N} = \matrix{0 & I_N \\ -I_N & 0}. \end{equation} Alternatively, one uses a permutation of this as a block diagonal matrix with $J_2$ blocks, which suggests that we are really talking about a different kind of number here. So it suffices to study the effect of a single such a block. The essence of $J$ is that it is anti--symmetric, which yields an anti--commuting quadratic form $b^TJa = -a^TJb$. This is completely different from any other signature matrix we seen before, which were all symmetric matrices related to positive (or negative) real numbers. In the same way, $J$ can thus be seen as the imaginary (anti--symmetric) part of some arbitrary matrix. Also note that $J_{2N}^2 = -I_{2N}$. Let's follow the lead here and move to a complex vector space $\CC^N$, coupling 2 components of the original vector space into one complex component, according to the block diagonal distribution of $J$. The quadratic form above is now represented by $\text{imag}(b^Ha) = \frac{1}{2}(b^H a - a^Hb)$. Now, an interesting question would be: are there any numbers that can represent these generalized rotations, in the same fashon the complex numbers can represent circular rotations? The answer is yes of course. Just imagine them by introducing a set of symbols which are not real number and define their multiplication table. For hyperbolic rotations, this leads to the split complex numbers, defined by the relation $i^2 = 1$. For rotations in $3$ dimensions these are the quaternions defined by $i^2 = j^2 = k^2 = ijk = -1$. For arbitrary signature and dimension these generalizations are called Clifford algebras. Most of these are no longer fields but rings, with the exception of the quaternions which form a division algebra. This is the main reason you don't see them much. The only way they show up usually is in the form of the odd structured matrix block, like we saw before\footnote{ In fact, we can see matrices as a generalization of all these kinds of numbers. Matrices can represent quite a lot of structure. This is due to an analog to Cayley's theorem for groups. The algebra version says that each associative algebra with unit element is isomorphic to a subalgebra of the automorphism algebra of some vector space. Note that this is not true for non--associative algebras, like the octonions, which are fundamentally different from anything the matrix multiplication can represent without modification. }. % what is a vector field over a division ring? Entry: Projection and Nilpotents Date: Fri Jun 19 11:01:12 CEST 2009 Type: tex Ane thing we didn't talk about yet is the ability for matrices to exhibit behaviour like $AB = 0$ when $A \neq 0$ and $B \neq 0$, and stuff like $A^k = 0$ for $k > K$. In other words, there are \emph{zero divisors} and \emph{nilpotent elements}. This enables them to represent \emph{projection} and \emph{compaction}. Matrices are a \emph{ring} and not a \emph{field}. So some elements are not invertable. % entropy anyone? % \footnote{ It is possible to have multidimensional extensions of the % real numbers which are still fields, however, these can all be % nicely contained in the ring of matrices. It turns out that % giving up commutativity gives rise to a lot more structure so we % can represent more phenomena that with numbers alone. For % example, rotations in 3D do not commute. This is an essential % property of \emph{direction} in higher dimensions. is dit wel % helemaal waar? waarschijnlijk wel.. maar ik kan het niet zomaar % bewijzen.. bv z^N+1 met N even geeft een N-dim veld over \RR % representeerbaar als diagonaalmatrices of semi-circulante matrices % } Entry: Triangular matrices Date: Fri Jun 19 11:02:12 CEST 2009 Type: tex So, in the light of the remarks above, what is a triangular matrix? While symmetric matrices are important from a more theoretical point of view, as a kind of generalization of (positive) real number, a triangular matrix can also fulfill this role. However, triangular matrices are imporant from a practical perspective: they are easily invertable, and share this property with unitary/orthogonal matrices. They are a bit like diagonal matrices, but are only half--decoupled. Enough to have $O(N^2)$ multiplication \emph{and} solving time. The application of the inverse of a triangular matrix to a vector is easy to compute in a recursive way. This is quite important, so we have a look at it. Suppose we want to solve the following $N \times N$ system \begin{equation} Lx = y, \end{equation} with $L$ a lower triangular matrix, meaning $l_{ij} = 0$ for $i causaliteit -> light cone -> hyperbolic % space is er een direct verband tussen driehoek en hyperbolic space? % indirect is er displacement rank, en identificatie van % driehoeksmatrices met polynomen, en `kwadraten' van die polynomen % die een soort gegeneraliseerde hyperbool vormen.. What a Givens rotation is for rotations, and a Householder reflection is for reflections, a Gauss matrix is for\ldots Gaussian elimination. This matrix is also called an atomic triangular matrix, and looks like \begin{equation} L_l = \matrix{ I_{i-1} & 0 & 0 \\ 0 & 1 & 0 \\ 0 & l & I_{N-i-1} } \end{equation} where $l$ is an arbitrary column vector residing in column $i$. Note that it is a vector--parametrized matrix, for vectors ranging from dimension $N-1$ to $1$. The effect of applying a row elimination downward can be expressed by applying such a matrix from the left. It can be seen as to spread out a coordinate $i$ over all coordinates $j>i$\footnote{ The interesting thing about this matrix is that $L_l ^{-1} = L_{-l}$, so in that respect they behave a bit like rotations, for which some kind of conjugation represents inverse. Another very interesting thing to remark is that $L_a L_b = L_{a+b}$. They map multiplication to addition! This makes the map $l \to L_l$ behave like an exponential. This shouldn't be so surprizing because they represent the operation of adding one coordinate to other coordinates, which obviously behaves as additions, as a matrix multiplication.}. So we see triangular matrices are really important, because they allow us to introduce recurrence relations between matrices exhibiting some form of triangular structure. These recurrence relations are a direct consequence of subspace nests. Recall the subspaces $T_i \subset T_j$ if $i \geq j$. For a triangular transform $L$ we have $LT_i \subset T_i$. This means the effect of $L$ on the orthogonal complement of $T_i$ is independent of $T_i$. It is exactly this property which allows us to concentrate on the effect of $L$ on the $1$--dimensional complement of $T_{N-1}$, and reduce the dimensionality of the problem. In other words, matrix inversion is a decoupling process. We ``disentangle'' a matrix, by identifying its invariant subspaces. However, by disentangling only in one direction, we are able to effectively map the problem to a causal solution method, which is all we need. \subsection{Outer product notation} In the sections to come, there are a lot of formulas that would look very redudant when we use standard matrix notation. The pattern which keeps recurring is $QJQ^H$, or more generally $Q\tilde{Q}^H$, where $\tilde{Q}$ is the dual of $Q$, where the word dual is understood in the current context. For example, we can write the LU factors as $L\tilde{L}^H$. In the case of symmetric matrices, the matrix $J$ is a signature matrix. Examples of new notation are $A = L\tilde{L}^H = [L] = \sum_k [l_k]$, where $[X]$ has the meaning of outer product between $X$ and the dual, clear from context. In case we want to make the signature explicit, we can write $QJQ^H = [Q]_J$. Later, for displacement rank expressions, we have $R - ZRZ^H = GJG^H$, which can more concisely be written as $R-[Z]_R = [G]_J$, or alternatively as $[I]_R-[Z]_R = [G]_J$. For a rank one matrix we could write $[x] = [x]_1$, to emphasize it is a matrix generated by a vector $x$ and its dual $\tilde{x}$. Entry: Cholesky decomposition Date: Fri Jun 19 11:03:12 CEST 2009 Type: tex Let's put this knowledge about triangular matrices to the test. Any Hermitian, positive definite matrix $A$ can be written as \begin{equation} A = L L^H = U^H U, \end{equation} with $L = U^H$ a lower triangular matrix. It can be seen as the triangular square root of $A$, as opposed to the Hermitian square root, which can be computed from the SVD or Eigenvalue decomposition. It is unique, as long as we require the diagonal elements to be positive, containing the square roots of the (positive) eigenvalues of $A$. The algorithm to compute the decomposition has a similar recursive taste as the solving of triangular systems. Let's write $L$ in terms of columns $l_i$ as $\matrix{l_1 & \ldots & l_N}$. This enables us to write \begin{equation} A = \sum_k l_k l_k^H. \end{equation} Looking at this sum, it consists of a bunch of rank 1 matrices with top rows and left colums zero, execept for the first, which is dense. The top row and left column are only influenced by $l_1 l_1^H$, so that's where we start. Looking at $l_1 l_1^H$, its top left element is $l_{11}^2$, so $l_{11} = \sqrt{a_{11}}$. The top row is the same as the left column, so we look only at the left column. The next element down $l_{21} l_{11}$ has to be equal to $a_{21}$, so we get $l_{21} = a_{21} / l_{11}$, and so on. This gives us $l_1$, the first column of $L$. Subtracting $l_1 l_1^H$ from $A$ now gives us a matrix with top row and left colum equal to zero. If we remove those, we have a Cholesky decomposition problem of order $N-1$, which we solve recursively until we get to the trivial $1 \times 1$ problem. Entry: LU decomposition Date: Fri Jun 19 11:04:12 CEST 2009 Type: tex We continue our excursion into to land of the triangle matrices. Next stop is the LU decomposition. Any invertible matrix $A$ can be written as \begin{equation} A = L U, \end{equation} with $L$ a lower triangular matrix and $U$ an upper triangular matrix. Sometimes a permutation matrix enters the picture to increase numerical stability. Because of the properties of triangular matrices, this can be regarded as an alternative representation of the matrix $A$ such that applying $A$ and $A^{-1}$ have the same complexity. See the remarks about triangular matrices above. A variant can be written $A = LDU$, with $D$ diagonal and $U,L$ unit upper and lower triagular. This is usually computed with gaussian elimination, and used for solving linear systems, or any application where matrices and inverses are necessary, and numerical stability permits. It is interesting to rewrite the LU decomposition as $A = L \tilde{L}^H$, simply renaming $U$ to $\tilde{L}^H$. What's in a name! A lot, seemingly. This gives us a better picture about what is actually going on. Applying the same notational trick as we did for the Cholesky decomposition gives \begin{equation} A = \sum_k l_k \tilde{l}_k ^H \end{equation} Here again we can see $A$ modeled as a sum of rank 1 matrices, with top rows and left columns zero, though they are \emph{asymmetric} this time. We can follow almost the same recursive algorithm as we used the Cholesky decomposition, only we need to compute the top row too, which is $\tilde{l}_1$. % Note: Gaussian elimination + structure -> generalized schur algo % zie kailath slides_main.ps Writing LU decomposition in terms of $2 \times 2$ block matrices gives \begin{equation} \matrix{A & B \\ C & D} = \matrix{I & 0 \\ CA^{-1} & I} \matrix{A & B \\ 0 & D - CA^{-1}B}. \label{schurcomp} \end{equation} The part $D - CA^{-1}B$ is called the \emph{Schur complement} of $A$. This leads to a decoupling of two orthogonal complement spaces. If we write vectors as $(x,y)$ corresponding to the compartiments, then the right block upper triangular factor leaves invariant the subspace $(x,0)$ and the left block lower triangular factor does the same for the subspace $(0,y)$. Interpreting $A$ in (\ref{schurcomp}) as a scalar gives us a way to reduce an $N \times N$ problem to a trivial one and a $N-1 \times N-1$ problem for the Schur complement of $A$. The interesting thing to note is if we have $A = LU$, and squeeze $\text{diag}(L L^{-1}, I)$ between the two matrices above, and distribute evenly left and right, we get \begin{equation} \matrix{A & B \\ C & D} = \matrix{L & 0 \\ CU^{-1} & I} \matrix{U & L^{-1}B \\ 0 & D - CA^{-1}B}, \end{equation} which is a partial LU decomposition in terms of $A = LU$. Entry: QR decomposition Date: Fri Jun 19 11:05:12 CEST 2009 Type: tex We talked above about triangular matrices and their particulary attractive features having mostly to do with the nice recursive algorithms for their inversion, which ultimately boil down to the invariance of subspace nests. Looking at complexity alone, a triangular matrix behaves like an orthogonal one, save the fact that it is not orthogonal of course. (But, as we will see, this depends on the definition of orthogonal!). So let's have a look at the QR decomposition, which factors a square matrix $A$ as the product of an orthogonal (Hermitian) matrix $Q$ and an upper triangular matrix $R$ \begin{equation} A = Q R, \end{equation} In terms of the outer product notation of the product of two matrices, this can alternatively be expressed as $A = QL^H = \sum_k q_k l_k^T$. The first thing this shows us is that $L$ is in fact the Cholesky factor of $A^HA$. This makes the QR decomposition look a lot like polar decomposition, identifying $R$ as the positive real part. This decomposition is not unique, unless we fix the diagonal elements of $R$ to be positive. If $A$ is real, both $Q$ and $R$ are real, with $Q$ orthogonal. The QR decomposition can be seen as an alternative polar decomposition like (\ref{polarmatrix}). A common misconception is that the diagonal of $R$ correspond to the eigenvalues of $A$. This is not true, however the determinant of $A$ is conserved, and equal to the product of the diagonal elements of $R$. The factorization can be generalized to non--square $M \times N$ matrices, with $M > N$, by imposing $Q^H Q = I$, which is the first step in computing the SVD for an unstructured matrix. This is often used to solve the least squares problem $Ax \approx b$ by converting it to the triangular system $Rx = Q^Hb$. In that case $Q^H$ projects $b$ into the column space of $A$. The QR decomposition can be computed using Gram--Smidth orthogonalization, Givens rotations or Householder transformations. Using the Gram--Smidth procedure, we write (\ref{gramsmidth}) with $W=I$ and $a_n$ the columns of $A$, or \begin{equation} a_n = \|f_n\| e_n + \sum_{i=0}^{n-1} (e_i^T a_n) e_i. \end{equation}. Building $Q$ with columns $e_i$ gives us $R$ with diagonal elements $\|f_n\|$ and $e_i^T a_n$ in row $i$ and column $n$. A variant of this is the inverse QR decompostion \begin{equation} A R = Q, \end{equation} with $R$ triangular and $Q$ orthogonal. It can be computed as well using Gram--Smidth orthogonalization by observing the columns of the matrix $R$ are orthogonal for the inner product weighted by $A^TA$, or $R^T A^T A R = Q^T Q = I$. This is related to the inverse Cholesky decomposition of $A^TA$ or $(A^TA)^{-1} = L L^T$ with $L = R^T$. In practice this is of little use in the most general situation, since solving and applying a triangular matrix are of the same complexity order. % general pattern: write recurrence relation + use commute/inverse of % real/complex numbers i wonder if it is possible to automate this % search for algorithms :) % note that the relationship between QR and Schur decomp are very % related to the polar decomposition of complex numbers again. Entry: Schur decomposition Date: Fri Jun 19 11:06:12 CEST 2009 Type: tex Any square matrix $A$ can be written in the form \begin{equation} A = Q U Q^H, \label{schurdecomp} \end{equation} where $Q$ is unitary, and $U$ is upper triangular with the diagonal containing the eigenvalues of $A$. This is somehow a middle road between SVD and EVD. When $A$ is normal, it is the same as the EVD, and U will be diagonal. If it is Hermitian and positive definite, it is in addition equal to the SVD. % SPINORS! % First let's see what this means going back to our complex number % analog. Suppose $a = qr$, with $q = e^{i\theta}$, we can rewrite % this as $a = p r \bar{p}$ with $p = e^{i \theta/2}$. Now this is % just an alogy, but there must be some relationship because. % To solve this, we resort to the same tricks we used before, which is % solving the $2 \times 2$ problem, and then generalizing it to more % than one dimension. % this by using tricks we can do with triangular and orthogonal matrices. Since this involves triangular matrices, let's see if we can come up with a recurrence relation like we did before. Rearranging (\ref{schurdecomp}) gives $AQ = QU$. The first column of $U$ contains only one nonzero element, which makes the first column of $AQ$ proportional to the first column of $Q$, making that column an eigenvector of $A$ and the nonzero element is an eigenvalue. In order to construct the decomposition, we find one eigenpair $(\lambda, q)$. We complement the eigenvector $q$ with $N-1$ orthogonal vectors to form a unitary matrix $Q_0$. Computing the product gives $Q_0^H A Q_0 = L_0$. Rearranging this gives the recurrence relation \begin{equation} A = Q_0 \matrix{\lambda & \times \\ 0 & A_1} Q_0^H. \end{equation} We can expand this until lower right block becomes a scalar. This routine just prooves existence, and needs a ``subroutine'' to find the eigenvalue and eigenvector. If $A$ is normal, meaning $A A^H = A^H A$, the Schur decomposition is the same as the eigenvalue decomposition, and $U$ is diagonal. If $A$ is normal and positive definite, the eigenvalues are real and the Schur decomposition is equal to the Singular Value Decomposition. A special case of normal matrices are circulant matrices, which are constant along diagonal lines that ``wrap around''. These are a special kind of Toeplitz matrix. In this case the unitary transform for diagonalization is the discrete Fourier transform, which can be computed in $O(N\log N)$ operations, and is of great importance in a lot of engineering applications. Entry: Eigenvalue decomposition Date: Fri Jun 19 11:07:12 CEST 2009 Type: tex An eigenvalue $\lambda$ of a square matrix $A$ satisfies \begin{equation} Ax = \lambda x \end{equation} for some eigenvector $x$. The algebraic multiplicity of eigenvalue $\lambda$ is the multiplicity of the root $\lambda$ in the polynomial $\det(A-\lambda I)$. The geometric multiplicity is the dimension of the kernel of $A-\lambda I$. If both are equal for all eigenvalues, we can write the normal form for the similarity transform, also called the eigenvalue decomposition \begin{equation} A = S \Lambda S^{-1}, \label{eigendecomp} \end{equation} with $\Lambda$ a diagonal matrix containing the eigenvalues of $A$, and the corresponding columns of $S$ containing the eigenvectors. If $A$ is normal, we have the normal form for the congruence transform, with $S^{-1} = S^H$ unitary, containing the normalized eigenvectors which form an orthogonal base for the column space of $A$. The eigenvalue decomposition then corresponds to the Schur decomposition. In this case, we can obtain the normal form for the congruence transform, by absorbing the quare root of the eigenvalues in the left and right matrices $X = S \sqrt{|\Lambda|}$, and keeping only their sign in the diagonal matrix. With $J$ a diagonal matrix containing only $\pm 1$ and $0$ elements, this gives the normal for for the congruence transform for a Hermitian matrix as \begin{equation} A = X J X^H. \end{equation} An example of a non normal matrix is $I+Z$, with $Z$ the down shift matrix defined in (\ref{shiftdown}). Here it is clear to see that $ZZ^T \neq Z^TZ$. They are mirror images around the anti--diagonal, with a zero in to top left and bottom right respectively. Matrices like these can be brought in Jordan form. They are similar to a diagonal matrix plus a nilpotent matrix. Entry: Jordan form Date: Fri Jun 19 11:08:12 CEST 2009 Type: tex If the geometric and algebraic multiplicities for eigenvalues $\lambda_i$ do not conincide, we cannot diagonalize $A$ using the eigenvalue decomposition. However, we can still write its Jordan form\footnote{In practice, when the numbers in a matrix are a result of measurement, this only occurs in engineering applications if the matrix is structured. Nilpotent transforms, and more generally, rank--deficiency, have an \emph{exact} quality, that has effectively zero probability to occur as a result of measurements alone. For rational and finite fields this is of course different, but there it is arguable that this is again imposed structure, since we think of physical qualities mostly in terms of real numbers. }. Intuitively one could say the Jordan form expresses the fact that a linear map (endomorphism) can essentially perform only 2 kinds of space--invariant operations that are fundamentally different in nature: scaling/rotation and `collapse'. The first kind expressed by the eigenvalue decomposition, while the last one can be found in its pure form in nilpotent transforms. The Jordan canonical form writes a linear transformation as a combination of both kinds. Suppose $Z$ is a nilpotent matrix with index $p$, meaning $Z^p = 0$ and $Z^{p-1} \neq 0$. There is a vector $n$ such that $Z^{p-1} n \neq 0$. The set of vectors $n_k = Z^k n$ are independent and span a $p$--dimensional subspace of $Z$. Since $Z$ maps each $n_k$ to $n_{k+1}$, the space is invariant under transformation by $Z$. Suppose $\lambda$ is an eigenvalue of $A$ with algebraic multiplicity $m$, and $k$, with $1 < k < m$, is the dimension of the kernel of $A-\lambda I$. This means we can find $k$ independent eigenvalues. We can augment this base with $m-k$ additional vectors to span a space that is invariant under $A$. This space is called the Jordan space corresponding to eigenvalue $\lambda$, and is equal to the kernel of $(A-\lambda I)^\mu$, with $\mu$ the smallest number for which the dimension of this kernel does not longer grow. A Jordan chain is a sequence of vectors $H_k$, such that $H_0 = 0$ and $H_{k} = (A-\lambda I) H_{k+1}$. For each vector in the chain we have $AH_{k+1} = H_k + \lambda H_{k+1}$, which means the space spanned by such a chain is invariant under $A$. This makes $H_1$ an eigenvector since $AH_1 = \lambda H_1$. For each $\lambda$, we can find a number of Jordan chains to span the Jordan space. This gives rise to the decomposition \begin{equation} A = S (\Lambda + N) S^{-1}, \end{equation} with $\Lambda$ a diagonal matrix containing the eigenvalues, and $N$ a block diagonal nilpotent matrix built from shift--up blocks. The matrix $S$ containts linear independent vectors built from Jordan chains. Entry: Singular value decomposition Date: Fri Jun 19 11:09:12 CEST 2009 Type: tex Any $M \times N$ matrix $A$ can be written in the form \begin{equation} A = U \Sigma V^H, \end{equation} where $U$ and $V$ are unitary, and $\Sigma$ is a rectangular matrix containing the singular values on the diagonal. The SVD can be seen as containing the information of the square roots of the EVDs of both squares of $A$. Which means $A^HA = V \Sigma^2 V^H$ and $A A^H = U \Sigma^2 U^H$. It has the properties of an EVD of a positive definite Hermitian matrix: positive real eigenvalues and orthogonal eigenvectors. The SVD is formost an orthogonal decomposition: it gives orthogonal bases for the row/column spaces and left/right null spaces. It is central to so called \emph{subspace methods}. One of its major applications low rank approximation with as special cases the \emph{total least squares} approximation method, and PCA. The singular values are the square root of the eigenvalues of $A^H A$, so the SVD could be seen as a sort of double polar representation. In fact, using the SVD one can compute what is ordinary called the polar decomposition of $A$ in terms of a unitary and a positive definite Hermitian matrix \begin{equation} A = U \Sigma V^H = ( U V^H ) (V \Sigma V^H). \label{polarmatrix} \end{equation} Similarly, the right polar decomposition can be computed as $(U \Sigma U^H) (UV^H)$. This can be seen as a generalization of the polar representation of complex numbers $a = e^{i\theta} r$, where the positive factor is taken to be the Hermitian square root of $AA^H$ or $A^HA$. Note that this is very similar to the QR decomposition. Where $R$ is triangular instead of Hermitian, indicating there is a link between positive definite triangular and Hermitian matrices. This is reflected in the Cholesky decomposition. % wat wil zeggen dat er zeker ergens een diep sterk verband is tussen % hermitiaanse matrices en driehoeks matrices % rechts geeft het $( U \Sigma U^H ) (U V^H)$. Entry: CS Decomposition Date: Fri Jun 19 11:10:12 CEST 2009 Type: tex For any $2 \times 2$ partitioning of unitary matrix \begin{equation} Q = \matrix{ Q_{11} & Q_{21} \\ Q_{12} & Q_{22} } \end{equation} there exist unitary matrices $U_1$, $U_2$ and $V_1$, $V_2$, such that $D = U^H Q V$, with \begin{equation} U = \matrix{ U_1 & \\ & U_2 }, V = \matrix{ V_1 & \\ & V_2 } \end{equation} and \begin{equation} D = \matrix{ D_{11} & D_{21} \\ D_{12} & D_{22} }, \end{equation} where $D_{ij}$ are essentially diagonal (permuted diagonal). This is essentially a SVD, where $Q_{ij} = U_i D_{ij} V_j^H$. % It can be seen as the basic form for real matrices with complex % eigenvalues, without using complex numbers\footnote{ Note that the % block above is nothing more than a matrix representation of % complex numbers over the field of real numbers. This is the % algebraic extension field $\RR[\sqrt{-1}]$, isomorphic to the % quotient of the Euclidian ring of polynomials $\RR[x]$ and its % maximal ideal $\{p(x) | (x^2+1)q(x), q(x) \in \RR[x]\}$.}. The % essential observation is that the block above is unitary similar to % a $2 \times 2$ diagonal matrix containing the complex conjugate pair % $c \pm is$. % SEE page wei : history and generality of the CSD decomposition Entry: Vandermonde decomposition Date: Fri Jun 19 11:11:12 CEST 2009 Type: tex Given a sequence $h_k$ which satisfies the relation \begin{equation} h_k = \sum_{l=1}^{L} c_l z_l^k \end{equation} with all $z_l$ different, any $M \times N$ Hankel matrix of $h_k$ can be written as \begin{equation} H = \sum_{l=1}^{L} c_l \left[ \begin{array}{c} 1 \\ z_l \\ \vdots \\ z_l^{M-1} \\ \end{array} \right] \left[ \begin{array}{c} 1 \\ z_l \\ \vdots \\ z_l^{N-1} \\ \end{array} \right]^T, \end{equation} which we denote as $H = \sum_l c_l W_{l,M} W_{l,N}^T = W_M C W_N^T$, with $C$ a diagonal matrix containing the $c_l$, and $W_M$ and $W_N$ vandermonde matrices built from the $z_l$. In general we can decompose any Hankel matrix $H$ in terms of confluent Vandermonde matrices obtained from the roots of the polynomial \begin{equation} p(z) = z^{L} + \sum_{l=1}^L a_l z^{L-l} \end{equation} % note that also MA signals can be modeled by this!! if all poles are zero, % the confluent vandermonde matrices represent a shifted unit impulse base.. defined by the $a_l$ satisfying the recurrence relation \begin{equation} h_k = - \sum_{l=1}^L a_l h_{k-l}. \end{equation} If $H$ is rank deficient, $p(z)$ can be constructed from an element of the null space of $H$. If the Hankel matrix is full rank, the decomposition can be obtained through the decomposition of the Hankel matrix obtained by the addition of an extra linearly dependent column. This is however no longer unique, since the extension is arbitrary. In the case where $H$ is (Hankel) circulant, the circulant rank deficient extension gives the (mirrored) Fourier decomposition $H = F C F^T$, with $p(z) = z^N - 1$ and $z_l = \exp(j2\pi l / N)$. Perturbing a Hankel matrix of full rank with a Hankel matrix of minimal Frobenius norm such that its rank is reduced to $L$ gives us a way to construct an order $L$ exponential model for a signal $h_k$ with minimal error in a least squares sense. This is a structured total least squares problem. % \subsection{Hankel shifts} % Observe $W_M^{+} = W W_M{-}$, with $.^+$ and $.^-$ denoting the % removal of top and bottom rows. The diagonal matrix $W$ can be % understood as the diagonal representation of some kind of shift % operator. % To see this, take $H_L = Z_M C Z_L^T$, the Hankel matrix built from % the first $L$ linear independent columns of $H$. The product % $CZ_L^T$ is regular, so $Z_M$ and $H_L$ span the same space. We % write $H_L$ as a sum of Hankel matrices $Z_m$, which are zero except % for a one on the $m$th anti--diagonal. This gives % \begin{equation} % H_L = \sum_m h_m Z_m. % \end{equation} % , which is the analog of the % This could be understood as a diagonalizable shift operator. Indeed, % suppose $M \to \infty$, meaning $H \in \mathbb{C}^{\infty \times % L}$, % best gewoon andersom werken want mij lijkt het niet direkt duidelijk % hoe die compositie nu bewezen kan worden.. % This occurs in (\ref{linpred}) where we model the autocorrelation as % a sum of expontials, by removing the top row. % \section{Frequency domain interpolation} % We use time domain interpolation to set the base pitch of the % synthesized signal. Let's have a look at the effect of doing this in % the frequency domain. Suppose we have a spectral sequence % $C_k$. What happens if we resample this vector? % \section{Duality} % To understand the benefits of this method, it is interesting to have % a look at time/frequency duality. Since we have a direct and % efficient way to switch between time and frequency domain % representation, some operations become very straightforward. % effects: single sideband modulation harmonic exciting + aliasing % free distortion % todo: pd objects for basic operations bfft/bifft formant shift (time % or frequency dilatation) % Alternative playback: synchronous retriggering. Entry: Subspace frequency estimation Date: Fri Jun 19 11:12:12 CEST 2009 Type: tex An interesting trick exists for relating the Vandermonde decomposition $H = W_M C W_N^T$ and the SVD $H = U \Sigma V^T$. Suppose $C,\Sigma \in \mathbb{C}^{L \times L}$, and $H \in \mathbb{C}^{M \times N}$, with $L < N < M$. We have \begin{equation} W_M^+ = W_M^- W. \label{vandershift} \end{equation} with $.^+$ and $.^-$ indicating the removal of the top and bottom row respectively. $W$ is a diagonal matrix containing the $z_l$. This can be used to compute the $z_l$. Since $W_M$ spans the column space of $H$, given some matrix $X$ spanning this space, there is a regular matrix $Q$ satisfying $W_M = X Q$. Removing the top and bottom equation of this relation, we can construct the equation \begin{equation} X^+ Q = X^- Q W. \label{subspaceshift} \end{equation} Every basis for the subspace spanned by the rank $L$ Hankel matrix has such a shift property. We can solve this for the matrix $W' = QWQ^{-1}$, which has the same eigenvalues as $Q$, so we can obtain the $z_l$. For $X$ we could use the first $L$ columns of $H$. An alternative approach is to use the SVD, which can also be used to compute an approximate solution in case $H$ is corrupted by noise. Take $U_L$ to be the matrix of left singular vectors corresponding to the $L$ largest singular values of $H$. In general, when $L$ is not equal to the rank of $H$, the approximation is no longer a Hankel matrix and the subspace shift property $U_L^+ = U_L^- W'$ is not valid. However, we can still solve this equation in a least--squares sense, which gives us an approximation of the $z_l$ in case $H$ is corrupted by noise. This method is called HSVD. If we solve the last equation in a total least squares sense, the method is called HTLS. The signal amplitudes can be recovered by solving a least squares problem after obtaining the signal poles. % Applying a variant of this method to a signal covariance matrix % yields the (TLS--)ESPRIT algorithms. zie deprettere en vanderveen % 93 ok, dit brengt ons tot 'realization theory' \def\O{\mathcal{O}} \def\C{\mathcal{C}} This method is a special case of ``realization theory''. Where one computes a statespace model $$\matrix{x^+ \\ y} = \matrix{A & B \\ C & D} \matrix{x \\ u}.$$ from a given impulse response $h_n$. This sequence is zero for $n<0$ because we assume the system is causal. We have $h_0 = D$, and for $n<0$ we have $h_n = CA^nB$. The parameters are recoverd using the Hankel operator $H$, the submatrix of the the Toeplitz impulse response matrix $T$ which relates past input $u_n : n<0$, and future output $y_n : n \geq 0$. Rearranging Arranging the elements $h_n : n > 0$ into an infinite Hankel matrix gives $H$. The elements of $H$ are given by $h_{ij} = CA^{i+j-1}B$. In other words, $H$ can be factored as $H = \O\C$, where $$\O = \matrix{C \\ CA \\ CA^2 \\ \vdots}$$ is called the \emph{observability matrix} and $$\C = \matrix{B & AB & A^2B & \ldots}$$ is called the \emph{controllability matrix}. This clearly shows $H$ is rank deficient. The Vandermonde decomposition is a special case of this factorization. Both $\C$ and $\O$ satisfy shift properties. $\O A$ is equal to $\O$ with top row removed, and $A\C$ is equal to $\C$ with the left row removed. The decomposition $H = \O\C$ is not unique, so we could use the SVD as $H = (U)(\Sigma V^T)$. We can obtain $A$ by applying the shift trick on $U$, and recover $B$, $C$ and $D$ from this. Entry: Displacement rank Date: Fri Jun 19 11:13:12 CEST 2009 Type: tex To explain the idea of displacement rank, we first concentrate on the Hermitian $N \times N$ matrix $A$. This enables us to employ the normal form for the congruence transform, which it a bit easier on the eye than assymetric structures. The particular kind of structure we try to capture is shift structure, which occurs in a lot of problems involving time/space--invariant systems. We are going to investigate the difference between two quadratic forms $x^H A x$ and $(Zx)^H A (Zx)$, where $Z$ is the nilpotent down--shift matrix from (\ref{shiftdown}), which satisfies $Z^N = 0$. Our new quadratic form is represented by the matrix \begin{equation} \Delta A = A - Z A Z^T = G J G^H, \label{drank} \end{equation} where $J = I_p \oplus -I_q$ is an $r \times r$ diagonal signature matrix containing the elements $j_l \in \{ 1, -1\}$. The fact that we take a difference of two squares is a first indicator that we are going to run into some hyperbolic metric with signature $(p,q)$. Here $r$ is called the displacement rank of $A$, while the pair $(p,q)$ is called the displacement inertia. The redundancy in the dense matrix $A$ is captured by the pair $\{J, G\}$, called a generator of $A$. For most problems of interest $r \ll N$. The task that remains us is to figure out how to translate ordinary operations we would like to perform on $A$ to operations on $\{J, G\}$. The first thing we might want to know is how to perform a matrix multiplication with $A$ using this representation. Expanding the recursive formula $A = \Delta A + ZAZ^T$ gives us \begin{equation} A = \sum_{k=0}^{N-1} (Z^k G) J (Z^k G)^H. \end{equation} Expanding this further in terms of columns of $G$ gives us an expansion in terms of ``squares'' of lower triangular Toeplitz (LTT) matrices $L_l L_l^H = \sum_k (Z^k g_l) (Z^k g_l)^H$. The complete expression for $A$ then becomes \begin{equation} A = \sum_{l=1}^{p} L_l L_l^H - \sum_{l=p+1}^{r} L_l L_l^H, \end{equation} with $L_l$ the LTT matrix built from the $l$th colum of $G$. So in words, $A$ is written as a difference of two positive definite Hermitian matrices. Both are the sum of respectively $p$ and $q$ positive definite Hermitian matrices. The remarkable fact is that for each of these matrices, we have the Cholesky decomposition available in terms of lower triangular Toeplitz matrices. The interesting thing about this representation, is that a multiplication with an $N \times N$ LTT matrix is essentialy a convolution and can be performed using an FFT of size $2N-1$. This is our first victory! Complexity of application of $A$ is reduced from $O(N^2)$ to $O(N \log N)$. % Note that $A$ is expressed as a sum/difference of ``squares''. % For those who didn't pay close attention, the FFT sneaked in when we % started playing with $Z$. % In the more general non--symmetric case we have $A - Z A Z^T = G B % ^H$, giving rise to $A = \sum_l L_l^g (L_l^b)^H$, where the $k$th % column of the LTT matrices $L_l^g$ and $L_l^b$ are built from the % $l$th column of $Z^kG$ and $Z^kB$. So the matrices are all in LU % form. % These LTT matrices can be handled using FFT algorithms. In layman's % terms, exploiting displacement rank is exploiting shift--invariance % using a Fourier transform, together with a workaround to handle % boundary conditions and other transformations that result in this % invariance no longer strictly holding. Can we do more? An interesting property is \begin{equation} A-FAF^H \cong A^{-1} - F^H A^{-1} F, \label{congruence} \end{equation} with $\cong$ denoting congruence, for any $F$ and $A$ regular Hermitian. Note this is valid for any $F$! One application of this is about the inverse of a Toeplitz matrix. Let $\tilde{I}$ be the anti--diagonal matrix. Using this matrix in a congruence transform reflects rows and columns in a matrix, effectively rotating the grid. When we apply this on a Hermitian Toeplitz matrix $T$ or its inverse $T^{-1}$ they are both left invariant. If we use $T$ and $Z$ from above in (\ref{congruence}), and transform the right hand side using $\tilde{I}$, we get % jaja. dit is echt wel interessant!! ik ben eens benieuwd of die % dingen te relateren zijn aan andere symmetriegroupen enz.. \begin{equation} \Delta T \cong \Delta T^{-1}. \end{equation} The significance of this is of course that we can transform the generators of $T$ into the generators of $T^{-1}$, without having to represent any of the two in dense form. This is not a new result of course, and is effectively expressed in the Levinson algorithm, where the invariance under transform by $\tilde{I}$ is explicitly exploited, together with a recurrence relation. Note that this is less general and talks specificly about symmetric Toeplitz matrices. The most important general consequence of (\ref{congruence}) is that Schur complements inherit displacement structure. The generators of this complement can be computed using the generalized Schur algorithm. One thing to remark is that if $G$ is in normal form, which means the top row $k$ contains only a single nonzero element, and the matrix $Z$ is strictly lower triangular, the corresponding column $g_k$ is the first column of the Cholesky factor. Switching to our new notation, this becomes $$R = [L] = \sum_{k=1}^N [l_k].$$ We identify this with $$R = [G]_J + [Z]_R = \sum_{k=1}^r [g_k]_{j_k} + [Z]_R.$$ where we conclude that $l_1 = g_1$, since all the other terms do not affect the first row and column of $R$. This gives us the first column in the Cholesky factorization of $R$. Suppose $j_1 = 1$, we can compute the rest as $$R - [g_1] = \bar{R} = \sum_{k=2}^r [g_k]_{j_k} + [Z]_R.$$ The only thing we need to do now is to write $[Z]_R$ in terms of $\bar{R}$. With $R = \bar{R} + [g_1]$, we have $[Z]_R = [Z]_{\bar{R} + [g_1]} = [Z]_{\bar{R}} + [Zg_1]$. The final word is thus $$\bar{R} = [Z g_1] + \sum_{k=2}^r [g_k]_{j_k} + [Z]_{\bar{R}}.$$ Which gives us the generator for the Schur complement. This is one step of the generalized Schur algorithm. The only thing we assumed is that we can bring any generator in normal form using $J$--orthogonal transforms. This will yield one column of the Cholesky factor. The generators of the Schur complement are then computed by shifting down the pivot column. The question which arrises almost immediately is: we go great lengths trying to keep the number of coefficients small, but in generating the Cholesky decomposition, we do generate $N(N-1)/2$ parameters. Can we somehow consume these parameters immediately? For example, solving $LL^H x = b$. Solving $Lx' = b$ is straightforward, but the $L^H$ factor gives some trouble. The thing to do here would be to use some kind of primal/dual trick. This is essentially what the Levinson algorithm does: instead of updating the Cholesky factors, it updates the solution to a symmetric positive definite Toeplitz algorithm. % where $J|Q = QJQ^H$, $j_k$ is the $k$--th diagonal of $J$. This % gives $R - R|Z = J(G) = \sum_{k=1}^r j_k(g_k)$. We can write the % partial Cholesky decomposition as %$$ %\begin{array}{lll} %(I)_R & = & (L)_I \\ % & = & \sum_k (l_k)_1 \\ % & = & \sum_{k=1}^r (g_k)_{j_k} + (Z)_R \\ % & = & (l_1)_I + \sum_{k=2}^r (g_k)_{j_k} + (Z)_R. \\ %\end{array} % $$ % ok, nu komen de nieuwe dingen he! morgen verder begrijpen.. maar de % moraal van het verhaal is: recursie wordt ingevoerd owv het aanwezig % zijn van een relatie tussen een matrix en zijn schur complement. % ..., using a sequence of transforms that leave the metric % intact. This brings us to $J$ inner matrices, which are matrices % satisfying $Q^H J Q = J$. They are a generalization of unitary % matrices (where $J=I$), but preserve distance with respect to $J$. % misschien ga ik hier eindelijk eens aan mijn symplectisch trekken % komen door het introduceren van een symplectic form. % SCHUR : displacement structure allows us to speed up the computation % of schur complements triangular factorization can be stated in terms % of successive schur compliments == generalized schur algorithm %If $A$ is Toeplitz, we have $J = \text{diag}(1,-1)$ and so %\footnote{ %An alternative definition is %\begin{equation} %\Delta_{F,A} (R) = FR - RA = GB^T, {%\end{equation} %using the $N \times L$ generator matrices $G$ and $B$.}. % Waar kan ik dit toepassen. Om effe terug te komen op wat dit % eigenlijk doet: shift -- invariantie en randvoorwaarden met elkaar % combineren. % \section{Heavy speculation} % So what's the whole idea when modeling audio signals? Present a % linear model using time--localized features. The problem is really % to define time--extent. Since we have a discrete number of features, % sliding over a signal will make some of them appear and % disappear. This is the cause of all trouble, the fact that we are % really dealing with discrete events. % It would be interesting to investigate if it is possible to perform % event segmentation using HTLS. Basicly, starting from a signal % $s_n$, partition the signal such that each partition satisfies the % same prediction equation, minimising an error. This could result in % a separation of an AR filter and an impulse-like stimulus (impulse + % MA). % Without thinking too much, combining the backward and forward errors % from linear prediction in time localisation of impulses is an option % to consider. See p132 godsill & rayner % \subsection{The arrow of time} %[REWRITE] % this part is a bit over the top :) the only real info here is % choleski factor of variance/covariance matrix as conditional % entities arrow of time, ok, but in practice this only refers to the % relation between recursive formulation and causal COMPUTERS :) the % model is not necessarily causal.. % There's something mysterious about triangular matrices. Note the % interesting recurrent patterns that emerged while we were computing % the Cholesky and LU factorization, and in solving triangular % systems. This screams for a closer look. % But first, some buzzwords. Causality, assymetry, order, subspace % inclusion, information delay, entropy, conditional variance (they % occur as Cholesky factors = conditional expansion of mutivariate % gaussians). Time varying systems, the Kalman filter, autoregressive % modeling with reflection back--forth in time, the wave % equation\ldots Ok, enough madness. There is something deep going on % here i don't quite get yet, all centered around this ``conditional % probability'' thing. Bayesian networks, Schur complements. % What is so special about a Cholesky factorization? Writing $A = LL^H % = R^HR = (QR)^H(QR)$ with $Q$ unitary gives us quite a family of % ``square roots''. Because of its signature, $x^TAx = 1$ gives an % ellipse. Taking the square root of $A$ gives an alternative basis % where the ellipse becomes the unit sphere. Rotating and reflecting % this sphere leaves it invariant, hence we have the group of unitary % matrices $Q$ as the symmetry group of square roots. % So why choose $L$? The first and most obvious reason is of course a % pragmatic one. Algorithms involving triangular matrices are % reasonably fast compared to dense and orthogonal ones. But there is % a deeper reason, which is intimately linked to the algebraic % properties of triangular matrices: a Cholesky factorization can be % seen as a ``causal square root''. It is the one factorization that % preserves a specific kind of order. % The basis transform to bring the quadratic form in normal form is % $x' = L^Tx$. Now, we saw before that lower (or upper) triangular % matrices preserve some nesting property of subspaces. This nesting % property is an order relation intimitely linked to the concept of % time, for which causality determines an order relation of time % instances. It is exactly this property (symmetry) which allows % algorithms involving these matrices to be formulated using the % principle of induction. % decategorification of the category of finite sets + definition of % integers + order relation % now i start to sound as a real burgie % Where do we see this causality clearly? In statistical signal % processing. Suppose we have a noise processes $u_k$ and want to % capture its ``essence''. In order to do this, we need to make some % assumptions that make our life a lot easier, but are not too far % fetched. % We assume it is stationary, meaning its statistical properties do % not vary over time. The statistical properties can then be expressed % in terms of expected values of products of $u_k$ from different time % instances, which only depend on the difference of 2 time instances, % and not their absolute value. % In other words, time is a torsor. % We assume it is zero mean or $\E(u_k) = 0$. So we can forget about % the first degree term. We assume it is gaussian, which means we % forget about all degrees larger than $2$. This leaves us a second % order representation of the statistical properties, or, a quadratic % form, which we all learned to love. % Suppose we denote $u$ as the vector with each component $i$ a % statistical variable representing the observation at time instant % $i$ before now, we get a variance--covariance matrix given by $U = % \E(uu^T)$. To state the obvious, this is of course only equal to % $\E(u)\E(u)^T$ if all the components are independent, which is the % same as saying there is no correlation over time. If there is such a % correlation, it will be represented by the off--diagonal elements of % $U$. % Now, what is $L$ in $U=LL^T$? It is a certain \emph{causal} model % for the dependence of the components of $u$, such that, the modeled % \emph{features} $u' = L^Tu$ are \emph{independent}. If you draw this % as a graph, the interdependency network, or information flow from % $u$ to $u'$ only goes from past to future, not the other way around. % Note that not in all systems (square root algorithms) it has this % kind of interpretation. Since, due to it's complexity properties, % the Cholesky factorization can be a good for any kind of % decorrelation where we impose (force) some kind of causal or % conditional statistical structure. % BAYES! % To stay in the statistical framework, we explore the concept of % Cholesky decomposition combined with \emph{Schur complements}. % Let's have a look at conditional probability, Schur complements, and % the accumulation of knowledge (the Kalman filter). A multivariate, % zero mean gaussian distribution can be written as $p(x) = % \exp{-x^TAx}$, with $A > 0$ the variance--covariance matrix. This % has a Cholesky decomposition $A=LL^T$. This means that $p(x') = % \exp{-x'Tx'}$, with $x' = L^T x$ is a decoupled multivariate model. % It is in fact a recurrent Bayesian network. Since we explain the % variance of one term in terms of previous (co)variance and new % information. To see this, we have a look at the $2$--dimensional % case, which will bring us to Schur complements. Take % \begin{equation} % S = \matrix{E(xx) & E(xy) \\ E(yx) & E(yy)}. % \end{equation} % Computing the Cholesky factorisation of this gives $S = LL^T$ with % \begin{equation} % L = \matrix{\sqrt{E(xx)} & 0 \\ E(xy)/\sqrt{E(xx)} & \sqrt{E(yy) - E(xy)^2/E(xx)}}. % \end{equation} % NOTE; for triangular matrices, the commutator is not zero, but it is nilpotent, % They are very asymmetric, but are square roots of very symmetric % things. This pops up in AR modeling for instance. What about the % relation of $LL^H$ to $L+L^H$? Compare $a^2 + 2a + 1 = (a+1)^2$ and % $LL^H + L + L^H + I = (L+I)^H(L+I)$? For LTT matrixes this sort of % makes sense, since the diagonal is constant. % nog iets LTI:polynomen(commutatief), LTV:niet commutatief (owv tijdsvariante rotaties) Entry: Trace and Determinant Date: Fri Jun 19 11:14:12 CEST 2009 Type: tex Trace and determinant are two group homomorphisms from square $N \times N$ matrices to the base field. The trace preserves the addition operator $\tr(A) + \tr(B) = \tr(A + B)$, and the determinant preserves the multiplication operator $\det(A)\det(B) = \det(AB)$. Note the determinant formula also works for determinant zero, but in that case it is no longer a group homomorphism. With respect to the eigenvalues we have $\tr(A) = \sum_k \lambda_k$, and $\det(A) = \prod_k \lambda_k$. They are related through the exponential function as $\exp(\tr(A)) = \det(\exp(A))$. The trace is interesting to define an inner product between matrices, when considering a space of matrices as a normed vector space. For the space of $M \times N$ matrices, one could define the inner product $\left = \tr(B^HA)$. For $N=1$ or $M=1$ this is the ordinary inner product for column or row vectors. We also have $\left = \|A\|^2_F = \sum_{ij} |a_{ij}|^2$, the Frobenius norm. Since $\tr(rA) = r\tr(A)$, it is a linear map. Also $\tr(AB) = \tr(BA)$, which can be generalized to cyclic permutations of a product of matrices. The trace of the Jacobian of a vector function is the divergence. While the trace $\tr(A)/N = \sum_k \lambda_k/N$ can be seen as an additive average of a square $N \times N$ matrix $A$, the determinant $\det(A)^{1/N} = \prod_k \lambda_k^{1/N}$ can be seen as a geometric average. In this light, it is no surprise the determinant is related to the inverse of a matrix. More specificly $A^{-1} = \adj(A) / \det(A)$. We define the adjugate as $\adj(A) = C^T$, the transpose of the cofactor matrix of $A$. The cofactors are given by $c_{ij} = (-1)^{i+j} m_{ij}$. Here $m_{ij}$ is the a minor of $A$. It is the determinant of the matrix $A$ with column $i$ and row $j$ removed. The adjoint is the part of the inverse of $A$ which can be computed without divisions. Laplace's formula gives an expression for the determinant in terms of minors $\det(A) = \sum_k a_{1k} c_{1k}$. Entry: Matrix Equations Date: Fri Jun 19 11:15:12 CEST 2009 Type: tex Finding transforms subject to constraints. The discrete time Lyapunov equation in the unknown $X$ is given by $$A X A^H - X + Q = 0,$$ with $Q$ Hermitian. There is a unique solution iff no eigenvalue of $A$ is the reciprocal of an eigenvalue of $A^H$. If this condition is satisfied, and $\lim_{k\to\infty} A^k = 0$, the unique solution is given by $X = \sum_k (A)^k Q (A^H)^k$. It is Hermitian since it is the sum of (infinitely many) Hermitian matrices. % rewrite as shur complement % [Riccati]. Entry: Statistics Date: Fri Jun 19 11:16:12 CEST 2009 Type: tex \subsection{Introduction} A very brief introduction to statistics. We represent a stochastic variable, denoted by a capital letter $X$, by its \emph{probability density function} or PDF $p_X(x)$, which is a function over the reals. This is also called a probability measure. It satisfies $p_X(x) > 0$ and \begin{equation} \int_{\RR} p_X(x) d x = 1. \end{equation} In general we talk about \emph{distributions} instead of functions, and we allow the PDF to vanish, so we can take $\RR$ to be the domain. An event $E$ is defined as a subdomain of the domain of $X$. For example, $E = \{x : x>10\}$. The probability for the event $E$ is then denfined to be \begin{equation} P(E) = \int_{x \in E} p_X(x) d x \end{equation} % This function can be said to reflect a \emph{belief} about the value % of $X$. The more localized $p_X(x)$ is around a value $x_0$, the % stronger our belief is the value is $x_0$. Since events are subsets, we can compute using the operations of union $\cup$ and intersection $\cap$. The joint probability of two events is defined as $P(A,B) = P(A \cap B)$. A conditional probability is defined as $P(A|B) = P(A \cup B) / P(B)$. Two events are said to be independent if $P(A \cap B) = P(A) P(B)$. Two events are said to be mutually exclusive if $P(A \cap B) = 0$, but either $P(A)$ or $P(B)$ does not vanish. Since union commutes, this enables us to derive Bayes rule \begin{equation} P(A|B)P(B) = P(B|A)P(A). \end{equation} In practice, we usualy encounter several variables $X_1, \dots, X_N$ that are related. These will be represented by a \emph{joint PDF}, a multivariate function $p_{X_1 \ldots X_N}(x_1, \ldots, p_N) > 0$ satisfying \begin{equation} \int_{\RR^N} p_{X_1 \ldots X_N}(x_1,\dots,x_N) d x_1 \ldots d x_N = 1. \end{equation} For a bivariate distribution $p_{XY}(x,y)$ we can say the variables $X$ and $Y$ are independent if $p_{XY}(x,y)=p_X(x)p_Y(y)$. In other words, the PDF is separable. In practice it is easier to disregard the normalization factor. This is possible since most formulas are linear in $p$. It is assumed understood that normalization needs to be performed whenever one integrates over a PDF. We will use the symbol $\sim$ to indicate proportionality upto normalization. An important concept is the \emph{expectation} operator, which returns the expected value of a function of a stochastic variable. It is defined as \begin{equation} E\{f(X)\} = \int_\RR f(x) p_X(x) d x. \end{equation} This is a linear operator. Usually one talks not directly about $p_X(x)$ but about \emph{moments}. These are the expectation values of monomials in $X$. We call $m_X = E\{X\}$ the \emph{mean}. It is the center of mass of a distribution. We call $\sigma_X^2 = E\{(X - m_X)^2\} = E\{X^2\} - m_X^2$ the \emph{variance}. The square root of the variance is called \emph{standard deviation}, and is a measure for the spread of mass around the center. For an event, we can define the \emph{information content} $I$, also called \emph{surprisal} or \emph{self information}. It expresses the amount of information that knowledge about an event adds to one's overall knowledge. It can be seen as a measure for the reduction in uncertainty. It depends only on the probability of an event. Events with low probability carry a lot of self--information, while events with high probability carry little self--information. For independent events $A$ and $B$ it has the properties $P(A) > P(B) \implies I(A) < I(B)$, and $I(A) + I(B) = I(A \cap B)$. This gives us $I(A) = -\log(P(A))$. A measure often used is the expected value of the self--information, called the \emph{entropy} of $X$, and is defined as \begin{equation} H(A) = E\{I(A)\} = - \int_{A} p_X(x) \log p_X(x) d x. \end{equation} If the event is taken to be the entire sample space, we talk about the source entropy, which is the average amount of information an observation of $X$ brings. What does it mean for a distribtion to have maximal entropy? % Entropy is related to physical systems in thermal equilibrium, where % it is maximal. It is prominent in information theory. In short, % given a choice over a family of distributions, the one with maximal % entropy is the desirable one. More about this later. % Most physical systems tend toward a configuration of maximum entropy % over time. For a class of PDFs, the one with the highest entropy is % the one with the east amount of prior knowledge. For any given distribution determined by mean and variance, the \emph{normal distribution} is the one with maximal entropy. It is given by \begin{equation} N(x ; m_X, \sigma_X) = \frac{1}{\sqrt{2\pi\sigma_X}} \exp\left(-\frac{(x-m_X)^2}{2\sigma_X^2}\right). \end{equation} The multivariate version of this is $(2\pi\det A)^{-\frac{1}{2}} \exp (-\frac{1}{2} (x-b)^T A^{-1} (x-b))$, with $b$ the vector of means and $A$ the variance--covariance matrix. Adding two stochastic variables $X$ and $Y$ produces a third stochastic variable $Z = X+Y$. Written in terms of the distributions $p_X$, $p_Y$ and $p_Z$, and the joint distribution of $X$ and $Y$ written as the bivariate function $p_{XY}$, we can compute $p_Z$ by observing that the marginal probability $p_Z(z)$ is the sum (integral) of all marginal contributions $p_{XY}(x,y)$, for which $x + y = z$. This gives $$p_Z(z) = \int_{\RR}p_{XY}(z-y,y) d y.$$ When $X$ and $Y$ are independent, their joind distribution is seperable $p_{XY} = p_X p_Y$, and the above formula reduces to a convolution. This enables us to compute the PDF in a transform domain. For this we define the characteristic function of a variable $X$ as \begin{equation} \phi_X(t) = E\{e^{itx}\} = \int_\RR p(x) e^{itx} d x \end{equation} which is the inverse Fourier transform of $p(x)$, interpreting $p(x)$ as a spectrum. The sum of a linear combination of independent random variables $S = \sum a_i X_i$ can be done from the relation $\phi_S(t) = \prod \phi_{X_i}(a_i t)$. For arbitrary many to many functions $y = f(x)$, we can write $p_Y(y)$ as an integral over contours $f(x) = y$ of the joint PDF $p_{XY}$. This means that for an invertable $f$, the new PDF is given by the composition $p_Y(y) = f(p_x(z))$. A word on \emph{interpretation}. The practical applicability of probabilistic methods to real world situations depends on the stance one takes. The \emph{frequentists} approach deals with probabilities as relative frequencies of occurance or relative descriptions of populations. The \emph{bayesian} approach deals with probability as a model for \emph{belief}, and is very much related to information. Of course this deals only with applicability of probability theory, since the developpement thereof is axiomatic, and independent of interpretation. \subsection{Parameter estimation} The problem of estimation is defined as follows. We are interested in determining the value of some unobservable parameter $\theta$, given observed data $x_0$. To this end we will use an estimator $\hat{\theta}(x_0)$ that maps the data to an estimate. If we have information about the statistics of $x$, can can judge the value of the estimator by quantities like \emph{bias}, defined as $E\{\theta - \hat{\theta}\}$, and \emph{mean square error}, defined as $E\{(\theta-\hat{\theta})^2\}$. An often used estimator is the \emph{maximum likelihood estimator}. Here $\theta$ is a parameter of a probability density function $p(x;\theta)$. We suppose $x_0$ is drawn from $p$. A means to estimate $\theta$ is to maximize $L(\theta) = p(x_0;\theta)$, which is called the likelihood function. The estimator is then $\hat{\theta}(x_0) = \arg\max_\theta p(x_0;\theta)$. Note that this usually is a biased estimator. An important concept is \emph{Fisher information}. It is thought of as the amount of information that an observable random variable $X$ carries about an unobservable parameter $\theta$ upon which the probability distribution of $X$ depends. Using the symbol $;$ to distinguish variables from parameters, we write a parametrized distribution for the variable $X$ as $p_X(x;\theta)$. This is a function of sample $x$ and unobservable parameter $\theta$. First we define the \emph{score} as ${d \over d \theta} \log p(x ; \theta)$, with $L(\theta) = p(x;\theta)$ the likelihood function. The Fisher information is then defined as the variance of the score. Since the expectation value of the score is zero, the Fisher information can be written as $$I(\theta) = E\{\left[{d \over d \theta} \log p(x ; \theta)\right]^2 \}$$ % alternative form: in terms of second derivative % TODO: think about this some more.. % it's not so hard, just need to make small leap when awake. Some problems might be \emph{ill--conditioned}. This is typical for estimators that try to recover information going against the ``arrow of time'', or opposite to the normal evolution of a physical system. Problems of this class have a large sensitivity to measurement errors. % Note: we can only talk about estimators, but for some simple % problems it is clear what the best estimator is, so the properties % of that estimator are said to be properties of the problem. This is usually quantified by means of the condition number, which reflects the ratio of relative errors in the estimate to relative errors in the data. We write $$k(\hat\theta) = \max\{ {\left|\hat\theta(x) - \hat\theta(x_0) \right| \over \left| \hat\theta(x) \right|} / {\left| x - x_0 \right| \over \left| x \right|} : x-x_0 < \epsilon \},$$ with $\epsilon$ a small region around $x_0$ which makes sense. For a differentiable estimator we can let $\epsilon \to 0$ and write everything in terms of derivatives $$k(\hat\theta) = {\left| \hat\theta'(x) \right| \over \left| \hat\theta(x)\right|} \left|x\right|.$$ An important special case is the linear problem $A\theta=x$, for which $\hat\theta(x)=A^{-1}x$. Rearranging the formula above we get $$k(\hat\theta) = \max\{ {\left|A^{-1}(x-x_0)\right| \over \left|x-x_0\right|} / {\left|A^{-1}x\right| \over \left|x\right|} : x-x_0 < \epsilon \}.$$ For any $x \neq 0$, the maximum of $\|Ax\|/\|x\|$ is given by the absolute value of the largest eigenvalue, the minimum is the absolute value of the smallest eigenvalue. The eigenvalues of $A^{-1}$ are the inverses of the eigenvalues of $A$. With $\|A\|$ denoting the operator norm, $k(\theta) = \|A^{-1}\| \|A\| = |\lambda_\text{max}(A) / \lambda_\text{min}(A)|$. % In some cases it might be interesting to use a different matrix norm % to define condition number. % canonical example: poisson distribution % TODO: entropy, bias, MMSE. andere estimators. fisher. CRB. % en bayes. \subsection{Monte Carlo} In cases where it is impractical to obtain theoretical bounds on an estimate, one can perform numerical simulations to estimate the bias and variance of an estimator. This is usually done using Monte Carlo simulation. MC methods are usually employed for evaluating an integral $$\int_{x \in D} f(x)dx$$ when the dimensionality of $D$ is so large that standard numerical methods of integration, which have a complexity $O(\exp(\dim D))$, cannot be employed in practice. As long as $f$ is well behaved, we can relate it to a stochastic sampling problem. This is due to the central limit theorem, which talks about the sum of independent variables $S_i = \sum_{i=1}^N X_i$, where $X_i$ are independent variables taken from the same probability distribution with mean $m_X$ and variance $\sigma_X$. The theorem states that the distribution of $S_i$ converges to a normal distribution with mean $Nm_X$ and variance $\sigma_X \sqrt{N}$. This means that the expected value of $S_i/N$ is $m_X$, and the variance of $S_i/N$ is $\sigma_X / \sqrt{N}$. We can use this to evaluate the integral above. By selecting random points in $D$ and evaluating the function the result will eventually converge to the value of the integral, though the central limit theorema does not provide a convergence speed. There are several tricks to speed up the process which all work by influencing the sampling so important regions in $D$ are sampled more densely. Of course, to know where to sample is to solve the integration problem. One of those methods is \emph{importance sampling}. It works by computing implementing a part of $f$ as the distribution $g$ of a random number generator such that $$\int_D f(x)dx = \int_D {f(x) \over g(x)} g(x) dx.$$ The integral is then evaluated as $1/N \sum_i f(x_i)/g(x_i)$. The best choice is $g(x) \sim | f(x) |$, which is of course nonsense since it involves evaluation of $f$ and is thus as difficult as ordinary integration. % FIXME!! If the variance of $g(x)$ is smaller than the uniform % distribution the method will converge faster. Metropolis-Hastings % algorithm quasi-MC: low-discrepancy sequences. interesting :) \subsection{Stochastic processes} % http://en.wikipedia.org/wiki/Stochastic_process A stochastic process is a stochastic function, i.e. a stochastic variable with values in a function space. This is NOT the same as a parametrized PDF! % There are some relations though. \subsection{Exponential modeling} % % aka harmonic retrieval -- from papy thesis chap 2 Given a sequence of observations $x_n$, we are asked to construct a model $$x_n \approx \hat{x}_n = \sum_{k=1}^L c_k z_k^n$$ that approximates $x_n$ in some optimal way. We assume the measurements and model parameters live in $\C$. Given the $z_k$, this problem is linear in $c_k$, so it is safe to concentrate on the exponential sequences $z_k^n$. The most useful property of such sequences is that they are eigenfunctions of the shift operator $Z\{x_n\} = x_{n+1}$, or $$Z\{z_k^n\} = z_k^{n+1} = z_k z_k^n$$. This might look entirely trivial, but it is this fact that is the basis of most of the tools used in sinusoidal estimation. We introduce the polynomial $a_1(z) = 1 - z / z_1$. If we apply it to the operator $Z$ we get $a_1(Z)\{z_1^n\} = 0$. In other words, the sequence $z_1^n$ is an element of the null space of the operator $a_1(Z)$. This means $a_1(Z)$ can ``filter out'' the sequence $z_1^n$. This is true for any $cz_1^n$, $c \in \C$, since $a_1(Z)$ is a linear operator. Similarly, we can define $a_2(z)$. An important property of polynomials of $Z$ is that they commute. This means applying first the operation $a_1(Z)$ and then the operation $a_2(Z)$ is the same as applying them in reverse order. When we apply $a_1(Z)a_2(Z)$ to $c_1z_1^n + c_2z_2^n$ we get again zero. So, for the entire model we can construct the polynomial $$a(z) = \prod_{k=1}^L (1 - z/z_k) = \sum_{k=0}^L a_k z^k$$ which has the property $a(Z)\{\hat{x}_n\} = 0$. This is called a linear prediction equation. Noting that the coefficient $a_0 = 1$, we can see this by rewriting the equation above explicitly in terms of the sequence $\hat{x}_n$ and its delayed versions $$\hat{x}_n = -\sum_{k=1}a_k\hat{x}_{n-k}.$$ The polynomials we defined above can be seen as the frequency domain representation of a signal. Representing any signal $x_n$ by its generating function, also called its Z--transform $x(z) = \sum_k x_k z^k$ allows us to manipulate shift and scale operations on the sequence $x_n$ in terms of multiplication of $x(z)$ by a polynomial. The only thing we need to do now is to make the approximation error explicit. We can do this by putting it in a statistical framework, devising some optimal error, and solving the problem. Or we can work the other way, by inventing some estimator with good properties (i.e. a fast algorithm) and analyzing its statistical properties to compare with other methods. % SUBSPACE: ESPRIT (covariance EVD) LS/TLS % HSVD/HTLS (data matrix, 'square root' SVD) LS/TLS % \subsection{Entropy} \subsection{The Kalman filter} Suppose we have a linear system which behaves like this \begin{equation} \matrix{x_k \\ y_k} = \matrix{A & B \\ C & D} \matrix{x_{k-1} \\ u_k} + \matrix{u_k \\ v_k}, \end{equation} where $x_k$ is the current state, $y_k$ the output, $u_k$ the observable input, and $u_k$ and $v_k$ are noise processes with covariance $U$ and $V$ influencing the state update and observation respectively. What we are interested in is to find an estimate for $x_k$ and its estimate covariance $X_k$. The filter operates in essentially four steps. First a prediction is made using information from the previous step. Second step is an observation. Third step is the calculation of bias and last step is the update. The error will be used to update the prediction, but only proportional to a gain factor which depends on ``uncertainty''. This two step process is essential to its workings and indicates the combination of prediction and observation. To not overload the notation we keep the state matrix and noise matrices constant. We will set $D = 0$ for simplicity. The prediction steps are reasonably straightforward \begin{equation} \begin{array}{l} \hat{x}_{k|k-1} = A \hat{x}_{k-1|k-1} + B u_k \\ X_{k|k-1} = A X_{k-1|k-1} A^T + W. \end{array} \end{equation} During this step, the system is followed as if it was deterministic, which is expressed in the first equation. The second equation computes the estimated a priori error covariance $E[(x-\hat{x})(x-\hat{x})^T]$ as a combination of attenuation of previous estimate by the dynamics $A$ of the system and injection of new uncertainty due to the random evolution of $w_k$, expressed by $W$. % These are just added together because of the assumption they are independent. We use the estimates we just computed to estimate the output of the system in much the same way, but subtract it from the observed output $y_k$, which gives the \emph{innovation} and an estimate of tis statistics \begin{equation} \begin{array}{l} \hat{e}_k = y_k - C \hat{x}_{k|k-1}\\ E_k = C X_{k|k-1} C^T + V. \end{array} \end{equation} where we take again the same approach as for the state update. The first equation computes the error using a prediction for the output based on our prediction of the state. The second equation compute an uncertainty estimate for this prediction, which is the same as the variance of the innovation (WHY IS THIS? ASSUMPTION?) Now we have some authorative information about how we're doing so far, in the form of a prediction error and an estimate of its variance. This allows us to compute the Kalman gain $K_k$ which is used in the update step \begin{equation} \begin{array}{l} K_k = X_{k|k-1} C^T E_k^{-1} \\ \hat{x}_{k|k} = \hat{x}_{k|k-1} + K_x e_k \\ X_{k|k} = (I - K_k C_k) X_{k|k-1}. \end{array} \end{equation} This deserves some explanation. If $e_k$ is small, we're on track, and the state estimate is left where it is. The biggest mistery is in the $K_k$ of course. It is obtained by minimizing the a--posteriori error covariance $P_k$. TODO: geometric interpretation. Entry: Numeric Date: Fri Jun 19 11:17:04 CEST 2009 Type: tex When we are faced with solving problems involving real numbers, we are bound to resort to approximate methods, because of the simple fact that (digital) computers can store and manipulate only a finite amount of information, and most real numbers can only be represented by an infinite string of bits. So what we do in practice is to represent real numbers by rational ones, and keep an eye on how approximation errors propagate through a system. An often used representation is the floating point number, where $r \approx n 2^m$, with $m$ and $n$ integers over a limited range. This gives an almost constant relative precision over a large dynamic range. This leads to the concept of \emph{machine precision}, which we denote as $\epsilon$. % well--posed problems resemble physical problems. reverse problems % are usually ill--posed. this can be seen in the light of 'going % against the arrow of entropy'. information is lost in a dissipative % system, what a reverse problem tries to do is to re-construct this % information, which leads to a very high sensitivity to measurement % errors. see http://en.wikipedia.org/wiki/Well-posed_problem If a % problem is well-posed, then it stands a good chance of solution on a % computer using a stable algorithm. If it is not well-posed, it needs % to be re-formulated for numerical treatment. Typically this involves % including additional assumptions, such as smoothness of % solution. This process is known as regularization. % condition number An important concept regarding numerical properties for matrices is the condition number, $k(A) = \|A^{-1}\| \|A\|$. % forward / backward stability Entry: Projective Geometry Date: Fri Jun 19 11:18:39 CEST 2009 Type: tex For a regular square $N \times N$ matrix $A$, and $N \times 1$ vectors $x$ and $b$, the equation $Ax = b$ is satisfied by a unique point $x = x_0$. In case $A$ has rank $N-1$, we can find a vector $x$ such that $Ax = 0$. If $b$ lies in the column space of $A$, we can find some other vector $x_0$ such that $Ax_0 = b$. The general solution is of the form $x_0 + \lambda x$, with $\lambda$ an arbitrary number. For $A$ of rank $N-k$ we can do similar things, yielding a two dimensional space of solutions. The solution spaces are called affine spaces. They are \emph{shifted} vector spaces. These define the geometric entities of points, lines, planes, volumes, etc\ldots The observation to make here is that these sets are no vector spaces themselves. A vector space can act on them as a symmetry group (translation) but they are cumbersome to represent, due to the presence of $b$. What if we dispense of $b$? A way to do so without changing too much about the solution space is to construct the set of equations $(A - bI)x = 0$. This leaves the solution space intact, but brings the point $x_0$ to the origin of a vector space again. So we can safely change notation and just drop $b$ altogether, writing $Ax = 0$ to characterise a space of solutions. What can we say about this space? If some $x_0$ satisfies $Ax_0 = 0$, we know that $\lambda x_0$ also satisfies this equation. There is not much difference between both, so we can just as well identify this 1--dimensional space with one and the same thing, defining an equivalence class identifying all points on one line through the origin, leaving their only distinguising quality to be their \emph{direction}. % collection = set of sets This construction is called the projective space of dimension $N-1$ derived from the vector space of dimension $N$, as a collection of 1--dimensional subspaces. It can be visualized in different ways. We can think about them as (light) rays originating from the origin, or dually as light rays starting at some point in space and arriving at the origin. We can reduce these one--dimensional rays to zero--dimensional points by intersecting them with some part of $N$--space. Two common subspaces are the unit sphere, which gives two intersection points with a ray that have to be identified, and a plane at unit distance from the origin, which gives one, but does not include tha point/line/ldots at infinity. Projective space can be seen as the extension of Euclidian space by ``joining ends''. To see what this looks like, we take two easily visualizable examples. First is the projective line, constructed from 1--dimensional subspaces of the plane. This can be identified as a circle, with opposite points identified, which remains a circle. The alternative view is the line, together with the point at infinity. Second is the projective plane, which is again a sphere with opposite points identified. It can be seen as an extension of the euclidian plane with a line at infinity. % What are transformations in projective space? Things taking spaces % to spaces, instead of points to points. In this way, a rotation in % $N+1$ space can be seen as a translation in the projective space. % conics % Actually, i should add a bit about the Erlangen programme. The links % between different kinds of geometry and some simple matrix algebra % consequences. The only 2 i can see that are important at this moment % are euclidian and projective. hyperbolic maybe to talk about FLT. % And some things about Lie groups & all. And what kind of geometry % are triangular matrices? % notatie: pqbd % $\matrix{p & q \\ b & d}$ All this is better seen in terms of Lie groups. The Euclidian group $E(n)$ can be represented by a subgroup of $GL(n+1)$, as $\matrix{R & x \\ 0 & 1}$, where $R$ is an orthogonal matrix, and $x$ a translation vector. The Poincar\'{e} group similarly as $\matrix{A & b \\ 0 & 1}$, which represents all affine transforms $x \to Ax+b$. Entry: Interpolation Date: Fri Jun 19 11:19:54 CEST 2009 Type: tex Some notes about interpolation. This is basicly resampling. Shannon, Nyquist, filtering and polynomial interpolation. We present 2 interpolation methods: polyphase filtering and polynomial approximation. Because we use arbitrary resampling, the standard techniques for sample rate conversion, where the ratio between original and resulting samplerate is a rational number, cannot be employed directly. % see jos interpolation presentation for an overview % linear % allpass (enkel voor fractional delay owv iir) % sinc % windowed sinc % lagrange % farrow: incrementele update % thirian allpass % least squares filter design % ook iets vreemds: VT interpolatie word tijds-variabel % genoemd.. inderdaad, de FIR mask verandert telkens, maar in het % geheel is het interpolatie + hersampling, is dit wel zo? is het % niet beter dit een tijds-invariant te zien, maar als een convolutie % over het hele signaal? Arbitrary resampling can be understood as a 2 stage process. We first construct a continuous time signal \begin{equation} s(t) = \sum_n s_n h(t-n), \label{ctinterpol} \end{equation} using an interpolation kernel $h(t)$. This is the same as convolving $\sum_n s_n \delta(t-n)$ with $h(t)$. The effects of interpolation can be studied by examining the Fourier transform of $h(t)$. For practical purposes, we require $h(t)$ to have compact support. Causality of $h(t)$ is not really an issue in our scheme, since all the elements of $s_n$ are known in advance. \def\emax{\epsilon_{\text{max}}} \def\emin{\epsilon_{\text{min}}} Interpolating a discrete sequence $s_n$ reduces to evaluating (\ref{ctinterpol}) at a point $t'$. We set $t' = t_0' + \epsilon$, with $t_0'$ an integer or half-integer reference point we will use to preserve symmetry in our notation. $\epsilon$ spans a range of $1$, or $\emin \leq \epsilon < \emax$ and $\emax - \emin = 1$. This gives us \begin{equation} s(t_0' + \epsilon) = \sum_n s_n h(t_0' + \epsilon - n) \label{dtinterpol} \end{equation} which we will simplify using the following change of variables. $x_i = i - N/2$, $y_i = s_{t_0' + x_i}$ and $f(t) = s(t_0' + t)$. This turns (\ref{dtinterpol}) into \begin{equation} f(\epsilon) = \sum_i y_i h(\epsilon - x_i). \label{simplepol} \end{equation} We require that $h(t)$ only assumes values other than zero in the interval $-N/2 + \epsilon_{\text{min}} \leq t < N/2 + \epsilon_{\text{max}}$. This means we can limit the sum over $x_i$ to $-N/2 \leq x_i \leq -N/2$, which means the index $i$ in (\ref{simplepol}) runs from $0$ to $N$. The neighbourhood grid $X$ is then $(-N/2, \ldots, N/2)$, and we can use the following matrix representation \begin{equation} f(\epsilon) = \left[ \begin{array}{c} h(\epsilon - x_0) \\ h(\epsilon - x_1) \\ \vdots \\ h(\epsilon - x_N) \\ \end{array} \right]^T \left[ \begin{array}{c} y_0 \\ y_1 \\ \vdots \\ y_N \\ \end{array} \right] \label{mdtinterpol} \end{equation} We will abbreviate (\ref{mdtinterpol}) as $f(\epsilon) = H(\epsilon)^T Y$. $H(\epsilon)$ is a function mapping the fractional displacement $\epsilon$ relative to $t_0'$ to a weighting function for the neighbourhood values $Y$. It can be understood as an infinitely large polyphase filterbank, with each filter implementing a fractional delay $\epsilon$. % This is actually 'polyphase filtering' \subsection{Polyphase filtering} In the case where $\epsilon$ takes on a finite number of rational values $m/M$, with $m,M \in \mathbb{N}$ we can use FIR filter design techniques to determine $h(t)$, which then has the form \begin{equation} h(t) = \sum_m c_m \delta(t-m/M), \label{ppinterpol} \end{equation} with $\delta(t)$ the Dirac impulse function. The number of nonzero coefficients $h_m$ is taken to be $MN$. This is the standard approach in sample rate conversion where the ratio of 2 sample rates is a rational number $L/M$. The filter coefficients $h_m$ are designed to interpolate the $L$ times oversampled signal in such a way that subsequent decimation by $M$ does not introduce significant aliasing artifacts. This can be done using any FIR filter design method. Because the number of taps $MN$ is rather large, one often resorts to a `windowed sinc' design methods, where the filter is obtained by multiplying a function with compact support (the window) with the ideal low-pass filter which has a $\sin(t)/t$ form. By choosing different windowing functions and moving the cutoff frequency of the ideal lowpass filter, aliasing vs. bandwidth tradoffs can be made. Note that (\ref{ppinterpol}) is still discrete. To get to a continuous function we can evaluate, we need to refine $h(t)$ further. We can do this using polynomial interpolation, which is treated below. If $M$ is large enough, low order interpolation us sufficient. In the case where we approximate $\epsilon$ by the nearest $m/M$, (\ref{ppinterpol}) is convolved with a block impulse, making the resulting $h(t)$ piecewize constant. For linear interpolation, it is convolved with a hat function, making $h(t)$ piecewize linear. \subsection{Polynomial interpolation} An alternative to polyphase filtering to determining $h(t)$ is to use polynomial interpolation only. In this case, interpolation is performed by evaluating an $N$th order polynomial through $N+1$ points. % $C(\epsilon)$ is usually computed using the method of Lagrange interpolation. % Using this method, the function $c(t)$ can be directly computed. % uitzoeken hoe die $c(t)$ nu geconstrueerd word, en zien wat % er mis is met mijn interpretatie hier.. % 'generalized sinc' % equivalent to sinc + binomial window In our scheme we have \begin{equation} f(\epsilon) = \sum_i a_i \epsilon^i \end{equation} with $a_i$ the coefficients for the interpolating polynomial. With $E = (1, \epsilon, \ldots, \epsilon^N)$ we can write this as $f(\epsilon) = A^T E$. This polynomial satisfies the following set of equations \begin{equation} f(x_n) = \sum_i a_i x_n^i = y_n. \end{equation} Using the vandermonde matrix of the neighbourhood coordinates $X$ \begin{equation} V = \left[ \begin{array}{cccc} 1 & x_0 & \dots & x_0^N \\ 1 & x_1 & \dots & x_1^N \\ \vdots & \vdots & & \vdots \\ 1 & x_N & \dots & x_N^N \\ \end{array} \right], \end{equation} we can rewrite this set of equations as $VA=Y$ or $A = V^{-1}S$. This gives $f(\epsilon) = E^T V^{-1} Y$. We move on to determining $t_0'$ and the range for $\epsilon$ which will enable us to obtain an expression for $h(t)$. Setting $W = V^{-1}$, equating $E^TW^TY$ with (\ref{mdtinterpol}) and requiring it to be valid for all $Y$ and $\emin \leq \epsilon < \emax$ gives \begin{equation} H(\epsilon) = W^T \left[ \begin{array}{c} 1 \\ \epsilon \\ \vdots \\ \epsilon^N \\ \end{array} \right] = W^T E(\epsilon). \label{pwpoly} \end{equation} Column $i$ of $W$ contains the coefficients of the polynomial defining a segment of $h(t)$ in the neighbourhood of $x_i$, which makes $h(t)$ piecewize polynomial, i.e. a spline. The segments are the center intervals of the Lagrange base polynomials $l_k$. More explicitly for segment $k$ \begin{equation} h(\epsilon - x_k) = l_k(\epsilon) = \prod_{k \neq l} \frac{\epsilon - x_l}{x_k - x_l} \end{equation} for $\emin \leq \epsilon < \emax$. An important property of $h(t)$ is continuity. This constraint translates to requiring $l_k(\emin) = l_{k+1}(\emax)$, for $k$ from $0$ to $N-1$. As stated above, to preserve symmetry around $0$ in $X$, which yields symmetric coefficients in $W$ useful to simplify implementation, we choose $t_0'$ integer when $N$ is even, and half integer when $N$ is odd. The range of $\epsilon$ is best chosen to be as close to $0$ as possible, in the middle of $X$, since this reduces the interpolation error. This leads us to the following choices. When $N$ is odd, we take $t_0'$ to be half integer, and set $(\emin, \emax) = (-1/2, 1/2)$. This leads to a symmetric $h(t)$. When $N$ is even, we have two optimal choices $(\emin, \emax) = (-1,0)$ or $(0,1)$ respectively. This will yield an asymmetric $h(t)$. The splines $h(t-n)$ map the signal $s_n$ to a space of piecewize polynomials of degree upto $N$. The spectrum of $h(t)$ denoted by $H(\omega)$ is maximally flat, meaning $\frac{d^n}{d^n \omega}H(\omega) = 0$ for $0 \leq n \leq N$. Note that $h(t)$ is not necessarily smooth, only continuous. \subsection{Aliasing} The interpolation kernel $h(t)$ mentioned above must be designed in a way that minimizes aliasing. In order to do this, we must know the rate at which (\ref{ctinterpol}) is sampled before we can design an optimal kernel. The sampling rate in our application is highly variable, and the approaches we presented above result in a fixed $h(t)$, where we would like this to depend on the resampling ratio. Since we do not have such a technique at our disposal, we need to find a workaround. \subsubsection{Multiple $h(t)$} In the case where we use a finite polyphase filterbank optionally followed by low order polynomial interpolation, it is possible to design several $h(t)$ directly, each with a different cutoff frequency, and choose the most appropriate one based on $r$. If the spacing of the different interpolators is large, (i.e. one $h(t)$ per octave), linear interpolation between neighbouring $h(t)$ might be necessary. \subsubsection{Multiresolution} An other approach is multi stage resampling. This is often applied in fixed rate sample rate converters where it has the interesting property to reduce the total number of filter coefficients, and so increases efficiency. This can be done efficiently when $N$ is a composite number. Since we will mostly use $N$ to be a power of $2$, our resampling algorithm for a ratio $r = Nf_n$ reduces to a sequence of $m$ resampling operations with a decimation factor of $2$, with the final step implementing the ratio $r/2^m < 1$, so it does not introduce aliasing. One possible framework for multiresolution is the Discrete Wavelet Transform (DWT). Entry: Sequences and Polynomials Date: Fri Jun 19 11:20:07 CEST 2009 Type: tex To simplify notation we define define the Z--transform for a sequence $s_n$ to be its generating function $s(z) = \sum_n s_n z^n$, which maps causal FIR filters to polynomials. This is in contrast to the engineering convention which defines it in terms of $z^{-n}$, so all statements about poles and zeros of transfer functions should be interpreted in this respect. With causal signals we mean $s_n = 0$ for $n<0$. For a non--causal signal, the Z--transform is the Laurent series, which is assumed to converge on the unit circle. This means the generating function has no poles on the unit circle. The Fourier series is defined as $c_n = r s_n$, with $f(\omega) = \sum_n c_n e^{j\omega} = s(r e^{j \omega})$, and $r$ in the annulus of convergence. In most cases $s(z)$ is bounded on the unit circle and we can set $r=1$. The bivariate version of the Z--transform, relates semi--infinite matrices to a bivariate complex function $m_{ij}(x,y) = \sum_{ij}^{\infty} m_{ij} x^i \overline{y}^j$. We call $m_{ij}(x,y)$ the generating function of the matrix $m_{ij}$. Below we will elaborate on some finite approximations of the Z--transform, where we associate polynomials and matrices to finite sequences. \subsection{Periodic signals} The Z--transform can be used as a frequency domain representation of infinite length sequences with finite energy. It maps shift invariant operators to multiplication in the transform domain. For perioidic signals, the Z--transform does not converge. Similar tools can be constructed however, by using vectors in $\mathbb{C}^N$ to represent a single period of an $N$--periodic signal. The circular down shift operator is then defined as the permutation operator \begin{equation} Z = \left[ \begin{array}{ccccc} 0 & 0 & \hdots & 0 & 1 \\ 1 & 0 & \hdots & 0 & 0 \\ 0 & 1 & \hdots & 0 & 0 \\ \vdots & \vdots & & \vdots & \vdots \\ 0 & 0 & \hdots & 1 & 0 \\ \end{array} \right], \end{equation} which satisfies $Z^0 = Z^N = I$. These circular down shift operators form an $N$--dimensional basis for circulant $N \times N$ matrices. We can associate an order $N-1$ polynomial $a(z) = \sum_{k=0}^{N-1} a_k z^k$ with a circulant matrix $A = \sum_k a_k Z^k$. The first column of $A$ contains the coefficients of $a(z)$. Define The eigenvalues and eigenvectors of $Z$ by \begin{equation} ZW_k = w_{-k}W_k. \label{shifteig} \end{equation} We have $Z^nW_k = w_{-k}^nW_k$. From $Z^N=I$ we get $w_k^N = 1$, which gives the $N$ solutions $w_k = e^{j2\pi k/N}$ and $W_k = (1, w_k, \ldots, w_k^{N-1})$. We also have $AW_k = a(Z)W_k = a(w_{-k})W_k$, which gives us the eigenvalues of $A$ as the discrete Fourier transform of the coefficients of $a(z)$. This allows us to write the eigenvalue decomposition as \begin{equation} A = FCF^H = \frac{1}{N} \sum_k c_k W_k W_k^H = \sum_k c_k F_k, \label{circeig} \end{equation} with $C$ a diagonal matrix containing the eigenvalues $c_k = a(w_{-k})$ and $F$ containing the normalized orthogonal Fourier basis with Vandermonde columns $W_k / \sqrt{N}$. This means the circulant matrices are similar to the diagonal matrices, with the similarity transform equal to the Fourier transform. The diagonal $N \times N$ matrices form a commutative unitarian ring over $\mathbb{C}$. This ring has a general matrix representation $QDQ^{-1}$, of which the circulant matrices are the special case that can be represented using the basis of shift operators $Z^k$ which is of special interest to us, since it gives two isomorphic algebras with multiplication defined respectively as circular convolution and componentwize multiplication. % The polynomial multiples of $z^N-1$ form an ideal in \mathbb{C}[x] % Two polynomials $a(z)$ and $b(z)$ are congruent if their difference % is in I, or $a(z)-b(z) = q(z)(z^N-1) The equivalence classes for % this congruence relation are isomorphic to the circulant (and % diagonal) matrices Note that $Z^{k+N} = Z^k$, so the matrix polynomials $a(Z)$ are isomorphic to the factor ring of polynomials modulo $z^N-1$. In this ring the multiplicative unit element $1 = z^N$ corresponds to the $N \times N$ unit matrix $I=Z^N$. The prime ideals $I_k$ are generated by $w_k(z) = 1 - w_kz$ and correspond to matrices with $c_k = 0$. Since $\prod_k w_k(z) = 1 - z^N = 0$, the nonzero elements of the prime ideals are all zero divisors. Each rank $1$ circulant hermitian projection matrix $F_k = \frac{1}{N}W_k W_k^H$ in the eigenvalue decomposition of $A$ corresponds to the polynomial \begin{equation} f_k(z) = \frac{1}{N} \sum_{l=0}^{N-1} (w_k z)^l = \frac{1}{N} \prod_{l \neq k} 1 - w_l z. \end{equation} The matrix $F_k$ projects on the one--dimensional space generated by $W_k$. These matrices satisfy $F_k F_l = \delta_{kl} F_k$. For the polynomials we have $f_k(z) f_l(z) = f_k(z) \delta_{kl}$. The analog of (\ref{circeig}) then becomes \begin{equation} a(z) = \sum_{k=0}^{N-1} c_k f_k(z). \end{equation} If matrix $A$ is full rank, meaning $c_k \neq 0$, its inverse exists and is given by $A^{-1} = \sum_k c_k^{-1} F_k = FC^{-1}F^H$. For the polynomials we can write $a(z) = a_0 z^m \prod_n (1-z_nz)$. This has an inverse in the ring of polynomials modulo $z^N-1$ if all of its factors have an inverse. The inverse of the factor $z^m$ is $z^{N-m}$. The inverse of the factor $1 - a z$ exists if $a^N \neq 1$. Using $(az)^N = a^N$, we find it to be \begin{equation} (1-az)^{-1} = \frac{1}{1-a^N}\sum_{k=0}^{N-1} (az)^k. \end{equation} The factor $1/(1-a^N)$ can be understood as the effect of the wrapping of the infinite exponential tail. This makes the ARMA model $b(z)/a(z)$ well defined for periodic signals represented by polynomials modulo $z^N-1$, as long as $a(z)$ does not vanish at the $w_k$. %\footnote{The the stability requirement $|a| < 1$ is not necessary for periodic signals.}. The limit case of $N$ tending to $\infty$ gives us the Z--transform of $a_n = a^n$, which converges for $|az| < 1$. % We could abuse notation and define $(1 - w_kz)^{-1} = f_k(z)$, but % remembering that it is a projection operator, not an inverse. This % makes sense, because $f_k(z) = \frac{1 - z^N}{1 - w_k z} \mod 1 - % z^N$. % As a side note, it might be interesting to look at matrices of the % form $W_2^TDW_1$, with $W_1$ and $W_2$ biorthogonal wavelet bases. % Dus. Shift-invariant operations -> exponentials. Wat is de equivalent voor wavelet bases? Note that the polynomial ring we talked about is over $\mathbb{C}$, which is algebraicly closed, so subrings of $\mathbb{C}[z]$ can never be fields. When we take $\mathbb{R}$ as the starting field, we can construct fields as quotients of $\mathbb{R}[x]$, using maximal ideals generated by irreducible polynomials. For $z^2 + 1$ we get the complex numbers. In general, polynomials over $\mathbb{R}$, modulo $z^N + 1$, with $N$ even give a field because the polynomial is irreducible over $\mathbb{R}$. Polynomials modulo $z^2 - 1$, which is reducable, gives the ring of split complex numbers. % what is the significance of this? \subsection{Finite signals} Instead of working with circular shifts and finite periods, we can work with finite signals. Take $Z$ to be the nilpotent $N \times N$ down--shift matrix \begin{equation} Z = \left[ \begin{array}{ccccc} 0 & 0 & \hdots & 0 & 0 \\ 1 & 0 & \hdots & 0 & 0 \\ 0 & 1 & \hdots & 0 & 0 \\ \vdots & \vdots & & \vdots & \vdots \\ 0 & 0 & \hdots & 1 & 0 \\ \end{array} \right], \label{shiftdown} \end{equation} satisfying $Z^N = 0$. Together with $Z^0 = I$, the $Z^k$ form a basis for lower triangular Toeplitz matrices (LTT)\footnote{ which can still be handled by FFT agorithms by embedding them in $(2N-1) \times (2N-1)$ circulant matrices. % Take for example % \begin{equation} % C = left[ % \begin{array} % T 0 % \end{array} % \end{equation} }. Note that each matrix $M$ which commutes with $Z$ is necessarily LTT, since these are the only matrices for which a down shift $MZ$ and a left shift $ZM$ are the same. With each polynomial of degree smaller than $N$ we can associate an LTT matrix $A = a(Z)$. Relating this to the ring of polynomials over $\mathbb{C}$, the LTT matrices are isomorphic to the quotient ring obtained by the ideal generated by $z^N$, the polynomials truncated to degree $N-1$. In this quotient ring, the zero divisors are given by multiples of $z$, which are nilpotent\footnote{ Note that $a(0)$ is the eigenvalue of $A$ with algebraic multiplicity $N$.}. A polynomial $a(z)$ is invertible if all its factors are invertible. If $a(0) \neq 0$, it is invertible and can be written as $a(z) = a(0)\prod_k (1-z_nz)$. The inverse of $1-az$ is given by \begin{equation} (1-az)^{-1} = \sum_{k=0}^{N-1} (az)^k, \end{equation} which tends to the Z--transform when $N \to \infty$. % Constructing a basis for any $N \times N$ matrix $M$ can be done in % the following way. \subsection{Matrix generating function} \def\y{\overline{y}} Now we extend this to two dimensions. The interpretation of the inverse $x^{-1}$ depends on the field we work in. Suppose this to be the polynomials modulo $N$, and associate with matrix $M$ a bivariate function \begin{equation} m(x,y) = \sum_{ij} m_{ij} x^i \y^j = X M Y^H, \end{equation} with row vectors $X = (1,x,\ldots,x^{N-1})$ and $Y = (1,y,\ldots,y^{N-1})$. This associates the term $x^i \y^j$ to the matrix with a $1$ on the $i$th row and $j$th column, counting from zero. Using the notation $Z^{Tj} = (Z^T)^j$, this matrix can be written as \begin{equation} Z^{i} Z^{Tj} - Z^{(i+1)} Z^{T(j+1)} = Z^{i} (I - ZZ^T) Z^{Tj}. \end{equation} When $N \to \infty$ we have the generating function for the semi--infinite matrix $M$. Using this representation, the LTT matrix $A = a(Z)$ corresponds to \begin{equation} a(x,y) = a(x) \sum_{k=0}^{N-1} (x \y)^k = a(x) (1 - x \y)^{-1}, \end{equation} while $A^H$ corresponds to $\overline{a(y)}(1-x\y)^{-1}$. % These relations are tied to the product of Toeplitz matrices. % Furthermore we have $(1-x\y)^{-1} = XY^H$, $x(1-x\y)^{-1} = XZY^H$, % and $\y(1-x\y)^{-1} = XZ^TY^H$. % A seperable matrix $S$ can be associated to a product of two % polynomials $s(x,y) = p(x) q(y)$. % Now take the bivariate polynomial $b(z,z')$ and associate it with % the matrix polynomial $b(Z,Z^T)$. % Now consider the mapping $a(z) \to a(Z) a(Z)^H$. % The Jordan form ... % We can associate with $\overline{a(1/\bar{z})}$ the upper triangular % Toeplitz (UTT) matrix $A^H = \overline{a(Z^T)}$. A positive definite % toeplitz matrix can then be written using its Cholesky factorization % $A = L L^H = l(Z)\overline{l(Z^T)}$. % This brings us to the concept of displacement rank. RICCATI - % BERNOULLI - WIENER/HOPF - KALMAN FILTER ( kailath berkely lecture ) % NOTE: F^2 = \tilde{I} (reversal) displacement rank: sum of cholesky % factorizations, chol = LL^H, met L voorgesteld door veelterm \subsection{Orthogonal polynomials on the unit circle} In the general case where $s_n$ is a causal sequence in $l_p$, and $s(z) = \sum_{n=0}^{\infty} s_n z^n$ belongs to the Hardy space $H_p$. $r(z) = s(z)\overline{s(1/\bar{z})}$ satisfies $r(e^{j\omega}) = |s(j\omega)|^2 \geq 0$. We can define an inner product for the weighted hilbert space of polynomials on the unit circle, using the measure $d \mu = r(z)d \omega$ with $z=\exp(j\omega)$. It is given by \begin{equation} \left = \frac{1}{2 \pi}\int_{-\pi}^{\pi} \overline{g(z)} h(z) d \mu, \label{polyprod} \end{equation} The time domain equivalent space is the space spanned by shifts of the sequence $s_n$. This means $h_n * s_n$ maps to $h(z)=\sum_n h_n z^n$. By incorporating the effect of $s_n$ in the measure, also called the spectral measure, we can work with the polynomials only. The time domain equivalent of the inner product is \begin{equation} \left = \sum_n \bar{s}^g_n s^h_n, \label{polyprod} \end{equation} with $s^g_n = s_n * g_n$ and $s^h_n = s_n * h_n$ the residuals after filtering $s_n$ with $g_n$ and $h_n$ respectively. This inner product defines a norm $\| h(z) \| = \left< h, h \right>$, which gives the energy of the residual after filtering $s_n$ with $h_n$. In this interpretation, we write the autocorrelation of $s_n$ in terms of inner products of monomials in this Hilbert space, setting $g(z)=z^l$ and $h(z)=z^k$ we get \begin{equation} \left = \sum_i \bar{s}_{i-l} s_{i-k} = \sum_i s_{i + (k-l)} s_{i} = r_{k-l}, \end{equation} which defines the Gram matrix of the base of monomials $\{ z^k \}$. We can construct an orthogonal base for this Hilbert space, by applying the Gram--Smidth orthogonalization procedure to the base of monomials. This will solve the least squares linear prediction problem. To see this, we formulate the effect in the time domain using linear algebra notation. We define the infinite dimensional column vectors $S_n$ to be the signal $s_n$ delayed by $n$ and construct the $\infty \times L$ Hankel matrix $S$ by joining the columns $S_1$ to $S_L$. Setting $A = (a_1, \ldots, a_L)$ and $E$ the infinite dimensional column vector containing the prediction error, we require $A$ is the solution to the least squares problem $S A = -S_0 + E$, minimizing $\| E \| ^2$. The vector $E$ is orthogonal to the column space of $S$. By rearranging this equation to $\left[S_0 S\right]\left[1 A\right]^T = E$, and observing the norm $\| E \|$ in the time domain space is the same as the norm $\|a(z)\|$ in the frequency domain, with $a(z) = \sum_n^L a_n$, and $a_0 = 1$ we need to find a minimal norm polynomial $a(z)$. For this polynomial, the time domain orthogonality translates to the frequency domain orthogonality to the span of $\{z,z^2,\ldots,z^{n}\}$. Defining $\phi(z) = z^L \overline{a(1/\bar{z})} = a^*(z)$ gives a monic polynomial orthogonal to span $\{1,z,\ldots,z^{n-1}\}$. The time domain equivalent is a `backward predictor', which produces a residual $F$ that is orthogonal to the span of $\{ S_0, \ldots, S_{L-1}\}$. It is trivial to show that $\|a(z)\| = \|\phi(z)\|$, so we can concentrate on the monic polynomial $\phi(z)$. In a Hilbert space, monic polynomial of minimal norm is the orthogonal one. So our problem reduces to constructing an orthogonal base of polynomials $\phi_n(z)$ using the Gram--Smidth procedure. The crucial observation is that there exists a recurrence relation between the the orthogonal polynomials $\phi_n(z)$, similar to orthogonal polynomials on the real line. This will provide us with a fast algorithm to compute the predictor of order $L+1$ in terms of the predictor of order $L$. Define the partial correlation coefficients as \begin{equation} \rho_{n+1} = \frac{\left< z^{n+1}, a_n \right>}{\|a_n\|^2} \end{equation} These express the correlation between the $n$th order prediction error and the signal delayed by $n+1$. We can write this as \begin{equation} \rho_{n+1} = \frac{\left< z\phi_n, \phi_n^* \right>}{\|\phi_n\| \|\phi_n^*\|}, \label{parcorlev} \end{equation} since $a_n(z)$ is orthogonal to all the terms in $z\phi_n(z)$ except $z^n$. The recurrence relation for the $\phi_n(z)$ is given by \begin{equation} \left[ \begin{array}{c} \phi_n(z) \\ \phi_n^*(z) \\ \end{array} \right] = \left[ \begin{array}{cc} 1 & - \bar{\rho_n} \\ - \rho_n & 1 \\ \end{array} \right] \left[ \begin{array}{cc} z & 0 \\ 0 & 1 \\ \end{array} \right] \left[ \begin{array}{c} \phi_{n-1}(z) \\ \phi_{n-1}^*(z) \\ \end{array} \right]. \label{szego} \end{equation} This can be shown by expanding the relation for $\phi_n(z)$ in terms of all $\phi_k(z)$, with $k < n$, which gives the Gram--Smidth procedure. The time domain equivalent of the recurrence relation gives the lattice filter realization in terms of the partial correlation coefficients. The time domain equivalent of the combination of (\ref{parcorlev}) and (\ref{szego}) is the Levinson algorithm, which exploits the symmetry in (\ref{linpred}). \def\car{Carath\'{e}odory } % \subsubsection{\car and Schur functions} % From a symmetric sequence $s_n$ construct the Toeplitz matrices $S_n = \left[ s_{|i-j|}\right]_{i=0,j=0}^{n}$. Now the $S_n$ are positive definite iff the Laurent series of $s_n$ is real on the unit circle, which means it is a power spectral density function or $\text{Im}(s(z)) = 0$ for $|z| = 1$. \car linked this to the function $c(z) = s_0 + 2 \sum_{i>0} s_n z^n$, which is equal to the Laurent series of $s_n$, except it replaces all terms in $z^{-n}$ by $z^n$. This function is positive real iff $s(z)$ is real on the unit circle. Using the bilinear transform, a \car function can be transformed into a Schur function. % Define a \car function as a positive real function $\text{Re } c(z) \geq 0$, for $|z|<1$. Entry: Spectral Factorization Date: Fri Jun 19 11:21:37 CEST 2009 Type: tex We will have a brief look at spectral factorization and linear prediction. A filter $h(z)$ is minimum phase if it is causal and its inverse $1/h(z)$ exists and is causal. Robinson's energy delay theorem states the minimum phase sequence $h_n$ has the property of minimum energy delay, which means most of its energy is packed towards $n=0$. We write $h(z)$ as \begin{equation} h(z) = \sum_n h_n z^i = \prod_i h_0 (1 - z_iz). \end{equation} If the sum is finite $h(z)$ is a polynomial corresponding to a FIR filter. Using the `mathematical' Z--transform, the minimum phase property requires $|z_i| < 1$, which means the zeros $1/z_i$ are outside the unit circle. Any causal filter $h_n$ can be made minimum phase without changing its maginitude response, by reflecting its zeros outside the unit circle. To see that this indeed produces the same magnitude spectrum, we write the minimum phase version as \begin{equation} h'(z) = h(z) \prod_{i \in I} \frac{z - \bar{z}_i}{1 - z_i z}, \end{equation} with $I$ the set of indices satisfying $|z_i| > 1$. This maps all zeros $1/z_i$ inside the unit circle to $\bar{z}_i$. $h'(z)$ is a polynomial of $h(z)$ is a polynomial. $h'(z)/h(z)$ is an all--pass filter, which has a magnitude response of $1$. This operation preserves the autocorrelation \begin{equation} r(z) = h(z) \overline{h(1/\bar{z})} = h'(z) \overline{h'(1/\bar{z})}, \label{autocorr} \end{equation} since this just switches around zeros in its factorization \begin{equation} r(z) = h_0\bar{h}_0 \prod_i (1 - z z_i) (1 - z / z_i). \end{equation} If $h'(z)$ is causal minimum phase, then its inverse $a(z) = 1/h'(z)$ exists, and we can write \begin{equation} r(z)a(z) = \overline{h'(1/\bar{z})} \label{specfac} \end{equation} which is anti--causal, so the constant term in the Laurent series of (\ref{specfac}) is $\bar{h}_0 = 1/\bar{a}_0$, and all the other terms corresponding to $z^n$, $n>0$ are zero. If the order of $a(z)$ is $L$, this gives us a set of $L+1$ equations to obtain the $a_i$. The normal form can be obtained by dividing the unknowns and right hand side $a_0$. Using $\gamma_L^2 = 1/a_0\bar{a}_0$ this gives the well known symmetric set of Toeplitz equations \begin{equation} \left[ \begin{array}{cccc} r_0 & r_1 & \ldots & r_L \\ r_1 & r_0 & \ldots & r_{L-1} \\ \vdots & & & \vdots \\ r_L & r_{L-1} & \ldots & r_0 \\ \end{array} \right] \left[ \begin{array}{c} 1 \\ a_1 \\ \vdots \\ a_L \\ \end{array} \right] = \left[ \begin{array}{c} \gamma_L^2 \\ 0 \\ \vdots \\ 0 \\ \end{array} \right] \label{linpred} \end{equation} The first row gives the energy misprediction in terms of the $a_i$, while the other rows model determine the $a_i$ in terms of the autocorrelation. In the part on the exponential model for $h_n$, we had a remark about the starting phase of the approximation of $p_n$. Spectral factorization allows us to avoid that problem by modeling the minimum phase part of a sequence as a sum of exponentials. Let's have a look at the error. Let $P_k$ denote the vector with the elements of $p_n$ shifted up by $k$. The shift is done either in a cyclic way, or by adding zeros. Assume $p_n$ is real. Let $A = [a_1 \dots a_L]^T$, $E = [e_0 \ldots e_{N-1}]^T$, and $P = [P_1 \ldots P_L]$. The linear prediction problem solves \begin{equation} PA = -P_0 + E \label{lsar} \end{equation} in a least squares sense, minimizing $E^TE$. The solution is given by pre--multiplying this equation by $P^T$. Since $P^T P$ is invertible, we can write \begin{equation} A = - (P^T P)^{-1} P^T P_0, \end{equation} with $P$ the Hankel signal matrix and $P^TP$ the symmetric positive definite signal autocorrelation matrix. The solution $A$ projects $-P_0$ on the column space of $P$. This can be solved using the Levinson--Durbin algorithm. An alternative approach is the Schur algorithm, which computes $a(z)$ in terms of its lattice filter representation directly. Both have complexity $O(L^2)$. Superfast $O(L\log L)$ algorithms exist, but they are only better for large $L$. % In the framework of linear prediction of a WSS process, the error is % well defined. The predictor for the WSS process behaves as a Minimum % Mean Square Error (MMSE) predictor, minimizing $E(e_k^2)$.In the % case where the WSS process is an autoregressive process of order % $L$, we have in addition $E(e_k, e_l) = 0$, $k \neq l$. The factorization can be computed analyticly by separating the series expansion \begin{equation} \log r(z) = \sum_n \frac{1}{n}[r(z)-1]^n = \sum_n u_n z^n, \end{equation} which converges when $0 < |r(z)| < 2$ on the unit circle. The lower bound means the corresponding Toeplitz matrix is nonsingular, the upper bound can be ensured with proper scaling. We separate the causal and anticausal part \begin{equation} \log r(z) = \left( \frac{u_0}{2} + \sum_{n>0} u_n z^n \right) + \left( \frac{u_0}{2} + \sum_{n<0} u_n z^n \right), \end{equation} or $\log r(z) = u^{+} (z) + u^{-} (z)$. Because $r(z) = r(1/z)$ we have $u^{-} (z) = \bar{u}^{+}(1/z)$. Taking the exponential gives us $r(z) = h'(z) \bar{h}'(1/z)$. This preserves causality of $h'(z) = \exp(u^{+}(z))$. Since $1 / h'(z) = \exp(-u^{+}(z))$ is also causal, $h'(z)$ is minimum phase. % Which amounts to the Levinson algorithm for solving the toeplitz set % of equations (\ref{linpred}), If $\left< g, h \right>$ is zero, the % residuals after filtering are orthogonal. % This algorithm is based on the observation that extending the matrix % with one row and column, and extening the vector of unknowns with a % zero produces a system of equations which has the same form when % order of unknowns and right hand side is reversed. Taking the % weighted sum of the original and reflected set of equations, % weigthed by $(1, \rho_{L+1})$, allows us to parametrize the solution % for $L+1$ in terms of the solution for $L$. The weighting parameter % $\rho_{L+1}$ is determined by requiring the right hand side of the % weighted sum to have the same form as (\ref{linpred}). % This algorithm can be restated in the frequency domain in terms of % polynomials. % Using linear algebra arguments, the Levinson algorithm can be % explained as follows. We write the equation in (\ref{linpred}) as % $R_L a_L = e_L$. By adding an extra row and column to the Toeplitz % matrix $R_L$ we obtain $R_{L+1}$ we get % \begin{equation} % R_{L+1} % \left( % \left[ % \begin{array}{c} % a_L \\ % 0 % \end{array} % \right] % + \rho_{L+1} % \left[ % \begin{array}{c} % 0 \\ % a_L' % \end{array} % \right] % \right) % = % \end{equation} % To relate this to Gram--Smidth orthogonalization in the infinite % dimensional vector space spanned by shifts of $s$, we compute an % orthogonal base with the weighting matrix $W$ in (\ref{gramsmidth}) % equal to $S^H S$ and the $\delta_i$ equal to the columns of % $I_\infty$, we have % \begin{equation} % f_n = \delta_n - \sum_{i=1}^{n-1} \frac{f_i^H S^H S \delta_n}{f_i^H S^H S f_i} f_i. % \end{equation} % With the vectors $s_n = S\delta_n$ the delayed signal and $e_n = % Sf_i$ the prediction error this becomes % \begin{equation} % f_n = \delta_n - \sum_{i=1}^{n-1} \frac{e_i^H s_i}{e_i^H e_i} f_i. % \end{equation} % The trick of the Levinson algorithm is to update the inner products % from $n$ to $n+1$, by taking into account one extra term of the % autocorrelation sequence at each step. The frequency domain version % of the above equation is % \begin{equation} % f_n(z) = z^n - \sum_{i=1}^{n-1} \frac{\left}{\left< f_i,f_i \right> } f_i(z). % \end{equation} % We can relate this to the QR decomposition of the $\infty \times % L+1$ Hankel matrix $S_L$ which has its columns the delayed versions % of the signal $s_n$. Entry: Autogregressive Model Date: Fri Jun 19 11:22:40 CEST 2009 Type: tex \subsection{AR modeling for periodic signals} In the case where $s_n$ is periodic with period $N$, the polynomials are taken modulo $1+z^N$, and we can use the finite vector space $\mathbb{C}^N$ as the alternative of the Hilbert space of polynomials on the unit circle. With $S$ to the circulant matrix constructed from $s_n$, The time domain weigted inner product for the space spanned by the columns of $S$ can be written as \begin{equation} \left = g^H S^HS h. \end{equation} Because $S$ is circulant, we can express it as $S = FCF^H$, with $C$ a diagonal matrix. The equivalent frequency domain inner product then becomes \begin{equation} \left = g^H F C^HC F^H h. \end{equation} Where $F^H h$ is the analogy of $h(z)$, and the diagonal matrix $R = C^HC$ is the analogy of $r(z)$. Finding the equivalent of a base of orthogonal polynomials stated above can be restated as an inverse QR problem. Note that the columns of $F$ are also an orthogonal base, though they do not solve our successive AR approximation problem since these vectors are dense. \subsection{AR as inverse QR} Finding an orthogonal base in a weighted vector space is related to the inverse QR decomposition of a matrix $S$, using the Gram--Smidth procedure. This decomposition is written as \begin{equation} SR = Q \label{iqr1} \end{equation} with $R$ an triangular and $Q$ orthogonal. This is equivalent to stating that $R$ is orthogonal with respect to the inner product weighted by $S^HS$, or \begin{equation} R^HS^HSR = Q^HQ = I, \end{equation} which is the time domain equivalent of the orthogonal polynomials treated above. % another way of saying this is that $R$ reduces the quadratic form defined by $S^HS$ to a sum of squares. Taking $S$ to be the $\infty \times N$ Toeplitz matrix containing shifts of the signal, and computing $R$ using Gram--Smidth with weighting $S^HS$ gives us the normalized backward predictors as columns of $R$. The columns of $Q$ are then the normalized backward prediction errors. Multiplying (\ref{iqr1}) by the diagonal matrix $E'$ such that the diagonal elements of of $\Phi = RE'$ are $1$ gives us the predictors as columns of $\Phi$, while the diagonal elements of $E'$ contain the square root of the prediction error energies. We can obtain this by computing the columns of $R$ by not normalizing (\ref{gramsmidth}), since we start from the base $I$. With $E = QE'$ this gives \begin{equation} S\Phi = E. \end{equation} The corresponding columns of $\Phi$ and $E$ gives us the solution to the respective order least squares backward prediction problem. This can be seen because the $n$th column of $E$ is orthogonal to the $k = a^T W b$, with $W$ a positive definite matrix. This defines the norm $\| a \| = \left< a, a \right>$. We start from the base $a_n$ of mutually independent vectors and want to generate an orthogonal base which spans the same subspace. The base vectors are given by expressing $a_n$ as a component in the space spanned by $\{e_1, e_n\}$ and a component $f_n$ orthogonal to this space, or \begin{equation} f_n = a_n - \sum_{i=1}^{n-1} \frac{f_i^T W a_n}{f_i^T W f_i} f_i. \end{equation} With $e_n = f_n / \|f_n\|$ we have \begin{equation} f_n = a_n - \sum_{i=1}^{n-1} (e_i^T W a_n) e_i. \label{gramsmidth} \end{equation} Entry: Moore's Law Date: Sat Jun 20 10:41:56 CEST 2009 It's all getting just too expensive[1][2][3]. The idea is that in order to keep the exponential going, software needs to become more efficient/effective. This then again will probably lead to another exponential prodictivity curve keeping moore's law just as it is. [1] http://www.dspdesignline.com/news/218100277 [2] http://c2.com/cgi/wiki?EndOfMooresLaw [3] http://c2.com/cgi/wiki?MooresSecondLaw Entry: DWS: blurbs removed from paper Date: Sat Jun 20 13:38:15 CEST 2009 Type: tex \section{Related Techniques} \subsection{Wavetable synthesis} In wavetable synthesis, a sound signal is generated by interpolating the values of a waveform vector $P \in \mathbb{R}^N$. This is related to our model with ${\pdifft} p = 0$. To simplify indexing we choose the following units: phase $\phi$ and frequency $f$ are assumed to be modulo $1$, instead of $2\pi$ as chosen above. The periodicity of $p$ is taken to be $N$ in the phase dimension. Explicitly this gives $$ \begin{array}{rcl} s_n &=& p(N\phi_n) \\ \phi_{n+1} &=& \phi_n + f_n \end{array} \label{wavetab} $$ with $f_n$ the instantanious frequency relative to the sample frequency of $s_n$, $\phi_n$ the current waveform phase, and $p$ a function which interpolates $P$ satisfying $p(i) = p_m$, $m = i \mod N$. The basic system we use is an extension of this, where the vector $P$ is replaced by a vector sequence $P_k$, updated at a rate $T$. This turns the equations into \begin{equation} \begin{array}{rcl} s_n &=& p_k(N\phi_n) \\ k &=& [n / T] \\ \phi_{n+1} &=& \phi_n + f_n \end{array} \label{dynwavtab} \end{equation} with $[x]$ the integer part of $x$, and $p_k$ a function interpolating the vector $P_k$. The general form which takes into account time interpolation of successive $P_k$ is \begin{equation} \begin{array}{rcl} s_n &=& p(N\phi_n, n) \\ \phi_{n+1} &=& \phi_n + f_n \end{array} \label{dtdynwav} \end{equation} which is equal to our original model, except for scaling of the parameters. Here $p$ is interpreted as interpolating the grid $P_k$ in both the phase dimension $\phi_n$ and the time dimension $n$. To relate our model to the general overlap added block approach, we present a special case of this for $2$--point interpolation of the $P_k$, \begin{equation} \begin{array}{rcl} s_n &=& p_k(N\phi_n) w(n - kT) \\ &+& p_{k-1}(N\phi_n) w(n - (k-1)T) \\ k &=& [n / T] \\ \phi_{n+1} &=& \phi_n + f_n \end{array} \label{dws} \end{equation} Here $w$ is the interpolation function used for interpolating the time dimension of our time--phase model, which is zero outside the range $[-T, T]$. It has the property $w(n) + w(n-T) = 1$, $n \in [0, T]$. This is related to the windowing function in sinusoidal modeling, where it is used to limit the time extent of a periodic base function with infinite extent. \subsection{Sinusoidal modeling} We now consider the frequency domain interpretation of our basic model. We work from the discrete representation (\ref{dtdynwav}). Set $P_k = F C_k$, with $F$ the IDFT. This makes $P_k$ a linear combination of the columns of $F$, so the instantanious waveform is a sum of harmonicly spaced imaginary exponentials. Translated to $s_n$, the effect of resampling to $f_k$ is to shift each harmonic relative to $f_k$. Let's concentrate on column $l+1$ of $F$, setting the corresponding coefficient in $C_k$ to $1$, with the rest $0$. The column is a vector with a frequency of $l$ relative to the period $N$. Assuming perfect interpolation with $f_n$ constant and $\phi_n = 0$, we have \def\fscale{\frac{1}{\sqrt{N}}} \begin{equation} s_n = \fscale \exp(j2\pi f_n l n). \label{partial} \end{equation} Taking into account interpolation between successive $P_k$, and a variable $f_k$ gives us a special case of sinusoidal modeling, where a signal is expressed as a weighted sum of sinusoidal `blobs' which have most of their energy concentrated in a small part of the time--frequency plane. The frequency locality is an effect of the columns of $F$. The time locality is an effect of interpolation between successive $P_k$. What we are interested in now is to express the effect of phase change of the elements of $C_k$ from $k$ to $k+1$ on the resulting signal $s_n$. There are 2 terms contributing to this. The first is phase change due to $f_n$. With $f_n$ constant, the accumulated phase change over $T$ samples in (\ref{partial}) is $2\pi f_n l T$. The second term is independent of $f_n$ and is merely due to the phase change per $k$ update of the corresponding element of $C_k$. Setting this equal to $\Delta \phi'$ per $k$ update, and defining the immediate frequency as phase change over time gives us \begin{equation} f' = f_n l + \frac{\Delta \phi'}{2 \pi T}. \end{equation} The consequence of this is that it is possible to represent slightly inharmonic sounds. Updating the phase of element $j$ of $C_k$ on each $k$ has the effect of detuning the frequency of the corresponding harmonic. In the limit case where the phase is reversed on each update, significant amplitude modulation distortion occurs as a result of interpolation between successive $P_k$. However, for smaller updates, this distortion is hardly noticable. On the other hand, the AM modulation between neighbouring harmonics resulting from this slight detuning is pleasant to the ear. It is characteristic of quite a lot of stringed instruments, and makes the signal sound less synthetic. \subsection{Phase modulation} Since phase is an explicit parameter in our model, applying phase modulation to (\ref{torus}) is trivial. Using a modulator $\phi'(t)$, we can transform a signal $s(t)$ to \begin{equation} s'(t) = p(\phi(t) + \phi'(t), t). \label{phasemod} \end{equation} Phase modulation (and its relative frequency modulation) can be seen as phase warping. Here this would be $r(\phi(t), t) = \phi(t) + \phi'(t)$. In the case where $r$ is independent of $t$, which boils down to periodicity relationships between $\phi$ and $\phi'$, the warping can be embedded in $p$. A warped $p$ then looks like \begin{equation} p'(\phi, t) = p(r(\phi), t). \end{equation} In the discrete representation, this is equivalent to resampling $P_k$, according to $r(\phi)$. The dual operation is to do this in the frequency domain, by resampling $C_k = F^HP_k$. \section{Interpolation} TODO: two things to explain: aliasing and phase cancellation Interpolation or resampling is central to our approach, so it makes sense to reflect on the possible options we have to implement this. [ See basics article: this is basicly Shannon--Nyquist, but with some extra hacks ordinarily not found in the treatment of sample rate conversion with rational ratios. ]. \subsubsection{Filtering $P_k$} The resampling ratio in (\ref{dtdynwav}) is $r = N f_n$. This is the ratio between the old and new sample rate. As long as the new sample rate is higher than the original one, which means $r<1$ or $f_n < 1/N$, there is no aliasing. We can eliminate undesired frequencies from $P_k$ before it is interpolated. Several approaches are possible. Since $P_k$ is periodic, we can simply eliminate the higher frequencies in its spectrum $C_k$. Depending on the specific layout of the processing algorithm at hand, this might be free if the last step before determining $P_k$ is an IDFT. For example when using a spectral synthesis or processing operation. If not, this approach might be too expensive, since it involves extra DFT, IDFT steps. An alternative is to filter $P_k$ in the time domain. This can be done using both FIR and IIR techniques. Since we have the whole signal at hand we can use IIR filters in a forward/backward way to obtain linear phase response. For IIR filters, running the filter once on $P_k$ and once with zero input, and adding these 2 responses, allows us to approximate the steady state response to the periodic extension of $P_k$. FIR and IIR filter coefficients could be pre-computed, and chosen depending on the current resampling rate at run time. When using IIR filters, the coefficients can be computed at run time using standard filter design techniques, of which Butterworth filters are interesting due to the maximally flat property. This approach is related to the previous one, with the exception that the different $h(t)$ are determined using a 2 stage process. The resulting kernel is obtained by convolving the filter for $P_k$ with the interpolation kernel used. \section{Pitch/formant coupling} One of the drawbacks of our model is the coupling between pitch and formants, which is often referred to as the `chipmunk effect'. Resampling a signal $s_n$ changes not only its pitch and time base, but also its formant structure. Formant structure in this context means the `spectral envelope' of a sound, which is usually largely independent of its pitch. An advantage of our model is that time scale modification independent of pitch and formant structure is possible by resampling $P_k$ in the $k$ dimension, and keeping $\phi_n$ as it is. This could be seen as a `harmonic phase vocoder'. In our model resampling is core, and because the chipmunk effect is generally not desired because it sounds artificial, we have to take it into account. A positive side of this is that if we solve this problem, we can do arbitrary pitch/time/formant manipulations. % In perceiving sounds, our brain seems to separate pitch and formant % structure into largly independent properties. This can be understood % as an adaptation to the behaviour of most sound sources in our % natural environment. Formant structure is largely an effect of % resonating cavities, the size and shape of objects, and is mostly % constant, with the exception to the rule being voice, where the % formant structure is largely variable due to a variable shape % cavity. Pitch on the other hand, is determined by oscillating % entities places in or near resonators, like strings (vocal chords). \subsection{Source/filter model} In general, this problem is solvable if we can decompose a signal $s(t)$ as \begin{equation} s(t) = h(t) * s'(t), \label{sourcefilter} \end{equation} with $h(t)$ the impulse response of a filter representing the formant structure, and $s'(t)$ an oscillator. Changing the pitch and formant structure independently amounts to changing the time base of $s'$ and $h$ respectively. For sound synthesis this is trivial, since $h(t)$ is explicitly synthesized. In our model this translates to synthesizing $C_k$ using a parametric method generating a spectral envelope. We simply need to take the playback pitch $f_n$ into account to compensate for formant shifting when generating the spectrum. Without further information or assumptions about the signal $s(t)$, the deconvolution equation (\ref{sourcefilter}) is underdetermined. For some sounds, this model is too crude an approximation. A better model is given by \begin{equation} s(t) = h^b(t) * h^r(t) * s'(t), \end{equation} where $s'(t)$ is an arbitrary excitation signal, $h^r(t)$ is a harmonic resonnator determining the pitch, for example an air column or a string, and $h^b(t)$ contains the overal spectral envelope, which can be assumed to be constant. However, in our representation we combine the effects of the last two terms into the oscillator. % Below we will have a look at several approaches that try to solve % the problem of independent time/pitch/formant modification, and will % illustrate how this relates to decomposing $s(t)$ in a source and a % filter component using additional constraints and assumptions. \subsection{Periodic signals} % The coefficients are then given as $s_n = % \frac{1}{n!}\frac{\diff}{\diff z}s(z)$. In this section we work with periodic sequences with period $N$. The tools to use here are circulant matrices and polynomials modulo $z^N-1$. (See notes). For $a^N \neq 1$ the inverse of a degree 1 polynomial exists and is given by \begin{equation} (1-az)^{-1} = \frac{1}{1-a^N}\sum_{k=0}^{N-1} (az)^k, \label{ringinverse} \end{equation} which allows us to transform rational functions (ARMA models) to polynomials modulo $z^N-1$. The exponential models in the following section take the form \begin{equation} p(z) = \frac{c(z)}{a(z)}, \end{equation} where we can interpret $p(z)$, $c(z)$ and $1/a(z)$ either as the Z--transform of the respective infinite sequences, or as the polynomials $p(z)$, $c(z)$, $a(z)^{-1}$ modulo $z^N-1$, of which the coefficients represent one period of a periodic signal. In the latter case the inverse of a polynomial $a(z)$ is taken to be the polynomial $a'(z)$ satisfying $a'(z) a(z) = q(z)(z^N-1) + 1$ for some polynomial $q(z)$. We normalize the filter nominator polynomial of degree $L$ as \begin{equation} a(z) = \prod_l (1 - z_l z), \end{equation} with zeros given by $1/z_l$, for which we have $|z_l| < 1$, which means the zeros are all outside the unit circle, so the filter is stable. This normalization allows us to write the impulse response $h(z) = 1/a(z)$ as a sum of complex exponentials \begin{equation} h_n = \sum_l z_l^n. \end{equation} Below we take a look at several ways to formulate the problem so we can shape the decomposition of $p(z) = s(z)h(z)$ so $s(z)$ can serve as oscillator in our source--filter model. \subsection{ARMA model} Let's restate our goals. We want to separate $s(z) = h(z)s'(z)$ with $h(z)$ a filter and $s'(z)$ an oscillator. This corresponds to the physics of most sounds occuring in nature, including human speech. The reason we want to separate the two is to be able to change the frequency of the oscillator $s'(z)$, without changing the filtering effect $h(z)$, which we can define as the property which is clearly recognized by our auditory system as being an invariant of a certain sound, independent of the pitch. In general, we model $h(z)$ using an ARMA model \begin{equation} s(z) = \frac{b(z)}{a(z)} s'(z), \label{arma} \end{equation} where $a(z)$ and $b(z)$ are low order polynomials. % This equation is understood in the following way. If $s'(z)$ is periodic with period $N$, the steady state response $s(z)$ is equal to the periodic extension of $p_n$. The only way we can make this problem computable, is by introducing additional information, or make guesses about properties of both $s'(z)$ and $h(z)$, based on physical or perceptual motivations. For the ARMA model, this boils down to choosing the model order for $h(z)$ and an expression for $s'(z)$, which includes the modeling error. This gives us a general solution to the source--filter separation problem. Below we will look at two approaches which give us such a model. The difference is mostly in the shape of the error term. Once we have a model which incorporates a relatively small error term, we are still free to manipulate (\ref{arma}) by refining the assumptions about $s'(z)$. For example, depending on additional information, we can move some of the poles and zeros of $h(z)$ to the excitiation function $s'(z)$. Since we are working with steady state responses, we can even move a pole of $h(z)$ to a set of $N$ zeros in $s'(z)$. \subsection{ARMA Poles} Taking a look at the zeros of $a(z)$, we can basicly separate them in two classes. Zeros close to the unit circle correspond to narrow bandwidth oscillatory modes and are grouped in $h^r(z)$, while zeros further from the unit circle correspond to wideband resonnance peaks, the part we are really interested in, and are grouped in $h^b(z)$. \begin{equation} s(z) = h^b(z) h^r(z) s'(z), \end{equation} which can be obtained from (\ref{arma}). When shifting the frequency of $s'(z)$, we probably want to shift the narrow band modes too. This means the excitation of the system is actually $h^r(z) s'(z)$ instead of impulse like, so we have little information about the true shape of the spectrum. How can we deal with this or avoid it all together? We could try to translate this to a preprocessing step where we can weigh the spectrum of $p_n$ to accentuate its high frequency modes. After fitting the model, we could inverse filter the oscillator waveform to compensate for this. Another remark to be made is that narrowband modes can easily be picked from the harmonic spectrum of $p_n$. By removing them beforehand, it might be possible to push the ARMA model toward broader peaks. \subsection{Exponential modeling} A way to look at the finite sequence $p_n$ is to view it as the steady state response of some IIR filter $h_n$ to some periodic signal. In the time domain this is \begin{equation} p_n + e_n = h_n = \sum_l c_l z_l^n. \label{expmod} \end{equation} We write this in he frequency domain modulo $z^N-1$. Using (\ref{ringinverse}) this becomes \begin{equation} h(z) = \sum_l \frac{c_l'}{1 - z_l z} = \frac{b(z)}{a(z)} \end{equation} with $c'_l = (1 - z_l^N) c_l$ the non--wrapped amplitudes and $a(z) = \prod_l (1 - z_l z)$. Denominator $b(z)$ can be computed from the partial fraction expansion using the $c_l'$. We can solve (\ref{expmod}) minimizing $\sum_n |e_n|^2$. Looking at the error, we seek a sequence $e_n$, such that $p_k + e_n$ is a sum of exponentials. We set $E_k$ to be the vector containing the error sequence shifted up by $k$, in the same fashion as we define $P_k$. Set $E = [E_1 \ldots E_L]$ The exponential modeling problem solves $(P + E)A = -(P_0 + E_0)$ minimizing $E^T E$. With $P' = [P_0 P]$, $A' = [1 A^T]^T$, and $E' = [E_0 E]$ it amounts to finding a Hankel matrix $E'$ that satisfies $(P' + E') A' = 0$. This boils down to finding a rank $L$ hankel matrix which approximates $P'$, minimizing $\| E' \|_f$. This is a structured (Hankel) total least squares problem, or structured errors--in--variables model. We will also have a look at a at a subspace approximation called HSVD, where the Lanczos method can be used to find an (unstructured) rank $L$ approximation. One important question remains. In our model the finite sequence $p_n$ is one period of a periodic sequence. Shifting the elements of $p_n$ in a circular way is possible in our model, but gives a different optimization problem for (\ref{expmod}). This can be seen as choosing the starting phase of the impulse train. We will see below that working from the autocorrelation function allows us to ignore the phase problem. This leads us to another formulation of the exponential modeling problem in terms of linear prediction, where we do not present an impulse input, but a broad spectrum noise signal. % NOTE: LS of TLS is natuurlijk latency versus misfit.. wat dit nu % eigenlijk wil zeggen hier is niet duidelijk. het zegt enkel iets % over de foutendistributie. voor TLS gaat de additieve ruis ->wit en % voor LS gaat het residu (het source signaal) -> wit % Once the model is fitted, the residual $e_i$ can be inverse filtered % to replace the assumption of $s(t)$ being an impulse train. This % gives us the separate description in (\ref{sourcefilter}), with the % added advantage that $h$ is parametrized in frequency through the % poles $z_l$. % Another extension is possible: by deconvolving the inverse filtered % error term using some constraint or assumption about $h$, we could % attribute a one part of the deconvolution to $h$ turning it into an % IIR filter instead of allpole. \subsection{Frequency domain AR} Instead of modeling the period vector as a (periodic) AR model, we can take the dual approach and write its spectrum as an AR model. This is equivalent to modeling the time domain waveform as a sequence of pulses, and might be interesting for formant preserving time stretching, by allowing us to extrapolate the waveform. % In the framework of linear prediction of a WSS process, the error is % well defined. The predictor for the WSS process behaves as a Minimum % Mean Square Error (MMSE) predictor, minimizing $E(e_k^2)$.In the % case where the WSS process is an autoregressive process of order % $L$, we have in addition $E(e_k, e_l) = 0$, $k \neq l$. % So we can relate exponential modeling to linear prediction. In the % case where we compute $r_n$ from $p_n$, $r_n$ will be periodic, and % can be written as a sum of sinusoids. % \subsection{Frequency domain AR} % One possible extension is to compute the AR model in the frequency % domain. For te periodic model we can pre--multiply both sides of the % equation (\ref{lsar}) by the matrix $F^H$ representing the Fourier % transform, or % \begin{equation} % C A = - C_0 + F^HE, % \end{equation} % with $C$ the matrix of phase shifted spectra. Since $F^H$ is unitary % the error norm is preserved $\|F^HE\| = \| E \|$ and we have the % same solution % \begin{equation} % A = - (C^H C)^{-1} C^H C_0, % \end{equation} % with $C^H C = (F^HP)^H(F^HP) = P^TP$, and $C^H C_0 = % (F^HP)^H(F^HP_0) = P^TP_0$. % NOTE NOTE: alternative intepretation of AR the ar model write the % spectrum in terms of phase shifted versions of the spectrum NOTE % NOTE: instead of computing the correlation matrix from the power % spectrum $c(\omega)$, one could compute it from % $\sigma(c(\omega))$. (see support vector machines: each linear algo % in terms of inner products can be written as nonlinear using a % kernel function NOTE NOTE: fourier - mul - inverse fourier = % multiplier theory (i.e. fractional differentiation) % \subsection{Extensions} % We propose a combined method of spectral factorization, spectral % weights and exponential modeling. % First, the spectrum $c_n$ is weighted by $\sqrt{n}$. This % corresponds to the average pink noise distribution of sound % signals. The fundamental is discarded. % As a second step, we compute the minimum phase version of the % weighted spectrum, recording the phase adjustment. This signal will % be treated as an impulse response which will be modeled as a sum of % exponentials. % The resulting error will be inverse filtered to add to the impulse % stimulus. The phase and amplitude effects of the weighting will be % applied in reverse to obtain the original stimulus. % In speech coding, the guesses amount to assuming $s(z)$ is either a % periodic impulse (voice speech) or white noise (unvoiced % speech). The filter $h(z)$ corresponds to the transfer function of % the vocal tract. This seems to work reasonably well for coding % applications, but since our goals are high quality sound processing, % we have to do better than that. % Take for example the violin. A rough approximation would be the % domposition $p(z) = h(z) s(z) b(z)$, where $b(z)$ is the input % oscillator, modeled as a sequence of pulses due to sliding of the % bow over the strings. $s(z)$ is the transfer function of the strings % combined with the effect of position and pressure of the fingers of % the player. This determines the eventual pitch. and $h(z)$ is the % effect of the violin body and the transfer function from bridge to % body. In our view we can combine the effect of $b(z)$ and $s(z)$. % That's quite something. Luckily, we can make some % approximations. Let's take a look at the results of exponential % modeling. In the case where the error $e_k=0$, we have the harmonic % decomposition of the signal $p_k$. This is clearly not what we want, % since it is what we had, a periodic oscillator. % Reducing the filter order allows us to move some information from % $h(z)$ to $e(z)$. What remains in $h(z)=1/a(z)$ can be split in 2 % parts. Strong resonnances could be attributes to harmonics of the % oscillator, while weak resonnances can be attributed to modes of a % filter. The closer a zero of $a(z)$ is to $\exp(j2\pi t/T)$, the % more likely it is to be due to an effect we like to incorporate in % the oscillator, not the filter. % We can use this information not only to do a better source/filter % separation, but also to improve the condition of the estimation % problem, since poles close to the unit circle severely infect the % condition of an exponential estimation problem. % \subsection{Fixed zero ARMA} % In the case where the periodic signal $p_k$ has a strong harmonic % content, we try to model its effect by a zero instead of a pole. We % can do this because we are dealing with periodic signals. % \subsection{Resampling lattice filters} % REF = bultheel's paper % It would be interesting to have an explicit polar representation of % $h(z)$, as in (\ref{pureexp}). Though, depending on the application, % a representation using $\rho_l$ might be more interesting. % We have a brief look at the possibility to use reflection % coefficients to represent spectral data, instead of polar or % polynomial form. % \subsection{Phase vocoder} % Without resorting to an extensive description of the algorithm, the % phase vocoder can be seen as an analysis/synthesis tool for time % base modification independent of pitch and formant structure. It % assumes a signal $s_n$ can be represented as a sum of sinusoidal % components which vary slowly over time. % This presents a model in terms of amplitude, phase and % frequency. Contracting the time base is then equivalent to % resynthesis with keeping the amplitudes and frequencies fixed, but % modifying the phase according to the new time base. % Several variants of the algorithm exist. Most are based on DFT % analysis and `peak picking' methods, where the sinusoidal components % are identified by looking at the peaks of a spectrum. Sythesis is % done by direct modification of the phase of a DFT spectrum. % It does not separate pitch and formants, and has the same drawbacks % as our method when used for pich shifting. In fact, our method can % be seen as a harmonic phase vocoder, with all sinusoidal components % harmonicly spaced. %\subsection{WSOLA} % \section{Remarks} % The core of the solution is very simple. Stretching or compressing % the spectrum $C_k$ is exactly what we need. Dually, we can compress % or stretch $P_k$. The difficulty lies in the boundary conditions. % When stretching $C_k$, wrapping around from Nyquist to DC is not a % good idea, since it introduces aliasing. The dual operation is % compressing $P_k$, which indeed can introduce aliasing if there are % high frequencies present. So it seems shifting formants up can best % be done in the frequency domain. % When stretching $P_k$, wrapping around is not necessarily a bad % thing. Or is it? % Here we can use some kind of hack. Since most real filters in nature % are causal, we could use this information. Other hack: spectrum % warping: leave the first couple of harmonics intact, and shift only % the higher ones? % Another hack: using HTLS/HSVD on the period vector to isolate % formants? Suppose we have a % Here we will discuss the effects of deforming the function $p(\phi, % t)$ in the phase (or frequency) dimension. This is the basic tool we % have to re--organize the spectral content at time $t$. % in how far is it possible and/or desirable to generalize the model % to pure sinusoidal modeling based on absorbing inharmonicity in the % phase spectrum? \section{Effect processing} Spectral shift. Pitch correction. Harmonic enhancement (aliasing free waveshaping). Phase coherence. \section{Data reduction} Approximation of all discrete sequences (spectra, AR models, pitch envelope) using DWT compression. Phase encoding (constant difference). \section{Pitch detection} A crucial part for any analysis algorithm for our model is a pitch (periodicity) detection scheme. More so, given a perfect pitch estimator, parameter estimation for our model is relatively trivial, and amounts to segmentation of a signal $s(t)$ into a sequence of intervals $s_i(t)$ related to a sequence of periods $p_i(\phi)$. The length of each interval is determined by the instantanious period $T_i$ at time instance $t_i$, with $t_i = \sum_{k FT -> \sigma in FD % There are a lot of approaches to pitch detection to be found in the literature. % Time domain: autocorrelation. Frequency domain: peak picking, harmonic sum % spectrum. Cepstrum methods. % Fiddle uses an ad-hoc likelyhood function. \section{Non--periodicity} Our model assumes signal periodicity, which makes it generally unsuitable for modeling complex sounds. In addition to that, we need to take into account transients. % Although the model we present is mainly intended as an overcomplete representation of a signal which allows real--time modification of a lot of its parameters, and therefore intended as a musical tool, there are possibilites to employ compression techniques to limit the storage needed to store sound expansions. \section{Notation} We use the following notation. $s_i$ represents (an element of) a discrete sequence or a vector. Multiple indices denote elements of discrete grids or matrices. Capital letters denote matrices and vectors. The elements of the $K \times L$ matrix $M$ are denoted as $m_{kl}$, with $k$ the row index running from $1$ to $K$, and $l$ the column index running from $1$ to $L$. $s(x)$ represents a real or complex function. All entities take their values from $\mathbb{R}$ or $\mathbb{C}$. We use $j = \sqrt{-1}$. We represent the discrete Fourier transform (DFT) using the matrix $F \in \mathbb{C}^{N \times N}$ with elements $f_{k+1,l+1} = \frac{1}{\sqrt{N}}\exp(j2\pi kl / N)$. We have $F^{-1} = F^H$ due to normalization, with $(.)^H$ denoting the complex conjugate transpose. The DFT of vector $X$ is then denoted as $F^H X$, and the inverse discrete Fourier transform (IDFT) of $X$ is $FX$. Entry: Complex Waveshaping Extras Date: Sat Jun 20 16:34:42 CEST 2009 Type: tex Stuff below this is just for inspiration. \subsection{More unrelated stuff} The space spanned by the $f_{a,\theta}$ is isomorphic to the space of matrices of the form \begin{equation} m'_{a,\theta} = e^{i\theta} \left( \begin{array}{rr} 1 & a \\ \bar a & 1 \\ \end{array} \right). \end{equation} $|a| \neq 1$ ensures $\det m'_{a,\theta} \neq 0$. Dividing the nominator and denominator of (\ref{themap}) by $\sqrt{1-|a|^2}$ does not change the value of $f_{a,\theta}$ so we will let the matrix with unit norm determinant \begin{equation} m_{a,\theta} = \frac{e^{i\theta}}{\sqrt{1-|a|^2}} \left( \begin{array}{rr} 1 & a \\ \bar a & 1 \\ \end{array} \right). \label{ma} \end{equation} correspond to $f_{a,\theta}$. The set of $m_{a,\theta}$ is a group under multiplication. The mapping between $f$ and $m$ is a group homomorphism mapping composition of $f$ to multiplication of $m$. The inverse of $m_{a,\theta}$ is given by $m_{-a,-\theta}$. This means that $f_{a,\theta} \circ f_{-a,-\theta} (z) = z$. The $m_{a,\theta}$ span a hyperbolic space where the inverse is given by the anti-hermitian of $m$. For the angular representation We will write $a = r e^{i\beta}$. For $0 < |r| < 1$ we can set $r = \sin \gamma$. Transforming (\ref{themap}) we get \begin{equation} f_{\beta,\gamma}(z) = \frac{z + \sin \theta e^{i\beta}}{1 + \sin\theta e^{-i\beta} z}. \label{themap_angular} \end{equation} Transforming (\ref{ma}) in the same way we get \begin{equation} m_{\gamma,\beta} = \frac{1}{\cos \gamma} \left( \begin{array}{cc} 1 & r e^{i\beta} \\ r e^{-i\beta} & 1 \\ \end{array} \right). \label{ma_angular} \end{equation} Writing (\ref{themap_angular}) as a function of $\alpha$ with $z = e^{i\alpha}$ we get \begin{equation} f(\alpha,\beta,\gamma) = \frac{e^{i\alpha} + \sin \gamma e^{i\beta}}{1 + \sin\gamma e^{-i\beta} e^{i\alpha}}. \label{themap_angular} \end{equation} \subsection{Lie groups} It would be nice to check out what can be said about modulation in algebras that can be represented by matrices over the reals or complex numbers. % I don't know much about this subject, but find it intriguing nontheless. The oscillator $x(t) = \exp(i\omega t)$ could be generalized to \begin{equation} X(t) = \exp(\Omega t), \label{lieosc} \end{equation} where $X(t)$ is an element of a Lie group, and $\Omega t$ is the corresponding element of the Lie algebra. Using matrix representations of both allows us to constructs methods that can be implemented on a digital computer. The oscillator (\ref{lieosc}) can be written as a set of coupled oscillators, by using the eigenvalue decomposition of $\Omega = S\Lambda S^{-1}$, assuming it exists. This gives \begin{equation} X(t) = S \exp(\Lambda t) S^{-1} \end{equation} Take for example $\text{SO}(3)$ with its universal (double) cover $\text{Spin}(3)$ or $\text{SU}(2)$, which is diffeomorphic to the 3--sphere $\text{S}^3$, the unit quaternions. The Lie algebra $\text{su}(2)$ can be represented by the Pauli matrices \begin{equation} \sigma_1 = \left[ \begin{array}{cc} 0 & 1 \\ 1 & 0 \end{array} \right],\text{ } \sigma_2 = \left[ \begin{array}{cc} 0 & i \\ -i & 0 \end{array} \right],\text{ } \sigma_3 = \left[ \begin{array}{cc} 1 & 0 \\ 0 & -1 \end{array} \right]. \end{equation} Extended with $\sigma_0 = I$, the identity matrix, $(\sigma_0, i\sigma_1, i\sigma_2, i\sigma_3)$ gives a representation for the quaternion basis $(1,i,j,k)$ with $i^2 = j^2 = k^2 = ijk = -1$. Writing (\ref{lieosc}) in terms of the quaternions, we have \begin{equation} x(t) = \exp(q t). \end{equation} If $q$ is purely imaginary, $x$ will have unit norm. % Other examples include the biquaternions and general Clifford Algebras. % AARGH!! % This basis is equivalent to the quaternions % $a + bi + cj + dk$ with $-1 = i^2 = j^2 = k^2$, which are represented by % a\sigma_0 + ib\sigma_1 + ic\sigma_2 + id\sigma_3 = %\begin{equation} %\left[ %\begin{array}{cc} %a + id & c + ib \\ %-c + ib & a - id %\end{array} %\right]. %\end{equation} % http://en.wikipedia.org/wiki/Rotation_group % http://en.wikipedia.org/wiki/Spinor Entry: Polynomial Composition Date: Wed Jun 24 18:46:25 CEST 2009 Type: tex I'm trying to give a meaningful interpretation for polynomial composition. While addition and multiplication have a straightforward interpretation, composition seems to have less structure. Composing two polynomials yields $$p(q(x)) = \sum_{n=0}p_n (\sum_{m=0}^M q_n x^M)^n$$ which doesn't have a trivial closed form. It is a linear combination weighted by $p_n$ of self--convolutions of the original polynomial $q(x)$. Let's look at the self--convolution by investigating the squaring operation $$p(x)^2 = (\sum_{n=0}^N p_n x^n)^2 = \sum_{n=0}^{2N}(\sum_{k+l = n} p_k p_l)x^n$$ and the effect of higher powers of a simple polynomial $$(1 + x)^M = \sum_{m=0}^M \binom{M}{m} x^m.$$ The result is a polynomial which is in some sense ``flatter'' than the orginal, in that it is more spread out. It wouldn't be surprising that the Bernstein basis[2][4] are a good point to start looking for meaning. $$ \begin{align*} 1 &= [x + (1 - x)]^n \\ &= \sum_{k=0}^n \binom{n}{k}x^n(1-x)^{n-k} \\ &= \sum_{k=0}^n b_{k,n}(x). \end{align*}$$ Doing some random searches online I don't find anything special about polynomial composition, other than iterated functions (dynamical systems) which have a very rich behaviour that is not easily expressed in terms of coefficient values. This leads me to think that maybe it is better to study the polynomials as opaque \emph{maps} instead of focussing on linear combination of polynomial powers. There are some special cases. The Chebyshev polynomials obey the nesting property $T_n(T_m(x)) = T_{n m}(x)$, as do $x^nx^m = x^{nm}$. It looks like it makes more sense to look at composition from the effect it has on polynomial roots. Let's limit ourselves to monic polynomials. With $p(x) = \prod (x-p_k)$ the composition is written as $$p(q(x)) = \prod(q(x)-p_k).$$ This constructs a new polynomial as a unition of the roots of the polynomials $q(x)-p_k$. Here each root $p_k$ \emph{moves} the roots of $q(x)$. % Is there a polynomial that, raised to a power yields a similar % expression? %% For $x \in I = [0,1]$ all the terms are positive which makes them %% map $I$ to itself. Let's collect other polynomials that are closed %% under composition on subsets of $\CC$ to get a better idea of %% composition. Chebyshev polynomials[5] are closed on $[-1,1]$. % I wonder what the link is between $p(x)^n$ with large $n$ and the % central limit theorema. % Also it is related to building higher order interpolating % polynomials from lower order (Neville interpolation [3]). EDIT: no % this is multiplication as it raises the order by one every % multiplication. % This is related to interpolation, for which the function $f_a(x,y) = % x + a (y - x)$ plays an important role. %[1] http://www.math.utah.edu/~pa/MDS/bform.html %[2] entry://20070505-215632 %[3] http://en.wikipedia.org/wiki/Neville%27s_algorithm %[4] http://en.wikipedia.org/wiki/Bernstein_polynomial %[5] http://en.wikipedia.org/wiki/Chebyshev_polynomials %[6] entry://../library/071ac18443ca8d29abe3e2efcf114b1b Entry: dna vs. static code Date: Wed Jul 1 15:41:36 CEST 2009 One of the discussions I find interesting is the static vs. dynamic debate. To simplify: Smalltalk vs. Haskell Is code a construct from which one can create statically linked executable code (early binding), or should the links allow late binding? This is very similar to what you find in nature: genotype vs. pheonotype. Apparently evolution has come up with a mechanism that builds an image (body) from code (dna), and regularly discards the image to rebuild it from scratch. It looks to me that this is mostly about efficiency of storage: genotypes are a _lot_ smaller than phenotype. Most of the information of the phenotype is a consequence of the structure of the compiler. However, without the compiler there is no semantics. On the other hand, the pheonotype contains a lot of information and is "self-hosted" on top of physics, but this huge information store makes it vulnerable to bit rot. Of course this ignores information stored in the nervous system, but the analogy seems to hold in that "similar" phenotypes can be created as long as the compiler stays intact. Here compiler also includes social and environmental information. Entry: The Point Date: Thu Jul 2 01:06:27 CEST 2009 I think it's time to stop wanting to do original research, and try to understand what is already there through study and building useful things. Entry: preserving data Date: Wed Jul 8 16:35:36 CEST 2009 This interpreter + data business hinted at in the previous posts and recent struggle with build tools made me think about something: there isn't any static semantics for data we're storing. If something doesn't build you're fucked. Entry: Loose ideas about embedded programming Date: Tue Jul 14 16:39:22 CEST 2009 Some scrap ideas and frustrations that I can't place yet (contains duplication). * Maybe it's time to look at the semiconductor design industry. I think it is safe to say that all a semiconductor design company does is to try to get it right the first time. It's a hard life. Getting it wrong means going out of business. The best way to do this is to write compilers and verification systems that try to catch all possible errors. * To me embedded software development is really about tools: languages that make abstraction easier and and make debugging more straightforward. The hardware industry has known this for a very long time already. Maybe the reason is that there is more pressure to get things right in a minimal number of iterations, and that this industry is not so much hindered by software standards. But when they provide language support, they provide C and really only C. * What saddens me however is that the "don't touch anything" frame of mind seems to be quite important in this endeavor. Systems are complicated, but once they work they are left as is. Is there anything to learn from this? Is real software necessarily messy? Like biology? * (About me not liking C) What tears me apart in some sense is that it seems to be about closed mindedness. At one point there is the point that you really can't beat C and Unix as a base platform. These tools are there to stay. This is in some sense good: standards are good and make development cheaper. What saddens me is that because of the structure of C and especially the C preprocessor, the commercial success of C and Unix (Linux in particular) have eliminated two variables from the equiation: it is no longer feasible to pick an optimal implementation language or operating system. C and CPP are just good enough to solve most problems with a bit of developper's pain. * About Staapl to Lisp, Forth and C people: The weird thing is that the basic idea seems to be completely obvious to me, which is probably the reason why I find it so hard to explain. The only people that seem to get what it is about are Forth and Lisp people, and they think it is fairly trivial: an interesting excercise but "No thanks, I'll roll my own." Most embedded software engineers I talk to find it esoteric and far removed from the real world. I've even been accused of talking like "management". So why is this? Why aren't people concerned about the tools they are using? Why does the current patchwork of ad-hoc glue code that is called a GCC toolchain actually work? I found the Forth and Scheme communities to be quite happy with the gem they discovered and generally unwilling (or demotivated?) to spend any energy in advocating the general idea. I find myself in the middle of two worlds. What I wonder is why this is. My disgruntled response is that embedded engineers should learn about programming language design, as this is very helpful in shifting your idea about writing programs to writing compilers / debuggers / profilers. If anything it helps you to get a good idea about how dynamic and static relate to each other: how debugging and reaching optimal implementations are diametrically opposed and how this beast actually has a name. * Maybe the problem (building a standard meta-system for C) is just too complicated, and maybe it is true that each application does deserve its own ad-hoc system. I don't think this is the case however. I think the problem is really worse-is-better: that the C preprocessor is just good enough to get the job done in most of the cases, if you find someone skilled enough to actually pull it off, with lot's of TDD traps to catch the awkward errors CPP can produce. * I really can't ignore the C++ template langue in this sense. It is equally horrible, also probably because it is not meant as a general purpose language, but is used as such. The idea is that a macro system should really be a proper language: otherwise people will find ways to circumvent it. You can't squash turing completeness. Look at syntax-rules vs. syntax-case. The problem is how to merge this with static checks.. * "Premature Flexibilization". YAGNI: Ya Ain't Gonna Need It. Writing managable software seems to be about being able to grow in the correct direction from simple seeds. You need a language (and tools) that makes refactoring easy. Start with concrete and move to more abstract later, but make sure it is possible, and _don't_ waste too much time on making things too general. Move generality to the development tools, not the application. This is close to Chuck Moore's philosophy of early binding + writing highly specialized code. Only languages and low-level libraries are reusable. Languages provide the "general" and libraries the "specific". The space in between is the realm of complex disposable glue code. Move languages from dynamic (debuggable, specializable, easy to write) to static (robust, specialized, easy to maintain). * I don't like the idea of large test suits for exactly this reason: they try to keep a dynamic implementation under control in the case where you really want to keep most of the program as-is: where the _overall structure_ of your program is something you're more interested in. In this case a stricter static implementation might be a better idea than to try to keep all things open sticking to a dynamic implementation. * Once your program is correct, this virtual layer should be eliminated completely (unless it provides optimality in another sense: i.e. memory usage). This should be possible for any part of your program that provides flexibility that is not needed in the end product. It should be possible to seal up the end product in a "blob" that does _only_ what it is inteded to do, and nothing more. Simple behaviour is easier to get right. Compile time complexity is managable. Unnecessary run time complexity is a horrible waste of time. * Anything that is supposed to be configurable in the end product (as a feature) should have the semantics of a real (dynamic) programming language, or at least have a possible growth path toward one. If you don't acknowledge this early on, it will creep up later in a very bad form. The basic idea is if you want to allow your users to tweak your product to their needs, it you should go all the way and pick a standard scripting language + some high level sugar on top of that to making writing "dumb glue" easy. * I define "embedded programming" as developping a system that is supposed to be sealed up and forgotten about: a localized blob of intelligence that does not require human intervention to operate. If at some time you find out your system needs to be configurable, it is probably best to expose this as a real, complete programming language, possibly the debugging inspection system properly sealed down to eliminate security issues. * Once you realize this dynamic vs. static problem, it is very well possible to do it using standard tools (C, shell scripts and some C code generation) as long as you find a way to keep the structure of the metasystem under control. * Embedded programming is about writing debuggers. * The moral for embedded software seems to be to focus on cross-interface binding. Move as much complexity as possible to compile time. * What Staapl does is to explore this paradigm, currently sidestepping the issue of having to read and write C code by focussing on two languages (Forth and Scheme) with a simple syntactic structure. Staapl's own code base has been largely on this dynamic -> static track. * While C itself is a rock-solid standard that is not going anywhere, anything remotely high-level built on top of it quickly leads to a jungle of incompatibilities and ad-hoc solutions. * I need to open up GCC or LLVM to give compilable intermediate code. * The theoretical problem related to this seems to be the unification of macros (general purpose compile-time computations) with type systems and program verification. This endeavor seems to be mostly about finding ways around the halting problem (you can't say what a program does without running it, once it has conditional jumps) Generic program inspection is undecidable, so clever selective simplification of higher level semantics is necessary. * The one thing which is really important in a large software system is its large-scale stucture. You want it to be brittle, so that refactoring (changing internal interfaces on one side) immediately breaks the other side in the most spectacular way. It seems that static type systems, due to their conservatism, are better bet here than dynamic test suits. * Note that this approach (dynamic bootstrap + static metaprogramming) probably doesn't work in environments with highly volatile code. Embedded programming seems to me mostly about writing components that stay relatively fixed once they are built. This is in stark contrast with systems that deal a lot with "arbitrariness of business logic", one example being web-based user interfaces. * While Staapl might not be ready for commercial prime-time yet, I believe the ideas behind it are quite solid. I would like to continue applying it to practical problems to bring it closer to where it might pay off in the real world: serving as a basic tool for a development team of a couple of skilled engineers, and as a basic tool in the development of specialized software libraries. Next to that I would like to see these ideas carried over to a C-based programming system, to make integration into existing projects more straightforward. This is not a simple task. I've been trying to find a proper attack angle for many years, but it seems I'm not there yet in my current skill set. * All the design tools I know will hide the metaprogramming behind some dumbed-down user interface. From what I know of people actually using these tools is that these interfaces are really complicated and behave quite a lot as "magic". An example that generalized to source code is The MathWorks' Real-Time Workshop, which is more of an actual compiler for a specific language. Is the problem they solve just too complicated to expose these systems as programming languages? This basic principle would make sure that the end user is not limited in the way the system is composed with other components outside of the designer's vision. Entry: Speed reading Date: Thu Jul 16 14:38:47 CEST 2009 I'm accumulating books and papers much faster than I can possibly manage to ever read. However, if I could read all the books I have I could probably solve some problems I'm encountering in a simpler way, because these books create the right view by stimulating thought in a certain way.. I've found some way to deal with this: to collect whatever might be interesting and take some time here and there to go over good books and articles and try to extract a single idea from them. There is one reason why I pick up a book: I know there is something hidden behind the stroll through the stacking of abstractions that I can instantly grasp by looking inbetween the lines and finding that important idea. Good books simply _need_ to contain a huge amount of scaffolding to clearly express good ideas. However, this scaffolding is only there to get you to the "aha" if you're disciplined enough to follow the reasoning. There are two reasons why I put down a book: I get bored by details because the exposition doesn't immediately come to the point (basicly, I can't find what I'm looking for) or I get saturated with the abstraction level of the ideas on the path/part I know I have to read to get the idea (meaning I am not "smart" enough yet to accomodate the ideas in my concept space.) I'm probably going to stop altogether trying to read books from front to back in great deatail. It takes too much time and is overall suboptimal. I find it better to combine speed-reading with spend most time first trying to figure out why a problem needs solving and then to go through the incremental buildup of abstraction once more. And yes, this is advice given to me by many a wise (wo)man. It's just something you need to feel first I guess.. Entry: Energy and angles Date: Thu Jul 16 14:54:49 CEST 2009 I've talked before[1] about energy as the thing that makes stuff rotate. This is probably a bit ill-put. Energy describes a space within which motion is possible, and in general, these spaces are compact, leading necessarily to some form of "rotation". Otoh.. there is the E = hν thingy.. [1] entry://20060505-005951 Entry: Scroderiders Date: Sun Jul 19 12:11:56 CEST 2009 In "A Fire Upon The Deep" ... [1] http://groups.google.com.vn/group/rec.arts.sf.written/browse_thread/thread/cc9fb4f47bb52343 Entry: a working day Date: Tue Jul 21 14:24:30 CEST 2009 All attempts to bring structure in my day have failed miserably. For "normal" work there is little trouble: just do it. For the stimulating but difficult tasks however, some management might be necessary. Mostly I just work until I can't work no more, which often leads to dissatisfaction. Since I don't really need to steer the good days, let's try to capture the bad ones: Problems: * I can't think. Usually these are related to not getting enough sleep, being in a stressful environment, and getting hung up on a dead end and not wanting to realize it. This involves _accepting_ the stuck state and _getting out_ either by doing something non work-related, or picking a simple rote task from the list. * I can't focus - not knowing where to start. Pick a difficult task and try to advance it. Don't start anything new! Starting new things is a symptom of badly organized "next actions". Do this in the morning, 1 hour after waking. * I have no inspiration. This happens usually in the afternoon. What fixes it is to simply fix a problem by making it look significant. Methods: * Leave rote tasks for later. There are plenty of moments where thinking doesn't work, and it helps to just relax and do something less demanding. Make a list of rote tasks. This could involve soldering, sysadmin at 1/2 speed, houshold chores, ... * Reading works better at night as long as the material doesn't need too many leaps outside of known concepts. I find I have more patience at night. Let's call this "re-enforcement reading". Try to focus on simple ideas. * Don't waste the productive hours of the day to do simple things. This will bore you to death and kill motivation. Entry: ar and order of .o files Date: Tue Aug 18 10:35:48 CEST 2009 ( How not noticing important small details makes you doubt everything you know.. ) I'm getting linker errors. The rules are very simple: when you're linking .o, .a and .so files: make sure the order is right! Usually putting the low level things on which everything depends in the back works best. Actually, I've never really understood how this works exactly - my mental model of it was good enough uptil now. My guess is that all .o files get included in their entirety, but only if they are somehow referenced beforehand. But.. Now I have this: tom@zni:~/libprim/ex$ objdump -t main.o |grep yy_start 0000000000000000 *UND* 0000000000000000 yy_start tom@zni:~/libprim/ex$ objdump -t sexp.o |grep yy_start 0000000000000609 l F .text 000000000000004f yy_start but: tom@zni:~/libprim/ex$ gcc -o main main.o sexp.o main.o: In function `main': /home/tom/libprim/ex/main.c:3: undefined reference to `yy_start' collect2: ld returned 1 exit status and: tom@zni:~/libprim/ex$ gcc -o main sexp.o main.o main.o: In function `main': /home/tom/libprim/ex/main.c:3: undefined reference to `yy_start' collect2: ld returned 1 exit status I don't understand.. Damn this sucks: look at the little "l" in the objdump output: it means *LOCAL*. Indeed, the function was decleared "static" through a macro. Entry: The Lost BASIC Generation Date: Fri Sep 11 09:11:07 CEST 2009 Dijkstra mentioned in one of his reports that BASIC has done a lot of bad in forming the minds of programmers of my generation (1975). I got my first computer at the age of 12. It was an MSX that booted up with BASIC. I think this is a valid point. BASIC disconnects programming from mathematics. I liked math when I was in high school. At the same time I loved computers. The strange thing is that I never saw the link between math and programming until much later in college. In retrospect, this is really strange. I remember one day trying to write a starfield demo, being able to express some mathematics, but not being able to turn it into an algorithm that actually computed a simulation. I've learned since then that this is actually quite normal :) Algorithms are strange beasts. And they are quite different from high-school math. But that's not my point. What I mean is that if somebody would have pointed me to functional programming or lisp in that fertile time of age 13-15, when I was discovering machine language and C, things might have been quite different. I got into electrical engineering, which satisfied my desire to do something with math (algebra, numerics, DSP, control, ...) but at the same time killed any growth in programming techniques. All I knew was C/C++ and this lead me to re-invent dynamic languages (mostly through Pure Data and later Forth and my Forth/Lisp hybrid PF) until I finally arrived at Scheme and the typed functional languages. It's only now that I re-connect math and software (not counting numerical algorithms based on linear algebra), after a long detour in C-land. Anyways. This detour did give me a unique perspective. I feel I am now where I want to be: in a position where I want to make _implementation_ of numeric code easier, through better understanding of programming languages and machines. I felt the pain of low-level implementation, but I've recently realized that this doesn't have to be the end of the story. Entry: Quit Caffeine Date: Sun Dec 20 12:55:51 CET 2009 I quit coffee last wednesday. Well, I had one, wednesday after dinner, but none during the day or in the morning. Waking up is more difficult, but I find it work mostly on my motivation: it makes me a lot more lazy. I don't get hyped up about what I'm doing. When I actually get to working (once motivated) it doesn't seem to have such a large effect: I'm still able to work, and I'm even a bit more patient when things are not going as planned. Overall: even if I feel a bit down and tired, it seems better overall to not be so hyped up the whole day. Entry: Overwhelmed by TODO Date: Thu Dec 31 10:41:34 CET 2009 My interests are getting a bit dispersed. I've got the day job now, so not much time to meander. Should I leave all of these on the back burner, and focus on a single thing? After quitting caffeine, the direct bursts of motivation in the morning are lacking. By now I should be recovered of any over-expenditure.. - fixing sweb: get rid of mutable graph structure - explore FPGA board + Xilinx tools - learn Haskell - fix SC (better VM) - port staapl to 16-bit PICs - fix staapl docs - make a standard forth for staapl - make a compiler for PF - do something with the OSD2 - autodiff and numerics - fix staapl's asm expression language dep - use staapl to build some uc circuits Some of these are really not trivial and require a lot of work. The bigger question is however, where do I want to end up? It's not really the caffeine: it's a combination of relevance and perceived feasibility which can lead to intrinsic motivation, or more specifically the perception of a possibility of incremental growth without getting stuck in problems that require a huge amount of current context. Currently the simplest task seems to be to learn Haskell. At least, I've found an incremental approach to it (book: ``Real World Haskell''). Second is the Scheme VM: which requires a single insight on how to refactor the current code without breaking it. Implementation should take only a day or so at full mental capacity. I am really craving coffee at this moment. Maybe I should use it only on the week-end, or only at home, not at work? I'm giving in.. Funny. Immediately I also crave pastries. Maybe this wasn't such a good idea after all.. Entry: Learning boring stuff - There's light at the end of the tunnel. Date: Thu Jan 14 22:41:51 CET 2010 Learning boring stuff is painful. I didn't really enjoy learning Java and the Android API. (And before that, even less EtherCAT and USB.) Now, after a very intensive month I'm comfortable with it. I can see the straightforward design (of Android) behind the horrible mess that is a large collection of Java callbacks in a language that doesn't have a succinct syntax for closures or support for proper tail calls. It does amaze me every time again _how_ painful it is to fill your head with details for which you do not have any intrinsic motivation, especially after having had the luxury to be able to learn a lot of interesting things the months before that new job. It's painful for a long time and then plotseling it is all obvious, and actually gets interesting as the mind that was bogged down integrating a mountain of details has freedom to think again and `be clever' using the recently learned tool, however ugly, in creative ways. Entry: Free time Date: Sat Jan 16 14:05:40 CET 2010 Scheme VM: Ha! I spent at least two quite full days to end up with something that needs some revision. It's not at all trivial! There are many design choices to make, some of them are hard to abstract away. Haskell: mostly reading.. See ``beautiful differentiation''. I still don't have a clear goal - except for a hardware mapper or a compiler like Staapl, where I would be more inclined to use OCaml/MetaOCaml. Sweb: mostly thinking that I should keep the current implementation as is, but separate out some modules as purely functional code (the parsers + formatters). Entry: Semantics is a stub Date: Wed Mar 17 09:48:38 CET 2010 Compilation versus interpretation, this diagram again: data int | stx ====> fun | : | comp : V : data' V stx' Interpretation is used to give meaning to syntax and its manipulation. I.e. to relate proof construction (logic) to function composition (semantics). On the more philosophical level one could argue that the real world consists of only structure and its manipulation, and no such thing as meaning meaning exists. Meaning is a "stub". I.e. the interpreter in the diagram above doesn't really map stx to fun. It is a formal transformation method that derives data' from (stx,data). Anyways.. Entry: Filtered Stock Market Date: Thu Apr 22 09:40:37 EDT 2010 What would happen if transactions were not discrete events, but finite time price integrals? A world without conquest? A world without predators? Only herbivores? Essentially, it doesn't exist: all food consists of transactions (i.e. blades of grass). What matters is the size of the transaction. Entry: TF-synth and nonlinearities Date: Sun Apr 25 19:50:09 EDT 2010 Type: tex The main issue in the ``analog modeling'' problem of music DSP is dealing with aliasing due to discontinuities and non-linearities. Using a dynwav TF representation, is it possible to solve this analytically? I.e. given a sum of sines and some sequence representation of a nonlinear function (i.e. power series), can we compute the spectrum directly? Briefly touched before in [1], this is the realm of function composition. Given a signal $x(t)$ and a wave shaper $y(x)$, a new signal $y \circ x$ can be constructed. The interesting question is: how does $y$ need to be represented when we have a representation of $x(t)$ as a spectrum sequence, and we want to get a similar spectral representation for $y(t) = y(x(t))$? Is it possible to cheat by picking a particular representation of a periodic signal? I.e. start from phase modulation, and compose the modulators? If the wave shaping function $y$ is closed on the interval (the interesting ones that \emph{squash} a signal), there might be a way to compose shapers. I'm looking at this in the wrong way. It's probably simpler to see this in terms of roots: polynomial composition turns a single root into an n--fold root by \emp{shifting} the other poly according to the roots of the first. With $p(x) = \prod_i (x-x_i)$ a composition $p \circ q$ with $q(x) = x + \epsilon$ gives $$(p \circ q)(x) = \prod_i (x + \epsilon - x_i).$$ %[1] entry://../math/20090620-152527 Entry: There are no events Date: Fri Apr 30 16:55:16 EDT 2010 Maybe this is what is necessary to get a better understanding of many things related to music and DSP. People need events to think (words, language). Computers need events to work, they essentially _are_ the concept of discrete, directed time. However, such structures do not tend to lend themselves well to analysis because they are ultimately local, with no (obvious) connection to the global. They do not tie into their surroundings like continuous systems do. Working with continuous systems allows events to be decoupled from parameters. The latter are still a discrete set. Entry: About abstraction Date: Mon May 10 16:49:19 EDT 2010 Non-obvious obvious stuff. (NOOS?) If you have a concrete approach your eventual much more abstract approach that you'll spend the whole night looking for is probably going to reduce to it. It won't solve the problem better, but it will make your original low-level solution seem pretty obvious. Probably, but not always.. That is, not if you missed to exploit an essential redundancy in your low-level approach. The question is: is this high level pattern-seeking activity worthwhile or not? From my experience, in the real world, where things just need to be done once and as quickly as possible, it is not worth it. However, by solving the problem abstractly, you will be able to solve more related problems and get a better understanding. Entry: The day Date: Mon Jun 21 12:02:43 CEST 2010 morning: input. study: absorb knowledge through asking the right questions. more idiosyncratic and stubborn, but open to external input. noon: disappointment and confusion. can't do it all - do some mundane stuff instead. day: output. creative action, problem/self driven. very idiosyncratic. brain seems to be organizing existing knowledge. not open to input. evening: input. study: absorb knowledge through passive listening / more tolerant. Entry: What is a difficult problem? Date: Fri Jun 25 14:17:10 CEST 2010 One that has a difficult solution. Let's see: - simple problem: a solution is clear intuitively at a glance - complex problem: a factoring into simple problems is known - intractable problem: no factoring is known How to become an expert? * Know the simple solutions of your domain. * Don't build unnecessary complex solution through ill-informed decomposition. * Build more intuitive understanding of composite solutions by familiarizing the ``inbetween space''. * Recognize the smell of intractability. Entry: Simplectic updates Date: Thu Jul 8 10:17:52 CEST 2010 (By lack of better name.) Probably completely unrelated to the previous post, but this makes me think of "simplectic updates" where a system state transition is factored in two steps: a position update and a velocity update (two triangular matrices). Such a discrete simulation of a continuous system keeps intact certain invariants (while still making slight errors). I wonder how that relates to Shur/Levinson style forward/backward prediction. I suspect it is actually the same.. Entry: Selling stuff - the cost of wear Date: Sat Jul 17 15:41:30 CEST 2010 I'm having trouble getting rid of my old music equipment. I don't think there is too much emotional value attached, so why don't I want to sell it? Isn't it simply conversion of (useless) material goods into cash? The problem is price. How to think about it? Price should be the amount it would cost me to buy the same kind of gear back when I need it. Most of it has significant wear, and it is still more worth to me than I can probably charge for it. I.e. the burns are not worth the devaluation to me: I have an incorrect concept of the cost of wear. Maybe it is emotional value then: to me the wear is "character", but to a buyer it is just devaluation. Entry: Dynamic range Date: Mon Aug 2 08:30:25 CEST 2010 Playing a bit with my analog synth, and comparing it to what I can do with the Behringer digital encoder box, it's clear that solving the Dynamic range problem is the essence of making a good controller / synth on a digital computer. Using analog pots, making small adjustments is really simple. The effective resolution is very high. So, the question is: can this be reproduced digitally, either using dither or something (noise doesn't seem to be so much of a problem; average is much more important). Alternatively: what is needed to have a high-res analog controller? Entry: Binary decorrelation Date: Mon Aug 2 08:46:53 CEST 2010 What about simply using dithered PDM? S-D PDM gives periodic waveforms. Dithering could at least de-correlate the phase of these. Is it possible to see S-D PDM as a form of frequency-dependent phase width modulation? If the modulation can be made more explicit (i.e. what comes out has some more properties) then maybe decorrelation is feasible. Entry: PDM: Binary Sequences and base functions Date: Sat Aug 7 17:24:44 CEST 2010 Instead of trying to start from signals, is it possible to start from a set of base functions? Can signals be represented as a (linear?) composition of primitive signals with certain properties that carry over after combining them? The problem is that the useful linear operation (XOR) doesn't carry over to a linear operation on the densities. (add -> mul). Can we use logarithms? Entropy? From afar.. If F(a+b) = F(a)F(b) then F behaves as an exponential, or + is really a multiplication. For XOR this is indeed the case. (quack quack) Entry: int profile Date: Tue Sep 7 16:22:29 CEST 2010 ubt orifuke (right hand misaligned) Entry: DLL hell, API versions, reusability and modularity Date: Fri Sep 10 13:09:08 CEST 2010 I think the important element isn't really reusability - it's modularity: being able to oversee a design by minimising coupling. Explicit interfaces are necessary for large projects, but they add a positive feedback loop that leads to even larger projects. What is necessary is "global API optimization": some mechanism that feeds back "global consequences" of APIs back into the APIs and _forces everyone_ to use them. It's a bit like garbage collection and finalization: you need global information (who has a reference but isn't using it) why you perform a local operation (i want this resource now). Let's call this the "global feedback problem" (GFP) and hold it responsable for all problems in computer science. Yeah. GFP you suck. From the other side: some local decisions have huge global impact, so in order to make good local decisions, you need to know their impact factor _before_ you make the decision. Is there a way to somehow express this more formally? Dependency graph of API decisions? Entry: shell Date: Sun Sep 19 00:26:58 CEST 2010 It's just a shell. Do not forget; it's easily shed. Entry: I'm enjoying electronics Date: Wed Dec 22 17:03:49 EST 2010 I've always been intrigued by electronics, but until recently the practical side of things didn't work so well. It requires a certain kind of "observational consistence" that doesn't work so well for my mind. Essentially, I'm not a good scientist. Like my mother and now my wife say: I don't look! I guess that's why I like programming better: more consistency, more reliable and redundant tools, easier debugging. With electronics, everything has to be right or it simply doesn't work with very little feedback. Finding that error feedback seems to be 95% of the design process. Quite interesting and humbling, coming from paper-only design! Entry: Basic equation solving Date: Thu Dec 23 19:34:52 EST 2010 I've become quite bad at the on-paper manipulation for solving basic sets of equations. I guess I need to practice more. I always was sloppy, making many errors doing manipulations. It's gotten much worse though. Entry: I'm fearful with money but fearless with time! Date: Tue Dec 28 22:07:13 EST 2010 So what does a person need in this day and age of industrialized madness? [1] http://www.moralquotes.com/index.php/a/2008/08/09/life_is_an_ongoing_process_of_choosing_b Entry: Microwork Date: Wed Dec 29 02:05:54 EST 2010 It would be great to create a more direct correspondence between need and effort for the more abstract jobs. Too bad there isn't a way to perform freelance skilled work on the micro-scale, i.e. I need for this gizmo, so I work for 1 hour on something else. Problem is that setup time is usually way too large, demand is in the form of: "want everything here, now" and the routing problem way too complicated. Skilled work needs "storage space" in the form of credits for time spent. Entry: Serendipity Date: Fri Dec 31 01:28:10 EST 2010 Feyman's trick [1]: to keep a number of problems in your head such that when you learn a new trick you can see if it works with any of them. This generalizes quite well: work on many things at once, and switch between them often. I've found that getting stuck (or getting bored) means you need a break. Work on something else. If it's a real problem, and it's within your reach to solve, the solution will often come just by itself. The reality is that problems don't get solved by thinking hard. They get solved by creating lots and lots of context and allowing solutions to bubble up. I.e. be prepared, know your stuff, and you will find the way :) [1] http://www.hooversbiz.com/2009/01/16/keep-big-problems-in-your-head/ Entry: Integer skipping Date: Sun Jan 2 12:55:18 EST 2011 This is fairly vague and serves mostly as bait / rehash : I have a way to express the solution to a numerical problem exactly (as integers / rationals). However, the number of digits in the solution grows (linearly) with time, as each time step involves a multiplication by a rational number. How to "renormalize rationals". More concretely, it involves a linear autoregressive system in the rationals. The main idea is that it really needs to be exact, as it involves subtraction that would kill accuracy if it is done in limited precision. This is very similar to the relation between harmony and melody, i.e. how harmonically ("semantically") different intervals can be glossed over and be substituted with other ones. See also "almost equal composite numbers" [1]. [1] entry://../math/20110102-130526 Entry: Interested in number theory Date: Sun Jan 2 20:15:49 EST 2011 Mood: intrigued, tired, a bit dull/dumb. It irks me that I don't "get" number theory. I mean, I don't get the point. I can see that it is really useful to know a bit about the integers and how they interact, but it seems that the structure you find when you start looking is very "jagged". Compared to "smooth" math that pervade engineering disciplines such as basic calculus and linear algebra over complex vector spaces. And yes, pun intended wrt. smooth. The discrete is truly mysterious, and the continuous seems to be only mysterious in the way it interacts with the discrete. Entry: Our audio/video perception is not 1st order roll-off Date: Thu Jan 6 13:10:05 EST 2011 What I mean is that some perceptual boundaries seem to be quite abrupt (possibly with a little bit of hysteresis due to attention). This contrasts with the "gradual" roll-off exposed by physical systems. I.e. there's no way we can hear 40kHz sound, no matter how loud it is. If our system would be 1st order with fixed noise floor, simple amplification would lift it over the threshold. FIXME: find a better way to express this and relate it to some actual measurements of hearing / vision. Entry: Bytes in a row Date: Tue Mar 15 15:01:26 EDT 2011 There's this word I keep forgetting. It is used to indicate that a bunch of data is stored at addresses without gaps, i.e. one after the other. My mind switched to using the word "consecutive", somehow blocking the recall of the word I'm looking for. Annoying, especially because this keeps happening, especially when I'm tired. I found the word, it's "contiguous"[1]. EDIT: Maybe I should name my band "contiguity theory" or something, so I won't forget. It just happened again.. [1] http://www.merriam-webster.com/dictionary/contiguous Entry: On getting lazy Date: Sat Jun 11 10:30:27 CEST 2011 I think it's happening to me. Why I don't know exactly, but it has to do with the "wrong kind" of stimulus. When I'm left idling and "tricked" into remembering how good it feels to find things out, many of my old questions and projects come back. I have a lot of them, and the seeds are all over [1]. Thinking about it, there have been two kinds of wrong stimulus: - Small things to fix, i.e. infrastructure maintenance. - Payed contract work They are not "bad" by themselves, but they take away the combination of boredom + freshness that are necessary to get to motivation. Basically, they stop making me think hard. I'd like to figure out how to fix that. [1] entry://../topics Entry: Getting Stuck Date: Sat Oct 15 09:40:56 EDT 2011 Well voiced[1]. I have this all the time writing low-level system drivers (reading datasheets and OS manuals/code). Also when learning new things, at this moment that's Haskell's type system. Whenever a mountain of complexity threatens to burry me it feels as if the last thing on earth I want to do is to get near it, though mostly this is not a conscious process, just as Jeff describes. Getting conscious of this problem and accepting it helps overcoming it. [1] http://www.jeffwofford.com/?p=835 Entry: Commutation Date: Sat Oct 22 15:22:55 EDT 2011 I've been playing with Haskell type classes for representing meta languages and it seems that most of the time expressing functionality is about expressing commutation relations. An interesting example of this is lambda and application for an embedded simply typed lambda calculus. -- Simply typed LC class LAM1 r where lit1 :: t -> r t app1 :: r (a -> b) -> r a -> r b lam1 :: (r a -> r b) -> r (a -> b) Here app1 and lam1 deal with the commutation of r and (->) type constructors. Entry: Structure can't hide Date: Sun Oct 23 12:38:02 EDT 2011 The crazy thing about mathematics is that even if you don't really know what you're doing, you can't really do anything wrong. I mean, you can make meaningless things that don't work, but you can't make things that _seem_ to work but don't. Of course this is obvious, but if you look at this from the perspective of a low-level C programmer that is getting beat up by the Haskell type system, it is actually quite amazing. Entry: Learning haskell Date: Mon Oct 24 13:54:22 EDT 2011 I keep running into constructs where I can't get the details right. Mostly this has to do with commutation of type constructors: a (b c) vs b (a c), or mixing up functionality. Is there a way to abstract this pattern of misunderstanding? Or are some days just prone to not working out right? Often when I do get mixed up, I can't really focus on other things that are usually not such a problem. Is it just focus? Entry: Getting confused takes a lot of energy Date: Thu Oct 27 11:13:03 EDT 2011 I see this pattern pop up almost every day. If I stay in flow I can work for hours in a row. If I stumble out of flow and get confused, I get tired very quickly: my brain just stops working and I feel quite uncomfortable. Entry: mutations Date: Sat Oct 29 20:09:11 EDT 2011 I'm generating syntax trees with a state continuation monad. I get it wrong often, and this creates trees that are not quite right in connection. The difference in code is sometimes subtle, leading to widely different results.. Entry: Kernel vs. Periphery Date: Wed Nov 2 23:13:11 EDT 2011 Many interesting systems have a structure with a highly intertwined core, where it's not clear where the center is, and a periphery that's clearly focused on the center. I.e. a programming language: lots of "feedback" on the inside, but usually libraries have more directional / bottom up dependencies. Does this pattern have a name? Entry: reverb: fractal compression? Date: Mon Nov 28 20:56:34 EST 2011 Take a real impulse response, and try to compress its tail. Entry: Studying programming language theory ... Date: Thu Dec 15 22:13:17 EST 2011 .. messes with your head. It teaches you the habit of turning things upside down, seeing everything as the same, or just a little bit different. I'm referring more specifically to writing compilers, transforming data structures. The whole code is data business. Maybe I'm just educated in a slanted way but this thinking sets me apart from most of my collegues, who are mostly interested in getting the work done. Actually, maybe this means some of the madness I thought I lost is coming back? That kind that makes many mistakes and seems out of control but produces a couple of gems here and there, hidden in debris. Entry: Spacial density interface Date: Tue Mar 27 14:39:12 CEST 2012 I never liked vector graphics, but I wonder, how to get a high-bandwidth interface to the brain other than using visuals or permanent surgery? Instead of thinking in objects at the level of representation, it might be best to let the brain do its object-recognition. This is how medical imaging works: give an MD some tools to see certain things and his/her brain will be trained to do the pattern recognition. So what data visualization should be about is just ways to project information into 2D or 3D images. 2D is straightforward, but can we do 3D? I.e. density maps? I'd like to find out how to do this density map thing with some 3D glasses. What's the thing to buy? Maybe that's a nice goal to apply that lurking DSP knowledge.. Entry: Bigger keys Date: Thu May 3 12:48:34 EDT 2012 Got a KeyTronic KT800U2 big keyboard with big keys. It's been a while since I've used one of those. Need to learn how to type again. Curious if it will make a difference from the flatter kensington. I'm not sure... maybe previous was better.. Let's sit it out for a bit. Entry: Natural flicker, 4Hz Date: Thu Jun 7 14:09:15 EDT 2012 This is going to sound crazy, but in my meditation exercises I've been seeing a "natural flicker" turn up in my visual field lately. More about that here [1]. But what I have seen before is that this range of frequencies is at the boundary of "events" and "vibrations". Stay below 4Hz and it feels like a rhythm with distinct beats, go above and it starts to feel like a continuous stream of vibrations that has its own quality, some mix of discrete events and continuous "buzz". Of course the transition is not a vague one, but it can be clearly experienced when doubling up. Try making a drum pattern of 4 Hz (240 bpm) and then doubling it to 8 Hz. There is a change of quality there that doesn't happen in the the jumps from e.g. 2->4 and 8->12. At 16Hz it starts to become less of a "vibration" and more of a "sound". So I'd say, vibration is between 6 and 16 Hz. Does this have something to do with the resonance frequency of the human body / flesh? I did find this[2]: RF of human eye is about 18Hz ;) I wonder if by making this frequency range more explicit in thinking about music, it could lead to some different way of composing rhythm? [1] entry://../dharma/20120607-141016 [2] http://en.wikipedia.org/wiki/Infrasound Entry: USA Dutch? Date: Sat Jan 26 15:48:42 CET 2013 I heard yesterday that almost, the main language in the U.S. was set to dutch. I don't know if this is true, but maybe that's why U.S. english sounds so "kaks" :) urr urrr :) Entry: Perpetually confused Date: Mon Feb 18 13:09:04 CET 2013 I seem to do better in a role of perpetually confused student, as opposed to tutor or consultant (in the original sense). In a sense, I perform better when I don't 100% understand what I'm doing. It takes a long time to get to mastery, simplicity. Entry: Appreciating research Date: Sun Apr 21 10:07:04 EDT 2013 The best way to appreciate research is to try to solve a problem yourself, realize it is not easy, and then read a real solution. I found often that my arrogance gets in the way, underestimating the complexity of the problem, or completely missing some structural element that can be exploited. Entry: Distributed resource allocator Date: Sun Dec 13 19:37:45 EST 2015 The problem with trade, is that it is hard to do without a universal placeholder such as money. Is it possible to redefine money in terms of computer-assisted trade interactions? E.g. a system that keeps "score", to eliminate abuse. E.g. is it possible to introduce "local virtual money"? Is it possible to reconstruct stable money without state intervention? Something ephemeral, that can't be stockpiled? Maybe the stockpiling is the real problem? Problems: - generalization: you want an apple, here's a pear - trunctation: can't bring 100 people together to trade - time management: drop off, pick up sequencing - distributed architecture (this needs to run server-less) Entry: Do we not see glue code? Date: Fri Feb 12 15:16:33 EST 2016 Or glue logic. Whatever. The thing that's omnipresent in coding is connection and transformation of arbitrary transformations, i.e. the implementation of isomorphisms, projections. The "actual stuff" that's in a representation is merely a mental model, but it is that model that is important in reasoning about how data is combined into new data. I spend a very large amount of time shuffling between transformations. Is it possible to reduce that? Entry: Everything is broken Date: Sun Feb 28 21:01:54 EST 2016 Drunken idealist speaking. What can be done about it? The problem is interaction: APIs are incompatible, and state assumptions are wrong. There seems to be only one way, which is to build an AI that can do refactoring. EDIT: 2016/12/27 Interestingly, there seems to be some merit to setting up "chains" to solve resource allocation for tasks, e.g. where to run code based on optimizing I/O overhead. Entry: Programming is too low-level Date: Tue Mar 8 12:34:53 EST 2016 What I find is that still, I spent most of my time finding neat things to do given the tools I have. There is no true purpose in this other than the joy of finding things out. I get most confronted with this in the synth work. Bogged down in infrastructure, continuously. All previous attempts to build a workable programmer-computer interface have not reached the point of actually reducing the barrier. So what is the real problem? Entry: Interfacing Date: Mon Dec 26 17:56:13 EST 2016 ( slight manic freeflow - maybe there's something there. ) Interfacing has been the problem ever since I started programming: adapting formats so meaning can be connected. Shared meaning = connection after unpacking, translation of interfaces. The shared meaning is there because the translation exists. Human thinking seems to abstract that translation layer, which is what makes programming so difficult: making the translation explicit and adapting the small difference in _actual_ meaning is where the bulk of the work goes. Because the bulk of the work is in the translation, it might be argued that meaning itself resides in the encoding/decoding process, connecting internal representation to effects in the world. 99% of the work I do is ad-hoc translation. Computation and algorythm by itself is only a small subset. Is there a way to remove this boilerplate once and for all? ( The funny bit is that the last couple of posts talk only about this. ) Entry: Ideas with type errors Date: Sat Jul 21 22:21:53 EDT 2018 Wine night, so free association abound. Why I need a framework: my mind generates a lot of nonsense, but if I'm living in a formal system, that really constrains it enough to get to something. My mind generates a lot of ideas. Most of those are nonsense. Having a formal system right in front of you (a "nature"), really helps to quickly test them and weed out the junk. Without that, it goes astray very quickly. Entry: What about this log Date: Sat Jul 21 22:24:23 EDT 2018 It's been a while. What's in here? I haven't read yet, but I think it is about "intuitive madness". I.e. what happens when ideas that are based on some fundamental misunderstanding are allowed to proliferate without checks. Yeah started reading and this is indeed insanity. But there might be some leads in here. What it needs though is some formalization. It helps to write (Haskell) programs to encode some of this.