Tue Jun 9 11:13:03 CEST 2009

Sigma-Delta analog synth

I think Sigma-Delta[1] modulation is a great idea.  The idea behind
this method is what triggered the previous posts.  It has been part of
my lingering background noise for a long time, influencing a lot of
the ideas that went into the Sheep synth.  This modulation scheme has
an aestetic appeal I can't really describe.

So, to get to know the modulation form a bit better, I'd like to build
a modular analog synth using this representation for signals, by
patching up a couple of PIC and discrete logic chips.  However, this
is probably more suited to FPGA implementation.

Let's see what operations are supported:

  * switch/mix:     interleave two streams
  * distorted amp:  perform AND/OR on an unbiased signal
  * inversion:      XOR with a stream of known average
  * differential:   pulse->edge: amplitude to frequency conversion.

Some real-world hacks on top of this:

  * A->D:           PIC's 12-bit AD


Instead of using higher order filters as in an SD tailored to
accomodate a low-noise band of arbitrary shape, we use an n-bit
wrap-around accumulator as a discrete integrator.

To convert an n-bit signal to this representation, simply accumulate
it.  This yields a "digitial phasor" which rotates around 2^n states
over time.  Observing the carry bit of the accumulation as a binary
signal yields a signal with the same average as the original signal.

The reason this encoding can be used as a representation of audio is
that the error is small for low frequencies (where it matters).
Almost all encoding errors are present in the higher frequency bands.
Using higher order integrating filters will give better results if the
objective is good quality audio.  See [1] for more information.

However, for us the simplicity of the representation is more
important, since we want to use it as a computation substrate.  Its
connection to frequency modulation is of key importance.


Converting from a bit stream to a signal stream can be done using
accumulation and subsampling.  Some information is lost in the
modulated form (high frequency spectral content) and when converting
back to a stream of numbers, we'll loose some more.

Averages: 0, 1 and 1/2

There are 3 special signals to distinguish:

  signal   average
  0000...   0
  0101...  1/2
  1111...   1

The all-zero signal corresponds to a phasor that doesn't change.  The
alternating 0/1 signal corresponds to a phasor in a binary
oscillation, which corresponds to an input signal of 2^(n-1).  The
latter however is a limit case and cannot be generated by the phasor
mechanism.  The maximum is a signal with a 0 bit once every 2^n 1
bits, reflecting the maximal input signal amplitude of 2^n-1.

To simplify interfacing to the analog world, let's call 0101.. the
reference signal (zero).  This allows 0000.. to be the clip min and
1111.. to be the clip max amplitude.


In order to calculate with SD modulated signals and prove some of
their properties we need to clearly define how such binary sequences
relate to sequences of real numbers.  It is remarkable that magnitude
is encoded in an single event stream.  How to fish out the magnitude

In first approximation we forget about time altogether and interpret a
binary sequence as draws from a random variable.  This allows us to
work with the expected value defined as:

    E(a) = \sum p(a=x) x

With x ranging over the values {0,1} this simply reduces to

    E(a) = p(a=1)

This mathematical model connects to the messy physical world by
observing that a finite average of a sequence of a_i samples
approximates E(a).  Anything we say about the statistical property
E(a) in some sense carries over to what the estimator produces.  In
practice this means that the sequences can represent time-varying low
frequency signals by cutting off the integration at a particular low


The logical AND of two signals behaves as attenuation.  A 0 in one
signal can cancel the transmission of a 1 in the other.  As long as
this doesn't lead to cases where a disproportionate number of 1s is
canceled due to correlation of the two bitstreams, the effect on the
expected value is to be attenuated by an amount proportional to the
expected value of the other signal.  

Representing AND as real number multiplication on the set {0,1} allows
the expression of the expected value of the product

    E(ab) = \sum p(a_x, b_y) a_x b_y
for events a_x and b_y ranging over {0,1}.  Of these 4 terms only
p(a=1,b=1) produces a nonzero contribution.  This can be simplified to
E(a) E(b) if the random variables are independent[3] meaning

    p(a_x, b_y) = p(a_x) p(b_y).

Note that "time slots" erased in a signal can be filled up with those
from another one, which might be created through an AND operation in a
similar manner.  This then provides a means for implementing addition
without the need for recoding.  This naturally leads to consider the
interpolation of two signals a and b, by a third signal x

    OR(AND (x,a), AND(~x,b))

This fills up all available time slots by multiplexing the signals x
and y proportionally to E(x).  A special case of this happens when b =
~a, leading to the XOR of two signals

    XOR(a,b) = OR((AND(x,a), ~AND(~x,~a)))

This interpolates between a signal and its polarity reversal.

Note that multiplication with factors >1 is not possible locally.  The
signal would need to be demodulated into a form where extra energy can
be added, which would then result in this being spread out over time.
However, it might be possible to construct some kind of buffered
mechanism that merges two streams, ensuring conservation of energy.


Signals coming from an SD modulator are very repetetive, which means
that they can become correlated, breaking down multiplication.  In
order for this to work some form of decorrelation must be added.
Every modulator could be equipped with a dither module, which would
also improve the SNR.

An analogy

(Thu Oct 19 22:21:22 CEST 2006) Think of an SD-modulator as a bus stop
where on each timestep (say 10 minutes) a bus either leaves or not. A
bus only leaves if it's full, and the number of people that arrive in
a single time step is never greater than the capacity of the bus. This
way, the bus stop never needs to accommodate more than one bus worth
of people. Once i got this picture in my head i couldn't get rid of it
any more. It clearly illustrates the conservation properties of a bus
stop, which implies the SDM preserves averages and low frequency


I've read some papers by Philips people about the SACD..  It included
some operations that can be performed on an SACD stream without
decoding it.  What I remember is the use of interleaving to mix two
streams.  For other operations they always used (de / re) modulation
around an ordinary n-bits representation.  I found this[2] in Chuck
Moore's site, which is related but talks about analog domain VCOs.

[1] http://en.wikipedia.org/wiki/Delta-sigma_modulation
[2] http://colorforth.com/immunity.htm
[3] http://en.wikipedia.org/wiki/Statistical_independence