Thu Mar 31 16:04:33 EDT 2011

Looking at "kb"

The "kb" display is quite slow.  I'm looking at it at the scope and
while there is still a 3ms delay in the sync due to more than one
packet begin used, there is an extended period where the host is
polling but there is no reply, about 50 ms.

That's 500 instructions.  That doesn't seem right.  Each chkblk loop
is only 64 bytes.

Wait.. The loop is 5 instructions.  That's 6 clocks per iteration
including the 2 clocks for the branch.

 	0218 0009 [tblrd*+]
	021A 50F5 [movf 245 0 0]
	021C 14ED [andwf .L111 0 0]
	021E 06E7 [decf 231 1 0]
	0220 E1FA [bpz 1 .L116]

Actually, that's already 38.4 ms for just the loop, so that indeed
makes sense.  So why is the display so slow?  On the serial line it's
much faster.


Typical delays:

       3ms      2ms                4ms              3ms
  sync <-> send <-> receive_header <-> receive_body <-> sync

The actual byte transfers are almost not noticable as they have 3us
clocks and are in the order of 100us total length.  This is far from

I don't see a simple way to fix this, so for now this will have to do.
Maybe lower the clock speed too as that doesn't seem to have much
influence either.

Yep.  Switched to 100us and can't see any difference, except that
there is a bit less idle bus time.

I'm not sure how to fix this..  Sending larger messages and using less
handshakes will help, but it seems to be an inherit problem with the
pk2 as there is no pipelining to hide the 1ms usb bus clock.

Hmm.. That's not really true.  In one direction it pipelines just
fine.  Sending 2 x 26 bytes transfers spaces them 1ms apart with only
200us wasted space.

So it does burst well..  The problem then is the pingpong handshake.

So how to fix?  The only annoying part is bulk read and write.  These
can probably be optimized by using larger packets.

Ok, so roadmap for faster pk2:

  - Use RAM buffering for program upload, send one page at a time.

  - Make checkblock work on 1 k blocks.