Making a couple of changes. 1. Loop vars are explicitly typed as Int 2. First pass keeps the current loop stack. What about this: perform the lifting in the first step, and perform the optimization (re-using internals) in the second. Hmmm... makes me doubt the whole auto-lifting approach again. It seems simple in the abstract: introduce references.