Software Architecture / Engineering Notes about the practical side of building software and getting it to where it needs to be. This is a counterpart to more technical issues which are handled in the comsci notes[1]. [1] entry://../compsci/ Entry: Coding The Architecture Date: Mon Mar 1 16:57:49 CET 2010 I'm reading some on here[1]. First impression: close gap between architects and implementors. What is architecture about, really? The role of a software architect: DEFINITION: * Management of non-functional requirements * Architecture definition * Technology selection * Architecture evaluation * Architecture collaboration (?) DELIVERY: * Ownership of the bigger picture * Leadership * Coaching and mentoring * Quality assurance * Design, development and testing [1] http://codingthearchitecture.com Entry: Architecture document Date: Mon Mar 1 17:08:15 CET 2010 Summary of [1]. 1. An outline description of the software architecture, including major software components and their interactions. 2. A common understanding of the drivers (requirements, constraints and principles) that influence the architecture. 3. A description of the hardware and software platforms on which the system is built and deployed. 4. Explicit justification of how the architecture satisfies the drivers. [1] http://www.codingthearchitecture.com/pages/book/software-architecture-document-guidelines.html Entry: What did I learn this time? Date: Sun Mar 21 10:06:39 CET 2010 The project I'm about to finish was something new to me. In some sense it was quite frustrating as it wasn't much about design and programming, but more about communication and politics. All my work up till now has been higly technical: understand the problem deeply, write a solution, debug, iterate. This time the technical challenges where minor, apart from absorbing knowledge about a new system and its quirks (Android) and a new programming language (Java). Most work went into: - Very difficult path from requirements to specifications. Things seemed obvious at first, but workarounds for secondary effects out of our control greatly complicated the design and implementation. New use-case requirements kept popping up. Lack of adequate tests to see what is really needed. - No real feedback from particular use case to general requirements. This seems to be largely a consequence of hierarchical way of planning and making decisions common in large companies. This project was special as it was a small one, not important enough for upper layers to bother. - Documentation. Getting the point across is difficult when using concepts from functional programming in an OO-oriented shop. Writing good design documentation is hard. Keeping it up to date is harder. Make sure that documentation and implementation are linked! Turn every question ever asked into a FAQ item. - Struggle with tools. The android development tools are not very good. They are slow, buggy and error-prone. The large write-compile-test cycle makes working with this system quite frustrating. - Meetings drain energy. My efficient work modes are problem solving (long stretches of deep concentration) and process management (keeping and eye on and streering a lot of not too complicated semi-automatic processes). Dealing with people and disagreements is not something that fits into this plan. It is as if _everyting_ in my mind needs to be swapped to secondary storage to make room for the social interaction rules. Entry: Robust filesystem Date: Sun Apr 24 21:59:35 EDT 2011 I'm burning my fingers on a current project. Some things that went wrong: - No specification, evolutionary what-can-we-get-away-with-cheaply design. Considering the application that was actually not such a bad approach: functional requirements where very simple and straightforward. Eventally there was one ill-specified requirement that caused a bit of complexity. - Complete underestimation of non-functional requirements: robustness. - Difficult refactoring: merging two subsystems that where "almost the same" caused many headaches when they where actually placed on top of the same abstraction. - Splitting another part of the code into separate modules proved difficult due to insufficient understanding of the coupling involved. - Premature optimization. Not for speed, but for memory usage, in this case disk buffers. This lead to an implementation that was very hard to change. - Inadiquate test suite: the stateful nature of the system, and the nature of the errors that should be recovered (lots of invalid intermediate state) makes it very hard to test. If I can name one major point, it is state. The system as it is at this moment has too many degrees of freedom. This makes many things very difficult: * Testing: almost impossible to cover all corner cases. * Change: the higly sequential nature of operation makes it very difficult to separate responsabilities over multiple objects, or perform simple, incremental changes. * Ownership: at least one bad factorization is due to unclear ownership of data structures. * Temporary storage management: A non-functional requirement is to use a minimal memory footprint. If state is the problem, the solution I imagine is almost immediately to switch to a transaction-based approach where pre and post conditions can be expressed clearly (even if only in the test suite). In a fully transaction-based approach, there is a very simple, even _stupid simple_ way of handling errors: whenever it fails, retry. In the current implementation that approach doesn't work completely: physical errors are system state changes: the system is changed from a consistent to an inconsistent state. The difficulty is in recovering from that. Entry: Robustness : external mutators Date: Sun Apr 24 22:31:05 EDT 2011 Follow-up of last post. How to make a data storage system robust? - In a reliable system (no externally introduced faults), inconsistent state has to be a consequence of bad design, as there are tried and true principles to build such systems correctly. The main idea is to clearly define what a state change is, and allow it to occur or not, but never show to produce any intermediate state. This is captured in more detail in the ACID[1] rules used in database design: * Atomicity: a transaction succeeds or fails. No intermediate (inconsistent) state is ever visible. * Consistency: each transaction maintains consistency rules. * Isolation: concurrent interaction should not interfere. * Durability: a completed transaction persists. All these are reasonably obvious, especially if you stick to the simpler approach of serialization as isolation principle: a state change either succeeds or fails. - In a system with transient errors, recovery is possible through transaction abort+retry if inconsistencies are discovered soon enough. Here "soon enough" means before an inconsistent read leads to a write that would not occur. Let's call these "read faults" or "wire faults". In practice, such errors can be caught using redundancy. Checksumming, error detecting and correcting codes, ... As an optimziation, nested transactions can be retried locally. - In a system with "external state mutation", consistency maintenance requires extra effort, i.e. actual "repair". This is hard, as it requires "intelligence", i.e. knowledge that is not inside the system and its consistency rules. It seems that the best approach is to design the system in such a way that such repair operations are kept to a minimum. If the external mutations are known, they could possibly be caught by not allowing them to cause actual inconsistencies. [1] http://en.wikipedia.org/wiki/Database#The_ACID_rules Entry: Abstraction isn't always the solution Date: Mon May 9 09:07:50 EDT 2011 It's hard to quantify this impression. Some heterogenous examples: * http://zwizwa.be/darcs/staapl/staapl/pk2/libusb.ss Originally, Jakub wrote this code with a naming scheme that's close to the C header file. I changed it to have ``pretty'' dash-separated scheme-ish names. I later changed it back so that all entities that are direct wrappers of C functions and structures are verbatim copies of those names ( and struct:). Then only the functionality that is built on top of this hase the pretty scheme names. This has the advantage that libusb documentation remains valid. In essence, the original (lower level) API remains visible. In a 2nd layer this can then be abstracted out if necessary. Morale: Exposing the low-level names seems dirty at first (that's why I originally changed Jakub's naming) but seems to make a lot of sense from a documentation pov. If you include the existence of libusb as a C library, and the fact that many people that would use it from scheme would also use it from C, then the overall complexity seems to be decreasing if the horribility of the the C API is leaked into the Scheme code. ( Where it can be abstracted over later if necessary, but only app-specific, not wrapper-specific. ) * Exposing data structure internals in interfaces. C-specific: When you expose a structure in a header file, you forever break binary upgradability of the representation (apart from adding members to the end of the struct). For core data structures, exposing the internals is almost always a bad idea. For mutable structures which need to respect data invariants accessors that explicitly maintain invariants are absolutely necessary. In some cases though it seems not such a big deal to just use a wide open struct and be done with it, instead of a pletora of constructor and accessor methods that do not add much value. However, this seems to work only for "constant" structures: those that never change shape after they have been constructed. Such a structure behaves as a "configuration file". ( In C, one additional reasons to expose internals of a struct is to allow temporary structured data alloced as local variables, avoiding malloc() issues. ) Note that when the use of such a public configuration structure is limited to one or a handful of API methiods, upgrade is not such an issue. Simply remove one method together with its configuration structure, and replace it with another method with another structure. Here the binary linking issues are the same for structures and functions. ( The general idea seems to be that when structures are read-only, the difference with functions starts to look more blurry. This is especially apparent in a pure functional language like Haskell [1]. ) [1] entry://../compsci/20110509-093238 Entry: On doing things differently Date: Mon Oct 24 19:08:58 EDT 2011 If you violate a rule or invariant in a sound model, or design pattern, then you should know *why* you violate it. Meaning, the violation should not be out of ignorance of how to do it properly, but out of knowing why you have to do it differently. Often it is indeed easier to start from scratch and build something simpler, with less moving parts. Though this should never be out of ignorance, because usually in good designs some possibly hard to understand complexity might actually be essential in ways you don't get yet. Entry: Writing stateful code Date: Fri Nov 25 14:14:34 EST 2011 I've recently had the pleasure (!) to write some file system code. The main problems I run into are 1. robustness (error detection and recovery) and 2. simply getting it to work correctly in the first place. I talk about robustness somewhere else[1]. This post is about how to handle stateful code. File systems generally have a lot of state and a lot of invariants covering relationships between state elements. Often a lot of this state is cache or index data: some *duplication* of state that is available from other state, but is too expensive to compute. A practical way to solve this is to use an approximation of [2]. In logic, structure is expressed as predicates. To use this in programming, make sure these predicates can be evaluated by reusing some of the code that computes the caches: 1. Make sure the code that computes the cached/index data is readily available as subroutines. 2. Add assert checks that verify stored caches against computed data after major state updates occur. Make these optional, so they can be run only during testing/debugging or under specially constructed conditions. [1] entry://20111125-141642 [2] http://en.wikipedia.org/wiki/Hoare_logic Entry: Robustness Date: Fri Nov 25 14:16:42 EST 2011 Main issues are: - Transient vs. permanent errors. Transient errors are errors that do not violate system state, but are a consequence of communication errors. The criterion is that they can often be solved by detecting them (using data redundancy) followed by simply retrying the operation. Permanent errors are those that permanently violate invariants. The issue here is to identify which invariants are broken, and how to fix them by throwing away part of the information causing minimal damage. - Local or global recovery. It is hard to implement robust code that is also modular. There seems to always be a tension between abstraction (solving problems locally) and need for broad-ranging information (solving problems globally). Entry: Darcs Emacs Date: Mon Nov 28 15:10:33 EST 2011 After getting used to git a bit more (magit), I really miss decent support of darcs in emacs. I currently have these installed: vc-darcs.el[1] xdarcs.el[2] [1] http://www.loveshack.ukfsn.org/emacs/vc-darcs.el [2] http://chumsley.org/download/xdarcs.el Entry: State machines are hard Date: Sat Dec 17 10:32:46 EST 2011 I'm on an embedded project with a lot (lot lot) of state that is very hard to test properly. We're basically like "oh yes, this is a case that needs to be tested also." but only after we see it failing in more expensive black bock testsint, past the first line of developer tests. How can this problem be reduced in a more systematic way? I sort of saw it coming, but did not have a way to counter.. Entry: Testing a stateful monster Date: Tue Dec 20 14:10:35 EST 2011 I'm done with this approach.. Same project as last post, more state problems. The thing is also that there really aren't enough seconds in all of time to write test cases for every possible initial state. How to turn a stateful program into property based testing in a systematic way? Entry: Branching in Darcs / Git Date: Sat Dec 31 07:41:58 EST 2011 Been using git for work for a while. I actually like it better than darcs. Maybe it's the "chronological" model that meshes better with how I think about code. I don't think I ever used the "clean, commuting patch" model of darcs. I'm just not that clean! Or it's more about the kind of early-stage development that has large, cross-cutting changes instead of small feature additions or bug fixes.. So I'm thinking of moving my stuff to git, at least the current Haskell work, or figure out how to branch and inspect branches in darcs.