The life and death of a piece of code

How is code born? What do good developers look for when they write their code? The devel­op­ment of a sys­tem can be meas­ured by its trade-offs. Some sys­tems can­not be slow, some oth­ers can­not fail, some oth­ers can­not be too expens­ive. All sys­tems want to have everything, obvi­ously. But they can­not, obviously.

Before we study this case, let’s set aside some ter­min­o­logy first.


Systems, architectures, and domains

  • System: is “a set of things work­ing togeth­er as parts of a mech­an­ism or an inter­con­nect­ing net­work; a com­plex whole.1
  • Architecture: quot­ing Booch2, “An archi­tec­ture is the set of sig­ni­fic­ant decisions about the organ­iz­a­tion of a soft­ware sys­tem, the selec­tion of the struc­tur­al ele­ments and their inter­faces by which the sys­tem is com­posed, togeth­er with their beha­viour as spe­cified in the col­lab­or­a­tions among those ele­ments, the com­pos­i­tion of these struc­tur­al and beha­vi­our­al ele­ments into pro­gress­ively lar­ger sub­sys­tems, and the archi­tec­tur­al style that guides this organization—these ele­ments and their inter­faces, their col­lab­or­a­tions, and their com­pos­i­tion”.
  • Problem Domains: quot­ing Wikipedia, “A prob­lem domain is the area of expert­ise or applic­a­tion that needs to be examined to solve a prob­lem.3

We can, with these defin­i­tions, clas­si­fy any giv­en pro­ject accord­ing to how it fits into these cat­egor­ies. An oper­at­ing sys­tem, for example, attempts to solve the con­cur­rent usage of mul­tiple repro­gram­mable soft­ware on a giv­en hard­ware, that can be divided as an archi­tec­ture into the schedul­ing prob­lems, the vir­tu­al memory man­age­ment, or the drivers, all of which builds up to the System, as we use it.

The concerns of a project

We write code because we need to solve a prob­lem. Automation, data­bases, tele­com­mu­nic­a­tions, web­sites, games, serv­ers, or A.I., just to name a few. Each prob­lem requires dif­fer­ent decisions from dif­fer­ent archi­tec­tures, to build dif­fer­ent sys­tems, to finally solve our prob­lem domain. Each prob­lem has dif­fer­ent require­ments, and can be solved with dif­fer­ent tools.

We usu­ally write code, obvi­ously, in a giv­en lan­guage. There are Domain Specific Languages, inten­ded to solve one spe­cif­ic prob­lem, like HTML in webpage ren­der­ing, or XML in the markup examples; and General Purposes Languages, inten­ded to solve just about any­thing, like most main­stream lan­guages (C++, Haskell, Java, and what­not). But each lan­guage, even if General Purpose, car­ries along cer­tain decisions about the archi­tec­ture they’re made for, suit­ing cer­tain prob­lem domain bet­ter while impos­ing cer­tain archi­tec­tur­al constrains.

There are also meta-require­ments for any cod­ing pro­ject, not dir­ectly related to their more dir­ect con­cerns like resource man­age­ment, con­cur­rency require­ments, event hand­ling or pretty GUIs; rather, I call “meta” those things like devel­op­ment costs, pro­duc­tion demands, team work, and the ever-last­ing con­cerns of fail­ures and per­form­ance, as we all want fault-less light­ning-fast sys­tems after all.

The gestation stages of a piece of code

A trend in the industry is to pay a per­form­ance pen­alty in order to achieve cer­tain devel­op­ment facil­ity. An ubi­quit­ous example these days is the Garbage Collector: resource man­age­ment is not only com­plic­ated but also incred­ibly dan­ger­ous, there­fore cer­tain archi­tec­tures trust an extern­al agent, the Garbage Collector, to keep track of the resource own­er­ships and release what’s needed no longer, auto­mat­ic­ally and safely. But a GC might have bugs as well, might release things before­hand, might spe­cially con­sume too many resources on its own, or might inter­act badly with non-GC code, releas­ing things that the non-GC code still needs. All this must be con­sidered as well: per­form­ance under the risk of erro­neous resource man­age­ment, or auto­mat­ic resource man­age­ment under per­form­ance pen­al­ties and the risk of (very few any­way!) errors? When is the com­plex­ity of manu­al resource man­age­ment worth the per­form­ance, and when the per­form­ance pen­alty is too near to noth­ing in com­par­is­on to a sub­stan­tial improve­ment in resource management?

Modern advances need to be taken ser­i­ously into account: are we mak­ing our product scale for the future? How much con­cur­rency and par­al­lel­ism is neces­sary, and how much bene­fit will these bring to the product? Locking paradigms for multi-thread­ing are hav­ing ser­i­ous prob­lems to scale, and new mod­els are being developed and pop­ular­ised, like Transactional Memory or the Message Passing. And if you’ve heard they’re slow, remem­ber, lock­ing is fast just because we built hard­ware sup­port for them, in the past locks used to be spin­ning locks with sched­uler dequeueing, ker­nel sup­port, and inter­rupts dis­abling. The price!

Every lan­guage as well, imposes an archi­tec­ture. There are dif­fer­ent paradigms of pro­gram­ming, suit­able to dif­fer­ent prob­lem domains, that dif­fer­ent lan­guages stand for. From the old Von Neumann style of imper­at­ive, to the – merely glor­i­fied imper­at­ive – style of Object Orientation, whose core ideas are state­ful and sequen­ti­al­ity; Event-Driven frame­works for push-pull inform­a­tion or UI, Concurrency Oriented style for dis­tri­bu­tion, or func­tion­al style for mod­el­ling of trans­form­a­tions 4.

In par­al­lel to soft­ware scalab­il­ity, pun inten­ded, it is import­ant to con­sider the meta-scalab­il­ity: how much people need to be involved? Do we assign all the sys­tem to all the team? Or do we divide the archi­tec­ture into mod­u­lar ele­ments that can be divided across the developers? How do we sep­ar­ate con­cerns? The same way the talk about good cod­ing prac­tices, we need to con­sider good team-work­ing organ­isa­tion prac­tices: will the mis­takes of one developer affect the good-doings of anoth­er? A les­son might be taken from the Erlang eco­sys­tem: pro­grams are designed to have any arbit­rar­ily large num­ber of pro­cesses, and semant­ic facil­it­ies allow to put con­cur­rency on some mod­ules and sequen­ti­al­ity in some oth­ers: there­fore, the experts can work on the con­cur­rent parts of the pro­gram, clas­sic­ally harder, and the new­bies on the easi­er sequen­tial parts. In con­trasts, many frame­works are usu­ally as strong as its weak­est developer, a weak­ness that needs being addressed.

And also import­antly, how do we share and syn­chron­ize the job being done? Name it: a Version Control System. There are cent­ral­ised VCS, or dis­trib­uted VCS. There are VCS that pro­mote branch­ing and exper­i­ment­ing, and facil­it­ate unre­lated code to be kept sep­ar­ately, until merges are decided. There are VCS that are fault-tol­er­ant, con­sist­ent, and quick. Say his name. I’m talk­ing about git 5.

The life and evolution of a piece of code

Clients evolve, and with them, the require­ments of a pro­ject. But code might be a chaos and adding new fea­tures can be a daunt­ing task that pre­vents the pro­ject from evolving, which will in the end just shorten its lifespan. There will be those hor­rible days when the sys­tem is crash­ing, or worse, doing the wrong thing unnoticed. We then need to come back to our code and touch it. To add things, to remove things, to fix things. But what if the code turns out to be untouchable?

Architectural decisions are of extreme import­ance when the sys­tem is designed, in order to plan for the future. An archi­tec­ture needs to be designed to be ready for exten­sion, while mak­ing sure that future changes don’t acci­dent­ally break what was work­ing cor­rectly. Several of the glor­i­ous S.O.L.I.D. prin­ciples of Object-Oriented pro­gram­ming are all about this: Open for exten­sion, closed for modi­fic­a­tion, for example, tells us that when adding new fea­tures to our code, the archi­tec­ture should be designed so that pre­vi­ous code shall have no changes, there­fore avoid­ing risks of break­age; instead of modi­fi­able, code should be extend­able 6.

Automated test­ing is an essen­tial require­ment that pre­vents break­ing dis­tant code very well: do a change in your code, and run the tests to ensure that things that should­n’t change still behave exactly as expec­ted. And test­ing does­n’t fall short of bene­fits: a Test-Driven Development approach will make sure that soft­ware meets its expect­a­tions, that it does what is sup­posed to be done.

These tests need to be quick, inform­at­ive, and cor­rect, or the developer won’t trust them, or won’t both­er to wait etern­it­ies to check them. The tests need to be flex­ible to evolve with the pro­ject, or the developer won’t both­er chan­ging them when it’s required, and then tests will fail aban­doned. If all of this is kept out of prob­lem, the archi­tec­ture of the pro­ject needs to be designed to be test­able to begin with: depend­en­cies should be reas­on­ably easy to mock, and Dependency Injection tech­niques must be ubi­quit­ous in the code base to facil­it­ate both mock­ing and the final testing.

An archi­tec­ture should as well be mod­u­lar if it wants to be evolve-able. We see this concept in Erlang again, and com­ing back as far as 19857, where it is argued that mod­u­lar­ity is a require­ment for fault-tol­er­ance: when fail­ure ensues, mod­ules encap­su­late them, keep­ing fail­ure costs low; and mod­ules are replace­able, mak­ing fail­ures easi­er to fix. Barbara Liskov said (more than) once 8:

I kind of envied elec­tric­al engin­eers […], because there was abso­lutely no struc­ture super­im­posed on our com­puter pro­grams, and so you could just do any­thing, it was infin­itely plastic. Whereas I thought the engin­eers, they have to work with com­pon­ents and con­nect them by wires, and this forced a cer­tain kind of dis­cip­line in the prac­tice of organ­ising things, that was totally lack­ing in the soft­ware world.

At last, code can be form­ally ana­lysed, enabling a whole world of checks, both at com­pile-time and at run-time. I’m talk­ing mostly about Type Systems, a Formal Logic System that can check and ana­lyse form­al prop­er­ties of your pro­gram. The Curry-Howard cor­res­pond­ence can ensure that talk­ing about types is the same than talk­ing about pro­pos­i­tion­al logic, of first order or any super­i­or order for that mat­ter, a sci­ence that has been thor­oughly ana­lysed and stud­ied: there­after, decid­ing that a pro­gram type-checks is equi­val­ent to decid­ing wheth­er a logic pro­pos­i­tion is val­id, hence, ensur­ing we don’t write non-val­id pro­pos­i­tions amounts to ensur­ing we don’t write non­sensic­al pro­grams: we detect errors. Types are as well a design lan­guage, a spe­cific­a­tion and doc­u­ment­a­tion lan­guage (the type of an entity tells us a lot about this entity!), and, the biggest mer­it, an amaz­ing main­ten­ance tool: change some­thing in some place, and the type-check­er will tell you if you broke some­thing some­where else! A golden stand­ard these days of strong stat­ic typ­ing is Haskell, a lan­guage from which we can learn a lot about cor­rect­ness 9.

These three fea­tures alone, Type Systems, Modularity, and Automated Testing, are what really keeps high qual­ity stand­ards, good main­ten­ance, long life, and low future costs, of any giv­en project.

There’s also the per­form­ance con­cerns: before any­thing can be said about per­form­ance, before get­ting too mani­ac about it, one thing needs to be said first: pre­ma­ture optim­isa­tion is the root of all evil! These are not my words, they’re Donald Knuth’s 10. Only when per­form­ance is really a con­cern, and we have really pro­filed and ana­lyse this con­cern, we can dis­cuss this top­ic. A thor­ough know­ledge of data struc­ture and algorithms needs to be known by the archi­tects, and com­plex decisions about how’re we mod­el­ling our prob­lem domain need to be done. When clear and well known data struc­tures and algorithms are made, we can start won­der­ing about our tech­no­lo­gic­al stack: GC? Multi-thread­ing? Hardware? Going “low-level” is per­haps not a thing of the present any­more, but work­ing more closely to our tool-chains will be import­ant: do know your com­pilers, dis­cov­er its secrets. Often com­pilers are just incred­ibly huge tool­box full of use­ful sur­prises. And if you start scratch­ing your head about a hard­ware archi­tec­ture, remem­ber, you’re get­ting more per­form­ance with a giv­en archi­tec­ture just because we have build hard­ware for that archi­tec­ture. So, just look for­ward for our more func­tion­al and less imper­at­ive hard­ware. If this is still a prob­lem (is it? Really? Are you sure? Well, ok, maybe, some­times…), go for your filthy little assembly dreams.

The death of a piece of code

I heard once Joe Armstrong com­ing with a very nice com­ment in some talk: they, the seni­or developers of the pre­vi­ous gen­er­a­tions should be awar­ded as nation­al her­oes from the amount of job they and no one else have cre­ated: the leg­acy code! Just look at the mar­ket: much more job is being done at keep­ing old stuff alive than at devel­op­ing new ones. This is pain­ful, and it is import­ant to see the point: either we devel­op tech­niques to deal with leg­acy code, or we take a decision of when, if ever, is the moment to kill this leg­acy.

Studies are made on the former option. For example, there’s an ana­lys­is that orders code by the rela­tion­ship between func­tions and vari­ables of uses-what and is-used-by, which can be mod­elled by a math­em­at­ic­al lat­tice, called Concepts, of which we can ana­lyse the order graph and extract dis­con­nec­ted sub-graphs, auto­mat­ic­ally improv­ing mod­u­lar­iz­a­tion 11. Some com­pilers, like many C++ ones, also often imple­ment a called-by order­ing of all pro­ced­ures in code, which as well builds an order on the set of all pro­ced­ures, whose graph struc­ture can again be ana­lysed and restructured.

Then there’s the key moment: code reached the end of it’s life. This is easy when there’s noth­ing to lose: there’re no cli­ents of this code, there’s no sys­tem using it, its respons­ib­il­it­ies are not neces­sary any­way. This is nearly impossible when this code is used by numer­ous bank­ing sys­tems (read, COBOL), or by cut­ting-edge per­form­ance pro­grams (read, FORTRAN). If the sys­tem is needed and the shut-down costs out­weighs the main­ten­ance costs, a third cost need to be per­formed: that of rebuild­ing the entire sys­tem in a dif­fer­ent archi­tec­ture, hope­fully announ­cing a pos­it­ive review.

Some final words

After all, I leave the final decision to man­age­ment (after all I’m only a sci­ent­ist, not a busi­ness man), but on the mean­time, I’m going to make sure all cards are in the table when mana­geri­al decisions are made. That is what I can do, as a mere sci­ent­ist. Talk facts.

xkcd optimization

  4. This will all be a top­ic of the future!
  5. One GIT to rule them all!
  6. Not to glor­i­fy O.O. Design, indeed, I con­sider it noth­ing more than a glor­i­fied imper­at­ive Von Neumann machine, full of state, assign­ments, lack of all sorts of safety, and what not.
  7. Jim Gray. Why do com­puters stop and what can be done about it? Technical Report 85.7, Tandem Computers, 1985.
  9. Strong typ­ing best prac­tices are often seen most purely func­tion­al lan­guages, like Haskell, OCaml, F#, and the old school Standard ML. These are stat­ic­ally typed, while oth­er lan­guages like Perl and Python are dynam­ic­ally typed: that is, defer­ring type-check­ing to com­pile type or to run-time. We’ll see much more on these in the future.
  10. Donald Knuth, Structured pro­gram­ming with go to state­ments, Stanford University, 1974
  11. Christian Lindig et al. Assessing Modular Structure of Legacy Code Based on Mathematical Concept Analysis, 1997

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.