Up to now, we have been focusing on where memory management actually occurs, the heap. But this is not the only memory segment relevant to a program. Indeed, in all garbage collected techniques, there’re other players having a relevant role in the game: the mutator and the root set. In the reference counted garbage collection, it’s the mutator who directs the life of its subjects, as it notifies in real time about the changes in references outside of itself. In the tracing garbage collection, the root set plays a fundamental role, as all tracing starts in this set, but it’s again the mutator who directs the life of the root set. The mutator is, essentially, composed of the Call Stack.
This call stack is a masterpiece of computer engineering all by itself. I can’t recall where was it introduced for the first time, but many problems the call stack solves seem to be legit origins. Indeed, it’s so universal these days that it’s hard to find a chip without support for call stack structures, and, at least to me, it’s hard to imagine a model of computations without it — please tell me if you know of any, I really want to know!
The raison d’être of the call stack
Subroutines are an example of an idea that can be implemented with stacks. A call stack is a stack of call frames: each time a subroutine is called, the return address of where this subroutine should transfer control back when it’s done is pushed onto the stack1. The call stack will record the address of the place in the .data segment where it should tell the CPU to continue executing from once it’s done.
If a subroutine needs parameters to be passed, much in the style of a pushdown automata these parameters can be pushed onto the stack. If the subroutine needs local variables for its computations, it can keep its local memory within its stack frame, on the call stack; once the subroutine is done, it can simply drop the entire stack frame. Note how simple and quick it is to allocate space within a stack frame: you only need a pointer to the beginning of the frame, and another pointer to the end, and expanding a frame is just moving its top pointer further up, much in the way that brk works, but basically instant at the lack of a system call — notice nevertheless that this then depends on how much space is previously reserved for the stack: this is what is known as a Stack Overflow, when a new frame surpasses the allocated space and the program segfaults, and the kernel kills you.
Parameter passing is an interesting feature. If you’ve done any Windows programming, you might have seen somewhere keywords like [cci_cpp]__stdcall[/cci_cpp] or [cci_cpp]__cdecl[/cci_cpp]: this is a calling convention. When a subroutine calls another one, a binary convention needs to be established: where are parameters passed, if by registers or by the stack, and in which order, how a return value is passed back to the caller, and who should clean the parameters, if the caller or the callee. Let me share with you some anecdotal curiosities. Let’s suppose a classic x86 architecture (32-bits), and the following equivalent examples, with their compiled machine code:
int __cdecl callee_cdecl(int a)
return a + 1;
int a = callee_cdecl(1);
int __stdcall callee_stdcall(int a)
return a + 1;
int a = callee_stdcall(1);
Notice the differences. It’s all on who is responsible to clean the given parameter. In both cases, an [cci_c]int[/cci_c] of value 1 is passed as a parameter when the caller says [cci_c]push 1[/cci_c]. The caller then executes [cci_c]call[/cci_c], which saves the return address of the next instruction in the caller on top of the stack and transfers control to the callee. Now, the callee computes its value and saves it in the register [cci_c]eax[/cci_c], and it cleans up after himself: here is where the differences starts. After popping the stack frame’s base pointer ([cci_c]pop ebp[/cci_c]), cdecl’s callee then says [cci_c]ret 0[/cci_c], which means return execution to the address saved on top of the stack, while stdcall runs a [cci_c]ret 4[/cci_c], which means return execution to the address 4 bytes behind the top of the stack, updating at once the stack frame’s top pointer. As cdecl didn’t pop the parameter (the earlier [cci_c]push 1[/cci_c]), cdecl then executes [cci_c]add esp,4[/cci_c], which basically means move the stack frame’s top pointer 4 bytes up — exactly what [cci_c]ret 4[/cci_c] did in stdcall.
Now imagine the mess if we mix cdecl callers with stdcall callees or vice versa. A cdecl caller pops what the stdcall callee already popped, that is, we would have the stack pointer updated 8 bytes instead of 4, hence, considering that instructions are often executed by offsetting [cci_c]esp[/cci_c], we would have everything offset: stack corruption. Similar with an stdcall caller with a cdecl callee: none of them pops the pushed parameter, so a [cci_c]ret[/cci_c] call from the caller would actually transfer control to that [cci_c]int[/cci_c] never popped: hopefully a segfault.
Stack allocations and deallocations
But let’s come back a bit, let’s pay attention again to how memory management on the stack works…
Deallocating a stack frame is nothing else but updating the stack frame and the stack pointer back to the previous stack frame base and top. The cells of the current stack frame are nothing but ignored and can be freely overwritten on the next stack frame push or growth of the current one — in the very same way, they can be re-read by uninitialized variables of a subsequent stack frame allocated in the same space. This is an implementation detail.
But what does this means for the semantics of the language? It means that the variables of your language can be divided in two kinds: those with value semantics, and those with pointer semantics. The former are those cells semantically equivalent to a value: [cci_cpp]int i = 3[/cci_cpp] means that i is a three, and not that i points to a distant three. The latter, are those cells semantically equivalent to an address: [cci_cpp]int* ptr_i = malloc(sizeof(int))[/cci_cpp] means that [cci_cpp]ptr_i[/cci_cpp] is an identifier to another cell, and not a value by itself.
A stack frame popping will simply treat everything within itself as having value semantics. It will consider that [cci_cpp]ptr_i[/cci_cpp] is a value, and make the stack forget about some value valued (pun intended) 0x7ffc78e71b60. This is, a stack frame popping effectively cleans the [cci_cpp]int[/cci_cpp], and the pointer. But not the value the pointer points to.
At last, let me leave you with a recommended read, if you want to know more in depth the workings of the call stack: https://blog.holbertonschool.com/category/hack-the-virtual-memory/
- Before trying to picture this in your mind, it is important to remember one thing: the stack does not contain executable code! For security reasons, your kernel can keep track of what parts of your code are executable, or read-write, or read-only, and the stack is just read-write, all executable code is read from the .data memory segment alone.