Processors, Part Two: the Present. Meltdown and Spectre

meltdown spectre

Imagine a res­taur­ant. A pro­fes­sion­al chef, in charge of the best piz­zas in town, and nobody knows how’re they done. It’s such a secret, those magic ingredi­ents. And ima­gine a bunch of appren­tices, sup­posed to help him. Imagine them fol­low­ing the chef’s orders. And then see, the chef going to the fridge to take some fresh cheese. The appren­tice isn’t sure if the next pizza will need tomato sauce or if it will be a white pizza, so in the mean­time, he takes the magic sauce from the draw­er and pre­pares it in the table. Chef is back and says “no, no, this pizza is white, I even brought our secret cheese for it!”. The appren­tice then saves the tomato sauce back to the draw­er, and con­tin­ue impass­ive. But a drop of the secret sauce was spilled in the table.

And a spy hired by the com­pet­i­tion is there just in time to take that drop, ves­sel in, back to the lab for examination.

A modern concern

This is, essen­tially and over-sim­pli­fied, what hap­pens when you mix branch pre­dic­tion in spec­u­lat­ive exe­cu­tion, with the L3 cache being glob­al to all cores. Your L3 cache is that big table, your chef is the pro­gram con­trol, your assist­ants are the cores, includ­ing that one spy, who has access to the same table all the oth­ers have, and your assist­ants go on take the tomato sauce out of habit, even before know­ing if they’ll need it or not.

You might have heard it or not. Last January, what is said to be the worst CPU bugs in his­tory 1 was dis­covered, inde­pend­ently, from sev­er­al research team, mainly Google’s Project Zero and E.U. fun­ded pro­jects in Europe. Reported secretly to Intel, The Linux Foundation, and Microsoft, and pub­lished when some fixes were ready to deploy­ment 2.

A Trade-Off

The issues, fant­ast­ic­ally described in a ded­ic­ated web­site, affects vir­tu­ally every single Intel pro­cessor alive today, and a vast amount of AMD and ARM pro­cessors as well. And the fixes are, to say the least, expens­ive performance-wise.

For dec­ades, the chip industry has been focused on per­form­ance. Nothing to blame, it’s pre­cisely and noth­ing else than what the mar­ket asks them to. But some of the tricks that have made them fast, are pre­cisely the ones that have made them unsafe all of a sud­den. Some people have gone as far as say­ing that these per­form­ance improve­ments are merely a quest to make C faster, and not to make chips faster. And for the most part, regain­ing that safety comes at the price of giv­ing up a lot of their gained performance.

But what are these vul­ner­ab­il­it­ies all about? Let’s ana­lyse the cases.

Enter cache

Last time we saw that, to enhance the memory bot­tle­neck, archi­tec­tures build a layered memory mod­el. While the main memory is liv­ing on its own piece of hard­ware, integ­rated with the chip there is a cache memory buck­et that serves the fast accesses. As we also saw, there’re three lay­ers: L1 and L2 are private to each core 3, while L3 is shared across all cores and threads. Every time a core attempts to fetch from memory, it checks each level, and if in the end it needed to go down all the way to main memory, it cop­ies the found data all the way up, exploit­ing the loc­al­ity we saw last time.

This has an import­ant hurdle 4: a second core can meas­ure the time it takes him to read a giv­en address from memory by flush­ing the cache and read­ing it repeatedly; if the access was too fast, it means that some oth­er core has already accessed this info, thereby leak­ing inform­a­tion of what oth­er cores are doing.

Enter the Address Space

Because Memory is one giant pool where everything 5 lives togeth­er, Memory needs be vir­tu­al­ised to keep pro­cesses away from mess­ing each oth­er. And because every pro­gram has no idea where the Operating System will load him, pro­grams are designed to believe that they’ll always be loaded at address zero and have the entire memory space for them. So in real­ity, memory is vir­tu­al­ised: the OS keeps a map­ping of addresses for each pro­cess, between what the pro­cess believes to have, and what only the OS knows about their actu­al loc­a­tion. In this vir­tu­al address space, for per­form­ance reas­ons, the OS maps its own ker­nel at the end of it

Here it is import­ant to know what a con­text switch is: in any stand­ard OS these days, the pro­cessor is con­tinu­ously chan­ging its con­text, doing a bit of every pro­cess every few mil­li­seconds, there­fore achiev­ing pro­gress every­where and avoid­ing one pro­cess from lock­ing the pro­cessor. But chan­ging from one pro­cess to anoth­er is not that easy: before doing so, you have to hide away from the second pro­cess all the inform­a­tion of the first, and as well keep this inform­a­tion safe so the first pro­cess can con­tin­ue exactly where it was. This is done by the ker­nel: a con­text switch means the fol­low­ing trans­ition: “Process A -> ker­nel switch­ing mech­an­ism -> pro­cess B -> kernel …”.

Besides, often a pro­cess needs ser­vices from the ker­nel, like mes­sage passing between oth­er pro­cesses, or memory accesses, or hard drive reads… For this reas­on, the ker­nel always maps itself with­in the private address space of every pro­cess, so ser­vi­cing requests or switch­ing con­texts, which are already expens­ive, can be made a bit cheap­er by sav­ing memory accesses and far jump instructions.

And this ker­nel memory is of course pro­tec­ted, so only the ker­nel can read itself.

So now we know everything we need to know to under­stand the problems.

Meltdown 6


In the offi­cial paper, we can find this reveal­ing “Toy Example”:





The prob­lem is then crys­tal-clear: con­sid­er­ing out-of-order exe­cu­tion, the second instruc­tion does indeed get executed, even if, semantic­ally, was unreach­able. The raised excep­tion gives con­trol to the ker­nel, which will then ter­min­ate the pro­gram, and out-of-order exe­cu­tion ensures that oper­a­tions that should­n’t have been executed won’t be com­mit­ted, and no archi­tec­tur­al signs will be seen; how­ever, this has a micro-archi­tec­tur­al effect: the fetch­ing from memory gets loaded in cache, which is not flushed due to per­form­ance reas­ons. Furthermore, the access, if to ker­nel memory, was illeg­al and will raise an Access Violation Exception, but this is not triggered imme­di­ately, because it has to still go through the pipeline, long behind the actu­al viol­a­tion, because a core can­not waist time wait­ing for excep­tions or whatnot.

From here on, cache is a glob­al unpro­tec­ted memory for all cores, where, by just tim­ing memory access, anoth­er core can see the loaded address, and by some tricky point­er arith­met­ic, it can see its con­tents. Repeating the pro­cess, it can dump the entire ker­nel, without any ker­nel priv­ileges. And if you have access to the ker­nel, you have access to the entire memory map. Yes, includ­ing your browser­’s tab check­ing your bank accounts, and all your saved passwords.

Some fixes

Immediately after acknow­ledge­ment, all major OSes patched their ker­nels. Basically, Meltdown can make any pro­cess read its entire address space, includ­ing the pro­tec­ted part. Therefore, we should­n’t map the entire ker­nel in the pro­cess“ space, right? But there are neces­sary pieces that need be map­ping, like “what is needed to enter/exit the ker­nel such as the entry/exit func­tions, inter­rupt descriptors (IDT) and the ker­nel tram­po­line stacks. This min­im­al set of data can still reveal the ker­nel’s ASLR base address; but, this is all trus­ted, which makes it harder to exploit.7

Meltdown does not attack any soft­ware vul­ner­ab­il­ity, he just bypasses the hard­ware pro­tec­tion. Hence, any soft­ware patch will always leave some small vul­ner­able memory sur­face. But dis­abling out-of-order exe­cu­tion, or flush­ing the cache con­tinu­ously, or seri­al­ising all per­mis­sion checks before fetches, would all involve a sig­ni­fic­ant over­head, some­times repor­ted as high as a 50% slowdown.

Spectre 8

SpectreSpectre, accord­ing to the offi­cial site, is called as it is because “The name is based on the root cause, spec­u­lat­ive exe­cu­tion. As it is not easy to fix, it will haunt us for quite some time.

Speculative exe­cu­tion, much sim­il­ar to out-of-order exe­cu­tion but not quite the same, is based on guess­ing branches and far jumps. Just like with Meltdown, the goal is for­cing a fetch that should­n’t have happened, an then steal the leaked data from the cache. Unlike Meltdown, pre­dict­or mech­an­isms are not shared across cores, and they must first be trained to pre­dict a pat­tern suc­cess­fully and then be tricked to pre­dict this pat­tern when it was no longer true. For these reas­ons, Spectre achieves much less per­form­ance than Meltdown, and it’s much harder to repro­duce and attack; but at the same time, it’s also much harder to fix. Furthermore, even in the case of pro­tect­ing ker­nel memory from every fetch, noth­ing stops Spectre to still steal user-space memory, from any avail­able pro­cess. It is even feas­ible through JavaScript attacks!

Consider the fol­low­ing code:


if (x < array1_size)

y = array2[array1[x] * 4096];


In a case like this, the pipeline has to decide wheth­er to execute the array check, or not. We can eas­ily ima­gine giv­ing the pipeline suc­cess­ful inputs of x many times, called the train­ing phase, and then, giv­ing one incor­rect and mali­ciously craf­ted x. The branch pre­dict­or will then push the fetch into the pipeline, and if we can arrange array1_size and array2 to be out of cache (by [ccie]clflush[/ccie]-ing it), and we can make array1[x] be cached before­hand, then order­ing the fetch would be a lot faster than com­par­ing the val­ues, and there­fore the pipeline would­n’t be flushed in time to pre­vent the exe­cu­tion of the fetch.

And then we’re back to cache side-attacks ter­rit­ory: we’ve suc­cess­fully viol­ated memory accesses, and make the CPU serve into cache the value we want exactly when we asked him to, there­fore mak­ing it too easy to leak it to our interests.

Spectre has many vari­ants, which makes it all worse. We can pois­on branch pre­dic­tion, but we can also pois­on indir­ect jumps, that is, a jump to an address that needs be cal­cu­lated first, as the spec­u­lat­ive exe­cu­tion engine remem­bers as well pre­vi­ous jumps and spec­u­lates where the next jump will be. There’s also the vari­ant where, even if spec­u­lat­ive exe­cu­tion does not modi­fy the cache what­so­ever, the state of the cache still affects the tim­ing of spec­u­lat­ively executed code, reveal­ing “metadata” of the state, or even vari­ants that don’t involve the cache, or vic­tim code without branches nor jumps (the inter­rupt ser­vices from the OS are still pro­vok­ing jumps anyway).

JIT com­pilers, the web work­er fea­ture of HTML5, Virtual Machines, and old com­piled code, are all vul­ner­able. And new Spectre vari­ants are being dis­covered all the time 9.

Some fixes

Hoping to keep this post short, I’ll just enu­mer­ate some of the imple­men­ted fixes to this day: in soft­ware, point­ers can be secretly xored to pois­on them, hence the attack­er would read garbage, and bounds check­ing could be replaced by index mask­ing. In hard­ware, “future pro­cessors can track wheth­er data was fetched as a res­ult of a spec­u­lat­ive oper­a­tion”, branch pre­dict­ors could not only not be shared across cores, but also across threads with­in the same core, and at last, there’s the very well respec­ted Google’s solu­tion ret­polines.

What do we do?

Next, we will explore some of the ideas that aca­demia and research are cur­rently explor­ing. In the mean­time, yes, once again, xkcd has the best con­clu­sion to the thought:

Meltdown and Spectre xkcd joke mess

  2. Please enjoy Linus Torvalds“ opin­ion on the fixes :D
  3. L1 is actu­ally mod­el­ling a Harvard Architecture, an inter­est­ing vari­ation of the Von Neumann’s one.
  4. Not any news, Evict-Flush-Prime are known Cache Side-Channel Attacks meth­ods since a long time ago.
  5. Everything, includ­ing your browser vis­it­ing your bank account!
  6. meltdown.pdf
  8. spectre.pdf

One Reply to “Processors, Part Two: the Present. Meltdown and Spectre”

  1. […] one big com­plex thread being executed too fast is risky as we saw, because there’s a phys­ic­al lim­it to sheer clock speed and cheat­ing optim­isa­tions are […]

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.