Skip to content

Piero Bosio Social Web Site Personale Logo Fediverso

Social Forum federato con il resto del mondo. Non contano le istanze, contano le persone

Cloudflare’s Outages and Why Cool Kids Test on Prod

Uncategorized
1 1 0
  • Cloudflare’s Outages and Why Cool Kids Test on Prod

    Every system administrator worth their salt knows that the right way to coax changes to network infrastructure onto a production network is to first validate it on a Staging network: a replica of the Production (Prod) network. Meanwhile all the developers who are working on upcoming changes are safely kept in their own padded safety rooms in the form of Test, Dev and similar, where Test tends to be the pre-staging phase and Dev is for new-and-breaking changes. This is what anyone should use, and yet Cloudflare apparently deems itself too cool for such a rational, time-tested approach based on their latest outage.

    In their post-mortem on the December 5th outage, they describe how they started doing a roll-out of a change to React Server Components (RSC), to allow for a 1 MB buffer to be used as part of addressing the critical CVE-2025-55182 in RSC. During this roll-out on Prod, it was discovered that a testing tool didn’t support the increased buffer size and it was decided to globally disable it, bypassing the gradual roll-out mechanism.

    This follows on the recent implosion at Cloudflare when their brand-new, Rust-based FL2 proxy keeled over when it encountered a corrupted input file. This time, disabling the testing tool created a condition in the original Lua-based FL1 where a NIL value was encountered, after which requests through this proxy began to fail with HTTP 500 errors. The one saving grace here is that the issue was detected and corrected fairly quickly, unlike when the FL2 proxy fell over due to another issue elsewhere in the network and it took much longer to diagnose and fix.

    Aside from Cloudflare clearly having systemic issues with actually testing code and validating configurations prior to ‘testing’ on Prod, this ought to serve as a major warning to anyone else who feels that a ‘quick deployment on Prod’ isn’t such a big deal. Many of us have dealt with companies where testing and development happened on Staging, and the real staging on Prod. Even if it’s management-enforced, that doesn’t help much once stuff catches on fire and angry customers start lighting up the phone queue.


    hackaday.com/2025/12/28/cloudf…


Gli ultimi otto messaggi ricevuti dalla Federazione
Post suggeriti
  • Linus Torvalds ha la mia età.

    Uncategorized
    1
    0 Votes
    1 Posts
    0 Views
    Linus Torvalds ha la mia età.😊RE: mastodon.social/users/nixCraft…
  • 0 Votes
    2 Posts
    0 Views
    @Bastacosi la cosa bella è che sono stati votati da chi pensava che avrebbero fatto l'opposto, e che finalmente avrebbero messo le cose a posto.Una volta ho sentito dire che nell'era dell'informazione l'ignoranza è una scelta. Forse non sarà vero in senso letterale, ma in questo caso è applicabile.
  • I write technical articles on my blog

    Uncategorized blogging
    46
    0 Votes
    46 Posts
    3 Views
    @stefano i gave up caring about this when i write. people think i'm bot for years. now i'm playful with it "this post was written by .ai, skynet will end everything, pray to the machine"so far it generates a lot of laughs
  • Addestramento subliminale

    Uncategorized
    27
    0 Votes
    27 Posts
    0 Views
    @mau @Cincia @aitech @game Li prendi per quello che possono fare, quando non ti fai condizionare dal marketing. Io avevo sottinteso che facesse gli anagrammi tramite dati e calcoli matematici, non certo perché ragiona come un cervello umano. Io lavorandoci sopra per cose importanti (ausilio per sopperire dove possibile alla mancanza della vista) pretendo che i loro limiti siano belli evidenti, alla luce del sole, o sotto il tuo naso, per capirsi. Perché se ti fidi di un modello del genere come riferimento per orientarti a piedi su strada, è un problema.Io spero che Glidance, il robot-guida che stanno costruendo negli USA, si avvalga di tecnologie di vario tipo e non di LLM e basta. Se no viene fuori darwin award