Salta al contenuto
0
  • Home
  • Piero Bosio
  • Blog
  • Mondo
  • Fediverso
  • News
  • Categorie
  • Recenti
  • Popolare
  • Tag
  • Utenti
  • Home
  • Piero Bosio
  • Blog
  • Mondo
  • Fediverso
  • News
  • Categorie
  • Recenti
  • Popolare
  • Tag
  • Utenti
Skin
  • Light
  • Brite
  • Cerulean
  • Cosmo
  • Flatly
  • Journal
  • Litera
  • Lumen
  • Lux
  • Materia
  • Minty
  • Morph
  • Pulse
  • Sandstone
  • Simplex
  • Sketchy
  • Spacelab
  • United
  • Yeti
  • Zephyr
  • Dark
  • Cyborg
  • Darkly
  • Quartz
  • Slate
  • Solar
  • Superhero
  • Vapor

  • Predefinito (Nessuna skin)
  • Nessuna skin
Collassa

Piero Bosio Social Web Site Personale Logo Federation

Social Forum federato con il resto del mondo. Non contano le istanze, contano le persone
  1. Home
  2. Categorie
  3. Senza categoria
  4. Spent my morning figuring out why Nginx was dead on a server with many days of uptime.

Spent my morning figuring out why Nginx was dead on a server with many days of uptime.

Pianificato Fissato Bloccato Spostato Senza categoria
sysadminlinuxsystemdrantkiss
18 Post 6 Autori 1 Visualizzazioni
  • Da Vecchi a Nuovi
  • Da Nuovi a Vecchi
  • Più Voti
Rispondi
  • Topic risposta
Effettua l'accesso per rispondere
Questa discussione è stata eliminata. Solo gli utenti con diritti di gestione possono vederla.
  • Stefano Marinelliundefined Questo utente è esterno a questo forum
    Stefano Marinelliundefined Questo utente è esterno a questo forum
    Stefano Marinelli
    scritto ultima modifica di stefano@mastodon.bsd.cafe
    #1

    Spent my morning figuring out why Nginx was dead on a server with many days of uptime. No reboot, no kernel panic. Just... down. Ubuntu 24.04.

    The cause? An automatic unattended-upgrade of libc6. This prompted systemd to work its magic, wisely deciding to restart every running service to apply the patch. Fine.

    The problem is, in the exact same minute, the systemd timer for certbot decided it was time to renew certificates.

    The result:

    - systemd stops Nginx.
    - Port 80 becomes free.
    - certbot, in standalone mode, immediately grabs it for validation.
    - systemd tries to restart Nginx, which fails with "Address already in use".

    The web server was knocked offline by its own certificate renewal script.

    I swear, this is the kind of cascading failure that has never happened to me in years of running *BSD. With a classic cron job, certbot would have failed, logged an error, and tried again the next day. The web server would have remained untouched.

    systemd was doing its job, but something failed because of the interactions.

    Sometimes, too much automation and too many interconnected parts just create more spectacular ways for things to break.

    #SysAdmin #Linux #SystemD #Rant #KISS

    Farooq | فاروق [Master Patata]undefined Doerkundefined hyperreal 🅅undefined Monospace Mentorundefined Haelwenn /элвэн/ :triskell:undefined 5 Risposte Ultima Risposta
    • Stefano Marinelliundefined Stefano Marinelli

      Spent my morning figuring out why Nginx was dead on a server with many days of uptime. No reboot, no kernel panic. Just... down. Ubuntu 24.04.

      The cause? An automatic unattended-upgrade of libc6. This prompted systemd to work its magic, wisely deciding to restart every running service to apply the patch. Fine.

      The problem is, in the exact same minute, the systemd timer for certbot decided it was time to renew certificates.

      The result:

      - systemd stops Nginx.
      - Port 80 becomes free.
      - certbot, in standalone mode, immediately grabs it for validation.
      - systemd tries to restart Nginx, which fails with "Address already in use".

      The web server was knocked offline by its own certificate renewal script.

      I swear, this is the kind of cascading failure that has never happened to me in years of running *BSD. With a classic cron job, certbot would have failed, logged an error, and tried again the next day. The web server would have remained untouched.

      systemd was doing its job, but something failed because of the interactions.

      Sometimes, too much automation and too many interconnected parts just create more spectacular ways for things to break.

      #SysAdmin #Linux #SystemD #Rant #KISS

      Farooq | فاروق [Master Patata]undefined Questo utente è esterno a questo forum
      Farooq | فاروق [Master Patata]undefined Questo utente è esterno a questo forum
      Farooq | فاروق [Master Patata]
      scritto ultima modifica di
      #2

      @stefano

      hmm I think the problem's here using certbot in standalone mode. Don't you think so?

      Stefano Marinelliundefined 1 Risposta Ultima Risposta
      • Stefano Marinelliundefined Stefano Marinelli

        Spent my morning figuring out why Nginx was dead on a server with many days of uptime. No reboot, no kernel panic. Just... down. Ubuntu 24.04.

        The cause? An automatic unattended-upgrade of libc6. This prompted systemd to work its magic, wisely deciding to restart every running service to apply the patch. Fine.

        The problem is, in the exact same minute, the systemd timer for certbot decided it was time to renew certificates.

        The result:

        - systemd stops Nginx.
        - Port 80 becomes free.
        - certbot, in standalone mode, immediately grabs it for validation.
        - systemd tries to restart Nginx, which fails with "Address already in use".

        The web server was knocked offline by its own certificate renewal script.

        I swear, this is the kind of cascading failure that has never happened to me in years of running *BSD. With a classic cron job, certbot would have failed, logged an error, and tried again the next day. The web server would have remained untouched.

        systemd was doing its job, but something failed because of the interactions.

        Sometimes, too much automation and too many interconnected parts just create more spectacular ways for things to break.

        #SysAdmin #Linux #SystemD #Rant #KISS

        Doerkundefined Questo utente è esterno a questo forum
        Doerkundefined Questo utente è esterno a questo forum
        Doerk
        scritto ultima modifica di
        #3

        @stefano I have never been a fan of systemd. Can’t see any advantage over using the good old tools that come with FreeBSD.

        1 Risposta Ultima Risposta
        • Stefano Marinelliundefined Stefano Marinelli

          Spent my morning figuring out why Nginx was dead on a server with many days of uptime. No reboot, no kernel panic. Just... down. Ubuntu 24.04.

          The cause? An automatic unattended-upgrade of libc6. This prompted systemd to work its magic, wisely deciding to restart every running service to apply the patch. Fine.

          The problem is, in the exact same minute, the systemd timer for certbot decided it was time to renew certificates.

          The result:

          - systemd stops Nginx.
          - Port 80 becomes free.
          - certbot, in standalone mode, immediately grabs it for validation.
          - systemd tries to restart Nginx, which fails with "Address already in use".

          The web server was knocked offline by its own certificate renewal script.

          I swear, this is the kind of cascading failure that has never happened to me in years of running *BSD. With a classic cron job, certbot would have failed, logged an error, and tried again the next day. The web server would have remained untouched.

          systemd was doing its job, but something failed because of the interactions.

          Sometimes, too much automation and too many interconnected parts just create more spectacular ways for things to break.

          #SysAdmin #Linux #SystemD #Rant #KISS

          hyperreal 🅅undefined Questo utente è esterno a questo forum
          hyperreal 🅅undefined Questo utente è esterno a questo forum
          hyperreal 🅅
          scritto ultima modifica di
          #4

          @stefano I'm not sure about the renew subcommand of certbot, but I know there is an --nginx flag that will tell certbot to use the already running nginx instance. You would need the python3-certbot-nginx package installed.

          Stefano Marinelliundefined 1 Risposta Ultima Risposta
          • Farooq | فاروق [Master Patata]undefined Farooq | فاروق [Master Patata]

            @stefano

            hmm I think the problem's here using certbot in standalone mode. Don't you think so?

            Stefano Marinelliundefined Questo utente è esterno a questo forum
            Stefano Marinelliundefined Questo utente è esterno a questo forum
            Stefano Marinelli
            scritto ultima modifica di
            #5

            @farooqkz looking at the logs, it seems that certbot will run in --nginx mode if it finds an active nginx - but it didn't find it when launched, so used the standalone mode

            Farooq | فاروق [Master Patata]undefined 1 Risposta Ultima Risposta
            • hyperreal 🅅undefined hyperreal 🅅

              @stefano I'm not sure about the renew subcommand of certbot, but I know there is an --nginx flag that will tell certbot to use the already running nginx instance. You would need the python3-certbot-nginx package installed.

              Stefano Marinelliundefined Questo utente è esterno a questo forum
              Stefano Marinelliundefined Questo utente è esterno a questo forum
              Stefano Marinelli
              scritto ultima modifica di
              #6

              @hyperreal it usually run in --nginx mode - but looking at the logs, it seems it didn't detect a running nginx so switched back to the standalone mode

              1 Risposta Ultima Risposta
              • Stefano Marinelliundefined Stefano Marinelli

                Spent my morning figuring out why Nginx was dead on a server with many days of uptime. No reboot, no kernel panic. Just... down. Ubuntu 24.04.

                The cause? An automatic unattended-upgrade of libc6. This prompted systemd to work its magic, wisely deciding to restart every running service to apply the patch. Fine.

                The problem is, in the exact same minute, the systemd timer for certbot decided it was time to renew certificates.

                The result:

                - systemd stops Nginx.
                - Port 80 becomes free.
                - certbot, in standalone mode, immediately grabs it for validation.
                - systemd tries to restart Nginx, which fails with "Address already in use".

                The web server was knocked offline by its own certificate renewal script.

                I swear, this is the kind of cascading failure that has never happened to me in years of running *BSD. With a classic cron job, certbot would have failed, logged an error, and tried again the next day. The web server would have remained untouched.

                systemd was doing its job, but something failed because of the interactions.

                Sometimes, too much automation and too many interconnected parts just create more spectacular ways for things to break.

                #SysAdmin #Linux #SystemD #Rant #KISS

                Monospace Mentorundefined Questo utente è esterno a questo forum
                Monospace Mentorundefined Questo utente è esterno a questo forum
                Monospace Mentor
                scritto ultima modifica di
                #7

                @stefano I read this as a simple race condition for port 80, and can't see how this is an "only on Linux" thing.

                Stefano Marinelliundefined 1 Risposta Ultima Risposta
                • Monospace Mentorundefined Monospace Mentor

                  @stefano I read this as a simple race condition for port 80, and can't see how this is an "only on Linux" thing.

                  Stefano Marinelliundefined Questo utente è esterno a questo forum
                  Stefano Marinelliundefined Questo utente è esterno a questo forum
                  Stefano Marinelli
                  scritto ultima modifica di
                  #8

                  @monospace I didn't say it's a problem "only on Linux". It's more of a "let's make things complex" problem. The fact that it's never happened on BSDs is directly related to the fact that they don't provide that kind of automation - so it can't break anything. 🙂

                  Monospace Mentorundefined 1 Risposta Ultima Risposta
                  • Stefano Marinelliundefined Stefano Marinelli

                    @monospace I didn't say it's a problem "only on Linux". It's more of a "let's make things complex" problem. The fact that it's never happened on BSDs is directly related to the fact that they don't provide that kind of automation - so it can't break anything. 🙂

                    Monospace Mentorundefined Questo utente è esterno a questo forum
                    Monospace Mentorundefined Questo utente è esterno a questo forum
                    Monospace Mentor
                    scritto ultima modifica di
                    #9

                    @stefano To me, this is just a coincidence of two scheduled jobs (package upgrades and certificate renewal) running at the same time. Maybe I'm missing something, but port 80 being open to be taken over by certbot would have happened with a traditional cron job on any old Unix system just the same.

                    Stefano Marinelliundefined Monospace Mentorundefined 2 Risposte Ultima Risposta
                    • Monospace Mentorundefined Monospace Mentor

                      @stefano To me, this is just a coincidence of two scheduled jobs (package upgrades and certificate renewal) running at the same time. Maybe I'm missing something, but port 80 being open to be taken over by certbot would have happened with a traditional cron job on any old Unix system just the same.

                      Stefano Marinelliundefined Questo utente è esterno a questo forum
                      Stefano Marinelliundefined Questo utente è esterno a questo forum
                      Stefano Marinelli
                      scritto ultima modifica di stefano@mastodon.bsd.cafe
                      #10

                      @monospace the certbot renewal cronjob is usually enforcing the --nginx (or --apache), so it would fail if nginx/apache is down. This script tries to detect if nginx or apache is running and, if not, it's using the certbot as standalone. This created the problem - otherwise, it would just fail and retry the morning after.

                      Monospace Mentorundefined 1 Risposta Ultima Risposta
                      • Monospace Mentorundefined Monospace Mentor

                        @stefano To me, this is just a coincidence of two scheduled jobs (package upgrades and certificate renewal) running at the same time. Maybe I'm missing something, but port 80 being open to be taken over by certbot would have happened with a traditional cron job on any old Unix system just the same.

                        Monospace Mentorundefined Questo utente è esterno a questo forum
                        Monospace Mentorundefined Questo utente è esterno a questo forum
                        Monospace Mentor
                        scritto ultima modifica di
                        #11

                        @stefano Wait, no, I see your point. I had to edit in "post-install restarts", and realized that systemd taking care of that is indeed something special. I will still put the blame on certbot for taking over port 80 even though instructed to use nginx. That should have resulted in a fatal error.

                        Stefano Marinelliundefined 1 Risposta Ultima Risposta
                        • Stefano Marinelliundefined Stefano Marinelli

                          @monospace the certbot renewal cronjob is usually enforcing the --nginx (or --apache), so it would fail if nginx/apache is down. This script tries to detect if nginx or apache is running and, if not, it's using the certbot as standalone. This created the problem - otherwise, it would just fail and retry the morning after.

                          Monospace Mentorundefined Questo utente è esterno a questo forum
                          Monospace Mentorundefined Questo utente è esterno a questo forum
                          Monospace Mentor
                          scritto ultima modifica di
                          #12

                          @stefano I agree, it's certbot's behaviour that caused the issue in the end, not systemd doing a good job at system maintenance.

                          Stefano Marinelliundefined 1 Risposta Ultima Risposta
                          • Monospace Mentorundefined Monospace Mentor

                            @stefano Wait, no, I see your point. I had to edit in "post-install restarts", and realized that systemd taking care of that is indeed something special. I will still put the blame on certbot for taking over port 80 even though instructed to use nginx. That should have resulted in a fatal error.

                            Stefano Marinelliundefined Questo utente è esterno a questo forum
                            Stefano Marinelliundefined Questo utente è esterno a questo forum
                            Stefano Marinelli
                            scritto ultima modifica di
                            #13

                            @monospace Exactly, I agree.

                            1 Risposta Ultima Risposta
                            • Monospace Mentorundefined Monospace Mentor

                              @stefano I agree, it's certbot's behaviour that caused the issue in the end, not systemd doing a good job at system maintenance.

                              Stefano Marinelliundefined Questo utente è esterno a questo forum
                              Stefano Marinelliundefined Questo utente è esterno a questo forum
                              Stefano Marinelli
                              scritto ultima modifica di
                              #14

                              @monospace I've updated the original post to clarify that systemd has done its job, but the interaction caused problems

                              1 Risposta Ultima Risposta
                              • Stefano Marinelliundefined Stefano Marinelli

                                Spent my morning figuring out why Nginx was dead on a server with many days of uptime. No reboot, no kernel panic. Just... down. Ubuntu 24.04.

                                The cause? An automatic unattended-upgrade of libc6. This prompted systemd to work its magic, wisely deciding to restart every running service to apply the patch. Fine.

                                The problem is, in the exact same minute, the systemd timer for certbot decided it was time to renew certificates.

                                The result:

                                - systemd stops Nginx.
                                - Port 80 becomes free.
                                - certbot, in standalone mode, immediately grabs it for validation.
                                - systemd tries to restart Nginx, which fails with "Address already in use".

                                The web server was knocked offline by its own certificate renewal script.

                                I swear, this is the kind of cascading failure that has never happened to me in years of running *BSD. With a classic cron job, certbot would have failed, logged an error, and tried again the next day. The web server would have remained untouched.

                                systemd was doing its job, but something failed because of the interactions.

                                Sometimes, too much automation and too many interconnected parts just create more spectacular ways for things to break.

                                #SysAdmin #Linux #SystemD #Rant #KISS

                                Haelwenn /элвэн/ :triskell:undefined Questo utente è esterno a questo forum
                                Haelwenn /элвэн/ :triskell:undefined Questo utente è esterno a questo forum
                                Haelwenn /элвэн/ :triskell:
                                scritto ultima modifica di
                                #15
                                @stefano Says more about certbot than systemd though.
                                Like web server can just stay up with using the other ACME challenges (which can be DNS or reverse-proxying the acme client), so web server never has to go down.
                                Stefano Marinelliundefined 1 Risposta Ultima Risposta
                                • Haelwenn /элвэн/ :triskell:undefined Haelwenn /элвэн/ :triskell:
                                  @stefano Says more about certbot than systemd though.
                                  Like web server can just stay up with using the other ACME challenges (which can be DNS or reverse-proxying the acme client), so web server never has to go down.
                                  Stefano Marinelliundefined Questo utente è esterno a questo forum
                                  Stefano Marinelliundefined Questo utente è esterno a questo forum
                                  Stefano Marinelli
                                  scritto ultima modifica di
                                  #16

                                  @lanodan when I create some cron jobs, I force the "--nginx" or "--apache" - so it will never start listening. The script shipped with Ubuntu seems to fallback to "standalone" mode if nginx|apache isn't running.

                                  1 Risposta Ultima Risposta
                                  • Stefano Marinelliundefined Stefano Marinelli

                                    @farooqkz looking at the logs, it seems that certbot will run in --nginx mode if it finds an active nginx - but it didn't find it when launched, so used the standalone mode

                                    Farooq | فاروق [Master Patata]undefined Questo utente è esterno a questo forum
                                    Farooq | فاروق [Master Patata]undefined Questo utente è esterno a questo forum
                                    Farooq | فاروق [Master Patata]
                                    scritto ultima modifica di
                                    #17

                                    @stefano

                                    I agree about the problem of Ubuntu here. But I don't think behavior of certbot is fine here either.

                                    I don't think doing certbot --nginx and then it falling back to standalone without explicit request of the user(here the sysadmin) aligns well with Unix philosophy and designs. To be honest, the certbot itself doesn't very much align with Unix philosophy IMO.

                                    Stefano Marinelliundefined 1 Risposta Ultima Risposta
                                    • Farooq | فاروق [Master Patata]undefined Farooq | فاروق [Master Patata]

                                      @stefano

                                      I agree about the problem of Ubuntu here. But I don't think behavior of certbot is fine here either.

                                      I don't think doing certbot --nginx and then it falling back to standalone without explicit request of the user(here the sysadmin) aligns well with Unix philosophy and designs. To be honest, the certbot itself doesn't very much align with Unix philosophy IMO.

                                      Stefano Marinelliundefined Questo utente è esterno a questo forum
                                      Stefano Marinelliundefined Questo utente è esterno a questo forum
                                      Stefano Marinelli
                                      scritto ultima modifica di
                                      #18

                                      @farooqkz I agree. On many of my servers, I'm using acme.sh or lego. Or acme client on OpenBSD, of course

                                      1 Risposta Ultima Risposta
                                      Rispondi
                                      • Topic risposta
                                      Effettua l'accesso per rispondere
                                      • Da Vecchi a Nuovi
                                      • Da Nuovi a Vecchi
                                      • Più Voti


                                      Feed RSS
                                      Spent my morning figuring out why Nginx was dead on a server with many days of uptime.

                                      Gli ultimi otto messaggi ricevuti dalla Federazione
                                      • Fucina Fibonacciundefined
                                        Fucina Fibonacci

                                        @storiespettinate @societa @generalespecifico mi affido al maestro Barbero https://mastodon.uno/@fucinafibonacci/113367755729842920 e sono pronto a manifestare per garantire libertà di parola agli autori di questi libri (che poi sono quelli del podcast)

                                        per saperne di più

                                      • TiTiNoNero :__:undefined
                                        TiTiNoNero :__:

                                        @screwlisp @GustavinoBevilacqua

                                        Fools! When the only possible result should be 42...

                                        per saperne di più

                                      • Alistella 🧚🍰:snwfnw:undefined
                                        Alistella 🧚🍰:snwfnw:

                                        @snow una gatta randagia li ha fatti nel mio giardino e io nn me ne ero accorta 😍😍😍😍♥️

                                        per saperne di più

                                      • Catalin Cimpanuundefined
                                        Catalin Cimpanu

                                        North Korean espionage group Kimsuky used "sex offender notices" to lure victims into running its malware

                                        https://logpresso.com/ko/blog/2025-09-18-Kimsuky-Attack

                                        per saperne di più

                                      • ginoundefined
                                        gino

                                        Maronn ma che dovete fare co sto bluesky?

                                        per saperne di più

                                      • Snow  :gnu: :tux: :debian:undefined
                                        Snow :gnu: :tux: :debian:

                                        @stella 😍

                                        per saperne di più

                                      • Alistella 🧚🍰:snwfnw:undefined
                                        Alistella 🧚🍰:snwfnw:

                                        visite inaspettate ❤️❤️❤️

                                        per saperne di più

                                      • Storie Spettinateundefined
                                        Storie Spettinate

                                        @fucinafibonacci @societa

                                        ragazzi, quando si scherza va tutto bene, ma la sacralità del diplomatico non può essere mai messa in discussione, anche se citato da siffatta gente, eh?

                                        Sua regalità IL DIPLOMATICO è roba seria, non accostiamolo a gentucola indegna di nominarlo, figuriamoci di mangiarlo...

                                        😂 ❤️

                                        per saperne di più
                                      Powered by NodeBB Contributors
                                      Post suggeriti
                                      • gyptazyundefined

                                        Hey #Proxmox community!

                                        Seguito Ignorato Pianificato Fissato Bloccato Spostato Senza categoria proxmox debian apt spacewalk qualvosec ansible proxlb linux
                                        1
                                        1
                                        0 Votazioni
                                        1 Post
                                        1 Visualizzazioni
                                        Nessuno ha risposto
                                      • Linux Easyundefined

                                        Kdenlive si evolve!

                                        Seguito Ignorato Pianificato Fissato Bloccato Spostato Senza categoria kdenlive linux opensource videoediting
                                        1
                                        0 Votazioni
                                        1 Post
                                        3 Visualizzazioni
                                        Nessuno ha risposto
                                      • POuLundefined

                                        Quest'anno il Linux Day Milano si fa al Politecnico!

                                        Seguito Ignorato Pianificato Fissato Bloccato Spostato Senza categoria linux politecnicodimilano poul opensource
                                        1
                                        0 Votazioni
                                        1 Post
                                        2 Visualizzazioni
                                        Nessuno ha risposto
                                      • Onokoto █undefined

                                        I modifed the M-x Calendar so that it displays 2 decimals instead of 1 and displays red and blue colors for week ends like the japanese calendar.

                                        Seguito Ignorato Pianificato Fissato Bloccato Spostato Senza categoria lisp gnu guile elisp emacs lambda linux bsd
                                        1
                                        1
                                        0 Votazioni
                                        1 Post
                                        1 Visualizzazioni
                                        Nessuno ha risposto
                                      • Accedi

                                      • Accedi o registrati per effettuare la ricerca.
                                      • Primo post
                                        Ultimo post