IT professional with a strong love for all things #FLOSS. Soon-to-be-retired #soccer player, #guitar player and sizeable #LEGO bricks addict.

GPG 0x736EDD9A0151287B

https://keyoxide.org/26E947141F348287FF494EAE736EDD9A0151287B

Pixelfed: @pete@pixel.cyano.at
PeerTube: @pete@tube.cyano.at

  • 1 Post
  • 9 Comments
Joined 1 year ago
cake
Cake day: July 11th, 2023

help-circle


  • Ironically, if I would have had more services running in docker I might not have experienced such a fundamental outage. Since docker services usually tend to spin up their exclusive database engine you kind of “roll the dice” as far as data corruption goes with each docker service individually. Thing is, I don’t really believe in bleeding CPU computation cycles by running redundant database services. And since many of my services are already very long-serving they’ve been set up from source and all funneled towards a single, central and busy database server - thus, if that one experiences sudden outage (for instance power failure) all kinds of corruption and despair can arise. ;-)

    Guess I should really look into a small UPS and automated shutdown. On top of better backup management of course! Always the backups.



  • Excellent choice. I’m running a physical Routerboard and a virtual RouterOS inside my hypervisor for redundancy.
    The license for virtual RouterOS is dirt cheap and has more features than you could ever dream of with any of the the big network device manufacturers.
    The physical devices are very well designed for their relatively modest price and likewise fully featured. Perfect for any home lab or to play around with IEEE conform protocols.




  • I don’t see a clear indication that you have too low RAM… RAM should be “used” fully at all times and your “cached” RAM value suggest you still have quite a bunch of RAM that could potentially be consumed by applications when they need it.
    I cannot clearly see a swap usage in the graphs - that would be an interesting value to judge the overall stability of the system with regards to fluctuating RAM usage.

    However, once you notice the problem again, right after you manage to log in, run a “dmesg -T | grep -i oom” and see if any processes get killed due to temporarily spiking RAM consumption. If you’re lucky that command might lend some insight even now still.

    Also, what if you run a “top” command for a while, what’s the value for “wa” in the second line like? “wa” stands for I/O wait and if that value is anything above 5 it might indicate that your CPU is being bottlenecked by for instance hard disk speed.