Ubuntu headless server failure

Discussion in 'Other Operating Systems' started by toxic_spill, Nov 3, 2019.

  1. toxic_spill

    toxic_spill Member

    Joined:
    Jan 8, 2002
    Messages:
    351
    Location:
    Eltham
    I'm going to try and put this is as much of a chronological order as I can.

    Hardware:
    HP Proliant Microserver Gen8 G2020T
    4 x 4Tb WD Red drives, 8Tb RAID 5 and 2Tb RAID 10
    128Gb Samsung SSD (root, swap, etc)
    1Gb SD card (boot) (This is the boot drive due to the peculiarities of the HP hardware - the SSD cannot be set as the boot drive)

    Until recently I was running Ubuntu 14.04 on a headless server as a sonarr/Plex box. After a couple of power outages, then internal fan for the server failed, and I replaced this with a standard case fan.

    Later on, the system seemed to be having some difficulties, only successfully booting intermittently. (I wasn't generally seeking to reboot it, but on occasion had to do so). When troubleshooting this, I vaguely recall something about being unable to write to the root drive, but I cannot recall the specific cause/message.

    This weekend I decided to have a go at getting it working again - which involved taking my monitor, keyboard and mouse from one end of the house to the other, as the server has no WiFi. As a result, my ability to comprehensively troubleshoot was somewhat limited.

    I managed to successfully get the system to boot by selecting an older kernel (3.13 version, with 148 being the most recent functional, and 170 being the most recent actual) - however I didn't trust that this was stable, and I also realised that 14.04 had reached end of life stage. Given that the system was running fine, I decided to update to 16.04 as a stepping stone to 18.04.

    Moving to 16.04 seemed to go fine, with everything downloading and installing without issue. A new kernel was also applied, 4.4, with the 3.13.170 kernel being retained. On boot however, the system will get to GRUB and give me 6 options (3 for each kernel) of normal, upstart and recovery. In each instance the system seems to either hang at "loading initial ramdisk", or loads into "BusyBox", and that's where I am stuck.

    I have tried booting onto a live USB of 16.04 and running boot-repair, but that said everything was OK.

    I have also tried running fsck on the Samsung SSD (root drive), but that seems to go into an infinite run of error discovery.

    Other than simply attempting to install a totally new version of Ubuntu onto the drive, what are my options here? Is there anything I can do to attempt to diagnose/check/repair the SDD on my windows machine?

    If I do have to go down the route of a new OS install, will I still be able to access the data on my storage drives, or am I facing a total loss here?
     
  2. juggernaut88

    juggernaut88 Member

    Joined:
    Aug 5, 2015
    Messages:
    226
    Location:
    /dev/null
    If the data is on a separate drives to the SSD then you will be fine and won't lose anything. Depending on how far gone the SSD is you ought to be able to copy your home drive from it to the new installation on a new drive.
     
  3. HyRax1

    HyRax1 ¡Viva la Resolutión!

    Joined:
    Jun 28, 2001
    Messages:
    7,893
    Location:
    At a desk
    Correct - unlike Windows, Linux is much easier to recover in situations like this.

    Going forward, if you do end up reinstalling (and for a server, this is usually the easiest path), boot up into a Live environment, backup your /etc folder so you can reference some of the config files for your custom settings in the new install (don't just copy these files back over your new install).

    Keep your boot drive and data folder physically separate - I have a nice and cheap 128GB boot SSD for my server and all the data is stored on a big mechanical disk array. I keep a Clonezilla backup of the boot drive just in case a power outage or something makes it go tits up. If the drive fails, I simply buy another, restore the backup (via USB boot drive) and the only thing that will be missing is any system updates and changed config files since the backup was taken. This also allows you to easily test upgrades to newer releases of Ubuntu to see if anything breaks without worrying about how to rollback.

    Where possible/practical, I also keep other regular-changing data like databases (general MySQL, Plex, MythTV, etc) and place them on my array too, away from the boot drive.

    With respect to backing up apps like Plex, they are straight forward "backup the folder" and then "restore after reinstall", or consider taking the opportunity to start with a clean database and rebuild it from scratch.
     
    elvis likes this.
  4. koss

    koss Member

    Joined:
    Mar 6, 2009
    Messages:
    5,816
    Location:
    Vic
    Normally an fsck -y on your SSD from a live CD or repair drive should fix it. If it doesn't fix it within a minute, then the SSD may be failing or something on your motherboard.

    Ubuntu 18.04 is the current LTS version, with 20.04 being the next release in April.
    14.04 is way past it. I just user debian Buster on my servers, Ubuntu has a tad too much baggage.
     
  5. OP
    OP
    toxic_spill

    toxic_spill Member

    Joined:
    Jan 8, 2002
    Messages:
    351
    Location:
    Eltham
    Am I able to grab the SSD out of the server and plug it into my windows box for testing without damaging anything, or do I just have to reinstall the OS and hope for the best?

    In my case the 1Gb MicroSD card contains GRUB and /boot, while the SSD contains everything else bar the data.

    Is there anything I should be aware of regarding preserving/reading the RAID drives? I'm *reasonably* sure that the RAID was done in software only, not with a hardware controller.

    Hmmm. Could I boot a live Ubuntu OS and read the SSD, or would it be restricted because I am not booting into the actual install OS? If I can copy off the configurations info etc that would be excellent.

    The fsck ran for ages with me literally setting up a book and pen to hold down the y key (because I didn't initially use the -y option, and I don't know how to kill a running process).

    I realise that the OS is old, but for the longest time it had just ticked along without issue, only having problems when I actually tried to change something, hence attempting to leave it alone if possible.
     
  6. Quadbox

    Quadbox Member

    Joined:
    Jun 27, 2001
    Messages:
    6,269
    Location:
    Brisbane
    ctrl-c in the terminal you launched it from. Or if not that terminal, see "kill", "killall" and "ps" (ps -aux would be a broad option)
     
    toxic_spill likes this.
  7. OP
    OP
    toxic_spill

    toxic_spill Member

    Joined:
    Jan 8, 2002
    Messages:
    351
    Location:
    Eltham
    I might have to try Ctrl-c - since I am attempting to boot the machine without much luck, I am generally working direct from the prompt, rather than in a separate terminal window.
     
  8. HyRax1

    HyRax1 ¡Viva la Resolutión!

    Joined:
    Jun 28, 2001
    Messages:
    7,893
    Location:
    At a desk
    The most you'd be able to test is checking the SMART data. Windows won't be able to read the filesystem on it and will simply assume that it's empty and offer you the opportunity to format it.

    Yes, you'll want to change this. MicroSD cards are cool as boot devices, but they don't like being randomly written to all the time (this is why you have to regularly reformat dashcam cards, for example). Have an SSD as a boot drive and a second drive (SSD or mechanical) for all your data. Make a backup of the boot drive with a tool like Clonezilla to keep on hand in the event of future failure.

    If you setup "mdadm" then it's software RAID. While there is some OS-level configuration data, this is not critical and the drives will happily import into any other system that supports mdadm. For example, you can boot up on a Live Ubuntu USB stick or DVD, install the "mdadm" package and then magically see all your RAID drives (and then add any additional filesystem packages such as ZFS to read the content on those drives if you are not using a standard out-of-the-box filesystem).

    Yes you can. You can read the files but not execute anything because the Live environment doesn't have those apps that use that data, eg: If you have a Plex installation, that doesn't mean Plex will suddenly work in a Live environment - you just have access to its data.

    It is in your interests to keep the OS up to date. That doesn't mean you have to check and install things every week, but at least aim for once a month. Note that if you selected so during the install, by default Ubuntu will auto-install critical security updates, but leave the non-critical to you.

    I'd also recommend setting up a cron job to periodically check the SMART data of your drives (and other housekeeping things like available storage space), and if a threshold or error is met, so send you an automatic email about it so you can pre-emptively respond to potential issues before they actually become issues. On my server for example, I have it poll my LSI hardware RAID controller for any drives that are misbehaving and to pop me an email accordingly (just looks for keywords in the output of the status query command such as "degraded" or "offline"). I also have a separate alert for low disk space (past 85% full) and one for my UPS telling me the battery is bad or power has failed and it's on battery and later back off battery, etc.
     

Share This Page

Advertisement: