My Synology NAS seems to be doing this thing where no services are available after a restart. I can't get to the web interface, samba isn't available, and a bunch of scripts don't seem to work from command line. I really didn't have let's go through a bunch of logs and see what's broken today on my non-existent bingo card, but here we are, I guess.
Patrick Perdue
in reply to Patrick Perdue • • •Patrick Perdue
in reply to Patrick Perdue • • •Patrick Perdue
in reply to Patrick Perdue • • •Patrick Perdue
in reply to Patrick Perdue • • •Fun new development:
The volume my raid lives on isn't mounted. Yeah, this might be more serious than I thought. Neat!
Patrick Perdue
in reply to Patrick Perdue • • •Broken NAS update:
Good news:
Taking out the motherboard and replacing the CR-1220 battery wasn't particularly difficult. I've done way worse things.
Bad news:
Other than the new battery, I'm still right where I was before.
The RAID isn't mounting, whatever is supposed to sync the time isn't, and everything is still broken... except now it keeps time after a reboot. And, by the way, due to whatever IO junk is going on, it takes about 7 minutes to reboot after executing sudo shutdown -r now.
sudo reboot just never works.
So, yeah, fun.
Next step, probably, is to install a single, initialized drive (which I don't have) and set the operating system up as new to see if it will import the RAID from the existing pool when I put all the original drives back in, unless someone has a better idea.
Patrick Perdue
in reply to Patrick Perdue • • •So, I want to post some questions about this DS1815Plus problem on Reddit.
I've never used Reddit before, and the official Synology support forum has a big stupid CAPTCHA that, yes, I could solve with AI or whatever, but I just don't have the spoons to deal with all that today.
So, trying to sign up on Reddit, I was greeted with no fewer than seven "something went wrong" errors in different places, and a verification code but no place to put one in. Very exciting.
Patrick Perdue
in reply to Patrick Perdue • • •Even more updates:
It seems like the NAS is trying to repair itself. All my shares are now visible under /volume1 again, even though essentially none of the built-in services -- web management, samba, etc. are currently useable. The drives are chugging madly, and the shell is still not quite completely unresponsive.
Also, I manually set the clock less than an hour ago just to see what it would do, as it went back to January 01, 2014 again, since NTP isn't happening for whatever reason, and it's already gained about 24 minutes since then.
Uptime has this to say for itself:
22:12:42 up 51 min, 2 users, load average: 26.05, 27.44, 22.21 [IO: 26.04, 27.32, 21.93 CPU: 0.01, 0.12, 0.28.
So, I guess I'll just let it sit here and do whatever it does, and hopefully it will fix itself in time, and not blow up and make terrible noises in the middle of the night.
It's probably just coincidence that the CMOS battery was dead.
Hopefully, at some point, I can just restart this thing again, and it updates the system clock, and things start working as normal.
I've got a couple of projects that are currently in limbo until I can access resources again. Great timing.
Update: Samba just came back online, though it's very, very sloooooow.
Anyway, it's progress.
Patrick Perdue
in reply to Patrick Perdue • • •Basically, all this happened because at least one drive has completely failed, and another one wants to very badly. This pool has two disk tolerance, so I have to do stuff now, right this very now, to prevent data loss. But, for whatever reason, I got no notifications. No beeps, no emails or push notifications. Anything that should have alerted me completely failed to do so.
On top of that, the CMOS battery just happened to die, which turned out to, I think, be mostly unrelated.
I've just ordered four 12TB Ironwolf drives. Can't afford to buy eight in one go, so I'll replace half now, half later, obviously one disk at a time.
Patrick Perdue
in reply to Patrick Perdue • • •Continuing to play with this messed up NAS, because I can't put it down:
I have no idea how I'm going to deactivate drives and replace them in the prescribed way. The Synology web management portal is brutally slow, often times out internally, and sometimes, elements of the page just don't appear at all for minutes at a time in it's current state. There is also a potential accessibility problem highlighting a drive in the storage manager, when I can even get to the storage manager. I say potential, because I can't tell if it's not working due to accessibility, or because the interface is so ridiculously slow in it's current broken state. This is so much fun!
Edit:
It was the thing being slow. I'll have to be very careful replacing the drives on this thing.
Patrick Perdue
in reply to Patrick Perdue • • •I've been running a full S.M.A.R.T test on one of the failing drives of the NAS since around 1:30 AM.
The ETA was just a bit over 10 hours.
It's been stuck at 90% (granularity is only in 10% increments) since this morning at around 11:30.
Four of the new hard drives will be here tomorrow.
Will this thing finish it's test before then? Stay tuned... or don't.
Patrick Perdue
in reply to Patrick Perdue • • •Drive 1 of the first set of 4 has been physically replaced, and is now theoretically being integrated into the raid.
Too bad I can't figure out where to go in the stupid DSM web management interface to get status on that so I know when to replace the next drive, and email notifications are all still broken. A thing to fix the notification system requires a DSM update, which I don't want to do while it's repairing the raid.
Patrick Perdue
in reply to Patrick Perdue • • •The rest of the drives haven't shipped yet, though. Just replacing this one drive will buy me a little time, anyway.
Patrick Perdue
in reply to Patrick Perdue • • •Patrick Perdue
in reply to Patrick Perdue • • •Patrick Perdue
in reply to Patrick Perdue • • •Patrick Perdue
in reply to Patrick Perdue • • •The rest of the initial set of four replacement drives showed up today, so I am now repairing the raid after replacing drive 8. Drives 6 and 4 will be replaced next, then, eventually, the other four, probably next month.
Worryingly though, it seems to think that drive 7, which I just replaced, has bad sectors. I'll scan it again after this is done. Maybe it's a holdover in the log somewhere from when the original drive 7 was basically unusable.
[>....................] recovery = 0.7% (43711056/5855691456) finish=1318.3min speed=73474K/sec
It's taking so long for each drive, because my storage pool is at 87% capacity.
Patrick Perdue
in reply to Patrick Perdue • • •As I suspected, drive 7 doesn't actually have any bad sectors.
The new drive 8 has now been fully integrated. 6 has now been replaced.
Only one more out of this set after this one. The next four, when I get them, will go faster.
[>....................] recovery = 0.0% (3435332/5855691456) finish=1431.1min speed=68153K/sec
Patrick Perdue
in reply to Patrick Perdue • • •The last of the first set of four new hard drives for the NAS has now been replaced, and is currently processing.
After this, I get a level-up in the volume's storage capacity to 43.6 TB, which will increase again when I get the next set of four 12TB drives.
[>....................] recovery = 0.2% (12360500/5855691456) finish=1285.1min speed=75776K/sec
Felix Steindorff
in reply to Patrick Perdue • • •Patrick Perdue
in reply to Felix Steindorff • • •Felix Steindorff
in reply to Patrick Perdue • • •Patrick Perdue
in reply to Felix Steindorff • • •