dslreports logo
 
    All Forums Hot Topics Gallery
spc
Search similar:


uniqs
1589
dcohn
Premium Member
join:2004-05-22
North Bergen, NJ

dcohn

Premium Member

[hard drive] RAID 1 Array with Trouble DELL Intel Raid

Click for full size
Click for full size
Click for full size
I have a Dell 780 with Intel RST. He showed me a screen shot that stated Intel Matrix Storage manager ROM 8.5.2.1002 ICH1OR/D0

It showed the Raid as FAILED and under both drives from DELL ST1000DM003-9YN1 1 TB Seagates 931.5GB as error occurred.

Then the next screen shot is the system running with one of the two drives as active.

I included the three screen shots as attachments. DRives were purchased directly from DELL as supported drives for the Optiplex 780.

Will the SmartMONTOOL help in this case?

norwegian
Premium Member
join:2005-02-15
Outback

norwegian

Premium Member

With raid 1, the raid should be able to adopt using just 1 drive.
Mirrored (raid 1) will allow that.
How you do it on Intel I'm not sure, I had a Nvidia chipset for most of my tests.

On smartmontools, I noticed that these hard drives have a firmware update at Seagate too, found here.
A windows based package to check firmware from Seagate is here.
Smartmontools will allow an offline reports that will list the firmware version, as I gather this computer has stopped booting to the desktop since the error occurred and you are looking for help to sort the error in bios?

koitsu
MVM
join:2002-07-16
Mountain View, CA
Humax BGW320-500

koitsu to dcohn

MVM

to dcohn
smartmontools can talk "through" the Intel RST/RAID layer to the individual disk using special device names (e.g. smartctl -a C: will not work). Use smartctl --scan to get some ideas of what you should try (they will likely be /dev/csmiX), then provide output from smartctl -x for both drives in the array. Please store the results in separate text files (one for each drive) and don't copy-paste (this messes up the formatting and makes it harder for me to decipher. For example once you find the working syntax, just do smartctl ... > C:\drive0.txt for one drive and smartctl ... > C:\drive1.txt for the other, then provide those here as an attachment. There is no need to hide or edit the serial numbers -- nobody can do anything with them without physically possessing the drive).

Chances are only one of them is misbehaving and that the option ROM has a bug in it where for some reason it marks the entire array as failed when only one member is misbehaving. Hard to say. But there's also the chance both drives are misbehaving, but maybe only one is doing it intermittently.

Regardless, you should open a ticket with Dell about this matter. They should be able to step you through exactly what needs to be done, plus get you replacement drives, assuming everything is under warranty. If things are not under warranty, please state that.

P.S. -- You initially said "I" then later used "he". Is this your system or someone else's (i.e. a client)?
dcohn
Premium Member
join:2004-05-22
North Bergen, NJ

dcohn to norwegian

Premium Member

to norwegian
Thanks.

We had weird issues where the Raid would first show drive 0 failed then after a restart drive 1 would be failed and 0 would be running.

We removed the drive from Raid and restarted and it ran chkdsk. Ran chkdsk manually twice more until he had no errors.

Machine boots but every time he attempts a windows backup (image) using windows backup it fails toward the end of process.

Had him run sfc /scannow and it shows unresolved issues. I suggested a reinstall as this is several years old already but the boss wants the setup as it is. It is the boss's machine.

So he is troubleshooting himself right now. I am waiting for updates.

Can drive Firmware cause this type of issue?

Thanks

Doug

norwegian
Premium Member
join:2005-02-15
Outback

norwegian

Premium Member


Isn't it great when what is required to help someone, you get told no.
Wait till it really crashes and the "I told you so" comes out.

Firmware - depends on what the problem was they fixed by the update. Answer, yes it can, if the firmware update was critical. I've not researched so it is theory, only the hard drive manufacturers can tell you what was fixed.

For your piece of mind, not the boss, talk to koitsu about smartmontools to check the HDD's.
Then if you can report the drives are failing, there will be nothing this boss can blame on you.

koitsu
MVM
join:2002-07-16
Mountain View, CA
Humax BGW320-500

koitsu to dcohn

MVM

to dcohn
The CHKDSK method may or may not solve the problem permanently. In fact, that may actually create more problems than it solves; CHKDSK if it finds issues causes write operations to happen, which can affect the general status/state of things at a deeper level. If one (or possibly both) of the drives is experiencing LBA or sector-level anomalies, there is no way for CHKDSK to know about that (unless the LBA is unreadable), so it's possible that there may have been transparent data loss as a result of this. (No CHKDSK output was provided, so I can't analyse that either)

I would not have advocated doing that. I would have rather seen actual SMART attributes for each of the disks first, then waited on my analysis beforehand. But the "damage" has been done at this point and there's no going back.

If you still want me to do an analysis of the drives, please get smartmontools 6.3 (you need smartctl.exe and drivedb.h) and figure out the syntax in question + provide what I asked in my previous post.

As for whether or not a drive firmware could cause this problem: anything is possible. I'm doubting it's responsible in this scenario, but those models of drives do have at least one major firmware bug (that I know of), but it would not manifest itself in the manner shown here. Would I advocate updating the F/W anyway? Yes, but only after doing full backups. I have a gut feeling what the issue is, but I'd rather not speculate. I'll let you decide which avenue to pursue.
dcohn
Premium Member
join:2004-05-22
North Bergen, NJ

dcohn to koitsu

Premium Member

to koitsu
Machine is used by boss of company I work for. Warranty was allowed to lapse against my wishes. The current drives were bought from Dell in OCT 2013 but are only warranted for 90 days when not included as part of a system. Luckily they had the exact same drive in stock, same firmware even and sold it for $47.00 with Next day shipping. Very nice of them. We bought one and initially tried rebuilding the array which failed.

I Read your last email as well and I will retain that bit of info about not allow chkdsk to run when having this type of issue.

Before I request he run smartmontools I want to be sure you follow that the current system is not under Intel Raid any longer. It is a single drive of the two that were in a Raid 1 set but the current drive only was removed from the Raid set as an attempt to get it to boot.

When neither would boot as a degraded raid we saved one of the drives as it was (and still have it - No chkdsk or anything was run against this) and with the other went into Bios and removed the Raid. The system then booted with the single NON Raid drive but it started running chkdsk. I understand in this circumstance you do not recommend it and I assume that means we should have cancelled it before it started???

So now my question is should I put the left over single Raid 1 drive into the system and try to run smartmontools? Is there a method you suggest when the drive does not boot. OR should he run smartmontools on the current drive that is operating but is unable to create a backup image and shows errors on SFC /SCANNOW.

I greatly appreciate your advice in this.

Regards

Doug
dcohn

dcohn to norwegian

Premium Member

to norwegian
Thank you for all the assistance. From a business view spending this amount of time is hard to value yet buying extended warranty is a way to save? Seems backwards.

I would love to know if anyone has a doc that details at what point you cancel Chkdsk if started on reboot and you did not set it manually. When do you let it run and when do to stop it.

Again Thank you. I responded to koitsu and think it is wonderful having someone with such skills available.
dcohn

dcohn to koitsu

Premium Member

to koitsu
Thank you. I am waiting to see what his own "research" has discovered. I had asked him for the Chkdsk results but it was never supplied. Would it be in the Event Logs though unless he cannot work he does not take any time out to deal with the actual issue.

Again I thank you and will get you as much proper data as I can based on your suggestions.

Be well

Doug

koitsu
MVM
join:2002-07-16
Mountain View, CA
Humax BGW320-500

koitsu

MVM

It would not be in the event logs. All you'd find there is a computer reboot or possibly a "I ran CHKDSK Hehehehehe" message of sorts. If CHKDSK makes modifications to a filesystem, it can sometimes write that data to a .txt file (maybe it's a .log file), but where it puts that I can't remember. A Windows/Microsoft support forum would be able to answer that, and that's just a sub-part/subsection of what's all required. That's just purely CHKDSK output, it doesn't tell me anything about the condition of the underlying disks in any way/shape/form.

The simple thing to understand is: disks have their own set of characteristics and "health states" that are completely independent of a filesystem. CHKDSK is for filesystems, smartmontools is for disks. This is why having both pieces of information is useful, where the latter is needed to determine if the former should even be run (explained why above, re: CHKDSK makes modifications to the filesystem, thus submits writes to the disk(s), thus can affect the underlying health statistics that the disk keeps track of. But these are completely independent of one another; neither the disk nor the filesystem/CHKDSK has any concept of the other).
dcohn
Premium Member
join:2004-05-22
North Bergen, NJ

dcohn

Premium Member

Understood.

Thanks for the clear explanation. Seems that smartmontools or similar should be on every system that you are concerned with stability.

Again thanks very much

Doug
dcohn

dcohn to koitsu

Premium Member

to koitsu
Koitsu -
I have a similar issue as my original post here and I figured I would post my logs here with a question. If this is unacceptable and I need to create a new ticket please let me know and I will.

Otherwise I have a system where I could not get disk space back from disconnected offline files. several hundred gig. Lenovo W520 Laptop with i7, Win7 64, 8 gig Ram,. 500 gig notebook drive.

Received errors in Lenovo Solution Center stating Hard drive errors. Called them and they send new blank drive but I would like to fix current one. Machine boots and runs seemingly ok but I have avoided using it for a few months because of issues (Using other Laptops I luckily had).

At your advice I did nothing else but ran smartcontrol. (I used Gsmartcontrol but have the non gui version if I must rerun. I attached the log.