dslreports logo
site
 
    All Forums Hot Topics Gallery
spc

spacer




how-to block ads


Search Topic:
uniqs
1926
share rss forum feed


koitsu
Premium,MVM
join:2002-07-16
Mountain View, CA
kudos:23
reply to robman50

Re: video card failing?

I'm talking about the RAM that's on the actual video card.


robman50

join:2010-12-14

makes sense.


robman50

join:2010-12-14
reply to koitsu

 
Click for full size
I did the VMT test and it completed with no errors and I did it the way you said.
This is the ALL IN WONDER series of the X800.
The GPU and PCB temp didn't go higher than 51C


koitsu
Premium,MVM
join:2002-07-16
Mountain View, CA
kudos:23

Okay, then that rules out video card memory as the source of problems. Thanks for the screenshots too -- those definitely help give some details about the card/etc.

Not sure what to say at this point other than at least the RAM on the card itself looks OK.

I should note that the memory tester is not the same thing as a stress test. It doesn't really stress the GPU much (you might think it would but it doesn't). For actual stress/load testing things like OCCT / OCCTPT do the job.
--
Making life hard for others since 1977.
I speak for myself and not my employer/affiliates of my employer.


robman50

join:2010-12-14

is that the next step? oh i should i have over clocked the cpu from 3200mhz to 3360mhz. should i go back to 3200mhz to run occt?



koitsu
Premium,MVM
join:2002-07-16
Mountain View, CA
kudos:23

Does the problem you're experiencing go away if you stop overclocking?


robman50

join:2010-12-14

nope, it really makes no difference.


robman50

join:2010-12-14
reply to koitsu

I am going to run OCCT now. Is there a certain test I should be running?


robman50

join:2010-12-14

said by robman50:

I am going to run OCCT now. Is there a certain test I should be running?

I would guess GPU:3D for 1 hour and DX9 since it is an older card and any other settings?

robman50

join:2010-12-14
reply to koitsu

Ran OCCT and with in 2 minutes I got 61306 errors, GPU temp got up to 95C and PCB got up to 60C.
Settings I used where DirectX 9, Shader Complexity 8, error check enabled.


robman50

join:2010-12-14
reply to koitsu

After a bunch of tests and changing the settings I have figured out that once the GPU reaches 90C I get tons of errors and if the GPU reaches 91.3C the system completely freezes up.

If the card is failing I would think it would be due to overheating damage.

GPU Fan is running at 100%.

What are some signs of overheating damage?



aurgathor

join:2002-12-01
Lynnwood, WA
kudos:1

said by robman50:

What are some signs of overheating damage?

Bunch of errors....

Now that you know the root cause, you can either reinstall the heatsink and the fan (and perhaps get a higher CFM fan and a bigger heatsink) or just install a different video card.

Not sure if you can do these, but slightly *lowering* the voltage and the clock speed of the GPU should make it run cooler.
--
Wacky Races 2012!

robman50

join:2010-12-14

I have removed the cooler, cleaned the dust out, wiped the old paste off the GPU, put fresh thermal paste (artic silver) on the GPU and put the cooler back on. I have also notice the HDD IDE cable was blocking the cool air from getting to the GPU cooler. I have also cleared away the dust from the CPU, PSU, HDD, AGP slot, motherboard. I tried to round the floppy cable for better air flow. Lastly I removed the one back plate so the GPU cooler can blow the hot air out of the back of the case.

Running the system idle and GPU-Z reports PCB is 42C and GPU is 45.6C.


robman50

join:2010-12-14

1 edit
reply to robman50

Click for full size
Click for full size
Click for full size
Click for full size
Click for full size
Click for full size
Click for full size
Click for full size
Click for full size
Click for full size
Click for full size
Here are a bunch of pictures of the video card and the cooler.

robman50

join:2010-12-14

Click for full size
and one more picture.


koitsu
Premium,MVM
join:2002-07-16
Mountain View, CA
kudos:23
reply to robman50

The first thing I notice is how there's no HSF contact on any of the RAM modules. Some older (and even some present) video cards were designed this way, but vendors eventually realised that it's generally not a good idea. After-market coolers rarely address this (depends on what you get).

The second thing I notice is what appears to be an exposed TIM pad in this picture. I'm talking about "the silver square" on the back. I can't tell if it's TIM or foam or what -- there's no way for me to tell without physically inspecting it.

Did the original HSF that came with this card cover both sides of the card or "wrap around" the card (example)?

I'm also amused that there's no HSF coverage on the RAM (and the jury's still on on the TIM pad or whatever it is on the back), yet there are gold heatsinks on what are probably the VRMs. Heh.
--
Making life hard for others since 1977.
I speak for myself and not my employer/affiliates of my employer.


robman50

join:2010-12-14

That silver square feels soft. I thought it was to protect the back of the GPU because that cooler has a bracket that clamps on to the card.

»euroalps.eu/technology/Computing···ink.html

»www.gideontech.com/content/articles/303/1

The card came to me with an older computer system that I took in so I do not know much about the parts.



koitsu
Premium,MVM
join:2002-07-16
Mountain View, CA
kudos:23

I had no idea what the card looked like with the HSF assembled on it, so thank you for the pictures -- yes, it's just a foam pad then, not TIM. Pshew.
--
Making life hard for others since 1977.
I speak for myself and not my employer/affiliates of my employer.


robman50

join:2010-12-14

Click for full size
Click for full size
Here is what the picture of the card on the cardboard box looks like.

robman50

join:2010-12-14
reply to koitsu

said by koitsu:

I had no idea what the card looked like with the HSF assembled on it, so thank you for the pictures -- yes, it's just a foam pad then, not TIM. Pshew.

I figured taking tons of pictures is better than taking to little.

robman50

join:2010-12-14

2 edits
reply to koitsu

I had to reset the BIOS back to defaults because the system was starting to lockup when the GPU temps where low.
Wow this box is just full of surprises. lol
What is the next item to test? I can test the HDD, RAM and thats about it. Don't own or have access to a multimeter to test the PSU. What are the chances of the CPU going bad? Maybe I just need to format and reinstall Windows XP?



koitsu
Premium,MVM
join:2002-07-16
Mountain View, CA
kudos:23

I would say the CPU is probably not what's bad. You're able to boot into Windows and run GPU-Z and run VMT without crashes. Your pre-POST crashes could indicate a bad CPU, but I don't think so. A bad CPU causes massive havoc at almost all times, and the behaviour is usually noticed pre-POST, immediately after POST, or *definitely* as Windows boots/loads drivers.

You could test system RAM if you wanted using memtest86+ (download the pre-built ISO, burn it, boot it -- nothing else you need to do). Let it run until "Pass xx%" has exceeded 100% or has wrapped back to 0%. This usually takes a few hours. What you're looking for is something like this. If this tool locks up hard then the problem probably isn't RAM-related but rather "something else" (PSU, voltages, motherboard, etc. -- but not the GPU. GPU isn't heavily used in memtest86+, obviously).

PSU testing is only helpful is you can test the PSU while it's in use, i.e. an "inline" or "passive" test. I have one of these which is not an inline tester (this device has said "OK!" many times with PSUs which were absolutely 100% bad when put under even light load). I have a Fluke MM myself but I am completely/entirely afraid of anything pertaining to electricity (really!) so with PSUs I tend to just buy another and re-test.

If I had to take a wild guess at the mess of a system? It'd be that you have two actual problems:

1. Possibly bad motherboard. Northbridge or southbridge, VRMs, or some other anomaly (cracked traces somewhere between layers, etc.). This is really hard to diagnose, and often manifests itself as the system just flat out locking up hard. Since VRMs are involved, and those are what provide voltage to your GPU, its possible that a busted VRM could cause OCCT to report errors (since your video card is powered off the AGP bus, not off the PSU directly)

2. Your GPU is by far running way too hot. Rather than futz with this I would recommend just buying another video card. This is hard to do since you're limited to AGP (good luck finding AGP cards these days -- I'm sure some are still made but ha!).

Let me ask you something very bluntly: are you willing to replace the parts in this system with something more recent, assuming you can do so on a budget with some help (from me)? I have hardware right now which I could send you (mostly free of charge -- some you would need to pay for however) that could build you a reliable replacement system (Socket 775, PCI Express, etc.). Specifically: motherboard, CPU, and video card. The board uses DDR3 RAM (one of the few Socket 775s which do) and I don't have any of that for you, but it's *super* cheap right now. I have the details typed out and ready to so just say yes/no.
--
Making life hard for others since 1977.
I speak for myself and not my employer/affiliates of my employer.


robman50

join:2010-12-14

Well I have all ways thought that the motherboard was the problem. Would this type of problem come and go? Like well one day it would act up and then be good for a month or so?
About the video card, I can always put my old X800 (crossfire edition, still have no idea why I got that model) PCI Express card back in to the PCI Express slot (like how it was a few years ago).
I used memtest86+ tons of times in the past so yeah I boot it and just let it run for 2 passes.



koitsu
Premium,MVM
join:2002-07-16
Mountain View, CA
kudos:23

We don't know what the root cause of the problem is, so it's hard to say whether or not the issue can come and go. But yes, generally speaking with circuits and ICs and things like VRMs, an issue can indeed "come and go" depending on what the real (scientific) root cause is.

How would you insert a PCI Express card into an AGP slot? I was under the impression your motherboard only has an AGP slot, and I'm not aware of any motherboards that offer both PCI Express and AGP slots (I wouldn't be surprised if one existed, but still).

Let me know about my last paragraph too, please.
--
Making life hard for others since 1977.
I speak for myself and not my employer/affiliates of my employer.


robman50

join:2010-12-14

Its one of the joys of an hybrid board. It comes with AGP 8x and PCI express x4. Also DDR and DDR2 slots.

Here is the link for the full specs.
»www.asrock.com/mb/VIA/775Dual-880Pro/


robman50

join:2010-12-14
reply to koitsu

Going to run memtest86+ now, I tossed the image on to a floppy disk.


robman50

join:2010-12-14

said by robman50:

Going to run memtest86+ now, I tossed the image on to a floppy disk.

I ran it and I did 2 passes without errors. So the system RAM is good.

robman50

join:2010-12-14
reply to koitsu

Click for full size
So the drive is failing. I'm not surprised since it is really old. Is there any utilities to fix it and will remapping the sectors work, or chkdsk?
Or should I just reach in my box of spare IDE drives and pull out another?


koitsu
Premium,MVM
join:2002-07-16
Mountain View, CA
kudos:23

This hard disk is not "failing". 1 remapped LBA in a total of 3231 power-on hours is really not that surprising given how old the drive is (manufacturing date). The drive has no other anomalies (I'm not sure how to read SMART attribute 3 on a Maxtor drive of that age; if the drive is truly taking almost 15 seconds to spin up, that might explain some strangeness during pre-POST, but I could be interpreting the attribute wrong -- that drive is VERY OLD!!!)

Any data at the remapped LBA, at the time of the remap, was lost. That would be 512 bytes of data. The LBA is usable but is now remapped to a different physical sector. Any writes to that LBA will function/work just fine.

The only way to ensure that no file/software on the filesystem is using the remapped LBA (meaning "to ensure the file whose data was stored in that LBA no longer has 512 bytes of zeros when it should have legitimate data") is to replace the file -- and doing that is easy if you know what file it is (there's no way to determine this without a checksumming filesystem), or -- the more common way -- to reinstall Windows entirely.

The hard disk also has nothing to do with your system randomly locking up hard / crashing, or your GPU temperatures being skyhigh. The hard disk having a reallocated LBA would not explain the system crashing after POST.

If the reallocated LBA was LBA 0, your system would never ever boot into Windows. So, the hard disk LBA reallocation is not the source of your problems, nor is it much of a concern at this point in time.

Finally, and for a third time: please see the last paragraph of my post and please answer me.
--
Making life hard for others since 1977.
I speak for myself and not my employer/affiliates of my employer.


robman50

join:2010-12-14
reply to koitsu

said by koitsu:

Let me ask you something very bluntly: are you willing to replace the parts in this system with something more recent, assuming you can do so on a budget with some help (from me)? I have hardware right now which I could send you (mostly free of charge -- some you would need to pay for however) that could build you a reliable replacement system (Socket 775, PCI Express, etc.). Specifically: motherboard, CPU, and video card. The board uses DDR3 RAM (one of the few Socket 775s which do) and I don't have any of that for you, but it's *super* cheap right now. I have the details typed out and ready to so just say yes/no.

No thanks, that is okay. If it would to just die and not work any more I would just scrap this old P4 and replace it with my Core i5 2500k system that I just got to replace this old spare system.