Please register or login. There are 0 registered and 1410 anonymous users currently online. Current bandwidth usage: 326.30 kbit/s December 12 - 06:38pm EST 
Hardware Analysis
      
Forums Product Prices
  Contents 
 
 

  Latest Topics 
 

More >>
 

    
 
 

  You Are Here: 
 
/ Forums / Memory /
 

  memtest: how reliable? 
 
 Author 
 Date Written 
 Tools 
Juan Pena Dec 05, 2011, 06:49am EST Reply - Quote - Report Abuse
Private Message - Add to Buddy List Replies: 5 - Views: 4500
I left memtest testing 512 mb ddr all night long. I checked, it ran over 30 times and only on two passes did it find something wrong. The other 28 passes, all went well.

what am I to think of this? is the module bad or not? can memtest also misdiagnose?

thanks in advance,



Want to enjoy fewer advertisements and more features? Click here to become a Hardware Analysis registered user.
john albrich Dec 05, 2011, 11:22am EST Reply - Quote - Report Abuse
Private Message - Add to Buddy List

Edited: Dec 05, 2011, 11:44am EST

 
>> Re: memtest: how reliable?
.
Memtest uses different test patterns and methods as it progresses. The pattern of the data can actually affect the content of adjacent physical memory "locations" in a memory chip. Kind of like dancing next to other people...they can bump into you and affect your dance and cause mistakes. Other things can also affect whether an error occurs (current "location" temperature, adjacent electrical fields, slight voltage or timing variations, whether you're overclocking, and etc).

See http://en.wikipedia.org/wiki/Memtest86 for some basic info on what memtest does and how it works.

As described it can take a convergence of multiple factors to cause an error and those factors may not converge each and every time you go through a test pass. As one example, it may take a high temperature AND a specific memory pattern shift to cause an error. Out of 100 passes you may see the error occur only once...but one error during normal operations is all that would be needed to screw-up your data, registry, etc.

You can also find various papers on memtest and memory testing in general through internet searches.


As for whether memtest can misdiagnose (false positive)...the answer is yes. ANY program can produce an error given the right conditions, timing, etc. It might be triggered by an OS error, a program sub-routine error (usually programming that doesn't properly handle an 'unexpected' set of conditions), or hardware error in some other part of the computer (e.g. the CPU, memory interface chips, motherboard design sensitivities, etc).

Even the impedance of the address, control, and data lines to the memory module can impact results. For example, a 4GB module might fail (even though the module itself it perfectly fine) whereas a 2GB module in the same location might work without any errors. Similarly, let's assume you have 2x4GB modules in slots 0 and 1. Even changing which of the two 100% OK modules goes in slot0 could make a difference. Of course in that instance it strongly suggests your system is probably too sensitive to varying electrical loads on the memory slots. However, extended testing may give you some confidence in a specific configuration. You can also try increasing the voltage slightly at a given "speed" to improve reliability...retest...and assess if you can live with a given degree of reliability. And one needs to keep in mind that a memory error (or errors) may NOT result in immediately detected failures. For example, a given error might not be detected for months until someone notices your over-budget report inaccurately said you only spent $16B instead of the correct $16T (and won't you be embarrassed then) ;)

Juan Pena Dec 06, 2011, 12:27pm EST Reply - Quote - Report Abuse
Private Message - Add to Buddy List  
>> Re: memtest: how reliable?
thanks John, for your contribution.

Out of 100 passes only 3 show any errors. I think I can disregard those pretty much.

john albrich Dec 06, 2011, 01:43pm EST Reply - Quote - Report Abuse
Private Message - Add to Buddy List  
>> Re: memtest: how reliable?
.
I consider that a fairly high error-rate.

If however, you simply use the machine for gaming or other non-critical function, and don't care that you may have to re-install things from scratch* from time-to-time, it might not be a serious issue for a use like that.

But, I personally wouldn't take the risk if the integrity of the data or even the accessibility to data and filesystem, or the reliability of the machine were important to me to any degree.


*You may have to reinstall everything from scratch even if you make backups, because the backups themselves could be compromised due to memory errors.

Juan Pena Dec 07, 2011, 04:29pm EST Reply - Quote - Report Abuse
Private Message - Add to Buddy List  
>> Re: memtest: how reliable?
thanks John for your comments.

I understand what you mean; the risk, the loss of data,may be more than I am willing to accept.

Anyhow, I stopped the program and rebooted. I have been running memtest now for almost 33 hours, 142 passes.... not one error yet.

Just FYI.

Again, thanks,

john albrich Dec 07, 2011, 07:57pm EST Reply - Quote - Report Abuse
Private Message - Add to Buddy List

Edited: Dec 07, 2011, 08:08pm EST

 
>> Re: memtest: how reliable?
.
Some things to ponder...I'm not saying this is the situation with your machine...it's more like just showing you how interesting things can get when diagnosing problems.

If you didn't change ANYTHING between the time you got the 3 errors in 100 passes, and the new testing where you've had 0 errors, it suggests you could have stability issues. IF that is the case, I think two of the more likely culprits are typically:

1) a voltage stability issue. Either the PSU or the motherboard's voltage regulators might calibrate slightly differently between Power-On-Resets (PORs). An initial power-on state might influence how well the regulators track and maintain voltage. Varying temperature could also make a difference (for example, if the room temp varied by a few degrees that could be reflected in a slight change in voltage regulation accuracy). It depends on the VR design, component and manufacturing quality, and tolerances of the components used.

2) a mechanically affected issue which could be introduced by significantly "touching" the system or by mechanically moving parts like HDDs, DVDs, etc. OR mechanical/electrical changes affected by temperatures. One example, if the RAM connections were "iffy" (corrosion, etc) then if you had a high-power CPU cooler assembly that vibrated quite a bit at high speeds, it could intermittently improve (or make worse) a corrosion or seating tolerance issue on the nearby RAM modules.

A real-life example I can share was a system that operated OK at "normal" moderate temps with occasional and short excursions to fairly "high" temps...but when the CPU/memory was exposed to constant high temperatures over a period of maybe 30 minutes or more, data integrity errors in the filesystem started popping up...but the system never shutdown or reported errors during tests. Apparently memtest86+ never got things hot-enough to cause the errors. But using [i]prime95[/b] stress-test modes brought the errors out...which ONLY appeared after trying to re-boot the system. The problem symptom was that it would often (but not always) fail to boot after such a test because of file corruption on the boot disk.

You might also try using a program that monitors and logs the high-low voltages and temps of your system over time. One freeware program I like is:
http://majorgeeks.com/Open_Hardware_Monitor_d6396.html
http://openhardwaremonitor.org/
Like most of these monitoring programs, you may have to manually correlate reported values with a given parameter. For example, it may show THREE or more CPU temperatures with generic labels. It will be up to you to determine which temp is which (e.g. CPU Tdie, Tcase, Tcore(n)). You may be VERY surprised by the changes that you see reported in primary voltages like +5, +12, and +3.3volts over a long period of time (many hours or even days). You may even see voltages that approach or fall outside of the specified voltage tolerances for a given voltage rail (e.g. more than +/-5% or +/-10% depending on the rail) and that would be cause for concern because the monitoring programs only take "snapshots". They don't report absolute high and low values. A voltage (or temp) could exceed by quite a lot more than is reported for the specified value in between sampling intervals and you'd never know it.


Write a Reply >>


 

    
 
 

  Topic Tools 
 
RSS UpdatesRSS Updates
 

  Related Articles 
 
 

  Newsletter 
 
A weekly newsletter featuring an editorial and a roundup of the latest articles, news and other interesting topics.

Please enter your email address below and click Subscribe.