Here is a post that is outside of my ordinary vein. It shows the process of isolating a problem, testing solutions, and applying a fix. This is the essence of engineering. Feel free to skip over this if you are not interested in gritty technical details.
Recently, overcome by computer envy, I decided to buy a new computer. There are a plethora of choices available, from new laptops to Do-It-Yourself kits. I decided that since it has been a long time since I built a computer, it was time to try another kit.
Years ago, I routinely built computers from components. I would purchase motherboards, processors, hard drives, power supplies, and other components and put them all together to build a new machine.
However, the continuous advances in technology soon made it a money losing proposition to build computers from components. It was easier and cheaper to buy a new computer, which usually included software packages, than it was to put a package together.
It is still cheaper in most cases to purchase a fully assembled computer with software included than to build one from a kit. But if you want to mix and match components to get the best performance, a kit is the way to go.
I also wanted to have a chance to test my skills, so it was time to build a new machine.
I ordered a kit from Newegg.com. My components were:
Newegg, as usual, did a great job and most of the components arrived quickly. One drawback was that, due to multiple sourcing of the parts, my components were shipped separately. This caused a delay in the receipt of the I5 Intel processor.
However, the components still arrived in the promised time.
The assembly went smoothly, although the case required judicious amounts of force to line up the screw holes.
Note that I had ordered a 4 GB SD card. This was to be used as a bootable device to allow the installation of a new Operating System. As a member of Microsoft’s Technet community, I can download and use almost any version of the Window’s Operating System for testing purposes. I chose to install the latest version of Windows 7 64 bit. I used my other computer to download and install a bootable version of the OS onto the SD card.
The system booted normally. I could see the BIOS screen as the system booted. I was able to see that the BIOS recognized all of the memory modules, the hard drive, and the SD slots. Inside the BIOS setup, which ASUS has made very user friendly, I set the system to boot from the SD slot. I wrote down the “out-of-the-box” BIOS level, 1803, which I could compare to the latest BIOS on the ASUS website. It would need to be updated to the latest version.
I did note some problems at this point with the memory. The ASUS MB has a system to automatically set the timing of the memory to allow it work with different manufacturer’s memory. I had to reboot several times and noted that the memory test was failing. I removed and re-inserted the four G.SKILL SDRAM memory modules and the problem appeared to go away. I wrote it off as a poor connection. I would later find that this was a hint of problems to come.
Once the memory problem was cleared up, I inserted the SD card and rebooted the machine, and proceeded with the Windows 7 installation. The install went smoothly and the system was soon set up with Windows 7.
After connecting to my home network, the next step was authenticating and updating Windows. I also installed Microsoft Security Essentials, a free anti-virus solution from Microsoft.
Soon, more problems appeared. I noted that the after the computer went to sleep; upon waking, the keyboard and mouse no longer worked. A reboot of the computer was required to get the mouse and keyboard to work.
In these cases, it is best to make sure the BIOS has been updated to the latest version. After navigating to the ASUS website, I checked for the most current BIOS version for my motherboard. There was some confusion as to the exact model number of my motherboard. The ASUS website has a software selection service that is SUPPOSED to indicate the correct motherboard type and the corresponding BIOS version. The ASUS automatic identify application showed my MB was a P87XXXXX Pro, not the correct P8H67-M EVO. It pointed me to the incorrect BIOS for my MB.
Luckily, after downloading the incorrect BIOS and attempting to burn it to the MB, the software was smart enough to stop me from overwriting the original BIOS with the incorrect version. After research showed the problem with the mis-identifying software, I did a manual search and found, downloaded and installed the correct BIOS version. The latest BIOS version for my MB was 2103.
This example serves as a caution not to trust everything on the manufacturer’s website.
After the BIOS update, the machine seemed to be more stable, no longer locking up after entering sleep mode.
I soon ran into other problems. I was using the new computer as a Home Theater, connecting it to my large TV via the HDMI port. The built in HDMI port was one of the reasons I chose this MB.
The next symptom of instability occurred while streaming video from my Hulu Plus account. I would get about five minutes of good video, then the browser would crash. I was using Internet Explorer 9 at first. It would crash and not recover, requiring a restart of the computer to clear the fault. Updating the browser offered no improvement.
Next, I tried using the Chrome browser from Google. This was an improvement as it would last longer before crashing, and could be recovered without restarting the machine, but the problem with streaming video still persisted.
Research on the Internet indicated that this model MB has a history of problems. I wish I had read all of those reviews on the Newegg site before I ordered the MB. Although there were a lot of recommendations (replace the MB, replace the Intel processor, replace the memory), there were no clear reasons for this type of fault.
Intermittent problems can be the toughest to isolate and repair. The first step is to find a set of conditions that will consistently reproduce the fault. Since my testing indicated the problem occurred when the MB was stressed (due to video buffering), I looked for a way to run stress tests on the MB. ASUS provides a set of software tools to do exactly that. There is the ASUS AI Suite II, which can be used to monitor temperatures and fan speeds, and the ASUS PC Diagnostic Tool, which can be used to perform stress tests of the CPU, memory, and video. I downloaded and installed the software packages.
Note that the software installation was not a simple matter. The first few times I tried installing ASUS AI Suite II, the installation did not work. ASUS responded to many user complaints by issuing another software package to cleanly remove the ASUS AI Suite II program so that a clean re-installation can take place. Other minor problems included: no progress monitor on the installation process, hidden acknowledgment screens, and “hangs” during the installation. After several attempts, I managed to get both software packages installed and working.
By running the Processor Stress test (using ASUS PC Diagnostics) and monitoring the Processor temperature (using ASUS AI Suite II), I could see what was happening to the MB during the test. At first, I used the standard 1 minute test for both the memory and CPU. This did not give me consistent faults. I changed the length of the CPU stress test to 5 minutes and finally was able to cause the computer to lock up every time.
Strangely enough, although the problem was later shown to be the memory, the Memory Stress Test did not consistently fail. The CPU stress test proved to be a better way to test the MB.
Being able to cause the fault every time was a great step forward, as I now had a method to verify if any changes I made to the BIOS parameters or components would actually improve the functioning of the machine. My five minute stress test showed the Processor temperature was hitting 52 degrees Centigrade (125 degrees Fahrenheit). This is warm, but still much lower than the maximum working temperature of the Processor. Intel’s specs on the Processor show a max of about 72 degrees Centigrade (160 Fahrenheit).
I targeted the memory first. This was because I had no spare Processors or MB. Also, since the memory was composed of four DDR SDRAM modules of 4 GB each, they could be removed and rearranged for testing. I removed all but the first module and ran the stress test again.
Success! The stress test ran for the full five minutes without crashing. I then installed a second module, which caused a failure during the stress test. I put this module in the “BAD” pile and installed a third module in the second slot. This module worked normally, causing no problems on the stress test. Installation of the fourth module also caused problems. It looked like my memory had a 50% failure rate.
To verify that the fault was caused by defective memory and not a bad memory slot on the MB, I moved the good memory to different slots. The two previously tested “BAD” modules were bad, no matter which slot they were used in.
Still, I was stuck with a 64 bit machine and only 8 GB of memory, after paying for 16 GB. Due to the amount of time I spent testing, I had exceeded Newegg’s 30 day return policy. I thought my options were to either continue using the machine with only 4 GB of memory (perhaps selling the defective memory on Ebay as “slightly used”), RMA the memory to the manufacturer, or get more memory from Newegg.
Not wanting to inflict this intermittent memory on another poor user, I ordered another 16 GB of memory from Newegg. Why order more memory than I needed? G.Skill recommends replacing the entire memory module kit when problems are found, as they are a matching set.
Upon arrival, I installed the new memory modules, testing them one at a time. They all worked correctly. In accordance with the manufacturer’s suggestion, I replaced the entire set. My computer has been working for several weeks now without issue.
I was able to RMA the old memory to Newegg. I notified Newegg that the memory was defective. They can argue with the manufacturer. After paying the extra shipping and the “restocking fee” I ended up getting about 3/4ths of my money back. Of course, that doesn’t count the endless hours of frustration caused by the bad memory.
With the clarity of hindsight, I revisited Newegg’s “feedback” pages for the G.Skill memory. About 30% of the reviewers of this memory had problems, either one or more sticks DOA, or setup problems due to overclock settings. Although G. Skill offers a “lifetime warranty”, the time and effort needed to send back a full set of memory modules exceeds the savings achieved by buying “cheap” memory.
To their credit, Newegg stands behind their warranty and gave me no trouble with the return. The only problem was the delay inherent in shipping components. I have never had a problem with Newegg’s service. That is why they are my first choice when ordering electronics.