02Pilot
02Pilot Dork
4/11/18 5:07 p.m.

My previously reliable as gravity Win7 box (Win7SP1-64, I5 4690K, 16GB DDR3, MSI Z97) started giving my BSODs yesterday. I ran Clonezilla and imaged the OS SSD (120GB Toshiba), just in case; during this process I got a bunch of CPU hardware errors, but the image tested OK. Had all sorts of boot failures after that, but multiple restarts seemed to get that sorted out.

Temps reported OK, but I took the box out and blew out the dust, reseated the RAM, power, and SATA connectors. I had installed new updates yesterday, so I planned to do a System Restore, but somehow this seems to have been disabled, so I have no restore point. Instead, I manually uninstalled the two updates. It locked up when I restarted in the "Windows is preparing your computer" screen. I hard rebooted after about 10 minutes of waiting. It's running now, but who knows when the next BSOD will appear.

What's my best course of action here? I've got the Win7 installer disk (full retail version, not OEM). I suspect a hardware issue based on what I've been reading, but I'm not sure how to narrow it down from there, since everything seems to test OK. If there are logs from the previous crashes I can post them if someone will tell me where to find them. The OS SSD is pretty full - would that have an impact on this? (I've got a larger SSD on order, will clone the install when it arrives). I'm reading that it's possible to repair the Win7 install without nuking the whole thing - if it's worth doing, what's the best way? Suggestions in reasonably plain English (I know a little, but not that much) would be most appreciated.

GameboyRMH
GameboyRMH GRM+ Memberand MegaDork
4/11/18 5:15 p.m.

Yeah probably a hardware problem, first you should run a memory test to see if your computer is insane. Here's how to run the built-in Windows memory test that you have to sit and watch to see the results:

https://technet.microsoft.com/en-us/library/ff700221.aspx

Also most Linux LiveCDs come with a memory testing tool that isn't a PITA.

red_stapler
red_stapler Dork
4/11/18 5:40 p.m.

Are there any files in %windir%\minidump  ?  That would be where to start, those crash dumps can be analyzed for a root cause (usually a device driver).

02Pilot
02Pilot Dork
4/11/18 6:32 p.m.

The minidump directory is empty; this may be due to aggressive CCleaner settings. I'll change that and check after the next BSOD. FWIW, no new devices or drivers have been installed (at least that I'm aware of - virtually all of my driver updates are set to manual).

Memory Diagnostic turned up no problems.

 

02Pilot
02Pilot Dork
4/11/18 7:31 p.m.

Ran a Malwarebytes scan just in case: nothing found. Also took manual control of the pagefile setup and moved it off the crowded SSD to one of the HDs, which freed up quite a bit of space (SSD is now ~30% free space). Haven't been doing much on the machine this evening, but no further crashes to report at this point.

02Pilot
02Pilot Dork
4/12/18 10:50 a.m.

So in digging through the (incredibly annoying) Windows tools, I discovered that a file called igfxCUIService.exe had crashed a bunch of times recently, corresponding more or less with the times I've had the BSOD or other issues. Apparently this is an Intel graphics driver for onboard chips, which is very odd, considering that my onboard graphics are disabled in BIOS, and thus this shouldn't be running at all AFAIK. Haven't had any new instances of the problem since last night, and igfxCUIService.exe is not currently running (according to the Services tab in the Task Manager). Any ideas why this would be activating itself all of a sudden, and how to prevent it from happening in the future?

 

TenToeTurbo
TenToeTurbo Dork
4/12/18 11:25 a.m.

Reseat everything. Check for bad capacitors on your motherboard. Try a different power supply. 

GameboyRMH
GameboyRMH GRM+ Memberand MegaDork
4/12/18 12:43 p.m.

Are any Intel graphics drivers listed in your add/remove programs list? If so, you could start by removing it there. I suspect the driver crashing is more of a symptom than a cause though. In Win7+, a graphics driver crash shouldn't normally cause a BSOD.

02Pilot said:

Also took manual control of the pagefile setup and moved it off the crowded SSD to one of the HDs, which freed up quite a bit of space (SSD is now ~30% free space). Haven't been doing much on the machine this evening, but no further crashes to report at this point.

This will make the computer much slower, the paging file is one of the last files you want to move off your SSD. Manually setting a fixed size is a good idea though, especially on HDDs. The best performance can be had by disabling the swap file entirely if you have enough RAM, since Windows' swap management is idiotic and it will use some no matter how much free RAM you have.

The0retical
The0retical UltraDork
4/12/18 12:48 p.m.

Are you getting a different BSOD error every time it crashes? I was chasing a problem for a while where I was seeing issues with network stacks, video drivers, memory exceptions and a kernel exception. It turned out the SSD was failing and the Windows disk analysis tool wasn't catching it.

SSD's will do some weird things as they start to go. Boot failure and random driver failures would have me starting there after all that.

lastsnare
lastsnare Reader
4/12/18 12:55 p.m.

I haven't diagnosed a lot of BSOD related problems, but I do remember trying this once, might give you some clue as to what exactly exploded.  

If i remember correctly, you feed it the minidump file and it tries to make it somewhat human readable  

https://www.nirsoft.net/utils/blue_screen_view.html

 

 

02Pilot
02Pilot Dork
4/18/18 8:56 a.m.

OK, after a week without problems I had another BSOD this morning. No minidump file was saved for whatever reason. I uninstalled the Intel graphics driver and ICSTAgent (the latter showed up as having errors in the event viewer), and updated the Nvidia graphics driver just because. Next step is to install the new SSD and use Clonezilla to make a copy of the current drive onto it.

Regarding the pagefile location, I didn't see any significant slowdown, but I'm happy to get rid of it all together. Is 16GB of RAM sufficient to do so?

I don't have records of the past BSODs, but I do believe they have not been identical, which led me to suspect the SSD or a heat issue. Everything has been reseated and the dust removed. Fans are operating properly and temps are well within acceptable specs.

What is the preferred tool for checking the integrity of an SSD? Is it chkdsk or something else? It reports good health, but I know that's a fairly superficial guide.

GameboyRMH
GameboyRMH GRM+ Memberand MegaDork
4/18/18 9:39 a.m.
02Pilot said:

Regarding the pagefile location, I didn't see any significant slowdown, but I'm happy to get rid of it all together. Is 16GB of RAM sufficient to do so?

Definitely, 8~12GB is enough these days.

Chkdsk would check for file system damage which can happen independently of the type of storage, it won't check for lower-level physical damage to the SSD (or higher-level file corruption which is another possibility). Usually the first sign of physical damage to an SSD is a system crash followed by a suddenly unbootable computer and unrecoverable data, but you may have an unusual situation on your hands.

SSDs require special, manufacturer-specific interpretations of SMART data to diagnose problems, so the best way to test one is to use an SSD diagnostic tool from the manufacturer.

If there isn't one, you'll have to use a generic SMART tool like Crystal Diskmark (Edit: Whoops, actually CrystalDiskInfo), wmic or smartmontools and do some research.

02Pilot
02Pilot Dork
4/18/18 2:40 p.m.

The SSD is a Mushkin Chronos 120GB. I couldn't find a manufacturer-specific diagnostic tool, but I did run SSDLife Pro on it and it reports everything good with an estimated life stretching to 2026 (not sure just how valid this is, but it's the only SSD-specific tool I found in a quick search). Going to run chkdsk next.

EDIT: I was reading some things about chkdsk and SSDs that were rather unclear about whether this tool is a) useful and/or b) advisable. Some also suggest that if TRIM is enabled (as it is on my SSD) chkdsk is unnecessary. Getting a bit out of my depth here....

GameboyRMH
GameboyRMH GRM+ Memberand MegaDork
4/18/18 3:02 p.m.

I looked at SSDLife but it seems to be just a tarted-up SMART tool...it's better than nothing, but not as good as checking the SMART stats yourself.

TRIM has to do with sub-filesystem level SSD optimization, it won't do anything to reduce the need to run CHKDSK. It's just as useful and advisable to use on an SSD as a HDD.

02Pilot
02Pilot Dork
4/18/18 4:00 p.m.

So just the standard chkdsk /f, or do I need to run some other parameters as well?

The0retical
The0retical UltraDork
4/18/18 4:30 p.m.

You may want to use /r as it looks for bad sectors and also includes /f.

I never had it find anything when mine failed. You may also want to try

sfc /scannow

just to check the integrity of Windows files.

02Pilot
02Pilot Dork
4/18/18 6:11 p.m.

chkdsk and sfc both turned up nothing.

GameboyRMH
GameboyRMH GRM+ Memberand MegaDork
4/18/18 6:21 p.m.

I think it's safe to say by now that the SSD is not the problem, since not only did it not turn up hardware problems, but also no filesystem or file corruption problems that would likely result.

02Pilot
02Pilot Dork
4/18/18 9:35 p.m.

If it's not the SSD, any guesses as to other likely culprits? I haven't had a problem since I uninstalled the Intel graphics and Smart Connect drivers, and both showed up in the event viewer as having problems concurrent with the crashes, but I don't know if it was cause or effect.

GameboyRMH
GameboyRMH GRM+ Memberand MegaDork
4/19/18 8:59 a.m.

Since it's still most likely a hardware problem, try stress-testing your system to see if that triggers a BSOD. I'd start with a demanding videogame or graphics benchmark to stress everything at once. I accidentally found that Mirror's Edge 2 would expose RAM stability problems before anything else (causing a hard reset). Then if there are crashes we can try component-specific stress tests to narrow it down.

The0retical
The0retical UltraDork
4/19/18 11:53 a.m.

In reply to 02Pilot :

If you're confident it isn't the SSD, the RAM or the power supply would be the next starting points.

RAM you can check by popping out a pair at a time and seeing if the issue continues. Additionally you can run something like memtest x86 or the Windows utility mdsched.exe.

Power supply requires a known good one to check with. It's possible something is failing resulting in some dirty power and which produces the weird inconsistent faults.

 

I'd still be wary of the SSD though just from my experience of chkdsk and sfc not finding anything then having the boot sector fail.

02Pilot
02Pilot Dork
4/19/18 12:32 p.m.

I'm a little wary of the SSD too, just because of the potential for sudden failure.

That said, I just (inadvertently) provoked a BSOD with a very specific action. I plugged in a USB flash drive (data storage only) and it instantly crashed. The only other things open were a couple browsers (one of them running a stream of music), an email client, and two graphics programs - nothing stressful. Certainly points to a hardware problem, but what? I find it hard to believe that one bad USB port could result in crashes when the port was unoccupied.

GameboyRMH
GameboyRMH GRM+ Memberand MegaDork
4/19/18 12:36 p.m.

Have you checked for swollen/leaking capacitors on the mainboard already?

02Pilot
02Pilot Dork
4/19/18 12:46 p.m.

Not yet. I won't have a chance to pull it apart again until tonight or tomorrow. Will advise.

02Pilot
02Pilot Dork
4/20/18 8:09 a.m.

OK, visual inspection showed zero indications of anything burned, damaged, or out of place on the motherboard. Clonezilla is currently creating a metal-to-metal copy of the boot SSD onto the new, larger SSD; thus far at least, it is not popping up the system errors that it did when I made the backup. Interestingly, not that I think about it, I had the external drive I made the backup copy on hooked up to the same USB port that invoked the BSOD I had yesterday. Could just be a coincidence, and I think I've had crashes with that port empty, but it does make me wonder a bit.

Once the copy is done I will boot from the new SSD - hopefully without any issues - and go from there. If I still get crashes I'll have to start trying other components.

EDIT: SSD copy was seemingly successful, as the machine is up and running normally from the new SSD (the copy made the new partition the same size as the old drive, which I will fix with gparted eventually). Now the waiting begins....

You'll need to log in to post.

Our Preferred Partners
WIf9twVudaTMyHssiwS7vH45akvdxvEbQlr9Hnfe6dRGoXpPCBH4OVHwhcgjkjRD