John Carrona - BSOD analyst

I have the Knack: sound byte here

www.carrona.org



How To Fix a BSOD RSS Feed Subscribe to the RSS feed Last updated:  02 September 2017
Added info about imaging the system

How to troubleshoot your own BSOD.

The easiest way that I've found is:
- Update your BIOS/UEFI to the latest available version
- Get ALL Windows Updates
- Get ALL OEM updates (for your system or for your motherboard) (it's better to uninstall, then install a fresh copy)
- Update any attached devices (that didn't come with the system or with the motherboard (it's better to uninstall, then install a fresh copy)
- Update any software that was installed after you first got the system or the motherboard (it's better to uninstall, then install a fresh copy)
- Run these free hardware diagnostics:  http://www.carrona.org/hwdiag.html  The first few are bootable - so you can do them even if you can't boot to Windows.


If this doesn't fix it, then...
Backup your stuff (as this will wipe the system clean) and then try a clean install of Windows...
[QUOTE]
A clean install is:
- Windows is installed to a freshly partitioned hard drive with legitimate installation media (W10:  https://www.microsoft.com/en-us/software-download/windows10 ).
- The installation media is only a copy of Windows, not the OEM recovery disks that you can make on some systems.
- Windows is fully updated after it's installed.  That's ALL updates - none excepted.
- NO 3rd party software is installed.
- There are no errors in Device Manager (if you find any, post back for suggestions).

This will wipe everything off of the computer, so it's advisable to backup your stuff first.
Also, it will wipe out all the special software that the OEM added to the system, so if you rely on any of that - let us know what it is so we can figure out a way to save/download it (the easiest way is to create/obtain the OEM's recovery media) - but do this BEFORE you start the clean install!

If unable to find recovery media that has the software (or if you suspect that this is a hardware problem), you can make an image of your system that'll preserve everything in the state that it was in when you made the image. One drawback to this is that you're making an image of a malfunctioning system - so, if there are errors in the system software, you'll have a nice copy of them :( Another drawback is that the image of the system will be very large - so you'll most likely need a large external drive to store it on. But, this will allow you to save everything on the hard drive (although you'll need an image viewer to get things out of the image). The point here is that, if it's a hardware problem, then you can restore the system to the point it was when you made the image - after you repair the hardware problem. You can obtain more info on imaging in the Backup/Imaging/DiskMgmt forums located here: http://www.bleepingcomputer.com/forums/f/238/backup-imaging-and-disk-management-software/

The point of doing this (the clean install) is to:
- rule out Windows as a problem (if the problem continues, it's not a Windows problem as you completely replaced Windows
- rule out 3rd party software (if the problem continues, it's not a 3rd party software problem as you didn't install any 3rd party software)
- so, if the problem continues, it must be a hardware problem.

OTOH, if the problem stops, then it was either a Windows or 3rd party software problem.  If the problem doesn't come back, then you've fixed it.  Then all that remains is setting the computer back up the way that you'd like it and importing your data from the backup you made.
[/QUOTE]


Then, if the problem still happens after doing the clean install, I suggest running hardware diagnostics:
Ensure that you have completed these free diagnostics:  http://www.carrona.org/hwdiag.html
The first few are bootable - so you can do them even if you can't boot to Windows.
If the system is UEFI based (if the system came with W8 or W10, it's most likely UEFI based), post back if you have troubles booting to the media.

Then, if all of the diagnostics pass, try this stripdown procedure to find the problem device:  http://www.carrona.org/strpdown.html
This procedure works well for desktop systems - both OEM and custom-built - but doesn't work well with laptops (as most components are built into the motherboard).
On most laptops the only components you can easily remove are the hard drive, RAM, and wireless card.



More information (Good reading, but it may put you to sleep!)

Here's 2 excellent articles about BSOD troubleshooting from Microsoft.
This one is for Home Users (Microsoft's term):  https://support.microsoft.com/en-us/help/14238/windows-10-troubleshoot-blue-screen-errors
And this one is for more advanced users:  https://support.microsoft.com/en-us/help/3106831/troubleshooting-stop-error-problems-for-it-pros
Also, this is my DRAFT on how memory dumps are generated:  http://www.carrona.org/dumpgen.html
This'll remain a draft until someone comes out with a more definitive article on this.

Please note this description from the last Microsoft link that I  posted (these figures seem to have remained constant since the XP days - found in Windows Internals 4th Edition):
[QUOTE]
Our analysis (Microsoft's) of the root causes of crashes indicates the following:

    70 percent are caused by third-party driver code
    10 percent are caused by hardware issues
    5 percent are caused by Microsoft code
    15 percent have unknown causes (because the memory is too corrupted to analyze)
[/QUOTE]
Other  estimates (I can't recall where I found this) were:
- about 90% due to 3rd party drivers
- about 10% due to hardware problems
- less than 1% due to Windows drivers (presuming a fully updated Windows Update)


Third party code is generally non-Windows code.  But, there's some code (such as processor drivers, storage drivers, etc) that are included in Windows. And, there is some Microsoft code (such as XBox controller drivers or Office products) that isn't included in Windows.  In short, if the driver didn't come from Windows or Windows Update - then it's 3rd party.  This includes drivers for 3rd party devices/software that are offered from Microsoft Update. 

Hardware issues are any sort of problem that revolves around the hardware and isn't influenced by the software.  There is a "grey" area of compatibility - which is usually software problems that manifest as hardware issues.  The easiest way to avoid this is by using hardware that fully supports the OS that you're using, and by keeping the system fully updated (BIOS/UEFI, Windows Update, OEM/system updates, and hardware/software updates).
Here's my list that I generally post for hardware issues:
[QUOTE]
This list is not in any sort of order:
- borked (broken) hardware (several different procedures used to isolate the problem device)
- BIOS issues (check for updates at the motherboard manufacturer's website)
- overclocking/overheating - You'll know if you're overclocking or not. If uncertain we can suggest things to check.
- dirt/dust/hair/fur/crud inside the case.  Blow out the case/vents with canned air (DO NOT use an air compressor or vacuum as they can cause damage to the system)
- missing Windows Updates
- compatibility issues (3rd party hardware/drivers), older systems, or even pirated systems
- low-level driver problems
- or even [b]malware[/b] (scanned for when we ask for hardware diagnostics from http://www.carrona.org/initdiag.html or http://www.carrona.org/hwdiag.html ).
[/QUOTE] 

Microsoft code - this was, in the past, listed as Windows code. 
If this is Windows code, it's fairly simple - if there's no errors after a clean install of Windows that's fully updated and doesn't have any errors in Device Manager, then there's not a problem with Windows (a possible exception is faulty installation media)
If this is Microsoft code, then you'll have to remove the Microsoft stuff that's not part of Windows. 
Drawing this line (between Windows and other Microsoft code) is rather difficult.
XBox controllers and Office software are 2 rather obvious examples.  But what of .NET, SQL, C++, etc ?
My only suggestion here is that if it doesn't come with a clean Windows installation, then it's a Microsoft product that's NOT a part of Windows.



What I do to analyze a BSOD (this is very brief):
Also, my techniques do change - so this description is from 13 August 2017

First, I look at the Speccy report to assess the general state of the system and to see what the temps were.
Then I look at the System Information report (systeminfo.txt).  I look for the OS version, the original install date, the BIOS/UEFI date, the amount of free RAM (Available Physical Memory), and how many Windows Update hotfixes are installed.

Next, I look at the MSINFO32 report, looking at:
- Components--Storage--Drives (looking for drives with less than 15% free space)
- Components--Storage--Disks (depending on the number of Drives I see - more than 3 makes me wonder about the PSU being adequate)
- Components--Problem Devices (have to fix these)
- Software Environment--Windows Error Reporting (look for signs of different errors - remember the difference between user mode and kernel mode when looking here)
- Software Environment--Startup Programs (look for known problem programs/conflicts/compatibility/etc)
- Software Environment--Program Groups (look for known problem programs/conflicts/compatiblilty/etc)

Then, as needed, I look at the other reports for additional information.  Most often I'll look at the KernelDumpList, the different Driver files, and the Application and System Event Viewer logs.
As the Event Viewer logs are in text format, if I need to look deeper, I'll request a copy of the Administrative logfile.
The Event Viewer Administrative log is simply a compilation of the Critical, Error, and Warning entries from all of the logs.
But, it's sortable in the MMC viewer, so it's easier to look through (and the Information entries aren't there to clutter up the search).

Next I look through the dump files.  This is the most difficult part, and the pages at my website ( http://www.carrona.org ) may be helpful.
I use the Sysnative BSOD Processing app to automate a lot of the work here - processing multiple memory dumps and generating reports that help me see what's going on inside the crash dump.

I review the memory dump information and the drivers that are in the dump to see if I can find a pattern.  Included in this is the absence of a pattern - as many hardware issues will present themselves as a series of seemingly unrelated errors.  It's my opinion that there is no such thing as a truly random error in a computer system - it's just that the error is presented in such a way that the pattern is not obvious to us.

You'll note that the dump output doesn't list Windows drivers.  That's because Windows drivers are infrequently the cause, and that the repairs for Windows problems  are fairly straight forward, and (if the repairs don't work) you're going to end up doing a clean install.  Caveat:  I haven't updated the DRT (Driver Reference Table) recently, so newer  Windows drivers still appear in the dump analysis output.








© 2017 - John D. Carrona
Forum screen name: usasma