Subscribe
to the RSS feed
Last updated: 02
September 2017
Added info about imaging the system
How to troubleshoot your own BSOD.
The easiest way that I've found is:
- Update your BIOS/UEFI to the latest available version
- Get ALL Windows Updates
- Get ALL OEM updates (for your system or for your motherboard) (it's better to uninstall, then install a fresh copy)
- Update any attached devices (that didn't come with the system or with
the motherboard (it's better to uninstall, then install a fresh copy)
- Update any software that was installed after you first got the system
or the motherboard (it's better to uninstall, then install a fresh copy)
- Run these free hardware diagnostics: http://www.carrona.org/hwdiag.html The first few are bootable - so you can do them even if you can't boot to Windows.
If this doesn't fix it, then...
Backup your stuff (as this will wipe the system clean) and then try a clean install of Windows...
[QUOTE]
A clean install is:
- Windows is installed to a freshly partitioned hard drive with
legitimate installation media (W10: https://www.microsoft.com/en-us/software-download/windows10
).
- The installation media is only a copy of Windows, not the OEM
recovery disks that you can make on some systems.
- Windows is fully updated after it's installed. That's ALL
updates - none excepted.
- NO 3rd party software is installed.
- There are no errors in Device Manager (if you find any, post back for
suggestions).
This will wipe everything off of the computer, so it's advisable to
backup your stuff first.
Also, it will wipe out all the special software that the OEM added to
the system, so if you rely on any of that - let us know what it is so
we can figure out a way to save/download it (the easiest way is to
create/obtain the OEM's recovery media) - but do this BEFORE you start
the clean install!
If unable to find recovery media that has the software (or if you suspect that this is a hardware problem), you can make an image of your system that'll preserve everything in the state that it was in when you made the image.
One drawback to this is that you're making an image of a malfunctioning system - so, if there are errors in the system software, you'll have a nice copy of them :(
Another drawback is that the image of the system will be very large - so you'll most likely need a large external drive to store it on.
But, this will allow you to save everything on the hard drive (although you'll need an image viewer to get things out of the image).
The point here is that, if it's a hardware problem, then you can restore the system to the point it was when you made the image - after you repair the hardware problem.
You can obtain more info on imaging in the Backup/Imaging/DiskMgmt forums located here: http://www.bleepingcomputer.com/forums/f/238/backup-imaging-and-disk-management-software/
The point of doing this (the clean install) is to:
- rule out Windows as a problem (if the problem continues, it's not a
Windows problem as you completely replaced Windows
- rule out 3rd party software (if the problem continues, it's not a 3rd
party software problem as you didn't install any 3rd party software)
- so, if the problem continues, it must be a hardware problem.
OTOH, if the problem stops, then it was either a Windows or 3rd party
software problem. If the problem doesn't come back, then you've
fixed it. Then all that remains is setting the computer back up
the way that you'd like it and importing your data from the backup you
made.
[/QUOTE]
Then, if the problem still happens
after doing the clean install, I suggest running hardware diagnostics:
Ensure that you have completed these free diagnostics: http://www.carrona.org/hwdiag.html
The first few are bootable - so you can do them even if you can't boot
to Windows.
If the system is UEFI based (if the system came with W8 or W10, it's
most likely UEFI based), post back if you have troubles booting to the
media.
Then, if all of the diagnostics pass,
try this stripdown procedure to find the problem device: http://www.carrona.org/strpdown.html
This procedure works well for desktop systems - both OEM and
custom-built - but doesn't work well with laptops (as most components
are built into the motherboard).
On most laptops the only components you can easily remove are the hard
drive, RAM, and wireless card.
More information (Good reading, but it may put you to sleep!)
Here's 2 excellent articles about BSOD troubleshooting from Microsoft.
This one is for Home Users (Microsoft's term): https://support.microsoft.com/en-us/help/14238/windows-10-troubleshoot-blue-screen-errors
And this one is for more advanced users: https://support.microsoft.com/en-us/help/3106831/troubleshooting-stop-error-problems-for-it-pros
Also, this is my DRAFT on how memory dumps are generated: http://www.carrona.org/dumpgen.html
This'll remain a draft until someone comes out with a more definitive
article on this.
Please note this description from the last Microsoft link that I
posted (these figures seem to have remained constant since the XP days
- found in Windows Internals 4th Edition):
[QUOTE]
Our analysis (Microsoft's) of the root causes of crashes indicates the
following:
70 percent are caused by third-party driver code
10 percent are caused by hardware issues
5 percent are caused by Microsoft code
15 percent have unknown causes (because the memory
is too corrupted to analyze)
[/QUOTE]
Other estimates (I can't recall where I found this) were:
- about 90% due to 3rd party drivers
- about 10% due to hardware problems
- less than 1% due to Windows drivers (presuming a fully updated
Windows Update)
Third party code is generally non-Windows code. But, there's some
code (such as processor drivers, storage drivers, etc) that are
included in Windows. And, there is some Microsoft code (such as XBox
controller drivers or Office products) that isn't included in
Windows. In short, if the driver didn't come from Windows or
Windows Update - then it's 3rd party. This includes drivers for
3rd party devices/software that are offered from Microsoft
Update.
Hardware issues are any sort of problem that revolves around the
hardware and isn't influenced by the software. There is a "grey"
area of compatibility - which is usually software problems that
manifest as hardware issues. The easiest way to avoid this is by
using hardware that fully supports the OS that you're using, and by
keeping the system fully updated (BIOS/UEFI, Windows Update, OEM/system
updates, and hardware/software updates).
Here's my list that I generally post for hardware issues:
[QUOTE]
This list is not in any sort of order:
- borked (broken) hardware (several different procedures used to
isolate the problem device)
- BIOS issues (check for updates at the motherboard manufacturer's
website)
- overclocking/overheating - You'll know if you're overclocking or not.
If uncertain we can suggest things to check.
- dirt/dust/hair/fur/crud inside the case. Blow out the
case/vents with canned air (DO NOT use an air compressor or vacuum as
they can cause damage to the system)
- missing Windows Updates
- compatibility issues (3rd party hardware/drivers), older systems, or
even pirated systems
- low-level driver problems
- or even [b]malware[/b] (scanned for when we ask for hardware
diagnostics from http://www.carrona.org/initdiag.html
or http://www.carrona.org/hwdiag.html
).
[/QUOTE]
Microsoft code - this was, in the past, listed as Windows code.
If this is Windows code, it's fairly simple - if there's no errors
after a clean install of Windows that's fully updated and doesn't have
any errors in Device Manager, then there's not a problem with Windows
(a possible exception is faulty installation media)
If this is Microsoft code, then you'll have to remove the Microsoft
stuff that's not part of Windows.
Drawing this line (between Windows and other Microsoft code) is rather
difficult.
XBox controllers and Office software are 2 rather obvious
examples. But what of .NET, SQL, C++, etc ?
My only suggestion here is that if it doesn't come with a clean Windows
installation, then it's a Microsoft product that's NOT a part of
Windows.
What I do to analyze a BSOD (this is very brief):
Also, my techniques do change - so this description is from 13 August
2017
First, I look at the Speccy report to assess the general state of the
system and to see what the temps were.
Then I look at the System Information report (systeminfo.txt). I
look for the OS version, the original install date, the BIOS/UEFI date,
the amount of free RAM (Available Physical Memory), and how many
Windows Update hotfixes are installed.
Next, I look at the MSINFO32 report, looking at:
- Components--Storage--Drives (looking for drives with less than 15%
free space)
- Components--Storage--Disks (depending on the number of Drives I see -
more than 3 makes me wonder about the PSU being adequate)
- Components--Problem Devices (have to fix these)
- Software Environment--Windows Error Reporting (look for signs of
different errors - remember the difference between user mode and kernel
mode when looking here)
- Software Environment--Startup Programs (look for known problem
programs/conflicts/compatibility/etc)
- Software Environment--Program Groups (look for known problem
programs/conflicts/compatiblilty/etc)
Then, as needed, I look at the other reports for additional
information. Most often I'll look at the KernelDumpList, the
different Driver files, and the Application and System Event Viewer
logs.
As the Event Viewer logs are in text format, if I need to look deeper,
I'll request a copy of the Administrative logfile.
The Event Viewer Administrative log is simply a compilation of the
Critical, Error, and Warning entries from all of the logs.
But, it's sortable in the MMC viewer, so it's easier to look through
(and the Information entries aren't there to clutter up the search).
Next I look through the dump files. This is the most difficult
part, and the pages at my website ( http://www.carrona.org
) may be helpful.
I use the Sysnative BSOD Processing app to automate a lot of the work
here - processing multiple memory dumps and generating reports that
help me see what's going on inside the crash dump.
I review the memory dump information and the drivers that are in the
dump to see if I can find a pattern. Included in this is the
absence of a pattern - as many hardware issues will present themselves
as a series of seemingly unrelated errors. It's my opinion that
there is no such thing as a truly random error in a computer system -
it's just that the error is presented in such a way that the pattern is
not obvious to us.
You'll note that the dump output doesn't list Windows drivers.
That's because Windows drivers are infrequently the cause, and that the
repairs for Windows problems are fairly straight forward, and (if
the repairs don't work) you're going to end up doing a clean
install. Caveat: I haven't updated the DRT (Driver
Reference Table) recently, so newer Windows drivers still appear
in the dump analysis output.