BSOD issues Nitro 5 AN515-54

Unknownhost
Unknownhost Member Posts: 17 Troubleshooter
edited December 2023 in 2020 Archives
Hi all hoping for some more help. A little back ground to help maby resolve my issue.

So about 2-3 months ago i put in a new Crucial MX500 2.5" SSD 1Tb and 8GB of ram into my system, boosting it up to 16GB of ram and increasing the memory ,still keeping the factory installed 256GB m.2 SSD. Everything was working fine for about a week.  After that i got a random crash whilst playing a game and got a no bootable device issue. After restarting the laptop it booted up fine and continued to work as usual. Then fine again , did some virus checks and ran disk checks etc etc after following guides on the web ( im no expert and am very new to trying to diagnose my system ) and nothing was found. So i continued to use it again , again fine for about 2-3 weeks this time. This is where my issues are now arising , so for about a month and a half , everyday nearly without fail i will get a BSOD ( only once a day and then works fine again) and each time i get a variety of issues pop up. here is my list that ive managed to get pictures of before it restarts : system service exception ( what failed : nvlddmkm.sys ) , System service exception ( no reason ) , System service exception ( no reason ) , System service exception ( what failed : win32kfull.sys ) There have also been a few others that i havnt managed to get my phone out quick enough whilst it reboots , one was a kernal issue and the one i got today as i was watching Sky go was a Thread issue i believe it said but cannot remember the full issue it stated.....

As said i have run disk checks , run powershell ( think thats what it was called ), i have run virus checkers , updated all drivers both manually and automatically where needed etc etc following all the guides i have found on specific issues i have managed to find. Ive run hardware checks and everything says its working fine , ive revolted my battery up as was undervolted due to temp issues these devices seem to have and am now at a loss to what else i can do. 

The only thing i havn't done which all of these guides have left as last resort is to do a fresh reset , due to a possible OS issue. Now i do not want to do this as i dont want to lose everything i have on the laptop at the current time , albeit only game services and clients , i just dont want to have to reinstall all of them again unless its a dead certain this will resolve my issues.  

So i guess after all this info my question is , Can anybody think of any other issues that might be causing these issues that im dealing with and have a way to fix them using a dummies guide to help me  or is it better to restart fresh with all the new hardware in it and see if it fixes the issue? I mean i have so much installed now i have no idea what software could be causing the issues so cant go back and uninstall, test ,reinstall route i have seen mentioned. I was ok with it before as once a day and then running fine , but know i really want to get to the bottom of it as the laptop was only just brought in march 2020. 

Well thank you to anyone that takes the time to read and respond , maby one of you know something i havnt seen or have had the same issue and can help me out. :) 


Answers

  • aphanic
    aphanic Member Posts: 959 Seasoned Specialist WiFi Icon
    We could look at running some tests to see if we find something that could be causing it, some of which you may have done already maybe buy I'd like you to run them again, for good measure.

    First thing I'd like to know is the full model number of your Nitro, we know the series for now, an AN515-54, but we don't really know what processor is in there for example, the letters and/or numbers that follow that would tell us the sub-model. For example, I have an A515-54G-70Y9, that last part is enough to know what my laptop had out of the box.

    Now, I'd like to know details on those RAM sticks you have, because you said you added 8 GB to the other 8 you already have and I just want to double check that they're actually compatible, or mostly compatible at some level. Sometimes getting mixes like those can cause problems, which is why I always tend to upgrade using matched pairs and end up selling the one I had before.

    It may be nothing, they may be perfectly compatible, but it doesn't hurt knowing more stuff about them. Download Thaiphoon Burner (the free version suffices and you can disregard the warning you'll get whenever you start the program, it's just telling you can't hack the memory so to speak) and post a screenshot of the data it reads of each of the sticks:



    Next, I'd like you to check that your Windows installation is "healthy", not my term, but the next command runs some sanity checks. Open an administrative command prompt or PowerShell (if you right click the Start Menu it's in that list) and run the following command (highlighted in yellow):



    That's the output I'd like to see, no problem at all. Because you'd be running it in an SSD the second command will complete rather quickly, but otherwise it could take a while, it's normal.

    We'll then proceed from there and see what's up ;)
  • StevenGen
    StevenGen ACE Posts: 12,480 Trailblazer
    edited August 2020

    Firstly “Unknownhost”, all your BSOD’s from what you have listed e.g. “System service exception, what failed: nvlddmkm.sys) and the win32kfull.sys” is a system service exception blue screen of death which the .sys file is related to the device driver error, such as nvlddmkm.sys is the graphic driver error, these are related to driver issues, the “nvlddmkm.sys” (Display Driver Stopped Responding and Has Recovered) has to do with your NVIDIA driver and there are numerous fixes:

    Fixing the Nvlddmkm error 1 Solution

    1. Power Supply.
    2. Changing the Theme.
    3. New Drivers.
    4. Windows Update.
    5. Take Out RAM and reseat the RAM or Incompatible RAM problems
    6. Better Cooling
    7. Graphics Card problems

    The easiest way is to firstly uninstall your NVIDIA driver and get the latest driver for your Nitro 5 AN515-54 and reinstall it from either the Acer site of the Nitro 5 AN515-54 that has the NVIDIA driver dated 2019/09/26 version 417.88 or look on the NVIDIA site and see if they have a newer driver?

    Also, the upgrading of an SSD cannot cause any of these BSOD’s while with the RAM the only problems that you could have (as I see in your captions, that the Hynix DDR4-2666V downbin RAM is slightly different "Speed Grade" to the Micron Tech RAM) and that could be a problem, as they have to be the same e.g. you should have used CPUz and looked in the “SPD” section for your OEM RAM exact specs and matched its specs and manufacturer to be the same or similar RAM speed and timing for your new RAM, otherwise your new RAM could be incompatible and could cause BSOD’s. Get CPUz and send us the current captions of both SPD#1 and #3 and the "memory" sections, as that is a better reference. 

    If these steps don’t work and you are still getting BSOD’s then you have no option except to do a RESET which you have two options”Keep my files” and “Remove everything” but and anyway, what I would do is separately “back my files all your personal files with an external backup, e.g. Macrium Reflect v7 or EaseUS etc and do a “Reset” after.


  • aphanic
    aphanic Member Posts: 959 Seasoned Specialist WiFi Icon
    I completely disagree with your CPU-Z statement @StevenGen. For example, it doesn't even show all of the JEDEC profiles that are embedded in the SPD chip, only the last 4 (or less if it's XMP enabled memory) and it doesn't show some of the other sub-timings, you wouldn't be able to see the subtle difference in tRRDS in there, which if memory doesn't fail is the command delay between active banks in different bank groups.

    In my opinion, it's always better to go to specialized tools (in any field) instead of general ones, specially when they're free. AIDA64, for example, would also list all of the profiles the card has, its number of ranks and banks and all; but being a paid app some of the data may only be visible in the paid version of the app.

    Anyway, back to the point, Windows reports the system to be alright, so at first glance I don't see a need for formatting the unit and in that case Reset, like Steven could be a suitable alternative. Certainly not as thorough as what a clean installation would be, because it keeps certain things like the stuff you have in your personal folders, but it gets rid of all of the apps, that you'd hate to reinstall, and if I'm not mistaken the content of the registry plus %AppData% and %LocalAppData% as well. So if you have any save games in there that are not synced to an online service you may need to back them up before proceeding.

    But I can't avoid mentioning the RAM, although you can see that most of the details of both sticks are the same (they're both 1 rank sticks, with same organization) the difference in tRC and tRRDS could be what's causing your headaches.

    tRC should just be tRAS (time from activate to precharge) + tRP (row precharge), and you see it doesn't match in the stick that came with the laptop (I'm assuming it's the one of the left?). It should be 62 instead of 61 just like in the other chip. Different ways that can happen, since SPD profiles can be specified at the factory, or overwritten (that's the warning you got at the beginning, your laptop won't allow you to do that), that chip being a downbin could explain it.

    Don't quote me on this, but when processors or memory are not performing well at their rated specifications, instead of being discarded they're rebranded of sorts, in processors those cores could be disabled and sold as a lower SKU, and in memory they are down-binned. Instead of rating that the sticks to run at 3200 MHz (for example), they choose to make 2666 MHz the max frequency at which it can run by not including any profile for 3200 MHz. They should test that of course, that it performs as expected at the those frequencies.

    But that's just it, finding a matching pair for an existing stick can be complicated, and adding to the complication many times the sticks you find plugged in laptops by default are manufactured for the laptop brand itself, so you won't find an exact match unless you go second hand from some who had upgraded theirs.

    We could have the laptop run some memory testing maybe, create a USB bootable drive with a memory testing app and let it running during the night while you sleep several cycles. If you see any errors in the morning, you'd know for sure the RAM you have is misbehaving, or incompatible when working together.

    You could also run a different test, get one of the sticks out of there and use the laptop for a while. You'll have a performance penalty for sure, because not only you'd be running with half the RAM but also in single channel, but if the system becomes stable you'd have another test corroborating the incompatibility between the sticks.
  • Unknownhost
    Unknownhost Member Posts: 17 Troubleshooter
    hello there Stevengen 

    thanks for your reply, alot of what u have said has been lost on me , im not great at software stuff so ull have to excuse my ignorance and lack of knowledge on this front :) 

    so from your list what 1-7 i dont understand what u mean with a few of them.
    like number 1 , how do u mean with power supply? 
    2. what do u mean by changing theme?
    3. i use the geforce experience to update the drivers and i am currently on version 451.67 dated 07/09/2020. game ready driver. 
    4. i have all up to date and checked and double checked :)
    5. i did that the night i posted this initially, just as ive seen reports that due to heating up and cooling down something could have happened , though got another BSOD last night just as it was sat running doing nothing. 
    6. on this i have had a few issues and have undervolted it , i did have it @-125 but have put it up to about -110 just as a friend said it could be the card isnt getting enough power etc etc. before undervolting the system would sit @90 degrees and sometimes go higher when heavy gaming , now with the undervolt and fans running at max speed and a cooling fan situated to blow air to the under tray it is running at around the 65-70 degrees when plugged in and heavy gaming. lower when unplugged.
    7.hopefully not , only had for about 6 months and if ive added stuff i guess if there is an issue with the GC im up the creak without a paddle so to speak. 

    When i ordered the Ram i did use CPUz to find out all the ram specs and the issue you spotted ( which i also saw was slightly different) was not mentioned. I might also add that i used this page and was told that it would also be fine as long as i matched the speed so it didnt knock the other one down and was also told that the crucial one was the best bet. i was going to buy two sticks as then was going to put the one already in my device in the second one i have for my partner. So if this is just the RAM stick issue then maby i could transfer the crucial one to hers and take her stick which should be identical as they are exactly the same model , brought them both on the same day and everything is the same. 


    As i have been writing i see that you have also replied Apahnic :)

    After reading i agree with you both on the fact that the factory fitted RAM stick is slightly diffrent and these issues all started to happen when i put this stick in. I think im going to ( read my last bit to steven ) take the RAM sticjk from my partners and give her the crucial one and see if that changes things. Its annoying as i when i went to but the x2 sticks the supplier wanted and extra £200 quid from the RRP that crucial wanted so brought the one instead ..... but i suppose i might be lucky as i have two of these systems , so im guessing that they are the same stick. i will do a quick check on hers and see if they are identical and if so will go ahead with that first. Is that what you would do first?


    Im glad its not a OS issue or something worse ( or thought to be currently) 


  • aphanic
    aphanic Member Posts: 959 Seasoned Specialist WiFi Icon
    Yep, if you have two sticks that are the same (bought individually, but still) I'd try that first.

    Matching RAM is not cut-and-dried, it can be a tricky business and sometimes even when you get two sticks that are apparently the same (as in, they both were bought in the same store at the same time but individually packed) they can fail! At least by what some memory gurus state, I have never had the misfortune of stumbling upon that.

    But in that regard, if at all possible try to go for matched pairs (those they sell in packs already) and sell the one you have in eBay or something, it would be the best course of action towards getting full compatibility. Those that are sold in packs have been binned together and are guaranteed to work in dual (or quad) channel.

    There's a guide that right now is not necessary for you but if you have the need to upgrade another machine may come in handy: https://community.acer.com/en/discussion/608352/guide-how-to-find-out-if-you-can-upgrade-the-ram-and-which-one-you-need

    But! If in understood correctly, your partner has the same laptop as you do, not another Crucial stick right? Were they both bought in the same site/shop and at the same time? I'm asking you this because while on the outside the machines may look the same they could be completely different on the inside, from different branded (or speed) RAM sticks, to different LCD panels and even different chips soldered to the motherboard.

    They both should perform similarly, but due to availability, or pricing of the sub-components at the time of manufacturing the manufacturer (Acer in this case) can opt to use different components. All this goes to say that the RAM stick inside that other laptop may not even be Hynix made, but Kingston for example. Still, check the data with Thaiphoon and see if their profiles match, maybe it is a match for the Crucial you have instead, who knows.

    And let's hope that's where the problem stems for, because there's no commonality in the BSODs you were seeing, except the undervolting. That's another thing that could cause BSODs that are unrelated, you could try lowering it further, or even leave it as stock and see what happens. If the undervolt led to an unstable system that could be another cause for what you are experiencing.

    There is a procedure, called re-pasting, or exchanging the thermal paste that sits between the CPU, GPU and sometimes even VRMs and the cooling assembly (those copper pipes that go to the heatsink near the fan(s)). It is not a complicated procedure, but it involves opening the laptop, removing the cooling assembly, using alcohol (isopropyl is best) to remove the sub-par stock past, apply a healthy dose of a high performance one and reassembling the whole thing.

    So while I said it is not complicated, it is somewhat involved, but you can have an experienced friend to do it, or see the myriad of videos online about the matter to get a grasp at what it is before doing it to your laptop. That is, in my opinion, the best thing you can do to improve the thermals of a machine. No software trick is going to get close to that, we're talking about some 20ºC of difference sometimes.

    Explaining it a bit, what that paste does is help transfer the heat those parts produce to the cooling assembly, the sooner that happen, the cooler the parts run. That's where the better thermal compounds come into place, they are able to conduct heat much more quickly.

    Hell, why do I always end up writing so long posts?! Hahaha, feel free to read it a couple of times or ask anything you don't understand, I just let my mind roam free sometimes.
  • Unknownhost
    Unknownhost Member Posts: 17 Troubleshooter
    Lol 

    Its cool buddy , the more in depth explaining is welcomed and very helpful. 

    Yes both systems where brought on the same day at the same place , both the exact same specs , both AN515-54 52NB's , and hers is still running sweet with no issues , just mine since the upgrade so im thinking that this could now be the issues with what u have said along with Steven. 

    I also brought some kryonaut when i brought the RAM and SSD and was going to do it at the same time , but the paste was still good from the looks of it , it wasnt dried out so thought id leave it. Maby if the difference can be that good i might go back in and do that today when ive checked the other RAM and change that if its the same. Im good at tinkering with things mechanically and do like to take stuff apart and see the inner working so this doesnt bother me to much :) its just the software and everything from that point that i have trouble with haha 

    Thank you for the time and help so far , i will update you on the matter in a few days after ive done all this and see how it goes +1:  
  • Unknownhost
    Unknownhost Member Posts: 17 Troubleshooter
    @aphanic

    So seems that we have resolved the issue :D after changing the RAM stick ive been running for 3 days heavy gaming and really pushing the system to check things and no BSOD at all. I also changed the paste and ive seen a good result though think i might need to redo as i got a weird spike and show in temps whilst running today for the morning period. though before that got it sitting at 50-55 degrees steady whilst plugged in and fans running auto instead of max.
    The one thing i did notice though is that the paste i saw which i though was the CPU and GPU where infact not. Dunno what chips they where around the edges of the heat sink but they where covered in a much thicker pink paste. On all videos of the repasting these have had little heat pads on them is there a difference between the paste and pads?

    Thanks again for all the help , was starting to fear the worst with the BSOD 
  • aphanic
    aphanic Member Posts: 959 Seasoned Specialist WiFi Icon
    Aha! While I can't tell you what that thicker pink paste is, it's just another thermal interface between the memory chips (if I were to hazard a guess) and the cooling solution. It is thicker probably because the distance between the two was longer, that's why some employ pads, but the paste should perform much better.

    Full disclosure, I have yet to try those pads that have some graphite in them, but generally speaking thermal compounds in paste form are easily better at conducting heat. Hell, I bet even if you went overboard with the same paste you used for the dies in those parts it would work as well.

    And I'm glad the RAM sticks were compatible after all! You had all of the chances since they were bought in the same shop on the same day, they probably come from the same batch so same pieces. But yes, finding companions for RAM chips can be tricky.
  • Unknownhost
    Unknownhost Member Posts: 17 Troubleshooter
    Ok I'll leave alone then on the pads and hope that paste does the job. 

    New issue now though , my partners laptop now got the BSOD !! After all this it seems that RAM is a issue. I cannot see why though as its supposed to be compatible ?? As u saw all the details from mine , do u think it's not or is that RAM stick doddgy?
  • aphanic
    aphanic Member Posts: 959 Seasoned Specialist WiFi Icon
    Now that's puzzling, because the Crucial stick is on its own, so it doesn't have to play nice with anybody else and it certainly matches the specs the processor wants (it's a DDR4-2666 module), plus it's nowhere near the theoretical limit of that memory controller.

    Could it be that the stick is malfunctioning? Or somehow incompatible with those systems?

    Is it possible to run a memory test program for some hours while you sleep? I'll give you instructions on how to make a bootable USB with one such program if it is, we could see if it shows anything odd.

    The BSODs, can you make out what are they showing? How big are the minidumps (at C:\Windows\minidump, may be hidden) Maybe you could upload them (compressed) so I can take a look. There used to be a nice service to analyze them online, but it's been shutdown. We're now limited to WhoCrashed and a couple other programs.
  • Unknownhost
    Unknownhost Member Posts: 17 Troubleshooter
    edited August 2020
    Ye that was my thought too. Ive emailed crucial and have to wait a few days for them to get back to me on it as its just gone over the 30 days warranty on it ..... typical. 

    Its compatible or it should be , i used the cruical system check to see what was compatible and it came up with the stick i ordered so i am at a loss now with it and she obviously is now saying i need to sort it out haha.

    I could do it through the night whilst i am awake to be fair if need be (bit of a night owl) , but i would need a proper dummies guide for it as id have no idea what to do with it , plus i might not have a USB device for it. dont keep those around anymore , what kinda size would i need for it ? 

    Ok ill have to search for them on her system now as mine wont be any good from this laptop will they? , though would the last one on here and the one from hers be a good idea so kinda like a cross reference? or just send her one across? the last one and 3 previous ones on mine are 2054kb. Just zip it up and then send the file?

    and just when i thought id sorted it 
    PS. do u know what the downbin bit means on the stick from the factory fitted one? i cant find any info about it. 

     
  • aphanic
    aphanic Member Posts: 959 Seasoned Specialist WiFi Icon
    Oh Crucial system check... how I dislike that utility haha, or rather we have a love-hate relationship.

    Don't get me wrong, it may work fine many times, but I've found it unreliable. In my case for example it tells me to "upgrade" to an SSD that is slower than what I'm currently running, but it's Crucial branded of course. And wrongly tells me that I have 4 GBs of memory soldered to the motherboard when there's no such thing... but the terms we agree to before using the app already state that it can be wrong and insecure (and they wash their hands responsibility-wise, don't even state which data they collect, everything is too hush hush for my taste, doesn't even comply with GDPR for European countries).

    If it were brand agnostic for example, something developed by a 3rd party that not only suggested things from a brand but gave various options (and was more accurate) I could get behind it, but in its current state I'd just stay out of it. But I see things from my perspective, I understand all of the things that one has to take into account are too much for the average Joe so I see the appeal and usefulness of that app too.



    Anyway, I'm a night owl too haha although I keep trying not to, but it's like I'm wired up this way, nights are much more calm and peaceful. Maybe I should just relocate, move to a place where my usual bed time is the regular one over there :D

    I suggested overnight because those kinds of apps are run without OS or anything, barebones, so the computer can't be used while it's doing the testing. But it could be anytime you don't need to use the computer, it's just that it takes a couple of hours because runs several patterns through the whole memory and reads them back to see if there's a "mistake". If I remember correctly I think you could specify the number of cycles you want it to run, or it was endless, but the testing can be stopped at any time. I'll get you some details tomorrow on how to do it with screenshots and the like, maybe it isn't needed this time but if you ever suspect a RAM chip could be failing, that'd be a way to know.

    Size for the USB would be minimal, hell I think old versions even fit in floppy disks so while there are several versions of it (and I think Windows comes with one too, I'll look into that first because there may not be a need for anything), even an old 64 MB USB stick would do.



    For the minidumps yep, it would have to be from the system that is failing, preferably hers because it is the one that only has the Crucial stick and nothing else. They vary in size, if they are too big maybe we have to abandon the idea, but I'll tell you how to look for anything useful over there if it's even possible.



    That downbin... complicated to know without working in the industry, but I can give you an educated? guess. My take on it is that there were imperfections in the memory chips that make up that RAM stick during manufacturing (as in, they were intended to be used in a stick that was rated for higher clock speeds for example, like 3200 MHz) so instead of discarding them, they're set to be run at a lower speed in which they run well and per spec.

    Defects in manufacturing of semiconductors are common, for example when switching to a new manufacturing process or a new node the yield could be very low (see the lack of Ice Lake processors when they were announced for example, I bet Intel had trouble getting them made), there could be many that don't perform up to spec. There's something that's called "risk production" before the mass production phase in which no real production ready chip is made, but the process is already being tested. Even in mass production, at the testing phase is when products are binned or grouped according to how they perform. Some may be discarded, others destined for different products. Maybe that's what downbin means.

    It's happened in processors before, I think it was AMD but I may be mistaken so take it with a grain of salt; some quad core processors had defective cores so instead of just discarding the things they were labeled and sold as dual core processors with those cores disabled. Back in the day people would try to hack their firmware to "enable" those defective cores, but nowadays when that happens the cores are physically disabled instead.

    And it's not misleading by the way, because you get what you buy, it's as if you were buying an i5 from Intel, it would be labelled as such and perform as expected, but in reality the chip could be an i7 that just didn't perform as expected in its range. This analogy is purely hypothetical, but you get the drift.

    And there goes another monster of text to be read (facepalm emoji)