Source Code Project Mantis - FSSCP
View Issue Details
0003130FSSCPbeamspublic2014-11-21 19:072015-12-19 18:06
ReporterGoober5000 
Assigned ToFSO 4 
PrioritynormalSeveritymajorReproducibilitysometimes
StatusconfirmedResolutionsuspended 
PlatformOSOS Version
Product Version3.7.2 RC4 
Target VersionFixed in Version 
Summary0003130: The insidious beam crash bug
DescriptionI suddenly started experiencing it myself and now I've noticed some common features. All of the following seem to be necessary:

1) A recent build (latest trunk as of r11178 will do it)
2) A beam begins firing
3) The player is using an NVIDIA card

It occurs on both release and debug builds but NOT when a debug build is run through MSVC 2005, at least. It happens both with and without the 2014 mediaVPs. I've attached a couple of fso_open.log files.

I didn't initially experience this crash when playing through the FS2 campaign on MediaVPs 2014, but then I realized I was playing using the integrated graphics card. As soon as I switched to NVIDIA, the crashes started reliably happening.

This is a show-stopper for the 3.7.2 release as far as I'm concerned.
Additional InformationThreads where the crash has been reported:

http://www.hard-light.net/forums/index.php?topic=88649.0
http://www.hard-light.net/forums/index.php?topic=88662.0
http://www.hard-light.net/forums/index.php?topic=88654.0
TagsNo tags attached.
related to 0000001resolved taylor CTD With "Target In Reticle" 
Attached Fileslog fs2_open-mediavps.log (47,382) 2014-11-21 19:15
http://scp.indiegames.us/mantis/file_download.php?file_id=2614&type=bug
log fs2_open-nomediavps.log (43,030) 2014-11-21 19:16
http://scp.indiegames.us/mantis/file_download.php?file_id=2615&type=bug
jpg crash error.jpg (15,660) 2014-11-22 13:31
http://scp.indiegames.us/mantis/file_download.php?file_id=2616&type=bug
jpg

txt call stack.txt (6,326) 2014-11-22 13:31
http://scp.indiegames.us/mantis/file_download.php?file_id=2617&type=bug
zip MVP 2014 patches.zip (4,112) 2014-11-22 14:45
http://scp.indiegames.us/mantis/file_download.php?file_id=2618&type=bug
txt call stack 2.txt (2,458) 2014-11-23 14:30
http://scp.indiegames.us/mantis/file_download.php?file_id=2619&type=bug
txt call stack 3.txt (2,558) 2014-11-23 14:30
http://scp.indiegames.us/mantis/file_download.php?file_id=2620&type=bug
log fs2_open 3.log (48,048) 2014-11-23 14:31
http://scp.indiegames.us/mantis/file_download.php?file_id=2621&type=bug
log fs2_open 4.log (456,666) 2014-11-23 15:15
http://scp.indiegames.us/mantis/file_download.php?file_id=2622&type=bug
txt call stack 4.txt (2,558) 2014-11-23 15:15
http://scp.indiegames.us/mantis/file_download.php?file_id=2623&type=bug
txt call stack 5.txt (2,558) 2015-01-03 21:07
http://scp.indiegames.us/mantis/file_download.php?file_id=2635&type=bug
txt call stack 6.txt (2,558) 2015-01-05 00:31
http://scp.indiegames.us/mantis/file_download.php?file_id=2636&type=bug
patch hud.cpp.patch (1,806) 2015-01-05 02:30
http://scp.indiegames.us/mantis/file_download.php?file_id=2637&type=bug
patch gropengltexture.cpp.patch (791) 2015-01-05 03:31
http://scp.indiegames.us/mantis/file_download.php?file_id=2638&type=bug
patch smother_with_error_checking.patch (2,375) 2015-01-07 17:04
http://scp.indiegames.us/mantis/file_download.php?file_id=2639&type=bug
txt call_stack-2015-01-10.txt (2,558) 2015-01-10 22:43
http://scp.indiegames.us/mantis/file_download.php?file_id=2640&type=bug
log fs2_open-2015-01-10.log (510,630) 2015-01-10 22:43
http://scp.indiegames.us/mantis/file_download.php?file_id=2641&type=bug
txt bmpman_entry.txt (1,404) 2015-02-08 21:04
http://scp.indiegames.us/mantis/file_download.php?file_id=2646&type=bug
txt bm_bitmaps watch.txt (1,328) 2015-02-22 20:16
http://scp.indiegames.us/mantis/file_download.php?file_id=2653&type=bug
patch disable_reuse.patch (582) 2015-02-26 19:56
http://scp.indiegames.us/mantis/file_download.php?file_id=2655&type=bug
txt call stack wo dummy.txt (2,559) 2015-03-04 00:45
http://scp.indiegames.us/mantis/file_download.php?file_id=2663&type=bug
txt call stack w dummy.txt (2,558) 2015-03-04 00:45
http://scp.indiegames.us/mantis/file_download.php?file_id=2664&type=bug
txt call stack w dummy and kills disabled.txt (2,558) 2015-03-04 00:46
http://scp.indiegames.us/mantis/file_download.php?file_id=2665&type=bug
txt different crash call stack.txt (2,453) 2015-03-04 00:46
http://scp.indiegames.us/mantis/file_download.php?file_id=2666&type=bug
7z different crash fs2_open.7z (126,238) 2015-03-04 00:46
http://scp.indiegames.us/mantis/file_download.php?file_id=2667&type=bug
patch 3130.patch (9,855) 2015-03-04 01:06
http://scp.indiegames.us/mantis/file_download.php?file_id=2668&type=bug
patch redundancy.patch (482) 2015-03-05 03:29
http://scp.indiegames.us/mantis/file_download.php?file_id=2669&type=bug
txt callstack w redundancy.txt (2,559) 2015-03-10 00:23
http://scp.indiegames.us/mantis/file_download.php?file_id=2671&type=bug
txt units.txt (651) 2015-03-10 00:23
http://scp.indiegames.us/mantis/file_download.php?file_id=2672&type=bug
patch hudtarget.cpp.patch (1,343) 2015-03-10 23:51
http://scp.indiegames.us/mantis/file_download.php?file_id=2673&type=bug
txt enabling triangles.txt (2,546) 2015-03-15 02:16
http://scp.indiegames.us/mantis/file_download.php?file_id=2674&type=bug
patch swifty.patch (1,553) 2015-03-23 02:08
http://scp.indiegames.us/mantis/file_download.php?file_id=2677&type=bug
patch desaturate_fix.patch (2,863) 2015-04-08 03:49
http://scp.indiegames.us/mantis/file_download.php?file_id=2689&type=bug
txt desaturate-callstack.txt (2,559) 2015-04-08 23:10
http://scp.indiegames.us/mantis/file_download.php?file_id=2690&type=bug
log desaturate-fs2_open.log (404,039) 2015-04-08 23:11
http://scp.indiegames.us/mantis/file_download.php?file_id=2691&type=bug
patch shader_mode_hunch.patch (4,404) 2015-04-09 23:01
http://scp.indiegames.us/mantis/file_download.php?file_id=2692&type=bug
txt hunch-callstack.txt (2,558) 2015-04-10 00:13
http://scp.indiegames.us/mantis/file_download.php?file_id=2693&type=bug
log hunch-fs2_open.log (403,794) 2015-04-10 00:13
http://scp.indiegames.us/mantis/file_download.php?file_id=2694&type=bug
patch hunch2.patch (4,582) 2015-04-10 00:17
http://scp.indiegames.us/mantis/file_download.php?file_id=2695&type=bug
txt hunch2-callstack.txt (2,559) 2015-04-10 23:50
http://scp.indiegames.us/mantis/file_download.php?file_id=2696&type=bug
log hunch2-fs2_open.log (404,961) 2015-04-10 23:50
http://scp.indiegames.us/mantis/file_download.php?file_id=2697&type=bug

Notes
(0016390)
MageKing17   
2014-11-21 20:20   
The second thread linked doesn't fit the pattern because TheHound's logs show him using an Intel HD 4000, not an nVidia card. I think you're also the first person to report that the crash still occurs with a debug build... if running the debug build through Visual Studio fails to reproduce the problem, have you tried attaching the debugger to a release build?
(0016391)
Goober5000   
2014-11-22 13:20   
(Last edited: 2014-11-22 13:22)
Tried that today. It refuses to crash in MSVC whether it's through release or debug.

But since the crash does occur in a debug build, it should be possible to narrow down the problem by use of logging statements. The cause seems to be in the shader system, because the debug log contains shader information immediately prior to the crash, both with and without MVPs.

If someone wants to heavily instrument the shader system with logging statements and provide me a patch, I'd be happy to run tests.

(0016392)
Goober5000   
2014-11-22 13:31   
Actually, let me modify that comment. It doesn't crash if I launch the process from MSVC, but it does crash if I launch the process separately and then attach the debugger.

The crash occurs in opengl_texture_state::Enable, inside the glBindTexture function. The actual source of the crash is in the nvoglv32.dll, for which we don't have the debug symbols. I'm attaching a screenshot of the error as well as the call stack.
(0016393)
Goober5000   
2014-11-22 14:45   
I've uploaded a zip that contains versions of the MVP 2014 tables so that 3.7.0 can run with the MediaVPs 2014. I still get the crash under 3.7.0.
(0016394)
Goober5000   
2014-11-22 16:36   
Updating the NVIDIA drivers did not solve the crash.
(0016395)
m_m   
2014-11-23 11:24   
The call stack shows that the bitmap_handle argument for gr_opengl_tcache_set_internal is 0. If I read bm_get_next_handle() in bmpman.cpp correctly that should never happen.
Is there a mission where this crash happens consistently?
(0016396)
Goober5000   
2014-11-23 12:08   
Yeah, if you load up Exodus (SM3-06) the Nebtuu and Abraxis are firing beams at each other at mission start, so the crash happens almost immediately.
(0016397)
m_m   
2014-11-23 12:50   
I tried running the mission and checking for bitmap_handle==0 in gr_opengl_tcache_set_internal but that condition is apparently never true on my machine.
(0016398)
MageKing17   
2014-11-23 13:38   
It shouldn't be possible for bitmap_handle to be 0 in gr_opengl_tcache_set_internal() because it's only called through gr_opengl_tcache_set(), which specifically checks if (bitmap_handle <= 0) and returns 0 if it is. For some reason, gr_opengl_tcache_set() is missing from the attached call stack, which doesn't seem like it should be possible.
(0016399)
Goober5000   
2014-11-23 15:15   
More logs attached.
(0016420)
chief1983   
2014-12-19 10:03   
Goober, if you're getting crashes with debug, attaching a debugger and snooping around could be very very helpful here. Perhaps in conversation with another dev or something.
(0016422)
Goober5000   
2014-12-20 16:29   
I've done that already. See comment #c16392 that I wrote on 11/22. I tracked down the location of the crash and attached several call stacks.

I'm happy to do more collaboration with whoever wants to step up to fix this, but nobody has stepped up yet. And not being a graphics coder, I don't have the expertise to fix this myself.
(0016423)
chief1983   
2014-12-20 16:34   
One of our members mentioned that the crash happens only when directly looking at beam sources. Was this something you observed? If not it might be different bugs.
(0016424)
Goober5000   
2014-12-20 17:29   
That's an interesting note. Every crash that I saw was when I was looking directly at the beam, but I didn't try looking away from the beam.

I don't think that tells us much though. We already knew it was graphics related, and there's no need to draw the beam if you're not looking at it.
(0016425)
MageKing17   
2014-12-20 17:36   
Would be interesting to see if it could be linked to a specific part of drawing the beam (e.g. disable particles and see if it still happens); I'm going to try to reproduce this issue on an nVidia computer today and, with luck, get more data (although I'm also pessimistic the data might be "it doesn't happen on that card").
(0016426)
chief1983   
2014-12-20 17:53   
Not just the beam itself, but he could look at the beam freely, and not crash until he looked at its origin. People mentioned that boom warmups had been causing it, so personally I hadn't ruled out that we just had a small sample pool and that it was possibly still audio related, although now I'm more willing to rule that out. But if it's only the origin of the beam, it seems to indicate something with the way the warmup or base animation points are being draw, or possibly how beam origin animations are interacting with the surrounding ship model. With the description of your callstack you mentioned earlier, perhaps it has to do with the animation somehow failing to have loaded in the first place, and upon trying to draw it to load it ends up crapping out because it isn't there to draw in the first place? I take it no one has had this happen with retail, so that might imply it has to do with DDS textures and PCX are immune. That or the actual difference between how MVP beam glows and retail beam glows are configured is related.
(0016427)
MageKing17   
2014-12-20 19:55   
Null result on a GTX 580; sounds like this problem only affects 600-series and later cards.
(0016428)
Goober5000   
2014-12-20 21:45   
Mine is a Quadro K2100M, if that helps.
(0016429)
chief1983   
2014-12-20 23:23   
That appears to share a chipset with the GTK660, so that would be in line with the findings. And if you're running Quadro drivers it means they're not any more immune to the issue than the Geforce line drivers.
(0016435)
z64555   
2015-01-03 16:59   
(Last edited: 2015-01-03 17:02)
Thought I would try to help out, but I could not repro the crash on my machine.

Build Info:
 SVN r11204
 MSVS Debug AVX configuration

Test Mission:
 SM3-06

GPUs:
 Intel 4700-MQ
 Nvidia GeForce 770M (driver 347.09)

Assets:
 Retail
 mediavps_3612
 MediaVPs_2014

(0016436)
Goober5000   
2015-01-03 17:04   
If one of the graphics coders is able to do a code review of the section that crashes, that would be helpful even if he can't reproduce the crash on his own machine. Scrutinizing the code may reveal improperly formatted data, a bad API call, a function being called out-of-order, data being accessed after being freed, or any number of things.

Refer to my previous comments and stack traces, particularly this: "The crash occurs in opengl_texture_state::Enable, inside the glBindTexture function."
(0016437)
Goober5000   
2015-01-03 17:49   
It's definitely HUD related. I loaded up Exodus and immediately toggled the HUD off, and the beams fired without crashing. Then I toggled the HUD back on while the beam was firing, and insta-crash, even before the HUD became visible again.

(0016439)
niffiwan   
2015-01-04 02:08   
I notice that the call stacks always show that the Kills Gauge is being rendered, does the crash still occur if just that one gauge is disabled?

(0016441)
Goober5000   
2015-01-04 02:38   
If you can whip up a test mission, I'll test it out. I've been using Exodus (sm3-06) and applying various tweaks.
(0016443)
MageKing17   
2015-01-04 02:54   
Should be as simple as using the in-game options menu and turning it off; hit F2, enter "HUD Config", click on the kills gauge (it's between ETS and CM count), and set it to "off" and see what happens.

(And then, of course, if it doesn't crash, try turning it on while a beam is firing, although at that point I think it'd be a pretty safe bet that it crashes again.)
(0016444)
Goober5000   
2015-01-05 00:33   
Very interesting. That's exactly what happened... starting the mission with the kills gauge disabled prevented the crash, and then re-enabling that gauge while the mission was in progress caused a crash as soon as I hit Accept. New call stack (number 6) attached, but it's pretty much the same as the previous ones.

(0016445)
MageKing17   
2015-01-05 02:30   
(Last edited: 2015-01-05 03:32)
Well, I've gone all over the HudGaugeKills code and can't find anything unusual about it. I did find one oddity in hud.cpp that shouldn't be causing any problems, but just to be safe, I've attached a patch file that removes some unused things. Could you compile a build with the patch attached and see if it affects the problem? I don't expect it to, but who knows what could be affecting it...

EDIT: Another patch attached that adds some error checking near the problem area.

(0016448)
Goober5000   
2015-01-09 10:56   
I haven't had a chance to test the patch yet, but hopefully I can do so this evening.
(0016449)
Goober5000   
2015-01-10 22:44   
Or the following evening. >.> Crashed again in the same place. New log and stack trace attached.
(0016450)
MageKing17   
2015-01-10 22:51   
It looks like it didn't trigger any of the new OpenGL error checking. Did you use the smother_with_error_checking.patch or just gropengltexture.cpp.patch?
(0016451)
Goober5000   
2015-01-11 17:30   
Just the smother_with_error_checking.patch actually. But I just ran a new build with both patches and it looked like I got the same result.
(0016452)
MageKing17   
2015-01-11 17:39   
smother_with_error_checking.patch includes gropengltexture.cpp.patch; clearly, whatever is going wrong here isn't throwing any OpenGL errors.

Someone on the forums reported that the problem has gone away since installing the latest nVidia drivers; is there an updated driver for the Quadro K2100M?
(0016453)
chief1983   
2015-01-11 17:42   
Looks like it. http://www.nvidia.com/download/driverResults.aspx/80141/en-us 341.21, on December 5.
(0016454)
Goober5000   
2015-01-17 21:30   
I installed the driver update. Crash still occurs.
(0016466)
niffiwan   
2015-01-29 07:44   
(Last edited: 2015-01-29 16:33)
There seems to have been another driver update recently; it may be worth testing with it?
http://www.nvidia.com/download/driverResults.aspx/81666/en-us

Version: 347.25 WHQL
Release Date: 2015.1.23

Also, do we know if the problem occurs with mediavps_3612? And do we know the earliest release that it occurs on? (pre 3.7.0)

update: info from krevett62
http://www.hard-light.net/forums/index.php?topic=89060.msg1775367#msg1775367

... for me before updating my nvidia drivers, I was crashing when playing with 3.7.2_RC4 and with MVP 2014 or MVP 3.6.12, but no crash occured with retail data and the same exe. Also 3.7.0 with MVP 3.6.12 worked fine at this time.

(0016467)
Goober5000   
2015-02-01 02:59   
Updated the driver again. Still crashes.
(0016475)
Inglonias   
2015-02-07 12:04   
(Last edited: 2015-02-07 12:20)
I have this bug, but trying to reproduce it in the debug builds results in the game running just fine. Weird.

Using a GTX 670 with the latest drivers.

EDIT: Reading the other notes reveals that this is not new information. My bad. Any way I can help with just the debug builds and normal builds?

(0016476)
Goober5000   
2015-02-07 12:52   
Debug builds will run on the Intel Integrated graphics card unless you specifically set them to run with the NVIDIA card. I think that's why some people have not seen crashes in debug builds.

I think that all the help testers can give has been given. We know where in the code it crashes, we know about an unusual setting that disables the crashes (turning off the HUD gauge), and we know a possible cause (NVIDIA has said this can happen with badly formatted graphics data).

This bug will only be fixed when a group of graphics coders sit down, take a serious look at it, and make a serious attempt to solve it. It's not going to solve itself, and three separate driver updates have not fixed it.
(0016477)
niffiwan   
2015-02-08 17:09   
(Last edited: 2015-02-08 21:22)
@Goober5000; have you had a discussion with Nvidia regarding this issue? If you've got more info from them it'd be useful to see the entire conversation, and in particular understand what "badly formatted graphics data" refers to. Maybe there's something wrong with the retail hud kills gauge .ani?

I've also had a look through the hud kills gauge code and can't see any obvious issues (apart from the one that MageKing17 has already identified). Coverity is reporting some uninitialised variables in the HUD code class constructors but I can't see anywhere obvious that these are being *used* uninitialised (i.e. they're being initialised outside the contructors). Regardless I'll see if I can put together a patch that'll fix those issues and we can see if it makes any difference or not.

One other idea I had was to investigate the possibility that (any of) the beam textures were corrupting/overwriting the kills gauge textures. Would it be possible for you to check the contents of BMPMAN (when FSO crashes) at the relevant bmpman index (532715 in most of the stack traces; and it's not exactly an index given the modulo operations involved, the real index is probably 532715%4750 = 715) and maybe extract the data into a file to check if it looks like valid data?

@Inglonias; I think it'd be useful if you could confirm whether you have dual video cards in your system and whether the debug builds are using the Intel card or the Nvidia one.

(0016478)
niffiwan   
2015-02-08 17:31   
(Last edited: 2015-02-08 21:30)
more info; Axem is unable to reproduce the issue.

Nvidia 960
Drivers 347.25
Exodus (SM3-06) with mediavps_2014

So maybe the issue only affect 6xx/7xx series cards and their Quadro equivalents?

edit:

DahBlount also can't repro with a 970

(0016479)
Goober5000   
2015-02-08 21:04   
(Last edited: 2015-02-08 21:05)
@niffiwan: I haven't discussed anything with NVIDIA directly. That comment came from a search on the error message which returned this result:

https://devtalk.nvidia.com/default/topic/720651/opengl/access-violation-in-nvoglv32-dll-how-do-i-track-down-the-problem-/
"Access violations during glDrawArrays or glDrawElements are most often due to incorrectly enabled vertex attribute arrays.
Please check your current state of the enabled vertex attributes very carefully. One which is left enabled inadvertently without providing any or enough data will result in these kinds of errors when sourcing data out of bounds during the draw call."

Good call on checking the contents of bmpman. I looked at bm_bitmaps using the bitmap_handle passed to gr_opengl_tcache_set_internal() at the time of the crash, and it corresponded to kills1.ani. I have attached the rest of the entry as a text file to this ticket but I wouldn't know how to extract the graphical data into a file.

(0016480)
MageKing17   
2015-02-08 21:27   
Except it's not during glDrawArrays() or glDrawElements(), it's during glBindTexture().

Have you tried taking some other ANI file (like, say, time1.ani), copying it to your /data/hud/ folder, renaming it to "kills1.ani", and seeing what happens?
(0016482)
Goober5000   
2015-02-08 23:42   
Yes, I know that glBindTexture is a different function, but the same principle could apply. In the first place, it was a very similar error; in the second place, the forum poster may not have exhaustively listed every single potential crash source; and in the third place, checking that your inputs are valid is a good thing to do no matter what API you are using.

Copying and renaming time1.ani to kills1.ani produced exactly the same crash.
(0016483)
niffiwan   
2015-02-08 23:52   
The odd thing is that there's a heap of other hud gauges rendered in similar fashion. I guess it'll need a more thorough comparison of the exact calls+params used in each case. I'm kinda assuming it's not a stale OGL state issue since then I'd expect it to crash at the next gauge to be rendered when the kills gauge is off.
(0016485)
MageKing17   
2015-02-09 00:23   
"Copying and renaming time1.ani to kills1.ani produced exactly the same crash."

And you can see the background of the kills gauge changed, thereby ensuring that this copy is being loaded and not being overridden by a higher-priority kills1.ani somewhere else in the file tree? Just want to make sure. It sounds incredibly odd for it to not be working when time1.ani renders just fine for the mission time gauge.

Both gauges render in pretty much exactly the same way; they call setGaugeColor(), then renderBitmap() on the appropriate frame. For HudGaugeKills, this renderBitmap() call is apparently crashing whenever a beam is onscreen. For HudGaugeMissionTime, it isn't. Just about the only difference I can see is in how they behave slightly differently if first_frame is < 0, but that shouldn't be relevant when both have background images (or cause any problems even if they didn't).

Just about the only zany testing change I can think of to make at this point is to rearrange retail_gauges[] to swap HUD_OBJECT_MISSION_TIME and HUD_OBJECT_KILLS, just to see if it still crashes during HudGaugeKills.
(0016486)
Goober5000   
2015-02-09 02:22   
(Last edited: 2015-02-09 02:23)
Swapping HUD_OBJECT_MISSION_TIME and HUD_OBJECT_KILLS now results in a crash in the mission-time HUD gauge (time1.ani), but at the same bitmap handle as the kills gauge in the previous test. This lends support to niffiwan's earlier hypothesis about memory overwrites:

"One other idea I had was to investigate the possibility that (any of) the beam textures were corrupting/overwriting the kills gauge textures."

(0016487)
niffiwan   
2015-02-09 05:21   
(Last edited: 2015-02-09 05:42)
OK, so thinking in the same direction, the 1st beams to fire in SM3-06 would be the VSlash on the Nebtuu? Could you try switching out each texture used by the VSlash, one at a time? i.e.

particleexp01
beam-white3
beam-orange2
beam-orange
(I think they're the only VSlash textures that are unique to beams)
(actually, isn't there a beam glow texture loaded somewhere as well?)
(and hopefully I'm looking in the correct tables, there could be a table I've missed in the mediavps_2014 that's overriding the VSlash)

As for what to replace them with; maybe any original retail textures?
ParticleExp01.ani
beam-white3.pcx
beam-orange2.pcx
beam-orange.pcx

The idea is to isolate which texture is the cause of the issue to hopefully narrow down the area of code that needs checking.

In conjunction with this, maybe use a debug_filter.cfg containing the following lines, in order to get some extra info about the texture load process?
+General
+Warning
+Error
+BmpMan
+BmpInfo
+BMP DEBUG
+BmpFastLoad


edit: D'oh! There is, in mv_effects.vp. Maybe disable that single VP as the 1st test?

edit2: I think I've found the correct textures now.
particle_yellow.eff
VasBeam2Core.dds
VasBeam2Glow.dds
VasBeam2GlowHaze.dds
OrangeFade.dds
beamglow6.eff
capflash.eff
exp06.eff
exp04.eff

Maybe try at replacing the .eff files 1st as I recall that EFF loading/handling changed within the last few years. More recently than any DDS changes at least. Disabling MV_Advanced.vp will probably remove a fair few of them.

(0016494)
Goober5000   
2015-02-09 22:00   
The test occurs regardless of whether the MediaVPs are active or not. In all of my recent tests, I've been running with just the standard no-mods configuration. So they are already all retail textures.
(0016498)
MageKing17   
2015-02-18 20:28   
I don't suppose there's some way to find out exactly what section of memory the kills gauge is stored in and then make FSO break whenever that section of memory changes?
(0016499)
Goober5000   
2015-02-18 22:00   
It's very possible, and in fact I've tracked down some devious bugs that way. But I would need to know what I should be looking for. Memory changes all the time in perfectly normal ways.

I'll give you an example of one such bug. Remember my gigantic campaign to fix memset/memmove/memcpy in 2013? That was prompted by a crash where a pointer had a null value that had no business being null. I set a breakpoint on that pointer to trigger when its value changed, and then I ran the test mission. Upon reaching the breakpoint, I immediately saw that memset was being used incorrectly.
(0016500)
MageKing17   
2015-02-19 13:00   
Well, I was thinking that once it's loaded, the bitmap data shouldn't be changing again, so any change to that memory should be incorrect after the initial loading stage.
(0016501)
chief1983   
2015-02-19 13:10   
So the difference between this and that example is that in that example, the pointer's value was changing, but here, we're looking to see if the data in the memory the pointer points to is changing? I'm guessing that's harder to set up a watch for? Or at least not the same.
(0016502)
niffiwan   
2015-02-19 20:22   
I don't think it'll be much different. Set a data watchpoint on this pointer address, i.e.:

&bm_bitmaps[715]->bm->data

From this page I believe you can set the number of bytes to watch from that point on, which should let you watch all the data for that bitmap.
http://stackoverflow.com/questions/621535/what-are-data-breakpoints

To get the number of bytes I think you just need to read the contents of bm_bitmaps[715].data_size (if I'm reading this correctly BMPMAN_DEBUG is always defined in DEBUG builds)
(0016503)
Goober5000   
2015-02-20 23:51   
K. I'll give that a try on Saturday.
(0016505)
LotF   
2015-02-22 16:03   
No issues with my GeForce GTX 660 Ti (Vendor: MSI, Driver: nvlddmkm 9.18.13.3788 (ForceWare 337.88) / Vista 64bit)
(0016509)
Goober5000   
2015-02-22 20:15   
So I set a data watchpoint on bm_bitmaps[257]->bm->data (not 715, because that's exp05_1) and it had the value 0 during the entire run of the executable. Never changed at all, and was still 0 when the crash occurred.

I am attaching the data watch of the bm_bitmaps[257] entry at the time the crash occurred.
(0016510)
MageKing17   
2015-02-22 22:41   
ref_count is 0; that means that entry isn't being drawn at the time the crash is happening. Given the fact that bm->data is 0, this almost certainly means that this bitmap entry has never been drawn.

It's rather odd that this is bm_bitmaps[257] when the bitmap_handle in previous call stacks was 532715 (which, modulo MAX_BITMAPS, should be 715 as niffiwan referenced earlier). This isn't still with HUD_OBJECT_MISSION_TIME and HUD_OBJECT_KILLS swapped in retail_gauges[], is it?
(0016511)
Goober5000   
2015-02-22 23:42   
Well, bitmap handles aren't set in stone. Their loading order probably depends on the sequence in which you do things. In recent test runs the bitmap_handle has been 332757, which mod 4750 is 257.

The HUD_OBJECT_MISSION_TIME and HUD_OBJECT_KILLS are not still swapped.
(0016515)
MageKing17   
2015-02-26 19:57   
After stepping through the related graphics code, it would seem that the crash is actually occurring before there's ever a chance to even load the relevant bitmap data; this suggests that the problem isn't related to the graphics data, but is instead related to how OpenGL is being used. As an experiment, I've uploaded a patch that disables a bit of code that causes the engine to re-use a texture slot; if it still crashes with this patch, then I'm out of ideas.
(0016523)
Goober5000   
2015-03-02 02:01   
Crashed again, same place. Same bitmap handle, in fact. (Yes, I double-checked that your patch had been applied.)
(0016524)
MageKing17   
2015-03-02 02:40   
Well, I wouldn't expect the bitmap handle to change; the texture slots I am referring to are the texture IDs generated by glGenTextures(); I had thought that perhaps forcing it to free the IDs and regenerate fresh ones would avoid some wonky driver issue and fix the problem. Since that appears to not be the case... as I previously said, I'm out of ideas.
(0016533)
MageKing17   
2015-03-03 22:17   
After reading an offhanded comment niffiwan made in the March newsletter ("Maybe we should just reorder the hud gauges array such that an unused/little used HUD gauge is in the (ahem) *slot of doom* and then the workaround is to turn off the gauge"), I've come up with a stupid idea. Latest patch creates a "dummy" HUD gauge that's more-or-less an exact copy of HudGaugeKills, but a render() function that doesn't do anything, and inserts it before HUD_OBJECT_KILLS in retail_gauges[]. Given the behavior regarding swapping HUD_OBJECT_MISSION_TIME and HUD_OBJECT_KILLS, and the fact that turning off the kills gauge makes it not crash, this should theoretically fix the problem (even if it is a dirty, dirty hack).
(0016534)
Goober5000   
2015-03-04 00:47   
Well, this was interesting. With your patch applied, the game still crashes in the same place, with the same bitmap_handle, and using the same kills gauge. Not only that, but disabling the kills gauge in the HUD config doesn't work -- the gauge still appears in the mission and the crash still occurs.

To double-check this, I reverted the patch and ran things again. With the patch reverted, disabling the kills gauge worked and prevented the crash, while leaving the HUD gauge active caused the crash.

I did this series of tests twice. The first time I realized adding the extra HUD gauge might introduce some unknown problem with pilot profiles, so during the second cycle I created a new pilot for each of the trunk build and the patched build. Same result in all respects, including the patched build not actually disabling the kills gauge in-mission when you disable the kills gauge in your settings.

I've uploaded call stacks of the second cycle's series of tests.

(It's worth noting that during the first cycle, with the potentially compromised pilot files, I encountered one crash that was completely different, though it *also* occurred right at the beginning of Exodus. I've uploaded the call stack and log for that one crash as well.)
(0016535)
MageKing17   
2015-03-04 01:16   
Inability to disable kills gauge was probably due to inserting the entry in the wrong order in hudconfig.cpp; patch updated to hopefully fix that (although the fact that the crash still happened at all means that the dummy gauge isn't doing what it was expected to do).

The "different crash call stack" looks like "call stack 2" above, except with the addition of gr_opengl_string() in the stack trace... the odd part being that line 537 is the closing brace of the function. I guess that means some variable in that function is causing a crash due to going out of scope; the only non POD variable being declared in gr_opengl_string() would seem to be the GLboolean cull_face.

Before we go too far down this rabbit hole, it looks like there was another driver update last month: http://www.nvidia.com/download/driverResults.aspx/82061/en-us
Version: 347.52
Release Date: 2015.2.11

Might be worth trying one more time to see if the new drivers get rid of the issue.
(0016536)
MageKing17   
2015-03-05 03:31   
On a working assumption that the driver update won't fix the problem, I've attached another patch. This adds a glBindTexture() call immediately before the crashing one that binds the texture target to 0. This should be a completely redundant call that should have no effect whatsoever.

"Should" being the operative word.
(0016539)
chief1983   
2015-03-09 11:07   
Since we do have a workaround for this of disabling the kills gauge, plus it seems a smaller subset of users are still having issues at all now, there's talk of going ahead with the 3.7.2 final release without a final fix for this bug. Objections?
(0016540)
Inglonias   
2015-03-09 12:28   
As one of the smaller subset of users who is still having this issue... yes?

EDIT: I'll use the workaround if I have to. No harm done.

(0016541)
chief1983   
2015-03-09 12:34   
You are? We were thinking that recent driver updates had almost narrowed it down to only Quadro users still affected by it. Are you able to reproduce the issue the same way Goober has been in the more recent posts in this thread? And disabling the kills gauge prevents the crash for you as well?
(0016542)
Inglonias   
2015-03-09 12:36   
Oh, that's right. I DID update my drivers didn't I? Huh.

I haven't tested it with the new drivers. Ignore my ramblings.
(0016543)
MageKing17   
2015-03-09 12:37   
Inglonias, when asked for more information earlier, you never commented again; without feedback, we had no way of knowing you even still had the issue.

The best way to help track down the cause of this bug is to provide as much information as possible!
(0016544)
Inglonias   
2015-03-09 12:43   
Right, yes, but I was also told that more testing information was not needed at that point. Specifically, Goober said "I think that all the help testers can give has been given."

So really, it's Goober's fault. /sarcasm

If it helps, I'll try reproducing the bug again as soon as I have access to my desktop in a few minutes. (Someone else is using it)
(0016545)
Inglonias   
2015-03-09 13:01   
Alright, so I was on my desktop with a GTX 670. If it makes you feel better, I was staring right at the Psamtik as it fired in "Surrender, Bellisarius!" and the game doesn't crash.

Loading up Exodus, the game doesn't crash there either. I guess I'm not affected by this any more after all. As I said:
1. Ignore my mad ramblings
2. Goober. Blame him. It's his fault. Because I say so.
(0016546)
Goober5000   
2015-03-09 23:05   
When I said "I think that all the help testers can give has been given", I meant that the testers had narrowed down the location and context of the crash, and the next step was for someone with deep understanding of the graphics code to sit down, construct a fault tree, and start zeroing in on what might be causing the crash. That person never showed up. MageKing17 has been making a valiant attempt and is to be commended, but it amounts to throwing darts at the code to see if something gets hit. That's a lousy way to debug a problem. Nevertheless, because MageKing17 has been persevering, we owe it to him as testers to continue to provide feedback.

In short, Inglonias, my comment was directed at the developers, not the testers. It was meant to shame the graphics coders into showing up. :p

I've been busy over the past few days (painting, ugh) but I will update the drivers again and try MageKing17's latest patch.

Inglonias, are you *positive* that the game is running using the NVIDIA graphics processor and not Intelgrated or whatever the fallback is? If you have multiple processors you can right-click on the EXE and select "run with high performance processor" to make it explicit.

If Inglonias confirms that he's using the right processor, and if neither the driver update nor the latest patch solves the problem, AND if nobody else has the problem, then I will accede to closing this bug as "won't fix" and I'll just live with using the standard processor.

I will post again with the results of my tests shortly.
(0016547)
Goober5000   
2015-03-10 00:28   
Well now we're getting somewhere... I think. The bad news is that it still crashed after the driver update. The good news is that something unexpected happened: it crashed on the newly inserted line:

glBindTexture(units[active_texture_unit].texture_target, 0);

The bitmap_handle variable is still 332757 and still points to kills1.ani. On the assumption that this "units" struct has something to do with the crash, I've uploaded a variable watch for that reference. (The active_texture_unit variable itself is 0.)
(0016548)
MageKing17   
2015-03-10 00:37   
(Last edited: 2015-03-10 01:07)
The texture_target is just GL_TEXTURE_2D (3553 == 0x0DE1, which is how GL_TEXTURE_2D is #defined in /code/graphics/gl/Gl.h). glBindTexture(GL_TEXTURE_2D, 0) is just saying "make the active 2D texture the default". The fact that it's crashing on what should be the safest, most boring OpenGL call means that the problem almost certainly lies elsewhere; something else is changing the state in such a fashion that the next glBindTexture() call is resulting in a crash.

EDIT: Does it still not crash with the kills gauge disabled?

EDIT2: I just noticed that neither gauge after the kills gauge in retail_gauges[] calls renderBitmap(). It's possible that this is the reason the dummy gauge failed; moving something that calls renderBitmap() after the kills gauge may cause that gauge to crash if the kills gauge is disabled.

Perhaps it's time to try moving HUD_OBJECT_KILLS up in retail_gauges[] until the crash stops occuring (swap it with the item above it one by one so we know which gauge is causing the OpenGL state to become corrupted).

EDIT3: Actually, HudGaugeEtsRetail() eventually winds up calling renderBitmapEx() which winds up in the same set of functions, and HudGaugeFixedMessages (HUD_OBJECT_FIXED_MESSAGES, immediately after HUD_OBJECT_KILLS) calls renderString(), which eventually ends up in gr_opengl_tcache_set() again. So... I dunno. Still want to know what happens if HUD_OBJECT_KILLS is moved up the list.

(0016549)
Inglonias   
2015-03-10 01:46   
In regards to Goober asking if I'm sure about the crashing: What part of "Ignore my mad ramblings" do you not understand?

But, if it makes you happy, I loaded up Exodus on 3.7.2 RC5. Using a non-debug build. I was able to play the mission, and watch some beams before jumping out and being court marshalled. They were pretty.

Lack of a crash in this case from my end proves... that my copy of Freespace Open didn't crash. I don't know what else to say.

Goober, you don't HAVE to use the standard processor. Just disable the kills gauge. Nobody ever looks at it anyway!

(0016550)
Goober5000   
2015-03-10 20:39   
I don't think randomly shuffling HUD_OBJECT_KILLS is a good use of testing resources. It takes time to run each test, and every time the game crashes it takes longer to run the next test than the previous one.* Plus this is even more blatantly throwing darts than usual.

And it's not a good test either. It has only two possible outcomes: reordering the gauges will prevent the crash or the crash will keep happening. Neither outcome tells us anything.

You need to think strategically. It makes sense that the root cause lies elsewhere and what we're seeing is merely a symptom. This is consistent with NVIDIA's statement that malformed data can cause problems (though not conclusive). So, how might we check that the data isn't being corrupted or incorrectly constructed? What logging statements can we add? What Asserts can we add? Where should we expand our search? If we think that one of the other HUD gauges is corrupting the OpenGL state, is there some sort of sanity check that we can temporarily insert into each gauge's code? (This is what I meant by "construct a fault tree" and "zero in on the problem".)


*I assume this is caused by memory leaks due to the crash. The game will randomly freeze for a second or so, and every time it crashes and I run another test the freezes will occur more frequently. However, the crash is not dependent on whether or how frequent the freezing occurs.
(0016551)
Goober5000   
2015-03-10 21:11   
Because apparently I like to contradict myself, I ran a few tests with moving HUD_OBJECT_KILLS up a few indexes in the retail_gauges[] array. Turns out that the crash occurs if it is immediately below HUD_OBJECT_TARGET_TRI, but does not occur if it is immediately above it.
(0016552)
Goober5000   
2015-03-10 21:12   
(Note that usually HUD_OBJECT_KILLS is two spaces below HUD_OBJECT_TARGET_TRI, not one space.)
(0016553)
MageKing17   
2015-03-10 22:12   
And that is exactly why I wanted you to move HUD_OBJECT_KILLS around; because now we know that we need to look at HudGaugeTargetTriangle for odd behavior. Not sure what's not strategic about wanting to find that out...

Given that, HUD_OBJECT_HOSTILE_TRI, HUD_OBJECT_TARGET_TRI, and HUD_OBJECT_MISSILE_TRI all use HudGaugeReticleTriangle::renderTriangle() for their drawing, there may be something wrong with that function that's causing the kills gauge to crash.

If you want to verify that renderTriangle() is causing the problem, you could try disabling the gauges that use it instead of the kills gauge (with HUD_OBJECT_KILLS in its original position, of course) and see if it still crashes. They're "locked missile direction" and "current target direction" (not "target orientation", which is HUD_OBJECT_ORIENTATION_TEE) in the in-game HUD config menu (HUD_OBJECT_HOSTILE_TRI can't be turned off via this menu, but since it didn't crash for you before with HUD_OBJECT_KILLS immediately below it, that probably won't be a problem for this test).
(0016554)
MageKing17   
2015-03-11 00:04   
For experimentation purposes, I've attached a patch that makes the triangles get drawn with an OpenGL mode of GL_TRIANGLES instead of GL_TRIANGLES_FAN. Theoretically there should be no difference between the two, but, well, theoretically, glBindTexture() shouldn't be crashing at all.

One last retail_gauges[] rearrangement test I'd like to see the results of is HUD_OBJECT_ETS_RETAIL swapped with HUD_OBJECT_KILLS. If the crash occurs in HudGaugeEtsRetail, then we'll know that there's no difference between renderBitmap() and renderBitmapEx(), and then maybe we can take a look at why HudGaugeFixedMessages::render() not only doesn't trigger the crash, but fixes the problem for gauges rendered afterwards.

(If HudGaugeEtsRetail doesn't crash, then we'll be freaking out because there's very little difference between gr_opengl_aabitmap() and gr_opengl_aabitmap_ex(); they both call opengl_aabitmap_ex_internal().)
(0016556)
Goober5000   
2015-03-15 02:15   
With the kills gauge in the original place, I disabled "locked missile direction" and "current target direction". No crash. Then I re-enabled "current target direction" and got a completely different crash at the same time instant: the one shown in the "enabling triangles" attachment.

I'll try your triangles patch next, but I need to restart first. Too many memory leaks...
(0016557)
Goober5000   
2015-03-15 02:56   
(Last edited: 2015-03-15 02:57)
First I swapped HUD_OBJECT_ETS_RETAIL with HUD_OBJECT_KILLS. No crash at all.

Then I reverted the code, added your TMAP_FLAG_TRILIST patch (the hudtarget.cpp.patch uploaded on March 10), created a new pilot. Crashed immediately. (That patch doesn't seem to contain either GL_TRIANGLES or GL_TRIANGLES_FAN; do I have the right patch?)

(0016558)
MageKing17   
2015-03-15 04:06   
"That patch doesn't seem to contain either GL_TRIANGLES or GL_TRIANGLES_FAN; do I have the right patch?"

Yes. It adds a flag that causes a later function to use GL_TRIANGLES instead of GL_TRIANGLES_FAN. It shouldn't have made a difference, and it didn't. Good to know.

"First I swapped HUD_OBJECT_ETS_RETAIL with HUD_OBJECT_KILLS. No crash at all."

This is mind-boggling. HudGaugeEtsRetail calls calls a function that calls HudGauge::renderBitmapEx(), which follows an extremely similar to codepath to HudGauge::renderBitmap(), so I don't--

...Huh. I guess it does call HudGauge::renderPrintf() before that (to draw the "G", "S", "E" labels), so maybe that's why it doesn't crash. I guess that makes more sense than HudGaugeFixedMessages::render() fixing anything, since it shouldn't actually do anything most of the time. I wonder why gr_opengl_string() getting called avoids the crash, given that it also calls gr_opengl_tcache_set().

...In fact, in terms of actual OpenGL calls, the two should be doing exactly the same thing, up to the point where the kills gauge crashes. Both gr_opengl_string() and opengl_aabitmap_ex_internal() do the following things in the following order (with identical code):

GL_state.SetTextureSource(TEXTURE_SOURCE_NO_FILTERING);
GL_state.SetAlphaBlendMode(ALPHA_BLEND_ALPHA_BLEND_ALPHA);
GL_state.SetZbufferType(ZBUFFER_TYPE_NONE);

if ( !gr_opengl_tcache_set(gr_screen.current_bitmap, TCACHE_TYPE_AABITMAP, &u_scale, &v_scale) ) {
(0016559)
Goober5000   
2015-03-16 00:24   
(Last edited: 2015-03-16 00:29)
Did your comment get truncated? Not sure if your code snippet is meant to go to the closing brace, or if you were going to propose another test.

(0016560)
MageKing17   
2015-03-16 00:33   
No, I stopped quoting on the line that the kills gauge crashed on, and I have no other suggestions given there shouldn't actually be any difference in OpenGL calls between those two gauges.
(0016561)
Goober5000   
2015-03-16 11:10   
Did you get anything out of the "enabling triangles.txt" stack trace?
(0016562)
MageKing17   
2015-03-16 12:44   
Given that it happened in std::_Uninit_move(), I'm pretty sure there's memory corruption going on. At this point my recommendation is to reorder retail_gauges[] and/or immerse the computer in holy water before nuking it from orbit, then try to forget this ever happened.
(0016563)
chief1983   
2015-03-16 12:59   
Reordering sounds like a slightly better solution than asking people to disable their kills gauge, and I'm fine to release 3.7.2 with that but it would still be nice to someday understand this behavior more fully.
(0016564)
Goober5000   
2015-03-16 22:07   
Let's not "fix" it by reordering the array, otherwise the problem may manifest it in a different mysterious way and we'll be back to the wild goose chase again.

We really need someone who has a solid knowledge of the graphics code and can audit our API calls to make sure the data is passed in properly. (Or who at least knows what Assertions to put in place that can more precisely detect the problem.) Do we have nobody on the SCP who can fill that role?
(0016571)
MageKing17   
2015-03-22 03:16   
Swifty just found an interesting potential problem while working on the deferred+shadows branch: http://www.hard-light.net/forums/index.php?topic=89379.0

Does this patch affect the issue in any way?
(0016572)
Goober5000   
2015-03-23 02:12   
Alas, no effect. It crashes in the same place.
(0016578)
Goober5000   
2015-03-26 21:32   
On a whim, I decided to try the nightly builds in case this was due to a compiler optimization issue like the infamous Y targeting bug. No such luck. The standard, SSE, and no-SSE builds all crashed.
(0016579)
chief1983   
2015-03-27 09:49   
How about Swifty's deferred lighting branch? If it doesn't crash that way for you, then surely this is just a temporary issue at worst that is already resolved in what will be merged into trunk soon anyway.
(0016582)
Goober5000   
2015-03-27 22:23   
I thought I had posted about that but I guess not. No, neither Swifty's patch against trunk nor Swifty's full branch fixes the crash.
(0016584)
chief1983   
2015-03-28 08:54   
I might have missed it if you had but it's good to have documentation here. Still, I think we're going to have to release without knowing the full cause of the issue at this point. Can we get a reordering patch that at least works around the issue so anyone on Goober's hardware wouldn't have to disable the kills gauge?
(0016585)
Goober5000   
2015-03-28 14:57   
I don't want to "fix" it with a reordering patch. It'll just hide the issue and it's possible that it will then be caused by some other strange combination of settings and we'll have to track it down all over again.

So far no graphics coder has taken a serious look at this ticket. Since Swifty is knowledgeable in the ways of OpenGL I want him to take a look first. If he's stumped too, then we can release.
(0016586)
chief1983   
2015-03-28 23:47   
I appreciate that we do want to get the root cause of this issue fixed, but as it seems to only currently affect an incredibly narrow set of circumstances, of which the ones we're aware of we can work around, I think it's safe to go with the workaround now and see if Swifty can investigate down the road, possibly with his deferred lighting branch which he is probably more comfortable with at this point. I've been requested by our commander in chief to get a release out very soon and don't think we should be waiting any longer for this issue.
(0016587)
Goober5000   
2015-03-29 00:41   
The workaround we know of is to disable the kills gauge. That's reasonable as a workaround. But if we apply a code hack that we neither understand nor know the consequences of, it could backfire. It's better to release with a known bug than a buried land mine.

However, I have managed to find a knowledgeable person who provided some diagnostic logging statements and has a test plan. Let's wait to see how that pans out first.
(0016588)
The_E   
2015-03-29 04:11   
Let's not. This issue seems to be fixed for most people affected due to driver updates, and we've held up this release for far too long already.

I fully understand that the workarounds available are bad. I would absolutely prefer to ship 3.7.2 with this bug understood, fixed and buried, but without some sort of timeframe for this fix, and workarounds available, I'd rather we not keep everyone else waiting.
(0016589)
m_m   
2015-03-29 12:38   
@Goober: We could try looking at the OpenGL calls FSO executes. Can you install GLIntercept (https://code.google.com/p/glintercept/wiki/Downloads?tm=2) and use the FullDebug mode while reproducing the crash?
For comparison it could also help if you provide a glintercept log for when you have the kills gauge disabled. Maybe that will show if something is wrong.
(0016590)
Goober5000   
2015-03-29 13:16   
The E, the timeframe is very short: just a matter of days. We're already making some significant progress -- for instance, we've discovered that the crash occurs on glIsTexture(), not just glBindTexture(), and we've ruled out certain graphics modes.

m!m, I'll take a look at that.
(0016591)
Swifty   
2015-03-29 17:45   
(Last edited: 2015-03-29 17:51)
Some thoughts:

- Can anyone who can reproduce this issue try running with a build before 11165? Perhaps my changes to gr_opengl_string is causing issues.

- While we're on that, can someone try to run the code before the large amount of Xt changes were merged in? The changes that prompted my changes to gr_opengl_string were because of the gr_opengl_string changes from the Xt merges. Basically run a build before 11116.

- Just for the interests of science, can someone comment out the line with renderBitmap in HudGaugeKills::render and see what happens?

- Also try to comment out beam_render, beam_render_muzzle_glow, or beam_generate_muzzle_particles to see what happens.

- Also try out pabst_bleu_ribbon (https://github.com/SamuelCho/Freespace-Open-Swifty/tree/pabst_bleu_ribbon) to see if the bug is persisting there. I replaced all fixed function rendering with a passthrough shader. Maybe Nvidia's drivers aren't very tolerant of fixed function rendering anymore.

(0016592)
Goober5000   
2015-03-29 23:02   
Doesn't seem to be due to the Xt changes. Revision 11115 crashes.

Commenting out the renderBitmap line prevents the crash. But commenting out beam_render, beam_render_muzzle_glow, or beam_generate_muzzle_particles does not.

I checked out the PBR branch and that doesn't crash, but there are no stars and all the ships are black silhouettes.
(0016593)
Goober5000   
2015-03-29 23:49   
M!m, there is no crash when running using GLIntercept. This actually makes sense if GLIntercept uses debug code and the crash is due to a data problem, because garbage values would be sanitized to NULL or other known values.
(0016597)
Goober5000   
2015-04-02 00:43   
Just a quick update to say that progress is being made. We have some debugging callbacks running in the OpenGL functions now.
(0016608)
chief1983   
2015-04-07 15:09   
All right, progress is nice but seriously, we need like daily updates at this point to keep justifying waiting on this bug.
(0016620)
Goober5000   
2015-04-08 00:28   
Sorry, Easter intervened. We're back in contact now though.

Unfortunately the OpenGL debug callback is *itself* crashing, which is leaving us both stumped. I want to give this another day or two, but if after that time we haven't figured it out, then I'll punt and we can release.

So start warming up your release script, and someone should do a quick pass through the tickets to make sure there are no patches for fixed bugs waiting to be committed.
(0016621)
Goober5000   
2015-04-08 01:42   
Actually I may have spoken too soon. I found a way to get the debug callback to run, and I've sent along the appropriate logs. Hopefully they can provide a clue to solving this.

Or they may tell us nothing, in which case revert to my previous comment.
(0016622)
Swifty   
2015-04-08 03:54   
So, I took a careful look at the logs posted in this ticket and I noticed the shaders are displaying a lot of warnings related to the desaturate uniforms in the model rendering shaders. Unfortunately hose warnings don't seem to appear on any other machine I compile those shaders on (Notably an AMD Radeon 290, a Geforce 750M, a Geforce 750M with Mac drivers, and a Geforce 470 GTX). So, this is just a stab in the dark to see if this little bit of code cleanup will fix the issue. Check the attached desaturate_fix.patch file I've uploaded.

I've also noticed that each log file seem to end with the Diffuse and Glow Map shader variant is compiled so I tried to see if there's anything peculiar about the #defines and the #ifdefs concerning glow maps and diffuse in the shaders but I've found nothing.
(0016627)
Goober5000   
2015-04-08 23:13   
Alas, the desaturate patch didn't fix the beam crash. I've attached the stack trace and log anyway.

On my end, we found and fixed a VBO configuration error and generated a new batch of logs. No smoking gun yet.

(0016632)
Goober5000   
2015-04-09 22:29   
Well, poo. The debug callbacks into the OpenGL functions told us exactly nothing useful. Which means we're out of ideas.

Time to punt on this and release 3.7.2. It sucks, but what are you going to do.
(0016634)
Swifty   
2015-04-09 23:06   
Another patch to try. I noticed that we never call SetShaderMode on the texture unit state handlers when rendering soft particles. Maybe us calling glEnable on a texture unit being used in shader draw calls is causing issues?
(0016635)
Goober5000   
2015-04-10 00:13   
I applied the patch to clean trunk. Alas, it crashed just as before. Logs attached.
(0016636)
Swifty   
2015-04-10 00:21   
Last hunch. Basically I make sure to disable all texture units in opengl_render_internal if we're not doing any texturing.
(0016640)
Goober5000   
2015-04-10 23:50   
Crashed again. :( Logs attached as before.
(0016641)
Goober5000   
2015-04-15 23:08   
Assigning this bug to FSO 4 and untargeting from 3.7.2.
(0016710)
Goober5000   
2015-05-20 01:08   
Hmm. VS 2005 builds from both r11289 and 3.7.2 final exhibit the crash. But VS 2013 builds from both r11289 and 3.7.2 *do not* exhibit the crash.

This makes me think that this error *was* a compiler problem, except that the nightly builds I tested from r11289 to rule out this possibility (see #c16578) crashed as well. I've asked chief1983 what compiler he used to create the r11289 builds.
(0016711)
MageKing17   
2015-05-20 01:13   
All Nightly builds prior to 2015-05-07 (935af40) were built with VS 2008 (since then, they've been built with VS 2013; see also http://www.hard-light.net/forums/index.php?topic=89681.msg1784993#msg1784993).
(0016713)
chief1983   
2015-05-20 08:29   
Yup, what he said. That was the cutover date, and I haven't looked back since.
(0016715)
Goober5000   
2015-05-20 23:46   
I'll test the two nightlies on either side of that date. Tomorrow though.
(0016716)
Goober5000   
2015-05-22 00:05   
Well, so much for that theory. Both 2015-05-06 (4d90765) and 2015-05-07 (935af40) crashed -- and there was even an attempt when 2015-05-06, the VS 2008 build, did *not* crash. This does not make any sense.

Guess I'll shove this ticket back under the bed then.
(0016805)
Goober5000   
2015-12-19 18:06   
Changing description. It's not widely reported anymore, but it has popped again for a couple people, and it certainly is insidious.

Issue History
2014-11-21 19:07Goober5000New Issue
2014-11-21 19:07Goober5000File Added: fs2_open-goober5000.log
2014-11-21 19:07Goober5000Statusnew => confirmed
2014-11-21 19:09Goober5000Additional Information Updatedbug_revision_view_page.php?rev_id=951#r951
2014-11-21 19:10Goober5000Description Updatedbug_revision_view_page.php?rev_id=953#r953
2014-11-21 19:15Goober5000File Deleted: fs2_open-goober5000.log
2014-11-21 19:15Goober5000File Added: fs2_open-mediavps.log
2014-11-21 19:16Goober5000File Added: fs2_open-nomediavps.log
2014-11-21 19:16Goober5000Description Updatedbug_revision_view_page.php?rev_id=954#r954
2014-11-21 20:20MageKing17Note Added: 0016390
2014-11-22 13:20Goober5000Note Added: 0016391
2014-11-22 13:20Goober5000Note Edited: 0016391bug_revision_view_page.php?bugnote_id=16391#r956
2014-11-22 13:22Goober5000Note Edited: 0016391bug_revision_view_page.php?bugnote_id=16391#r957
2014-11-22 13:31Goober5000Note Added: 0016392
2014-11-22 13:31Goober5000File Added: crash error.jpg
2014-11-22 13:31Goober5000File Added: call stack.txt
2014-11-22 14:45Goober5000File Added: MVP 2014 patches.zip
2014-11-22 14:45Goober5000Note Added: 0016393
2014-11-22 16:36Goober5000Note Added: 0016394
2014-11-23 11:24m_mNote Added: 0016395
2014-11-23 12:08Goober5000Note Added: 0016396
2014-11-23 12:50m_mNote Added: 0016397
2014-11-23 13:38MageKing17Note Added: 0016398
2014-11-23 14:30Goober5000File Added: call stack 2.txt
2014-11-23 14:30Goober5000File Added: call stack 3.txt
2014-11-23 14:31Goober5000File Added: fs2_open 3.log
2014-11-23 15:15Goober5000File Added: fs2_open 4.log
2014-11-23 15:15Goober5000File Added: call stack 4.txt
2014-11-23 15:15Goober5000Note Added: 0016399
2014-12-14 14:39chief1983Product Version3.7.2 => 3.7.2 RC4
2014-12-19 10:03chief1983Note Added: 0016420
2014-12-20 16:29Goober5000Note Added: 0016422
2014-12-20 16:34chief1983Note Added: 0016423
2014-12-20 17:29Goober5000Note Added: 0016424
2014-12-20 17:36MageKing17Note Added: 0016425
2014-12-20 17:53chief1983Note Added: 0016426
2014-12-20 19:55MageKing17Note Added: 0016427
2014-12-20 21:45Goober5000Note Added: 0016428
2014-12-20 23:23chief1983Note Added: 0016429
2015-01-03 16:59z64555Note Added: 0016435
2015-01-03 17:00z64555Note Edited: 0016435bug_revision_view_page.php?bugnote_id=16435#r961
2015-01-03 17:02z64555Note Edited: 0016435bug_revision_view_page.php?bugnote_id=16435#r962
2015-01-03 17:04Goober5000Note Added: 0016436
2015-01-03 17:49Goober5000Note Added: 0016437
2015-01-03 17:49Goober5000Note Edited: 0016437bug_revision_view_page.php?bugnote_id=16437#r964
2015-01-03 21:07Goober5000File Added: call stack 5.txt
2015-01-04 02:08niffiwanNote Added: 0016439
2015-01-04 02:08niffiwanNote Edited: 0016439bug_revision_view_page.php?bugnote_id=16439#r966
2015-01-04 02:38Goober5000Note Added: 0016441
2015-01-04 02:54MageKing17Note Added: 0016443
2015-01-05 00:31Goober5000File Added: call stack 6.txt
2015-01-05 00:33Goober5000Note Added: 0016444
2015-01-05 00:33Goober5000Note Edited: 0016444bug_revision_view_page.php?bugnote_id=16444#r968
2015-01-05 02:30MageKing17File Added: hud.cpp.patch
2015-01-05 02:30MageKing17Note Added: 0016445
2015-01-05 03:31MageKing17File Added: gropengltexture.cpp.patch
2015-01-05 03:32MageKing17Note Edited: 0016445bug_revision_view_page.php?bugnote_id=16445#r970
2015-01-07 17:04MageKing17File Added: smother_with_error_checking.patch
2015-01-09 10:56Goober5000Note Added: 0016448
2015-01-10 22:43Goober5000File Added: call_stack-2015-01-10.txt
2015-01-10 22:43Goober5000File Added: fs2_open-2015-01-10.log
2015-01-10 22:44Goober5000Note Added: 0016449
2015-01-10 22:51MageKing17Note Added: 0016450
2015-01-11 17:30Goober5000Note Added: 0016451
2015-01-11 17:39MageKing17Note Added: 0016452
2015-01-11 17:42chief1983Note Added: 0016453
2015-01-17 21:30Goober5000Note Added: 0016454
2015-01-29 07:44niffiwanNote Added: 0016466
2015-01-29 16:33niffiwanNote Edited: 0016466bug_revision_view_page.php?bugnote_id=16466#r978
2015-02-01 02:59Goober5000Note Added: 0016467
2015-02-07 12:04IngloniasNote Added: 0016475
2015-02-07 12:20IngloniasNote Edited: 0016475bug_revision_view_page.php?bugnote_id=16475#r986
2015-02-07 12:52Goober5000Note Added: 0016476
2015-02-08 17:09niffiwanNote Added: 0016477
2015-02-08 17:31niffiwanNote Added: 0016478
2015-02-08 21:04Goober5000File Added: bmpman_entry.txt
2015-02-08 21:04Goober5000Note Added: 0016479
2015-02-08 21:05Goober5000Note Edited: 0016479bug_revision_view_page.php?bugnote_id=16479#r988
2015-02-08 21:22niffiwanNote Edited: 0016477bug_revision_view_page.php?bugnote_id=16477#r990
2015-02-08 21:27MageKing17Note Added: 0016480
2015-02-08 21:30niffiwanNote Edited: 0016478bug_revision_view_page.php?bugnote_id=16478#r992
2015-02-08 23:42Goober5000Note Added: 0016482
2015-02-08 23:52niffiwanNote Added: 0016483
2015-02-09 00:23MageKing17Note Added: 0016485
2015-02-09 02:22Goober5000Note Added: 0016486
2015-02-09 02:23Goober5000Note Edited: 0016486bug_revision_view_page.php?bugnote_id=16486#r994
2015-02-09 05:21niffiwanNote Added: 0016487
2015-02-09 05:22niffiwanNote Edited: 0016487bug_revision_view_page.php?bugnote_id=16487#r996
2015-02-09 05:23niffiwanNote Edited: 0016487bug_revision_view_page.php?bugnote_id=16487#r997
2015-02-09 05:24niffiwanNote Edited: 0016487bug_revision_view_page.php?bugnote_id=16487#r998
2015-02-09 05:42niffiwanNote Edited: 0016487bug_revision_view_page.php?bugnote_id=16487#r999
2015-02-09 22:00Goober5000Note Added: 0016494
2015-02-18 20:28MageKing17Note Added: 0016498
2015-02-18 22:00Goober5000Note Added: 0016499
2015-02-19 13:00MageKing17Note Added: 0016500
2015-02-19 13:10chief1983Note Added: 0016501
2015-02-19 20:22niffiwanNote Added: 0016502
2015-02-20 23:51Goober5000Note Added: 0016503
2015-02-22 16:03LotFNote Added: 0016505
2015-02-22 20:15Goober5000Note Added: 0016509
2015-02-22 20:16Goober5000File Added: bm_bitmaps watch.txt
2015-02-22 22:41MageKing17Note Added: 0016510
2015-02-22 23:42Goober5000Note Added: 0016511
2015-02-26 19:56MageKing17File Added: disable_reuse.patch
2015-02-26 19:57MageKing17Note Added: 0016515
2015-03-02 02:01Goober5000Note Added: 0016523
2015-03-02 02:40MageKing17Note Added: 0016524
2015-03-03 22:14MageKing17File Added: 3130.patch
2015-03-03 22:17MageKing17Note Added: 0016533
2015-03-03 22:18MageKing17File Deleted: 3130.patch
2015-03-03 22:18MageKing17File Added: 3130.patch
2015-03-03 22:45MageKing17File Deleted: 3130.patch
2015-03-03 22:45MageKing17File Added: 3130.patch
2015-03-04 00:45Goober5000File Added: call stack wo dummy.txt
2015-03-04 00:45Goober5000File Added: call stack w dummy.txt
2015-03-04 00:46Goober5000File Added: call stack w dummy and kills disabled.txt
2015-03-04 00:46Goober5000File Added: different crash call stack.txt
2015-03-04 00:46Goober5000File Added: different crash fs2_open.7z
2015-03-04 00:47Goober5000Note Added: 0016534
2015-03-04 01:06MageKing17File Deleted: 3130.patch
2015-03-04 01:06MageKing17File Added: 3130.patch
2015-03-04 01:16MageKing17Note Added: 0016535
2015-03-05 03:29MageKing17File Added: redundancy.patch
2015-03-05 03:31MageKing17Note Added: 0016536
2015-03-09 11:07chief1983Note Added: 0016539
2015-03-09 12:28IngloniasNote Added: 0016540
2015-03-09 12:28IngloniasNote Edited: 0016540bug_revision_view_page.php?bugnote_id=16540#r1009
2015-03-09 12:34chief1983Note Added: 0016541
2015-03-09 12:36IngloniasNote Added: 0016542
2015-03-09 12:37MageKing17Note Added: 0016543
2015-03-09 12:43IngloniasNote Added: 0016544
2015-03-09 13:01IngloniasNote Added: 0016545
2015-03-09 23:05Goober5000Note Added: 0016546
2015-03-10 00:23Goober5000File Added: callstack w redundancy.txt
2015-03-10 00:23Goober5000File Added: units.txt
2015-03-10 00:28Goober5000Note Added: 0016547
2015-03-10 00:37MageKing17Note Added: 0016548
2015-03-10 00:38MageKing17Note Edited: 0016548bug_revision_view_page.php?bugnote_id=16548#r1011
2015-03-10 00:45MageKing17Note Edited: 0016548bug_revision_view_page.php?bugnote_id=16548#r1012
2015-03-10 01:07MageKing17Note Edited: 0016548bug_revision_view_page.php?bugnote_id=16548#r1013
2015-03-10 01:46IngloniasNote Added: 0016549
2015-03-10 01:46IngloniasNote Edited: 0016549bug_revision_view_page.php?bugnote_id=16549#r1015
2015-03-10 20:39Goober5000Note Added: 0016550
2015-03-10 21:11Goober5000Note Added: 0016551
2015-03-10 21:12Goober5000Note Added: 0016552
2015-03-10 22:12MageKing17Note Added: 0016553
2015-03-10 23:51MageKing17File Added: hudtarget.cpp.patch
2015-03-11 00:04MageKing17Note Added: 0016554
2015-03-15 02:15Goober5000Note Added: 0016556
2015-03-15 02:16Goober5000File Added: enabling triangles.txt
2015-03-15 02:56Goober5000Note Added: 0016557
2015-03-15 02:57Goober5000Note Edited: 0016557bug_revision_view_page.php?bugnote_id=16557#r1017
2015-03-15 04:06MageKing17Note Added: 0016558
2015-03-16 00:24Goober5000Note Added: 0016559
2015-03-16 00:29Goober5000Note Edited: 0016559bug_revision_view_page.php?bugnote_id=16559#r1019
2015-03-16 00:33MageKing17Note Added: 0016560
2015-03-16 11:10Goober5000Note Added: 0016561
2015-03-16 12:44MageKing17Note Added: 0016562
2015-03-16 12:59chief1983Note Added: 0016563
2015-03-16 22:07Goober5000Note Added: 0016564
2015-03-22 03:16MageKing17Note Added: 0016571
2015-03-23 02:08Goober5000File Added: swifty.patch
2015-03-23 02:12Goober5000Note Added: 0016572
2015-03-26 21:32Goober5000Note Added: 0016578
2015-03-27 09:49chief1983Note Added: 0016579
2015-03-27 22:23Goober5000Note Added: 0016582
2015-03-28 08:54chief1983Note Added: 0016584
2015-03-28 14:57Goober5000Note Added: 0016585
2015-03-28 23:47chief1983Note Added: 0016586
2015-03-29 00:41Goober5000Note Added: 0016587
2015-03-29 04:11The_ENote Added: 0016588
2015-03-29 12:38m_mNote Added: 0016589
2015-03-29 13:16Goober5000Note Added: 0016590
2015-03-29 17:45SwiftyNote Added: 0016591
2015-03-29 17:46SwiftyNote Edited: 0016591bug_revision_view_page.php?bugnote_id=16591#r1027
2015-03-29 17:51SwiftyNote Edited: 0016591bug_revision_view_page.php?bugnote_id=16591#r1028
2015-03-29 23:02Goober5000Note Added: 0016592
2015-03-29 23:49Goober5000Note Added: 0016593
2015-04-02 00:43Goober5000Note Added: 0016597
2015-04-07 15:09chief1983Note Added: 0016608
2015-04-08 00:28Goober5000Note Added: 0016620
2015-04-08 01:42Goober5000Note Added: 0016621
2015-04-08 03:49SwiftyFile Added: desaturate_fix.patch
2015-04-08 03:54SwiftyNote Added: 0016622
2015-04-08 21:05Goober5000Changeset attached => fs2open trunk r11298
2015-04-08 23:10Goober5000File Added: desaturate-callstack.txt
2015-04-08 23:11Goober5000File Added: desaturate-fs2_open.log
2015-04-08 23:13Goober5000Note Added: 0016627
2015-04-08 23:13Goober5000Note Edited: 0016627bug_revision_view_page.php?bugnote_id=16627#r1032
2015-04-09 22:29Goober5000Note Added: 0016632
2015-04-09 23:01SwiftyFile Added: shader_mode_hunch.patch
2015-04-09 23:06SwiftyNote Added: 0016634
2015-04-10 00:13Goober5000File Added: hunch-callstack.txt
2015-04-10 00:13Goober5000File Added: hunch-fs2_open.log
2015-04-10 00:13Goober5000Note Added: 0016635
2015-04-10 00:17SwiftyFile Added: hunch2.patch
2015-04-10 00:21SwiftyNote Added: 0016636
2015-04-10 02:44The_EPriorityurgent => normal
2015-04-10 02:44The_ESeverityblock => major
2015-04-10 23:50Goober5000File Added: hunch2-callstack.txt
2015-04-10 23:50Goober5000File Added: hunch2-fs2_open.log
2015-04-10 23:50Goober5000Note Added: 0016640
2015-04-15 23:08Goober5000Note Added: 0016641
2015-04-15 23:08Goober5000Assigned To => FSO 4
2015-04-15 23:08Goober5000Resolutionopen => suspended
2015-04-15 23:08Goober5000Target Version3.7.2 =>
2015-05-20 01:08Goober5000Note Added: 0016710
2015-05-20 01:09Goober5000Relationship addedrelated to 0000001
2015-05-20 01:13MageKing17Note Added: 0016711
2015-05-20 08:29chief1983Note Added: 0016713
2015-05-20 23:46Goober5000Note Added: 0016715
2015-05-22 00:05Goober5000Note Added: 0016716
2015-12-19 18:06Goober5000Note Added: 0016805
2015-12-19 18:06Goober5000SummaryThe widely reported beam crash bug => The insidious beam crash bug