Source Code Project Mantis - FSSCP
View Issue Details
0000859FSSCPsoundpublic2006-03-11 01:312007-02-02 23:37
Assigned Totaylor 
PlatformOSOS Version
Product Version 
Target VersionFixed in Version3.6.9 
Summary0000859: Nebulas + 64bit + CVS + big bomb exposions = seg fault
DescriptionI have an Athlon64 X2 and a GeForce 6800GT. I run Fedora Core 4 with latest updates and a 64 bit SMP kernel. Most missions work fine expect those with big explosions (usually from shivan cyclops bombs) in a nebula. This is a 64bit build or system issue because I have an older 32bit system that works fine with these missions using the EXACT same install including ~/.fs2_open files. I've tried various CVS builds from March 6 - 9, all have had the same problem on my 64bit system. I use the delta media vps. This might even be a dual core problem.
Additional InformationI am not convinced it is a mediavp problem because I have tried playing without the mediavps (by ommitting the -mod mediavps switch). The problem is less frequent but still occurs. I don't know how to upload multiple files so for now all my segfault logs are here where Z is, is a number between 1 and 9. They are mixed console output and debug build logs.
TagsNo tags attached.
Attached Filestxt segfault2.txt (12,568) 2006-03-11 01:31
txt segfault1.txt (497) 2006-03-11 01:33
txt segfault3.txt (10,080) 2006-03-11 01:33
txt segfault4.txt (7,720) 2006-03-11 01:34
txt segfault5.txt (220) 2006-03-11 01:34
txt segfault6.txt (21,020) 2006-03-11 01:35
txt segfault7.txt (23,439) 2006-03-11 01:35
log gdb.log (3,717) 2006-03-19 00:42
log fs2_open.log (83,754) 2006-03-19 22:44
txt ptr_patch.txt (18,645) 2006-05-26 19:19

2006-03-11 01:38   
Well, I figured out how to upload more logs. I forgot to mention the missions I have problems with most often are "A Game of Tag", "Battle of the Wilderness", and that mission after the first Sathanas is destroyed. I forgot what it is called.
2006-03-11 13:37   
This shouldn't be directly related to MediaVPs. I'll try take a look at it when I get the chance.
2006-03-14 13:41   
I'm starting to think this is a sound issue afterall. I've tried the -nosound switch and I was able to play through the usual seg fault locations. -snd_preload -nomusic did not help, however. The latest openal for Fedora Core 4 from the yum repositories is a 2006-02-02cvs build apparently. I'll see if I can try a newer openal version later.
2006-03-14 13:55   
I'll try upgrading my OpenAL drivers (I'm using something old) and try again to reproduce this. The seem to be a couple of sound related crashes lately so this is getting bumped up a bit on the priority list.
2006-03-18 04:38   
I updated to the same version of OpenAL as you but can't recreate any of the problems. I did fix one possibly unrelated error though. Grab updated CVS and see if it makes a difference. It will probably do the same thing as before, but it's worth a try.
2006-03-18 13:35   
Unfortunately the CVS didn't help. However, I do have some good news because I've been able to narrow down the actual problem. Nebulas no longer have anything to do with the crash. It has to do with a certain high pitched squeel of Shivan bombers. The sound is a doppler effect sound I believe. Usually occurs when passing a Shivan bomber while it is flying toward you. I usually get the following on the command line after the seg fault... ./fs2_open: line 2: 9710 Segmentation fault. But the 9710 number varies. If I'm very careful, I can avoid the circumstances that cause this sound.
2006-03-18 15:04   
Hmm, can you run a debug build through gdb and get a backtrace from it? I appear to be able to play the flyby sound just fine so I'm not sure if it's the sound itself or something in the code that's playing it (object code, species code, or sound code).
2006-03-19 00:46   
I ran a debug build through gdb. I've never run it before so I'm not exactly sure if I got all the needed output. Anyways it is in gdb.log. I also ran strace, however, it was 21mb. When compressed it is 919K, so if you want it I can upload it too.
2006-03-19 14:06   
If you still have the ~/.fs2_open/data/fs2_open.log file from that gdb session then I will probably need that too.

These model (like bmpman) problems are basically impossible to fix unless I can recreate them myself. It's easy to know what went wrong but it's extremely difficult to know why without debugging it step-by-step, and that's not something you can tell someone how to do. I'll keep trying to reproduce the problem but other than that there isn't anything I can do until then. If you find any other places that this appears to happen on a regular basis then please let me know, the more places I can try to replicate this the better.
2006-03-19 22:51   
I uploaded the fs2_open.log file. Fedora Core 5 should come out some time later this month and I'll probably upgrade. I'll let you know if you the problem persists then. I read most of the manual for GDB and I can see why the process can't really be explained.
2006-04-07 22:45   
I have good news and bad news. Bad news is I've tried Fedora Core 5 64bit and it did not help. Good news is I tried the 32bit version as well and it did work. I was able to do the flyby without a seg fault. So I guess my problem (whatever it is) is unresovable for now. I think I'll stick with 32bit OSes for now until I can get more firefox plugins, zsnes, etc to work. Thanks for the help thus far.
2006-05-09 22:04   
bug 0000908 is a duplicate of this one
I have isolated the problem to a corrupted timer callback. Just need to find out why :)
Will post more when I have more data.
2006-05-25 01:27   
Found it! As I thought, 32/64 bit pointer handling. This is going to take me a while to fix, as it has implications anywhere sound is used, so it may take me a few days. Using DWORD to hold a void*. I see this so often in Windows code. GAAAAAHHHH!
2006-05-26 03:36   
Where is it??? I haven't found any 32/64-bit issues, a 128/64-bit issue or two, but not the other. GCC shouldn't even let a DWORD hold a void* and should error out during compile so I'm a little surprised that something has slipped through.
2006-05-26 15:15   
It is in a whole bunch of places in the timer code
a few examples
295 BOOL Create (UINT nPeriod, UINT nRes, DWORD dwUser, TIMERCALLBACK pfnCallback);
1489 m_timer.Create (m_nBufService, m_nBufService, ptr_u (this), TimerCallback);
DWORD is unsigned int, ptr_u is unsigned long long
this is cast from a pointer to an unsigned long long (no warning) and then to an unsigned int (no warning)
this is just one of about a dozen locations I've found so far
the problem is, these functions work with timeSetEvent which expects DWORD values for pointers (the Windows API is full of this kind of crap)
there may also be something wonky in the PCM decode functions, but that is going to take more time. C has some completely braindead type promotion rules, so it makes it hard to spot (like sign extension of unsigned integers)
am working on a patch
2006-05-26 16:21   
... Wow, I'm totally amazed that I missed this. I'm probably more confused as to why this has never messed up for me.

Good work tracking this down.
2006-05-26 16:52   
It is a tough one to spot. Even if you are specifically looking for it. What bugs me, though, is it looks like I'm still missing something :( I'm going through all references to ptr_u right now, before I dig into the PCM decode logic. I suspect it because even though all other sounds are OK, I get audio chatter in .mve playback. Another tack I'm considering is maybe a corrupted VP file somewhere. If it isn't too much trouble, could you check these md5's against your version?

8a34dd671563bc06645999809896d4e3 fsport3_0_3.vp
94cb4ca3bae95419a7298591581ad506 fsport-fs1_training.vp
9cca931a15d0c53fd47ae0e75dd019b3 fsport-glow.vp
5a7f76cab6c9350819ae7ad706363f84 fsport-hi_res.vp
71f8282fed8fdf89aedc86c24f620277 fsport-missions.vp
43727c2b55895bcc789ef92e8b966e70 fsport-shine.vp
bab2a33574ec19f1e05503cf8dea5317 mv_core.vp
a1e16a0ab2398b23c6673eb716141694 mv_effects.vp
13dbd48906f4efe466004d6336101bd9 mv_models.vp
db92955baf5e508e8d677890f691c3c3 mv_music.vp
e8ec4396fd6a6da0af9e438af147cc5b mv_textures.vp
42bc56a410373112dfddc7985f66524a root_fs2.vp
0d662decc0b443ccb8e8aa2e3a0887ce smarty_fs2.vp
2a47bdf14860071cf0196d92e9ee7c2f sparky_fs2.vp
acb785362792927b17fae8eb62b21473 sparky_hi_fs1.vp
97661124cdc47c0a2f0678982b8cbd91 sparky_hi_fs2.vp
5fc56b02fa454d60dd9ba423309776fe stu_fs1.vp
e88f0e0011b3e525a5ad625933684c03 stu_fs2.vp
8ca7330cfe63329b41868efc2e40e048 tango1_fs2.vp
6fb6e9a36248980540155a9777c51c47 tango2_fs2.vp
d42c20b6ffb4782e431899c211ae55c4 tango3_fs2.vp
f7f346e4c0339ba38cff4d9d4dc663f3 tangoA_fs2.vp
532fc3f8b68f19b062c18dafc67bc966 tangoB_fs2.vp
38213b6f6222b2e94fc12ee9e36dd588 tango_fs1.vp
b849bbee619fc3a28ebe6ce7710a8063 warble_fs1.vp
d1f3c39d4fe1bbd56b7b06fe66eef4a6 warble_fs2.vp
2006-05-26 18:17   
Now I'm annoyed. Fixing the pointer issues didn't fix it. What did fix it was replacing the S_Flyby2.wav sound effect from sparky_fs2.vp :/ The version I have has a 16 bit .wav for some reason, but freespace is hard coded for 8 bit sound effects. I'll finish up the pointer patch first, then I'll take a look at the sound effects code. I guess this proves the old adage - if it ain't one thing, it's something else. :P
2006-05-26 19:18   
Patch for 32/64 bit pointer handling, with misc other fixes.

Eliminates DWORD<->ptr conversions in sound system, except where required by Windows API. Will patch other modules seperately.
-code/sound/audiostr.cpp - this is only used on Windows, so not modified
-code/sound/audiostr-openal.cpp - left Windows API as is. Someone with a Windows dev box please check that I didn't break anything :/

Eliminates ptr_s

Eliminates ptr_u - all uses have been removed or converted to void*
-GR_SCREEN_PTR - modified to use size_t
-GR_SCREEN_PTR_SIZE does not appear to be used anymore, commented out

fixes PDWORD and LPDWORD in code/windows_stub/config.h so they match DWORD on 64 bit

fixes typo in code/sound/acm.cpp

adds newline to end of code/ship/shipcontrails.cpp so GCC won't complain anymore
2006-05-26 21:29   
Not going to get rid of ptr_s and ptr_u. I know that they aren't really needed, I did it like you did originally, but changed it to the ptr_? stuff when I actually got finished with the original 64-bit port. I just liked it better, and it sticks closer to [V]'s original code (they could have done it differently too).

Not going to make the full set of timeSetEvent() changes either. It's meant to be a wrapper around the WinAPI function so it needs to take the same basic options. Otherwise I just would have went with SDL calls instead (and may go ahead and do that).

I'll try and get the rest of the fixes in this weekend though.

My s_flyby2.wav is 16-bit too, and Freespace isn't hardcoded for 8-bit files. It works better with 16-bit actually, it has more trouble with 8-bit files (the OpenAL code that is). Some of the 8-bit stuff is just hold-over from FS1, but probably 75-80% of all WAVs in FS2 are 16-bit. Here is my md5sum for s_flyby2.wav just in case:
ab6d2f9a82bf77d89656b89ff56c3c9e s_flyby2.wav

I use my own MediaVPs so I'm not including md5sums for those, but they don't include sound effects anways. I'm probably missing a couple of things from your list too, so keep that in mind when you are checking:

8a34dd671563bc06645999809896d4e3 fsport3_0_3.vp
9cca931a15d0c53fd47ae0e75dd019b3 fsport_glow.vp
5a7f76cab6c9350819ae7ad706363f84 fsport_hi-res.vp
71f8282fed8fdf89aedc86c24f620277 fsport-missions.vp
eec0fdbd0412b1eba7d14cc34bad6ef0 fsport-strpatch.vp
0d9fd69acfe8b29d616377b057d2fc04 root_fs2.vp
0d662decc0b443ccb8e8aa2e3a0887ce smarty_fs2.vp
2a47bdf14860071cf0196d92e9ee7c2f sparky_fs2.vp
acb785362792927b17fae8eb62b21473 sparky_hi_fs1.vp
97661124cdc47c0a2f0678982b8cbd91 sparky_hi_fs2.vp
5fc56b02fa454d60dd9ba423309776fe stu_fs1.vp
e88f0e0011b3e525a5ad625933684c03 stu_fs2.vp
8ca7330cfe63329b41868efc2e40e048 tango1_fs2.vp
6fb6e9a36248980540155a9777c51c47 tango2_fs2.vp
d42c20b6ffb4782e431899c211ae55c4 tango3_fs2.vp
f7f346e4c0339ba38cff4d9d4dc663f3 tangoA_fs2.vp
38213b6f6222b2e94fc12ee9e36dd588 tango_fs1.vp
b849bbee619fc3a28ebe6ce7710a8063 warble_fs1.vp
d1f3c39d4fe1bbd56b7b06fe66eef4a6 warble_fs2.vp
2006-05-26 21:59   
As you wish it. I'll baseline to the changes you choose so we stay in sync. The only .vp that is different between us is root_fs2.vp and S_flyby2.wav is the same, so that can't be it, either. :? Back to the drawing board.
2006-05-27 00:58   
I've got your changes (well, the basic idea of them anyway ;)) folded in to my tree so I'll commit those later on today/tomorrow. Once it hits CVS be sure to double-check me in case I forgot something, or introduced a new bug somewhere. Also be sure to run your debug test again just to make sure it really is still happening. I've fixed numerous unrelated issues, but unrelated things have been known to wreak havoc in this code base.

There is still the possibility that something is wrong with the PCM code, I kinda doubt it, but it's not unthinkable. If you have the time, and the desire, go over it and just make sure that nothing is wrong. I have a feeling that this has a more subtle cause though.

As far as the root_fs2.vp goes, be sure that you have the 1.20 version. I've got a Loki Patch updater for it if you need. I've never run that partcular patch on 64-bit though (made it long before I went 64-bit) but I think it should work. You can find the updater here:
2006-05-27 12:49   
All but a couple, really small, commits are now in. Give it a look over and let me know if there are any problems with my changes.
2006-06-03 06:57   
Sorry for dropping off the planet like that, been kinda fried this last week. It seems I didn't have the 1.20 patch after all. :/ We'll see what glitches it fixes. My s_flyby2.wav is identical to yours, but if I copy s_flyby2.wav (or any other .wav, for that matter) to $FREESPACE/data/sounds/8b22k/S_Flyby2.wav, the problem dissappears. I'll try 1.20 without my workaround and see if that changes anything.

As for the patch
ptr_u/void* - I see where you are coming from. It does look cleaner, and I have been known to do things like that myself sometimes. I just find it is too easy to get burned in odd ways sometimes. :P

audiostr-openal.cpp line 512
I am still concerned that this may be a double free of the timer. The way I read the SDL documentation, returning 0 from this function tells SDL to free the timer itself. I haven't been able to get it to trigger, so I haven't been able to verify this. I'll have to remember look at the SDL source later. :)

found an unrelated bug:
code/freespace2/freespace.cpp line 2542
void init_decals();

this line causes a compiler error if ENABLE_DECALS is not defined.
this function is defined in code/decals/decals.h
if ENABLE_DECALS is not defined, an empty macro called init_decals is defined, making this line read 'void ;'
commenting out the line fixes the problem.

I'll test things over the weekend. Right now I need sleep. :)
2006-06-03 07:11   
Hmm, it does sound more like a problem with that particular WAV file then. I don't remember that file getting changed with the 1.20 update, but I guess it's worth a try.

I'll have to read the docs on the SDL timer situation. I didn't know about that but it's deffinitely something that we need to protect against. If you've got time to look into that yourself then please do. :) I'll try and get to it next weekend if you don't have the time.

Regarding the decals problem, part of my commit is missing. I did remove that line (it wasn't in the decals header originally) but I apparently missed that in the commit. Since my code tree had the line removed I never noticed it. I'll go ahead and fix that now. Thanks for the heads-up. :)
2006-06-03 17:23   
Updating root_fs2.vp had no effect.

The SDL docs that come with libsdl1.2 don't mention it. You have to look on their website. The relevant page is in the Description section. I took a look at the source for SDL 1.2.9. src/timer/SDL_timer.c in function SDL_ThreadedTimerCheck around line 150 it does remove the timer if the callback returns 0. You should not call SDL_RemoveTimer from within the callback anyway, because you are inside a lock.

re: decals problem: NP :)
2006-06-22 10:59   
* BUMP *

So, is there a code problem here still, or is it data related?

Looking as the SDL source I don't really see an issue with SDL_RemoveTimer(). It stores the timer ids in a linked list and runs through it, finds the passed value, then free()'s it and removes it from the list. After it's been removed once it wouldn't be found the second time so a double-free should be almost impossible. The same rule should apply to any of the other timer functions removing that timer_id as well. Did you see something else here?
2006-07-08 23:34   

Assuming this is fixed if there isn't a post to the contrary by tomorrow.
2006-07-09 10:07   
It ain't fixed, at least if the crash I have in the Training Simulator module TSM-122x (main campaign) is actually related to this bug.

When : when a bomb explode very close of the ship (or is it that the bomber is too close ?)

My setup : current CVS build, 3.6.8-zeta mediaVP (3.6.8-beta = same problem), debian sid on amd64 3200+
CFLAGS/CXXFLAGS : -march=k8 -O2 -pipe -msse3 (I did try without those, no change)

command line parameters : -glow -spec -spec_exp 11 -spec_point 0.6 -spec_static 0.8 -spec_tube 0.4 -fps -jpgtga -ambient_factor 75 -targetinfo -orbradar -ballistic_gauge - rearm_timer -nograb

hope it helps,
2006-07-24 14:52   
Can you please run it in GDB and attach a backtrace of when it crashes?
2006-07-28 10:00   
1001 is a potential duplicate of this bug
2006-08-19 23:37   
@nodens: If you run with -nosound does it still crash? If so then we can probably dupe your bug as 1001 and go ahead and resolve this one since it should be unrelated.
2006-08-26 16:06   
It seems to work OK with -nosound.
Thanks (and sorry for being so slow to respond, too much work to play theses past weeks):)
2006-08-26 16:31   
2006-08-26 16:34   
Crap. That's what I get for referring to my own bugnote. This would be unrelated if, with -nosound, it still crashed. With sound it may still be this bug.

Just try and get that gdb backtrace, we aren't going to be able to do anything else about this otherwise.
2006-11-15 05:52   
Well, few months with no response, no one else has reported the problem, and I still can't reproduce it. Either this is no longer an issue, or it will have to be debugged on the side of the person having the problem.

Until new info is available for this I'm just going to close and suspend this issue. It can be reopened when someone has the time to provide a good backtrace, or has more info on the cause of the problem.
2007-02-02 23:37   
Ok, I'm told that this is can not be reproduced in the current Linux builds (3.6.9, RC8+ specifically). There are a numerous possible things that might have fixed this, but I don't have a clue as to which one it actually was. Hopefully we'll just never see this bug again. :)


Issue History
2006-03-11 01:31AncientConsoleNew Issue
2006-03-11 01:31AncientConsoleFile Added: segfault2.txt
2006-03-11 01:33AncientConsoleFile Added: segfault1.txt
2006-03-11 01:33AncientConsoleFile Added: segfault3.txt
2006-03-11 01:34AncientConsoleFile Added: segfault4.txt
2006-03-11 01:34AncientConsoleFile Added: segfault5.txt
2006-03-11 01:35AncientConsoleFile Added: segfault6.txt
2006-03-11 01:35AncientConsoleFile Added: segfault7.txt
2006-03-11 01:38AncientConsoleNote Added: 0005115
2006-03-11 13:36taylorAssigned ToWMCoolmon =>
2006-03-11 13:36taylorStatusassigned => new
2006-03-11 13:36taylorCategorymediaVP => ---------
2006-03-11 13:37taylorNote Added: 0005116
2006-03-14 13:41AncientConsoleNote Added: 0005127
2006-03-14 13:55taylorNote Added: 0005128
2006-03-18 04:38taylorNote Added: 0005164
2006-03-18 13:35AncientConsoleNote Added: 0005167
2006-03-18 15:04taylorNote Added: 0005172
2006-03-19 00:42AncientConsoleFile Added: gdb.log
2006-03-19 00:46AncientConsoleNote Added: 0005186
2006-03-19 14:06taylorNote Added: 0005188
2006-03-19 22:44AncientConsoleFile Added: fs2_open.log
2006-03-19 22:51AncientConsoleNote Added: 0005193
2006-04-07 22:45AncientConsoleNote Added: 0005294
2006-05-09 22:04SpikeNote Added: 0005494
2006-05-25 01:27SpikeNote Added: 0005615
2006-05-26 03:36taylorNote Added: 0005628
2006-05-26 15:15SpikeNote Added: 0005629
2006-05-26 16:21taylorNote Added: 0005630
2006-05-26 16:52SpikeNote Added: 0005631
2006-05-26 18:17SpikeNote Added: 0005632
2006-05-26 19:18SpikeNote Added: 0005634
2006-05-26 19:19SpikeFile Added: ptr_patch.txt
2006-05-26 21:29taylorNote Added: 0005635
2006-05-26 21:59SpikeNote Added: 0005636
2006-05-27 00:58taylorNote Added: 0005637
2006-05-27 02:10taylorCategory--------- => sound
2006-05-27 12:49taylorNote Added: 0005652
2006-05-28 00:05taylorStatusnew => assigned
2006-05-28 00:05taylorAssigned To => taylor
2006-06-03 06:57SpikeNote Added: 0005738
2006-06-03 07:11taylorNote Added: 0005739
2006-06-03 17:23SpikeNote Added: 0005749
2006-06-22 10:59taylorNote Added: 0005903
2006-07-08 23:34taylorNote Added: 0006106
2006-07-09 10:07nodensNote Added: 0006113
2006-07-24 14:52taylorNote Added: 0006326
2006-07-28 10:00KazanNote Added: 0006368
2006-08-19 23:37taylorNote Added: 0006492
2006-08-26 16:06nodensNote Added: 0006505
2006-08-26 16:31taylorStatusassigned => resolved
2006-08-26 16:31taylorResolutionopen => fixed
2006-08-26 16:31taylorNote Added: 0006506
2006-08-26 16:34taylorStatusresolved => feedback
2006-08-26 16:34taylorResolutionfixed => reopened
2006-08-26 16:34taylorNote Added: 0006507
2006-08-26 16:34taylorStatusfeedback => assigned
2006-11-15 05:52taylorStatusassigned => resolved
2006-11-15 05:52taylorResolutionreopened => suspended
2006-11-15 05:52taylorNote Added: 0007133
2007-02-02 23:37taylorNote Added: 0007581
2007-02-02 23:37taylorResolutionsuspended => fixed
2007-02-02 23:37taylorFixed in Version => 3.6.9