View Issue Details

IDProjectCategoryView StatusLast Update
0002205FSSCPAIpublic2010-07-10 15:03
ReporterEchelon9 Assigned ToSushi_CW  
PrioritynormalSeveritycrashReproducibilityrandom
Status resolvedResolutionfixed 
Product Version3.6.12 RC2 
Fixed in Version3.6.13 
Summary0002205: Division by zero in do_random_sidethrust()
DescriptionCrash in game occurred because of a division by zero in do_random_sidethrust(). I'd assume extra checks that strafeHoldDirAmount != zero (although it's a bit odd that static_rand_range() would return a zero in this case.

Additional InformationTop of the stack:

Exception Type: EXC_ARITHMETIC (SIGFPE)
Exception Codes: EXC_I386_DIV (divide by zero)
Crashed Thread: 0 Dispatch queue: com.apple.main-thread

Thread 0 Crashed: Dispatch queue: com.apple.main-thread
0 FS2_Open (debug) 0x00091ed3 do_random_sidethrust(ai_info*, ship_info*) + 139
1 FS2_Open (debug) 0x000921ee attack_set_accel(ai_info*, ship_info*, float, float, float) + 604
2 FS2_Open (debug) 0x000944a3 ai_chase_attack(ai_info*, ship_info*, vec3d*, float, int) + 739
3 FS2_Open (debug) 0x000a8d70 ai_chase() + 2304
4 FS2_Open (debug) 0x000aecb1 ai_execute_behavior(ai_info*) + 189
5 FS2_Open (debug) 0x000af98a ai_frame(int) + 2524
6 FS2_Open (debug) 0x000afb5e ai_process(object*, int, float) + 356
TagsNo tags attached.

Activities

Echelon9

2010-05-09 07:45

developer   ~0011948

Sushi_CW, contact me if you'd like directions on which (non-public) mission I experienced this in.

The_E

2010-05-09 13:11

administrator   ~0011949

There is no do_random_sidethrust() call in attack_set_accel(), at least not on my checkout... (I could just be blind, however).

Also, the function's signature reads "void attack_set_accel(ai_info *aip, float dist_to_enemy, float dot_to_enemy, float dot_from_enemy)", so the question is whether this actually happens on current trunk as well?

Echelon9

2010-05-09 13:28

developer   ~0011950

Sorry to clarify, I am using a trunk build, being the latest r6094.

In trunk, we can see how we get to this point within aicode.cpp:

6695: void attack_set_accel(ai_info *aip, ship_info *sip, float dist_to_enemy, float dot_to_enemy, float dot_from_enemy)
{
...
6737: do_random_sidethrust(aip, sip);
...

The_E

2010-05-09 13:36

administrator   ~0011951

Last edited: 2010-05-09 13:42

Ahh, my mistake. My checkout script apparently failed to run....

EDIT: Right, looking at it, this should only ever happen if sip->slide_accel is -1, and static_rand_range returns the defined minimum (1, in this case).

Considering past experiences, I would recommend tracing further back to see if sip->slide_accel actually is valid at that point, is there something in the mission that would cause that variable to become NaN?

Sushi_CW

2010-05-09 14:05

developer   ~0011952

Yeah, that is surprising: the static_rand_range should go from 1 to 3, inclusive. What is the slide accel on the ship this triggers for? It isn't negative, is it?

Echelon9

2010-06-28 14:42

developer   ~0012153

Apologies for not getting better testing details to you Sushi, I've been flat out these two weeks.

I can confirm this division by zero crash still occurs regularly with trunk and the same missions.

The_E

2010-06-28 18:19

administrator   ~0012154

Yes, well, can you get those details to all of us? A test scenario using retail assets would be good....

Echelon9

2010-07-04 08:31

developer   ~0012182

In testing strafeHoldDirAmount was equal to 0 (thus causing the crash) when:

(int)(sip->slide_accel) = 0, and
static_rand_range((Missiontime + static_rand(aip->shipnum)) >> 18, 1, 3) = 0.

static_rand_range() returned zero when (Missiontime + static_rand(aip->shipnum)) >> 18 was equal to -6813 is this case.

Echelon9

2010-07-04 09:02

developer   ~0012183

Sushi will know this section of code much better, but could this be because of loss of precision changing the float to an int? The '$Slide accel' on the relevant ship is 0.05.

The_E

2010-07-04 14:29

administrator   ~0012184

Last edited: 2010-07-04 14:45

Umm.

Actually, that would hint more towards a failure in static_rand_range(), as that one should return a number >= 1 && <= 3, but not 0.

Which raises the question whether static_rand_range (and by extension, static_rand) deal with negative numbers properly.

Does the same error appear if you add a check for negative numbers to static_rand, like in the attached patch?

2010-07-04 14:46

 

staticrand.patch (295 bytes)   
Index: math/staticrand.cpp
===================================================================
--- math/staticrand.cpp	(revision 6278)
+++ math/staticrand.cpp	(working copy)
@@ -35,6 +35,9 @@
 {
 	int	a, b, c;
 
+	if (num < 0) 
+		num *= -1;
+
 	if (!Semirand_inited)
 		init_semirand();
 
staticrand.patch (295 bytes)   

Sushi_CW

2010-07-05 21:26

developer   ~0012191

Last edited: 2010-07-05 21:26

So, I exhaustively tested static_rand_range by feeding it every possible int value and verifying that the resulting value was in the 1-3 range (same usage scenario as in the FSO code under discussion). Code for the test is attached. It passed the test, at least for me. Manual inspection of the static_rand_range code doesn't see anything objectionable either.

I'm not sure where to go from here, since I can't reproduce the error. Diaspora uses circle strafe, which invokes do_random_sidethrust, and also uses small values for slide accel (which shouldn't make a difference anyway: the slide accel value doesn't even go into static_rand_range). Despite a lot of AI testing in Diaspora, I've never run into this error.

The only way I see for the divide-by-zero to occur is if slide_accel < 0. Do you get a different error if you add

Assert(sip->slide_accel) >= 0;

to the beginning of do_random_sidethrust?

2010-07-05 21:28

 

static_rand_range_test.cpp.txt (1,708 bytes)   
// static_rand_range_test.cpp : Defines the entry point for the console application.
//

#pragma once

#include "targetver.h"

#include <stdio.h>
#include <cstdlib>
#include <tchar.h>

int Rand_count;

#define	SEMIRAND_MAX_LOG	4
#define	SEMIRAND_MAX		(2 << SEMIRAND_MAX_LOG)	//	Do not change this!  Change SEMIRAND_MAX_LOG!

int Semirand_inited = 0;
int Semirand[SEMIRAND_MAX];

int myrand()
{
	int rval;
	rval = rand();
	Rand_count++;
//	nprintf(("Alan","RAND: %d\n", rval));
	return rval;
}

//	Initialize Semirand array.
void init_semirand()
{
	int	i;

	Semirand_inited = 1;

	for (i=0; i<SEMIRAND_MAX; i++)
		Semirand[i] = (myrand() << 15) + myrand();
}

int static_rand(int num)
{
	int	a, b, c;

	if (!Semirand_inited)
		init_semirand();

	a = num & (SEMIRAND_MAX - 1);
	b = (num >> SEMIRAND_MAX_LOG) & (SEMIRAND_MAX - 1);
	c = (num >> (2 * SEMIRAND_MAX_LOG)) & (SEMIRAND_MAX - 1);

	return Semirand[a] ^ Semirand[b] ^ Semirand[c];
}

// Note: min and max are inclusive
int static_rand_range(int num, int min, int max)
{
	int	rval = static_rand(num);
	rval = (rval % (max - min + 1)) + min;
	return rval;
}

int main(int argc, char* argv[])
{
	int min = 1;
	int max = 3;
	FILE *file;
	file = fopen("file.txt","w+");
	for (int i = INT_MIN; i < INT_MAX; i++)
	{
		int result = static_rand_range(i, min, max);
		if (result < min || result > max)
		{
			printf("ANOMALY: Seed=%d, Min=%d, Max=%d, Output=%d\n", i, min, max, result);
			fprintf(file, "ANOMALY: Seed=%d, Min=%d, Max=%d, Output=%d\n", i, min, max, result);
		}
		if (i % 10000000 == 0)
		{
			printf("Seed @ %d\n", i);
			fprintf(file, "Seed @ %d\n", i);
		}
	}
	fclose(file); /*done!*/
	return 0;
}


static_rand_range_test.cpp.txt (1,708 bytes)   

The_E

2010-07-09 13:54

administrator   ~0012208

Could we please get a sample tbl and sample mission with which to reproduce this?

Echelon9

2010-07-10 12:15

developer   ~0012217

static_rand_range() was returning zero, which is fixed by Mondragon's patch to check for negative numbers in static_rand.

This may be caused by the underlying system's generation of rand(). I'm testing on a Mac here, and in Diaspora missions the AI was causing this crash regularly after 2-4mins. With the patch it is rock solid again.

I assume Sushi and Mondragon aren't on Mac or Linux (i.e. on Windows)?

The_E

2010-07-10 13:52

administrator   ~0012219

Well, yeah, I am on Windows. But it's weird, is there something in the random number functions that works differently on Linux/MacOS?

Sushi_CW

2010-07-10 14:56

developer   ~0012220

Yeah, I'm on Windows.

This is indeed very weird. :) I'm guessing Mondragon's patch should work to solve the issue, though.

The_E

2010-07-10 15:03

administrator   ~0012221

Patch committed in revision 6289. If anyone can make it clear why this happens, I'd be grateful.

Issue History

Date Modified Username Field Change
2010-05-09 07:44 Echelon9 New Issue
2010-05-09 07:44 Echelon9 Status new => assigned
2010-05-09 07:44 Echelon9 Assigned To => Sushi_CW
2010-05-09 07:45 Echelon9 Note Added: 0011948
2010-05-09 13:11 The_E Note Added: 0011949
2010-05-09 13:28 Echelon9 Note Added: 0011950
2010-05-09 13:36 The_E Note Added: 0011951
2010-05-09 13:39 The_E Note Edited: 0011951
2010-05-09 13:42 The_E Note Edited: 0011951
2010-05-09 14:05 Sushi_CW Note Added: 0011952
2010-06-28 14:42 Echelon9 Note Added: 0012153
2010-06-28 18:19 The_E Note Added: 0012154
2010-07-04 08:31 Echelon9 Note Added: 0012182
2010-07-04 09:02 Echelon9 Note Added: 0012183
2010-07-04 14:29 The_E Note Added: 0012184
2010-07-04 14:36 The_E Note Edited: 0012184
2010-07-04 14:45 The_E Note Edited: 0012184
2010-07-04 14:46 The_E File Added: staticrand.patch
2010-07-05 21:26 Sushi_CW Note Added: 0012191
2010-07-05 21:26 Sushi_CW Note Edited: 0012191
2010-07-05 21:28 Sushi_CW File Added: static_rand_range_test.cpp.txt
2010-07-09 13:54 The_E Note Added: 0012208
2010-07-10 12:15 Echelon9 Note Added: 0012217
2010-07-10 13:52 The_E Note Added: 0012219
2010-07-10 14:56 Sushi_CW Note Added: 0012220
2010-07-10 15:03 The_E Note Added: 0012221
2010-07-10 15:03 The_E Status assigned => resolved
2010-07-10 15:03 The_E Fixed in Version => 3.6.13
2010-07-10 15:03 The_E Resolution open => fixed