Debugging Statistical Methods in JavaScript
I occasionally contribute to the jStat JavaScript library, which aims to provide a decent suite of statistical functions in native JavaScript. A recent bug report (helpfully filed by akrawitz  thanks!) reported that the jStat.centralF.pdf(x, df1, df2)
function was returning NaN
instead of the proper value for large values of df1
and df2
.
I decided to fix this bug and journal the process here, as an experiment in selfreflective coding. I’m writing in the present tense, because I am writing this post as I go through the bug fix. I understand if first person present tense is bothersome; apologies to those readers.
First, the bug report helpfully describes the problem and gives two failing cases, with the correct value even! This is quite helpful. I have a degree in statistics and am certainly familiar with the F distribution but haven’t touched this part of the jStat code. I don’t know what to expect. I know there is division in the formula for the probability distribution function of a F random variable; perhaps that’s where the NaN comes from.
This is all a hunch, though. To begin fixing the issue, first I make sure I have the newest copy of the jStat master to work off of. Which means forgetting how to set an upstream branch to track the jstat/jstat project instead of my fork; a little searching (this article was helpful) and I have my local repository working.
I confirm that the existing tests pass. Then I write some tests that fail, which follow in the next code block. In the process, I check to see if there are any existing tests of the centralF
distribution; there aren’t, but in scanning the appropriate file I find that there are some duplicated tests. I go ahead and remove them, and commit.
Given that there are no tests for the centralF
distribution, it is time to add some. I fire up R to get some values to test against, and copy the structure of one of the other test groups. The code I added is below.
1 2 3 4 5 6 7 8 9 10 11 12 

The last two values were given to me from the bug report, I know they fail so I include them. The first is a sanity test  there are no tests for this function, so if it fails on the first one I know that it is totally broken. The second tests my theory that it is large denominator degrees of freedom  which correspond to small denominator values in the F pdf calculation  that is causing the problem.
I run the tests and… they pass?
Add a test sure to fail to make sure I’m not insane:
1 2 3 4 5 6 7 

Sure enough, I get a broken
message with the offending line. I remove it. I’m still a doubting Thomas; I open the node.js REPL and try to see for myself. I’m not a node expert so it takes a couple tries to get the file require
’d correctly; I’m used to using Ruby’s reuqire_relative
and didn’t realize I could just list a file path in require()
. Sure enough jStat.centralF.pdf(1, 100, 100)
returns NaN
.
Some logging and further investigation is in order:
1 2 3 4 5 6 7 8 9 

The tests pass, and the log statement dutifully shows NaN. Apparently, I was wrong to assume that assert.epsilon
would return false
when comparing a number to a NaN
. A quick grep
through my source directory shows that assert.epsilon
is defined in the vows node module, in the lib/assert/macros.js file.
Here’s the function:
1 2 3 4 5 

Playing around in the REPL, I rediscover that NaN > x
returns false
for all x
, so if NaN
is passed, the function can never fail the assertion. This means that manual NaN
checks are going to be required everywhere. There are a few other options; the two logical ones would be to wrap assert.epsilon
in a custom, jStat only function that automates the NaN
check or to monkeypatch the vows library to make assert.epsilon
have the requested behavior.
Since I’m not the project maintainer, I’ll bring it up in the pull request and discuss it with the maintainer, Trevor Norris. As an aside, Trevor is absolutely wonderful  he spent way more time than I deserve helping me learn how to contribute to jStat and FLOSS. For now, I’ll stay on track fixing this function. Here’s a fixed iteration of the test suite, which fails as expected.
(I removed the #1 and #2 test case from above after testing them manually in the REPL; they worked, so the remaining two examples were sufficient.)
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 

At this point, I commit, with a note about the assert.epsilon
problem. Time to actually fix the function, which is shown here:
1 2 3 4 5 6 7 8 

Checking against one of my old statistics textbooks, I see that this is totally correct. The problem is numerical. Time to verify:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 

Sure enough, the sqrtNumerator
and sqrtDenominator
are returning Infinity
 not surprising when one calls Math.pow(df2, df2)
!
How to fix this is another issue. There isn’t a clear option in JavaScript to just declare the numerator or denominator as a bigger type of number, as all numbers are represented the same way. There are some big number libraries around, but adding a dependency like that to the project isn’t a decision to be made lightly, and evaluating big number libraries is certainly outside the scope of this blog post.
The only other idea I have is to see if someone else has a cleverer way to calculate the central F distribution pdf. I know this is pretty common; oftentimes the way that a statistical value is calculated is quite different from the mathematical definition found in a textbook. If there is a clever way to do it, it is a pretty sure bet that R will be using it.
The R project exposes its SVN repository as HTML; I searched it just by typing (“df site:df site:https://svn.rproject.org/R/trunk”) into a search engine, where “df” is the name of the function I was looking for. df.c ends up being the sixth or seventh result, but still a lot easier than navigating the source.
Good news : it appears that Catherine Loader from Bell Labs wrote a super quick implementation that can handle reasonably large numbers, but understanding the implementation is nontrivial. I think that a better understanding of how Catherine Loader’s implementation works is required  time to do some research. In the mean time, I submitted a PR for the double test.
UPDATE: I fixed the bug! Read about the process in Part 2.