Review for paper "Quantifying and Evaluating Uncertainty in the Internal Representations of Robots" (Project 6)
================
Best Project? No
Top 3? Maybe
Organization/Clarity: 5
Idea: 8
Research Contribution: 10
================
Feedback
================
Note: This review will focus on the math section and the results section; I will sometimes use the term "paper" to refer only to this subset.

Overall, this paper has useful ideas and information, but needs serious revision in order to be suitable for its target audience. The paper is written for a reader deeply interested in the mathematical intricacies of the formulas and methods described (i.e. a mathematician), but the average reader will be more interested in the results and the means of applying them (i.e. an engineer).

4 things need to be made clear that aren't: first, which ideas already exist that address this sort of problem; second, which ideas originate in this paper; third, how those ideas are better than existing ones; and fourth, the exact manner in which the ideas were applied to produce the results outlined at the paper's end.

The problem that the entropy method addresses is that of determining the degree of certainty with which two discrete random variables can be said to be (in)dependent. That a chi-square test is the "standard" statistical way to solve this problem will likely be known to the reader, but in the paper, the chi-square test is not at all mentioned before the explanation of mutual information. You should formulate the problem, mention the chi-squared test, and state your intention to develop a better method before any explanation of entropy.

It is very unclear which mathematical ideas are original to the paper. It appears that the paper mostly summarizes previous mathematics work, and that propositions 1-7 add a bit of extra analysis (propositions 8-11 and the result (40) appear to be misguided). It is particularly confusing when you repeat existing work inline, because it is hard to differentiate original work from previously existing work when shifting rapidly back and forth between the two. All unoriginal proofs should be removed either to appendices or entirely. In particular, the proof of Pearson's Theorem seriously disrupts the flow of your paper; it was frustrating having to scroll back and forth through it when I needed to compare information below and above it. I feel it would be best if you were to first explain the existing work in a succinct fashion, preferring references to appendices or to other works instead of inline discussion, and then provide your original work below that. Remember that although there are readers (myself among them) that will enjoy reading the details of existing work, there are others that won't, and you frustrate the latter type when you mix different content as you have. Note that despite the problems I mention in this paragraph, the paper is still very useful (and made an interesting read for me personally) because it brings entropy to the attention of the robotics community in a compact form.

Since the chi-square test was never explicitly mentioned (although numerous relationships between it and entropy appeared throughout the paper), there was no comparison between it and the entropy method. Such a comparison, I feel, should be the crux of your paper: you need to justify the advantages of the entropy method over the chi-squared method, and this justification will ideally involve both a theoretical explanation as well as experimental results; ideally you will give side-by-side graphs of an entropy implementation and a chi-square implementation for each of the 4 experiments you analyzed. As a part of this, you should give the reader a quantitative sense of the significance of certain formulas (e.g. the bounds for Bias(Hmm); it is not at all apparent from the formula that they are tight enough to be useful).

Finally, you need to provide an explicit (algorithmic) description of how you computed p-values from the data in the various experiments. I was unable to infer the use of Fisher's Theorem for this (in fact, I could not decipher the purpose of the theorem at all), so it was not sufficiently clear. You should also provide source code (preferably with a "new-BSD" license). As I alluded to before, your paper needs to be streamlined for the reader who doesn't want to understand but only wants to implement; you can still provide as many auxililary facts and explanations as you feel appropriate, as long as they are given in appendices, and the overall mathematical quality of your paper will not suffer from this measure.

There were other minor flaws like the implicit use of Jensen's theorem around proposition 4; L_{H_{MLE}} as a function of ((m - 1) / N) needs to be convex for the proof to work, and it is, but as it stands the proof seems to make a naive mistake that just happens to work (perhaps I am just misreading it, though). Also, it is redundant to say that H_{MM} makes a good estimator with N>>m, because the formula shows that that when N>>m, H_{MM} \approx H_{MLE}, and it was already stated that H_{MLE} is a good estimator when N>>m. H_{MM} should only be useful when N/m is low.

I hope that this feedback will prove useful in improving the paper before its final submission.