Thanks for responding!
I’ll try not to bloviate, but I can’t guarantee it.
LOL! Don't worry. I'm sure you won't! Dan just needs to be chastised for his overly zealous terminology every now and then. We've learned to tolerate each other... most of the time!
I’m sorry about the hit and run. I was at a conference last week. I made the comment in the middle of the night when I couldn’t sleep. I was travelling home all day yesterday.
No problem! I realize that any time spent here is time taken away from other (probably more important) activities. That's why I wanted to give you time to see if you planned on coming back or if it was just a one time thing. No problem either way, but if you can spare some time it would be nice to get your perspective on some things--although I have no doubt you will disagree with me and for the most part support Dan and Glenn.
Marg said:I haven't followed what your statistical study/studies is/are about. I have a vague idea on one aspect...which I believe is that according to your study the Jocker's study tells us nothing of value if the true or true author's writings are not included in the input to be studied.
I happened to meet Matt Jockers earlier this summer. He did a great job of hosting the Digital Humanities conference at Stanford, and we had a nice discussion there. I might be misstating his position, but I believe that Jockers himself simply saw their study as using a novel machine learning tool to preliminarily rank a tentative set of candidate authors for Book of Mormon chapters. The paper itself, of course, has much stronger language than that, actually giving completely unjustified authorship probabilities for the Book of Mormon chapters. I can only suppose that the overstated part of the paper was due to Craig Criddle.
I hesitate to simply say that “the Jocker's study tells us nothing of value if the true or true author's writings are not included in the input to be studied” because you might then say that the historical data convinces you that the true author was in the candidate set (especially if the set were expanded to include Joseph Smith) and therefore all conclusions of the Jockers et al. paper become valid. It’s not like that.
The salient point is that the Book of Mormon chapters are stylometrically distinct from all of the training texts (see my PCA plot or my goodness-of-fit test). Therefore the Jockers et al. results could only provide information about authorship of the Book of Mormon chapters:
1. if the true author of each of the Book of Mormon chapters was actually in the set of candidate authors, AND
2. if it was known that each author’s centroid would shift in exactly the same way if they deliberately ‘archaized’ their style.
We know nothing about number 2, so even if the true author was in the candidate set Jockers’ results provide no useful information.
Okay so the above is very interesting. There are several things that hit me right off the bat... and please keep in mind that I am a layman and don't claim to understand the complex formulas.
Nevertheless, when you add the qualifier "AND" you seem to be negating Ben's previous claim that essentially simply boils down to your point number 1: if the true author of each of the Book of Mormon chapters was actually in the set of candidate authors then Jocker's methodology is generally accurate. You now seem to be qualifying that assertion with what Chris Smith has been saying for some time now... that genre can affect the outcome.
If that is what is implied by your use of "AND" then that seems totally reasonable.
But I thought the use of non-contextual words was supposed to minimize (if not eliminate) those kinds of factors. No? What am I missing? If a writer is intentionally attempting to change his writing style can he fool the computer? Isn't that what the whole idea of using non-contextual words is supposed to deal with?
Bruce wrote:I hesitate to simply say that “the Jocker's study tells us nothing of value if the true or true author's writings are not included in the input to be studied” because you might then say that the historical data convinces you that the true author was in the candidate set (especially if the set were expanded to include Joseph Smith) and therefore all conclusions of the Jockers et al. paper become valid. It’s not like that.
That's exactly what I have been saying. It is simply an extension of Ben's logic, who, I presume, was at least fairly qualified to make the assertion in the first place (but I could be wrong about that). Up until now, that assertion has not been challenged or qualified.
Getting back to this:
2. if it was known that each author’s centroid would shift in exactly the same way if they deliberately ‘archaized’ their style.
For the sake of discussion, let's say we know that number 2 is true. If so, in that case, could you agree that Jocker's results are useful given your point number 1?
--oops, I just read your second post and you seem to be agreeing with that. Here's what you wrote:
If a genre shift (like writing in archaic language) was not involved, I would (in a qualified way) agree with Ben. The Federalist application that Glenn referred to shows that. My qualification is that Jockers’ results would still be distorted because they ignored sizes of the training and test texts, because they used Rigdon texts from the 1863-1873 period (for which the provenance is not well established, and which are distinctly different stylometrically from the early Rigdon texts), and because many of the stylistic marker words they used were contextual (e.g ‘children’, ‘men’).
Well I'm not sure why they would do that, but it's good to hear that at least in principle, under the right conditions you would apparently agree that Jocker's methodology can be useful. The sticking point seems to be whether the actual author is among the candidate set and to what extent would emulation of King James English affect the results. Correct?
With regard to training and test sets, there must be some reason for the limitations in size, no?
We recently submitted another paper to LLC in which we developed better ways to adjust for text sizes and choose the shrinkage coefficient, as well as to provide a measure of uncertainty for the authorship probabilities. We used the Federalist papers as training tests, and other non-Federalist writings (of different sizes and genres) by Hamilton, Madison and Jay as test texts. We still used open-set technology. Almost two-thirds of the test texts were correctly classified.
Would it be possible to use the 1830 Book of Mormon preface as a sample for Joseph Smith and see how it compares to the rest of the Book of Mormon?
All the best.