Response to Jockers, Criddle, et al., Now Available

Post by **_Roger** » Mon Jul 25, 2011 8:00 am

Hi Bruce:

Thanks for responding!

I’ll try not to bloviate, but I can’t guarantee it.

LOL! Don't worry. I'm sure you won't! Dan just needs to be chastised for his overly zealous terminology every now and then. We've learned to tolerate each other... most of the time!

I’m sorry about the hit and run. I was at a conference last week. I made the comment in the middle of the night when I couldn’t sleep. I was travelling home all day yesterday.

No problem! I realize that any time spent here is time taken away from other (probably more important) activities. That's why I wanted to give you time to see if you planned on coming back or if it was just a one time thing. No problem either way, but if you can spare some time it would be nice to get your perspective on some things--although I have no doubt you will disagree with me and for the most part support Dan and Glenn.

Marg said:
I haven't followed what your statistical study/studies is/are about. I have a vague idea on one aspect...which I believe is that according to your study the Jocker's study tells us nothing of value if the true or true author's writings are not included in the input to be studied.

I happened to meet Matt Jockers earlier this summer. He did a great job of hosting the Digital Humanities conference at Stanford, and we had a nice discussion there. I might be misstating his position, but I believe that Jockers himself simply saw their study as using a novel machine learning tool to preliminarily rank a tentative set of candidate authors for Book of Mormon chapters. The paper itself, of course, has much stronger language than that, actually giving completely unjustified authorship probabilities for the Book of Mormon chapters. I can only suppose that the overstated part of the paper was due to Craig Criddle.

I hesitate to simply say that “the Jocker's study tells us nothing of value if the true or true author's writings are not included in the input to be studied” because you might then say that the historical data convinces you that the true author was in the candidate set (especially if the set were expanded to include Joseph Smith) and therefore all conclusions of the Jockers et al. paper become valid. It’s not like that.

The salient point is that the Book of Mormon chapters are stylometrically distinct from all of the training texts (see my PCA plot or my goodness-of-fit test). Therefore the Jockers et al. results could only provide information about authorship of the Book of Mormon chapters:
1. if the true author of each of the Book of Mormon chapters was actually in the set of candidate authors, AND
2. if it was known that each author’s centroid would shift in exactly the same way if they deliberately ‘archaized’ their style.

We know nothing about number 2, so even if the true author was in the candidate set Jockers’ results provide no useful information.

Okay so the above is very interesting. There are several things that hit me right off the bat... and please keep in mind that I am a layman and don't claim to understand the complex formulas.

Nevertheless, when you add the qualifier "AND" you seem to be negating Ben's previous claim that essentially simply boils down to your point number 1: if the true author of each of the Book of Mormon chapters was actually in the set of candidate authors then Jocker's methodology is generally accurate. You now seem to be qualifying that assertion with what Chris Smith has been saying for some time now... that genre can affect the outcome.

If that is what is implied by your use of "AND" then that seems totally reasonable.

But I thought the use of non-contextual words was supposed to minimize (if not eliminate) those kinds of factors. No? What am I missing? If a writer is intentionally attempting to change his writing style can he fool the computer? Isn't that what the whole idea of using non-contextual words is supposed to deal with?

Bruce wrote:I hesitate to simply say that “the Jocker's study tells us nothing of value if the true or true author's writings are not included in the input to be studied” because you might then say that the historical data convinces you that the true author was in the candidate set (especially if the set were expanded to include Joseph Smith) and therefore all conclusions of the Jockers et al. paper become valid. It’s not like that.

That's exactly what I have been saying. It is simply an extension of Ben's logic, who, I presume, was at least fairly qualified to make the assertion in the first place (but I could be wrong about that). Up until now, that assertion has not been challenged or qualified.

Getting back to this:

2. if it was known that each author’s centroid would shift in exactly the same way if they deliberately ‘archaized’ their style.

For the sake of discussion, let's say we know that number 2 is true. If so, in that case, could you agree that Jocker's results are useful given your point number 1?

--oops, I just read your second post and you seem to be agreeing with that. Here's what you wrote:

If a genre shift (like writing in archaic language) was not involved, I would (in a qualified way) agree with Ben. The Federalist application that Glenn referred to shows that. My qualification is that Jockers’ results would still be distorted because they ignored sizes of the training and test texts, because they used Rigdon texts from the 1863-1873 period (for which the provenance is not well established, and which are distinctly different stylometrically from the early Rigdon texts), and because many of the stylistic marker words they used were contextual (e.g ‘children’, ‘men’).

Well I'm not sure why they would do that, but it's good to hear that at least in principle, under the right conditions you would apparently agree that Jocker's methodology can be useful. The sticking point seems to be whether the actual author is among the candidate set and to what extent would emulation of King James English affect the results. Correct?

With regard to training and test sets, there must be some reason for the limitations in size, no?

We recently submitted another paper to LLC in which we developed better ways to adjust for text sizes and choose the shrinkage coefficient, as well as to provide a measure of uncertainty for the authorship probabilities. We used the Federalist papers as training tests, and other non-Federalist writings (of different sizes and genres) by Hamilton, Madison and Jay as test texts. We still used open-set technology. Almost two-thirds of the test texts were correctly classified.

Would it be possible to use the 1830 Book of Mormon preface as a sample for Joseph Smith and see how it compares to the rest of the Book of Mormon?

All the best.

Post by **_bschaalje** » Tue Jul 26, 2011 3:09 am

Roger,

--although I have no doubt you will disagree with me and for the most part support Dan and Glenn.

I think you’re right. I’m a hopeless N/M/M (Nephi/Mormon/Moroni) theorist.

Glenn, you have done a great job hanging in there on this very long thread. My 2 previous posts address to some extent your question about genre skewing Jockers’ results.

Dan, your reference to the Shermer conspiracy book was nice. The 8 criteria for deciding whether a conspiracy is real are sensible. Obviously – by definition – a successful conspiracy is one that can’t be proven, but that doesn’t mean that it is unhelpful to consider how likely a proposed conspiracy theory is.

Nevertheless, when you add the qualifier "AND" you seem to be negating Ben's previous claim that essentially simply boils down to your point number 1: if the true author of each of the Book of Mormon chapters was actually in the set of candidate authors then Jocker's methodology is generally accurate. You now seem to be qualifying that assertion with what Chris Smith has been saying for some time now... that genre can affect the outcome.

If that is what is implied by your use of "AND" then that seems totally reasonable.

But I thought the use of non-contextual words was supposed to minimize (if not eliminate) those kinds of factors. No? What am I missing? If a writer is intentionally attempting to change his writing style can he fool the computer? Isn't that what the whole idea of using non-contextual words is supposed to deal with?

That’s the hope. But a couple of things: (1) A significant number of Jockers’ words (14%) were proper nouns and therefore contextual. So this idea doesn’t apply to Jockers’ results anyway. (2) The main point of our LLC paper was that if genre ISN’T a problem, the data provide strong evidence that the Book of Mormon chapters were NOT written by any of the Jockers candidate authors or by Joseph Smith.

That's exactly what I have been saying. It is simply an extension of Ben's logic, who, I presume, was at least fairly qualified to make the assertion in the first place (but I could be wrong about that). Up until now, that assertion has not been challenged or qualified.

For the sake of discussion, let's say we know that number 2 is true. If so, in that case, could you agree that Jocker's results are useful given your point number 1? .

I think that Ben was aware of (or at least would agree with) these qualifications. I could agree with your point in general, but not specifically with respect to Jockers’ results. Number 2 being true would mean that we have data showing that every author distorts their style in exactly the same way when they ‘archaize’ (which I really doubt, but I don’t have data on it). We would also have to have data showing that the ‘archaizing shift’ would place the candidate authors’ centroids in the middle of the Book of Mormon chapters. Even then, given the variability of the Book of Mormon chapters (much, much greater than that of any of the candidate authors), we still would not have clean results.

The point is that stylometric measures of the Book of Mormon chapters are highly variable, and statistically (i.e. taking the mean AND the variability into account) very far from those of all of the S/R theory authors.

Well I'm not sure why they would do that, but it's good to hear that at least in principle, under the right conditions you would apparently agree that Jocker's methodology can be useful. The sticking point seems to be whether the actual author is among the candidate set and to what extent would emulation of King James English affect the results. Correct?

Not only ‘to what extent would emulation of King James English affect the results’ but also ‘to what extent would emulation of King James English differ from author to author.’

With regard to training and test sets, there must be some reason for the limitations in size, no?

There is a very good reason. The Book of Mormon chapters and the training texts were of greatly varying sizes. I agree with tampering with (splitting or amalgamating) the natural texts as little as possible. But this means that adjustments have to be made for the resulting differences among the texts with regard to uncertainty in the stylometric measures.

Would it be possible to use the 1830 Book of Mormon preface as a sample for Joseph Smith and see how it compares to the rest of the Book of Mormon?

Yes it would. But since the preface is small, the results might not be definitive. I’ve been meaning to look at this for a long time. What do you expect the preface to show?

Bruce

Post by **_Daniel Peterson** » Tue Jul 26, 2011 4:27 am

Roger wrote:The simple truth is that at times you sound like Daniel Peterson until you are called on it.

LOL. That was brutal.

Poor Dan Vogel.

I dipped into this thread yesterday for the first time in months . . . and saw that. One of the harshest putdowns I've ever seen.

Now, back to your regularly scheduled programming, and on to page 250!

Post by **_Roger** » Tue Jul 26, 2011 6:18 am

Daniel Peterson wrote:
Roger wrote:The simple truth is that at times you sound like Daniel Peterson until you are called on it.

LOL. That was brutal.

Poor Dan Vogel.

I dipped into this thread yesterday for the first time in months . . . and saw that. One of the harshest putdowns I've ever seen.

Now, back to your regularly scheduled programming, and on to page 250!

LOL! I'm glad you're a good sport. It simply goes to show that perspective makes all the difference as I have no doubt Glenn would have been quite flattered.

Post by **_Roger** » Tue Jul 26, 2011 6:59 am

Bruce:

Glenn, you have done a great job hanging in there on this very long thread. My 2 previous posts address to some extent your question about genre skewing Jockers’ results.

Hanging in there? I think Glenn has done more than his fair share of dishing out! lol

I think that Ben was aware of (or at least would agree with) these qualifications. I could agree with your point in general, but not specifically with respect to Jockers’ results. Number 2 being true would mean that we have data showing that every author distorts their style in exactly the same way when they ‘archaize’ (which I really doubt, but I don’t have data on it). We would also have to have data showing that the ‘archaizing shift’ would place the candidate authors’ centroids in the middle of the Book of Mormon chapters. Even then, given the variability of the Book of Mormon chapters (much, much greater than that of any of the candidate authors), we still would not have clean results.

The point is that stylometric measures of the Book of Mormon chapters are highly variable, and statistically (i.e. taking the mean AND the variability into account) very far from those of all of the S/R theory authors.

Given the variability, how confident are you that regardless of whenever the content for the Book of Mormon was composed, there were multiple authors contributing to the final product? Can you quantify it, say on a scale of 1 to 10 with 10 being certainty of multiple authors and 1 being certainty of only one author?

Well I'm not sure why they would do that, but it's good to hear that at least in principle, under the right conditions you would apparently agree that Jocker's methodology can be useful. The sticking point seems to be whether the actual author is among the candidate set and to what extent would emulation of King James English affect the results. Correct?

Not only ‘to what extent would emulation of King James English affect the results’ but also ‘to what extent would emulation of King James English differ from author to author.’

Well let me probe that a bit... isn't it probable that under either S/D or S/A there should be little variance when it comes to KJV emulation? Under S/D (tight) we would expect the King James emulation to come from God, I would think, since God is the one giving the words to Joseph and under S/D (loose) we would expect the King James emulation to come from Joseph Smith. And, of course, same thing for S/A. So it would seem that only S/R presents a framework whereby we would expect to see much variation in King James emulation from one author to the next, no? In other words, I would not expect Nephi or Helaman or Mormon, etc. to introduce variance into your results as a result of King James English emulation... would you?

With regard to training and test sets, there must be some reason for the limitations in size, no?

There is a very good reason. The Book of Mormon chapters and the training texts were of greatly varying sizes. I agree with tampering with (splitting or amalgamating) the natural texts as little as possible. But this means that adjustments have to be made for the resulting differences among the texts with regard to uncertainty in the stylometric measures.

I guess that just goes with the territory, then.

Would it be possible to use the 1830 Book of Mormon preface as a sample for Joseph Smith and see how it compares to the rest of the Book of Mormon?

Yes it would. But since the preface is small, the results might not be definitive. I’ve been meaning to look at this for a long time. What do you expect the preface to show?

Well I think it would be a worthwhile experiment. To be candid, as someone who thinks Smith contributed content to the BOM--but only a relatively small amount--I would expect that only a few chapters would end up being very similar stylometrically to the preface while most would not.

Another option, perhaps a better one, would be to compile a "closed set" of author-candidate word-prints, including those for Isaiah, Malachi, Matthew, Joseph Smith, Nephi, Mormon, Zeniff, Helaman, and Moroni.

Then run those word-prints against the Book of Mormon text (including the preface) and publish the occurrence distributions for each author-candidate.

Post by **_GlennThigpen** » Tue Jul 26, 2011 11:50 am

Roger wrote:Hanging in there? I think Glenn has done more than his fair share of dishing out! lol

I guess that some of my dishes have not been very palatable to a S/R diner. <grin>

Roger wrote:Well let me probe that a bit... isn't it probable that under either S/D or S/A there should be little variance when it comes to KJV emulation? Under S/D (tight) we would expect the King James emulation to come from God, I would think, since God is the one giving the words to Joseph and under S/D (loose) we would expect the King James emulation to come from Joseph Smith. And, of course, same thing for S/A. So it would seem that only S/R presents a framework whereby we would expect to see much variation in King James emulation from one author to the next, no? In other words, I would not expect Nephi or Helaman or Mormon, etc. to introduce variance into your results as a result of King James English emulation... would you?

I myself would think that a very faithful rendering of an author from Hebrew to archaic English would preserve the original's author's style pretty well and would thus be distinguishable from other authors.
It has been shown that via a careful translation, an author's wordprints can survive a translation (On Verifying Wordprint Studies: Book of Mormon Authorship, Hilton, 1990). This study was from German to English and German does have articles and participles that align pretty well with English.
However, wordprint studies on Malachi and Isaiah show that they differentiate between each other also, which leads me to believe that a careful rendering of the authors of the Book of Mormon from Hebrew into archaic English would produce the same type of results, i.e. that that the differences between the authors would indeed show up in the translation.

Glenn

Post by **_Roger** » Tue Jul 26, 2011 7:28 pm

Glenn:

I guess that some of my dishes have not been very palatable to a S/R diner. <grin>

How can I put it gently..... don't quit your day job. ; )

I myself would think that a very faithful rendering of an author from Hebrew to archaic English would preserve the original's author's style pretty well and would thus be distinguishable from other authors.
It has been shown that via a careful translation, an author's wordprints can survive a translation (On Verifying Wordprint Studies: Book of Mormon Authorship, Hilton, 1990). This study was from German to English and German does have articles and participles that align pretty well with English.
However, wordprint studies on Malachi and Isaiah show that they differentiate between each other also, which leads me to believe that a careful rendering of the authors of the Book of Mormon from Hebrew into archaic English would produce the same type of results, i.e. that that the differences between the authors would indeed show up in the translation.

Sure, the differences between authors would show up. That's what's supposed to happen. But those authors could not have been attempting to emulate a foreign language spoken by a future king of different country.

Post by **_Uncle Dale** » Tue Jul 26, 2011 7:44 pm

Daniel Peterson wrote:...
Now, back to your regularly scheduled programming, and on to page 250!

Of course Mr. Vogel has nowhere near the education, qualifications,
professional experience and publishing history that you do, Doc.

All might be better if one of you two would consent change his Christian
name. (Beyond that small existential overlap, I see no justification for our
mentioning the two of you fine fellows in the same breath.)

As for defending Fawn Brodie -- and arguing against a multiple authorship
for the Book of Mormon -- I truly feel that those are lost causes. The
volume obviously is a composite of authorship "voices," obtained from
multiple sources. Unfortunately the Gentile scholars have yet to make
that interesting discovery: and I'm beginning to doubt they ever will.
I. W. Riley will outlive us all, in his vaunted literary deductions.

Uncle Dale

Post by **_GlennThigpen** » Tue Jul 26, 2011 8:27 pm

Roger wrote:Sure, the differences between authors would show up. That's what's supposed to happen. But those authors could not have been attempting to emulate a foreign language spoken by a future king of different country.

I'm not sure what you are trying to say here. What authors are you talking about?

No one was trying to emulate Nephi's language and speech. It was only being translated.

Glenn

Post by **_Roger** » Tue Jul 26, 2011 9:03 pm

Glenn:

I'm not sure what you are trying to say here. What authors are you talking about?

No one was trying to emulate Nephi's language and speech. It was only being translated.

When Nephi wrote (which of course, I don't believe in the first place) he would not have been attempting to write in emulation of King James English. He would allegedly have been writing in the mysterious "reformed Egyptian."

When the Book of Mormon was translated (which of course, I don't believe it ever was) whoever produced the "translation" was attempting to emulate King James English, which was not the contemporary dialect of either Joseph Smith or his society. It was an archaic dialect even in Joseph Smith's day (with the possible exception of some obscure puritanical groups).

So when Bruce asks: "Not only ‘to what extent would emulation of King James English affect the results’ but also ‘to what extent would emulation of King James English differ from author to author.’"I have to ask myself which authors is he referring to? He can't be referring to Nephi or Mormon, he can only be referring to potential 19th century authors--which you insist his study has ruled out (at least with regard to the most likely suspects). So I wonder about the question. Which authors is he referring to?

If he's referring to 19th century authors, then if we discover noticeable variance in King James emulation patterns (whether that is even possible I do not know) then, it would appear that such a variance would support S/R. I don't see how it would support either S/A or S/D.

DiscussMormonism.com

Response to Jockers, Criddle, et al., Now Available

Re: Response to Jockers, Criddle, et al., Now Available

Re: Response to Jockers, Criddle, et al., Now Available

Re: Response to Jockers, Criddle, et al., Now Available

Re: Response to Jockers, Criddle, et al., Now Available

Re: Response to Jockers, Criddle, et al., Now Available

Re: Response to Jockers, Criddle, et al., Now Available

Re: Response to Jockers, Criddle, et al., Now Available

Re: Response to Jockers, Criddle, et al., Now Available

Re: Response to Jockers, Criddle, et al., Now Available

Re: Response to Jockers, Criddle, et al., Now Available