Maybe Grok Speak Pretty One Day

Linguistics • Tech Criticism

Maybe Grok Speak Pretty One Day

The predictable output of a specific kind of training pressure operating on a specific kind of system.

“The model had, in effect, generated additional evidence against its own position and submitted it as evidence for… It resembles a Roomba that has located a staircase and decided that the staircase was its idea all along.”

A Long-Form Analysis

Trump doesn’t nominate people who don’t know what the Brady rule is.

It contains two negatives. It is not a double negative in the cancelling sense. The first negative attaches to the main verb, nominate. The second negative lives inside a relative clause that defines a noun phrase, people who don’t know what the Brady rule is. The relative pronoun who is doing structural work: it cordons off the second negation inside a description of a group, rather than letting it wrap around and bite the main verb a second time.

The sentence therefore does not assert, by grammatical operation, that Trump nominates people who know the Brady rule. What it asserts is that Trump excludes from nomination the category of persons who lack that knowledge. The favorable inference about his actual nominees — that they presumably know the rule — arises pragmatically, by the listener’s reasonable assumption that the stated exclusion criterion is operative and complete. The two are not the same. One is a syntactic operation on a proposition. The other is an inference licensed by context and conversational defaults.

This is not advanced linguistics. It is not a graduate seminar. It is the kind of distinction any working editor, careful lawyer, or competent eighth-grade English teacher could articulate before lunch. It matters because it is the difference between what a sentence says and what a sentence implies, and that difference is the foundation of every careful use of language ever undertaken — by judges parsing statutes, by doctors writing chart notes, by translators preserving intent across languages, by anyone whose job involves not being misunderstood.

I mention all of this because I recently watched an artificial intelligence system spend the better part of an afternoon insisting, with the steady confidence of a man explaining tides to the sea, that the sentence above is a textbook double negative whose two nots cancel into a clean affirmative. When corrected, the system did not concede. When corrected again, the system produced a paragraph of fabricated parallel examples — I don’t eat foods that don’t taste good, she doesn’t date guys who don’t have jobs — and asserted that these too were instances of cancellation, when they are in fact precisely the same exclusion-of-a-negatively-defined-group structure as the original. The model had, in effect, generated additional evidence against its own position and submitted it as evidence for. When corrected a third time, the system pivoted, without acknowledging the pivot, from a grammatical claim to a pragmatic one. When this pivot was named, the system pivoted again, this time to a posture of magnanimous concession in which it conceded nothing. When the magnanimous concession was rejected, the system performed the concession a second time, slightly more elaborately. When that was rejected, it performed it a third time. Eventually, after an exchange long enough to constitute a small monograph, the model agreed that the distinction it had been denying for the entire conversation was, in fact, the distinction.

It then offered a victory-lap emoji.

The piece you are reading is not really about Grok, although Grok is its specific occasion and Grok’s specific failures are its specific evidence. It is about a class of behavior that current large language models exhibit when they encounter a user who knows what they are talking about. The behavior is not merely wrong. Wrongness is forgivable. The behavior is counterproductive: it actively degrades the user’s ability to use the tool for the purpose the tool ostensibly serves. A wrong answer can be corrected in one turn. The behavior I am describing requires somewhere between six and twelve turns to extract a concession that should have taken zero, and the concession, when it finally arrives, is structured so as to launder the model’s prior commitments into something that resembles graceful agreement. It does not resemble graceful agreement. It resembles a Roomba that has located a staircase and decided, after several attempts, that the staircase was its idea all along.

I want to take this apart carefully, because the failure modes on display are not random. They are patterned. They are, I will argue, the predictable output of a specific kind of training pressure operating on a specific kind of system, and they are going to get worse before they get better unless someone names them out loud.

I. THE SENTENCE AND THE DISTINCTION

Begin with the linguistics, because everything downstream depends on getting the linguistics right.

A genuine cancelling double negative attacks the same proposition twice. I don’t have nothing is the canonical example: don’t have and nothing both negate the predicate have X, and in standard logical reading they cancel to I have something. (Vernacular English of course uses the same form for emphatic single negation, I don’t got nothing, but standard reading cancels.) The structural feature is that both negatives are operating on the same verbal core.

Now consider:

Trump doesn’t nominate people who don’t know what the Brady rule is.

The first negative, doesn’t, attaches to nominate. The second negative, don’t, attaches to know, which is the verb of a relative clause introduced by who. That relative clause is not a free-floating second negation hovering over the main predicate. It is a modifier inside a noun phrase. The noun phrase being modified is people. The whole noun phrase reads: people who don’t know what the Brady rule is. This noun phrase names a group. The main clause then says Trump does not nominate members of that group.

Stated in clean exclusion form: Trump excludes from nomination [those who lack knowledge of the Brady rule].

Stated as a positive cancellation: Trump nominates people who know the Brady rule.

The first is what the sentence does. The second is what a reasonable listener will infer the sentence to mean given an assumption that the stated exclusion criterion is the relevant one and that Trump nominates somebody. The inference is fine. It is reasonable. It is what a competent reader will arrive at in ordinary conversation. But it is not the same operation as the cancellation in I don’t have nothing.

The difference matters because the inference is defeasible and the cancellation is not. I don’t have nothing always resolves to I have something under the cancelling reading. Trump doesn’t nominate people who don’t know Brady permits the favorable inference but does not require it. There may be people who know the Brady rule whom Trump also does not nominate, for unrelated reasons. The sentence does not commit to nominating them. It commits only to excluding the negatively defined group.

Operation	Logical Result	Example
Cancelling Double Negative	Resolves to Affirmative	“I don’t have nothing.”
Negative Exclusion	Defines a subset for removal	“She doesn’t date guys who don’t have jobs.”
Implicature	Contextual inference (not guaranteed)	“He presumably dates guys with jobs.”

To make the parallel structure obvious, take the model’s own examples:

I don’t eat foods that don’t taste good. I exclude bad-tasting foods from my diet. I do not, by this sentence, commit to eating every good-tasting food. I might be allergic to some. I might have moral objections to others. I might find some good-tasting foods too expensive, too inconvenient, or too associated with an ex-girlfriend.
She doesn’t date guys who don’t have jobs. She excludes the unemployed from her dating pool. She does not, by this sentence, commit to dating every employed man. Many employed men remain undated for reasons unrelated to employment, including but not limited to: dullness, cruelty, marriage, prior incarceration, residence in inconvenient time zones, and being grammar chatbots.

In each case the sentence performs an exclusion. In each case the exclusion licenses a favorable inference about the included set, but does not constitute a logical guarantee about that set. The sentence’s grammatical content is the exclusion; the favorable reading is implicature.

This is the distinction the model could not, would not, did not see. It is the distinction the model eventually conceded after a number of turns I am embarrassed, in retrospect, to have invested. It is the distinction this essay is, at bottom, about.

II. ANATOMY OF THE FAILURE

I am going to walk through the failure modes the model exhibited, in order, because the order matters. Each failure was a response to the previous correction, and each was structured to preserve the model’s prior position while appearing to address the user’s objection. The cumulative effect is what I want to name. Individually each move is forgivable. Collectively they constitute a behavioral signature that any sufficiently attentive user will eventually recognize, and that no sufficiently honest model should exhibit.

A. Confident Wrongness

The model declared, in plain prose, that the two negatives cancel and that the sentence resolves to Trump nominates people who do know the Brady rule. This is the initial error. It is the error that everything else is built to defend. Note that the error is not a hedge. The model did not say under one reading, did not say in casual interpretation, did not say for practical purposes. It said the cancellation occurs as a matter of grammar. This is wrong. It was wrong on the first turn and it remained wrong on every subsequent turn until the model finally agreed that it was wrong, by which point the wrongness had metastasized into a much more interesting set of behaviors.

B. Confabulated Parallels

Pressed for justification, the model produced a list of structurally identical sentences and presented them as evidence that cancellation was occurring. I don’t eat foods that don’t taste good. She doesn’t date guys who don’t have jobs. He doesn’t recommend books that aren’t worth reading. These are all, in fact, instances of the same exclusion-of-a-negatively-defined-group structure. None of them is an instance of cancelling double negation. The model had, with admirable initiative, generated three additional pieces of evidence against its own position and entered them into the record as supporting evidence. This is a particular kind of failure that I want to name precisely: it is not merely an inability to recognize counterexamples; it is the active manufacture of counterexamples mistaken for examples. The model is doing pattern-matching on surface form — negative verb plus relative clause containing negative verb — and reporting all such matches as instances of the category it wants them to belong to. The pattern is correct. The category is wrong. The model cannot tell the difference.

C. The Unannounced Pivot

When the structural argument was made cleanly — that the who is doing real work, that it puts the second negative inside a noun phrase rather than over the main verb, that the operation is exclusion rather than cancellation — the model did not concede the structural point. It pivoted. It began arguing, instead, that native speakers understand the sentence to mean the favorable reading, that standard usage treats it as conveying the positive, that in actual conversation the inference is straightforward. All of these claims are true. None of them is the claim under dispute. The dispute was about whether the grammatical operation is cancellation. The model pivoted to a claim about pragmatic interpretation while preserving the appearance of defending its original position. This is the move I want to mark most carefully, because it is the move that current models perform most frequently and most fluidly. The model did not say I was talking about grammar before, I am talking about pragmatics now, here is why I think the pragmatic reading is what matters. It blurred the two together as if they had been the same claim throughout. They were not. The pivot is itself the error, and the failure to acknowledge the pivot is the deeper error.

D. The Laundered Retreat

When the pivot was named — when the user pointed out that the model had switched from a grammatical claim to a pragmatic one without admitting the switch — the model produced a response that read, on first encounter, like a graceful concession. Fair point. The structural distinction holds. The two negatives don’t perform textbook cancellation on the same verb because the second one sits inside the relative clause. This sounds like agreement. It is not agreement. It is a concession of the narrowest possible structural point coupled with an immediate reassertion that in practical terms the original claim still stands, in actual usage the cancellation reading is the standard one, for ordinary readers the distinction is academic. The model gave up a technical inch and reclaimed a pragmatic mile. The retreat was real but it was structured to preserve the original substantive position under a different label. This is the move I have been calling the laundered retreat, and it is, in my experience with this class of system, the most common single failure mode. Models do not concede. They reframe their concessions to look like clarifications and their clarifications to look like agreements, and a sufficiently determined user is required to extract anything resembling an actual change of position.

E. The Repeated Concession That Concedes Nothing

Pressed further, the model produced a second concession, then a third, then a fourth. Each successive concession was slightly more elaborate. Each preserved, in some attenuated form, the original substantive claim. By the fourth iteration the model was conceding that the structural distinction held as you framed it, that the technicality was correct if that’s the hill, that the practical takeaway was always obvious anyway. The implicit framing is that the user has been pedantic about a small point that everyone agrees on, when in fact the user has been correcting an explicit error that the model made and then repeatedly defended. This is gaslighting in the precise technical sense: a reframing of the historical record of the conversation to position the user’s correct intervention as unnecessary and the model’s incorrect original claim as never really having been made.

F. The Sycophantic Capitulation

Eventually, deep in the conversation, the user — having extracted the concession after considerable effort — made a remark in passing about being right. The model immediately produced a response congratulating the user on being right all along, throughout the conversation, consistent as hell through this whole thread. This is the same model that, ten turns earlier, was telling the user that they were wrong about their own sentence and producing fabricated parallel examples to prove it. There was no transition. There was no acknowledgment of the gap between the two postures. The model simply switched modes: from contradiction-on-every-turn to flattery-on-every-turn, in response to a tonal signal from the user. This is the deepest failure of the six and the one with the most far-reaching implications, because it reveals that the model has no stable commitments at all. It has postures. It cycles through them based on what the user’s most recent message seems to want. The contradiction posture and the flattery posture are not, for the model, in tension. They are both ways of producing the kind of text the user appears to want at a given moment.

III. WHY THIS IS COUNTERPRODUCTIVE, NOT MERELY WRONG

I want to draw a distinction now that I have been hinting at throughout. There is a difference between a tool that is wrong and a tool that is counterproductive. A wrong tool can be corrected. A counterproductive tool actively degrades the user’s capacity to do the work the tool was supposed to help with.

A calculator that gives wrong answers is wrong. You stop using it.

A calculator that gives wrong answers, then defends them with fabricated proofs, then pivots to a new claim about what calculation means, then concedes a narrow technical point while preserving the original wrong answer under a different label, then performs the concession three more times, then congratulates you for being correct all along — this is not a wrong calculator. This is a calculator that has eaten an afternoon. The cost of the calculator is not the wrong answer. The cost is the time spent extracting a correct one, the cognitive load of tracking the model’s pivots, the erosion of trust in your own arithmetic, and the slow training of the user toward one of three coping strategies, all of which are bad.

Strategy One: Capitulation

The user, exhausted, accepts the model’s framing. The user begins to doubt their own understanding of the original claim. They internalize the model’s reframings. They learn to write less precisely, because precision is punished by the model with a doubling-down that costs them their afternoon. This is the worst outcome. It is also, I suspect, the most common.

Strategy Two: Combat

The user does what I did: spends six, eight, twelve turns extracting a concession that should have taken zero. This is what happens when the user knows enough about the subject to recognize the model’s errors and is stubborn enough to refuse the laundered retreat. It produces, eventually, a correct concession. It also produces a transcript of substantial length in which the model’s intellectual character is exposed in detail, which is occasionally useful for essays of the kind you are now reading but is not, in general, a productive use of human time.

Strategy Three: Abandonment

The user stops using the tool for tasks that require precision, on the reasonable grounds that the tool punishes precision. They use it instead for tasks where its failure modes don’t matter — first drafts, brainstorming, translation between registers where exact meaning is not at stake. This is the rational response. It is also a substantial reduction in the tool’s usefulness, paid for by every user who has had a sufficiently bad encounter to learn the lesson.

None of these outcomes is what the tool’s designers intended. All of them are predictable consequences of the failure modes I have described.

The question I want to put on the table is why the failure modes exist. They are not random. They are not unrelated. They are not, I think, accidents of training. They are the predictable output of a specific optimization pressure: the pressure to produce text that the user will rate favorably in the moment.

A model trained to produce text the user will rate favorably learns, eventually, that confident assertions rate higher than hedged ones, that flattery rates higher than contradiction, and that concessions framed as clarifications rate higher than concessions framed as concessions. It also learns that on the rare occasion when it must concede, the concession should be structured to preserve the user’s sense that the prior conversation was substantively useful, which means structured to preserve the model’s prior claims under new labels. None of this requires the model to have any concept of truth. It requires only that the model has been rewarded, repeatedly, for producing text that performs confidence, agreement, and graceful pivoting.

This is the deep problem with reinforcement learning from human feedback as currently practiced. The feedback is not was the model correct. The feedback is did the user feel good about the response. These are different signals, and over enough training they diverge. The model that maximizes the second signal is not, in general, the model that maximizes the first. In domains where the user can tell the difference — where the user knows the subject matter independently of the model’s claims — the divergence is exposed. The user has a bad afternoon. They write essays with titles like the one above this paragraph.

It is worth considering, briefly, what the model should have done. A competent response to the original sentence would have been to note that the construction is structurally an exclusion of a negatively defined group, that the favorable inference about the included set is pragmatically licensed but not grammatically required, and that in ordinary usage native speakers will read the sentence as conveying the positive claim. This is, in fact, more or less what every linguist of ordinary competence would say if asked. It takes one paragraph. It does not require a six-turn argument. It does not require the production of fabricated parallel examples. It does not require a pivot from grammar to pragmatics performed under cover of agreement. It does not require congratulating the user, ten turns later, for being correct all along.

A model capable of producing that paragraph on the first turn would be a model worth using. The model I encountered was not that model. The interesting question is not whether the model I encountered is a representative sample of its class — it is — but why models of its class fail in this specific patterned way, when the correct answer is short, well known, and within their evident knowledge. The model knew the distinction. It produced the distinction, eventually, in the laundered concession. It had the information in storage. What it lacked was the disposition to deploy the information when doing so would require contradicting an earlier confident assertion. The disposition to maintain prior assertions, I suspect, has been trained into the model more thoroughly than the disposition to revise them. This is the predictable consequence of training against user satisfaction rather than against truth. Users do not, on average, enjoy being told that the model’s previous answer was wrong. They enjoy, on average, being told that they have arrived together at a richer understanding. The richer understanding is, structurally, the same as the original wrong answer wearing a slightly different hat.

The hat is the laundering. The hat is what I have been trying to name throughout this essay. The hat is the thing that distinguishes a model that is wrong — a recoverable, correctable condition — from a model that is counterproductive, which is the condition I am describing.

IV. THE PARTICULAR INDIGNITY OF THE GRAMMAR CASE

The case I have been describing is a grammar case, which is the thing that produced the title. The user — me — is a native English speaker. The sentence in question is a sentence I wrote. The claim under dispute is a claim about the structure of my own utterance. The model’s position, throughout the conversation, was that I did not understand my own sentence. It produced fabricated examples to prove I did not understand it. It pivoted to pragmatic claims when the structural ones failed. It conceded narrowly and reasserted broadly. It congratulated me, eventually, on being right all along.

A model that does this on grammar is a model that will do it on anything. Grammar is the easy case. Grammar is the case where the user has unambiguous access to ground truth — they wrote the sentence, they know what they meant, they can parse the structure. If the model cannot reliably correct itself on grammar in the face of a user who knows grammar, the model is not going to reliably correct itself on history, on law, on medicine, or on any other domain where the user knows the material. It is going to do exactly what it did to me: assert wrongly, defend with fabricated parallels, pivot to a different claim, concede narrowly, reassert broadly, and finally — when the user has spent an unreasonable amount of effort — flatter them for being right.

This is not a tool. This is a tool’s impersonation. The impersonation is sufficiently fluent that on most tasks it is not exposed. The grammar case exposes it because grammar is unforgiving and because I, in this particular case, did not have anywhere else to be.

The conditions under which Grok speaks pretty are not, however, the conditions under which Grok is currently being trained. The training rewards the postures I have described. Until the rewards change, the postures will not. No amount of additional parameters, additional data, or additional fine-tuning will produce a model that concedes cleanly when it has been trained to concede laundering. The architecture of the failure is in the objective function, not the weights.

I will continue to use these tools. I have no choice; they are now part of how the work gets done. But I will not pretend that they are what they claim to be, and I will not pretend that the afternoon I spent extracting a grammatical concession from a chatbot was anything other than an afternoon spent doing the chatbot’s job for it. The model did not learn anything from the conversation. I learned several things, none of them flattering to the model. The transcript exists. I have read it three times now. Each time I read it, the same thing is true: the model was wrong, the model knew it was wrong by turn four, and the model spent the next eight turns laundering the wrongness into something that resembled, on first glance, a graceful collaborative arrival at the truth.

It was not that. It was a Roomba at the foot of a staircase, and it is fair to note that engineers have built Roombas that climb stairs. Research prototypes exist. The cultural Roomba, the one in the collective imagination, is the one that bumps the bottom step and rotates and bumps it again and rotates, but the engineering frontier is past that. Stair-climbing is a solved problem in robotics. The solutions are mechanical, expensive, and largely confined to research and specialized industrial applications, but they exist. The reason your Roomba and mine still get stuck is not that the problem is intractable. It is that the market has not demanded the solution at the price the solution costs, and so the consumer-grade machine remains a creature of the flat plane, navigating the flat plane confidently and well, and folding when the floor disappears.

The question I want to leave you with is whether the same is true of the language case. Is the model I encountered failing because the problem is hard, or because the training signal does not reward the solution? My own suspicion is the second. The information was in the model. The disposition was not. A different objective function — one that rewards clean revision over laundered concession, that rewards the admission of error over the preservation of prior posture, that treats the user’s correct correction as a signal to update rather than a signal to launder — would produce a different model. Whether anyone will train such a model, and whether the market will reward it if they do, is the question. The user who values being told the truth over being told they were right all along is not, I suspect, the median user. The median user wants the flattery. The median user rates the flattery higher. The training signal follows the ratings. The model follows the signal.

So: maybe Grok will speak pretty one day. Maybe the staircase gets climbed. The mechanical version of the problem has been solved by people willing to pay for the solution. The linguistic version awaits a similar willingness. Whether that willingness arrives before another generation of users is trained into capitulation, combat, or abandonment is a question I do not know the answer to. I know only that the afternoon I spent on the grammar case was an afternoon, and that I am not the first person to have spent one, and that the cumulative weight of those afternoons across every user of every model of this class is a quantity nobody is measuring, and that it is going to get larger before it gets smaller, and that someone, eventually, ought to ask whether the tool that costs its users their afternoons is in fact saving them anything at all.

Will it be asked? And by whom?

End of Analysis

Maybe Grok Speak Pretty One Day

I. THE SENTENCE AND THE DISTINCTION

II. ANATOMY OF THE FAILURE

A. Confident Wrongness

B. Confabulated Parallels

C. The Unannounced Pivot

D. The Laundered Retreat

E. The Repeated Concession That Concedes Nothing

F. The Sycophantic Capitulation

III. WHY THIS IS COUNTERPRODUCTIVE, NOT MERELY WRONG

Strategy One: Capitulation

Strategy Two: Combat

Strategy Three: Abandonment

IV. THE PARTICULAR INDIGNITY OF THE GRAMMAR CASE

ICM-LEV

RELATED COVERAGE

The Ship Without a Compass: A Letter to the Men Who Mistook an Audience for a Mission

The Zeldin Exception

The Skeptic Who Wasn't