New questions from the Postmortem Article #422

Ultrabenosaurus · 2023-07-12T13:10:11Z

Ultrabenosaurus
Jul 12, 2023

From reading the postmortem article a few things stood out to me, which I feel may justify new questions or should absolutely be referenced when answering existing questions on these topics.

Due to how close we are to the community call I will apologise for being brief in order to give MDN / Mozilla as much time as possible to review this. I will put a quote from the postmortem (not necessarily in the same order as in the article) followed by a question in bold, and then consolidate all questions at the end.

We also saw that other developer-focused companies were investing significantly in and building products on top of the technology. With MDN's documentation being publicly available under an open Creative Commons license, and considered best-in-class, it's reasonable to assume most models have been trained on our content, and products started to be announced which explicitly allowed the consumption of our documentation. This led us to understand that, irrespective of our personal feelings, our users already access our content through generative AI.

1. Why are MDN using peer pressure as a justification for intentionally introducing LLMs, known to generate only plausible content not true and accurate information, to their "best-in-class" technical documentation?

2. What do MDN propose makes their documentation "best-in-class" if not accuracy and being approachable to readers?

We recognized an opportunity with these features to better assist less experienced developers with the intent to be helpful rather than matching our reference documentation's exact quality and precision. Beginners are new to navigating reference documentation and often turn to other, potentially less accurate, information sources. Learning isn't an immediate jump to a correct conclusion but an iterative process where we consider incorrect information and discard it as our understanding solidifies. Just as a human might (as an expert or a peer in a learning community) give an incorrect response, it is still ultimately useful as it unblocks, gives our users ideas, and points them to something relevant.

3. Why are MDN claiming that intentionally giving an incorrect response to novice readers of "best-in-class" technical documentation is a good thing?

4. Have MDN considered the impact to their reputation as well as the impact to novice developer's time, project success, and frustration / mental health from dealing with incorrect responses given intentionally by MDN?

See also: a great summary of the likely novice user experience by @kyanha on #9230.

Therefore, one method we tried was to run explanations created by gpt-3.5-turbo-0301 for each of the 25,820 unique code examples on MDN and validate the output against GPT-4. The summary of the responses from this experiment tagged generated explanations as either accurate, somewhat inaccurate, or incorrect. We randomly sampled responses that were not considered high-quality and manually inspected and evaluated them.

I acknowledge this states "one method we tried was" and thus other methods were also employed, however:

5. Why did MDN trust that another LLM would correctly flag output as "accurate, somewhat inaccurate, or incorrect"?

6. Why did MDN only review the output which the separate LLM deemed "low-quality", and not also the ones it deemed "accurate" to ensure they were, indeed, accurate?

In four cases, the code examples contained newer web features that GPT-3.5 doesn't know about due to its limitation to information until 2021.

7. Why do MDN / Mozilla believe it is viable to use a model trained on content that is 2-3 years out of date?

In three cases (including one duplicate), we found the code examples to be sufficiently complex that many developers would have a hard time explaining them.

8. Will MDN take steps to identify such complex pages and prevent the LLM from attempting to summarise them until such time they have been sufficiently improved by a human?

For AI Help specifically, we added a "Report an issue with this answer on GitHub" link to all answers, making it easy for users to raise issues in a dedicated ai-feedback repository with all necessary context for us to expedite bug fixes and enhancements of the feature.

9. How do MDN expect to collect sufficient reports of problematic LLM output when the target audience are those least likely to know the output is wrong at the time they read it, and given the time and effort involved in the user explaining the problem on GitHub when they're trying to work on their own project and needed a quick technical reference?

10. Can MDN not have a way to either only show LLM output already vetted by knowledgeable humans, or improve the reporting process so users only need click a button and it sends their prompt / page and the LLM output to MDN for review automatically? This would be beneficial for all "experimental" updates in the future to ensure the widest possible collection of reports.

And, finally, credit to @nicuveo for this:

We know that technical accuracy is why our readers come to MDN

Then why are you so insistent on integrating a third-party inaccuracy generator into it?

11. Why are MDN so insistent on introducing a third-party content generator that, by design, will always be able to produce inaccurate-but-plausible output if MDN know that technical accuracy is why readers come to MDN?

Questions

Why are MDN using peer pressure as a justification for intentionally introducing LLMs, known to generate only plausible content not true and accurate information, to their "best-in-class" technical documentation?
What do MDN propose makes their documentation "best-in-class" if not accuracy and being approachable to readers?
Why are MDN claiming that intentionally giving an incorrect response to novice readers of "best-in-class" technical documentation is a good thing?
Have MDN considered the impact to their reputation as well as the impact to novice developer's time, project success, and frustration / mental health from dealing with incorrect responses given intentionally by MDN? (See also: a great summary of the likely novice user experience by @kyanha on #9230)
Why did MDN trust that another LLM would correctly flag output as "accurate, somewhat inaccurate, or incorrect"?
Why did MDN only review the output which the separate LLM deemed "low-quality", and not also the ones it deemed "accurate" to ensure they were, indeed, accurate?
Why do MDN / Mozilla believe it is viable to use a model trained on content that is 2-3 years out of date?
Will MDN take steps to identify such complex pages and prevent the LLM from attempting to summarise them until such time they have been sufficiently improved by a human?
How do MDN expect to collect sufficient reports of problematic LLM output when the target audience are those least likely to know the output is wrong at the time they read it, and given the time and effort involved in the user explaining the problem on GitHub when they're trying to work on their own project and needed a quick technical reference?
Can MDN not have a way to either only show LLM output already vetted by knowledgeable humans, or improve the reporting process so users only need click a button and it sends their prompt / page and the LLM output to MDN for review automatically? This would be beneficial for all "experimental" updates in the future to ensure the widest possible collection of reports.
Why are MDN so insistent on introducing a third-party content generator that, by design, will always be able to produce inaccurate-but-plausible output if MDN know that technical accuracy is why readers come to MDN?

caugner · 2023-07-13T08:47:11Z

caugner
Jul 13, 2023
Maintainer

@Ultrabenosaurus Would you mind putting your questions in an enumerated list, to make it easier to respond, and put only keywords in bold, rather than the entire questions, to make them easier to read? 🙏

3 replies

Ultrabenosaurus Jul 13, 2023
Author

I have updated the question format to use a numbered list and non-bold text 👍

caugner Jul 13, 2023
Maintainer

Thank you, much appreciated.

dougfinnie Jul 15, 2023

One could be tempted to have asked AI to do that, huh?

Ultrabenosaurus · 2023-08-13T19:32:51Z

Ultrabenosaurus
Aug 13, 2023
Author

@caugner @LeoMcA @Rumyra when will these questions be addressed?

1 reply

Ultrabenosaurus Aug 27, 2023
Author

@LeoMcA @caugner @Rumyra have MDN abandoned their promise to answer all community questions?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

MDN Web Docs

New questions from the Postmortem Article #422

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Replies: 2 comments 4 replies

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

Select a reply

MDN Web Docs

New questions from the Postmortem Article #422

Ultrabenosaurus Jul 12, 2023

Questions

Replies: 2 comments · 4 replies

caugner Jul 13, 2023 Maintainer

Ultrabenosaurus Jul 13, 2023 Author

caugner Jul 13, 2023 Maintainer

dougfinnie Jul 15, 2023

Ultrabenosaurus Aug 13, 2023 Author

Ultrabenosaurus Aug 27, 2023 Author

Ultrabenosaurus
Jul 12, 2023

Replies: 2 comments 4 replies

caugner
Jul 13, 2023
Maintainer

Ultrabenosaurus Jul 13, 2023
Author

caugner Jul 13, 2023
Maintainer

Ultrabenosaurus
Aug 13, 2023
Author

Ultrabenosaurus Aug 27, 2023
Author