Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

The AI help button is very good but it links to a feature that should not exist #9230

Open
3 tasks done
nyeogmi opened this issue Jul 4, 2023 · 168 comments
Open
3 tasks done
Labels
🧑‍🤝‍🧑 community contributions by our wonderful community idle

Comments

@nyeogmi
Copy link

nyeogmi commented Jul 4, 2023

Summary

I made a previous issue pointing out that the AI Help feature lies to people and should not exist because of potential harm to novices.

This was renamed by @caugner to "AI Help is linked on all pages." AI Help being linked on all pages is the intended behavior of the feature, and @caugner therefore pointed out that the button looks good and works even better, which I agree with -- it is a fantastic button and when I look at all the buttons on MDN, the AI Help button clearly stands out to me as the radiant star of the show.

The issue was therefore closed without being substantively addressed. (because the button is so good, which I agree with)

I think there are several reasons the feature shouldn't exist which have been observed across multiple threads on platforms Mozilla does not control. Actually, the response has been universally negative, except on GitHub where the ability to have a universally negative response was quietly disabled Monday morning.

Here is a quick summary of some of those reasons.

One, the AI model is frequently wrong. Mozilla claims it intends to fix this, but Mozilla doesn't contain any GPT-3.5 developers and OpenAI has been promising to fix it for months. It's unlikely this will actually happen.

Two: contrary to @caugner 's opinion, it's very often wrong about core web topics, including trivial information where there is no obvious excuse. Here are some examples:

Even examples posted by people who support the existence of the AI contain significant errors:

(I say examples, but note: this is the only usage example provided by a person who supported the existence of the feature, and it contained an error.)

This is identical to one of the categories of problem seen on StackExchange when StackExchange introduced its generative AI assistant based on the same model, and it led to Stack removing the assistant because it was generating bizarre garbage.

Three: it's not clear that any documentation contributors were involved in developing the feature. Actually, it's still unclear who outside of @fiji-flo and @caugner was involved in the feature. Some contributors including @sideshowbarker have now objected and the process has produced a default outcome, which is that AI Explain was voluntarily rolled back and AI Help remains in the product.

It is probably OK for those contributors to review each other's own code, but they're also managing the response to the backlash. After a bunch of people have already signaled "hey, I have an active interest in this feature" by engaging with a relevant issue, excluding those people reflects that a ruling of "actually, you do not have an active interest!" has been reached, and it's not clear what basis that ruling would have been reached on.

Four: the existence of this feature suggests that product decisions are being made by people who don't understand the technology or who don't think I understand it.


Overall, the change tells the story that MDN doesn't know who their average user is, but assumes that the average user is (1) highly dissimilar to the GitHub users who were involved in the backlash (2) easy to sell to.

The fact is that in one day, measured in upvotes, you attracted comparable backlash to what the entire StackOverflow strike attracted in a month. It would be a mistake to think only a small group of people are concerned. This attitude would be wishful thinking.

It seems like the fork in the road for MDN is:

  • run at a loss and hope that corporate sponsors like Google continue to fund it anyway. (It's hard to imagine this happening because almost all of the value of MDN is created inside the OWD team, which is funded separately.)
  • mimic Mozilla's strategy: increasingly productize itself to users via features like the AI assistant, and increasingly productize itself to corporations by selling its credibility
  • stop existing

If option 1 isn't sustainable, then between option 2 and option 3, option 3 is obviously better for humanity in the long-run and I would encourage MDN to make plans for its own destruction.

In the worst possible world, the attitude is correct and the users are easy to sell to. Well, in that case, you've created another product company and in doing so you've metaphorically elected to serve both God and money -- and as is evidenced by the recent implosions of every siloed social media company, that is always a great idea.


Again, the AI Help button is absolutely gorgeous and functions as intended. This issue is not about the AI Help button and therefore should not be closed as a button-related wontfix, or renamed by @caugner into a description of the behavior of the button.

URL

#9208
#9214

Reproduction steps

Pivot to a more aggressive funding model, then engage in a mix of panic and corporate groupthink.

Expected behavior

I think the button is amazing and you are doing a great job.

Actual behavior

The AI help feature should not exist.

Device

Desktop

Browser

Chrome

Browser version

Stable

Operating system

Windows

Screenshot

image

Anything else?

No response

Validations

@github-actions github-actions bot added the needs triage Triage needed by staff and/or partners. Automatically applied when an issue is opened. label Jul 4, 2023
@NoraCodes
Copy link

I just want to agree with this report wholeheartedly. The use of large language models to offset labor is problematic enough, but doing so when those LLMs do not even consistently produce reasonable or correct output is utterly unconscionable. MDN is supposed to be a simple, authoritative source for the web platform; with the introduction of "AI Help", you're throwing that reputation away. I never would have imagined I'd be recommending w3schools over MDN to new programmers, but that's where we are today.

I'm a long-time Firefox user. I've worked with Mozillans in the past, including on the 2nd edition of Programming Rust. I know you're decent people; do the right thing and ditch the AI bullshit.

@sideshowbarker
Copy link
Collaborator

sideshowbarker commented Jul 4, 2023

To provide some context here about the relationship of OWD to MDN and about my own role in all this:

OWD funds the work of a group of writers, whose names you can find at https://openwebdocs.org/team/#writers — and the funding for OWD itself is organized through an Open Collective, which has a formal Team, the names of whose members you can find under the Team tab at https://opencollective.com/open-web-docs#section-contributors.

While I am among the 150+ individual people who have donated to OWD, I am neither formally one of the OWD writers nor formally one of the OWD Team members.

To be clear on my actual role: I’m one of the core reviewers/maintainers who have push/merge access to the https://github.com/mdn/content/ repo (the content of MDN) doing reviews of incoming PRs and otherwise contributing to the repo. The set of core reviewers/maintainers includes the OWD writers, but it also includes some writers who work for Mozilla, and includes me and some others who are neither formally OWD writers nor writers from Mozilla.

See https://github.com/orgs/mdn/teams?query=@sideshowbarker for the list of relevant GitHub teams I belong to, and https://github.com/mdn/content/pulls?q=reviewed-by:sideshowbarker for the reviews I’ve done (3858 so far) and https://github.com/mdn/content/graphs/contributors to see my own commits (and those of other contributors).

And FWIW here I’ll mention that I actually also have push/merge access to the Yari repo at https://github.com/mdn/yari/ repo, which has the source code for the platform on which MDN runs — including code for things like the “AI Explain” button, but also code for all kinds of good things that aren’t controversial at all.

I am not a core Yari reviewer/maintainer, but I have actually done reviews there (20 so far), as shown in https://github.com/mdn/yari/pulls?q=is:pr+reviewed-by:sideshowbarker — in cases where it has made sense for me to review — and commits (42 so far), as shown in https://github.com/mdn/yari/commits?author=sideshowbarker.

@resuna
Copy link

resuna commented Jul 4, 2023

I do not believe there is currently a theoretical framework for making statistical text generation distinguish the truth of a statement so there is no likelihood of this being fixed with any anticipated development based on the current technology.

@nyeogmi

This comment was marked as outdated.

@Xe
Copy link

Xe commented Jul 4, 2023

I'm not sure if this is the right implementation, but the idea is on the right track. A better way to implement this with large language models should that be desired is to have an AI generate a bunch of options for knowledgeable human technical writers to pick from and then form a composite to ensure that the answers are technically accurate and using large language models to assist the creative flow.

i think that the current implementation is wishful thinking at best and I am sad to see such a critical web resource fall prey to hype cycles that cut out the best part of MDN: the technical writers. Hopefully what I proposed is a viable middle path.

@nyeogmi
Copy link
Author

nyeogmi commented Jul 4, 2023

I do not believe there is currently a theoretical framework for making statistical text generation distinguish the truth of a statement so there is no likelihood of this being fixed with any anticipated development based on the current technology.

As far as I can tell, the framework AI Help is using is described here. Basically, it feeds the posts to a search engine and then uses the search engine to make sure at least one relevant MDN doc can be surfaced to the LLM before it outputs anything.

The idea seems to be "well, AIs are better at summarizing text and doing question/answer tasks about specific passages than they are at answering questions off the cuff," which I think is probably true.

(Does this work? I don't know. When I tried the tool, I was trying to trick it, and it mostly just told me "I can't answer that," which I suspect means it was falling over at the search engine step.)

I would say this is actually really close to the model AI Explain used, so I would expect it to produce similar mistakes. From talking to Augner, it sounds like Augner doesn't believe any examples taken from AI Explain are representative of likely weaknesses in AI Help, which is surprising to me, but that appears to be their current position.

Overall, I think an affirmative case for "AI would be good at this task" is still missing. Augner wants an affirmative case that it won't work, I want an affirmative case that it will, so we're basically talking past each other.

@mdn mdn deleted a comment from nyeogmi Jul 5, 2023
@fiji-flo
Copy link
Contributor

fiji-flo commented Jul 5, 2023

AI Help works very different than AI Explain. We restrict the LLM to base it answers on the context we give it (which is actual up to date MDN content). So you won't see the issues AI Explain was exhibiting.


A basic helpful answer (to the following question) would be:

How can I detect that I'm in offline mode?

To detect if you are in offline mode, you can use the navigator.onLine property. This property returns a boolean value, with true indicating that the browser is online and false indicating that it is offline.

Here is an example of how you can use navigator.onLine to check if you are in offline mode:

if (navigator.onLine) {
  console.log("Online");
} else {
  console.log("Offline");
}

In this example, if navigator.onLine returns true, it means that the browser is online, and the message "Online" will be logged to the console. If navigator.onLine returns false, it means that the browser is offline, and the message "Offline" will be logged to the console.

Please note that if the browser does not support the navigator.onLine property, the above example will always return false or undefined.

If you want to listen for changes in the network status, you can use the online and offline events. Here is an example of how to do that:

window.addEventListener("offline", (e) => {
  console.log("Offline");
});

window.addEventListener("online", (e) => {
console.log("Online");
});

By adding event listeners for the offline and online events, you can execute specific code when the browser goes offline or comes back online.

MDN content that I've consulted that you might want to check:

@Zarthus
Copy link

Zarthus commented Jul 5, 2023

@fiji-flo it looks like the links you've provided are broken. (maybe the urls were relative?)

My main concern is that AI Help does not have a place in technical documentation. Yes, in theory it could help out a few people, but the target audience it seems to aim for (new developers or someone unfamiliar with the concept it is trying to learn about) coupled with our current understanding and research about LLMs (in a nutshell; they can confidently present inaccurate information) seems to be a hugely concerning mismatch.

You need someone to fact-check the response from a LLMs, a four eye principle is often applied on technical docs (one writer, and at least one reviewer) which is missing from the LLM.

Therefore, there is a significantly increased risk that the LLM provides wrong information to someone not knowledgeable enough about the subject to fact-check if the AI is confidently providing misinformation, or is actually accurate.

How does the team behind AI Explain hope to alleviate this concern, beyond plastering the user with warnings (which might be hint that this is not a product-market fit?)

@caugner
Copy link
Contributor

caugner commented Jul 5, 2023

Here's another helpful answer for the following question about a brand-new web feature:

How does oklch() relate to oklab()?

The oklch() functional notation is related to the oklab() functional notation as they both express colors in the Oklab color space. However, they differ in the way they represent colors.

While oklab() uses a Cartesian coordinate system with the a and b axes to represent colors, oklch() uses polar coordinates with the C (Chroma) and H (Hue) values. Both notations share the same L axis, which represents the perceived lightness of the color.

In oklch(), the C value represents the chroma or the intensity of the color, while the H value represents the hue or the angle of the color on the color wheel.

Using oklch(), you can express colors in the Oklch color space by specifying the L, C, and H values. This notation allows you to work with polar coordinates, which can be useful for certain color manipulations or when you want to represent colors in a different way than the Cartesian coordinates used in oklab().

Here is an example of using the oklch() notation:

[data-color="blue"] {
  background-color: oklch(50% 70 240);
}

In this example, the oklch() function is used to set the background color of an element to a blue color with a lightness of 50%, a chroma of 70, and a hue of 240 degrees.

Overall, oklch() provides an alternative way to represent colors in the Oklab color space using polar coordinates, while oklab() uses Cartesian coordinates.

MDN content that I've consulted that you might want to check:

@Zarthus
Copy link

Zarthus commented Jul 5, 2023

@caugner I think it's a little bit impolite to respond to @nyeogmi with an almost completely automatically generated answer (not relevant to the subject). Would you like to respond to their message (the main content of the issue) as well please?

@caugner
Copy link
Contributor

caugner commented Jul 5, 2023

@Zarthus Both this and that comment respond to @nyeogmi who requested positive examples of answers produced by AI Help:

Overall, I think an affirmative case for "AI would be good at this task" is still missing. (...), I want an affirmative case that it will, so we're basically talking past each other.

@Zarthus
Copy link

Zarthus commented Jul 5, 2023

@caugner: If that was the essence and what the contributors of AI Explain and AI Help have taken away from this issue, and is their official response to this issue, I shall pardon myself from this thread.

@mdn mdn deleted a comment from GabrielRavier Jul 5, 2023
@rileyinman
Copy link

This is honestly quite embarrassing. I've been a vocal proponent of Mozilla, their products, and MDN for quite a long time. Seeing the consistent non-acknowledgment of perfectly valid, calmly laid out reasoning against this feature in its current state is disheartening. If Mozilla is set on its current path and will refuse to bend to criticism on this feature, at least do the service of outright saying so - then we can all stop wasting our time.

@obfusk
Copy link

obfusk commented Jul 5, 2023

I really really didn't want to be part of this discussion. But if people are worried about this feature producing convincing but inaccurate/wrong/misleading output (which LLMs are known to do), providing examples of correct output will not convince them. That only proves that the LLM is capable of being correct and useful (which I don't think anyone has disputed). Not that it is likely to be correct most of the time. Nor that it will not provide really bad results some of the time. Nor does it address the issue that users may not be able to tell these cases apart.

It's really easy to create an algorithm that produces correct output some of the time, or even most of the time, but that fails spectacularly in some (edge) cases. That may be acceptable if it's clear beforehand when it will fail, so that people can avoid the edge cases, or if it's easy to tell when it has failed. But algorithms are a lot more predictable than LLMs. You can usually at least prove they are correct under certain conditions. LLMs are much harder to predict. And we know that LLMs can "hallucinate" perfectly convincing but non-existent sources for their claims.

Even if the LLM produces accurate, useful, output 99% of the time, can I know whether the output I'm currently getting is in fact accurate without fact-checking it every time?

@joepie91
Copy link

joepie91 commented Jul 5, 2023

@Zarthus Both this and that comment respond to @nyeogmi who requested positive examples of answers produced by AI Help:

Overall, I think an affirmative case for "AI would be good at this task" is still missing. (...), I want an affirmative case that it will, so we're basically talking past each other.

My understanding is that they were requesting an affirmative case to be made for it being structurally good at this task, rather than providing an individual question that it managed to answer sufficiently accurately (which does not say much about structural fitness for the task).

@caugner
Copy link
Contributor

caugner commented Jul 5, 2023

@sideshowbarker Please stop hiding or deleting comments in this repository. Thank you!

@eevee
Copy link

eevee commented Jul 5, 2023

Please note that if the browser does not support the navigator.onLine property, the above example will always return false or undefined.

what? if a property isn't supported, it will always be undefined. it would make no sense for a browser to specifically define navigator.onLine to only be false.

most of that answer is complete fluff, and more importantly it does not really answer the original question — because of exactly the problem that it struggles to raise. if you want to know for sure that you're in offline mode, you would have to check navigator.onLine === false, so you know you're not mistaking lack of support for being offline.

The oklch() functional notation is related to the oklab() functional notation as they both express colors in the Oklab color space. However, they differ in the way they represent colors.

...

Using oklch(), you can express colors in the Oklch color space...

so are they the same colorspace or not? this seems like the crux of the question, but the bulk of the response is rambling that rephrases parts of the linked articles (including repeated mention of the cartesian/polar distinction, which i doubt will help someone who isn't already visualizing a colorspace in their head), rather than a direct answer. it's mostly explaining oklch() and barely touching on oklab() at all.

a good direct answer would probably say that you just want oklch() if you're not already familiar with Lab's A/B axes and how they correspond to colors. instead we get "here's what blue looks like", without even explaining why it's blue.


but an LLM can't give an answer like that, because it doesn't understand context, or what common sticking points might look like. or anything at all. all it can do is babble, and hopefully not babble something that's incorrect.

but you can't ever be confident that it won't be wrong about some percentage of arbitrary questions. and if it is wrong, you can't directly correct it the way you might correct a static article. all you can do is keep feeding it more text and cross your fingers that it starts babbling more correctly, in an infinite game of whack-a-mole.

it might seem like i'm being nitpicky here. and i am — because these examples were specifically cherry-picked to defend the existence of the feature itself. they are the best case scenario. and they are, charitably, mediocre.

ultimately, if you create a chatbot (which you explicitly call "trusted"!) that can't really do much more than restate the contents of existing articles, and you're relying on the reader to sift through its rambling to find the actual information they asked for... then what was the point? they could just as well have sifted through the articles themselves to find the information they wanted, without risking that crucial details will get lost or twisted.

@kyanha
Copy link

kyanha commented Jul 5, 2023

I would say this is actually really close to the model AI Explain used, so I would expect it to produce similar mistakes. From talking to Augner, it sounds like Augner doesn't believe any examples taken from AI Explain are representative of likely weaknesses in AI Help, which is surprising to me, but that appears to be their current position.

Overall, I think an affirmative case for "AI would be good at this task" is still missing. Augner wants an affirmative case that it won't work, I want an affirmative case that it will, so we're basically talking past each other.

I think the affirmative case for Augner should be "there are many examples of incorrect information being provided already cited."

I'd like to read what precisely the proponents think it's going to help with.

@sideshowbarker
Copy link
Collaborator

@sideshowbarker Please stop hiding or deleting comments in this repository. Thank you!

Lest anyone else here be led to believe I hid or deleted any comments nefariously or something: Allow me to be fully transparent about exactly what did actually hide and delete —

So, for the record here: The only comments I hid or deleted were completely innocuous cleanup of outdated comments related to updates that got made to the issue description. (See the remaining related comment at #9230 (comment).)

Specifically: I had posted a comment correcting some things that had been in the issue description, and there were some follow-up comments from the OP and another commenter about that — and then the issue description was subsequently updated based on my corrections.

So that update of the issue description rendered all those comments outdated and no-longer-necessary, and they were therefore amicably deleted by agreement with the OP — with the point being that keeping those comments hanging around would have just been noise that distracted from the substance of the discussion here.

@aardrian
Copy link

aardrian commented Jul 5, 2023

@fiji-flo (and @caugner)

AI Help works very different than AI Explain. We restrict the LLM to base it answers on the context we give it (which is actual up to date MDN content). So you won't see the issues AI Explain was exhibiting.

I tried to confirm this assertion by pasting some code into AI Help and asked it to explain the code. I used my first CSS example from issue 9208 (I do not have an account, so I don't want to use up my free checks for today)

For the example, I got this final paragraph (after the LLM explained each included property that visually hides the pseudo-content):

These pseudo-elements and their styles are used to visually indicate the start and end of strikethrough text. However, it's important to note that the presence of the s element is not announced by most screen reading technology in its default configuration. To make it accessible, additional techniques may need to be applied.

I italicized the part that seems questionable given the context it just provided (that the styles visually hide the content it claims visually indicates the start and end of an element).

I agree that it seems less overtly wrong, but it is still wrong. In a more subtle way.

@caugner
Copy link
Contributor

caugner commented Jul 5, 2023

@aardrian Can you please use the (new) "Report a problem with this answer on GitHub" link at the bottom of the AI Help answer, so that the team can follow up on the specific problem you're experiencing? Thanks! 🙏

@Ultrabenosaurus
Copy link

@aardrian Can you please use the (new) "Report a problem with this answer on GitHub" link at the bottom of the AI Help answer, so that the team can follow up on the specific problem you're experiencing? Thanks! 🙏

@aardrian's comment is valid in this thread.

Encouraging users to report each incident separately seems like "divide and conquer" tactics to obscure the true scale and prevalence of the problem. By chopping it up into smaller, specific blocks they can be "addressed" with cherry-picked responses as attempted earlier in this thread, only with less context due to being isolated single Issues, not contributing to the overall picture.

Like how @nyeogmi's previous issue was renamed to obfuscate the real problem being raised, and then closed without addressing said problem properly and prompting the creation of this Issue. And how #9208 was also renamed to obfuscate and downplay the very concerning issue being discussed.

@aardrian
Copy link

aardrian commented Jul 5, 2023

@caugner

Can you please use the (new) "Report a problem with this answer on GitHub" link at the bottom of the AI Help answer…

No. First, I am already giving my free labor by engaging on this (versus swearing off MDN) anḋ second, what @Ultrabenosaurus said.

@GabrielRavier
Copy link

GabrielRavier commented Jul 5, 2023

@sideshowbarker Please stop hiding or deleting comments in this repository. Thank you!

Given that one of the comments that were deleted was mine, I'd like to further emphasize that what @sideshowbarker said in #9230 (comment) is in fact completely accurate: my comment (along with other deleted ones) related entirely and only to minor cleanup and did not need to be present after that was cleared up. I have no issue at all with the deletion of the comment and fully agree that leaving it there would just have cluttered things up.

@nyeogmi
Copy link
Author

nyeogmi commented Jul 5, 2023

My understanding is that they were requesting an affirmative case to be made for it being structurally good at this task, rather than providing an individual question that it managed to answer sufficiently accurately (which does not say much about structural fitness for the task).

I don't think the value of good examples is literally zero. But if advocates of the feature are rejecting isolated examples of bad answers as evidence that the feature is bad, then I am reluctant to accept isolated examples of good answers as evidence that the feature is good.

Specifically: if we accuse one side of cherry picking w/o specific basis, we have to accuse both sides of cherrypicking and throw out all the examples. If we just take everyone's evidence at face value, we conclude that it produces both good and bad answers with roughly equal likelihood, which is more consistent with the case that it's bad.

@GabrielRavier
Copy link

My understanding is that they were requesting an affirmative case to be made for it being structurally good at this task, rather than providing an individual question that it managed to answer sufficiently accurately (which does not say much about structural fitness for the task).

I don't think the value of good examples is literally zero. But if advocates of the feature are rejecting isolated examples of bad answers as evidence that the feature is bad, then I am reluctant to accept isolated examples of good answers as evidence that the feature is good.

Specifically: if we accuse one side of cherry picking w/o specific basis, we have to accuse both sides of cherrypicking and throw out all the examples. If we just take everyone's evidence at face value, we conclude that it produces both good and bad answers with roughly equal likelihood, which is more consistent with the case that it's bad.

Well, I also think the fact that the side submitting "good examples" is actually submitting examples that seem superficially good but actually have large problems is also quite relevant.

@tanepiper
Copy link

This whole debacle is making W3CSchools more useful than MDN. That's embarrassing.

@meejah
Copy link

meejah commented Jul 15, 2023

Since we apparently want "AI" in everything, here is why ChatGPT-3.5 thinks it would be a bad idea to use ChatGPT4 to interpret or produce technical documentation [*]:


As a seasoned web developer, I can provide several reasons why relying solely on a language model like ChatGPT4 for technical documentation is not a good idea:

  1. Lack of Contextual Understanding: While language models like ChatGPT4 are impressive in generating human-like text, they lack contextual understanding. They may generate plausible-sounding
    explanations, but they might not accurately comprehend the intricacies and nuances of technical concepts. This can lead to misleading or incorrect information being provided.

  2. Incomplete or Outdated Information: Language models are trained on vast amounts of data, but they might not have access to the most up-to-date information. Technology and programming
    languages evolve rapidly, and relying solely on a language model might result in outdated or incomplete documentation. Developers need accurate and current information to ensure their
    work is efficient and secure.

  3. Limited Problem-Solving Abilities: Technical documentation often involves troubleshooting and problem-solving. While ChatGPT4 can generate responses, it lacks the ability to actively
    debug or diagnose issues. Developers need guidance that goes beyond generic explanations and includes specific solutions to common problems, which a language model might not be able
    to provide.

  4. Lack of Domain-Specific Knowledge: Technical documentation requires domain-specific knowledge and expertise. It involves understanding the underlying technologies, frameworks, and
    best practices. Language models like ChatGPT4 are trained on a wide range of topics, but they may not possess the deep understanding required to provide accurate and reliable technical
    documentation.

  5. Absence of Real-World Experience: Technical documentation should ideally be written by experienced developers who have hands-on experience in implementing the concepts they document.
    Language models lack real-world experience and practical insights, which are crucial for providing meaningful and effective documentation.

  6. Inability to Adapt to User Needs: Developers often have specific requirements or constraints that need to be considered while working with technical documentation. Language models
    like ChatGPT4 generate responses based on general patterns and examples, but they might not be able to adapt to individual user needs or provide tailored solutions.

While language models like ChatGPT4 can be useful for generating text and providing general information, they should not be relied upon as the sole source of technical documentation.
Developers should seek well-curated, accurate, and up-to-date documentation created by experts in the field to ensure they have the necessary guidance for their work.

[*] -- warning, this was produced by a LLM and may not be accurate.

@Be-ing
Copy link

Be-ing commented Jul 15, 2023

This is pointless.

I agree. These wall-of-text posts aren't going to get them to change. So, take it out of their control. If I did web development more than on rare occasions, I'd already be using the content and code to setup a fork of MDN on a new domain with all this LLM nonsense removed.

@faintbeep
Copy link

Sorry to be another voice chiming in, but it seems the thread is lacking examples of bad responses from AI Help, leading to a dismissal of the issues as being only related to AI Explain. So here's a very bad response I got from AI Help about reacting to element size changes:

To detect changes in the size of an element, you can use the MutationObserver API in JavaScript. Here's an example of how you can use MutationObserver to detect size changes of an element:

// Select the node that will be observed for size changes
const targetNode = document.getElementById("your-element-id");

// Options for the observer (which mutations to observe)
const config = { attributes: true, attributeFilter: ["style"] };

// Callback function to execute when size changes are observed
const callback = (mutationList, observer) => {
  for (const mutation of mutationList) {
    if (mutation.type === "attributes" && mutation.attributeName === "style") {
      console.log("The size of the element has changed.");
      // Perform your desired actions here
    }
  }
};

// Create an observer instance linked to the callback function
const observer = new MutationObserver(callback);

// Start observing the target node for size changes
observer.observe(targetNode, config);

// Later, you can stop observing
observer.disconnect();

In this example, we select the element that you want to observe for size changes using getElementById. Then, we define the options for the observer, specifying that we want to observe changes to the style attribute of the element.

The callback function is executed when size changes are observed. It checks if the mutation type is attributes and if the attribute name is style. If both conditions are met, it logs a message indicating that the size of the element has changed. You can perform your desired actions within this callback function.

Finally, we create an instance of MutationObserver linked to the callback function and start observing the target node for size changes. When you no longer need to observe size changes, you can call disconnect() on the observer to stop observing.

Remember to replace "your-element-id" with the actual ID of the element you want to observe for size changes.

MDN content that I've consulted that you might want to check:

MutationObserver
MutationObserver: MutationObserver() constructor

This is obviously wrong in every way, and I'd be surprised to see someone defend it as inaccurate but useful.

Full disclosure, I deliberately tricked the LLM by asking how to use MutationObserver for this purpose. But IMO that's a question a confused beginner is likely to ask, and the documentation should correct them rather than hallucinate a world in which they are correct.

@FinalStaticFox
Copy link

What is even the point of this feature? If I wanted to ask ChatGPT for explanations, I would just... you know... go on OpenAI's website and do it. With all the precautions it comes with, like staying skeptical of all output because of the non-negligible risk of it being confidently wrong.

But so far I never even felt the need to ask a LLM about MDN documentation, because it is well written and sufficient. So, at best this AI help button is useless, and at worst it is harmful, because of the risk that someone might end up misinformed by the output.

Also the idea that "incorrect information can still be helpful" is asinine. This is technical documentation, not Fox News.

@DoubleHyphen
Copy link

So let me get this straight… the objections to the original post were that the answers output are correct—for the most part?

@yoe
Copy link

yoe commented Jul 15, 2023

@faintbeep,

Full disclosure, I deliberately tricked the LLM by asking how to use MutationObserver for this purpose. But IMO that's a question a confused beginner is likely to ask, and the documentation should correct them rather than hallucinate a world in which they are correct.

This is actually a generic problem with ChatGPT: if you ask it something that is impossible, it simply cannot tell you that what you ask is impossible; instead, it will hallucinate a world wherein the thing you ask for is in fact possible and then come up with an overly elaborate answer, with full code examples and everything, but it will never work because it's not possible and it does not have the ability to tell you this.

I don't know whether this is a generic problem with LLMs or a specific problem with ChatGPT, but on all the interactions that I've had with it, I've never seen it tell me that a thing is impossible, and believe me, this was not for lack of trying.

In other words, ChatGPT is an XY problem amplifier. You want to do something with an API that wasn't made to do the something, you ask the tool in MDN how to do that, it will hallucinate some gibberish for you that makes it sound like it's possible, and now you're stuck even further in your XY problem.

This is not something MDN should be doing, ever, but it does, both with AI Help and with AI Explain.

@caugner
Copy link
Contributor

caugner commented Jul 15, 2023

Full disclosure, I deliberately tricked the LLM by asking how to use MutationObserver for this purpose. But IMO that's a question a confused beginner is likely to ask, and the documentation should correct them rather than hallucinate a world in which they are correct.

@faintbeep Thanks for being honest, and glad to hear you had to trick AI Help to get a seemingly incorrect answer. Could you please report the answer using the "Report a problem with this answer on GitHub" link to create a (public) GitHub issue for it? That issue will then contain both the question(s) you asked and the answer you received, which makes it easier to reproduce and follow-up. (So far we have received only 5 issue reports - all valid - since we added the link.)

It's important to mention that had you asked if you can detect size changes using MutationObserver instead (e.g. "Can I detect size changes with MutationObserver?"), AI Help would have told you that you cannot and pointed you to ResizeObserver. And my question "How can I detect size changes with MutationObserver?" was just rejected by AI Help. So I'm curious how you phrased that question.

It seems you insisted specifically on a solution with MutationObserver, and AI Help gave you what seems to me like a possibly valid solution to a subset of size changes (namely through style attribute changes, which may effectively change the size of an element), without mentioning this limitation though. Luckily there are the two links that allow the beginner (who, kudos, already heard about MutationObserver) to double-check, deepen their knowledge about MutationObserver and discover ResizeObserver through the "See also" section. Even if you don't find this helpful, maybe we can agree that there is some helpfulness in this?

But seriously, if you actually report this as an issue, we can look into what improvements can avoid this kind of scenario. For example, we could update the MutationObserver page to better explain its differences to ResizeObserver, or add an overview page for all the observers with their respective use cases could help maybe it already exists, then we could look into why it wasn't deemed relevant enough, and ensure it's passed as context). And last but not least, it's an option to update our system instructions to prevent GPT-3.5 from suggesting solutions using unsuitable feature, even if the user specifically asked for it.

PS: Just to make this clear once and for all, we are aware of the limitations of LLMs, and we know that the LLM doesn't understand the question or these instructions, and only uses statistics to come up with the next words. However, the crux is that it works surprisingly well, which is the reason why LLMs can provide value for users, why AI Help's answers are mostly helpful, and why we experiment with an LLM as part of this beta feature. The success of this experiment is yet to be evaluated, and all feedback is going to be taken into consideration.

@acdha
Copy link

acdha commented Jul 15, 2023

However, the crux is that it works surprisingly well, which is the reason why LLMs can provide value for users

You’ve asserted this but not supported the claim. Even if we ignore the inaccuracies, the positive examples provided have mostly been disorganized and turgid, so I think the better way to convince people would be by having real human testimonials: survey learners in the target audience and see how helpful they found it for solving real problems.

@meejah
Copy link

meejah commented Jul 15, 2023

You’ve asserted this but not supported the claim. Even if we ignore the inaccuracies, the positive examples provided have mostly been disorganized and turgid, so I think the better way to convince people would be by having real human testimonials: survey learners in the target audience and see how helpful they found it for solving real problems.

Definitely a better approach than asking LLMs to evaluate each other!

Perhaps this could be improved further: divide the target audience into two and give them all the same (short) task. One group gets to use only MDN for help and the other gets to use MDN + "AI Help". Have professionals evaluate the quality of the results from both groups.

@acdha
Copy link

acdha commented Jul 15, 2023

Perhaps this could be improved further: divide the target audience into two and give them all the same (short) task. One group gets to use only MDN for help and the other gets to use MDN + "AI Help". Have professionals evaluate the quality of the results from both groups.

The sad part is that old Mozilla could have had volunteers to do this if they were training an open LLM and approached this as a research project without a predetermined outcome. As a former donor and contributor, “help Open AI pro bono” is just not as compelling a pitch.

@Xkeeper0
Copy link

I decided to test the "ask it for something impossible and it will answer as if it was possible" thing above by asking a question I've had myself many times over the years: How do I use CSS selectors to select an element only if it contains a specific child element?

The AI response not only gets the asked question backwards, but then it answers the rewritten question (which misses the entire point)

image

A Google query, css select element if it has a specific child, gives its first result to this Stack Overflow answer; reading the results quickly instills the idea that there likely isn't anything usable just yet but might be in the future, and points to what it will likely be, the ":has" selector... amusingly, only not supported by Firefox right now.

For curiosity's sake I decided to reformat the question and try again; by this point I know it won't give me an accurate, correct answer, but once again it manages to get basic details wrong:

image

The only "trick" involved in this was asking it a question I already knew the answer to.

@workingjubilee
Copy link

workingjubilee commented Jul 15, 2023

How can anyone validate the information provided by an AI assistant if the sites they were supposed to validate that information against are the ones providing that "AI assistance"? How do they know who to trust? This problem most severely negatively affects those who do not have an abundance of spare time, energy, and knowledge to validate the output of AI tools, which are the people who most need assistance from things like MDN.

MDN's AI help establishes the baseline of trustworthiness of help from MDN, because it is so much lower than the rest of the site, and if it is trusted as a vector of information, there is no reason to believe such information has not been incorporated elsewhere on the site in less obviously perceived ways. No one is auditing the edit history of every single article here, and the obvious next step to happen is "the AI starts making edits". Now that you've made it clear you are happy to incorporate this tool into the text displayed for individual articles via "AI Explain", it's not enough to roll things back to "AI Help". The entire thing has to go, otherwise I have no reason to assume you're not just going to reimplement AI Explain later, when things quiet down, as everyone tends to. Thus in order for MDN to be useful, I will start having to audit the edit history of every article, which is harder for people to do now that it's a git history (git has notoriously poor UX).

Defensive maneuvers against misinformation should not cost more than the misinformation costs to generate. Otherwise the misinformation wins. Checking to see if "AI Editing" was enabled when I was away every time I reference or cite MDN is not cost-efficient. So the only defensive maneuver that makes sense is to assume you've abandoned your responsibility to providing reliable and accurate information, as that is the easiest explanation for why a tool that does not provide reliable and accurate information was incorporated into a website that does provide reliable and accurate information. "It generates value" is not enough if it raises the cost of using the resources on MDN overall.

@nyeogmi
Copy link
Author

nyeogmi commented Jul 15, 2023

(Periodic reminder: this thread has literally no multiplier effect and the devs aren't listening to you. If you want anything to happen, post about it on a platform that has a multiplier effect.)

@ToxicFrog
Copy link

As announced in the Community Call invitation, we're sharing our answers for anyone who couldn't attend. We'll be adding them in the individual GitHub Discussions threads.

Is there a timeline for this? When can we expect answers and/or the transcript to be posted?

@obfusk
Copy link

obfusk commented Jul 18, 2023

I keep seeing the proponents of this conflate seeming to be helpful with actually being helpful and assuming that there is no meaningful difference between inaccurate information provided by well-meaning people (e.g. on stack overflow) and the kind of inaccurate information that an LLM can produce.

See my comment here.

Pebbly-Wisdom added a commit to Ambiguous-Epilogue/github-drama that referenced this issue Jul 24, 2023
MDN AI drama:
Archive and cite reputable journalism sources, such as The Register.

Links to the GitHub issues:
mdn/yari#9208
mdn/yari#9230
@DavidJCobb
Copy link

DavidJCobb commented Jul 24, 2023

It seems you insisted specifically on a solution with MutationObserver, and AI Help gave you what seems to me like a possibly valid solution to a subset of size changes (namely through style attribute changes, which may effectively change the size of an element), without mentioning this limitation though.

There is no world in which "tell me when inline styles change" or even "tell me when size-related attributes change" could ever be an adequate answer to "tell me when the size of a typical element changes." The latter is asking about an effect; the former focuses only on one cause among so, so very many. (And it's overbroad in its wrongness, too: it doesn't even double-check offsetWidth and so on for actual changes; every style change is assumed to be a size change!) Calling this "possibly valid" is a breathtakingly flimsy rationalization.

And last but not least, it's an option to update our system instructions to prevent GPT-3.5 from suggesting solutions using unsuitable feature, even if the user specifically asked for it.

Explain how.

PS: Just to make this clear once and for all, we are aware of the limitations of LLMs, and we know that the LLM doesn't understand the question or these instructions, and only uses statistics to come up with the next words.

You say this, but it directly contradicts your last remark. You can "update your system instructions" to overcome the fundamental nature of LLMs? You're acknowledging the limitations of LLMs but refusing to actually consider them, and this is evident in everything you've been saying: it's evident in you projecting confidence that with the right prompt, the right prayer to the toaster oracle, you can get it to reliably correct mistakes; it's evident in you assuming that someone definitely has to be acting in bad faith and insisting that your genius machine provide a wrong answer, for the machine to do so.

(The LLM provided a correct answer when you asked it, so clearly, it "knows" the answer, right? If it gave someone else a wrong answer, it must be because shenanigans are afoot. It can't be that innocent enough variations in wording or phrasing -- variations you simply haven't thought of and tested -- might trip up a program that reacts entirely and blindly to wording with no mental model of what words actually mean.)

And let's not forget the context of you failing to actually demonstrate the awareness you say you have: multiple GitHub issues with hundreds upon hundreds of comments' worth of explanations of LLMs' limitations, presented and explained in just about every way possible, in some cases with examples pulled from MDN itself.

At best, assuming good faith as hard as I can, you've shown an appalling level of myopia that should immediately disqualify someone from making or in any way being involved in any noteworthy decisions about how one of the web's most critical developer documentation sites should be run; but it's becoming increasingly difficult to believe that this is the thoughtlessness it looks like.

@Xkeeper0
Copy link

Xkeeper0 commented Jul 26, 2023

I feel it's worth pointing out what one of the community call answers had to say: https://github.com/orgs/mdn/discussions/414#discussioncomment-6541058

It's MDN's fault for completely failing to listen to the community here and to consider them when developing new features for MDN, and that's why so many people felt the need to express their concerns.

An extremely vocal small set of our community is not the entire MDN community. We thank you for your feedback, and concern, and we’re taking substantial portions of it on board.

We're just "an extremely vocal small" minority, apparently, because anyone who simply hasn't responded clearly finds AI integration to be a flawless addition.

@obfusk
Copy link

obfusk commented Jul 26, 2023

And this feature was built for a subset of our community not particularly represented on the issues discussing this feature, and whom many people commenting on the feature entirely forgot about: learners and those not yet capable of finding the correct information on MDN.

I'm pretty sure we've actually expressed a lot of concern that adding more incorrect information to MDN will not help those "not yet capable of finding the correct information" instead of forgetting about them; quoting myself:

This has me worried. We've raised multiple concerns about the inaccuracy of the LLM output. Saying "you can ignore it" just shifts the responsibility for determining whether the output is inaccurate and should be ignored or fact-checked to the users, which is especially problematic given that:

Those most likely to want a simple summary of technical documentation are those least likely to determine the truth and accuracy of an LLM's output supposedly explaining the content they are not knowledgeable about

@alahmnat
Copy link

we are aware of the limitations of LLMs, and we know that the LLM doesn't understand the question or these instructions, and only uses statistics to come up with the next words.

See, you say that, but then your very next words are

However, the crux is that it works surprisingly well

No, it doesn’t. It appears to work surprisingly well, but you can never be certain whether you’ve gotten the one true book containing your life’s story or one of the ones that’s just 60,000 q’s in a row from the infinite library of every combination of words ever made, and that is fundamentally the problem.

As for “incorrect answers can be helpful,” I’d like to go on record as saying that I find incorrect answers given to me by a tool that is supposed to give me correct information to be nothing but infuriating. I don’t even like getting wrong information from Stack Overflow answers because now I’m having to waste more of my time trying to figure out why it’s not working as expected. I’m sure we’re all more than familiar with adapting Stack Overflow answers that sort of answer the same question we’re trying to ask, but that, too, is a fundamentally different process than “ask the magic answer box my exact question and get an exact answer that should work”.

Finally, I think if you really wanted to impress upon your users the limitations of these tools, you wouldn’t call them “AI” anything. You’d call them “LLM Help” and “LLM Explain”. “AI” has so many sci-fi implications about sentience and reasoning and understanding embedded in it that expecting people to see “AI” in the name of a tool and think “box that makes convincing-sounding sentences” is, frankly, laughable. Despite disclaimers plastered every which way, people are still using ChatGPT to do things like write translations and write legal briefs full of hallucinated court case citations. People will not use these tools the way you expect them to, doubly so if you keep insisting on calling them something they very blatantly are not: artificial intelligence.

@resuna
Copy link

resuna commented Jul 28, 2023

Finally, I think if you really wanted to impress upon your users the limitations of these tools, you wouldn’t call them “AI” anything. You’d call them “LLM Help” and “LLM Explain”. “AI” has so many sci-fi implications about sentience and reasoning and understanding embedded in it that expecting people to see “AI” in the name of a tool and think “box that makes convincing-sounding sentences” is, frankly, laughable.

In a fair world the people who introduced these programs by referring to them as AI would have burst into black flames for the sheer hubris of it all. They are parody generators. Nothing more.

@ghalfacree
Copy link

I am aware that management has long moved on and am not expecting a response, here, but I wanted to raise this nevertheless just in case someone who can effect change sees it by chance.

The paper Who Answers It Better? An In-Depth Analysis of ChatGPT and Stack Overflow Answers to Software Engineering Questions, Kabir et al, 2023 (preprint) delivers exactly what its title suggests. It finds that ChatGPT answers for software engineering questions are wrong 52 per cent of the time - to within a margin of error the same as tossing a coin.

But it goes deeper than that. Because ChatGPT and other LLMs write very, very convincingly, their answers are often preferred over human equivalents (from Stack Overflow, in the case of Kabir et al) - 39.34 per cent of the time, in this case. Of the preferred answers, over 77 per cent were wrong.

So, given MDN is using the same technology, I believe it would not be unreasonable to assume the same holds true: of those users clicking the button to report an answer as "helpful," as many as 77 per cent may have done so on an answer which is wrong. But, because they're unfamiliar with the subject matter and ChatGPT's output is designed to sound helpful, they have no idea they're being led up the garden path.

@noahbroyles
Copy link

noahbroyles commented Aug 21, 2023

In my professional opinion, LLMs have no place being included on MDN, where developers come looking for trustworthy technical information.

As someone who has used ChatGPT for technical questions numerous times, I know from experience that although it can be quite useful sometimes, it very frequently spews out misinformation and leads you down a rabbit hole of plausible looking garbage. Often it can take more time trying to get ChatGPT to arrive at a working solution that it would to just use a trustworthy source of documentation (like MDN is supposed to be).

This is very confusing and frustrating, especially for newer developers. The things that LLMs can actually answer accurately (most of the time), are simple, well known things that a quick Google search would have sufficed for. There is a reason why ChatGPT is banned on StackOverflow:

Overall, because the average rate of getting correct answers from ChatGPT and other generative AI technologies is too low, the posting of answers created by ChatGPT and other generative AI technologies is substantially harmful to the site and to users who are asking questions and looking for correct answers.

I also find it very concerning that newer developers turn to ChatGPT and AI in general as a source of guidance. It is too easy for developers to use it as a crutch. This is dangerous because unlike a calculator being used in mathematics, LLMs/ChatGPT do not always present factually accurate outputs. While using a calculator will always provide an accurate answer for the problem entered, LLMs have no such guarantee. Using GPT is not just detrimental to developers because it reduces their ability to do their own work, but also because it introduces a higher probability of error and often can waste a lot of time.

TD;DR: LLMs are not a good source of factual information, and as such MDN shouldn't expect to be considered a reliable source while they have it included on their website.

@kyanha
Copy link

kyanha commented Sep 27, 2023

I know that no action is going to be taken on this.

But I would be remiss if I didn't provide this link (not written by me): https://www.zdnet.com/article/third-party-ai-tools-are-responsible-for-55-of-ai-failures-in-business/

@megmorsie
Copy link

This is dangerous because unlike a calculator being used in mathematics, LLMs/ChatGPT do not always present factually accurate outputs. While using a calculator will always provide an accurate answer for the problem entered, LLMs have no such guarantee.

Yes! I just made this exact comparison to someone recently. So often the applications people are pushing LLMs for already have solutions (keyword searches, math calculations, boilerplates/templates, etc). And those solutions aren't using an insane amount of processing to get results, sapping communities of potable water, requiring a precarious data training labor pool, etc. The externalities of "AI" and LLMs are massive and it's so frustrating that people hand-wave these important factors away on top of the technology itself being demonstrably worse than things we already have.

@github-actions github-actions bot added the idle label Jan 24, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
🧑‍🤝‍🧑 community contributions by our wonderful community idle
Projects
None yet
Development

No branches or pull requests