WARNING: Bad News for LLM Fine-Tuning

6,331
0
Published 2024-07-08

All Comments (21)
  • @unclecode
    Hey, as usual, such a good paper you brought up. I read this paper a few days ago, and tbh I am a bit skeptical, let me share my points, like to know yours. First, I totally agree that what fine-tuning is really about. It's not about teaching the model new facts, but helping it access its existing knowledge more efficiently or in specific ways. Like if we want the model to always respond in JSON format, we'd fine-tune for that, we're not teaching it new info, just tweaking how it presents what it knows. Now, I've got three main concerns with this study: 1/ They didn't mention how much of the model they actually fine-tuned. If they used something like LoRA, which is common, they're only training a tiny fraction of a massive model. That's a major flaw in their methodology because fine-tuning a small portion of model with unknown knowledge could just be adding noise to the model's activations, leading to hallucinations. This could invalidate their whole claim. 2/ They only tested on a huge model like PaLM 2-M, which probably has over 340 billion parameters (If I am not wrong). What if the results are totally different for smaller models, like a 7B one? We really need to see this tested on a range of model sizes to draw any meaningful conclusions. 3/ What if they fine-tuned most of the model, like 80%? That'd be more like pre-training, and the results could be way different. They didn't explore this scenario at all. These gaps make me skeptical about how useful or generalizable their findings really are. It feels like they missed some crucial aspects of how fine-tuning actually works in practice. I couldn't find these details in their paper. To be honest, I didn't go through in details, perhaps have to check it again. Kudos to your taste in selecting papers for your channel.
  • @fhsp17
    Thumbnail: STOP FINE-TUNING. Opens the video. It's discussing a google clickbait title paper. They state way more than they are entitled to by this paper. As if it was a general answer for every use case and every method. It's just for their own controlled closed-book setup using precisely whatever method they used to finetune (which should be meticulously described in the paper for validation, because it's the only one the results are useful for). No more. Lol.
  • Great. Therr is also a late june paper appeared in nature journal, semantic entropy applied to detect hallucinations. You can keep it in backlog for calmer weeks.
  • I have to say I am not convinced. We would need to see more "examples", where this adversely affects different models. Also, my guess is that the method used to fine tune will have a different effect. Another issue, is that I am not seeing too much in the way of "specifics". I would like to be able to see the example set of all questions with answers (without fine tuning) vs hallucinated responses for the fine trained model to see how it correlates with their definitions of hallucinations.
  • @dlyog
    Great work and completely agree
  • @aks8285
    This i could correlate with my experience with vision models, and they also perform similar on fine tuning like you said.
  • @testales
    I wonder what the implications of this are for finetuning diffusion models or whether that is a completely different story.
  • @DB-Barrelmaker
    I thought since last year that the miracle of llms was that they managed to understand referencing AKA linguistic pointers. The increase in hallucination upon fine-tuning clearly points to a negative on that front. That means the door is open!
  • @KevinKreger
    I can enhance hallucinations with one ICL example if there is a near void in that space.
  • @SonGoku-pc7jl
    yes, better example of fine-tunning i see, is for the kind of style of speak, similar to somebody you make fine-tuning for example with alot of transcript interviews. As you said, it serves the style
  • @medirobot96
    How to know whether the data we use in fine tuning is unknown to llm or not?
  • @Tony-cw6om
    Where can we find similar papers for knowing what's happening and learn new things?
  • @Basant5911
    Fine tuning create misalignment in weights, hence do it with caution.
  • @therobotocracy
    How about diffusion models? Fine tuning is night and day!
  • @freeideas
    I find this disturbing. How, then, do we give an LLM new knowledge? RAG makes prompt size quite a bit larger and more expensive, and there are a few pieces of information that will be fed to the LLM in the prompt over and over. Seems way more efficient to teach the LLM. One example: baby otters are very good swimmers but they can't dive because too much air is embedded into their fur. This is too obscure for most LLMs to know, but this information will dramatically affect the quality of reasoning about the lives of baby otters. Do I need to feed that plus 1000 other obscure truths into an LLM's prompt every time the LLM is used? Apologies if the answer is already in the video, but it was not clear to my simple mind. :)