Low-rank Adaption of Large Language Models: Explaining the Key Concepts Behind LoRA

101,746

3,307 0

Published 2023-04-30

In this video, I go over how LoRA works and why it's crucial for affordable Transformer fine-tuning.

LoRA learns low-rank matrix decompositions to slash the costs of training huge language models. It adapts only low-rank factors instead of entire weight matrices, achieving major memory and performance wins.

🔗 LoRA Paper: arxiv.org/pdf/2106.09685.pdf
🔗 Intrinsic Dimensionality Paper: arxiv.org/abs/2012.13255

About me:
Follow me on LinkedIn: www.linkedin.com/in/csalexiuk/
Check out what I'm working on: getox.ai/

All Comments (21)

@chrisalexiuk 1 year ago

Implementation video: https://youtu.be/iYr1xZn26R8
@mahandolatabadi2600 1 year ago

The high-level intuition you gave at the end of the video was great. As a mathematician I'm aware of the theory behind low-rank decomposition and the classic applications, but the way it is applied in the context of LLMs is interesting.
@AlexG4MES 1 year ago

This was beautifully explained. As someone who relies exclusively on self learning from online materials, the mathematical barriers of differig notations and overly complex wording is the most time consuming challenge. Thank you for such a distilled explanation, with only the notation and wording that makes sense for an intuitive and initial dive understanding. Subscribed!
@user-qy9sx7bn1l 7 months ago

I particularly appreciate the depth of research and preparation that clearly goes into this video. It's evident that you're passionate about the topics you cover, and your enthusiasm is contagious. Your dedication to providing accurate information while maintaining an accessible and entertaining format is commendable.
@Ali-ts6po 1 year ago

This is the first video I watched from your channel, and I loved it! now I have 12 tabs open, to watch rest of your videos. Simply Amazing! (and WOW, I could not imagine LoRA can be so good. it will save me a ton of resources in developing the product I am working on.
@deeplearning5408 11 months ago

Best explanation of the whole YouTube so far. Great job!
@mdraugelis 5 months ago

Thanks for the great intro to LoRA. I liked your graphics and your take-aways, also you energetic presentation :)
@kartikpodugu 9 months ago

I came across this video a month back, at that time, I didn't understand your excitement, though I understood the technique. Presently, I have better understanding of LLMs, and how they are trained and fine tuned for down stream tasks, and now I share your excitement.
@mikesecret2731 1 year ago

Thank you so much Chris! This was an awesome video and has stopped me from going down the fine-tuning rabbit hole! Just dipping my toe into AI so it’s really great to find an informative channel like yours!
@amirmohammadi572 1 year ago

Thank you for thus nice, deep yet easy to follow explanation of LoRA. Nice job
@user-qn8zn4rj4t 10 months ago

Woow the best explanation I've found so far.Coming from mathematics background it's really amazed me. Thank you
@nmirza2013 5 months ago

Thanks a lot for this amazing explanation. I am fine tuning Mixtral 8x7B and using QLoRA have been able to perform test runs on Colab Pro using A100 machine.
@AhmedKachkach 1 year ago

Really simple explanation without skimming through the intuition and other important details. Subscribed for more content like this :)!
@necbranduc 1 year ago

Great explanation and first video I've seen from your channel. Somehow, I had the impression that you have 624K subscribers. Was shocked to see there's no actual "K" in there, when browsing your whole video history and seeing your channel is only ~2 months old. You'd deserve that "K" from my pov. Looking forward to the implementation video! (new subscriber)
@OwenIngraham 1 year ago

bears repeating that you should continue doing these videos, wish I had your communication skillz
@swfsql 5 months ago

Thanks a lot for this video! This is the first time I see a good explanation on this LoRA thing! 14:45 One minor note, is that it would indicate that the model has a low intrinsic info only if you could get rid of the original weights and just stick to the lora. That is, during the lora finetune training, if you could get away with while decaying the original (non-lora) weights down to zero. So I think that what has a low intrinsic info is "what you have to change from the base model" for your finetuning needs - but not the base model itself.
@szymonpogodzinach2495 5 months ago

Man that video is fireee! Thank you for your work!
@oliverhu1025 1 year ago

Great explanation! Keep up the great work!!
@douglasschonholtz8683 1 year ago

The AutoGPT team is learning about LoRA and I recommended this because it is such a clear explanation. Thanks for the awesome resource!
@VitalContribution 1 year ago

Love your enthusiasm and great explanation!