MIT 6.S191 (2023): Recurrent Neural Networks, Transformers, and Attention

654,669
0
Published 2023-03-17
MIT Introduction to Deep Learning 6.S191: Lecture 2
Recurrent Neural Networks
Lecturer: Ava Amini
2023 Edition

For all lectures, slides, and lab materials: introtodeeplearning.com/

Lecture Outline
0:00​ - Introduction
3:07​ - Sequence modeling
5:09​ - Neurons with recurrence
12:05 - Recurrent neural networks
13:47 - RNN intuition
15:03​ - Unfolding RNNs
18:57 - RNNs from scratch
21:50 - Design criteria for sequential modeling
23:45 - Word prediction example
29:57​ - Backpropagation through time
32:25 - Gradient issues
37:03​ - Long short term memory (LSTM)
39:50​ - RNN applications
44:50 - Attention fundamentals
48:10 - Intuition of attention
50:30 - Attention and search relationship
52:40 - Learning attention with neural networks
58:16 - Scaling attention and applications
1:02:02 - Summary
Subscribe to stay up to date with new deep learning lectures at MIT, or follow us @MITDeepLearning on Twitter and Instagram to stay fully-connected!!

All Comments (21)
  • I just can't believe how amazing the educators are and damn !! they're providing it out here for free... Hats off to the team !!
  • @deepakspace
    I am a Professor and this is the best course I have found to learn about Machine learning and Deep learning....
  • @joxa6119
    Over all videos on Youtube that explained about Transformer architecture (including the visual explanation) , this is the BEST EXPLANATION ever done. Simple, contextual, high level, step by step complexity progression. Thank you the educators and MIT!
  • @tgyawali
    Thank you so much MIT and instructors for making these very high quality lectures available to everyone. Students from developing countries who have aspirations to achieve something big is now possible with this type of content and information!
  • @gemini_537
    Summary by Gemini: The lecture is about recurrent neural networks, transformers, and attention. The speaker, Ava, starts the lecture by introducing the concept of sequential data and how it is different from the data that we typically work with in neural networks. She then goes on to discuss the different types of sequential modeling problems, such as text generation, machine translation, and image captioning. Next, Ava introduces the concept of recurrent neural networks (RNNs) and how they can be used to process sequential data. She explains that RNNs are able to learn from the past and use that information to make predictions about the future. However, she also points out that RNNs can suffer from vanishing and exploding gradients, which can make them difficult to train. To address these limitations, Ava introduces the concept of transformers. Transformers are a type of neural network that does not rely on recurrence. Instead, they use attention to focus on the most important parts of the input data. Ava explains that transformers have been shown to be very effective for a variety of sequential modeling tasks, including machine translation and text generation. In the last part of the lecture, Ava discusses the applications of transformers in various fields, such as biology, medicine, and computer vision. She concludes the lecture by summarizing the key points and encouraging the audience to ask questions.
  • As a CS student from University of Tehran, you guys don't have any idea how much such content could be helpful and the idea that all of this is free make it really amazing. Really appreciate it Alexander and Ava. Best hops.
  • @lazydart4117
    Watching those MIT courses alongside course at my Uni in Poland, so grateful to be able to experience such a high quality education
  • @xvaruunx
    Best end to the lecture: “Thank you for your attention.” ❤😂
  • One of the best lectures I have seen on Sequence Models, with crystal clear explanations! :)
  • @sorover111
    ty to MIT for giving back a little in an impactful way
  • @gidi1899
    This is my favorite subject :) (following is self clarification of said words that feel exaggerated) 4:08 - binary classification or filtering is a sequence of steps: - new recording - retrieval of a constant record - compare new and constant record - express a property of the compare process So, sequencing really is a property of maybe all systems. While "wave sequencing" is built on top of a Sequencer System, that repeatedly uses the "same actions" per sequence element.
  • @MrPejotah
    These are some spectacular lessons. Thank you very much for making this available.
  • @hamza-325
    I watched and read a lot of content about Transformers and never understood what are those three Q, K, and V vectors doing so I coulnd't understand how attention works, until today when I watched this lecture doing the analogy of YouTube search and the Iron Man picture. Now it became much much clearer! Thanks for the brilliant analogies that you are making!
  • Your explanation of attention took me 2 revisits to this video to truly truly understand! But now when I did, my love for deep learning got stronger :)
  • Extremely informative, well structured and paced. A pleasure to watch and follow. Thank you.
  • This is what we need in this day and age, the teaching is amazing and can be understood by people of variable intelligence. Nice work and thanks for this course.
  • @roy11883
    Indeed commendable the way this lecture has been ordered and difficult topic like self-attention has been lucidly explained. Thanks to the instructors, really appreciated.
  • @ViniciusVA1
    This is incredible! Thanks a lot for this video, it’s going to help me a lot in my undergrad reasearch :)
  • Thank you for this amazing content! There are many concepts discussed intuitively!
  • @Djellowman
    She absolutely killed it. Amazing lecture(r)!