AI beats multiple World Records in Trackmania

2,628,336
0
Published 2024-03-13
I trained an AI in Trackmania with reinforcement learning, and made it compete against human World Records on 3 different pipe tracks.

Between research, programming and editing, these videos take a long time to produce. Any support on Patreon will help me to spend more time on that in the future :)
• Patreon : www.patreon.com/Yoshtm

Contact
• Discord: yosh_tm
• Twitter:  yoshtm1  
• Mail: [email protected]

The maps shown in this video can be downloaded on TMX and played in Trackmania Nations Forever:
• 1) One Hella Long Pipe (It requires TMUnlimiter !) - tmnf.exchange/trackshow/8484272
• 2) Calm Down - tmnf.exchange/trackshow/1293088
• 3) Are You Serious ?! - tmnf.exchange/trackshow/5152869

Wirtual made a nice video about his world record on the track Calm Down, don’t hesitate to watch it :    • How I Beat Trackmania's Hardest Patie...  
More generally, if you want to learn more about this game, check out his streams and videos, he makes fantastic content !
• twitch.tv/wirtual
youtube.com/@Wirtual

You can find a list of the musics I used at the end of the video. Special thanks to Beik Poel for allowing me to use their song En aften ved svanefossen :    • En aften ved svanefossen  

Thanks to Donadigo, for TMInterface !
donadigo.com/tminterface/

All Trackmania Nations Forever tricks TIERLIST – Fliks.
   • All Trackmania Nations Forever tricks...  

Into The Breach vs. Karmine Corp | Semifinal 2 | World Championship 2023 – Trackmania World Tour
   • Into The Breach vs. Karmine Corp | Se...  

All Comments (21)
  • @yoshtm
    Thanks for watching! Some additional details not explained in the video, which might help to better understand the irregularities observed: - I didn't show all the AI inputs in the video to keep things simple. In reality it has access to more information, such as x,y,z velocity, velocity rates, roll-pitch-yaw rates, etc. But maybe it's still missing some crucial information, it's hard to know. - The irregularities observed are not due to hardware or framerate issues. All the things I'm showing in the video are made with a tool called TMinterface. This tool allows me to inject action commands into the game at precise timings, in a way that is 100% repeatable. The same sequence of action on the same map will aways lead to the same outcome, even on a different computer. It's completely deterministic. And it's used a lot for Tool Assisted Speedrun (TAS), you can find many examples of that on youtube. I have a bit of extra footage that I couldn't fit into this long video. I plan to post some of this extra content on my Patreon in the next weeks. Any support there is a great motivation to keep making these videos :) patreon.com/Yoshtm
  • @tylerlarsen1842
    I came here to see AI destroy humans on Trackmania, but I got a 37 minute essay on how pipes in Trackmania are an exercise in chaos theory instead. 10/10
  • @night0x453
    PhD Student in Deep RL here. The behavior at the end seems mathematically chaotic and might lead you to the conclusion that you cannot predict deterministically the behavior of the car. However, that does not mean you can't improve the performance by a long shot with a few simple tricks. You are doing model-free reinforcement learning, which basically means that you don't need to predict what is going to happen exactly (and extremely hard here), you just need to figure out what actions are the best in this situation. In most environments, this is actually much easier to do (and a reason why learning an accurate model is often harder than just straight up optimizing for performance with model-free). Second problem is that you are using an RL algorithm and RL assumes you are in a Markov Devision Process, ie that you have full observability of the world, ie that the probability of taking the next action ONLY depends on the current information that you have, and not the past, ie that adding information about the past does NOT help ou make better decisions. HOWEVER, you are NOT in an MDP (Markov Devision Process) here since you lack critical information about the state to take decision, as many pointed out. In practice, what you can do will most certainly improve performance: - either manually add all the missing info (speed, rotations, ...) - or more simply, create a concatenated vector of the previous N past states and inputs and use that as input to your NN (LSTM/RNN works too but is more complex and often not necessary when we are dealing with a few steps of history). - Use more layers (deeper network = more complex function can be learned by the network) - Sticky/random actions: introduce manually randomness in the action of the agent. For example repeat randomly the past action with a small probability. Do NOT decrease this probability with time. What this means is that the agent will have to learn the whole time in a now stochastic (ie random) environment and will have no choice but to cope with the inherent randomness that you added, making it largely more robust than an deterministic agent. This is a common flaw of fully deterministic agent in these kind of environment btw. - Look at extensions like Max Entropy RL (Soft Actor Critic for instance), where you both maximize reward and entropy of the policy. Some papers proved that it makes the learned policy more robust to out-Of-Distribution perturbations, ie perturbations that were never seen during training. In your case, this will help the agent lean "recovering behaviors" to recover from deviations caused by physics bugs/randomness/whatever. - Max Entropy RL will also help with exploration a lot, which might help the AI be more creative in finding solutions. - Look at other tricks that improve performance "for free". I don't know how much you implemented already, but a basic start is looking at Rainbow DQN, PPO.... But you are already using it probably..?
  • @no-no-noku
    The editing, the music choice, the compactness of the information while still remaining coherent, the pacing. This video is a masterpiece! Even small things like using the butterfly-shaped pinhole transition when bringing up that topic for the first-time, or how the hue of the cars match up with what generation they're from, so many little details that I could write up my own essay detailing every little thing you put extra effort into. I was enraptured the whole time. Bravo!
  • @MinecraftSubi
    I love your video style and how you're displaying the learning process. Thanks for the great video!
  • @TheWizardsOfOz
    Waiting for all the Trackmania YouTubers to react to this.
  • @bettaTM
    So you’re saying my world record fails are actually the butterfly effect’s fault…
  • @Arturius1987
    That ending sequence was an absolute joy. Prefect syncing with the music
  • @tomepedro2560
    This is one of the best videos I've ever seen. Amazing work!
  • @sreynoldshaertle
    Roboticist here. Some regions of configuration space are more chaotic than others; the demo you did where you perturbed a car that was about to fall off demonstrated a smooth increase in success rate as you moved the perturbation further away from the point at which outcomes diverged, suggesting that the failure was in a fairly narrow chaotic region and moving out of that region made the system more stable and enabled more consistent success. This is what unnamed did to complete the jumps track: Found a strategy that consistently avoided highly-divergent regions of configuration space. One problem is that, if the display of the inputs is representative, the AI doesn't have the right sensors. Specifically, the AI has its position and speed but not its _velocity_; it doesn't know how much of its speedo reading is in each component of down the track vs. across the track vs. away from the track. More critically, it also doesn't have roll-pitch-yaw rates. It only knows its current orientation, not the rate at which its orientation is changing. I suspect that the car also needs to know the difference between a simple pipe elbow, a tee, and a cross. Even if the extra geometry isn't in the direction of its travel it'll almost certainly still interact. Finally, consider increasing the numerical precision of your network. This is one of those cases where running floats vs. doubles might actually matter.
  • @yuxufgVOD
    The song that played when you showed Wirtual's WR was just peak video making lol
  • @Hurricane6220
    What an insanely good video! I haven't even played Trackmania in many years... but the way you explain the details and visualize everything just makes this soooo interesting! 😌👏👏👏
  • @kirdow
    I’m not a TrackMania player but I love watching TrackMania videos. I also love game AI vids, and this video really did it for me. Being fascinated by chaos theory I’ve been screaming at my screen throughout the whole video ”bro it’s chaos theory”, and then you even mention it yourself. Top tier content, it’s what I’m here for. You just gained a new subscriber 😄
  • @xardenas2636
    28:50 “Now let’s have the AI drive upside down” had me rolling. For a hot second I really thought you were about to blow my mind once again 🤣
  • @MagicWazam
    28:40 was a comedic masterpiece, the editing, the pacing, incroyable!
  • @gui6987
    Incredible video, one of the best I’ve seen in a long time, thank you !
  • @AlmightyRawks
    Apart from the excellent research, I'd just like to comment on how amazing the editing was for this! That kept it interesting AND funny!
  • @qweytr9964
    Your analysis about chaos theory is exactly what is going on. And TrackMania's chaotic physics are very familiar to me from another TrackMania project. The best way to control that chaos is by going slower, as the deviations caused by the bugs are less severe at lower speeds. Perhaps giving less reward for speed and more reward by having a high chance of getting far would help with consistency.
  • Yosh: "I am going to lock your breaks and acceleration so you cant slow down" AI: "Pfft check this out"
  • Thise video (and all of the other videos of yours) are a true masterpiece.