AI Builds in Creative Mode | Mindcraft

86,499
0
Published 2024-05-26
In this video, I use AI agents powered by different large language models to build various things in minecraft. It is a test of their ability to code, create, follow instructions, and problem solve. They blow up some tnt and build ruins in a false world. I test #gpt o #gemini #llama #claude on #minecraft
Part one of this vid:    • AI Builds Stuff in Minecraft | Mindcraft  

𒃴 𒅌

Support me on Patreon: www.patreon.com/emergentgarden
Code base: github.com/kolbytn/mindcraft
Discord: discord.gg/ZsrAAByEnr
My twitter: twitter.com/max_romana
Kolby's twitter (project owner): twitter.com/kolbytn

Timestamps
(0:00) The Great Pyramid of Andy
(1:13) Meet the Models
(2:51) Roman Columns
(5:11) Desert Castle
(8:48) Redstone
(11:38) Nether Portal


𒆨

All Comments (21)
  • @yo_gab
    9:33 the way LLama kept trying to flip the switch as if it’ll make the lamp light up somehow 😂
  • @RichConnerGMN
    10:28 that "does anyone know where i can find some" makes me really want to see these ais try to do something together. like just set them loose in survival mode and see what happens
  • @ataarono
    @Emergent Garden Recommendation from me: Your prompts have no leverage, what I mean is that the LLM does not handle complex building tasks well because its limited by the single shot answer it needs to generate. Your template for "NewAction" is a great idea, my idea to improve its leverage is to add another template "NewActionPlan" Which it then fills with a list of generated prompts that will then be fed back into itself one after another (kind of like writing a todo list before getting started) My vision for it was kind of like this: -You whisper"Build a bridge for me" - "Okay lets plan this out" used newActionPlan - Okay lets see whats first on the todo list... used actionPlan[0] Sure I will build the supporting pillars used newAction ...etc Getting a shared reference point for superimposed building actions is of course something to consider. Using plans recursively might also be interesting, like making a plan for planning multiple plans for even more abstracted tasks. Some way of sensing the world is possible, maybe you can let it take screenshots of the game and feed the image into some of the multi modal image recognition capable models
  • @MintBiscuit
    the aliens gave egyptians creative mode? I see!
  • @Rasteriser
    I don’t know exactly how your system works but have you tried letting them use something like mathematical curves for building? Like vectors at positions pointing to positions with some formulas on top if required? Another thing you could do to help them out is allow them to write classes per object in a build. I think this would be great for things like columns because they then realise there would be spatial rules like spacing.
  • @andyrawrz
    the future of gaming looks amazing, imagine having multiple ai bots that help you in your world all dwarf fortress style base building
  • @IAmThisFact
    It is still interesting to see these LLMs do their best at understanding how to build in Minecraft, i wonder if more of them ever get image scanning abilities, you could let them take pictures of builds or the environment so they can see what they built and they can auto-correct?
  • @MinerBat
    i think it would be cool if you let every iteration build a skyscraper and add them all to a single city which will then grow with skyscrapers that are slowly getting better so you can see the improvement in one place
  • @_BangDroid_
    Doing this without computer vision is interesting and really makes me appreciate how incredibly complex the human brain is to be able to do so much in real time. Imagine the resources needed to give a multimodal model with vision/language/action the ability to play in real time, the power requirements, where we can just eat for energy
  • @bobblebardsley
    4:08 In Llama's defence, I can see how those could be described as one-block columns spaced 'one block apart' as requested at 2:48, it's just included the column itself in the measurement of 'spaced'.
  • @phil-jc8hp
    Gemini 1.5 is generally available via Vertex AI since a couple of days. You can also create an API key via AI Studio; it's not only their chatbot interface and a little easier to create an account.
  • @itissatno
    this is SO cool! gpt4 going to the nether had me in awe
  • @IceTank
    Yeah, mineflayer-pathfinder definitely needs some improvements, especially in the scaffolding department. Maybe I can get myself to work on it some more. This is actually not the first time people tried to use general ai with mineflayer. There was also a French Microsoft team that did the same before with gpt. I think having the agents write the code has huge potential, especially if the modells were trained on the existing mineflayer code.
  • @ViceZone
    The fact that they are imperfect makes it more amazing.
  • @RikkTheGaijin
    is it possible to use GPT-4o vision capabilities to let it "see" what is doing? That could significantly improve the quality.