AI Builds in Creative Mode | Mindcraft

84,910
0
Published 2024-05-26
In this video, I use AI agents powered by different large language models to build various things in minecraft. It is a test of their ability to code, create, follow instructions, and problem solve. They blow up some tnt and build ruins in a false world. I test #gpt o #gemini #llama #claude on #minecraft
Part one of this vid:    • AI Builds Stuff in Minecraft | Mindcraft  

𒃴 𒅌

Support me on Patreon: www.patreon.com/emergentgarden
Code base: github.com/kolbytn/mindcraft
Discord: discord.gg/ZsrAAByEnr
My twitter: twitter.com/max_romana
Kolby's twitter (project owner): twitter.com/kolbytn

Timestamps
(0:00) The Great Pyramid of Andy
(1:13) Meet the Models
(2:51) Roman Columns
(5:11) Desert Castle
(8:48) Redstone
(11:38) Nether Portal


𒆨

All Comments (21)
  • @MintBiscuit
    the aliens gave egyptians creative mode? I see!
  • @yo_gab
    9:33 the way LLama kept trying to flip the switch as if it’ll make the lamp light up somehow 😂
  • @RichConnerGMN
    10:28 that "does anyone know where i can find some" makes me really want to see these ais try to do something together. like just set them loose in survival mode and see what happens
  • @andyrawrz
    the future of gaming looks amazing, imagine having multiple ai bots that help you in your world all dwarf fortress style base building
  • @IAmThisFact
    It is still interesting to see these LLMs do their best at understanding how to build in Minecraft, i wonder if more of them ever get image scanning abilities, you could let them take pictures of builds or the environment so they can see what they built and they can auto-correct?
  • @ataarono
    @Emergent Garden Recommendation from me: Your prompts have no leverage, what I mean is that the LLM does not handle complex building tasks well because its limited by the single shot answer it needs to generate. Your template for "NewAction" is a great idea, my idea to improve its leverage is to add another template "NewActionPlan" Which it then fills with a list of generated prompts that will then be fed back into itself one after another (kind of like writing a todo list before getting started) My vision for it was kind of like this: -You whisper"Build a bridge for me" - "Okay lets plan this out" used newActionPlan - Okay lets see whats first on the todo list... used actionPlan[0] Sure I will build the supporting pillars used newAction ...etc Getting a shared reference point for superimposed building actions is of course something to consider. Using plans recursively might also be interesting, like making a plan for planning multiple plans for even more abstracted tasks. Some way of sensing the world is possible, maybe you can let it take screenshots of the game and feed the image into some of the multi modal image recognition capable models
  • @MinerBat
    i think it would be cool if you let every iteration build a skyscraper and add them all to a single city which will then grow with skyscrapers that are slowly getting better so you can see the improvement in one place
  • @Rasteriser
    I don’t know exactly how your system works but have you tried letting them use something like mathematical curves for building? Like vectors at positions pointing to positions with some formulas on top if required? Another thing you could do to help them out is allow them to write classes per object in a build. I think this would be great for things like columns because they then realise there would be spatial rules like spacing.
  • @itissatno
    this is SO cool! gpt4 going to the nether had me in awe
  • @bobblebardsley
    4:08 In Llama's defence, I can see how those could be described as one-block columns spaced 'one block apart' as requested at 2:48, it's just included the column itself in the measurement of 'spaced'.
  • @phil-jc8hp
    Gemini 1.5 is generally available via Vertex AI since a couple of days. You can also create an API key via AI Studio; it's not only their chatbot interface and a little easier to create an account.
  • @_BangDroid_
    Doing this without computer vision is interesting and really makes me appreciate how incredibly complex the human brain is to be able to do so much in real time. Imagine the resources needed to give a multimodal model with vision/language/action the ability to play in real time, the power requirements, where we can just eat for energy
  • @ViceZone
    The fact that they are imperfect makes it more amazing.
  • @sootguy
    LLAMA: dahhh lets build a solid block house☝️🥴
  • @dragonuh7915
    you know we are doomed when gpt doesn`t know how to do an or gate but loves tnt
  • @luco9155
    Woldn't be interesting to have naturally generated structures in the world built by AI, instead of finding the same structures over and over you could find useless, not so good looking and simple but surely enigmatic structure to give minecraft his old feeling of mistery and the feeling of seeing for the first time like in the old days