AI Builds in Creative Mode | Mindcraft

86,499

6,657 0

Published 2024-05-26

In this video, I use AI agents powered by different large language models to build various things in minecraft. It is a test of their ability to code, create, follow instructions, and problem solve. They blow up some tnt and build ruins in a false world. I test #gpt o #gemini #llama #claude on #minecraft
Part one of this vid: • AI Builds Stuff in Minecraft | Mindcraft

𒃴 𒅌

Support me on Patreon: www.patreon.com/emergentgarden
Code base: github.com/kolbytn/mindcraft
Discord: discord.gg/ZsrAAByEnr
My twitter: twitter.com/max_romana
Kolby's twitter (project owner): twitter.com/kolbytn

Timestamps
(0:00) The Great Pyramid of Andy
(1:13) Meet the Models
(2:51) Roman Columns
(5:11) Desert Castle
(8:48) Redstone
(11:38) Nether Portal

𒆨

All Comments (21)

@ChannelMiner 1 month ago

GPT4 going to the nether was hilarious
@yo_gab 1 month ago

9:33 the way LLama kept trying to flip the switch as if it’ll make the lamp light up somehow 😂
@ryaquaza3offical 1 month ago

“Ok go make a castle” Llama: “birthday cake got it”
@scoutantyfan 1 month ago

Gpt went like: U SAID COLUMNS, YOU GOT THEM.
@RichConnerGMN 1 month ago

10:28 that "does anyone know where i can find some" makes me really want to see these ais try to do something together. like just set them loose in survival mode and see what happens
@real_Clone_Gordon_Freeman 1 month ago

When are we getting their group speedrun attempt to beat the dragon?
@ataarono 1 month ago

@Emergent Garden Recommendation from me: Your prompts have no leverage, what I mean is that the LLM does not handle complex building tasks well because its limited by the single shot answer it needs to generate. Your template for "NewAction" is a great idea, my idea to improve its leverage is to add another template "NewActionPlan" Which it then fills with a list of generated prompts that will then be fed back into itself one after another (kind of like writing a todo list before getting started) My vision for it was kind of like this: -You whisper"Build a bridge for me" - "Okay lets plan this out" used newActionPlan - Okay lets see whats first on the todo list... used actionPlan[0] Sure I will build the supporting pillars used newAction ...etc Getting a shared reference point for superimposed building actions is of course something to consider. Using plans recursively might also be interesting, like making a plan for planning multiple plans for even more abstracted tasks. Some way of sensing the world is possible, maybe you can let it take screenshots of the game and feed the image into some of the multi modal image recognition capable models
@MintBiscuit 1 month ago

the aliens gave egyptians creative mode? I see!
@Rasteriser 1 month ago

I don’t know exactly how your system works but have you tried letting them use something like mathematical curves for building? Like vectors at positions pointing to positions with some formulas on top if required? Another thing you could do to help them out is allow them to write classes per object in a build. I think this would be great for things like columns because they then realise there would be spatial rules like spacing.
@andyrawrz 1 month ago

the future of gaming looks amazing, imagine having multiple ai bots that help you in your world all dwarf fortress style base building
@IAmThisFact 1 month ago

It is still interesting to see these LLMs do their best at understanding how to build in Minecraft, i wonder if more of them ever get image scanning abilities, you could let them take pictures of builds or the environment so they can see what they built and they can auto-correct?
@MinerBat 1 month ago

i think it would be cool if you let every iteration build a skyscraper and add them all to a single city which will then grow with skyscrapers that are slowly getting better so you can see the improvement in one place
@_BangDroid_ 1 month ago

Doing this without computer vision is interesting and really makes me appreciate how incredibly complex the human brain is to be able to do so much in real time. Imagine the resources needed to give a multimodal model with vision/language/action the ability to play in real time, the power requirements, where we can just eat for energy
@bobblebardsley 1 month ago

4:08 In Llama's defence, I can see how those could be described as one-block columns spaced 'one block apart' as requested at 2:48, it's just included the column itself in the measurement of 'spaced'.
@phil-jc8hp 1 month ago

Gemini 1.5 is generally available via Vertex AI since a couple of days. You can also create an API key via AI Studio; it's not only their chatbot interface and a little easier to create an account.
@itissatno 1 month ago

this is SO cool! gpt4 going to the nether had me in awe
@and5177 1 month ago

Love this project!
@IceTank 1 month ago

Yeah, mineflayer-pathfinder definitely needs some improvements, especially in the scaffolding department. Maybe I can get myself to work on it some more. This is actually not the first time people tried to use general ai with mineflayer. There was also a French Microsoft team that did the same before with gpt. I think having the agents write the code has huge potential, especially if the modells were trained on the existing mineflayer code.
@ViceZone 1 month ago

The fact that they are imperfect makes it more amazing.
@RikkTheGaijin 1 month ago

is it possible to use GPT-4o vision capabilities to let it "see" what is doing? That could significantly improve the quality.