Minecraftify
Mincraftify converts all images into mc-style LIVE!
What would your photos look like if they were built entirely out of Minecraft blocks?
That question led to the creation of Minecraftify, a Hugging Face Space that transforms ordinary images into faithful Minecraft-style recreations while preserving the original scene's structure, layout, and composition.
Unlike simple style-transfer approaches, Minecraftify attempts to maintain the identity of the scene while converting textures, materials, objects, and geometry into something that feels like it belongs inside vanilla Minecraft.
The project is powered by a fine-tuned FLUX.2-Klein-4B image-to-image model and a custom LoRA trained specifically for Minecraft scene conversion.
Space: Minecraftify
Upload an image or use your webcam and instantly see your world transformed into Minecraft.
Features include:
Many image stylization models dramatically alter a scene, changing camera angles, objects, or compositions.
Minecraftify was designed with a different objective:
The result should feel like the original image was rebuilt inside Minecraft rather than completely reimagined.
Input:
Output:
The ideal output remains immediately recognizable while clearly belonging to the Minecraft aesthetic.
Training data is often the most important part of a project.
For Minecraftify, I created a paired image dataset consisting of approximately 400 image pairs.
Each sample contains:
The Minecraft versions were generated using Qwen-Edit-25-12, allowing creation of a large paired dataset suitable for image-to-image training.
Dataset structure:
source_image
edited_image
prompt_used
Where:
source_image = original imageedited_image = Minecraft-style targetprompt_used = caption used during generationA major design goal was keeping the project lightweight.
Instead of training or serving a very large model, Minecraftify uses:
FLUX.2-Klein-4B
This smaller FLUX model provides:
The final deployment combines:
FLUX.2-Klein-4B
+
Minecraft LoRA
=
Minecraftify
This allows the application to remain efficient while producing high-quality edits.
The model was trained using the Hugging Face Diffusers DreamBooth LoRA workflow adapted for image-to-image training.
Key training settings:
| Parameter | Value |
|---|---|
| Batch Size | 1 |
| Gradient Accumulation | 4 |
| Learning Rate | 1 |
| Precision | bf16 |
| Rank | 64 |
| Training Steps | 1200 |
| Optimizer | Prodigy |
| Warmup Steps | 200 |
Additional optimizations:
These settings allowed training on a relatively small dataset while maintaining scene fidelity.
The processing pipeline is intentionally simple:
Input Image
↓
FLUX.2-Klein Image-to-Image
↓
Minecraft LoRA Adapter
↓
Minecraftified Output
For webcam mode:
Camera Frame
↓
Latest Frame Buffer
↓
Model Inference
↓
Minecraft Output
Only the most recent frame is processed to keep latency manageable.
Minecraftify is deployed as a Gradio Space.
The Space includes:
To avoid repeatedly downloading models after restarts, persistent storage is used:
/data/models
/data/.huggingface
This significantly improves startup times and reduces bandwidth usage.
For the best balance between speed and quality:
| Setting | Value |
|---|---|
| Steps | 3 |
| Guidance Scale | 3.0 |
| Seed | Fixed |
| Input | Well-lit scenes |
These settings were chosen specifically for interactive usage within Hugging Face Spaces.
A few lessons emerged during development:
The 4B FLUX Klein model proved surprisingly capable when paired with a targeted LoRA.
Even with roughly 400 examples, careful pairing and consistent transformations produced useful results.
One of the biggest challenges was encouraging the model to change visual style without changing the scene itself.
Prompt design and image-to-image conditioning played a major role in achieving this balance.
Potential future directions include:
Minecraftify
black-forest-labs/FLUX.2-klein-4B
AnimeOverlord/flux2-klein-4b-mc-v2
Custom paired Minecraft image dataset (~400 samples)
Minecraftify began as a hackathon-style experiment with a simple question: could a lightweight image editing model convincingly rebuild the world using Minecraft blocks?
Despite being developed in a relatively short timeframe, the project successfully demonstrates how a specialized visual transformation task can be achieved using a compact 4B parameter model, LoRA fine-tuning, and a carefully curated image-to-image dataset.
The current version should be viewed as a strong first iteration rather than a finished product. Due to time constraints, many ideas and improvements remain unexplored, including larger training datasets, improved character and creature generation, better block consistency, real-time video support, texture-pack variations, and more advanced scene preservation techniques.
What excites me most is the project's potential. The underlying approach has already shown promising results with limited training data and development time, suggesting there is significant room for improvement as the dataset, training process, and inference pipeline continue to evolve.
Minecraftify is ultimately an exploration of how far small, specialized models can be pushed for creative visual tasks. This release is just the beginning, and future iterations will continue to expand its capabilities and bring generated scenes even closer to the experience of stepping into a real Minecraft world.
Mincraftify converts all images into mc-style LIVE!
More from this author