Minecraftify: Turning Real Images into Minecraft Worlds with FLUX.2-Klein

Community Article Published June 15, 2026

Upvote

Mohammed Taha Rafi Farooqui

MTahaRF

build-small-hackathon

kalam

AnimeOverlord

build-small-hackathon

Introduction
Try It Yourself
Try it here! : https://huggingface.co/spaces/build-small-hackathon/Minecraftify
The Goal
Example Transformation
Building the Dataset
Why FLUX.2-Klein?
Training the Minecraft LoRA
Architecture
Running on Hugging Face Spaces
Recommended Settings
What Worked Well
Small Models Can Go a Long Way
Dataset Quality Matters More Than Quantity
Scene Preservation Is Hard
Future Improvements
Resources
Space
Base Model
LoRA
Dataset
Training Framework
Closing Thoughts
Introduction

What would your photos look like if they were built entirely out of Minecraft blocks?

That question led to the creation of Minecraftify, a Hugging Face Space that transforms ordinary images into faithful Minecraft-style recreations while preserving the original scene's structure, layout, and composition.

Unlike simple style-transfer approaches, Minecraftify attempts to maintain the identity of the scene while converting textures, materials, objects, and geometry into something that feels like it belongs inside vanilla Minecraft.

The project is powered by a fine-tuned FLUX.2-Klein-4B image-to-image model and a custom LoRA trained specifically for Minecraft scene conversion.

Try It Yourself

Space: Minecraftify

Upload an image or use your webcam and instantly see your world transformed into Minecraft.

Features include:

Image upload
Webcam capture
Live processing mode
Adjustable inference settings
Side-by-side comparison
Downloadable results

Try it here! : https://huggingface.co/spaces/build-small-hackathon/Minecraftify

The Goal

Many image stylization models dramatically alter a scene, changing camera angles, objects, or compositions.

Minecraftify was designed with a different objective:

Preserve the original composition
Preserve camera perspective
Keep recognizable objects
Maintain scene structure
Convert surfaces and geometry into Minecraft-style blocks

The result should feel like the original image was rebuilt inside Minecraft rather than completely reimagined.

Example Transformation

Input:

Real-world photograph
Natural lighting
Real textures and materials

Output:

Minecraft blocks
Voxel-style geometry
Minecraft-inspired materials
Similar composition and scene layout

The ideal output remains immediately recognizable while clearly belonging to the Minecraft aesthetic.

Building the Dataset

Training data is often the most important part of a project.

For Minecraftify, I created a paired image dataset consisting of approximately 400 image pairs.

Each sample contains:

Original image
Minecraft-style edited image
Caption describing the transformation

The Minecraft versions were generated using Qwen-Edit-25-12, allowing creation of a large paired dataset suitable for image-to-image training.

Dataset structure:

source_image
edited_image
prompt_used

Where:

source_image = original image
edited_image = Minecraft-style target
prompt_used = caption used during generation

Why FLUX.2-Klein?

A major design goal was keeping the project lightweight.

Instead of training or serving a very large model, Minecraftify uses:

FLUX.2-Klein-4B

This smaller FLUX model provides:

Fast inference
Lower VRAM requirements
Strong image editing capabilities
Excellent compatibility with LoRA fine-tuning

The final deployment combines:

FLUX.2-Klein-4B
        +
Minecraft LoRA
        =
Minecraftify

This allows the application to remain efficient while producing high-quality edits.

Training the Minecraft LoRA

The model was trained using the Hugging Face Diffusers DreamBooth LoRA workflow adapted for image-to-image training.

Key training settings:

Parameter	Value
Batch Size	1
Gradient Accumulation	4
Learning Rate	1
Precision	bf16
Rank	64
Training Steps	1200
Optimizer	Prodigy
Warmup Steps	200

Additional optimizations:

Latent caching
Gradient checkpointing
Aspect ratio buckets
8-bit optimizer support

These settings allowed training on a relatively small dataset while maintaining scene fidelity.

Architecture

The processing pipeline is intentionally simple:

Input Image
      ↓
FLUX.2-Klein Image-to-Image
      ↓
Minecraft LoRA Adapter
      ↓
Minecraftified Output

For webcam mode:

Camera Frame
      ↓
Latest Frame Buffer
      ↓
Model Inference
      ↓
Minecraft Output

Only the most recent frame is processed to keep latency manageable.

Running on Hugging Face Spaces

Minecraftify is deployed as a Gradio Space.

The Space includes:

Persistent model caching
Live webcam support
Adjustable generation controls
Side-by-side preview interface

To avoid repeatedly downloading models after restarts, persistent storage is used:

/data/models
/data/.huggingface

This significantly improves startup times and reduces bandwidth usage.

Recommended Settings

For the best balance between speed and quality:

Setting	Value
Steps	3
Guidance Scale	3.0
Seed	Fixed
Input	Well-lit scenes

These settings were chosen specifically for interactive usage within Hugging Face Spaces.

What Worked Well

A few lessons emerged during development:

Small Models Can Go a Long Way

The 4B FLUX Klein model proved surprisingly capable when paired with a targeted LoRA.

Dataset Quality Matters More Than Quantity

Even with roughly 400 examples, careful pairing and consistent transformations produced useful results.

Scene Preservation Is Hard

One of the biggest challenges was encouraging the model to change visual style without changing the scene itself.

Prompt design and image-to-image conditioning played a major role in achieving this balance.

Future Improvements

Potential future directions include:

Better Minecraft character generation
Support for different Minecraft texture packs
Real-time video processing
Larger and more diverse training datasets
Multiple Minecraft style presets
Improved block-level consistency

Resources

Space

Minecraftify

Base Model

black-forest-labs/FLUX.2-klein-4B

LoRA

AnimeOverlord/flux2-klein-4b-mc-v2

Dataset

Custom paired Minecraft image dataset (~400 samples)

Training Framework

Hugging Face Diffusers
Accelerate
PEFT
PyTorch

Closing Thoughts

Minecraftify began as a hackathon-style experiment with a simple question: could a lightweight image editing model convincingly rebuild the world using Minecraft blocks?

Despite being developed in a relatively short timeframe, the project successfully demonstrates how a specialized visual transformation task can be achieved using a compact 4B parameter model, LoRA fine-tuning, and a carefully curated image-to-image dataset.

The current version should be viewed as a strong first iteration rather than a finished product. Due to time constraints, many ideas and improvements remain unexplored, including larger training datasets, improved character and creature generation, better block consistency, real-time video support, texture-pack variations, and more advanced scene preservation techniques.

What excites me most is the project's potential. The underlying approach has already shown promising results with limited training data and development time, suggesting there is significant room for improvement as the dataset, training process, and inference pipeline continue to evolve.

Minecraftify is ultimately an exploration of how far small, specialized models can be pushed for creative visual tasks. This release is just the beginning, and future iterations will continue to expand its capabilities and bring generated scenes even closer to the experience of stepping into a real Minecraft world.

Spaces mentioned in this article 1

Signal Garden: A Game Engine That Keeps Mutating

June 16, 2026

Noteworthy

June 15, 2026

Community

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment

Upvote