MatFuse
Collection
2 items
β’
Updated
MatFuse generates tileable PBR material maps (diffuse, normal, roughness, specular) from text, reference images, sketches, and/or color palettes.
Paper: MatFuse: Controllable Material Generation with Diffusion Models β CVPR 2024 Project page: https://gvecchio.com/matfuse/
import torch
from diffusers import DiffusionPipeline
pipe = DiffusionPipeline.from_pretrained(
"gvecchio/MatFuse",
trust_remote_code=True,
torch_dtype=torch.float16,
)
pipe = pipe.to("cuda")
result = pipe(
text="red brick wall",
num_inference_steps=50,
guidance_scale=4.0,
generator=torch.Generator("cuda").manual_seed(42),
)
result["diffuse"][0].save("diffuse.png")
result["normal"][0].save("normal.png")
result["roughness"][0].save("roughness.png")
result["specular"][0].save("specular.png")
All conditions are optional and freely composable:
| Input | Type | Description |
|---|---|---|
text |
str |
Text description of the material |
image |
PIL.Image |
Reference image for style/appearance |
sketch |
PIL.Image (grayscale) |
Binary edge map for structure |
palette |
list[tuple] |
Up to 5 RGB colour tuples (0β255) |
from PIL import Image
result = pipe(
image=Image.open("reference.png"),
text="rough stone texture",
palette=[(120, 80, 60), (90, 60, 40), (150, 110, 80), (70, 50, 30), (180, 140, 100)],
num_inference_steps=50,
guidance_scale=4.0,
)
| Component | Class | Key parameters |
|---|---|---|
| UNet | UNet2DConditionModel |
in=16, out=12, blocks=[256,512,1024], cross_attn=512 |
| VAE | MatFuseVQModel (custom) |
4 encoders + 4 VQ codebooks (4096Γ3), shared decoder, f=8 |
| Scheduler | DDIMScheduler |
Ξ² 0.0015β0.0195, scaled_linear, Ξ΅-prediction |
| Conditioning | MultiConditionEncoder (custom) |
CLIP ViT-B/16 Β· sentence-transformers Β· palette MLP Β· sketch CNN |
@inproceedings{vecchio2024matfuse,
author = {Vecchio, Giuseppe and Sortino, Renato and Palazzo, Simone and Spampinato, Concetto},
title = {MatFuse: Controllable Material Generation with Diffusion Models},
booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},
month = {June},
year = {2024},
pages = {4429-4438}
}
This project is licensed under the MIT License.