Abstract
VecGlypher is a multimodal language model that generates high-fidelity vector glyphs directly from text or image inputs, bypassing traditional raster-to-vector processes and enabling direct SVG path generation.
Vector glyphs are the atomic units of digital typography, yet most learning-based pipelines still depend on carefully curated exemplar sheets and raster-to-vector postprocessing, which limits accessibility and editability. We introduce VecGlypher, a single multimodal language model that generates high-fidelity vector glyphs directly from text descriptions or image exemplars. Given a style prompt, optional reference glyph images, and a target character, VecGlypher autoregressively emits SVG path tokens, avoiding raster intermediates and producing editable, watertight outlines in one pass. A typography-aware data and training recipe makes this possible: (i) a large-scale continuation stage on 39K noisy Envato fonts to master SVG syntax and long-horizon geometry, followed by (ii) post-training on 2.5K expert-annotated Google Fonts with descriptive tags and exemplars to align language and imagery with geometry; preprocessing normalizes coordinate frames, canonicalizes paths, de-duplicates families, and quantizes coordinates for stable long-sequence decoding. On cross-family OOD evaluation, VecGlypher substantially outperforms both general-purpose LLMs and specialized vector-font baselines for text-only generation, while image-referenced generation reaches a state-of-the-art performance, with marked gains over DeepVecFont-v2 and DualVector. Ablations show that model scale and the two-stage recipe are critical and that absolute-coordinate serialization yields the best geometry. VecGlypher lowers the barrier to font creation by letting users design with words or exemplars, and provides a scalable foundation for future multimodal design tools.
Community
This is an automated message from the Librarian Bot. I found the following papers similar to this paper.
The following papers were recommended by the Semantic Scholar API
- From Tokens to Numbers: Continuous Number Modeling for SVG Generation (2026)
- Beyond Patches: Global-aware Autoregressive Model for Multimodal Few-Shot Font Generation (2026)
- Moving Beyond Sparse Grounding with Complete Screen Parsing Supervision (2026)
- Towards Pixel-Level VLM Perception via Simple Points Prediction (2026)
- VisTIRA: Closing the Image-Text Modality Gap in Visual Math Reasoning via Structured Tool Integration (2026)
- PokeFusion Attention: Enhancing Reference-Free Style-Conditioned Generation (2026)
- PosterOmni: Generalized Artistic Poster Creation via Task Distillation and Unified Reward Feedback (2026)
Please give a thumbs up to this comment if you found it helpful!
If you want recommendations for any Paper on Hugging Face checkout this Space
You can directly ask Librarian Bot for paper recommendations by tagging it in a comment: @librarian-bot recommend
Models citing this paper 0
No model linking this paper
Datasets citing this paper 0
No dataset linking this paper
Spaces citing this paper 0
No Space linking this paper
Collections including this paper 0
No Collection including this paper