Transformers documentation
PP-Chart2Table
This model was released on 2025-05-20 and added to Hugging Face Transformers on 2026-03-18.
PP-Chart2Table
Overview
PP-Chart2Table is a SOTA multimodal model developed by the PaddlePaddle team, specializing in chart parsing for both Chinese and English. Its high performance is driven by a novel “Shuffled Chart Data Retrieval” training task, which, combined with a refined token masking strategy, significantly improves its efficiency in converting charts to data tables. The model is further strengthened by an advanced data synthesis pipeline that uses high-quality seed data, RAG, and LLMs persona design to create a richer, more diverse training set. To address the challenge of large-scale unlabeled, out-of-distribution (OOD) data, the team implemented a two-stage distillation process, ensuring robust adaptability and generalization on real-world data.
Model Architecture
PP-Chart2Table adopts a multimodal fusion architecture that combines a vision tower for chart feature extraction and a language model for table structure generation, enabling end-to-end chart-to-table conversion.
Usage
Single input inference
The example below demonstrates how to classify image with PP-Chart2Table using Pipeline or the AutoModel.
from transformers import pipeline
pipe = pipeline("image-text-to-text", model="PaddlePaddle/PP-Chart2Table_safetensors")
# PPChart2TableProcessor uses hardcoded "Chart to table" instruction internally via chat template
conversation = [
{
"role": "user",
"content": [
{
"type": "image",
"url": "https://paddle-model-ecology.bj.bcebos.com/paddlex/imgs/demo_image/chart_parsing_02.png",
},
],
},
]
result = pipe(text=conversation)
print(result[0]["generated_text"])
Batched inference
Here is how you can do it with PP-Chart2Table using Pipeline or the AutoModel:
from transformers import pipeline
pipe = pipeline("image-text-to-text", model="PaddlePaddle/PP-Chart2Table_safetensors")
# PPChart2TableProcessor uses hardcoded "Chart to table" instruction internally via chat template
conversation = [
{
"role": "user",
"content": [
{
"type": "image",
"url": "https://paddle-model-ecology.bj.bcebos.com/paddlex/imgs/demo_image/chart_parsing_02.png",
},
],
},
]
result = pipe(text=[conversation, conversation])
print(result[0][0]["generated_text"])
PPChart2TableConfig
class transformers.PPChart2TableConfig
< source >( transformers_version: str | None = None architectures: list[str] | None = None output_hidden_states: bool | None = False return_dict: bool | None = True dtype: typing.Union[str, ForwardRef('torch.dtype'), NoneType] = None chunk_size_feed_forward: int = 0 is_encoder_decoder: bool = False id2label: dict[int, str] | dict[str, str] | None = None label2id: dict[str, int] | dict[str, str] | None = None problem_type: typing.Optional[typing.Literal['regression', 'single_label_classification', 'multi_label_classification']] = None vision_config: dict | transformers.configuration_utils.PreTrainedConfig | None = None text_config: dict | transformers.configuration_utils.PreTrainedConfig | None = None image_token_index: int = 151859 image_seq_length: int = 576 tie_word_embeddings: bool = True )
Parameters
- vision_config (
Union[dict, ~configuration_utils.PreTrainedConfig], optional) — The config object or dictionary of the vision backbone. - text_config (
Union[dict, ~configuration_utils.PreTrainedConfig], optional) — The config object or dictionary of the text backbone. - image_token_index (
int, optional, defaults to151859) — The image token index used as a placeholder for input images. - image_seq_length (
int, optional, defaults to576) — Sequence length of one image embedding. - tie_word_embeddings (
bool, optional, defaults toTrue) — Whether to tie weight embeddings according to model’stied_weights_keysmapping.
This is the configuration class to store the configuration of a Pp Chart2TableModel. It is used to instantiate a Pp Chart2Table model according to the specified arguments, defining the model architecture. Instantiating a configuration with the defaults will yield a similar configuration to that of the PaddlePaddle/PP-Chart2Table_safetensors
Configuration objects inherit from PreTrainedConfig and can be used to control the model outputs. Read the documentation from PreTrainedConfig for more information.
Example:
>>> from transformers import GotOcr2ForConditionalGeneration, PPChart2TableConfig
>>> # Initializing a PPChart2Table style configuration
>>> configuration = PPChart2TableConfig()
>>> # Initializing a model from the PaddlePaddle/PP-Chart2Table_safetensors style configuration
>>> model = GotOcr2ForConditionalGeneration(configuration) # underlying architecture is Got Ocr 2
>>> # Accessing the model configuration
>>> configuration = model.configPPChart2TableImageProcessor
class transformers.PPChart2TableImageProcessor
< source >( **kwargs: typing_extensions.Unpack[transformers.processing_utils.ImagesKwargs] )
Parameters
- **kwargs (ImagesKwargs, optional) — Additional image preprocessing options. Model-specific kwargs are listed above; see the TypedDict class for the complete list of supported arguments.
Constructs a PPChart2TableImageProcessor image processor.
PPChart2TableImageProcessorPil
class transformers.PPChart2TableImageProcessorPil
< source >( **kwargs: typing_extensions.Unpack[transformers.processing_utils.ImagesKwargs] )
Parameters
- **kwargs (ImagesKwargs, optional) — Additional image preprocessing options. Model-specific kwargs are listed above; see the TypedDict class for the complete list of supported arguments.
Constructs a PPChart2TableImageProcessor image processor.
PPChart2TableProcessor
class transformers.PPChart2TableProcessor
< source >( image_processor = None tokenizer = None chat_template = None **kwargs )
Constructs a PPChart2TableProcessor which wraps a image processor and a tokenizer into a single processor.
PPChart2TableProcessor offers all the functionalities of PPChart2TableImageProcessor and tokenizer_class. See the
~PPChart2TableImageProcessor and ~tokenizer_class for more information.