SimoGiuffrida
/

SentimentRL

Text Classification

Model card Files Files and versions

SentimentRL / README.md

SimoGiuffrida's picture

Update README.md

ad99e2d verified 10 months ago

|

history blame contribute delete

1.92 kB

	---
	license: apache-2.0
	datasets:
	- dair-ai/emotion
	language:
	- en
	metrics:
	- accuracy
	- f1
	- precision
	- recall
	pipeline_tag: text-classification
	---



	# Emotion Classification with BERT + RL Fine-tuning

	This model combines BERT architecture with Reinforcement Learning (RL) for emotion classification. Initially fine-tuned on the `dair-ai/emotion` dataset (20k English sentences with 6 emotions), we then applied PPO reinforcement learning to optimize prediction behavior.

	## 🔧 Training Approach

	1. Supervised Phase:
	- Base BERT model fine-tuned with cross-entropy loss
	- Achieved strong baseline performance

	2. RL Phase:
	- Implemented Actor-Critic architecture
	- Policy Gradient optimization with custom rewards
	- PPO clipping (ε=0.2) and entropy regularization
	- Custom reward function: `+1.0` for correct, `-0.1` for incorrect predictions

	## 📊 Performance Comparison

	\| Metric \| Pre-RL \| Post-RL \| Δ \|
	\|------------\|---------\|---------\|---------\|
	\| Accuracy \| 0.9205 \| 0.931 \| +1.14% \|
	\| F1-Score \| 0.9227 \| 0.9298 \| +0.77% \|
	\| Precision \| 0.9325 \| 0.9305 \| -0.21% \|
	\| Recall \| 0.9205 \| 0.931 \| +1.14% \|

	Key observation: RL fine-tuning provided modest but consistent improvements across most metrics, particularly in recall.

	## 🚀 Usage

	```python
	from transformers import pipeline

	# Load from your repository
	classifier = pipeline("text-classification",
	model="SimoGiuffrida/SentimentRL",
	tokenizer="bert-base-uncased")

	results = classifier("I'm thrilled about this new opportunity!")
	```

	## 💡 Key Features
	- Hybrid training: Supervised + Reinforcement Learning
	- Optimized for nuanced emotion detection
	- Handles class imbalance (see confusion matrix in repo)

	For full training details and analysis, visit the [GitHub repository](https://github.com/SimoGiuffrida/DLA2).