I. Conceptual Framework
Ethoscore measures framing—the rhetorical posture and emotional intensity of written content—independent of factual accuracy or political orientation. The system quantifies linguistic urgency, threat language, and dramatic framing.
Neutral
Measured, sourced, proportionate tone
Loaded
Emotionally charged, moralizing language
Alarmist
Imminent threat, catastrophic framing
Increases Score
- ↑Doom/threat language intensity
- ↑High-certainty catastrophe predictions
- ↑Temporal urgency framing
Decreases Score
- ↓Source attribution and transparency
- ↓Uncertainty acknowledgment
- ↓Balanced risk presentation
Discrete Classification
Categorical label with confidence probability
Continuous Scale Score
Normalized 0–100 intensity metric
Class Probabilities
Full probability distribution across categories
II. Model Architecture and Training Protocol
Transfer Learning Architecture
1Teacher Model Annotation
125,014 articles from NewsAPI.ai Event Registry were annotated using Llama 3.3 (70 billion parameters) as the teacher model. Articles were sourced across 149 topic categories selected to maximize framing diversity.
Model: Llama-3.3-70B-Instruct-Turbo
Parameters: ~70B
Task: Three-class framing annotation
2Student Model Training
Two student models based on DeBERTa-v3-xsmall were trained via knowledge distillation. The architecture trades embedding breadth for network depth, enabling better integration of abstract semantic content.
Base: microsoft/deberta-v3-xsmall
Context window: 1,500 tokens
Training set: 121,888 articles
Dual Model Architecture
Ordinal Classification Model
Projects article representations onto a one-dimensional latent scale with learned threshold parameters. Outputs cumulative probabilities converted to class probabilities and continuous scale scores.
Accuracy: 89.73%
F1 (Macro): 83.12%
Softmax Classification Model
Standard three-class classifier with softmax activation. Outputs discrete probabilities for each category without ordinal constraint. Optimized for categorical accuracy.
Accuracy: 89.51%
F1 (Macro): 81.19%
Training Data Distribution
Article distribution across 149 topic categories selected to represent contemporary news coverage patterns. Categories span politics, technology, health, environment, economics, and social issues.

III. Score Normalization System
Calibrated 0–100 Scale
The ordinal model's latent scale scores (ranging approximately -15 to +20) are transformed to a standardized 0–100 normalized scale via region-based linear interpolation. This normalization preserves ordinal relationships while providing intuitive interpretability.
Normalization Mapping
Raw range: -10.0 to +2.0
Raw range: +1.0 to +6.0
Raw range: +5.0 to +15.0+
Key Transition Points
- -0.5Neutral ↔ Loaded boundary (~27.5/100)
- +6.0Loaded ↔ Alarmist boundary (~65/100)
Boundary Smoothing
Scores within ±1.0 unit of class boundaries receive smoothing to ensure continuous transitions and prevent artificial cliffs in the normalized scale. This maintains model confidence fidelity.
IV. Validation and Performance Metrics
Held-Out Validation Protocol
To ensure generalization, 3,126 articles (2.5%) were held out as a validation set never exposed during training. Models were evaluated on this independent test set to measure real-world performance.
Ordinal Model Performance
Softmax Model Performance
Train-Test Partition Schema
Visualization of the training and validation split methodology ensuring independent evaluation.

V. Probability and Confidence Computation
Standard Softmax Classification
The model outputs raw logits for each class, which are transformed via the softmax function into a probability distribution summing to 1.0. The highest probability determines the predicted class, and its value represents confidence.
Input logits: [2.0, 1.0, 0.1]
→ Softmax transformation
Output probabilities: [0.65, 0.24, 0.11]
Prediction: Neutral (65% confidence)
Confidence Interpretation Guidelines
High (≥80%)
Strong model certainty; suitable for automated classification workflows
Medium (50–80%)
Moderate certainty; article may exhibit mixed signals or boundary characteristics
Low (<50%)
High ambiguity; manual review recommended for critical applications
VI. Limitations and Transparency
Inherited Biases
Labels originate from a teacher model (Llama 3.3), meaning any systematic biases in the teacher's classifications are partially inherited by the student models. While the teacher model was instructed to focus on framing rather than content, topic-specific annotation patterns may persist.
Context Window Truncation
Articles are truncated to 1,500 tokens for computational efficiency. Extended articles may lose nuance from later sections, potentially affecting classification accuracy for documents with evolving framing patterns.
Domain-Specific Language Variations
Certain topics (e.g., disaster reporting, public health emergencies) naturally employ more urgent language. The model attempts to distinguish between contextually appropriate urgency and exaggerated framing, but edge cases require human judgment.
Ethoscore quantifies framing and rhetorical tone, not factual accuracy or authorial intent. High alarmism scores do not imply falsehood, nor do low scores guarantee accuracy. This system should complement—not replace—human editorial judgment in consequential decision-making contexts.
Our sincere thanks to Dr. Matthias for his indispensable work on model training and for making this research possible.