Technical documentation

Ethoscore methodology

A comprehensive framework for measuring article framing along a calibrated spectrum— Neutral, Loaded, and Alarmist—using transfer learning, ordinal classification, and held-out validation on 125,014 news articles.

I. Conceptual Framework

Window
Concept
Framing Spectrum

Ethoscore measures framing—the rhetorical posture and emotional intensity of written content—independent of factual accuracy or political orientation. The system quantifies linguistic urgency, threat language, and dramatic framing.

Neutral

Measured, sourced, proportionate tone

Loaded

Emotionally charged, moralizing language

Alarmist

Imminent threat, catastrophic framing

Window
Signals
Linguistic Signals

Increases Score

  • Doom/threat language intensity
  • High-certainty catastrophe predictions
  • Temporal urgency framing

Decreases Score

  • Source attribution and transparency
  • Uncertainty acknowledgment
  • Balanced risk presentation
Window
Outputs
Model Outputs

Discrete Classification

Categorical label with confidence probability

Continuous Scale Score

Normalized 0–100 intensity metric

Class Probabilities

Full probability distribution across categories

II. Model Architecture and Training Protocol

Transfer Learning Architecture

1Teacher Model Annotation

125,014 articles from NewsAPI.ai Event Registry were annotated using Llama 3.3 (70 billion parameters) as the teacher model. Articles were sourced across 149 topic categories selected to maximize framing diversity.

Model: Llama-3.3-70B-Instruct-Turbo

Parameters: ~70B

Task: Three-class framing annotation

2Student Model Training

Two student models based on DeBERTa-v3-xsmall were trained via knowledge distillation. The architecture trades embedding breadth for network depth, enabling better integration of abstract semantic content.

Base: microsoft/deberta-v3-xsmall

Context window: 1,500 tokens

Training set: 121,888 articles

Dual Model Architecture

Ordinal Classification Model

Projects article representations onto a one-dimensional latent scale with learned threshold parameters. Outputs cumulative probabilities converted to class probabilities and continuous scale scores.

Accuracy: 89.73%

F1 (Macro): 83.12%

Softmax Classification Model

Standard three-class classifier with softmax activation. Outputs discrete probabilities for each category without ordinal constraint. Optimized for categorical accuracy.

Accuracy: 89.51%

F1 (Macro): 81.19%

Training Data Distribution

Article distribution across 149 topic categories selected to represent contemporary news coverage patterns. Categories span politics, technology, health, environment, economics, and social issues.

Training data distribution across 149 topic areas showing article counts per category

III. Score Normalization System

Calibrated 0–100 Scale

The ordinal model's latent scale scores (ranging approximately -15 to +20) are transformed to a standardized 0–100 normalized scale via region-based linear interpolation. This normalization preserves ordinal relationships while providing intuitive interpretability.

Normalization Mapping

0–30: Neutral

Raw range: -10.0 to +2.0

25–65: Loaded

Raw range: +1.0 to +6.0

60–100: Alarmist

Raw range: +5.0 to +15.0+

Key Transition Points

  • -0.5Neutral ↔ Loaded boundary (~27.5/100)
  • +6.0Loaded ↔ Alarmist boundary (~65/100)

Boundary Smoothing

Scores within ±1.0 unit of class boundaries receive smoothing to ensure continuous transitions and prevent artificial cliffs in the normalized scale. This maintains model confidence fidelity.

IV. Validation and Performance Metrics

Held-Out Validation Protocol

To ensure generalization, 3,126 articles (2.5%) were held out as a validation set never exposed during training. Models were evaluated on this independent test set to measure real-world performance.

Ordinal Model Performance

Overall Accuracy89.73%
F1 Score (Macro)83.12%
F1 Score (Weighted)89.66%
Neutral F1:94.01%
Loaded F1:81.89%
Alarmist F1:73.44%

Softmax Model Performance

Overall Accuracy89.51%
F1 Score (Macro)81.19%
F1 Score (Weighted)89.18%
Neutral F1:94.06%
Loaded F1:80.93%
Alarmist F1:68.59%

Train-Test Partition Schema

Visualization of the training and validation split methodology ensuring independent evaluation.

Train-test split diagram showing features, target, and ML model pipeline

V. Probability and Confidence Computation

Standard Softmax Classification

The model outputs raw logits for each class, which are transformed via the softmax function into a probability distribution summing to 1.0. The highest probability determines the predicted class, and its value represents confidence.

Input logits: [2.0, 1.0, 0.1]

→ Softmax transformation

Output probabilities: [0.65, 0.24, 0.11]

Prediction: Neutral (65% confidence)

Confidence Interpretation Guidelines

High (≥80%)

Strong model certainty; suitable for automated classification workflows

Medium (50–80%)

Moderate certainty; article may exhibit mixed signals or boundary characteristics

Low (<50%)

High ambiguity; manual review recommended for critical applications

VI. Limitations and Transparency

Inherited Biases

Labels originate from a teacher model (Llama 3.3), meaning any systematic biases in the teacher's classifications are partially inherited by the student models. While the teacher model was instructed to focus on framing rather than content, topic-specific annotation patterns may persist.

Context Window Truncation

Articles are truncated to 1,500 tokens for computational efficiency. Extended articles may lose nuance from later sections, potentially affecting classification accuracy for documents with evolving framing patterns.

Domain-Specific Language Variations

Certain topics (e.g., disaster reporting, public health emergencies) naturally employ more urgent language. The model attempts to distinguish between contextually appropriate urgency and exaggerated framing, but edge cases require human judgment.

Try the tool

Analyze any article in your browser. Paste a link or text and get framing scores instantly.

Our sincere thanks to Dr. Matthias for his indispensable work on model training and for making this research possible.