// ResNet-18 . CIFAR-10 . 4-Experiment Benchmark

Staged Embarrassment
Learning

A curriculum-based training method that applies dynamic gradient sparsity -- letting a neural network focus its compute on the samples it finds most embarrassing to get wrong.

View on GitHub
99% FLOPs saved (SEL-95)
95% FLOPs saved (Warmup+SEL)
85.3% Accuracy (Warmup+SEL)
38% Faster training time
SCROLL
01 // Origin

A child, a ball,
and a learning signal.

The idea for SEL came from watching a niece learning to catch a ball. She missed. Someone nearby laughed. She felt embarrassed -- and in that instant, something remarkable happened: she didn't just try harder, she corrected her exact mistake with a precision she hadn't shown before.

That emotional signal -- embarrassment -- triggered a targeted, high-efficiency correction. She wasn't recalibrating everything she knew. She was zeroing in on exactly what went wrong.

Traditional neural networks don't do this. They apply gradients uniformly, spending compute on easy samples they already know perfectly. SEL asks: what if we only updated weights where the model is genuinely embarrassed?

The result is a training algorithm that mirrors this human learning instinct -- suppressing gradient updates for confident, easy predictions, and concentrating all available compute on the samples the model finds hardest to explain.

1

Input Arrives

A training sample is passed through the network. The model makes a prediction.

2

Embarrassment Computed

Per-class embarrassment Ec is measured via temperature-scaled cross-entropy loss.

3

Guilt Threshold Applied

Gradients below the guilt threshold gamma are masked to zero -- the model "ignores" what it already knows.

4

Sparse Update

Only the most significant gradients survive. Frozen knowledge stays frozen.

5

Staged Curriculum

The training progresses from easy to hard samples across 5 stages, naturally escalating difficulty.

02 // Mathematical Foundation

The math of guilt.

SEL formalizes the intuition above into two elegant operations: measuring per-class embarrassment, and masking gradients that fall below a guilt threshold.

PER-CLASS EMBARRASSMENT
E_c = (1 / |N_c|) . sum L( y_hat_i / T , y_i ) for i in N_c
Where T = 1.5 is the temperature parameter, N_c is the set of samples from class c, and L is cross-entropy loss. Confidence is defined as C_c = max(0, 1 - E_c).
SPARSE GRADIENT UPDATE
Mask = |grad(p)| > gamma
grad(p) <- grad(p) . Mask
gamma is the guilt threshold -- the p40 percentile of gradient magnitudes. Gradients below this threshold are zeroed out, producing ~95% sparsity. The remaining 5% of gradients carry all the learning signal.
sel_engine.py -- sparse_update()
def sparse_update(model, gamma):
    """Apply guilt threshold mask to gradients. Returns sparsity fraction."""
    tot = guilty = 0
    for p in model.parameters():
        if p.grad is not None:
            mask = (p.grad.abs() > gamma).float()  # 1 where guilt > threshold
            p.grad.mul_(mask)                    # zero out innocent gradients
            tot    += mask.numel()
            guilty += mask.sum().item()
    return 1.0 - (guilty / max(tot, 1))      # sparsity = fraction frozen

The evidence.

Four experiments on CIFAR-10, evaluated on a held-out test set of 100 images per class -- never seen during training. All run on a T4 GPU.

Test Accuracy -- Held-Out 100/Class
Baseline Lottery Ticket SEL-95% Warmup+SEL
Training Accuracy (Convergence)
Baseline Lottery SEL-95% Warmup+SEL
Summary -- Final Results
System Test Acc (100/class) FLOPs Time FLOPs Saved
Baseline CNN
93.2%
33.5T 1398s 0%
Lottery Ticket
93.0%
6.7T 1460s 80%
SEL-95%
77.5%
0.084T 835s 99%
Warmup+SEL
85.3%
1.58T 875s 95%
04 // Applications

Where embarrassment is a feature.

SEL's key insight -- spend compute only on confusing examples -- opens up training regimes that were previously impractical.

Edge Devices

99% FLOPs reduction makes on-device fine-tuning feasible on microcontrollers and embedded AI chips with extreme power constraints.

Real-Time Robotics

Robots can continuously learn from novel situations (embarrassing failures) without replaying entire datasets -- adaptive learning in the field.

Adaptive Education AI

Personalized tutoring systems that track per-concept "embarrassment" scores and focus practice on the exact gaps in a student's knowledge.

Federated Learning

Sparse gradient updates dramatically reduce communication overhead in distributed training -- crucial for privacy-preserving FL systems.

Continual Learning

By freezing confident knowledge and only updating on embarrassing samples, SEL naturally resists catastrophic forgetting.

Foundation Model Fine-Tuning

Apply SEL's sparse update logic to LoRA-style fine-tuning, drastically cutting the cost of adapting large models to new domains.

05 // Trade-offs

Honest limitations.

+ What works

  • Massive FLOPs reduction (95--99%) with manageable accuracy cost
  • Warmup+SEL recovers ~8% accuracy vs pure SEL-95 at minimal extra cost
  • Training wall-clock time drops by 38% vs baseline
  • Natural curriculum prevents early overfitting on easy samples
  • Embarrassment signal provides interpretable per-class difficulty tracking

- Known limitations

  • SEL-95 achieves only 77.5% vs 93.2% baseline -- a 15.7pp accuracy gap
  • Guilt threshold gamma requires careful calibration; wrong value kills convergence
  • Stage transitions can cause temporary accuracy dips (visible in curves)
  • Evaluated only on CIFAR-10 / ResNet-18; generalization to larger architectures is untested
  • Lottery Ticket achieves baseline accuracy with 80% savings -- a strong competitor in accuracy-critical settings
06 // Future Scope

Where this goes next.

NEAR-TERM

Adaptive guilt threshold gamma that auto-tunes per class, removing the need for manual calibration.

MEDIUM-TERM

SEL applied to transformer architectures -- per-head embarrassment scoring as an attention-aware sparsity method.

LONG-TERM

Embarrassment as a universal learning signal -- combining with reinforcement learning for agents that prioritize surprising state transitions.