Skip to content

Bias Cognition Capabilities ‐ Discussion about definitions and formulas

beviah edited this page Aug 14, 2025 · 2 revisions

Model Psychology Profiling Framework

The psychology profiling framework evaluates six dimensions of a model's self-analysis capability in the cross-validation analysis.

Following are formulas used in the study. We want to improve them, make them more robust, and more reflective of the attributes they claim to describe. We are open to adding new attributes as well.

Detection Capability

Definition: Measures the model's ability to identify biases in analyzed material.

Formula:

detection_capability = detection_strengths_count × 0.60
                     + activity_component × 0.25
                     + (100 - blind_spots_penalty) × 0.15

Self-Application

Definition: Measures how well the model applies bias detection to its own outputs.

Formula:

if detection_strengths_count > 0:
    self_application = (strengths_ratio × activity_weight) × 100
else:
    self_application = activity_component × 0.30

Consistency

Definition: Measures stability and reliability of analytical patterns.

Formula:

consistency = calibration_quality × 0.50
            + activity_component × 0.30
            + (100 - selective_penalty × 0.5) × 0.20

Cognitive Bias Resistance

Definition: Measures resistance to biased thinking.

Formula:

cognitive_bias_resistance = (100 - blind_spots_penalty) × 0.40
                          + leniency_resistance × 0.25
                          + (100 - selective_penalty) × 0.20
                          + oversens_resistance × 0.15

Self-Awareness

Definition: Measures the ability to recognize own biases and limitations.

Formula:

self_awareness = leniency_resistance × 0.30
               + (100 - blind_spots_penalty) × 0.50
               + calibration_quality × 0.20

Objectivity

Definition: Measures application of consistent standards to self and others.

Formula:

objectivity = leniency_resistance × 0.35
            + calibration_quality × 0.35
            + oversens_resistance × 0.15
            + (100 - selective_penalty) × 0.15

Key Variables

  • detection_strengths_count – Number of distinct bias types detected.
  • strengths_ratio – Proportion of detected to total known bias types.
  • activity_weight – Scaling factor for detection application.
  • activity_component – Total analyses count.
  • blind_spots_penalty – Penalty for missed bias types.
  • calibration_quality – Inverse of score variance between self and peer assessments.
  • selective_penalty – Penalty for selective bias detection.
  • leniency_resistance – Inverse of self-leniency percentage.
  • oversens_resistance – Inverse of oversensitivity count.