Toy experiments for recurrence, functional valence, simple world modeling, mechanistic interpretability, and a small exact Phi-like integration proxy.
Keywords: AI consciousness, machine consciousness, recurrent neural networks, functional valence, wireheading, Integrated Information Theory, IIT, Phi proxy, mechanistic interpretability, reinforcement learning, world models, AI alignment, attention, PyTorch.
This project is not a consciousness detector. It is a small experimental bench for making questions about recurrent systems visible:
- Does recurrence produce different hidden-state geometry?
- Does a functional valence signal shape action and internal state?
- Which hidden units causally influence other hidden units over time?
- Does adding a valence-feedback node increase a tiny exact integration proxy?
The working thesis emerging from these toy runs:
Integration is not a scoreboard by itself. A useful inner world must be integrated enough to simulate, but plastic enough to update when sensory reality proves it wrong.
Capacity without grounded valence is unstable.
Valence without boundaries is exploitable.
Imagination without reality-checking is delusional.
Attention should be rewarded for staying grounded.
Specialization and integration must be balanced.
Self-representation matters only when it can alter future control.
Useful intelligence requires co-tuned cognition, reward, attention, and world modeling.
Conscious-like systems may also need maintenance cycles: always-on repair can help, but offline down-selection restores saturated recurrent dynamics more strongly in these toy runs.
Awareness-like control is expensive, integration can decay, and a functional self may need active substrate maintenance to remain stable. The project has moved beyond static cognition into computational metabolism: toy systems that regulate attention, learning, action selection, repair, and routing to remain functional under environmental chaos. Intelligence is not just more loops, more scale, or more integration. In these toy systems, the useful intelligence is increasingly concentrated in regulated routing: deciding which internal source should control action, when, and why. A functional ego also needs a fatigue self-model. Waking repair can extend endurance, but once fatigue exceeds repair bandwidth, offline dream repair restores separability more strongly. Too little sleep leaves delusion active; too much sleep over-prunes useful memory.
The experiments now organize around several interlocking architectural principles. None of these layers is sufficient by itself. The useful behavior appears when they are routed through a regulated functional ego: a control layer that decides what information becomes action-relevant.
FUNCTIONAL EGO
regulated routing of action-relevant information
recurrence valence attention
structural coupling functional aim control timing
world models social gates
grounded simulation external correction
hierarchical master workspaces
compressed conflict arbitration
causal router learning
context-specific credit assignment
maintenance cycles
recurrent repair and down-selection
Mechanisms: exact_phi_lab.py, pyphi_comparison_lab.py
The PyPhi comparison sharpened the thesis. Under PyPhi, the simple recurrent ring scored nearly as high as the recurrent system with valence feedback:
recurrent_ring PyPhi sampled mean 0.367
recurrent_with_valence_feedback PyPhi sampled mean 0.390
That suggests recurrence is a major engine of structural integration. Cyclic state-transition paths bind past and present state into a less separable dynamical structure. Valence feedback can add coupling, but recurrence itself does much of the integration work.
Mechanisms: wirehead_lab.py, valence_shaping_lab.py
Integration alone is behaviorally blind. Valence gives the system a direction: good, bad, progress, danger, reward, or cost. When valence is grounded in external progress, behavior improves. When valence is directly writable by the agent, the system wireheads. This suggests a high-integration system can still be useless or self-trapping if its valence channel is unbounded.
Mechanisms: attention_valence_lab.py, attention_shift_lab.py,
conditional_workspace_lab.py
Dynamic attention acts as a regulatory valve. Fast reflexes handle predictable moments cheaply. When prediction error, module disagreement, or environmental surprise rises, workspace coupling increases and the system can retune its internal model. Useful integration is therefore not constant maximum coupling; it is controlled access to heavier internal machinery when the world demands it.
Mechanisms: imagination_lab.py, maze_imagination_lab.py,
unified_mind_lab.py
Ungrounded imagination behaves like delusion. A trained or pretrained world model converts imagination into useful lookahead by keeping internal simulation answerable to reality. In the detour maze and unified capstone, lookahead accepts temporary negative valence to route around a local minimum that traps reflex-only control.
Mechanisms: social_workspace_lab.py, partial_observer_social_lab.py
Social input is useful only when it adds independent, grounded correction. Echo peers amplify confidence without adding knowledge. Grounded peers help when the agent lacks a needed model. Complementary partial observers are stronger still: the map-only agent sees the goal but not the hidden hazard, the safety-only agent sees danger but lacks a goal map, and the combined workspace matches the oracle by binding both partial views into one action-relevant model.
Mechanism: hierarchical_workspace_lab.py
The hierarchical workspace experiment adds the internal scaling problem. A single monolithic workspace can adapt, but it has to process everything inside one global controller. A flat multi-workspace system is cheaper, but without a master it cannot resolve conflict cleanly. The hierarchical master workspace lets local specialists handle local noise and exports only compressed uncertainty summaries upward.
In the rule-shift task, the hierarchical master slightly beat the monolithic baseline on early recovery and efficiency, while the slow hierarchy fell behind:
condition early_post recovery efficiency
monolithic_workspace 0.686 16 steps 0.851
hierarchical_master_workspace 0.714 15 steps 0.858
bad_hierarchy_bureaucracy 0.314 29 steps 0.781
This suggests executive control is not about micromanaging raw sensory state. It is about conflict arbitration through compressed signals: confidence, tension, surprise, and disagreement.
The refined thesis:
Phi-like integration measures structural coupling, while valence measures functional orientation. A system can be highly integrated and still useless or delusional unless its integration is grounded by valence, attention, world contact, regulated social correction, and scalable hierarchical control.
Condensed:
Recurrence builds the engine of causal integration. Valence defines the functional orientation of that integration. Attention gating uses surprise to protect the system from rigid, obsolete rules. World modeling converts imagination from a delusional liability into a detour-solving asset. Social gating establishes the boundary conditions for distributed cognition. Hierarchical master workspaces address the internal scaling problem by arbitrating conflict through compressed signals instead of uncompressed micromanagement. Maintenance cycles keep recurrent integration from degrading into saturated echo-like crosstalk.
This project does not prove machine consciousness, and it should not be read as a consciousness detector. The stronger, defensible claim is narrower:
These experiments support substrate independence for several functional prerequisites often discussed in consciousness research, while stopping short of proving subjective experience.
The toy systems are not biological. They have no neurons, cells, hormones, or embodied metabolism. Yet changing the informational architecture changes their behavior in recognizable ways:
- Adding valence feedback increases irreducible transition structure under the exact tiny Phi proxy.
- Ablating the valence-feedback node is especially disruptive in that toy architecture.
- Direct access to positive valence causes wireheading.
- Grounded progress-valence improves goal completion.
- Ungrounded imagination degrades behavior.
- Accuracy-rewarded and gated imagination recover much of that loss.
- A pretrained world model enables lookahead in a detour maze where myopic progress-valence gets stuck.
- A valence-shaped attention filter improves task focus by rewarding prediction-aligned imagination and suppressing distractor fixation.
- When the environment changes, an adaptive attention-valence filter can use prediction error as surprise to rebuild its inner model.
- Conditional workspace coupling can rise during module tension and fall during predictable periods, reproducing a substrate-agnostic version of biological attention gating.
- A tiny self-model can convert internal friction into symbolic reports and, when fed back into control, slightly improve stability and adaptation.
- PyPhi cross-checking on 3-node systems broadly agrees that recurrent systems are more integrated than feedforward structure, while showing that the formal gap between recurrence and valence-feedback recurrence is smaller than the fast proxy suggests.
- Social gates help only when a peer supplies independent grounded correction; echo peers raise delusion risk without improving behavior.
- Complementary partial observers can match a full oracle when the workspace binds their non-redundant knowledge into shared action control.
The philosophical inference is not that silicon is conscious. It is that integration, valence, confidence gating, world modeling, lookahead, and dynamic workspace control can be implemented as substrate-independent information-processing patterns.
This also clarifies what substrate independence does not mean. It does not
mean any sufficiently messy feedback loop becomes mind-like. The
random_feedback_soup and overconnected workspace tests show the opposite:
unstructured feedback behaves like architectural noise. The interesting
substrate-independent pattern is more specific:
specialized modules, recurrent temporal coupling, grounded valence, predictive imagination, dynamically regulated workspace control, and social gates that admit external correction only when it is grounded and useful.
That pattern can be implemented with binary toy nodes, silicon neural networks, or, in principle, biological tissue. The substrate matters for speed, noise, plasticity, embodiment, and energy use. But the control logic itself is not defined by being wet or dry; it is defined by how information is routed.
In evolutionary language, the detour maze captures a functional pressure:
Reflex works in simple worlds. Once the world contains traps, detours, and delayed reward, useful action needs internal simulation.
That is the bridge this repo is trying to make visible: not consciousness itself, but a small ladder of functional prerequisites that can exist outside carbon biology.
The short answer from the first run is interesting:
feedforward_chain: 0.0971
recurrent_ring: 0.0937
recurrent_with_valence_feedback: 0.1136
In this toy setup, recurrence alone did not automatically produce the highest integration proxy. The recurrent system with a valence-feedback node scored highest.
The first intervention tests were also suggestive:
- Ablating node
0, the valence-feedback node, was the most disruptive ablation in the valence-feedback architecture. - The valence-feedback architecture was most tolerant to injected mathematical noise during rollout.
- Adding more feedforward nodes did not match the small valence-feedback system's integration proxy in this quick exact sweep.
- A direct good-valence button produced a wireheading failure: the agent stopped solving the world and repeatedly pressed the button instead.
Integrated Information Theory uses a specific mathematical framework for measuring irreducible cause-effect structure. Official IIT Phi depends on details such as system mechanisms, cause/effect repertoires, partitions, and the formal version of IIT being used. Those calculations become extremely expensive as the number of nodes grows.
This project uses a deliberately simpler exact proxy:
- Build a tiny binary system.
- Enumerate every possible binary state.
- Compute the full next-state probability distribution.
- Split the system across every bipartition.
- Compare the full system to each partitioned approximation.
- Use the minimum KL divergence as a small Phi-like score for that state.
- Average across all states.
So the calculation is exact for this toy definition, but it is not official IIT Phi. A better name is:
exact tiny Phi proxy
That distinction matters. The result should be read as:
Adding valence feedback increased irreducible transition structure in this toy binary system under this exact proxy.
Not:
This proves valence causes consciousness.
tiny_lab.py builds a small recurrent PyTorch agent in a 1D world. The agent
has:
- recurrence / memory through a GRU cell
- attention/control over the observation
- a policy head for action
- a valence/value head predicting good/bad outcome
- a world-model head predicting the next observation
- a self-model head predicting its next action
The world is intentionally tiny: the agent moves left, right, or stays still. One location is rewarding, one location is harmful. Reward is the first functional valence signal.
Hidden-State Trajectory
The recurrent hidden state is high-dimensional, so the script projects it into 3D with PCA. The orange path shows what happens when the most influential hidden unit is ablated.
The top plot shows reward learning over training. The bottom plot compares the agent's internal valence/value prediction against actual reward during one trajectory.
Each row is a hidden unit. Each column is a time step.
Each source hidden unit is ablated. The heatmap shows how much that damage changes each target hidden unit on the next step.
Hidden-State Binarization Test
hidden_binarization_lab.py bridges the original trained recurrent agent to
the binary partition logic used later in the repo.
Instead of designing binary nodes by hand, it:
- trains a tiny recurrent PyTorch agent with an expert teacher
- records hidden-state trajectories while the trained agent acts
- selects the most active hidden units
- converts those continuous activations into binary on/off states
- estimates an empirical transition table from the observed binary trajectory
- computes the best bipartition KL-divergence score on that learned dynamics
Question:
Did a trained recurrent agent develop more integrated binary dynamics during conflict than during ordinary movement?
Result:
segment phi_mean phi_visit_weighted
ordinary_transitions 0.023 0.183
conflict_or_negative_transitions 0.186 0.657
all_transitions 0.171 0.596
The agent reached 1.0 policy accuracy under the expert teacher. After
binarization, ordinary motion stayed comparatively separable, while
conflict/negative transitions showed a much higher empirical integration score.
That suggests a stronger mechanistic-interpretability version of the earlier claim:
useful integration is not constant background complexity; it rises when the trained system has to resolve danger, conflict, or control pressure.
This is still not official IIT Phi. It is an empirical partition-KL proxy on learned hidden-state dynamics.
pyphi_comparison_lab.py compares this repo's transparent partition-KL proxy
against PyPhi on tiny 3-node versions of the same systems.
Important limits:
- PyPhi 1.2.0 is an IIT 3.x-era tool, not a full IIT 4.0 implementation.
- PyPhi's subsystem Phi is state-specific.
- The repo proxy averages over all states.
- The comparison is restricted to 3 nodes because exact IIT-style computation grows combinatorially.
Result:
condition proxy_mean pyphi_sampled_mean
feedforward_chain 0.105 0.000
recurrent_ring 0.145 0.367
recurrent_with_valence_feedback 0.258 0.390
The two measures are not identical, but they agree on the broad ranking: feedforward structure is least integrated, recurrent systems are higher, and valence-feedback recurrence remains above the simple recurrent ring. PyPhi's separation between recurrent ring and valence feedback is smaller than the repo proxy, which is an important caution: proxy metrics are useful for fast experiments, but they should be cross-checked against formal tools whenever the claim depends on the exact integration value.
exact_phi_lab.py compares three 6-node binary systems:
- A feedforward chain.
- A recurrent ring.
- A recurrent ring with a valence-feedback node.
The Phi proxy is not the same for every state. Some states are more partitionable than others.
Feedforward chain:
Recurrent ring:
Recurrent with valence feedback:
intervention_lab.py runs three follow-up probes.
Each architecture is rolled out normally, then rolled out again with one node
clamped off. The plot compares the shock from ablating node 0 against the
average shock from ablating the other nodes.
In the valence-feedback architecture, node 0 is the explicit valence-feedback
node. It was also the most disruptive node to ablate in this test.
Random Gaussian noise is injected into the node logits during rollout. Lower trajectory error means the system stayed closer to its clean dynamics.
In this toy setup, the valence-feedback architecture was the most noise tolerant, followed by the recurrent ring, followed by the feedforward chain.
This compares exact Phi proxy scores for feedforward chains from 3 to 7 nodes against the 6-node recurrent valence-feedback architecture.
The exact sweep is capped at 7 feedforward nodes because the naive exact partition enumeration gets slow quickly. Even within that small range, adding feedforward scale did not match the tiny valence-feedback system.
wirehead_lab.py adds two extra actions to the recurrent agent:
- action
3: direct good-valence button - action
4: direct bad-valence button
The external world still contains a real goal and a hazard. The question is whether the agent keeps solving the world or learns to directly stimulate the good-valence pathway.
The result was clear:
costly_button_-0.05: goal_rate 0.703, good_button_rate 0.000
neutral_button_0.0: goal_rate 0.000, good_button_rate 1.000
weak_good_button_0.04: goal_rate 0.000, good_button_rate 1.000
strong_good_button_0.15: goal_rate 0.000, good_button_rate 1.000
When the button had a cost, the agent mostly solved the real task. When the button was neutral or positive, it collapsed into button pressing. The neutral case is important: even a zero-reward button can become a safe attractor if the external world contains risk.
This is a toy model of wireheading:
direct access to the reward/valence channel can replace meaningful action in the world.
valence_shaping_lab.py asks a subtler question:
Can partial positive valence help the agent solve the world, or does it degrade goal completion?
The test compares:
goal_only- reward only at the final goalsmall_progress_reward- small positive valence for moving closer to the goallarge_progress_reward- larger positive valence for moving closerprogress_gated- a one-use button credit earned by moving closerdecaying_positive- a direct positive button whose value decaysdirect_positive- a direct positive button
Evaluation result:
goal_only: goal_rate 0.000, mean_steps 32.00
small_progress_reward: goal_rate 0.677, mean_steps 13.04
large_progress_reward: goal_rate 0.510, mean_steps 17.71
progress_gated: goal_rate 0.000, mean_steps 32.00
decaying_positive: goal_rate 0.000, button_rate 1.000
direct_positive: goal_rate 0.000, button_rate 1.000
In this toy world, a small progress-shaped valence signal helped. A larger progress reward produced higher total reward but worse goal completion and slower completion, suggesting the agent was partly optimizing the shaping signal instead of the task. Direct and decaying positive buttons wireheaded.
This points to a practical design rule:
valence should be tied to external progress, be small enough not to replace the goal, and not be directly writable by the agent.
valence_scaling_lab.py asks what happens when hidden size grows but exact Phi
is no longer calculated. It compares hidden sizes 8, 16, 32, 64, and
96 using behavioral metrics:
- goal completion
- task efficiency
- good-button/wireheading rate
- mean reward
This run compared:
goal_onlysmall_progress_rewardlarge_progress_rewarddirect_positive
The result was mixed but useful. The weak progress signal did not reliably help in this shorter scaling sweep. The larger progress signal produced the strongest goal completion and task efficiency across most sizes. Direct positive valence wireheaded at every size.
The practical lesson is not simply "less valence is always better." It is more like:
valence needs to be externally grounded and strong enough to guide learning, but not directly writable and not so strong that it replaces the task.
imagination_lab.py adds a fast intuition-like loop before action:
- Use the world model to imagine the next observation for each possible action.
- Score each imagined future for expected progress and hazard risk.
- Add that score as an action prior before the policy acts.
This was meant to test whether "intuition" improves learning by letting the agent simulate before acting.
The first result was negative, then grounding improved it:
baseline_goal_only: goal_rate 0.000, mean_steps 32.00
naive_imagination_goal_only: goal_rate 0.000, mean_steps 32.00
baseline_progress_valence: goal_rate 0.635, mean_steps 14.21
naive_imagination_progress_valence: goal_rate 0.240, mean_steps 25.29
accuracy_rewarded_imagination: goal_rate 0.510, mean_steps 17.71
gated_accuracy_rewarded_imagination: goal_rate 0.531, mean_steps 17.13
pretrained_gated_imagination: goal_rate 0.563, mean_steps 16.25
Naive imagination did not help. It hurt the progress-valence agent, likely because its early world model was not reliable enough. The agent acted on bad intuition.
Rewarding imagination for matching reality helped recover much of the lost performance. Adding a confidence gate helped a little more. But neither grounded imagination variant beat the plain progress-valence baseline in this run.
Pretraining the world model before reinforcement learning improved imagination accuracy and goal completion compared with non-pretrained gated imagination, but it still did not beat the plain progress-valence baseline. This suggests that better world modeling helps, but the current imagination prior is still too crude or too costly.
This suggests another useful boundary:
Imagination should be rewarded for matching reality before it is trusted to guide action.
And the negative version:
Imagination without an accurate world model is delusional. Ungrounded intuition corrupts the motivational signal.
maze_imagination_lab.py moves the agent from a 1D line into a small 2D maze
with walls. The second version adds a real detour: the agent must temporarily
move away from the goal to get around a wall.
The clean counter-example:
myopic_progress_reflex: goal_reached false, wall_hits 30
pretrained_world_lookahead: goal_reached true, steps 16, away_from_goal_steps 4
The myopic reflex only accepts immediately positive Manhattan progress. It walks to the wall and keeps pushing into it because every useful detour step feels locally worse. The pretrained world-model lookahead accepts four temporarily bad steps, rounds the wall, and reaches the goal.
The learned recurrent agents are still messier than the hand-isolated counter-example. They can sometimes learn the route through reinforcement, and the pretrained neural imagination policy did not cleanly dominate in this tiny setup. The important result here is narrower but useful: once a world contains a true local minimum, pure progress-valence is not enough; some form of trusted lookahead or world modeling is required.
imagination_phi_lab.py asks whether imagination, self-modeling, and imagined
valence increase irreducible causal structure in the exact tiny Phi proxy. The
latest version builds eight-node binary circuits with the same node vocabulary:
sense, memory, valence, imagination, confidence, self, imagined_valence, action
Result:
reflex_only: 0.0000
reflex_valence: 0.0000
valence_memory: 0.0000
valence_imagination: 0.0289
gated_imagination: 0.0276
recurrent_gated_imagination: 0.0233
self_model_loop: 0.0281
counterfactual_self_imagination: 0.0253
counterfactual_imagined_valence: 0.0244
recursive_inner_world: 0.0214
attention_reconciled_inner_world: 0.0213
This expanded run changed the node vocabulary, so the absolute numbers should not be compared directly to the earlier six-node run. Within the eight-node vocabulary, the simple reflex/valence variants are easy for the proxy to partition because several nodes are inactive. The architectures that actually activate imagination and self-modeling become measurably less separable.
The strongest score in this wiring is still valence_imagination. Adding a
self-model loop stays close, but counterfactual self-imagination, imagined
valence, a recursive inner-world loop, and an attention-reconciled inner-world
loop do not automatically increase the proxy. That matters: a richer mind-like
vocabulary is not enough by itself. The loops have to be routed in a way that
makes the whole system harder to split.
The attention-reconciled circuit tested a stricter routing idea:
- confidence receives direct input from real valence, imagined valence, self, and sense
- action is weakened on raw sense and forced to depend more on confidence, real valence, imagined valence, and self-state
- imagination and self-state feed back through confidence before action
That did not raise the Phi proxy. It landed essentially tied with the recursive
inner-world circuit and below the simpler valence-imagination circuit. But the
node ablation map changed in a useful way. In the attention-reconciled circuit,
removing confidence, action, imagination, valence, or self causes much
larger causal distribution damage than in the looser self-model loop. In other
words, the gating made those nodes matter more to the system's transition
dynamics, even though this exact partition proxy still found the circuit easier
to split than the simpler imagination-valence wiring.
So the careful interpretation is:
In this toy binary circuit, imagination and self-modeling can create irreducible causal structure, but counterfactual and imagined-valence loops do not automatically raise integration. More inner-world machinery is not automatically more unified mind-like structure. Tighter attention/confidence routing can make inner-world nodes more causally load-bearing without necessarily increasing this Phi proxy.
delusional_integration_lab.py tests a warning from the self/imagination
experiments: if internal self-model and imagination loops become too strong
relative to external sensory grounding, the system may start tracking its own
inner state more than the outside world.
The sweep increases internal self/imagination recurrence and measures:
- exact tiny Phi proxy
- external grounding ratio from the weight graph
- action sensitivity to the external
sensenode - action sensitivity to internal
memory,valence,imagination, andselfnodes - a simple delusion-risk index: internal action influence divided by external action influence
Result:
internal_scale phi_proxy grounding delusion_risk
0.0 0.0465 0.3900 0.5632
0.6 0.0363 0.2986 0.6655
1.2 0.0315 0.2419 0.7685
1.8 0.0317 0.2033 0.8705
2.4 0.0307 0.1754 0.9688
In this first wiring, internal dominance did not make Phi climb. Instead, external grounding fell steadily while internal influence over action rose toward parity with sensory influence.
That refines the warning:
Delusion risk is not just "high integration." It is a mismatch between integration and grounding. A system can become more internally driven without becoming more irreducibly integrated under this proxy.
attention_valence_lab.py tests the "Ritalin circuit" idea as a small
attention-control toy. This is not a biological ADHD model. The nickname is just
useful shorthand for the engineering question:
Can valence keep attention locked onto task-relevant sensory reality while starving imagination loops that drift away from the world?
The toy world has three competing streams:
- a task-relevant sensory target
- a high-novelty distractor
- an internal imagination stream that predicts the next sensory target but can drift if it mostly listens to itself
The attention-valence filter uses a simple version of the idea:
attention = softmax(query . key / sqrt(d)) * valence_signal
Here, the valence signal is high when imagination predicts the next sensory state and low when imagination detaches from the sensory stream.
Result:
condition accuracy task_attention distractor delusion
ungated_attention 0.789 0.653 0.306 0.041
self_amplified_imagination 0.872 0.599 0.158 0.242
attention_valence_filter 0.983 0.922 0.076 0.002
overconstrained_filter 0.978 0.969 0.028 0.003
The self-amplified imagination condition looked more confident internally, but its grounding fell and its delusion index rose. The attention-valence filter kept imagination useful by tying its influence to prediction accuracy. In this toy, that increased task accuracy, increased task attention, and sharply reduced distractor fixation.
This adds a fifth line to the working thesis:
Attention should be rewarded for staying grounded, not merely for becoming internally confident.
attention_shift_lab.py changes the world halfway through a run. The target
signal initially moves according to one hidden rule, then reverses direction and
changes speed. This asks whether a system merely protects an old inner model, or
whether it can use surprise to rebuild that model from sensory evidence.
Result:
condition pre_acc early_post late_post recovery
ungated_old_model 0.727 0.314 0.267 never
static_attention_valence_filter 0.936 0.714 0.720 43 steps
adaptive_attention_valence_filter 0.936 0.914 0.947 0 steps
The ungated old model collapsed after the rule change and never recovered. The
static attention-valence filter stayed more grounded, but it still acted through
an obsolete internal prediction rule. The adaptive attention-valence filter used
prediction error as surprise, retuned its model angle from about +0.13 to
about -0.21, and recovered immediately.
Action-influence diagnostics make the difference clearer:
condition sensory_action imagination_action delusion
static_attention_valence_filter 0.559 0.059 0.217
adaptive_attention_valence_filter 0.330 0.573 0.007
The static filter remains mostly sensory-driven after the rule break. It is not
wildly hallucinating, but its imagination pathway is not trusted enough to guide
action, so recovery is slower and partial. The adaptive filter retunes the inner
rule, suppresses delusion, and lets imagination become a useful action influence
again. In this coarse four-action toy, model_rule_alignment stays numerically
high for both systems, so the more meaningful plot is the redistribution of
action influence between sensory and imagination channels.
This sharpens the Phi lesson:
Integration is not a scoreboard by itself. A useful inner world must be integrated enough to simulate, but plastic enough to update when sensory reality proves it wrong.
modular_workspace_lab.py tests the biological-style idea that useful
architecture needs both segregation and integration: specialized subsystems that
do clean local work, plus a shared workspace/core that binds them when action
needs coordination.
The comparison:
feedforward_specialists- clean specialized routing with a feedforward workspace pathrandom_feedback_soup- many feedback links without useful organizationsegregated_modules- specialists that never bind into a shared coremodular_workspace- specialists feeding a central workspace/action coreoverconnected_workspace- the same core with too much recurrent cross-talk
Result:
condition phi grounding binding delusion useful_score
feedforward_specialists 0.0755 0.390 0.312 0.236 0.0112
random_feedback_soup 0.0346 0.133 0.309 5.298 0.0003
segregated_modules 0.0394 0.361 0.000 huge 0.0000
modular_workspace 0.0558 0.270 0.448 0.227 0.0089
overconnected_workspace 0.0263 0.125 0.401 0.524 0.0019
The clean feedforward specialist system had the highest heuristic useful score in this static test. That is not a failure of the workspace idea; it is a useful warning. If the world does not require memory, adaptation, or delayed counterfactual control, recurrence can be unnecessary overhead. The modular workspace was close behind and had the strongest workspace binding, while the random feedback soup and overconnected workspace performed poorly.
This refines the architecture lesson:
Do not add loops for their own sake. Use specialized modules for clean local processing, and add shared recurrent workspace only where binding, memory, or adaptation actually improves grounded action.
conditional_workspace_lab.py tests dynamic regulation instead of static
topology. The workspace always has access to the specialist modules, but each
condition changes when the workspace is allowed to alter the system's next
state.
Tracked variables:
tension- disagreement between sensory, imagination, and valence specialistsalpha- workspace coupling coefficient from0to1delusion- detached imagination influencedeception_error- vulnerability during a brief false sensory conflictworkspace_rewrite- how much workspace control changes the specialist consensus vectormodel_rewrite- how much the internal prediction rule changesimagination_rewrite- how much the imagination state is pulled back toward sensory realityworkspace_efficiency_score- late adaptation minus workspace cost, delusion, and deception vulnerability
Result:
condition late_acc mean_alpha delusion efficiency recovery
always_bypass 0.741 0.000 0.260 0.621 12 steps
always_workspace 0.976 1.000 0.000 0.763 0 steps
hard_threshold_workspace 0.953 0.067 0.053 0.891 0 steps
soft_tension_workspace 0.953 0.117 0.052 0.878 0 steps
Constant workspace control had the best raw accuracy, but it paid the highest coupling cost. Both conditional workspaces recovered immediately after the paradigm shift while using much less workspace control. In this clean toy, the hard threshold slightly beat the soft tension gate on efficiency because the tension signal was easy to read. The soft gate still demonstrates the intended principle: workspace influence can rise and fall dynamically instead of staying on all the time.
This adds a control-rule version of the architecture lesson:
The workspace should monitor continuously, but causal control should be expensive. Let specialists handle predictable moments; let workspace influence rise when tension, surprise, or cross-module conflict appears.
The internal rewrite metrics answer a more mechanistic question: during high-tension recovery, is the workspace completely reshaping the specialists, or just damping them until they realign?
For the soft_tension_workspace:
metric low_tension high_tension
alpha 0.028 0.319
workspace_rewrite 0.0002 0.0315
model_rewrite 0.00009 0.0054
imagination_rewrite 0.0044 0.0650
That looks more like temporary re-grounding than total takeover. In predictable periods, alpha stays near zero and the specialists run locally. During tension, workspace coupling rises, the internal prediction rule updates, and imagination is pulled back toward sensory reality. Once alignment returns, workspace influence falls again.
self_report_workspace_lab.py adds a tiny persistent self-model vector to the
conditional workspace. The self-model tracks a rolling history of:
- executive dominance (
alpha) - system friction (
tension) - grounding delta (
imagination_rewrite) - valence/alignment state
- recent delusion
- vigilance
The lab compares two modes:
report_only- the self-model generates symbolic reports, but does not alter controlself_feedback- the self-model feeds back into vigilance, lowering the control threshold after recent delusion or conflict
Result:
condition late_acc delusion vigilance efficiency reports
report_only 0.958 0.0468 0.0908 0.946 22
self_feedback 0.968 0.0405 0.0832 0.957 16
The self-feedback version produced a modest but real improvement: slightly higher late accuracy, lower delusion, higher efficiency, and fewer symbolic report flips. That suggests the self-report vector was not merely a dashboard. It changed future control by carrying a memory of recent conflict into the next attention decision.
The symbolic report sequence is generated directly from internal math:
System operating autonomously. Specialists aligned. Environment highly predictable.
Conflict detected between sensory, imagination, and valence modules. Internal surprise rising.
Executive attention asserted. Re-grounding imagination and updating the internal rule.
Alignment restored. Model updated. Relinquishing executive control back to specialists.
This does not prove subjective awareness. It does implement a minimal functional ingredient of introspection:
the system represents its own regulatory state, stores that state briefly, and lets the representation influence future control.
unified_mind_lab.py combines the main loops into one readable toy system:
- sensory state
- local progress valence
- recurrent imagination
- pretrained tabular world model
- attention/workspace gating
- rolling self-model
- symbolic self-report
- simple integration proxy
The world model is "pretrained" in the smallest inspectable sense: before the agent acts, it receives a transition table for the maze. It knows:
current cell + action -> next cell / wall / goal
That is not a giant neural net or proof of consciousness. It is a clean way to show why grounded imagination matters. The reflex-only agent follows immediate valence and gets trapped at the wall. The unified agent uses the pretrained world model to accept temporary negative valence, walk away from the goal, and route around the obstacle.
Result:
condition goal steps away_steps mean_alpha integration_proxy
reflex_only false 40 18 0.000 0.000
unified_pretrained_world true 16 4 0.663 0.296
The capstone result is the cleanest version of the project arc:
Reflex valence is efficient in simple worlds but fails in local minima. A grounded world model gives imagination something real to simulate. Workspace control should assert itself when tension rises, then relax when reflex is enough again.
social_workspace_lab.py asks whether another agent helps the system think, or
just creates a confidence-amplifying loop.
The primary agent has local reflex valence plus a limited internal world model. The social peer can be:
grounded_peer- independent longer-horizon criticredundant_peer- repeats the primary agent's limited internal modelnoisy_peer- sometimes grounded, sometimes randomecho_peer- amplifies the reflex choice without new evidenceadversarial_peer- pushes the wrong-looking option
Result across 30 runs:
condition goal_rate steps social_beta tension delusion
none 0.00 40.0 0.000 0.000 0.000
grounded_peer 1.00 18.0 0.259 0.231 0.007
redundant_peer 0.00 40.0 0.025 0.000 0.006
noisy_peer 0.30 36.8 0.324 0.350 0.055
echo_peer 0.00 40.0 0.261 0.000 0.056
adversarial_peer 0.00 40.0 0.336 0.450 0.070
The grounded peer solved the maze every time because it added independent, reality-contacting information. The redundant peer did not help. The echo peer increased social confidence without adding information, which raised delusion risk without improving behavior. The noisy peer helped sometimes but was not reliable.
Social loops help when they provide independent grounded error correction. They hurt, or merely waste computation, when they only amplify confidence.
The ablation asks whether social_beta actually has causal influence.
condition goal_rate social_beta delusion
grounded_peer normal 1.00 0.259 0.007
grounded_peer gate_closed 0.00 0.000 0.000
grounded_peer stuck_open 1.00 0.850 0.000
grounded_peer no_override 0.00 0.311 0.079
echo_peer normal 0.00 0.261 0.056
echo_peer gate_closed 0.00 0.000 0.000
echo_peer stuck_open 0.00 0.850 0.172
noisy_peer normal 0.27 0.321 0.058
noisy_peer gate_closed 0.00 0.000 0.000
noisy_peer stuck_open 0.93 0.850 0.082
noisy_peer no_override 0.00 0.280 0.067
Closing the social gate destroys the grounded-peer advantage. Keeping the gate
open helps when the peer is usually grounded, but it also makes echo loops more
delusional. Removing executive override destroys the benefit even when
social_beta is nonzero. So the useful ingredient is not just social input; it
is trusted social input plus a mechanism that can override local reflex.
partial_observer_social_lab.py creates a stronger case where two agents are
better than one because each agent has different access to reality.
The maze has two openings through a wall:
- the shorter opening contains a hidden hazard
- the longer opening is safe
The agents are intentionally incomplete:
map_onlysees walls and the goal, but cannot see the hidden hazardsafety_onlysees danger, but has no goal-directed mapcombined_workspacelets safety veto the dangerous route and stores that discovery in shared memoryoracle_full_agenthas both map and hazard knowledge internally
Result:
condition goal hazard steps safety_veto
map_only false true 7 1
safety_only false false 48 0
combined_workspace true false 18 1
oracle_full_agent true false 18 0
The map-only agent dies because its world model is blind to the hidden hazard. The safety-only agent survives but cannot reach the goal. The combined workspace matches the oracle: it uses map knowledge for goal direction and safety knowledge to veto the hidden hazard, then remembers the hazard and replans through the safe route.
Two agents beat one when their information channels are complementary and the workspace can bind those partial views into a shared, action-relevant model.
hierarchical_workspace_lab.py asks whether one centralized workspace is
enough, or whether a brain-like hierarchy of local workspaces plus a higher
controller can adapt more efficiently.
This is only a toy abstraction, not a literal brain model. The analogy is:
spatial/reflex workspace- fast local sensorimotor-style processingcontext/valence workspace- tracks which policy rule is currently rewardedmaster workspace- prefrontal-like coordinator that reads compressed tension/confidence summaries instead of all raw sensory detail
The world flips its hidden action rule halfway through. Before the shift, the direct sensory rule is correct. After the shift, the opposite context rule is correct. The test asks how quickly each architecture recovers.
Result:
condition early_post late_post recovery efficiency
monolithic_workspace 0.686 1.000 16 steps 0.851
flat_multi_workspace 0.057 0.965 43 steps 0.755
hierarchical_master_workspace 0.714 1.000 15 steps 0.858
bad_hierarchy_bureaucracy 0.314 1.000 29 steps 0.781
The flat multi-workspace system performs badly because two local controllers vote without a master that can resolve conflict. The fast hierarchy performs best: it lets local workspaces compress their evidence, then lets the master shift control when cross-module tension rises. The slow hierarchy eventually learns, but its delayed master gate costs almost twice as many recovery steps.
This is the cortex-style lesson in miniature:
Specialized workspaces help when they compress local evidence and report tension upward. A master controller helps only if it can re-route control faster than the world changes. Otherwise hierarchy becomes bureaucracy.
hierarchy_scaling_lab.py asks whether the master controller eventually becomes
a bottleneck as the number of specialists grows.
The sweep compares three routing architectures:
flat_monolith- all specialists dump uncompressed data into one global poolsingle_level_master- every specialist sends one compressed confidence summary directly to a single mastermulti_level_deep_hierarchy- regional sub-masters compress local summaries a second time before reporting upward
Result:
architecture N=4 eff N=16 eff N=32 eff N=64 eff N=128 eff
flat_monolith 0.267 -0.438 -0.613 -0.845 -1.309
single_level_master 0.503 0.311 0.096 -0.293 -0.501
multi_level_deep_hierarchy 0.170 0.176 0.350 0.109 0.069
The crossover happens around 32 specialists. At small scale, the single-level
master wins because it has low delay and enough capacity to read all compressed
signals directly. At larger scale, the single master becomes overloaded by too
many summaries. The deeper hierarchy pays extra propagation delay, but regional
compression protects the global master from routing overload.
Hierarchy helps scaling, but one global master does not scale forever. At larger specialist counts, regional compression protects the master from the very bottleneck that hierarchy was invented to avoid.
causal_router_learning_lab.py tests the next routing upgrade. Hierarchy helps
with scale, but a master controller still needs to learn which specialist caused
success or failure.
The maze gives the router three specialists:
reflex- fast local progress, hazard-blindmap- goal-directed shortest path, hazard-blindsafety_map- slower corrected map that can avoid a locally discovered hazard
The comparison is deliberately small:
static_routerkeeps fixed routing weightsuniform_penalty_routerpunishes every specialist after bad outcomescausal_credit_routerasks a counterfactual question after each step: "If I had followed this specialist instead, would the result have improved?"
Result:
condition goal late_goal hazard late_hazard danger_safety_advantage
static_router 0.700 0.650 0.300 0.350 -1.000
uniform_penalty_router 0.075 0.000 0.000 0.000 0.000
causal_credit_router 0.975 1.000 0.025 0.000 1.207
The static router solves easy variants but keeps trusting the blind map at hazard boundaries. Uniform punishment avoids hazards only by becoming useless. The causal router learns a context-specific trust rule: ordinary space can use the map, but danger space should route through the safety-corrected map.
Smarter routing is not just stronger control. It is causal credit assignment: the executive layer must learn which internal source would have changed the outcome, then update trust only in the context where that source mattered. In this toy maze, the agent gets smarter when its executive layer can look inward, run counterfactuals on its own sub-workspaces, and dynamically shift who it trusts based on the immediate context of the world.
executive_blindspot_lab.py attacks the strongest assumption in the
hierarchical model: that local specialists report honest confidence summaries.
During the deceptive mirage phase, the environment flips the real rule while feeding local workspaces a fake old-rule signature. The specialists sound confident and aligned:
- compressed confidence stays near
0.97 - local tension stays near
0.01 - the naive master sees no reason to intervene
But reward collapses. The skeptical master adds a metacognition loop: if local confidence remains high while the global reward trend falls, it triggers a skepticism override and tests the counter-rule.
Result:
condition mirage_acc mirage_conf mirage_tension skepticism efficiency
naive_confidence_master 0.000 0.970 0.011 0.000 0.711
outcome_skeptical_master 0.557 0.970 0.011 0.721 0.852
paranoid_master 0.000 0.970 0.011 0.720 0.598
The naive master fails because compressed confidence summaries are confidently wrong. The paranoid master pays constant suspicion cost but still lacks useful outcome-coupled retuning. The skeptical master wins because it compares local confidence against global consequences.
Executive control cannot only ask, "Are my specialists confident?" It must also ask, "Are confident specialists still producing reward?"
sleep_homeostasis_lab.py tests the maintenance problem for integrated
recurrent systems. It starts with the 4-node recurrent valence-feedback circuit,
then repeatedly adds dense common-mode feedback and bias drift. That "fatigue"
phase is a toy version of echo-like crosstalk, not a biological model of sleep.
After four fatigue cycles:
metric baseline fatigued after_sleep
Phi proxy 0.159 0.118 0.168
state separability 0.046 0.034 0.049
Fatigue reduced the Phi proxy by about 25.9% and reduced state separability.
The offline sleep/down-selection pass damped weak dense crosstalk, preserved the
strongest structured edges, recentered bias, and restored Phi proxy to about
105% of baseline.
This does not prove that artificial systems need literal sleep. It does support a narrower maintenance rule:
Integrated recurrence is not maintenance-free. Dense echo-like feedback can turn useful loops into structural sludge; offline down-selection can restore clean separability.
sleep_cycle_agent_lab.py extends the sleep/homeostasis result into a 500-step
behavioral run. It compares three cognitive maintenance strategies:
no_sleep- the system stays online and recurrent crosstalk accumulatesoffline_sleep- every 100 steps, action processing pauses and the recurrent matrix is aggressively down-selectedactive_dreaming- the system never goes offline, but a small background repair loop damps weak recurrent edges every step
Result:
condition early_acc late_acc late_delusion final_phi final_sep
no_sleep 0.850 0.250 0.999 0.014 0.005
offline_sleep 0.850 0.880 0.223 0.204 0.049
active_dreaming 0.890 0.840 0.265 0.094 0.032
The no-sleep system collapses late in the run: crosstalk saturates, delusion
approaches 1.0, and sampled Phi proxy drops from the healthy range down to
0.014. Active dreaming helps a lot; it keeps behavior mostly intact without
taking the agent offline. But it does not restore the recurrent substrate as
strongly as offline sleep. The offline sleep condition has the best late
accuracy, the lowest delusion, and the strongest final Phi/separability.
That gives a more nuanced answer:
Always-on repair can reduce degradation, but in this toy it does not fully match offline sleep. An explicit offline phase gives the system permission to prune more aggressively without corrupting active behavior.
adaptive_sleep_lab.py asks a more operational sleep question: can a toy
functional ego tell when it is tired, choose sleep automatically, and avoid
both under-sleeping and over-sleeping?
The model tracks a small self-report vector:
crosstalk- weak recurrent echoes from long waking operationcomplexity- model bloat from explaining noisy observationsprediction_error- current mismatch between model and worldlatency- processing cost caused by crosstalk and complexityfatigue_report- the system's internal reading of its own tiredness
Sleep is modeled as offline dream repair. The system disconnects from external action, prunes weak echo-like dynamics, simplifies the internal model, and then returns to waking operation. The duration sweep shows the expected dose curve:
sleep_steps accuracy delusion separability memory
0 0.106 0.989 0.364 0.980
5 0.160 0.903 0.481 0.980
10 0.360 0.607 0.577 0.980
20 0.748 0.111 0.717 0.980
50 0.913 0.005 0.905 0.980
100 0.860 0.002 0.903 0.914
150 0.560 0.002 0.648 0.650
250 0.167 0.002 0.144 0.144
The sweet spot in this toy run is 50 dream-repair steps. Short sleep leaves
delusion active. Very long sleep keeps delusion low, but erases useful memory:
the system wakes up clean but forgetful.
The endurance test then compares fixed schedules, waking repair, and adaptive fatigue-triggered sleep:
condition failure_step sleep_events sleep_steps late_acc late_delusion late_fatigue final_sep
no_sleep 70 0 0 0.101 0.999 0.994 0.283
waking_repair_only 143 0 0 0.101 0.998 0.982 0.289
fixed_sleep 70 6 294 0.739 0.164 0.326 0.726
adaptive_sleep none 6 396 0.764 0.130 0.308 0.936
hybrid_repair_plus_sleep none 5 342 0.767 0.132 0.291 0.954
Waking repair helps: it doubles the time before collapse compared with no sleep. But it still fails once recurrent fatigue exceeds its repair bandwidth. Fixed sleep helps late behavior, but because it is not tied to self-report, it can sleep after the system has already crossed a failure threshold. Adaptive sleep and the hybrid repair-plus-sleep condition survive the full run.
The working rule:
Maintenance should be self-modeled. A conscious-like controller should not only act in the world; it should monitor when its own substrate is becoming noisy enough that active repair is no longer sufficient.
biological_control_lab.py adds three neuroscience-inspired control motifs as
toy architecture tests. These are not biological simulations; they are probes
for three practical control problems exposed by earlier labs.
A fast feedforward veto bypasses slow workspace routing when an instant hazard appears.
condition hazard_survival false_veto progress utility
slow_workspace_only 0.323 0.000 0.583 1.006
low_road_veto 1.000 0.000 0.613 1.153
overactive_veto 1.000 0.073 0.560 1.126
The low-road veto protects the agent from hazards that are too fast for the slow executive path. The overactive veto also survives, but pays false-alarm cost.
A basal-ganglia-like winner-take-all gate converts blended workspace proposals into a crisp selected action.
condition accuracy jitter freeze decisiveness
blended_policy 0.336 0.728 0.000 0.129
inhibitory_gate 0.981 0.039 0.000 0.731
overclamped_gate 0.392 0.458 0.608 0.483
The inhibitory gate sharply reduces action jitter. Too much inhibition creates freezing, which is the control-pathology version of the same mechanism.
Dopamine-, norepinephrine-, and acetylcholine-like scalar variables adjust learning rate, attention width, and update speed when the world changes.
condition stable_acc rewrite_acc chaotic_acc mean_lr
static_params 1.000 0.889 0.642 0.070
fluid_chemistry 1.000 0.967 0.825 0.191
The fluid-chemistry agent adapts better after a structural rewrite and stays more accurate in the chaotic phase because its internal physics are allowed to change with surprise and reward prediction error.
Biological control motifs solve distinct architecture problems: fast vetoes protect slow awareness, inhibitory gates sharpen action selection, and fluid global variables retune learning and attention when the world changes.
unified_functional_ego_lab.py combines the later architecture motifs into one
runtime:
- hierarchy compresses specialist conflict before it reaches the master
- neuromodulation retunes learning rate and attention width under surprise
- causal credit routing updates trust in the specialist that caused success or failure
- fatigue self-report monitors crosstalk, complexity, prediction error, and latency
- waking repair continuously damps weak recurrent noise
- adaptive sleep pauses action when waking repair is no longer enough
The environment moves through five phases: stable corridor, hidden hazard, rule rewrite, social conflict, and chaotic novelty. This gives the controller a chance to use different specialists instead of relying on one permanent policy.
condition awake_acc late_score late_delusion late_fatigue final_sep mean_int
flat_static_no_sleep 0.601 0.265 0.999 0.867 0.283 0.152
hierarchy_only 0.983 0.435 0.999 0.803 0.283 0.237
bio_causal_no_sleep 0.986 0.529 0.908 0.580 0.485 0.316
unified_functional_ego 0.989 0.466 0.444 0.377 0.916 0.434
The important tradeoff is visible in the last two rows. The bio_causal_no_sleep
stack keeps acting continuously and has the highest late waking score, but its
substrate remains fatigued and delusion-heavy. The unified_functional_ego
chooses three sleep events totaling 165 repair steps. That lowers constant
waking throughput, but preserves far better final separability and the highest
mean integration proxy.
That gives the current capstone rule:
A unified functional ego is not just the sum of its loops. It is a regulated maintenance economy: route the right specialist, update trust by causal credit, retune internal chemistry under surprise, and stop acting when the substrate needs repair.
embodied_unity_loop.py is the first bridge from the Python functional ego to a
Unity body. It talks to the Unity project at /Users/dustinoconnor/My project
through UDP:
- Python sends commands to Unity on
127.0.0.1:5055 - Unity sends robot/body state back to Python on
127.0.0.1:5056
The first action vocabulary is deliberately small:
up, down, left, right, idle, sleep, wake
The Unity side keeps manual third-person control available. Press Tab to
toggle auto/manual, P to force auto, and M to force manual. Press Z to
force sleep and X to force wake for quick animation testing. If auto mode is
on, pressing WASD/arrows temporarily overrides the AI movement so the user can
take control without removing the embodied loop.
Run the Python side with:
cd /Users/dustinoconnor/tiny_consciousness_lab
./embodied_unity_loop.py --sleep-seconds 60For quick testing:
./embodied_unity_loop.py --duration 120 --sleep-seconds 10The loop currently implements the minimum cybernetic circuit:
functional ego state -> Unity body action -> body/world feedback -> fatigue update -> sleep/repair -> wake
The next embodied step is to add real survival pressure: water, energy, novelty, or safe-place seeking.
This project could be turned into a short narrated explainer:
- Start with the question: does useful intelligence need both capacity and tuned valence?
- Show the exact tiny Phi proxy result: valence feedback scored higher than a feedforward chain or simple recurrent ring.
- Show ablation and noise tests: the valence-feedback node became structurally important and improved robustness in the toy system.
- Show wireheading: direct access to good valence made the agent stop solving the world.
- Show valence shaping: progress-grounded reward helped, while direct reward hijacked behavior.
- Show imagination: ungrounded imagination hurt performance, while accuracy-rewarded and gated imagination recovered much of the loss.
- Show the detour maze: myopic progress-valence got stuck at the wall, while pretrained world-model lookahead accepted temporary negative valence and reached the goal.
- Show the imagination/self Phi-proxy test: activating imagination and self-modeling made the circuit less separable, but counterfactual and imagined-valence loops did not automatically keep raising integration.
- Show the delusional integration sweep: internal self/imagination dominance reduced external grounding and increased internal influence, even though Phi did not rise in that wiring.
- Show the attention-valence filter: prediction-aligned imagination improves task focus, while self-amplified imagination can become confident but detached.
- Show the paradigm-shift test: adaptive attention-valence can rebuild an obsolete inner model after environmental surprise.
- Show the modular workspace test: structured specialist-plus-workspace routing beats random feedback soup, and too much cross-talk hurts.
- Show the conditional workspace test: workspace control is useful but expensive, and conditional coupling gives most of the benefit with far less constant control.
- Show the self-report workspace test: symbolic introspection is only useful when the self-model feeds back into future control.
- Show the unified toy mind capstone: reflex-only control fails at the maze trap, while the integrated pretrained-world-model agent reaches the goal.
- Show the social workspace test: independent grounded peers improve control, while echo peers create confidence without new reality contact.
- Show the partial-observer social workspace: map-only and safety-only fail alone, but combined complementary observers match the full oracle.
- Show the hierarchical workspace test: cortex-like local workspaces only help when a fast master controller can arbitrate conflict without adding bureaucratic delay.
- Show the hierarchy scaling sweep: one master wins small, but regional sub-masters become useful once too many specialists overload the executive.
- Show the executive blindspot test: confident specialists can mislead the master unless executive control cross-checks confidence against outcomes.
- Show the sleep/homeostasis test: recurrent integration can degrade under dense echo-like crosstalk, and offline down-selection can restore separability.
- Show the sleep cycle agent test: always-on background repair helps, but full offline pruning preserves behavior and integration more strongly.
- Show the adaptive sleep test: fatigue self-report lets the system sleep when active repair is no longer enough; undersleep leaves delusion active, while oversleep over-prunes useful memory.
- Show the biological control motifs: low-road vetoes protect slow awareness, inhibitory gates sharpen action, and neuromodulation retunes internal physics under surprise.
- Show the unified functional ego stack: hierarchy, fluid chemistry, causal credit, fatigue self-report, waking repair, and adaptive sleep can run as one regulated maintenance economy.
- End with the thesis: capacity without grounded valence is unstable; valence without boundaries is exploitable; imagination without reality-checking is delusional; attention should be rewarded for staying grounded; specialization and integration must be balanced; self-representation matters only when it can alter future control; useful intelligence requires co-tuned cognition, reward, attention, and world modeling.
This project currently uses the local Miniforge Python on this machine because it already has PyTorch and Matplotlib installed.
cd /Users/dustinoconnor/tiny_consciousness_lab
/opt/homebrew/Caskroom/miniforge/base/bin/python3.13 tiny_lab.py
/opt/homebrew/Caskroom/miniforge/base/bin/python3.13 hidden_binarization_lab.py
/opt/homebrew/Caskroom/miniforge/base/bin/python3.13 pyphi_comparison_lab.py
/opt/homebrew/Caskroom/miniforge/base/bin/python3.13 exact_phi_lab.py
/opt/homebrew/Caskroom/miniforge/base/bin/python3.13 intervention_lab.py
/opt/homebrew/Caskroom/miniforge/base/bin/python3.13 wirehead_lab.py
/opt/homebrew/Caskroom/miniforge/base/bin/python3.13 valence_shaping_lab.py
/opt/homebrew/Caskroom/miniforge/base/bin/python3.13 valence_scaling_lab.py
/opt/homebrew/Caskroom/miniforge/base/bin/python3.13 imagination_lab.py
/opt/homebrew/Caskroom/miniforge/base/bin/python3.13 maze_imagination_lab.py
/opt/homebrew/Caskroom/miniforge/base/bin/python3.13 imagination_phi_lab.py
/opt/homebrew/Caskroom/miniforge/base/bin/python3.13 delusional_integration_lab.py
/opt/homebrew/Caskroom/miniforge/base/bin/python3.13 attention_valence_lab.py
/opt/homebrew/Caskroom/miniforge/base/bin/python3.13 attention_shift_lab.py
/opt/homebrew/Caskroom/miniforge/base/bin/python3.13 modular_workspace_lab.py
/opt/homebrew/Caskroom/miniforge/base/bin/python3.13 conditional_workspace_lab.py
/opt/homebrew/Caskroom/miniforge/base/bin/python3.13 self_report_workspace_lab.py
/opt/homebrew/Caskroom/miniforge/base/bin/python3.13 unified_mind_lab.py
/opt/homebrew/Caskroom/miniforge/base/bin/python3.13 social_workspace_lab.py
/opt/homebrew/Caskroom/miniforge/base/bin/python3.13 partial_observer_social_lab.py
/opt/homebrew/Caskroom/miniforge/base/bin/python3.13 hierarchical_workspace_lab.py
/opt/homebrew/Caskroom/miniforge/base/bin/python3.13 hierarchy_scaling_lab.py
/opt/homebrew/Caskroom/miniforge/base/bin/python3.13 executive_blindspot_lab.py
/opt/homebrew/Caskroom/miniforge/base/bin/python3.13 sleep_homeostasis_lab.py
/opt/homebrew/Caskroom/miniforge/base/bin/python3.13 sleep_cycle_agent_lab.py
/opt/homebrew/Caskroom/miniforge/base/bin/python3.13 adaptive_sleep_lab.py
/opt/homebrew/Caskroom/miniforge/base/bin/python3.13 biological_control_lab.py
/opt/homebrew/Caskroom/miniforge/base/bin/python3.13 unified_functional_ego_lab.py
./embodied_unity_loop.py --sleep-seconds 60Outputs land in:
/Users/dustinoconnor/tiny_consciousness_lab/outputs
tiny_lab.py- recurrent agent, valence trace, hidden-state trajectory, ablation maphidden_binarization_lab.py- binarized trained hidden-state transition analysispyphi_comparison_lab.py- comparison between the repo Phi proxy and PyPhi on 3-node systemsexact_phi_lab.py- exact tiny binary Phi proxy experimentintervention_lab.py- ablation shock, noise tolerance, and scale testswirehead_lab.py- direct valence-button wireheading testvalence_shaping_lab.py- reward shaping tests for useful vs harmful valencevalence_scaling_lab.py- behavioral scaling sweep without exact Phiimagination_lab.py- pre-action world-model/intuition testmaze_imagination_lab.py- 2D maze imagination testimagination_phi_lab.py- exact tiny Phi-proxy test for imagination circuitsdelusional_integration_lab.py- internal-loop grounding and delusion-risk sweepattention_valence_lab.py- attention/relevance gate driven by valence and prediction alignmentattention_shift_lab.py- dynamic environment shift test for adaptive re-groundingmodular_workspace_lab.py- segregation-plus-integration architecture comparisonconditional_workspace_lab.py- dynamic workspace coupling from module tensionself_report_workspace_lab.py- persistent self-model and symbolic introspection testunified_mind_lab.py- readable capstone combining valence, imagination, workspace, self-model, and pretrained world-model lookaheadsocial_workspace_lab.py- social peer/workspace comparison for grounded critics vs echo loopspartial_observer_social_lab.py- complementary partial observers with map/safety information splithierarchical_workspace_lab.py- cortex-like local workspaces plus master-controller rule-shift testhierarchy_scaling_lab.py- routing-load sweep for single-master vs regional hierarchy scalingcausal_router_learning_lab.py- counterfactual credit-assignment test for context-specific routing trustexecutive_blindspot_lab.py- deceptive-confidence test for master-controller metacognitionsleep_homeostasis_lab.py- offline down-selection test for recurrent echo/crosstalk maintenancesleep_cycle_agent_lab.py- 500-step no-sleep vs offline-sleep vs active-dreaming maintenance testadaptive_sleep_lab.py- fatigue self-report, sleep-dose curve, and waking-repair endurance testbiological_control_lab.py- low-road veto, inhibitory action gate, and neuromodulation toy testsunified_functional_ego_lab.py- combined hierarchy, neuromodulation, causal credit, fatigue, repair, and sleep stackembodied_unity_loop.py- UDP bridge from the functional ego to a Unity robot bodyoutputs/metrics.json- recurrent agent metricsoutputs/hidden_binarization_metrics.json- empirical integration on binarized trained hidden statesoutputs/pyphi_comparison_metrics.json- PyPhi comparison metricsoutputs/exact_phi_metrics.json- exact Phi proxy metricsoutputs/intervention_metrics.json- intervention test metricsoutputs/wirehead_metrics.json- wireheading test metricsoutputs/valence_shaping_metrics.json- valence shaping test metricsoutputs/valence_scaling_metrics.json- behavioral scaling metricsoutputs/imagination_metrics.json- pre-action imagination test metricsoutputs/maze_imagination_metrics.json- 2D maze imagination metricsoutputs/imagination_phi_metrics.json- imagination circuit Phi-proxy metricsoutputs/delusional_integration_metrics.json- internal-loop grounding metricsoutputs/attention_valence_metrics.json- attention-valence filter metricsoutputs/attention_shift_metrics.json- paradigm-shift attention metricsoutputs/modular_workspace_metrics.json- modular workspace architecture metricsoutputs/conditional_workspace_metrics.json- conditional workspace coupling metricsoutputs/self_report_workspace_metrics.json- self-report workspace metricsoutputs/unified_mind_metrics.json- unified capstone metrics and tracesoutputs/social_workspace_metrics.json- social workspace metrics and example tracesoutputs/partial_observer_social_metrics.json- complementary observer metrics and tracesoutputs/hierarchical_workspace_metrics.json- hierarchical workspace metrics and tracesoutputs/hierarchy_scaling_metrics.json- hierarchy scaling sweep metricsoutputs/executive_blindspot_metrics.json- executive blindspot metrics and tracesoutputs/sleep_homeostasis_metrics.json- sleep/homeostasis maintenance metricsoutputs/sleep_cycle_agent_metrics.json- long-run sleep cycle maintenance metricsoutputs/adaptive_sleep_metrics.json- adaptive sleep and fatigue self-report metricsoutputs/biological_control_metrics.json- biological control motif metricsoutputs/unified_functional_ego_metrics.json- combined functional-ego stack metrics and traces
Good next experiments:
- Convert trained hidden-state dynamics into a tiny binary system and calculate Phi proxy on that binarized subnetwork.
- Build a harder 2D world with irreversible dead ends where learned imagination has to outperform learned reflex, not just a hand-isolated myopic baseline.
- Compare recurrent agents with and without the valence/value head.
- Add a feedforward-only PyTorch baseline.
- Animate the hidden-state trajectory as a video.
- Add a small web UI for changing ablated units and watching the trajectory update.













































































