Describe the bug
Lines 452 and 523 in pkg/agent/controller/networkpolicy/networkpolicy_controller.go have a typo in the error handling code. They're missing := in the if statement, so errors from replaceAll() get silently dropped. This means if writing the policy cache to disk fails (like when disk is full), the agent doesn't log anything but the on-disk cache becomes stale. When the agent restarts later, it loads this stale cache and installs wrong OVS flows, which can break network policy enforcement.
To Reproduce
- Deploy antrea on a cluster with some network policies applied
- Fill up the disk on a node to around 98% (this happens in production with log buildup)
- Restart the antrea controller pod - this makes the agent reconnect and try to sync all policies
- The
replaceAll() call fails because of disk space but no error is logged
- Free up some disk space, agent keeps running fine (in-memory state is ok)
- Now restart the agent pod or reboot the node
- Agent loads stale data from disk cache and installs incorrect flows
Expected
When replaceAll() fails, it should log an error message like it does for networkPolicyStore on line 381. This way we know the disk cache is broken and can fix it before restarting.
Actual behavior
No error is logged. The bug is in these two places:
Line 452:
if c.appliedToGroupStore.replaceAll(objs); err != nil {
klog.ErrorS(err, "Failed to store the AppliedToGroups to files")
}
Line 523:
if c.addressGroupStore.replaceAll(objs); err != nil {
klog.ErrorS(err, "Failed to store the AddressGroups to files")
}
Should be (add :=):
if err := c.appliedToGroupStore.replaceAll(objs); err != nil {
if err := c.addressGroupStore.replaceAll(objs); err != nil {
The way it's written now, it's checking the outer err variable which is always nil, not the return value from replaceAll(). Line 381 already does it correctly for networkPolicyStore.
Versions:
This affects all versions - the bug has been there since the file cache feature was added. I'm looking at current main branch.
Additional context
I wrote a test that proves this using afero readonly filesystem to force the disk write to fail. With the bug, old stale data stays on disk and no error appears. With the fix, error gets caught properly.
This is pretty serious because it's completely silent - no logs, no events, nothing. And it happens during normal cluster operations like controller upgrades + node restarts. The network policy enforcement just silently breaks.
Describe the bug
Lines 452 and 523 in
pkg/agent/controller/networkpolicy/networkpolicy_controller.gohave a typo in the error handling code. They're missing:=in the if statement, so errors fromreplaceAll()get silently dropped. This means if writing the policy cache to disk fails (like when disk is full), the agent doesn't log anything but the on-disk cache becomes stale. When the agent restarts later, it loads this stale cache and installs wrong OVS flows, which can break network policy enforcement.To Reproduce
replaceAll()call fails because of disk space but no error is loggedExpected
When
replaceAll()fails, it should log an error message like it does for networkPolicyStore on line 381. This way we know the disk cache is broken and can fix it before restarting.Actual behavior
No error is logged. The bug is in these two places:
Line 452:
Line 523:
Should be (add
:=):The way it's written now, it's checking the outer
errvariable which is always nil, not the return value fromreplaceAll(). Line 381 already does it correctly for networkPolicyStore.Versions:
This affects all versions - the bug has been there since the file cache feature was added. I'm looking at current main branch.
Additional context
I wrote a test that proves this using afero readonly filesystem to force the disk write to fail. With the bug, old stale data stays on disk and no error appears. With the fix, error gets caught properly.
This is pretty serious because it's completely silent - no logs, no events, nothing. And it happens during normal cluster operations like controller upgrades + node restarts. The network policy enforcement just silently breaks.