Skip to content

Commit df88e44

Browse files
author
Tom Lasswell
committed
feat(lan): probe devStatus per device in diagnostics (#57)
Extend the read-only LAN helper beyond discovery: after the scan, query each discovered device's devStatus (unicast <ip>:4003) and attach the full runtime reply per device under lan_discovery.devices[].status, so the LAN data surface can be measured empirically on real hardware before LAN control is built. - api/lan.py: async_probe_lan_devstatus() + _DevStatusProtocol. One shared 4002 socket (group-joined, catches unicast and multicast replies), one bounded collection window regardless of device count, capped at 64. The whole devStatus data dict is captured (no allowlist) so a firmware that returns more than the verified 4 fields (onOff/brightness/color/ colorTemInKelvin) is not silently discarded. ptReal is deliberately not probed: write-only and state-changing. - diagnostics.py: per-device status sub-dict (None for non-responders) plus probe_attempted/probe_response_count/probe_error. Redaction is unchanged key-name TO_REDACT, so ip/device/mac echoed inside a reply auto-redact. - docs: note the verified readable-vs-writable LAN surface in §6. - tests: +11 unit, +5 integration (incl. unknown-field-preserved and echoed-ip/device-redacted guarantees). Verified against Galorhallen/govee-local-api and wez/govee2mqtt. Claude-Session: https://claude.ai/code/session_01QVkrSto5stGSV1NNS5pmKM
1 parent 1584ba1 commit df88e44

5 files changed

Lines changed: 524 additions & 124 deletions

File tree

custom_components/govee/api/lan.py

Lines changed: 142 additions & 24 deletions
Original file line numberDiff line numberDiff line change
@@ -1,27 +1,49 @@
1-
"""Read-only Govee LAN (UDP) discovery helper.
1+
"""Read-only Govee LAN (UDP) observation helper.
22
33
Govee exposes a local UDP control API (must be toggled on per device in the
4-
Govee app) on a subset of mostly-light SKUs. This module performs ONLY the
5-
discovery half — a single bounded multicast ``scan`` — so a user can capture
6-
which of their devices answer on the LAN and what they report, and attach it to
7-
a diagnostics download. That community data is the prerequisite for the full
8-
LAN transport requested in issue #57 (the maintainer has no LAN hardware to
9-
test against).
10-
11-
Deliberately scoped: no control commands, no entities, no persistent socket —
12-
one scan, collect responses for a short timeout, return them. Protocol per
13-
``docs/govee-protocol-reference.md`` §6:
14-
15-
- Scan request -> 239.255.255.250:4001 ``{"msg":{"cmd":"scan",...}}``
16-
- Scan response -> 239.255.255.250:4002 ``{"msg":{"cmd":"scan","data":{...}}}``
4+
Govee app) on a subset of mostly-light SKUs. This module performs ONLY read-only
5+
probes so a user can capture which of their devices answer on the LAN and what
6+
state they report, and attach it to a diagnostics download. That community data
7+
is the prerequisite for the full LAN transport requested in issue #57 (the
8+
maintainer has no LAN hardware to test against).
9+
10+
Two probes, both safe to attach to a diagnostics download:
11+
12+
- ``async_scan_lan_devices`` — one bounded multicast ``scan`` (discovery): which
13+
devices answer and their identity/firmware metadata.
14+
- ``async_probe_lan_devstatus`` — a unicast ``devStatus`` query per discovered
15+
device, capturing its full runtime reply so we can measure empirically how
16+
much state the LAN API actually exposes. Verified against
17+
``Galorhallen/govee-local-api`` and ``wez/govee2mqtt``, a ``devStatus`` reply
18+
carries exactly four runtime fields — ``onOff``, ``brightness``, ``color`` and
19+
``colorTemInKelvin`` — but we capture the whole ``data`` dict so a firmware
20+
that returns more is not silently discarded (the entire point is discovery).
21+
22+
``ptReal`` (the BLE-over-WiFi passthrough that drives scenes/segments/music) is
23+
deliberately NOT probed: both reference libraries send it fire-and-forget with
24+
no response to read back, and emitting one is a state-changing control write —
25+
forbidden in this read-only module. So scene/segment/music/sensor state is
26+
simply not readable over the LAN API; only the four ``devStatus`` fields and the
27+
discovery metadata are.
28+
29+
Deliberately scoped: no control writes, no entities, no persistent socket — each
30+
call opens a socket, collects responses for a short timeout, and returns them.
31+
Protocol per ``docs/govee-protocol-reference.md`` §6:
32+
33+
- Scan request -> 239.255.255.250:4001 ``{"msg":{"cmd":"scan",...}}``
34+
- Scan response -> 239.255.255.250:4002 ``{"msg":{"cmd":"scan","data":{...}}}``
35+
- devStatus query -> <device-ip>:4003 ``{"msg":{"cmd":"devStatus","data":{}}}``
36+
- devStatus reply -> our :4002 (unicast OR multicast, firmware-dependent)
1737
1838
Critical protocol detail (the reason early builds returned zero devices, issue
1939
#57): a Govee device sends its scan *response* as **multicast** to the group on
2040
port 4002 — it does NOT unicast the reply back to the sender. So the receive
2141
socket MUST join the ``239.255.255.250`` group via ``IP_ADD_MEMBERSHIP`` or the
2242
kernel silently drops every reply before it reaches us. Binding port 4002 alone
23-
is not enough. This mirrors ``govee-local-api`` (the library behind Home
24-
Assistant's ``govee_light_local``) and ``wez/govee2mqtt``.
43+
is not enough. The devStatus probe reuses the same group-joined 4002 socket so
44+
it catches replies whether a given firmware answers unicast or multicast. This
45+
mirrors ``govee-local-api`` (the library behind Home Assistant's
46+
``govee_light_local``) and ``wez/govee2mqtt``.
2547
"""
2648

2749
from __future__ import annotations
@@ -44,15 +66,25 @@
4466
LAN_MULTICAST_GROUP = "239.255.255.250"
4567
LAN_DISCOVERY_PORT = 4001 # devices listen here for the scan request
4668
LAN_RESPONSE_PORT = 4002 # devices multicast scan responses here; we listen
69+
LAN_COMMAND_PORT = 4003 # devices listen here for unicast devStatus/control
4770
LAN_MULTICAST_TTL = 2 # let a scan / reply cross at most one router hop
4871

72+
# devStatus probe budget. All probes share ONE socket and ONE collection window
73+
# (sends are fire-and-forget; replies arrive asynchronously), so total wall time
74+
# is bounded by the window regardless of device_count — 11 devices cost the same
75+
# ~2s as one. The cap bounds send-loop work against a large CIDR sweep, not wait.
76+
LAN_PROBE_WINDOW = 2.0 # seconds to collect all devStatus replies
77+
LAN_PROBE_MAX_DEVICES = 64 # hard cap on how many IPs we probe in one batch
78+
4979
# INADDR_ANY: join/egress on the kernel's default-route interface. Always added
5080
# alongside any explicit interface IPs as a catch-all for single-NIC hosts.
5181
_DEFAULT_INTERFACE = "0.0.0.0"
5282

53-
_SCAN_REQUEST = json.dumps(
54-
{"msg": {"cmd": "scan", "data": {"account_topic": "reserve"}}}
55-
).encode("utf-8")
83+
_SCAN_REQUEST = json.dumps({"msg": {"cmd": "scan", "data": {"account_topic": "reserve"}}}).encode("utf-8")
84+
85+
# Empty-data devStatus query; matches DevStatusMessage in govee-local-api and
86+
# Request::DevStatus{} in wez/govee2mqtt. Sent unicast to <device-ip>:4003.
87+
_DEVSTATUS_REQUEST = json.dumps({"msg": {"cmd": "devStatus", "data": {}}}).encode("utf-8")
5688

5789
# Packed multicast group address, reused for every IP_ADD/DROP_MEMBERSHIP call.
5890
_GROUP_BYTES = socket.inet_aton(LAN_MULTICAST_GROUP)
@@ -106,8 +138,7 @@ def _add(address: str) -> None:
106138
network = IPv4Network(token, strict=False)
107139
if network.num_addresses > MAX_LAN_TARGET_ADDRESSES:
108140
raise LanTargetError(
109-
f"Subnet {token} is larger than /24 — list device IPs "
110-
"or a /24 (or smaller) subnet instead."
141+
f"Subnet {token} is larger than /24 — list device IPs " "or a /24 (or smaller) subnet instead."
111142
)
112143
for host in network.hosts():
113144
_add(str(host))
@@ -156,6 +187,40 @@ def error_received(self, exc: Exception) -> None: # pragma: no cover - rare
156187
_LOGGER.debug("LAN scan socket error: %s", exc)
157188

158189

190+
class _DevStatusProtocol(asyncio.DatagramProtocol):
191+
"""Collects raw Govee ``devStatus`` replies, keyed by responder IP.
192+
193+
Separate from ``_ScanProtocol`` because that one hard-drops ``cmd != "scan"``.
194+
Captures the ENTIRE ``data`` dict (no field allowlist) — the purpose of the
195+
probe is to discover what firmware actually returns, so an allowlist would
196+
throw away exactly the signal we want. Redaction happens downstream in
197+
diagnostics ``_redact`` (key-name based: any ``ip``/``device``/``mac`` key a
198+
firmware echoes inside ``data`` is auto-redacted there).
199+
"""
200+
201+
def __init__(self) -> None:
202+
self.responses: dict[str, dict[str, Any]] = {}
203+
204+
def datagram_received(self, data: bytes, addr: tuple[str, int]) -> None:
205+
try:
206+
payload = json.loads(data.decode("utf-8", errors="replace"))
207+
msg = payload.get("msg", {})
208+
if msg.get("cmd") != "devStatus":
209+
return # ignore scan replies / unrelated multicast noise
210+
body = msg.get("data", {})
211+
if not isinstance(body, dict):
212+
return
213+
except (ValueError, AttributeError):
214+
return
215+
# Key by the datagram SOURCE IP — correct for both reply paths: a
216+
# unicast reply to our 4002 source and a multicast reply to the group
217+
# both carry the device's own IP as the UDP source. Last reply wins.
218+
self.responses[addr[0]] = body
219+
220+
def error_received(self, exc: Exception) -> None: # pragma: no cover - rare
221+
_LOGGER.debug("LAN devStatus socket error: %s", exc)
222+
223+
159224
def _build_socket() -> socket.socket:
160225
"""Create the bound UDP receive socket for the scan (raises ``OSError``).
161226
@@ -226,9 +291,7 @@ def _send_scan(
226291
"""
227292
for iface in interfaces or [_DEFAULT_INTERFACE]:
228293
try:
229-
sock.setsockopt(
230-
socket.IPPROTO_IP, socket.IP_MULTICAST_IF, socket.inet_aton(iface)
231-
)
294+
sock.setsockopt(socket.IPPROTO_IP, socket.IP_MULTICAST_IF, socket.inet_aton(iface))
232295
except OSError as err: # pragma: no cover - bad interface, try the send anyway
233296
_LOGGER.debug("LAN scan: egress select on %s failed: %s", iface, err)
234297
transport.sendto(_SCAN_REQUEST, (LAN_MULTICAST_GROUP, LAN_DISCOVERY_PORT))
@@ -290,3 +353,58 @@ async def async_scan_lan_devices(
290353
transport.close()
291354

292355
return list(protocol.responses.values())
356+
357+
358+
async def async_probe_lan_devstatus(
359+
ips: list[str],
360+
timeout: float = LAN_PROBE_WINDOW,
361+
interface_ips: list[str] | None = None,
362+
) -> dict[str, dict[str, Any]]:
363+
"""Unicast ``devStatus`` to each IP and collect raw replies for ``timeout`` s.
364+
365+
Returns ``{responder_ip: raw_data_dict}`` capturing the WHOLE reply body for
366+
each device that answers — the probe exists to measure the real LAN data
367+
surface, so no field allowlist is applied here (redaction is downstream in
368+
diagnostics). A device that discovers but does not answer ``devStatus`` (LAN
369+
control disabled in the app, BLE-only SKU) simply has no entry — the caller
370+
treats a missing IP as "no status".
371+
372+
Sends are fire-and-forget to ``<ip>:4003``; replies may return unicast to our
373+
4002 source OR multicast to ``239.255.255.250:4002`` depending on firmware,
374+
so we reuse the scan socket pattern (bound 4002 + group-joined) to catch
375+
both. All probes share one socket and one collection window, so total wall
376+
time is bounded by ``timeout`` regardless of device count. ``ips`` is capped
377+
at ``LAN_PROBE_MAX_DEVICES`` so a large ``extra_targets`` sweep cannot blow up
378+
the send loop.
379+
380+
``interface_ips`` join the multicast group on each adapter (multi-homed
381+
coverage), mirroring ``async_scan_lan_devices``.
382+
383+
Raises ``OSError`` if the response socket cannot be bound (port 4002 held by
384+
a non-sharing local-control app); callers should treat that as "no data",
385+
the same contract as ``async_scan_lan_devices``.
386+
"""
387+
if not ips:
388+
return {}
389+
390+
interfaces = list(interface_ips or [])
391+
targets = ips[:LAN_PROBE_MAX_DEVICES]
392+
393+
loop = asyncio.get_running_loop()
394+
sock = _build_socket() # raises OSError if port 4002 cannot be bound
395+
joined = _join_group(sock, interfaces) # catch multicast replies too
396+
397+
transport, protocol = await loop.create_datagram_endpoint(_DevStatusProtocol, sock=sock)
398+
assert isinstance(protocol, _DevStatusProtocol)
399+
try:
400+
for ip in targets:
401+
try:
402+
transport.sendto(_DEVSTATUS_REQUEST, (ip, LAN_COMMAND_PORT))
403+
except OSError as err: # one bad/unreachable IP must not abort the batch
404+
_LOGGER.debug("LAN probe: send to %s failed: %s", ip, err)
405+
await asyncio.sleep(timeout)
406+
finally:
407+
_drop_group(sock, joined)
408+
transport.close()
409+
410+
return dict(protocol.responses)

custom_components/govee/diagnostics.py

Lines changed: 38 additions & 17 deletions
Original file line numberDiff line numberDiff line change
@@ -21,7 +21,12 @@
2121
from homeassistant.core import HomeAssistant
2222
from homeassistant.helpers.device_registry import DeviceEntry
2323

24-
from .api.lan import LanTargetError, async_scan_lan_devices, expand_lan_targets
24+
from .api.lan import (
25+
LanTargetError,
26+
async_probe_lan_devstatus,
27+
async_scan_lan_devices,
28+
expand_lan_targets,
29+
)
2530
from .const import (
2631
CONF_API_KEY,
2732
CONF_EMAIL,
@@ -91,10 +96,7 @@ def _anon_id(value: str) -> str:
9196

9297
def _anonymize_device_keys(data: dict[str, Any]) -> dict[str, Any]:
9398
"""Replace MAC-format dict keys with stable hashes; leave other keys intact."""
94-
return {
95-
(_anonymize_device_id(k) if isinstance(k, str) and _looks_like_mac(k) else k): v
96-
for k, v in data.items()
97-
}
99+
return {(_anonymize_device_id(k) if isinstance(k, str) and _looks_like_mac(k) else k): v for k, v in data.items()}
98100

99101

100102
def _serialize_state(state: Any) -> dict[str, Any] | None:
@@ -260,9 +262,7 @@ async def _lan_source_interfaces(hass: HomeAssistant) -> tuple[list[str], list[s
260262
return ips, classes
261263

262264

263-
async def _lan_discovery_diag(
264-
hass: HomeAssistant, lan_targets_raw: str = ""
265-
) -> dict[str, Any]:
265+
async def _lan_discovery_diag(hass: HomeAssistant, lan_targets_raw: str = "") -> dict[str, Any]:
266266
"""Run one read-only LAN scan for the diagnostics download (issue #57).
267267
268268
Captures which of the user's devices answer Govee's local UDP discovery and
@@ -293,15 +293,40 @@ async def _lan_discovery_diag(
293293
"interface_classes": interface_classes,
294294
"extra_target_count": len(extra_targets),
295295
"error": None,
296+
"probe_attempted": False,
297+
"probe_response_count": 0,
298+
"probe_error": None,
296299
}
297300
try:
298-
devices = await async_scan_lan_devices(
299-
interface_ips=interface_ips, extra_targets=extra_targets
300-
)
301+
devices = await async_scan_lan_devices(interface_ips=interface_ips, extra_targets=extra_targets)
301302
result["device_count"] = len(devices)
302303
result["devices"] = devices
303304
except Exception as err: # never break the diagnostics download
304305
result["error"] = str(err)
306+
return result
307+
308+
# MAX-DATA probe: query each discovered device for its full LAN runtime state
309+
# (issue #57 follow-up). The whole devStatus reply is captured per device so
310+
# we can measure empirically how much state the LAN API exposes — building
311+
# toward LAN-primary control. Best-effort and never-raise, same contract as
312+
# the scan. Each devStatus field (onOff/brightness/color/colorTemInKelvin) is
313+
# non-PII; any ip/device/mac key a firmware echoes is auto-redacted by the
314+
# shared _redact pass. NOTE: if community downloads surface a NEW PII key
315+
# (hostname, ssid, wifi MAC under a fresh key name), add it to TO_REDACT.
316+
probe_ips = [d["ip"] for d in devices if d.get("ip")]
317+
result["probe_attempted"] = bool(probe_ips)
318+
status_by_ip: dict[str, dict[str, Any]] = {}
319+
if probe_ips:
320+
try:
321+
status_by_ip = await async_probe_lan_devstatus(probe_ips, interface_ips=interface_ips)
322+
except Exception as err: # never break the diagnostics download
323+
result["probe_error"] = str(err)
324+
result["probe_response_count"] = len(status_by_ip)
325+
# Attach each raw reply onto its device record by IP; non-responders get
326+
# status=None so the community data shows the responder ratio explicitly.
327+
for device in devices:
328+
device_ip = device.get("ip")
329+
device["status"] = status_by_ip.get(device_ip) if isinstance(device_ip, str) else None
305330
return result
306331

307332

@@ -341,9 +366,7 @@ async def async_get_config_entry_diagnostics(
341366
"raw_api_devices": coordinator.api_client.last_raw_devices,
342367
"leak_sensors": _leak_diag(coordinator),
343368
# Read-only local-network scan to seed the LAN-API work (issue #57).
344-
"lan_discovery": await _lan_discovery_diag(
345-
hass, entry.options.get(CONF_LAN_TARGETS, "")
346-
),
369+
"lan_discovery": await _lan_discovery_diag(hass, entry.options.get(CONF_LAN_TARGETS, "")),
347370
**_runtime_diag(coordinator),
348371
}
349372
return _redact(diagnostics_data)
@@ -393,9 +416,7 @@ async def async_get_device_diagnostics(
393416
"name": device.name_by_user or device.name,
394417
# Anonymize the MAC inside each (domain, id) identifier tuple — it is
395418
# a list element, so async_redact_data's key match won't reach it.
396-
"identifiers": [
397-
[dom, _anon_id(ident)] for dom, ident in device.identifiers
398-
],
419+
"identifiers": [[dom, _anon_id(ident)] for dom, ident in device.identifiers],
399420
"model": device.model,
400421
"sw_version": device.sw_version,
401422
"hw_version": device.hw_version,

docs/govee-protocol-reference.md

Lines changed: 12 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1233,6 +1233,18 @@ Local network control without cloud dependency. Must be enabled in Govee app dev
12331233
}
12341234
```
12351235

1236+
> **Readable vs writable over LAN (verified against `Galorhallen/govee-local-api`
1237+
> + `wez/govee2mqtt`):** `devStatus` is the ONLY read path and it returns exactly
1238+
> these four runtime fields — `onOff`, `brightness` (0–100), `color {r,g,b}`
1239+
> (whole-device), `colorTemInKelvin`. There is **no** scene, segment, music, DIY,
1240+
> or sensor field. `turn`/`brightness`/`colorwc`/`ptReal` are **write-only** (no
1241+
> response body); `ptReal` is fire-and-forget, so active scene, segment colors,
1242+
> music mode and DIY state are **not readable over LAN** — they must come from
1243+
> MQTT/API/optimistic state. The `lan_discovery` diagnostics block probes each
1244+
> discovered device with `devStatus` (capturing the whole reply, to catch any
1245+
> firmware that returns more) so this can be confirmed empirically on real
1246+
> hardware before LAN control is built (issue #57).
1247+
12361248
### 5.4 BLE Passthrough (ptReal)
12371249

12381250
Send BLE commands through WiFi for devices supporting it:

0 commit comments

Comments
 (0)