1111
1212- ``async_scan_lan_devices`` — one bounded multicast ``scan`` (discovery): which
1313 devices answer and their identity/firmware metadata.
14- - ``async_probe_lan_devstatus`` — a unicast ``devStatus`` query per discovered
15- device, capturing its full runtime reply so we can measure empirically how
16- much state the LAN API actually exposes. Verified against
17- ``Galorhallen/govee-local-api`` and ``wez/govee2mqtt``, a ``devStatus`` reply
18- carries exactly four runtime fields — ``onOff``, ``brightness``, ``color`` and
19- ``colorTemInKelvin`` — but we capture the whole ``data`` dict so a firmware
20- that returns more is not silently discarded (the entire point is discovery).
21-
22- ``ptReal`` (the BLE-over-WiFi passthrough that drives scenes/segments/music) is
23- deliberately NOT probed: both reference libraries send it fire-and-forget with
24- no response to read back, and emitting one is a state-changing control write —
25- forbidden in this read-only module. So scene/segment/music/sensor state is
26- simply not readable over the LAN API; only the four ``devStatus`` fields and the
27- discovery metadata are.
14+ - ``async_probe_lan_raw`` — a "reality probe": fire a battery of safe READ-ONLY
15+ queries (``devStatus`` + ``status`` + a unicast ``scan``) at each discovered
16+ device and capture **every** datagram it emits, completely unfiltered — whole
17+ payload, any command, any field, even undecodable bytes. We do NOT trust any
18+ other integration's notion of which commands exist or which fields a reply
19+ carries (``govee-local-api`` parses only 4 ``devStatus`` fields and never sends
20+ ``status`` at all); the point is to measure on real hardware what the firmware
21+ actually exposes rather than inherit someone else's parser. In particular the
22+ ``status`` command's ``pt`` (BLE-passthrough hex) field may carry
23+ segment/scene/sensor state that the 4-field ``devStatus`` omits.
24+
25+ ``ptReal`` and the other control verbs (``turn``/``brightness``/``colorwc``) are
26+ deliberately NOT sent: they are state-changing writes, forbidden in this
27+ read-only module. Capturing what the device *volunteers* in response to read
28+ queries is the safe way to map the surface.
2829
2930Deliberately scoped: no control writes, no entities, no persistent socket — each
3031call opens a socket, collects responses for a short timeout, and returns them.
3132Protocol per ``docs/govee-protocol-reference.md`` §6:
3233
3334- Scan request -> 239.255.255.250:4001 ``{"msg":{"cmd":"scan",...}}``
3435- Scan response -> 239.255.255.250:4002 ``{"msg":{"cmd":"scan","data":{...}}}``
35- - devStatus query -> <device-ip>:4003 ``{"msg":{"cmd":"devStatus","data":{}}}``
36- - devStatus reply -> our :4002 (unicast OR multicast, firmware-dependent)
36+ - Read queries -> <device-ip>:4003/4001 ``{"msg":{"cmd":"devStatus|status|scan ","data":{}}}``
37+ - Replies -> our :4002 (unicast OR multicast, firmware-dependent)
3738
3839Critical protocol detail (the reason early builds returned zero devices, issue
3940#57): a Govee device sends its scan *response* as **multicast** to the group on
4041port 4002 — it does NOT unicast the reply back to the sender. So the receive
4142socket MUST join the ``239.255.255.250`` group via ``IP_ADD_MEMBERSHIP`` or the
4243kernel silently drops every reply before it reaches us. Binding port 4002 alone
43- is not enough. The devStatus probe reuses the same group-joined 4002 socket so
44- it catches replies whether a given firmware answers unicast or multicast. This
44+ is not enough. The reality probe reuses the same group-joined 4002 socket so it
45+ catches replies whether a given firmware answers unicast or multicast. This
4546mirrors ``govee-local-api`` (the library behind Home Assistant's
4647``govee_light_local``) and ``wez/govee2mqtt``.
4748"""
6970LAN_COMMAND_PORT = 4003 # devices listen here for unicast devStatus/control
7071LAN_MULTICAST_TTL = 2 # let a scan / reply cross at most one router hop
7172
72- # devStatus probe budget. All probes share ONE socket and ONE collection window
73+ # Reality- probe budget. All probes share ONE socket and ONE collection window
7374# (sends are fire-and-forget; replies arrive asynchronously), so total wall time
7475# is bounded by the window regardless of device_count — 11 devices cost the same
7576# ~2s as one. The cap bounds send-loop work against a large CIDR sweep, not wait.
76- LAN_PROBE_WINDOW = 2.0 # seconds to collect all devStatus replies
77+ LAN_PROBE_WINDOW = 2.5 # seconds to collect all probe replies
7778LAN_PROBE_MAX_DEVICES = 64 # hard cap on how many IPs we probe in one batch
79+ LAN_PROBE_MAX_REPLIES_PER_IP = 32 # guard against a chatty device flooding output
7880
7981# INADDR_ANY: join/egress on the kernel's default-route interface. Always added
8082# alongside any explicit interface IPs as a catch-all for single-NIC hosts.
8183_DEFAULT_INTERFACE = "0.0.0.0"
8284
8385_SCAN_REQUEST = json .dumps ({"msg" : {"cmd" : "scan" , "data" : {"account_topic" : "reserve" }}}).encode ("utf-8" )
8486
85- # Empty-data devStatus query; matches DevStatusMessage in govee-local-api and
86- # Request::DevStatus{} in wez/govee2mqtt. Sent unicast to <device-ip>:4003.
87- _DEVSTATUS_REQUEST = json .dumps ({"msg" : {"cmd" : "devStatus" , "data" : {}}}).encode ("utf-8" )
87+ # Read-only LAN query battery for the reality probe (issue #57). We do NOT trust
88+ # any other integration's field/command list — we send every safe READ query we
89+ # know of and capture whatever the hardware actually emits, so the real LAN data
90+ # surface is measured, not assumed. STRICTLY read-only: NO writes
91+ # (turn/brightness/colorwc/ptReal) — a diagnostics probe must never mutate device
92+ # state. Each entry is ``(cmd, port, data)`` sent unicast; ``data`` is empty so no
93+ # parameters are set. Replies are captured raw on 4002 regardless of cmd.
94+ #
95+ # - ``devStatus`` (:4003) — the documented status read (4 known fields).
96+ # - ``status`` (:4003) — undocumented in HA libs but a ``StatusResponse`` with
97+ # a ``pt`` (base64 BLE passthrough) field exists in govee-local-api yet is never
98+ # sent; it may carry segment/scene/sensor state the 4-field devStatus omits.
99+ # - ``scan`` (:4001) — unicast discovery, captured WHOLE (not the 7-field
100+ # allowlist) so any extra identity/firmware fields surface.
101+ LAN_PROBE_COMMANDS : tuple [tuple [str , int , dict [str , Any ]], ...] = (
102+ ("devStatus" , LAN_COMMAND_PORT , {}),
103+ ("status" , LAN_COMMAND_PORT , {}),
104+ ("scan" , LAN_DISCOVERY_PORT , {"account_topic" : "reserve" }),
105+ )
88106
89107# Packed multicast group address, reused for every IP_ADD/DROP_MEMBERSHIP call.
90108_GROUP_BYTES = socket .inet_aton (LAN_MULTICAST_GROUP )
@@ -187,38 +205,38 @@ def error_received(self, exc: Exception) -> None: # pragma: no cover - rare
187205 _LOGGER .debug ("LAN scan socket error: %s" , exc )
188206
189207
190- class _DevStatusProtocol (asyncio .DatagramProtocol ):
191- """Collects raw Govee ``devStatus`` replies, keyed by responder IP.
208+ class _RawProbeProtocol (asyncio .DatagramProtocol ):
209+ """Captures EVERY datagram received during a probe, raw, keyed by source IP.
210+
211+ Deliberately unfiltered — no ``cmd`` check, no field allowlist, no shape
212+ assumptions. The goal is to record exactly what the hardware emits, including
213+ commands and fields no reference library parses, so the real LAN data surface
214+ is *measured* rather than inherited from another integration's parser. The
215+ whole ``{"msg": ...}`` payload is kept; an undecodable datagram is captured as
216+ a truncated ``_unparsed`` string rather than dropped (even garbage is signal).
192217
193- Separate from ``_ScanProtocol`` because that one hard-drops ``cmd != "scan"``.
194- Captures the ENTIRE ``data`` dict (no field allowlist) — the purpose of the
195- probe is to discover what firmware actually returns, so an allowlist would
196- throw away exactly the signal we want. Redaction happens downstream in
197- diagnostics ``_redact`` (key-name based: any ``ip``/``device``/``mac`` key a
198- firmware echoes inside ``data`` is auto-redacted there).
218+ Keyed by the datagram SOURCE IP — correct for both reply paths (a unicast
219+ reply to our 4002 source and a multicast reply to the group both carry the
220+ device's own IP as the UDP source). Each IP accumulates a LIST of replies so
221+ multiple commands' responses (devStatus + status + scan) are all retained.
222+ Redaction is downstream in diagnostics (key-name + value-level address scrub).
199223 """
200224
201225 def __init__ (self ) -> None :
202- self .responses : dict [str , dict [ str , Any ]] = {}
226+ self .replies : dict [str , list [ Any ]] = {}
203227
204228 def datagram_received (self , data : bytes , addr : tuple [str , int ]) -> None :
229+ bucket = self .replies .setdefault (addr [0 ], [])
230+ if len (bucket ) >= LAN_PROBE_MAX_REPLIES_PER_IP :
231+ return # chatty device / broadcast storm — keep the dump bounded
205232 try :
206- payload = json .loads (data .decode ("utf-8" , errors = "replace" ))
207- msg = payload .get ("msg" , {})
208- if msg .get ("cmd" ) != "devStatus" :
209- return # ignore scan replies / unrelated multicast noise
210- body = msg .get ("data" , {})
211- if not isinstance (body , dict ):
212- return
213- except (ValueError , AttributeError ):
214- return
215- # Key by the datagram SOURCE IP — correct for both reply paths: a
216- # unicast reply to our 4002 source and a multicast reply to the group
217- # both carry the device's own IP as the UDP source. Last reply wins.
218- self .responses [addr [0 ]] = body
233+ payload : Any = json .loads (data .decode ("utf-8" , errors = "replace" ))
234+ except ValueError :
235+ payload = {"_unparsed" : data .decode ("utf-8" , errors = "replace" )[:512 ]}
236+ bucket .append (payload )
219237
220238 def error_received (self , exc : Exception ) -> None : # pragma: no cover - rare
221- _LOGGER .debug ("LAN devStatus socket error: %s" , exc )
239+ _LOGGER .debug ("LAN raw-probe socket error: %s" , exc )
222240
223241
224242def _build_socket () -> socket .socket :
@@ -355,27 +373,30 @@ async def async_scan_lan_devices(
355373 return list (protocol .responses .values ())
356374
357375
358- async def async_probe_lan_devstatus (
376+ async def async_probe_lan_raw (
359377 ips : list [str ],
360378 timeout : float = LAN_PROBE_WINDOW ,
361379 interface_ips : list [str ] | None = None ,
362- ) -> dict [str , dict [str , Any ]]:
363- """Unicast ``devStatus`` to each IP and collect raw replies for ``timeout`` s.
364-
365- Returns ``{responder_ip: raw_data_dict}`` capturing the WHOLE reply body for
366- each device that answers — the probe exists to measure the real LAN data
367- surface, so no field allowlist is applied here (redaction is downstream in
368- diagnostics). A device that discovers but does not answer ``devStatus`` (LAN
369- control disabled in the app, BLE-only SKU) simply has no entry — the caller
370- treats a missing IP as "no status".
371-
372- Sends are fire-and-forget to ``<ip>:4003``; replies may return unicast to our
373- 4002 source OR multicast to ``239.255.255.250:4002`` depending on firmware,
374- so we reuse the scan socket pattern (bound 4002 + group-joined) to catch
375- both. All probes share one socket and one collection window, so total wall
376- time is bounded by ``timeout`` regardless of device count. ``ips`` is capped
377- at ``LAN_PROBE_MAX_DEVICES`` so a large ``extra_targets`` sweep cannot blow up
378- the send loop.
380+ commands : tuple [tuple [str , int , dict [str , Any ]], ...] = LAN_PROBE_COMMANDS ,
381+ ) -> dict [str , list [Any ]]:
382+ """Reality probe: fire a read-only query battery at each IP, capture all replies.
383+
384+ Returns ``{responder_ip: [raw_payload, ...]}`` — the WHOLE ``{"msg": ...}`` of
385+ every datagram each device emits during the window, completely unfiltered. We
386+ do not trust any other integration's idea of which commands exist or which
387+ fields a reply carries: we send every safe READ query in ``commands`` and
388+ record exactly what comes back, so the real LAN data surface is measured. A
389+ device that does not answer simply has no entry.
390+
391+ ``commands`` is ``((cmd, port, data), ...)`` — STRICTLY read-only (default
392+ ``LAN_PROBE_COMMANDS``: devStatus + status + unicast scan). No control writes
393+ are ever sent. Each is unicast to ``<ip>:port``; replies may return unicast to
394+ our 4002 source OR multicast to ``239.255.255.250:4002`` depending on
395+ firmware, so we reuse the scan socket pattern (bound 4002 + group-joined) to
396+ catch both. All probes share one socket and one collection window, so total
397+ wall time is bounded by ``timeout`` regardless of device count. ``ips`` is
398+ capped at ``LAN_PROBE_MAX_DEVICES``; per-IP capture is capped at
399+ ``LAN_PROBE_MAX_REPLIES_PER_IP``.
379400
380401 ``interface_ips`` join the multicast group on each adapter (multi-homed
381402 coverage), mirroring ``async_scan_lan_devices``.
@@ -389,22 +410,26 @@ async def async_probe_lan_devstatus(
389410
390411 interfaces = list (interface_ips or [])
391412 targets = ips [:LAN_PROBE_MAX_DEVICES ]
413+ requests = [
414+ (json .dumps ({"msg" : {"cmd" : cmd , "data" : data }}).encode ("utf-8" ), port ) for cmd , port , data in commands
415+ ]
392416
393417 loop = asyncio .get_running_loop ()
394418 sock = _build_socket () # raises OSError if port 4002 cannot be bound
395419 joined = _join_group (sock , interfaces ) # catch multicast replies too
396420
397- transport , protocol = await loop .create_datagram_endpoint (_DevStatusProtocol , sock = sock )
398- assert isinstance (protocol , _DevStatusProtocol )
421+ transport , protocol = await loop .create_datagram_endpoint (_RawProbeProtocol , sock = sock )
422+ assert isinstance (protocol , _RawProbeProtocol )
399423 try :
400424 for ip in targets :
401- try :
402- transport .sendto (_DEVSTATUS_REQUEST , (ip , LAN_COMMAND_PORT ))
403- except OSError as err : # one bad/unreachable IP must not abort the batch
404- _LOGGER .debug ("LAN probe: send to %s failed: %s" , ip , err )
425+ for request , port in requests :
426+ try :
427+ transport .sendto (request , (ip , port ))
428+ except OSError as err : # one bad/unreachable IP must not abort the batch
429+ _LOGGER .debug ("LAN raw probe: send to %s:%s failed: %s" , ip , port , err )
405430 await asyncio .sleep (timeout )
406431 finally :
407432 _drop_group (sock , joined )
408433 transport .close ()
409434
410- return dict (protocol .responses )
435+ return dict (protocol .replies )
0 commit comments