Zigbee Lock: Support SLGA Migration#2939
Conversation
First, I had AI do the following: - Rename subdrivers: using-old-capabilities -> legacy-handlers, promote using-new-capabilities handlers to top-level defaults - Make new capability handling the driver default; legacy-handlers subdriver overrides for pre-migration devices only - Top-level vendor subdrivers (yale, samsungsds, yale-fingerprint-lock) now only activate for migrated devices to avoid double-dispatch - Remove duplicate lock-without-codes top-level subdriver (handled entirely within legacy-handlers since these devices never migrate) - Add init lifecycle handler to legacy-handlers to populate lockCodes state for pre-migration devices on driver restart Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> Then, I cleaned up the driver further by fixing the git understanding of what files were made in what year, and by introducing small changes to altogether remove some sub-subdrivers from the legacy subdriver, and vice versa.
|
Duplicate profile check: Passed - no duplicate profiles detected. |
|
Invitation URL: |
Test Results 72 files 510 suites 0s ⏱️ Results for commit d74f6a5. ♻️ This comment has been updated with latest results. |
|
Minimum allowed coverage is Generated by 🐒 cobertura-action against d74f6a5 |
| local init = function(driver, device) | ||
| lock_utils.reload_tables(device) | ||
| device.thread:call_with_delay(15, function(d) | ||
| reload_all_codes(device) |
There was a problem hiding this comment.
Do we have to reload all codes every time or can we put a guard on this so it's only done in the cases where it's needed?
There was a problem hiding this comment.
Seems that all of the actual sends in this function are gated on if the cache is actually empty, so I think that's effectively done.
There was a problem hiding this comment.
It doesn't look like SendPINOverTheAir and GetPINCode are gated
There was a problem hiding this comment.
Ah yeah, my bad. Seems like this was 1-1 copied from the current Zigbee Lock handling (which is now in legacy-handlers), and the capability cache checks have just been moved to the new capabilities.
|
|
||
| if (credential_index == device:get_field(lock_utils.CHECKING_CODE)) then | ||
| -- the credential we're checking has arrived | ||
| local last_slot = device:get_latest_state("main", capabilities.lockCredentials.ID, |
There was a problem hiding this comment.
Are we guaranteed to have pinUsersSupported state at this point or do we need to handle the case where it's not set yet?
| device:emit_event(event) | ||
| end | ||
| local function device_added(driver, device) | ||
| if device:supports_capability_by_id(capabilities.lockCodes.ID) and device._provisioning_state == "TYPED" then |
There was a problem hiding this comment.
This provisioning state typed check should only not be true for devices that are driver switched, do we need to consider the driver switch case wrt migration? I assume this would rarely happen and it it did, it would be a user moving from a custom driver to ours, in which case IMO we can ask them to re-onboard if anything goes wrong.
There was a problem hiding this comment.
FWIW in the matter lock driver, we just have a separate handler for the driverSwitched event that mirrors do_configure, If we want to address the driver switch case, I think we could add handling for the driver switch case by encapsulating the configuration logic in a helper function and just calling that same function from the doConfigure and the driverSwitched handler.
There was a problem hiding this comment.
IMO driver switching locks should be treated like a LAN device, and basically not be supported officially. Should anyone try to do it and face issues, the solution is re-onboarding.
There was a problem hiding this comment.
That was my thinking. Agreed that we could add a driverSwitched handler if we wanted to support that, but it should not match the normal user experience.
There was a problem hiding this comment.
Also, fyi the added handler shouldn't really be related to the SWITCHED state.
| device:emit_event(LockCodes.maxCodeLength(value.value, { visibility = { displayed = false } })) | ||
| local add_user_handler = function(driver, device, command) | ||
| if lock_utils.busy_check_and_set(device, {name = lock_utils.ADD_USER, type = lock_utils.LOCK_USERS}) then | ||
| return |
There was a problem hiding this comment.
If this happens, whats the user experience? Does the app just time out?
| device:emit_event(LockCredentials.minPinCodeLen(min_code_len, { visibility = { displayed = false } })) | ||
| device:emit_event(LockCredentials.maxPinCodeLen(max_code_len, { visibility = { displayed = false } })) | ||
| device:emit_event(LockCredentials.pinUsersSupported(max_codes, { visibility = { displayed = false } })) | ||
| device:emit_event(LockCredentials.credentials(lock_credentials, { visibility = { displayed = false } })) |
There was a problem hiding this comment.
You could consider setting the credential and user tables on the device at this point. I think everything should work since the tables are reloaded when accessed, but its a bit fragile and potentially prone to race conditions to rely on the state cache to be immediately updated.
| SLGA_MIGRATED = "slgaMigrated" | ||
| } | ||
|
|
||
| lock_utils.DELETE_ALL_USERS = 0xFF |
There was a problem hiding this comment.
This overrides what was previously set ("deleteAllUsers"). I think everything will still work as intended (just a weird command status name during the delete operation, but it does look like a potential bug, and is misleading. I think you should a new constant for the protocol credential index special value that means all users.
There was a problem hiding this comment.
Actually I think we will emit an event for command status with 0xFF as the command name.
| lock_utils.send_events(device) | ||
| end | ||
| -- ignore handling the busy state for these commands, they are handled within their own handlers | ||
| if command ~= nil and command ~= lock_utils.DELETE_ALL_CREDENTIALS and command ~= lock_utils.DELETE_ALL_USERS then |
There was a problem hiding this comment.
These command ~= ... checks will always be true since command is a table.
There was a problem hiding this comment.
Fyi I've decided to make some updates to all the command name stuff, so I'd hold off on reviewing this for the moment.
| local user_index = tonumber(command.args.userIndex) | ||
| local user_type = command.args.userType | ||
| local credential_type = command.args.credentialType | ||
| local credential_data = command.args.credentialData |
There was a problem hiding this comment.
I think we should also extract credentialName, and include that in the ACTIVE_CREDENTIAL field, so the data is not lost when we emit an event. Same goes for the other places we set ACTIVE_CREDENTIAL in update_credential_handler and delete_credential_handler, and then the lock_utils.add_credential function would need to consider it as well (I dont think I missed any...)
| end | ||
| event.data = {method = METHOD[0], codeId = code_id .. "", codeName = code_name} | ||
|
|
||
| event.data = { method = METHOD[0], codeId = code_id .. "", codeName = code_name } |
There was a problem hiding this comment.
I think this is the old event data, and should include the new lock capability fields. The full list of fields in the lock attribute are:
- codeId
- codeName
- userIndex
- userName
- userType
- timeout
- unlockDirection
I am not sure what the plugin expects for these events, but the requirements say that unlock events should include the user information, and the codeId/codeName are legacy fields.
| lock_codes[code_id] ~= nil) then | ||
| code_name = lock_codes[code_id] | ||
| local code_name = "Code " .. code_id | ||
| local user = lock_utils.get_user(device, code_id) |
There was a problem hiding this comment.
The zigbee message's user_id maps to our credential capability not the users capability.
| local user = lock_utils.get_user(device, code_id) | |
| local credential = lock_utils.get_credential(device, code_id) | |
| local user = lock_utils.get_user(device, credential.userIndex) |
| device.thread:call_with_delay(delay + 2, function(d) | ||
| device:send(LockCluster.server.commands.GetPINCode(device, credential_index)) | ||
| end) | ||
| delay = delay + 2 |
There was a problem hiding this comment.
If there are more than 5 credentials being deleted, the "busy state" will be cleared prior to the command being completed.
We could consider upping the busy state timeout to be 30 or 60 seconds, but it comes with the possibility that the user will see unexplained failures if they try to deal with any failure situations by issuing more commands within that window of the initial failed command.
There was a problem hiding this comment.
A similar timing situation occurs with delete_all_users
| device:send(LockCluster.attributes.NumberOfPINUsersSupported:read(device)) | ||
| end | ||
| if (device:get_latest_state("main", capabilities.lockUsers.ID, capabilities.lockUsers.totalUsersSupported.NAME) == nil) then | ||
| device:send(LockCluster.attributes.NumberOfTotalUsersSupported:read(device)) |
There was a problem hiding this comment.
I dont think this is needed. 1 there is no handler for the attribute, and 2 it has no bearing on our credentials/user capabilities. I think the total number of users supported is equal to the total number of pin users supported, although I am not entirely sure how the legacy driver supported RFID credentials on zigbee locks...the max users according to the spec is the larger of NumberOfPINUsersSupported or NumberOfRFIDUsersSupported
| -- Licensed under the Apache License, Version 2.0 | ||
|
|
||
| return function(opts, driver, device) | ||
| local fingerprints = require("yale-fingerprint-lock.fingerprints") |
There was a problem hiding this comment.
I think this should be
| local fingerprints = require("yale-fingerprint-lock.fingerprints") | |
| local fingerprints = require("legacy-handlers.yale-fingerprint-lock.fingerprints") |
There was a problem hiding this comment.
Nah, it is correctly pointed. The same fingerprints accessing this legacy subdriver should access the updated subdriver.
| if device.data.lockCodes ~= nil and device:get_field(legacy_lock_utils.MIGRATION_COMPLETE) ~= true then | ||
| -- build the lockCodes table | ||
| local lockCodes = {} | ||
| local lc_data = json.decode(device.data.lockCodes) |
There was a problem hiding this comment.
Maybe add a nil check for device.data.lockCodes? in case migration is incomplete or something
| device:set_field(lock_utils.CHECKING_CODE, checkingCode) | ||
| device:send(LockCluster.server.commands.GetPINCode(device, checkingCode)) | ||
| end | ||
| driver:inject_capability_command(device, { |
There was a problem hiding this comment.
Do you know why this is done via injecting the refresh command rather than just calling a separate function? If a delay is necessary, a timer could be used. This might just be something historical from the old driver code, I just didn't know the reason why it was done this way and was curious.
Description of Change
Follow-up to #2937, where I've responded to some comments that myself and others have left in order to clarify certain functionality.
Note on structure: The legacy-handlers subdriver keeps some subdrivers that have specific non-conformant behavior that relates specifically to lockCodes. This includes lock-without-codes, which no longer needs to exist in the modern implementation, as well as the yale-fingerprint-lock subdriver, which sets the lockCodes maxCodes state. Similarly, the yale subdriver has lockCodes-specific behavior as well.
However, upon analysis it was clear that the sub-sub-driver yale-bad-battery-reporter subdriver was 1. not unique to yale, and 2. was distinct from any lockCodes behavior, so it was pulled out into the now default subdrivers. Also, the samsungds subdriver had minimal differences between the old and new handling, so I condensed them.
The other primary thing that was removed from the now-legacy-handlers subdriver is the old added handler, for the following 2 reasons. 1. Generally, devices should not be added to this driver any longer, and 2. We now use the added main driver implementation to initially set the migrated keyword to true.
I find the fact that the entire migrated state depends on a keyword set only once in added to be a little precarious, but nothing stands out to me initially at this not working.
Summary of Completed Tests
Migration process, plus lockUsers/lockCredentials functionality, has been tested on-device with two different kwikset locks.
Unit tests added (I renamed some of the previous tests to be
_legacy, to clarify that they are testing pre-migration functionality.