Re: [PATCH net-next 08/14] ice: protect init and calibrating fields with spinlock
From: Jacob Keller <jacob.e.keller@intel.com>
Date: 2022-12-02 18:36:52
On 12/1/2022 1:18 AM, Leon Romanovsky wrote:
On Wed, Nov 30, 2022 at 11:43:24AM -0800, Tony Nguyen wrote:quoted
From: Jacob Keller <jacob.e.keller@intel.com> Ensure that the init and calibrating fields of the PTP Tx timestamp tracker structure are only modified under the spin lock. This ensures that the accesses are consistent and that new timestamp requests will either begin completely or get ignored. Signed-off-by: Jacob Keller <jacob.e.keller@intel.com> Tested-by: Gurucharan G <redacted> (A Contingent worker at Intel) Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com> --- drivers/net/ethernet/intel/ice/ice_ptp.c | 55 ++++++++++++++++++++++-- drivers/net/ethernet/intel/ice/ice_ptp.h | 2 +- 2 files changed, 52 insertions(+), 5 deletions(-)diff --git a/drivers/net/ethernet/intel/ice/ice_ptp.c b/drivers/net/ethernet/intel/ice/ice_ptp.c index a7d950dd1264..0e39fed7cfca 100644 --- a/drivers/net/ethernet/intel/ice/ice_ptp.c +++ b/drivers/net/ethernet/intel/ice/ice_ptp.c@@ -599,6 +599,42 @@ static u64 ice_ptp_extend_40b_ts(struct ice_pf *pf, u64 in_tstamp) (in_tstamp >> 8) & mask); } +/** + * ice_ptp_is_tx_tracker_init - Check if the Tx tracker is initialized + * @tx: the PTP Tx timestamp tracker to check + * + * Check that a given PTP Tx timestamp tracker is initialized. Acquires the + * tx->lock spinlock. + */ +static bool +ice_ptp_is_tx_tracker_init(struct ice_ptp_tx *tx) +{ + bool init; + + spin_lock(&tx->lock); + init = tx->init; + spin_unlock(&tx->lock); + + return init;How this type of locking can be correct? It doesn't protect anything and equal to do not have locking at all. Thanks
The init field is used to by the Tx timestamp work item to have it exit if it is executed after the Tx tracker starts being de-initialized. I guess technically reading the value would be atomic.. Though using a spinlock also ensures the appropriate memory barriers are in place around reading the value, preventing re-ordering. We clear the init field and then the Tx timestamp thread will stop re-arming itself. We used to have a kthread item which we would then call flush to ensure the last queued instance of the work item was removed... but ugh now that we're using a threaded interrupt I am not sure if we have any way to guarantee that the Tx work has completed. I am not sure how to address that now with a threaded interrupt. Hmm. We can't hold a spin lock for the entire duration of the Tx timestamp work because it might sleep while checking for an outstanding Tx timestamp via the device Admin queue interface to read the PHY timestamp. The goal is to ensure that a) timestamp requests guarantee that they each get unique indexes or fail to start. This is covered by the fact that timestamp requests all hold the lock over the duration of checking that requests are being accepted, picking the index, and setting that index's in_use bit b) timestamp completions are reported only once. The only place that processes timestamp completions is the threaded interrupt function, which can only have one instance executing at a time. This function doesn't hold the spin lock while iterating the in_use bitmap, but it does hold the lock when it reports the timestamp and before doing so it rechecks the in_use bit in case the timestamp got flushed. We used to have a separate thread that handled discarding old timestamps after 2 seconds but we now do that from within the same threaded function. Its possible that a request thread might execute while this function is processing and *set* a new in_use bit. In this case, either the function will see the update when iterating the in_use bitmap, in which case it will check the PHY register and process the timestamp if it happened to be completed. Or, it will miss the timestamp in that run. At the end of the loop, the lock is re-acquired and we check if any timestamp requests are outstanding. If so, we exit such that the threaded interrupt function will re-execute the Tx timestamp thread shortly to check again. c) In previous versions we had a kthread task which was used to run the Tx timestamp function. In that version we used kthread flush after clearing the init flag to ensure that the threaded function was completed. I think this is a gap since the commit that introduced a threaded interrupt instead. We need some way for the tear down to ensure that no more timestamp processing will occur. There is a small window after we clear init that we could race with tearing down and removing the Tx tracker memory now :( I am not sure what the best way to fix c) is. Perhaps an additional flag of some sort which indicates that the timestamp thread function is processing so that tear down can wait until after the interrupt function completes. Once init is cleared, the function will stop re-executing, but we need a way to wait until it has stopped. That method in tear down can't hold the lock or else we'd potentially deadlock... Maybe an additional "processing" flag which is set under lock only if the init flag is set? and then cleared when the function exits. Then the tear down can check and wait for the processing to be complete? Hmm. I realize this whole scheme is rather complicated. The biggest problem is that while reading timestamps we need to interact with firmware over the Admin queue, but we also need to safely be able to set new timestamp requests from the hot path with minimal disruption. If we locked over the entire Tx timestamp read process it would block hot path. I think there's a gap now with the threaded interrupt needing a way to ensure that no new Tx timestamp interrupts will be processed since we can no longer use kthread flushing to handle that. I believe the other parts are correct. Thanks, Jake