Re: [PATCH 4/6] usbip: fix stub_dev usbip_sockfd_store() races leading to gpf
From: Shuah Khan <skhan@linuxfoundation.org>
Date: 2021-03-09 15:23:12
Also in:
lkml
On 3/8/21 9:27 AM, Shuah Khan wrote:
On 3/8/21 3:10 AM, Tetsuo Handa wrote:quoted
On 2021/03/08 16:35, Tetsuo Handa wrote:quoted
On 2021/03/08 12:53, Shuah Khan wrote:quoted
Fix the above problems: - Stop using kthread_get_run() macro to create/start threads. - Create threads and get task struct reference. - Add kthread_create() failure handling and bail out. - Hold usbip_device lock to update local and shared states after creating rx and tx threads. - Update usbip_device status to SDEV_ST_USED. - Update usbip_device tcp_socket, sockfd, tcp_rx, and tcp_tx - Start threads after usbip_device (tcp_socket, sockfd, tcp_rx, tcp_tx, and status) is complete.No, the whole usbip_sockfd_store() etc. should be serialized using a mutex, for two different threads can open same file and write the same content at the same moment. This results in seeing SDEV_ST_AVAILABLE and creating two threads and overwiting global variables and setting SDEV_ST_USED and starting two threads by each of two thread, which will later fail to call kthread_stop() on one of two thread because global variables are overwritten. kthread_crate() (which involves GFP_KERNEL allocation) can take long time enough to hit usbip_sockfd_store() must perform if (sdev->ud.status != SDEV_ST_AVAILABLE) {Oops. This is if (sdev->ud.status == SDEV_ST_AVAILABLE) { of course.quoted
/* misc assignments for attach operation */ sdev->ud.status = SDEV_ST_USED; } under a lock, or multiple ud->tcp_{tx,rx} are created (which will later cause a crash like [1]) and refcount on ud->tcp_socket is leaked when usbip_sockfd_store() is concurrently called. problem. That's why my patch introduced usbip_event_mutex lock.And I think that same serialization is required between "rh_port_connect() from attach_store()" and "rh_port_disconnect() from vhci_shutdown_connection() via usbip_event_add(&vdev->ud, VDEV_EVENT_DOWN) from vhci_port_disconnect() from detach_store()", for both vhci_rx_pdu() from vhci_rx_loop() and vhci_port_disconnect() from detach_store() can queue VDEV_EVENT_DOWN event which can be processed without waiting for attach_store() to complete.Yes. We might need synchronization between events, threads, and shutdown in usbip_host side and in connection polling and threads in vhci. I am also looking at the shutdown sequences closely as well since the local state is referenced without usbip_device lock in these paths. I am approaching these problems as peeling the onion an expression so we can limit the changes and take a spot fix approach. We have the goal to address these crashes and not introduce regressions. I don't seem to be able to reproduce these problems consistently in my env. with the reproducer. I couldn't reproduce them in normal case at all. Hence, the this cautious approach that reduces the chance of regressions and if we see regressions, they can fixed easily. https://syzkaller.appspot.com/text?tag=ReproC&x=14801034d00000 If this patch series fixes the problems you are seeing, I would like get these fixes in and address the other two potential race conditions in another round of patches. I also want to soak these in the next for a few weeks. Please let me know if these patches fix the problems you are seeing in your env.
Can you verify these patches in your environment and see if you are seeing any problems? I want to first see where we are with these fixes. thanks, -- Shuah