Thread (10 messages) 10 messages, 4 authors, 2020-03-20

Re: [linux-next/mainline][bisected 3acac06][ppc] Oops when unloading mpt3sas driver

From: Michael Ellerman <mpe@ellerman.id.au>
Date: 2020-01-16 06:42:27
Also in: linux-next, linux-scsi

Abdul Haleem [off-list ref] writes:
On Thu, 2020-01-09 at 06:22 -0800, Christoph Hellwig wrote:
quoted
On Thu, Jan 09, 2020 at 02:27:25PM +0530, Abdul Haleem wrote:
quoted
+ CC Christoph Hellwig
The only thing this commit changed for the dma coherent case (which
ppc64 uses) is that we now look up the page to free by the DMA address
instead of the virtual address passed in.  Which suggests this call
stack passes in a broken dma address.  I suspect we somehow managed
to disable the ppc iommu bypass mode after allocating memory, which
would cause symptoms like this, and thus the commit is just exposing
a pre-existing problem.
Trace with printk added for page->addr, will this help ?

mpt3sas_cm0: removing handle(0x000f), sas_addr(0x500304801f080d3d)
mpt3sas_cm0: enclosure logical id(0x500304801f080d3f), slot(12)
mpt3sas_cm0: enclosure level(0x0000), connector name(     )
mpt3sas_cm0: mpt3sas_transport_port_remove: removed:
sas_addr(0x500304801f080d3f)
mpt3sas_cm0: expander_remove: handle(0x0009),
sas_addr(0x500304801f080d3f)
mpt3sas_cm0: sending diag reset !!
mpt3sas_cm0: diag reset: SUCCESS 
page->vaddr = 0xc000003f2d200000
page->vaddr = 0xc000003f2ef00000
page->vaddr = 0xc000003f38430000
page->vaddr = 0xc000003f3d7d0000
page->vaddr = 0xc000003f75760000
BUG: Unable to handle kernel data access on write at 0xc04a000000017c34
We also want the dma address, Abdul did another run resulting in:

  mpt3sas_cm0: mpt3sas_transport_port_remove: removed: sas_addr(0x500304801f080d3f)
  mpt3sas_cm0: expander_remove: handle(0x0009), sas_addr(0x500304801f080d3f)
  mpt3sas_cm0: sending diag reset !!
  mpt3sas_cm0: diag reset: SUCCESS
  page->vaddr = 0xc000003fc5880000
  page->dma = 0x800003fc5880000
  page->vaddr = 0xc000003fc5900000
  page->dma = 0x800003fc5900000
  page->vaddr = 0xc000003fc5980000
  page->dma = 0x800003fc5980000
  page->vaddr = 0xc000003fc5990000
  page->dma = 0x800003fc5990000
  page->vaddr = 0xc000003fc7c70000
  page->dma = 0x5f00000
  BUG: Unable to handle kernel data access on write at 0xc04a000000017c34
  Faulting instruction address: 0xc000000000300780
  Oops: Kernel access of bad area, sig: 11 [#1]
  LE PAGE_SIZE=64K MMU=Hash SMP NR_CPUS=2048 DEBUG_PAGEALLOC NUMA PowerNV
  Modules linked in: iptable_mangle xt_MASQUERADE iptable_nat nf_nat xt_conntrack nf_conntrack nf_defrag_ipv4 ipt_REJECT nf_reject_ipv4 xt_tcpudp tun bridge stp llc iptable_filter btrfs blake2b_generic xor zstd_decompress zstd_compress lzo_compress raid6_pq vmx_crypto gf128mul powernv_rng rng_core kvm_hv nfsd kvm binfmt_misc ip_tables x_tables xfs libcrc32c qla2xxx ixgbe nvme_fc nvme_fabrics mdio nvme_core i40e mpt3sas(-) raid_class scsi_transport_sas autofs4
  CPU: 149 PID: 17518 Comm: rmmod Not tainted 5.5.0-rc5-next-20200108-autotest-00002-g36e1367-dirty #2
  NIP:  c000000000300780 LR: c0000000001aabe4 CTR: c00000000004a030
  REGS: c0000078ffab75d0 TRAP: 0380   Not tainted  (5.5.0-rc5-next-20200108-autotest-00002-g36e1367-dirty)
  MSR:  9000000000009033 <SF,HV,EE,ME,IR,DR,RI,LE>  CR: 24002424  XER: 20000000
  CFAR: c0000000001aabe0 IRQMASK: 0
  GPR00: c00000000004a0c8 c0000078ffab7860 c000000001321a00 c04a000000017c00
  GPR04: 0000000000000000 c000003fc7c70000 003e000000017c00 0000000000000000
  GPR08: 0000000000000000 c0000000013cd000 c04a000000017c34 0000000000000230
  GPR12: c00000000004a030 c000007ffef35000 0000000000000000 0000000000000000
  GPR16: 0000000000000000 0000000000000000 00000100140c0180 0000000010020098
  GPR20: 0000000010020050 0000000010020038 0000000005f00000 c000000000d60870
  GPR24: c000000000d60890 c000000000d608a8 0000000000000000 c0000000012a9818
  GPR28: 0000000005f00000 c000003fc7c70000 0000000000010000 c000003fdaa4c8a8
  NIP [c000000000300780] __free_pages+0x10/0x50
  LR [c0000000001aabe4] dma_direct_free_pages+0x54/0x90
  Call Trace:
  [c0000078ffab7880] [c00000000004a0c8] dma_iommu_free_coherent+0x98/0xd0
  [c0000078ffab78d0] [c0000000001a9c10] dma_free_attrs+0x110/0x120
  [c0000078ffab7920] [c000000000317750] dma_pool_destroy+0x1d0/0x270
  [c0000078ffab79d0] [c00800000dc51e98] _base_release_memory_pools+0x1d8/0x4b0 [mpt3sas]
  [c0000078ffab7a60] [c00800000dc5b9f0] mpt3sas_base_detach+0x40/0x150 [mpt3sas]
  [c0000078ffab7ad0] [c00800000dc6c92c] scsih_remove+0x24c/0x3e0 [mpt3sas]
  [c0000078ffab7b90] [c0000000006199a4] pci_device_remove+0x64/0x110
  [c0000078ffab7bd0] [c0000000006cf1a4] device_release_driver_internal+0x154/0x260
  [c0000078ffab7c10] [c0000000006cf37c] driver_detach+0x8c/0x140
  [c0000078ffab7c50] [c0000000006cd488] bus_remove_driver+0x78/0x100
  [c0000078ffab7c80] [c0000000006d0090] driver_unregister+0x40/0x90
  [c0000078ffab7cf0] [c0000000006190c8] pci_unregister_driver+0x38/0x110
  [c0000078ffab7d40] [c00800000dc7f188] _mpt3sas_exit+0x50/0x4118 [mpt3sas]
  [c0000078ffab7da0] [c0000000001dda18] sys_delete_module+0x1a8/0x2a0
  [c0000078ffab7e20] [c00000000000b9d0] system_call+0x5c/0x68
  Instruction dump:
  88830051 2fa40000 41de0008 4bffe7fc 7d234b78 4bfffe94 60000000 60420000
  3c4c0102 38421290 39430034 7c0004ac <7d005028> 3108ffff 7d00512d 40c2fff4
  ---[ end trace b8cbc679eff3dfcc ]---
  Segmentation fault


cheers
Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help