Re: Firmware debugging patches?

From: Emmanuel Grumbach <hidden>
Date: 2014-06-02 19:29:05

Emmanuel Grumbach
egrumbach@gmail.com


On Mon, Jun 2, 2014 at 9:58 PM, Ben Greear [off-list ref] wrote:

On 06/02/2014 11:46 AM, Emmanuel Grumbach wrote:

quoted

[Good stuff snipped, adding linux-wireless as this is a more
general issue if we are going to consider general framework]


Maybe we should start with goals before getting to implementation
details.  Here's my wish list that is ath10k specific, but probably
similar to other firmware users:

1)  We need the firmware crash text currently printed to
/var/log/messages.

2)  It would be nice to get the firmware RAM and stack dumps at time of
crash to debug more interesting crashes.

Right - but typically you'll have closed source / IP / whatever there..

I mean that we need the raw data (ie, binary dump, something printed
in ascii-hex, etc).  I understand it will take proprietary tools to
decode it to something a developer can actually debug.

quoted

3)  It would be nice to know about firmware debug messages for
the period of time directly before the crash (maybe 2-5 minutes?)

4)  It would be nice to have this interleaved with kernel, supplicant,
and related logs.


We need a solution for different types of users.  I suspect the number
of crashes seen in the wild will be more for users nearer the top
of this list.

a) Normal Fedora/Ubuntu/etc default-installed distribution user
with ath10k NIC has wifi issues, firmware crashes, they don't
really know what firmware means or that it crashed, but some automated crash-log
tool notices and gathers debug info for automated bug reporting.

I am working on that for our firmware. I recently added such capability relying on udev to notify the userspace that something bad happens. I gather all the data and prepare a binary file that is sent through debugfs (pulled by a script triggered by udev). I remember the first crash only.

How is this binary blob encoded?

Different TLV based binary blobs concatenated. The actual encoding of
each of them is another story.

At least for drivers that can recover from firmware crashes, I think
we should continue to report crashes, not just the first.

I remember the first until udev kicks the script that will empty the
buffer. Then I take the second crash's log.

Maybe could store another one after initial crash has been read
and 1 minute has elapsed, or if initial crash has not been read
in 1 day, or something like that.

Also, if we use debugfs then we require upstream kernels to have this
compiled in and mounted if we want to handle this class of user.

Agreed. I rely on debugfs. But this is "just" the way to reach the filesystem.
Give me another way and I am fine with it.
FWIW Ubuntu which is not exactly the distribution of the super
advanced users has it mounted by default.

I am not sure this is really the case currently.  But, once the
blob is generated and stored in RAM, it would be easily enough to
add ethtool option to dump it w/out debugfs support.  This will
still not really address my concerns because it may take a year
or two for the latest ethtool binary to make it to normal-ish users.

I understand.

quoted

b) Slightly more advanced user actually notices the problem at coffee shop
earlier today, posts about it when they get home, and we ask for
debug info.

c) Experienced and determined user has similar issues, but is able to
reproduce the problem and/or turn on more advanced debugging efforts.

d)  Even more determined user that can and will recompile kernels and/or
try patches.


Anything that has to be enabled before-hand will not help a) and b) above.

If support is not compiled into default kernels, c) will not help you either.

If it is difficult or requires acquiring cutting edge tools not in their
distribution by default, many of c) and some of d) will just ignore the problem or use
different hardware.

If we are storing crashes for something like ethtool to report, we need
RAM and/or disk storage so the firmware RAM dumps and such can be stored until
the user and/or automated tools ask for them.  We need some way to automatically
clean up old crashes so disk/ram is not overly utilized.  For APs,
they are low on both RAM and 'disk', so storing crash logs for any
length of time may be problematic.

I did something simpler - but it works. I don't really know the ethtool infrastructure though.

I think ethtool would not be overly hard to implement...basic framework is already
in the wifi stack.

Thanks,
Ben


--
Ben Greear [off-list ref]
Candela Technologies Inc  http://www.candelatech.com

`h`	back out one level
`j`	next message in thread
`k`	previous message in thread
`l`	drill in
`Esc`	close help / fold thread tree
`?`	toggle this help