Re: [linux-sunxi] [PATCH v2 1/2] media: v4l: Add definitions for the HEVC... | linux-arm-kernel

Re: [linux-sunxi] [PATCH v2 1/2] media: v4l: Add definitions for the HEVC slice format and controls

From: Maxime Ripard <hidden>
Date: 2019-01-30 07:58:03
Also in: linux-media, linux-rockchip, lkml

On Wed, Jan 30, 2019 at 12:35:41PM +0900, Tomasz Figa wrote:

On Wed, Jan 30, 2019 at 11:29 AM Alexandre Courbot
[off-list ref] wrote:

quoted

On Wed, Jan 30, 2019 at 6:41 AM Nicolas Dufresne [off-list ref] wrote:

quoted

Le mardi 29 janvier 2019 à 16:44 +0900, Alexandre Courbot a écrit :

quoted

On Fri, Jan 25, 2019 at 10:04 PM Paul Kocialkowski
[off-list ref] wrote:

quoted

Hi,

On Thu, 2019-01-24 at 20:23 +0800, Ayaka wrote:

quoted

Sent from my iPad

quoted

On Jan 24, 2019, at 6:27 PM, Paul Kocialkowski [off-list ref] wrote:

Hi,

quoted

On Thu, 2019-01-10 at 21:32 +0800, ayaka wrote:
I forget a important thing, for the rkvdec and rk hevc decoder, it would
requests cabac table, scaling list, picture parameter set and reference
picture storing in one or various of DMA buffers. I am not talking about
the data been parsed, the decoder would requests a raw data.

For the pps and rps, it is possible to reuse the slice header, just let
the decoder know the offset from the bitstream bufer, I would suggest to
add three properties(with sps) for them. But I think we need a method to
mark a OUTPUT side buffer for those aux data.

I'm quite confused about the hardware implementation then. From what
you're saying, it seems that it takes the raw bitstream elements rather
than parsed elements. Is it really a stateless implementation?

The stateless implementation was designed with the idea that only the
raw slice data should be passed in bitstream form to the decoder. For
H.264, it seems that some decoders also need the slice header in raw
bitstream form (because they take the full slice NAL unit), see the
discussions in this thread:
media: docs-rst: Document m2m stateless video decoder interface

Stateless just mean it won’t track the previous result, but I don’t
think you can define what a date the hardware would need. Even you
just build a dpb for the decoder, it is still stateless, but parsing
less or more data from the bitstream doesn’t stop a decoder become a
stateless decoder.

Yes fair enough, the format in which the hardware decoder takes the
bitstream parameters does not make it stateless or stateful per-se.
It's just that stateless decoders should have no particular reason for
parsing the bitstream on their own since the hardware can be designed
with registers for each relevant bitstream element to configure the
decoding pipeline. That's how GPU-based decoder implementations are
implemented (VAAPI/VDPAU/NVDEC, etc).

So the format we have agreed on so far for the stateless interface is
to pass parsed elements via v4l2 control structures.

If the hardware can only work by parsing the bitstream itself, I'm not
sure what the best solution would be. Reconstructing the bitstream in
the kernel is a pretty bad option, but so is parsing in the kernel or
having the data both in parsed and raw forms. Do you see another
possibility?

Is reconstructing the bitstream so bad? The v4l2 controls provide a
generic interface to an encoded format which the driver needs to
convert into a sequence that the hardware can understand. Typically
this is done by populating hardware-specific structures. Can't we
consider that in this specific instance, the hardware-specific
structure just happens to be identical to the original bitstream
format?

At maximum allowed bitrate for let's say HEVC (940MB/s iirc), yes, it
would be really really bad. In GStreamer project we have discussed for
a while (but have never done anything about) adding the ability through
a bitmask to select which part of the stream need to be parsed, as
parsing itself was causing some overhead. Maybe similar thing applies,
though as per our new design, it's the fourcc that dictate the driver
behaviour, we'd need yet another fourcc for drivers that wants the full
bitstream (which seems odd if you have already parsed everything, I
think this need some clarification).

Note that I am not proposing to rebuild the *entire* bitstream
in-kernel. What I am saying is that if the hardware interprets some
structures (like SPS/PPS) in their raw format, this raw format could
be reconstructed from the structures passed by userspace at negligible
cost. Such manipulation would only happen on a small amount of data.

Exposing finer-grained driver requirements through a bitmask may
deserve more exploring. Maybe we could end with a spectrum of
capabilities that would allow us to cover the range from fully
stateless to fully stateful IPs more smoothly. Right now we have two
specifications that only consider the extremes of that range.

I gave it a bit more thought and if we combine what Nicolas suggested
about the bitmask control with the userspace providing the full
bitstream in the OUTPUT buffers, split into some logical units and
"tagged" with their type (e.g. SPS, PPS, slice, etc.), we could
potentially get an interface that would work for any kind of decoder I
can think of, actually eliminating the boundary between stateful and
stateless decoders.

For example, a fully stateful decoder would have the bitmask control
set to 0 and accept data from all the OUTPUT buffers as they come. A
decoder that doesn't do any parsing on its own would have all the
valid bits in the bitmask set and ignore the data in OUTPUT buffers
tagged as any kind of metadata. And then, we could have any cases in
between, including stateful decoders which just can't parse the stream
on their own, but still manage anything else themselves, or stateless
ones which can parse parts of the stream, like the rk3399 vdec can
parse the H.264 slice headers on its own.

That could potentially let us completely eliminate the distinction
between the stateful and stateless interfaces and just have one that
covers both.

Thoughts?

If we have to provide the whole bitstream in the buffers, then it
entirely breaks the sole software stack we have running and working
currently, for a use case and a driver that hasn't seen a single line
of code.

Seriously, this is a *private* API that we did that way so that we can
change it and only make it public. Why not do just that?

Maxime

-- 
Maxime Ripard, Bootlin
Embedded Linux and Kernel engineering
https://bootlin.com

`h`	back out one level
`j`	next message in thread
`k`	previous message in thread
`l`	drill in
`Esc`	close help / fold thread tree
`?`	toggle this help