Thread (18 messages) 18 messages, 12 authors, 2021-05-05

Re: [dpdk-dev] Questions about API with no parameter check

From: Morten Brørup <hidden>
Date: 2021-05-03 15:19:25

From: dev [mailto:dev-bounces@dpdk.org] On Behalf Of Tyler Retzlaff
Sent: Friday, April 30, 2021 2:16 AM

On Thu, Apr 29, 2021 at 09:49:24PM +0300, Dmitry Kozlyuk wrote:
quoted
2021-04-29 09:16 (UTC-0700), Tyler Retzlaff:
quoted
On Wed, Apr 07, 2021 at 05:10:00PM +0100, Ferruh Yigit wrote:
quoted
On 4/7/2021 4:25 PM, Hemant Agrawal wrote:
quoted
quoted
+1
But are we going to check all parameters?
+1

It may be better to limit the number of checks.
+1 to verify input for APIs.

Why not do all, what is the downside of checking all input for
control path APIs?
quoted
quoted
why not assert them then, what is the purpose of returning an error
to a
quoted
quoted
caller for a api contract violation like a `parameter shall not be
NULL`
quoted
quoted
* assert.h/cassert can be compiled away for those pundits who don't
want
quoted
quoted
  to see extra branches in their code

* when not compiled away it gives you an immediate stack trace or
dump to operate
quoted
quoted
  on immediately identifying the problem instead of having to troll
  through hoaky inconsistently formatted logging.

* it catches callers who don't bother to check for error from
return of
quoted
quoted
  the function (debug builds) instead of some arbitrary failure at
some
quoted
quoted
  unrelated part of the code where the corrupted program state is
relied
quoted
quoted
  upon.

we aren't running in kernel, we can crash.
As library developers we can't assume stability requirements at call
site.
quoted
There may be temporary files to clean up, for example,
or other threads in the middle of their work.
if a callers state is so incoherent that it is passing NULL to
functions
that contractually expect non-NULL it is already way past the point of
no return. continuing to run only accomplishes destroying the state
that
might be used to diagnose the originating flaw in program logic.

if you return an error instead of fail fast at best you'll crash soon
but
more often then not you'll keep running and produce incorrect results
or worst
keep running security compromised.

about the only argument that can be made for having this silly error
pattern that is valid is when many-party code is running inside the
same
process and you don't want someone elses bad code taking your process
down. a problem that i am accutely aware of in allowing 3rd party code
run
in kernel space. (but this is mostly? mitigated by multi-process mode).
quoted
As an application developer I'd hate to get a crash inside a library
and
quoted
having to debug it. Usually installed are release versions with
assertions
quoted
compiled away.
so it wouldn't crash at all at least not at the point of failure. the
only
difference is i guess you wouldn't get a log message with what is being
done
now.

could we turn this around and have it tunable by policy instead of
opting everyone in to this behavior maybe?  i'm just making some ideas
up on
the fly but couldn't we just have something that is compile time
policy?

#ifdef EAL_FAILURE_POLICY_RETURN
#define EAL_FAILURE(condition, error) \
if ((condition)) { \
    return (error); \
}
#else
#define EAL_FAILURE(condition, error) \
    assert(! (condition), (error));
#endif
I agree with the overall idea - it's better to fail immediately when a violation is detected. And more asserts are better than fewer.

However, I don't see the need for a completely new macro. For testing contract violations and similar, we already have RTE_VERIFY() and RTE_ASSERT(), where the latter can be controlled by RTE_ENABLE_ASSERT at compile time.
also, i'll point out that lately there have been a lot of patches
accepted that call functions and don't evaluate their return value and
the reason is those functions really should never have been "failable".
so we'll just see more of that as we stack on often compile time or
immediate runtime failure returns. of course the compatibility of the
code calling these functions is only as good as the implicit dependency
on the implementation... until it changes and the application
misbehaves.

i'll also throw another gripe in here that there are a lot of
"deallocation" functions in dpdk that according to their api can fail
again because of this kind of "oh i'll fail because i got a bad
parameter design".

deallocation should never fail ever and i shouldn't need to write logic
around a deallocation to handle failures. imagine if free failed?

p = malloc(...);
if (p == NULL)
     return -1;

... do work with p ...

rv = free(p);
if (rv != 0) ... what the hell? yet this pattern exists in a bunch of
places. it's insane. (i'll quietly ignore the design error that free
does accept NULL and is a noop standardized *facepalm*).

anyway, i guess i've ranted enough. there are some users who would
prefer not to have this but i admit there are an overwhelming number of
people who seem to want it.
Good rant. More thoughts should be put into API design. We certainly don't need failure return values for functions that shouldn't be able to fail in the first place.

Application errors are caught earlier in the development process, when libraries fail hard (e.g. rte_panic()) on errors instead of trying to handle them gracefully!
Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help