Thread (18 messages) 18 messages, 4 authors, 2013-10-11

Re: [PATCH v4 4/4] powerpc/85xx: add sysfs for pw20 state and altivec idle

From: Scott Wood <hidden>
Date: 2013-09-30 23:06:38

On Sun, 2013-09-29 at 01:57 -0500, Wang Dongsheng-B40534 wrote:
quoted
-----Original Message-----
From: Wood Scott-B07421
Sent: Saturday, September 28, 2013 5:33 AM
To: Wang Dongsheng-B40534
Cc: Wood Scott-B07421; Bhushan Bharat-R65777; linuxppc-
dev@lists.ozlabs.org
Subject: Re: [PATCH v4 4/4] powerpc/85xx: add sysfs for pw20 state and
altivec idle

On Thu, 2013-09-26 at 22:34 -0500, Wang Dongsheng-B40534 wrote:
quoted
cycle = div_u64(ns * tb_ticks_per_usec, 1000); It's look good.
But why we need:
	if (ns >= 10000)
		cycle = ((ns + 500) / 1000) * tb_ticks_per_usec; ?

I think "cycle = div_u64(ns * tb_ticks_per_usec, 1000)" is good
enough. :)
It's to deal with overflow if a very large value of ns is provided
(and/or tb_ticks_per_usec is larger than we expect).
:), as I think, it's to deal with overflow. But you version cannot do deal with it.
Because ns is u64, if ns > 0xffffffff_fffffe0b, ns + 500 will overflow, And if tb_ticks_per_usec > 1000 and ns > (0xffffffff_ffffffff / 10) cycle also will overflow.
Sigh... It significantly increases the value of ns at which you'll get
overflow problems. :-)  I was more concerned with
large-but-somewhat-reasonable values of ns (e.g. than with trying to
handle every possible input.  Even without that test, getting overflow
is stretching the bounds of reasonableness (e.g. a 1 GHz timebase would
require a timeout of over 7 months to overflow), but it was low-hanging
fruit.  And the worst case is we go to pw20 sooner than the user wanted,
so let's not go too overboard.

I doubt you would see > 0xffffffff_fffffe0b except if someone is trying
to disable it by setting 0xffffffff_ffffffff even though a separate API
is provided to cleanly disable it.
I think we need to do this:

#define U64_LOW_MASK            0xffffffffULL
#define U64_MASK                0xffffffffffffffffULL

        u32 tmp_rem;
        u64 ns_u_rem, ns_u, ns_l, ns_l_carry;
        u64 cycle;

        ns_u = ns >> 32;
        ns_l = ns & U64_LOW_MASK;

        ns_l *= tb_ticks_per_usec;
        ns_l_carry = ns_l >> 32;
        ns_u *= tb_ticks_per_usec;
        ns_u += ns_l_carry;

        ns_u = div_u64_rem(ns_u, 1000, &tmp_rem);
        ns_u_rem = tmp_rem;
        ns_l = (ns_l & U64_LOW_MASK) | ((ns_u_rem) << 32);
        ns_l = div_u64(ns_l, 1000);

        if (ns_u >> 32)
                cycle = U64_MASK;
        else
                cycle = (ns_u << 32) | (ns_l & U64_LOW_MASK);

I has already tested this code, and works good. :)
Ugh.  I don't think we need to get this complicated (and I'd rather not
spend the time verifying the correctness of this).

If for some reason we did need something like this in some other context
(I don't want to see it just for pw20), I'd be more inclined to see
general 128-bit mult/divide support.

-Scott
Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help