Re: [RFR] gianfar ethernet driver

From: Jeff Garzik <hidden>
Date: 2004-07-07 05:27:40

Possibly related (same subject, not in this thread)

2004-08-22 · Re: [RFR] gianfar ethernet driver · Jeff Garzik <hidden>
2004-08-16 · Re: [RFR] gianfar ethernet driver · Kumar Gala <hidden>
2004-08-04 · Re: [RFR] gianfar ethernet driver · Andy Fleming <hidden>
2004-08-02 · Re: [RFR] gianfar ethernet driver · Andy Fleming <hidden>
2004-08-02 · Re: [RFR] gianfar ethernet driver · Jeff Garzik <hidden>

Andy Fleming wrote:

quoted

7) ethtool_ops should not be kmalloc'd.
+       dev_ethtool_ops =
+               (struct ethtool_ops *)kmalloc(sizeof(struct 
ethtool_ops),
+                                             GFP_KERNEL);


Is there a particular reason?  The reason I did it this way is because 
the driver supports a 10/100 controller which does not have the RMON 
statistics, and therefore should not read from that register space.  So 
I needed to use different functions to fill in the statistics lists.

Just switch out one static pointer for another, once you know the 
hardware you're dealing with.  You _must_ have that knowledge anyway, 
before calling register_netdev(), otherwise you're got a race in 
initialization versus interface-up.

quoted

8) I recommend enabling _both_ hardware interrupt coalescing and NAPI,
at the same time.  Several other Linux net drivers need to be changed to
do this, as well


Ok... but this is possible with the driver as it is.  Interrupt 
Coalescing is runtime configurable, and NAPI is a compile-time option.  
NAPI can sometimes hurt performance, and so we'd like to have the 
ability to disable it for some deployments.  I guess I'm not sure what 
change this suggestion was meant to cause.

Default hardware mitigation to on in all cases, and preferably NAPI as well.

If you see cases that hurt performance, that wants investigating.

quoted

14) surely gfar_set_mac_address() needs some sort of synchronization?


I'm not sure why.  It only gets called during gfar_open(), and 
startup_gfar() gets called after it.

Incorrect.  Set-mac-address can be called when the interface is up and 
active, such as by the bonding driver.

quoted

15) gfar_change_mtu() synchronization appears absent?


I'm not exactly sure what kind of synchronization is expected here.  
stop_gfar() and startup_gfar() do modify the appropriate hardware values.

Same conditions (and response) as set-mac-address.  These can be called 
while you're actively DMAing packets.

quoted

23) gfar_set_multi() does not appear to take into account huge
multi-cast lists.  Drivers usually handle large dev->mc_count by falling
back to ALLMULTI behavior.


Actually, it does.  The bits which are set represent hash table values. 
 Essentially, the MAC addr is converted to an 8-bit CRC.  This value 
then serves as an index to the 256-bit hash table.  If the list is 
large, then certain bits may be written twice, but if all the bits are 
set, then the behavior is the same as in the ALLMULTI behavior -- every 
multicast packet arrives.  It would be a mistake, IMHO, to cut it off 
after N entries, as several of those entries could overlap in the 
bitmap.  Clearly, the less bits which are set, the more effective the 
hardware filter, so checking each address will, I think, always be a 
performance win or tie (on the filtering side) over ALLMULTI

ok

quoted

24) setting IFF_MULTICAST is suspicious at best, possibly wrong:
+       dev->mtu = 1500;
+       dev->set_multicast_list = gfar_set_multi;
+       dev->flags |= IFF_MULTICAST;
+


I'll believe you, but is there a reason for this?  I guess it's already 
set, by default, so easy fix.

follow the other drivers, and behave predictably :)

quoted

28) I see you support register dump (good!), now please send the
register-pretty-print patch to the ethtool package :)


Heh.  Well, I haven't written that, yet.  There's actually a slight 
problem, in that the 85xx registers are 32-bit, and refuse to be 
accessed as bytes.  I'll be sure to look at that in the near...ish future.

Nothing wrong with accessing registers as 32-bit quantities.  That 
attribute is common to a lot of NICs.

quoted

31) infinite loops, particularly on 1-bits, are discouraged:
+       /* Wait for the transaction to finish */
+       while (gfar_read(&regbase->miimind) & (MIIMIND_NOTVALID |
MIIMIND_BUSY))
+               cpu_relax();


What is the suggested method for waiting for the bus to be free?  Should 
I timeout after some time, and bring the driver down?

it's really up to you, as it depends on the implementation platforms.

The most simple is to include a counter that counts down to zero, and 
starts some absurdly large number like 100000.

quoted

33) merge-stopper:  mii_parse_sr().  never wait for autonegotiation to
complete.  it is an asynchronous operation that could exceed 30 seconds.


Hmm...

quoted

34) merge-stopper:  dm9161_wait().  same thing as #33.


This may be a problem.  That function is there to work around an issue 
in the PHY, wherein if you try to configure it before it has come up all 
the way, it refuses to bring the link up.  We've sworn at this code many 
times, but there has, as of yet, not been a good suggestion as to how we 
can ensure that the 9161 is  ready before we configure it.

I interpret that as a driver bug :)

As common sense, regardless of phy bugs, you should not be trying to 
configure the MAC _or_ the phy in the middle of autonegotiation. 
Presumeably you are using a combination of netif_stop_queue(), 
netif_carrier_off(), and netif_device_detach() to achieve this.

quoted

35) liberally borrow code and ideas from Ben H's sungem_phy.c.
Eventually we want to move to a generic phy module.


Heh.  ironically, I stole liberally from the ibm_emac driver, which now 
looks exactly like the sungem_phy code for phy handling.  A generic phy 
module would be a good thing, and I'm even interested in helping with 
code and/or suggestions.  If nothing else, I'll jump on the new generic 
phy module bandwagon!

Cool.  This item #35 is more of a long term "think about it" type of 
request.  Please do not hesitate to think of ways that you could share 
code with sungem_phy.c, and/or make them both use the same API, and 
submit patches along those lines :)

It's not enough to just write a driver, help change Linux for the better :)

 From Jamal:

quoted

1) The check (in gfar_start_xmit()):

Should happen much sooner - i.e before the skb is touched.


I'm not sure I agree here.  What I am doing is detecting the full state 
before an skb needs to be rejected.  I am testing the NEXT descriptor to 
see if it is ready (if not, dirty_tx will be pointing to it).  This way, 
I always handle any skb that is passed to gfar_start_xmit()

See my comments to jamal as well:  if you guarantee that you always have 
room on the DMA ring for an additional skb, that check should never be 
needed.

Some extremely cautious/paranoid programmers will add a check, with a 
printk noting it's a BUG (i.e. impossible) condition, such as

	if (unlikely(... no more descriptors ...)) {
		printk(KERN_ERR "%s: BUG: no room for TX\n", dev->name);
		netif_stop_queue();
		spin_unlock_irqrestore();
		return 1;
	}

	... queue TX to hardware ...

	if (no more descriptors)
		netif_stop_queue()

	spin_unlock_irqrestore()

quoted

2) Also its pretty scary if you are doing:
txbdp->status |= TXBD_INTERRUPT for every packet.
Look at other drivers, they try to do this every few packets;
or enable tx mitigation to slow the rate  of those interupts.


I don't believe this is as dire as you think.  This bit only indicates 
that, if the conditions are right, an interrupt will be generated once 
that descriptor is processed, and its data sent.  Conditions which 
mitigate that are:
1) coalescing is on, and the timer and counter have not triggered the 
interrupt yet
2) NAPI is enabled, and so interrupts are disabled after the first 
packet arrives
3) NAPI is disabled, but the driver is currently handling a previous 
interrupt, so the interrupts are disabled for now.

Even at 10/100 speeds, you really don't want to be generating one 
interrupt per Tx...

	Jeff

`h`	back out one level
`j`	next message in thread
`k`	previous message in thread
`l`	drill in
`Esc`	close help / fold thread tree
`?`	toggle this help