aboutsummaryrefslogtreecommitdiffstats
path: root/net/core/dev.c
Commit message (Collapse)AuthorAgeFilesLines
* net: Fix userspace RTM_NEWLINK notifications.Eric W. Biederman2009-12-131-0/+11
| | | | | | | | | | | | | | | | | | | | | | | | | | | I received some bug reports about userspace programs having problems because after RTM_NEWLINK was received they could not immediate access files under /proc/sys/net/ because they had not been registered yet. The original problem was trivially fixed by moving the userspace notification from rtnetlink_event() to the end of register_netdevice(). When testing that change I discovered I was still getting RTM_NEWLINK events before I could access proc and I was also getting RTM_NEWLINK events after I was seeing RTM_DELLINK. Things practically guaranteed to confuse userspace. After a little more investigation these extra notifications proved to be from the new notifiers NETDEV_POST_INIT and NETDEV_UNREGISTER_BATCH hitting the default case in rtnetlink_event, and triggering unnecessary RTM_NEWLINK messages. rtnetlink_event now explicitly handles NETDEV_UNREGISTER_BATCH and NETDEV_POST_INIT to avoid sending the incorrect userspace notifications. Signed-off-by: Eric W. Biederman <ebiederm@aristanetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* net: Handle NETREG_UNINITIALIZED devices correctlyKrishna Kumar2009-12-111-4/+6
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Fix two problems: 1. If unregister_netdevice_many() is called with both registered and unregistered devices, rollback_registered_many() bails out when it reaches the first unregistered device. The processing of the prior registered devices is unfinished, and the remaining devices are skipped, and possible registered netdev's are leaked/unregistered. 2. System hangs or panics depending on how the devices are passed, since when netdev_run_todo() runs, some devices were not fully processed. Tested by passing intermingled unregistered and registered vlan devices to unregister_netdevice_many() as follows: 1. dev, fake_dev1, fake_dev2: hangs in run_todo ("unregister_netdevice: waiting for eth1.100 to become free. Usage count = 1") 2. fake_dev1, dev, fake_dev2: failure during de-registration and next registration, followed by a vlan driver Oops during subsequent registration. Confirmed that the patch fixes both cases. Signed-off-by: Krishna Kumar <krkumar2@in.ibm.com> Acked-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* netdevice: provide common routine for macvlan and vlan operstate managementPatrick Mullaney2009-12-031-0/+27
| | | | | | | | | | Provide common routine for the transition of operational state for a leaf device during a root device transition. Signed-off-by: Patrick Mullaney <pmullaney@novell.com> Acked-by: Arnd Bergmann <arnd@arndb.de> Acked-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>
* net: Move network device exit batchingEric W. Biederman2009-12-031-0/+25
| | | | | | | | Move network device exit batching from a special case in net_namespace.c to using common mechanisms in dev.c Signed-off-by: Eric W. Biederman <ebiederm@xmission.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* net: Simplfy default_device_exit and improve batching.Eric W. Biederman2009-12-011-10/+6
| | | | | | | | | | - Defer dellink to net_cleanup() allowing for batching. - Fix comment. - Use for_each_netdev_safe again as dev_change_net_namespace touches at most one network device (unlike veth dellink). Signed-off-by: Eric W. Biederman <ebiederm@xmission.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* net: NETDEV_UNREGISTER_PERNET -> NETDEV_UNREGISTER_BATCHEric W. Biederman2009-12-011-27/+9
| | | | | | | | | | | | | | | | | | | | | The motivation for an additional notifier in batched netdevice notification (rt_do_flush) only needs to be called once per batch not once per namespace. For further batching improvements I need a guarantee that the netdevices are unregistered in order allowing me to unregister an all of the network devices in a network namespace at the same time with the guarantee that the loopback device is really and truly unregistered last. Additionally it appears that we moved the route cache flush after the final synchronize_net, which seems wrong and there was no explanation. So I have restored the original location of the final synchronize_net. Cc: Octavian Purdila <opurdila@ixiacom.com> Signed-off-by: Eric W. Biederman <ebiederm@xmission.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* net: Move && and || to end of previous lineJoe Perches2009-11-291-3/+4
| | | | | | | | | | | Not including net/atm/ Compiled tested x86 allyesconfig only Added a > 80 column line or two, which I ignored. Existing checkpatch plaints willfully, cheerfully ignored. Signed-off-by: Joe Perches <joe@perches.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* veth: move loopback logic to common locationArnd Bergmann2009-11-261-0/+40
| | | | | | | | | | | | | | | The veth driver contains code to forward an skb from the start_xmit function of one network device into the receive path of another device. Moving that code into a common location lets us reuse the code for direct forwarding of data between macvlan ports, and possibly in other drivers. Signed-off-by: Arnd Bergmann <arnd@arndb.de> Acked-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>
* net: use net_eq to compare netsOctavian Purdila2009-11-251-2/+2
| | | | | | | | | | | | | | | | | | | | | | | Generated with the following semantic patch @@ struct net *n1; struct net *n2; @@ - n1 == n2 + net_eq(n1, n2) @@ struct net *n1; struct net *n2; @@ - n1 != n2 + !net_eq(n1, n2) applied over {include,net,drivers/net}. Signed-off-by: Octavian Purdila <opurdila@ixiacom.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* net: Fix missing kernel-doc notationJaswinder Singh Rajput2009-11-221-1/+1
| | | | | | | | | Fix the following htmldocs warning: Warning(net/core/dev.c:5378): bad line: Signed-off-by: Jaswinder Singh Rajput <jaswinderrajput@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* net: rename skb->iif to skb->skb_iifEric Dumazet2009-11-201-3/+3
| | | | | | | To help grep games, rename iif to skb_iif Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* net: device name allocation cleanupsOctavian Purdila2009-11-181-45/+24
| | | | | Signed-off-by: Octavian Purdila <opurdila@ixiacom.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* linkwatch: linkwatch_forget_dev() to speedup device dismantleEric Dumazet2009-11-181-0/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Herbert Xu a écrit : > On Tue, Nov 17, 2009 at 04:26:04AM -0800, David Miller wrote: >> Really, the link watch stuff is just due for a redesign. I don't >> think a simple hack is going to cut it this time, sorry Eric :-) > > I have no objections against any redesigns, but since the only > caller of linkwatch_forget_dev runs in process context with the > RTNL, it could also legally emit those events. Thanks guys, here an updated version then, before linkwatch surgery ? In this version, I force the event to be sent synchronously. [PATCH net-next-2.6] linkwatch: linkwatch_forget_dev() to speedup device dismantle time ip link del eth3.103 ; time ip link del eth3.104 ; time ip link del eth3.105 real 0m0.266s user 0m0.000s sys 0m0.001s real 0m0.770s user 0m0.000s sys 0m0.000s real 0m1.022s user 0m0.000s sys 0m0.000s One problem of current schem in vlan dismantle phase is the holding of device done by following chain : vlan_dev_stop() -> netif_carrier_off(dev) -> linkwatch_fire_event(dev) -> dev_hold() ... And __linkwatch_run_queue() runs up to one second later... A generic fix to this problem is to add a linkwatch_forget_dev() method to unlink the device from the list of watched devices. dev->link_watch_next becomes dev->link_watch_list (and use a bit more memory), to be able to unlink device in O(1). After patch : time ip link del eth3.103 ; time ip link del eth3.104 ; time ip link del eth3.105 real 0m0.024s user 0m0.000s sys 0m0.000s real 0m0.032s user 0m0.000s sys 0m0.001s real 0m0.033s user 0m0.000s sys 0m0.000s Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* net: introduce NETDEV_UNREGISTER_PERNETOctavian Purdila2009-11-181-2/+27
| | | | | | | | | | | | | This new event is called once for each unique net namespace in batched unregister operations (with the argument set to a random device from that namespace) and once per device in non-batched unregister operations. It allows us to factorize some device unregister work such as clearing the routing cache. Signed-off-by: Octavian Purdila <opurdila@ixiacom.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* net: add dev_txq_stats_fold() helperEric Dumazet2009-11-171-19/+29
| | | | | | | | | Some drivers ndo_get_stats() method need to perform txqueue stats folding. Move folding from dev_get_stats() to a new dev_txq_stats_fold() function Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* Merge branch 'master' of ↵David S. Miller2009-11-171-5/+6
|\ | | | | | | | | | | | | master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6 Conflicts: drivers/net/can/Kconfig
| * net: Fix the rollback test in dev_change_name()Eric Dumazet2009-11-161-5/+6
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | net: Fix the rollback test in dev_change_name() In dev_change_name() an err variable is used for storing the original call_netdevice_notifiers() errno (negative) and testing for a rollback error later, but the test for non-zero is wrong, because the err might have positive value as well - from dev_alloc_name(). It means the rollback for a netdevice with a number > 0 will never happen. (The err test is reordered btw. to make it more readable.) Signed-off-by: Jarek Poplawski <jarkao2@gmail.com> Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | net: Optimize hard_start_xmit() return checkingJarek Poplawski2009-11-151-17/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Recent changes in the TX error propagation require additional checking and masking of values returned from hard_start_xmit(), mainly to separate cases where skb was consumed. This aim can be simplified by changing the order of NETDEV_TX and NET_XMIT codes, because the latter are treated similarly to negative (ERRNO) values. After this change much simpler dev_xmit_complete() is also used in sch_direct_xmit(), so it is moved to netdevice.h. Additionally NET_RX definitions in netdevice.h are moved up from between TX codes to avoid confusion while reading the TX comment. Signed-off-by: Jarek Poplawski <jarkao2@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | net: check the return value of ndo_select_queue()Eric Dumazet2009-11-151-0/+15
| | | | | | | | | | | | | | | | | | | | Check the return value of ndo_select_queue(). If the value isn't smaller than the real_num_tx_queues, print a warning message, and reset it to zero. Signed-off-by: Changli Gao <xiaosuo@gmail.com> Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> ---- Signed-off-by: David S. Miller <davem@davemloft.net>
* | net: allow to propagate errors through ->ndo_hard_start_xmit()Patrick McHardy2009-11-131-6/+26
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Currently the ->ndo_hard_start_xmit() callbacks are only permitted to return one of the NETDEV_TX codes. This prevents any kind of error propagation for virtual devices, like queue congestion of the underlying device in case of layered devices, or unreachability in case of tunnels. This patches changes the NET_XMIT codes to avoid clashes with the NETDEV_TX codes and changes the two callers of dev_hard_start_xmit() to expect either errno codes, NET_XMIT codes or NETDEV_TX codes as return value. In case of qdisc_restart(), all non NETDEV_TX codes are mapped to NETDEV_TX_OK since no error propagation is possible when using qdiscs. In case of dev_queue_xmit(), the error is propagated upwards. Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>
* | netdev: fold name hash properly (v3)stephen hemminger2009-11-111-1/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The full_name_hash function does not produce well distributed values in the lower bits, so most code uses hash_32() to fold it. This is really a bug introduced when name hashing was added, back in 2.5 when I added name hashing. hash_32 is all that is needed since full_name_hash returns unsigned int which is only 32 bits on 64 bit platforms. Also, there is no point in using hash_32 on ifindex, because the is naturally sequential and usually well distributed. Signed-off-by: Stephen Hemminger <shemminger@vyatta.com> Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | net: Introduce for_each_netdev_rcu() iteratorEric Dumazet2009-11-041-14/+16
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Adds RCU management to the list of netdevices. Convert some for_each_netdev() users to RCU version, if it can avoid read_lock-ing dev_base_lock Ie: read_lock(&dev_base_loack); for_each_netdev(net, dev) some_action(); read_unlock(&dev_base_lock); becomes : rcu_read_lock(); for_each_netdev_rcu(net, dev) some_action(); rcu_read_unlock(); Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | net: RCU locking for simple ioctl()Eric Dumazet2009-11-011-4/+4
| | | | | | | | | | | | | | | | | | | | All ioctls() implemented by dev_ifsioc_locked() : SIOCGIFFLAGS, SIOCGIFMETRIC, SIOCGIFMTU, SIOCGIFHWADDR, SIOCGIFSLAVE, SIOCGIFMAP, SIOCGIFINDEX & SIOCGIFTXQLEN can use RCU lock instead of dev_base_lock rwlock Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | veth: Fix unregister_netdevice_queue for vethEric W. Biederman2009-11-011-1/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | I tested the recent unregister many changes and got a weird, nasty and seemingly unrelasted kernel oops. Changing unregister_netdevice_queue to use list_move_tail fixes the problem for me. ip link add type veth rmmod veth ls /sys/class/net/ showed one of the veth devices still present. A subsequent ip link oopsed the box. Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> Acked-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | net: Introduce dev_get_by_name_rcu()Eric Dumazet2009-11-011-9/+40
| | | | | | | | | | | | | | | | | | | | | | | | | | | | Some workloads hit dev_base_lock rwlock pretty hard. We can use RCU lookups to avoid touching this rwlock (and avoid touching netdevice refcount) netdevices are already freed after a RCU grace period, so this patch adds no penalty at device dismantle time. However, it adds a synchronize_rcu() call in dev_change_name() Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | net: use hlist_for_each_entry()Eric Dumazet2009-10-301-8/+8
| | | | | | | | | | | | | | | | Small cleanup of __dev_get_by_name() and __dev_get_by_index() to use hlist_for_each_entry() : They'll look like their _rcu variant. Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | gro: Change all receive functions to return GRO result codesBen Hutchings2009-10-291-23/+15
| | | | | | | | | | | | | | | | | | | | | | | | This will allow drivers to adjust their receive path dynamically based on whether GRO is being applied successfully. Currently all in-tree callers ignore the return values of these functions and do not need to be changed. Signed-off-by: Ben Hutchings <bhutchings@solarflare.com> Acked-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>
* | gro: Name the GRO result enumeration typeBen Hutchings2009-10-291-5/+14
| | | | | | | | | | | | | | | | | | This clarifies which return and parameter types are GRO result codes and not RX result codes. Signed-off-by: Ben Hutchings <bhutchings@solarflare.com> Acked-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>
* | net: Introduce dev_get_by_index_rcu()Eric Dumazet2009-10-291-10/+38
| | | | | | | | | | | | | | | | | | | | | | | | | | Some workloads hit dev_base_lock rwlock pretty hard. We can use RCU lookups to avoid touching this rwlock. netdevices are already freed after a RCU grace period, so this patch adds no penalty at device dismantle time. dev_ifname() converted to dev_get_by_index_rcu() Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | vlan: Optimize multiple unregistrationEric Dumazet2009-10-281-0/+1
| | | | | | | | | | | | | | Use unregister_netdevice_many() to speedup master device unregister. Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | net: add a list_head parameter to dellink() methodEric Dumazet2009-10-281-1/+1
| | | | | | | | | | | | | | | | | | Adding a list_head parameter to rtnl_link_ops->dellink() methods allow us to queue devices on a list, in order to dismantle them all at once. Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | net: Introduce unregister_netdevice_many()Eric Dumazet2009-10-281-32/+65
| | | | | | | | | | | | | | | | | | | | Introduce rollback_registered_many() and unregister_netdevice_many() rollback_registered_many() is able to perform necessary steps at device dismantle time, factorizing two expensive synchronize_net() calls. Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | net: Introduce unregister_netdevice_queue()Eric Dumazet2009-10-281-7/+13
| | | | | | | | | | | | | | | | | | This patchs adds an unreg_list anchor to struct net_device, and introduces an unregister_netdevice_queue() function, able to queue a net_device to a list instead of immediately unregister it. Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | vlan: allow null VLAN ID to be usedEric Dumazet2009-10-271-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | We currently use a 16 bit field (vlan_tci) to store VLAN ID/PRIO on a skb. Null value is used as a special value, meaning vlan tagging not enabled. This forbids use of null vlan ID. As pointed by David, some drivers use the 3 high order bits (PRIO) As VLAN ID is 12 bits, we can use the remaining bit (CFI) as a flag, and allow null VLAN ID. In case future code really wants to use VLAN_CFI_MASK, we'll have to use a bit outside of vlan_tci. #define VLAN_PRIO_MASK 0xe000 /* Priority Code Point */ #define VLAN_PRIO_SHIFT 13 #define VLAN_CFI_MASK 0x1000 /* Canonical Format Indicator */ #define VLAN_TAG_PRESENT VLAN_CFI_MASK #define VLAN_VID_MASK 0x0fff /* VLAN Identifier */ Reported-by: Gertjan Hofman <gertjan_hofman@yahoo.com> Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | rtnetlink: speedup rtnl_dump_ifinfo()Eric Dumazet2009-10-241-5/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | When handling large number of netdevice, rtnl_dump_ifinfo() is very slow because it has O(N^2) complexity. Instead of scanning one single list, we can use the 256 sub lists of the dev_index hash table. This considerably speedups "ip link" operations Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | net: Use sk_tx_queue_mapping for connected socketsKrishna Kumar2009-10-201-6/+18
| | | | | | | | | | | | | | | | | | | | | | For connected sockets, the first run of dev_pick_tx saves the calculated txq in sk_tx_queue_mapping. This is not saved if either the device has a queue select or the socket is not connected. Next iterations of dev_pick_tx uses the cached value of sk_tx_queue_mapping. Signed-off-by: Krishna Kumar <krkumar2@in.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | net: Use netdev_alloc_skb_ip_align()Eric Dumazet2009-10-131-10/+3
| | | | | | | | | | Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | net: Make UFO on master device independent of attached devicesSridhar Samudrala2009-10-071-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | Now that software UFO is supported, UFO can be enabled on master devices like bridge, bond even though the attached device doesn't support this feature in hardware. This allows UFO to be used between KVM host and guest even when a physical interface attached to the bridge doesn't support UFO. Signed-off-by: Sridhar Samudrala <sri@us.ibm.com> Acked-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>
* | net: introduce NETDEV_POST_INIT notifierJohannes Berg2009-10-051-0/+6
|/ | | | | | | | | | | | For various purposes including a wireless extensions bugfix, we need to hook into the netdev creation before before netdev_register_kobject(). This will also ease doing the dev type assignment that Marcel was working on for cfg80211 drivers w/o touching them all. Signed-off-by: Johannes Berg <johannes@sipsolutions.net> Signed-off-by: Marcel Holtmann <marcel@holtmann.org> Signed-off-by: David S. Miller <davem@davemloft.net>
* net: restore tx timestamping for accelerated vlansEric Dumazet2009-09-301-3/+3
| | | | | | | | | | | | | | | | | | | | | | Since commit 9b22ea560957de1484e6b3e8538f7eef202e3596 ( net: fix packet socket delivery in rx irq handler ) We lost rx timestamping of packets received on accelerated vlans. Effect is that tcpdump on real dev can show strange timings, since it gets rx timestamps too late (ie at skb dequeueing time, not at skb queueing time) 14:47:26.986871 IP 192.168.20.110 > 192.168.20.141: icmp 64: echo request seq 1 14:47:26.986786 IP 192.168.20.141 > 192.168.20.110: icmp 64: echo reply seq 1 14:47:27.986888 IP 192.168.20.110 > 192.168.20.141: icmp 64: echo request seq 2 14:47:27.986781 IP 192.168.20.141 > 192.168.20.110: icmp 64: echo reply seq 2 14:47:28.986896 IP 192.168.20.110 > 192.168.20.141: icmp 64: echo request seq 3 14:47:28.986780 IP 192.168.20.141 > 192.168.20.110: icmp 64: echo reply seq 3 Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* bonding: remap muticast addresses without using dev_close() and dev_open()Moni Shoua2009-09-151-2/+2
| | | | | | | | | | | | | | | | | | | | This patch fixes commit e36b9d16c6a6d0f59803b3ef04ff3c22c3844c10. The approach there is to call dev_close()/dev_open() whenever the device type is changed in order to remap the device IP multicast addresses to HW multicast addresses. This approach suffers from 2 drawbacks: *. It assumes tha the device is UP when calling dev_close(), or otherwise dev_close() has no affect. It is worth to mention that initscripts (Redhat) and sysconfig (Suse) doesn't act the same in this matter. *. dev_close() has other side affects, like deleting entries from the routing table, which might be unnecessary. The fix here is to directly remap the IP multicast addresses to HW multicast addresses for a bonding device that changes its type, and nothing else. Reported-by: Jason Gunthorpe <jgunthorpe@obsidianresearch.com> Signed-off-by: Moni Shoua <monis@voltaire.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next-2.6Linus Torvalds2009-09-141-319/+337
|\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next-2.6: (1623 commits) netxen: update copyright netxen: fix tx timeout recovery netxen: fix file firmware leak netxen: improve pci memory access netxen: change firmware write size tg3: Fix return ring size breakage netxen: build fix for INET=n cdc-phonet: autoconfigure Phonet address Phonet: back-end for autoconfigured addresses Phonet: fix netlink address dump error handling ipv6: Add IFA_F_DADFAILED flag net: Add DEVTYPE support for Ethernet based devices mv643xx_eth.c: remove unused txq_set_wrr() ucc_geth: Fix hangs after switching from full to half duplex ucc_geth: Rearrange some code to avoid forward declarations phy/marvell: Make non-aneg speed/duplex forcing work for 88E1111 PHYs drivers/net/phy: introduce missing kfree drivers/net/wan: introduce missing kfree net: force bridge module(s) to be GPL Subject: [PATCH] appletalk: Fix skb leak when ipddp interface is not loaded ... Fixed up trivial conflicts: - arch/x86/include/asm/socket.h converted to <asm-generic/socket.h> in the x86 tree. The generic header has the same new #define's, so that works out fine. - drivers/net/tun.c fix conflict between 89f56d1e9 ("tun: reuse struct sock fields") that switched over to using 'tun->socket.sk' instead of the redundantly available (and thus removed) 'tun->sk', and 2b980dbd ("lsm: Add hooks to the TUN driver") which added a new 'tun->sk' use. Noted in 'next' by Stephen Rothwell.
| * net: force bridge module(s) to be GPLStephen Hemminger2009-09-111-2/+2
| | | | | | | | | | | | | | | | | | | | The only valid usage for the bridge frame hooks are by a GPL components (such as the bridge module). The kernel should not leave a crack in the door for proprietary networking stacks to slip in. Signed-off-by: Stephen Hemminger <shemminger@vyatta.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * net: Remove debugging codeEric Dumazet2009-09-031-2/+0
| | | | | | | | | | | | | | Remove a debugging aid I accidently left in previous 'cleanup' patch Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * net: net/core/dev.c cleanupsEric Dumazet2009-09-031-297/+292
| | | | | | | | | | | | | | Pure style cleanup patch before surgery :) Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * net: convert remaining non-symbolic return values in dev_queue_xmitKrishna Kumar2009-08-301-1/+1
| | | | | | | | | | | | | | Patch compiled and 32 simultaneous netperf testing ran fine. Signed-off-by: Krishna Kumar <krkumar2@in.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * Drop ARPHRD_IEEE802154_PHYDmitry Eremin-Solenikov2009-08-191-2/+2
| | | | | | | | | | | | | | There are not maste devices in mac802154 anymore, so drop ARPHRD_IEEE802154_PHY definition. Signed-off-by: Dmitry Eremin-Solenikov <dbaryshkov@gmail.com>
| * Merge branch 'master' of ↵David S. Miller2009-08-121-9/+14
| |\ | | | | | | | | | | | | | | | | | | master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6 Conflicts: arch/microblaze/include/asm/socket.h
| * | net: Avoid enqueuing skb for default qdiscsKrishna Kumar2009-08-061-13/+35
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | dev_queue_xmit enqueue's a skb and calls qdisc_run which dequeue's the skb and xmits it. In most cases, the skb that is enqueue'd is the same one that is dequeue'd (unless the queue gets stopped or multiple cpu's write to the same queue and ends in a race with qdisc_run). For default qdiscs, we can remove the redundant enqueue/dequeue and simply xmit the skb since the default qdisc is work-conserving. The patch uses a new flag - TCQ_F_CAN_BYPASS to identify the default fast queue. The controversial part of the patch is incrementing qlen when a skb is requeued - this is to avoid checks like the second line below: + } else if ((q->flags & TCQ_F_CAN_BYPASS) && !qdisc_qlen(q) && >> !q->gso_skb && + !test_and_set_bit(__QDISC_STATE_RUNNING, &q->state)) { Results of a 2 hour testing for multiple netperf sessions (1, 2, 4, 8, 12 sessions on a 4 cpu system-X). The BW numbers are aggregate Mb/s across iterations tested with this version on System-X boxes with Chelsio 10gbps cards: ---------------------------------- Size | ORG BW NEW BW | ---------------------------------- 128K | 156964 159381 | 256K | 158650 162042 | ---------------------------------- Changes from ver1: 1. Move sch_direct_xmit declaration from sch_generic.h to pkt_sched.h 2. Update qdisc basic statistics for direct xmit path. 3. Set qlen to zero in qdisc_reset. 4. Changed some function names to more meaningful ones. Signed-off-by: Krishna Kumar <krkumar2@in.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * | net: mark read-only arrays as constJan Engelhardt2009-08-051-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | String literals are constant, and usually, we can also tag the array of pointers const too, moving it to the .rodata section. Signed-off-by: Jan Engelhardt <jengelh@medozas.de> Signed-off-by: David S. Miller <davem@davemloft.net>