Thread (24 messages) 24 messages, 7 authors, 2007-06-21

Re: [IPV4] LVS: Allow to send ICMP unreachable responses when real-servers are removed

From: Simon Horman <horms@verge.net.au>
Date: 2007-05-15 05:26:15

On Mon, May 14, 2007 at 07:41:48PM +0200, Patrick McHardy wrote:
Janusz Krzysztofik wrote:
quoted
Patrick McHardy wrote:
quoted
Janusz Krzysztofik wrote:
quoted
... ICMP port unreachable messages are not generated inside
IPVS code, they are just sent, with help of the patch in question, from
udp_input() or netfilter REJECT.

Both use icmp_send(), which should always pick a local source, so I
don't understand why this change was needed. Could you describe
the specific case when the packet generated by icmp_send() does
not have a local source?

Yes, it happens when a packet with a non-local destination IP address is
routed localy in order to reach ip_vs_in(), but is not catched there
because of no associated connection and no matching service, so it is
passed through and ends up in udp_input(). Then, inside udp_input(),
icmp_send() is invoked with original non-local destination IP as source
address.

So you're adding a local route for non-local destination and the
address selection in icmp_send() uses the original destination
address as source because the route has RTCF_LOCAL set, resulting
in an error in ip_route_output_slow().
I'm not entirely sure that "adding a local route" is the right
terminology, but then again, perhaps I'm missunderstanding exactly
what that means.

My undersanding of the problem is that IPVS likes to send icmp to notify
end-users when real-servers are down. The source ip of such icmp is the
VIP, that is the IP address associated with the virtual service.
However, it is quite valid for this VIP not to be configured on the
machine that is running IPVS. Thus the machine in question wants to send
icmp packets with a non-local source address.

http://archive.linuxvirtualserver.org/html/lvs-users/2007-01/msg00109.html

I think that your patch looks good, assuming that inet_addr_type(VIP)
is going to return RTN_LOCAL (except in the unlikely case that VIP is
multicast or something silly like that.

However, I wonder if efficiency or safety reasons it might
be better for IPVS to pass some sort of OK_ITS_SUPPSED_TO_BE_NON_LOCAL
flag into ip_route(). 

Just a thought.
If thats correct than this patch should also work, it changes
icmp_send() to check if the original destination address is
non-local when deciding whether to pick a new address (and
reverts the routing changes).

Signed-off-by: Patrick McHardy <redacted>
quoted hunk ↗ jump to hunk
diff --git a/net/ipv4/icmp.c b/net/ipv4/icmp.c
index d38cbba..b964863 100644
--- a/net/ipv4/icmp.c
+++ b/net/ipv4/icmp.c
@@ -513,7 +513,7 @@ void icmp_send(struct sk_buff *skb_in, int type, int code, __be32 info)
 	 */
 
 	saddr = iph->daddr;
-	if (!(rt->rt_flags & RTCF_LOCAL)) {
+	if (inet_addr_type(saddr) != RTN_LOCAL) {
 		if (sysctl_icmp_errors_use_inbound_ifaddr)
 			saddr = inet_select_addr(skb_in->dev, 0, RT_SCOPE_LINK);
 		else
diff --git a/net/ipv4/route.c b/net/ipv4/route.c
index cb76e3c..df9fe4f 100644
--- a/net/ipv4/route.c
+++ b/net/ipv4/route.c
@@ -2396,7 +2396,7 @@ static int ip_route_output_slow(struct rtable **rp, const struct flowi *oldflp)
 
 		/* It is equivalent to inet_addr_type(saddr) == RTN_LOCAL */
 		dev_out = ip_dev_find(oldflp->fl4_src);
-		if ((dev_out == NULL) && !(sysctl_ip_nonlocal_bind))
+		if (dev_out == NULL)
 			goto out;
 
 		/* I removed check for oif == dev_out->oif here.
@@ -2407,7 +2407,7 @@ static int ip_route_output_slow(struct rtable **rp, const struct flowi *oldflp)
 		      of another iface. --ANK
 		 */
 
-		if (dev_out && oldflp->oif == 0
+		if (oldflp->oif == 0
 		    && (MULTICAST(oldflp->fl4_dst) || oldflp->fl4_dst == htonl(0xFFFFFFFF))) {
 			/* Special hack: user can direct multicasts
 			   and limited broadcast via necessary interface

-- 
Horms
  H: http://www.vergenet.net/~horms/
  W: http://www.valinux.co.jp/en/
Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help