Fwd: LVS on local node

Previous thread: by Mr Tomo Sand on Wednesday, July 21, 2010 - 5:43 pm. (1 message)

Next thread: macvtap: Limit packet queue length by Herbert Xu on Wednesday, July 21, 2010 - 11:41 pm. (13 messages)
From: Franchoze Eric
Date: Wednesday, July 21, 2010 - 8:51 pm

Hello,

I'm trying to do load balancing of incoming traffic to my applications. This applications are not very  smp friendly, and I want try to run some instances according to number of cpus on single machine. And balance load of incoming traffic/connections to this applications.
Looks like is should be similar to http://www.austintek.com/LVS/LVS-HOWTO/HOWTO/LVS-HOWTO.localnode.html

 linux kernel 2.6.32 with  or without hide interface patches.  Tried different configurations but could not see packets on application layer.

192.168.1.165 - eth0 - interface for external connections
195.0.0.1 - dummy0 - virtual interface, real application is binded to that address.

Configuration is:
-A -t 192.168.1.165:1234 -s wlc
-a -t 192.168.1.165:1234 -r 195.0.0.1:1234 -g -w

#ipvsadm -L -n
IP Virtual Server version 1.2.1 (size=4096)
Prot LocalAddress:Port Scheduler Flags
  -> RemoteAddress:Port           Forward Weight ActiveConn InActConn
TCP  192.168.1.165:1234 wlc
  -> 195.0.0.1:1234               Local   1      0          0        
#

Log:
[ 2106.897409] IPVS: lookup/out TCP 192.168.1.165:44847->192.168.1.165:1234 not hit
[ 2106.897412] IPVS: lookup service: fwm 0 TCP 192.168.1.165:1234 hit
[ 2106.897414] IPVS: ip_vs_wlc_schedule(): Scheduling...
[ 2106.897416] IPVS: WLC: server 195.0.0.1:1234 activeconns 0 refcnt 2 weight 1 overhead 1
[ 2106.897418] IPVS: Enter: ip_vs_conn_new, net/netfilter/ipvs/ip_vs_conn.c line 693
[ 2106.897421] IPVS: Bind-dest TCP c:192.168.1.165:44847 v:192.168.1.165:1234 d:195.0.0.1:1234 fwd:L s:0 conn->flags:181 conn->refcnt:1 dest->refcnt:3
[ 2106.897425] IPVS: Schedule fwd:L c:192.168.1.165:44847 v:192.168.1.165:1234 d:195.0.0.1:1234 conn->flags:1C1 conn->refcnt:2
[ 2106.897429] IPVS: TCP input  [S...] 195.0.0.1:1234->192.168.1.165:44847 state: NONE->SYN_RECV conn->refcnt:2
[ 2106.897431] IPVS: Enter: ip_vs_null_xmit, net/netfilter/ipvs/ip_vs_xmit.c line 212
[ 2106.897439] IPVS: lookup/in TCP 192.168.1.165:1234->192.168.1.165:44847 not hit
[ 2106.897441] ...
From: Eric Dumazet
Date: Wednesday, July 21, 2010 - 11:56 pm

lvs seems not very SMP friendly and a bit complex.

I would use an iptables setup and a slighly modified REDIRECT target
(and/or a nf_nat_setup_info() change)

Say you have 8 daemons listening on different ports (1000 to 1007)

iptables -t nat -A PREROUTING -p tcp --dport 1234 -j REDIRECT --rxhash-dist --to-port 1000-1007

rxhash would be provided by RPS on recent kernels or locally computed if
not already provided by core network (or old kernel)

This rule would be triggered only at connection establishment.
conntracking take care of following packets and is SMP friendly.



--

From: Changli Gao
Date: Thursday, July 22, 2010 - 2:10 am

I think maybe REDIRECT is enough. If the public port is one of the
real ports, you need to append "random" option to iptables target
REDIRECT. If not, "REDIRECT --to-ports 1000-1007" is good enough, and
the destination port will be selected in the round-robin manner.

-- 
Regards,
Changli Gao(xiaosuo@gmail.com)
--

From: Eric Dumazet
Date: Thursday, July 22, 2010 - 2:46 am

Yes, on 2.6.32, no RPS, so undocumented --random option is probably the
best we can offer. (random option was added in 2.6.22)

iptables -t nat -A PREROUTING -p tcp --dport 1234 -j REDIRECT --random --to-port 1000-1007

Here is a patch to add "random" help to REDIRECT iptables target

Thanks

[PATCH] extensions: REDIRECT: add random help

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
---

diff --git a/extensions/libipt_REDIRECT.c b/extensions/libipt_REDIRECT.c
index 3dfcadf..324d0eb 100644
--- a/extensions/libipt_REDIRECT.c
+++ b/extensions/libipt_REDIRECT.c
@@ -17,7 +17,8 @@ static void REDIRECT_help(void)
 	printf(
 "REDIRECT target options:\n"
 " --to-ports <port>[-<port>]\n"
-"				Port (range) to map to.\n");
+"				Port (range) to map to.\n"
+" [--random]\n");
 }
 
 static const struct option REDIRECT_opts[] = {


--

From: Changli Gao
Date: Thursday, July 22, 2010 - 2:52 am

FYI: the random option is documented in the manual page of iptables.

   REDIRECT
       This  target is only valid in the nat table, in the PREROUTING and OUT-
       PUT chains, and user-defined chains which are only  called  from  those
       chains.   It redirects the packet to the machine itself by changing the
       destination IP  to  the  primary  address  of  the  incoming  interface
       (locally-generated packets are mapped to the 127.0.0.1 address).

       --to-ports port[-port]
              This  specifies  a  destination  port  or range of ports to use:
              without this, the destination port is never  altered.   This  is
              only valid if the rule also specifies -p tcp or -p udp.

       --random
              If  option --random is used then port mapping will be randomized
              (kernel >= 2.6.22).


-- 
Regards,
Changli Gao(xiaosuo@gmail.com)
--

From: Eric Dumazet
Date: Thursday, July 22, 2010 - 2:59 am

Note my patch has nothing to do with the man page, its already up2date.

I usually dont read the Fine manuals, do you ?

Try :

iptables -t nat -A PREROUTING -p tcp --dport 1234 -j REDIRECT --help

REDIRECT target options:
 --to-ports <port>[-<port>]
				Port (range) to map to.


You see [--random] is missing.



--

From: Changli Gao
Date: Thursday, July 22, 2010 - 3:06 am

-- 
Regards,
Changli Gao(xiaosuo@gmail.com)
--

From: Patrick McHardy
Date: Friday, July 23, 2010 - 3:54 am

Applied, thanks Eric.
--

From: Eric Dumazet
Date: Thursday, July 22, 2010 - 5:59 am

Hi Simon

I am not familiar with LVS code, so I am probably wrong, but it seems it
could be changed a bit.

Some rwlocks might become spinlocks (faster than rwlocks)

__ip_vs_securetcp_lock for example is always used with
write_lock()/write_unlock().
This can be a regular spinlock without even knowing the code.

Some lookups could use RCU to avoid cache line misses, and to be able to
use spinlocks for the write side.

It would be good to have a bench setup with the case of 16 legacy
daemons, and check how many new connections per second can be
established, in an LVS setup and an iptables based one.

With 2.6.35 and RPS, a REDIRECT based solution can chose the target port
without taking any lock (not counting conntrack internal costs of
course), each cpu accessing local memory only.

# No need is eth0 is a multiqueue NIC
echo ffff >/sys/class/net/eth0/queues/rx-0/rps_cpus

for c in `seq 0 15`
do
  iptables -t nat -A PREROUTING -p tcp --dport 80 -m cpu --cpu $c -j
REDIRECT --to-port $((1000 + $c))
done



--

From: Simon Horman
Date: Thursday, July 22, 2010 - 6:20 am

Agreed. I took a look at RCUing things a while back, but got bogged

Its hard for lvs to compete with those kind of lightweight solutions and
it probably shouldn't. However, I'd just like to see LVS working as
well as it can within the constraint that, as you pointed out, its rather
complex. Thanks for your suggestions.

--

From: Simon Horman
Date: Thursday, July 22, 2010 - 5:24 am

On Thu, Jul 22, 2010 at 08:56:51AM +0200, Eric Dumazet wrote:


I'd be interested to hear some thoughts on
how the SMP aspect of that statement could
be improved.
--

From: Franchoze Eric
Date: Thursday, July 22, 2010 - 11:45 am

Thanks Eric,  your solution woks (I checked  with  -A PREROUTING  -j DNAT).  But there were 3 reasons why I wanted to do it with LVS:
1. use more smart schedules than simple random (schedule according to number connections or try make equal network load to all ports).
2. keepalive. LVS knows if service is dead or does not respond and skips route connections to it.
3. connection tracking - statistic how many clients on each port and were they were switched.
--

From: Simon Horman
Date: Thursday, July 22, 2010 - 6:25 am

Hi,

while others have suggested not using LVS for this task for various reasons.
I would just like to comment that this should work and this smells
like a bug to me. I will try and confirm that. But it won't be today.

--

From: Franchoze Eric
Date: Thursday, July 22, 2010 - 9:59 am

With the latest kernel I see that: LVS accepts connections, selects right destination (if round robin is selected destination changes accoring it), then it detects that it is local node and do:
net/netfilter/ipvs/ip_vs_xmit.c:
   ip_vs_null_xmit(struct sk_buff *skb, struct ip_vs_conn *cp,
 struct ip_vs_protocol *pp)

Which does nothing with skb. (here I do not understand what happens with that packet then)
I think if VLS could change destination for packets which go from local node to local node then connection can be established. Is it reasonable?
--

From: Simon Horman
Date: Sunday, September 19, 2010 - 10:56 pm

Hi,

sorry for taking a very long time to look into this more closely.
I believe that the problem is that you need to make sure that
the source IP address used by the client (application) is not the VIP
as the VIP also exists on the real-server and thus the real-server will
ignore packets from the VIP.

You also need to make sure that the source address can be receive
packets from the real-server. So you can't use 127.0.0.1.

What I suggest is have a different 192.168.1.x address as the primary
address on eth0 (or whatever the interface in question is) on the
client/linux-director. And add 192.168.1.165 as a secondary address on the
same interface.

--

Previous thread: by Mr Tomo Sand on Wednesday, July 21, 2010 - 5:43 pm. (1 message)

Next thread: macvtap: Limit packet queue length by Herbert Xu on Wednesday, July 21, 2010 - 11:41 pm. (13 messages)