Hello, I'm trying to do load balancing of incoming traffic to my applications. This applications are not very smp friendly, and I want try to run some instances according to number of cpus on single machine. And balance load of incoming traffic/connections to this applications. Looks like is should be similar to http://www.austintek.com/LVS/LVS-HOWTO/HOWTO/LVS-HOWTO.localnode.html linux kernel 2.6.32 with or without hide interface patches. Tried different configurations but could not see packets on application layer. 192.168.1.165 - eth0 - interface for external connections 195.0.0.1 - dummy0 - virtual interface, real application is binded to that address. Configuration is: -A -t 192.168.1.165:1234 -s wlc -a -t 192.168.1.165:1234 -r 195.0.0.1:1234 -g -w #ipvsadm -L -n IP Virtual Server version 1.2.1 (size=4096) Prot LocalAddress:Port Scheduler Flags -> RemoteAddress:Port Forward Weight ActiveConn InActConn TCP 192.168.1.165:1234 wlc -> 195.0.0.1:1234 Local 1 0 0 # Log: [ 2106.897409] IPVS: lookup/out TCP 192.168.1.165:44847->192.168.1.165:1234 not hit [ 2106.897412] IPVS: lookup service: fwm 0 TCP 192.168.1.165:1234 hit [ 2106.897414] IPVS: ip_vs_wlc_schedule(): Scheduling... [ 2106.897416] IPVS: WLC: server 195.0.0.1:1234 activeconns 0 refcnt 2 weight 1 overhead 1 [ 2106.897418] IPVS: Enter: ip_vs_conn_new, net/netfilter/ipvs/ip_vs_conn.c line 693 [ 2106.897421] IPVS: Bind-dest TCP c:192.168.1.165:44847 v:192.168.1.165:1234 d:195.0.0.1:1234 fwd:L s:0 conn->flags:181 conn->refcnt:1 dest->refcnt:3 [ 2106.897425] IPVS: Schedule fwd:L c:192.168.1.165:44847 v:192.168.1.165:1234 d:195.0.0.1:1234 conn->flags:1C1 conn->refcnt:2 [ 2106.897429] IPVS: TCP input [S...] 195.0.0.1:1234->192.168.1.165:44847 state: NONE->SYN_RECV conn->refcnt:2 [ 2106.897431] IPVS: Enter: ip_vs_null_xmit, net/netfilter/ipvs/ip_vs_xmit.c line 212 [ 2106.897439] IPVS: lookup/in TCP 192.168.1.165:1234->192.168.1.165:44847 not hit [ 2106.897441] ...
lvs seems not very SMP friendly and a bit complex. I would use an iptables setup and a slighly modified REDIRECT target (and/or a nf_nat_setup_info() change) Say you have 8 daemons listening on different ports (1000 to 1007) iptables -t nat -A PREROUTING -p tcp --dport 1234 -j REDIRECT --rxhash-dist --to-port 1000-1007 rxhash would be provided by RPS on recent kernels or locally computed if not already provided by core network (or old kernel) This rule would be triggered only at connection establishment. conntracking take care of following packets and is SMP friendly. --
I think maybe REDIRECT is enough. If the public port is one of the real ports, you need to append "random" option to iptables target REDIRECT. If not, "REDIRECT --to-ports 1000-1007" is good enough, and the destination port will be selected in the round-robin manner. -- Regards, Changli Gao(xiaosuo@gmail.com) --
Yes, on 2.6.32, no RPS, so undocumented --random option is probably the
best we can offer. (random option was added in 2.6.22)
iptables -t nat -A PREROUTING -p tcp --dport 1234 -j REDIRECT --random --to-port 1000-1007
Here is a patch to add "random" help to REDIRECT iptables target
Thanks
[PATCH] extensions: REDIRECT: add random help
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
---
diff --git a/extensions/libipt_REDIRECT.c b/extensions/libipt_REDIRECT.c
index 3dfcadf..324d0eb 100644
--- a/extensions/libipt_REDIRECT.c
+++ b/extensions/libipt_REDIRECT.c
@@ -17,7 +17,8 @@ static void REDIRECT_help(void)
printf(
"REDIRECT target options:\n"
" --to-ports <port>[-<port>]\n"
-" Port (range) to map to.\n");
+" Port (range) to map to.\n"
+" [--random]\n");
}
static const struct option REDIRECT_opts[] = {
--
FYI: the random option is documented in the manual page of iptables.
REDIRECT
This target is only valid in the nat table, in the PREROUTING and OUT-
PUT chains, and user-defined chains which are only called from those
chains. It redirects the packet to the machine itself by changing the
destination IP to the primary address of the incoming interface
(locally-generated packets are mapped to the 127.0.0.1 address).
--to-ports port[-port]
This specifies a destination port or range of ports to use:
without this, the destination port is never altered. This is
only valid if the rule also specifies -p tcp or -p udp.
--random
If option --random is used then port mapping will be randomized
(kernel >= 2.6.22).
--
Regards,
Changli Gao(xiaosuo@gmail.com)
--
Note my patch has nothing to do with the man page, its already up2date. I usually dont read the Fine manuals, do you ? Try : iptables -t nat -A PREROUTING -p tcp --dport 1234 -j REDIRECT --help REDIRECT target options: --to-ports <port>[-<port>] Port (range) to map to. You see [--random] is missing. --
Hi Simon I am not familiar with LVS code, so I am probably wrong, but it seems it could be changed a bit. Some rwlocks might become spinlocks (faster than rwlocks) __ip_vs_securetcp_lock for example is always used with write_lock()/write_unlock(). This can be a regular spinlock without even knowing the code. Some lookups could use RCU to avoid cache line misses, and to be able to use spinlocks for the write side. It would be good to have a bench setup with the case of 16 legacy daemons, and check how many new connections per second can be established, in an LVS setup and an iptables based one. With 2.6.35 and RPS, a REDIRECT based solution can chose the target port without taking any lock (not counting conntrack internal costs of course), each cpu accessing local memory only. # No need is eth0 is a multiqueue NIC echo ffff >/sys/class/net/eth0/queues/rx-0/rps_cpus for c in `seq 0 15` do iptables -t nat -A PREROUTING -p tcp --dport 80 -m cpu --cpu $c -j REDIRECT --to-port $((1000 + $c)) done --
Agreed. I took a look at RCUing things a while back, but got bogged Its hard for lvs to compete with those kind of lightweight solutions and it probably shouldn't. However, I'd just like to see LVS working as well as it can within the constraint that, as you pointed out, its rather complex. Thanks for your suggestions. --
On Thu, Jul 22, 2010 at 08:56:51AM +0200, Eric Dumazet wrote: I'd be interested to hear some thoughts on how the SMP aspect of that statement could be improved. --
Thanks Eric, your solution woks (I checked with -A PREROUTING -j DNAT). But there were 3 reasons why I wanted to do it with LVS: 1. use more smart schedules than simple random (schedule according to number connections or try make equal network load to all ports). 2. keepalive. LVS knows if service is dead or does not respond and skips route connections to it. 3. connection tracking - statistic how many clients on each port and were they were switched. --
Hi, while others have suggested not using LVS for this task for various reasons. I would just like to comment that this should work and this smells like a bug to me. I will try and confirm that. But it won't be today. --
With the latest kernel I see that: LVS accepts connections, selects right destination (if round robin is selected destination changes accoring it), then it detects that it is local node and do: net/netfilter/ipvs/ip_vs_xmit.c: ip_vs_null_xmit(struct sk_buff *skb, struct ip_vs_conn *cp, struct ip_vs_protocol *pp) Which does nothing with skb. (here I do not understand what happens with that packet then) I think if VLS could change destination for packets which go from local node to local node then connection can be established. Is it reasonable? --
Hi, sorry for taking a very long time to look into this more closely. I believe that the problem is that you need to make sure that the source IP address used by the client (application) is not the VIP as the VIP also exists on the real-server and thus the real-server will ignore packets from the VIP. You also need to make sure that the source address can be receive packets from the real-server. So you can't use 127.0.0.1. What I suggest is have a different 192.168.1.x address as the primary address on eth0 (or whatever the interface in question is) on the client/linux-director. And add 192.168.1.165 as a secondary address on the same interface. --
