[*v3 PATCH 07/22] IPVS: netns preparation for proto_udp

Previous thread: [*v3 PATCH 15/22] IPVS: netns, ip_vs_stats and its procfs by hans on Thursday, December 30, 2010 - 3:50 am. (1 message)

Next thread: [net-next-2.6 PATCH v2 0/4] dcbnl: Extending dcbnl to support non-host DCBX by Shmulik Ravid on Thursday, December 30, 2010 - 9:26 am. (2 messages)
From: hans
Date: Thursday, December 30, 2010 - 3:50 am

From: Hans Schillstrom <hans.schillstrom@ericsson.com>

This patch series adds network name space support to the LVS.

REVISION

This is version 3

OVERVIEW

The patch doesn't remove or add any functionality except for netns.
For users that don't use network name space (netns) this patch is
completely transparent.

Now it's possible to run LVS in a Linux container (see lxc-tools)
i.e.  a light weight visualization. For example it's possible to run
one or several lvs on a real server in their own network name spaces.
From the LVS point of view it looks like it runs on it's own machine.

IMPLEMENTATION
Basic requirements for netns awareness
 - Global variables has to be moved to dyn. allocated memory.
 - No or very little performance loss

Large hash tables connection hash and service hashes still resides in
global memory with net ptr added in hash key.
Most global variables now resides in a struct ipvs { } in netns/ip_vs.h.
The size of per name space is 2096 bytes (for x86_64) and a little bit less
for 32 bit arch's.

Statistics counters is now lock-free i.e. incremented per CPU,
The estimator does a sum when using it.

Procfs: ip_vs_stats_percpu is added to reflect the "per cpu"
ex.
# cat /proc/net/ip_vs_stats
       Total Incoming Outgoing         Incoming         Outgoing
CPU    Conns  Packets  Packets            Bytes            Bytes
  0        0        3        1               9D               34
  1        0        1        2               49               70
  2        0        1        2               34               76
  3        1        2        2               70               74
  ~        1        7        7              18A              18E

     Conns/s   Pkts/s   Pkts/s          Bytes/s          Bytes/s
           0        0        0                0                0

Algorithm files are untouched except for lblc and lblcr.

STEP BY STEP
First patch creates network name space init for all files that need it.
How ever if a new name space is ...
From: hans
Date: Thursday, December 30, 2010 - 3:50 am

From: Hans Schillstrom <hans.schillstrom@ericsson.com>

Add support for protocol data per name-space.
in struct ip_vs_protocol, appcnt will be removed when all protos
are modified for network name-space.

This patch causes warnings of unused functions, they will be used
when next patch will be applied.

Signed-off-by: Hans Schillstrom <hans.schillstrom@ericsson.com>
---
 include/net/ip_vs.h              |   20 +++++++++++-
 include/net/netns/ip_vs.h        |    3 ++
 net/netfilter/ipvs/ip_vs_proto.c |   66 ++++++++++++++++++++++++++++++++++++++
 3 files changed, 88 insertions(+), 1 deletions(-)

diff --git a/include/net/ip_vs.h b/include/net/ip_vs.h
index 70c5462..4fc61bc 100644
--- a/include/net/ip_vs.h
+++ b/include/net/ip_vs.h
@@ -350,6 +350,7 @@ struct iphdr;
 struct ip_vs_conn;
 struct ip_vs_app;
 struct sk_buff;
+struct ip_vs_proto_data;
 
 struct ip_vs_protocol {
 	struct ip_vs_protocol	*next;
@@ -364,6 +365,10 @@ struct ip_vs_protocol {
 
 	void (*exit)(struct ip_vs_protocol *pp);
 
+	void (*init_netns)(struct net *net, struct ip_vs_proto_data *pd);
+
+	void (*exit_netns)(struct net *net, struct ip_vs_proto_data *pd);
+
 	int (*conn_schedule)(int af, struct sk_buff *skb,
 			     struct ip_vs_protocol *pp,
 			     int *verdict, struct ip_vs_conn **cpp);
@@ -415,7 +420,20 @@ struct ip_vs_protocol {
 	int (*set_state_timeout)(struct ip_vs_protocol *pp, char *sname, int to);
 };
 
-extern struct ip_vs_protocol * ip_vs_proto_get(unsigned short proto);
+/*
+ * protocol data per netns
+ */
+struct ip_vs_proto_data {
+	struct ip_vs_proto_data	*next;
+	struct ip_vs_protocol	*pp;
+	int			*timeout_table;	/* protocol timeout table */
+	atomic_t		appcnt;		/* counter of proto app incs. */
+	struct tcp_states_t 	*tcp_state_table;
+};
+
+extern struct ip_vs_protocol   * ip_vs_proto_get(unsigned short proto);
+extern struct ip_vs_proto_data * ip_vs_proto_data_get(struct net *net,
+						      unsigned short proto);
 
 struct ip_vs_conn_param {
 	const union ...
From: hans
Date: Thursday, December 30, 2010 - 3:50 am

From: Hans Schillstrom <hans.schillstrom@ericsson.com>

In this phase (one), all local vars will be moved to ipvs struct.

Remaining work, add param struct net *net to a couple of
functions that is common for all protos and use ip_vs_proto_data

*v3
Removed unused function set_state_timeout()

Signed-off-by: Hans Schillstrom <hans.schillstrom@ericsson.com>
---
 include/net/netns/ip_vs.h            |    8 +++
 net/netfilter/ipvs/ip_vs_proto.c     |    3 +
 net/netfilter/ipvs/ip_vs_proto_udp.c |   86 +++++++++++++++++-----------------
 3 files changed, 54 insertions(+), 43 deletions(-)

diff --git a/include/net/netns/ip_vs.h b/include/net/netns/ip_vs.h
index 512cdd0..4975026 100644
--- a/include/net/netns/ip_vs.h
+++ b/include/net/netns/ip_vs.h
@@ -40,6 +40,14 @@ struct netns_ipvs {
 	struct list_head 	tcp_apps[TCP_APP_TAB_SIZE];
 	spinlock_t		tcp_app_lock;
 #endif
+	/* ip_vs_proto_udp */
+#ifdef CONFIG_IP_VS_PROTO_UDP
+	#define	UDP_APP_TAB_BITS	4
+	#define	UDP_APP_TAB_SIZE	(1 << UDP_APP_TAB_BITS)
+	#define	UDP_APP_TAB_MASK	(UDP_APP_TAB_SIZE - 1)
+	struct list_head 	udp_apps[UDP_APP_TAB_SIZE];
+	spinlock_t		udp_app_lock;
+#endif
 
 	/* ip_vs_lblc */
 	int 			sysctl_lblc_expiration;
diff --git a/net/netfilter/ipvs/ip_vs_proto.c b/net/netfilter/ipvs/ip_vs_proto.c
index 90d69c5..ec71d47 100644
--- a/net/netfilter/ipvs/ip_vs_proto.c
+++ b/net/netfilter/ipvs/ip_vs_proto.c
@@ -310,6 +310,9 @@ static int  __net_init  __ip_vs_protocol_init(struct net *net)
 #ifdef CONFIG_IP_VS_PROTO_TCP
 	register_ip_vs_proto_netns(net, &ip_vs_protocol_tcp);
 #endif
+#ifdef CONFIG_IP_VS_PROTO_UDP
+	register_ip_vs_proto_netns(net, &ip_vs_protocol_udp);
+#endif
 	return 0;
 }
 
diff --git a/net/netfilter/ipvs/ip_vs_proto_udp.c b/net/netfilter/ipvs/ip_vs_proto_udp.c
index 5ab54f6..71a4721 100644
--- a/net/netfilter/ipvs/ip_vs_proto_udp.c
+++ b/net/netfilter/ipvs/ip_vs_proto_udp.c
@@ -9,7 +9,8 @@
  *              as published by the Free Software Foundation; either version
  *         ...
From: hans
Date: Thursday, December 30, 2010 - 3:51 am

From: Hans Schillstrom <hans.schillstrom@ericsson.com>

This patch makes defense work timer per name-space,
A net ptr had to be added to the ipvs struct,
since it's needed by defense_work_handler.

Signed-off-by: Hans Schillstrom <hans.schillstrom@ericsson.com>
---
 include/net/ip_vs.h             |    2 +-
 include/net/netns/ip_vs.h       |    3 +++
 net/netfilter/ipvs/ip_vs_conn.c |    5 +++--
 net/netfilter/ipvs/ip_vs_core.c |    1 +
 net/netfilter/ipvs/ip_vs_ctl.c  |   20 +++++++++-----------
 5 files changed, 17 insertions(+), 14 deletions(-)

diff --git a/include/net/ip_vs.h b/include/net/ip_vs.h
index 99828f0..918382a 100644
--- a/include/net/ip_vs.h
+++ b/include/net/ip_vs.h
@@ -855,7 +855,7 @@ extern const char * ip_vs_state_name(__u16 proto, int state);
 
 extern void ip_vs_tcp_conn_listen(struct net *net, struct ip_vs_conn *cp);
 extern int ip_vs_check_template(struct ip_vs_conn *ct);
-extern void ip_vs_random_dropentry(void);
+extern void ip_vs_random_dropentry(struct net *net);
 extern int ip_vs_conn_init(void);
 extern void ip_vs_conn_cleanup(void);
 
diff --git a/include/net/netns/ip_vs.h b/include/net/netns/ip_vs.h
index 2e9a1b3..1c8c3c4 100644
--- a/include/net/netns/ip_vs.h
+++ b/include/net/netns/ip_vs.h
@@ -72,6 +72,7 @@ struct netns_ipvs {
 
 	int 			num_services;    /* no of virtual services */
 	/* 1/rate drop and drop-entry variables */
+	struct delayed_work	defense_work;   /* Work handler */
 	int 			drop_rate;
 	int			drop_counter;
 	atomic_t 		dropentry;
@@ -131,6 +132,8 @@ struct netns_ipvs {
 	/* multicast interface name */
 	char 			master_mcast_ifn[IP_VS_IFNAME_MAXLEN];
 	char 			backup_mcast_ifn[IP_VS_IFNAME_MAXLEN];
+	/* net name space ptr */
+	struct net 		*net;            /* Needed by timer routines */
 };
 
 #endif /* IP_VS_H_ */
diff --git a/net/netfilter/ipvs/ip_vs_conn.c b/net/netfilter/ipvs/ip_vs_conn.c
index 5ba205a..28bdaf7 100644
--- a/net/netfilter/ipvs/ip_vs_conn.c
+++ b/net/netfilter/ipvs/ip_vs_conn.c
@@ -1138,7 ...
From: Julian Anastasov
Date: Saturday, January 1, 2011 - 5:27 am

Hello,


 	Only some comments for two of the patches:

v3 PATCH 10/22 use ip_vs_proto_data as param

 	- Can ip_vs_protocol_timeout_change() walk
 	proto_data_table instead of ip_vs_proto_table to avoid the
 	__ipvs_proto_data_get call?

v3 PATCH 15/22 ip_vs_stats

 	- Is ustats_seq allocated with alloc_percpu?

 	Such reader sections should be changed to use tmp vars because
 	on retry we risk to add the values multiple times. For example:

 		do {
 			start = read_seqcount_begin(seq_count);
 			ipvs->ctl_stats->ustats.inbytes += u->inbytes;
 			ipvs->ctl_stats->ustats.outbytes += u->outbytes;
 		} while (read_seqcount_retry(seq_count, start));

 	should be changed as follows:

 		u64 inbytes, outbytes;

 		do {
 			start = read_seqcount_begin(seq_count);
 			inbytes = u->inbytes;
 			outbytes = u->outbytes;
 		} while (read_seqcount_retry(seq_count, start));
 		ipvs->ctl_stats->ustats.inbytes += inbytes;
 		ipvs->ctl_stats->ustats.outbytes += outbytes;

 	Or it is better to create new struct for percpu stats,
 	they will have their own syncp, because we can not
 	change struct ip_vs_stats_user. syncp should be percpu
 	because we remove locks.

 	For example:

 	struct ip_vs_cpu_stats {
 		struct ip_vs_stats_user	ustats;
 		struct u64_stats_sync	syncp;
 	};

 	Then we can add this in struct netns_ipvs:

 	struct ip_vs_cpu_stats __percpu *stats;	/* Statistics */

 	without the seqcount_t * ustats_seq;

 	Then syncp does not need any initialization, it seems
 	alloc_percpu returns zeroed area.

 	When we use percpu stats for all places (dest and svc) we
 	can create new struct struct ip_vs_counters, so that we
 	can reduce the memory usage from percpu data. Now stats
 	include counters and estimated values. The estimated
 	values should not be percpu. Then ip_vs_cpu_stats
 	will be shorter (it is not visible to user space):

 	struct ip_vs_cpu_stats {
 		struct ip_vs_counters	ustats;
 		struct u64_stats_sync	syncp;
 	};

 	For writer ...
From: Hans Schillstrom
Date: Sunday, January 2, 2011 - 9:27 am

OK, I will take a look at this when I'm back from my vacation at 20 Jan.
I guess it might be worth the work to make all the stat counters per_cpu.

--

Previous thread: [*v3 PATCH 15/22] IPVS: netns, ip_vs_stats and its procfs by hans on Thursday, December 30, 2010 - 3:50 am. (1 message)

Next thread: [net-next-2.6 PATCH v2 0/4] dcbnl: Extending dcbnl to support non-host DCBX by Shmulik Ravid on Thursday, December 30, 2010 - 9:26 am. (2 messages)