[PATCH 4/8] net/stmmac: use generic recycling infrastructure

Previous thread: [PATCH 3/8] net/mv643xx: use generic recycling infrastructure by Sebastian Andrzej Siewior on Friday, July 2, 2010 - 12:20 pm. (1 message)

Next thread: Re: by ($10,500,000.00) Donation for Charitable Goals on Friday, July 2, 2010 - 12:33 pm. (1 message)
From: Sebastian Andrzej Siewior
Date: Friday, July 2, 2010 - 12:20 pm

This is version two of generic rx-recycling followed by version one of
emergency skb pools which are built on top of rx-recycling.
The change from v1 of generic rx-recycling is that the list access is
unlocked instead of locked.
Patch six which introduces the emergency pools adds the locking back.
This is required since we now have two not serialized users. In order
not to punish everyone patch eight removes this locking again. That
patch converts only two drivers so you have an idea what I think is
required to get the locking removed.

The idea behind emergency pools is to have pre-allocated skbs for TX and
RX. Using the memory allocator for it leads to latencies during memory
pressure. The pre-allocated skb are "tagged" and should get back to the
pool once they are through the stack so the pool should never get
exhausted.

While it was easy to convert the drivers which share the same concept of
rx-recycling to use the emergency pools it was difficult to hook up the
more complex drivers like e1000e. The e1000e can use split skbs / a frag
list which is different from the allocation currently used. So instead of
forcing all drivers to use the same way of doing things I've been thinking
about providing a dedicated callback for skb allocation and checking if
this skb is "good enough". This is not yet implemented.

I would be glad to receive some feedback on this patch series before I go
any further. Unfortunately I'm on vacation for the next two weeks so I
can't respond earlier. tglx is on Cc and should be able respond earlier :)

Sebastian
--

From: Sebastian Andrzej Siewior
Date: Friday, July 2, 2010 - 12:20 pm

From: Sebastian Andrzej Siewior <bigeasy@linutronix.de>

Right now both users (socket and network driver) are accessing the
emergency pools locked. If there is no socket user then we have only the
NIC driver which is touching the pool in its napi callback. The locking
is not required since the driver has its own locking.

Disabling the emergency pools results in emerg_skb_users going down to
zero. Once the driver notices this then further allocations will be lock
less. This is performed while holding the list lock of the pool to
ensure that all users which touch the struct locked are gone.
Enabling the pools for the socket user requires that the NIC driver is
not touching the pool unlocked. This is ensured by the ndo_emerg_reload
callback which performs a lightweight disable/enable of the card.

As a side effect, the emergency mode of the nic is now deactivated once
the last user is gone.

Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
---
 drivers/net/gianfar.c     |   10 ++++
 drivers/net/ucc_geth.c    |    8 ++++
 include/linux/netdevice.h |   55 ++++++++++++++++++++++--
 net/core/dev.c            |    1 +
 net/core/skbuff.c         |   19 ++++++++-
 net/core/sock.c           |  101 ++++++++++++++++++++++++++++++++++++++-------
 6 files changed, 172 insertions(+), 22 deletions(-)

diff --git a/drivers/net/gianfar.c b/drivers/net/gianfar.c
index 1a1a249..0c891ea 100644
--- a/drivers/net/gianfar.c
+++ b/drivers/net/gianfar.c
@@ -116,6 +116,7 @@ static void gfar_new_rxbdp(struct gfar_priv_rx_q *rx_queue, struct rxbd8 *bdp,
 		struct sk_buff *skb);
 static int gfar_set_mac_address(struct net_device *dev);
 static int gfar_change_mtu(struct net_device *dev, int new_mtu);
+static int gfar_emerg_reload(struct net_device *dev);
 static irqreturn_t gfar_error(int irq, void *dev_id);
 static irqreturn_t gfar_transmit(int irq, void *dev_id);
 static irqreturn_t gfar_interrupt(int irq, void *dev_id);
@@ -464,6 +465,7 @@ static const struct net_device_ops ...
From: Sebastian Andrzej Siewior
Date: Friday, July 2, 2010 - 12:20 pm

From: Sebastian Andrzej Siewior <bigeasy@linutronix.de>

Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
---
 drivers/net/stmmac/stmmac.h      |    1 -
 drivers/net/stmmac/stmmac_main.c |   26 +++++++-------------------
 2 files changed, 7 insertions(+), 20 deletions(-)

diff --git a/drivers/net/stmmac/stmmac.h b/drivers/net/stmmac/stmmac.h
index ebebc64..dbf9f95 100644
--- a/drivers/net/stmmac/stmmac.h
+++ b/drivers/net/stmmac/stmmac.h
@@ -44,7 +44,6 @@ struct stmmac_priv {
 	unsigned int dirty_rx;
 	struct sk_buff **rx_skbuff;
 	dma_addr_t *rx_skbuff_dma;
-	struct sk_buff_head rx_recycle;
 
 	struct net_device *dev;
 	int is_gmac;
diff --git a/drivers/net/stmmac/stmmac_main.c b/drivers/net/stmmac/stmmac_main.c
index a31d580..722a5e6 100644
--- a/drivers/net/stmmac/stmmac_main.c
+++ b/drivers/net/stmmac/stmmac_main.c
@@ -636,18 +636,7 @@ static void stmmac_tx(struct stmmac_priv *priv)
 			p->des3 = 0;
 
 		if (likely(skb != NULL)) {
-			/*
-			 * If there's room in the queue (limit it to size)
-			 * we add this skb back into the pool,
-			 * if it's the right size.
-			 */
-			if ((skb_queue_len(&priv->rx_recycle) <
-				priv->dma_rx_size) &&
-				skb_recycle_check(skb, priv->dma_buf_sz))
-				__skb_queue_head(&priv->rx_recycle, skb);
-			else
-				dev_kfree_skb(skb);
-
+			net_recycle_add(priv->dev, skb);
 			priv->tx_skbuff[entry] = NULL;
 		}
 
@@ -843,6 +832,9 @@ static int stmmac_open(struct net_device *dev)
 	priv->dma_buf_sz = STMMAC_ALIGN(buf_sz);
 	init_dma_desc_rings(dev);
 
+	net_recycle_init(priv->dev, priv->dma_rx_size, priv->dma_buf_sz +
+			NET_IP_ALIGN);
+
 	/* DMA initialization and SW reset */
 	if (unlikely(priv->hw->dma->init(ioaddr, priv->pbl, priv->dma_tx_phy,
 					 priv->dma_rx_phy) < 0)) {
@@ -894,7 +886,6 @@ static int stmmac_open(struct net_device *dev)
 		phy_start(priv->phydev);
 
 	napi_enable(&priv->napi);
-	skb_queue_head_init(&priv->rx_recycle);
 	netif_start_queue(dev);
 	return 0;
 }
@@ -925,7 ...
From: Sebastian Andrzej Siewior
Date: Friday, July 2, 2010 - 12:20 pm

From: Sebastian Andrzej Siewior <bigeasy@linutronix.de>

Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
---
 drivers/net/ucc_geth.c |   30 ++++++++----------------------
 drivers/net/ucc_geth.h |    2 --
 2 files changed, 8 insertions(+), 24 deletions(-)

diff --git a/drivers/net/ucc_geth.c b/drivers/net/ucc_geth.c
index dc32a62..9d6097b 100644
--- a/drivers/net/ucc_geth.c
+++ b/drivers/net/ucc_geth.c
@@ -210,10 +210,7 @@ static struct sk_buff *get_new_skb(struct ucc_geth_private *ugeth,
 {
 	struct sk_buff *skb = NULL;
 
-	skb = __skb_dequeue(&ugeth->rx_recycle);
-	if (!skb)
-		skb = dev_alloc_skb(ugeth->ug_info->uf_info.max_rx_buf_length +
-				    UCC_GETH_RX_DATA_BUF_ALIGNMENT);
+	skb = net_recycle_get(ugeth->ndev);
 	if (skb == NULL)
 		return NULL;
 
@@ -1992,8 +1989,6 @@ static void ucc_geth_memclean(struct ucc_geth_private *ugeth)
 		iounmap(ugeth->ug_regs);
 		ugeth->ug_regs = NULL;
 	}
-
-	skb_queue_purge(&ugeth->rx_recycle);
 }
 
 static void ucc_geth_set_multi(struct net_device *dev)
@@ -2069,6 +2064,7 @@ static void ucc_geth_stop(struct ucc_geth_private *ugeth)
 	ugeth->phydev = NULL;
 
 	ucc_geth_memclean(ugeth);
+	net_recycle_cleanup(ugeth->ndev);
 }
 
 static int ucc_struct_init(struct ucc_geth_private *ugeth)
@@ -2205,9 +2201,6 @@ static int ucc_struct_init(struct ucc_geth_private *ugeth)
 			ugeth_err("%s: Failed to ioremap regs.", __func__);
 		return -ENOMEM;
 	}
-
-	skb_queue_head_init(&ugeth->rx_recycle);
-
 	return 0;
 }
 
@@ -3213,12 +3206,8 @@ static int ucc_geth_rx(struct ucc_geth_private *ugeth, u8 rxQ, int rx_work_limit
 			if (netif_msg_rx_err(ugeth))
 				ugeth_err("%s, %d: ERROR!!! skb - 0x%08x",
 					   __func__, __LINE__, (u32) skb);
-			if (skb) {
-				skb->data = skb->head + NET_SKB_PAD;
-				skb->len = 0;
-				skb_reset_tail_pointer(skb);
-				__skb_queue_head(&ugeth->rx_recycle, skb);
-			}
+			if (skb)
+				net_recycle_add(dev, skb);
 
 			ugeth->rx_skbuff[rxQ][ugeth->skb_currx[rxQ]] = ...
From: Sebastian Andrzej Siewior
Date: Friday, July 2, 2010 - 12:20 pm

From: Sebastian Andrzej Siewior <bigeasy@linutronix.de>

skb_clone() creates a clone of the skb: a new head is allocated from the
slab cache and the reference counter for the data part is incremented.
For the skbs from the emergency pool, we don't really want to clone
them that way:
- talking to slab may lead to lock contention which in turn increases
  the latency.
- the original (with the data part) may return earlier to the pool than
  the clone. In that case we would "lose" the skb from the emergency
  pool.

Instead we do a copy of head and data into a skb from the emergency
pool.
This patch cuts pskb_copy() into a helper function which does
the bare work and the remaining pskb_copy() allocates a new skb and
calls it.

Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
---
 net/core/skbuff.c |   80 +++++++++++++++++++++++++++++++++++++++--------------
 1 files changed, 59 insertions(+), 21 deletions(-)

diff --git a/net/core/skbuff.c b/net/core/skbuff.c
index f02737d..9e094fc 100644
--- a/net/core/skbuff.c
+++ b/net/core/skbuff.c
@@ -613,6 +613,7 @@ struct sk_buff *skb_morph(struct sk_buff *dst, struct sk_buff *src)
 }
 EXPORT_SYMBOL_GPL(skb_morph);
 
+static int __pskb_copy(struct sk_buff *skb, struct sk_buff *n);
 /**
  *	skb_clone	-	duplicate an sk_buff
  *	@skb: buffer to clone
@@ -631,6 +632,20 @@ struct sk_buff *skb_clone(struct sk_buff *skb, gfp_t gfp_mask)
 {
 	struct sk_buff *n;
 
+	if (skb->emerg_dev) {
+		n = skb_dequeue(&skb->emerg_dev->rx_recycle);
+		if (!n)
+			goto norm_clone;
+		/* remove earlier reservers */
+		skb_reserve(n, - skb_headroom(n));
+		if (!__pskb_copy(skb, n)) {
+			n->emerg_dev = skb->emerg_dev;
+			dev_hold(skb->emerg_dev);
+			return n;
+		}
+		net_recycle_add(skb->emerg_dev, n);
+	}
+norm_clone:
 	n = skb + 1;
 	if (skb->fclone == SKB_FCLONE_ORIG &&
 	    n->fclone == SKB_FCLONE_UNAVAILABLE) {
@@ -720,31 +735,22 @@ struct sk_buff *skb_copy(const struct sk_buff *skb, gfp_t gfp_mask)
 ...
From: Sebastian Andrzej Siewior
Date: Friday, July 2, 2010 - 12:20 pm

From: Sebastian Andrzej Siewior <bigeasy@linutronix.de>

This patch implements emergency pools which are bound to a specific
network device. They can be activated via the socket interface and used
for a specific socket.
The pools are built on top of rx-recycling. The socket interface allows
to set the number of skbs in the pool and to active the pool.
The size of the skb which are accepted / added to the pool can not be
changed. It is set by the network driver and get altered on MTU change.
This requires to drop the current pool and re-allocate it. If the driver
does not set the skb size, the emergency pools can not be used.
Once the emergency pools are activated all rx-skbs allocation by the
network driver are taken from the pool. tx-skbs are allocated from the
emergency pool only for the relevant socket, i.e. that one which
activated the emergency mode.
Since the socket _and_ the driver can add/remove skbs to/from the pool
the list operations are using now skb_queue_head() and skb_dequeue().
There is patch later in the series which tries to bring the old unlock
behavior back if the emergency pools are not used by the user.

Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
---
 arch/alpha/include/asm/socket.h   |    4 +
 arch/arm/include/asm/socket.h     |    3 +
 arch/avr32/include/asm/socket.h   |    3 +
 arch/cris/include/asm/socket.h    |    5 +-
 arch/frv/include/asm/socket.h     |    4 +-
 arch/h8300/include/asm/socket.h   |    3 +
 arch/ia64/include/asm/socket.h    |    3 +
 arch/m32r/include/asm/socket.h    |    3 +
 arch/m68k/include/asm/socket.h    |    3 +
 arch/mips/include/asm/socket.h    |    3 +
 arch/mn10300/include/asm/socket.h |    3 +
 arch/parisc/include/asm/socket.h  |    3 +
 arch/powerpc/include/asm/socket.h |    3 +
 arch/s390/include/asm/socket.h    |    3 +
 arch/sparc/include/asm/socket.h   |    3 +
 arch/xtensa/include/asm/socket.h  |    3 +
 include/asm-generic/socket.h      |    4 +
 include/linux/netdevice.h         |   52 ...
From: Sebastian Andrzej Siewior
Date: Friday, July 2, 2010 - 12:20 pm

From: Sebastian Andrzej Siewior <bigeasy@linutronix.de>

Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
---
 drivers/net/gianfar.c         |   28 ++++++++--------------------
 drivers/net/gianfar.h         |    2 --
 drivers/net/gianfar_ethtool.c |    1 +
 3 files changed, 9 insertions(+), 22 deletions(-)

diff --git a/drivers/net/gianfar.c b/drivers/net/gianfar.c
index fccb7a3..1a1a249 100644
--- a/drivers/net/gianfar.c
+++ b/drivers/net/gianfar.c
@@ -1148,6 +1148,9 @@ static int gfar_probe(struct of_device *ofdev,
 		priv->rx_queue[i]->rxic = DEFAULT_RXIC;
 	}
 
+	net_recycle_init(dev, DEFAULT_RX_RING_SIZE,
+			priv->rx_buffer_size + RXBUF_ALIGNMENT);
+
 	/* enable filer if using multiple RX queues*/
 	if(priv->num_rx_queues > 1)
 		priv->rx_filer_enable = 1;
@@ -1768,7 +1771,7 @@ static void free_skb_resources(struct gfar_private *priv)
 			sizeof(struct rxbd8) * priv->total_rx_ring_size,
 			priv->tx_queue[0]->tx_bd_base,
 			priv->tx_queue[0]->tx_bd_dma_base);
-	skb_queue_purge(&priv->rx_recycle);
+	net_recycle_cleanup(priv->ndev);
 }
 
 void gfar_start(struct net_device *dev)
@@ -1949,8 +1952,6 @@ static int gfar_enet_open(struct net_device *dev)
 
 	enable_napi(priv);
 
-	skb_queue_head_init(&priv->rx_recycle);
-
 	/* Initialize a bunch of registers */
 	init_registers(dev);
 
@@ -2366,6 +2367,7 @@ static int gfar_change_mtu(struct net_device *dev, int new_mtu)
 		stop_gfar(dev);
 
 	priv->rx_buffer_size = tempsize;
+	net_recycle_size(dev, tempsize + RXBUF_ALIGNMENT);
 
 	dev->mtu = new_mtu;
 
@@ -2498,16 +2500,7 @@ static int gfar_clean_tx_ring(struct gfar_priv_tx_q *tx_queue)
 			bdp = next_txbd(bdp, base, tx_ring_size);
 		}
 
-		/*
-		 * If there's room in the queue (limit it to rx_buffer_size)
-		 * we add this skb back into the pool, if it's the right size
-		 */
-		if (skb_queue_len(&priv->rx_recycle) < rx_queue->rx_ring_size &&
-				skb_recycle_check(skb, priv->rx_buffer_size ...
From: Eric Dumazet
Date: Friday, July 2, 2010 - 11:23 pm

Le vendredi 02 juillet 2010 à 21:20 +0200, Sebastian Andrzej Siewior a

Sebastian

I read all patches, and my initial feeling is all this is very complex
and have many shortcomings.

Most modern NICS are multiqueue, so that each cpu can use a queue on its
own without slowing down other cpus.

Yet rx recycling has one queue per device, defeating part of the
multiqueue goal.

Patch 6/8 even touches dev->refcnt on each emerg packet
Patch 6/8 adds 8 bytes (emerg_dev) to skb. Oh well...

Adding cache layers, especially dumb ones like this one, is probably the
sign something more fundamental is broken somewhere.


I do believe for example that netdev_alloc_skb() should not try to use
the node affinity of the device, but use current cpu node for sk_buff at
least, and possibly for data part too.

One other problem of skb are the two memory blocs involved, and fact
that first one (skb) is already very big and fat, and filled/dirtied
many cycles before its use in RX path.

Maybe its time to provide new API, so that a driver can build an skb at
the time RX interrupt is handled, not at the time the rx ring buffer is
renewed. RX ring should only provide the data part to NIC, and skb
should be built when NIC delivers the frame, so that we provide to IP
stack a real hot skb.



--

From: David Miller
Date: Friday, July 2, 2010 - 11:46 pm

From: Eric Dumazet <eric.dumazet@gmail.com>

Drivers do this already, and in fact I am very sure I'm mentioned this
to you at least other time in the past, and one such driver is NIU :-)

It's trivial to do with devices which work on power-of-2 chopped up
pages.  In fact I'm surprised that RX buffer management scheme is
not more prevalent in network devices.
--

From: Eric Dumazet
Date: Saturday, July 3, 2010 - 12:31 am

You did indeed. I should said "provide a generic and universal API",
usable by average NIC driver, not only big ones. (NIU is more than
10.000 lines of code ;) )

NIU still uses netdev_alloc_skb() API. I was thinking of a function to

Many of them were written in last century :)



--

From: Sebastian Andrzej Siewior
Date: Thursday, August 26, 2010 - 10:31 am

True. gianfar uses multiqueue howver in its napi poll one CPU grabs a

The memory for the skbs is comming from slab. slab's latency is getting
quite high on memory pressure. Therefore I try to avoid talking to slab

And where are you allocating the memory from? Remember, SLAB could be
slow :)

Sebastian
--

Previous thread: [PATCH 3/8] net/mv643xx: use generic recycling infrastructure by Sebastian Andrzej Siewior on Friday, July 2, 2010 - 12:20 pm. (1 message)

Next thread: Re: by ($10,500,000.00) Donation for Charitable Goals on Friday, July 2, 2010 - 12:33 pm. (1 message)