Re: [PATCH 2/2] lib: cpu_rmap: CPU affinity reverse-mapping

Previous thread: UASP updates (uasp.c) by Luben Tuikov on Tuesday, January 4, 2011 - 11:15 am. (1 message)

Next thread: Null pointer exception for local variables in stack with C++ kernel modules by Leo Prasath on Tuesday, January 4, 2011 - 12:46 pm. (2 messages)
From: Ben Hutchings
Date: Tuesday, January 4, 2011 - 12:37 pm

This patch series is intended to support queue selection on multiqueue
IRQ-per-queue network devices (accelerated RFS and XPS-MQ) and
potentially queue selection for other classes of multiqueue device.

The first patch implements IRQ affinity notifiers, based on the outline
that Thomas wrote in response to my earlier patch series for accelerated RFS.

The second patch is a generalisation of the CPU affinity reverse-
mapping, plus functions to maintain such a mapping based on the new IRQ
affinity notifiers.

I would like to be able to use this functionality in networking for
2.6.38.  Thomas, if you are happy with this, could these changes go
through net-next-2.6?  Alternately, if Linus pulls from linux-2.6-tip
and David pulls from Linus during the merge window, I can (re-)submit
the dependent changes after that.

Ben.

Ben Hutchings (2):
  genirq: Add IRQ affinity notifiers
  lib: cpu_rmap: CPU affinity reverse-mapping

 include/linux/cpu_rmap.h  |   73 +++++++++++++
 include/linux/interrupt.h |   41 +++++++
 include/linux/irqdesc.h   |    3 +
 kernel/irq/manage.c       |   81 ++++++++++++++
 lib/Kconfig               |    4 +
 lib/Makefile              |    2 +
 lib/cpu_rmap.c            |  262 +++++++++++++++++++++++++++++++++++++++++++++
 7 files changed, 466 insertions(+), 0 deletions(-)
 create mode 100644 include/linux/cpu_rmap.h
 create mode 100644 lib/cpu_rmap.c

-- 
1.7.3.4


-- 
Ben Hutchings, Senior Software Engineer, Solarflare Communications
Not speaking for my employer; that's the marketing department's job.
They asked us to note that Solarflare product names are trademarked.

--

From: Ben Hutchings
Date: Tuesday, January 4, 2011 - 12:38 pm

When initiating I/O on a multiqueue and multi-IRQ device, we may want
to select a queue for which the response will be handled on the same
or a nearby CPU.  This requires a reverse-map of IRQ affinity.  Add a
notification mechanism to support this.

This is based closely on work by Thomas Gleixner <tglx@linutronix.de>.

Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
---
 include/linux/interrupt.h |   41 +++++++++++++++++++++++
 include/linux/irqdesc.h   |    3 ++
 kernel/irq/manage.c       |   81 +++++++++++++++++++++++++++++++++++++++++++++
 3 files changed, 125 insertions(+), 0 deletions(-)

diff --git a/include/linux/interrupt.h b/include/linux/interrupt.h
index 55e0d42..09d6039 100644
--- a/include/linux/interrupt.h
+++ b/include/linux/interrupt.h
@@ -14,6 +14,8 @@
 #include <linux/smp.h>
 #include <linux/percpu.h>
 #include <linux/hrtimer.h>
+#include <linux/kref.h>
+#include <linux/workqueue.h>
 
 #include <asm/atomic.h>
 #include <asm/ptrace.h>
@@ -231,6 +233,28 @@ static inline void resume_device_irqs(void) { };
 static inline int check_wakeup_irqs(void) { return 0; }
 #endif
 
+/**
+ * struct irq_affinity_notify - context for notification of IRQ affinity changes
+ * @irq:		Interrupt to which notification applies
+ * @kref:		Reference count, for internal use
+ * @work:		Work item, for internal use
+ * @notify:		Function to be called on change.  This will be
+ *			called in process context.
+ * @release:		Function to be called on release.  This will be
+ *			called in process context.  Once registered, the
+ *			structure must only be freed when this function is
+ *			called or later.
+ */
+struct irq_affinity_notify {
+        unsigned int irq;
+        struct kref kref;
+#if defined(CONFIG_SMP) && defined(CONFIG_GENERIC_HARDIRQS)
+        struct work_struct work;
+#endif
+        void (*notify)(struct irq_affinity_notify *, const cpumask_t *mask);
+        void (*release)(struct kref *ref);
+};
+
 #if defined(CONFIG_SMP) && ...
From: Ben Hutchings
Date: Tuesday, January 4, 2011 - 12:39 pm

When initiating I/O on a multiqueue and multi-IRQ device, we may want
to select a queue for which the response will be handled on the same
or a nearby CPU.  This requires a reverse-map of IRQ affinity.  Add
library functions to support a generic reverse-mapping from CPUs to
objects with affinity and the specific case where the objects are
IRQs.

Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
---
 include/linux/cpu_rmap.h |   73 +++++++++++++
 lib/Kconfig              |    4 +
 lib/Makefile             |    2 +
 lib/cpu_rmap.c           |  262 ++++++++++++++++++++++++++++++++++++++++++++++
 4 files changed, 341 insertions(+), 0 deletions(-)
 create mode 100644 include/linux/cpu_rmap.h
 create mode 100644 lib/cpu_rmap.c

diff --git a/include/linux/cpu_rmap.h b/include/linux/cpu_rmap.h
new file mode 100644
index 0000000..6e2f5ff
--- /dev/null
+++ b/include/linux/cpu_rmap.h
@@ -0,0 +1,73 @@
+/*
+ * cpu_rmap.c: CPU affinity reverse-map support
+ * Copyright 2010 Solarflare Communications Inc.
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms of the GNU General Public License version 2 as published
+ * by the Free Software Foundation, incorporated herein by reference.
+ */
+
+#include <linux/cpumask.h>
+#include <linux/gfp.h>
+#include <linux/slab.h>
+
+/**
+ * struct cpu_rmap - CPU affinity reverse-map
+ * @near: For each CPU, the index and distance to the nearest object,
+ *      based on affinity masks
+ * @size: Number of objects to be reverse-mapped
+ * @used: Number of objects added
+ * @obj: Array of object pointers
+ */
+struct cpu_rmap {
+	struct {
+		u16     index;
+		u16     dist;
+	} near[NR_CPUS];
+	u16		size, used;
+	void		*obj[0];
+};
+#define CPU_RMAP_DIST_INF 0xffff
+
+extern struct cpu_rmap *alloc_cpu_rmap(unsigned int size, gfp_t flags);
+
+/**
+ * free_cpu_rmap - free CPU affinity reverse-map
+ * @rmap: Reverse-map allocated with alloc_cpu_rmap(), or %NULL
+ */
+static inline ...
From: Eric Dumazet
Date: Tuesday, January 4, 2011 - 2:17 pm

This [NR_CPUS] is highly suspect.


I really doubt you need other than GFP_KERNEL. (Especially if you switch


--

From: Ben Hutchings
Date: Tuesday, January 4, 2011 - 2:23 pm

I think that would be a waste of space in shared caches, as this is
[...]

I agree, but this is consistent with ~all other allocation functions.

Ben.


-- 
Ben Hutchings, Senior Software Engineer, Solarflare Communications
Not speaking for my employer; that's the marketing department's job.
They asked us to note that Solarflare product names are trademarked.

--

From: Eric Dumazet
Date: Tuesday, January 4, 2011 - 2:45 pm

This is slow path, unless I dont understood the intent.

Cache lines dont matter. I was not concerned about speed but memory
needs.

NR_CPUS can be 4096 on some distros, that means a 32Kbyte allocation.

Really, you'll have to have very strong arguments to introduce an
[NR_CPUS] array in the kernel today.



--

From: Ben Hutchings
Date: Tuesday, January 4, 2011 - 3:04 pm

get_rps_cpu() will need to read from an arbitrary entry in cpu_rmap (not
the current CPU's entry) for each new flow and for each flow that went
idle for a while.  That's not fast path but it is part of the data path,

I could replace this with a pointer to an array of size
num_possible_cpus().  But I think per_cpu is wrong here.

Ben.

-- 
Ben Hutchings, Senior Software Engineer, Solarflare Communications
Not speaking for my employer; that's the marketing department's job.
They asked us to note that Solarflare product names are trademarked.

--

From: Eric Dumazet
Date: Tuesday, January 4, 2011 - 3:19 pm

Yes, an dynamic array is acceptable

You probably mean nr_cpu_ids 




--

Previous thread: UASP updates (uasp.c) by Luben Tuikov on Tuesday, January 4, 2011 - 11:15 am. (1 message)

Next thread: Null pointer exception for local variables in stack with C++ kernel modules by Leo Prasath on Tuesday, January 4, 2011 - 12:46 pm. (2 messages)