Re: dmaengine: fix dma_unmap (was: Re: [PATCH 06/13] DMAENGINE: driver for the ARM PL080/PL081 PrimeCells)

Previous thread: [PATCH] writeback: avoid unnecessary determine_dirtyable_memory call by Minchan Kim on Monday, January 3, 2011 - 9:30 am. (3 messages)

Next thread: [PATCH] writeback: fix typo of global_dirty_limits comment by Minchan Kim on Monday, January 3, 2011 - 9:36 am. (2 messages)
From: Dan Williams
Date: Monday, January 3, 2011 - 9:36 am

On Mon, Jan 3, 2011 at 3:14 AM, Russell King - ARM Linux

This requires that a copy of the mapped addresses be maintained
outside the driver's physical descriptor.  This needs support from the
client to set up storage for this information (probably a
scatterlist).  The dmaengine core could use this to implement a common
unmap routine.  However,  this still has the problem of how to prevent
unmapping too early in the multi-operation raid case and how to
communicate the full set of addresses to unmap to the final descriptor
in such a chain.  I think the only way to fully solve this is to make
the client solely responsible for both mapping and unmapping.

For raid this will have implications for architectures that split
operation types on to different physical channels.  Preparing the
entire operation chain ahead of time is not possible on such
configuration because we need to remap the buffers for each channel
transition.  So, raid will have an optimized path for engines like
mv_xor, ioatdma, and iop-adma (iop13xx) where all buffers can be
mapped upfront (against a single physical channel) and then unmapped
when all stripe operations complete.  For the others iop-adma (iop3xx)
and ppc44x we need to wait for each leg to finish before mapping and
issuing the next leg.  There will most likely be negative performance
implications of waiting and reissuing, but as far as I can see this is

Longer term I do not see these flags surviving, but yes a 2.6.38
change along these lines makes sense.

--
Dan
--

From: Russell King - ARM Linux
Date: Monday, January 3, 2011 - 9:52 am

That's not entirely true.  You will only need to remap buffers if
old_chan->device != new_chan->device, as the underlying struct device
will be the different and could possibly have a different IOMMU or
DMA-able memory parameters.

So, when changing channels, the optimization is not engine specific,
but can be effected when the chan->device points to the same dma_device
structure.  That means it should still be possible to chain several
operations together, even if it means that they occur on different
channels on the same device.

One passing idea is the async_* operations need to chain buffers in
terms of <virtual page+offset, len, dma_addr_t, struct dma_device *>, or
maybe <struct dma_device *, scatterlist>.  If the dma_device pointer is
initialized, the scatterlist is already mapped.  If this differs from
the dma_device for the next selected operation, the previous operations
need to be run, then unmap and remap for the new device.


Well, if the idea is to kill those flags, then it would be a good idea
not to introduce new uses of them as that'll only complicate matters.

I do have an untested patch which adds the unmap to pl08x, but I'm
wondering if it's worth it, or whether to disable the memcpy support
for the time being.
--


On Mon, Jan 3, 2011 at 8:52 AM, Russell King - ARM Linux

Yes, but currently operation capabilities are organized per dma device
(i.e. all channels on a dma device share the same set of
capabilities).  The channel allocator will keep the chain on a single
channel where possible, but if it determines we need to switch to a
channel with a different capability set then we have also switched dma
devices at that point.

iop3xx and ppc4xx have this dma_device-per-dma_chan
organization.currently.  They could switch to a model of hiding
multiple hw channels behind a single dma_chan, but then they would
need to handle the operation ordering and channel transitions

Yes, but the dma driver still does not have enough information to
determine when it is finally safe to unmap / allow speculative reads.
The raid driver can make a much cleaner guarantee of "this stripe now
belongs to a dma device" and "all dma operations have completed this

We could disable the driver if NET_DMA or ASYNC_TX_DMA are selected.
 That still allows the driver to be exercised with dmatest.  Although
I notice the driver is already marked experimental, do we need
something stronger for 37-final?

--
Dan
--

From: Linus Walleij
Date: Tuesday, January 4, 2011 - 3:34 pm

Your pick, IMHO. To use it out-of-the-box with 2.6.37 is not possible
on any system anyway - we have not patched in the required
platform data to any ARM system! Those who do such things surely
know what they're doing.

Yours,
Linus Walleij
--

Previous thread: