Re: git: HAMMER - Add live dedup sysctl and support

Previous message: [thread] [date] [author]
Next message: [thread] [date] [author]
From: Ilya Dryomov
Date: Tuesday, January 4, 2011 - 3:11 am

On Tue, Jan 04, 2011 at 10:00:18AM +0100, Thomas Nikolajsen wrote:

Bear in mind that while it basically works it is marked experimental -
extensive testing as still ahead (obscure races, etc).  But any feedback
is very welcome!


I wanted Matt to write commit message, but he refused ;)


The thing is that as the quoted comment says there is this on-media
requirement, which along with simplifying code tremendously limits its
use cases (I mean the number of situations where duplicate data will be
dedup'ed).  The primary use case is cp and cpdup of files and directory
hierarchies.  If you want to get maximum of its abilities set sysctl
vfs.hammer.live_dedup=2, but note that it will slightly impact the
normal write path (performance wise).

As for the relation, both of them can and should be used together.  Live
dedup can be turned on at all times, while offline dedup should be run
periodically to pickup all the leftovers.  In combination this
arrangement gives us a full-fledged deduplication support in HAMMER
without major complications in the implementation.


Yes, offline is per-PFS.  Online is fs-wide, that is it will dedup data
between PFSs (cp pfs1/a pfs2/b will be dedup'ed).


Apart from testing and closing races in live dedup:

1) The main issue with both offline and online dedup is reblocker.  Under
certain (rare) circumstances it may partially re-dedup dedup ;)  So
reblocker has to be made aware of dedup'ed data, but it is pretty a
separate project.

2) 'hammer dedup-everything' directive - fs-wide (as opposed to per-PFS)
offline dedup.  Actually I consider per-PFS separation more of a feature
than a drawback, but for people who think they may have possible
duplicates across PFSs it will be useful.

3) Per-file (and possibly per-directory) nodedup flag.

4) Make live dedup cache size a tunable (for now I think I'll just make
it a sysctl, but it clearly has to scale automatically).

This is all I can remember of the top of my head, any thoughts and
comments are welcome.

Thanks,

		Ilya

Previous message: [thread] [date] [author]
Next message: [thread] [date] [author]

Messages in current thread:
Re: git: HAMMER - Add live dedup sysctl and support, Thomas Nikolajsen, (Tue Jan 4, 2:00 am)
Re: git: HAMMER - Add live dedup sysctl and support, Ilya Dryomov, (Tue Jan 4, 3:11 am)