Evening everyone, This is the latest incarnation of gitweb w/ caching. Per the general consensus and requests from the recent Gittogether I'm re-submitting my patches. Biggest changes are bringing it back in line with Junio's tree, couple of locking fixes (specifically targeted at binary files), and at the nagging behest of Sverre I've cut the old patch down to a much more manageable (and in his words "reviewable") series. This still differs, slightly, from whats in production on kernel.org. It's missing the index page git:// link, and kernel.org is still running the broken out output buffering but I'm likely to flip over to this just so I'm not maintaining quite so much (and the changes are easier to track with upstream) v7: - Rework output system, now central STDOUT redirect - Various fixes to caching brought in from existing running system v6: - Never saw the light of day - Various testing, and reworks. v5: - Missed a couple of things that were in my local tree, and added them back in. - Split up the die_error and the version matching patch - Set version matching to be on by default - otherwise this really is code that will never get checked, or at best enabled by default by distributions - Added a minor code cleanup with respect to $site_header that was already in my tree - Applied against a more recent git tree vs. 1.6.6-rc2 - Removed breakout patch for now (did that in v4 actually) and will deal with that separately http://git.kernel.org/?p=git/warthog9/gitweb.git;a=shortlog;h=refs/heads/gitweb-ml-v5 v4: - major re-working of the caching layer to use file handle redirection instead of buffering output - other minor improvements http://git.kernel.org/?p=git/warthog9/gitweb.git;a=shortlog;h=refs/heads/gitweb-ml-v4 v3: - various minor re-works based on mailing list feedback, this series was not sent to the mailing list. v2: - Better breakout - You can actually disable the cache now - John ...
This adds output buffering for gitweb, mainly in preparation for caching support. This is a dramatic change to how caching was being done, mainly in passing around the variable manually and such. This centrally flips the entire STDOUT to a variable, which after the completion of the run, flips it back and does a print on the resulting data. This should save on the previous 10K line patch (or so) that adds more explicit output passing. --- gitweb/gitweb.perl | 29 +++++++++++++++++++++++++++++ 1 files changed, 29 insertions(+), 0 deletions(-)
Nice solution. I'll steal it for GitwebCache::Capture::Redirect (or Doesn't this enable capture unconditionally? Wouldn't that screw-up the blame_data part of blame_interactive Ajax-y -- Jakub Narebski Poland ShadeHawk on #git --
This adds $git_versions_must_match variable, which is set to true, checks that we are running on the same version of git that we shipped with, and if not throw '500 Internal Server Error' error. What is checked is the version of gitweb (embedded in building gitweb.cgi), against version of runtime git binary used. Gitweb can usually run with a mismatched git install. This is more here to give an obvious warning as to whats going on vs. silently failing. By default this feature is turned on. Signed-off-by: John 'Warthog9' Hawley <warthog9@kernel.org> Signed-off-by: Jakub Narebski <jnareb@gmail.com> --- gitweb/README | 4 ++++ gitweb/gitweb.perl | 27 +++++++++++++++++++++++++++ 2 files changed, 31 insertions(+), 0 deletions(-)
Were you bitten by mismatch between git version and gitweb version? Just how serious is that (hypothetical?) situation? I think last time this patch stalled on discussion whether this feature should be turned on by default, where it can break some setups; or be turned off by default, where it is much less useful For me, if $git_versions_must_match is to be on by default, I would prefer also update to t/gitweb-lib.sh, so hat I don't have to Shouldn't it be before check_loadavg()? -- Jakub Narebski Poland ShadeHawk on #git --
I do not understand why this variable needs to be turned _on_ by default, but more importantly, isn't this more or less independent from what we discussed at GitTogether, which is to get your battle-tested caching layer that will tremendously help any nontrivial site merged earlier rather than later by reducing load average from 1000 to 3-or-4? --
This is a relatively large patch that implements the file based caching layer that is quite similar to the one used on such large sites as kernel.org and soon git.fedoraproject.org. This provides a simple, and straight forward caching mechanism that scales dramatically better than Gitweb by itself. The caching layer basically buffers the output that Gitweb would normally return, and saves that output to a cache file on the local disk. When the file is requested it attempts to gain a shared lock on the cache file and cat it out to the client. Should an exclusive lock be on a file (it's being updated) the code has a choice to either update in the background and go ahead and show the stale page while update is being performed, or stall the client(s) until the page is generated. There are two forms of stalling involved here, background building and non-background building, both of which are discussed in the configuration page. There are still a few known "issues" with respect to this: - Code needs to be added to be "browser" aware so that clients like wget that are trying to get a binary blob don't obtain a "Generating..." page Caching is disabled by default with the $cache_enable variable, setting this to 1 will enable file based caching. It is expected that this will be extended to include additional types of caching (like memcached) in the future and should not be exclusively considered a binary value. --- gitweb/cache.pm | 365 ++++++++++++++++++++++++++++++++++++++++++++++ gitweb/gitweb.perl | 99 +++++++++++-- gitweb/static/gitweb.css | 6 + 3 files changed, 458 insertions(+), 12 deletions(-) create mode 100644 gitweb/cache.pm
I am not commenting on minor style issues, like using camelCase names
of variables and subroutines, of 'if( ... )' instead of 'if (...)'
etc. minor issues.
"Quite similar" - what does it mean? In what way it is different from
O.K. This means caching HTTP response (caching output of CGI script),
using "simple" file based caching engine, without any serialization.
But this doesn't tell us how the cache expiration is handled, i.e. how
In my rewrite of earlier version of your gitweb caching patch it is
responsibility of caching engine (GitwebCache::FileCacheWithLocking)
to serialize (via locking) (re)generating cache entry. Gitweb just
passes page key, function that returns captured output to cache
instance; it passes callback for "staling" the client to cache
constructor.
It is not the "file" that is being requested: on the gitweb level it
is a page / action / view that is being requested, on the level of
caching negine it is cache entry that is requested... only at the low
Right. This is also missing from my rewrite.
Sidenote: why "binary blob"? Most downloading utilities do not follow
Why not do the same as in my rewrite, namely $cache_enabled or
$caching_enabled *boolean* value (not "binary"), and $cache variable
It would be nice to have at least some minimal testing for caching
enabled gitweb. Hmmm... I should have split update to t9500-* test
so you would be able to cherry-pick it.
Also, it would be good idea to have 'make install-gitweb' to install
'cache.pm' alongside gitweb, especially that you do not do any error
Wouldn't it be better to use real Perl module, i.e. start with
Because [stable] Perl earlier than 5.10.0 has pre-2.00 version of
File::Path in core, which didn't have make_path and remove_tree.
At least that is compatibility with pre-2.00 File::Path should be
mentioned in this comment, if we are to mention it at all.
Wouldn't it be better to use boolean 'if (!$caching_enabled) {' rather
than explicitely test for ...Yeah, when I saw your print-to-variable-assignment that was one of the things that came to my mind. Sounds like a reasonable thing to do. --
I am getting this in the gitweb.log:
[Fri Oct 29 22:21:12 2010] gitweb.perl: Undefined subroutine &main::cache_fetch called at .../t/../gitweb/gitweb.perl line 1124.
which seems to cause t9500 to fail.
--
This is caused by three issues (bugs) in v7 caching code. First is the reason why t9500 exhibits this bug. The gitweb caching v7 includes file with subroutines related to caching in the following way: do 'cache.pm'; Note that the relative path is used; for t9500 it is relative from somewhere witjin 't/', and not from 'gitweb/', so "do 'cache.pm';" doesn't find it. In my earlier rewrites I used do $cache_pm; and 't/gitweb-lib.sh' set $cache_pm appriopriately. Second is why this bug is bad. There is no error checking that "do 'cache.pm';" succeeded. It should be do $cache_pm; die $@ if $@; at least. Perhaps even better would be to simply turn off caching support if there is an error in 'cache.pm' (which probably should be called 'cache.pl', as it is not proper Perl module)... though on the other hand side it would could hide the fact that caching is not working. Third is why this matters. In v7 the cache_fetch() subroutine, defined in 'cache.pm', is run *unconditionally*, and has test if the caching is enabled *inside it*, instead of having gitweb (caller) use it only if caching is disabled. This coupled with the fact that 'make install-gitweb' would *not* install 'cache.pm' alongside gitweb.cgi means that anybody upgrading gitweb, whether he/she wants caching or not, would have gitweb stop working after upgrade... unless he/she knows that he/she has to copy 'cache.pm' file manually. On the other hand having test first if caching is enabled would make t9500 tests pass (because they do not even test minimally cache enabled / cache disabled cases, like in my rewrite), but would hide problem when caching is enabled and 'cache.pm' is not installed... (perhaps also in persistent environments; I don't know where pwd is then). -- Jakub Narebski Poland ShadeHawk on #git --
John, where should cache.pm go in the installed system? Does it typically
go next to gitweb.perl?
I think "do 'that-file'" honors path specified by the -I option, so I do
not think "do $cache_pm" is necessary. My preference is to run gitweb
That can come later.
Jakub, can we have an absolute minimum fix-up, so that we can give
this wider exposure? I think there are only
four issues:
(1) exclude Ajax-y stuff from caching;
(2) install cache.pm the same way gitweb.perl is installed via
the Makefile;
(3) running tests with appropriate -I so that cache.pm is found; and
(4) die if 'cache.pm' cannot be "done").
I think the change in gitweb-cache v7 is small and safe enough that we
should fast-track it to give usability to the real world sites. It may be
a low-risk "obviously correct" approach that is quick-and-dirty, but that
is exactly why this should be fast-tracked. It does not touch the logic
or formatting in any way, just bypasses the page generation altogether
when it can clearly do so when it can tell the output cannot possibly be
incorrect (albeit sometimes it might be stale if in certain cases, e.g. it
is relative to HEAD).
I know you and others were aiming to split things up, but I think the
amount of the effort that is needed for that line of work on top of the
current codebase is not much different from what is needed on top of
gitweb-cache v7.
--
From `perldoc -f do`
do 'stat.pl';
is just like
eval 'cat stat.pl';
except that it's more efficient and concise, keeps track of the current
filename for error messages, searches the @INC libraries, and updates %INC
if the file is found. ^^^^^^^^^^^^^^^^^^^^^^^^^^^
So I think it respects '-I<directory>' given to perl interpreter.
On the other hand I don't know how it would work with mod_perl (uwing
ModPerl::Registry handler), whether it wouldn't too require extra
configuration.
So I think a better solution would be to base 'Gitweb caching v7' plus
necessary fixes and improvements on top of
gitweb: Prepare for splitting gitweb
This means that the directory that gitweb is in would be added to @INC
via
use lib __DIR__.'/lib';
The 'cache.pm' or 'cache.pl' file would be moved to 'gitweb/lib', but
Easy, but only if check whether to do capturing and caching is moved
out of cache_fetch to caller, i.e. to gitweb script. See also comments
below about what need and should be done.
Another solution is to turn off Ajax-y stuff when caching is enabled.
It can be done quite easily, just sprinkle some !$caching_enabled
(see comment below about naming and semantic of this config variable)
With "gitweb: Prepare for splitting gitweb" as introductory patch this
would be just adding
# gitweb output caching
GITWEB_MODULES += cache.pm
(or cache.pl - it is not a proper Perl package!) e.g. above GITWEB_REPLACE
No change to test necessary if we use "use lib __DIR__.'/lib';" in
gitweb.perl
I'd like to port from my rewrite change to t/t9500-gitweb-standalone-no-errors.sh
adding minimal test to check if running gitweb with caching enabled
doesn't generate any warnings, and (modified) change to the
t/t9502-gitweb-standalone-parse-output.sh, adding minimal test that
O.K. (4) is one liner.
There is are also other issues
(5) naming and semantic of gitweb config variables configuring caching;
at least change ...This series is a bit fixed up and a tiny bit cleaned up version of "Gitweb caching v7" series from John 'Warthog9' Hawley. This series is based on top of 8ff76f4 (gitweb: Move call to evaluate_git_version after evaluate_gitweb_config, 2010-09-26) currently in 'pu'. This commit fixes a bug in gitweb which shows in running testsuite. The patch "gitweb: Add option to force version match" was removed from this series as it doesn't belong to it, c.f. http://article.gmane.org/gmane.comp.version-control.git/160236 [PATCHv7.1 1/4] gitweb: Prepare for splitting gitweb [PATCHv7.1 2/4] gitweb: add output buffering and associated functions [PATCHv7.1 3/4] gitweb: File based caching layer (from git.kernel.org) [PATCHv7.1 4/4] gitweb: Minimal testing of gitweb caching Jakub Narebski (2): gitweb: Prepare for splitting gitweb gitweb: Minimal testing of gitweb caching John 'Warthog9' Hawley (2): gitweb: add output buffering and associated functions gitweb: File based caching layer (from git.kernel.org) gitweb/Makefile | 20 ++- gitweb/gitweb.perl | 134 +++++++++++- gitweb/lib/cache.pl | 348 +++++++++++++++++++++++++++++ gitweb/static/gitweb.css | 6 + t/t9500-gitweb-standalone-no-errors.sh | 19 ++ t/t9502-gitweb-standalone-parse-output.sh | 39 ++++ 6 files changed, 555 insertions(+), 11 deletions(-) create mode 100644 gitweb/lib/cache.pl -- 1.7.3 --
This series is a bit fixed up and a tiny bit cleaned up version of "Gitweb caching v7" series from John 'Warthog9' Hawley: http://thread.gmane.org/gmane.comp.version-control.git/160147 This series is based on top of 'next', because it contains 'jn/gitweb-test' branch. The difference from v7.2 is that it takes into account 'test-installed' target in gitweb/Makefile in first patch of its series, and that testing of caching support is slightly extended. Note that some of those tests fail currently, not because of error in gitweb caching code, but because I was not able to disable calling cacheWaitForUpdate(), which hinders testing. Those differences were also described in "Re: What's cooking in git.git (Nov 2010, #01; Tue, 9)": http://article.gmane.org/gmane.comp.version-control.git/161309 Table of contents: ================== [PATCHv7.2 1/4] gitweb: Prepare for splitting gitweb [PATCHv7.1 2/4] gitweb: add output buffering and associated functions [PATCHv7.1 3/4] gitweb: File based caching layer (from git.kernel.org) [PATCHv7.2 4/4] gitweb: Minimal testing of gitweb caching Shortlog: ========= Jakub Narebski (2): gitweb: Prepare for splitting gitweb gitweb: Minimal testing of gitweb caching John 'Warthog9' Hawley (2): gitweb: add output buffering and associated functions gitweb: File based caching layer (from git.kernel.org) Diffstat: ========= gitweb/Makefile | 20 ++- gitweb/gitweb.perl | 134 +++++++++++- gitweb/lib/cache.pl | 348 +++++++++++++++++++++++++++++ gitweb/static/gitweb.css | 6 + t/gitweb-lib.sh | 15 ++ t/t9500-gitweb-standalone-no-errors.sh | 20 ++ t/t9501-gitweb-standalone-http-status.sh | 13 + t/t9502-gitweb-standalone-parse-output.sh | 33 +++ 8 files changed, 579 insertions(+), 10 deletions(-) create mode 100644 gitweb/lib/cache.pl -- 1.7.3 --
Prepare gitweb for having been split into modules that are to be installed alongside gitweb in 'lib/' subdirectory, by adding use lib __DIR__.'/lib'; to gitweb.perl (to main gitweb script), and preparing for putting modules (relative path) in $(GITWEB_MODULES) in gitweb/Makefile. This preparatory work allows to add new module to gitweb by simply adding GITWEB_MODULES += <module> to gitweb/Makefile (assuming that the module is in 'gitweb/lib/' directory). While at it pass GITWEBLIBDIR in addition to GITWEB_TEST_INSTALLED to test instaleed version of gitweb and installed version of modules (for tests which check individual (sub)modules). Using __DIR__ from Dir::Self module (not in core, that's why currently gitweb includes excerpt of code from Dir::Self defining __DIR__) was chosen over using FindBin-based solution (in core since perl 5.00307, while gitweb itself requires at least perl 5.8.0) because FindBin uses BEGIN block, which is a problem under mod_perl and other persistent Perl environments (thought there are workarounds). At Pavan Kumar Sankara suggestion gitweb/Makefile uses install [OPTION]... SOURCE... DIRECTORY format (2nd format) with single SOURCE rather than install [OPTION]... SOURCE DEST format (1st format) because of security reasons (race conditions). Modern GNU install has `-T' / `--no-target-directory' option, but we cannot rely that the $(INSTALL) we are using supports this option. The install-modules target in gitweb/Makefile uses shell 'for' loop, instead of make's $(foreach) function, to avoid possible problem with generating a command line that exceeded the maximum argument list length. Signed-off-by: Jakub Narebski <jnareb@gmail.com> --- This version is rebased on top of 'next', which means that it takes 'jn/gitweb-test' into account, in particular 958a846 (gitweb/Makefile: Add 'test' and 'test-installed' targets, 2010-09-26). That means adding GITWEBLIBDIR for future testing of individual modules (it was not ...
You confused me. "git grep" after test-applying this 4-patch series on top of next shows only one GITWEBLIBDIR, which means that the symbol is In the earlier series, 478da52 (gitweb: Prepare for splitting gitweb, 2010-10-30) removed the trailing blank line from here but this does not do so anymore, which is a micronit regression. --
That's why I wrote "for _future_ testing"... but perhaps it is not worth I was not sure if such "while at it" style improvements are encouraged or not. Will remember it for v7.3. -- Jakub Narebski Poland --
Prepare gitweb for having been split into modules that are to be installed alongside gitweb in 'lib/' subdirectory, by adding use lib __DIR__.'/lib'; to gitweb.perl (to main gitweb script), and preparing for putting modules (relative path) in $(GITWEB_MODULES) in gitweb/Makefile. This preparatory work allows to add new module to gitweb by simply adding GITWEB_MODULES += <module> to gitweb/Makefile (assuming that the module is in 'gitweb/lib/' directory). While at it pass GITWEBLIBDIR in addition to GITWEB_TEST_INSTALLED to allow testing installed version of gitweb and installed version of modules (for future tests which would check individual (sub)modules). Using __DIR__ from Dir::Self module (not in core, that's why currently gitweb includes excerpt of code from Dir::Self defining __DIR__) was chosen over using FindBin-based solution (in core since perl 5.00307, while gitweb itself requires at least perl 5.8.0) because FindBin uses BEGIN block, which is a problem under mod_perl and other persistent Perl environments (thought there are workarounds). At Pavan Kumar Sankara suggestion gitweb/Makefile uses install [OPTION]... SOURCE... DIRECTORY format (2nd format) with single SOURCE rather than install [OPTION]... SOURCE DEST format (1st format) because of security reasons (race conditions). Modern GNU install has `-T' / `--no-target-directory' option, but we cannot rely that the $(INSTALL) we are using supports this option. The install-modules target in gitweb/Makefile uses shell 'for' loop, instead of make's $(foreach) function, to avoid possible problem with generating a command line that exceeded the maximum argument list length. Signed-off-by: Jakub Narebski <jnareb@gmail.com> --- BUGFIX: shell variables should not be in single quotes I am very sorry about this bug... gitweb/Makefile | 18 +++++++++++++++--- gitweb/gitweb.perl | 8 ++++++++ 2 files changed, 23 insertions(+), 3 deletions(-) diff --git a/gitweb/Makefile ...
Hmm, how did you find the issue, and more importantly, how did I or other people who saw this patch so easily fail to notice it? FWIW, I do run "make install" from the toplevel as part of my pre-push test, so I _should_ have noticed it. Ah, I don't run the install step for a revision that does not pass its selftest, so I haven't run "make install" on 'pu' for some time. That may explain it. Anyway thanks for a fixup. --
When working on my total rewrite of J.H. gitweb caching series, available as 'gitweb/cache-kernel-pu' branch in git://repo.or.cz/git/jnareb-git.git and git://github.com/jnareb/git.git repositories I finally did a *clean* reinstal (i.e. remove whole directory, then run "make install-gitweb"). This branch uses the same "gitweb: Prepare for splitting gitweb" patch Hmmm... I thought that "make install" doesn't install gitweb, but it does with "$(MAKE) -C gitweb install"... though I am not sure if "make all" Sorry for the bug. -- Jakub Narebski Poland --
I think it does, and it should if it doesn't. --
Anyway "install" target in gitweb/Makefile runs "all" target in gitweb/Makefile, so "make install" -> "make -C gitweb install" -> "make -C gitweb all" -> gitweb/gitweb.cgi is generated gitweb/gitweb.cgi is in $(OTHER_PROGRAMS), and we have all:: [...] $(OTHER_PROGRAMS) GIT-BUILD-OPTIONS Hmmm... shouldn't it be 'gitweb', not 'gitweb/gitweb.cgi'? Just wondering (we have gitweb/gitweb.cgi target in main Makefile, which just proxies to gitweb/Makefile). -- Jakub Narebski Poland --
From: John 'Warthog9' Hawley <warthog9@eaglescrag.net>
This adds output buffering for gitweb, mainly in preparation for
caching support. This is a dramatic change to how caching was being
done, mainly in passing around the variable manually and such.
This centrally flips the entire STDOUT to a variable, which after the
completion of the run, flips it back and does a print on the resulting
data.
This should save on the previous 10K line patch (or so) that adds more
explicit output passing.
[jn: modified reset_output to silence 'gitweb.perl: Name "main::STDOUT_REAL"
used only once: possible typo at ../gitweb/gitweb.perl line 1130.' warning]
Signed-off-by: Junio C Hamano <gitster@pobox.com>
---
This is unchanged from previous version.
Reminder: it uses
open STDOUT, ">&", \*STDOUT_REAL;
rather than
open(STDOUT,">&STDOUT_REAL");
to silence spurious (invalid) warning
gitweb.perl: Name "main::STDOUT_REAL" used only once: possible typo
at ../gitweb/gitweb.perl line 1130.
gitweb/gitweb.perl | 29 +++++++++++++++++++++++++++++
1 files changed, 29 insertions(+), 0 deletions(-)
diff --git a/gitweb/gitweb.perl b/gitweb/gitweb.perl
index cfa511c..cae0e34 100755
--- a/gitweb/gitweb.perl
+++ b/gitweb/gitweb.perl
@@ -39,6 +39,9 @@ BEGIN {
our $version = "++GIT_VERSION++";
+# Output buffer variable
+our $output = "";
+
our ($my_url, $my_uri, $base_url, $path_info, $home_link);
sub evaluate_uri {
our $cgi;
@@ -1134,6 +1137,25 @@ sub evaluate_argv {
);
}
+sub change_output {
+ our $output;
+
+ # Trap the 'proper' STDOUT to STDOUT_REAL for things like error messages and such
+ open(STDOUT_REAL,">&STDOUT") or die "Unable to capture STDOUT $!\n";
+
+ # Close STDOUT, so that it isn't being used anymore.
+ close STDOUT;
+
+ # Trap STDOUT to the $output variable, which is what I was using in the original
+ # patch anyway.
+ open(STDOUT,">", \$output) || die "Unable to open STDOUT: $!"; #open STDOUT handle to use ...From: John 'Warthog9' Hawley <warthog9@eaglescrag.net> This is a relatively large patch that implements the file based caching layer that is quite similar to the one used on such large sites as kernel.org and soon git.fedoraproject.org. This provides a simple, and straight forward caching mechanism that scales dramatically better than Gitweb by itself. The caching layer basically buffers the output that Gitweb would normally return, and saves that output to a cache file on the local disk. When the file is requested it attempts to gain a shared lock on the cache file and cat it out to the client. Should an exclusive lock be on a file (it's being updated) the code has a choice to either update in the background and go ahead and show the stale page while update is being performed, or stall the client(s) until the page is generated. There are two forms of stalling involved here, background building and non-background building, both of which are discussed in the configuration page. There are still a few known "issues" with respect to this: - Code needs to be added to be "browser" aware so that clients like wget that are trying to get a binary blob don't obtain a "Generating..." page Caching is disabled by default. You can turn it on by setting $caching_enabled variable to true to enable file based caching. [jn: added error checking to loading 'cache.pl'; moved check for $caching_enabled outside out of cache_fetch, which required update to die_error()] Signed-off-by: Junio C Hamano <gitster@pobox.com> --- This patch is unchanged from previous version. Reminder: the main difference from J.H. patch was moving check for $caching_enabled out of cache_fetch, so that when caching is disabled then gitweb works (almost) the same as before this patch. Also, error checking for loading 'cache.pl' file. Interdiff in previous series. gitweb/Makefile | 3 + gitweb/gitweb.perl | 105 ++++++++++++-- gitweb/lib/cache.pl | 348 ...
From: John 'Warthog9' Hawley <warthog9@eaglescrag.net>
This is a relatively large patch that implements the file based
caching layer that is quite similar to the one used on such large
sites as kernel.org and soon git.fedoraproject.org. This provides
a simple, and straight forward caching mechanism that scales
dramatically better than Gitweb by itself.
The caching layer basically buffers the output that Gitweb would
normally return, and saves that output to a cache file on the local
disk. When the file is requested it attempts to gain a shared lock
on the cache file and cat it out to the client. Should an exclusive
lock be on a file (it's being updated) the code has a choice to either
update in the background and go ahead and show the stale page while
update is being performed, or stall the client(s) until the page
is generated.
There are two forms of stalling involved here, background building
and non-background building, both of which are discussed in the
configuration page.
There are still a few known "issues" with respect to this:
- Code needs to be added to be "browser" aware so
that clients like wget that are trying to get a
binary blob don't obtain a "Generating..." page
Caching is disabled by default. You can turn it on by setting
$caching_enabled variable to true to enable file based caching.
[jn: added error checking to loading 'cache.pl'; moved check
for $caching_enabled outside out of cache_fetch, which required
update to die_error()]
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Signed-off-by: Jakub Narebski <jnareb@gmail.com>
---
^^^^^^^^^^^^^^
I have made mistake with this line when moving $caching_enabled check
out of cache_fetch to its caller.
Reusing $fullhashpath variable as a *capture buffer* (it has nothing
[...]
gitweb/Makefile | 3 +
gitweb/gitweb.perl | 105 ++++++++++++--
gitweb/lib/cache.pl | 348 ...Errr... I meant that this abuse didn't help avoiding my mistake. -- Jakub Narebski Poland --
Wait a bit. This seems to match what I have already queued on 'pu', no? Am I hallucinating? --
Damn, it looks like my mailer included wrong (older) version of a file.
I'm sorry. Below there is interdiff:
diff --git a/gitweb/lib/cache.pl b/gitweb/lib/cache.pl
index dd14bfb..4360b3d 100644
--- a/gitweb/lib/cache.pl
+++ b/gitweb/lib/cache.pl
@@ -99,7 +99,7 @@ sub cacheUpdate {
$lockStatus = $lockStatBG;
}else{
- open(cacheFile, '>:utf8', \$fullhashpath);
+ open(cacheFile, '>:utf8', $fullhashpath);
my $lockStat = flock(cacheFile,LOCK_EX|LOCK_NB);
$lockStatus = $lockStat;
-- >8 --
From: John 'Warthog9' Hawley <warthog9@eaglescrag.net>
This is a relatively large patch that implements the file based
caching layer that is quite similar to the one used on such large
sites as kernel.org and soon git.fedoraproject.org. This provides
a simple, and straight forward caching mechanism that scales
dramatically better than Gitweb by itself.
The caching layer basically buffers the output that Gitweb would
normally return, and saves that output to a cache file on the local
disk. When the file is requested it attempts to gain a shared lock
on the cache file and cat it out to the client. Should an exclusive
lock be on a file (it's being updated) the code has a choice to either
update in the background and go ahead and show the stale page while
update is being performed, or stall the client(s) until the page
is generated.
There are two forms of stalling involved here, background building
and non-background building, both of which are discussed in the
configuration page.
There are still a few known "issues" with respect to this:
- Code needs to be added to be "browser" aware so
that clients like wget that are trying to get a
binary blob don't obtain a "Generating..." page
Caching is disabled by default. You can turn it on by setting
$caching_enabled variable to true to enable file based caching.
[jn: added error checking to loading 'cache.pl'; moved check
for $caching_enabled outside out of cache_fetch, which ...Thanks. What I had was what I pulled from you. I take it that you want me to rebase the two tip commits on the branch? --
Yes, please. Though it still doesn't pass new tests in t9501 and t9502; the problem is more with setting up tests rather than with caching code, though. -- Jakub Narebski Poland --
Why do you use dynamically scoped file handles here as opposed to lexically scoped ones? And why do you change the output discipline on BINOUT immediately before closing the file, and after you print data to it? Doing so sortof makes sense when the filedhandle is STDOUT, but not when it is BINOUT. cheers, Yves -- perl -Mre=debug -e "/just|another|perl|hacker/" --
Also in this code: Why is that preferred to: require 'cache.pl'; And why is this thing even a .pl file? Why isnt it called lib/GitWeb/Cache.pm or something like that? -- perl -Mre=debug -e "/just|another|perl|hacker/" --
Because it is not a Perl module; in particular 'cache.pl' uses global variables from gitweb.perl (like $my_url, or $cachedir, or %action) and subroutines from gitweb.perl (like change_output() and reset_output()). That is why it needs to be injected via do, rather than included in its owne namespace with package/require. P.S. This is not my code, this is patch by J.H. (John Hawley); I did only *minimal* fixups. P.P.S. My rewrite can be found in 'gitweb/cache-kernel-pu' branch in my repository (links are to web interface) http://repo.or.cz/w/git/jnareb-git.git https://github.com/jnareb/git Sent in http://thread.gmane.org/gmane.comp.version-control.git/158313 -- Jakub Narebski Poland --
Add basic tests of caching support to t9500-gitweb-standalone-no-errors
test: set $caching_enabled to true and check for errors for first time
run (generating cache) and second time run (retrieving from cache) for a
single view - summary view for a project. Check also that request for
non-existent object (which results in die_error() codepath to be called)
doesn't produce errors.
Check in t9501-gitweb-standalone-http-status that request for
non-existent object produces correct output (HTTP headers and HTML
output) also when caching is enabled.
Check in the t9502-gitweb-standalone-parse-output test that gitweb
produces the same output with and without caching, for first and
second run, with binary (raw) or plain text (utf8) output.
The common routine that enables cache, gitweb_enable_caching, is
defined in t/gitweb-lib.sh
Signed-off-by: Jakub Narebski <jnareb@gmail.com>
---
As I wrote in the cover letter, because of cacheWaitForUpdate() generating
extra output some of those newly introduced tests fail, some
intermittently.
John, could you take a look at it and check if the problem is: in tests,
in my simplification, or in caching code...
t/gitweb-lib.sh | 15 +++++++++++++
t/t9500-gitweb-standalone-no-errors.sh | 20 +++++++++++++++++
t/t9501-gitweb-standalone-http-status.sh | 13 +++++++++++
t/t9502-gitweb-standalone-parse-output.sh | 33 +++++++++++++++++++++++++++++
4 files changed, 81 insertions(+), 0 deletions(-)
diff --git a/t/gitweb-lib.sh b/t/gitweb-lib.sh
index b9bb95f..16ce811 100644
--- a/t/gitweb-lib.sh
+++ b/t/gitweb-lib.sh
@@ -52,6 +52,21 @@ EOF
export SCRIPT_NAME
}
+gitweb_enable_caching () {
+ test_expect_success 'enable caching' '
+ cat >>gitweb_config.perl <<-\EOF &&
+ our $caching_enabled = 1;
+ our $minCacheTime = 60*60*24*7*30; # very long expiration time for tests (a month)
+ our $maxCacheTime = 60*60*24*7*30*365; # upper bound for dynamic (adaptive) caching
+ our $cachedir = ...Ooops. Still, fixing this doesn't fix tests. I think I'd have to make another extra commit, adding configure knob to turn off cacheWaitForUpdate()... Or perhaps I did something wrong when simplifying, but I have checked diff between full and simplified version (in comments in v7.1 series), and it doesn't look like it... -- Jakub Narebski Poland --
From: John 'Warthog9' Hawley <warthog9@eaglescrag.net>
This adds output buffering for gitweb, mainly in preparation for
caching support. This is a dramatic change to how caching was being
done, mainly in passing around the variable manually and such.
This centrally flips the entire STDOUT to a variable, which after the
completion of the run, flips it back and does a print on the resulting
data.
This should save on the previous 10K line patch (or so) that adds more
explicit output passing.
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Signed-off-by: Jakub Narebski <jnareb@gmail.com>
---
Commit message unchanged.
I have modified reset_output to silence the following warning from t9500:
gitweb.perl: Name "main::STDOUT_REAL" used only once: possible typo
at ../gitweb/gitweb.perl line 1130.
Here is the interdiff for gitweb.perl from 2e5c355 (J.H. patch in 'pu'):
diff --git a/gitweb/gitweb.perl b/gitweb/gitweb.perl
index b2b0a23..91e274f 100755
--- a/gitweb/gitweb.perl
+++ b/gitweb/gitweb.perl
@@ -32,7 +39,7 @@ BEGIN {
our $version = "++GIT_VERSION++";
# Output buffer variable
-$output = "";
+our $output = "";
our ($my_url, $my_uri, $base_url, $path_info, $home_link);
sub evaluate_uri {
@@ -1143,13 +1138,12 @@ sub change_output {
}
sub reset_output {
- # This basically takes STDOUT_REAL and puts it back as STDOUTl
- open(STDOUT,">&STDOUT_REAL");
+ # This basically takes STDOUT_REAL and puts it back as STDOUT
+ open STDOUT, ">&", \*STDOUT_REAL;
}
gitweb/gitweb.perl | 29 +++++++++++++++++++++++++++++
1 files changed, 29 insertions(+), 0 deletions(-)
diff --git a/gitweb/gitweb.perl b/gitweb/gitweb.perl
index e4c08ba..91e274f 100755
--- a/gitweb/gitweb.perl
+++ b/gitweb/gitweb.perl
@@ -38,6 +38,9 @@ BEGIN {
our $version = "++GIT_VERSION++";
+# Output buffer variable
+our $output = "";
+
our ($my_url, $my_uri, $base_url, $path_info, $home_link);
sub evaluate_uri {
...Prepare gitweb for having been split into modules that are to be installed alongside gitweb in 'lib/' subdirectory, by adding use lib __DIR__.'/lib'; to gitweb.perl (to main gitweb script), and preparing for putting modules (relative path) in $(GITWEB_MODULES) in gitweb/Makefile. This preparatory work allows to add new module to gitweb by simply adding GITWEB_MODULES += <module> to gitweb/Makefile (assuming that the module is in 'gitweb/lib/' directory). Using __DIR__ from Dir::Self module (not in core, that's why currently gitweb includes excerpt of code from Dir::Self defining __DIR__) was chosen over using FindBin-based solution (in core since perl 5.00307, while gitweb itself requires at least perl 5.8.0) because FindBin uses BEGIN block, which is a problem under mod_perl and other persistent Perl environments (thought there are workarounds). At Pavan Kumar Sankara suggestion gitweb/Makefile uses install [OPTION]... SOURCE... DIRECTORY format (2nd format) with single SOURCE rather than install [OPTION]... SOURCE DEST format (1st format) because of security reasons (race conditions). Modern GNU install has `-T' / `--no-target-directory' option, but we cannot rely that the $(INSTALL) we are using supports this option. The install-modules target in gitweb/Makefile uses shell 'for' loop, instead of make's $(foreach) function, to avoid possible problem with generating a command line that exceeded the maximum argument list length. Signed-off-by: Jakub Narebski <jnareb@gmail.com> --- This patch would conflict with the patch 958a846 (gitweb/Makefile: Add 'test' and 'test-installed' targets, 2010-09-26), which is in 'pu'. Resolving those conflicts is easy, though non-trivial. Please tell me if I should send this series rebased on top of 'jn/gitweb-test' (which includes 958a846) instead. This patch introduces infrastructure which would be required later for splitting gitweb. gitweb/Makefile | 17 ++++++++++++++--- gitweb/gitweb.perl ...
Prepare gitweb for having been split into modules that are to be installed alongside gitweb in 'lib/' subdirectory, by adding use lib __DIR__.'/lib'; to gitweb.perl (to main gitweb script), and preparing for putting modules (relative path) in $(GITWEB_MODULES) in gitweb/Makefile. This preparatory work allows to add new module to gitweb by simply adding GITWEB_MODULES += <module> to gitweb/Makefile (assuming that the module is in 'gitweb/lib/' directory). While at it pass GITWEBLIBDIR in addition to GITWEB_TEST_INSTALLED to test instaleed version of gitweb and installed version of modules (for tests which check individual (sub)modules). Using __DIR__ from Dir::Self module (not in core, that's why currently gitweb includes excerpt of code from Dir::Self defining __DIR__) was chosen over using FindBin-based solution (in core since perl 5.00307, while gitweb itself requires at least perl 5.8.0) because FindBin uses BEGIN block, which is a problem under mod_perl and other persistent Perl environments (thought there are workarounds). At Pavan Kumar Sankara suggestion gitweb/Makefile uses install [OPTION]... SOURCE... DIRECTORY format (2nd format) with single SOURCE rather than install [OPTION]... SOURCE DEST format (1st format) because of security reasons (race conditions). Modern GNU install has `-T' / `--no-target-directory' option, but we cannot rely that the $(INSTALL) we are using supports this option. The install-modules target in gitweb/Makefile uses shell 'for' loop, instead of make's $(foreach) function, to avoid possible problem with generating a command line that exceeded the maximum argument list length. Signed-off-by: Jakub Narebski <jnareb@gmail.com> --- This version requires and applies cleanly on top of patch 958a846 (gitweb/Makefile: Add 'test' and 'test-installed' targets, 2010-09-26) in 'pu'. gitweb/Makefile | 17 +++++++++++++++-- gitweb/gitweb.perl | 8 ++++++++ 2 files changed, 23 insertions(+), 2 ...
From: John 'Warthog9' Hawley <warthog9@eaglescrag.net>
This is a relatively large patch that implements the file based
caching layer that is quite similar to the one used on such large
sites as kernel.org and soon git.fedoraproject.org. This provides
a simple, and straight forward caching mechanism that scales
dramatically better than Gitweb by itself.
The caching layer basically buffers the output that Gitweb would
normally return, and saves that output to a cache file on the local
disk. When the file is requested it attempts to gain a shared lock
on the cache file and cat it out to the client. Should an exclusive
lock be on a file (it's being updated) the code has a choice to either
update in the background and go ahead and show the stale page while
update is being performed, or stall the client(s) until the page
is generated.
There are two forms of stalling involved here, background building
and non-background building, both of which are discussed in the
configuration page.
There are still a few known "issues" with respect to this:
- Code needs to be added to be "browser" aware so
that clients like wget that are trying to get a
binary blob don't obtain a "Generating..." page
Caching is disabled by default. You can turn it on by setting
$caching_enabled variable to true to enable file based caching.
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Signed-off-by: Jakub Narebski <jnareb@gmail.com>
---
This is *minimal* fixup of J.H. patch, addressing only major concerns
of Junio C Hamano (JH) and me (JN).
JH> Jakub, can we have an absolute minimum fix-up, so that we can give
JH> this wider exposure? I think there are only
JH> four issues:
JH>
JH> (1) exclude Ajax-y stuff from caching
Done, via "if ($caching_enabled && is_cacheable($action) { ... }".
Alternate solution would be to degrade Ajax-y actions to their
non-Ajax-y versions (i.e. 'blame_incremental' to just 'blame') when
caching is enabled.
JH> (2) install cache.pm the same way ...Add basic tests of caching support to t9500-gitweb-standalone-no-errors test: set $caching_enabled to true and check for errors for first time run (generating cache) and second time run (retrieving from cache) for a single view - summary view for a project. Check in the t9502-gitweb-standalone-parse-output test that gitweb produces the same output with and without caching, for first and second run, with binary or text output. Signed-off-by: Jakub Narebski <jnareb@gmail.com> --- All tests, including those added here, passes. t/t9500-gitweb-standalone-no-errors.sh | 19 ++++++++++++++ t/t9502-gitweb-standalone-parse-output.sh | 39 +++++++++++++++++++++++++++++ 2 files changed, 58 insertions(+), 0 deletions(-) diff --git a/t/t9500-gitweb-standalone-no-errors.sh b/t/t9500-gitweb-standalone-no-errors.sh index 4f2b9b0..0ad5fc8 100755 --- a/t/t9500-gitweb-standalone-no-errors.sh +++ b/t/t9500-gitweb-standalone-no-errors.sh @@ -676,4 +676,23 @@ test_expect_success HIGHLIGHT \ gitweb_run "p=.git;a=blob;f=test.sh"' test_debug 'cat gitweb.log' +# ---------------------------------------------------------------------- +# caching + +cat >>gitweb_config.perl <<\EOF +$caching_enabled = 1; +$cachedir = 'cache'; # to clear right thing +EOF +rm -rf cache + +test_expect_success \ + 'caching enabled (project summary, first run, generating cache)' \ + 'gitweb_run "p=.git;a=summary"' +test_debug 'cat gitweb.log' + +test_expect_success \ + 'caching enabled (project summary, second run, cached version)' \ + 'gitweb_run "p=.git;a=summary"' +test_debug 'cat gitweb.log' + test_done diff --git a/t/t9502-gitweb-standalone-parse-output.sh b/t/t9502-gitweb-standalone-parse-output.sh index dd83890..f3d0cf0 100755 --- a/t/t9502-gitweb-standalone-parse-output.sh +++ b/t/t9502-gitweb-standalone-parse-output.sh @@ -112,4 +112,43 @@ test_expect_success 'snapshot: hierarchical branch name (xx/test)' ' ' test_debug 'cat gitweb.headers' + +# ...
