[PATCHv7.2 4/4] gitweb: Minimal testing of gitweb caching

Previous thread: [PATCH] completion: fix zsh check under bash with 'set -u' by Mark Lodato on Wednesday, October 27, 2010 - 6:08 pm. (2 messages)

Next thread: Undo git svn reset? by Marat Radchenko on Wednesday, October 27, 2010 - 11:20 pm. (1 message)
From: John 'Warthog9' Hawley
Date: Wednesday, October 27, 2010 - 5:42 pm

Evening everyone,
 
This is the latest incarnation of gitweb w/ caching.  Per the general
consensus and requests from the recent Gittogether I'm re-submitting 
my patches.

Biggest changes are bringing it back in line with Junio's tree, couple
of locking fixes (specifically targeted at binary files), and at the 
nagging behest of Sverre I've cut the old patch down to a much more
manageable (and in his words "reviewable") series.

This still differs, slightly, from whats in production on kernel.org.
It's missing the index page git:// link, and kernel.org is still running
the broken out output buffering but I'm likely to flip over to this just
so I'm not maintaining quite so much (and the changes are easier to track
with upstream)

v7:
	- Rework output system, now central STDOUT redirect
	- Various fixes to caching brought in from existing
	  running system

v6:
	- Never saw the light of day
	- Various testing, and reworks.

v5:
	- Missed a couple of things that were in my local tree, and
	  added them back in.
	- Split up the die_error and the version matching patch
	- Set version matching to be on by default - otherwise this
	  really is code that will never get checked, or at best
	  enabled by default by distributions
	- Added a minor code cleanup with respect to $site_header
	  that was already in my tree
	- Applied against a more recent git tree vs. 1.6.6-rc2
	- Removed breakout patch for now (did that in v4 actually)
	  and will deal with that separately 

	http://git.kernel.org/?p=git/warthog9/gitweb.git;a=shortlog;h=refs/heads/gitweb-ml-v5

v4:
	- major re-working of the caching layer to use file handle
	  redirection instead of buffering output
	- other minor improvements

	http://git.kernel.org/?p=git/warthog9/gitweb.git;a=shortlog;h=refs/heads/gitweb-ml-v4
v3:
	- various minor re-works based on mailing list feedback,
	  this series was not sent to the mailing list.
v2:
	- Better breakout
	- You can actually disable the cache now

- John ...
From: John 'Warthog9' Hawley
Date: Wednesday, October 27, 2010 - 5:42 pm

This adds output buffering for gitweb, mainly in preparation for
caching support.  This is a dramatic change to how caching was being
done, mainly in passing around the variable manually and such.

This centrally flips the entire STDOUT to a variable, which after the
completion of the run, flips it back and does a print on the resulting
data.

This should save on the previous 10K line patch (or so) that adds more
explicit output passing.
---
 gitweb/gitweb.perl |   29 +++++++++++++++++++++++++++++
 1 files changed, 29 insertions(+), 0 deletions(-)

From: Jakub Narebski
Date: Thursday, October 28, 2010 - 2:56 am

Nice solution.  I'll steal it for GitwebCache::Capture::Redirect (or

Doesn't this enable capture unconditionally?  

Wouldn't that screw-up the blame_data part of blame_interactive Ajax-y


-- 
Jakub Narebski
Poland
ShadeHawk on #git
--

From: John 'Warthog9' Hawley
Date: Wednesday, October 27, 2010 - 5:42 pm

This adds $git_versions_must_match variable, which is set to true,
checks that we are running on the same version of git that we
shipped with, and if not throw '500 Internal Server Error' error.
What is checked is the version of gitweb (embedded in building
gitweb.cgi), against version of runtime git binary used.

Gitweb can usually run with a mismatched git install.  This is more
here to give an obvious warning as to whats going on vs. silently
failing.

By default this feature is turned on.

Signed-off-by: John 'Warthog9' Hawley <warthog9@kernel.org>
Signed-off-by: Jakub Narebski <jnareb@gmail.com>
---
 gitweb/README      |    4 ++++
 gitweb/gitweb.perl |   27 +++++++++++++++++++++++++++
 2 files changed, 31 insertions(+), 0 deletions(-)

From: Jakub Narebski
Date: Thursday, October 28, 2010 - 2:52 am

Were you bitten by mismatch between git version and gitweb version?
Just how serious is that (hypothetical?) situation?


I think last time this patch stalled on discussion whether this
feature should be turned on by default, where it can break some
setups; or be turned off by default, where it is much less useful

For me, if $git_versions_must_match is to be on by default, I would
prefer also update to t/gitweb-lib.sh, so hat I don't have to


Shouldn't it be before check_loadavg()?

-- 
Jakub Narebski
Poland
ShadeHawk on #git
--

From: Junio C Hamano
Date: Thursday, October 28, 2010 - 3:08 pm

I do not understand why this variable needs to be turned _on_ by default,
but more importantly, isn't this more or less independent from what we
discussed at GitTogether, which is to get your battle-tested caching layer
that will tremendously help any nontrivial site merged earlier rather than
later by reducing load average from 1000 to 3-or-4?



--

From: John 'Warthog9' Hawley
Date: Wednesday, October 27, 2010 - 5:42 pm

This is a relatively large patch that implements the file based
caching layer that is quite similar to the one used  on such large
sites as kernel.org and soon git.fedoraproject.org.  This provides
a simple, and straight forward caching mechanism that scales
dramatically better than Gitweb by itself.

The caching layer basically buffers the output that Gitweb would
normally return, and saves that output to a cache file on the local
disk.  When the file is requested it attempts to gain a shared lock
on the cache file and cat it out to the client.  Should an exclusive
lock be on a file (it's being updated) the code has a choice to either
update in the background and go ahead and show the stale page while
update is being performed, or stall the client(s) until the page
is generated.

There are two forms of stalling involved here, background building
and non-background building, both of which are discussed in the
configuration page.

There are still a few known "issues" with respect to this:
- Code needs to be added to be "browser" aware so
  that clients like wget that are trying to get a
  binary blob don't obtain a "Generating..." page

Caching is disabled by default with the $cache_enable variable,
setting this to 1 will enable file based caching.  It is expected
that this will be extended to include additional types of caching
(like memcached) in the future and should not be exclusively
considered a binary value.
---
 gitweb/cache.pm          |  365 ++++++++++++++++++++++++++++++++++++++++++++++
 gitweb/gitweb.perl       |   99 +++++++++++--
 gitweb/static/gitweb.css |    6 +
 3 files changed, 458 insertions(+), 12 deletions(-)
 create mode 100644 gitweb/cache.pm

From: Jakub Narebski
Date: Thursday, October 28, 2010 - 9:10 am

I am not commenting on minor style issues, like using camelCase names
of variables and subroutines, of 'if( ... )' instead of 'if (...)'
etc. minor issues.



"Quite similar" - what does it mean?  In what way it is different from

O.K.  This means caching HTTP response (caching output of CGI script),
using "simple" file based caching engine, without any serialization.

But this doesn't tell us how the cache expiration is handled, i.e. how

In my rewrite of earlier version of your gitweb caching patch it is
responsibility of caching engine (GitwebCache::FileCacheWithLocking)
to serialize (via locking) (re)generating cache entry.  Gitweb just
passes page key, function that returns captured output to cache
instance; it passes callback for "staling" the client to cache
constructor.

It is not the "file" that is being requested: on the gitweb level it
is a page / action / view that is being requested, on the level of
caching negine it is cache entry that is requested... only at the low

Right.  This is also missing from my rewrite.

Sidenote: why "binary blob"?  Most downloading utilities do not follow

Why not do the same as in my rewrite, namely $cache_enabled or
$caching_enabled *boolean* value (not "binary"), and $cache variable

It would be nice to have at least some minimal testing for caching
enabled gitweb.  Hmmm... I should have split update to t9500-* test
so you would be able to cherry-pick it.

Also, it would be good idea to have 'make install-gitweb' to install
'cache.pm' alongside gitweb, especially that you do not do any error

Wouldn't it be better to use real Perl module, i.e. start with


Because [stable] Perl earlier than 5.10.0 has pre-2.00 version of
File::Path in core, which didn't have make_path and remove_tree.

At least that is compatibility with pre-2.00 File::Path should be
mentioned in this comment, if we are to mention it at all.





Wouldn't it be better to use boolean 'if (!$caching_enabled) {' rather
than explicitely test for ...
From: Junio C Hamano
Date: Thursday, October 28, 2010 - 3:29 pm

Yeah, when I saw your print-to-variable-assignment that was one of the
things that came to my mind.  Sounds like a reasonable thing to do.
--

From: Junio C Hamano
Date: Friday, October 29, 2010 - 3:25 pm

I am getting this in the gitweb.log:

    [Fri Oct 29 22:21:12 2010] gitweb.perl: Undefined subroutine &main::cache_fetch called at .../t/../gitweb/gitweb.perl line 1124.

which seems to cause t9500 to fail.
--

From: Jakub Narebski
Date: Saturday, October 30, 2010 - 1:58 am

This is caused by three issues (bugs) in v7 caching code.

First is the reason why t9500 exhibits this bug.  The gitweb caching
v7 includes file with subroutines related to caching in the following
way:

  do 'cache.pm';

Note that the relative path is used; for t9500 it is relative from
somewhere witjin 't/', and not from 'gitweb/', so "do 'cache.pm';"
doesn't find it.

In my earlier rewrites I used

  do $cache_pm;

and 't/gitweb-lib.sh' set $cache_pm appriopriately.


Second is why this bug is bad.  There is no error checking that 
"do 'cache.pm';" succeeded.  It should be

  do $cache_pm;
  die $@ if $@;

at least.  Perhaps even better would be to simply turn off caching
support if there is an error in 'cache.pm' (which probably should be
called 'cache.pl', as it is not proper Perl module)... though on the
other hand side it would could hide the fact that caching is not
working.


Third is why this matters.  In v7 the cache_fetch() subroutine,
defined in 'cache.pm', is run *unconditionally*, and has test if the
caching is enabled *inside it*, instead of having gitweb (caller) use
it only if caching is disabled.

This coupled with the fact that 'make install-gitweb' would *not*
install 'cache.pm' alongside gitweb.cgi means that anybody upgrading
gitweb, whether he/she wants caching or not, would have gitweb stop
working after upgrade... unless he/she knows that he/she has to copy
'cache.pm' file manually.

On the other hand having test first if caching is enabled would make
t9500 tests pass (because they do not even test minimally cache
enabled / cache disabled cases, like in my rewrite), but would hide
problem when caching is enabled and 'cache.pm' is not installed...
(perhaps also in persistent environments; I don't know where pwd is
then).

-- 
Jakub Narebski
Poland
ShadeHawk on #git
--

From: Junio C Hamano
Date: Saturday, October 30, 2010 - 9:24 pm

John, where should cache.pm go in the installed system?  Does it typically
go next to gitweb.perl?

I think "do 'that-file'" honors path specified by the -I option, so I do
not think "do $cache_pm" is necessary.  My preference is to run gitweb


That can come later.

Jakub, can we have an absolute minimum fix-up, so that we can give
this wider exposure?  I think there are only
four issues:

 (1) exclude Ajax-y stuff from caching;
 (2) install cache.pm the same way gitweb.perl is installed via
     the Makefile;
 (3) running tests with appropriate -I so that cache.pm is found; and
 (4) die if 'cache.pm' cannot be "done").

I think the change in gitweb-cache v7 is small and safe enough that we
should fast-track it to give usability to the real world sites.  It may be
a low-risk "obviously correct" approach that is quick-and-dirty, but that
is exactly why this should be fast-tracked.  It does not touch the logic
or formatting in any way, just bypasses the page generation altogether
when it can clearly do so when it can tell the output cannot possibly be
incorrect (albeit sometimes it might be stale if in certain cases, e.g. it
is relative to HEAD).

I know you and others were aiming to split things up, but I think the
amount of the effort that is needed for that line of work on top of the
current codebase is not much different from what is needed on top of
gitweb-cache v7.
--

From: Jakub Narebski
Date: Sunday, October 31, 2010 - 2:21 am

From `perldoc -f do`

      do 'stat.pl';

   is just like

      eval 'cat stat.pl';

   except that it's more efficient and concise, keeps track of the current
   filename for error messages, searches the @INC libraries, and updates %INC
   if the file is found.        ^^^^^^^^^^^^^^^^^^^^^^^^^^^

So I think it respects '-I<directory>' given to perl interpreter.

On the other hand I don't know how it would work with mod_perl (uwing
ModPerl::Registry handler), whether it wouldn't too require extra
configuration.

So I think a better solution would be to base 'Gitweb caching v7' plus
necessary fixes and improvements on top of 

  gitweb: Prepare for splitting gitweb

This means that the directory that gitweb is in would be added to @INC
via

  use lib __DIR__.'/lib';

The 'cache.pm' or 'cache.pl' file would be moved to 'gitweb/lib', but


Easy, but only if check whether to do capturing and caching is moved
out of cache_fetch to caller, i.e. to gitweb script.  See also comments
below about what need and should be done.

Another solution is to turn off Ajax-y stuff when caching is enabled.
It can be done quite easily, just sprinkle some !$caching_enabled
(see comment below about naming and semantic of this config variable)

With "gitweb: Prepare for splitting gitweb" as introductory patch this
would be just adding

  # gitweb output caching
  GITWEB_MODULES += cache.pm

(or cache.pl - it is not a proper Perl package!) e.g. above GITWEB_REPLACE

No change to test necessary if we use "use lib __DIR__.'/lib';" in
gitweb.perl

I'd like to port from my rewrite change to t/t9500-gitweb-standalone-no-errors.sh
adding minimal test to check if running gitweb with caching enabled
doesn't generate any warnings, and (modified) change to the
t/t9502-gitweb-standalone-parse-output.sh, adding minimal test that

O.K. (4) is one liner.

There is are also other issues

   (5) naming and semantic of gitweb config variables configuring caching;
       at least change ...
From: Jakub Narebski
Date: Monday, November 1, 2010 - 3:24 am

This series is a bit fixed up and a tiny bit cleaned up version of
"Gitweb caching v7" series from John 'Warthog9' Hawley.

This series is based on top of 8ff76f4 (gitweb: Move call to
evaluate_git_version after evaluate_gitweb_config, 2010-09-26)
currently in 'pu'.  This commit fixes a bug in gitweb which shows in
running testsuite.

The patch "gitweb: Add option to force version match" was removed from
this series as it doesn't belong to it, c.f. 
  http://article.gmane.org/gmane.comp.version-control.git/160236

 [PATCHv7.1 1/4] gitweb: Prepare for splitting gitweb
 [PATCHv7.1 2/4] gitweb: add output buffering and associated functions
 [PATCHv7.1 3/4] gitweb: File based caching layer (from git.kernel.org)
 [PATCHv7.1 4/4] gitweb: Minimal testing of gitweb caching

Jakub Narebski (2):
  gitweb: Prepare for splitting gitweb
  gitweb: Minimal testing of gitweb caching

John 'Warthog9' Hawley (2):
  gitweb: add output buffering and associated functions
  gitweb: File based caching layer (from git.kernel.org)

 gitweb/Makefile                           |   20 ++-
 gitweb/gitweb.perl                        |  134 +++++++++++-
 gitweb/lib/cache.pl                       |  348 +++++++++++++++++++++++++++++
 gitweb/static/gitweb.css                  |    6 +
 t/t9500-gitweb-standalone-no-errors.sh    |   19 ++
 t/t9502-gitweb-standalone-parse-output.sh |   39 ++++
 6 files changed, 555 insertions(+), 11 deletions(-)
 create mode 100644 gitweb/lib/cache.pl

-- 
1.7.3

--

From: Jakub Narebski
Date: Friday, November 12, 2010 - 4:35 pm

This series is a bit fixed up and a tiny bit cleaned up version of
"Gitweb caching v7" series from John 'Warthog9' Hawley:
  http://thread.gmane.org/gmane.comp.version-control.git/160147

This series is based on top of 'next', because it contains 
'jn/gitweb-test' branch.

The difference from v7.2 is that it takes into account 'test-installed'
target in gitweb/Makefile in first patch of its series, and that testing
of caching support is slightly extended.  Note that some of those tests
fail currently, not because of error in gitweb caching code, but because
I was not able to disable calling cacheWaitForUpdate(), which hinders
testing.

Those differences were also described in "Re: What's cooking in git.git
(Nov 2010, #01; Tue, 9)":
  http://article.gmane.org/gmane.comp.version-control.git/161309


Table of contents:
==================
 [PATCHv7.2 1/4] gitweb: Prepare for splitting gitweb
 [PATCHv7.1 2/4] gitweb: add output buffering and associated functions
 [PATCHv7.1 3/4] gitweb: File based caching layer (from git.kernel.org)
 [PATCHv7.2 4/4] gitweb: Minimal testing of gitweb caching

Shortlog:
=========
Jakub Narebski (2):
  gitweb: Prepare for splitting gitweb
  gitweb: Minimal testing of gitweb caching

John 'Warthog9' Hawley (2):
  gitweb: add output buffering and associated functions
  gitweb: File based caching layer (from git.kernel.org)

Diffstat:
=========
 gitweb/Makefile                           |   20 ++-
 gitweb/gitweb.perl                        |  134 +++++++++++-
 gitweb/lib/cache.pl                       |  348 +++++++++++++++++++++++++++++
 gitweb/static/gitweb.css                  |    6 +
 t/gitweb-lib.sh                           |   15 ++
 t/t9500-gitweb-standalone-no-errors.sh    |   20 ++
 t/t9501-gitweb-standalone-http-status.sh  |   13 +
 t/t9502-gitweb-standalone-parse-output.sh |   33 +++
 8 files changed, 579 insertions(+), 10 deletions(-)
 create mode 100644 gitweb/lib/cache.pl

-- 
1.7.3

--

From: Jakub Narebski
Date: Friday, November 12, 2010 - 4:41 pm

Prepare gitweb for having been split into modules that are to be
installed alongside gitweb in 'lib/' subdirectory, by adding

  use lib __DIR__.'/lib';

to gitweb.perl (to main gitweb script), and preparing for putting
modules (relative path) in $(GITWEB_MODULES) in gitweb/Makefile.

This preparatory work allows to add new module to gitweb by simply
adding

  GITWEB_MODULES += <module>

to gitweb/Makefile (assuming that the module is in 'gitweb/lib/'
directory).

While at it pass GITWEBLIBDIR in addition to GITWEB_TEST_INSTALLED
to test instaleed version of gitweb and installed version of modules
(for tests which check individual (sub)modules).


Using __DIR__ from Dir::Self module (not in core, that's why currently
gitweb includes excerpt of code from Dir::Self defining __DIR__) was
chosen over using FindBin-based solution (in core since perl 5.00307,
while gitweb itself requires at least perl 5.8.0) because FindBin uses
BEGIN block, which is a problem under mod_perl and other persistent
Perl environments (thought there are workarounds).

At Pavan Kumar Sankara suggestion gitweb/Makefile uses

  install [OPTION]... SOURCE... DIRECTORY

format (2nd format) with single SOURCE rather than

  install [OPTION]... SOURCE DEST

format (1st format) because of security reasons (race conditions).
Modern GNU install has `-T' / `--no-target-directory' option, but we
cannot rely that the $(INSTALL) we are using supports this option.

The install-modules target in gitweb/Makefile uses shell 'for' loop,
instead of make's $(foreach) function, to avoid possible problem with
generating a command line that exceeded the maximum argument list
length.

Signed-off-by: Jakub Narebski <jnareb@gmail.com>
---
This version is rebased on top of 'next', which means that it takes
'jn/gitweb-test' into account, in particular 958a846 (gitweb/Makefile:
Add 'test' and 'test-installed' targets, 2010-09-26).

That means adding GITWEBLIBDIR for future testing of individual
modules (it was not ...
From: Junio C Hamano
Date: Wednesday, November 17, 2010 - 4:10 pm

You confused me.  "git grep" after test-applying this 4-patch series on
top of next shows only one GITWEBLIBDIR, which means that the symbol is

In the earlier series, 478da52 (gitweb: Prepare for splitting gitweb,
2010-10-30) removed the trailing blank line from here but this does not do
so anymore, which is a micronit regression.
--

From: Jakub Narebski
Date: Thursday, November 18, 2010 - 6:37 am

That's why I wrote "for _future_ testing"... but perhaps it is not worth

I was not sure if such "while at it" style improvements are encouraged
or not.  Will remember it for v7.3.

-- 
Jakub Narebski
Poland
--

From: Jakub Narebski
Date: Thursday, December 2, 2010 - 3:17 am

Prepare gitweb for having been split into modules that are to be
installed alongside gitweb in 'lib/' subdirectory, by adding

  use lib __DIR__.'/lib';

to gitweb.perl (to main gitweb script), and preparing for putting
modules (relative path) in $(GITWEB_MODULES) in gitweb/Makefile.

This preparatory work allows to add new module to gitweb by simply
adding

  GITWEB_MODULES += <module>

to gitweb/Makefile (assuming that the module is in 'gitweb/lib/'
directory).

While at it pass GITWEBLIBDIR in addition to GITWEB_TEST_INSTALLED to
allow testing installed version of gitweb and installed version of
modules (for future tests which would check individual (sub)modules).


Using __DIR__ from Dir::Self module (not in core, that's why currently
gitweb includes excerpt of code from Dir::Self defining __DIR__) was
chosen over using FindBin-based solution (in core since perl 5.00307,
while gitweb itself requires at least perl 5.8.0) because FindBin uses
BEGIN block, which is a problem under mod_perl and other persistent
Perl environments (thought there are workarounds).

At Pavan Kumar Sankara suggestion gitweb/Makefile uses

  install [OPTION]... SOURCE... DIRECTORY

format (2nd format) with single SOURCE rather than

  install [OPTION]... SOURCE DEST

format (1st format) because of security reasons (race conditions).
Modern GNU install has `-T' / `--no-target-directory' option, but we
cannot rely that the $(INSTALL) we are using supports this option.

The install-modules target in gitweb/Makefile uses shell 'for' loop,
instead of make's $(foreach) function, to avoid possible problem with
generating a command line that exceeded the maximum argument list
length.

Signed-off-by: Jakub Narebski <jnareb@gmail.com>
---
BUGFIX: shell variables should not be in single quotes

I am very sorry about this bug...

 gitweb/Makefile    |   18 +++++++++++++++---
 gitweb/gitweb.perl |    8 ++++++++
 2 files changed, 23 insertions(+), 3 deletions(-)

diff --git a/gitweb/Makefile ...
From: Junio C Hamano
Date: Thursday, December 2, 2010 - 10:37 am

Hmm, how did you find the issue, and more importantly, how did I or other
people who saw this patch so easily fail to notice it?

FWIW, I do run "make install" from the toplevel as part of my pre-push
test, so I _should_ have noticed it.

Ah, I don't run the install step for a revision that does not pass its
selftest, so I haven't run "make install" on 'pu' for some time.  That may
explain it.

Anyway thanks for a fixup.
--

From: Jakub Narebski
Date: Thursday, December 2, 2010 - 12:01 pm

When working on my total rewrite of J.H. gitweb caching series, available
as 'gitweb/cache-kernel-pu' branch in git://repo.or.cz/git/jnareb-git.git
and git://github.com/jnareb/git.git repositories I finally did a *clean*
reinstal (i.e. remove whole directory, then run "make install-gitweb").
This branch uses the same "gitweb: Prepare for splitting gitweb" patch

Hmmm... I thought that "make install" doesn't install gitweb, but it does
with "$(MAKE) -C gitweb install"... though I am not sure if "make all"

Sorry for the bug.

-- 
Jakub Narebski
Poland
--

From: Junio C Hamano
Date: Thursday, December 2, 2010 - 12:21 pm

I think it does, and it should if it doesn't.
--

From: Jakub Narebski
Date: Thursday, December 2, 2010 - 12:36 pm

Anyway "install" target in gitweb/Makefile runs "all" target in
gitweb/Makefile, so 
   "make install" -> "make -C gitweb install" -> "make -C gitweb all"
   -> gitweb/gitweb.cgi is generated

gitweb/gitweb.cgi is in $(OTHER_PROGRAMS), and we have

  all:: [...] $(OTHER_PROGRAMS) GIT-BUILD-OPTIONS


Hmmm... shouldn't it be 'gitweb', not 'gitweb/gitweb.cgi'?  Just wondering
(we have gitweb/gitweb.cgi target in main Makefile, which just proxies
to gitweb/Makefile).

-- 
Jakub Narebski
Poland
--

From: Jakub Narebski
Date: Friday, November 12, 2010 - 4:44 pm

From: John 'Warthog9' Hawley <warthog9@eaglescrag.net>

This adds output buffering for gitweb, mainly in preparation for
caching support.  This is a dramatic change to how caching was being
done, mainly in passing around the variable manually and such.

This centrally flips the entire STDOUT to a variable, which after the
completion of the run, flips it back and does a print on the resulting
data.

This should save on the previous 10K line patch (or so) that adds more
explicit output passing.

[jn: modified reset_output to silence 'gitweb.perl: Name "main::STDOUT_REAL"
 used only once: possible typo at ../gitweb/gitweb.perl line 1130.' warning]

Signed-off-by: Junio C Hamano <gitster@pobox.com>
---
This is unchanged from previous version.


Reminder: it uses

  open STDOUT, ">&", \*STDOUT_REAL;

rather than

  open(STDOUT,">&STDOUT_REAL");

to silence spurious (invalid) warning

  gitweb.perl: Name "main::STDOUT_REAL" used only once: possible typo
  at ../gitweb/gitweb.perl line 1130.


 gitweb/gitweb.perl |   29 +++++++++++++++++++++++++++++
 1 files changed, 29 insertions(+), 0 deletions(-)

diff --git a/gitweb/gitweb.perl b/gitweb/gitweb.perl
index cfa511c..cae0e34 100755
--- a/gitweb/gitweb.perl
+++ b/gitweb/gitweb.perl
@@ -39,6 +39,9 @@ BEGIN {
 
 our $version = "++GIT_VERSION++";
 
+# Output buffer variable
+our $output = "";
+
 our ($my_url, $my_uri, $base_url, $path_info, $home_link);
 sub evaluate_uri {
 	our $cgi;
@@ -1134,6 +1137,25 @@ sub evaluate_argv {
 	);
 }
 
+sub change_output {
+	our $output;
+
+	# Trap the 'proper' STDOUT to STDOUT_REAL for things like error messages and such
+	open(STDOUT_REAL,">&STDOUT") or die "Unable to capture STDOUT $!\n";
+
+	# Close STDOUT, so that it isn't being used anymore.
+	close STDOUT;
+
+	# Trap STDOUT to the $output variable, which is what I was using in the original
+	# patch anyway.
+	open(STDOUT,">", \$output) || die "Unable to open STDOUT: $!"; #open STDOUT handle to use ...
From: Jakub Narebski
Date: Friday, November 12, 2010 - 4:56 pm

From: John 'Warthog9' Hawley <warthog9@eaglescrag.net>

This is a relatively large patch that implements the file based
caching layer that is quite similar to the one used  on such large
sites as kernel.org and soon git.fedoraproject.org.  This provides
a simple, and straight forward caching mechanism that scales
dramatically better than Gitweb by itself.

The caching layer basically buffers the output that Gitweb would
normally return, and saves that output to a cache file on the local
disk.  When the file is requested it attempts to gain a shared lock
on the cache file and cat it out to the client.  Should an exclusive
lock be on a file (it's being updated) the code has a choice to either
update in the background and go ahead and show the stale page while
update is being performed, or stall the client(s) until the page
is generated.

There are two forms of stalling involved here, background building
and non-background building, both of which are discussed in the
configuration page.

There are still a few known "issues" with respect to this:
- Code needs to be added to be "browser" aware so
  that clients like wget that are trying to get a
  binary blob don't obtain a "Generating..." page

Caching is disabled by default.  You can turn it on by setting
$caching_enabled variable to true to enable file based caching.

[jn: added error checking to loading 'cache.pl'; moved check
 for $caching_enabled outside out of cache_fetch, which required
 update to die_error()]

Signed-off-by: Junio C Hamano <gitster@pobox.com>
---
This patch is unchanged from previous version.

Reminder: the main difference from J.H. patch was moving check for
$caching_enabled out of cache_fetch, so that when caching is disabled
then gitweb works (almost) the same as before this patch.

Also, error checking for loading 'cache.pl' file.

Interdiff in previous series.

 gitweb/Makefile          |    3 +
 gitweb/gitweb.perl       |  105 ++++++++++++--
 gitweb/lib/cache.pl      |  348 ...
From: Jakub Narebski
Date: Sunday, November 28, 2010 - 4:22 am

From: John 'Warthog9' Hawley <warthog9@eaglescrag.net>

This is a relatively large patch that implements the file based
caching layer that is quite similar to the one used  on such large
sites as kernel.org and soon git.fedoraproject.org.  This provides
a simple, and straight forward caching mechanism that scales
dramatically better than Gitweb by itself.

The caching layer basically buffers the output that Gitweb would
normally return, and saves that output to a cache file on the local
disk.  When the file is requested it attempts to gain a shared lock
on the cache file and cat it out to the client.  Should an exclusive
lock be on a file (it's being updated) the code has a choice to either
update in the background and go ahead and show the stale page while
update is being performed, or stall the client(s) until the page
is generated.

There are two forms of stalling involved here, background building
and non-background building, both of which are discussed in the
configuration page.

There are still a few known "issues" with respect to this:
- Code needs to be added to be "browser" aware so
  that clients like wget that are trying to get a
  binary blob don't obtain a "Generating..." page

Caching is disabled by default.  You can turn it on by setting
$caching_enabled variable to true to enable file based caching.

[jn: added error checking to loading 'cache.pl'; moved check
 for $caching_enabled outside out of cache_fetch, which required
 update to die_error()]

Signed-off-by: Junio C Hamano <gitster@pobox.com>
Signed-off-by: Jakub Narebski <jnareb@gmail.com>
---

                                            ^^^^^^^^^^^^^^

I have made mistake with this line when moving $caching_enabled check
out of cache_fetch to its caller.

Reusing $fullhashpath variable as a *capture buffer* (it has nothing

[...]

 gitweb/Makefile          |    3 +
 gitweb/gitweb.perl       |  105 ++++++++++++--
 gitweb/lib/cache.pl      |  348 ...
From: Jakub Narebski
Date: Sunday, November 28, 2010 - 4:31 am

Errr... I meant that this abuse didn't help avoiding my mistake.

-- 
Jakub Narebski
Poland
--

From: Junio C Hamano
Date: Monday, November 29, 2010 - 3:13 pm

Thanks.
--

From: Junio C Hamano
Date: Monday, November 29, 2010 - 3:20 pm

Wait a bit.

This seems to match what I have already queued on 'pu', no?  Am I
hallucinating?
--

From: Jakub Narebski
Date: Monday, November 29, 2010 - 4:09 pm

Damn, it looks like my mailer included wrong (older) version of a file.
I'm sorry.  Below there is interdiff:

  diff --git a/gitweb/lib/cache.pl b/gitweb/lib/cache.pl
  index dd14bfb..4360b3d 100644
  --- a/gitweb/lib/cache.pl
  +++ b/gitweb/lib/cache.pl
  @@ -99,7 +99,7 @@ sub cacheUpdate {
   
   		$lockStatus = $lockStatBG;
   	}else{
  -		open(cacheFile, '>:utf8', \$fullhashpath);
  +		open(cacheFile, '>:utf8', $fullhashpath);
   		my $lockStat = flock(cacheFile,LOCK_EX|LOCK_NB);
   
   		$lockStatus = $lockStat;

-- >8 --
From: John 'Warthog9' Hawley <warthog9@eaglescrag.net>

This is a relatively large patch that implements the file based
caching layer that is quite similar to the one used  on such large
sites as kernel.org and soon git.fedoraproject.org.  This provides
a simple, and straight forward caching mechanism that scales
dramatically better than Gitweb by itself.

The caching layer basically buffers the output that Gitweb would
normally return, and saves that output to a cache file on the local
disk.  When the file is requested it attempts to gain a shared lock
on the cache file and cat it out to the client.  Should an exclusive
lock be on a file (it's being updated) the code has a choice to either
update in the background and go ahead and show the stale page while
update is being performed, or stall the client(s) until the page
is generated.

There are two forms of stalling involved here, background building
and non-background building, both of which are discussed in the
configuration page.

There are still a few known "issues" with respect to this:
- Code needs to be added to be "browser" aware so
  that clients like wget that are trying to get a
  binary blob don't obtain a "Generating..." page

Caching is disabled by default.  You can turn it on by setting
$caching_enabled variable to true to enable file based caching.

[jn: added error checking to loading 'cache.pl'; moved check
 for $caching_enabled outside out of cache_fetch, which ...
From: Junio C Hamano
Date: Monday, November 29, 2010 - 5:51 pm

Thanks.  What I had was what I pulled from you.  I take it that you want
me to rebase the two tip commits on the branch?

--

From: Jakub Narebski
Date: Tuesday, November 30, 2010 - 3:21 am

Yes, please.

Though it still doesn't pass new tests in t9501 and t9502; the problem
is more with setting up tests rather than with caching code, though.

-- 
Jakub Narebski
Poland
--

From: demerphq
Date: Monday, November 29, 2010 - 4:07 pm

Why do you use dynamically scoped file handles here as opposed to
lexically scoped ones?

And why do you change the output discipline on BINOUT immediately
before closing the file, and after you print data to it?

Doing so sortof makes sense when the filedhandle is STDOUT, but not
when it is BINOUT.

cheers,
Yves



-- 
perl -Mre=debug -e "/just|another|perl|hacker/"
--

From: demerphq
Date: Monday, November 29, 2010 - 4:26 pm

Also in this code:


Why is that preferred to:

   require 'cache.pl';

And why is this thing even a .pl file? Why isnt it called
lib/GitWeb/Cache.pm or something like that?


-- 
perl -Mre=debug -e "/just|another|perl|hacker/"
--

From: Jakub Narebski
Date: Monday, November 29, 2010 - 4:54 pm

Because it is not a Perl module; in particular 'cache.pl' uses global 
variables from gitweb.perl (like $my_url, or $cachedir, or %action)
and subroutines from gitweb.perl (like change_output() and 
reset_output()).  That is why it needs to be injected via do, rather
than included in its owne namespace with package/require.


P.S. This is not my code, this is patch by J.H. (John Hawley); I did
only *minimal* fixups.

P.P.S. My rewrite can be found in 'gitweb/cache-kernel-pu' branch in
my repository (links are to web interface)
  http://repo.or.cz/w/git/jnareb-git.git
  https://github.com/jnareb/git
Sent in http://thread.gmane.org/gmane.comp.version-control.git/158313

-- 
Jakub Narebski
Poland
--

From: Jakub Narebski
Date: Friday, November 12, 2010 - 5:01 pm

Add basic tests of caching support to t9500-gitweb-standalone-no-errors
test: set $caching_enabled to true and check for errors for first time
run (generating cache) and second time run (retrieving from cache) for a
single view - summary view for a project.  Check also that request for
non-existent object (which results in die_error() codepath to be called)
doesn't produce errors.

Check in t9501-gitweb-standalone-http-status that request for
non-existent object produces correct output (HTTP headers and HTML
output) also when caching is enabled.

Check in the t9502-gitweb-standalone-parse-output test that gitweb
produces the same output with and without caching, for first and
second run, with binary (raw) or plain text (utf8) output.

The common routine that enables cache, gitweb_enable_caching, is
defined in t/gitweb-lib.sh

Signed-off-by: Jakub Narebski <jnareb@gmail.com>
---
As I wrote in the cover letter, because of cacheWaitForUpdate() generating
extra output some of those newly introduced tests fail, some
intermittently.

John, could you take a look at it and check if the problem is: in tests,
in my simplification, or in caching code...

 t/gitweb-lib.sh                           |   15 +++++++++++++
 t/t9500-gitweb-standalone-no-errors.sh    |   20 +++++++++++++++++
 t/t9501-gitweb-standalone-http-status.sh  |   13 +++++++++++
 t/t9502-gitweb-standalone-parse-output.sh |   33 +++++++++++++++++++++++++++++
 4 files changed, 81 insertions(+), 0 deletions(-)

diff --git a/t/gitweb-lib.sh b/t/gitweb-lib.sh
index b9bb95f..16ce811 100644
--- a/t/gitweb-lib.sh
+++ b/t/gitweb-lib.sh
@@ -52,6 +52,21 @@ EOF
 	export SCRIPT_NAME
 }
 
+gitweb_enable_caching () {
+	test_expect_success 'enable caching' '
+		cat >>gitweb_config.perl <<-\EOF &&
+		our $caching_enabled = 1;
+		our $minCacheTime = 60*60*24*7*30;     # very long expiration time for tests (a month)
+		our $maxCacheTime = 60*60*24*7*30*365; # upper bound for dynamic (adaptive) caching
+		our $cachedir = ...
From: Junio C Hamano
Date: Wednesday, November 17, 2010 - 3:37 pm

leftover....

--

From: Jakub Narebski
Date: Wednesday, November 17, 2010 - 4:12 pm

Ooops.

Still, fixing this doesn't fix tests.  I think I'd have to make another
extra commit, adding configure knob to turn off cacheWaitForUpdate()...


Or perhaps I did something wrong when simplifying, but I have checked
diff between full and simplified version (in comments in v7.1 series),
and it doesn't look like it...

-- 
Jakub Narebski
Poland
--

From: Jakub Narebski
Date: Monday, November 1, 2010 - 3:24 am

From: John 'Warthog9' Hawley <warthog9@eaglescrag.net>

This adds output buffering for gitweb, mainly in preparation for
caching support.  This is a dramatic change to how caching was being
done, mainly in passing around the variable manually and such.

This centrally flips the entire STDOUT to a variable, which after the
completion of the run, flips it back and does a print on the resulting
data.

This should save on the previous 10K line patch (or so) that adds more
explicit output passing.

Signed-off-by: Junio C Hamano <gitster@pobox.com>
Signed-off-by: Jakub Narebski <jnareb@gmail.com>
---
Commit message unchanged.

I have modified reset_output to silence the following warning from t9500:

  gitweb.perl: Name "main::STDOUT_REAL" used only once: possible typo
  at ../gitweb/gitweb.perl line 1130.

Here is the interdiff for gitweb.perl from 2e5c355 (J.H. patch in 'pu'):

  diff --git a/gitweb/gitweb.perl b/gitweb/gitweb.perl
  index b2b0a23..91e274f 100755
  --- a/gitweb/gitweb.perl
  +++ b/gitweb/gitweb.perl
  @@ -32,7 +39,7 @@ BEGIN {
   our $version = "++GIT_VERSION++";
   
   # Output buffer variable
  -$output = "";
  +our $output = "";
   
   our ($my_url, $my_uri, $base_url, $path_info, $home_link);
   sub evaluate_uri {
  @@ -1143,13 +1138,12 @@ sub change_output {
   }
   
   sub reset_output {
  -	# This basically takes STDOUT_REAL and puts it back as STDOUTl
  -	open(STDOUT,">&STDOUT_REAL");
  +	# This basically takes STDOUT_REAL and puts it back as STDOUT
  +	open STDOUT, ">&", \*STDOUT_REAL;
   }
   

 gitweb/gitweb.perl |   29 +++++++++++++++++++++++++++++
 1 files changed, 29 insertions(+), 0 deletions(-)

diff --git a/gitweb/gitweb.perl b/gitweb/gitweb.perl
index e4c08ba..91e274f 100755
--- a/gitweb/gitweb.perl
+++ b/gitweb/gitweb.perl
@@ -38,6 +38,9 @@ BEGIN {
 
 our $version = "++GIT_VERSION++";
 
+# Output buffer variable
+our $output = "";
+
 our ($my_url, $my_uri, $base_url, $path_info, $home_link);
 sub evaluate_uri {
 ...
From: Jakub Narebski
Date: Monday, November 1, 2010 - 3:24 am

Prepare gitweb for having been split into modules that are to be
installed alongside gitweb in 'lib/' subdirectory, by adding

  use lib __DIR__.'/lib';

to gitweb.perl (to main gitweb script), and preparing for putting
modules (relative path) in $(GITWEB_MODULES) in gitweb/Makefile.

This preparatory work allows to add new module to gitweb by simply
adding

  GITWEB_MODULES += <module>

to gitweb/Makefile (assuming that the module is in 'gitweb/lib/'
directory).

Using __DIR__ from Dir::Self module (not in core, that's why currently
gitweb includes excerpt of code from Dir::Self defining __DIR__) was
chosen over using FindBin-based solution (in core since perl 5.00307,
while gitweb itself requires at least perl 5.8.0) because FindBin uses
BEGIN block, which is a problem under mod_perl and other persistent
Perl environments (thought there are workarounds).

At Pavan Kumar Sankara suggestion gitweb/Makefile uses

  install [OPTION]... SOURCE... DIRECTORY

format (2nd format) with single SOURCE rather than

  install [OPTION]... SOURCE DEST

format (1st format) because of security reasons (race conditions).
Modern GNU install has `-T' / `--no-target-directory' option, but we
cannot rely that the $(INSTALL) we are using supports this option.

The install-modules target in gitweb/Makefile uses shell 'for' loop,
instead of make's $(foreach) function, to avoid possible problem with
generating a command line that exceeded the maximum argument list
length.

Signed-off-by: Jakub Narebski <jnareb@gmail.com>
---
This patch would conflict with the patch 958a846 (gitweb/Makefile: Add
'test' and 'test-installed' targets, 2010-09-26), which is in 'pu'.
Resolving those conflicts is easy, though non-trivial.

Please tell me if I should send this series rebased on top of 
'jn/gitweb-test' (which includes 958a846) instead.

This patch introduces infrastructure which would be required later for
splitting gitweb.

 gitweb/Makefile    |   17 ++++++++++++++---
 gitweb/gitweb.perl ...
From: Jakub Narebski
Date: Monday, November 1, 2010 - 11:50 am

Prepare gitweb for having been split into modules that are to be
installed alongside gitweb in 'lib/' subdirectory, by adding

  use lib __DIR__.'/lib';

to gitweb.perl (to main gitweb script), and preparing for putting
modules (relative path) in $(GITWEB_MODULES) in gitweb/Makefile.

This preparatory work allows to add new module to gitweb by simply
adding

  GITWEB_MODULES += <module>

to gitweb/Makefile (assuming that the module is in 'gitweb/lib/'
directory).

While at it pass GITWEBLIBDIR in addition to GITWEB_TEST_INSTALLED
to test instaleed version of gitweb and installed version of modules
(for tests which check individual (sub)modules).


Using __DIR__ from Dir::Self module (not in core, that's why currently
gitweb includes excerpt of code from Dir::Self defining __DIR__) was
chosen over using FindBin-based solution (in core since perl 5.00307,
while gitweb itself requires at least perl 5.8.0) because FindBin uses
BEGIN block, which is a problem under mod_perl and other persistent
Perl environments (thought there are workarounds).

At Pavan Kumar Sankara suggestion gitweb/Makefile uses

  install [OPTION]... SOURCE... DIRECTORY

format (2nd format) with single SOURCE rather than

  install [OPTION]... SOURCE DEST

format (1st format) because of security reasons (race conditions).
Modern GNU install has `-T' / `--no-target-directory' option, but we
cannot rely that the $(INSTALL) we are using supports this option.

The install-modules target in gitweb/Makefile uses shell 'for' loop,
instead of make's $(foreach) function, to avoid possible problem with
generating a command line that exceeded the maximum argument list
length.

Signed-off-by: Jakub Narebski <jnareb@gmail.com>
---
This version requires and applies cleanly on top of patch 958a846 
(gitweb/Makefile: Add 'test' and 'test-installed' targets, 2010-09-26)
in 'pu'.

 gitweb/Makefile    |   17 +++++++++++++++--
 gitweb/gitweb.perl |    8 ++++++++
 2 files changed, 23 insertions(+), 2 ...
From: Jakub Narebski
Date: Monday, November 1, 2010 - 3:24 am

From: John 'Warthog9' Hawley <warthog9@eaglescrag.net>

This is a relatively large patch that implements the file based
caching layer that is quite similar to the one used  on such large
sites as kernel.org and soon git.fedoraproject.org.  This provides
a simple, and straight forward caching mechanism that scales
dramatically better than Gitweb by itself.

The caching layer basically buffers the output that Gitweb would
normally return, and saves that output to a cache file on the local
disk.  When the file is requested it attempts to gain a shared lock
on the cache file and cat it out to the client.  Should an exclusive
lock be on a file (it's being updated) the code has a choice to either
update in the background and go ahead and show the stale page while
update is being performed, or stall the client(s) until the page
is generated.

There are two forms of stalling involved here, background building
and non-background building, both of which are discussed in the
configuration page.

There are still a few known "issues" with respect to this:
- Code needs to be added to be "browser" aware so
  that clients like wget that are trying to get a
  binary blob don't obtain a "Generating..." page

Caching is disabled by default.  You can turn it on by setting
$caching_enabled variable to true to enable file based caching.

Signed-off-by: Junio C Hamano <gitster@pobox.com>
Signed-off-by: Jakub Narebski <jnareb@gmail.com>
---
This is *minimal* fixup of J.H. patch, addressing only major concerns
of Junio C Hamano (JH) and me (JN).

JH> Jakub, can we have an absolute minimum fix-up, so that we can give
JH> this wider exposure?  I think there are only
JH> four issues:
JH>
JH>  (1) exclude Ajax-y stuff from caching

Done, via "if ($caching_enabled && is_cacheable($action) { ... }".

Alternate solution would be to degrade Ajax-y actions to their
non-Ajax-y versions (i.e. 'blame_incremental' to just 'blame') when
caching is enabled.

JH>  (2) install cache.pm the same way ...
From: Jakub Narebski
Date: Monday, November 1, 2010 - 3:24 am

Add basic tests of caching support to t9500-gitweb-standalone-no-errors
test: set $caching_enabled to true and check for errors for first time
run (generating cache) and second time run (retrieving from cache) for a
single view - summary view for a project.  Check in the
t9502-gitweb-standalone-parse-output test that gitweb produces the same
output with and without caching, for first and second run, with binary
or text output.

Signed-off-by: Jakub Narebski <jnareb@gmail.com>
---
All tests, including those added here, passes.

 t/t9500-gitweb-standalone-no-errors.sh    |   19 ++++++++++++++
 t/t9502-gitweb-standalone-parse-output.sh |   39 +++++++++++++++++++++++++++++
 2 files changed, 58 insertions(+), 0 deletions(-)

diff --git a/t/t9500-gitweb-standalone-no-errors.sh b/t/t9500-gitweb-standalone-no-errors.sh
index 4f2b9b0..0ad5fc8 100755
--- a/t/t9500-gitweb-standalone-no-errors.sh
+++ b/t/t9500-gitweb-standalone-no-errors.sh
@@ -676,4 +676,23 @@ test_expect_success HIGHLIGHT \
 	 gitweb_run "p=.git;a=blob;f=test.sh"'
 test_debug 'cat gitweb.log'
 
+# ----------------------------------------------------------------------
+# caching
+
+cat >>gitweb_config.perl <<\EOF
+$caching_enabled = 1;
+$cachedir = 'cache'; # to clear right thing
+EOF
+rm -rf cache
+
+test_expect_success \
+	'caching enabled (project summary, first run, generating cache)' \
+	'gitweb_run "p=.git;a=summary"'
+test_debug 'cat gitweb.log'
+
+test_expect_success \
+	'caching enabled (project summary, second run, cached version)' \
+	'gitweb_run "p=.git;a=summary"'
+test_debug 'cat gitweb.log'
+
 test_done
diff --git a/t/t9502-gitweb-standalone-parse-output.sh b/t/t9502-gitweb-standalone-parse-output.sh
index dd83890..f3d0cf0 100755
--- a/t/t9502-gitweb-standalone-parse-output.sh
+++ b/t/t9502-gitweb-standalone-parse-output.sh
@@ -112,4 +112,43 @@ test_expect_success 'snapshot: hierarchical branch name (xx/test)' '
 '
 test_debug 'cat gitweb.headers'
 
+
+# ...
Previous thread: [PATCH] completion: fix zsh check under bash with 'set -u' by Mark Lodato on Wednesday, October 27, 2010 - 6:08 pm. (2 messages)

Next thread: Undo git svn reset? by Marat Radchenko on Wednesday, October 27, 2010 - 11:20 pm. (1 message)