Re: [PATCH v8] gitweb: add test suite with Test::WWW::Mechanize::CGI

Previous thread: Re: Mercurial to git converter. by Nick Andrew on Saturday, June 14, 2008 - 3:23 am. (7 messages)

Next thread: Document clone of clone loosing branches? by Vaclav Hanzl on Saturday, June 14, 2008 - 6:05 am. (20 messages)
From: Jakub Narebski
Date: Saturday, June 14, 2008 - 5:47 am

This test uses Perl module Test::WWW::Mechanize::CGI to check
gitweb output, using HTML::Lint (if present) to validate HTML.

Signed-off-by: Jakub Narebski <jnareb@gmail.com>
---
Review, review, review!

This test requires test_external_without_stderr in t/test-lib.sh by
Lea Wiemann, send to git mailing list in
  [PATCH] t/test-lib.sh: add test_external and test_external_without_stderr
  http://thread.gmane.org/gmane.comp.version-control.git/83415
  Message-Id: <1212276975-27428-1-git-send-email-LeWiemann@gmail.com>
also present in git://repo.or.cz/git/gitweb-caching.git since commit
(currently) d28d31467db2ec5737948685ade281d5c0704a27.

This is very much work in progress; it uses Test::WWW::Mechanize::CGI 
Perl module (from CPAN), which makes writing tests checking form (but
not requiring exact details) of gitweb output.  The test driver, i.e.
t/t9503-gitweb-Mechanize.sh skips tests if they cannot be done; note
that it assumes that if one jas TWM::CGI module it also has all
modules it depends on (Test::WWW::Mechanize, HTTP::Request::AsCGI,...)
installed.

The alternatives would be to redo work of TWM::CGI using standard or
simply more common modules (LWP::* modules, HTTP::Request::AsCGI), or
(proposed during off-line discussion with Lea Wiemann) to create
expected output documents and diff (compare) literal output.  

The second solution has two disadvantages I can see.  First, it
freezes gitweb output format, making improvements more difficult.
Second, if there is change that invalidates some or all tests vectors,
you would have to regenerate those vectors from actual gitweb output
without having any tests to check this output; it would be quite easy
to put into test vectors errorneous output.

On the other hand current approach means much, much more tests.
It is also bit more challenging, because you have to find invariants
of the output and test that they are fullfilled.


NOTE: Currently test_external_without_stderr fails because when trying
to access URL for ...
From: Lea Wiemann
Date: Saturday, June 14, 2008 - 7:40 am

For everyone's convenience, here's the raw patch: 
http://article.gmane.org/gmane.comp.version-control.git/83415/raw
(When repo.or.cz is up again you can also locate a version with fixed 

Awesome, thank you so much for making a start here!  Here are some quick 

Without having looked at the cause of that, I think that gitweb should 
not be writing stuff to stderr unless an internal or serious error 
occurs; in particular trying to access invalid commits shouldn't cause 
messages on stderr, only to log files if at all.

That said, as long as it isn't fixed, here's my workaround to 
temporarily discard stderr (from my t/t9710/test.pl):

our $old_stderr;
sub discard_stderr {
         open our $old_stderr, ">&", STDERR or die "cannot save STDERR";
         close STDERR;
}
sub restore_stderr {
         open STDERR, ">&", $old_stderr or die "cannot restore STDERR";
}


t9500 seems to be doing the same(?) thing, but this somehow doesn't work 
with your t9503 test:

$ git     # no git in PATH to make sure it picks up the right git binary
bash: git: command not found
$ ./t9500-gitweb-standalone-no-errors.sh | grep passed
* passed all 75 test(s)
$ ./t9503-gitweb-Mechanize.sh -v
[...]
	gitweb.perl: Can't exec "git": No such file or directory at 
/home/lea/source/git/fresh-git/gitweb/gitweb.perl line 380.

-- Lea
--

From: Jakub Narebski
Date: Saturday, June 14, 2008 - 11:07 am

Actually it isn't gitweb (or Perl) writing to stderr, but git itself.
Somehow, at least for gitweb run as CGI script (and under legacy
mod_perl) with Apache 2 as web server this error message:
  fatal: bad revision 'non-existent'
doesn't land in web server logs (/var/log/httpd/error_log).  So it lands
in /dev/null when running gitweb as web script, so it was deemed not
important (also fixing this is not very easy, as you can read below).

t/t9500-gitweb-standalone-no-errors.sh considers as errors only those
error message which would make it into web server logs, see gitweb_run()
function there.  This catches compilation errors.


Fixing this is not that simple.  There is no option to git-rev-list
to not write any output to stderr (no, '--quiet' is about something
else), and I'd rather not lose all advantages of list (shell-less)
form of magical "|-" open only for "2>/dev/null" redirection as
in git_object subroutine in gitweb.

Perhaps it could be solved in Git.pm, and when gitweb is rewritten
to use "use Git" (and global $repo object instead of global $git_dir
variable) it would automatically fix it (using Git.pm would have
the advantage of making gitweb more portable, I think - to ActiveState
broken Perl implementation, with broken magic "|-" open).

Or git-rev-list, or even git wrapper itself, could acquire option to
redirect all stderr to dev null... I think adding it in git wrapper
would be even better; simply change "warning" and "die" to null


It should work.  test-lib.sh sets up $PATH to have 'git' binary (just

...and it would be very strange for t9500 to pass, but t9503 do not
pass.  (Of course both tests passes at my computer, otherwise
I wouldn't send this patch in current form).

Hmmm... perhaps $PATH doesn't get passed down... strange.


But thanks to your report I have found bug in gitweb.  I have changed
t/t9503-gitweb-Mechanize.sh... 

diff --git a/t/t9503-gitweb-Mechanize.sh b/t/t9503-gitweb-Mechanize.sh
index 5df22c4..a5be275 100755
--- ...
From: Lea Wiemann
Date: Saturday, June 14, 2008 - 11:31 am

Okay, this one will go away with the new API I'm writing, which uses 
cat-file --batch-check instead of rev-list.  In the meantime (and in 
other cases) I guess diverting stderr in the test code is fine.  (I 
wouldn't want to ignore stderr in all cases, even where you're not 
expecting any output on stderr, since that might actually indicate an 

Since you're accessing http://localhost/ URLs, the web server's PATH is 
in effect, which doesn't get overridden by the tests.  But anyways, 
using an environment variable (see my other email) will move the 

For t9503 this'll go away if the GITWEB_TEST_BASE_URL thing is 
implemented.  For t9500 (which contains $GIT = "git" as well), it should 
  be fine as is since test-lib.sh sets PATH and thus gitweb picks up the 
right binary.

-- Lea
--

From: Jakub Narebski
Date: Saturday, June 14, 2008 - 11:59 am

Won't work.  Well, it would work for p=commit;h=non-existent, but it
would not work for p=log;h=non-existent, where git-rev-list
(or git-log) is really needed.  And I'd rather have parse_commit()
(used in 'commit' view) and parse_commits() (used in log-like views)
share the same commit parser, parse_commit_text(), so it's rev-list
also for git_commit, not cat-file.


Well, I'd divert stderr in tests cases (or use simply 'test_external'),
or better filter stderr, only in those cases where there is spurious
but not dangerous thing on stderr.

But again, I think the solution would be either to add feature to
Git.pm, something like command_output_pipe_no_stderr, which would
redirect stderr to /dev/null, or modify git wrapper to redirect
stderr to /dev/null in scripts / when calling script, and redefine

It isn't.  http://localhost/ is just access convention, and
WWW::Mechanize::CGI actually calls system()[*1*] on provided path made
absolute (hence error when it contains whitespace), setting CGI
environmental variables before calling it.

[*1*] it would be nice to have perl_application in WWW::Mechanize::CGI,
which would simply setup %ENV and use do() instead of system() on
provided application.  Perhaps it would be better in meantime to
simply craft $mech->cgi( sub { ... } ), and do not use generic
$mech->cgi_application($path); we could do without all the checking,
too.

-- 
Jakub Narebski
Poland
--

From: Lea Wiemann
Date: Saturday, June 14, 2008 - 2:12 pm

Since you can only run whole perl scripts with test_external, we'll have 
to resort to using those discard_/restore_stderr functions.  I doubt 

(FYI, Git.pm has such a feature in the command method, though it it 
simply closes STDERR and doesn't properly re-open it, I believe.)  The 
Git::Repo API I'm writing doesn't currently generate any spurious output 

Yup, I should've read TFM. :)  On my system, $PATH is empty in 
gitweb.cgi for some strange reason, but since using an absolute path for 

Gitweb and probably CGI::Carp qw(fatalsToBrowser) use 'exit', so we'd 
have to use Test::Trap (or so) to catch those.  I think we should defer 
this until performance actually becomes an issue.

-- Lea
--

From: Jakub Narebski
Date: Sunday, June 15, 2008 - 1:36 am

The idea was more to avoid bug in cgi_application() method of WM::CGI,
which fails on pathname containing embedded spaces (that is why
workaround in setting $gitweb path, to avoid 'trash directory' in it),
than for performance reasons.

Nevertheless naive $mech->cgi( sub { do "$gitweb"; } ); doesn't work;
and after thinking about it a little it simply couldn't work...

-- 
Jakub Narebski
Poland
--

From: Lea Wiemann
Date: Saturday, June 14, 2008 - 11:18 am

Just noticed this -- here's the cause of the problem I encountered in my 
previous post.

You can't really probe for a running gitweb installation on localhost, 
because even if there's one, you don't know if you're testing against 
your tree and your test repo.

So unless it's (easily) possible to force Mechanize to eat gitweb.cgi's 
output in lieu of making actual HTTP requests, I'd suggest that we 
simply use an environment variable (e.g. 
GITWEB_TEST_BASE_URL=http://localhost/cgi-bin/gitweb.cgi) -- if it's 
unset, the tests get skipped, otherwise they're run against that base URL.

*wanders-off-to-write-a-patch*

-- Lea
--

From: Jakub Narebski
Date: Saturday, June 14, 2008 - 11:31 am

Errr... WWW::Mechanize::CGI does not access web server, but runs CGI
script (or CGI subroutine) as if it was installed in the root directory
of localhost.  'http://localhost' simply means installed CGI script.

For example tests works for me, even though I have gitweb installed
somewhere deeper than directly at locahost:
  http://localhost/cgi-bin/gitweb/gitweb.cgi
Besides you can check output and see that it is consistent with set
and used gitweb config in t9503* test.

Test::WWW::Mechanize::CGI has the following description:
http://search.cpan.org/~mramberg/Test-WWW-Mechanize-CGI-0.1/lib/Test/WWW/Mechanize/CGI.pm

  Provides a convenient way of testing CGI applications without a external
  daemon.

The documentation of module (0.1 in the case of TWM::CGI, 0.3 in the
case of WM::CGI) leaves something to be desired, so I mainly used
examples provided as guideline.

-- 
Jakub Narebski
Poland
--

From: Lea Wiemann
Date: Saturday, June 14, 2008 - 4:57 pm

From: Jakub Narebski <jnareb@gmail.com>

This test uses Test::WWW::Mechanize::CGI to check gitweb's output.  It
also uses HTML::Lint (if present) to validate the HTML.

Signed-off-by: Jakub Narebski <jnareb@gmail.com>
Signed-off-by: Lea Wiemann <LeWiemann@gmail.com>
---
This requires the following two patches:
[PATCH] gitweb: Make it work with $GIT containing spaces
    http://article.gmane.org/gmane.comp.version-control.git/85036/raw
[PATCH] t/test-lib.sh: add test_external and test_external_without_stderr
    http://article.gmane.org/gmane.comp.version-control.git/83415/raw

Changed since v1:

- Use "$GIT_EXEC_PATH/git" instead of "git" for the git binary path,
  since on my system $PATH gets lost mysteriously.

- Use test_external instead of test_external_without_stderr so that
  the test doesn't fail if there is output on stderr.

  Discarding and restoring STDERR (as I mentioned in a recent message)
  doesn't work with Mechanize, as Mechanize somehow seems to save the
  STDERR handle and reuse it, so subsequent calls to Mechanize keep
  using the discarded STDERR.  (The discard/restore functions work
  fine in other test modules, it's really just Mechanize here.)

  Using test_external means that in verbose mode we get some spurious
  error messages in the test output, but they're not harmful (and it
  should be clear to anyone running the test suite that they're OK).

- Reworded commit message a little for clarity.

The test works now on my system.  If you run this successfully (or
unsuccessfully), please post a note on the list.
Comments/patches/additions, in particular from people who've used
Mechanize before, is still much appreciated!

This patch is WIP and not for inclusion.  At some point we'll take
further revisions off the list and use a public repository instead
(once repo.or.cz is up again); we'll post a notice when that happens.
When the test suite is finished we'll post it here again.

 t/t9503-gitweb-Mechanize.sh |  127 ...
From: Jakub Narebski
Date: Sunday, June 15, 2008 - 11:01 am

Actually I think it is CGI module itself catching and redirecting
STDERR from Perl to logfile, and WWW::Mechanize::CGI having to catch
and redirect all stderr of invoked application, but I'm not sure.


What should we think about now, I guess, is
1.) Should we put all tests in one file, or should they be split
    according to what do they test?  For example separate tests
    for correct 4xx and 5xx responses for requests for non-exiting
    objects, and for not permitted views.

2.) What invariants should we test, beside obvious HTML validation
    using HTML::Lint (by the way, is there some Perl module for RSS,
    Atom and OPML validation?).  Checking for example if all items
    are listed in a 'tree' view, or if all inner links (#link) are
    valid would be a good start... pity that Mechanize modules don't
    have very good documetation.

3.) What invariants you want to test for your caching efforts, e.g.
    checking (not necessary with TWM::CGI) if cached output matches
    non-cached, with exception of marking output as cached?

-- 
Jakub Narebski
Poland
--

From: Lea Wiemann
Date: Sunday, June 15, 2008 - 11:45 am

I'd suggest we leave it in a single file until test execution time 
becomes an issue.  Then (when it has become too large) we'll be able to 

Yup; completeness of item lists is especially relevant for paginated 
output.  Also check for the presence and validity of links (like 
"parent" links, etc.), and for the presence of certain elements (like 
the file modes in the tree view).

Also, with a $ENV{LONG_GIT_TEST} variable or so, we could automatically 
validate all links for each page we're checking -- it takes a long time, 
but it's still way more efficient than exhaustive spidering of the whole 

I can't find anything on Google right now, but piping them into external 
validators might be just as fine.  Also, since those formats are 
generated using print statements (which is really error-prone for XML 

How about this:

1. Run the Mechanize tests (and possibly also the existing t9500 tests) 
*without* caching, recording the URL's and contents of all pages the 
test suite accesses.

2. Get all those URL's again *with* caching (from a cold cache), and 
assert that the output is identical.

3. Get all those URL's again *with* caching (from a warm cache), and 
assert that the output is identical.  Perhaps also assert that no call 
to the git binary is made (i.e. everything has actually been cached). 
(Of course we might need options for the production site to not cache 
certain things, but let's defer this discussion.)

-- Lea
--

From: Jakub Narebski
Date: Sunday, June 15, 2008 - 5:40 pm

I wanted to split tests mainly not because of performance, but because
of making it easier to maintain.  Although perhaps single driver-test,




We can use Test::XML / Test::XML::Valid / Test::XML::Simple for being
well-formed XML.  If RSS / Atom / OPML have good DTD / XML Schema /
/ Relax-NG schema / Sablotron rules, they could be checked using that


Well, it might be identical, but it also might have "cached output"

Or at least (if we don't cache everything, and that could be good idea)
to check if there are less git binary calls.

-- 
Jakub Narebski
Poland
--

From: Lea Wiemann
Date: Monday, June 16, 2008 - 2:10 am

We can make that deactivatable so that the output is actually identical.

-- Lea

--

From: Jakub Narebski
Date: Monday, June 16, 2008 - 1:15 pm

Good idea.  Or we can just do diff and check the difference.

BTW. another thing worth considering: generating "Generating page..."
or something like that like kernel.org's gitweb does when it has to
regenerate cache.  Nice feature, that is.

-- 
Jakub Narebski
Poland
--

From: Lea Wiemann
Date: Thursday, June 19, 2008 - 8:18 pm

From: Jakub Narebski <jnareb@gmail.com>

This test uses Test::WWW::Mechanize::CGI to check gitweb's output.  It
also uses HTML::Lint (if present) to validate the HTML.

Signed-off-by: Jakub Narebski <jnareb@gmail.com>
Signed-off-by: Lea Wiemann <LeWiemann@gmail.com>
---
I haven't gotten around to merging Jakub's recent XML validation patch
yet, but I think it'd be good to have some review now; I'll merge it
tomorrow and send another patch.

Changes since v2:

t9503-gitweb-Mechanize.sh has stayed the same, but t9503/test.pl has
been overhauled: extracted common code into helper functions, only
validate if --long-tests if given, added link-checking with
--long-tests, added some minor tests for page contents, and added
tests for the summary page at the bottom.

This runs on top of the *next* branch plus the following patches:

[PATCH] gitweb: respect $GITPERLLIB
    http://article.gmane.org/gmane.comp.version-control.git/85586/raw
[PATCH 1/2 v3] t/test-lib.sh: add test_external and test_external_without_stderr
    http://article.gmane.org/gmane.comp.version-control.git/85504/raw
[PATCH v3] gitweb: standarize HTTP status codes
    http://article.gmane.org/gmane.comp.version-control.git/85522/raw

(The latter two patches might be in next already by the time you're
reading this.)


 t/t9503-gitweb-Mechanize.sh |  128 +++++++++++++++++++++++++++++++++++++++++++
 t/t9503/test.pl             |  128 +++++++++++++++++++++++++++++++++++++++++++
 2 files changed, 256 insertions(+), 0 deletions(-)
 create mode 100755 t/t9503-gitweb-Mechanize.sh
 create mode 100755 t/t9503/test.pl

diff --git a/t/t9503-gitweb-Mechanize.sh b/t/t9503-gitweb-Mechanize.sh
new file mode 100755
index 0000000..abcf987
--- /dev/null
+++ b/t/t9503-gitweb-Mechanize.sh
@@ -0,0 +1,128 @@
+#!/bin/sh
+#
+# Copyright (c) 2008 Jakub Narebski
+# Copyright (c) 2008 Lea Wiemann
+#
+
+test_description='gitweb as CGI (using WWW::Mechanize)
+
+This test uses Perl module Test::WWW::Mechanize::CGI to
+check ...
From: Jakub Narebski
Date: Friday, June 20, 2008 - 5:08 am

If you have rewritten the patch I won't mind if you change authorship

First, it is not XML validation[*1*], but check if XML is well-formed.
Second, I think it would be better if adding XML checks (RSS, Atom,
OPML) would be left as separate commit.
 
[*1*] There is W3C feed validation service, http://validator.w3.org/feed/
which provides also standalone Python script for feed validation, and
also SOAP interface with WebService::Validator::Feed::W3C wrapper (but
it would require having Internet connection), or we can try validate
against XML Schema Definition (there exists one for RSS, and there is
[cut]

I have created this part as a copy of older t9500 gitweb test, thinking
about what we might want to test, but the WIP of Mechanize based t9503

Actually it is not that we cannot could properly when skipping, because
there are two ways to have skipped tests and test count upfront, both
implemented in my initial version (v1) of t/t9503/test.pl.

You can set number of tests upfront depending if some feature is
available, e.g. "plan tests => <n> - $use_lint*<m2> - $check_xml*<m2>"
or just "if ($use_lint) { plan tests => <m> } else ...".

Or you can explicitly skip tests using SKIP: BLOCK (see Test::More(3pm)).


What we have here is that we don't know number of tests upfront,
because it is complicated, and that's what I'd like to see in the

Why do you use shortened SHA1 identifier, instead of full SHA1? Links


Another solution would be to copy relevant parts of cgi_application
(without all the checks for example), and use $mech->cgi( sub { ... } );

I think we can use 'my' here; gitweb uses 'our' only to be able to run

Style: I would write "our (", with space between keyword and opening

Style: I would use function call form, i.e. "test_page(...)", not
command-like (or script-like) form.


As to the rest of the test: I think as preliminary test it is a good
thing to have.  We can think about what tests to add later (unless you
rather have exhaustive test suite ...
From: Lea Wiemann
Date: Friday, June 20, 2008 - 6:49 am

Sure; FWIW I'm generally in favor of having a large initial commit for 
new independent files (and since we don't need to worry about conflicts 
I'm not fervent about getting this into the next branch ASAP), but we 

I was thinking about that.  Right now the tests are so generic that you 
can replace the test repository with anything else as long as it has 

We're skipping tests if a page-load fails to prevent a slew of failure, 
using constructs like if(test_page '?...') { ... inspect the page ...}. 
It's my impression that we shouldn't end up with a wrong test count even 
if one test fails; but then we'd have to replace those ifs with 
cumbersome skip blocks.

I'm generally not in favor of maintaining any test count plans; they're 
an unnecessary failure source, and I don't think they buy us much, if 

That's for readability in the test output; links get checked anyway, 
don't they?  If you think we should be testing against full SHA1s, 


ISTR that using cgi(sub ...) gives us problems with untrappable exits in 
gitweb.cgi (and possibly more things), right?  I'm fine with the 



I'll be writing tests as I go and change parts of gitweb.  I won't be 
able to write exhaustive tests, but I at least want to make sure that 
the code I'm changing is covered somehow.

-- Lea
--

From: Jakub Narebski
Date: Friday, June 20, 2008 - 11:03 am

I just think that having this separate could help bisectability in

Well, I've tried to put the cases where something can go wrong, and to
cover wide range of possibilities.  Rename, copy, typechange, merge
commit for testing *diff views, symlink to test 'tree' view, different
types of tags and tagged objects...

What could be added is different types of stage output: filenames with
'*', '+', '=', ':', '?', whitespace, etc.  Checking if submodules

While they can detect otherwise unnoticed errors, I think most of time

This would reduce number of operations when crawling gitweb output.

 
[about workaround bug in TWM::CGI when path to application contains

Actually ->cgi_application(<path>) is implemented using ->cgi(<sub>)
in TWM::CGI.  The bug is that it uses straight "system($application)",
without any quoting, after ensuring that $application is absolute path

They work around the fact that we use 'trash directory', but they
would fail if you run test from the directory which contains spaces,
for example "/home/Lea Wiemann/git/t" (I think this has greater
probability happening on operating systems other than Linux, for

Actually 'my' would work here too; the problem with gitweb being
forced to use 'our' to make it work with mod_perl isn't about the
fact that they are global variables, but with initializing them,

Write test when we notice something breaking seems a common theme
in git development... ;-)

-- 
Jakub Narebski
Poland
--

From: Lea Wiemann
Date: Friday, June 20, 2008 - 3:04 pm

Hm...  I wouldn't think that bisect could be necessary for a long linear 

Yup, filenames with ampersands and semicolons would be fun, too. 


OK, now I see what you mean.  Well, awesome, borked lirbareis.  I've put 

Fixed, and thanks for the offline explanations.

-- Lea
--

From: Lea Wiemann
Date: Friday, June 20, 2008 - 3:18 pm

This test uses Test::WWW::Mechanize::CGI to check gitweb's output.  It
also uses HTML::Lint (if present) to validate the HTML.

TOOD: Make this runnable even when the path to the git tree (i.e. your
working copy) contains blanks.
http://mid.gmane.org/200806202003.55919.jnareb@gmail.com

Signed-off-by: Jakub Narebski <jnareb@gmail.com>
Signed-off-by: Lea Wiemann <LeWiemann@gmail.com>
---
This for everyone's reference so you have my most recent version to
base changes on.

Changed since v3: applied Jakub's suggestions, and added TODO note to
commit message so this doesn't get applied accidentally.

 t/t9503-gitweb-Mechanize.sh |  128 +++++++++++++++++++++++++++++++++++++++++++
 t/t9503/test.pl             |  128 +++++++++++++++++++++++++++++++++++++++++++
 2 files changed, 256 insertions(+), 0 deletions(-)
 create mode 100755 t/t9503-gitweb-Mechanize.sh
 create mode 100755 t/t9503/test.pl

diff --git a/t/t9503-gitweb-Mechanize.sh b/t/t9503-gitweb-Mechanize.sh
new file mode 100755
index 0000000..abcf987
--- /dev/null
+++ b/t/t9503-gitweb-Mechanize.sh
@@ -0,0 +1,128 @@
+#!/bin/sh
+#
+# Copyright (c) 2008 Jakub Narebski
+# Copyright (c) 2008 Lea Wiemann
+#
+
+test_description='gitweb as CGI (using WWW::Mechanize)
+
+This test uses Perl module Test::WWW::Mechanize::CGI to
+check gitweb output, using HTML::Lint to validate HTML.'
+
+# helper functions
+
+safe_chmod () {
+	chmod "$1" "$2" &&
+	if [ "$(git config --get core.filemode)" = false ]
+	then
+		git update-index --chmod="$1" "$2"
+	fi
+}
+
+. ./test-lib.sh
+
+# check if test can be run
+perl -MEncode -e 'decode_utf8("", Encode::FB_CROAK)' >/dev/null 2>&1 || {
+	test_expect_success \
+		'skipping gitweb tests, perl version is too old' :
+	test_done
+	exit
+}
+
+perl -MTest::WWW::Mechanize::CGI -e '' >/dev/null 2>&1 || {
+	test_expect_success \
+		'skipping gitweb tests, Test::WWW::Mechanize::CGI not found' :
+	test_done
+	exit
+}
+
+# set up test repository
+test_expect_success 'set up test ...
From: Lea Wiemann
Date: Sunday, June 22, 2008 - 5:45 pm

This test uses Test::WWW::Mechanize::CGI to check gitweb's output.  It
also uses HTML::Lint, XML::Parser, and Archive::Tar (if present, each)
to validate the HTML/XML/tgz output, and checks all links on the
tested pages if --long-tests is given.

Also add a GITPERL environment variable that allows running Perl-based
tests with different perl binaries (and thus under different
versions).

Signed-off-by: Lea Wiemann <LeWiemann@gmail.com>
Signed-off-by: Jakub Narebski <jnareb@gmail.com>
---
Windows users: Please apply this patch and run the test, I'd love to
know if it works on Windows!

This patch applies against the *next* branch (without any other
patches, finally).

Note that in order to actually run the test you need the
Test::WWW::Mechanize::CGI Perl module.

Changes since v4:

- Add a $GITPERL variable so you can choose the perl binary and run
  the test suite with different Perl versions.  Note that this affects
  test-lib.sh.

- Add a workaroud for WWW::Mechanize::CGI so that the test suite also
  works if the path to the Git tree contains blanks (which could
  happen on Windows, for instance).

- Integrated all of Jakub's "[RFC/PATCH (WIP)] gitweb: Check that RSS,
  Atom and OPML output is well formed XML", which uses XML::Parser.

- Also added snapshot validation using Archive::Tar.

- Spidering (with --long-tests) is now much more efficient, in that it
  doesn't check any page more than once.

- Added checks to ensure that fragments have corresponding name/id
  attributes in the page.

- Added plenty more actual tests: check more elements on the summary
  page, and also test the commit, commitdiff, tree, blame, history,
  blob, and blob_plain views.  Some of these tests don't do anything
  more than downloading a few pages with the given action (without
  checking any of the content), but this already helps a lot since it
  checks that gitweb doesn't die, that the HTML is valid, and (if
  --long-tests is given) that all links work.

I've tested the ...
From: Lea Wiemann
Date: Sunday, June 22, 2008 - 6:14 pm

This test uses Test::WWW::Mechanize::CGI to check gitweb's output.  It
also uses HTML::Lint, XML::Parser, and Archive::Tar (if present, each)
to validate the HTML/XML/tgz output, and checks all links on the
tested pages if --long-tests is given.

Also add a GITPERL environment variable that allows running Perl-based
tests with different perl binaries (and thus under different
versions).

Signed-off-by: Lea Wiemann <LeWiemann@gmail.com>
Signed-off-by: Jakub Narebski <jnareb@gmail.com>
---
Changed since v5: honor $ENV{GITPERL} for gitweb.cgi execution, not
just test execution:

    diff --git a/t/t9503/test.pl b/t/t9503/test.pl
    index 28894c5..947e2e8 100755
    --- a/t/t9503/test.pl
    +++ b/t/t9503/test.pl
    @@ -54,5 +54,5 @@ my $gitweb = abs_path(File::Spec->catfile('..', '..', 'gitweb', 'gitweb.cgi'));
     my $cgi = sub {
            # Use exec, not the shell, to support blanks in the path.
    -	my $status = system $gitweb $gitweb;
    +	my $status = system { $ENV{GITPERL} } $ENV{GITPERL}, $gitweb;
            my $value  = $status >> 8;


 t/t9503-gitweb-Mechanize.sh |  132 ++++++++++++++++
 t/t9503/test.pl             |  354 +++++++++++++++++++++++++++++++++++++++++++
 t/test-lib.sh               |    2 +
 3 files changed, 488 insertions(+), 0 deletions(-)
 create mode 100755 t/t9503-gitweb-Mechanize.sh
 create mode 100755 t/t9503/test.pl

diff --git a/t/t9503-gitweb-Mechanize.sh b/t/t9503-gitweb-Mechanize.sh
new file mode 100755
index 0000000..3fe6d8b
--- /dev/null
+++ b/t/t9503-gitweb-Mechanize.sh
@@ -0,0 +1,132 @@
+#!/bin/sh
+#
+# Copyright (c) 2008 Jakub Narebski
+# Copyright (c) 2008 Lea Wiemann
+#
+
+# This test supports the --long-tests option.
+
+# This test only runs on Perl 5.8 and later versions, since
+# Test::WWW::Mechanize::CGI requires Perl 5.8.
+
+test_description='gitweb tests (using WWW::Mechanize)
+
+This test uses Test::WWW::Mechanize::CGI to test gitweb.'
+
+# helper functions
+
+safe_chmod () {
+	chmod "$1" "$2" ...
From: Jakub Narebski
Date: Monday, June 23, 2008 - 6:31 am

I'd rather either skip this comment all together, or change it to
read something like that (I don't have good solution):

+# Counting tests upfront is difficult, as number of tests depends
+# on existence of several Perl modules, and is next to impossible
+# when spidering gitweb output (for --long-test).  Additionally,
+# having number of tests planned stated at beginning is not necessary,
+# as this test is to be run from git test suite, and not to be
+# processed further by TAP (Test Anything Protocol) Perl modules.

See http://www.perlfoundation.org/perl5/index.cgi?testing for
explanation why (and when) 'plan' is useful:

  TAP output usually starts with a plan line:
  
     1..9
  
  which specifies how many tests are to follow. The above example
  specifies 9 tests.
  
  This line is optional, but recommended. Should a test suite die
  part-way through, the plan allows the testing framework to recognise

Here also it could be "my $long_tests"; but I am not a Perl hacker,

Wouldn't be easier to use '--format=%(refname)' in git-for-each-ref
invocation?  Besides, gitweb now uses in link _full_ refname, i.e.
"refs/heads/<name>" and "refs/tags/<name>" to avoid branch / tag



Errr... I think that without my comment you removed it is hard to
understand what this comment talks about.  "Thus ..." without any

blanks = embedded whitespace?


Could you tell us here what are the limitations?  Are those limits


This I think should be written using Test::More equivalent of
test_expect_failure in t/test-lib.sh, namely TODO block, either
as 

  TODO: {
    local $TODO = "why";

    ...normal testing code...
  }

or if it causes problem with t9503-gitweb-Mechanize.sh failing, perhaps
as

  TODO: {
    todo_skip "why", <n> if <condition>;

    ...normal testing code...
  }




How it is different from PERL_PATH?

-- 
Jakub Narebski
ShadeHawk on #git
Thorn, Poland
--

From: Lea Wiemann
Date: Monday, June 23, 2008 - 10:57 am

[Open issue about PERL_PATH at the bottom!]


I'll just remove it.  Anybody trying to add a test count will give up 

This one is to make "local $long_tests = 0" work -- it wouldn't work 

*checks*  No, since I want to strip the leading refs/heads/ anyway (in 
order to test the page text, which mentions heads and tags without the 
directory prefix) -- hence the 'split' wouldn't go away with 


That's a tyop: s/Thus/This/ (fixed).  It refers to the following "my 

Yup.  Improved comment, and simplified system call:

-	# Use exec, not the shell, to support blanks in the path.
-	my $status = system { $ENV{GITPERL} } $ENV{GITPERL}, $gitweb;
+	# Use exec, not the shell, to support embedded whitespace in
+	# the path to gitweb.cgi.

No idea.  I tried it and it told me that the URL was too long -- I 
suspect it's the W3C server that rejected it.  I wouldn't spend more 
effort trying to get online validation to work; it's probably easier to 
just occasionally validate manually. ;)  XML well-formedness tests 

Right -- here's my new version (which still fails if the feeds die or 
are not well-formed XML -- I'll want that when I change gitweb):

# RSS/Atom/OPML view
# Simply retrieve and verify well-formedness, but don't spider.
$mech->get_ok('?p=.git;a=atom', 'Atom feed') and _verify_page;
$mech->get_ok('?p=.git;a=rss', 'RSS feed') and _verify_page;
TODO: {
	# Now spider -- but there are broken links.
	# http://mid.gmane.org/485EB333.5070108@gmail.com
	local $TODO = "fix broken links in Atom/RSS feeds";
	test_page('?p=.git;a=atom', 'Atom feed');
	test_page('?p=.git;a=rss', 'RSS feed');
}

Right, I didn't think of that.  PERL_PATH isn't available in the tests 
though, it's only used internally by the Makefile to generate (among 
other things) gitweb.cgi.  This means that while we can control under 
which Perl version gitweb.cgi runs, we cannot control under which Perl 
version the test suite runs (at least without $PATH trickery).  Does 
this bother us?

If yes, I'd ...
From: Jakub Narebski
Date: Monday, June 23, 2008 - 3:18 pm

I just think that not having comment is better than have comment which
is not fully true, or five lines of explanations which could be
expanded further.

Simply: planning (coming with number of tests upfront) is hard, and for

Hmmmm...

Siednote: some time ago I though that I understood difference between


It makes sense now, thanks.

Although I guess it would be nice to have link to the ticket to the bug
(report) in WWW::Mechanize::CGI, so we could use cgi_application()

...if not for the trick with explicitly calling Perl (GITPERL), to

Right.
 
Well, W3C feed validator provides SOAP interface we could use, using
POST of course and not GET (but it requires net connection), and also


That's what I was asking about.

-- 
Jakub Narebski
Poland
--

From: Lea Wiemann
Date: Monday, June 23, 2008 - 7:01 pm

Sure, I've added a link to 
http://rt.cpan.org/Ticket/Display.html?id=36654 for completeness, though 
we won't be able to use it if it gets fixed, since we can't break 
compatibility to older versions.  Btw, you may want to add a reply to 
the bug report (if that's possible -- haven't used CPAN's ticket system 
before) that "system $application $application" would be a better 

No, my point was that it will fail *in case* it breaks (which is good). 
  Putting everything into a TODO block would silently ignore issues with 

Breaking the indecisiveness, I've ripped it out. ;-)  We can still test 
gitweb under different Perl versions by make'ing it with different 
PERL_PATHs, so I think it'll be fine.

-- Lea
--

From: Lea Wiemann
Date: Monday, June 23, 2008 - 7:18 pm

This test uses Test::WWW::Mechanize::CGI to check gitweb's output.  It
also uses HTML::Lint, XML::Parser, and Archive::Tar (if present, each)
to validate the HTML/XML/tgz output, and checks all links on the
tested pages if --long-tests is given.

Signed-off-by: Jakub Narebski <jnareb@gmail.com>
Signed-off-by: Lea Wiemann <LeWiemann@gmail.com>
---
Changes since v6:

- Removed GITPERL again.

- Added some basic tests for the page contents of blob and blob_plain
  views.

- Incorporated Jakub's suggestions (see parent posts); in particular
  blame and feed views *do* get spidered now in --long-tests mode, but
  the spider tests are marked as TODO and thus don't cause the test
  suite to fail.

- Some minor (internal) changes: @files only contains blobs, not
  trees; improved the logic that avoids checking pages twice; die if
  test.pl is run directly rather than from the shell script.

 t/t9503-gitweb-Mechanize.sh |  132 +++++++++++++++
 t/t9503/test.pl             |  378 +++++++++++++++++++++++++++++++++++++++++++
 2 files changed, 510 insertions(+), 0 deletions(-)
 create mode 100755 t/t9503-gitweb-Mechanize.sh
 create mode 100755 t/t9503/test.pl

diff --git a/t/t9503-gitweb-Mechanize.sh b/t/t9503-gitweb-Mechanize.sh
new file mode 100755
index 0000000..6f7168e
--- /dev/null
+++ b/t/t9503-gitweb-Mechanize.sh
@@ -0,0 +1,132 @@
+#!/bin/sh
+#
+# Copyright (c) 2008 Jakub Narebski
+# Copyright (c) 2008 Lea Wiemann
+#
+
+# This test supports the --long-tests option.
+
+# This test only runs on Perl 5.8 and later versions, since
+# Test::WWW::Mechanize::CGI requires Perl 5.8.
+
+test_description='gitweb tests (using WWW::Mechanize)
+
+This test uses Test::WWW::Mechanize::CGI to test gitweb.'
+
+# helper functions
+
+safe_chmod () {
+	chmod "$1" "$2" &&
+	if [ "$(git config --get core.filemode)" = false ]
+	then
+		git update-index --chmod="$1" "$2"
+	fi
+}
+
+. ./test-lib.sh
+
+# check if test can be run
+perl -MEncode -e 'decode_utf8("", ...
From: Lea Wiemann
Date: Thursday, June 26, 2008 - 6:47 am

This test uses Test::WWW::Mechanize::CGI to check gitweb's output.  It
also uses HTML::Lint, XML::Parser, and Archive::Tar (if present, each)
to validate the HTML/XML/tgz output, and checks all links on the
tested pages if --long-tests is given.

Signed-off-by: Jakub Narebski <jnareb@gmail.com>
Signed-off-by: Lea Wiemann <LeWiemann@gmail.com>
---
This patch applies on next.

Changes since v7:

- In Makefile, dump $PERL_PATH to GIT-BUILD-OPTIONS (which gets
  source'd by test-lib.sh), and use it in t9710-perl-git-repo.sh.

- .git/description in the test repository no longer depends on $0
  (this would e.g. cause 'cd t; make' to fail).

- Add test_link subroutine and use it everywhere in place of
  ok(find_link...) so that links whose presence get tested get checked
  and spidered in --long-tests mode.

- Add simple page caching (reduces execution time without --long-tests
  by 25%).  That's *really* helpful when you have to run those tests
  on a regular basis. ;)  (WWW::Mechanize::Cached won't work with
  TWM::CGI, so we have to implement it ourselves; but it's easier
  anyway.)

- Follow redirects rather than failing.

- Test subdirectories in tree view.

- Test error handling for non-existent hashes or hashes of wrong type.

- Test commitdiff_plain view.

- Expand test for history view.

- Test tag objects (not just symbolic tags) in tag list.

- Test a specific bug (under "diff formatting", marked TODO).

- Do not test correctness of line number fragments (#l[0-9]+); they're
  broken too often right now.

- Probably some more minor improvements I've forgotten about. :)

 Makefile                    |    1 +
 t/t9503-gitweb-Mechanize.sh |  138 +++++++++++
 t/t9503/test.pl             |  553 +++++++++++++++++++++++++++++++++++++++++++
 3 files changed, 692 insertions(+), 0 deletions(-)
 create mode 100755 t/t9503-gitweb-Mechanize.sh
 create mode 100755 t/t9503/test.pl

diff --git a/Makefile b/Makefile
index a129491..3bf4f14 100644
--- a/Makefile
+++ ...
From: Lea Wiemann
Date: Thursday, June 26, 2008 - 6:48 am

This test uses Test::WWW::Mechanize::CGI to check gitweb's output.  It
also uses HTML::Lint, XML::Parser, and Archive::Tar (if present, each)
to validate the HTML/XML/tgz output, and checks all links on the
tested pages if --long-tests is given.

Signed-off-by: Jakub Narebski <jnareb@gmail.com>
Signed-off-by: Lea Wiemann <LeWiemann@gmail.com>
---
[Resent with correct subject.]

This patch applies on next.

Changes since v7:

- In Makefile, dump $PERL_PATH to GIT-BUILD-OPTIONS (which gets
  source'd by test-lib.sh), and use it in t9710-perl-git-repo.sh.

- .git/description in the test repository no longer depends on $0
  (this would e.g. cause 'cd t; make' to fail).

- Add test_link subroutine and use it everywhere in place of
  ok(find_link...) so that links whose presence get tested get checked
  and spidered in --long-tests mode.

- Add simple page caching (reduces execution time without --long-tests
  by 25%).  That's *really* helpful when you have to run those tests
  on a regular basis. ;)  (WWW::Mechanize::Cached won't work with
  TWM::CGI, so we have to implement it ourselves; but it's easier
  anyway.)

- Follow redirects rather than failing.

- Test subdirectories in tree view.

- Test error handling for non-existent hashes or hashes of wrong type.

- Test commitdiff_plain view.

- Expand test for history view.

- Test tag objects (not just symbolic tags) in tag list.

- Test a specific bug (under "diff formatting", marked TODO).

- Do not test correctness of line number fragments (#l[0-9]+); they're
  broken too often right now.

- Probably some more minor improvements I've forgotten about. :)

 Makefile                    |    1 +
 t/t9503-gitweb-Mechanize.sh |  138 +++++++++++
 t/t9503/test.pl             |  553 +++++++++++++++++++++++++++++++++++++++++++
 3 files changed, 692 insertions(+), 0 deletions(-)
 create mode 100755 t/t9503-gitweb-Mechanize.sh
 create mode 100755 t/t9503/test.pl

diff --git a/Makefile b/Makefile
index ...
From: Jakub Narebski
Date: Sunday, June 29, 2008 - 3:47 pm

Wouldn't it be better to use "(each if present)"?

I am not a native English speaker, though.



I'd rather use more descriptive name, so when you look it up in gitweb
to manually (visually?) checks its output you would know of which test
is it.  Something like "gitweb Mechanize test repository", or "test


What do you need caching for?  You check if page was already accessed
when spidering...

If it is about checking links, alternate solution would be to replace
simple $mech->page_links_ok( [ $desc ] ) by finding all the links
either using $mech->followable_links() or $mech->find_all_links( ... ),
or just $mech->links, then filtering out links which you have checked

Why this package is not named WWW::Mechanize::CGI::Cached, I wonder?

By the way, if you want to add a comment to mentioned WM::CGI ticket
http://rt.cpan.org/Ticket/Display.html?id=36654 you have to either
register, or send comment via mail with the following info

From "Bugs in WWW-Mechanize-CGI via RT" <bug-WWW-Mechanize-CGI@rt.cpan.org>:
|
| Please include the string:
|
|         [rt.cpan.org #36654]
|
| in the subject line of all future correspondence about this issue. To do so, 


More tests are always nice to have, although I begin to wonder if

 
I'll try to investigate; I guess this uses wrong name or wrong hash



 
Good work!

-- 
Jakub Narebski
Poland
--

From: Lea Wiemann
Date: Sunday, June 29, 2008 - 4:39 pm

This is actually about the short tests (I don't think it gains much for
spidering, percentage-wise).  The test suite uses some repetitive set-up
code (like get_summary && follow_link 'tree' && ...), so the second and


Yeah, basically. :)  Plus, it's not to be confused with a proper
implementation of a TWM::CGI::Cached package.  (For instance, it uses
the URL rather than the complete request as cache key.  Hence it ignores
headers like Referer; but that's great for the Gitweb tests since TWM
apparently sets the Referer, but Gitweb doesn't check it, so there's no


test_page doesn't use $mech->get_ok anymore, but rather calls $mech->get
and checks that the status is [23][0-9][0-9].  If it's 3xx, it also

It's not yet painful (or long-running) enough for me to care. :)  I'm



This is only visible with --long-tests -- there are links to line
numbers that don't exist (IOW the fragment doesn't exist in a name or id
attribute on the target page).  I've even seen links to #l0.  Bug fixes
welcome. ;-)

(There are also some other [mostly minor] issues marked with "TODO:" in
the test code; I didn't want to swamp the list with half a dozen bug

Thanks! :-)  I'll send v9 in (hopefully) 2-3 days, together with the
first working version of the caching code.

-- Lea
--

From: Jakub Narebski
Date: Sunday, June 29, 2008 - 4:56 pm

Thanks for the info.

By the way, some time ago I have send a patch (dropped, perhaps because
of it being feature freeze, or just lost) which converted support for
links with hash and without action (introduced in 7f9778b by Gerrit
Pape) to use redirect (like for 'object' action) instead of silently
filling correct action based on type of object (given by hash).  IMHO
it is better as it should prevent bookmarking "expensive" URL.


I wonder if those are intentional (or at least known) breaking, to form
approximate blame file history browsing; there was discussion about it
on git mailing list some time ago (it resulted in adding "parent" line

One issue of note, after brief peek at Git::Repo code: you should not
error out on unknown header in commit object, but either save its value
under its name, or just skip it.

Unless this has changed...

-- 
Jakub Narebski
Poland
--

From: Lea Wiemann
Date: Sunday, June 29, 2008 - 5:30 pm

Haha, I'd actually suggest the opposite. ;-)  Figuring out the right
action is almost free since you have to fetch the object anyways, so I
doubt it'll make any difference performance-wise (though it'd be
interesting to benchmark this).  However, gitweb's URLs are
prohibitively long -- so that nobody uses them in email --, and
(automatically?) dropping the action parameter where possible would be a
good first step to shortening them.  Another idea would be to shorten

*nods*  I wasn't following this in detail; if it turns out to be
unfixable, we could also remove the fragment checks for line numbers

Unless this can actually happen in practice, I'd rather die aggressively
-- it prevents errors (like cache hiccups) slipping through unnoticed.
--

From: Jakub Narebski
Date: Monday, June 30, 2008 - 2:55 pm

It is one fork more[*1*], which doesn't matter on operating systems
like Linux where forking is fairly cheap.

But perhaps avoiding redirect would be better solution... well, yet
another solution would be to leave it up to gitweb configuration ;-)

[*1*] Or pre-fetching object (together with its type) which might be

About shortening the hashes: _usually_ 8. characters is enough... but
sometimes it isn't.  What then?


If Git::Commit is to be *generic* object oriented interface to git
repository (and not only for gitweb), it has either to skip unknown
commit headers, or save them as-is.

As to checking cache: you can always check if header name is sane...
-- 
Jakub Narebski
Poland
--

From: Junio C Hamano
Date: Monday, June 23, 2008 - 9:20 pm

That sounds wrong, as the point of tests would be to make sure the stuff
you are going to install would work with what you thought will be used
from the system.

If "isn't available in the tests" is the problem, is it possible to make
it available?  We are passing down SHELL_PATH from primary Makefile to t/
and you should be able to do the same for Perl path...

About the Test::WWW:Mechanize::CGI thing, how widely available is it?  I
do not think it is packaged for Debian nor Ubuntu, for example.

--

From: Lea Wiemann
Date: Tuesday, June 24, 2008 - 1:37 am

Not very, you basically have to install it from CPAN.  If it's not 
installed, the only message you get from the test is:

ok 1: skipping gitweb tests, Test::WWW::Mechanize::CGI not found

Should optional test dependencies like Test::WWW::Mechnanize::CGI be 
documented in INSTALL?

Best,

     Lea
--

From: Jakub Narebski
Date: Tuesday, June 24, 2008 - 2:23 am

Not very widely; practically you have to install it from CPAN.  
Test::WWW::Mechanize::CGI is at v0.1, WWW::Mechanize::CGI is at v0.3.

But if Test::WWW::Mechanize::CGI is not installed, test would be not 
run.  If you are gitweb developer, then installing locally in $HOME 
from CPAN is I guess viable option; if you are not gitweb developer, we 
still have t/t9500-gitweb-standalone-no-errors.sh


The whole point of using those packages was that it makes it _easy_
to write those tests.  This consist of two parts: 

1). running gitweb as if it was CGI application, which otherwise would 
either require deep knowledge of how CGI application is invoked (what 
it is in WWW::Mechanize::CGI and required dependence 
HTTP::Request::AsCGI which I think does all the work) or working
web server, done a la "git instaweb",

2.) accessing and testing gitweb output (what Test::WWW::Mechanize,
WWW::Mechanize does; we could do the same with LWP* modules from 
libwww-perl / perl-libwww-perl package and Test::More from Perl, but
it would be repeating Test::WWW::Mechanize work).
-- 
Jakub Narebski
Poland
--

Previous thread: Re: Mercurial to git converter. by Nick Andrew on Saturday, June 14, 2008 - 3:23 am. (7 messages)

Next thread: Document clone of clone loosing branches? by Vaclav Hanzl on Saturday, June 14, 2008 - 6:05 am. (20 messages)