Re: More on git over HTTP POST

Previous thread: [PATCH] Add Pascal/Delphi (.pas file) funcname pattern. by Avery Pennarun on Friday, August 1, 2008 - 2:00 pm. (1 message)

Next thread: extracting to/cc addresses for stg mail by Bjorn Helgaas on Friday, August 1, 2008 - 3:50 pm. (3 messages)
From: H. Peter Anvin
Date: Friday, August 1, 2008 - 2:50 pm

Hi all,

I have investigated a bit what it would take to support git protocol 
(smart transport) over HTTP POST transactions.

The current proxy system is broken, for a very simple reason: it doesn't 
convey information about when the channel should be turned around.

HTTP POST -- or, for that matter, any RPC-style transport, is a half 
duplex transport: only one direction can be active at a time, after 
which the channel has to be explicitly turned around.  The "turning 
around" consists of posting the queued transaction and listening for the 
reply.

Ultimately, it comes down to the following: the transactor needs to be 
given explicit information when the git protocol goes from writing to 
reading (the opposite direction information is obvious.)  I was hoping 
that it would be possible to get this information from snooping the 
protocol, but it doesn't seem to be so lucky.

I started to hack on a variant which would embed a VFS-style interface 
in git itself, looking something like:

struct transactor;

struct transact_ops {
	ssize_t (*read)(struct transactor *, void *, size_t);
	ssize_t (*write)(struct transactor *, const void *, size_t);
	int (*close)(struct transactor *);
};

struct transactor {
	union {
		void *p;
		intptr_t i;
	} u;
	const struct transact_ops *ops;
};

Replacing the usual fd operations with this interface would allow a 
different transactor to see the phase changes explicitly; the 
replacement to use xread() and xwrite() is obvious.

Of course, I started hacking on it and found myself with zero time to 
continue, but I thought I'd post what I had come up with.

	-hpa
--

From: Shawn O. Pearce
Date: Saturday, August 2, 2008 - 1:57 pm

I have started to think about this more myself, not just for POST
put also for some form of GET that can return an efficient pack,
rather than making the client walk the object chains itself.

Have you looked at the Mecurial wire protocol?  It runs over HTTP
and uses a relatively efficient means of deciding where to cut the
transfer at.

  http://www.selenic.com/mercurial/wiki/index.cgi/WireProtocol

Most of their smarts are in the branches() and between() operations.

Unfortunately this documentation isn't very complete and/or there
are some simplifications that the Mecurial team took due to their
repository format not initially supporting multiple branches like

Well, over git:// (or any protocol that wraps git:// like ssh)
we assume a full-duplex channel.  Some proxy systems are able to

No, the git:// protocol implementation in fetch-pack/upload-pack
runs more efficient than that by keeping a sliding window of stuff
that is in-flight.  Its I guess two async RPCs running in parallel,
but from the client and server perspective both RPCs go into the
same computation.

HTTP POST is actually trivial if you don't want to support the new
tell-me-more extension that was added to git-push.  Hell, I could
write the CGI in a few minutes I think.  Its really just a small
wrapper around git-receive-pack.

What's a bitch is the efficient fetch, and getting tell-me-more to
work on push.

-- 
Shawn.
--

From: Daniel Stenberg
Date: Saturday, August 2, 2008 - 2:00 pm

Yes it does. The CONNECT method is used to get a full-duplex channel to a 
remote site through a HTTP proxy. The downside with that is of course that 
most proxies are setup to disallow CONNECT to other ports than 443 (the https 
default port).

-- 

  / daniel.haxx.se
--

From: Shawn O. Pearce
Date: Saturday, August 2, 2008 - 2:08 pm

Ah, yes.  CONNECT.  Very few servers wind up supporting it I think.

I know one very big company who cannot use or support Git because
Git over HTTP is too slow to be useful.  They support other tools
like Subversion instead.  :-|

Really we just need smart protocol support in half-duplex RPC like
hpa was going after.  Then it doesn't matter what we serialize it
into, almost any RPC system will be useful.  Of course the only
one that probably matters in practice is HTTP.

-- 
Shawn.
--

From: Petr Baudis
Date: Saturday, August 2, 2008 - 2:23 pm

On what projects? I'm currently using Git over HTTP (read-only) a lot
and it doesn't seem really all that impractical to me. Maybe just using
a more dumb-friendly packing scheme could help a lot?

				Petr "Pasky" Baudis
--

From: Shawn O. Pearce
Date: Saturday, August 2, 2008 - 2:32 pm

They tested by taking the SVN source code and importing it into
both Git and Hg, then cloned them both over a WAN link.  Git was
22x slower.  I suspect they didn't pack the Git repository at all,
so Git had to issue thousands of HTTP GET requests for the loose
objects.  But I also suspect there was bias in the testing so they
didn't realize they needed to repack, and didn't care to find out.

I've probably already said too much.  I'm under NDAs.

But anyway.  The point I was trying to make was that there are
not just some proxy servers, but also some server platforms, that
cannot handle bidirectional communiction.  E.g. servers that are
behind reverse proxies, where the reverse proxy is acting as a sort
of firewall or content cache accelerator.

-- 
Shawn.
--

From: Shawn O. Pearce
Date: Saturday, August 2, 2008 - 7:56 pm

So I have this draft of how smart push might work.  Its slated
for the Documentation/technical directory.  Thus far I have only
written about push support, but Ilari on #git has some ideas about
how to do a smart fetch protocol.

Implementation wise in C git I think this is just a new C
program (git-http-backend?) that turns around and proxies
into git-receive-pack, at least for the push support.

What I don't know is how we could configure URI translation from
/path/to/repository.git received out of the $PATH_INFO in the
CGI environment to a physical directory.  Should we rely on the
server's $PATH_TRANSLATED?


Smart HTTP transfer protocols
=============================

Git supports two HTTP based transfer protocols.  A "dumb" protocol
which requires only a standard HTTP server on the server end of the
connection, and a "smart" protocol which requires a Git aware CGI
(or server module).  This document describes the "smart" protocol.

Authentication
--------------

Standard HTTP authentication is used, and must be configured and
enforced by the HTTP server software.

Chunked Transfer Encoding
-------------------------

For performance reasons the HTTP/1.1 chunked transfer encoding is
used frequently to transfer variable length objects.  This avoids
needing to produce large results in memory to compute the proper
content-length.

Detecting Smart Servers
-----------------------

HTTP clients can detect a smart Git-aware server by sending the
show-ref request (below) to the server.  If the response has a
status of 200 and the magic x-application/git-refs content type
then the server can be assumed to be a smart Git-aware server.

If any other response is received the client must assume dumb
protocol support, as the server did not correctly response to
the request.


Show Refs
---------

Obtains the available refs from the remote repository.  The response
is a sequence of git "packet lines", one per ref, and a final flush
packet line to indicate the end ...
From: Junio C Hamano
Date: Saturday, August 2, 2008 - 8:27 pm

As the initial protocol exchange request, I suspect that you would regret
if you do not leave room for some "capability advertisement" in this
exchange.

With the git native protocol, we luckily found space to do so after the
ref payload (because pkt-line is "length + payload" format but the code
that reads payload happened to ignore anything after NUL).  You would want
to define how these are given by the server to the client over HTTP
channel.  For example, putting them on extra HTTP headers is probably Ok.
--

From: Shawn O. Pearce
Date: Saturday, August 2, 2008 - 8:31 pm

Yea, I thought that the HTTP headers would be more than enough
space to add capability advertisements.  Most client libraries
will happily parse and store these for the application, and won't
make a fuss if the application doesn't read them.

Hence there's more than enough room in the protocol to extend it
in the future with additional capabilities.

We do have to be careful though.  Any cachable resource must only
rely upon the URI and the standard headers which compute into the
cache key for a request.  There aren't many, though I think the
Content-Type header may be among them.

-- 
Shawn.
--

From: H. Peter Anvin
Date: Saturday, August 2, 2008 - 8:47 pm

I think that would be a mistake, just because it's one more thing for 
proxies to screw up on.  It's better to have negotiation information in 
the payload, before the "real" data.

Obviously one thing that needs to be included in each transaction is a 
transaction ID that will be reported back on the next transaction, since 
you can't rely on a persistent connection.

	-hpa
--

From: Shawn O. Pearce
Date: Saturday, August 2, 2008 - 9:10 pm

I didn't realize we were in an era of proxies that are that
brain-damaged that they cannot relay the other headers.  The Amazon
S3 service relies heavily upon their own extended headers to make
their REST API work.  If proxies stripped that stuff out then the
client wouldn't work at all.



No.  That requires the server to maintain state.  We don't want to
do that if we can avoid it.  I would much rather have the clients
handle the state management as it simplifies the server side,
especially when you start talking about reverse proxies and/or
load-balancers running in front of the server farm.

-- 
Shawn.
--

From: david
Date: Sunday, August 3, 2008 - 1:10 am

actually, it's not just a matter of not getting 'past this dark age of the 
Internet', it's an issue that so many people are tunneling _everyting_ 
over http (including the bad guys tunneling malware) that proxies are 
getting more aggressive then they have ever been before in pulling apart 
the payload and analysing it before letting it get through to the far 
side.

--

From: H. Peter Anvin
Date: Sunday, August 3, 2008 - 4:42 am

... which is of course because of said proxies that this is happening, too.

There are too many idiots out there building "security software" and 
running IT departments, that's really the bottom line.

By the way, I want to say *thank you* to Shawn for tackling this 
project: this has been a major issue for kernel.org, and getting 
something like this deployed would be incredibly helpful.

	-hpa

--

From: H. Peter Anvin
Date: Sunday, August 3, 2008 - 4:29 am

If we were, there wouldn't be a need for this project at all.  The whole 
purpose of it is to deal with corporate proxies that try to prevent 
actual communication because of "security", and it's really hard to 
predict what utterly arbitrary heuristics they have applied.

	-hpa

--

From: H. Peter Anvin
Date: Saturday, August 2, 2008 - 8:51 pm

Note: you cannot rely on HTTP/1.1 being supported by an intermediate 
proxy; you might have to handle HTTP/1.0, where the data is terminated 
by connection close.

	-hpa
--

From: Shawn O. Pearce
Date: Saturday, August 2, 2008 - 9:12 pm

Well, that proxy is going to be crying when we upload a 120M pack
during a push to it, and it buffers the damn thing to figure out
the proper Content-Length so it can convert an HTTP/1.1 client
request into an HTTP/1.0 request to forward to the server.  That's
just _stupid_.

But from the client side perspective the chunked transfer encoding
is used only to avoid generating in advance and producing the
content-length header.  I fully expect the encoding to disappear
(e.g. in a proxy, or in the HTTP client library) before any sort
of Git code gets its fingers on the data.

Hence to your other remark, I _do not_ rely upon the encoding
boundaries to remain intact.  That is why there is Git pkt-line
encodings inside of the HTTP data stream.  We can rely on the
pkt-line encoding being present, even if the HTTP chunks were
moved around (or removed entirely) by a proxy.

-- 
Shawn.
--

From: H. Peter Anvin
Date: Sunday, August 3, 2008 - 4:31 am

Excellent.  I did not mean that as criticism, obviously, I just wanted 
that to be clear.

HTTP/1.1 does chunked encoding, and HTTP/1.0 does terminate on 
connection close; both serve the same purpose.

	-hpa
--

From: H. Peter Anvin
Date: Saturday, August 2, 2008 - 9:01 pm

One more thing about chunked transfer encodings: you cannot assume that 
a proxy will maintain chunk boundaries, any more than you can assume 


I really think it would make more sense to use POST requests for 
everything, and have the command part of the POSTed payload.  Putting 
stuff in the URL just complicates the namespace to the detriment of the 

Transfer-encoding: chunked is illegal with a HTTP/1.0 client.

	-hpa
--

From: Mike Hommey
Date: Saturday, August 2, 2008 - 11:43 pm

If you want, I have a patch series that introduces a small API to make
HTTP requests easier to make.

Mike
--

From: Shawn O. Pearce
Date: Sunday, August 3, 2008 - 12:25 am

The new --report-status flag forces the status report feature of
the push protocol to be enabled.  This can be useful in a CGI
program that implements the server side of a "smart" Git-aware
HTTP transport.  The CGI code can perform the selection of the
feature and ask receive-pack to enable it automatically.

The new --no-advertise-heads causes receive-pack to bypass its usual
display of known refs to the client, and instead immediately start
reading the commands and pack from stdin.  This is useful in a CGI
situation where we want to hand off all input to receive-pack.

Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
---
 receive-pack.c |   19 ++++++++++++++-----
 1 files changed, 14 insertions(+), 5 deletions(-)

diff --git a/receive-pack.c b/receive-pack.c
index d44c19e..512eae6 100644
--- a/receive-pack.c
+++ b/receive-pack.c
@@ -464,6 +464,7 @@ static int delete_only(struct command *cmd)
 
 int main(int argc, char **argv)
 {
+	int advertise_heads = 1;
 	int i;
 	char *dir = NULL;
 
@@ -472,7 +473,15 @@ int main(int argc, char **argv)
 		char *arg = *argv++;
 
 		if (*arg == '-') {
-			/* Do flag handling here */
+			if (!strcmp(arg, "--report-status")) {
+				report_status = 1;
+				continue;
+			}
+			if (!strcmp(arg, "--no-advertise-heads")) {
+				advertise_heads = 0;
+				continue;
+			}
+
 			usage(receive_pack_usage);
 		}
 		if (dir)
@@ -497,10 +506,10 @@ int main(int argc, char **argv)
 	else if (0 <= receive_unpack_limit)
 		unpack_limit = receive_unpack_limit;
 
-	write_head_info();
-
-	/* EOF */
-	packet_flush(1);
+	if (advertise_heads) {
+		write_head_info();
+		packet_flush(1);
+	}
 
 	read_head_info();
 	if (commands) {
-- 
1.6.0.rc1.221.g9ae23

--

From: Shawn O. Pearce
Date: Sunday, August 3, 2008 - 12:25 am

This CGI can be loaded into an Apache server using ScriptAlias,
such as with the following configuration:

  LoadModule cgi_module /usr/libexec/apache2/mod_cgi.so
  LoadModule alias_module /usr/libexec/apache2/mod_alias.so
  ScriptAlias /git/ /usr/libexec/git-core/git-http-backend/

Repositories are accessed via the translated PATH_INFO.

The CGI is backwards compatible with the dumb client, allowing the
client to detect the server's smarts by looking at the content-type
returned from "GET /repo.git/info/refs".  If the returned content
type is the magic application/x-git-refs type then the client can
assume the server is Git-aware.

Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
---
 .gitignore                                |    1 +
 Documentation/technical/http-protocol.txt |   88 +++++++++
 Makefile                                  |    1 +
 http-backend.c                            |  302 +++++++++++++++++++++++++++++
 4 files changed, 392 insertions(+), 0 deletions(-)
 create mode 100644 Documentation/technical/http-protocol.txt
 create mode 100644 http-backend.c

diff --git a/.gitignore b/.gitignore
index a213e8e..02eaf3a 100644
--- a/.gitignore
+++ b/.gitignore
@@ -51,6 +51,7 @@ git-gc
 git-get-tar-commit-id
 git-grep
 git-hash-object
+git-http-backend
 git-http-fetch
 git-http-push
 git-imap-send
diff --git a/Documentation/technical/http-protocol.txt b/Documentation/technical/http-protocol.txt
new file mode 100644
index 0000000..6cb96f3
--- /dev/null
+++ b/Documentation/technical/http-protocol.txt
@@ -0,0 +1,88 @@
+Smart HTTP transfer protocols
+=============================
+
+Git supports two HTTP based transfer protocols.  A "dumb" protocol
+which requires only a standard HTTP server on the server end of the
+connection, and a "smart" protocol which requires a Git aware CGI
+(or server module).  This document describes the "smart" protocol.
+
+As a design feature smart servers automatically degrade to the
+dumb protocol when speaking with a ...
From: H. Peter Anvin
Date: Sunday, August 3, 2008 - 4:38 am

Maybe I am slightly confused, but I thought handling HTTP chunking for 
HTTP/1.1+ clients was usually done by Apache above the level of the CGI 
script?

	-hpa
--

From: Shawn O. Pearce
Date: Sunday, August 3, 2008 - 2:25 pm

You may be right.  Apache undoes the chunking during a POST before
feeding the data to the CGI script.  If we can omit this mess of
code from git-http-backend that's a good thing.

Thanks for the sanity check.

-- 
Shawn.
--

From: Junio C Hamano
Date: Sunday, August 3, 2008 - 3:16 pm

I very much like it.

But could you be a bit more explicit than application/x-git-refs magic?  I
suspect very strongly that clueless server operators would advertise the
type on repositories statically hosted there, and would defeat the point
of your patch.

We are not changing update-server-info so if we can find a place we can
use to hide the "magic", it would be a much more robust.

Perhaps "#" comment line in info/refs that is ignored on the reading side
but update-server-info never generates on its own?

Or perhaps sort the output differently from how update-server-info
produces its output, so that older client would not care but the magic
aware client can notice?



--

From: Shawn O. Pearce
Date: Sunday, August 3, 2008 - 8:59 pm

This is a very valid concern.  I started to worry about it myself
last night, but decided it was late enough and just wanted to start

This is a good idea.  I think anyone who consumes info/refs does
so with the understanding that "#" comment lines exist, and should
be skipped, but this is not something that has been heavily tested
in the wild yet.

My concern here goes back to the remark you made above. What if a
server owner mirrors a smart server by a non-Git aware device like
wget?  They will now have a copy of the info/refs content which will
suggest we have Git smarts on the backend, but really it isn't there.

Perhaps the smart server detection is something like:

	Smart Server Detection
	----------------------

	To detect a smart (Git-aware) server a client sends an
	empty POST request to info/refs; if a 200 OK response is
	received with the proper content type then the server can
	be assumed to be Git-aware, and the result contains the
	current info/refs data for that repository.

		C: POST /repository.git/info/refs HTTP/1.0
		C: Content-Length: 0

		S: HTTP/1.0 200 OK
		S: Content-Type: application/x-git-refs
		S:
		S: 95dcfa3633004da0049d3d0fa03f80589cbcaf31	refs/heads/maint

Then clients should just attempt this POST first before issuing
a GET info/refs.  Non Git-aware servers will issue an error code,
and the client can retry with a standard GET request, and assume
the server isn't a newer style.

-- 
Shawn.
--

From: Rogan Dawes
Date: Monday, August 4, 2008 - 2:53 am

I don't understand why you would want to keep the commands in the URL 
when you are doing a POST?

How about something like:

	C: POST /repository.git/ HTTP/1.0
	C: Content-Length: <calculated>
         C:
         C: <whatever command you want>

A dumb server will respond with:

	S: HTTP/1.1 405 Method not allowed

(expected according to the RFC)

Or

	S: HTTP/1.1 404 Not Found

(resulting from testing against my own repo :-) )

While a smart server will respond with a "200 Ok" and the results of the 
command.

Also, if everything is done via POST, you don't have to worry about a 
wget-cloned server appearing to be "smart", since no "smarts" will ever 
be returned in response to a GET request (and to the best of my 
knowledge, wget can't mirror using POST).

Rogan
--

From: Johannes Schindelin
Date: Monday, August 4, 2008 - 3:08 am

Hi,


Caching.

Hth,
Dscho
--

From: Rogan Dawes
Date: Monday, August 4, 2008 - 3:14 am

If you are expecting something to be cacheable, then should you not be 
using a GET anyway?


This doesn't seem negotiable to me.

Unless I am misunderstanding your "Caching" comment to mean "To enable 
caching", as opposed to "To prevent caching"?

Rogan
--

From: Johannes Schindelin
Date: Monday, August 4, 2008 - 3:26 am

Hi,


Yes.

And I think the wget thing is not an issue: we should not try to prevent 
every single idiocy.

Ciao,
Dscho
--

From: Shawn O. Pearce
Date: Monday, August 4, 2008 - 7:48 am

Well, as Dscho pointed out this partly has to do with caching and
the transparent dumb server functionality.  By using the command in
the URL, and having the command match that of the dumb server file,
its easier to emulate a dumb server and also to permit caching.

Currently git-http-backend requests no caching for info/refs, but
I could see us tweaking that to permit several minutes of caching,
especially on big public sites like kernel.org.  Having info/refs
report stale by 5 minutes is not an issue when writes to there
already have a lag due to the master-slave mirroring system in use.

Because git-http-backend emulates a dumb server there is a command
dispatch table based upon the URL submitted.  Thus we already have
the command dispatch behavior implemented in the URL and doing it

I think we fixed the wget-cloned server issue by requesting
that clients use POST /info/refs to identify a smart server.
A wget-cloned repository will fail on this, and the client can
fallback to GET /info/refs and assume it must use the object
walker to fetch (or WebDAV to push).  A smart server would
respond to the POST /info/refs request correctly and the
client would know its smart.

-- 
Shawn.
--

From: Rogan Dawes
Date: Monday, August 4, 2008 - 8:45 am

Fair enough, but what about the quote from RFC2616 that I posted in 
rebuttal to Dscho?

 > 13.10 Invalidation After Updates or Deletions
 >
 > ...
 >
 > Some HTTP methods MUST cause a cache to invalidate an entity. This is
 > either the entity referred to by the Request-URI, or by the Location
 > or Content-Location headers (if present). These methods are:
 >
 >       - PUT
 >       - DELETE
 >       - POST

This doesn't seem negotiable to me.

For those resources that are expected to be cacheable, the request 

Not by a huge amount, surely?

if (method == "GET") command = ...
else if (method == "POST") command = ...
dispatch(command);

Rogan
--

From: Shawn O. Pearce
Date: Monday, August 4, 2008 - 8:59 am

That's exactly what we are doing.  Where caching is reasonable we are
using a GET request.  Where caching cannot be performed as the server
state is changing (e.g. actually updating refs) we are using POST.
That is entirely within the guidelines of the RFC.

However we are "abusing" POST for "POST /info/refs" to detect a
Git-aware HTTP server.  Sending POST to a static resource should

Well, true, we could do that.  But then we have to break the
command name out of the input stream.  In some cases we may just be
exec'ing another Git process and letting it handle the input stream.
Shoving the command name into the start of it just makes it that
much harder to parse out.

We already have to handle splitting PATH_TRANSLATED into a pair of
(GIT_DIR, command) so we can handle that for a GET.  We might as
well just use that very same code for POST to select the command.

Besides, by placing the command name into the URL server admins can
use regex filters in their configurations to control access.  If we
shove the command name into the body of a POST they cannot do this.

I can see sites wanting to offer anonymous smart fetch, but require
password protected smart push on the same repository URL.  Slapping
a directive like:

	<Location ~ ^/git/.*/receive-pack$>
		require valid-user
		...
	</Location>

Would easily make Apache implement this for us.  Most modern HTTP
servers should be able to be configured like this.

One of the problems with these RPC-in-HTTP systems is always the
fact that the true nature of the action isn't visible in the method
and URL, causing servers and proxies to have to parse the stream to
implement firewall rules.  Or to provide access control.  I'm trying
to reuse as much of the access control support as possible from the
HTTP server and put as little of it as possible into the backend CGI.

Since the backend CGI is based upon git-receive-pack itself admins
can use the standard pre-receive/update hook pair to manage branch
level security in a ...
From: Rogan Dawes
Date: Monday, August 4, 2008 - 9:18 am

Aha. So now I see the objective. I had misunderstood the intention to be 

Right. Either with a "405 Method not supported", or a "404 Not found". 


Works for me!

Thanks for doing all the hard thinking for this feature :-)

Rogan
--

From: H. Peter Anvin
Date: Monday, August 4, 2008 - 6:03 pm

Let's put it this way: we're not seeing a huge amount of load from git 
protocol requests, and I'm going to assume "git+http" protocol to be 
used only by sites behind braindamaged firewalls (everyone else would 
use git protocol), so I'm not really all that worried about it.

I'm not sure if "emulating a dumb server" is desirable at all; it seems 
like it would at least in part defeat the purpose of minimizing the 
transaction count and otherwise be as much of a "smart" server as the 
medium permits.

	-hpa
--

From: Shawn O. Pearce
Date: Monday, August 4, 2008 - 6:24 pm

Agreed.  There's another application I want git+http for, but that
may never materialize.  Or maybe it will someday.  I just have to

I think it is a really good idea.  Then clients don't have to worry
about which HTTP URL is the "correct" one for them to be using.
End users will just magically get the smart git+http variant if
both sides support it and they need to use HTTP due to firewalls.
Clients will fall back onto the dumb protocol if the server doesn't
support smart clones.  Older clients (pre git+http) will still be
able to talk to a smart server, just slower.  This is nice for the
end user.  No thinking is required.

Never ask a human to do what a machine can do in less time.

I think its just 1 extra HTTP hit per fetch/push done against
a dumb server.  On a smart server that first hit will also give
us what we need to begin the conversation (the info/refs data).
On a dumb server its a wasted hit, but a dumb server is already
doing to suck.  One extra HTTP request against a dumb server is a
drop in the bucket.  Its also a pretty small request (an empty POST).

-- 
Shawn.
--

From: H. Peter Anvin
Date: Monday, August 4, 2008 - 6:35 pm

Not arguing that URL compatibility isn't a good thing, but there are 
other ways to accomplish it, too.  After detecting either a smart or 
dumb server, we can use a redirect to point them to a different URL, as 
appropriate.

Furthermore, in the case of round-robin sites like kernel.org, this is 
actually *mandatory* in the case of a stateful server (we need a 
redirect to a server-specific URL), and highly recommended in the case 
of a stateless server (because of potential skew.)

	-hpa
--

From: Shawn O. Pearce
Date: Monday, August 4, 2008 - 6:57 pm

I'm not sure this is necessary.

Of course it all comes down to "how does an admin map Git repositories
into the URL space of the server"?

I thought it would be simple if the admin was able to map
repositories using a ScriptAlias and allow the server to perform
path info translation to give us the filesystem location of the
repository.  Then we don't have to configure our own map of the
available Git repositories.

Once you do that though you now have the URL space associated with
that repository served by a CGI.  For older clients we need to
either serve them the file, or issue a redirect to serve the file.
The redirect is messy because we need some configuration to explain
where the files are available in the server's URL space.

Or you go the other way, and have newer git+http clients try to
find the git aware server by a redirect.  Again we have to explain
where that git aware server is in the URL space of the server.


Well, the git+http protocol will hold all state in the client, making
each RPC a stateless RPC operation.  The only issue is then dealing with
skew in a server farm.

I guess we need to ask client implementations to honor a redirect
on the first request and reuse that new base URL for all subsequent
requests that are part of the same "operation".  Then server farms
can issue a redirect to a server-specific hostname if a client
comes in with a round-robin DNS hostname, thus ensuring that for
this current operation there isn't skew.

-- 
Shawn.
--

From: H. Peter Anvin
Date: Monday, August 4, 2008 - 7:02 pm

Either that, or you can pass a "chase URL" in the payload of the 
request... it's more or less the same concept.

	-hpa
--

From: H. Peter Anvin
Date: Tuesday, August 12, 2008 - 6:56 pm

Anything we can do to keep this moving forward?  I was extremely 
encouraged with the fast progress on this; this would be great to get to 
the point where we (kernel.org) can deploy it at least for testing.

	-hpa

--

From: Shawn O. Pearce
Date: Tuesday, August 12, 2008 - 7:37 pm

Sorry, I dropped it with my egit work.  I'll pick it up again and
try to continue it further.  I left off trying to implement the
push client and saying "damn, jgit is better structured to make this
sort of change than C git" and decided it was too late at night to
continue it more.  That was like a week ago.

-- 
Shawn.
--

Previous thread: [PATCH] Add Pascal/Delphi (.pas file) funcname pattern. by Avery Pennarun on Friday, August 1, 2008 - 2:00 pm. (1 message)

Next thread: extracting to/cc addresses for stg mail by Bjorn Helgaas on Friday, August 1, 2008 - 3:50 pm. (3 messages)