Thread (19 messages) 19 messages, 2 authors, 2016-09-28

Re: [PATCH v2] gitweb: use highlight's shebang detection

From: Ian Kelling <hidden>
Date: 2016-09-23 09:08:40

On Thu, Sep 22, 2016, at 03:50 PM, Jakub Narębski wrote:
W dniu 22.09.2016 o 00:18, Ian Kelling napisał:
quoted
The highlight binary can detect language by shebang when we can't tell
the syntax type by the name of the file. In that case, pass the blob
to "highlight --force" and the resulting html will have markup for
highlighting if the language was detected.
This description feels a bit convoluted. Perhaps something like this:

  The "highlight" binary can, in some cases, determine the language type
  by the means of file contents, for example the shebang in the first
  line
  for some scripting languages.  Make use of this autodetection for files
  which syntax is not known by gitweb.  In that case, pass the blob
  contents to "highlight --force"; the parameter is needed to make it
  always generate HTML output (which includes HTML-escaping).
Nice. Using it in v3.
Also, we might want to have the information about performance of this
solution either in the commit message, or in commit comments.
I tested it more rigorously and added to v3 commit message.
quoted
Document the feature and improve syntax highlight documentation, add
test to ensure gitweb doesn't crash when language detection is used,
All right.
quoted
and remove an unused parameter from gitweb_check_feature().
First, that is guess_file_syntax(), not gitweb_check_feature().
Second, this change could be made into independent patch, for example
preparatory one.

Oops. I split it out in v3.
quoted
Signed-off-by: Ian Kelling <redacted>
---
 Documentation/gitweb.conf.txt          | 21 ++++++++++++++-------
 gitweb/gitweb.perl                     | 14 +++++++-------
 t/t9500-gitweb-standalone-no-errors.sh |  8 ++++++++
 3 files changed, 29 insertions(+), 14 deletions(-)
diff --git a/Documentation/gitweb.conf.txt b/Documentation/gitweb.conf.txt
index a79e350..e632089 100644
--- a/Documentation/gitweb.conf.txt
+++ b/Documentation/gitweb.conf.txt
@@ -246,13 +246,20 @@ $highlight_bin::
 	Note that 'highlight' feature must be set for gitweb to actually
 	use syntax highlighting.
 +
-*NOTE*: if you want to add support for new file type (supported by
-"highlight" but not used by gitweb), you need to modify `%highlight_ext`
-or `%highlight_basename`, depending on whether you detect type of file
-based on extension (for example "sh") or on its basename (for example
-"Makefile").  The keys of these hashes are extension and basename,
-respectively, and value for given key is name of syntax to be passed via
-`--syntax <syntax>` to highlighter.
+*NOTE*: for a file to be highlighted, its syntax type must be detected
+and that syntax must be supported by "highlight".  The default syntax
+detection is minimal, and there are many supported syntax types with no
+detection by default.  There are three options for adding syntax
+detection.  The first and second priority are `%highlight_basename` and
+`%highlight_ext`, which detect based on basename (the full filename, for
+example "Makefile") and extension (for example "sh").  The keys of these
+hashes are the basename and extension, respectively, and the value for a
+given key is the name of the syntax to be passed via `--syntax <syntax>`
+to "highlight".  The last priority is the "highlight" configuration of
+`Shebang` regular expressions to detect the language based on the first
+line in the file, (for example, matching the line "#!/bin/bash").  See
+the highlight documentation and the default config at
+/etc/highlight/filetypes.conf for more details.
 +
I think the rewrite is a bit more readable.
quoted
 For example if repositories you are hosting use "phtml" extension for
 PHP files, and you want to have correct syntax-highlighting for those
diff --git a/gitweb/gitweb.perl b/gitweb/gitweb.perl
index 33d701d..44094f4 100755
--- a/gitweb/gitweb.perl
+++ b/gitweb/gitweb.perl
@@ -3913,7 +3913,7 @@ sub blob_contenttype {
 # guess file syntax for syntax highlighting; return undef if no highlighting
 # the name of syntax can (in the future) depend on syntax highlighter used
 sub guess_file_syntax {
-	my ($highlight, $mimetype, $file_name) = @_;
+	my ($highlight, $file_name) = @_;
Right.
quoted
 	return undef unless ($highlight && defined $file_name);
 	my $basename = basename($file_name, '.in');
 	return $highlight_basename{$basename}
@@ -3931,15 +3931,16 @@ sub guess_file_syntax {
 # or return original FD if no highlighting
 sub run_highlighter {
 	my ($fd, $highlight, $syntax) = @_;
-	return $fd unless ($highlight && defined $syntax);
+	return $fd unless ($highlight);
Run highlighter if it is defined, even if gitweb doesn't know syntax,
right.
quoted
 	close $fd;
+	my $syntax_arg = (defined $syntax) ? "--syntax $syntax" : "--force";
 	open $fd, quote_command(git_cmd(), "cat-file", "blob", $hash)." | ".
 	          quote_command($^X, '-CO', '-MEncode=decode,FB_DEFAULT', '-pse',
 	            '$_ = decode($fe, $_, FB_DEFAULT) if !utf8::decode($_);',
 	            '--', "-fe=$fallback_encoding")." | ".
 	          quote_command($highlight_bin).
-	          " --replace-tabs=8 --fragment --syntax $syntax |"
+	          " --replace-tabs=8 --fragment $syntax_arg |"
Use '--force' if syntax is unknown, right.
quoted
 		or die_error(500, "Couldn't open file or run syntax highlighter");
 	return $fd;
 }
@@ -7062,9 +7063,8 @@ sub git_blob {
 	$have_blame &&= ($mimetype =~ m!^text/!);

 	my $highlight = gitweb_check_feature('highlight');
-	my $syntax = guess_file_syntax($highlight, $mimetype, $file_name);
-	$fd = run_highlighter($fd, $highlight, $syntax)
-		if $syntax;
+	my $syntax = guess_file_syntax($highlight, $file_name);
+	$fd = run_highlighter($fd, $highlight, $syntax);
Remove unused parameter from callsite, *and* run highlighter even if we
don't know syntax.
quoted
 	git_header_html(undef, $expires);
 	my $formats_nav = '';
@@ -7117,7 +7117,7 @@ sub git_blob {
 			$line = untabify($line);
 			printf qq!<div class="pre"><a id="l%i" href="%s#l%i" class="linenr">%4i</a> %s</div>\n!,
 			       $nr, esc_attr(href(-replay => 1)), $nr, $nr,
-			       $syntax ? sanitize($line) : esc_html($line, -nbsp=>1);
+			       $highlight ? sanitize($line) : esc_html($line, -nbsp=>1);
This is a bit of code duplication / sync from run_highlighter(), but
it is not your fault; it was there (and I don't know how to improve it).
quoted
 		}
 	}
 	close $fd
diff --git a/t/t9500-gitweb-standalone-no-errors.sh b/t/t9500-gitweb-standalone-no-errors.sh
index e94b2f1..576db6d 100755
--- a/t/t9500-gitweb-standalone-no-errors.sh
+++ b/t/t9500-gitweb-standalone-no-errors.sh
Nice.
quoted
@@ -709,6 +709,14 @@ test_expect_success HIGHLIGHT \
 	 git commit -m "Add test.sh" &&
 	 gitweb_run "p=.git;a=blob;f=test.sh"'

+test_expect_success HIGHLIGHT \
+	'syntax highlighting (highlighter language autodetection)' \
+	'git config gitweb.highlight yes &&
Modern way would be

  +       'test_config gitweb.highlight yes &&

but other tests in this file do not use it.
quoted
+	 echo "#!/usr/bin/ruby" > test &&
Preferred style would be

  +        echo "#!/usr/bin/ruby" >test &&

but other tests in this file do not use it.
Agreed, but leaving it as is for consistency.
Sidenote: why Ruby, and not sh / bash, Perl or Python?
Not sh / bash, just to exercise more functionality of highlight by using
a different language than the other test. ruby just was the first thing
to come to mind since I've worked with it recently, but since you made
me think of it, perl is more likely to exist in the builtin config for
longer, and it seems a bit more fitting with gitweb, so its perl in v3.
quoted
+	 git add test &&
+	 git commit -m "Add test" &&
+	 gitweb_run "p=.git;a=blob;f=test"'
+
 # ----------------------------------------------------------------------
 # forks of projects
Thank you for your work.
--
Jakub Narębski
The only changes in v3 are the ones I described here.

Thank you for reviewing this.
--
Ian Kelling
Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help