Thread (90 messages) 90 messages, 5 authors, 2022-05-26

Re: [PATCH v3 0/9] Incremental po/git.pot update and new l10n workflow

From: Jiang Xin <hidden>
Date: 2022-05-23 16:13:35

On Mon, May 23, 2022 at 10:56 PM Ævar Arnfjörð Bjarmason
[off-list ref] wrote:

On Mon, May 23 2022, Jiang Xin wrote:
quoted
On Mon, May 23, 2022 at 4:19 PM Ævar Arnfjörð Bjarmason
quoted
 $(LOCALIZED_SH_GEN_PO): .build/pot/po/%.po: %
        $(call mkdir_p_parent_template)
@@ -2786,11 +2780,24 @@ sed -e 's|charset=CHARSET|charset=UTF-8|' \
 echo '"Plural-Forms: nplurals=INTEGER; plural=EXPRESSION;\\n"' >>$@
 endef

-.build/pot/git.header: $(LOCALIZED_ALL_GEN_PO)
+.build/pot/git.header:
No. We should rebuild the pot header if any po file need to be update,
because we want to refresh the timestamp in the "POT-Creation-Date:"
filed of the pot header.
Okey, I did leave a question about this in an earlier E-Mail though,
i.e. does anything actually rely on this, or the header at all, or is
this just cargo-culting?

I haven't found anything in our toolchain that cares about the header at
all (for the *.pot, not *.po!) let alone the update timestamp.
When creating a new po/XX.po manually using msginit from POT file with
or without a header, the new generated po/XX.po has different header.

  $ msginit -i po/git.pot -o po/XX-with-header.po \
      --locale=ja --no-translator
  $ msginit -i po/git-headless.pot -o po/XX-without-header.po \
      --locale=ja --no-translator
  $ diff po/XX-with-header.po po/XX-without-header.po
  1,5d0
  < # Japanese translations for Git package.
  < # Copyright (C) 2022 THE Git'S COPYRIGHT HOLDER
  < # This file is distributed under the same license as the Git package.
  < # Automatically generated, 2022.
  < #
  8,11c3
  < "Project-Id-Version: Git\n"
  < "Report-Msgid-Bugs-To: Git Mailing List [off-list ref]\n"
  < "POT-Creation-Date: 2022-05-23 23:27+0800\n"
  < "PO-Revision-Date: 2022-05-23 23:27+0800\n"
  ---
  > "Project-Id-Version: git 2.36.0.7.g31429651cf.dirty\n"
  16c8
  < "Content-Type: text/plain; charset=UTF-8\n"
  ---
  > "Content-Type: text/plain; charset=ASCII\n"

Should we ignore this change?
quoted
quoted
        $(call mkdir_p_parent_template)
        $(QUIET_GEN)$(gen_pot_header)

-po/git.pot: .build/pot/git.header $(LOCALIZED_ALL_GEN_PO)
+# We go through this dance of having a prepared
+# e.g. .build/pot/po/grep.c.po and copying it to
+# .build/pot/to-cat/grep.c only because some IDEs (e.g. VSCode) pick
+# up on the "real" extension for the purposes of auto-completion, even
+# if the .build directiory is in .gitignore.
+LOCALIZED_ALL_GEN_TO_CAT = $(LOCALIZED_ALL_GEN_PO:.build/pot/po/%.po=.build/pot/to-cat/%)
+ifdef AGGRESSIVE_INTERMEDIATE
+.INTERMEDIATE: $(LOCALIZED_ALL_GEN_TO_CAT)
+endif
+$(LOCALIZED_ALL_GEN_TO_CAT): .build/pot/to-cat/%: .build/pot/po/%.po
+       $(call mkdir_p_parent_template)
+       $(QUIET_GEN)cat $< >$@
Copy each po file in ".build/pot/po/" to another location
".build/pot/to-cat/", but without the ".po" extension.

Let's take "date.c" as an example:

1. Copy "date.c" to an intermediate C source file
".build/pot/po-munged/date.c" and replace PRItime with PRIuMAX in it.

2. Call xgettext to create  ".build/pot/po/date.c.po" from the
intermediate C source file ".build/pot/po-munged/date.c".

3. Optionally remove intermediate C source files like
".build/pot/po-munged/date.c". To have two identical C source files in
the same worktree is not good, some software may break. So I choose to
remove them.

4. Copy the po file (".build/pot/po/date.c.po") created in step 2 to
an intermediate fake C source file ".build/pot/to-cat/date.c" which is
a file without the ".po" extension. Please note this intermediate fake
C source file ".build/pot/to-cat/date.c" is not a valid C file, but a
PO file.

5. Call msgcat to create "po/git.pot" from all the intermediate fake C
source files including  ".build/pot/to-cat/date.c".

6. Optionally remove all the intermediate fake C source files in
".build/pot/to-cat/". I choose to remove them, because leave lots of
invalid C source files in worktree is not good.

For example, ".build/pot/po/date.c.po" was created from
quoted
+
+po/git.pot: .build/pot/git.header $(LOCALIZED_ALL_GEN_TO_CAT)
        $(QUIET_GEN)$(MSGCAT) $(MSGCAT_FLAGS) $^ >$@
7. "po/git.pot" depends on the intermediate fake C source files. If
any single C source file has been changed, will run step 6 to copy all
po files in ".build/pot/po" to corresponding fake C source files in
".build/pot/to-cat/", if we choose to remove these intermediate fake C
source files.

This implementation is too heavy to solve a trivial issue. I think we
can push forward this patch series and leave these comments in
"po/git.pot":
If you find it too "heavy" & are trying to optimize it for some reason
then that whole extra special-dance can be made conditional on
MAKE_AVOID_REAL_EXTENSIONS_IN_GITIGNORED_FILES.

But really, it's 15MB of .build/pot in my local HEAD with this fix-up,
it's 1.4MB without it, but this whole thing just seems like premature
optimization. Especially given:

    $ git hyperfine -r 3 -L rev origin/master,HEAD~,HEAD,avar/Makefile-incremental-po-git-pot-rule~,avar/Makefile-incremental-po-git-pot-rule -p 'git clean -dxf; git reset --hard' 'make pot' --warmup 1
    Benchmark 1: make pot' in 'origin/master
      Time (mean ± σ):      1.970 s ±  0.014 s    [User: 1.683 s, System: 0.353 s]
      Range (min … max):    1.955 s …  1.982 s    3 runs

    Benchmark 2: make pot' in 'HEAD~
      Time (mean ± σ):     931.3 ms ±   4.7 ms    [User: 3358.5 ms, System: 1088.7 ms]
      Range (min … max):   927.0 ms … 936.3 ms    3 runs

    Benchmark 3: make pot' in 'HEAD
      Time (mean ± σ):      1.506 s ±  0.389 s    [User: 4.655 s, System: 1.363 s]
      Range (min … max):    1.257 s …  1.955 s    3 runs

    Benchmark 4: make pot' in 'avar/Makefile-incremental-po-git-pot-rule~
      Time (mean ± σ):      1.015 s ±  0.002 s    [User: 3.615 s, System: 1.224 s]
      Range (min … max):    1.013 s …  1.017 s    3 runs

    Benchmark 5: make pot' in 'avar/Makefile-incremental-po-git-pot-rule
      Time (mean ± σ):      1.014 s ±  0.008 s    [User: 3.540 s, System: 1.068 s]
      Range (min … max):    1.007 s …  1.023 s    3 runs

    Summary
      'make pot' in 'HEAD~' ran
        1.09 ± 0.01 times faster than 'make pot' in 'avar/Makefile-incremental-po-git-pot-rule'
        1.09 ± 0.01 times faster than 'make pot' in 'avar/Makefile-incremental-po-git-pot-rule~'
        1.62 ± 0.42 times faster than 'make pot' in 'HEAD'
        2.12 ± 0.02 times faster than 'make pot' in 'origin/master'

I.e. all of this is much faster than what we have on "master" now. My
22434ef36ae (Makefile: avoid "sed" on C files that don't need it,
2022-04-08) (avar/Makefile-incremental-po-git-pot-rule) is then just 10%
slower than the "grep or xgettext", its "~" is the corresponding
unoptimized.

The HEAD here is with my fix-up, and HEAD~ is your series here.

Anyway, if you really feel strongly about it let's go with your way of
doing it.

It just sounded like you weren't actually trying top optimize anything,
but to work around your editor. So if we had a method to do that....
But for users who prefer to delete all the intermediate files in
".build/pot/to-cat" and ".build/pot/po-munged/", then they will get
performance penalty.
...except it seems you also care about making it much faster than
"master" (or care about <20MB of disk space), which to be blunt seems a
bit crazy to me :) Last I checked "make test" ended up creating ~1GB of
data (not all at once, but in parallel testing a lot more than 10MB is
often in play at once).

As this was a pretty obscure target that I only expect CI, you,
translators & me to run in practice a small difference in the initial
run didn't seem to matter, especially as it's all an improvement over
"master".

Anyway, you do whatever you think is best with that :)
quoted
quoted
        $ grep '#-#' po/git.pot
        #. #-#-#-#-#  git-add--interactive.perl.po  #-#-#-#-#
        #. #-#-#-#-#  add-patch.c.po  #-#-#-#-#
        #. #-#-#-#-#  git-add--interactive.perl.po  #-#-#-#-#
        #. #-#-#-#-#  branch.c.po  #-#-#-#-#
        #. #-#-#-#-#  object-name.c.po  #-#-#-#-#
        #. #-#-#-#-#  grep.c.po  #-#-#-#-#
Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help