Thread (16 messages) 16 messages, 3 authors, 2020-03-13

Re: [PATCH 4/5] t6022, t6046: test expected behavior instead of testing a proxy for it

From: Elijah Newren <hidden>
Date: 2020-03-13 17:46:06

On Fri, Mar 13, 2020 at 10:17 AM SZEDER Gábor [off-list ref] wrote:
On Thu, Mar 12, 2020 at 09:01:57PM +0100, SZEDER Gábor wrote:
quoted
quoted
quoted
quoted
I saw this test fail today in one of my custom CI builds:

  +git checkout change
  Switched to branch 'change'
  +test-tool chmtime =-1 M
  +test-tool chmtime --get M
  +GIT_MERGE_VERBOSITY=3 git merge change+rename
  +test-tool chmtime --get B
  +cat old-mtime
  +cat new-mtime
  +test 1583967731 -lt 1583967731
  error: last command exited with $?=1
  not ok 12 - merge of identical changes in a renamed file

The contents of 'out', i.e. the output of the 'git merge' command
before the failure is:

  Auto-merging B
  Merge made by the 'recursive' strategy.
   A => B | 0
   1 file changed, 0 insertions(+), 0 deletions(-)
   rename A => B (100%)

This is a rare failure, this is the first time I saw it, and to make
things worse, this one time it happened on big-endian and with all the
GIT_TEST_* knobs enabled.

  https://travis-ci.org/github/szeder/git-cooking-topics-for-travis-ci/jobs/661294571#L4020
This is very troubling.  The workflow is basically:
  - Manually set the mtime to a file to something old (I happened to
pick 1 second before now, but picking something from 1970 would have
been fine too).
  - Run a merge which is known to need to overwrite the file.  Your
output ("Auto-merging B") suggests that we should have been in such a
case.
  - Verify that the file was actually updated as expected.  Since the
OS is supposed to update the mtime when it writes the file, it should
have set it to something recent, i.e. something *different* than what
it had before.

So, now I'm left wondering how the mtime possibly could have been not
updated.  Maybe the file wasn't actually written?  (But if so, why
didn't other people see the failure?  Or your stress runs not see it?)
 Or maybe it was written but all file contents and metadata were
delayed in writing to disk such that a subsequent command still sees
the old file??  Or maybe it was written but the mtime update was
delayed and the test was able to check it in that intermediate
state???  Or perhaps the mtime check before the merge raced with the
setting of the mtime backwards and got the mtime before it was
rewound????

I don't have a plausible scenario under which any of these should be
possible; I'm at a loss.
Hmm.  Maybe leap seconds, or clock updates via ntp at an unfortunate
time?
I'm now fairly confident that 'git merge' is OK, and suspect that it's
an issue with Travis CI's s390x environment (multi-architecture
support is an alpha-stage feature).
I could finally reproduce the issue on my own machine, so apparently
it's not an issue in Travis CI's multi-arch environments.
quoted
  test_expect_success 'test' '
      touch file &&
      old=$(test-tool chmtime --get =-1 file) &&
      touch file &&
      new=$(test-tool chmtime --get file) &&
      test $old -lt $new
  '
I wanted to report the issue to Travis CI, and in order to do so I
turned the above test case into a script that doesn't at all depend on
out test framework and 'test-tool' but uses only coreutils commands
(with a whole lot of GNUisms...):

  cat >timestamp.sh <<-\EOF
  #!/bin/sh

  set -ex

  i=0
  while true
  do
        printf "$i\r"
        i=$((i + 1))

        # set a file's mtime to one second ago
        now=$(date "+%s.%N")
        one_sec_ago=$(date -d "@$((${now%.*} - 1))" "+%Y%m%d%H%M%S")
        touch -t ${one_sec_ago%??}.${one_sec_ago#????????????} file
        # save its actual mtime
        old=$(date -r file "+%s.%N")
        # set its mtime to now
        touch file
        # the current the mtime should be different, but sometimes it isn't
        new=$(date -r file "+%s.%N")
        test "${old%.*}" != "${new%.*}"
  done
  EOF
  chmod u+x timestamp.sh
  ./timestamp.sh 2>out
  echo
  tail -n11 out

And this script usually fails after a few hundred iterations on my
machine:

  308
  + date +%s.%N
  + now=1584118408.002458987
  + date -d @1584118407 +%Y%m%d%H%M%S
  + one_sec_ago=20200313175327
  + touch -t 202003131753.27 file
  + date -r file +%s.%N
  + old=1584118407.000000000
  + touch file
  + date -r file +%s.%N
  + new=1584118407.997464837
  + test 1584118407 != 1584118407

Note that the mtime update at the end of the iteration results in an
mtime that is _before_ the current time at the beginning of the
iteration.  Well, I'm puzzled :)
Interesting...

From https://stackoverflow.com/questions/14392975/timestamp-accuracy-on-ext4-sub-millsecond:
'''
date will call the gettimeofday system call which will always return
the most accurate time available based on the cached kernel time,
adjusted by the CPU cycle time if available to give nanosecond
resolution. The timestamps stored in the file system however, are only
based on the cached kernel time. ie The time calculated at the last
timer interrupt.
'''

also, from https://apenwarr.ca/log/20181113 says:

'''
Does mtime get set to >= the current time?

No, this depends on clock granularity. For example, gettimeofday() can
return times in microseconds on my system, but ext4 rounds timestamps
down to the previous ~10ms (but not exactly 10ms) increment, with the
surprising result that a newly-created file is almost always created
in the past:

  $ python -c "
  import os, time
  t0 = time.time()
  open('testfile', 'w').close()
  print os.stat('testfile').st_mtime - t0
  "

  -0.00234484672546
'''


So, perhaps instead of setting the file mtime relative to the current
clock (and truncated to the nearest integer), we need to set the file
mtime relative to its recorded mtime.
Yes, setting the old timestamp 2 seconds in the past seems to be
sufficient, at least I haven't seen it fail in a few 100k repetitions.
Or yeah, we could just use bigger numbers, but given that wall clock
and file-system clock are apparently different, I like the idea of
setting them relative to the mtimes that existed before.  Looks like
`test-tool chmtime` supports that too, so I'll cook up a patch.
Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help