-----============= acceptance-small: replay-dual ============----- Tue Apr 1 03:44:26 EDT 2025 mgs: Rocky Linux release 8.10 (Green Obsidian) MGS_OS_ID_LIKE=rhel centos fedora rocky MGS_OS_VERSION_ID=8.10 MGS_OS_ID=rocky MGS_OS_VERSION_CODE=134873088 mds1: Rocky Linux release 8.10 (Green Obsidian) MDS1_OS_VERSION_ID=8.10 MDS1_OS_VERSION_CODE=134873088 MDS1_OS_ID_LIKE=rhel centos fedora rocky MDS1_OS_ID=rocky ost1: Rocky Linux release 8.10 (Green Obsidian) OST1_OS_VERSION_CODE=134873088 OST1_OS_ID_LIKE=rhel centos fedora rocky OST1_OS_VERSION_ID=8.10 OST1_OS_ID=rocky client: Rocky Linux release 8.10 (Green Obsidian) CLIENT_OS_ID=rocky CLIENT_OS_VERSION_CODE=134873088 CLIENT_OS_VERSION_ID=8.10 CLIENT_OS_ID_LIKE=rhel centos fedora rocky oleg626-server: ls: cannot access '/home/green/git/lustre-release/lustre/tests/except/replay-dual.*ex': No such file or directory excepting tests: 14b 21b 21b skipping tests SLOW=no: 21b === replay-dual: start setup 03:45:16 (1743493516) === Starting client oleg626-client.virtnet: -o user_xattr,flock 192.168.206.126@tcp:/lustre /mnt/lustre2 Started clients oleg626-client.virtnet: 192.168.206.126@tcp:/lustre on /mnt/lustre2 type lustre (rw,checksum,flock,user_xattr,lruresize,lazystatfs,nouser_fid2path,verbose,encrypt,statfs_project) oleg626-client.virtnet: executing check_config_client /mnt/lustre oleg626-client.virtnet: Checking config lustre mounted on /mnt/lustre Checking servers environments Checking clients oleg626-client.virtnet environments Using TIMEOUT=20 osc.lustre-OST0000-osc-ffff9d0107a91000.idle_timeout=debug osc.lustre-OST0000-osc-ffff9d0108fe4000.idle_timeout=debug osc.lustre-OST0001-osc-ffff9d0107a91000.idle_timeout=debug osc.lustre-OST0001-osc-ffff9d0108fe4000.idle_timeout=debug disable quota as required oleg626-server: oleg626-server.virtnet: executing set_default_debug -1 all === replay-dual: finish setup 03:46:26 (1743493586) === == replay-dual test 0a: expired recovery with lost client ========================================================== 03:46:31 (1743493591) Check file is LU482_FAILED=/tmp/replay-dual.lu482.vDo0am UUID 1K-blocks Used Available Use% Mounted on lustre-MDT0000_UUID 2210688 3200 2205440 1% /mnt/lustre[MDT:0] lustre-OST0000_UUID 3771392 3072 3766272 1% /mnt/lustre[OST:0] lustre-OST0001_UUID 3771392 3072 3766272 1% /mnt/lustre[OST:1] filesystem_summary: 7542784 6144 7532544 1% /mnt/lustre total: 50 open/close in 3.28 seconds: 15.25 ops/second fail_loc=0x80000514 Failing mds1 on oleg626-server Stopping /mnt/lustre-mds1 (opts:) on oleg626-server 03:47:00 (1743493620) shut down facet: mds1 facet_host: oleg626-server facet_failover_host: oleg626-server Failover mds1 to oleg626-server mount facets: mds1 Starting mds1: -o localrecov lustre-mdt1/mdt1 /mnt/lustre-mds1 oleg626-server: oleg626-server.virtnet: executing set_default_debug -1 all pdsh@oleg626-client: oleg626-server: ssh exited with exit code 1 Started lustre-MDT0000 03:47:54 (1743493674) targets are mounted 03:47:54 (1743493674) facet_failover done Starting client: oleg626-client.virtnet: -o user_xattr,flock 192.168.206.126@tcp:/lustre /mnt/lustre2 - unlinked 0 (time 1743493761 ; total 0 ; last 0) total: 50 unlinks in 4 seconds: 12.500000 unlinks/second PASS 0a (189s) == replay-dual test 0b: lost client during waiting for next transno ========================================================== 03:49:41 (1743493781) UUID 1K-blocks Used Available Use% Mounted on lustre-MDT0000_UUID 2210688 3200 2205440 1% /mnt/lustre[MDT:0] lustre-OST0000_UUID 3771392 3072 3766272 1% /mnt/lustre[OST:0] lustre-OST0001_UUID 3771392 3072 3766272 1% /mnt/lustre[OST:1] filesystem_summary: 7542784 6144 7532544 1% /mnt/lustre Failing mds1 on oleg626-server Stopping /mnt/lustre-mds1 (opts:) on oleg626-server 03:50:07 (1743493807) shut down facet: mds1 facet_host: oleg626-server facet_failover_host: oleg626-server Failover mds1 to oleg626-server mount facets: mds1 Starting mds1: -o localrecov lustre-mdt1/mdt1 /mnt/lustre-mds1 oleg626-server: oleg626-server.virtnet: executing set_default_debug -1 all pdsh@oleg626-client: oleg626-server: ssh exited with exit code 1 Started lustre-MDT0000 03:51:01 (1743493861) targets are mounted 03:51:01 (1743493861) facet_failover done Starting client: oleg626-client.virtnet: -o user_xattr,flock 192.168.206.126@tcp:/lustre /mnt/lustre Starting client: oleg626-client.virtnet: -o user_xattr,flock 192.168.206.126@tcp:/lustre /mnt/lustre2 PASS 0b (171s) == replay-dual test 1: |X| simple create ================= 03:52:33 (1743493953) UUID 1K-blocks Used Available Use% Mounted on lustre-MDT0000_UUID 2210688 3200 2205440 1% /mnt/lustre[MDT:0] lustre-OST0000_UUID 3771392 3072 3766272 1% /mnt/lustre[OST:0] lustre-OST0001_UUID 3771392 3072 3766272 1% /mnt/lustre[OST:1] filesystem_summary: 7542784 6144 7532544 1% /mnt/lustre Failing mds1 on oleg626-server Stopping /mnt/lustre-mds1 (opts:) on oleg626-server 03:52:56 (1743493976) shut down facet: mds1 facet_host: oleg626-server facet_failover_host: oleg626-server Failover mds1 to oleg626-server mount facets: mds1 Starting mds1: -o localrecov lustre-mdt1/mdt1 /mnt/lustre-mds1 oleg626-server: oleg626-server.virtnet: executing set_default_debug -1 all pdsh@oleg626-client: oleg626-server: ssh exited with exit code 1 Started lustre-MDT0000 03:53:49 (1743494029) targets are mounted 03:53:49 (1743494029) facet_failover done oleg626-client.virtnet: executing wait_import_state_mount (FULL|IDLE) mdc.lustre-MDT0000-mdc-*.mds_server_uuid mdc.lustre-MDT0000-mdc-*.mds_server_uuid in FULL state after 0 sec PASS 1 (112s) == replay-dual test 2: |X| mkdir adir ==================== 03:54:25 (1743494065) UUID 1K-blocks Used Available Use% Mounted on lustre-MDT0000_UUID 2210688 3200 2205440 1% /mnt/lustre[MDT:0] lustre-OST0000_UUID 3771392 3072 3766272 1% /mnt/lustre[OST:0] lustre-OST0001_UUID 3771392 3072 3766272 1% /mnt/lustre[OST:1] filesystem_summary: 7542784 6144 7532544 1% /mnt/lustre Failing mds1 on oleg626-server Stopping /mnt/lustre-mds1 (opts:) on oleg626-server 03:54:50 (1743494090) shut down facet: mds1 facet_host: oleg626-server facet_failover_host: oleg626-server Failover mds1 to oleg626-server mount facets: mds1 Starting mds1: -o localrecov lustre-mdt1/mdt1 /mnt/lustre-mds1 oleg626-server: oleg626-server.virtnet: executing set_default_debug -1 all pdsh@oleg626-client: oleg626-server: ssh exited with exit code 1 Started lustre-MDT0000 03:55:41 (1743494141) targets are mounted 03:55:41 (1743494141) facet_failover done oleg626-client.virtnet: executing wait_import_state_mount (FULL|IDLE) mdc.lustre-MDT0000-mdc-*.mds_server_uuid mdc.lustre-MDT0000-mdc-*.mds_server_uuid in FULL state after 0 sec PASS 2 (114s) == replay-dual test 3: |X| mkdir adir, mkdir adir/bdir === 03:56:18 (1743494178) UUID 1K-blocks Used Available Use% Mounted on lustre-MDT0000_UUID 2210688 3200 2205440 1% /mnt/lustre[MDT:0] lustre-OST0000_UUID 3771392 3072 3766272 1% /mnt/lustre[OST:0] lustre-OST0001_UUID 3771392 3072 3766272 1% /mnt/lustre[OST:1] filesystem_summary: 7542784 6144 7532544 1% /mnt/lustre Failing mds1 on oleg626-server Stopping /mnt/lustre-mds1 (opts:) on oleg626-server 03:56:43 (1743494203) shut down facet: mds1 facet_host: oleg626-server facet_failover_host: oleg626-server Failover mds1 to oleg626-server mount facets: mds1 Starting mds1: -o localrecov lustre-mdt1/mdt1 /mnt/lustre-mds1 oleg626-server: oleg626-server.virtnet: executing set_default_debug -1 all pdsh@oleg626-client: oleg626-server: ssh exited with exit code 1 Started lustre-MDT0000 03:57:35 (1743494255) targets are mounted 03:57:35 (1743494255) facet_failover done oleg626-client.virtnet: executing wait_import_state_mount (FULL|IDLE) mdc.lustre-MDT0000-mdc-*.mds_server_uuid mdc.lustre-MDT0000-mdc-*.mds_server_uuid in FULL state after 0 sec PASS 3 (114s) == replay-dual test 4: |X| mkdir adir (-EEXIST), mkdir adir/bdir ========================================================== 03:58:12 (1743494292) UUID 1K-blocks Used Available Use% Mounted on lustre-MDT0000_UUID 2210688 3200 2205440 1% /mnt/lustre[MDT:0] lustre-OST0000_UUID 3771392 3072 3766272 1% /mnt/lustre[OST:0] lustre-OST0001_UUID 3771392 3072 3766272 1% /mnt/lustre[OST:1] filesystem_summary: 7542784 6144 7532544 1% /mnt/lustre mkdir: cannot create directory '/mnt/lustre/adir': File exists Failing mds1 on oleg626-server Stopping /mnt/lustre-mds1 (opts:) on oleg626-server 03:58:37 (1743494317) shut down facet: mds1 facet_host: oleg626-server facet_failover_host: oleg626-server Failover mds1 to oleg626-server mount facets: mds1 Starting mds1: -o localrecov lustre-mdt1/mdt1 /mnt/lustre-mds1 oleg626-server: oleg626-server.virtnet: executing set_default_debug -1 all pdsh@oleg626-client: oleg626-server: ssh exited with exit code 1 Started lustre-MDT0000 03:59:29 (1743494369) targets are mounted 03:59:29 (1743494369) facet_failover done oleg626-client.virtnet: executing wait_import_state_mount (FULL|IDLE) mdc.lustre-MDT0000-mdc-*.mds_server_uuid mdc.lustre-MDT0000-mdc-*.mds_server_uuid in FULL state after 0 sec PASS 4 (106s) == replay-dual test 5: open, unlink |X| close ============ 03:59:59 (1743494399) multiop /mnt/lustre2/a vo_tSc TMPPIPE=/tmp/multiop_open_wait_pipe.7348 UUID 1K-blocks Used Available Use% Mounted on lustre-MDT0000_UUID 2210688 3200 2205440 1% /mnt/lustre[MDT:0] lustre-OST0000_UUID 3771392 3072 3766272 1% /mnt/lustre[OST:0] lustre-OST0001_UUID 3771392 3072 3766272 1% /mnt/lustre[OST:1] filesystem_summary: 7542784 6144 7532544 1% /mnt/lustre Failing mds1 on oleg626-server Stopping /mnt/lustre-mds1 (opts:) on oleg626-server 04:00:24 (1743494424) shut down facet: mds1 facet_host: oleg626-server facet_failover_host: oleg626-server Failover mds1 to oleg626-server mount facets: mds1 Starting mds1: -o localrecov lustre-mdt1/mdt1 /mnt/lustre-mds1 oleg626-server: oleg626-server.virtnet: executing set_default_debug -1 all pdsh@oleg626-client: oleg626-server: ssh exited with exit code 1 Started lustre-MDT0000 04:01:16 (1743494476) targets are mounted 04:01:16 (1743494476) facet_failover done oleg626-client.virtnet: executing wait_import_state_mount (FULL|IDLE) mdc.lustre-MDT0000-mdc-*.mds_server_uuid mdc.lustre-MDT0000-mdc-*.mds_server_uuid in FULL state after 0 sec PASS 5 (111s) == replay-dual test 6: open1, open2, unlink |X| close1 [fail mds1] close2 ========================================================== 04:01:50 (1743494510) multiop /mnt/lustre2/a vo_c TMPPIPE=/tmp/multiop_open_wait_pipe.7348 multiop /mnt/lustre/a vo_c TMPPIPE=/tmp/multiop_open_wait_pipe.7348 UUID 1K-blocks Used Available Use% Mounted on lustre-MDT0000_UUID 2210688 3200 2205440 1% /mnt/lustre[MDT:0] lustre-OST0000_UUID 3771392 3072 3766272 1% /mnt/lustre[OST:0] lustre-OST0001_UUID 3771392 3072 3766272 1% /mnt/lustre[OST:1] filesystem_summary: 7542784 6144 7532544 1% /mnt/lustre Failing mds1 on oleg626-server Stopping /mnt/lustre-mds1 (opts:) on oleg626-server 04:02:13 (1743494533) shut down facet: mds1 facet_host: oleg626-server facet_failover_host: oleg626-server Failover mds1 to oleg626-server mount facets: mds1 Starting mds1: -o localrecov lustre-mdt1/mdt1 /mnt/lustre-mds1 oleg626-server: oleg626-server.virtnet: executing set_default_debug -1 all pdsh@oleg626-client: oleg626-server: ssh exited with exit code 1 Started lustre-MDT0000 04:03:07 (1743494587) targets are mounted 04:03:07 (1743494587) facet_failover done oleg626-client.virtnet: executing wait_import_state_mount (FULL|IDLE) mdc.lustre-MDT0000-mdc-*.mds_server_uuid mdc.lustre-MDT0000-mdc-*.mds_server_uuid in FULL state after 0 sec PASS 6 (111s) == replay-dual test 8: replay of resent request ========== 04:03:41 (1743494621) UUID 1K-blocks Used Available Use% Mounted on lustre-MDT0000_UUID 2210688 3200 2205440 1% /mnt/lustre[MDT:0] lustre-OST0000_UUID 3771392 3072 3766272 1% /mnt/lustre[OST:0] lustre-OST0001_UUID 3771392 3072 3766272 1% /mnt/lustre[OST:1] filesystem_summary: 7542784 6144 7532544 1% /mnt/lustre fail_loc=0x119 fail_loc=0 Failing mds1 on oleg626-server Stopping /mnt/lustre-mds1 (opts:) on oleg626-server 04:04:27 (1743494667) shut down facet: mds1 facet_host: oleg626-server facet_failover_host: oleg626-server Failover mds1 to oleg626-server mount facets: mds1 Starting mds1: -o localrecov lustre-mdt1/mdt1 /mnt/lustre-mds1 oleg626-server: oleg626-server.virtnet: executing set_default_debug -1 all pdsh@oleg626-client: oleg626-server: ssh exited with exit code 1 Started lustre-MDT0000 04:05:09 (1743494709) targets are mounted 04:05:09 (1743494709) facet_failover done oleg626-client.virtnet: executing wait_import_state_mount (FULL|IDLE) mdc.lustre-MDT0000-mdc-*.mds_server_uuid mdc.lustre-MDT0000-mdc-*.mds_server_uuid in FULL state after 0 sec PASS 8 (124s) == replay-dual test 9: resending a replayed create ======= 04:05:46 (1743494746) UUID 1K-blocks Used Available Use% Mounted on lustre-MDT0000_UUID 2210688 3200 2205440 1% /mnt/lustre[MDT:0] lustre-OST0000_UUID 3771392 3072 3766272 1% /mnt/lustre[OST:0] lustre-OST0001_UUID 3771392 3072 3766272 1% /mnt/lustre[OST:1] filesystem_summary: 7542784 6144 7532544 1% /mnt/lustre fail_loc=0x80000119 Failing mds1 on oleg626-server Stopping /mnt/lustre-mds1 (opts:) on oleg626-server 04:06:12 (1743494772) shut down facet: mds1 facet_host: oleg626-server facet_failover_host: oleg626-server Failover mds1 to oleg626-server mount facets: mds1 Starting mds1: -o localrecov lustre-mdt1/mdt1 /mnt/lustre-mds1 oleg626-server: oleg626-server.virtnet: executing set_default_debug -1 all pdsh@oleg626-client: oleg626-server: ssh exited with exit code 1 Started lustre-MDT0000 04:07:04 (1743494824) targets are mounted 04:07:04 (1743494824) facet_failover done oleg626-client.virtnet: executing wait_import_state_mount (FULL|IDLE) mdc.lustre-MDT0000-mdc-*.mds_server_uuid mdc.lustre-MDT0000-mdc-*.mds_server_uuid in FULL state after 0 sec fail_loc=0 PASS 9 (130s) == replay-dual test 10: resending a replayed unlink ====== 04:07:56 (1743494876) UUID 1K-blocks Used Available Use% Mounted on lustre-MDT0000_UUID 2210688 3200 2205440 1% /mnt/lustre[MDT:0] lustre-OST0000_UUID 3771392 3072 3766272 1% /mnt/lustre[OST:0] lustre-OST0001_UUID 3771392 3072 3766272 1% /mnt/lustre[OST:1] filesystem_summary: 7542784 6144 7532544 1% /mnt/lustre fail_loc=0x80000119 Failing mds1 on oleg626-server Stopping /mnt/lustre-mds1 (opts:) on oleg626-server 04:08:21 (1743494901) shut down facet: mds1 facet_host: oleg626-server facet_failover_host: oleg626-server Failover mds1 to oleg626-server mount facets: mds1 Starting mds1: -o localrecov lustre-mdt1/mdt1 /mnt/lustre-mds1 oleg626-server: oleg626-server.virtnet: executing set_default_debug -1 all pdsh@oleg626-client: oleg626-server: ssh exited with exit code 1 Started lustre-MDT0000 04:09:14 (1743494954) targets are mounted 04:09:14 (1743494954) facet_failover done oleg626-client.virtnet: executing wait_import_state_mount (FULL|IDLE) mdc.lustre-MDT0000-mdc-*.mds_server_uuid mdc.lustre-MDT0000-mdc-*.mds_server_uuid in FULL state after 0 sec fail_loc=0 PASS 10 (131s) == replay-dual test 11: both clients timeout during replay ========================================================== 04:10:07 (1743495007) UUID 1K-blocks Used Available Use% Mounted on lustre-MDT0000_UUID 2210688 3200 2205440 1% /mnt/lustre[MDT:0] lustre-OST0000_UUID 3771392 3072 3766272 1% /mnt/lustre[OST:0] lustre-OST0001_UUID 3771392 3072 3766272 1% /mnt/lustre[OST:1] filesystem_summary: 7542784 6144 7532544 1% /mnt/lustre fail_loc=0x0119 Failing mds1 on oleg626-server Stopping /mnt/lustre-mds1 (opts:) on oleg626-server 04:10:31 (1743495031) shut down facet: mds1 facet_host: oleg626-server facet_failover_host: oleg626-server Failover mds1 to oleg626-server mount facets: mds1 Starting mds1: -o localrecov lustre-mdt1/mdt1 /mnt/lustre-mds1 oleg626-server: oleg626-server.virtnet: executing set_default_debug -1 all pdsh@oleg626-client: oleg626-server: ssh exited with exit code 1 Started lustre-MDT0000 04:11:11 (1743495071) targets are mounted 04:11:11 (1743495071) facet_failover done oleg626-client.virtnet: executing wait_import_state_mount FULL mdc.lustre-MDT0000-mdc-*.mds_server_uuid mdc.lustre-MDT0000-mdc-*.mds_server_uuid in FULL state after 8 sec fail_loc=0 PASS 11 (101s) == replay-dual test 12: open resend timeout ============== 04:11:48 (1743495108) UUID 1K-blocks Used Available Use% Mounted on lustre-MDT0000_UUID 2210688 3200 2205440 1% /mnt/lustre[MDT:0] lustre-OST0000_UUID 3771392 3072 3766272 1% /mnt/lustre[OST:0] lustre-OST0001_UUID 3771392 3072 3766272 1% /mnt/lustre[OST:1] filesystem_summary: 7542784 6144 7532544 1% /mnt/lustre multiop /mnt/lustre/f12.replay-dual vmo_c TMPPIPE=/tmp/multiop_open_wait_pipe.7348 fail_loc=0x80000302 Failing mds1 on oleg626-server Stopping /mnt/lustre-mds1 (opts:) on oleg626-server 04:12:12 (1743495132) shut down facet: mds1 facet_host: oleg626-server facet_failover_host: oleg626-server Failover mds1 to oleg626-server mount facets: mds1 Starting mds1: -o localrecov lustre-mdt1/mdt1 /mnt/lustre-mds1 oleg626-server: oleg626-server.virtnet: executing set_default_debug -1 all pdsh@oleg626-client: oleg626-server: ssh exited with exit code 1 Started lustre-MDT0000 04:13:03 (1743495183) targets are mounted 04:13:03 (1743495183) facet_failover done fail_loc=0 /mnt/lustre/f12.replay-dual /mnt/lustre/f12.replay-dual has type file OK PASS 12 (93s) == replay-dual test 13: close resend timeout ============= 04:13:21 (1743495201) multiop /mnt/lustre/f13.replay-dual vmo_c TMPPIPE=/tmp/multiop_open_wait_pipe.7348 UUID 1K-blocks Used Available Use% Mounted on lustre-MDT0000_UUID 2210688 3200 2205440 1% /mnt/lustre[MDT:0] lustre-OST0000_UUID 3771392 3072 3766272 1% /mnt/lustre[OST:0] lustre-OST0001_UUID 3771392 3072 3766272 1% /mnt/lustre[OST:1] filesystem_summary: 7542784 6144 7532544 1% /mnt/lustre fail_loc=0x80000115 Failing mds1 on oleg626-server Stopping /mnt/lustre-mds1 (opts:) on oleg626-server 04:13:45 (1743495225) shut down facet: mds1 facet_host: oleg626-server facet_failover_host: oleg626-server Failover mds1 to oleg626-server mount facets: mds1 Starting mds1: -o localrecov lustre-mdt1/mdt1 /mnt/lustre-mds1 oleg626-server: oleg626-server.virtnet: executing set_default_debug -1 all pdsh@oleg626-client: oleg626-server: ssh exited with exit code 1 Started lustre-MDT0000 04:14:40 (1743495280) targets are mounted 04:14:40 (1743495280) facet_failover done fail_loc=0 /mnt/lustre/f13.replay-dual /mnt/lustre/f13.replay-dual has type file OK PASS 13 (95s) SKIP: replay-dual test_14b skipping ALWAYS excluded test 14b == replay-dual test 15a: timeout waiting for lost client during replay, 1 client completes ========================================================== 04:15:01 (1743495301) UUID 1K-blocks Used Available Use% Mounted on lustre-MDT0000_UUID 2210688 3200 2205440 1% /mnt/lustre[MDT:0] lustre-OST0000_UUID 3771392 3072 3766272 1% /mnt/lustre[OST:0] lustre-OST0001_UUID 3771392 3072 3766272 1% /mnt/lustre[OST:1] filesystem_summary: 7542784 6144 7532544 1% /mnt/lustre total: 25 open/close in 1.27 seconds: 19.63 ops/second total: 1 open/close in 0.05 seconds: 19.23 ops/second Failing mds1 on oleg626-server Stopping /mnt/lustre-mds1 (opts:) on oleg626-server 04:15:22 (1743495322) shut down facet: mds1 facet_host: oleg626-server facet_failover_host: oleg626-server Failover mds1 to oleg626-server mount facets: mds1 Starting mds1: -o localrecov lustre-mdt1/mdt1 /mnt/lustre-mds1 oleg626-server: oleg626-server.virtnet: executing set_default_debug -1 all pdsh@oleg626-client: oleg626-server: ssh exited with exit code 1 Started lustre-MDT0000 04:16:02 (1743495362) targets are mounted 04:16:02 (1743495362) facet_failover done oleg626-client.virtnet: executing wait_import_state_mount (FULL|IDLE) mdc.lustre-MDT0000-mdc-*.mds_server_uuid mdc.lustre-MDT0000-mdc-*.mds_server_uuid in FULL state after 0 sec - unlinked 0 (time 1743495448 ; total 0 ; last 0) total: 25 unlinks in 1 seconds: 25.000000 unlinks/second Starting client: oleg626-client.virtnet: -o user_xattr,flock 192.168.206.126@tcp:/lustre /mnt/lustre2 PASS 15a (161s) == replay-dual test 15c: remove multiple OST orphans ===== 04:17:42 (1743495462) UUID 1K-blocks Used Available Use% Mounted on lustre-MDT0000_UUID 2210688 3200 2205440 1% /mnt/lustre[MDT:0] lustre-OST0000_UUID 3771392 3072 3766272 1% /mnt/lustre[OST:0] lustre-OST0001_UUID 3771392 3072 3766272 1% /mnt/lustre[OST:1] filesystem_summary: 7542784 6144 7532544 1% /mnt/lustre Failing mds1 on oleg626-server Stopping /mnt/lustre-mds1 (opts:) on oleg626-server 04:20:55 (1743495655) shut down facet: mds1 facet_host: oleg626-server facet_failover_host: oleg626-server Failover mds1 to oleg626-server mount facets: mds1 Starting mds1: -o localrecov lustre-mdt1/mdt1 /mnt/lustre-mds1 oleg626-server: oleg626-server.virtnet: executing set_default_debug -1 all pdsh@oleg626-client: oleg626-server: ssh exited with exit code 1 Started lustre-MDT0000 04:21:36 (1743495696) targets are mounted 04:21:36 (1743495696) facet_failover done oleg626-client.virtnet: executing wait_import_state_mount (FULL|IDLE) mdc.lustre-MDT0000-mdc-*.mds_server_uuid mdc.lustre-MDT0000-mdc-*.mds_server_uuid in FULL state after 0 sec Starting client: oleg626-client.virtnet: -o user_xattr,flock 192.168.206.126@tcp:/lustre /mnt/lustre2 PASS 15c (329s) == replay-dual test 16: fail MDS during recovery (3571) == 04:23:12 (1743495792) UUID 1K-blocks Used Available Use% Mounted on lustre-MDT0000_UUID 2210688 3200 2205440 1% /mnt/lustre[MDT:0] lustre-OST0000_UUID 3771392 3072 3766272 1% /mnt/lustre[OST:0] lustre-OST0001_UUID 3771392 3072 3766272 1% /mnt/lustre[OST:1] filesystem_summary: 7542784 6144 7532544 1% /mnt/lustre total: 25 open/close in 1.26 seconds: 19.88 ops/second total: 1 open/close in 0.04 seconds: 24.31 ops/second Failing mds1 on oleg626-server Stopping /mnt/lustre-mds1 (opts:) on oleg626-server 04:23:37 (1743495817) shut down facet: mds1 facet_host: oleg626-server facet_failover_host: oleg626-server Failover mds1 to oleg626-server mount facets: mds1 Starting mds1: -o localrecov lustre-mdt1/mdt1 /mnt/lustre-mds1 oleg626-server: oleg626-server.virtnet: executing set_default_debug -1 all pdsh@oleg626-client: oleg626-server: ssh exited with exit code 1 Started lustre-MDT0000 04:24:29 (1743495869) targets are mounted 04:24:29 (1743495869) facet_failover done Failing mds1 on oleg626-server Stopping /mnt/lustre-mds1 (opts:) on oleg626-server 04:25:02 (1743495902) shut down facet: mds1 facet_host: oleg626-server facet_failover_host: oleg626-server Failover mds1 to oleg626-server mount facets: mds1 Starting mds1: -o localrecov lustre-mdt1/mdt1 /mnt/lustre-mds1 oleg626-server: oleg626-server.virtnet: executing set_default_debug -1 all pdsh@oleg626-client: oleg626-server: ssh exited with exit code 1 Started lustre-MDT0000 04:25:56 (1743495956) targets are mounted 04:25:56 (1743495956) facet_failover done oleg626-client.virtnet: executing wait_import_state_mount (FULL|IDLE) mdc.lustre-MDT0000-mdc-*.mds_server_uuid mdc.lustre-MDT0000-mdc-*.mds_server_uuid in FULL state after 0 sec - unlinked 0 (time 1743496050 ; total 1 ; last 1) total: 25 unlinks in 2 seconds: 12.500000 unlinks/second Starting client: oleg626-client.virtnet: -o user_xattr,flock 192.168.206.126@tcp:/lustre /mnt/lustre2 PASS 16 (273s) == replay-dual test 17: fail OST during recovery (3571) == 04:27:44 (1743496064) total: 25 open/close in 1.41 seconds: 17.71 ops/second total: 1 open/close in 0.07 seconds: 14.56 ops/second UUID 1K-blocks Used Available Use% Mounted on lustre-MDT0000_UUID 2210688 3328 2205312 1% /mnt/lustre[MDT:0] lustre-OST0000_UUID 3771392 3072 3766272 1% /mnt/lustre[OST:0] lustre-OST0001_UUID 3771392 3072 3766272 1% /mnt/lustre[OST:1] filesystem_summary: 7542784 6144 7532544 1% /mnt/lustre Failing ost1 on oleg626-server Stopping /mnt/lustre-ost1 (opts:) on oleg626-server 04:28:07 (1743496087) shut down facet: ost1 facet_host: oleg626-server facet_failover_host: oleg626-server Failover ost1 to oleg626-server mount facets: ost1 Starting ost1: -o localrecov lustre-ost1/ost1 /mnt/lustre-ost1 seq.cli-lustre-OST0000-super.width=65536 oleg626-server: oleg626-server.virtnet: executing set_default_debug -1 all pdsh@oleg626-client: oleg626-server: ssh exited with exit code 1 Started lustre-OST0000 04:28:47 (1743496127) targets are mounted 04:28:47 (1743496127) facet_failover done Failing ost1 on oleg626-server Stopping /mnt/lustre-ost1 (opts:) on oleg626-server 04:29:17 (1743496157) shut down facet: ost1 facet_host: oleg626-server facet_failover_host: oleg626-server Failover ost1 to oleg626-server mount facets: ost1 Starting ost1: -o localrecov lustre-ost1/ost1 /mnt/lustre-ost1 seq.cli-lustre-OST0000-super.width=65536 oleg626-server: oleg626-server.virtnet: executing set_default_debug -1 all pdsh@oleg626-client: oleg626-server: ssh exited with exit code 1 Started lustre-OST0000 04:29:56 (1743496196) targets are mounted 04:29:57 (1743496197) facet_failover done oleg626-client.virtnet: executing wait_import_state_mount (FULL|IDLE) osc.lustre-OST0000-osc-[-0-9a-f]*.ost_server_uuid osc.lustre-OST0000-osc-[-0-9a-f]*.ost_server_uuid in FULL state after 0 sec - unlinked 0 (time 1743496216 ; total 0 ; last 0) total: 25 unlinks in 1 seconds: 25.000000 unlinks/second Starting client: oleg626-client.virtnet: -o user_xattr,flock 192.168.206.126@tcp:/lustre /mnt/lustre2 PASS 17 (167s) == replay-dual test 18: ldlm_handle_enqueue succeeds on evicted export (3822) ========================================================== 04:30:31 (1743496231) debug=+dlmtrace using seed 2656223920 running for 500 iterations total: 500 stats in 2 seconds: 250.000000 stats/second fail_loc=0x8000030b ldlm.namespaces.MGC192.168.206.126@tcp.early_lock_cancel=0 ldlm.namespaces.lustre-MDT0000-mdc-ffff9d0106a09000.early_lock_cancel=0 ldlm.namespaces.lustre-MDT0000-mdc-ffff9d0107a0c000.early_lock_cancel=0 ldlm.namespaces.lustre-OST0000-osc-ffff9d0106a09000.early_lock_cancel=0 ldlm.namespaces.lustre-OST0000-osc-ffff9d0107a0c000.early_lock_cancel=0 ldlm.namespaces.lustre-OST0001-osc-ffff9d0106a09000.early_lock_cancel=0 ldlm.namespaces.lustre-OST0001-osc-ffff9d0107a0c000.early_lock_cancel=0 fail_loc=0x80000305 Error in opening file "/mnt/lustre2/d18.replay-dual/f18.replay-dual"(flags=O_RDONLY) 2: No such file or directory ldlm.namespaces.MGC192.168.206.126@tcp.early_lock_cancel=1 ldlm.namespaces.lustre-MDT0000-mdc-ffff9d0106a09000.early_lock_cancel=1 ldlm.namespaces.lustre-MDT0000-mdc-ffff9d0107a0c000.early_lock_cancel=1 ldlm.namespaces.lustre-OST0000-osc-ffff9d0106a09000.early_lock_cancel=1 ldlm.namespaces.lustre-OST0000-osc-ffff9d0107a0c000.early_lock_cancel=1 ldlm.namespaces.lustre-OST0001-osc-ffff9d0106a09000.early_lock_cancel=1 ldlm.namespaces.lustre-OST0001-osc-ffff9d0107a0c000.early_lock_cancel=1 fail_loc=0 fail_loc=0 PASS 18 (78s) == replay-dual test 19: resend of open request =========== 04:31:49 (1743496309) UUID 1K-blocks Used Available Use% Mounted on lustre-MDT0000_UUID 2210688 3328 2205312 1% /mnt/lustre[MDT:0] lustre-OST0000_UUID 3771392 3072 3766272 1% /mnt/lustre[OST:0] lustre-OST0001_UUID 3771392 3072 3766272 1% /mnt/lustre[OST:1] filesystem_summary: 7542784 6144 7532544 1% /mnt/lustre fail_loc=0x157 - open/close 0 (time 1743496412.20 total 88.04 last 0.00) total: 1 open/close in 88.04 seconds: 0.01 ops/second fail_loc=0 Failing mds1 on oleg626-server Stopping /mnt/lustre-mds1 (opts:) on oleg626-server 04:33:47 (1743496427) shut down facet: mds1 facet_host: oleg626-server facet_failover_host: oleg626-server Failover mds1 to oleg626-server mount facets: mds1 Starting mds1: -o localrecov lustre-mdt1/mdt1 /mnt/lustre-mds1 oleg626-server: oleg626-server.virtnet: executing set_default_debug -1 all pdsh@oleg626-client: oleg626-server: ssh exited with exit code 1 Started lustre-MDT0000 04:34:25 (1743496465) targets are mounted 04:34:25 (1743496465) facet_failover done oleg626-client.virtnet: executing wait_import_state_mount (FULL|IDLE) mdc.lustre-MDT0000-mdc-*.mds_server_uuid mdc.lustre-MDT0000-mdc-*.mds_server_uuid in FULL state after 0 sec PASS 19 (187s) == replay-dual test 20: recovery time is not increasing == 04:34:56 (1743496496) UUID 1K-blocks Used Available Use% Mounted on lustre-MDT0000_UUID 2210688 3328 2205312 1% /mnt/lustre[MDT:0] lustre-OST0000_UUID 3771392 3072 3766272 1% /mnt/lustre[OST:0] lustre-OST0001_UUID 3771392 3072 3766272 1% /mnt/lustre[OST:1] filesystem_summary: 7542784 6144 7532544 1% /mnt/lustre Failing mds1 on oleg626-server Stopping /mnt/lustre-mds1 (opts:) on oleg626-server 04:35:19 (1743496519) shut down facet: mds1 facet_host: oleg626-server facet_failover_host: oleg626-server Failover mds1 to oleg626-server mount facets: mds1 Starting mds1: -o localrecov lustre-mdt1/mdt1 /mnt/lustre-mds1 oleg626-server: oleg626-server.virtnet: executing set_default_debug -1 all pdsh@oleg626-client: oleg626-server: ssh exited with exit code 1 Started lustre-MDT0000 04:36:12 (1743496572) targets are mounted 04:36:12 (1743496572) facet_failover done oleg626-client.virtnet: executing wait_import_state_mount (FULL|IDLE) mdc.lustre-MDT0000-mdc-*.mds_server_uuid mdc.lustre-MDT0000-mdc-*.mds_server_uuid in FULL state after 0 sec Starting client: oleg626-client.virtnet: -o user_xattr,flock 192.168.206.126@tcp:/lustre /mnt/lustre2 UUID 1K-blocks Used Available Use% Mounted on lustre-MDT0000_UUID 2210688 3328 2205312 1% /mnt/lustre[MDT:0] lustre-OST0000_UUID 3771392 3072 3766272 1% /mnt/lustre[OST:0] lustre-OST0001_UUID 3771392 3072 3766272 1% /mnt/lustre[OST:1] filesystem_summary: 7542784 6144 7532544 1% /mnt/lustre Failing mds1 on oleg626-server Stopping /mnt/lustre-mds1 (opts:) on oleg626-server 04:39:01 (1743496741) shut down facet: mds1 facet_host: oleg626-server facet_failover_host: oleg626-server Failover mds1 to oleg626-server mount facets: mds1 Starting mds1: -o localrecov lustre-mdt1/mdt1 /mnt/lustre-mds1 oleg626-server: oleg626-server.virtnet: executing set_default_debug -1 all pdsh@oleg626-client: oleg626-server: ssh exited with exit code 1 Started lustre-MDT0000 04:39:54 (1743496794) targets are mounted 04:39:54 (1743496794) facet_failover done oleg626-client.virtnet: executing wait_import_state_mount (FULL|IDLE) mdc.lustre-MDT0000-mdc-*.mds_server_uuid mdc.lustre-MDT0000-mdc-*.mds_server_uuid in FULL state after 0 sec Starting client: oleg626-client.virtnet: -o user_xattr,flock 192.168.206.126@tcp:/lustre /mnt/lustre2 PASS 20 (455s) == replay-dual test 21a: commit on sharing =============== 04:42:31 (1743496951) mdt.lustre-MDT0000.commit_on_sharing=1 Replay barrier on lustre-MDT0000 Failing mds1 on oleg626-server Stopping /mnt/lustre-mds1 (opts:) on oleg626-server 04:42:55 (1743496975) shut down facet: mds1 facet_host: oleg626-server facet_failover_host: oleg626-server Failover mds1 to oleg626-server mount facets: mds1 Starting mds1: -o localrecov lustre-mdt1/mdt1 /mnt/lustre-mds1 oleg626-server: oleg626-server.virtnet: executing set_default_debug -1 all pdsh@oleg626-client: oleg626-server: ssh exited with exit code 1 Started lustre-MDT0000 04:43:35 (1743497015) targets are mounted 04:43:35 (1743497015) facet_failover done Starting client: oleg626-client.virtnet: -o user_xattr,flock 192.168.206.126@tcp:/lustre /mnt/lustre2 mdt.lustre-MDT0000.commit_on_sharing=0 PASS 21a (217s) SKIP: replay-dual test_21b skipping SLOW test 21b == replay-dual test 22a: c1 lfs mkdir -i 1 dir1, M1 drop reply & fail, c2 mkdir dir1/dir ========================================================== 04:46:12 (1743497172) SKIP: replay-dual test_22a needs >= 2 MDTs SKIP 22a (7s) == replay-dual test 22b: c1 lfs mkdir -i 1 d1, M1 drop reply & fail M0/M1, c2 mkdir d1/dir ========================================================== 04:46:19 (1743497179) SKIP: replay-dual test_22b needs >= 2 MDTs SKIP 22b (8s) == replay-dual test 22c: c1 lfs mkdir -i 1 d1, M1 drop update & fail M1, c2 mkdir d1/dir ========================================================== 04:46:27 (1743497187) SKIP: replay-dual test_22c needs >= 2 MDTs SKIP 22c (8s) == replay-dual test 22d: c1 lfs mkdir -i 1 d1, M1 drop update & fail M0/M1,c2 mkdir d1/dir ========================================================== 04:46:35 (1743497195) SKIP: replay-dual test_22d needs >= 2 MDTs SKIP 22d (7s) == replay-dual test 23a: c1 rmdir d1, M1 drop reply and fail, client2 mkdir d1 ========================================================== 04:46:42 (1743497202) SKIP: replay-dual test_23a needs >= 2 MDTs SKIP 23a (7s) == replay-dual test 23b: c1 rmdir d1, M1 drop reply and fail M0/M1, c2 mkdir d1 ========================================================== 04:46:49 (1743497209) SKIP: replay-dual test_23b needs >= 2 MDTs SKIP 23b (7s) == replay-dual test 23c: c1 rmdir d1, M0 drop update reply and fail M0, c2 mkdir d1 ========================================================== 04:46:56 (1743497216) SKIP: replay-dual test_23c needs >= 2 MDTs SKIP 23c (8s) == replay-dual test 23d: c1 rmdir d1, M0 drop update reply and fail M0/M1, c2 mkdir d1 ========================================================== 04:47:04 (1743497224) SKIP: replay-dual test_23d needs >= 2 MDTs SKIP 23d (8s) == replay-dual test 24: reconstruct on non-existing object ========================================================== 04:47:12 (1743497232) fail_loc=0x119 fail_loc=0 truncate: cannot truncate '/mnt/lustre/f24.replay-dual' to length 100: No such file or directory PASS 24 (104s) == replay-dual test 25: replay|resend ==================== 04:48:57 (1743497337) 1+0 records in 1+0 records out 512 bytes copied, 0.0164106 s, 31.2 kB/s fail_loc=0x80000325 Failing ost1 on oleg626-server fail_loc=0x304 Stopping /mnt/lustre-ost1 (opts:) on oleg626-server 04:49:13 (1743497353) shut down facet: ost1 facet_host: oleg626-server facet_failover_host: oleg626-server Failover ost1 to oleg626-server mount facets: ost1 Starting ost1: -o localrecov lustre-ost1/ost1 /mnt/lustre-ost1 seq.cli-lustre-OST0000-super.width=65536 fail_loc=0 oleg626-server: oleg626-server.virtnet: executing set_default_debug -1 all pdsh@oleg626-client: oleg626-server: ssh exited with exit code 1 Started lustre-OST0000 04:49:51 (1743497391) targets are mounted 04:49:51 (1743497391) facet_failover done oleg626-client.virtnet: executing wait_import_state_mount (FULL|IDLE) osc.lustre-OST0000-osc-[-0-9a-f]*.ost_server_uuid osc.lustre-OST0000-osc-[-0-9a-f]*.ost_server_uuid in FULL state after 0 sec (multiop)$: no process found PASS 25 (82s) == replay-dual test 26: dbench and tar with mds failover ========================================================== 04:50:19 (1743497419) Starting client oleg626-client.virtnet: -o user_xattr,flock 192.168.206.126@tcp:/lustre /mnt/lustre Started clients oleg626-client.virtnet: 192.168.206.126@tcp:/lustre on /mnt/lustre type lustre (rw,checksum,flock,user_xattr,lruresize,lazystatfs,nouser_fid2path,verbose,encrypt,statfs_project) Started tar loop with pid 44302 Started dbench loop with 44304 looking for dbench program /usr/bin/dbench found dbench client file /usr/share/dbench/client.txt UUID 1K-blocks Used Available Use% Mounted on lustre-MDT0000_UUID 2210560 3200 2205312 1% /mnt/lustre[MDT:0] lustre-OST0000_UUID 3771392 3072 3742720 1% /mnt/lustre[OST:0] lustre-OST0001_UUID 3771392 3072 3753984 1% /mnt/lustre[OST:1] filesystem_summary: 7542784 6144 7496704 1% /mnt/lustre '/usr/share/dbench/client.txt' -> 'client.txt' running 'dbench 1 -D /mnt/lustre2/d26.replay-dual/run_dbench -t 100' on /mnt/lustre at Tue Apr 1 04:50:31 EDT 2025 waiting for dbench pid 44381 dbench version 4.00 - Copyright Andrew Tridgell 1999-2004 Running for 100 seconds with load 'client.txt' and minimum warmup 20 secs failed to create barrier semaphore 0 of 1 processes prepared for launch 0 sec 1 of 1 processes prepared for launch 0 sec releasing clients 1 21 0.46 MB/sec warmup 1 sec latency 43.589 ms 1 61 1.28 MB/sec warmup 2 sec latency 204.907 ms 1 95 1.42 MB/sec warmup 3 sec latency 221.820 ms 1 129 1.44 MB/sec warmup 4 sec latency 211.305 ms 1 155 1.30 MB/sec warmup 5 sec latency 196.372 ms 1 189 1.26 MB/sec warmup 6 sec latency 145.470 ms test_26 fail mds1 1 times 1 208 1.15 MB/sec warmup 7 sec latency 220.813 ms 1 230 1.11 MB/sec warmup 8 sec latency 208.646 ms 1 256 1.07 MB/sec warmup 9 sec latency 169.945 ms 1 286 1.06 MB/sec warmup 10 sec latency 139.886 ms Failing mds1 on oleg626-server 1 305 1.01 MB/sec warmup 11 sec latency 337.712 ms 1 324 0.97 MB/sec warmup 12 sec latency 209.978 ms Stopping /mnt/lustre-mds1 (opts:) on oleg626-server 1 341 0.94 MB/sec warmup 13 sec latency 477.860 ms 1 376 0.95 MB/sec warmup 14 sec latency 202.972 ms 1 405 0.95 MB/sec warmup 15 sec latency 150.533 ms 1 437 0.96 MB/sec warmup 16 sec latency 126.047 ms 1 469 0.96 MB/sec warmup 17 sec latency 228.985 ms 1 497 0.96 MB/sec warmup 18 sec latency 222.462 ms 1 519 0.94 MB/sec warmup 19 sec latency 189.210 ms 1 649 0.32 MB/sec execute 1 sec latency 190.137 ms 1 664 0.21 MB/sec execute 2 sec latency 698.564 ms 1 678 0.15 MB/sec execute 3 sec latency 708.121 ms 1 711 0.12 MB/sec execute 4 sec latency 143.507 ms 1 728 0.11 MB/sec execute 5 sec latency 396.104 ms 1 738 0.10 MB/sec execute 6 sec latency 207.993 ms 1 778 0.09 MB/sec execute 7 sec latency 123.595 ms 1 794 0.09 MB/sec execute 8 sec latency 435.340 ms 1 821 0.08 MB/sec execute 9 sec latency 135.573 ms 1 873 0.09 MB/sec execute 10 sec latency 141.173 ms 1 888 0.08 MB/sec execute 11 sec latency 386.989 ms 1 913 0.09 MB/sec execute 12 sec latency 150.835 ms 1 923 0.08 MB/sec execute 13 sec latency 378.092 ms 1 969 0.08 MB/sec execute 14 sec latency 96.187 ms 1 983 0.08 MB/sec execute 15 sec latency 392.109 ms 1 1031 0.13 MB/sec execute 16 sec latency 153.848 ms 1 1079 0.13 MB/sec execute 17 sec latency 87.809 ms 1 1127 0.12 MB/sec execute 18 sec latency 99.213 ms 1 1155 0.12 MB/sec execute 19 sec latency 149.559 ms 1 1194 0.11 MB/sec execute 20 sec latency 119.299 ms 1 1225 0.11 MB/sec execute 21 sec latency 125.204 ms 1 1262 0.11 MB/sec execute 22 sec latency 149.949 ms 1 1384 0.23 MB/sec execute 23 sec latency 148.453 ms 1 1415 0.22 MB/sec execute 24 sec latency 137.531 ms 1 1453 0.21 MB/sec execute 25 sec latency 91.307 ms 1 1481 0.20 MB/sec execute 26 sec latency 204.103 ms 1 1503 0.20 MB/sec execute 27 sec latency 378.059 ms 1 1524 0.19 MB/sec execute 28 sec latency 172.937 ms 1 1532 0.18 MB/sec execute 29 sec latency 914.130 ms 1 1532 0.18 MB/sec execute 30 sec latency 1914.808 ms 1 1532 0.17 MB/sec execute 31 sec latency 2915.139 ms 1 1532 0.17 MB/sec execute 32 sec latency 3915.577 ms 1 1532 0.16 MB/sec execute 33 sec latency 4915.849 ms 04:51:25 (1743497485) shut down facet: mds1 facet_host: oleg626-server facet_failover_host: oleg626-server 1 1532 0.16 MB/sec execute 34 sec latency 5922.622 ms 1 1532 0.15 MB/sec execute 35 sec latency 6922.985 ms 1 1532 0.15 MB/sec execute 36 sec latency 7923.398 ms 1 1532 0.14 MB/sec execute 37 sec latency 8923.759 ms 1 1532 0.14 MB/sec execute 38 sec latency 9924.160 ms 1 1532 0.14 MB/sec execute 39 sec latency 10926.971 ms 1 1532 0.13 MB/sec execute 40 sec latency 11927.213 ms 1 1532 0.13 MB/sec execute 41 sec latency 12932.738 ms 1 1532 0.13 MB/sec execute 42 sec latency 13933.111 ms 1 1532 0.12 MB/sec execute 43 sec latency 14933.380 ms 1 1532 0.12 MB/sec execute 44 sec latency 15933.686 ms Failover mds1 to oleg626-server mount facets: mds1 1 1532 0.12 MB/sec execute 45 sec latency 16934.485 ms 1 1532 0.12 MB/sec execute 46 sec latency 17934.842 ms 1 1532 0.11 MB/sec execute 47 sec latency 18935.974 ms 1 1532 0.11 MB/sec execute 48 sec latency 19936.599 ms 1 1532 0.11 MB/sec execute 49 sec latency 20936.964 ms 1 1532 0.11 MB/sec execute 50 sec latency 21937.408 ms 1 1532 0.10 MB/sec execute 51 sec latency 22938.548 ms 1 1532 0.10 MB/sec execute 52 sec latency 23941.283 ms 1 1532 0.10 MB/sec execute 53 sec latency 24941.639 ms Starting mds1: -o localrecov lustre-mdt1/mdt1 /mnt/lustre-mds1 1 1532 0.10 MB/sec execute 54 sec latency 25941.893 ms 1 1532 0.10 MB/sec execute 55 sec latency 26942.629 ms 1 1532 0.10 MB/sec execute 56 sec latency 27946.786 ms 1 1532 0.09 MB/sec execute 57 sec latency 28949.413 ms 1 1532 0.09 MB/sec execute 58 sec latency 29955.265 ms 1 1532 0.09 MB/sec execute 59 sec latency 30962.709 ms 1 1532 0.09 MB/sec execute 60 sec latency 31967.379 ms 1 1532 0.09 MB/sec execute 61 sec latency 32967.633 ms 1 1532 0.09 MB/sec execute 62 sec latency 33969.074 ms 1 1532 0.08 MB/sec execute 63 sec latency 34970.928 ms 1 1532 0.08 MB/sec execute 64 sec latency 35971.199 ms 1 1532 0.08 MB/sec execute 65 sec latency 36971.458 ms 1 1532 0.08 MB/sec execute 66 sec latency 37972.115 ms 1 1532 0.08 MB/sec execute 67 sec latency 38974.943 ms 1 1532 0.08 MB/sec execute 68 sec latency 39978.208 ms 1 1532 0.08 MB/sec execute 69 sec latency 40978.478 ms 1 1532 0.08 MB/sec execute 70 sec latency 41989.326 ms 1 1532 0.08 MB/sec execute 71 sec latency 42994.906 ms 1 1532 0.07 MB/sec execute 72 sec latency 43995.201 ms 1 1532 0.07 MB/sec execute 73 sec latency 45002.369 ms 1 1532 0.07 MB/sec execute 74 sec latency 46003.352 ms 1 1532 0.07 MB/sec execute 75 sec latency 47003.648 ms 1 1532 0.07 MB/sec execute 76 sec latency 48004.112 ms 1 1532 0.07 MB/sec execute 77 sec latency 49004.380 ms 1 1532 0.07 MB/sec execute 78 sec latency 50004.644 ms 1 1532 0.07 MB/sec execute 79 sec latency 51005.063 ms oleg626-server: oleg626-server.virtnet: executing set_default_debug -1 all 1 1532 0.07 MB/sec execute 80 sec latency 52005.566 ms 1 1532 0.07 MB/sec execute 81 sec latency 53005.893 ms pdsh@oleg626-client: oleg626-server: ssh exited with exit code 1 1 1532 0.07 MB/sec execute 82 sec latency 54011.070 ms Started lustre-MDT0000 1 1532 0.06 MB/sec execute 83 sec latency 55012.252 ms 04:52:15 (1743497535) targets are mounted 04:52:15 (1743497535) facet_failover done 1 1532 0.06 MB/sec execute 84 sec latency 56012.512 ms 1 1532 0.06 MB/sec execute 85 sec latency 57018.967 ms 1 1532 0.06 MB/sec execute 86 sec latency 58022.097 ms 1 1532 0.06 MB/sec execute 87 sec latency 59024.358 ms 1 1532 0.06 MB/sec execute 88 sec latency 60027.470 ms 1 1532 0.06 MB/sec execute 89 sec latency 61027.750 ms 1 1532 0.06 MB/sec execute 90 sec latency 62028.936 ms 1 1532 0.06 MB/sec execute 91 sec latency 63029.305 ms 1 1532 0.06 MB/sec execute 92 sec latency 64029.546 ms 1 1532 0.06 MB/sec execute 93 sec latency 65033.475 ms 1 1532 0.06 MB/sec execute 94 sec latency 66033.873 ms 1 1532 0.06 MB/sec execute 95 sec latency 67034.439 ms 1 1532 0.06 MB/sec execute 96 sec latency 68038.234 ms 1 1532 0.06 MB/sec execute 97 sec latency 69038.558 ms 1 1532 0.05 MB/sec execute 98 sec latency 70040.193 ms 1 1532 0.05 MB/sec execute 99 sec latency 71042.174 ms 1 cleanup 100 sec 1 cleanup 101 sec 1 cleanup 102 sec 1 cleanup 103 sec 1 cleanup 104 sec 1 cleanup 105 sec 1 cleanup 106 sec 1 cleanup 107 sec 1 cleanup 108 sec 1 cleanup 109 sec 1 cleanup 110 sec 1 cleanup 111 sec 1 cleanup 112 sec 1 cleanup 113 sec 1 cleanup 114 sec 1 cleanup 115 sec 1 cleanup 116 sec 0 cleanup 117 sec Operation Count AvgLat MaxLat ---------------------------------------- NTCreateX 150 627.884 83056.689 Close 126 15.173 52.691 Rename 10 101.802 149.533 Unlink 20 41.905 96.520 Qpathinfo 122 23.645 172.912 Qfileinfo 24 2.912 11.574 Qfsinfo 33 4.771 36.378 Sfileinfo 34 69.438 133.899 Find 54 8.543 43.933 WriteX 103 12.907 62.200 ReadX 236 0.393 6.982 Flush 25 233.560 708.100 Throughput 0.054031 MB/sec 1 clients 1 procs max_latency=71042.174 ms stopping dbench on /mnt/lustre at Tue Apr 1 04:52:49 EDT 2025 with return code 0 clean dbench files on /mnt/lustre /mnt/lustre /mnt/lustre oleg626-client.virtnet: executing wait_import_state_mount (FULL|IDLE) mdc.lustre-MDT0000-mdc-*.mds_server_uuid removed 'client.txt' /mnt/lustre dbench successfully finished looking for dbench program /usr/bin/dbench found dbench client file /usr/share/dbench/client.txt mdc.lustre-MDT0000-mdc-*.mds_server_uuid in FULL state after 0 sec '/usr/share/dbench/client.txt' -> 'client.txt' running 'dbench 1 -D /mnt/lustre2/d26.replay-dual/run_dbench -t 100' on /mnt/lustre at Tue Apr 1 04:53:01 EDT 2025 waiting for dbench pid 45307 dbench version 4.00 - Copyright Andrew Tridgell 1999-2004 Running for 100 seconds with load 'client.txt' and minimum warmup 20 secs 0 of 1 processes prepared for launch 0 sec 1 of 1 processes prepared for launch 0 sec releasing clients 1 14 0.00 MB/sec warmup 1 sec latency 13.667 ms 1 54 1.04 MB/sec warmup 2 sec latency 247.660 ms 1 91 1.38 MB/sec warmup 3 sec latency 165.600 ms 1 125 1.38 MB/sec warmup 4 sec latency 228.157 ms 1 153 1.29 MB/sec warmup 5 sec latency 172.754 ms UUID 1K-blocks Used Available Use% Mounted on lustre-MDT0000_UUID 2210560 4096 2204416 1% /mnt/lustre[MDT:0] 1 179 1.20 MB/sec warmup 6 sec latency 199.800 ms lustre-OST0000_UUID 3771392 40960 3699712 2% /mnt/lustre[OST:0] lustre-OST0001_UUID 3771392 17408 3736576 1% /mnt/lustre[OST:1] filesystem_summary: 7542784 58368 7436288 1% /mnt/lustre 1 199 1.11 MB/sec warmup 7 sec latency 265.353 ms 1 224 1.07 MB/sec warmup 8 sec latency 158.765 ms 1 247 1.04 MB/sec warmup 9 sec latency 197.104 ms 1 276 1.03 MB/sec warmup 10 sec latency 179.448 ms 1 304 1.01 MB/sec warmup 11 sec latency 175.293 ms 1 324 0.97 MB/sec warmup 12 sec latency 237.064 ms 1 345 0.94 MB/sec warmup 13 sec latency 253.260 ms 1 375 0.95 MB/sec warmup 14 sec latency 193.652 ms 1 410 0.96 MB/sec warmup 15 sec latency 176.802 ms 1 453 0.99 MB/sec warmup 16 sec latency 105.350 ms test_26 fail mds1 2 times 1 493 1.01 MB/sec warmup 17 sec latency 127.325 ms 1 510 0.98 MB/sec warmup 18 sec latency 298.997 ms 1 532 0.97 MB/sec warmup 19 sec latency 202.455 ms Failing mds1 on oleg626-server 1 639 0.65 MB/sec execute 1 sec latency 168.635 ms 1 664 0.37 MB/sec execute 2 sec latency 249.064 ms Stopping /mnt/lustre-mds1 (opts:) on oleg626-server 1 674 0.25 MB/sec execute 3 sec latency 453.383 ms 1 697 0.20 MB/sec execute 4 sec latency 193.075 ms 1 717 0.17 MB/sec execute 5 sec latency 198.535 ms 1 717 0.15 MB/sec execute 6 sec latency 1202.913 ms 1 717 0.12 MB/sec execute 7 sec latency 2203.221 ms 1 717 0.11 MB/sec execute 8 sec latency 3203.470 ms 1 717 0.10 MB/sec execute 9 sec latency 4204.746 ms 04:53:31 (1743497611) shut down 1 717 0.09 MB/sec execute 10 sec latency 5238.589 ms facet: mds1 facet_host: oleg626-server facet_failover_host: oleg626-server 1 717 0.08 MB/sec execute 11 sec latency 6238.861 ms 1 717 0.07 MB/sec execute 12 sec latency 7239.135 ms 1 717 0.07 MB/sec execute 13 sec latency 8241.764 ms 1 717 0.06 MB/sec execute 14 sec latency 9242.029 ms 1 717 0.06 MB/sec execute 15 sec latency 10244.006 ms 1 717 0.05 MB/sec execute 16 sec latency 11245.354 ms 1 717 0.05 MB/sec execute 17 sec latency 12249.862 ms 1 717 0.05 MB/sec execute 18 sec latency 13255.072 ms 1 717 0.05 MB/sec execute 19 sec latency 14255.356 ms 1 717 0.04 MB/sec execute 20 sec latency 15255.643 ms Failover mds1 to oleg626-server mount facets: mds1 1 717 0.04 MB/sec execute 21 sec latency 16258.884 ms 1 717 0.04 MB/sec execute 22 sec latency 17270.362 ms 1 717 0.04 MB/sec execute 23 sec latency 18270.592 ms 1 717 0.04 MB/sec execute 24 sec latency 19274.331 ms 1 717 0.03 MB/sec execute 25 sec latency 20275.881 ms 1 717 0.03 MB/sec execute 26 sec latency 21276.370 ms 1 717 0.03 MB/sec execute 27 sec latency 22285.717 ms 1 717 0.03 MB/sec execute 28 sec latency 23286.941 ms 1 717 0.03 MB/sec execute 29 sec latency 24293.268 ms Starting mds1: -o localrecov lustre-mdt1/mdt1 /mnt/lustre-mds1 1 717 0.03 MB/sec execute 30 sec latency 25296.721 ms 1 717 0.03 MB/sec execute 31 sec latency 26297.803 ms 1 717 0.03 MB/sec execute 32 sec latency 27299.353 ms 1 717 0.03 MB/sec execute 33 sec latency 28302.724 ms 1 717 0.03 MB/sec execute 34 sec latency 29306.971 ms 1 717 0.02 MB/sec execute 35 sec latency 30307.272 ms 1 717 0.02 MB/sec execute 36 sec latency 31307.535 ms 1 717 0.02 MB/sec execute 37 sec latency 32308.538 ms 1 717 0.02 MB/sec execute 38 sec latency 33313.684 ms 1 717 0.02 MB/sec execute 39 sec latency 34321.247 ms 1 717 0.02 MB/sec execute 40 sec latency 35323.782 ms 1 717 0.02 MB/sec execute 41 sec latency 36324.228 ms 1 717 0.02 MB/sec execute 42 sec latency 37324.802 ms 1 717 0.02 MB/sec execute 43 sec latency 38325.066 ms oleg626-server: oleg626-server.virtnet: executing set_default_debug -1 all 1 717 0.02 MB/sec execute 44 sec latency 39327.835 ms 1 717 0.02 MB/sec execute 45 sec latency 40335.712 ms pdsh@oleg626-client: oleg626-server: ssh exited with exit code 1 1 717 0.02 MB/sec execute 46 sec latency 41336.584 ms 1 717 0.02 MB/sec execute 47 sec latency 42340.002 ms Started lustre-MDT0000 04:54:09 (1743497649) targets are mounted 04:54:09 (1743497649) facet_failover done 1 717 0.02 MB/sec execute 48 sec latency 43344.706 ms 1 717 0.02 MB/sec execute 49 sec latency 44345.001 ms 1 717 0.02 MB/sec execute 50 sec latency 45345.902 ms 1 717 0.02 MB/sec execute 51 sec latency 46346.146 ms 1 721 0.02 MB/sec execute 52 sec latency 46528.900 ms 1 730 0.02 MB/sec execute 53 sec latency 402.679 ms 1 761 0.02 MB/sec execute 54 sec latency 196.044 ms 1 780 0.02 MB/sec execute 55 sec latency 199.179 ms oleg626-client.virtnet: executing wait_import_state_mount (FULL|IDLE) mdc.lustre-MDT0000-mdc-*.mds_server_uuid 1 796 0.02 MB/sec execute 56 sec latency 398.324 ms 1 809 0.02 MB/sec execute 57 sec latency 230.784 ms 1 851 0.02 MB/sec execute 58 sec latency 138.377 ms 1 873 0.02 MB/sec execute 59 sec latency 366.331 ms 1 885 0.02 MB/sec execute 60 sec latency 640.983 ms mdc.lustre-MDT0000-mdc-*.mds_server_uuid in FULL state after 0 sec 1 905 0.02 MB/sec execute 61 sec latency 232.287 ms 1 921 0.02 MB/sec execute 62 sec latency 389.543 ms 1 954 0.02 MB/sec execute 63 sec latency 201.327 ms 1 970 0.02 MB/sec execute 64 sec latency 155.187 ms 1 981 0.02 MB/sec execute 65 sec latency 677.976 ms 1 995 0.02 MB/sec execute 66 sec latency 230.040 ms tar: Unexpected EOF in archive tar: Unexpected EOF in archive 1 1018 0.03 MB/sec execute 67 sec latency 236.939 ms 1 1074 0.04 MB/sec execute 68 sec latency 105.637 ms 1 1126 0.04 MB/sec execute 69 sec latency 99.080 ms 1 1147 0.04 MB/sec execute 70 sec latency 244.856 ms 1 1177 0.04 MB/sec execute 71 sec latency 350.398 ms 1 1219 0.04 MB/sec execute 72 sec latency 89.600 ms 1 1247 0.04 MB/sec execute 73 sec latency 122.828 ms 1 1278 0.04 MB/sec execute 74 sec latency 110.934 ms 1 1391 0.07 MB/sec execute 75 sec latency 149.892 ms 1 1418 0.07 MB/sec execute 76 sec latency 207.494 ms 1 1462 0.07 MB/sec execute 77 sec latency 89.564 ms 1 1496 0.07 MB/sec execute 78 sec latency 395.710 ms 1 1538 0.07 MB/sec execute 79 sec latency 111.198 ms tar: Error is not recoverable: exiting now dbench killed by signal 15 stopping dbench on /mnt/lustre at Tue Apr 1 04:54:41 EDT 2025 with return code 0 1 1576 0.07 MB/sec execute 80 sec latency 76.980 ms 45307 pts/0 S+ 0:00 dbench -c client.txt 1 -D /mnt/lustre2/d26.replay-dual/run_dbench -t 100 killed dbench main pid 45307 clean dbench files on /mnt/lustre /mnt/lustre /mnt/lustre removed 'client.txt' /mnt/lustre dbench successfully finished PASS 26 (309s) == replay-dual test 28: lock replay should be ordered: waiting after granted ========================================================== 04:55:28 (1743497728) sleep 5 for ZFS MDS Waiting for MDT destroys to complete 1+0 records in 1+0 records out 4096 bytes (4.1 kB, 4.0 KiB) copied, 0.0181102 s, 226 kB/s fail_loc=0x80000324 fail_loc=0x32a Failing ost1 on oleg626-server Stopping /mnt/lustre-ost1 (opts:) on oleg626-server 04:55:52 (1743497752) shut down facet: ost1 facet_host: oleg626-server facet_failover_host: oleg626-server Failover ost1 to oleg626-server mount facets: ost1 Starting ost1: -o localrecov lustre-ost1/ost1 /mnt/lustre-ost1 1+0 records in 1+0 records out 4096 bytes (4.1 kB, 4.0 KiB) copied, 0.0271061 s, 151 kB/s seq.cli-lustre-OST0000-super.width=65536 oleg626-server: oleg626-server.virtnet: executing set_default_debug -1 all pdsh@oleg626-client: oleg626-server: ssh exited with exit code 1 Started lustre-OST0000 04:56:24 (1743497784) targets are mounted 04:56:24 (1743497784) facet_failover done oleg626-client.virtnet: executing wait_import_state_mount (FULL|IDLE) osc.lustre-OST0000-osc-[-0-9a-f]*.ost_server_uuid osc.lustre-OST0000-osc-[-0-9a-f]*.ost_server_uuid in FULL state after 0 sec PASS 28 (83s) == replay-dual test 29: replay vs update with the same xid ========================================================== 04:56:51 (1743497811) SKIP: replay-dual test_29 needs >= 2 MDTs SKIP 29 (6s) == replay-dual test 30: layout lock replay is not blocked on IO ========================================================== 04:56:57 (1743497817) 10+0 records in 10+0 records out 40960 bytes (41 kB, 40 KiB) copied, 0.0346213 s, 1.2 MB/s 10+0 records in 10+0 records out 40960 bytes (41 kB, 40 KiB) copied, 0.0387908 s, 1.1 MB/s fail_loc=0x32e fail_val=4 Failing mds1 on oleg626-server Stopping /mnt/lustre-mds1 (opts:) on oleg626-server 04:57:08 (1743497828) shut down facet: mds1 facet_host: oleg626-server facet_failover_host: oleg626-server Failover mds1 to oleg626-server mount facets: mds1 Starting mds1: -o localrecov lustre-mdt1/mdt1 /mnt/lustre-mds1 oleg626-server: oleg626-server.virtnet: executing set_default_debug -1 all pdsh@oleg626-client: oleg626-server: ssh exited with exit code 1 Started lustre-MDT0000 04:57:45 (1743497865) targets are mounted 04:57:45 (1743497865) facet_failover done 160+0 records in 160+0 records out 81920 bytes (82 kB, 80 KiB) copied, 44.9882 s, 1.8 kB/s oleg626-client.virtnet: executing wait_import_state_mount (FULL|IDLE) mdc.lustre-MDT0000-mdc-*.mds_server_uuid mdc.lustre-MDT0000-mdc-*.mds_server_uuid in FULL state after 0 sec PASS 30 (72s) == replay-dual test 31: deadlock on file_remove_privs and occupied mod rpc slots ========================================================== 04:58:09 (1743497889) Failing ost1 on oleg626-server Stopping /mnt/lustre-ost1 (opts:) on oleg626-server Creating to objid 2113 on ost lustre-OST0000... total: 32 open/close in 1.71 seconds: 18.71 ops/second 04:58:23 (1743497903) shut down facet: ost1 facet_host: oleg626-server facet_failover_host: oleg626-server at_max=0 fail_loc=0x80001420 file /mnt/lustre2/d31.replay-dual/mdtdir/f31.replay-dual is not ready, wait 0.5 second... file /mnt/lustre2/d31.replay-dual/mdtdir/f31.replay-dual is ready Failover ost1 to oleg626-server mount facets: ost1 Starting ost1: -o localrecov lustre-ost1/ost1 /mnt/lustre-ost1 seq.cli-lustre-OST0000-super.width=65536 oleg626-server: oleg626-server.virtnet: executing set_default_debug -1 all pdsh@oleg626-client: oleg626-server: ssh exited with exit code 1 Started lustre-OST0000 04:58:55 (1743497935) targets are mounted 04:58:55 (1743497935) facet_failover done oleg626-client.virtnet: executing wait_import_state_mount (FULL|IDLE) osc.lustre-OST0000-osc-[-0-9a-f]*.ost_server_uuid osc.lustre-OST0000-osc-[-0-9a-f]*.ost_server_uuid in FULL IDLE state after 0 sec pids: 50257 50259 50264 50265 50266 50267 50268 50269 50270 UUID 1K-blocks Used Available Use% Mounted on lustre-MDT0000_UUID 2210560 3584 2204928 1% /mnt/lustre[MDT:0] lustre-OST0000_UUID 3771392 3072 3766272 1% /mnt/lustre[OST:0] lustre-OST0001_UUID 3771392 3072 3766272 1% /mnt/lustre[OST:1] filesystem_summary: 7542784 6144 7532544 1% /mnt/lustre UUID Inodes IUsed IFree IUse% Mounted on lustre-MDT0000_UUID 496504 393 496111 1% /mnt/lustre[MDT:0] lustre-OST0000_UUID 118187 427 117760 1% /mnt/lustre[OST:0] lustre-OST0001_UUID 118153 393 117760 1% /mnt/lustre[OST:1] filesystem_summary: 235913 393 235520 1% /mnt/lustre at_max=600 PASS 31 (71s) == replay-dual test 32: gap in update llog shouldn't break recovery ========================================================== 04:59:20 (1743497960) SKIP: replay-dual test_32 needs >= 2 MDTs SKIP 32 (6s) == replay-dual test 33: Check for OBD_INCOMPAT_MULTI_RPCS in last_rcvd after abort_recovery ========================================================== 04:59:27 (1743497967) SKIP: replay-dual test_33 ldiskfs only test SKIP 33 (6s) == replay-dual test complete, duration 4503 sec ========== 04:59:33 (1743497973) === replay-dual: start cleanup 04:59:36 (1743497976) === Stopping clients: oleg626-client.virtnet /mnt/lustre2 (opts:) Stopping client oleg626-client.virtnet /mnt/lustre2 opts: === replay-dual: finish cleanup 04:59:48 (1743497988) ===