-----============= acceptance-small: sanity-lfsck ============----- Mon Mar 16 09:36:56 EDT 2026 mgs: Rocky Linux release 8.10 (Green Obsidian) MGS_OS_ID_LIKE=rhel centos fedora rocky MGS_OS_VERSION_ID=8.10 MGS_OS_ID=rocky MGS_OS_VERSION_CODE=134873088 mds1: Rocky Linux release 8.10 (Green Obsidian) MDS1_OS_VERSION_ID=8.10 MDS1_OS_VERSION_CODE=134873088 MDS1_OS_ID_LIKE=rhel centos fedora rocky MDS1_OS_ID=rocky ost1: Rocky Linux release 8.10 (Green Obsidian) OST1_OS_VERSION_CODE=134873088 OST1_OS_ID_LIKE=rhel centos fedora rocky OST1_OS_VERSION_ID=8.10 OST1_OS_ID=rocky client: Rocky Linux release 8.10 (Green Obsidian) CLIENT_OS_ID=rocky CLIENT_OS_VERSION_CODE=134873088 CLIENT_OS_VERSION_ID=8.10 CLIENT_OS_ID_LIKE=rhel centos fedora rocky oleg442-server: ls: cannot access '/home/green/git/lustre-release/lustre/tests/except/sanity-lfsck.*ex': No such file or directory excepting tests: 18b 23b /home/green/git/lustre-release/lustre/tests/test-framework.sh: line 1040: echo: write error: Device or resource busy Loading modules from /home/green/git/lustre-release/lustre detected 4 online CPUs by sysfs MODOPTS_LIBCFS= Force libcfs to create 2 CPU partitions loading modules on: 'oleg442-server' oleg442-server: oleg442-server.virtnet: executing load_modules_local oleg442-server: Loading modules from /home/green/git/lustre-release/lustre oleg442-server: /home/green/git/lustre-release/lustre/tests/test-framework.sh: line 1040: echo: write error: Device or resource busy oleg442-server: detected 4 online CPUs by sysfs oleg442-server: MODOPTS_LIBCFS= oleg442-server: Force libcfs to create 2 CPU partitions client=34681601 MDS=34681601 OSS=34681601 Stopping clients: oleg442-client.virtnet /mnt/lustre (opts:) Stopping client oleg442-client.virtnet /mnt/lustre opts: Stopping clients: oleg442-client.virtnet /mnt/lustre2 (opts:) Stopping /mnt/lustre-mds1 (opts:-f) on oleg442-server Stopping /mnt/lustre-mds2 (opts:-f) on oleg442-server Stopping /mnt/lustre-ost1 (opts:-f) on oleg442-server Stopping /mnt/lustre-ost2 (opts:-f) on oleg442-server unloading modules via unload_modules_local on: 'oleg442-server' oleg442-server: oleg442-server.virtnet: executing unload_modules_local oleg442-server: modules unloaded. === sanity-lfsck: start setup 09:37:49 (1773668269) === Stopping clients: oleg442-client.virtnet /mnt/lustre (opts:-f) Stopping clients: oleg442-client.virtnet /mnt/lustre2 (opts:-f) pdsh@oleg442-client: oleg442-server: ssh exited with exit code 1 pdsh@oleg442-client: oleg442-server: ssh exited with exit code 1 pdsh@oleg442-client: oleg442-server: ssh exited with exit code 1 pdsh@oleg442-client: oleg442-server: ssh exited with exit code 1 pdsh@oleg442-client: oleg442-server: ssh exited with exit code 2 oleg442-server: oleg442-server.virtnet: executing set_hostid /home/green/git/lustre-release/lustre/tests/test-framework.sh: line 1040: echo: write error: Device or resource busy Loading modules from /home/green/git/lustre-release/lustre detected 4 online CPUs by sysfs MODOPTS_LIBCFS= Force libcfs to create 2 CPU partitions ptlrpc/ptlrpc options: 'lbug_on_grant_miscount=1' quota/lquota options: 'hash_lqs_cur_bits=3' mdt/mdt options: 'mdt_enable_flr_ec=1' ln: failed to create symbolic link '/sbin/.libs': Read-only file system loading modules on: 'oleg442-server' oleg442-server: oleg442-server.virtnet: executing load_modules_local oleg442-server: /home/green/git/lustre-release/lustre/tests/test-framework.sh: line 1040: echo: write error: Device or resource busy oleg442-server: Loading modules from /home/green/git/lustre-release/lustre oleg442-server: detected 4 online CPUs by sysfs oleg442-server: MODOPTS_LIBCFS= oleg442-server: Force libcfs to create 2 CPU partitions oleg442-server: ptlrpc/ptlrpc options: 'lbug_on_grant_miscount=1' oleg442-server: quota/lquota options: 'hash_lqs_cur_bits=3' oleg442-server: mdt/mdt options: 'mdt_enable_flr_ec=1' Formatting mgs, mds, osts Format mds1: /dev/vdc pdsh@oleg442-client: oleg442-server: ssh exited with exit code 1 Format mds2: /dev/vdd pdsh@oleg442-client: oleg442-server: ssh exited with exit code 1 Format ost1: /dev/vde pdsh@oleg442-client: oleg442-server: ssh exited with exit code 1 Format ost2: /dev/vdf pdsh@oleg442-client: oleg442-server: ssh exited with exit code 1 Checking servers environments Checking clients oleg442-client.virtnet environments /home/green/git/lustre-release/lustre/tests/test-framework.sh: line 1040: echo: write error: Device or resource busy Loading modules from /home/green/git/lustre-release/lustre detected 4 online CPUs by sysfs MODOPTS_LIBCFS= Force libcfs to create 2 CPU partitions loading modules on: 'oleg442-server' oleg442-server: oleg442-server.virtnet: executing load_modules_local oleg442-server: /home/green/git/lustre-release/lustre/tests/test-framework.sh: line 1040: echo: write error: Device or resource busy oleg442-server: Loading modules from /home/green/git/lustre-release/lustre oleg442-server: detected 4 online CPUs by sysfs oleg442-server: MODOPTS_LIBCFS= oleg442-server: Force libcfs to create 2 CPU partitions Setup mgs, mdt, osts pdsh@oleg442-client: oleg442-server: ssh exited with exit code 1 Start mds1: mount -t lustre -o localrecov /dev/mapper/mds1_flakey /mnt/lustre-mds1 oleg442-server: oleg442-server.virtnet: executing set_default_debug vfstrace rpctrace dlmtrace neterror ha config ioctl super lfsck all pdsh@oleg442-client: oleg442-server: ssh exited with exit code 1 Commit the device label on /dev/vdc Started lustre-MDT0000 pdsh@oleg442-client: oleg442-server: ssh exited with exit code 1 Start mds2: mount -t lustre -o localrecov /dev/mapper/mds2_flakey /mnt/lustre-mds2 oleg442-server: oleg442-server.virtnet: executing set_default_debug vfstrace rpctrace dlmtrace neterror ha config ioctl super lfsck all pdsh@oleg442-client: oleg442-server: ssh exited with exit code 1 Commit the device label on /dev/vdd Started lustre-MDT0001 pdsh@oleg442-client: oleg442-server: ssh exited with exit code 1 Start ost1: mount -t lustre -o localrecov /dev/mapper/ost1_flakey /mnt/lustre-ost1 seq.cli-lustre-OST0000-super.width=65536 oleg442-server: oleg442-server.virtnet: executing set_default_debug vfstrace rpctrace dlmtrace neterror ha config ioctl super lfsck all pdsh@oleg442-client: oleg442-server: ssh exited with exit code 1 Commit the device label on /dev/vde Started lustre-OST0000 pdsh@oleg442-client: oleg442-server: ssh exited with exit code 1 Start ost2: mount -t lustre -o localrecov /dev/mapper/ost2_flakey /mnt/lustre-ost2 seq.cli-lustre-OST0001-super.width=65536 oleg442-server: oleg442-server.virtnet: executing set_default_debug vfstrace rpctrace dlmtrace neterror ha config ioctl super lfsck all pdsh@oleg442-client: oleg442-server: ssh exited with exit code 1 Commit the device label on /dev/vdf Started lustre-OST0001 Starting client: oleg442-client.virtnet: -o user_xattr,flock 192.168.204.142@tcp:/lustre /mnt/lustre Starting client oleg442-client.virtnet: -o user_xattr,flock 192.168.204.142@tcp:/lustre /mnt/lustre Started clients oleg442-client.virtnet: 192.168.204.142@tcp:/lustre on /mnt/lustre type lustre (rw,checksum,encrypt,flock,lazystatfs,lruresize,nolock,statfs_project,nouser_fid2path,user_xattr,verbose) Using TIMEOUT=20 osc.lustre-OST0000-osc-ffff9cb08738d000.idle_timeout=debug osc.lustre-OST0001-osc-ffff9cb08738d000.idle_timeout=debug setting jobstats to procname_uid Setting lustre.sys.jobid_var from disable to procname_uid Waiting 90s for 'procname_uid' Updated after 5s: want 'procname_uid' got 'procname_uid' disable quota as required osd-ldiskfs.track_declares_assert=1 === sanity-lfsck: finish setup 09:39:20 (1773668360) === == sanity-lfsck test 0: Control LFSCK manually =========== 09:39:21 (1773668361) preparing... 3 * 3 files will be created Mon Mar 16 09:39:21 EDT 2026. total: 3 mkdir in 0.01 seconds: 597.25 ops/second total: 3 create in 0.00 seconds: 651.32 ops/second total: 3 mkdir in 0.00 seconds: 760.43 ops/second prepared Mon Mar 16 09:39:23 EDT 2026. fail_val=3 fail_loc=0x1600 Started LFSCK on the device lustre-MDT0000: scrub namespace name: lfsck_namespace magic: 0xa06249ff version: 2 status: scanning-phase1 flags: param: last_completed_time: N/A time_since_last_completed: N/A latest_start_time: 1773668363 time_since_latest_start: 1 seconds last_checkpoint_time: N/A time_since_last_checkpoint: N/A latest_start_position: 13, N/A, N/A last_checkpoint_position: N/A, N/A, N/A first_failure_position: N/A, N/A, N/A checked_phase1: 0 checked_phase2: 0 updated_phase1: 0 updated_phase2: 0 failed_phase1: 0 failed_phase2: 0 directories: 0 dirent_repaired: 0 linkea_repaired: 0 nlinks_repaired: 0 multiple_linked_checked: 0 multiple_linked_repaired: 0 unknown_inconsistency: 0 unmatched_pairs_repaired: 0 dangling_repaired: 0 multiple_referenced_repaired: 0 bad_file_type_repaired: 0 lost_dirent_repaired: 0 local_lost_found_scanned: 0 local_lost_found_moved: 0 local_lost_found_skipped: 0 local_lost_found_failed: 0 striped_dirs_scanned: 0 striped_dirs_repaired: 0 striped_dirs_failed: 0 striped_dirs_disabled: 0 striped_dirs_skipped: 0 striped_shards_scanned: 0 striped_shards_repaired: 0 striped_shards_failed: 0 striped_shards_skipped: 0 name_hash_repaired: 0 linkea_overflow_cleared: 0 agent_entries_repaired: 0 success_count: 0 run_time_phase1: 0 seconds run_time_phase2: 0 seconds average_speed_phase1: 0 items/sec average_speed_phase2: N/A average_speed_total: 0 items/sec real_time_speed_phase1: 0 items/sec real_time_speed_phase2: N/A current_position: 12, N/A, N/A Stopped LFSCK on the device lustre-MDT0000. Started LFSCK on the device lustre-MDT0000: scrub namespace fail_loc=0 fail_val=0 Waiting 32s for 'completed' Updated after 3s: want 'completed' got 'completed' Started LFSCK on the device lustre-MDT0000: scrub namespace stopall, should NOT crash LU-3649 Stopping clients: oleg442-client.virtnet /mnt/lustre (opts:) Stopping client oleg442-client.virtnet /mnt/lustre opts: Stopping clients: oleg442-client.virtnet /mnt/lustre2 (opts:) Stopping /mnt/lustre-mds1 (opts:-f) on oleg442-server Stopping /mnt/lustre-mds2 (opts:-f) on oleg442-server Stopping /mnt/lustre-ost1 (opts:-f) on oleg442-server Stopping /mnt/lustre-ost2 (opts:-f) on oleg442-server PASS 0 (45s) == sanity-lfsck test 1a: LFSCK can find out and repair crashed FID-in-dirent ========================================================== 09:40:06 (1773668406) Checking servers environments Checking clients oleg442-client.virtnet environments /home/green/git/lustre-release/lustre/tests/test-framework.sh: line 1040: echo: write error: Device or resource busy Loading modules from /home/green/git/lustre-release/lustre detected 4 online CPUs by sysfs MODOPTS_LIBCFS= Force libcfs to create 2 CPU partitions loading modules on: 'oleg442-server' oleg442-server: oleg442-server.virtnet: executing load_modules_local oleg442-server: Loading modules from /home/green/git/lustre-release/lustre oleg442-server: /home/green/git/lustre-release/lustre/tests/test-framework.sh: line 1040: echo: write error: Device or resource busy oleg442-server: detected 4 online CPUs by sysfs oleg442-server: MODOPTS_LIBCFS= oleg442-server: Force libcfs to create 2 CPU partitions Setup mgs, mdt, osts Start mds1: mount -t lustre -o localrecov /dev/mapper/mds1_flakey /mnt/lustre-mds1 oleg442-server: oleg442-server.virtnet: executing set_default_debug vfstrace rpctrace dlmtrace neterror ha config ioctl super lfsck all pdsh@oleg442-client: oleg442-server: ssh exited with exit code 1 Started lustre-MDT0000 Start mds2: mount -t lustre -o localrecov /dev/mapper/mds2_flakey /mnt/lustre-mds2 oleg442-server: oleg442-server.virtnet: executing set_default_debug vfstrace rpctrace dlmtrace neterror ha config ioctl super lfsck all pdsh@oleg442-client: oleg442-server: ssh exited with exit code 1 Started lustre-MDT0001 Start ost1: mount -t lustre -o localrecov /dev/mapper/ost1_flakey /mnt/lustre-ost1 seq.cli-lustre-OST0000-super.width=65536 oleg442-server: oleg442-server.virtnet: executing set_default_debug vfstrace rpctrace dlmtrace neterror ha config ioctl super lfsck all pdsh@oleg442-client: oleg442-server: ssh exited with exit code 1 Started lustre-OST0000 Start ost2: mount -t lustre -o localrecov /dev/mapper/ost2_flakey /mnt/lustre-ost2 seq.cli-lustre-OST0001-super.width=65536 oleg442-server: oleg442-server.virtnet: executing set_default_debug vfstrace rpctrace dlmtrace neterror ha config ioctl super lfsck all pdsh@oleg442-client: oleg442-server: ssh exited with exit code 1 Started lustre-OST0001 Starting client: oleg442-client.virtnet: -o user_xattr,flock 192.168.204.142@tcp:/lustre /mnt/lustre Starting client oleg442-client.virtnet: -o user_xattr,flock 192.168.204.142@tcp:/lustre /mnt/lustre Started clients oleg442-client.virtnet: 192.168.204.142@tcp:/lustre on /mnt/lustre type lustre (rw,checksum,encrypt,flock,lazystatfs,lruresize,nolock,statfs_project,nouser_fid2path,user_xattr,verbose) Using TIMEOUT=20 osc.lustre-OST0000-osc-ffff9cb084d03800.idle_timeout=debug osc.lustre-OST0001-osc-ffff9cb084d03800.idle_timeout=debug disable quota as required preparing... 1 * 1 files will be created Mon Mar 16 09:40:53 EDT 2026. total: 1 mkdir in 0.00 seconds: 340.03 ops/second total: 1 create in 0.00 seconds: 319.81 ops/second total: 1 mkdir in 0.00 seconds: 269.82 ops/second prepared Mon Mar 16 09:40:55 EDT 2026. fail_loc=0x1501 fail_loc=0 192.168.204.142@tcp:/lustre /mnt/lustre lustre rw,checksum,encrypt,flock,lazystatfs,lruresize,nolock,statfs_project,nouser_fid2path,user_xattr,verbose 0 0 Stopping client oleg442-client.virtnet /mnt/lustre (opts:) Started LFSCK on the device lustre-MDT0000: scrub namespace oleg442-server: oleg442-server.virtnet: executing set_default_debug vfstrace rpctrace dlmtrace neterror ha config ioctl super lfsck all pdsh@oleg442-client: oleg442-server: ssh exited with exit code 1 Starting client: oleg442-client.virtnet: -o user_xattr,flock 192.168.204.142@tcp:/lustre /mnt/lustre fail_loc=0x1505 fail_loc=0 PASS 1a (67s) == sanity-lfsck test 1b: LFSCK can find out and repair the missing FID-in-LMA ========================================================== 09:41:13 (1773668473) preparing... 1 * 1 files will be created Mon Mar 16 09:41:14 EDT 2026. total: 1 mkdir in 0.00 seconds: 237.42 ops/second total: 1 create in 0.00 seconds: 374.39 ops/second total: 1 mkdir in 0.00 seconds: 405.36 ops/second prepared Mon Mar 16 09:41:16 EDT 2026. fail_loc=0x1502 fail_loc=0 192.168.204.142@tcp:/lustre /mnt/lustre lustre rw,checksum,encrypt,flock,lazystatfs,lruresize,nolock,statfs_project,nouser_fid2path,user_xattr,verbose 0 0 Stopping client oleg442-client.virtnet /mnt/lustre (opts:) fail_loc=0x1506 Started LFSCK on the device lustre-MDT0000: scrub namespace fail_loc=0 oleg442-server: oleg442-server.virtnet: executing set_default_debug vfstrace rpctrace dlmtrace neterror ha config ioctl super lfsck all pdsh@oleg442-client: oleg442-server: ssh exited with exit code 1 Starting client: oleg442-client.virtnet: -o user_xattr,flock 192.168.204.142@tcp:/lustre /mnt/lustre fail_loc=0x1505 fail_loc=0 PASS 1b (23s) == sanity-lfsck test 1c: LFSCK can find out and repair lost FID-in-dirent ========================================================== 09:41:37 (1773668497) preparing... 1 * 1 files will be created Mon Mar 16 09:41:38 EDT 2026. total: 1 mkdir in 0.00 seconds: 308.84 ops/second total: 1 create in 0.00 seconds: 327.76 ops/second total: 1 mkdir in 0.00 seconds: 294.81 ops/second prepared Mon Mar 16 09:41:40 EDT 2026. fail_loc=0x1504 fail_loc=0 192.168.204.142@tcp:/lustre /mnt/lustre lustre rw,checksum,encrypt,flock,lazystatfs,lruresize,nolock,statfs_project,nouser_fid2path,user_xattr,verbose 0 0 Stopping client oleg442-client.virtnet /mnt/lustre (opts:) Started LFSCK on the device lustre-MDT0000: scrub namespace oleg442-server: oleg442-server.virtnet: executing set_default_debug vfstrace rpctrace dlmtrace neterror ha config ioctl super lfsck all pdsh@oleg442-client: oleg442-server: ssh exited with exit code 1 Starting client: oleg442-client.virtnet: -o user_xattr,flock 192.168.204.142@tcp:/lustre /mnt/lustre fail_loc=0x1505 fail_loc=0 PASS 1c (21s) == sanity-lfsck test 2a: LFSCK can find out and repair crashed linkEA entry ========================================================== 09:41:58 (1773668518) preparing... 1 * 1 files will be created Mon Mar 16 09:41:59 EDT 2026. total: 1 mkdir in 0.00 seconds: 403.41 ops/second total: 1 create in 0.00 seconds: 505.03 ops/second total: 1 mkdir in 0.00 seconds: 426.64 ops/second prepared Mon Mar 16 09:42:00 EDT 2026. fail_loc=0x1603 fail_loc=0 192.168.204.142@tcp:/lustre /mnt/lustre lustre rw,checksum,encrypt,flock,lazystatfs,lruresize,nolock,statfs_project,nouser_fid2path,user_xattr,verbose 0 0 Stopping client oleg442-client.virtnet /mnt/lustre (opts:) Started LFSCK on the device lustre-MDT0000: scrub namespace oleg442-server: oleg442-server.virtnet: executing set_default_debug vfstrace rpctrace dlmtrace neterror ha config ioctl super lfsck all pdsh@oleg442-client: oleg442-server: ssh exited with exit code 1 Starting client: oleg442-client.virtnet: -o user_xattr,flock 192.168.204.142@tcp:/lustre /mnt/lustre PASS 2a (19s) == sanity-lfsck test 2b: LFSCK can find out and remove invalid linkEA entry ========================================================== 09:42:17 (1773668537) preparing... 1 * 1 files will be created Mon Mar 16 09:42:18 EDT 2026. total: 1 mkdir in 0.01 seconds: 197.00 ops/second total: 1 create in 0.00 seconds: 428.65 ops/second total: 1 mkdir in 0.00 seconds: 255.03 ops/second prepared Mon Mar 16 09:42:20 EDT 2026. fail_loc=0x1604 fail_loc=0 192.168.204.142@tcp:/lustre /mnt/lustre lustre rw,checksum,encrypt,flock,lazystatfs,lruresize,nolock,statfs_project,nouser_fid2path,user_xattr,verbose 0 0 Stopping client oleg442-client.virtnet /mnt/lustre (opts:) Started LFSCK on the device lustre-MDT0000: scrub namespace oleg442-server: oleg442-server.virtnet: executing set_default_debug vfstrace rpctrace dlmtrace neterror ha config ioctl super lfsck all pdsh@oleg442-client: oleg442-server: ssh exited with exit code 1 Starting client: oleg442-client.virtnet: -o user_xattr,flock 192.168.204.142@tcp:/lustre /mnt/lustre PASS 2b (30s) == sanity-lfsck test 2c: LFSCK can find out and remove repeated linkEA entry ========================================================== 09:42:47 (1773668567) preparing... 1 * 1 files will be created Mon Mar 16 09:42:49 EDT 2026. total: 1 mkdir in 0.01 seconds: 87.89 ops/second total: 1 create in 0.00 seconds: 223.22 ops/second total: 1 mkdir in 0.00 seconds: 212.13 ops/second prepared Mon Mar 16 09:42:53 EDT 2026. fail_loc=0x1605 fail_loc=0 192.168.204.142@tcp:/lustre /mnt/lustre lustre rw,checksum,encrypt,flock,lazystatfs,lruresize,nolock,statfs_project,nouser_fid2path,user_xattr,verbose 0 0 Stopping client oleg442-client.virtnet /mnt/lustre (opts:) Started LFSCK on the device lustre-MDT0000: scrub namespace oleg442-server: oleg442-server.virtnet: executing set_default_debug vfstrace rpctrace dlmtrace neterror ha config ioctl super lfsck all pdsh@oleg442-client: oleg442-server: ssh exited with exit code 1 Starting client: oleg442-client.virtnet: -o user_xattr,flock 192.168.204.142@tcp:/lustre /mnt/lustre PASS 2c (37s) == sanity-lfsck test 2d: LFSCK can recover the missing linkEA entry ========================================================== 09:43:24 (1773668604) preparing... 1 * 1 files will be created Mon Mar 16 09:43:26 EDT 2026. total: 1 mkdir in 0.00 seconds: 207.77 ops/second total: 1 create in 0.00 seconds: 292.06 ops/second total: 1 mkdir in 0.01 seconds: 157.23 ops/second prepared Mon Mar 16 09:43:29 EDT 2026. fail_loc=0x161d fail_loc=0 192.168.204.142@tcp:/lustre /mnt/lustre lustre rw,checksum,encrypt,flock,lazystatfs,lruresize,nolock,statfs_project,nouser_fid2path,user_xattr,verbose 0 0 Stopping client oleg442-client.virtnet /mnt/lustre (opts:) Started LFSCK on the device lustre-MDT0000: scrub namespace oleg442-server: oleg442-server.virtnet: executing set_default_debug vfstrace rpctrace dlmtrace neterror ha config ioctl super lfsck all pdsh@oleg442-client: oleg442-server: ssh exited with exit code 1 Starting client: oleg442-client.virtnet: -o user_xattr,flock 192.168.204.142@tcp:/lustre /mnt/lustre PASS 2d (28s) == sanity-lfsck test 2e: namespace LFSCK can verify remote object linkEA ========================================================== 09:43:52 (1773668632) fail_loc=0x1603 fail_loc=0 Started LFSCK on the device lustre-MDT0000: scrub namespace PASS 2e (7s) == sanity-lfsck test 3: LFSCK can verify multiple-linked objects ========================================================== 09:43:59 (1773668639) preparing... 4 * 4 files will be created Mon Mar 16 09:44:00 EDT 2026. total: 4 mkdir in 0.02 seconds: 213.74 ops/second total: 4 create in 0.02 seconds: 201.92 ops/second total: 4 mkdir in 0.01 seconds: 283.00 ops/second prepared Mon Mar 16 09:44:03 EDT 2026. fail_loc=0x1603 fail_loc=0x1604 fail_loc=0 Started LFSCK on the device lustre-MDT0000: scrub namespace PASS 3 (11s) == sanity-lfsck test 4: FID-in-dirent can be rebuilt after MDT file-level backup/restore ========================================================== 09:44:10 (1773668650) preparing... 3 * 3 files will be created Mon Mar 16 09:44:12 EDT 2026. total: 3 mkdir in 0.02 seconds: 170.36 ops/second total: 3 create in 0.02 seconds: 195.52 ops/second total: 3 mkdir in 0.01 seconds: 231.30 ops/second prepared Mon Mar 16 09:44:16 EDT 2026. Stopping clients: oleg442-client.virtnet /mnt/lustre (opts:) Stopping client oleg442-client.virtnet /mnt/lustre opts: file-level backup/restore on mds1:/dev/mapper/mds1_flakey backup data reformat new device Format mds1: /dev/mapper/mds1_flakey restore data remove recovery logs removed '/mnt/lustre-brpt/CATALOGS' start mds1 with disabling OI scrub oleg442-server: oleg442-server.virtnet: executing set_default_debug vfstrace rpctrace dlmtrace neterror ha config ioctl super lfsck all pdsh@oleg442-client: oleg442-server: ssh exited with exit code 1 fail_val=1 fail_loc=0x1601 Started LFSCK on the device lustre-MDT0000: scrub namespace Waiting 32s for 'inconsistent' Updated after 2s: want 'inconsistent' got 'inconsistent' fail_loc=0 fail_val=0 oleg442-server: oleg442-server.virtnet: executing set_default_debug vfstrace rpctrace dlmtrace neterror ha config ioctl super lfsck all pdsh@oleg442-client: oleg442-server: ssh exited with exit code 1 Starting client: oleg442-client.virtnet: -o user_xattr,flock 192.168.204.142@tcp:/lustre /mnt/lustre fail_loc=0x1505 fail_loc=0 PASS 4 (78s) == sanity-lfsck test 5: LFSCK can handle IGIF object upgrading ========================================================== 09:45:28 (1773668728) preparing... 1 * 1 files will be created Mon Mar 16 09:45:29 EDT 2026. fail_loc=0x1504 total: 1 mkdir in 0.00 seconds: 244.44 ops/second total: 1 create in 0.00 seconds: 224.11 ops/second total: 1 mkdir in 0.01 seconds: 154.97 ops/second fail_loc=0 prepared Mon Mar 16 09:45:33 EDT 2026. Stopping clients: oleg442-client.virtnet /mnt/lustre (opts:) Stopping client oleg442-client.virtnet /mnt/lustre opts: file-level backup/restore on mds1:/dev/mapper/mds1_flakey backup data reformat new device Format mds1: /dev/mapper/mds1_flakey restore data remove recovery logs removed '/mnt/lustre-brpt/CATALOGS' start mds1 with disabling OI scrub oleg442-server: oleg442-server.virtnet: executing set_default_debug vfstrace rpctrace dlmtrace neterror ha config ioctl super lfsck all pdsh@oleg442-client: oleg442-server: ssh exited with exit code 1 fail_val=1 fail_loc=0x1601 Started LFSCK on the device lustre-MDT0000: scrub namespace Waiting 32s for 'inconsistent,upgrade' Updated after 6s: want 'inconsistent,upgrade' got 'inconsistent,upgrade' fail_loc=0 fail_val=0 Waiting 32s for 'completed' Updated after 2s: want 'completed' got 'completed' oleg442-server: oleg442-server.virtnet: executing set_default_debug vfstrace rpctrace dlmtrace neterror ha config ioctl super lfsck all pdsh@oleg442-client: oleg442-server: ssh exited with exit code 1 Starting client: oleg442-client.virtnet: -o user_xattr,flock 192.168.204.142@tcp:/lustre /mnt/lustre fail_loc=0x1505 fail_loc=0 PASS 5 (55s) == sanity-lfsck test 6a: LFSCK resumes from last checkpoint (1) ========================================================== 09:46:23 (1773668783) preparing... 5 * 5 files will be created Mon Mar 16 09:46:25 EDT 2026. total: 5 mkdir in 0.04 seconds: 119.61 ops/second total: 5 create in 0.03 seconds: 144.77 ops/second total: 5 mkdir in 0.03 seconds: 181.74 ops/second prepared Mon Mar 16 09:46:27 EDT 2026. fail_val=1 fail_loc=0x1600 Started LFSCK on the device lustre-MDT0000: scrub namespace fail_loc=0x80001608 fail_val=1 fail_loc=0x1600 Started LFSCK on the device lustre-MDT0000: scrub namespace fail_loc=0 fail_val=0 PASS 6a (17s) == sanity-lfsck test 6b: LFSCK resumes from last checkpoint (2) ========================================================== 09:46:40 (1773668800) preparing... 5 * 5 files will be created Mon Mar 16 09:46:42 EDT 2026. total: 5 mkdir in 0.02 seconds: 229.98 ops/second total: 5 create in 0.03 seconds: 152.95 ops/second total: 5 mkdir in 0.03 seconds: 185.89 ops/second prepared Mon Mar 16 09:46:45 EDT 2026. fail_val=1 fail_loc=0x1601 Started LFSCK on the device lustre-MDT0000: scrub namespace fail_loc=0x80001609 fail_val=1 fail_loc=0x1601 Started LFSCK on the device lustre-MDT0000: scrub namespace Additional debug for 6b name: lfsck_namespace magic: 0xa06249ff version: 2 status: scanning-phase1 flags: param: last_completed_time: 1773668797 time_since_last_completed: 20 seconds latest_start_time: 1773668815 time_since_latest_start: 2 seconds last_checkpoint_time: 1773668812 time_since_last_checkpoint: 5 seconds latest_start_position: 20102, [0x2000061c1:0x78:0x0], 0x3ac86768a5e609c last_checkpoint_position: 20099, [0x2000061c1:0x78:0x0], 0x2e817831c194fb1 first_failure_position: N/A, N/A, N/A checked_phase1: 5 checked_phase2: 0 updated_phase1: 0 updated_phase2: 0 failed_phase1: 0 failed_phase2: 0 directories: 2 dirent_repaired: 0 linkea_repaired: 0 nlinks_repaired: 0 multiple_linked_checked: 0 multiple_linked_repaired: 0 unknown_inconsistency: 0 unmatched_pairs_repaired: 0 dangling_repaired: 0 multiple_referenced_repaired: 0 bad_file_type_repaired: 0 lost_dirent_repaired: 0 local_lost_found_scanned: 0 local_lost_found_moved: 0 local_lost_found_skipped: 0 local_lost_found_failed: 0 striped_dirs_scanned: 0 striped_dirs_repaired: 0 striped_dirs_failed: 0 striped_dirs_disabled: 0 striped_dirs_skipped: 0 striped_shards_scanned: 0 striped_shards_repaired: 0 striped_shards_failed: 0 striped_shards_skipped: 0 name_hash_repaired: 0 linkea_overflow_cleared: 0 agent_entries_repaired: 0 success_count: 14 run_time_phase1: 8 seconds run_time_phase2: 0 seconds average_speed_phase1: 0 items/sec average_speed_phase2: N/A average_speed_total: 0 items/sec real_time_speed_phase1: 0 items/sec real_time_speed_phase2: N/A current_position: 20101, [0x2000061c1:0x78:0x0], 0x3ac86768a5e609c fail_loc=0 fail_val=0 Waiting 32s for 'completed' Updated after 2s: want 'completed' got 'completed' PASS 6b (22s) == sanity-lfsck test 7a: non-stopped LFSCK should auto restarts after MDS remount (1) ========================================================== 09:47:02 (1773668822) preparing... 5 * 5 files will be created Mon Mar 16 09:47:04 EDT 2026. total: 5 mkdir in 0.03 seconds: 164.43 ops/second total: 5 create in 0.03 seconds: 193.63 ops/second total: 5 mkdir in 0.05 seconds: 108.11 ops/second prepared Mon Mar 16 09:47:07 EDT 2026. 192.168.204.142@tcp:/lustre /mnt/lustre lustre rw,checksum,encrypt,flock,lazystatfs,lruresize,nolock,statfs_project,nouser_fid2path,user_xattr,verbose 0 0 Stopping client oleg442-client.virtnet /mnt/lustre (opts:) fail_val=1 fail_loc=0x1601 Started LFSCK on the device lustre-MDT0000: scrub namespace stop mds1 fail_loc=0 fail_val=0 start mds1 oleg442-server: oleg442-server.virtnet: executing set_default_debug vfstrace rpctrace dlmtrace neterror ha config ioctl super lfsck all pdsh@oleg442-client: oleg442-server: ssh exited with exit code 1 Waiting 30s for 'completed' PASS 7a (26s) == sanity-lfsck test 7b: non-stopped LFSCK should auto restarts after MDS remount (2) ========================================================== 09:47:28 (1773668848) Checking servers environments Checking clients oleg442-client.virtnet environments /home/green/git/lustre-release/lustre/tests/test-framework.sh: line 1040: echo: write error: Device or resource busy Loading modules from /home/green/git/lustre-release/lustre detected 4 online CPUs by sysfs MODOPTS_LIBCFS= Force libcfs to create 2 CPU partitions loading modules on: 'oleg442-server' oleg442-server: oleg442-server.virtnet: executing load_modules_local oleg442-server: Loading modules from /home/green/git/lustre-release/lustre oleg442-server: /home/green/git/lustre-release/lustre/tests/test-framework.sh: line 1040: echo: write error: Device or resource busy oleg442-server: detected 4 online CPUs by sysfs oleg442-server: MODOPTS_LIBCFS= oleg442-server: Force libcfs to create 2 CPU partitions Setup mgs, mdt, osts Start mds1: mount -t lustre -o localrecov /dev/mapper/mds1_flakey /mnt/lustre-mds1 oleg442-server: mount.lustre: according to /etc/mtab /dev/mapper/mds1_flakey is already mounted on /mnt/lustre-mds1 pdsh@oleg442-client: oleg442-server: ssh exited with exit code 17 Start of /dev/mapper/mds1_flakey on mds1 failed 17 Start mds2: mount -t lustre -o localrecov /dev/mapper/mds2_flakey /mnt/lustre-mds2 oleg442-server: mount.lustre: according to /etc/mtab /dev/mapper/mds2_flakey is already mounted on /mnt/lustre-mds2 pdsh@oleg442-client: oleg442-server: ssh exited with exit code 17 Start of /dev/mapper/mds2_flakey on mds2 failed 17 Start ost1: mount -t lustre -o localrecov /dev/mapper/ost1_flakey /mnt/lustre-ost1 oleg442-server: mount.lustre: according to /etc/mtab /dev/mapper/ost1_flakey is already mounted on /mnt/lustre-ost1 pdsh@oleg442-client: oleg442-server: ssh exited with exit code 17 seq.cli-lustre-OST0000-super.width=65536 Start of /dev/mapper/ost1_flakey on ost1 failed 17 Start ost2: mount -t lustre -o localrecov /dev/mapper/ost2_flakey /mnt/lustre-ost2 oleg442-server: mount.lustre: according to /etc/mtab /dev/mapper/ost2_flakey is already mounted on /mnt/lustre-ost2 pdsh@oleg442-client: oleg442-server: ssh exited with exit code 17 seq.cli-lustre-OST0001-super.width=65536 Start of /dev/mapper/ost2_flakey on ost2 failed 17 Starting client: oleg442-client.virtnet: -o user_xattr,flock 192.168.204.142@tcp:/lustre /mnt/lustre Starting client oleg442-client.virtnet: -o user_xattr,flock 192.168.204.142@tcp:/lustre /mnt/lustre Started clients oleg442-client.virtnet: 192.168.204.142@tcp:/lustre on /mnt/lustre type lustre (rw,checksum,encrypt,flock,lazystatfs,lruresize,nolock,statfs_project,nouser_fid2path,user_xattr,verbose) Using TIMEOUT=20 osc.lustre-OST0000-osc-ffff9cb090214800.idle_timeout=debug osc.lustre-OST0001-osc-ffff9cb090214800.idle_timeout=debug disable quota as required preparing... 2 * 2 files will be created Mon Mar 16 09:48:07 EDT 2026. total: 2 mkdir in 0.01 seconds: 211.52 ops/second total: 2 create in 0.01 seconds: 330.42 ops/second total: 2 mkdir in 0.01 seconds: 207.25 ops/second prepared Mon Mar 16 09:48:10 EDT 2026. fail_loc=0x1604 fail_val=1 fail_loc=0x1602 Started LFSCK on the device lustre-MDT0000: scrub namespace 192.168.204.142@tcp:/lustre /mnt/lustre lustre rw,checksum,encrypt,flock,lazystatfs,lruresize,nolock,statfs_project,nouser_fid2path,user_xattr,verbose 0 0 Stopping client oleg442-client.virtnet /mnt/lustre (opts:) stop mds1 fail_loc=0 fail_val=0 start mds1 oleg442-server: oleg442-server.virtnet: executing set_default_debug vfstrace rpctrace dlmtrace neterror ha config ioctl super lfsck all pdsh@oleg442-client: oleg442-server: ssh exited with exit code 1 Waiting 30s for 'completed' Updated after 2s: want 'completed' got 'completed' PASS 7b (60s) == sanity-lfsck test 8: LFSCK state machine ============== 09:48:28 (1773668908) formatall oleg442-server: oleg442-server.virtnet: executing set_hostid /home/green/git/lustre-release/lustre/tests/test-framework.sh: line 1040: echo: write error: Device or resource busy oleg442-server: oleg442-server.virtnet: executing load_modules_local oleg442-server: /home/green/git/lustre-release/lustre/tests/test-framework.sh: line 1040: echo: write error: Device or resource busy setupall /home/green/git/lustre-release/lustre/tests/test-framework.sh: line 1040: echo: write error: Device or resource busy oleg442-server: oleg442-server.virtnet: executing load_modules_local oleg442-server: /home/green/git/lustre-release/lustre/tests/test-framework.sh: line 1040: echo: write error: Device or resource busy oleg442-server: oleg442-server.virtnet: executing set_default_debug vfstrace rpctrace dlmtrace neterror ha config ioctl super lfsck all pdsh@oleg442-client: oleg442-server: ssh exited with exit code 1 oleg442-server: oleg442-server.virtnet: executing set_default_debug vfstrace rpctrace dlmtrace neterror ha config ioctl super lfsck all pdsh@oleg442-client: oleg442-server: ssh exited with exit code 1 oleg442-server: oleg442-server.virtnet: executing set_default_debug vfstrace rpctrace dlmtrace neterror ha config ioctl super lfsck all pdsh@oleg442-client: oleg442-server: ssh exited with exit code 1 oleg442-server: oleg442-server.virtnet: executing set_default_debug vfstrace rpctrace dlmtrace neterror ha config ioctl super lfsck all pdsh@oleg442-client: oleg442-server: ssh exited with exit code 1 Using TIMEOUT=20 preparing... 20 * 20 files will be created Mon Mar 16 09:49:58 EDT 2026. total: 20 mkdir in 0.07 seconds: 268.00 ops/second total: 20 create in 0.05 seconds: 377.41 ops/second total: 20 mkdir in 0.06 seconds: 314.60 ops/second prepared Mon Mar 16 09:50:01 EDT 2026. fail_loc=0x1603 fail_loc=0x1604 192.168.204.142@tcp:/lustre /mnt/lustre lustre rw,checksum,encrypt,flock,lazystatfs,lruresize,nolock,statfs_project,nouser_fid2path,user_xattr,verbose 0 0 Stopping client oleg442-client.virtnet /mnt/lustre (opts:) fail_val=2 fail_loc=0x1601 Started LFSCK on the device lustre-MDT0000: scrub namespace Stopped LFSCK on the device lustre-MDT0000. Started LFSCK on the device lustre-MDT0000: scrub namespace fail_loc=0x80001609 Waiting 32s for 'failed' fail_loc=0x1600 Started LFSCK on the device lustre-MDT0000: scrub namespace fail_loc=0x160a stop mds1 fail_loc=0x160b start mds1 oleg442-server: oleg442-server.virtnet: executing set_default_debug vfstrace rpctrace dlmtrace neterror ha config ioctl super lfsck all pdsh@oleg442-client: oleg442-server: ssh exited with exit code 1 fail_loc=0x1601 Started LFSCK on the device lustre-MDT0000: scrub namespace stop mds1 fail_loc=0x160b start mds1 oleg442-server: oleg442-server.virtnet: executing set_default_debug vfstrace rpctrace dlmtrace neterror ha config ioctl super lfsck all pdsh@oleg442-client: oleg442-server: ssh exited with exit code 1 stop mds1 start mds1 without resume LFSCK oleg442-server: oleg442-server.virtnet: executing set_default_debug vfstrace rpctrace dlmtrace neterror ha config ioctl super lfsck all pdsh@oleg442-client: oleg442-server: ssh exited with exit code 1 fail_val=2 fail_loc=0x1602 Started LFSCK on the device lustre-MDT0000: scrub namespace fail_loc=0 fail_val=0 Waiting 32s for 'completed' PASS 8 (142s) == sanity-lfsck test 9a: LFSCK speed control (1) ========= 09:50:51 (1773669051) Checking servers environments Checking clients oleg442-client.virtnet environments /home/green/git/lustre-release/lustre/tests/test-framework.sh: line 1040: echo: write error: Device or resource busy Loading modules from /home/green/git/lustre-release/lustre detected 4 online CPUs by sysfs MODOPTS_LIBCFS= Force libcfs to create 2 CPU partitions loading modules on: 'oleg442-server' oleg442-server: oleg442-server.virtnet: executing load_modules_local oleg442-server: Loading modules from /home/green/git/lustre-release/lustre oleg442-server: /home/green/git/lustre-release/lustre/tests/test-framework.sh: line 1040: echo: write error: Device or resource busy oleg442-server: detected 4 online CPUs by sysfs oleg442-server: MODOPTS_LIBCFS= oleg442-server: Force libcfs to create 2 CPU partitions Setup mgs, mdt, osts Start mds1: mount -t lustre -o localrecov /dev/mapper/mds1_flakey /mnt/lustre-mds1 oleg442-server: mount.lustre: according to /etc/mtab /dev/mapper/mds1_flakey is already mounted on /mnt/lustre-mds1 pdsh@oleg442-client: oleg442-server: ssh exited with exit code 17 Start of /dev/mapper/mds1_flakey on mds1 failed 17 Start mds2: mount -t lustre -o localrecov /dev/mapper/mds2_flakey /mnt/lustre-mds2 oleg442-server: mount.lustre: according to /etc/mtab /dev/mapper/mds2_flakey is already mounted on /mnt/lustre-mds2 pdsh@oleg442-client: oleg442-server: ssh exited with exit code 17 Start of /dev/mapper/mds2_flakey on mds2 failed 17 Start ost1: mount -t lustre -o localrecov /dev/mapper/ost1_flakey /mnt/lustre-ost1 oleg442-server: mount.lustre: according to /etc/mtab /dev/mapper/ost1_flakey is already mounted on /mnt/lustre-ost1 pdsh@oleg442-client: oleg442-server: ssh exited with exit code 17 seq.cli-lustre-OST0000-super.width=65536 Start of /dev/mapper/ost1_flakey on ost1 failed 17 Start ost2: mount -t lustre -o localrecov /dev/mapper/ost2_flakey /mnt/lustre-ost2 oleg442-server: mount.lustre: according to /etc/mtab /dev/mapper/ost2_flakey is already mounted on /mnt/lustre-ost2 pdsh@oleg442-client: oleg442-server: ssh exited with exit code 17 seq.cli-lustre-OST0001-super.width=65536 Start of /dev/mapper/ost2_flakey on ost2 failed 17 Starting client: oleg442-client.virtnet: -o user_xattr,flock 192.168.204.142@tcp:/lustre /mnt/lustre Starting client oleg442-client.virtnet: -o user_xattr,flock 192.168.204.142@tcp:/lustre /mnt/lustre Started clients oleg442-client.virtnet: 192.168.204.142@tcp:/lustre on /mnt/lustre type lustre (rw,checksum,encrypt,flock,lazystatfs,lruresize,nolock,statfs_project,nouser_fid2path,user_xattr,verbose) Using TIMEOUT=20 osc.lustre-OST0000-osc-ffff9cb086efd000.idle_timeout=debug osc.lustre-OST0001-osc-ffff9cb086efd000.idle_timeout=debug disable quota as required - open/close 1428 (time 1773669099.79 total 10.00 last 142.73) - open/close 2813 (time 1773669109.79 total 20.01 last 138.49) - open/close 4264 (time 1773669119.79 total 30.01 last 145.00) total: 5000 open/close in 35.09 seconds: 142.50 ops/second Started LFSCK on the device lustre-MDT0000: scrub layout PASS 9a (100s) == sanity-lfsck test 9b: LFSCK speed control (2) ========= 09:52:31 (1773669151) preparing... 0 * 0 files will be created Mon Mar 16 09:52:50 EDT 2026. prepared Mon Mar 16 09:52:52 EDT 2026. Preparing another 50 * 50 files (with error) at Mon Mar 16 09:52:52 EDT 2026. fail_loc=0x1604 total: 50 mkdir in 0.17 seconds: 299.04 ops/second total: 50 create in 0.17 seconds: 299.93 ops/second fail_loc=0x160c Started LFSCK on the device lustre-MDT0000: scrub namespace Waiting 10s for 'stopped' Updated after 3s: want 'stopped' got 'stopped' fail_loc=0 Prepared at Mon Mar 16 09:53:18 EDT 2026. Started LFSCK on the device lustre-MDT0000: scrub namespace PASS 9b (85s) == sanity-lfsck test 10: System is available during LFSCK scanning ========================================================== 09:53:57 (1773669237) preparing... 1 * 1 files will be created Mon Mar 16 09:54:43 EDT 2026. total: 1 mkdir in 0.01 seconds: 121.78 ops/second total: 1 create in 0.01 seconds: 94.85 ops/second total: 1 mkdir in 0.01 seconds: 198.49 ops/second prepared Mon Mar 16 09:54:47 EDT 2026. Preparing more files with error at Mon Mar 16 09:54:47 EDT 2026. fail_loc=0x1603 fail_loc=0x1604 fail_loc=0 Prepared at Mon Mar 16 09:56:44 EDT 2026. 192.168.204.142@tcp:/lustre /mnt/lustre lustre rw,checksum,encrypt,flock,lazystatfs,lruresize,nolock,statfs_project,nouser_fid2path,user_xattr,verbose 0 0 Stopping client oleg442-client.virtnet /mnt/lustre (opts:) Starting client: oleg442-client.virtnet: -o user_xattr,flock 192.168.204.142@tcp:/lustre /mnt/lustre Started LFSCK on the device lustre-MDT0000: scrub namespace Waiting 90s for 'completed' Waiting 70s for 'completed' Waiting 60s for 'completed' Waiting 50s for 'completed' Waiting 30s for 'completed' Updated after 62s: want 'completed' got 'completed' PASS 10 (314s) == sanity-lfsck test 11a: LFSCK can rebuild lost last_id ========================================================== 09:59:11 (1773669551) total: 64 open/close in 0.59 seconds: 108.15 ops/second stopall remove LAST_ID on ost1: idx=0 removed '/mnt/lustre-ost1/O/0/LAST_ID' oleg442-server: oleg442-server.virtnet: executing set_default_debug vfstrace rpctrace dlmtrace neterror ha config ioctl super lfsck all pdsh@oleg442-client: oleg442-server: ssh exited with exit code 1 fail_val=3 fail_loc=0x160e trigger LFSCK for layout on ost1 to rebuild the LAST_ID(s) Started LFSCK on the device lustre-OST0000: scrub layout fail_val=0 fail_loc=0 Waiting 32s for 'completed' Updated after 3s: want 'completed' got 'completed' the LAST_ID(s) should have been rebuilt PASS 11a (210s) == sanity-lfsck test 11b: LFSCK can rebuild crashed last_id ========================================================== 10:02:41 (1773669761) Checking servers environments Checking clients oleg442-client.virtnet environments /home/green/git/lustre-release/lustre/tests/test-framework.sh: line 1040: echo: write error: Device or resource busy Loading modules from /home/green/git/lustre-release/lustre detected 4 online CPUs by sysfs MODOPTS_LIBCFS= Force libcfs to create 2 CPU partitions loading modules on: 'oleg442-server' oleg442-server: oleg442-server.virtnet: executing load_modules_local oleg442-server: Loading modules from /home/green/git/lustre-release/lustre oleg442-server: /home/green/git/lustre-release/lustre/tests/test-framework.sh: line 1040: echo: write error: Device or resource busy oleg442-server: detected 4 online CPUs by sysfs oleg442-server: MODOPTS_LIBCFS= oleg442-server: Force libcfs to create 2 CPU partitions Setup mgs, mdt, osts Start mds1: mount -t lustre -o localrecov /dev/mapper/mds1_flakey /mnt/lustre-mds1 oleg442-server: oleg442-server.virtnet: executing set_default_debug vfstrace rpctrace dlmtrace neterror ha config ioctl super lfsck all pdsh@oleg442-client: oleg442-server: ssh exited with exit code 1 Started lustre-MDT0000 Start mds2: mount -t lustre -o localrecov /dev/mapper/mds2_flakey /mnt/lustre-mds2 oleg442-server: oleg442-server.virtnet: executing set_default_debug vfstrace rpctrace dlmtrace neterror ha config ioctl super lfsck all pdsh@oleg442-client: oleg442-server: ssh exited with exit code 1 Started lustre-MDT0001 Start ost1: mount -t lustre -o localrecov /dev/mapper/ost1_flakey /mnt/lustre-ost1 oleg442-server: mount.lustre: according to /etc/mtab /dev/mapper/ost1_flakey is already mounted on /mnt/lustre-ost1 pdsh@oleg442-client: oleg442-server: ssh exited with exit code 17 seq.cli-lustre-OST0000-super.width=65536 Start of /dev/mapper/ost1_flakey on ost1 failed 17 Start ost2: mount -t lustre -o localrecov /dev/mapper/ost2_flakey /mnt/lustre-ost2 seq.cli-lustre-OST0001-super.width=65536 oleg442-server: oleg442-server.virtnet: executing set_default_debug vfstrace rpctrace dlmtrace neterror ha config ioctl super lfsck all pdsh@oleg442-client: oleg442-server: ssh exited with exit code 1 Started lustre-OST0001 Starting client: oleg442-client.virtnet: -o user_xattr,flock 192.168.204.142@tcp:/lustre /mnt/lustre Starting client oleg442-client.virtnet: -o user_xattr,flock 192.168.204.142@tcp:/lustre /mnt/lustre Started clients oleg442-client.virtnet: 192.168.204.142@tcp:/lustre on /mnt/lustre type lustre (rw,checksum,encrypt,flock,lazystatfs,lruresize,nolock,statfs_project,nouser_fid2path,user_xattr,verbose) Using TIMEOUT=20 osc.lustre-OST0000-osc-ffff9cb086ef8000.idle_timeout=debug osc.lustre-OST0001-osc-ffff9cb086ef8000.idle_timeout=debug disable quota as required set fail_loc=0x160d to skip the updating LAST_ID on-disk fail_loc=0x160d - precreated_ost_obj_count lustre-OST0000-osc-MDT0000 prealloc_last_id: 3297 prealloc_next_id: 3266 count: 32 total: 64 open/close in 0.51 seconds: 124.90 ops/second 192.168.204.142@tcp:/lustre /mnt/lustre lustre rw,checksum,encrypt,flock,lazystatfs,lruresize,nolock,statfs_project,nouser_fid2path,user_xattr,verbose 0 0 Stopping client oleg442-client.virtnet /mnt/lustre (opts:) Stopping /mnt/lustre-ost1 (opts:) on oleg442-server fail_loc=0x215 Start ost1: mount -t lustre -o localrecov /dev/mapper/ost1_flakey /mnt/lustre-ost1 seq.cli-lustre-OST0000-super.width=65536 oleg442-server: oleg442-server.virtnet: executing set_default_debug vfstrace rpctrace dlmtrace neterror ha config ioctl super lfsck all pdsh@oleg442-client: oleg442-server: ssh exited with exit code 1 Started lustre-OST0000 the on-disk LAST_ID should be smaller than the expected one trigger LFSCK for layout on ost1 to rebuild the on-disk LAST_ID Started LFSCK on the device lustre-OST0000: scrub layout Stopping /mnt/lustre-ost1 (opts:) on oleg442-server Start ost1: mount -t lustre -o localrecov /dev/mapper/ost1_flakey /mnt/lustre-ost1 seq.cli-lustre-OST0000-super.width=65536 oleg442-server: oleg442-server.virtnet: executing set_default_debug vfstrace rpctrace dlmtrace neterror ha config ioctl super lfsck all pdsh@oleg442-client: oleg442-server: ssh exited with exit code 1 Started lustre-OST0000 the on-disk LAST_ID should have been rebuilt fail_loc=0 Stopping clients: oleg442-client.virtnet /mnt/lustre (opts:) Stopping clients: oleg442-client.virtnet /mnt/lustre2 (opts:) Stopping /mnt/lustre-mds1 (opts:-f) on oleg442-server Stopping /mnt/lustre-mds2 (opts:-f) on oleg442-server Stopping /mnt/lustre-ost1 (opts:-f) on oleg442-server Stopping /mnt/lustre-ost2 (opts:-f) on oleg442-server PASS 11b (103s) == sanity-lfsck test 12a: single command to trigger LFSCK on all devices ========================================================== 10:04:24 (1773669864) Checking servers environments Checking clients oleg442-client.virtnet environments /home/green/git/lustre-release/lustre/tests/test-framework.sh: line 1040: echo: write error: Device or resource busy Loading modules from /home/green/git/lustre-release/lustre detected 4 online CPUs by sysfs MODOPTS_LIBCFS= Force libcfs to create 2 CPU partitions loading modules on: 'oleg442-server' oleg442-server: oleg442-server.virtnet: executing load_modules_local oleg442-server: /home/green/git/lustre-release/lustre/tests/test-framework.sh: line 1040: echo: write error: Device or resource busy oleg442-server: Loading modules from /home/green/git/lustre-release/lustre oleg442-server: detected 4 online CPUs by sysfs oleg442-server: MODOPTS_LIBCFS= oleg442-server: Force libcfs to create 2 CPU partitions Setup mgs, mdt, osts Start mds1: mount -t lustre -o localrecov /dev/mapper/mds1_flakey /mnt/lustre-mds1 oleg442-server: oleg442-server.virtnet: executing set_default_debug vfstrace rpctrace dlmtrace neterror ha config ioctl super lfsck all pdsh@oleg442-client: oleg442-server: ssh exited with exit code 1 Started lustre-MDT0000 Start mds2: mount -t lustre -o localrecov /dev/mapper/mds2_flakey /mnt/lustre-mds2 oleg442-server: oleg442-server.virtnet: executing set_default_debug vfstrace rpctrace dlmtrace neterror ha config ioctl super lfsck all pdsh@oleg442-client: oleg442-server: ssh exited with exit code 1 Started lustre-MDT0001 Start ost1: mount -t lustre -o localrecov /dev/mapper/ost1_flakey /mnt/lustre-ost1 seq.cli-lustre-OST0000-super.width=65536 oleg442-server: oleg442-server.virtnet: executing set_default_debug vfstrace rpctrace dlmtrace neterror ha config ioctl super lfsck all pdsh@oleg442-client: oleg442-server: ssh exited with exit code 1 Started lustre-OST0000 Start ost2: mount -t lustre -o localrecov /dev/mapper/ost2_flakey /mnt/lustre-ost2 seq.cli-lustre-OST0001-super.width=65536 oleg442-server: oleg442-server.virtnet: executing set_default_debug vfstrace rpctrace dlmtrace neterror ha config ioctl super lfsck all pdsh@oleg442-client: oleg442-server: ssh exited with exit code 1 Started lustre-OST0001 Starting client: oleg442-client.virtnet: -o user_xattr,flock 192.168.204.142@tcp:/lustre /mnt/lustre Starting client oleg442-client.virtnet: -o user_xattr,flock 192.168.204.142@tcp:/lustre /mnt/lustre Started clients oleg442-client.virtnet: 192.168.204.142@tcp:/lustre on /mnt/lustre type lustre (rw,checksum,encrypt,flock,lazystatfs,lruresize,nolock,statfs_project,nouser_fid2path,user_xattr,verbose) Using TIMEOUT=20 osc.lustre-OST0000-osc-ffff9cb0835ee000.idle_timeout=debug osc.lustre-OST0001-osc-ffff9cb0835ee000.idle_timeout=debug disable quota as required total: 100 open/close in 0.33 seconds: 306.66 ops/second total: 100 open/close in 0.29 seconds: 345.51 ops/second Start namespace LFSCK on all targets by single command (-s 1). Started LFSCK on the device lustre-MDT0000: scrub namespace All the LFSCK targets should be in 'scanning-phase1' status. Stop namespace LFSCK on all targets by single lctl command. Stopped LFSCK on the device lustre-MDT0000. All the LFSCK targets should be in 'stopped' status. Re-start namespace LFSCK on all targets by single command (-s 0). Started LFSCK on the device lustre-MDT0000: scrub namespace All the LFSCK targets should be in 'completed' status. debug=-1 debug_mb=150 debug=-1 debug_mb=150 Start layout LFSCK on all targets by single command (-s 1). Started LFSCK on the device lustre-MDT0000: scrub layout All the LFSCK targets should be in 'scanning-phase1' status. Stop layout LFSCK on all targets by single lctl command. Stopped LFSCK on the device lustre-MDT0000. All the LFSCK targets should be in 'stopped' status. Re-start layout LFSCK on all targets by single command (-s 0). Started LFSCK on the device lustre-MDT0000: scrub layout All the LFSCK targets should be in 'completed' status. debug_mb=21 debug_mb=21 PASS 12a (44s) == sanity-lfsck test 12b: auto detect Lustre device ====== 10:05:08 (1773669908) Start LFSCK without '-M' specified. Started LFSCK on the device lustre-MDT0000: scrub layout namespace Start layout LFSCK on the node with multipe targets, but not specify '-M'/'-A' option. Should get failure. oleg442-server: Detect multiple devices on current node. Please specify the device explicitly via '-M' option or '-A' option for all. pdsh@oleg442-client: oleg442-server: ssh exited with exit code 22 PASS 12b (4s) == sanity-lfsck test 13: LFSCK can repair crashed lmm_oi ========================================================== 10:05:12 (1773669912) ##### The lmm_oi in layout EA should be consistent with the MDT-object FID; otherwise, the LFSCK should re-generate the lmm_oi from the MDT-object FID. ##### Inject failure stub to simulate bad lmm_oi fail_loc=0x160f total: 1 open/close in 0.01 seconds: 114.03 ops/second fail_loc=0 Trigger layout LFSCK to find out the bad lmm_oi and fix them Started LFSCK on the device lustre-MDT0000: scrub layout PASS 13 (4s) == sanity-lfsck test 14a: LFSCK can repair MDT-object with dangling LOV EA reference (1) ========================================================== 10:05:16 (1773669916) ##### The OST-object referenced by the MDT-object should be there; otherwise, the LFSCK should re-create the missing OST-object. without '--delay-create-ostobj' option. ##### Inject failure stub to simulate dangling referenced MDT-object fail_loc=0x1610 - precreated_ost_obj_count lustre-OST0000-osc-MDT0000 prealloc_last_id: 3457 prealloc_next_id: 3413 count: 45 total: 61 open/close in 0.22 seconds: 277.18 ops/second touch: setting times of '/mnt/lustre/d14a.sanity-lfsck/guard0': No such file or directory touch: setting times of '/mnt/lustre/d14a.sanity-lfsck/guard1': No such file or directory fail_loc=0 debug=-1 debug_mb=150 debug=-1 debug_mb=150 - precreated_ost_obj_count lustre-OST0000-osc-MDT0000 prealloc_last_id: 3521 prealloc_next_id: 3492 count: 30 total: 30 open/close in 0.26 seconds: 113.25 ops/second 'ls' should fail because of dangling referenced MDT-object Trigger layout LFSCK to find out dangling reference Started LFSCK on the device lustre-MDT0000: scrub layout 'stat' should fail because of not repair dangling by default Trigger layout LFSCK to repair dangling reference Started LFSCK on the device lustre-MDT0000: scrub layout 'stat' should success after layout LFSCK repairing debug_mb=21 debug_mb=21 stopall to cleanup object cache setupall /home/green/git/lustre-release/lustre/tests/test-framework.sh: line 1040: echo: write error: Device or resource busy oleg442-server: oleg442-server.virtnet: executing load_modules_local oleg442-server: /home/green/git/lustre-release/lustre/tests/test-framework.sh: line 1040: echo: write error: Device or resource busy oleg442-server: oleg442-server.virtnet: executing set_default_debug vfstrace rpctrace dlmtrace neterror ha config ioctl super lfsck all pdsh@oleg442-client: oleg442-server: ssh exited with exit code 1 oleg442-server: oleg442-server.virtnet: executing set_default_debug vfstrace rpctrace dlmtrace neterror ha config ioctl super lfsck all pdsh@oleg442-client: oleg442-server: ssh exited with exit code 1 oleg442-server: oleg442-server.virtnet: executing set_default_debug vfstrace rpctrace dlmtrace neterror ha config ioctl super lfsck all pdsh@oleg442-client: oleg442-server: ssh exited with exit code 1 oleg442-server: oleg442-server.virtnet: executing set_default_debug vfstrace rpctrace dlmtrace neterror ha config ioctl super lfsck all pdsh@oleg442-client: oleg442-server: ssh exited with exit code 1 Using TIMEOUT=20 PASS 14a (107s) == sanity-lfsck test 14b: LFSCK can repair MDT-object with dangling LOV EA reference (2) ========================================================== 10:07:03 (1773670023) ##### The OST-object referenced by the MDT-object should be there; otherwise, the LFSCK should re-create the missing OST-object. with '--delay-create-ostobj' option. ##### Inject failure stub to simulate dangling referenced MDT-object fail_loc=0x1610 - precreated_ost_obj_count lustre-OST0000-osc-MDT0000 prealloc_last_id: 3585 prealloc_next_id: 3554 count: 32 total: 63 open/close in 0.47 seconds: 134.92 ops/second touch: setting times of '/mnt/lustre/d14b.sanity-lfsck/guard': No such file or directory fail_loc=0 debug=-1 debug_mb=150 debug=-1 debug_mb=150 - precreated_ost_obj_count lustre-OST0000-osc-MDT0000 prealloc_last_id: 3649 prealloc_next_id: 3618 count: 32 total: 32 open/close in 0.40 seconds: 80.71 ops/second 'ls' should fail because of dangling referenced MDT-object Trigger layout LFSCK to find out dangling reference Started LFSCK on the device lustre-MDT0000: scrub layout 'stat' should fail because of not repair dangling by default Trigger layout LFSCK to repair dangling reference Started LFSCK on the device lustre-MDT0000: scrub layout 'stat' should success after layout LFSCK repairing debug_mb=21 debug_mb=21 stopall to cleanup object cache setupall /home/green/git/lustre-release/lustre/tests/test-framework.sh: line 1040: echo: write error: Device or resource busy oleg442-server: oleg442-server.virtnet: executing load_modules_local oleg442-server: /home/green/git/lustre-release/lustre/tests/test-framework.sh: line 1040: echo: write error: Device or resource busy oleg442-server: oleg442-server.virtnet: executing set_default_debug vfstrace rpctrace dlmtrace neterror ha config ioctl super lfsck all pdsh@oleg442-client: oleg442-server: ssh exited with exit code 1 oleg442-server: oleg442-server.virtnet: executing set_default_debug vfstrace rpctrace dlmtrace neterror ha config ioctl super lfsck all pdsh@oleg442-client: oleg442-server: ssh exited with exit code 1 oleg442-server: oleg442-server.virtnet: executing set_default_debug vfstrace rpctrace dlmtrace neterror ha config ioctl super lfsck all pdsh@oleg442-client: oleg442-server: ssh exited with exit code 1 oleg442-server: oleg442-server.virtnet: executing set_default_debug vfstrace rpctrace dlmtrace neterror ha config ioctl super lfsck all pdsh@oleg442-client: oleg442-server: ssh exited with exit code 1 Using TIMEOUT=20 PASS 14b (81s) == sanity-lfsck test 15a: LFSCK can repair unmatched MDT-object/OST-object pairs (1) ========================================================== 10:08:24 (1773670104) ##### If the OST-object referenced by the MDT-object back points to some non-exist MDT-object, then the LFSCK should repair the OST-object to back point to the right MDT-object. ##### Inject failure stub to make the OST-object to back point to non-exist MDT-object. fail_loc=0x1611 1+0 records in 1+0 records out 1048576 bytes (1.0 MB, 1.0 MiB) copied, 0.0167685 s, 62.5 MB/s 257+0 records in 257+0 records out 1052672 bytes (1.1 MB, 1.0 MiB) copied, 0.089762 s, 11.7 MB/s fail_loc=0 Trigger layout LFSCK to find out unmatched pairs and fix them Started LFSCK on the device lustre-MDT0000: scrub layout PASS 15a (5s) == sanity-lfsck test 15b: LFSCK can repair unmatched MDT-object/OST-object pairs (2) ========================================================== 10:08:29 (1773670109) ##### If the OST-object referenced by the MDT-object back points to other MDT-object that doesn't recognize the OST-object, then the LFSCK should repair it to back point to the right MDT-object (the first one). ##### 1+0 records in 1+0 records out 1048576 bytes (1.0 MB, 1.0 MiB) copied, 0.0166196 s, 63.1 MB/s Inject failure stub to make the OST-object to back point to other MDT-object fail_loc=0x1612 1+0 records in 1+0 records out 1048576 bytes (1.0 MB, 1.0 MiB) copied, 0.0174649 s, 60.0 MB/s 2+0 records in 2+0 records out 2097152 bytes (2.1 MB, 2.0 MiB) copied, 0.0385947 s, 54.3 MB/s fail_loc=0 Trigger layout LFSCK to find out unmatched pairs and fix them Started LFSCK on the device lustre-MDT0000: scrub layout PASS 15b (5s) == sanity-lfsck test 15c: LFSCK can repair unmatched MDT-object/OST-object pairs (3) ========================================================== 10:08:34 (1773670114) SKIP: sanity-lfsck test_15c MDS newer than 2.7.55, LU-6475 SKIP 15c (1s) == sanity-lfsck test 15d: LFSCK don't crash upon dir migration failure ========================================================== 10:08:35 (1773670115) total: 100 open/close in 0.52 seconds: 194.01 ops/second total: 100 mkdir in 0.28 seconds: 355.75 ops/second Migrate /mnt/lustre/d15d.sanity-lfsck to MDT1 fail_loc=0x1709 lfs migrate: /mnt/lustre/d15d.sanity-lfsck/s27 migrate failed: Input/output error (5) lfs migrate: cb_migrate_mdt_fini: error completing migration of /mnt/lustre/d15d.sanity-lfsck: Directory not empty (39) fail_loc=0 fail_loc=0x1709 fail_loc=0 Started LFSCK on the device lustre-MDT0000: scrub namespace lfs migrate: cb_migrate_mdt_fini: error completing migration of /mnt/lustre/d15d.sanity-lfsck: Directory not empty (39) lfs rm_entry: error on ioctl 0xc03066f0 for '*' (3): No such file or directory (2) debug=0 Stopping clients: oleg442-client.virtnet /mnt/lustre (opts:) Stopping client oleg442-client.virtnet /mnt/lustre opts: Stopping clients: oleg442-client.virtnet /mnt/lustre2 (opts:) Stopping /mnt/lustre-mds1 (opts:-f) on oleg442-server Stopping /mnt/lustre-mds2 (opts:-f) on oleg442-server Stopping /mnt/lustre-ost1 (opts:-f) on oleg442-server Stopping /mnt/lustre-ost2 (opts:-f) on oleg442-server unloading modules via unload_modules_local on: 'oleg442-server' oleg442-server: oleg442-server.virtnet: executing unload_modules_local oleg442-server: modules unloaded. === sanity-lfsck: start setup 10:10:02 (1773670202) === Stopping clients: oleg442-client.virtnet /mnt/lustre (opts:-f) Stopping clients: oleg442-client.virtnet /mnt/lustre2 (opts:-f) pdsh@oleg442-client: oleg442-server: ssh exited with exit code 1 pdsh@oleg442-client: oleg442-server: ssh exited with exit code 1 pdsh@oleg442-client: oleg442-server: ssh exited with exit code 1 pdsh@oleg442-client: oleg442-server: ssh exited with exit code 1 pdsh@oleg442-client: oleg442-server: ssh exited with exit code 2 oleg442-server: oleg442-server.virtnet: executing set_hostid /home/green/git/lustre-release/lustre/tests/test-framework.sh: line 1040: echo: write error: Device or resource busy Loading modules from /home/green/git/lustre-release/lustre detected 4 online CPUs by sysfs MODOPTS_LIBCFS= Force libcfs to create 2 CPU partitions ptlrpc/ptlrpc options: 'lbug_on_grant_miscount=1' quota/lquota options: 'hash_lqs_cur_bits=3' mdt/mdt options: 'mdt_enable_flr_ec=1' ln: failed to create symbolic link '/sbin/.libs': Read-only file system loading modules on: 'oleg442-server' oleg442-server: oleg442-server.virtnet: executing load_modules_local oleg442-server: Loading modules from /home/green/git/lustre-release/lustre oleg442-server: /home/green/git/lustre-release/lustre/tests/test-framework.sh: line 1040: echo: write error: Device or resource busy oleg442-server: detected 4 online CPUs by sysfs oleg442-server: MODOPTS_LIBCFS= oleg442-server: Force libcfs to create 2 CPU partitions oleg442-server: ptlrpc/ptlrpc options: 'lbug_on_grant_miscount=1' oleg442-server: quota/lquota options: 'hash_lqs_cur_bits=3' oleg442-server: mdt/mdt options: 'mdt_enable_flr_ec=1' Formatting mgs, mds, osts Format mds1: /dev/vdc pdsh@oleg442-client: oleg442-server: ssh exited with exit code 1 Format mds2: /dev/vdd pdsh@oleg442-client: oleg442-server: ssh exited with exit code 1 Format ost1: /dev/vde pdsh@oleg442-client: oleg442-server: ssh exited with exit code 1 Format ost2: /dev/vdf pdsh@oleg442-client: oleg442-server: ssh exited with exit code 1 Checking servers environments Checking clients oleg442-client.virtnet environments /home/green/git/lustre-release/lustre/tests/test-framework.sh: line 1040: echo: write error: Device or resource busy Loading modules from /home/green/git/lustre-release/lustre detected 4 online CPUs by sysfs MODOPTS_LIBCFS= Force libcfs to create 2 CPU partitions loading modules on: 'oleg442-server' oleg442-server: oleg442-server.virtnet: executing load_modules_local oleg442-server: Loading modules from /home/green/git/lustre-release/lustre oleg442-server: /home/green/git/lustre-release/lustre/tests/test-framework.sh: line 1040: echo: write error: Device or resource busy oleg442-server: detected 4 online CPUs by sysfs oleg442-server: MODOPTS_LIBCFS= oleg442-server: Force libcfs to create 2 CPU partitions Setup mgs, mdt, osts pdsh@oleg442-client: oleg442-server: ssh exited with exit code 1 Start mds1: mount -t lustre -o localrecov /dev/mapper/mds1_flakey /mnt/lustre-mds1 oleg442-server: oleg442-server.virtnet: executing set_default_debug vfstrace rpctrace dlmtrace neterror ha config ioctl super lfsck all pdsh@oleg442-client: oleg442-server: ssh exited with exit code 1 Commit the device label on /dev/vdc Started lustre-MDT0000 pdsh@oleg442-client: oleg442-server: ssh exited with exit code 1 Start mds2: mount -t lustre -o localrecov /dev/mapper/mds2_flakey /mnt/lustre-mds2 oleg442-server: oleg442-server.virtnet: executing set_default_debug vfstrace rpctrace dlmtrace neterror ha config ioctl super lfsck all pdsh@oleg442-client: oleg442-server: ssh exited with exit code 1 Commit the device label on /dev/vdd Started lustre-MDT0001 pdsh@oleg442-client: oleg442-server: ssh exited with exit code 1 Start ost1: mount -t lustre -o localrecov /dev/mapper/ost1_flakey /mnt/lustre-ost1 seq.cli-lustre-OST0000-super.width=65536 oleg442-server: oleg442-server.virtnet: executing set_default_debug vfstrace rpctrace dlmtrace neterror ha config ioctl super lfsck all pdsh@oleg442-client: oleg442-server: ssh exited with exit code 1 Commit the device label on /dev/vde Started lustre-OST0000 pdsh@oleg442-client: oleg442-server: ssh exited with exit code 1 Start ost2: mount -t lustre -o localrecov /dev/mapper/ost2_flakey /mnt/lustre-ost2 seq.cli-lustre-OST0001-super.width=65536 oleg442-server: oleg442-server.virtnet: executing set_default_debug vfstrace rpctrace dlmtrace neterror ha config ioctl super lfsck all pdsh@oleg442-client: oleg442-server: ssh exited with exit code 1 Commit the device label on /dev/vdf Started lustre-OST0001 Starting client: oleg442-client.virtnet: -o user_xattr,flock 192.168.204.142@tcp:/lustre /mnt/lustre Starting client oleg442-client.virtnet: -o user_xattr,flock 192.168.204.142@tcp:/lustre /mnt/lustre Started clients oleg442-client.virtnet: 192.168.204.142@tcp:/lustre on /mnt/lustre type lustre (rw,checksum,encrypt,flock,lazystatfs,lruresize,nolock,statfs_project,nouser_fid2path,user_xattr,verbose) Using TIMEOUT=20 osc.lustre-OST0000-osc-ffff9cb090068000.idle_timeout=debug osc.lustre-OST0001-osc-ffff9cb090068000.idle_timeout=debug setting jobstats to procname_uid Setting lustre.sys.jobid_var from disable to procname_uid Waiting 90s for 'procname_uid' Updated after 5s: want 'procname_uid' got 'procname_uid' disable quota as required osd-ldiskfs.track_declares_assert=1 === sanity-lfsck: finish setup 10:11:23 (1773670283) === PASS 15d (171s) == sanity-lfsck test 16: LFSCK can repair inconsistent MDT-object/OST-object owner ========================================================== 10:11:26 (1773670286) ##### If the OST-object's owner information does not match the owner information stored in the MDT-object, then the LFSCK trust the MDT-object and update the OST-object's owner information. ##### 1+0 records in 1+0 records out 1048576 bytes (1.0 MB, 1.0 MiB) copied, 0.0208359 s, 50.3 MB/s running as uid/gid/euid/egid 500/500/500/500, groups: 500 [createmany] [-o] [/mnt/lustre/d16.sanity-lfsck/d1/o] [100] total: 100 open/close in 0.55 seconds: 181.99 ops/second Inject failure stub to skip OST-object owner changing fail_loc=0x1613 fail_loc=0 Trigger layout LFSCK to find out inconsistent OST-object owner and fix them Started LFSCK on the device lustre-MDT0000: scrub layout PASS 16 (5s) == sanity-lfsck test 17: LFSCK can repair multiple references ========================================================== 10:11:31 (1773670291) ##### If more than one MDT-objects reference the same OST-object, and the OST-object only recognizes one MDT-object, then the LFSCK should create new OST-objects for such non-recognized MDT-objects. ##### Inject failure stub to make two MDT-objects to refernce the OST-object fail_val=0 fail_loc=0x1614 1+0 records in 1+0 records out 1048576 bytes (1.0 MB, 1.0 MiB) copied, 0.0151087 s, 69.4 MB/s total: 1 open/close in 0.01 seconds: 102.01 ops/second fail_loc=0 fail_val=0 /mnt/lustre/d17.sanity-lfsck/f0 and /mnt/lustre/d17.sanity-lfsck/guard use the same OST-objects /mnt/lustre/d17.sanity-lfsck/f1 and /mnt/lustre/d17.sanity-lfsck/guard use the same OST-objects Trigger layout LFSCK to find out multiple refenced MDT-objects and fix them Started LFSCK on the device lustre-MDT0000: scrub layout /mnt/lustre/d17.sanity-lfsck/f0 and /mnt/lustre/d17.sanity-lfsck/guard should use diff OST-objects 2+0 records in 2+0 records out 2097152 bytes (2.1 MB, 2.0 MiB) copied, 0.0249781 s, 84.0 MB/s /mnt/lustre/d17.sanity-lfsck/f1 and /mnt/lustre/d17.sanity-lfsck/guard should use diff OST-objects 2+0 records in 2+0 records out 2097152 bytes (2.1 MB, 2.0 MiB) copied, 0.0182838 s, 115 MB/s PASS 17 (5s) == sanity-lfsck test 18a: Find out orphan OST-object and repair it (1) ========================================================== 10:11:36 (1773670296) ##### The target MDT-object is there, but related stripe information is lost or partly lost. The LFSCK should regenerate the missing layout EA entries. ##### 2+0 records in 2+0 records out 2097152 bytes (2.1 MB, 2.0 MiB) copied, 0.0367347 s, 57.1 MB/s [0x200000402:0x73:0x0] /mnt/lustre/d18a.sanity-lfsck/a1/f1 lmm_stripe_count: 1 lmm_stripe_size: 4194304 lmm_pattern: raid0 lmm_layout_gen: 0 lmm_stripe_offset: 0 obdidx objid objid group 0 108 0x6c 0x280000401 2+0 records in 2+0 records out 2097152 bytes (2.1 MB, 2.0 MiB) copied, 0.0363441 s, 57.7 MB/s [0x240000402:0x2:0x0] /mnt/lustre/d18a.sanity-lfsck/a2/f2 lmm_stripe_count: 2 lmm_stripe_size: 1048576 lmm_pattern: raid0 lmm_layout_gen: 0 lmm_stripe_offset: 1 obdidx objid objid group 1 2 0x2 0x2c0000400 0 2 0x2 0x280000400 2+0 records in 2+0 records out 2097152 bytes (2.1 MB, 2.0 MiB) copied, 0.0426327 s, 49.2 MB/s [0x200000402:0x75:0x0] /mnt/lustre/d18a.sanity-lfsck/f3 lcm_layout_gen: 3 lcm_mirror_count: 1 lcm_entry_count: 2 lcme_id: 1 lcme_mirror_id: 0 lcme_flags: init lcme_extent.e_start: 0 lcme_extent.e_end: 1048576 lmm_stripe_count: 1 lmm_stripe_size: 1048576 lmm_pattern: raid0 lmm_layout_gen: 0 lmm_stripe_offset: 0 lmm_objects: - 0: { l_ost_idx: 0, l_fid: [0x280000401:0x6d:0x0] } lcme_id: 2 lcme_mirror_id: 0 lcme_flags: init lcme_extent.e_start: 1048576 lcme_extent.e_end: EOF lmm_stripe_count: 1 lmm_stripe_size: 1048576 lmm_pattern: raid0 lmm_layout_gen: 0 lmm_stripe_offset: 1 lmm_objects: - 0: { l_ost_idx: 1, l_fid: [0x2c0000401:0x2:0x0] } Inject failure, to make the MDT-object lost its layout EA fail_loc=0x1615 fail_loc=0x1615 fail_loc=0 fail_loc=0 The file size should be incorrect since layout EA is lost Trigger layout LFSCK on all devices to find out orphan OST-object Started LFSCK on the device lustre-MDT0000: scrub layout [0x200000402:0x73:0x0] /mnt/lustre/d18a.sanity-lfsck/a1/f1 lmm_stripe_count: 1 lmm_stripe_size: 4194304 lmm_pattern: raid0 lmm_layout_gen: 1 lmm_stripe_offset: 0 obdidx objid objid group 0 108 0x6c 0x280000401 [0x240000402:0x2:0x0] /mnt/lustre/d18a.sanity-lfsck/a2/f2 lmm_stripe_count: 2 lmm_stripe_size: 1048576 lmm_pattern: raid0 lmm_layout_gen: 2 lmm_stripe_offset: 1 obdidx objid objid group 1 2 0x2 0x2c0000400 0 2 0x2 0x280000400 [0x200000402:0x75:0x0] /mnt/lustre/d18a.sanity-lfsck/f3 lcm_layout_gen: 1 lcm_mirror_count: 1 lcm_entry_count: 2 lcme_id: 1 lcme_mirror_id: 0 lcme_flags: init lcme_extent.e_start: 0 lcme_extent.e_end: 1048576 lmm_stripe_count: 1 lmm_stripe_size: 1048576 lmm_pattern: raid0 lmm_layout_gen: 1 lmm_stripe_offset: 0 lmm_objects: - 0: { l_ost_idx: 0, l_fid: [0x280000401:0x6d:0x0] } lcme_id: 2 lcme_mirror_id: 0 lcme_flags: init lcme_extent.e_start: 1048576 lcme_extent.e_end: EOF lmm_stripe_count: 1 lmm_stripe_size: 1048576 lmm_pattern: raid0 lmm_layout_gen: 1 lmm_stripe_offset: 1 lmm_objects: - 0: { l_ost_idx: 1, l_fid: [0x2c0000401:0x2:0x0] } The file size should be correct after layout LFSCK scanning PASS 18a (9s) SKIP: sanity-lfsck test_18b skipping excluded test 18b == sanity-lfsck test 18c: Find out orphan OST-object and repair it (3) ========================================================== 10:11:46 (1773670306) ##### The target MDT-object is lost, and the OST-object FID is missing. The LFSCK should re-create the MDT-object with new FID under the directory .lustre/lost+found/MDTxxxx. ##### Inject failure, to simulate the case of missing parent FID fail_loc=0x1617 2+0 records in 2+0 records out 2097152 bytes (2.1 MB, 2.0 MiB) copied, 0.0328917 s, 63.8 MB/s /mnt/lustre/d18c.sanity-lfsck/a1/f1 lmm_stripe_count: 1 lmm_stripe_size: 4194304 lmm_pattern: raid0 lmm_layout_gen: 0 lmm_stripe_offset: 0 obdidx objid objid group 0 110 0x6e 0x280000401 2+0 records in 2+0 records out 2097152 bytes (2.1 MB, 2.0 MiB) copied, 0.0349036 s, 60.1 MB/s /mnt/lustre/d18c.sanity-lfsck/a2/f2 lmm_stripe_count: 1 lmm_stripe_size: 4194304 lmm_pattern: raid0 lmm_layout_gen: 0 lmm_stripe_offset: 0 obdidx objid objid group 0 3 0x3 0x280000400 2+0 records in 2+0 records out 2097152 bytes (2.1 MB, 2.0 MiB) copied, 0.0516371 s, 40.6 MB/s /mnt/lustre/d18c.sanity-lfsck/f3 lcm_layout_gen: 3 lcm_mirror_count: 1 lcm_entry_count: 2 lcme_id: 1 lcme_mirror_id: 0 lcme_flags: init lcme_extent.e_start: 0 lcme_extent.e_end: 1048576 lmm_stripe_count: 1 lmm_stripe_size: 1048576 lmm_pattern: raid0 lmm_layout_gen: 0 lmm_stripe_offset: 0 lmm_objects: - 0: { l_ost_idx: 0, l_fid: [0x280000401:0x6f:0x0] } lcme_id: 2 lcme_mirror_id: 0 lcme_flags: init lcme_extent.e_start: 1048576 lcme_extent.e_end: EOF lmm_stripe_count: 1 lmm_stripe_size: 1048576 lmm_pattern: raid0 lmm_layout_gen: 0 lmm_stripe_offset: 1 lmm_objects: - 0: { l_ost_idx: 1, l_fid: [0x2c0000401:0x3:0x0] } fail_loc=0 Inject failure, to simulate the case of missing the MDT-object fail_loc=0x1616 fail_loc=0x1616 fail_loc=0 fail_loc=0 Trigger layout LFSCK on all devices to find out orphan OST-object Started LFSCK on the device lustre-MDT0000: scrub layout total 12 144115188109410307 dr-x------ 4 root root 4096 Dec 31 1969 . 144115205306056705 drwx------ 2 root root 4096 Mar 16 10:11 MDT0000 162129603815538689 drwx------ 2 root root 4096 Mar 16 10:11 MDT0001 There should NOT be some stub under .lustre/lost+found/MDT0001/ There should be some stub under .lustre/lost+found/MDT0000/ PASS 18c (9s) == sanity-lfsck test 18d: Find out orphan OST-object and repair it (4) ========================================================== 10:11:55 (1773670315) ##### The target MDT-object layout EA is corrupted, but the right OST-object is still alive as orphan. The layout LFSCK will not create new OST-object to occupy such slot. ##### [0x200000402:0x81:0x0] /mnt/lustre/d18d.sanity-lfsck/a1/f1 lmm_stripe_count: 1 lmm_stripe_size: 4194304 lmm_pattern: raid0 lmm_layout_gen: 0 lmm_stripe_offset: 0 obdidx objid objid group 0 112 0x70 0x280000401 [0x200000402:0x82:0x0] /mnt/lustre/d18d.sanity-lfsck/a1/f2 lmm_stripe_count: 1 lmm_stripe_size: 4194304 lmm_pattern: raid0 lmm_layout_gen: 0 lmm_stripe_offset: 0 obdidx objid objid group 0 113 0x71 0x280000401 [0x200000402:0x83:0x0] /mnt/lustre/d18d.sanity-lfsck/a1/f3 lmm_stripe_count: 1 lmm_stripe_size: 4194304 lmm_pattern: raid0 lmm_layout_gen: 0 lmm_stripe_offset: 0 obdidx objid objid group 0 114 0x72 0x280000401 [0x200000402:0x84:0x0] /mnt/lustre/d18d.sanity-lfsck/a1/f4 lcm_layout_gen: 2 lcm_mirror_count: 1 lcm_entry_count: 2 lcme_id: 1 lcme_mirror_id: 0 lcme_flags: init lcme_extent.e_start: 0 lcme_extent.e_end: 1048576 lmm_stripe_count: 1 lmm_stripe_size: 1048576 lmm_pattern: raid0 lmm_layout_gen: 0 lmm_stripe_offset: 0 lmm_objects: - 0: { l_ost_idx: 0, l_fid: [0x280000401:0x73:0x0] } lcme_id: 2 lcme_mirror_id: 0 lcme_flags: 0 lcme_extent.e_start: 1048576 lcme_extent.e_end: EOF lmm_stripe_count: 1 lmm_stripe_size: 1048576 lmm_pattern: raid0 lmm_layout_gen: 0 lmm_stripe_offset: -1 Inject failure to make /mnt/lustre/d18d.sanity-lfsck/a1/f1 and /mnt/lustre/d18d.sanity-lfsck/a1/f2 to reference the same OST-object (which is f1's OST-object). Then drop /mnt/lustre/d18d.sanity-lfsck/a1/f1 and its OST-object, so f2 becomes dangling reference case, but f2's old OST-object is there. The failure also makes /mnt/lustre/d18d.sanity-lfsck/a1/f3 and /mnt/lustre/d18d.sanity-lfsck/a1/f4 to reference the same OST-object (which is f3's OST-object). Then drop /mnt/lustre/d18d.sanity-lfsck/a1/f3 and its OST-object, so f4 becomes dangling reference case, but f4's old OST-object is there. fail_loc=0x1618 fail_loc=0 stopall to cleanup object cache setupall /home/green/git/lustre-release/lustre/tests/test-framework.sh: line 1040: echo: write error: Device or resource busy oleg442-server: oleg442-server.virtnet: executing load_modules_local oleg442-server: /home/green/git/lustre-release/lustre/tests/test-framework.sh: line 1040: echo: write error: Device or resource busy oleg442-server: oleg442-server.virtnet: executing set_default_debug vfstrace rpctrace dlmtrace neterror ha config ioctl super lfsck all pdsh@oleg442-client: oleg442-server: ssh exited with exit code 1 oleg442-server: oleg442-server.virtnet: executing set_default_debug vfstrace rpctrace dlmtrace neterror ha config ioctl super lfsck all pdsh@oleg442-client: oleg442-server: ssh exited with exit code 1 oleg442-server: oleg442-server.virtnet: executing set_default_debug vfstrace rpctrace dlmtrace neterror ha config ioctl super lfsck all pdsh@oleg442-client: oleg442-server: ssh exited with exit code 1 oleg442-server: oleg442-server.virtnet: executing set_default_debug vfstrace rpctrace dlmtrace neterror ha config ioctl super lfsck all pdsh@oleg442-client: oleg442-server: ssh exited with exit code 1 Using TIMEOUT=20 Trigger layout LFSCK on all devices to find out orphan OST-object Started LFSCK on the device lustre-MDT0000: scrub layout The file size should be correct after layout LFSCK scanning The LFSCK should find back the original data. foo [0x200000402:0x82:0x0] /mnt/lustre/d18d.sanity-lfsck/a1/f2 lmm_stripe_count: 1 lmm_stripe_size: 4194304 lmm_pattern: raid0 lmm_layout_gen: 1 lmm_stripe_offset: 0 obdidx objid objid group 0 113 0x71 0x280000401 foo [0x200000402:0x84:0x0] /mnt/lustre/d18d.sanity-lfsck/a1/f4 lcm_layout_gen: 2 lcm_mirror_count: 1 lcm_entry_count: 2 lcme_id: 1 lcme_mirror_id: 0 lcme_flags: init lcme_extent.e_start: 0 lcme_extent.e_end: 1048576 lmm_stripe_count: 1 lmm_stripe_size: 1048576 lmm_pattern: raid0 lmm_layout_gen: 1 lmm_stripe_offset: 0 lmm_objects: - 0: { l_ost_idx: 0, l_fid: [0x280000401:0x73:0x0] } lcme_id: 2 lcme_mirror_id: 0 lcme_flags: 0 lcme_extent.e_start: 1048576 lcme_extent.e_end: EOF lmm_stripe_count: 1 lmm_stripe_size: 1048576 lmm_pattern: raid0 lmm_layout_gen: 0 lmm_stripe_offset: -1 PASS 18d (103s) == sanity-lfsck test 18e: Find out orphan OST-object and repair it (5) ========================================================== 10:13:38 (1773670418) ##### The target MDT-object layout EA slot is occpuied by some new created OST-object when repair dangling reference case. Such conflict OST-object has been modified by others. To keep the new data, the LFSCK will create a new file to refernece this old orphan OST-object. ##### [0x200000bd1:0x4:0x0] /mnt/lustre/d18e.sanity-lfsck/a1/f1 lmm_stripe_count: 1 lmm_stripe_size: 4194304 lmm_pattern: raid0 lmm_layout_gen: 0 lmm_stripe_offset: 0 obdidx objid objid group 0 162 0xa2 0x280000401 [0x200000bd1:0x5:0x0] /mnt/lustre/d18e.sanity-lfsck/a1/f2 lmm_stripe_count: 1 lmm_stripe_size: 4194304 lmm_pattern: raid0 lmm_layout_gen: 0 lmm_stripe_offset: 0 obdidx objid objid group 0 163 0xa3 0x280000401 [0x200000bd1:0x6:0x0] /mnt/lustre/d18e.sanity-lfsck/a1/f3 lmm_stripe_count: 1 lmm_stripe_size: 4194304 lmm_pattern: raid0 lmm_layout_gen: 0 lmm_stripe_offset: 0 obdidx objid objid group 0 164 0xa4 0x280000401 [0x200000bd1:0x7:0x0] /mnt/lustre/d18e.sanity-lfsck/a1/f4 lcm_layout_gen: 2 lcm_mirror_count: 1 lcm_entry_count: 2 lcme_id: 1 lcme_mirror_id: 0 lcme_flags: init lcme_extent.e_start: 0 lcme_extent.e_end: 1048576 lmm_stripe_count: 1 lmm_stripe_size: 1048576 lmm_pattern: raid0 lmm_layout_gen: 0 lmm_stripe_offset: 0 lmm_objects: - 0: { l_ost_idx: 0, l_fid: [0x280000401:0xa5:0x0] } lcme_id: 2 lcme_mirror_id: 0 lcme_flags: 0 lcme_extent.e_start: 1048576 lcme_extent.e_end: EOF lmm_stripe_count: 1 lmm_stripe_size: 1048576 lmm_pattern: raid0 lmm_layout_gen: 0 lmm_stripe_offset: -1 Inject failure to make /mnt/lustre/d18e.sanity-lfsck/a1/f1 and /mnt/lustre/d18e.sanity-lfsck/a1/f2 to reference the same OST-object (which is f1's OST-object). Then drop /mnt/lustre/d18e.sanity-lfsck/a1/f1 and its OST-object, so f2 becomes dangling reference case, but f2's old OST-object is there. Also the failure makes /mnt/lustre/d18e.sanity-lfsck/a1/f3 and /mnt/lustre/d18e.sanity-lfsck/a1/f4 to reference the same OST-object (which is f3's OST-object). Then drop /mnt/lustre/d18e.sanity-lfsck/a1/f3 and its OST-object, so f4 becomes dangling reference case, but f4's old OST-object is there. fail_loc=0x1618 fail_loc=0 stopall to cleanup object cache setupall /home/green/git/lustre-release/lustre/tests/test-framework.sh: line 1040: echo: write error: Device or resource busy oleg442-server: oleg442-server.virtnet: executing load_modules_local oleg442-server: /home/green/git/lustre-release/lustre/tests/test-framework.sh: line 1040: echo: write error: Device or resource busy oleg442-server: oleg442-server.virtnet: executing set_default_debug vfstrace rpctrace dlmtrace neterror ha config ioctl super lfsck all pdsh@oleg442-client: oleg442-server: ssh exited with exit code 1 oleg442-server: oleg442-server.virtnet: executing set_default_debug vfstrace rpctrace dlmtrace neterror ha config ioctl super lfsck all pdsh@oleg442-client: oleg442-server: ssh exited with exit code 1 oleg442-server: oleg442-server.virtnet: executing set_default_debug vfstrace rpctrace dlmtrace neterror ha config ioctl super lfsck all pdsh@oleg442-client: oleg442-server: ssh exited with exit code 1 oleg442-server: oleg442-server.virtnet: executing set_default_debug vfstrace rpctrace dlmtrace neterror ha config ioctl super lfsck all pdsh@oleg442-client: oleg442-server: ssh exited with exit code 1 Using TIMEOUT=20 fail_val=10 fail_loc=0x1602 debug=-1 debug_mb=150 debug=-1 debug_mb=150 Trigger layout LFSCK on all devices to find out orphan OST-object Started LFSCK on the device lustre-MDT0000: scrub layout Write new data to f2/f4 to modify the new created OST-object. fail_val=0 fail_loc=0 Waiting 120s for 'completed' Updated after 8s: want 'completed' got 'completed' debug_mb=21 debug_mb=21 There should be stub file under .lustre/lost+found/MDT0000/ The stub file should keep the original f2 or f4 data foo [0x200000403:0x7:0x0] /mnt/lustre/.lustre/lost+found/MDT0000/[0x200000403:0x7:0x0]-[0x200000bd1:0x7:0x0]-0-C-0 lcm_layout_gen: 3 lcm_mirror_count: 1 lcm_entry_count: 1 lcme_id: 1 lcme_mirror_id: 0 lcme_flags: init lcme_extent.e_start: 0 lcme_extent.e_end: 1048576 lmm_stripe_count: 1 lmm_stripe_size: 1048576 lmm_pattern: raid0 lmm_layout_gen: 1 lmm_stripe_offset: 0 lmm_objects: - 0: { l_ost_idx: 0, l_fid: [0x280000401:0xa5:0x0] } foo [0x200000403:0x6:0x0] /mnt/lustre/.lustre/lost+found/MDT0000/[0x200000403:0x6:0x0]-[0x200000bd1:0x5:0x0]-0-C-0 lmm_stripe_count: 1 lmm_stripe_size: 4194304 lmm_pattern: raid0 lmm_layout_gen: 1 lmm_stripe_offset: 0 obdidx objid objid group 0 163 0xa3 0x280000401 The f2/f4 should contains new data. dummy [0x200000bd1:0x5:0x0] /mnt/lustre/d18e.sanity-lfsck/a1/f2 lmm_stripe_count: 1 lmm_stripe_size: 4194304 lmm_pattern: raid0 lmm_layout_gen: 0 lmm_stripe_offset: 0 obdidx objid objid group 0 162 0xa2 0x280000401 dummy [0x200000bd1:0x7:0x0] /mnt/lustre/d18e.sanity-lfsck/a1/f4 lcm_layout_gen: 2 lcm_mirror_count: 1 lcm_entry_count: 2 lcme_id: 1 lcme_mirror_id: 0 lcme_flags: init lcme_extent.e_start: 0 lcme_extent.e_end: 1048576 lmm_stripe_count: 1 lmm_stripe_size: 1048576 lmm_pattern: raid0 lmm_layout_gen: 0 lmm_stripe_offset: 0 lmm_objects: - 0: { l_ost_idx: 0, l_fid: [0x280000401:0xa4:0x0] } lcme_id: 2 lcme_mirror_id: 0 lcme_flags: 0 lcme_extent.e_start: 1048576 lcme_extent.e_end: EOF lmm_stripe_count: 1 lmm_stripe_size: 1048576 lmm_pattern: raid0 lmm_layout_gen: 0 lmm_stripe_offset: -1 PASS 18e (112s) == sanity-lfsck test 18f: Skip the failed OST(s) when handle orphan OST-objects ========================================================== 10:15:30 (1773670530) ##### The target MDT-object is lost. The LFSCK should re-create the MDT-object under .lustre/lost+found/MDTxxxx. If some OST fail to verify some OST-object(s) during the first stage-scanning, the LFSCK should skip orphan OST-objects for such OST. Others should not be affected. ##### 2+0 records in 2+0 records out 2097152 bytes (2.1 MB, 2.0 MiB) copied, 0.0262221 s, 80.0 MB/s 2+0 records in 2+0 records out 2097152 bytes (2.1 MB, 2.0 MiB) copied, 0.0286465 s, 73.2 MB/s 2+0 records in 2+0 records out 2097152 bytes (2.1 MB, 2.0 MiB) copied, 0.0271106 s, 77.4 MB/s /mnt/lustre/d18f.sanity-lfsck/a1/f1 lmm_stripe_count: 1 lmm_stripe_size: 4194304 lmm_pattern: raid0 lmm_layout_gen: 0 lmm_stripe_offset: 0 obdidx objid objid group 0 195 0xc3 0x280000401 /mnt/lustre/d18f.sanity-lfsck/a2/f2 lmm_stripe_count: 2 lmm_stripe_size: 1048576 lmm_pattern: raid0 lmm_layout_gen: 0 lmm_stripe_offset: 0 obdidx objid objid group 0 196 0xc4 0x280000401 1 66 0x42 0x2c0000401 2+0 records in 2+0 records out 2097152 bytes (2.1 MB, 2.0 MiB) copied, 0.0277981 s, 75.4 MB/s 2+0 records in 2+0 records out 2097152 bytes (2.1 MB, 2.0 MiB) copied, 0.028015 s, 74.9 MB/s 2+0 records in 2+0 records out 2097152 bytes (2.1 MB, 2.0 MiB) copied, 0.0253764 s, 82.6 MB/s /mnt/lustre/d18f.sanity-lfsck/a3/f3 lmm_stripe_count: 1 lmm_stripe_size: 4194304 lmm_pattern: raid0 lmm_layout_gen: 0 lmm_stripe_offset: 0 obdidx objid objid group 0 67 0x43 0x280000400 /mnt/lustre/d18f.sanity-lfsck/a4/f4 lmm_stripe_count: 2 lmm_stripe_size: 1048576 lmm_pattern: raid0 lmm_layout_gen: 0 lmm_stripe_offset: 0 obdidx objid objid group 0 68 0x44 0x280000400 1 66 0x42 0x2c0000400 Inject failure, to simulate the case of missing the MDT-object fail_loc=0x1616 fail_loc=0x1616 fail_loc=0 fail_loc=0 Inject failure, to simulate the OST0 fail to handle MDT0 LFSCK request during the first-stage scanning. fail_loc=0x161c fail_val=0 Trigger layout LFSCK on all devices to find out orphan OST-object Started LFSCK on the device lustre-MDT0000: scrub layout fail_loc=0 fail_val=0 Trigger layout LFSCK on all devices again to cleanup Started LFSCK on the device lustre-MDT0000: scrub layout PASS 18f (11s) == sanity-lfsck test 18g: Find out orphan OST-object and repair it (7) ========================================================== 10:15:41 (1773670541) ##### The target MDT-object is lost, but related OI mapping is there The LFSCK should recreate the lost MDT-object without affected by the stale OI mapping. ##### 2+0 records in 2+0 records out 2097152 bytes (2.1 MB, 2.0 MiB) copied, 0.0208969 s, 100 MB/s [0x2000013a1:0x10:0x0] /mnt/lustre/d18g.sanity-lfsck/a1/f1 lmm_stripe_count: 2 lmm_stripe_size: 1048576 lmm_pattern: raid0 lmm_layout_gen: 0 lmm_stripe_offset: 0 obdidx objid objid group 0 197 0xc5 0x280000401 1 67 0x43 0x2c0000401 Inject failure to simulate lost MDT-object but keep OI mapping fail_loc=0x162e fail_loc=0 Trigger layout LFSCK on all devices to find out orphan OST-object Started LFSCK on the device lustre-MDT0000: scrub layout Move the files from ./lustre/lost+found/MDTxxxx to namespace [0x2000013a1:0x10:0x0] /mnt/lustre/d18g.sanity-lfsck/a1/f1 lmm_stripe_count: 2 lmm_stripe_size: 1048576 lmm_pattern: raid0 lmm_layout_gen: 2 lmm_stripe_offset: 0 obdidx objid objid group 0 197 0xc5 0x280000401 1 67 0x43 0x2c0000401 PASS 18g (4s) == sanity-lfsck test 18h: LFSCK can repair crashed PFL extent range ========================================================== 10:15:45 (1773670545) ##### The PFL extent crashed. During the first cycle LFSCK scanning, the layout LFSCK will keep the bad PFL file(s) there without scanning its OST-object(s). Then in the second stage scanning, the OST will return related OST-object(s) to the MDT as orphan. And then the LFSCK on the MDT can rebuild the PFL extent with the 'orphan(s)' stripe information. ##### 0+1 records in 0+1 records out 312813 bytes (313 kB, 305 KiB) copied, 0.00440003 s, 71.1 MB/s Inject failure stub to simulate bad PFL extent range fail_loc=0x162f fail_loc=0 dd: error writing '/mnt/lustre/d18h.sanity-lfsck/f0': No data available 1+0 records in 0+0 records out 0 bytes copied, 0.00287665 s, 0.0 kB/s Trigger layout LFSCK to find out the bad lmm_oi and fix them Started LFSCK on the device lustre-MDT0000: scrub layout Data in /mnt/lustre/d18h.sanity-lfsck/f0 should not be broken Write should succeed after LFSCK repairing the bad PFL range 1+0 records in 1+0 records out 1048576 bytes (1.0 MB, 1.0 MiB) copied, 0.0122798 s, 85.4 MB/s PASS 18h (6s) == sanity-lfsck test 19a: OST-object inconsistency self detect ========================================================== 10:15:51 (1773670551) Inject failure, then client will offer wrong parent FID when read fail_loc=0x1619 Read RPC with wrong parent FID should be denied cat: /mnt/lustre/d19a.sanity-lfsck/a0: Operation not permitted cat: /mnt/lustre/d19a.sanity-lfsck/a1: Operation not permitted fail_loc=0 PASS 19a (5s) == sanity-lfsck test 19b: OST-object inconsistency self repair ========================================================== 10:15:56 (1773670556) Inject failure stub to make the OST-object to back point to non-exist MDT-object fail_loc=0x1611 fail_loc=0 Nothing should be fixed since self detect and repair is disabled Read RPC with right parent FID should be accepted, and cause parent FID on OST to be fixed foo1 foo2 PASS 19b (6s) == sanity-lfsck test 20a: Handle the orphan with dummy LOV EA slot properly ========================================================== 10:16:02 (1773670562) ##### The target MDT-object and some of its OST-object are lost. The LFSCK should find out the left OST-objects and re-create the MDT-object under the direcotry .lustre/lost+found/MDTxxxx/ with the partial OST-objects (LOV EA hole). New client can access the file with LOV EA hole via normal system tools or commands without crash the system. For old client, even though it cannot access the file with LOV EA hole, it should not cause the system crash. ##### 257+0 records in 257+0 records out 1052672 bytes (1.1 MB, 1.0 MiB) copied, 0.0502443 s, 21.0 MB/s 257+0 records in 257+0 records out 1052672 bytes (1.1 MB, 1.0 MiB) copied, 0.0556476 s, 18.9 MB/s 257+0 records in 257+0 records out 1052672 bytes (1.1 MB, 1.0 MiB) copied, 0.0637729 s, 16.5 MB/s [0x2000013a1:0x25:0x0] /mnt/lustre/d20a.sanity-lfsck/a1/f0 lmm_stripe_count: 2 lmm_stripe_size: 1048576 lmm_pattern: raid0 lmm_layout_gen: 0 lmm_stripe_offset: 0 obdidx objid objid group 0 204 0xcc 0x280000401 1 70 0x46 0x2c0000401 [0x2000013a1:0x26:0x0] /mnt/lustre/d20a.sanity-lfsck/a1/f1 lmm_stripe_count: 2 lmm_stripe_size: 1048576 lmm_pattern: raid0 lmm_layout_gen: 0 lmm_stripe_offset: 0 obdidx objid objid group 0 205 0xcd 0x280000401 1 71 0x47 0x2c0000401 [0x2000013a1:0x27:0x0] /mnt/lustre/d20a.sanity-lfsck/a1/f2 lmm_stripe_count: 2 lmm_stripe_size: 1048576 lmm_pattern: raid0 lmm_layout_gen: 0 lmm_stripe_offset: 0 obdidx objid objid group 0 206 0xce 0x280000401 1 72 0x48 0x2c0000401 Inject failure... To simulate f0 lost MDT-object fail_loc=0x1616 To simulate f1 lost MDT-object and OST-object0 fail_loc=0x161a To simulate f2 lost MDT-object and OST-object1 fail_val=1 192.168.204.142@tcp:/lustre /mnt/lustre lustre rw,checksum,encrypt,flock,lazystatfs,lruresize,nolock,statfs_project,nouser_fid2path,user_xattr,verbose 0 0 Stopping client oleg442-client.virtnet /mnt/lustre (opts:) fail_loc=0 fail_val=0 Trigger layout LFSCK on all devices to find out orphan OST-object Started LFSCK on the device lustre-MDT0000: scrub layout Starting client: oleg442-client.virtnet: -o user_xattr,flock 192.168.204.142@tcp:/lustre /mnt/lustre Check /mnt/lustre/.lustre/lost+found/MDT0000/[0x2000013a1:0x25:0x0]-R-0, which is the old f0 /mnt/lustre/.lustre/lost+found/MDT0000/[0x2000013a1:0x25:0x0]-R-0 lmm_magic: 0x0BD10BD0 lmm_seq: 0x2000013a1 lmm_object_id: 0x25 lmm_fid: [0x2000013a1:0x25:0x0] lmm_stripe_count: 2 lmm_stripe_size: 1048576 lmm_pattern: raid0 lmm_layout_gen: 2 lmm_stripe_offset: 0 obdidx objid objid group 0 204 0xcc 0x280000401 1 70 0x46 0x2c0000401 Check /mnt/lustre/.lustre/lost+found/MDT0000/[0x2000013a1:0x26:0x0]-R-0, it contains the old f1's stripe1 /mnt/lustre/.lustre/lost+found/MDT0000/[0x2000013a1:0x26:0x0]-R-0 lmm_magic: 0x0BD10BD0 lmm_seq: 0x2000013a1 lmm_object_id: 0x26 lmm_fid: [0x2000013a1:0x26:0x0] lmm_stripe_count: 2 lmm_stripe_size: 1048576 lmm_pattern: 40000001 lmm_layout_gen: 1 lmm_stripe_offset: 0 obdidx objid objid group 0 0 0 0 1 71 0x47 0x2c0000401 cat: '/mnt/lustre/.lustre/lost+found/MDT0000/[0x2000013a1:0x26:0x0]-R-0': Input/output error dd: error writing '/mnt/lustre/.lustre/lost+found/MDT0000/[0x2000013a1:0x26:0x0]-R-0': Input/output error 1+0 records in 0+0 records out 0 bytes copied, 0.000430551 s, 0.0 kB/s 1+0 records in 1+0 records out 4096 bytes (4.1 kB, 4.0 KiB) copied, 0.00371365 s, 1.1 MB/s /home/green/git/lustre-release/lustre/tests/sanity-lfsck.sh: line 3369: echo: write error: Input/output error Check /mnt/lustre/.lustre/lost+found/MDT0000/[0x2000013a1:0x27:0x0]-R-0, it contains the old f2's stripe0 /mnt/lustre/.lustre/lost+found/MDT0000/[0x2000013a1:0x27:0x0]-R-0 lmm_magic: 0x0BD10BD0 lmm_seq: 0x2000013a1 lmm_object_id: 0x27 lmm_fid: [0x2000013a1:0x27:0x0] lmm_stripe_count: 2 lmm_stripe_size: 1048576 lmm_pattern: 40000001 lmm_layout_gen: 1 lmm_stripe_offset: 0 obdidx objid objid group 0 206 0xce 0x280000401 0 0 0 0 cat: '/mnt/lustre/.lustre/lost+found/MDT0000/[0x2000013a1:0x27:0x0]-R-0': Input/output error dd: error writing '/mnt/lustre/.lustre/lost+found/MDT0000/[0x2000013a1:0x27:0x0]-R-0': Input/output error 1+0 records in 0+0 records out 0 bytes copied, 0.000539038 s, 0.0 kB/s PASS 20a (31s) == sanity-lfsck test 20b: Handle the orphan with dummy LOV EA slot properly - PFL case ========================================================== 10:16:33 (1773670593) PASS 20b (2s) == sanity-lfsck test 21: run all LFSCK components by default ========================================================== 10:16:35 (1773670595) total: 100 open/close in 0.28 seconds: 352.33 ops/second Start all LFSCK components by default (-s 1) Started LFSCK on the device lustre-MDT0000: scrub layout namespace namespace LFSCK should be in 'scanning-phase1' status layout LFSCK should be in 'scanning-phase1' status Stop all LFSCK components by default Stopped LFSCK on the device lustre-MDT0000. PASS 21 (4s) == sanity-lfsck test 22a: LFSCK can repair unmatched pairs (1) ========================================================== 10:16:39 (1773670599) ##### The parent_A references the child directory via some name entry, but the child directory back references another parent_B via its .. name entry. The parent_B does not exist. Then the namespace LFSCK will repair the child directory's .. name entry. ##### Inject failure stub on MDT0 to simulate bad dotdot name entry The dummy's dotdot name entry references the guard. fail_loc=0x161e fail_loc=0 Trigger namespace LFSCK to repair unmatched pairs Started LFSCK on the device lustre-MDT0000: scrub namespace 'ls' should success after namespace LFSCK repairing PASS 22a (3s) == sanity-lfsck test 22b: LFSCK can repair unmatched pairs (2) ========================================================== 10:16:42 (1773670602) ##### The parent_A references the child directory via the name entry_B, but the child directory back references another parent_C via its .. name entry. The parent_C exists, but there is no the name entry_B under the parent_C. Then the namespace LFSCK will repair the child directory's .. name entry and its linkEA. ##### Inject failure stub on MDT0 to simulate bad dotdot name entry and bad linkEA. The dummy's dotdot name entry references the guard. The dummy's linkEA references n non-exist name entry. fail_loc=0x161e fail_loc=0 fid2path should NOT work on the dummy's FID [0x2000013a3:0x6f:0x0] Trigger namespace LFSCK to repair unmatched pairs Started LFSCK on the device lustre-MDT0000: scrub namespace fid2path should work on the dummy's FID [0x2000013a3:0x6f:0x0] after LFSCK PASS 22b (4s) == sanity-lfsck test 23a: LFSCK can repair dangling name entry (1) ========================================================== 10:16:46 (1773670606) ##### The name entry is there, but the MDT-object for such name entry does not exist. The namespace LFSCK should find out and repair the inconsistency as required. ##### Inject failure stub on MDT1 to simulate dangling name entry fail_loc=0x1620 fail_loc=0 'ls' should fail because of dangling name entry Trigger namespace LFSCK to find out dangling name entry Started LFSCK on the device lustre-MDT0000: scrub namespace 'ls' should fail because not re-create MDT-object by default Trigger namespace LFSCK again to repair dangling name entry Started LFSCK on the device lustre-MDT0000: scrub namespace 'ls' should success after namespace LFSCK repairing PASS 23a (4s) SKIP: sanity-lfsck test_23b skipping excluded test 23b == sanity-lfsck test 23c: LFSCK can repair dangling name entry (3) ========================================================== 10:16:51 (1773670611) ##### The objectA has multiple hard links, one of them corresponding to the name entry_B. But there is something wrong for the name entry_B and cause entry_B to references non-exist object_C. In the first-stage scanning, the LFSCK will think the entry_B as dangling, and re-create the lost object_C. And then others modified the re-created object_C. When the LFSCK comes to the second-stage scanning, it will find that the former re-creating object_C maybe wrong and try to replace the object_C with the real object_A. But because object_C has been modified, so the LFSCK cannot replace it. ##### debug=-1 debug_mb=150 debug=-1 debug_mb=150 parent_fid=[0x2000013a3:0x73:0x0] total: 10 open/close in 0.07 seconds: 152.08 ops/second f0_fid=[0x2000013a3:0x7e:0x0] f1_fid=[0x2000013a3:0x7f:0x0] Inject failure stub on MDT0 to simulate dangling name entry fail_val=0x7f fail_loc=0x1621 fail_val=0 fail_loc=0 - unlinked 0 (time 1773670613 ; total 0 ; last 0) total: 10 unlinks in 0 seconds: inf unlinks/second 'ls' should fail because of dangling name entry fail_val=10 fail_loc=0x1602 Trigger namespace LFSCK to find out dangling name entry Started LFSCK on the device lustre-MDT0000: scrub namespace fail_val=0 fail_loc=0 Waiting 32s for 'completed' Updated after 10s: want 'completed' got 'completed' debug_mb=21 debug_mb=21 PASS 23c (16s) == sanity-lfsck test 23d: LFSCK can repair a dangling name entry to a remote object ========================================================== 10:17:07 (1773670627) Stopping /mnt/lustre-mds1 (opts:) on oleg442-server oleg442-server: debugfs 1.47.3-wc2 (11-Nov-2025) Start mds1: mount -t lustre -o localrecov /dev/mapper/mds1_flakey /mnt/lustre-mds1 oleg442-server: oleg442-server.virtnet: executing set_default_debug vfstrace rpctrace dlmtrace neterror ha config ioctl super lfsck all pdsh@oleg442-client: oleg442-server: ssh exited with exit code 1 Started lustre-MDT0000 cat: /mnt/lustre/d23d.sanity-lfsck/mdt1dir/foo: Bad address Started LFSCK on the device lustre-MDT0001: scrub namespace Started LFSCK on the device lustre-MDT0000: scrub layout PASS 23d (12s) == sanity-lfsck test 24: LFSCK can repair multiple-referenced name entry ========================================================== 10:17:19 (1773670639) ##### Two MDT-objects back reference the same name entry via their each own linkEA entry, but the name entry only references one MDT-object. The namespace LFSCK will remove the linkEA entry for the MDT-object that is not recognized. If such MDT-object has no other linkEA entry after the removing, then the LFSCK will add it as orphan under the .lustre/lost+found/MDTxxxx/. ##### [0x2400013a2:0x8:0x0] [0x2400013a2:0x9:0x0] Inject failure stub on MDT0 to simulate the case that the /mnt/lustre/d24.sanity-lfsck/d0/dummy/foo has the 'bad' linkEA entry that references /mnt/lustre/d24.sanity-lfsck/d0/guard/foo. Then remove the name entry /mnt/lustre/d24.sanity-lfsck/d0/dummy/foo. So the MDT-object /mnt/lustre/d24.sanity-lfsck/d0/dummy/foo will be left there with the same linkEA entry as another MDT-object /mnt/lustre/d24.sanity-lfsck/d0/guard/foo has fail_loc=0x1622 [0x2000013a3:0x84:0x0] fail_loc=0 stat /mnt/lustre/d24.sanity-lfsck/d0/dummy/foo should fail Trigger namespace LFSCK to repair multiple-referenced name entry Started LFSCK on the device lustre-MDT0000: scrub namespace There should be an orphan under .lustre/lost+found/MDT0000/ total 8 144115272414920836 drwxr-xr-x 2 root root 4096 Mar 16 10:17 . 144115205306056705 drwx------ 3 root root 4096 Mar 16 10:16 .. PASS 24 (4s) == sanity-lfsck test 25: LFSCK can repair bad file type in the name entry ========================================================== 10:17:23 (1773670643) ##### The file type in the name entry does not match the file type claimed by the referenced object. Then the LFSCK will update the file type in the name entry. ##### Inject failure stub on MDT0 to simulate the case that the file type stored in the name entry is wrong. fail_loc=0x1623 fail_loc=0 Trigger namespace LFSCK to repair bad file type in the name entry Started LFSCK on the device lustre-MDT0000: scrub namespace total 8 144115272414920838 drwxr-xr-x 2 root root 4096 Mar 16 10:17 . 144115272414920837 drwxr-xr-x 3 root root 4096 Mar 16 10:17 .. 144115272414920839 -rw-r--r-- 1 root root 0 Mar 16 10:17 foo PASS 25 (4s) == sanity-lfsck test 26a: LFSCK can add the missing local name entry back to the namespace ========================================================== 10:17:27 (1773670647) ##### The local name entry back referenced by the MDT-object is lost. The namespace LFSCK will add the missing local name entry back to the normal namespace. ##### Inject failure stub on MDT0 to simulate the case that foo's name entry will be removed, but the foo's object and its linkEA are kept in the system. fail_loc=0x1624 fail_loc=0 Trigger namespace LFSCK to repair the missing remote name entry Started LFSCK on the device lustre-MDT0000: scrub namespace 144115272414920842 -rw-r--r-- 2 root root 0 Mar 16 10:17 /mnt/lustre/d26a.sanity-lfsck/d0/foo PASS 26a (3s) == sanity-lfsck test 26b: LFSCK can add the missing remote name entry back to the namespace ========================================================== 10:17:30 (1773670650) ##### The remote name entry back referenced by the MDT-object is lost. The namespace LFSCK will add the missing remote name entry back to the normal namespace. ##### Inject failure stub on MDT0 to simulate the case that foo's name entry will be removed, but the foo's object and its linkEA are kept in the system. fail_loc=0x1624 fail_loc=0 Trigger namespace LFSCK to repair the missing remote name entry Started LFSCK on the device lustre-MDT0000: scrub namespace total 8 144115272414920844 drwxr-xr-x 2 root root 4096 Mar 16 10:17 . 162129670907625483 drwxr-xr-x 3 root root 4096 Mar 16 10:17 .. PASS 26b (4s) == sanity-lfsck test 27a: LFSCK can recreate the lost local parent directory as orphan ========================================================== 10:17:34 (1773670654) ##### The local parent referenced by the MDT-object linkEA is lost. The namespace LFSCK will re-create the lost parent as orphan. ##### Inject failure stub on MDT0 to simulate the case that foo's name entry will be removed, but the foo's object and its linkEA are kept in the system. And then remove another hard link and the parent directory. fail_loc=0x1624 fail_loc=0 Trigger namespace LFSCK to repair the lost parent Started LFSCK on the device lustre-MDT0000: scrub namespace There should be an orphan under .lustre/lost+found/MDT0000/ total 12 144115205306056705 drwx------ 3 root root 4096 Mar 16 10:17 . 144115188109410307 dr-x------ 4 root root 4096 Dec 31 1969 .. 144115272414920846 drwx------ 2 root root 4096 Dec 31 1969 [0x2000013a3:0x8e:0x0]-P-0 PASS 27a (3s) == sanity-lfsck test 27b: LFSCK can recreate the lost remote parent directory as orphan ========================================================== 10:17:37 (1773670657) ##### The remote parent referenced by the MDT-object linkEA is lost. The namespace LFSCK will re-create the lost parent as orphan. ##### [0x2400013a2:0xc:0x0] Inject failure stub on MDT0 to simulate the case that foo's name entry will be removed, but the foo's object and its linkEA are kept in the system. And then remove the parent directory. fail_loc=0x1624 fail_loc=0 Trigger namespace LFSCK to repair the missing remote name entry Started LFSCK on the device lustre-MDT0000: scrub namespace total 12 144115188109410307 dr-x------ 4 root root 4096 Dec 31 1969 . 144115205306056705 drwx------ 2 root root 4096 Mar 16 10:17 MDT0000 162129603815538689 drwx------ 3 root root 4096 Mar 16 10:15 MDT0001 There should be an orphan under .lustre/lost+found/MDT0001/ total 12 162129603815538689 drwx------ 3 root root 4096 Mar 16 10:15 . 144115188109410307 dr-x------ 4 root root 4096 Dec 31 1969 .. 162129670907625484 drwx------ 3 root root 4096 Dec 31 1969 [0x2400013a2:0xc:0x0]-P-0 PASS 27b (4s) == sanity-lfsck test 28: Skip the failed MDT(s) when handle orphan MDT-objects ========================================================== 10:17:41 (1773670661) ##### The target name entry is lost. The LFSCK should insert the orphan MDT-object under .lustre/lost+found/MDTxxxx. But if the MDT (on which the orphan MDT-object resides) has ever failed to respond some name entry verification during the first stage-scanning, then the LFSCK should skip to handle orphan MDT-object on this MDT. But other MDTs should not be affected. ##### Inject failure stub on MDT0 to simulate the case that d1/a1's name entry will be removed, but the d1/a1's object and its linkEA are kept in the system. And the case that d2/a2's name entry will be removed, but the d2/a2's object and its linkEA are kept in the system. fail_loc=0x1624 fail_loc=0x1624 fail_loc=0 fail_loc=0 Inject failure, to simulate the MDT0 fail to handle MDT1 LFSCK request during the first-stage scanning. fail_loc=0x161c fail_val=0 Trigger namespace LFSCK on all devices to find out orphan object Started LFSCK on the device lustre-MDT0000: scrub namespace fail_loc=0 fail_val=0 Trigger namespace LFSCK on all devices again to cleanup Started LFSCK on the device lustre-MDT0000: scrub namespace PASS 28 (7s) == sanity-lfsck test 29b: LFSCK can repair bad nlink count (2) ========================================================== 10:17:48 (1773670668) ##### The object's nlink attribute is smaller than the object's known name entries count. The LFSCK will repair the object's nlink attribute to match the known name entries count ##### Inject failure stub on MDT0 to simulate the case that foo's nlink attribute is smaller than its name entries count. fail_loc=0x1626 fail_loc=0 Trigger namespace LFSCK to repair the nlink count Started LFSCK on the device lustre-MDT0000: scrub namespace PASS 29b (3s) == sanity-lfsck test 29c: verify linkEA size limitation == 10:17:51 (1773670671) ##### The namespace LFSCK will create many hard links to the target file as to exceed the linkEA size limitation. Under such case the linkEA will be marked as overflow that will prevent the target file to be migrated. Then remove some hard links to make the left hard links to be held within the linkEA size limitation. But before the namespace LFSCK adding all the missed linkEA entries back, the overflow mark (timestamp) will not be cleared. ##### Create 150 hard links should succeed although the linkEA overflow total: 150 link in 0.66 seconds: 225.91 ops/second The object with linkEA overflow should NOT be migrated Remove 100 hard links to save space for the missed linkEA entries - unlinked 0 (time 1773670674 ; total 0 ; last 0) total: 100 unlinks in 0 seconds: inf unlinks/second Trigger namespace LFSCK to clear the overflow timestamp Started LFSCK on the device lustre-MDT0000: scrub namespace PASS 29c (9s) == sanity-lfsck test 29d: accessing non-existing inode shouldn't turn fs read-only (ldiskfs) ========================================================== 10:18:00 (1773670680) ##### The object's nlink attribute is smaller than the object's known name entries count. The LFSCK will repair the object's nlink attribute to match the known name entries count ##### Inject failure stub on MDT0 to simulate the case that foo's nlink attribute is smaller than its name entries count. fail_loc=0x1626 fail_loc=0 stat: cannot statx '/mnt/lustre/d29d.sanity-lfsck/d0/foo': No such file or directory rm_entry total 0 -rw-r--r-- 1 root root 0 Mar 16 10:18 foo0 PASS 29d (3s) == sanity-lfsck test 30: LFSCK can recover the orphans from backend /lost+found ========================================================== 10:18:03 (1773670683) ##### The namespace LFSCK will move the orphans from backend /lost+found directory to normal client visible namespace or to global visible ./lustre/lost+found/MDTxxxx/ directory ##### Inject failure stub on MDT0 to simulate the case that directory d0 has no linkEA entry, then the LFSCK will move it into .lustre/lost+found/MDTxxxx/ later. fail_loc=0x161d fail_loc=0 Inject failure stub on MDT0 to simulate the case that the object's name entry will be removed, but not destroy the object. Then backend e2fsck will handle it as orphan and add them into the backend /lost+found directory. fail_loc=0x1624 fail_loc=0 192.168.204.142@tcp:/lustre /mnt/lustre lustre rw,checksum,encrypt,flock,lazystatfs,lruresize,nolock,statfs_project,nouser_fid2path,user_xattr,verbose 0 0 Stopping client oleg442-client.virtnet /mnt/lustre (opts:) Stopping /mnt/lustre-mds1 (opts:) on oleg442-server run e2fsck on mds1 e2fsck -d -v -t -t -f -y /dev/mapper/mds1_flakey -m8 oleg442-server: e2fsck 1.47.3-wc2 (11-Nov-2025) oleg442-server: Use max possible thread num: 1 instead Pass 1: Checking inodes, blocks, and sizes [Thread 0] Scan group range [0, 1], used inodes 290/40000 [Thread 0] jumping to group 0 [Thread 0] e2fsck_pass1_run:2316: increase inode 7 badness 0 to 1 for 1005c [Thread 0] e2fsck_pass1_run:2680: increase inode 81 badness 0 to 2 for 10084 [Thread 0] e2fsck_pass1_run:2316: increase inode 82 badness 0 to 1 for 1005c [Thread 0] e2fsck_pass1_run:2680: increase inode 82 badness 1 to 3 for 10084 [Thread 0] e2fsck_pass1_run:2680: increase inode 83 badness 0 to 2 for 10084 [Thread 0] e2fsck_pass1_run:2680: increase inode 89 badness 0 to 2 for 10084 [Thread 0] e2fsck_pass1_run:2316: increase inode 93 badness 0 to 1 for 1005c [Thread 0] e2fsck_pass1_run:2680: increase inode 93 badness 1 to 3 for 10084 [Thread 0] e2fsck_pass1_run:2680: increase inode 94 badness 0 to 2 for 10084 [Thread 0] e2fsck_pass1_run:2680: increase inode 95 badness 0 to 2 for 10084 [Thread 0] e2fsck_pass1_run:2680: increase inode 96 badness 0 to 2 for 10084 [Thread 0] e2fsck_pass1_run:2680: increase inode 97 badness 0 to 2 for 10084 [Thread 0] e2fsck_pass1_run:2316: increase inode 98 badness 0 to 1 for 1005c [Thread 0] e2fsck_pass1_run:2680: increase inode 98 badness 1 to 3 for 10084 [Thread 0] e2fsck_pass1_run:2316: increase inode 99 badness 0 to 1 for 1005c [Thread 0] e2fsck_pass1_run:2680: increase inode 99 badness 1 to 3 for 10084 [Thread 0] e2fsck_pass1_run:2316: increase inode 100 badness 0 to 1 for 1005c [Thread 0] e2fsck_pass1_run:2680: increase inode 100 badness 1 to 3 for 10084 [Thread 0] e2fsck_pass1_run:2316: increase inode 101 badness 0 to 1 for 1005c [Thread 0] e2fsck_pass1_run:2680: increase inode 101 badness 1 to 3 for 10084 [Thread 0] e2fsck_pass1_run:2316: increase inode 102 badness 0 to 1 for 1005c [Thread 0] e2fsck_pass1_run:2680: increase inode 102 badness 1 to 3 for 10084 [Thread 0] e2fsck_pass1_run:2316: increase inode 103 badness 0 to 1 for 1005c [Thread 0] e2fsck_pass1_run:2680: increase inode 103 badness 1 to 3 for 10084 [Thread 0] e2fsck_pass1_run:2316: increase inode 104 badness 0 to 1 for 1005c [Thread 0] e2fsck_pass1_run:2680: increase inode 104 badness 1 to 3 for 10084 [Thread 0] e2fsck_pass1_run:2316: increase inode 105 badness 0 to 1 for 1005c [Thread 0] e2fsck_pass1_run:2680: increase inode 105 badness 1 to 3 for 10084 [Thread 0] e2fsck_pass1_run:2316: increase inode 106 badness 0 to 1 for 1005c [Thread 0] e2fsck_pass1_run:2680: increase inode 106 badness 1 to 3 for 10084 [Thread 0] e2fsck_pass1_run:2680: increase inode 107 badness 0 to 2 for 10084 [Thread 0] e2fsck_pass1_run:2316: increase inode 108 badness 0 to 1 for 1005c [Thread 0] e2fsck_pass1_run:2680: increase inode 108 badness 1 to 3 for 10084 [Thread 0] e2fsck_pass1_run:2316: increase inode 109 badness 0 to 1 for 1005c [Thread 0] e2fsck_pass1_run:2680: increase inode 109 badness 1 to 3 for 10084 [Thread 0] e2fsck_pass1_run:2316: increase inode 110 badness 0 to 1 for 1005c [Thread 0] e2fsck_pass1_run:2680: increase inode 110 badness 1 to 3 for 10084 [Thread 0] e2fsck_pass1_run:2316: increase inode 111 badness 0 to 1 for 1005c [Thread 0] e2fsck_pass1_run:2680: increase inode 111 badness 1 to 3 for 10084 [Thread 0] e2fsck_pass1_run:2316: increase inode 112 badness 0 to 1 for 1005c [Thread 0] e2fsck_pass1_run:2680: increase inode 112 badness 1 to 3 for 10084 [Thread 0] e2fsck_pass1_run:2316: increase inode 113 badness 0 to 1 for 1005c [Thread 0] e2fsck_pass1_run:2680: increase inode 113 badness 1 to 3 for 10084 [Thread 0] e2fsck_pass1_run:2316: increase inode 114 badness 0 to 1 for 1005c [Thread 0] e2fsck_pass1_run:2680: increase inode 114 badness 1 to 3 for 10084 [Thread 0] e2fsck_pass1_run:2316: increase inode 115 badness 0 to 1 for 1005c [Thread 0] e2fsck_pass1_run:2680: increase inode 115 badness 1 to 3 for 10084 [Thread 0] e2fsck_pass1_run:2316: increase inode 116 badness 0 to 1 for 1005c [Thread 0] e2fsck_pass1_run:2680: increase inode 116 badness 1 to 3 for 10084 [Thread 0] e2fsck_pass1_run:2316: increase inode 117 badness 0 to 1 for 1005c [Thread 0] e2fsck_pass1_run:2680: increase inode 117 badness 1 to 3 for 10084 [Thread 0] e2fsck_pass1_run:2316: increase inode 118 badness 0 to 1 for 1005c [Thread 0] e2fsck_pass1_run:2680: increase inode 118 badness 1 to 3 for 10084 [Thread 0] e2fsck_pass1_run:2316: increase inode 119 badness 0 to 1 for 1005c [Thread 0] e2fsck_pass1_run:2680: increase inode 119 badness 1 to 3 for 10084 [Thread 0] e2fsck_pass1_run:2316: increase inode 120 badness 0 to 1 for 1005c [Thread 0] e2fsck_pass1_run:2680: increase inode 120 badness 1 to 3 for 10084 [Thread 0] e2fsck_pass1_run:2680: increase inode 121 badness 0 to 2 for 10084 [Thread 0] e2fsck_pass1_run:2316: increase inode 122 badness 0 to 1 for 1005c [Thread 0] e2fsck_pass1_run:2680: increase inode 122 badness 1 to 3 for 10084 [Thread 0] e2fsck_pass1_run:2680: increase inode 123 badness 0 to 2 for 10084 [Thread 0] e2fsck_pass1_run:2316: increase inode 124 badness 0 to 1 for 1005c [Thread 0] e2fsck_pass1_run:2316: increase inode 126 badness 0 to 1 for 1005c [Thread 0] e2fsck_pass1_run:2680: increase inode 126 badness 1 to 3 for 10084 [Thread 0] e2fsck_pass1_run:2316: increase inode 127 badness 0 to 1 for 1005c [Thread 0] e2fsck_pass1_run:2680: increase inode 127 badness 1 to 3 for 10084 [Thread 0] e2fsck_pass1_run:2316: increase inode 128 badness 0 to 1 for 1005c [Thread 0] e2fsck_pass1_run:2680: increase inode 128 badness 1 to 3 for 10084 [Thread 0] e2fsck_pass1_run:2316: increase inode 129 badness 0 to 1 for 1005c [Thread 0] e2fsck_pass1_run:2680: increase inode 129 badness 1 to 3 for 10084 [Thread 0] e2fsck_pass1_run:2316: increase inode 133 badness 0 to 1 for 1005c [Thread 0] e2fsck_pass1_run:2680: increase inode 133 badness 1 to 3 for 10084 [Thread 0] e2fsck_pass1_run:2316: increase inode 134 badness 0 to 1 for 1005c [Thread 0] e2fsck_pass1_run:2680: increase inode 134 badness 1 to 3 for 10084 [Thread 0] e2fsck_pass1_run:2680: increase inode 135 badness 0 to 2 for 10084 [Thread 0] e2fsck_pass1_run:2316: increase inode 136 badness 0 to 1 for 1005c [Thread 0] e2fsck_pass1_run:2680: increase inode 136 badness 1 to 3 for 10084 [Thread 0] e2fsck_pass1_run:2316: increase inode 137 badness 0 to 1 for 1005c [Thread 0] e2fsck_pass1_run:2680: increase inode 137 badness 1 to 3 for 10084 [Thread 0] e2fsck_pass1_run:2316: increase inode 138 badness 0 to 1 for 1005c [Thread 0] e2fsck_pass1_run:2680: increase inode 138 badness 1 to 3 for 10084 [Thread 0] e2fsck_pass1_run:2316: increase inode 139 badness 0 to 1 for 1005c [Thread 0] e2fsck_pass1_run:2680: increase inode 139 badness 1 to 3 for 10084 [Thread 0] e2fsck_pass1_run:2316: increase inode 140 badness 0 to 1 for 1005c [Thread 0] e2fsck_pass1_run:2680: increase inode 140 badness 1 to 3 for 10084 [Thread 0] e2fsck_pass1_run:2316: increase inode 141 badness 0 to 1 for 1005c [Thread 0] e2fsck_pass1_run:2680: increase inode 141 badness 1 to 3 for 10084 [Thread 0] e2fsck_pass1_run:2316: increase inode 142 badness 0 to 1 for 1005c [Thread 0] e2fsck_pass1_run:2680: increase inode 142 badness 1 to 3 for 10084 [Thread 0] e2fsck_pass1_run:2316: increase inode 143 badness 0 to 1 for 1005c [Thread 0] e2fsck_pass1_run:2680: increase inode 143 badness 1 to 3 for 10084 [Thread 0] e2fsck_pass1_run:2316: increase inode 144 badness 0 to 1 for 1005c [Thread 0] e2fsck_pass1_run:2680: increase inode 144 badness 1 to 3 for 10084 [Thread 0] e2fsck_pass1_run:2316: increase inode 145 badness 0 to 1 for 1005c [Thread 0] e2fsck_pass1_run:2680: increase inode 145 badness 1 to 3 for 10084 [Thread 0] e2fsck_pass1_run:2316: increase inode 146 badness 0 to 1 for 1005c [Thread 0] e2fsck_pass1_run:2680: increase inode 146 badness 1 to 3 for 10084 [Thread 0] e2fsck_pass1_run:2316: increase inode 147 badness 0 to 1 for 1005c [Thread 0] e2fsck_pass1_run:2680: increase inode 147 badness 1 to 3 for 10084 [Thread 0] e2fsck_pass1_run:2316: increase inode 148 badness 0 to 1 for 1005c [Thread 0] e2fsck_pass1_run:2680: increase inode 148 badness 1 to 3 for 10084 [Thread 0] e2fsck_pass1_run:2316: increase inode 149 badness 0 to 1 for 1005c [Thread 0] e2fsck_pass1_run:2680: increase inode 149 badness 1 to 3 for 10084 [Thread 0] e2fsck_pass1_run:2316: increase inode 150 badness 0 to 1 for 1005c [Thread 0] e2fsck_pass1_run:2680: increase inode 150 badness 1 to 3 for 10084 [Thread 0] e2fsck_pass1_run:2316: increase inode 151 badness 0 to 1 for 1005c [Thread 0] e2fsck_pass1_run:2680: increase inode 151 badness 1 to 3 for 10084 [Thread 0] e2fsck_pass1_run:2316: increase inode 152 badness 0 to 1 for 1005c [Thread 0] e2fsck_pass1_run:2680: increase inode 152 badness 1 to 3 for 10084 [Thread 0] e2fsck_pass1_run:2680: increase inode 153 badness 0 to 2 for 10084 [Thread 0] e2fsck_pass1_run:2316: increase inode 154 badness 0 to 1 for 1005c [Thread 0] e2fsck_pass1_run:2680: increase inode 154 badness 1 to 3 for 10084 [Thread 0] e2fsck_pass1_run:2316: increase inode 155 badness 0 to 1 for 1005c [Thread 0] e2fsck_pass1_run:2680: increase inode 155 badness 1 to 3 for 10084 [Thread 0] e2fsck_pass1_run:2316: increase inode 156 badness 0 to 1 for 1005c [Thread 0] e2fsck_pass1_run:2680: increase inode 156 badness 1 to 3 for 10084 [Thread 0] e2fsck_pass1_run:2316: increase inode 157 badness 0 to 1 for 1005c [Thread 0] e2fsck_pass1_run:2680: increase inode 157 badness 1 to 3 for 10084 [Thread 0] e2fsck_pass1_run:2316: increase inode 158 badness 0 to 1 for 1005c [Thread 0] e2fsck_pass1_run:2680: increase inode 158 badness 1 to 3 for 10084 [Thread 0] e2fsck_pass1_run:2316: increase inode 159 badness 0 to 1 for 1005c [Thread 0] e2fsck_pass1_run:2680: increase inode 159 badness 1 to 3 for 10084 [Thread 0] e2fsck_pass1_run:2316: increase inode 160 badness 0 to 1 for 1005c [Thread 0] e2fsck_pass1_run:2680: increase inode 160 badness 1 to 3 for 10084 [Thread 0] e2fsck_pass1_run:2316: increase inode 161 badness 0 to 1 for 1005c [Thread 0] e2fsck_pass1_run:2680: increase inode 161 badness 1 to 3 for 10084 [Thread 0] e2fsck_pass1_run:2316: increase inode 162 badness 0 to 1 for 1005c [Thread 0] e2fsck_pass1_run:2680: increase inode 162 badness 1 to 3 for 10084 [Thread 0] e2fsck_pass1_run:2316: increase inode 163 badness 0 to 1 for 1005c [Thread 0] e2fsck_pass1_run:2680: increase inode 163 badness 1 to 3 for 10084 [Thread 0] e2fsck_pass1_run:2316: increase inode 164 badness 0 to 1 for 1005c [Thread 0] e2fsck_pass1_run:2680: increase inode 164 badness 1 to 3 for 10084 [Thread 0] e2fsck_pass1_run:2316: increase inode 165 badness 0 to 1 for 1005c [Thread 0] e2fsck_pass1_run:2680: increase inode 165 badness 1 to 3 for 10084 [Thread 0] e2fsck_pass1_run:2316: increase inode 166 badness 0 to 1 for 1005c [Thread 0] e2fsck_pass1_run:2680: increase inode 166 badness 1 to 3 for 10084 [Thread 0] e2fsck_pass1_run:2316: increase inode 167 badness 0 to 1 for 1005c [Thread 0] e2fsck_pass1_run:2680: increase inode 167 badness 1 to 3 for 10084 [Thread 0] e2fsck_pass1_run:2316: increase inode 168 badness 0 to 1 for 1005c [Thread 0] e2fsck_pass1_run:2680: increase inode 168 badness 1 to 3 for 10084 [Thread 0] e2fsck_pass1_run:2316: increase inode 169 badness 0 to 1 for 1005c [Thread 0] e2fsck_pass1_run:2680: increase inode 169 badness 1 to 3 for 10084 [Thread 0] e2fsck_pass1_run:2680: increase inode 173 badness 0 to 2 for 10084 [Thread 0] e2fsck_pass1_run:2680: increase inode 174 badness 0 to 2 for 10084 [Thread 0] e2fsck_pass1_run:2680: increase inode 175 badness 0 to 2 for 10084 [Thread 0] group 0 finished [Thread 0] e2fsck_pass1_run:2316: increase inode 20036 badness 0 to 1 for 1005c [Thread 0] e2fsck_pass1_run:2680: increase inode 20036 badness 1 to 3 for 10084 [Thread 0] e2fsck_pass1_run:2316: increase inode 20098 badness 0 to 1 for 1005c [Thread 0] e2fsck_pass1_run:2316: increase inode 20099 badness 0 to 1 for 1005c [Thread 0] e2fsck_pass1_run:2316: increase inode 20100 badness 0 to 1 for 1005c [Thread 0] e2fsck_pass1_run:2316: increase inode 20102 badness 0 to 1 for 1005c [Thread 0] e2fsck_pass1_run:2316: increase inode 20103 badness 0 to 1 for 1005c [Thread 0] group 1 finished [Thread 0] Pass 1: Memory used: 264k/0k (146k/119k), time: 0.00/ 0.00/ 0.00 [Thread 0] Pass 1: I/O read: 1MB, write: 0MB, rate: 249.25MB/s [Thread 0] Scanned group range [0, 1], used inodes 290/389 Pass 2: Checking directory structure Pass 2: Memooleg442-server: [QUOTA WARNING] Usage inconsistent for ID 0:actual (2682880, 280) != expected (2682880, 281) oleg442-server: [QUOTA WARNING] Usage inconsistent for ID 0:actual (2682880, 280) != expected (2682880, 281) oleg442-server: [QUOTA WARNING] Usage inconsistent for ID 0:actual (2682880, 280) != expected (2682880, 281) pdsh@oleg442-client: oleg442-server: ssh exited with exit code 1 ry used: 264k/0k (97k/168k), time: 0.00/ 0.00/ 0.00 Pass 2: I/O read: 1MB, write: 0MB, rate: 207.56MB/s Pass 3: Checking directory connectivity Peak memory: Memory used: 264k/0k (97k/168k), time: 0.01/ 0.00/ 0.00 Unconnected directory inode 20099 (was in /ROOT/d30.sanity-lfsck/foo) Connect to /lost+found? yes Unconnected directory inode 20102 (was in /lost+found/#20099) Connect to /lost+found? yes Pass 3A: Memory used: 264k/0k (98k/167k), time: 0.00/ 0.00/ 0.00 Pass 3A: I/O read: 0MB, write: 0MB, rate: 0.00MB/s Pass 3: Memory used: 264k/0k (96k/169k), time: 0.00/ 0.00/ 0.00 Pass 3: I/O read: 1MB, write: 0MB, rate: 2816.90MB/s Pass 4: Checking reference counts Unattached inode 181 Connect to /lost+found? yes Inode 181 ref count is 2, should be 1. Fix? yes Unattached inode 183 Connect to /lost+found? yes Inode 183 ref count is 2, should be 1. Fix? yes Unattached inode 184 Connect to /lost+found? yes Inode 184 ref count is 2, should be 1. Fix? yes Unattached inode 194 Connect to /lost+found? yes Inode 194 ref count is 2, should be 1. Fix? yes Inode 20102 ref count is 3, should be 2. Fix? yes Inode 20103 ref count is 1, should be 2. Fix? yes Pass 4: Memory used: 264k/0k (83k/182k), time: 0.00/ 0.00/ 0.00 Pass 4: I/O read: 1MB, write: 1MB, rate: 604.59MB/s Pass 5: Checking group summary information Pass 5: Memory used: 372k/0k (82k/291k), time: 0.00/ 0.00/ 0.00 Pass 5: I/O read: 1MB, write: 1MB, rate: 583.09MB/s Update quota info for quota type 0? yes Update quota info for quota type 1? yes Update quota info for quota type 2? yes lustre-MDT0000: ***** FILE SYSTEM WAS MODIFIED ***** 290 inodes used (0.72%, out of 40000) 4 non-contiguous files (1.4%) 0 non-contiguous directories (0.0%) # of inodes with ind/dind/tind blocks: 0/0/0 Extent depth histogram: 211/2 12738 blocks used (50.95%, out of 25000) 0 bad blocks 1 large file 157 regular files 123 directories 0 character device files 0 block device files 0 fifos 4294967294 links 0 symbolic links (0 fast symbolic links) 0 sockets ------------ 274 files Memory used: 372k/0k (85k/288k), time: 0.02/ 0.01/ 0.00 I/O read: 1MB, write: 1MB, rate: 52.20MB/s oleg442-server: oleg442-server.virtnet: executing set_default_debug vfstrace rpctrace dlmtrace neterror ha config ioctl super lfsck all pdsh@oleg442-client: oleg442-server: ssh exited with exit code 1 Trigger namespace LFSCK to recover backend orphans Started LFSCK on the device lustre-MDT0000: scrub namespace Starting client: oleg442-client.virtnet: -o user_xattr,flock 192.168.204.142@tcp:/lustre /mnt/lustre File: /mnt/lustre/d30.sanity-lfsck/foo/f0 Size: 0 Blocks: 0 IO Block: 4194304 regular empty file Device: 2c54f966h/743766374d Inode: 144115272414920866 Links: 1 Access: (0644/-rw-r--r--) Uid: ( 0/ root) Gid: ( 0/ root) Access: 2026-03-16 10:18:03.000000000 -0400 Modify: 2026-03-16 10:18:03.000000000 -0400 Change: 2026-03-16 10:18:11.000000000 -0400 Birth: 2026-03-16 10:18:04.000000000 -0400 total 12 144115188109410307 dr-x------ 4 root root 4096 Dec 31 1969 . 144115205306056705 drwx------ 3 root root 4096 Mar 16 10:17 MDT0000 162129603815538689 drwx------ 2 root root 4096 Mar 16 10:17 MDT0001 d0 should become orphan under .lustre/lost+found/MDT0000/ total 16 144115205306056705 drwx------ 3 root root 4096 Mar 16 10:17 . 144115188109410307 dr-x------ 4 root root 4096 Dec 31 1969 .. 144115272414920830 -rw-r--r-- 1 root root 6 Mar 16 10:16 [0x2000013a3:0x7e:0x0]-O-0 144115272414920847 -rw-r--r-- 1 root root 0 Mar 16 10:17 [0x2000013a3:0x8f:0x0]-O-0 144115272414920867 drwxr-xr-x 3 root root 4096 Mar 16 10:18 [0x2000013a3:0xa3:0x0]-[0x2000013a3:0xa1:0x0]-D-0 File: /mnt/lustre/.lustre/lost+found/MDT0000/[0x2000013a3:0xa3:0x0]-[0x2000013a3:0xa1:0x0]-D-0/d1 Size: 4096 Blocks: 8 IO Block: 1048576 directory Device: 2c54f966h/743766374d Inode: 144115272414920869 Links: 2 Access: (0755/drwxr-xr-x) Uid: ( 0/ root) Gid: ( 0/ root) Access: 2026-03-16 10:18:04.000000000 -0400 Modify: 2026-03-16 10:18:04.000000000 -0400 Change: 2026-03-16 10:18:11.000000000 -0400 Birth: 2026-03-16 10:18:04.000000000 -0400 File: /mnt/lustre/.lustre/lost+found/MDT0000/[0x2000013a3:0xa3:0x0]-[0x2000013a3:0xa1:0x0]-D-0/f1 Size: 0 Blocks: 0 IO Block: 4194304 regular empty file Device: 2c54f966h/743766374d Inode: 144115272414920868 Links: 1 Access: (0644/-rw-r--r--) Uid: ( 0/ root) Gid: ( 0/ root) Access: 2026-03-16 10:18:04.000000000 -0400 Modify: 2026-03-16 10:18:04.000000000 -0400 Change: 2026-03-16 10:18:11.000000000 -0400 Birth: 2026-03-16 10:18:04.000000000 -0400 PASS 30 (15s) == sanity-lfsck test 31a: The LFSCK can find/repair the name entry with bad name hash (1) ========================================================== 10:18:18 (1773670698) ##### For the name entry under a striped directory, if the name hash does not match the shard, then the LFSCK will repair the bad name entry ##### Inject failure stub on client to simulate the case that some name entry should be inserted into other non-first shard, but inserted into the first shard by wrong fail_loc=0x1628 fail_val=0 total: 2 mkdir in 0.01 seconds: 162.83 ops/second fail_loc=0 fail_val=0 Trigger namespace LFSCK to repair bad name hash Started LFSCK on the device lustre-MDT0000: scrub namespace 192.168.204.142@tcp:/lustre /mnt/lustre lustre rw,checksum,encrypt,flock,lazystatfs,lruresize,nolock,statfs_project,nouser_fid2path,user_xattr,verbose 0 0 Stopping client oleg442-client.virtnet /mnt/lustre (opts:) Starting client: oleg442-client.virtnet: -o user_xattr,flock 192.168.204.142@tcp:/lustre /mnt/lustre File: /mnt/lustre/d31a.sanity-lfsck/striped_dir/d0 Size: 4096 Blocks: 8 IO Block: 1048576 directory Device: 2c54f966h/743766374d Inode: 144115339490230275 Links: 2 Access: (0755/drwxr-xr-x) Uid: ( 0/ root) Gid: ( 0/ root) Access: 2026-03-16 10:18:18.000000000 -0400 Modify: 2026-03-16 10:18:18.000000000 -0400 Change: 2026-03-16 10:18:18.000000000 -0400 Birth: 2026-03-16 10:18:18.000000000 -0400 File: /mnt/lustre/d31a.sanity-lfsck/striped_dir/d1 Size: 4096 Blocks: 8 IO Block: 1048576 directory Device: 2c54f966h/743766374d Inode: 144115339490230276 Links: 2 Access: (0755/drwxr-xr-x) Uid: ( 0/ root) Gid: ( 0/ root) Access: 2026-03-16 10:18:18.000000000 -0400 Modify: 2026-03-16 10:18:18.000000000 -0400 Change: 2026-03-16 10:18:18.000000000 -0400 Birth: 2026-03-16 10:18:18.000000000 -0400 PASS 31a (3s) == sanity-lfsck test 31b: The LFSCK can find/repair the name entry with bad name hash (2) ========================================================== 10:18:21 (1773670701) ##### For the name entry under a striped directory, if the name hash does not match the shard, then the LFSCK will repair the bad name entry ##### Inject failure stub on client to simulate the case that some name entry should be inserted into other non-second shard, but inserted into the secod shard by wrong fail_loc=0x1628 fail_val=1 total: 10 mkdir in 0.06 seconds: 175.55 ops/second fail_loc=0 fail_val=0 Trigger namespace LFSCK to repair bad name hash Started LFSCK on the device lustre-MDT0000: scrub namespace repaired 1 name entries with bad hash 192.168.204.142@tcp:/lustre /mnt/lustre lustre rw,checksum,encrypt,flock,lazystatfs,lruresize,nolock,statfs_project,nouser_fid2path,user_xattr,verbose 0 0 Stopping client oleg442-client.virtnet /mnt/lustre (opts:) Starting client: oleg442-client.virtnet: -o user_xattr,flock 192.168.204.142@tcp:/lustre /mnt/lustre File: /mnt/lustre/d31b.sanity-lfsck/striped_dir/d0 Size: 4096 Blocks: 8 IO Block: 1048576 directory Device: 2c54f966h/743766374d Inode: 144115339507007491 Links: 2 Access: (0755/drwxr-xr-x) Uid: ( 0/ root) Gid: ( 0/ root) Access: 2026-03-16 10:18:22.000000000 -0400 Modify: 2026-03-16 10:18:22.000000000 -0400 Change: 2026-03-16 10:18:22.000000000 -0400 Birth: 2026-03-16 10:18:22.000000000 -0400 File: /mnt/lustre/d31b.sanity-lfsck/striped_dir/d1 Size: 4096 Blocks: 8 IO Block: 1048576 directory Device: 2c54f966h/743766374d Inode: 144115339507007492 Links: 2 Access: (0755/drwxr-xr-x) Uid: ( 0/ root) Gid: ( 0/ root) Access: 2026-03-16 10:18:22.000000000 -0400 Modify: 2026-03-16 10:18:22.000000000 -0400 Change: 2026-03-16 10:18:22.000000000 -0400 Birth: 2026-03-16 10:18:22.000000000 -0400 File: /mnt/lustre/d31b.sanity-lfsck/striped_dir/d2 Size: 4096 Blocks: 8 IO Block: 1048576 directory Device: 2c54f966h/743766374d Inode: 144115339507007493 Links: 2 Access: (0755/drwxr-xr-x) Uid: ( 0/ root) Gid: ( 0/ root) Access: 2026-03-16 10:18:22.000000000 -0400 Modify: 2026-03-16 10:18:22.000000000 -0400 Change: 2026-03-16 10:18:22.000000000 -0400 Birth: 2026-03-16 10:18:22.000000000 -0400 File: /mnt/lustre/d31b.sanity-lfsck/striped_dir/d3 Size: 4096 Blocks: 8 IO Block: 1048576 directory Device: 2c54f966h/743766374d Inode: 144115339507007494 Links: 2 Access: (0755/drwxr-xr-x) Uid: ( 0/ root) Gid: ( 0/ root) Access: 2026-03-16 10:18:22.000000000 -0400 Modify: 2026-03-16 10:18:22.000000000 -0400 Change: 2026-03-16 10:18:22.000000000 -0400 Birth: 2026-03-16 10:18:22.000000000 -0400 File: /mnt/lustre/d31b.sanity-lfsck/striped_dir/d4 Size: 4096 Blocks: 8 IO Block: 1048576 directory Device: 2c54f966h/743766374d Inode: 144115339507007495 Links: 2 Access: (0755/drwxr-xr-x) Uid: ( 0/ root) Gid: ( 0/ root) Access: 2026-03-16 10:18:22.000000000 -0400 Modify: 2026-03-16 10:18:22.000000000 -0400 Change: 2026-03-16 10:18:22.000000000 -0400 Birth: 2026-03-16 10:18:22.000000000 -0400 File: /mnt/lustre/d31b.sanity-lfsck/striped_dir/d5 Size: 4096 Blocks: 8 IO Block: 1048576 directory Device: 2c54f966h/743766374d Inode: 144115339507007496 Links: 2 Access: (0755/drwxr-xr-x) Uid: ( 0/ root) Gid: ( 0/ root) Access: 2026-03-16 10:18:22.000000000 -0400 Modify: 2026-03-16 10:18:22.000000000 -0400 Change: 2026-03-16 10:18:22.000000000 -0400 Birth: 2026-03-16 10:18:22.000000000 -0400 File: /mnt/lustre/d31b.sanity-lfsck/striped_dir/d6 Size: 4096 Blocks: 8 IO Block: 1048576 directory Device: 2c54f966h/743766374d Inode: 144115339507007497 Links: 2 Access: (0755/drwxr-xr-x) Uid: ( 0/ root) Gid: ( 0/ root) Access: 2026-03-16 10:18:22.000000000 -0400 Modify: 2026-03-16 10:18:22.000000000 -0400 Change: 2026-03-16 10:18:22.000000000 -0400 Birth: 2026-03-16 10:18:22.000000000 -0400 File: /mnt/lustre/d31b.sanity-lfsck/striped_dir/d7 Size: 4096 Blocks: 8 IO Block: 1048576 directory Device: 2c54f966h/743766374d Inode: 144115339507007498 Links: 2 Access: (0755/drwxr-xr-x) Uid: ( 0/ root) Gid: ( 0/ root) Access: 2026-03-16 10:18:22.000000000 -0400 Modify: 2026-03-16 10:18:22.000000000 -0400 Change: 2026-03-16 10:18:22.000000000 -0400 Birth: 2026-03-16 10:18:22.000000000 -0400 File: /mnt/lustre/d31b.sanity-lfsck/striped_dir/d8 Size: 4096 Blocks: 8 IO Block: 1048576 directory Device: 2c54f966h/743766374d Inode: 144115339507007499 Links: 2 Access: (0755/drwxr-xr-x) Uid: ( 0/ root) Gid: ( 0/ root) Access: 2026-03-16 10:18:22.000000000 -0400 Modify: 2026-03-16 10:18:22.000000000 -0400 Change: 2026-03-16 10:18:22.000000000 -0400 Birth: 2026-03-16 10:18:22.000000000 -0400 File: /mnt/lustre/d31b.sanity-lfsck/striped_dir/d9 Size: 4096 Blocks: 8 IO Block: 1048576 directory Device: 2c54f966h/743766374d Inode: 144115339507007500 Links: 2 Access: (0755/drwxr-xr-x) Uid: ( 0/ root) Gid: ( 0/ root) Access: 2026-03-16 10:18:22.000000000 -0400 Modify: 2026-03-16 10:18:22.000000000 -0400 Change: 2026-03-16 10:18:22.000000000 -0400 Birth: 2026-03-16 10:18:22.000000000 -0400 PASS 31b (4s) == sanity-lfsck test 31c: Re-generate the lost master LMV EA for striped directory ========================================================== 10:18:25 (1773670705) ##### For some reason, the master MDT-object of the striped directory may lost its master LMV EA. If nobody created files under the master directly after the master LMV EA lost, then the LFSCK should re-generate the master LMV EA. ##### Inject failure stub on MDT0 to simulate the case that the master MDT-object of the striped directory lost the LMV EA. fail_loc=0x1629 fail_loc=0 Trigger namespace LFSCK to re-generate master LMV EA Started LFSCK on the device lustre-MDT0000: scrub namespace 192.168.204.142@tcp:/lustre /mnt/lustre lustre rw,checksum,encrypt,flock,lazystatfs,lruresize,nolock,statfs_project,nouser_fid2path,user_xattr,verbose 0 0 Stopping client oleg442-client.virtnet /mnt/lustre (opts:) Starting client: oleg442-client.virtnet: -o user_xattr,flock 192.168.204.142@tcp:/lustre /mnt/lustre PASS 31c (31s) == sanity-lfsck test 31d: Set broken striped directory (modified after broken) as read-only ========================================================== 10:18:56 (1773670736) ##### For some reason, the master MDT-object of the striped directory may lost its master LMV EA. If somebody created files under the master directly after the master LMV EA lost, then the LFSCK should NOT re-generate the master LMV EA, instead, it should change the broken striped dirctory as read-only to prevent further damage ##### Inject failure stub on MDT0 to simulate the case that the master MDT-object of the striped directory lost the LMV EA. fail_loc=0x1629 fail_loc=0x0 192.168.204.142@tcp:/lustre /mnt/lustre lustre rw,checksum,encrypt,flock,lazystatfs,lruresize,nolock,statfs_project,nouser_fid2path,user_xattr,verbose 0 0 Stopping client oleg442-client.virtnet /mnt/lustre (opts:) Stopping /mnt/lustre-mds1 (opts:) on oleg442-server Start mds1: mount -t lustre -o localrecov /dev/mapper/mds1_flakey /mnt/lustre-mds1 oleg442-server: oleg442-server.virtnet: executing set_default_debug vfstrace rpctrace dlmtrace neterror ha config ioctl super lfsck all pdsh@oleg442-client: oleg442-server: ssh exited with exit code 1 Started lustre-MDT0000 Starting client: oleg442-client.virtnet: -o user_xattr,flock 192.168.204.142@tcp:/lustre /mnt/lustre Trigger namespace LFSCK to find out the inconsistency Started LFSCK on the device lustre-MDT0000: scrub namespace File: /mnt/lustre/d31d.sanity-lfsck/striped_dir/dummy Size: 0 Blocks: 0 IO Block: 4194304 regular empty file Device: 2c54f966h/743766374d Inode: 144115373044662273 Links: 1 Access: (0644/-rw-r--r--) Uid: ( 0/ root) Gid: ( 0/ root) Access: 2026-03-16 10:19:14.000000000 -0400 Modify: 2026-03-16 10:19:14.000000000 -0400 Change: 2026-03-16 10:19:14.000000000 -0400 Birth: 2026-03-16 10:19:14.000000000 -0400 touch: setting times of '/mnt/lustre/d31d.sanity-lfsck/striped_dir/foo': No such file or directory Trigger namespace LFSCK to find out the inconsistency Started LFSCK on the device lustre-MDT0000: scrub namespace Stopping /mnt/lustre-mds1 (opts:) on oleg442-server Start mds1: mount -t lustre -o localrecov /dev/mapper/mds1_flakey /mnt/lustre-mds1 oleg442-server: oleg442-server.virtnet: executing set_default_debug vfstrace rpctrace dlmtrace neterror ha config ioctl super lfsck all pdsh@oleg442-client: oleg442-server: ssh exited with exit code 1 Started lustre-MDT0000 PASS 31d (30s) == sanity-lfsck test 31e: Re-generate the lost slave LMV EA for striped directory (1) ========================================================== 10:19:26 (1773670766) ##### For some reason, the slave MDT-object of the striped directory may lost its slave LMV EA. The LFSCK should re-generate the slave LMV EA. ##### Inject failure stub on MDT0 to simulate the case that the slave MDT-object (that resides on the same MDT as the master MDT-object resides on) lost the LMV EA. fail_loc=0x162a fail_val=0 fail_loc=0x0 fail_val=0 Trigger namespace LFSCK to re-generate slave LMV EA Started LFSCK on the device lustre-MDT0000: scrub namespace PASS 31e (4s) == sanity-lfsck test 31f: Re-generate the lost slave LMV EA for striped directory (2) ========================================================== 10:19:30 (1773670770) ##### For some reason, the slave MDT-object of the striped directory may lost its slave LMV EA. The LFSCK should re-generate the slave LMV EA. ##### Inject failure stub on MDT0 to simulate the case that the slave MDT-object (that resides on different MDT as the master MDT-object resides on) lost the LMV EA. fail_loc=0x162a fail_val=1 fail_loc=0x0 fail_val=0 Trigger namespace LFSCK to re-generate slave LMV EA Started LFSCK on the device lustre-MDT0000: scrub namespace PASS 31f (3s) == sanity-lfsck test 31g: Repair the corrupted slave LMV EA ========================================================== 10:19:33 (1773670773) ##### For some reason, the stripe index in the slave LMV EA is corrupted. The LFSCK should repair the slave LMV EA. ##### Inject failure stub on MDT0 to simulate the case that the slave LMV EA on the first shard of the striped directory claims the same index as the second shard claims fail_loc=0x162b fail_val=0 fail_loc=0x0 fail_val=0 Trigger namespace LFSCK to repair the slave LMV EA Started LFSCK on the device lustre-MDT0000: scrub namespace 192.168.204.142@tcp:/lustre /mnt/lustre lustre rw,checksum,encrypt,flock,lazystatfs,lruresize,nolock,statfs_project,nouser_fid2path,user_xattr,verbose 0 0 Stopping client oleg442-client.virtnet /mnt/lustre (opts:) Starting client: oleg442-client.virtnet: -o user_xattr,flock 192.168.204.142@tcp:/lustre /mnt/lustre PASS 31g (33s) == sanity-lfsck test 31h: Repair the corrupted shard's name entry ========================================================== 10:20:06 (1773670806) ##### For some reason, the shard's name entry in the striped directory may be corrupted. The LFSCK should repair the bad shard's name entry. ##### Inject failure stub on MDT0 to simulate the case that the first shard's name entry in the striped directory claims the same index as the second shard's name entry claims. fail_loc=0x162c fail_val=0 fail_loc=0x0 fail_val=0 Trigger namespace LFSCK to repair the shard's name entry Started LFSCK on the device lustre-MDT0000: scrub namespace 192.168.204.142@tcp:/lustre /mnt/lustre lustre rw,checksum,encrypt,flock,lazystatfs,lruresize,nolock,statfs_project,nouser_fid2path,user_xattr,verbose 0 0 Stopping client oleg442-client.virtnet /mnt/lustre (opts:) Starting client: oleg442-client.virtnet: -o user_xattr,flock 192.168.204.142@tcp:/lustre /mnt/lustre PASS 31h (5s) == sanity-lfsck test 32a: stop LFSCK when some OST failed ========================================================== 10:20:11 (1773670811) preparing... 5 * 5 files will be created Mon Mar 16 10:20:12 EDT 2026. total: 5 mkdir in 0.01 seconds: 579.64 ops/second total: 5 create in 0.01 seconds: 713.05 ops/second total: 5 mkdir in 0.01 seconds: 501.41 ops/second prepared Mon Mar 16 10:20:13 EDT 2026. 192.168.204.142@tcp:/lustre /mnt/lustre lustre rw,checksum,encrypt,flock,lazystatfs,lruresize,nolock,statfs_project,nouser_fid2path,user_xattr,verbose 0 0 Stopping client oleg442-client.virtnet /mnt/lustre (opts:) fail_val=3 fail_loc=0x162d Started LFSCK on the device lustre-MDT0000: scrub layout stop ost1 fail_loc=0 fail_val=0 stop LFSCK Stopped LFSCK on the device lustre-MDT0000. oleg442-server: oleg442-server.virtnet: executing set_default_debug vfstrace rpctrace dlmtrace neterror ha config ioctl super lfsck all pdsh@oleg442-client: oleg442-server: ssh exited with exit code 1 PASS 32a (15s) == sanity-lfsck test 32b: stop LFSCK when some MDT failed ========================================================== 10:20:26 (1773670826) Checking servers environments Checking clients oleg442-client.virtnet environments /home/green/git/lustre-release/lustre/tests/test-framework.sh: line 1040: echo: write error: Device or resource busy Loading modules from /home/green/git/lustre-release/lustre detected 4 online CPUs by sysfs MODOPTS_LIBCFS= Force libcfs to create 2 CPU partitions loading modules on: 'oleg442-server' oleg442-server: oleg442-server.virtnet: executing load_modules_local oleg442-server: Loading modules from /home/green/git/lustre-release/lustre oleg442-server: /home/green/git/lustre-release/lustre/tests/test-framework.sh: line 1040: echo: write error: Device or resource busy oleg442-server: detected 4 online CPUs by sysfs oleg442-server: MODOPTS_LIBCFS= oleg442-server: Force libcfs to create 2 CPU partitions Setup mgs, mdt, osts Start mds1: mount -t lustre -o localrecov /dev/mapper/mds1_flakey /mnt/lustre-mds1 oleg442-server: mount.lustre: according to /etc/mtab /dev/mapper/mds1_flakey is already mounted on /mnt/lustre-mds1 pdsh@oleg442-client: oleg442-server: ssh exited with exit code 17 Start of /dev/mapper/mds1_flakey on mds1 failed 17 Start mds2: mount -t lustre -o localrecov /dev/mapper/mds2_flakey /mnt/lustre-mds2 oleg442-server: mount.lustre: according to /etc/mtab /dev/mapper/mds2_flakey is already mounted on /mnt/lustre-mds2 pdsh@oleg442-client: oleg442-server: ssh exited with exit code 17 Start of /dev/mapper/mds2_flakey on mds2 failed 17 Start ost1: mount -t lustre -o localrecov /dev/mapper/ost1_flakey /mnt/lustre-ost1 oleg442-server: mount.lustre: according to /etc/mtab /dev/mapper/ost1_flakey is already mounted on /mnt/lustre-ost1 pdsh@oleg442-client: oleg442-server: ssh exited with exit code 17 seq.cli-lustre-OST0000-super.width=65536 Start of /dev/mapper/ost1_flakey on ost1 failed 17 Start ost2: mount -t lustre -o localrecov /dev/mapper/ost2_flakey /mnt/lustre-ost2 oleg442-server: mount.lustre: according to /etc/mtab /dev/mapper/ost2_flakey is already mounted on /mnt/lustre-ost2 pdsh@oleg442-client: oleg442-server: ssh exited with exit code 17 seq.cli-lustre-OST0001-super.width=65536 Start of /dev/mapper/ost2_flakey on ost2 failed 17 Starting client: oleg442-client.virtnet: -o user_xattr,flock 192.168.204.142@tcp:/lustre /mnt/lustre Starting client oleg442-client.virtnet: -o user_xattr,flock 192.168.204.142@tcp:/lustre /mnt/lustre Started clients oleg442-client.virtnet: 192.168.204.142@tcp:/lustre on /mnt/lustre type lustre (rw,checksum,encrypt,flock,lazystatfs,lruresize,nolock,statfs_project,nouser_fid2path,user_xattr,verbose) Using TIMEOUT=20 osc.lustre-OST0000-osc-ffff9cb084f2c000.idle_timeout=debug osc.lustre-OST0001-osc-ffff9cb084f2c000.idle_timeout=debug disable quota as required preparing... 5 * 5 files will be created Mon Mar 16 10:20:53 EDT 2026. total: 5 mkdir in 0.01 seconds: 452.36 ops/second total: 5 create in 0.01 seconds: 456.83 ops/second total: 5 mkdir in 0.01 seconds: 527.76 ops/second prepared Mon Mar 16 10:20:54 EDT 2026. 192.168.204.142@tcp:/lustre /mnt/lustre lustre rw,checksum,encrypt,flock,lazystatfs,lruresize,nolock,statfs_project,nouser_fid2path,user_xattr,verbose 0 0 Stopping client oleg442-client.virtnet /mnt/lustre (opts:) fail_val=3 fail_loc=0x162d Started LFSCK on the device lustre-MDT0000: scrub namespace stop mds2 fail_loc=0 fail_val=0 stop LFSCK Stopped LFSCK on the device lustre-MDT0000. oleg442-server: oleg442-server.virtnet: executing set_default_debug vfstrace rpctrace dlmtrace neterror ha config ioctl super lfsck all pdsh@oleg442-client: oleg442-server: ssh exited with exit code 1 PASS 32b (44s) == sanity-lfsck test 33: check LFSCK paramters =========== 10:21:10 (1773670870) Checking servers environments Checking clients oleg442-client.virtnet environments /home/green/git/lustre-release/lustre/tests/test-framework.sh: line 1040: echo: write error: Device or resource busy Loading modules from /home/green/git/lustre-release/lustre detected 4 online CPUs by sysfs MODOPTS_LIBCFS= Force libcfs to create 2 CPU partitions loading modules on: 'oleg442-server' oleg442-server: oleg442-server.virtnet: executing load_modules_local oleg442-server: Loading modules from /home/green/git/lustre-release/lustre oleg442-server: /home/green/git/lustre-release/lustre/tests/test-framework.sh: line 1040: echo: write error: Device or resource busy oleg442-server: detected 4 online CPUs by sysfs oleg442-server: MODOPTS_LIBCFS= oleg442-server: Force libcfs to create 2 CPU partitions Setup mgs, mdt, osts Start mds1: mount -t lustre -o localrecov /dev/mapper/mds1_flakey /mnt/lustre-mds1 oleg442-server: mount.lustre: according to /etc/mtab /dev/mapper/mds1_flakey is already mounted on /mnt/lustre-mds1 pdsh@oleg442-client: oleg442-server: ssh exited with exit code 17 Start of /dev/mapper/mds1_flakey on mds1 failed 17 Start mds2: mount -t lustre -o localrecov /dev/mapper/mds2_flakey /mnt/lustre-mds2 oleg442-server: mount.lustre: according to /etc/mtab /dev/mapper/mds2_flakey is already mounted on /mnt/lustre-mds2 pdsh@oleg442-client: oleg442-server: ssh exited with exit code 17 Start of /dev/mapper/mds2_flakey on mds2 failed 17 Start ost1: mount -t lustre -o localrecov /dev/mapper/ost1_flakey /mnt/lustre-ost1 oleg442-server: mount.lustre: according to /etc/mtab /dev/mapper/ost1_flakey is already mounted on /mnt/lustre-ost1 pdsh@oleg442-client: oleg442-server: ssh exited with exit code 17 seq.cli-lustre-OST0000-super.width=65536 Start of /dev/mapper/ost1_flakey on ost1 failed 17 Start ost2: mount -t lustre -o localrecov /dev/mapper/ost2_flakey /mnt/lustre-ost2 oleg442-server: mount.lustre: according to /etc/mtab /dev/mapper/ost2_flakey is already mounted on /mnt/lustre-ost2 pdsh@oleg442-client: oleg442-server: ssh exited with exit code 17 seq.cli-lustre-OST0001-super.width=65536 Start of /dev/mapper/ost2_flakey on ost2 failed 17 Starting client: oleg442-client.virtnet: -o user_xattr,flock 192.168.204.142@tcp:/lustre /mnt/lustre Starting client oleg442-client.virtnet: -o user_xattr,flock 192.168.204.142@tcp:/lustre /mnt/lustre Started clients oleg442-client.virtnet: 192.168.204.142@tcp:/lustre on /mnt/lustre type lustre (rw,checksum,encrypt,flock,lazystatfs,lruresize,nolock,statfs_project,nouser_fid2path,user_xattr,verbose) Using TIMEOUT=20 osc.lustre-OST0000-osc-ffff9cb09a5c8800.idle_timeout=debug osc.lustre-OST0001-osc-ffff9cb09a5c8800.idle_timeout=debug disable quota as required preparing... 5 * 5 files will be created Mon Mar 16 10:21:37 EDT 2026. total: 5 mkdir in 0.01 seconds: 576.70 ops/second total: 5 create in 0.01 seconds: 677.79 ops/second total: 5 mkdir in 0.01 seconds: 565.54 ops/second prepared Mon Mar 16 10:21:38 EDT 2026. Started LFSCK on the device lustre-MDT0000: scrub layout Started LFSCK on the device lustre-MDT0000: scrub namespace PASS 33 (32s) == sanity-lfsck test 34: LFSCK can rebuild the lost agent object ========================================================== 10:21:42 (1773670902) SKIP: sanity-lfsck test_34 Only valid for ZFS backend SKIP 34 (1s) == sanity-lfsck test 35: LFSCK can rebuild the lost agent entry ========================================================== 10:21:43 (1773670903) preparing... 1 * 1 files will be created Mon Mar 16 10:21:44 EDT 2026. total: 1 mkdir in 0.00 seconds: 451.63 ops/second total: 1 create in 0.00 seconds: 387.32 ops/second total: 1 mkdir in 0.00 seconds: 435.14 ops/second prepared Mon Mar 16 10:21:45 EDT 2026. fail_loc=0x1631 fail_loc=0 Started LFSCK on the device lustre-MDT0000: scrub namespace stopall to cleanup object cache setupall /home/green/git/lustre-release/lustre/tests/test-framework.sh: line 1040: echo: write error: Device or resource busy oleg442-server: oleg442-server.virtnet: executing load_modules_local oleg442-server: /home/green/git/lustre-release/lustre/tests/test-framework.sh: line 1040: echo: write error: Device or resource busy oleg442-server: oleg442-server.virtnet: executing set_default_debug vfstrace rpctrace dlmtrace neterror ha config ioctl super lfsck all pdsh@oleg442-client: oleg442-server: ssh exited with exit code 1 oleg442-server: oleg442-server.virtnet: executing set_default_debug vfstrace rpctrace dlmtrace neterror ha config ioctl super lfsck all pdsh@oleg442-client: oleg442-server: ssh exited with exit code 1 oleg442-server: oleg442-server.virtnet: executing set_default_debug vfstrace rpctrace dlmtrace neterror ha config ioctl super lfsck all pdsh@oleg442-client: oleg442-server: ssh exited with exit code 1 oleg442-server: oleg442-server.virtnet: executing set_default_debug vfstrace rpctrace dlmtrace neterror ha config ioctl super lfsck all pdsh@oleg442-client: oleg442-server: ssh exited with exit code 1 Using TIMEOUT=20 Started LFSCK on the device lustre-MDT0000: scrub namespace PASS 35 (74s) == sanity-lfsck test 36a: rebuild LOV EA for mirrored file (1) ========================================================== 10:22:57 (1773670977) SKIP: sanity-lfsck test_36a needs >= 3 OSTs SKIP 36a (1s) == sanity-lfsck test 36b: rebuild LOV EA for mirrored file (2) ========================================================== 10:22:58 (1773670978) SKIP: sanity-lfsck test_36b needs >= 3 OSTs SKIP 36b (1s) == sanity-lfsck test 36c: rebuild LOV EA for mirrored file (3) ========================================================== 10:22:59 (1773670979) SKIP: sanity-lfsck test_36c needs >= 3 OSTs SKIP 36c (1s) == sanity-lfsck test 37: LFSCK must skip a ORPHAN ======== 10:23:00 (1773670980) multiop /mnt/lustre/d37.sanity-lfsck/d0 vD_c TMPPIPE=/tmp/multiop_open_wait_pipe.8184 Started LFSCK on the device lustre-MDT0000: scrub namespace stat: cannot statx '/mnt/lustre/d37.sanity-lfsck/d0': No such file or directory PASS 37 (3s) == sanity-lfsck test 38: LFSCK does not break foreign file and reverse is also true ========================================================== 10:23:03 (1773670983) striped dir -i0 -c2 -H crush /mnt/lustre/d38.sanity-lfsck lfm_magic: 0x0BD70BD0 lfm_length: 73 lfm_type: 0x00000000 (none) lfm_flags: 0x0000DA05 lfm_value: '95546897-1a60-41d4-a0ec-ab0fa45a6f79@77e8e0d5-5230-4e43-b20d-9d4d22dd110c' lfs setstripe: setstripe error for '/mnt/lustre/d38.sanity-lfsck/f38.sanity-lfsck': stripe already set Started LFSCK on the device lustre-MDT0000: scrub namespace Started LFSCK on the device lustre-MDT0000: scrub layout post-lfsck checks of foreign file lfm_magic: 0x0BD70BD0 lfm_length: 73 lfm_type: 0x00000000 (none) lfm_flags: 0x0000DA05 lfm_value: '95546897-1a60-41d4-a0ec-ab0fa45a6f79@77e8e0d5-5230-4e43-b20d-9d4d22dd110c' lfs setstripe: setstripe error for '/mnt/lustre/d38.sanity-lfsck/f38.sanity-lfsck': stripe already set cat: /mnt/lustre/d38.sanity-lfsck/f38.sanity-lfsck: No data available cat: write error: Bad file descriptor PASS 38 (35s) == sanity-lfsck test 39: LFSCK does not break foreign dir and reverse is also true ========================================================== 10:23:38 (1773671018) striped dir -i1 -c2 -H fnv_1a_64 /mnt/lustre/d39.sanity-lfsck lfm_magic: 0x0CD50CD0 lfm_length: 73 lfm_type: 0x00000000 (none) lfm_flags: 0x0000DA05 lfm_value: 'e2f1a446-2908-4995-8ab7-07233ed2e188@62f2979d-9c84-4bff-bc44-17044ad275d2' touch: cannot touch '/mnt/lustre/d39.sanity-lfsck/d39.sanity-lfsck2/f39.sanity-lfsck': No data available Started LFSCK on the device lustre-MDT0000: scrub namespace Started LFSCK on the device lustre-MDT0000: scrub layout post-lfsck checks of foreign dir lfm_magic: 0x0CD50CD0 lfm_length: 73 lfm_type: 0x00000000 (none) lfm_flags: 0x0000DA05 lfm_value: 'e2f1a446-2908-4995-8ab7-07233ed2e188@62f2979d-9c84-4bff-bc44-17044ad275d2' touch: cannot touch '/mnt/lustre/d39.sanity-lfsck/d39.sanity-lfsck2/f39.sanity-lfsck': No data available PASS 39 (5s) == sanity-lfsck test 40a: LFSCK correctly fixes lmm_oi in composite layout ========================================================== 10:23:43 (1773671023) Migrate /mnt/lustre/d40a.sanity-lfsck/dir1 from MDT1 to MDT0 trigger LFSCK for layout Started LFSCK on the device lustre-MDT0000: scrub layout PASS 40a (5s) == sanity-lfsck test 41: SEL support in LFSCK ============ 10:23:48 (1773671028) debug=+lfsck trigger LFSCK for SEL layout Started LFSCK on the device lustre-MDT0000: scrub layout namespace debug=super ioctl neterror warning dlmtrace error emerg ha rpctrace vfstrace config console lfsck PASS 41 (6s) == sanity-lfsck test 42: LFSCK repairs inconsistent MDT-object/OST-object encryption flags ========================================================== 10:23:54 (1773671034) ##### If the MDT-object has the encryption flag but the OST-object does not, add it to the OST-object. ##### 192.168.204.142@tcp:/lustre /mnt/lustre lustre rw,checksum,encrypt,flock,lazystatfs,lruresize,nolock,statfs_project,nouser_fid2path,user_xattr,verbose 0 0 Stopping client oleg442-client.virtnet /mnt/lustre (opts:) Starting client: oleg442-client.virtnet: -o user_xattr,flock,test_dummy_encryption 192.168.204.142@tcp:/lustre /mnt/lustre 1+0 records in 1+0 records out 1 byte copied, 0.00978886 s, 0.1 kB/s fail_loc=0x1632 fail_loc=0x1632 1+0 records in 1+0 records out 1 byte copied, 0.0125235 s, 0.1 kB/s fail_loc=0x0 fail_loc=0x0 Trigger layout LFSCK to find out inconsistent OST-object enc flag Started LFSCK on the device lustre-MDT0000: scrub layout 192.168.204.142@tcp:/lustre /mnt/lustre lustre rw,checksum,encrypt,flock,lazystatfs,lruresize,nolock,statfs_project,test_dummy_encryption=%s,test_dummy_encryption,nouser_fid2path,user_xattr,verbose 0 0 Stopping client oleg442-client.virtnet /mnt/lustre (opts:) Starting client: oleg442-client.virtnet: -o user_xattr,flock 192.168.204.142@tcp:/lustre /mnt/lustre 1 keys reaped PASS 42 (37s) == sanity-lfsck test 43: LFSCK does not loop endlessly on iget failure in scanning-phase1 ========================================================== 10:24:31 (1773671071) Stopping /mnt/lustre-mds2 (opts:) on oleg442-server Start mds2: mount -t lustre -o localrecov -o abort_recov /dev/mapper/mds2_flakey /mnt/lustre-mds2 oleg442-server: oleg442-server.virtnet: executing set_default_debug vfstrace rpctrace dlmtrace neterror ha config ioctl super lfsck all pdsh@oleg442-client: oleg442-server: ssh exited with exit code 1 Started lustre-MDT0001 fail_loc=0x1e2 Started LFSCK on the device lustre-MDT0001: scrub namespace oleg442-server: oleg442-server.virtnet: executing wait_import_state FULL osp.lustre-MDT0001-os[pc]-MDT0000.*_server_uuid oleg442-server: osp.lustre-MDT0001-os[pc]-MDT0000.*_server_uuid in FULL state after 1 sec PASS 43 (11s) == sanity-lfsck test 44: umount while lfsck is stopping == 10:24:42 (1773671082) preparing... 3 * 3 files will be created Mon Mar 16 10:24:43 EDT 2026. total: 3 mkdir in 0.00 seconds: 683.07 ops/second total: 3 create in 0.00 seconds: 763.94 ops/second total: 3 mkdir in 0.00 seconds: 659.21 ops/second prepared Mon Mar 16 10:24:44 EDT 2026. fail_val=3 fail_loc=0x1600 Started LFSCK on the device lustre-MDT0000: scrub namespace Stopped LFSCK on the device lustre-MDT0000. Stopping /mnt/lustre-mds1 (opts:) on oleg442-server fail_val=0 fail_loc=0 oleg442-server: oleg442-server.virtnet: executing set_default_debug vfstrace rpctrace dlmtrace neterror ha config ioctl super lfsck all pdsh@oleg442-client: oleg442-server: ssh exited with exit code 1 PASS 44 (12s) == sanity-lfsck test 45: LFSCK should fix UID/GID/PROJID of OST object ========================================================== 10:24:54 (1773671094) using SAVE_PROJECT_SUPPORTED=0 /home/green/git/lustre-release/lustre/utils/lfs project -p 1000 /mnt/lustre//f45.sanity-lfsck-0 running as uid/gid/euid/egid 500/500/500/500, groups: 500 [dd] [if=/dev/zero] [bs=1M] [of=/mnt/lustre//f45.sanity-lfsck-0] [count=30] 30+0 records in 30+0 records out 31457280 bytes (31 MB, 30 MiB) copied, 0.299063 s, 105 MB/s check the quota usage after initial write -u 500 space:30720 -g 500 space:30720 -p 1000 space:30720 Stopping /mnt/lustre-ost1 (opts:) on oleg442-server clear the UID/GID/PROJID of the test file -rw-rw-rw- 1 root root 31457280 Mar 16 10:24 /mnt/lustre-ost1/O/280000401/d2/282 0 --------------e------- /mnt/lustre-ost1/O/280000401/d2/282 Start ost1: mount -t lustre -o localrecov /dev/mapper/ost1_flakey /mnt/lustre-ost1 seq.cli-lustre-OST0000-super.width=65536 oleg442-server: oleg442-server.virtnet: executing set_default_debug vfstrace rpctrace dlmtrace neterror ha config ioctl super lfsck all pdsh@oleg442-client: oleg442-server: ssh exited with exit code 1 Started lustre-OST0000 UUID 1K-blocks Used Available Use% Mounted on lustre-MDT0000_UUID 43504 2304 35212 7% /mnt/lustre[MDT:0] lustre-MDT0001_UUID 43504 2168 35348 6% /mnt/lustre[MDT:1] lustre-OST0000_UUID 71076 35688 28388 56% /mnt/lustre[OST:0] lustre-OST0001_UUID 71076 2836 61240 5% /mnt/lustre[OST:1] filesystem_summary: 142152 38524 89628 31% /mnt/lustre oleg442-server: oleg442-server.virtnet: executing wait_import_state FULL os[cp].lustre-OST0000-osc-MDT0000.ost_server_uuid 50 oleg442-server: os[cp].lustre-OST0000-osc-MDT0000.ost_server_uuid in FULL state after 0 sec oleg442-server: oleg442-server.virtnet: executing wait_import_state FULL os[cp].lustre-OST0000-osc-MDT0001.ost_server_uuid 50 oleg442-server: os[cp].lustre-OST0000-osc-MDT0001.ost_server_uuid in FULL state after 0 sec oleg442-client.virtnet: executing wait_import_state FULL osc.lustre-OST0000-osc-ffff9cb083fde800.ost_server_uuid 50 osc.lustre-OST0000-osc-ffff9cb083fde800.ost_server_uuid in FULL state after 0 sec check the quota usage after UID/GID/PROJID is cleared -u 500 space:0 -g 500 space:0 -p 1000 space:0 the quota usage should be transferred to root -u root space:42884 -g root space:42884 -p 0 space:42884 fix the UID/GID/PROJID by LFSCK Started LFSCK on the device lustre-MDT0000: scrub layout the quota usage should be fixed -u 500 space:30720 -g 500 space:30720 -p 1000 space:30720 PASS 45 (41s) debug=0 Stopping clients: oleg442-client.virtnet /mnt/lustre (opts:) Stopping client oleg442-client.virtnet /mnt/lustre opts: Stopping clients: oleg442-client.virtnet /mnt/lustre2 (opts:) Stopping /mnt/lustre-mds1 (opts:-f) on oleg442-server Stopping /mnt/lustre-mds2 (opts:-f) on oleg442-server Stopping /mnt/lustre-ost1 (opts:-f) on oleg442-server Stopping /mnt/lustre-ost2 (opts:-f) on oleg442-server unloading modules via unload_modules_local on: 'oleg442-server' oleg442-server: oleg442-server.virtnet: executing unload_modules_local oleg442-server: modules unloaded. === sanity-lfsck: start setup 10:26:13 (1773671173) === Stopping clients: oleg442-client.virtnet /mnt/lustre (opts:-f) Stopping clients: oleg442-client.virtnet /mnt/lustre2 (opts:-f) pdsh@oleg442-client: oleg442-server: ssh exited with exit code 1 pdsh@oleg442-client: oleg442-server: ssh exited with exit code 1 pdsh@oleg442-client: oleg442-server: ssh exited with exit code 1 pdsh@oleg442-client: oleg442-server: ssh exited with exit code 1 pdsh@oleg442-client: oleg442-server: ssh exited with exit code 2 oleg442-server: oleg442-server.virtnet: executing set_hostid /home/green/git/lustre-release/lustre/tests/test-framework.sh: line 1040: echo: write error: Device or resource busy Loading modules from /home/green/git/lustre-release/lustre detected 4 online CPUs by sysfs MODOPTS_LIBCFS= Force libcfs to create 2 CPU partitions ptlrpc/ptlrpc options: 'lbug_on_grant_miscount=1' quota/lquota options: 'hash_lqs_cur_bits=3' mdt/mdt options: 'mdt_enable_flr_ec=1' ln: failed to create symbolic link '/sbin/.libs': Read-only file system loading modules on: 'oleg442-server' oleg442-server: oleg442-server.virtnet: executing load_modules_local oleg442-server: Loading modules from /home/green/git/lustre-release/lustre oleg442-server: /home/green/git/lustre-release/lustre/tests/test-framework.sh: line 1040: echo: write error: Device or resource busy oleg442-server: detected 4 online CPUs by sysfs oleg442-server: MODOPTS_LIBCFS= oleg442-server: Force libcfs to create 2 CPU partitions oleg442-server: ptlrpc/ptlrpc options: 'lbug_on_grant_miscount=1' oleg442-server: quota/lquota options: 'hash_lqs_cur_bits=3' oleg442-server: mdt/mdt options: 'mdt_enable_flr_ec=1' Formatting mgs, mds, osts Format mds1: /dev/vdc pdsh@oleg442-client: oleg442-server: ssh exited with exit code 1 Format mds2: /dev/vdd pdsh@oleg442-client: oleg442-server: ssh exited with exit code 1 Format ost1: /dev/vde pdsh@oleg442-client: oleg442-server: ssh exited with exit code 1 Format ost2: /dev/vdf pdsh@oleg442-client: oleg442-server: ssh exited with exit code 1 Checking servers environments Checking clients oleg442-client.virtnet environments /home/green/git/lustre-release/lustre/tests/test-framework.sh: line 1040: echo: write error: Device or resource busy Loading modules from /home/green/git/lustre-release/lustre detected 4 online CPUs by sysfs MODOPTS_LIBCFS= Force libcfs to create 2 CPU partitions loading modules on: 'oleg442-server' oleg442-server: oleg442-server.virtnet: executing load_modules_local oleg442-server: Loading modules from /home/green/git/lustre-release/lustre oleg442-server: /home/green/git/lustre-release/lustre/tests/test-framework.sh: line 1040: echo: write error: Device or resource busy oleg442-server: detected 4 online CPUs by sysfs oleg442-server: MODOPTS_LIBCFS= oleg442-server: Force libcfs to create 2 CPU partitions Setup mgs, mdt, osts pdsh@oleg442-client: oleg442-server: ssh exited with exit code 1 Start mds1: mount -t lustre -o localrecov /dev/mapper/mds1_flakey /mnt/lustre-mds1 oleg442-server: oleg442-server.virtnet: executing set_default_debug vfstrace rpctrace dlmtrace neterror ha config ioctl super lfsck all pdsh@oleg442-client: oleg442-server: ssh exited with exit code 1 Commit the device label on /dev/vdc Started lustre-MDT0000 pdsh@oleg442-client: oleg442-server: ssh exited with exit code 1 Start mds2: mount -t lustre -o localrecov /dev/mapper/mds2_flakey /mnt/lustre-mds2 oleg442-server: oleg442-server.virtnet: executing set_default_debug vfstrace rpctrace dlmtrace neterror ha config ioctl super lfsck all pdsh@oleg442-client: oleg442-server: ssh exited with exit code 1 Commit the device label on /dev/vdd Started lustre-MDT0001 pdsh@oleg442-client: oleg442-server: ssh exited with exit code 1 Start ost1: mount -t lustre -o localrecov /dev/mapper/ost1_flakey /mnt/lustre-ost1 seq.cli-lustre-OST0000-super.width=65536 oleg442-server: oleg442-server.virtnet: executing set_default_debug vfstrace rpctrace dlmtrace neterror ha config ioctl super lfsck all pdsh@oleg442-client: oleg442-server: ssh exited with exit code 1 Commit the device label on /dev/vde Started lustre-OST0000 pdsh@oleg442-client: oleg442-server: ssh exited with exit code 1 Start ost2: mount -t lustre -o localrecov /dev/mapper/ost2_flakey /mnt/lustre-ost2 seq.cli-lustre-OST0001-super.width=65536 oleg442-server: oleg442-server.virtnet: executing set_default_debug vfstrace rpctrace dlmtrace neterror ha config ioctl super lfsck all pdsh@oleg442-client: oleg442-server: ssh exited with exit code 1 Commit the device label on /dev/vdf Started lustre-OST0001 Starting client: oleg442-client.virtnet: -o user_xattr,flock 192.168.204.142@tcp:/lustre /mnt/lustre Starting client oleg442-client.virtnet: -o user_xattr,flock 192.168.204.142@tcp:/lustre /mnt/lustre Started clients oleg442-client.virtnet: 192.168.204.142@tcp:/lustre on /mnt/lustre type lustre (rw,checksum,encrypt,flock,lazystatfs,lruresize,nolock,statfs_project,nouser_fid2path,user_xattr,verbose) Using TIMEOUT=20 osc.lustre-OST0000-osc-ffff9cb098994800.idle_timeout=debug osc.lustre-OST0001-osc-ffff9cb098994800.idle_timeout=debug setting jobstats to procname_uid Setting lustre.sys.jobid_var from disable to procname_uid Waiting 90s for 'procname_uid' Updated after 5s: want 'procname_uid' got 'procname_uid' disable quota as required osd-ldiskfs.track_declares_assert=1 === sanity-lfsck: finish setup 10:27:34 (1773671254) === == sanity-lfsck test complete, duration 3038 sec ========= 10:27:35 (1773671255) === sanity-lfsck: start cleanup 10:27:35 (1773671255) === === sanity-lfsck: finish cleanup 10:27:36 (1773671256) ===