[ 2792.747242] Lustre: DEBUG MARKER: User quota (limit: 200) [ 2793.885143] Lustre: DEBUG MARKER: Write 100M (buffered) ... [ 2794.694866] LustreError: 31825:0:(osd_handler.c:694:osd_ro()) lustre-MDT0000: *** setting device osd-zfs read-only *** [ 2794.935126] Lustre: DEBUG MARKER: mds1 REPLAY BARRIER on lustre-MDT0000 [ 2795.326957] Lustre: DEBUG MARKER: Fail mds for 0 seconds [ 2795.867592] Lustre: Failing over lustre-MDT0000 [ 2795.993787] Lustre: server umount lustre-MDT0000 complete [ 2806.485800] Lustre: 2613:0:(client.c:2309:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1686832893/real 1686832893] req@ffff8800beb26880 x1768769582689152/t0(0) o400->MGC192.168.203.165@tcp@0@lo:26/25 lens 224/224 e 0 to 1 dl 1686832900 ref 1 fl Rpc:XNQr/200/ffffffff rc 0/-1 uid:0 gid:0 job:'kworker/u8:2.0' [ 2806.493108] Lustre: 2613:0:(client.c:2309:ptlrpc_expire_one_request()) Skipped 3 previous similar messages [ 2806.495538] LustreError: 166-1: MGC192.168.203.165@tcp: Connection to MGS (at 0@lo) was lost; in progress operations using this service will fail [ 2812.499335] Lustre: Evicted from MGS (at 192.168.203.165@tcp) after server handle changed from 0xa42055622c03b95e to 0xa42055622c163095 [ 2812.503328] Lustre: MGC192.168.203.165@tcp: Connection restored to 192.168.203.165@tcp (at 0@lo) [ 2812.589281] Lustre: lustre-MDT0000-lwp-OST0001: Connection to lustre-MDT0000 (at 0@lo) was lost; in progress operations using this service will wait for recovery to complete [ 2812.595544] Lustre: Skipped 1 previous similar message [ 2812.629695] Lustre: lustre-MDT0000: Imperative Recovery not enabled, recovery window 60-180 [ 2812.650897] Lustre: lustre-MDT0000: in recovery but waiting for the first client to connect [ 2813.420244] Lustre: DEBUG MARKER: oleg365-server.virtnet: executing set_default_debug vfstrace rpctrace dlmtrace neterror ha config ioctl super lfsck all 8 [ 2813.600068] Lustre: lustre-MDT0000: Will be in recovery for at least 1:00, or until 1 client reconnects [ 2813.625800] Lustre: lustre-MDT0000: Recovery over after 0:01, of 1 clients 1 recovered and 0 were evicted. [ 2813.638967] Lustre: lustre-OST0000: deleting orphan objects from 0x240000400:2081 to 0x240000400:2113 [ 2813.638969] Lustre: lustre-OST0001: deleting orphan objects from 0x280000400:2077 to 0x280000400:2113 [ 2815.690803] Lustre: DEBUG MARKER: oleg365-client.virtnet: executing wait_import_state_mount (FULL|IDLE) mdc.lustre-MDT0000-mdc-*.mds_server_uuid [ 2816.067092] Lustre: DEBUG MARKER: mdc.lustre-MDT0000-mdc-*.mds_server_uuid in FULL state after 0 sec [ 2817.622741] LustreError: 2610:0:(ldlm_resource.c:1125:ldlm_resource_complain()) lustre-MDT0000-lwp-OST0000: namespace resource [0x200000006:0x20000:0xea60].0x0 (ffff8800af426b00) refcount nonzero (1) after lock cleanup; forcing cleanup. [ 2817.630679] LustreError: 2610:0:(ldlm_resource.c:1125:ldlm_resource_complain()) Skipped 5 previous similar messages [ 2836.180747] Lustre: ll_ost_io00_004: service thread pid 27906 was inactive for 40.067 seconds. The thread might be hung, or it might only be slow and will resume later. Dumping the stack trace for debugging purposes: [ 2836.187277] Pid: 27906, comm: ll_ost_io00_004 3.10.0-7.9-debug #1 SMP Sat Mar 26 23:28:42 EDT 2022 [ 2836.190063] Call Trace: [ 2836.191120] [<0>] ptlrpc_set_wait+0x7cf/0x850 [ptlrpc] [ 2836.192977] [<0>] ptlrpc_queue_wait+0x86/0x250 [ptlrpc] [ 2836.195019] [<0>] qsd_send_dqacq+0x318/0x370 [lquota] [ 2836.197016] [<0>] qsd_acquire+0xaf6/0xe60 [lquota] [ 2836.198913] [<0>] qsd_op_begin0+0x1a1/0xa80 [lquota] [ 2836.200745] [<0>] qsd_op_begin+0x2a6/0x530 [lquota] [ 2836.202546] [<0>] osd_declare_quota+0xfe/0x4d0 [osd_zfs] [ 2836.204269] [<0>] osd_declare_write_commit+0x3b0/0x810 [osd_zfs] [ 2836.206384] [<0>] ofd_commitrw_write+0x543/0x1930 [ofd] [ 2836.208111] [<0>] ofd_commitrw+0x5f2/0xdc0 [ofd] [ 2836.209804] [<0>] tgt_brw_write+0x1ab5/0x2580 [ptlrpc] [ 2836.212037] [<0>] tgt_request_handle+0x93a/0x19c0 [ptlrpc] [ 2836.214685] [<0>] ptlrpc_server_handle_request+0x251/0xc00 [ptlrpc] [ 2836.216556] [<0>] ptlrpc_main+0xc41/0x16a0 [ptlrpc] [ 2836.217808] [<0>] kthread+0xe4/0xf0 [ 2836.218615] [<0>] ret_from_fork_nospec_begin+0x7/0x21 [ 2836.219837] [<0>] 0xfffffffffffffffe [ 2840.154770] Lustre: 27906:0:(client.c:2309:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1686832890/real 1686832890] req@ffff8800aa9da140 x1768769582689088/t0(0) o601->lustre-MDT0000-lwp-OST0000@0@lo:23/10 lens 336/336 e 0 to 1 dl 1686832934 ref 2 fl Rpc:XNQr/200/ffffffff rc 0/-1 uid:0 gid:0 job:'ll_ost_io00_004.0' [ 2840.164214] LustreError: 15027:0:(qsd_reint.c:635:qqi_reint_delayed()) lustre-OST0000: Delaying reintegration for qtype:0 until pending updates are flushed. [ 2840.168955] LustreError: 15027:0:(qsd_reint.c:635:qqi_reint_delayed()) Skipped 6 previous similar messages [ 2843.532289] Lustre: DEBUG MARKER: (dd_pid=1833, time=25, timeout=600) [ 2848.668771] Lustre: 2614:0:(client.c:2309:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1686832898/real 1686832898] req@ffff8800beb22600 x1768769582689472/t0(0) o400->lustre-MDT0000-lwp-OST0000@0@lo:12/10 lens 224/224 e 0 to 1 dl 1686832942 ref 1 fl Rpc:XNQr/200/ffffffff rc 0/-1 uid:0 gid:0 job:'kworker/u8:2.0' [ 2848.668778] Lustre: 2613:0:(client.c:2309:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1686832898/real 1686832898] req@ffff8800beb22f80 x1768769582689408/t0(0) o400->lustre-MDT0000-lwp-OST0001@0@lo:12/10 lens 224/224 e 0 to 1 dl 1686832942 ref 1 fl Rpc:XNQr/200/ffffffff rc 0/-1 uid:0 gid:0 job:'kworker/u8:2.0' [ 2848.668781] Lustre: 2613:0:(client.c:2309:ptlrpc_expire_one_request()) Skipped 2 previous similar messages [ 2867.429739] Lustre: DEBUG MARKER: User quota (limit: 200) [ 2868.538187] Lustre: DEBUG MARKER: Write 100M (directio) ... [ 2869.346848] LustreError: 4174:0:(osd_handler.c:694:osd_ro()) lustre-MDT0000: *** setting device osd-zfs read-only *** [ 2869.607906] Lustre: DEBUG MARKER: mds1 REPLAY BARRIER on lustre-MDT0000 [ 2869.994375] Lustre: DEBUG MARKER: Fail mds for 0 seconds [ 2870.533264] Lustre: Failing over lustre-MDT0000 [ 2870.634993] Lustre: server umount lustre-MDT0000 complete [ 2879.716774] Lustre: 2614:0:(client.c:2309:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1686832966/real 1686832966] req@ffff8801171063c0 x1768769582704064/t0(0) o400->MGC192.168.203.165@tcp@0@lo:26/25 lens 224/224 e 0 to 1 dl 1686832973 ref 1 fl Rpc:XNQr/200/ffffffff rc 0/-1 uid:0 gid:0 job:'kworker/u8:2.0' [ 2879.724841] Lustre: 2614:0:(client.c:2309:ptlrpc_expire_one_request()) Skipped 4 previous similar messages [ 2879.726958] LustreError: 166-1: MGC192.168.203.165@tcp: Connection to MGS (at 0@lo) was lost; in progress operations using this service will fail [ 2885.730375] Lustre: Evicted from MGS (at 192.168.203.165@tcp) after server handle changed from 0xa42055622c163095 to 0xa42055622c163978 [ 2885.863175] Lustre: lustre-MDT0000: Imperative Recovery not enabled, recovery window 60-180 [ 2885.879351] Lustre: lustre-MDT0000: in recovery but waiting for the first client to connect [ 2886.610736] Lustre: DEBUG MARKER: oleg365-server.virtnet: executing set_default_debug vfstrace rpctrace dlmtrace neterror ha config ioctl super lfsck all 8 [ 2887.498490] Lustre: lustre-MDT0000: Will be in recovery for at least 1:00, or until 1 client reconnects [ 2887.527499] Lustre: lustre-MDT0000: Recovery over after 0:01, of 1 clients 1 recovered and 0 were evicted. [ 2887.542582] Lustre: lustre-OST0000: deleting orphan objects from 0x240000400:2115 to 0x240000400:2145 [ 2887.542593] Lustre: lustre-OST0001: deleting orphan objects from 0x280000400:2077 to 0x280000400:2145 [ 2888.918801] Lustre: DEBUG MARKER: oleg365-client.virtnet: executing wait_import_state_mount (FULL|IDLE) mdc.lustre-MDT0000-mdc-*.mds_server_uuid [ 2889.319283] Lustre: DEBUG MARKER: mdc.lustre-MDT0000-mdc-*.mds_server_uuid in FULL state after 0 sec [ 2890.853721] LustreError: 2610:0:(ldlm_resource.c:1125:ldlm_resource_complain()) lustre-MDT0000-lwp-OST0000: namespace resource [0x200000006:0x20000:0xea60].0x0 (ffff8800a6b4b700) refcount nonzero (1) after lock cleanup; forcing cleanup. [ 2906.814819] LustreError: 7788:0:(qsd_reint.c:635:qqi_reint_delayed()) lustre-OST0000: Delaying reintegration for qtype:0 until pending updates are flushed. [ 2906.818381] LustreError: 7788:0:(qsd_reint.c:635:qqi_reint_delayed()) Skipped 11 previous similar messages [ 2914.884780] Lustre: 2611:0:(client.c:2309:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1686832965/real 1686832965] req@ffff8800ad998000 x1768769582704000/t0(0) o601->lustre-MDT0000-lwp-OST0000@0@lo:23/10 lens 336/336 e 0 to 1 dl 1686833008 ref 1 fl Rpc:XNQr/200/ffffffff rc 0/-1 uid:0 gid:0 job:'lquota_wb_lustr.0' [ 2921.881276] Lustre: DEBUG MARKER: (dd_pid=4187, time=30, timeout=600) [ 2944.196181] Lustre: DEBUG MARKER: sanity-quota test_18: @@@@@@ FAIL: [ 2836.180747] Lustre: ll_ost_io00_004: service thread pid 27906 was inactive for 40.067 seconds. The thread might be hung, or it might only be slow and will resume later. Dumping the stack trace for debugging purposes: