Closed
Description
In 3.11, when classic queues v2 are used with mirroring, they can crash like this:
** Reason for termination ==
** {{badmatch,-1},
[{rabbit_mirror_queue_slave,update_delta,2,
[{file,"rabbit_mirror_queue_slave.erl"},
{line,1081}]},
{rabbit_mirror_queue_slave,process_instruction,2,
[{file,"rabbit_mirror_queue_slave.erl"},
{line,1009}]},
{rabbit_mirror_queue_slave,handle_cast,2,
[{file,"rabbit_mirror_queue_slave.erl"},
{line,299}]},
{gen_server2,handle_msg,2,[{file,"gen_server2.erl"},{line,1067}]},
{proc_lib,init_p_do_apply,3,[{file,"proc_lib.erl"},{line,240}]}]}
crasher:
initial call: rabbit_prequeue:init/1
pid: <0.12215.0>
registered_name: []
exception exit: {{badmatch,-1},
[{rabbit_mirror_queue_slave,update_delta,2,
[{file,"rabbit_mirror_queue_slave.erl"},
{line,1081}]},
{rabbit_mirror_queue_slave,process_instruction,2,
[{file,"rabbit_mirror_queue_slave.erl"},
{line,1009}]},
{rabbit_mirror_queue_slave,handle_cast,2,
[{file,"rabbit_mirror_queue_slave.erl"},{line,299}]},
{gen_server2,handle_msg,2,
[{file,"gen_server2.erl"},{line,1067}]},
{proc_lib,init_p_do_apply,3,
[{file,"proc_lib.erl"},{line,240}]}]}
in function gen_server2:terminate/3 (gen_server2.erl, line 1183)
ancestors: [<0.12062.0>,<0.1605.0>,<0.1588.0>,<0.1587.0>,
rabbit_vhost_sup_sup,rabbit_sup,<0.1313.0>]
message_queue_len: 0
messages: []
links: [<0.12062.0>,<0.12216.0>]
dictionary: [{credit_blocked,[]},
{process_name,
{rabbit_mirror_queue_slave,
{resource,<<"/">>,queue,
<<"cmq-publish_then_consume">>}}},
{guid,{{548389537,174457376,3517217523,4084119289},1}},
{explicit_gc_run_operation_threshold,1000},
{rand_seed,
{#{jump => #Fun<rand.3.34006561>,
max => 288230376151711743,
next => #Fun<rand.5.34006561>,type => exsplus},
[247167938309581976|185153570766432085]}},
{credit_flow_default_credit,{400,200}}]
trap_exit: true
status: running
heap_size: 18481566
stack_size: 35
reductions: 56778258
neighbours:
neighbour:
pid: <0.12216.0>
registered_name: []
initial call: gm:init/1
current_function: {gen_server2,process_next_msg,1}
ancestors: [<0.12215.0>,<0.12062.0>,<0.1605.0>,<0.1588.0>,<0.1587.0>,
rabbit_vhost_sup_sup,rabbit_sup,<0.1313.0>]
message_queue_len: 0
links: [<0.12215.0>]
trap_exit: false
status: waiting
heap_size: 610
stack_size: 11
reductions: 115417
current_stacktrace: [{gen_server2,process_next_msg,1,
[{file,"gen_server2.erl"},{line,673}]},
{proc_lib,init_p_do_apply,3,
[{file,"proc_lib.erl"},{line,240}]}]
supervisor: {<0.12062.0>,rabbit_amqqueue_sup}
errorContext: child_terminated
reason: {{badmatch,-1},
[{rabbit_mirror_queue_slave,update_delta,2,
[{file,"rabbit_mirror_queue_slave.erl"},{line,1081}]},
{rabbit_mirror_queue_slave,process_instruction,2,
[{file,"rabbit_mirror_queue_slave.erl"},{line,1009}]},
{rabbit_mirror_queue_slave,handle_cast,2,
[{file,"rabbit_mirror_queue_slave.erl"},{line,299}]},
{gen_server2,handle_msg,2,
[{file,"gen_server2.erl"},{line,1067}]},
{proc_lib,init_p_do_apply,3,
[{file,"proc_lib.erl"},{line,240}]}]}
offender: [{pid,<0.12215.0>},
{id,rabbit_amqqueue},
{mfargs,
{rabbit_prequeue,start_link,
[{amqqueue,
{resource,<<"/">>,queue,
<<"cmq-publish_then_consume">>},
true,false,none,
[{<<"x-max-length">>,long,100000},
{<<"x-queue-mode">>,longstr,<<"lazy">>},
{<<"x-queue-version">>,long,2}],
<48209.8417.0>,[],[],[],
[{vhost,<<"/">>},
{name,<<"ha">>},
{pattern,<<"cmq.*">>},
{'apply-to',<<"queues">>},
{definition,
[{<<"ha-mode">>,<<"exactly">>},
{<<"ha-params">>,3},
{<<"ha-sync-mode">>,<<"automatic">>}]},
{priority,1}],
undefined,
[{<48209.8420.0>,<48209.8417.0>}],
[],live,0,[],<<"/">>,
#{user => <<"guest">>},
rabbit_classic_queue,#{}},
slave,<0.12061.0>]}},
{restart_type,intrinsic},
{shutdown,600000},
{child_type,worker}]
Steps to reproduce:
git checkout v3.11.x
bazel run start-cluster
rabbitmqctl -n rabbit-0 set_policy --apply-to queues --priority 1 ha "cmq.*" '{"ha-mode":"exactly", "ha-params": 3, "ha-sync-mode": "automatic"}'
- Publish messages to a queue and then consume them:
perf-test -x 1 -y 0 -c 3000 -u cmq-publish_then_consume -qa x-max-length=100000,x-queue-version=2,x-queue-mode=lazy -C 100000 -ad false -f persistent
perf-test -x 0 -y 1 -u cmq-publish_then_consume -qa x-max-length=100000,x-queue-version=2,x-queue-mode=lazy -ad false -f persistent
The issue doesn't seem to occur with CQv1. With v2, it occurs both with lazy and non-lazy, but it seems easier to reproduce with lazy. In both cases, given enough messages in the queue (the above 100k is generally not sufficient but just change it to 5M or something), it leads to a memory spike. Especially with non-lazy CQv2, that spike often leads to a memory alarm or even OOMkill. It looks like this:
I was not able to reproduce using main
.
Metadata
Metadata
Assignees
Labels
No labels