Open
Description
Context
there is no advantage to increase n_batch
above n_ubatch
with embeddings models with pooling, because the entire batch must fit in a physical batch (ie. n_ubatch
). n_batch
is always >= n_ubatch
.
- See @slaren comment in: server: docs:
--threads
and--threads
,--ubatch-size
,--log-disable
#6254 (comment)
Proposition
Exit failure if --embedding
is set and --ubatch-size
!= --batch-size
in the server
example. Probably also in the retrieval
example in #6193.
Aldo probably KV bert.context_size
must be taken into account.