AudioBERT/model at main · HJ-Ok/AudioBERT

History

Name		Name	Last commit message	Last commit date
parent directory ..
AudioBERT		AudioBERT
audio_span_detector		audio_span_detector
clap_retrieval		clap_retrieval
README.md		README.md

README.md

Model

AudioBERT

AudioBERT uses a retrieval-based framework to inject auditory knowledge into language models. Its key components include:

Auditory Knowledge Span Detector: This component detects text spans where auditory knowledge is needed, identifying key tokens related to sounds or objects for audio retrieval.
CLAP Retrieval: Once the span is identified, CLAP retrieves the most relevant audio by matching the text span with audio samples. This embedding is then added to the model to enhance auditory understanding.
AudioBERT (LoRA): Dynamically adapts the model with auditory embeddings when necessary, ensuring general performance on other language tasks.

Training

We employ a BERT-base model for the auditory knowledge spandetector. We trained with 5 epochs with a batch size of 16, a learning rate of 1×10−5, and utilizing AdamW optimizer.

We experimented using BERT for the language model and employed an AST encoder for auditory knowledge embedding injecting. We trained with 20 epochs with a batch size of 32, a learning rate of 3 × 10−4, and utilizing AdamW optimizer. For LoRA, we set the rank and alpha to 64 and 128.

Results

AudioBERT outperforms baseline models such as BERT, RoBERTa, Gemma2-2B, and LlaMA3.1-8B in auditory tasks, achieving significantly higher accuracy on both AuditoryBench tasks in the test set.

Model	Animal Sound (Acc)	Sound Pitch (Acc)	Combined (Acc)
BERT-large	15.85	58.90	44.41
RoBERTa-large	14.70	56.64	42.52
Gemma2-2B	15.11	60.45	45.19
LLaMA3.1-8B	21.80	62.55	48.83
AudioBERT	36.69	76.31	62.97

Usage

To run AudioBERT on a sample auditory knowledge task, use the following commands: (TBD)

Training the Model:
```
python train.py
```
Evaluating the Model:
```
python evaluate.py
```

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

model

model

README.md

Model

AudioBERT

Training

Results

Usage

Files

model

Directory actions

More options

Directory actions

More options

Latest commit

History

model

Folders and files

parent directory

README.md

Model

AudioBERT

Training

Results

Usage