Overview

The reason why I conducted research are shown below

2024-09-15: Dataset Collection and Preprocessing

2024-09-15: Dataset Collection and Preprocessing

  • Goal: Gather diverse audio samples for training.
  • Actions: Sourced 1,000 audio clips from public datasets and simulated environments. Extracted MFCC features using Librosa.
  • Code Snippet:
  • Challenges:
    • choose and get appropriate data sets.
    • 다양한 오픈 데이터셋(CREMA-D, ESC-50, FSD50K, AudioSet, RAVDESS ) 의
    • 다양한 오픈 데이터셋을 저장한 뒤 각각 데이터를 전처리 및 분류하기가 어려웠음.
  • datasets
  • unzip
zip -s 0 FSD50k.dev_audio.zip --out unsplit.zip

unzip unsplit.zip

Criteria :

  • FSD50K
    • FACT
      • In AudioSet ontology, there is a clear class called “Screaming”, defined as a sharp, high-pitched human vocalization.
      • However, FSD50K only selected a subset of Audio classes.
      • Seemingly, “Screaming” is included among the 200 FSD50K classes.
    • code snippet
fsd_meta = pd.read_csv('/content/drive/MyDrive/FSD50K/FSD50K.ground_truth/dev_clips.csv')
fsd_scream = fsd_meta[fsd_meta['tags'].str.contains('scream', case=False, na=False)]
os.makedirs('/content/drive/MyDrive/FSD50K/screams', exist_ok=True)
for fname in fsd_scream['fname']:
    src = f'/content/drive/MyDrive/FSD50K/FSD50K.dev_audio/{fname}.wav'
    dst = f'/content/drive/MyDrive/FSD50K/screams/{fname}.wav'
    if os.path.exists(src):
        shutil.copy(src, dst)

2024-09-15: Model

CNN

Epoch 1/20
88/88 ━━━━━━━━━━━━━━━━━━━━ 2s 17ms/step - accuracy: 1.0000 - loss: 1.1653e-05 - val_accuracy: 0.8645 - val_loss: 1.8799
Epoch 2/20
88/88 ━━━━━━━━━━━━━━━━━━━━ 1s 15ms/step - accuracy: 1.0000 - loss: 1.0051e-05 - val_accuracy: 0.8645 - val_loss: 1.8835
Epoch 3/20
88/88 ━━━━━━━━━━━━━━━━━━━━ 1s 15ms/step - accuracy: 1.0000 - loss: 1.1417e-05 - val_accuracy: 0.8659 - val_loss: 1.8832
Epoch 4/20
88/88 ━━━━━━━━━━━━━━━━━━━━ 1s 15ms/step - accuracy: 1.0000 - loss: 9.5432e-06 - val_accuracy: 0.8659 - val_loss: 1.9068
Epoch 5/20
88/88 ━━━━━━━━━━━━━━━━━━━━ 1s 15ms/step - accuracy: 1.0000 - loss: 7.6834e-06 - val_accuracy: 0.8645 - val_loss: 1.8992
Epoch 6/20
88/88 ━━━━━━━━━━━━━━━━━━━━ 1s 15ms/step - accuracy: 1.0000 - loss: 1.0186e-05 - val_accuracy: 0.8645 - val_loss: 1.8996
Epoch 7/20
88/88 ━━━━━━━━━━━━━━━━━━━━ 1s 15ms/step - accuracy: 1.0000 - loss: 9.7836e-06 - val_accuracy: 0.8645 - val_loss: 1.9095
Epoch 8/20
88/88 ━━━━━━━━━━━━━━━━━━━━ 1s 15ms/step - accuracy: 1.0000 - loss: 5.7895e-06 - val_accuracy: 0.8645 - val_loss: 1.9164
Epoch 9/20
88/88 ━━━━━━━━━━━━━━━━━━━━ 1s 16ms/step - accuracy: 1.0000 - loss: 6.9273e-06 - val_accuracy: 0.8645 - val_loss: 1.9266
Epoch 10/20
88/88 ━━━━━━━━━━━━━━━━━━━━ 1s 15ms/step - accuracy: 1.0000 - loss: 6.6463e-06 - val_accuracy: 0.8645 - val_loss: 1.9170
Epoch 11/20
88/88 ━━━━━━━━━━━━━━━━━━━━ 1s 15ms/step - accuracy: 1.0000 - loss: 5.6640e-06 - val_accuracy: 0.8645 - val_loss: 1.9285
Epoch 12/20
88/88 ━━━━━━━━━━━━━━━━━━━━ 1s 15ms/step - accuracy: 1.0000 - loss: 5.6058e-06 - val_accuracy: 0.8631 - val_loss: 1.9327
Epoch 13/20
88/88 ━━━━━━━━━━━━━━━━━━━━ 1s 15ms/step - accuracy: 1.0000 - loss: 5.2367e-06 - val_accuracy: 0.8631 - val_loss: 1.9413
Epoch 14/20
88/88 ━━━━━━━━━━━━━━━━━━━━ 1s 15ms/step - accuracy: 1.0000 - loss: 4.9884e-06 - val_accuracy: 0.8631 - val_loss: 1.9389
Epoch 15/20
88/88 ━━━━━━━━━━━━━━━━━━━━ 1s 16ms/step - accuracy: 1.0000 - loss: 4.4711e-06 - val_accuracy: 0.8631 - val_loss: 1.9546
Epoch 16/20
88/88 ━━━━━━━━━━━━━━━━━━━━ 1s 16ms/step - accuracy: 1.0000 - loss: 4.5350e-06 - val_accuracy: 0.8631 - val_loss: 1.9494
Epoch 17/20
88/88 ━━━━━━━━━━━━━━━━━━━━ 1s 16ms/step - accuracy: 1.0000 - loss: 4.2300e-06 - val_accuracy: 0.8631 - val_loss: 1.9608
Epoch 18/20
88/88 ━━━━━━━━━━━━━━━━━━━━ 1s 16ms/step - accuracy: 1.0000 - loss: 3.5147e-06 - val_accuracy: 0.8631 - val_loss: 1.9714
Epoch 19/20
88/88 ━━━━━━━━━━━━━━━━━━━━ 1s 16ms/step - accuracy: 1.0000 - loss: 3.1768e-06 - val_accuracy: 0.8631 - val_loss: 1.9723
Epoch 20/20
88/88 ━━━━━━━━━━━━━━━━━━━━ 1s 16ms/step - accuracy: 1.0000 - loss: 2.9217e-06 - val_accuracy: 0.8631 - val_loss: 1.9845
CNN Train is finished===

CRNN

Epoch 1/20
88/88 ━━━━━━━━━━━━━━━━━━━━ 1s 16ms/step - accuracy: 0.9979 - loss: 0.0104 - val_accuracy: 0.9073 - val_loss: 0.4617
Epoch 2/20
88/88 ━━━━━━━━━━━━━━━━━━━━ 1s 15ms/step - accuracy: 0.9962 - loss: 0.0100 - val_accuracy: 0.9073 - val_loss: 0.4210
Epoch 3/20
88/88 ━━━━━━━━━━━━━━━━━━━━ 1s 15ms/step - accuracy: 0.9999 - loss: 0.0032 - val_accuracy: 0.8859 - val_loss: 0.5132
Epoch 4/20
88/88 ━━━━━━━━━━━━━━━━━━━━ 1s 15ms/step - accuracy: 0.9461 - loss: 0.1919 - val_accuracy: 0.9087 - val_loss: 0.3831
Epoch 5/20
88/88 ━━━━━━━━━━━━━━━━━━━━ 1s 15ms/step - accuracy: 0.9914 - loss: 0.0261 - val_accuracy: 0.8987 - val_loss: 0.5386
Epoch 6/20
88/88 ━━━━━━━━━━━━━━━━━━━━ 1s 15ms/step - accuracy: 0.9947 - loss: 0.0209 - val_accuracy: 0.8987 - val_loss: 0.4155
Epoch 7/20
88/88 ━━━━━━━━━━━━━━━━━━━━ 1s 15ms/step - accuracy: 1.0000 - loss: 0.0029 - val_accuracy: 0.9101 - val_loss: 0.4390
Epoch 8/20
88/88 ━━━━━━━━━━━━━━━━━━━━ 1s 15ms/step - accuracy: 1.0000 - loss: 0.0012 - val_accuracy: 0.9073 - val_loss: 0.4446
Epoch 9/20
88/88 ━━━━━━━━━━━━━━━━━━━━ 1s 15ms/step - accuracy: 1.0000 - loss: 9.4883e-04 - val_accuracy: 0.9116 - val_loss: 0.4512
Epoch 10/20
88/88 ━━━━━━━━━━━━━━━━━━━━ 1s 15ms/step - accuracy: 1.0000 - loss: 7.2539e-04 - val_accuracy: 0.9101 - val_loss: 0.4535
Epoch 11/20
88/88 ━━━━━━━━━━━━━━━━━━━━ 1s 15ms/step - accuracy: 1.0000 - loss: 5.4815e-04 - val_accuracy: 0.9116 - val_loss: 0.4647
Epoch 12/20
88/88 ━━━━━━━━━━━━━━━━━━━━ 1s 15ms/step - accuracy: 1.0000 - loss: 5.0530e-04 - val_accuracy: 0.9130 - val_loss: 0.4685
Epoch 13/20
88/88 ━━━━━━━━━━━━━━━━━━━━ 1s 14ms/step - accuracy: 1.0000 - loss: 4.4567e-04 - val_accuracy: 0.9144 - val_loss: 0.4716
Epoch 14/20
88/88 ━━━━━━━━━━━━━━━━━━━━ 1s 15ms/step - accuracy: 1.0000 - loss: 4.0068e-04 - val_accuracy: 0.9116 - val_loss: 0.4769
Epoch 15/20
88/88 ━━━━━━━━━━━━━━━━━━━━ 1s 15ms/step - accuracy: 1.0000 - loss: 3.2456e-04 - val_accuracy: 0.9130 - val_loss: 0.4801
Epoch 16/20
88/88 ━━━━━━━━━━━━━━━━━━━━ 1s 14ms/step - accuracy: 1.0000 - loss: 3.6989e-04 - val_accuracy: 0.9130 - val_loss: 0.4830
Epoch 17/20
88/88 ━━━━━━━━━━━━━━━━━━━━ 1s 14ms/step - accuracy: 1.0000 - loss: 2.9874e-04 - val_accuracy: 0.9130 - val_loss: 0.4877
Epoch 18/20
88/88 ━━━━━━━━━━━━━━━━━━━━ 1s 15ms/step - accuracy: 1.0000 - loss: 2.7565e-04 - val_accuracy: 0.9130 - val_loss: 0.4937
Epoch 19/20
88/88 ━━━━━━━━━━━━━━━━━━━━ 1s 14ms/step - accuracy: 1.0000 - loss: 2.7009e-04 - val_accuracy: 0.9130 - val_loss: 0.4936
Epoch 20/20
88/88 ━━━━━━━━━━━━━━━━━━━━ 1s 15ms/step - accuracy: 1.0000 - loss: 2.4103e-04 - val_accuracy: 0.9130 - val_loss: 0.5008
CRNN Train is finished===
Training the classifier model...
Epoch 1/50
88/88 ━━━━━━━━━━━━━━━━━━━━ 3s 17ms/step - accuracy: 0.6908 - loss: 0.6759 - val_accuracy: 0.5592 - val_loss: 0.7654
Epoch 2/50
88/88 ━━━━━━━━━━━━━━━━━━━━ 0s 3ms/step - accuracy: 0.7237 - loss: 0.5160 - val_accuracy: 0.8374 - val_loss: 0.4223
Epoch 3/50
88/88 ━━━━━━━━━━━━━━━━━━━━ 0s 3ms/step - accuracy: 0.7650 - loss: 0.4653 - val_accuracy: 0.8488 - val_loss: 0.4032
Epoch 4/50
88/88 ━━━━━━━━━━━━━━━━━━━━ 0s 3ms/step - accuracy: 0.7714 - loss: 0.4473 - val_accuracy: 0.8417 - val_loss: 0.3812
Epoch 5/50
88/88 ━━━━━━━━━━━━━━━━━━━━ 0s 3ms/step - accuracy: 0.7603 - loss: 0.4388 - val_accuracy: 0.6320 - val_loss: 0.6263
Epoch 6/50
88/88 ━━━━━━━━━━━━━━━━━━━━ 0s 3ms/step - accuracy: 0.7589 - loss: 0.4639 - val_accuracy: 0.6947 - val_loss: 0.5400
Epoch 7/50
88/88 ━━━━━━━━━━━━━━━━━━━━ 0s 3ms/step - accuracy: 0.7685 - loss: 0.4090 - val_accuracy: 0.8502 - val_loss: 0.3662
Epoch 8/50
88/88 ━━━━━━━━━━━━━━━━━━━━ 0s 3ms/step - accuracy: 0.8013 - loss: 0.4005 - val_accuracy: 0.8203 - val_loss: 0.4330
Epoch 9/50
88/88 ━━━━━━━━━━━━━━━━━━━━ 0s 3ms/step - accuracy: 0.7913 - loss: 0.3819 - val_accuracy: 0.8531 - val_loss: 0.4303
Epoch 10/50
88/88 ━━━━━━━━━━━━━━━━━━━━ 0s 3ms/step - accuracy: 0.7651 - loss: 0.4139 - val_accuracy: 0.8516 - val_loss: 0.3928
Epoch 11/50
88/88 ━━━━━━━━━━━━━━━━━━━━ 0s 3ms/step - accuracy: 0.8123 - loss: 0.3809 - val_accuracy: 0.8046 - val_loss: 0.4560
Epoch 12/50
88/88 ━━━━━━━━━━━━━━━━━━━━ 0s 3ms/step - accuracy: 0.7971 - loss: 0.3908 - val_accuracy: 0.8359 - val_loss: 0.4327
Training finished.
YAMNet Train is finished===

Ensemble

yamnet predict>>>>>>>>>>>>>>>>>>>>>>
2/2 ━━━━━━━━━━━━━━━━━━━━ 11s 2s/step
cnn predict>>>>>>>>>>>>>>>>>>>>>>>
2/2 ━━━━━━━━━━━━━━━━━━━━ 1s 632ms/step
crnn predict>>>>>>>>>>>>>>>>>>>>>>>
1/2 ━━━━━━━━━━━━━━━━━━━━ 0s 312ms/step
2/2 ━━━━━━━━━━━━━━━━━━━━ 1s 417ms/step
Scream Metrics: 
(1.0, 0.8775510204081632, 0.9347826086956522, None)
Fear Metrics: 
(0.0, 0.0, 0.0, None)

REFERENCES

Posted in

Leave a comment