...

TUB-IRML at the MediaEval 2014 Violent Scenes Detection Task

by esra-acar

on

Report

Download: 0

Comment: 0

63,084

views

Comments

Description

Download TUB-IRML at the MediaEval 2014 Violent Scenes Detection Task

Transcript

  1. 1. TUB-IRML at MediaEval 2014 Violent Scenes DetectionTask: Violence Modeling through Feature Space PartitioningEsra Acar, Sahin AlbayrakCompetence Center Information Retrieval & Machine Learning
  2. 2. Outline►The Violence Detection MethodVideo Representation Violence Detection Model►Results & Discussion►Conclusions & Future Work16 October 2014 TUB-IRML at MediaEval 2014 Violent Scenes Detection Task 2
  3. 3. The Violence Detection Method►The two main components of our method are: (1) the representation of video segments, and (2) the learning of a violence model.16 October 2014 TUB-IRML at MediaEval 2014 Violent Scenes Detection Task 3
  4. 4. Video Representation (1)The generation process of sparse coding based audio and visual representations for video segments.16 October 2014 TUB-IRML at MediaEval 2014 Violent Scenes Detection Task 4
  5. 5. Video Representation (2)The generation of audio and visual dictionaries with sparse coding.16 October 2014 TUB-IRML at MediaEval 2014 Violent Scenes Detection Task 5
  6. 6. Video Representation (3)► In addition to the mid-level audio and visual representations,we use low-level features which are:Motion-related descriptors – Violent Flow (ViF) which is adescriptor proposed for real-time detection of violent crowdbehaviors, and Static content representations – Affect-related colordescriptors such as statistics on saturation, brightness andhue in the HSL color space, and colorfulness.16 October 2014 TUB-IRML at MediaEval 2014 Violent Scenes Detection Task 6
  7. 7. Violence Detection Model►Violence is a concept which can audio-visually be expressed indiverse manners.►We learn multiple models for the violence concept instead of aunique model. Feature space partitioning by clustering video segments inthe training dataset, and Learn a different model for each violence sub-concept.►We perform a classifier selection to solve the classifiercombination issue.16 October 2014 TUB-IRML at MediaEval 2014 Violent Scenes Detection Task 7
  8. 8. Results & DiscussionThe MAP2014 and MAP@100 of our method with different representationsMethod MAP2014 –MoviesMAP@100 –MoviesMAP2014 –Web videosMAP@100 –Web videosRun1 0.169 0.368 0.517 0.582Run2 0.139 0.284 0.371 0.478Run3 0.080 0.208 0.477 0.495Run4 0.172 0.409 0.489 0.586Run5 0.170 0.406 0.479 0.567SVM-based0.093 0.302 - -unique modelRun1  MFCC-based mid-level audio representationsRun2  HoG- and HoF-based mid-level features and ViFRun3  Affect-related color featuresRun4  Audio and visual features (except color)Run5  All audio-visual representations are linearly fused at the decision level16 October 2014 TUB-IRML at MediaEval 2014 Violent Scenes Detection Task 8
  9. 9. Conclusions & Future Work►The mid-level audio representation based on MFCC andsparse coding provides promising performance in terms of MAP2014 andMAP@100 metrics, and also outperforms our visual representations.► As a future work, we need to extend/improve our visual representation set, and further investigate the feature space partitioning concept.16 October 2014 TUB-IRML at MediaEval 2014 Violent Scenes Detection Task 10
  10. 10. M.Sc.Competence Center Information Retrieval &Machine Learningwww.dai-labor.deFonFax+49 (0) 30 / 314 – 74+49 (0) 30 / 314 – 74 003DAI-LaborTechnische Universität BerlinFakultät IV – Elektrontechnik & InformatikSekretariat TEL 14Ernst-Reuter-Platz 710587 Berlin, Deutschland11Esra AcarResearcheresra.acar@tu-berlin.deThanks!013TUB-IRML at MediaEval 16 October 2014 2014 Violent Scenes Detection Task
Fly UP