Software

The provided software is available as is, without warranty of any kind. Please, cite the related paper when using the code.


ConflictNET: end-to-end learning for speech-based conflict intensity estimation

Python code to estimate the level of verbal conflict from raw speech signals, as presented in

V. Rajan, A. Brutti, A. Cavallaro, ConflictNET: end-to-end learning for speech-based conflict intensity estimation, IEEE Signal Processing Letters, Vol. 26, Issue 11, pp. 1668-1672, November 2019


EdgeFool: an adversarial image enhancement filter

Code to generate enhanced adversarial images after training a Fully Convolutional Neural Network, as presented in

A.S. Shamsabadi, C. Oh, A. Cavallaro, EdgeFool: an adversarial image enhancement filter, IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), Barcelona, Spain, 4-8 May 2020


Private-FGSM, a privacy preserving adversarial image approach

Code for generating adversarial images to preserve privacy in scene classification, as presented in

C. Y. Li, A.S. Shamsabadi, R. Sanchez-Matilla, R. Mazzon, A. Cavallaro, Private-FGSM, a privacy preserving adversarial image approach, Proc. of IEEE Int. Conf. on Acoustics, Speech and Signal Processing (ICASSP), Brighton, UK, 12-17 May 2019


PrivEdge: from local to distributed private training and prediction

Keras implementation of a privacy-preserving technique that safeguards the privacy of users who provide their data for training, as presented in

A.S. Shamsabadi, A. Gascon, H. Haddadi, A. Cavallaro, ColorFool: semantic adversarial colorization, IEEE Transactions on Information Forensics and Security (TIFS), (to appear)


ColorFool: semantic adversarial colorization

Python code to generate adversarial images, as presented in

A.S. Shamsabadi, R. Sanchez-Matilla, A. Cavallaro, ColorFool: semantic adversarial colorization, IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Seattle, Washington, USA, 14-19 June 2020


Multi-speaker tracking from an audio-visual sensing device

Matlab code for multi-speaker tracking in 3D using the multi-modal signals captured by a small-size co-located audio-visual sensing platform, as presented in

X. Qian, A. Brutti, O. Lanz, M. Omologo, A. Cavallaro, Multi-speaker tracking from an audio-visual sensing device, in IEEE Transactions on Multimedia, Vol. 10, Issue 10, pp. 2576 - 2588, October 2019


Polyphonic sound event tracking using linear dynamical systems

This code performs sound event detection in complex acoustic environments, as presented in

E. Benetos, G. Lafay, M. Lagrange, M. D. Plumbley, Polyphonic sound event tracking using linear dynamical systems, IEEE/ACM Transactions on Audio, Speech and Language Processing, Vol.25, Issue 6, pp. 1266 - 1277, June 2017.


To bee or not to bee: Investigating machine learning approaches to beehive sound recognition

Code to create a system that can automatically identify different states of a hive based on audio recordings made from inside beehives, as presented in

I. Nolasco and E. Benetos, To bee or not to bee: Investigating machine learning approaches to beehive sound recognition, Workshop on Detection and Classification of Acoustic Scenes and Events (DCASE), 19-20 November 2018


A long short-term memory convolutional neural network for first-person vision activity recognition

The code implements a first-person vision motion representation that uses stacked spectrograms, as presented in

G. Abebe and A. Cavallaro, A long short-term memory convolutional neural network for first-person vision activity recognition, Proc. of ICCV workshop on Assistive Computer Vision and Robotics (ACVR), Venice, Italy, 28 October 2017


Inertial-vision: cross-domain knowledge transfer for wearable sensors

The code implements a multi-modal ego-centric proprioceptive activity recognition based on a convolutional neural network (CNN), as presented in

G. Abebe and A. Cavallaro, Inertial-vision: cross-domain knowledge transfer for wearable sensors, Proc. of ICCV workshop on Assistive Computer Vision and Robotics (ACVR), Venice, Italy, 28 October 2017


Networked Computer Vision: the importance of a holistic simulator

WiSE-MNet++, based on Castalia/Omnet++, enables the modeling of the communication layers, the sensing and distributed applications of Wireless multimedia sensor networks, as presented in

J.C. SanMiguel and A. Cavallaro, Networked Computer Vision: the importance of a holistic simulator, IEEE Computer, Vol. 50, Issue 7, pp.35-43, July 2017


Computational models of miscommunication phenomena

Python code to characterize communication quality using miscommunication phenomena, as presented in

M. Purver, J. Hough and C. Howes, Computational models of miscommunication phenomena, Topics in Cognitive Science, Vol. 10, Issue 2, pp. 425-451, 2018


POKer: a Partial Order Kernel for comparing strings with alternative substrings

Python code that can be used for comparison and classification of strings containing alternative substrings of variable length, as presented in

M. Abdollahyan and F. Smeraldi, POKer: a Partial Order Kernel for comparing strings with alternative substrings, European Symposium on Artificial Neural Networks, Computational Intelligence and Machine Learning (ESANN), Bruges, Belgium, pp 263-268, 26-28 April 2017


Acoustic event detection

Matlab code for baseline system on acoustic event detection, developed as part of the IEEE D-CASE Challenge. Developed by Dimitrios Giannoulis and Emmanouil Benetos, as presented in

D. Giannoulis, D. Stowell, E. Benetos, M. Rossignol, M, Lagrange, and M.D. Plumbley, A database and challenge for acoustic scene classification and event detection,in Proc. of the 21st European Signal Processing Conf., Marrakech, Morocco, 20-25 June 2013


Color filter arrays: representation, analysis and a design methodology

The code generates frequency structure, multiplexing matrix and demosaicking matrix for analysis of a given color filter array pattern, as presented in

P. Hao, Y. Li, Z. Lin, E. Dubois, A geometric method for optimal design of color filter arrays, IEEE Trans. on Image Processing, Vol. 20, Issue 3, pp. 709-722, March 2011


Dialogue similarity calculation tools

This code implements a set of tools for calculating similarity between speakers in dialogue, across standard and randomised corpora, as presented in

C. Howes, P. G. T. Healey, and M. Purver, Tracking lexical and syntactic alignment in conversation, in Proc. of Annual Conf. of the Cognitive Science Society, Portland, Oregon, USA, 15-19 August 2010


Distance blurring for space-variant image coding

The code performs the depth-based blurring described in

T. Popkin, A. Cavallaro and D. Hands, Image coding using depth blurring for aesthetically acceptable distortion, IEEE Trans. Image Processing, Vol. 20, Issue 11, November 2011


DyLan (Dynamics of Language) dialogue system and toolkit

This code implements a Dynamic Syntax parser and generator for the English language, within a word-by-word incremental dialogue system for the travel domain, as described in

M. Purver, A. Eshghi, and J. Hough, Incremental semantic construction in a Dialogue System, Proc. of Int. Conf. on Computational Semantics, Oxford, UK, 12-14 January 2011


Efficient depth blurring with occlusion handling

The code performs the depth-based blurring described in

T. Popkin, A. Cavallaro and D. Hands, Efficient depth blurring with occlusion handling, in Proc. of IEEE Int. Conf. on Image Processing, Brussels, Belgium, 11-14 September 2011


GM-PHD filter implementation (Gaussian mixture probability hypothesis density filter)

This Pyton code implements the paper Multi-target pitch tracking of vibrato sources in noise using the GM-PHD filter, as presented in

D. Stowell and M. D. Plumbley, Multi-target pitch tracking of vibrato sources in noise using the GM-PHD filter, in Int. Workshop on Machine Learning and Music, Edinburgh, UK, 30 June 2012


Landmark localization and registration for 3D faces

This software registers 3D faces and calculates their differences using the algorithms described in

P. Nair and A. Cavallaro, 3D face detection, landmark localization and registration using a Point Distribution Model, IEEE Trans. on Multimedia, Vol. 11, No. 4, June 2009


Max-Margin Semi-NMF

This code implements the paper Max-Margin Semi-NMF (MNMF), as presented in

V. Kumar, I. Kotsia and I. Patras, Max-Margin Semi-NMF, British Machine Vision Conf., Dundee, UK, 29 August - 2 September 2011


Multi-feature object trajectory clustering

The code performs the clustering procedure described in

N. Anjum and A. Cavallaro, Multi-feature object trajectory clustering for video analysis, IEEE Trans. on Circuits and Systems for Video Technology, Vol. 18, Issue 11, pp. 1555-1564, November 2008


Multi-foveation filtering

The code performs the spatial filtering described in

T. Popkin, A. Cavallaro and D. Hands, Multi-foveation filtering, in Proc. of IEEE Int. Conf. on Acoustics, Speech, and Signal Processing, Taipei, Taiwan, 19-24 April 2009


Protocol for evaluating trackers (PFT)

The code reproduces the evaluation protocol described in

T. Nawaz and A. Cavallaro, PFT: A protocol for evaluating video trackers, in Proc. of IEEE Int. Conf. on Image Processing, Brussels, Belgium, 11-14 September 2011


Probabilistic Subpixel Temporal Registration for Facial Expression Analysis

This code implements the PSTR technique for sequence registration as presented in

E. Sariyanidi, H. Gunes and A. Cavallaro, Probabilistic subpixel temporal registration for facial expression analysis, in Proc. of the Asian Computer Vision Conf., Singapore, 1-5 November 2014


Quantised Local Zernike Moments

This code implements the Quantised Local Zernike Moments (QLZM) image representation, as presented in

E. Sariyanidi, H. Gunes, M. Gokmen and A. Cavallaro, Local Zernike Moment representation for facial affect recognition , British Machine Vision Conf., Bristol, UK, 9-13 September 2013


Sentimental

These APIs implement Chatterbox's sentiment/emotion detection tools (free to use below a certain data limit), based around the method described in

M. Purver and S. Battersby, Experimenting with distant supervision for emotion classification, Proc. of Conf. of the European Chapter of the Association for Computational Linguistics, Avignon, France, 23-27 April 2012


Space-variant Gaussian blurring

The code performs the spatial blurring described in

T. Popkin, A. Cavallaro and D. Hands, Accurate and efficient method for smoothly space-variant Gaussian blurring, IEEE Trans. Image Processing, Vol. 19, No. 5, pp. 1362-1370, May 2010


Support Tucker Machines

This code implements Support Tucker Machines (STuMs) and Sw-STuMs as presented in,

I. Kotsia and I. Patras, Support Tucker Machines, in Proc. of IEEE Conf. on Computer Vision and Pattern Recognition, Colorado Springs, Colorado, USA, 21-25 June 2011


Tensor Regression

This code implements Support Tensor Regression (STR), as presented in

W. Guo, I. Kotsia and I. Patras, Tensor Learning for Regression, IEEE Trans. on Image Processing, Vol. 21, No. 2, pp. 816-827, February 2012


Xamrt - cross-associative tree regression algorithm

This code implements the paper Learning Timbre Analogies from Unlabelled Data by Multivariate Tree Regression, as presented in

D. Stowell and M. D. Plumbley, Learning Timbre Analogies from Unlabelled Data by Multivariate Tree Regression, Journal of New Music Research, Vol. 40, No. 4, pp. 325-336, 2014



Software Yoannis Clock tower Octagon building