Software
The provided software is available as is, without warranty of any kind. Please, cite the related paper when using the code.
ConflictNET: end-to-end learning for speech-based conflict intensity estimation
Python code to estimate the level of verbal conflict from raw speech signals, as presented in
V. Rajan, A. Brutti, A. Cavallaro, ConflictNET: end-to-end learning for speech-based conflict intensity estimation, IEEE Signal Processing Letters, Vol. 26, Issue 11, pp. 1668-1672, November 2019
EdgeFool: an adversarial image enhancement filter
Code to generate enhanced adversarial images after training a Fully Convolutional Neural Network, as presented in
A.S. Shamsabadi, C. Oh, A. Cavallaro, EdgeFool: an adversarial image enhancement filter, IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), Barcelona, Spain, 4-8 May 2020
Private-FGSM, a privacy preserving adversarial image approach
Code for generating adversarial images to preserve privacy in scene classification, as presented in
C. Y. Li, A.S. Shamsabadi, R. Sanchez-Matilla, R. Mazzon, A. Cavallaro, Private-FGSM, a privacy preserving adversarial image approach, Proc. of IEEE Int. Conf. on Acoustics, Speech and Signal Processing (ICASSP), Brighton, UK, 12-17 May 2019
PrivEdge: from local to distributed private training and prediction
Keras implementation of a privacy-preserving technique that safeguards the privacy of users who provide their data for training, as presented in
A.S. Shamsabadi, A. Gascon, H. Haddadi, A. Cavallaro, ColorFool: semantic adversarial colorization, IEEE Transactions on Information Forensics and Security (TIFS), (to appear)
ColorFool: semantic adversarial colorization
Python code to generate adversarial images, as presented in
A.S. Shamsabadi, R. Sanchez-Matilla, A. Cavallaro, ColorFool: semantic adversarial colorization, IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Seattle, Washington, USA, 14-19 June 2020
Multi-speaker tracking from an audio-visual sensing device
Matlab code for multi-speaker tracking in 3D using the multi-modal signals captured by a small-size co-located audio-visual sensing platform, as presented in
X. Qian, A. Brutti, O. Lanz, M. Omologo, A. Cavallaro, Multi-speaker tracking from an audio-visual sensing device, in IEEE Transactions on Multimedia, Vol. 10, Issue 10, pp. 2576 - 2588, October 2019
Polyphonic sound event tracking using linear dynamical systems
This code performs sound event detection in complex acoustic environments, as presented in
E. Benetos, G. Lafay, M. Lagrange, M. D. Plumbley, Polyphonic sound event tracking using linear dynamical systems, IEEE/ACM Transactions on Audio, Speech and Language Processing, Vol.25, Issue 6, pp. 1266 - 1277, June 2017.
To bee or not to bee: Investigating machine learning approaches to beehive sound recognition
Code to create a system that can automatically identify different states of a hive based on audio recordings made from inside beehives, as presented in
I. Nolasco and E. Benetos, To bee or not to bee: Investigating machine learning approaches to beehive sound recognition, Workshop on Detection and Classification of Acoustic Scenes and Events (DCASE), 19-20 November 2018
A long short-term memory convolutional neural network for first-person vision activity recognition
The code implements a first-person vision motion representation that uses stacked spectrograms, as presented in
G. Abebe and A. Cavallaro, A long short-term memory convolutional neural network for first-person vision activity recognition, Proc. of ICCV workshop on Assistive Computer Vision and Robotics (ACVR), Venice, Italy, 28 October 2017
Inertial-vision: cross-domain knowledge transfer for wearable sensors
The code implements a multi-modal ego-centric proprioceptive activity recognition based on a convolutional neural network (CNN), as presented in
G. Abebe and A. Cavallaro, Inertial-vision: cross-domain knowledge transfer for wearable sensors, Proc. of ICCV workshop on Assistive Computer Vision and Robotics (ACVR), Venice, Italy, 28 October 2017
Networked Computer Vision: the importance of a holistic simulator
WiSE-MNet++, based on Castalia/Omnet++, enables the modeling of the communication layers, the sensing and distributed applications of Wireless multimedia sensor networks, as presented in
J.C. SanMiguel and A. Cavallaro, Networked Computer Vision: the importance of a holistic simulator, IEEE Computer, Vol. 50, Issue 7, pp.35-43, July 2017
Computational models of miscommunication phenomena
Python code to characterize communication quality using miscommunication phenomena, as presented in
M. Purver, J. Hough and C. Howes, Computational models of miscommunication phenomena, Topics in Cognitive Science, Vol. 10, Issue 2, pp. 425-451, 2018
POKer: a Partial Order Kernel for comparing strings with alternative substrings
Python code that can be used for comparison and classification of strings containing alternative substrings of variable length, as presented in
M. Abdollahyan and F. Smeraldi, POKer: a Partial Order Kernel for comparing strings with alternative substrings, European Symposium on Artificial Neural Networks, Computational Intelligence and Machine Learning (ESANN), Bruges, Belgium, pp 263-268, 26-28 April 2017
Matlab code for baseline system on acoustic event detection, developed as part of the IEEE D-CASE Challenge. Developed by Dimitrios Giannoulis and Emmanouil Benetos, as presented in
D. Giannoulis, D. Stowell, E. Benetos, M. Rossignol, M, Lagrange, and M.D. Plumbley, A database and challenge for acoustic scene classification and event detection,in Proc. of the 21st European Signal Processing Conf., Marrakech, Morocco, 20-25 June 2013
Color filter arrays: representation, analysis and a design methodology
The code generates frequency structure, multiplexing matrix and demosaicking matrix for analysis of a given color filter array pattern, as presented in
P. Hao, Y. Li, Z. Lin, E. Dubois, A geometric method for optimal design of color filter arrays, IEEE Trans. on Image Processing, Vol. 20, Issue 3, pp. 709-722, March 2011
Dialogue similarity calculation tools
This code implements a set of tools for calculating similarity between speakers in dialogue, across standard and randomised corpora, as presented in
C. Howes, P. G. T. Healey, and M. Purver, Tracking lexical and syntactic alignment in conversation, in Proc. of Annual Conf. of the Cognitive Science Society, Portland, Oregon, USA, 15-19 August 2010
Distance blurring for space-variant image coding
The code performs the depth-based blurring described in
T. Popkin, A. Cavallaro and D. Hands, Image coding using depth blurring for aesthetically acceptable distortion, IEEE Trans. Image Processing, Vol. 20, Issue 11, November 2011
DyLan (Dynamics of Language) dialogue system and toolkit
This code implements a Dynamic Syntax parser and generator for the English language, within a word-by-word incremental dialogue system for the travel domain, as described in
M. Purver, A. Eshghi, and J. Hough, Incremental semantic construction in a Dialogue System, Proc. of Int. Conf. on Computational Semantics, Oxford, UK, 12-14 January 2011
Efficient depth blurring with occlusion handling
The code performs the depth-based blurring described in
T. Popkin, A. Cavallaro and D. Hands, Efficient depth blurring with occlusion handling, in Proc. of IEEE Int. Conf. on Image Processing, Brussels, Belgium, 11-14 September 2011
GM-PHD filter implementation (Gaussian mixture probability hypothesis density filter)
This Pyton code implements the paper Multi-target pitch tracking of vibrato sources in noise using the GM-PHD filter, as presented in
D. Stowell and M. D. Plumbley, Multi-target pitch tracking of vibrato sources in noise using the GM-PHD filter, in Int. Workshop on Machine Learning and Music, Edinburgh, UK, 30 June 2012
Landmark localization and registration for 3D faces
This software registers 3D faces and calculates their differences using the algorithms described in
P. Nair and A. Cavallaro, 3D face detection, landmark localization and registration using a Point Distribution Model, IEEE Trans. on Multimedia, Vol. 11, No. 4, June 2009
This code implements the paper Max-Margin Semi-NMF (MNMF), as presented in
V. Kumar, I. Kotsia and I. Patras, Max-Margin Semi-NMF, British Machine Vision Conf., Dundee, UK, 29 August - 2 September 2011
Multi-feature object trajectory clustering
The code performs the clustering procedure described in
N. Anjum and A. Cavallaro, Multi-feature object trajectory clustering for video analysis, IEEE Trans. on Circuits and Systems for Video Technology, Vol. 18, Issue 11, pp. 1555-1564, November 2008
The code performs the spatial filtering described in
T. Popkin, A. Cavallaro and D. Hands, Multi-foveation filtering, in Proc. of IEEE Int. Conf. on Acoustics, Speech, and Signal Processing, Taipei, Taiwan, 19-24 April 2009
Protocol for evaluating trackers (PFT)
The code reproduces the evaluation protocol described in
T. Nawaz and A. Cavallaro, PFT: A protocol for evaluating video trackers, in Proc. of IEEE Int. Conf. on Image Processing, Brussels, Belgium, 11-14 September 2011
Probabilistic Subpixel Temporal Registration for Facial Expression Analysis
This code implements the PSTR technique for sequence registration as presented in
E. Sariyanidi, H. Gunes and A. Cavallaro, Probabilistic subpixel temporal registration for facial expression analysis, in Proc. of the Asian Computer Vision Conf., Singapore, 1-5 November 2014
Quantised Local Zernike Moments
This code implements the Quantised Local Zernike Moments (QLZM) image representation, as presented in
E. Sariyanidi, H. Gunes, M. Gokmen and A. Cavallaro, Local Zernike Moment representation for facial affect recognition , British Machine Vision Conf., Bristol, UK, 9-13 September 2013
These APIs implement Chatterbox's sentiment/emotion detection tools (free to use below a certain data limit), based around the method described in
M. Purver and S. Battersby, Experimenting with distant supervision for emotion classification, Proc. of Conf. of the European Chapter of the Association for Computational Linguistics, Avignon, France, 23-27 April 2012
Space-variant Gaussian blurring
The code performs the spatial blurring described in
T. Popkin, A. Cavallaro and D. Hands, Accurate and efficient method for smoothly space-variant Gaussian blurring, IEEE Trans. Image Processing, Vol. 19, No. 5, pp. 1362-1370, May 2010
This code implements Support Tucker Machines (STuMs) and Sw-STuMs as presented in,
I. Kotsia and I. Patras, Support Tucker Machines, in Proc. of IEEE Conf. on Computer Vision and Pattern Recognition, Colorado Springs, Colorado, USA, 21-25 June 2011
This code implements Support Tensor Regression (STR), as presented in
W. Guo, I. Kotsia and I. Patras, Tensor Learning for Regression, IEEE Trans. on Image Processing, Vol. 21, No. 2, pp. 816-827, February 2012
Xamrt - cross-associative tree regression algorithm
This code implements the paper Learning Timbre Analogies from Unlabelled Data by Multivariate Tree Regression, as presented in
D. Stowell and M. D. Plumbley, Learning Timbre Analogies from Unlabelled Data by Multivariate Tree Regression, Journal of New Music Research, Vol. 40, No. 4, pp. 325-336, 2014