2017 Intelligent Sensing Summer School
A three-day event on advanced Intelligent Sensing topics, including wearable sensing, object tracking, sensing groups and crowds, sound processing, affective computing, machine learning, audio-visual sensing, and sensing people. Participants are expected to attend all three days. Accomodations nearby the event location: [QMUL campus], [short stay], [hotels]. Where: David Sizer Lecture Theatre, Bancroft building (number 31 in [map]). |
7 September | |||||||
8:30 | Registration | ||||||
9:15 | Welcome and opening | ||||||
9:30-11:15 session chair: Maryam Abdollahyan | |||||||
9:30 |
Initial investigations into characterizing DIY e-textile stretch sensors An evaluation of three electronic textile (e-textile) stretch sensors: two variations of fabric knit with a stainless steel and polyester yarn, and knit fabric coated with a conductive polymer. Although these materials are accessible to designers and engineers, the properties of each sensor have not before been formally analysed. We evaluate the sensors' performance when being stretched and released.
|
Sophie Skach | Wearable sensing | ||||
Large scale mood and stress self-assessments on a smartwatch
Modern sensing technology is becoming increasingly ubiquitous. We will present an easy-to-use application to log current emotional states on a widely used smartwatch and collect additional, body sensing data to build a basis for new algorithms, interventions and technology-supported therapy around this data to promote emotional and mental well-being.
|
Katrin Hansel | ||||||
A long short-term memory convolutional neural network for first-person vision activity recognition We propose a motion representation that uses stacked temporal spectrograms and a long short-term memory (LSTM) network for the recognition of proprioceptive activities in first-person vision (FPV). Experimental results show that the proposed approach achieves state-of-the-art performance in the largest public dataset for FPV activity recognition.
|
Girmaw Abebe Tadesse | ||||||
10:15 |
Active visual tracking in multi-agent scenarios We propose an active visual tracker with collision avoidance for camera-equipped robots in dense multi-agent scenarios. The objective of each tracking agent (robot) is to maintain visual fixation on its moving target while updating its velocity to avoid other agents. We address the problem of robots not having accessible collision-avoiding paths while maintaining the target centred in the field of view and at a certain size.
|
Yiming Wang | Object tracking | ||||
Online multi-target tracking with strong and weak detections An online multi-target tracker that exploits both high- and low-confidence target detections in a Probability Hypothesis Density Particle Filter framework. Results show that our method outperforms alternative online trackers on the Multiple Object Tracking 2016 and 2015 benchmark datasets in terms tracking accuracy, false negatives and speed.
|
Ricardo Sanchez-Matilla | ||||||
10:45 |
Generic to specific recognition models for membership analysis in group videos We present an automatic analysis of group membership - i.e., recognizing which group the individual in question is part of - based on a specific recognition model. The model is implemented by a novel two-phase Support Vector Machine (SVM) trained using an optimized generic recognition model.
|
Wenxuan Mou | Sensing groups and crowds | ||||
Crowd analysis using visual and non-visual sensors, a survey A critical survey of crowd analysis techniques using visual and non-visual sensors. This survey identifies different approaches as well as relevant work on crowd analysis, including crowd phenomenon and its dynamics ranging from social, and psychological aspects to computational perspectives.
|
Muhammad Irfan | ||||||
11:15 | Coffee break | ||||||
11:30-12:45 session chair: Yiming Wang | |||||||
11:30 |
A study on LSTM networks for polyphonic music sequence modelling
We investigate the predictive power of simple long short-term memory (LSTM) networks for polyphonic MIDI sequences, using an empirical approach. Such systems can then be used as a music language model which, combined with an acoustic model, can improve automatic music transcription (AMT) performance. Results are compared in terms of note prediction accuracy.
|
Adrien Ycart | Sound processing | ||||
Efficient learning of harmonic priors for pitch detection in polyphonic music We study whether the introduction of physically inspired Gaussian process (GP) priors into audio content analysis models improves the extraction of patterns required for Automatic music transcription (AMT). We demonstrate that what is relevant for improving pitch detection is to learn priors that fit the frequency content of the sound events to detect.
|
Pablo Alvarado Duran | ||||||
12:00 |
Effects of valence and arousal on working memory performance in virtual reality gaming This work explores how working memory (WM) performance is affected when playing a Virtual Reality (VR) game, and the effects of valence and arousal in this context. Furthermore, a discussion on the application of machine learning to detect affective states based on the player's hand and head motion is presented.
|
Daniel Gabana Arellano | Affective computing | ||||
12:15 |
Class rectification hard mining for imbalanced deep learning Recognising detailed facial or clothing attributes in images of people is a challenging task for computer vision, especially when the training data are both in very large scale and extremely imbalanced among different attribute classes. To address this problem, we formulate a novel scheme for batch incremental hard sample mining of minority attribute classes from imbalanced large scale training data.
|
Qi Dong |
Machine learning | ||||
L1 graph based sparse model for label de-noising We propose a novel robust graph-based approach for label de-noising by (i) label smoothing via a visual similarity graph, and (ii) explicitly modelling the label noise pattern. An efficient algorithm is formulated to optimise the proposed model, which contains multiple robust $L_1$ terms in its objective function and is thus non-trivial to optimise.
|
Xiaobin Chang |
14:00 |
Audio-visual multi-speaker tracking with PHD filtering Detection and tracking of multiple moving speakers in indoor environments is often required in applications such as automatic camera steering in video conferencing, individual speaker discrimination in multi-speaker environments, and surveillance and monitoring for security. In this lecture, we present some recent development in multi-speaker tracking with audio visual information under the Bayesian framework. In particular, we present adaptive particle filtering, PHD filtering and sparse-sampling based PHD filtering algorithms, and provide demos to show the performance of these tracking algorithms.
|
Wenwu Wang
Invited speaker |
Audio-visual sensing | ||
15:00 | Coffee break | ||||
15:30 |
Unsupervised cross-modal adaptation for audio-visual target identification with wearable cameras The increasing availability of body-worn cameras is facilitating applications such as life-logging and activity detection. In particular, recognising objects or the identity of humans from egocentric data is an important capability. Model adaptation is fundamental for wearable devices due to limited training material and rapidly varying operational conditions and target appearances. This talk will discuss the specific issues of audio-visual target identification with wearable cameras and will present an approach to adapt models in an unsupervised and on-line way: each mono-modal model is adapted using the unsupervised labelling provided by the other modality, leveraging on the complementarity of the information available in the audio and visual streams.
|
Alessio Brutti
Invited speaker |
|||
16:30 | CIS demos |
8 September | ||||||
9:15 | Introduction to the day | |||||
9:30 |
Spatial perception for mobile robots Mobile robots need dedicated sensing and processing for localisation and mapping as well as scene understanding. Recent years have brought tremendous advances in vision sensors (e.g. RGB-D cameras) and processing power (e.g. GPUs) that have led us to design new algorithms that will empower the next generation of mobile robots. With the arrival of deep learning, we are furthermore now in the position to link respective unprecedented performance in scene understanding with 3D mapping. In this talk, I will go through some recent algorithms and software we have developed as well as their application to mobile robots, including drones.
|
Stefan Leutenegger
Invited speaker |
Mobile sensing | |||
10:30 | Coffee break | |||||
11:00 |
Mobile sensing for human behaviour monitoring and mobile health: challenges and applications With the advent of powerful and inexpensive sensing technology the ability to study human behaviour and activity at large scale and for long periods is becoming a firm reality. Wearables and mobile devices further allow continuous monitoring at unprecedented granularities. This reality generates new challenges but also opens the door to potentially innovative ways of understanding our daily lives. The range of devices and apps released as products in recent years for both medical and general fitness has led to user interest in tracking activity with increased accuracy: this not only revealed the potential of this domain but also highlighted challenges and limitations. In this talk I will discuss our experience in large mobile sensor deployments and analytics in the areas of health and well being. I will discuss challenges and opportunities at the system, data analytics and inference level and our potential future directions and options.
|
Cecilia Mascolo
Invited speaker |
14:00 |
Deep learning for unconstrained face analysis Recently, methods based on Deep Learning have been shown to produce remarkable performance for a variety of difficult Computer Vision tasks including recognition, detection and semantic segmentation outperforming prior work by a large margin. A key feature of these approaches is that they integrate non-linear hierarchical feature extraction with the classification or regression task in hand being also able to capitalise on the very large datasets that are now readily available. In this talk, I will review the most recent Deep Learning methods for a number of important face analysis tasks, including face detection, 2D and 3D facial landmark localisation (i.e. face alignment), facial part segmentation, 3D face reconstruction, and face recognition, and show how these methods have significantly advanced the state-of-the-art on the most challenging face datasets to date.
|
Georgios Tzimiropoulos
Invited speaker |
Sensing people | ||
15:00 |
Introduction to the challenge: Multi-modal analysis of body-sensor data Hands-on activity: participants will be divided into groups and compete to solve on an assigned challenge within a limited time span. Solutions will be presented in front of a judging panel that will vote the best groups. |
||||
15:30 | *Challenge starts* |
9 September | |||
13:00 | *Challenge submission deadline* | ||
14:00 | Groups present the challenge results in front of a judging panel | ||
16:00 | Awards and closure |
Logistics | Filming | Sponsor | |
Muhammad Irfan
|
Ricardo Sanchez-Matilla
|
Shahnawaz Ahmed
|