CIS Intelligent Sensing | Summer School

2019 Intelligent Sensing Summer School

A five-day event on advanced intelligent sensing and AI topics, including computer vision, machine listening, natural language processing and tactile sensing. Attendees will learn the most recent methodologies and application for each of the themes. Posters showcasing projects at the Centre for Intelligent Sensing will also be presented during the school. A hands-on activity (the CORSMAL Challenge) will see participants divided into groups that will compete to solve an assigned task within a limited time span. Solutions will be presented in front of a judging panel that will vote the best groups.

The target audience are researchers from the industry, Postdocs, and MSc & PhD students. QMUL PhD students will receive Skills Points for participating, presenting or helping.

Registration: now closed! Send an [email] for late registrations.

Registration for QMUL students and staff (free but mandatory): send an email to [email] by mentioning (i) full name, (ii) supervisor or line manager and (iii) three keywords defining your research interests.

Where: FB 1.13a, Bancroft building (number 31 in [map]).

Accommodations nearby the event location: [QMUL campus], [short stay], [hotels].

For any query: [email]

Programme - 2-6 September

Programme at a glance

	Monday	Tuesday	Wednesday	Thursday	Friday
Morning		Tactile Sensing & AI - vision and tactile sensing - force, pressure and touch - in-hand manipulation - self-calibration	The CORSMAL Challenge (working in groups)	The CORSMAL Challenge (working in groups)	The CORSMAL Challenge (working in groups)

Afternoon	Welcome and opening	Natural Language Processing & AI - conversational agents - news filtering - gamifying crowdsourcing - multimodal data	Sound Sensing & AI - sound event detection - situational awareness - speaker localization - speech recognition	The CORSMAL Challenge (working in groups)	The CORSMAL Challenge (presentation of the results)
	Vision Sensing & AI - video enhancement - image categorisation and segmentation - 3D sensing


					Closing and awards

List of posters displayed during the summer school: [here]

Detailed programme

2 September

13:00-14:00

Registration

14:00-14:10

Welcome to CIS and Opening of the Summer School

Vision Sensing & AI

14:10-14:20

Opening by the session chair

Qianni Zhang

14:20-15:00

Use of machine learning in video enhancement

The BBC is well-known for its stunning visual content and rich archives. We are researching how signal processing and machine learning techniques can add even more value to that content: from enriching both pixels and semantics of existing videos, to delivering them more efficiently to our audiences. This talk will include recent trends and our contributions to this topic. For example, these include the results include design of algorithms for reduction of video encoder complexity, colourisation of visual data (still images for now), interpretability of deep neural networks and super resolution.

[slides]

[video]

Marta Mrak
BBC

15:00-15:30

Coffee break

15:30-15:40

Unsupervised deep learning by neighbourhood discovery

Deep convolutional neural networks (CNNs) have demonstrated remarkable success in computer vision by supervisedly learning strong visual feature representations. In this talk, we introduce a generic unsupervised deep learning approach to training deep models without the need for any manual label supervision. Specifically, we progressively discover sample anchored/centred neighbourhoods to reason and learn the underlying class decision boundaries iteratively and accumulatively. Experiments on image classification show the performance advantages of the proposed method over the state-of-the-art unsupervised learning models on six benchmarks including both coarse-grained and fine-grained object image categorisation.

[slides]

[video]

Jiabo Huang

15:40-15:50

3D sensing and analysis for high-quality point clouds

Multi-view 3D reconstruction techniques enable digital reconstruction of 3D objects from the real world by fusing different viewpoints of the same object into a single 3D representation. This process is by no means trivial and the acquisition of high quality point cloud representations of dynamic scenes is still an open problem. Addressing the increasing demand placed on real-time reconstruction, the work proposes the use of low-cost 3D studio environment which enables photo-realistic reconstruction of human avatars while eliminating the background. The proposed approach exploits an efficient composition of several inpainting methods and filtering methods with ability to search in local neighborhood and share mutual depth data between adjacent sensors to create single point cloud representation in real-time by 3D data fusion of multiple RGB-D sensors.

[slides]

[video]

Andrej Satnik

15:50-16:00

AI for digital pathology image analysis

Histopathology imaging is a type of microscopy imaging commonly used for the microlevel clinical examination of a patient's pathology. Due to the extremely large size of histopathology images, especially whole slide images (WSIs), it is difficult for pathologists to make a quantitative assessment by inspecting the details of a WSI. Hence, a computer-aided system is necessary to provide a subjective and consistent assessment of the WSI for personalised treatment decisions. In this presentatipn, deep learning frameworks for the automatic analysis of whole slide histopathology images is presented for the first time, which aims to address the challenging task of assessing and grading colorectal liver metastasis (CRLM). Quantitative evaluations of a patient’s condition with CRLM are conducted through quantifying different tissue components in resected tumorous specimens. This study mimics the visual examination process of human experts, by focusing on three levels of information, the tissue level, cell level and pixel level, to achieve the step by step segmentation of histopathology images.

[video]

Zhaoyang Xu

16:00-16:30

Presentation of the CORSMAL Challenge

Ricardo Sanchez-Matilla

16:30-17:00

Self-presentation of participants to the summer school and presentation of list of [posters]

3 September

9:00-9:30

Registration

Tactile Sensing & AI

9:30-9:40

Welcome and opening by the session chair

Lorenzo Jamone

9:40-10:10

Multimodal and cross-modal robotic perception with vision and tactile sensing

Future robots, as embodied agents, should make best use of all available sensing modalities to interact with the environment. In this talk, research on combining vision and touch sensing from the perspective of how touch sensing complements vision to achieve a better robot perception will be introduced.

[slides]

[video]

Shan Luo
University of Liverpool

10:10-11:00

Applications of tactile sensing in industry

Giving robots a sense of touch is vital to making them perform tasks that currently only humans can do. In this talk, Rich will explain current challenges and opportunities - what works and what doesn't - and give some insights into future research challenges.

[slides]

[video]

Rich Walker
Shadow Robot

11:00-11:30

Coffee break with interactive Demos

11:30-12:00

One sensor to measure two modalities – force information and tactile information

In this research, we present a novel design for an elastomer-based tactile and force sensing device that senses both information within one elastomer. The proposed sensor has a soft and compliant design employing an opaque elastomer. The optical sensing method is used to measure both information simultaneously based on the deformation of the reflective elastomer structure and a flexure structure.

Wanlin Li

Learning robotic in-hand manipulation tasks from demonstration

In-hand manipulation requires handling two problems simultaneously: controlling the object trajectory and keeping the object in a stable grasp. Multiple fingers should move in coordination while keeping a robust contact with the object. We combine learning from demonstration and the virtual spring framework to answer both of these problems. We use the tactile force sensing to adapt the grasp forces as a reaction to trajectory control forces.

[slides]

[video]

Gokhan Solak

Robot self-calibration from touch events

Robots often rely on a numerical representation of their body in order to interact with the environment; notably, such model needs to be continuously calibrated, for example, through some form of parameter estimation, to cope with changes over time. We will present a novel online strategy that allows a robot to self-calibrate its model by touching planar surfaces in its environment. We achieve this using an adaptive parameter estimation (Extended Kalman Filter) which incorporates planar constraints obtained at each contact detection. Testing this method on simulated and real-world robotic setups, we conclude to be able to improve significantly the robotic accuracy towards future reaching/grasping tasks.

[slides]

[video]

Rodrigo Zenha

Smart arse: textile pressure sensing in trousers

Textiles are a material we are very familiar with and that serves as an interface to the world. In the form of clothes, textiles follow our movements and can therefore be explored as an unintrusive modality for body-centric computing. Here, we introduce sensing trousers to classify sitting postures and furthermore social behaviours using embedded fabric pressure sensors.

[slides]

[video]

Sophie Skach

12:00-12:30

Robots with sense of touch

Robots operating in dynamic and unstructured environments must exhibit advanced forms of interaction with objects and humans. 'Sense of Touch' in robots can play a fundamental role in enhancing perceptual, cognitive and operative capabilities of robots, specifically when they physically interact with objects and humans in the environment. Many solutions to design, engineer and manufacture tactile sensors have been presented, because the availability of appropriate sensing technologies is the first and necessary step, but the effective utilization of sense of touch in robots depends also on the understanding of tactile perception mechanism through which the robot builds an appropriate world model. The lecture will present technological and research challenges for providing robots with sense of touch.

[slides]

[video]

Perla Maiolino
University of Oxford

13:00-13:50

Registration

Natural Language Processing & AI

13:50-14:00

Welcome and opening by the session chair

Massimo Poesio

14:00-14:45

Conversational agents in games

Games present both a challenge and an opportunity for conversation modelling. Ed speaks about the combination of AI techniques and design strategies that allow designers to build AI characters to respond to a wide range of player input (including natural language, gestures, and in-game actions), while staying on track to deliver the desired story and play experience.

[video]

Edward Minnett
Spirit AI

14:45-15:30

Applying AI in news filtering

In Signal, we use A.I. to monitor and analyse news and media. We focus on reputation management and market intelligence analysis. Quantity, speed, customisation and user interaction are fundamental requirements. In this talk, we will explain how A.I. techniques can be applied on vast amount of text data. We will look into particular examples of entity processing, which require the derivation of knowledge about arbitrary named entities from the data. We will introduce state-of-the-art technology in research. Particularly we will highlight how the appropriate use of data can help achieving robust performance in commercial standard, and discuss some core challenges in applying A.I. on vast scale in an industrial setting.

[slides]

[video]

Raymond Ng
Signal AI

15:30-16:00

Coffee break

16:00-16:40

Hands-On session: deep learning for Natural Language Processing (NLP)

Deep learning plays an important role in state-of-the-art Natural Language Processing (NLP) applications and is now used in the most recent systems developed for NLP tasks. In this session, we will explore a neural machine translation system using the sequence-to-sequence (seq2seq) model together with the attention mechanism (Sutskever et al., 2014 and Cho et al., 2014). The model we explore during this session is similar to the Google’s Neural Machine Translation system (GNMT) (Wu et al., 2016), and has the potential to achieve competitive performance with the GNMT by using larger and deeper networks.

[slides]

[video]

Juntao Yu

16:40-17:05

Gamifying crowdsourcing

Crowdsourcing was historically applied to simple labelling tasks, like picking objects in pictures. However, the tasks we use supervised learning for, and require human computation for, are getting increasingly more complex. Harnessing power from the crowd now requires the design and integration of bespoke interfaces, automated pipelines, training, task assignment, aggregation and various other measures to take non-experts and turn them into a workforce that can match expert workers. This talk will look at how we used gamification in crowdsourcing to address these problems when annotating candidate mentions for coreference resolution.

[video]

Chris Madge

17:05-17:30

Language and Vision tasks: models and what they learn

In the literature, several tasks have been proposed to combine linguistic and visual information. Different models have been developed to solve these tasks. These models implement the bottom-up processing of the "Hub and Spoke" architecture proposed in cognitive science to represent how the brain processes and combines multi-sensory inputs. In particular, the Hub is implemented as a neural network encoder. This talk will provide an overview of these tasks and models. And will show that the linguistic skills of the models differ dramatically, despite models having a comparable task success rate. In the later part of the talk will focus on how to systematically investigate the effect on the encoder of various vision-and-language tasks.

[slides]

[video]

Ravi Shekhar

4 September
The CORSMAL Challenge
9:00-9:30	Start of the CORSMAL Challenge
10:00-11:30	Working in groups

12:00-12:50

Registration

Sound Sensing & AI

12:50-13:00

Welcome and opening by the session chair

Lin Wang

13:00-13:45

The challenges and benefits of sound sensing

In a context where advances in AI have successfully turned the intelligent sensing of images, music, speech, health and people’s identity into practical and commercial realities, sound sensing in a wider sense than speech and music has only recently started to break through as a missing piece of the perceptual AI puzzle. This talk will illustrate the various aspects of the journey taken by Audio Analytic to develop pioneering sound event detection technology from scratch and turn it into commercial products yielding benefits to millions of customers. Topics will cover the research challenges, data challenges and privacy questions which underlie the intelligent sensing of acoustic scenes and events.

[slides]

[video]

Sacha Krstulovic
Audio Analytic

13:45-14:30

Localize, track, and interact: machine listening for AI

Audio signals encapsulate vital information required for autonomous agents to make sense of their surrounding environment. This talk focuses on state-of-the-art approaches in machine listening that equip autonomous agents with situational awareness. The first part of the talk will provide an overview of existing approaches for localization and tracking of sound sources. The second part will focus on practical insights gained from the recent LOCATA Challenge. In the third part, we will explore current and future directions, such as the self-localization of moving microphone arrays using acoustic SLAM, and the fusion of data from acoustic sensor networks in smart environments.

Christine Evers
Imperial College London

14:30-14:45

Speaker localization and tracking using multi-modal signals

The talk focuses on exploiting the complementarity of the audio and video modalities to accurately estimating the trajectories of the targets under challenging scenarios, such as partial occlusions and environment noise. We propose the AV3T algorithm which estimates the 3D mouth position from face detections and models the likelihood in the camera's spherical coordinates based on the uncertainties derived from the image-to-3D projection. Moreover, AV3T uses video to indicate the most likely speaker-height plane for the acoustic map computation. During misdetections, it switches to a generative model based on color spatiograms. We will aslo present a newly collected audio-visual dataset with annotations.

[slides]

[video]

Xinyuan Qian

14:45-15:15

Coffee break

15:15-16:00

Distant microphone speech recognition in everyday environments: from CHiME-5 to CHiME-6

The CHiME challenge series has been aiming to advance robust automatic speech recognition technology by promoting research at the interface of speech and language processing, signal processing and machine learning. This talk will present outcomes of the 5th CHiME Challenge, which has considered the task of distant multi-microphone conversational speech recognition in domestic environments. The talk will present an overview of the CHiME-5 dataset, a fully-transcribed audio-video dataset that has captured 50 hours of audio from 20 separate dinner parties held in real homes each with 6 video channels and 32 audio channels. The talk will discuss the design of the light-weight recording set up that allowed for highly natural data to be recorded. I will present an analysis of the data, highlighting the major sources of difficulty it presents for recognition systems. The talk will summarise the outcomes of the challenge itself and recent advances that now present the state-of-the art. The talk will conclude by discussing future directions and introducing the CHiME-6 challenge that is due to launch later this year.

[video]

Jon Barker
University of Sheffield

16:00-16:35

Embedded sound processing with Bela

This talk will present Bela (http://bela.io), an embedded computing platform for creating ultra-low-latency interactive audio systems. Bela is based on the BeagleBone Black, a 1GHz ARM single-board computer. It combines the performance of the Xenomai real-time Linux environment, flexible connectivity to a wide variety of sensors and actuators, and an easy-to-use browser-based development environment. Bela is a fully open-source platform for makers, musicians and researchers to create highly responsive interactive systems.

[video]

Andrew McPherson

16:35-16:50

Explainable Machine Learning and its applications to Machine Listening

Explainable Machine Learning (EML) algorithms aim to make Deep Neural Networks (DNNs) transparent through their post-hoc analysis. This talk will introduce two key categories of EML algorithms for explaining a model and for explaining individual model predictions. We will cover the recent advances in understanding DNNs and will highlight some of the key research challenges that EML methods have to face. We will conclude with a demonstration of our recent works that explain machine listening models to classify audio.

[slides]

[video]

Saumitra Mishra

16:50-17:30

Panel discussion: Christine Evers, Sacha Krstulovic and Jon Barker. Moderator: Lin Wang

[video]

5 September
The CORSMAL Challenge
9:00-17:30	Working in groups

6 September
The CORSMAL Challenge
9:00-13:00	Working in groups
13:00	Submission of the CORSMAL Challenge results

14:00-15:30	Presentation of the results in front of a panel
15:30	Closing and awards

Posters

Adapting the quality of experience framework for audio archive evaluation [pdf]
A. Ragano, E. Benetos, A. Hines

Analysing the predictions of a CNN-based replay spoofing detection system [pdf]
B. Chettri, S. Mishra, B. L. Sturm, E. Benetos

An elastomer-based flexible optical force and tactile sensor [pdf]
W. Li, J. Konstantinova, Y. Noh, Z. Ma, A. Alomainy, K. Althoefer

Background light estimation for depth-dependent underwater image restoration [pdf]
C.Y. Li, A. Cavallaro

Distributed one-class learning [pdf]
A.S. Shamsabadi, H. Haddadi, A. Cavallaro

Effect of textile properties on a low-profile wearable loop antenna for healthcare applications [pdf]
I.I. Labiano, A. Alomainy, M.M. Bait-Suwailam

End-to-end probabilistic inference for nonstationary audio analysis [pdf]
W. Wilkinson, M.R. Andersen, J.D. Reiss, D. Stowell, A. Solin

Knowledge distillation by on-the-fly native ensemble [pdf]
X. Lan, X. Zhu, S. Gong

Learning action representations for self-supervised visual exploration [pdf]
C. Oh and A. Cavallaro

MORB: A multi-scale binary descriptor [pdf]
A. Xompero, O. Lanz, A. Cavallaro

Multiview 3D sensing and analysis for high quality point cloud capturing and model generation [pdf]
A. Satnik and E. Izquierdo

Real-time quality assessment of videos from body-worn cameras [pdf]
Y.Y. Chang, R. Mazzon, A. Cavallaro

Region based user-generated human body scan registration [pdf]
Z. Xu, Q. Zhang

Scene privacy protection [pdf]
C.Y. Li, A.S. Shamsabadi, R. Sanchez-Matilla, R. Mazzon, A. Cavallaro

Self-referenced deep learning [pdf]
X. Lan, X. Zhu, S. Gong

Sound-based transportation mode recognition with smartphones [pdf]
L. Wang, D. Roggen

Sparse Gaussian process audio source separation using spectrum priors in the time-domain [pdf]
P.A. Alvarado, M.A. Alvarez, D. Stowell

SubSpectralNet – using sub-spectrogram based Convolutional Neural Networks for acoustic scene classification [pdf]
S.S.R. Phaye, E. Benetos, Y. Wang

Tracking a moving sound source from a multi-rotor drone [pdf]
L. Wang, R. Sanchez Matilla, A. Cavallaro

Unifying probabilistic models for time-frequency analysis [pdf]
W. Wilkinson, M.R. Andersen, J.D. Reiss, D. Stowell, A. Solin

Visual localization in the presence of appearance changes using the partial order kernel [pdf]
M. Abdollahyan, S. Cascianelli, E. Bellocchio, G. Costante, T.A. Ciarfuglia, F. Bianconi, F. Smeraldi, M.L. Fravolini