2019 Intelligent Sensing Summer School
A five-day event on advanced intelligent sensing and AI topics, including computer vision, machine listening, natural language processing and tactile sensing.
Attendees will learn the most recent methodologies and application for each of the themes.
Posters showcasing projects at the Centre for Intelligent Sensing will also be presented during the school. A hands-on activity (the CORSMAL Challenge) will see participants divided into groups that will compete to solve an assigned task within a limited time span. Solutions will be presented in front of a judging panel that will vote the best groups.
The target audience are researchers from the industry, Postdocs, and MSc & PhD students. QMUL PhD students will receive Skills Points for participating, presenting or helping.
Registration: now closed! Send an [email] for late registrations.
Registration for QMUL students and staff (free but mandatory): send an email to [email] by mentioning (i) full name, (ii) supervisor or line manager and (iii) three keywords defining your research interests.
Where: FB 1.13a, Bancroft building (number 31 in [map]).
Accommodations nearby the event location: [QMUL campus], [short stay], [hotels].
For any query: [email] |
Programme at a glance
Monday | Tuesday | Wednesday | Thursday | Friday | |
Morning | Tactile Sensing & AI - vision and tactile sensing - force, pressure and touch - in-hand manipulation - self-calibration |
The CORSMAL Challenge (working in groups) |
The CORSMAL Challenge (working in groups) |
The CORSMAL Challenge (working in groups) |
Afternoon | Welcome and opening | Natural Language Processing & AI - conversational agents - news filtering - gamifying crowdsourcing - multimodal data |
Sound Sensing & AI - sound event detection - situational awareness - speaker localization - speech recognition |
The CORSMAL Challenge (working in groups) |
The CORSMAL Challenge (presentation of the results) |
Vision Sensing & AI - video enhancement - image categorisation and segmentation - 3D sensing |
|||||
Closing and awards |
Detailed programme
2 September | ||||||
13:00-14:00 | Registration | |||||
14:00-14:10 | Welcome to CIS and Opening of the Summer School | |||||
Vision Sensing & AI | ||||||
14:10-14:20 | Opening by the session chair |
Qianni Zhang | ||||
14:20-15:00 |
Use of machine learning in video enhancement The BBC is well-known for its stunning visual content and rich archives. We are researching how signal processing and machine learning techniques can add even more value to that content: from enriching both pixels and semantics of existing videos, to delivering them more efficiently to our audiences. This talk will include recent trends and our contributions to this topic. For example, these include the results include design of algorithms for reduction of video encoder complexity, colourisation of visual data (still images for now), interpretability of deep neural networks and super resolution.
|
Marta Mrak BBC | ||||
15:00-15:30 | Coffee break | |||||
15:30-15:40 |
Unsupervised deep learning by neighbourhood discovery Deep convolutional neural networks (CNNs) have demonstrated remarkable success in computer vision by supervisedly learning strong visual feature representations. In this talk, we introduce a generic unsupervised deep learning approach to training deep models without the need for any manual label supervision. Specifically, we progressively discover sample anchored/centred neighbourhoods to reason and learn the underlying class decision boundaries iteratively and accumulatively. Experiments on image classification show the performance advantages of the proposed method over the state-of-the-art unsupervised learning models on six benchmarks including both coarse-grained and fine-grained object image categorisation.
|
Jiabo Huang |
||||
15:40-15:50 |
3D sensing and analysis for high-quality point clouds Multi-view 3D reconstruction techniques enable digital reconstruction of 3D objects from the real world by fusing different viewpoints of the same object into a single 3D representation. This process is by no means trivial and the acquisition of high quality point cloud representations of dynamic scenes is still an open problem. Addressing the increasing demand placed on real-time reconstruction, the work proposes the use of low-cost 3D studio environment which enables photo-realistic reconstruction of human avatars while eliminating the background. The proposed approach exploits an efficient composition of several inpainting methods and filtering methods with ability to search in local neighborhood and share mutual depth data between adjacent sensors to create single point cloud representation in real-time by 3D data fusion of multiple RGB-D sensors.
|
Andrej Satnik |
||||
15:50-16:00 |
AI for digital pathology image analysis Histopathology imaging is a type of microscopy imaging commonly used for the microlevel clinical examination of a patient's pathology. Due to the extremely large size of histopathology images, especially whole slide images (WSIs), it is difficult for pathologists to make a quantitative assessment by inspecting the details of a WSI. Hence, a computer-aided system is necessary to provide a subjective and consistent assessment of the WSI for personalised treatment decisions. In this presentatipn, deep learning frameworks for the automatic analysis of whole slide histopathology images is presented for the first time, which aims to address the challenging task of assessing and grading colorectal liver metastasis (CRLM). Quantitative evaluations of a patient’s condition with CRLM are conducted through quantifying different tissue components in resected tumorous specimens. This study mimics the visual examination process of human experts, by focusing on three levels of information, the tissue level, cell level and pixel level, to achieve the step by step segmentation of histopathology images.
|
Zhaoyang Xu |
||||
16:00-16:30 | Presentation of the CORSMAL Challenge |
Ricardo Sanchez-Matilla |
||||
16:30-17:00 | Self-presentation of participants to the summer school and presentation of list of [posters] |
3 September | ||||||
9:00-9:30 | Registration | |||||
Tactile Sensing & AI | ||||||
9:30-9:40 | Welcome and opening by the session chair |
Lorenzo Jamone | ||||
9:40-10:10 |
Multimodal and cross-modal robotic perception with vision and tactile sensing Future robots, as embodied agents, should make best use of all available sensing modalities to interact with the environment. In this talk, research on combining vision and touch sensing from the perspective of how touch sensing complements vision to achieve a better robot perception will be introduced.
|
Shan Luo University of Liverpool | ||||
10:10-11:00 |
Applications of tactile sensing in industry Giving robots a sense of touch is vital to making them perform tasks that currently only humans can do. In this talk, Rich will explain current challenges and opportunities - what works and what doesn't - and give some insights into future research challenges.
|
Rich Walker Shadow Robot | ||||
11:00-11:30 | Coffee break with interactive Demos | |||||
11:30-12:00 |
One sensor to measure two modalities – force information and tactile information In this research, we present a novel design for an elastomer-based tactile and force sensing device that senses both information within one elastomer. The proposed sensor has a soft and compliant design employing an opaque elastomer. The optical sensing method is used to measure both information simultaneously based on the deformation of the reflective elastomer structure and a flexure structure. |
Wanlin Li |
||||
Learning robotic in-hand manipulation tasks from demonstration In-hand manipulation requires handling two problems simultaneously: controlling the object trajectory and keeping the object in a stable grasp. Multiple fingers should move in coordination while keeping a robust contact with the object. We combine learning from demonstration and the virtual spring framework to answer both of these problems. We use the tactile force sensing to adapt the grasp forces as a reaction to trajectory control forces.
|
Gokhan Solak |
|||||
Robot self-calibration from touch events Robots often rely on a numerical representation of their body in order to interact with the environment; notably, such model needs to be continuously calibrated, for example, through some form of parameter estimation, to cope with changes over time. We will present a novel online strategy that allows a robot to self-calibrate its model by touching planar surfaces in its environment. We achieve this using an adaptive parameter estimation (Extended Kalman Filter) which incorporates planar constraints obtained at each contact detection. Testing this method on simulated and real-world robotic setups, we conclude to be able to improve significantly the robotic accuracy towards future reaching/grasping tasks.
|
Rodrigo Zenha |
|||||
Smart arse: textile pressure sensing in trousers Textiles are a material we are very familiar with and that serves as an interface to the world. In the form of clothes, textiles follow our movements and can therefore be explored as an unintrusive modality for body-centric computing. Here, we introduce sensing trousers to classify sitting postures and furthermore social behaviours using embedded fabric pressure sensors.
|
Sophie Skach |
|||||
12:00-12:30 |
Robots with sense of touch Robots operating in dynamic and unstructured environments must exhibit advanced forms of interaction with objects and humans. 'Sense of Touch' in robots can play a fundamental role in enhancing perceptual, cognitive and operative capabilities of robots, specifically when they physically interact with objects and humans in the environment. Many solutions to design, engineer and manufacture tactile sensors have been presented, because the availability of appropriate sensing technologies is the first and necessary step, but the effective utilization of sense of touch in robots depends also on the understanding of tactile perception mechanism through which the robot builds an appropriate world model. The lecture will present technological and research challenges for providing robots with sense of touch.
|
Perla Maiolino University of Oxford |
13:00-13:50 | Registration | |||||
Natural Language Processing & AI | ||||||
13:50-14:00 | Welcome and opening by the session chair |
Massimo Poesio | ||||
14:00-14:45 |
Conversational agents in games Games present both a challenge and an opportunity for conversation modelling. Ed speaks about the combination of AI techniques and design strategies that allow designers to build AI characters to respond to a wide range of player input (including natural language, gestures, and in-game actions), while staying on track to deliver the desired story and play experience.
|
Edward Minnett Spirit AI | ||||
14:45-15:30 |
Applying AI in news filtering In Signal, we use A.I. to monitor and analyse news and media. We focus on reputation management and market intelligence analysis. Quantity, speed, customisation and user interaction are fundamental requirements. In this talk, we will explain how A.I. techniques can be applied on vast amount of text data. We will look into particular examples of entity processing, which require the derivation of knowledge about arbitrary named entities from the data. We will introduce state-of-the-art technology in research. Particularly we will highlight how the appropriate use of data can help achieving robust performance in commercial standard, and discuss some core challenges in applying A.I. on vast scale in an industrial setting.
|
Raymond Ng Signal AI | ||||
15:30-16:00 | Coffee break | |||||
16:00-16:40 |
Hands-On session: deep learning for Natural Language Processing (NLP) Deep learning plays an important role in state-of-the-art Natural Language Processing (NLP) applications and is now used in the most recent systems developed for NLP tasks. In this session, we will explore a neural machine translation system using the sequence-to-sequence (seq2seq) model together with the attention mechanism (Sutskever et al., 2014 and Cho et al., 2014). The model we explore during this session is similar to the Google’s Neural Machine Translation system (GNMT) (Wu et al., 2016), and has the potential to achieve competitive performance with the GNMT by using larger and deeper networks.
|
Juntao Yu | ||||
16:40-17:05 |
Gamifying crowdsourcing Crowdsourcing was historically applied to simple labelling tasks, like picking objects in pictures. However, the tasks we use supervised learning for, and require human computation for, are getting increasingly more complex. Harnessing power from the crowd now requires the design and integration of bespoke interfaces, automated pipelines, training, task assignment, aggregation and various other measures to take non-experts and turn them into a workforce that can match expert workers. This talk will look at how we used gamification in crowdsourcing to address these problems when annotating candidate mentions for coreference resolution.
|
Chris Madge | ||||
17:05-17:30 |
Language and Vision tasks: models and what they learn In the literature, several tasks have been proposed to combine linguistic and visual information. Different models have been developed to solve these tasks. These models implement the bottom-up processing of the "Hub and Spoke" architecture proposed in cognitive science to represent how the brain processes and combines multi-sensory inputs. In particular, the Hub is implemented as a neural network encoder. This talk will provide an overview of these tasks and models. And will show that the linguistic skills of the models differ dramatically, despite models having a comparable task success rate. In the later part of the talk will focus on how to systematically investigate the effect on the encoder of various vision-and-language tasks.
|
Ravi Shekhar |
4 September | |||
The CORSMAL Challenge | |||
9:00-9:30 | Start of the CORSMAL Challenge | ||
10:00-11:30 | Working in groups |
12:00-12:50 | Registration | |||||
Sound Sensing & AI | ||||||
12:50-13:00 | Welcome and opening by the session chair |
Lin Wang | ||||
13:00-13:45 |
The challenges and benefits of sound sensing In a context where advances in AI have successfully turned the intelligent sensing of images, music, speech, health and people’s identity into practical and commercial realities, sound sensing in a wider sense than speech and music has only recently started to break through as a missing piece of the perceptual AI puzzle. This talk will illustrate the various aspects of the journey taken by Audio Analytic to develop pioneering sound event detection technology from scratch and turn it into commercial products yielding benefits to millions of customers. Topics will cover the research challenges, data challenges and privacy questions which underlie the intelligent sensing of acoustic scenes and events.
|
Sacha Krstulovic Audio Analytic | ||||
13:45-14:30 |
Localize, track, and interact: machine listening for AI
Audio signals encapsulate vital information required for autonomous agents to make sense of their surrounding environment. This talk focuses on state-of-the-art approaches in machine listening that equip autonomous agents with situational awareness. The first part of the talk will provide an overview of existing approaches for localization and tracking of sound sources. The second part will focus on practical insights gained from the recent LOCATA Challenge. In the third part, we will explore current and future directions, such as the self-localization of moving microphone arrays using acoustic SLAM, and the fusion of data from acoustic sensor networks in smart environments. |
Christine Evers Imperial College London | ||||
14:30-14:45 |
Speaker localization and tracking using multi-modal signals The talk focuses on exploiting the complementarity of the audio and video modalities to accurately estimating the trajectories of the targets under challenging scenarios, such as partial occlusions and environment noise. We propose the AV3T algorithm which estimates the 3D mouth position from face detections and models the likelihood in the camera's spherical coordinates based on the uncertainties derived from the image-to-3D projection. Moreover, AV3T uses video to indicate the most likely speaker-height plane for the acoustic map computation. During misdetections, it switches to a generative model based on color spatiograms. We will aslo present a newly collected audio-visual dataset with annotations.
|
Xinyuan Qian |
||||
14:45-15:15 | Coffee break | |||||
15:15-16:00 |
Distant microphone speech recognition in everyday environments: from CHiME-5 to CHiME-6 The CHiME challenge series has been aiming to advance robust automatic speech recognition technology by promoting research at the interface of speech and language processing, signal processing and machine learning. This talk will present outcomes of the 5th CHiME Challenge, which has considered the task of distant multi-microphone conversational speech recognition in domestic environments. The talk will present an overview of the CHiME-5 dataset, a fully-transcribed audio-video dataset that has captured 50 hours of audio from 20 separate dinner parties held in real homes each with 6 video channels and 32 audio channels. The talk will discuss the design of the light-weight recording set up that allowed for highly natural data to be recorded. I will present an analysis of the data, highlighting the major sources of difficulty it presents for recognition systems. The talk will summarise the outcomes of the challenge itself and recent advances that now present the state-of-the art. The talk will conclude by discussing future directions and introducing the CHiME-6 challenge that is due to launch later this year.
|
Jon Barker University of Sheffield | ||||
16:00-16:35 |
Embedded sound processing with Bela This talk will present Bela (http://bela.io), an embedded computing platform for creating ultra-low-latency interactive audio systems. Bela is based on the BeagleBone Black, a 1GHz ARM single-board computer. It combines the performance of the Xenomai real-time Linux environment, flexible connectivity to a wide variety of sensors and actuators, and an easy-to-use browser-based development environment. Bela is a fully open-source platform for makers, musicians and researchers to create highly responsive interactive systems.
|
Andrew McPherson |
||||
16:35-16:50 |
Explainable Machine Learning and its applications to Machine Listening Explainable Machine Learning (EML) algorithms aim to make Deep Neural Networks (DNNs) transparent through their post-hoc analysis. This talk will introduce two key categories of EML algorithms for explaining a model and for explaining individual model predictions. We will cover the recent advances in understanding DNNs and will highlight some of the key research challenges that EML methods have to face. We will conclude with a demonstration of our recent works that explain machine listening models to classify audio.
|
Saumitra Mishra |
||||
16:50-17:30 |
Panel discussion: Christine Evers, Sacha Krstulovic and Jon Barker. Moderator: Lin Wang
|
5 September | |||
The CORSMAL Challenge | |||
9:00-17:30 | Working in groups |
6 September | |||
The CORSMAL Challenge | |||
9:00-13:00 | Working in groups | ||
13:00 | Submission of the CORSMAL Challenge results |
14:00-15:30 | Presentation of the results in front of a panel | ||
15:30 | Closing and awards |
Posters |
Adapting the quality of experience framework for audio archive evaluation [pdf] A. Ragano, E. Benetos, A. Hines Analysing the predictions of a CNN-based replay spoofing detection system [pdf] B. Chettri, S. Mishra, B. L. Sturm, E. Benetos An elastomer-based flexible optical force and tactile sensor [pdf] W. Li, J. Konstantinova, Y. Noh, Z. Ma, A. Alomainy, K. Althoefer Background light estimation for depth-dependent underwater image restoration [pdf] C.Y. Li, A. Cavallaro Distributed one-class learning [pdf] A.S. Shamsabadi, H. Haddadi, A. Cavallaro Effect of textile properties on a low-profile wearable loop antenna for healthcare applications [pdf] I.I. Labiano, A. Alomainy, M.M. Bait-Suwailam End-to-end probabilistic inference for nonstationary audio analysis [pdf] W. Wilkinson, M.R. Andersen, J.D. Reiss, D. Stowell, A. Solin Knowledge distillation by on-the-fly native ensemble [pdf] X. Lan, X. Zhu, S. Gong Learning action representations for self-supervised visual exploration [pdf] C. Oh and A. Cavallaro MORB: A multi-scale binary descriptor [pdf] A. Xompero, O. Lanz, A. Cavallaro Multiview 3D sensing and analysis for high quality point cloud capturing and model generation [pdf] A. Satnik and E. Izquierdo Real-time quality assessment of videos from body-worn cameras [pdf] Y.Y. Chang, R. Mazzon, A. Cavallaro Region based user-generated human body scan registration [pdf] Z. Xu, Q. Zhang Scene privacy protection [pdf] C.Y. Li, A.S. Shamsabadi, R. Sanchez-Matilla, R. Mazzon, A. Cavallaro Self-referenced deep learning [pdf] X. Lan, X. Zhu, S. Gong Sound-based transportation mode recognition with smartphones [pdf] L. Wang, D. Roggen Sparse Gaussian process audio source separation using spectrum priors in the time-domain [pdf] P.A. Alvarado, M.A. Alvarez, D. Stowell SubSpectralNet – using sub-spectrogram based Convolutional Neural Networks for acoustic scene classification [pdf] S.S.R. Phaye, E. Benetos, Y. Wang Tracking a moving sound source from a multi-rotor drone [pdf] L. Wang, R. Sanchez Matilla, A. Cavallaro Unifying probabilistic models for time-frequency analysis [pdf] W. Wilkinson, M.R. Andersen, J.D. Reiss, D. Stowell, A. Solin Visual localization in the presence of appearance changes using the partial order kernel [pdf] M. Abdollahyan, S. Cascianelli, E. Bellocchio, G. Costante, T.A. Ciarfuglia, F. Bianconi, F. Smeraldi, M.L. Fravolini |
Logistics | Filming | |||||
Muhammad Farrukh S.
|
Ashish Alex
|
Vandana Rajan
|
Xinyuan Qian
|
Chau Yi Li
|
Ali Shahin Shamsabadi
|
Sponsors | |||
|
|
|