Program Deep Learning @UCA Event 2019, From July 15 to 19
• 2:45-hour Lecture in the morning (9 am – 12:15)
• Coffee break (around 10:15)
• Optional: UCA Lab sessions (2pm to 4:30pm)

If you are a researcher, an engineer, a PhD student, a master student, working in tech companies or in academic labs, this school is made for you!


Rohit Prabhavalkar, PhD, Researcher at Google Rohit Prabhavalkar received his PhD in Computer Science and Engineering from The Ohio State...
Professor Martial Hebert Martial Hebert is a Professor of Robotics and Director of the Robotics Institute at Carnegie-Mellon...
Professor Alexandre Alahi Alexandre Alahi is a tenure-track assistant professor at EPFL leading the Visual Intelligence for...
Professor Graham Taylor Graham Taylor is a Canada Research Chair and Associate Professor of Engineering at the University of...
Tomas Mikolov, PhD, Researcher at Facebook Tomas Mikolov is a research scientist at Facebook AI Research since May 2014. Previously he has been...
End-to-end models for automatic speech recognition by Rohit Prabhavalkar, PhD, Researcher at Google

Automatic speech recognition (ASR) systems -- which convert input speech into word hypotheses -- are becoming ubiquitous in our daily lives. ASR technologies have become the backbone that power our interactions with smartphones and digital assistants, allowing us to access the wealth of available information, improve productivity, and communicate faster and easier than before.

The technologies and the underlying approaches that power traditional ASR systems have remained fairly stable over the last few years. Traditional ASR systems are comprised of a set of separate components: an acoustic model (AM); a pronunciation model (PM); and a language model (LM). The AM takes acoustic features as input and predicts a distribution over sub-word units (i.e., the individual sounds in the target language). The PM, which is traditionally a hand-engineered pronunciation dictionary maps the sequence of sub-word units produced by the acoustic model to words. Finally, the LM assigns probabilities to various word hypotheses. In traditional ASR systems, these components are trained independently on separate datasets, while making a number of independence assumptions for tractability.

The dominance of traditional ASR systems, however, has been challenged recently by growing interest in the field of end-to-end ASR systems, which attempt to learn these separate components jointly in a single system. Examples of such systems include attention-based models [1, 5], the recurrent neural transducer [2, 3], and connectionist temporal classification with word targets [4]. A common feature of all of these models is that they are composed of a single neural network, which accepts acoustic frames as input and outputs a probability distribution over characters or word hypotheses. In fact, as has been demonstrated in recent work, such end-to-end models can surpass the performance of a conventional ASR systems [5].

In this lecture, I shall provide a detailed introduction to the topic of end-to-end modeling in the context of ASR. I shall begin by charting out the historical development of these systems, while emphasizing commonalities and differences between the various end-to-end approaches that have been considered in the literature. Next, I shall discuss a number of recently introduced innovations that have significantly improved the performance of end-to-end models, allowing these to surpass the performance of conventional ASR systems. I shall then describe some of the exciting applications of this research, along with possible fruitful directions to explore. Finally, I shall discuss some of the shortcomings of existing end-to-end modeling approaches and discuss ongoing efforts to address these challenges.

[1] W. Chan, N. Jaitly, Q. V. Le, and O. Vinyals, “Listen, Attend and Spell,” in Proc. ICASSP, 2016.
[2] A. Graves, “Sequence transduction with recurrent neural networks,” in Proc. of ICASSP, 2012.
[3] K. Rao, H. Sak, and R. Prabhavalkar, “Exploring Architectures, Data and Units For Streaming End-to-End Speech Recognition with RNN-Transducer,” in Proc. ASRU, 2017.
[4] H. Soltau, H. Liao, and H. Sak, “Neural speech recognizer: acoustic-to-word LSTM model for large vocabulary speech recognition,” in Proc. of Interspeech, 2017.
[5] C.C. Chiu, T. N. Sainath, Y. Wu, R. Prabhavalkar, P. Nguyen, Z. Chen, A. Kannan, R. J. Weiss, K. Rao, E. Gonina, N. Jaitly, B. Li, J. Chorowski and M. Bacchiani, “State-of-the-art Speech Recognition With Sequence-to-Sequence Models,” in Proc. ICASSP, 2018.

Speech Recognition Using Deep learning

Go on this link, then File-> Save a copy in your Google Drive

Research challenges in using computer vision in robotics systems by Professor Martial Hebert

The past decade has seen a remarkable increase in the level of performance of computer vision techniques, including with the introduction of effective deep learning techniques. Much of this progress is in the form of rapidly increasing performance on standard, curated datasets. However, translating these results into operational vision systems for robotics applications remains a formidable challenge. This talk with explore some of the fundamental questions at the boundary between deep learning/computer vision and robotics that need to be addressed. This includes minimizing supervision (low-shot learning, metalearning, self supervision), introspection/self-awareness of performance, anytime algorithms for computer vision, multi-hypothesis generation, rapid learning and adaptation. The discussion will be illustrated by examples from autonomous air and ground robots.


Deep Reinforcement learning

Go on this link, then File-> Save a copy in your Google Drive

Deep Learning for Self-driving cars and Beyond by Professor Alexandre Alahi

Will deep learning reshape the AI pillars of self-driving cars, or any autonomous moving platform? In this talk, we will overview state-of-the-art deep learning based methods for perception, prediction, and planning.


Object Detection Using Deep learning

Go on this link, then File-> Save a copy in your Google Drive

Towards interpretable and robust machine learning systems by Professor Graham Taylor

The scale of research and application of deep learning continues to accelerate. In our work and our daily lives, we see more dependence on deep learning systems for making predictions. In some cases, these are automated, and in others, they involve humans "in-the-loop". Unfortunately, these systems can fail silently. Examples of failure include systems that make erroneous predictions but believe with high confidence that they are correct, susceptibility to adversarial attacks and so-called fooling images.

In the first part of the tutorial, I will review the adversarial examples phenomenon and current work that aims to address a model's fault tolerance with respect to its input. I will present some recent work from our lab that aims to characterize tolerance to diverse input faults, and also a surprising result that relates the widely-used batch normalization technique to adversarial vulnerability.

In the second part of the tutorial, I will discuss one way to build more fault-tolerant machine learning systems: that is, by calibrating their confidence or uncertainty measures for interpretation by humans or other systems. Such measures are useful, for example, to refer data points to a human expert or another system for further processing. In automated scenarios, they can be used to hand control to a human. I will also highlight how calibration is related to fairness in machine learning. I will demonstrate the use of confidence and uncertainty measures for out-of-distribution detection and for improving exploration in reinforcement learning.

Sentiment analysis lab - From a baseline to a RNN with attention

Download the zip file:

Then Uncompress directly from Google Drive inside a repository "Colab Notebooks" that you have to create first

Then from your Google Drive you will be able to "Open with" Colaboratory the ipnyb files

Representational learning in NLP & Neural language models by Tomas Mikolov, PhD, Researcher at Facebook

We will cover representational learning from text: this includes algorithms such as word2vec and fastText. We will describe the differences between various algorithms for learning representations. Efficient supervised text classification with the fastText algorithm will also be discussed. Statistical language models based on neural networks will be introduced, and certain advanced topics such as vanishing and exploding gradients, as well as learning the longer term memory in recurrent networks will be explained. We will also talk about the limitations of the current learning algorithms, and discuss the limitations of generalization in the context of sequential data, and learning from language in general.



Access & Accommodation

Accommodation facilities for academic members:
The University provides student individual rooms and studios in Nice at very interesting prices.
Please visit the following site
Résidences ICARE - Crous de Nice-Toulon
For English version, go to Bottom of the page and click on Langage.

To become a partner, please


Past sessions