Book description
Multimodal Scene Understanding: Algorithms, Applications and Deep Learning presents recent advances in multi-modal computing, with a focus on computer vision and photogrammetry. It provides the latest algorithms and applications that involve combining multiple sources of information and describes the role and approaches of multi-sensory data and multi-modal deep learning. The book is ideal for researchers from the fields of computer vision, remote sensing, robotics, and photogrammetry, thus helping foster interdisciplinary interaction and collaboration between these realms.
Researchers collecting and analyzing multi-sensory data collections – for example, KITTI benchmark (stereo+laser) - from different platforms, such as autonomous vehicles, surveillance cameras, UAVs, planes and satellites will find this book to be very useful.
- Contains state-of-the-art developments on multi-modal computing
- Shines a focus on algorithms and applications
- Presents novel deep learning topics on multi-sensor fusion and multi-modal deep learning
Table of contents
- Cover image
- Title page
- Table of Contents
- Copyright
- List of Contributors
- Chapter 1: Introduction to Multimodal Scene Understanding
- Chapter 2: Deep Learning for Multimodal Data Fusion
- Chapter 3: Multimodal Semantic Segmentation: Fusion of RGB and Depth Data in Convolutional Neural Networks
- Chapter 4: Learning Convolutional Neural Networks for Object Detection with Very Little Training Data
- Chapter 5: Multimodal Fusion Architectures for Pedestrian Detection
- Chapter 6: Multispectral Person Re-Identification Using GAN for Color-to-Thermal Image Translation
- Chapter 7: A Review and Quantitative Evaluation of Direct Visual–Inertial Odometry
- Chapter 8: Multimodal Localization for Embedded Systems: A Survey
-
Chapter 9: Self-Supervised Learning from Web Data for Multimodal Retrieval
- Abstract
- Acknowledgements
- 9.1. Introduction
- 9.2. Related Work
- 9.3. Multimodal Text–Image Embedding
- 9.4. Text Embeddings
- 9.5. Benchmarks
- 9.6. Retrieval on InstaCities1M and WebVision Datasets
- 9.7. Retrieval in the MIRFlickr Dataset
- 9.8. Comparing the Image and Text Embeddings
- 9.9. Visualizing CNN Activation Maps
- 9.10. Visualizing the Learned Semantic Space with t-SNE
- 9.11. Conclusions
- References
- Chapter 10: 3D Urban Scene Reconstruction and Interpretation from Multisensor Imagery
- Chapter 11: Decision Fusion of Remote-Sensing Data for Land Cover Classification
- Chapter 12: Cross-modal Learning by Hallucinating Missing Modalities in RGB-D Vision
- Index
Product information
- Title: Multimodal Scene Understanding
- Author(s):
- Release date: July 2019
- Publisher(s): Academic Press
- ISBN: 9780128173596
You might also like
book
The Realistic HDR Image
Creating a realistic HDR image with multiple exposures depends on your ability to analyze a scene …
book
Spanning Time
Spanning Time: The Essential Guide to Time-lapse Photography is the ultimate how-to guide for creating time-lapse …
video
NLP in Action: Attention mechanism and upgrading TorchText 0.9
Hobson Lane updates Chapter 12 PyTorch examples for training an encoder-decoder to translate between English and …
article
Reinventing the Organization for GenAI and LLMs
Previous technology breakthroughs did not upend organizational structure, but generative AI and LLMs will. We now …