Simultaneous Localization and Mapping (SLAM) is a well-studied area during the past 20 years, yet there is no efficient method for large-scale and long-term indoor/outdoor application.
The robust and efficient feature is very necessary for visual place recognition (VPR). We apply an unsupervised feature learning method, where the raw image is converted into a lower dimension code for place encoding and fast retrieval. The Major challenge for Visual Place Recognition:

  1. In the real word, appearance are variant and some of them are very similar;
  2. The same place may have variant appearance under different conditions;
  3. Dynamic Objects add additional noise for place recognition;
  4. No label for Visual Place Recognition.

We apply a CapsuleNet based encoder module, which has an efficient dynamic routing mechanism. As you can see in the left figure, the first several convolution layers extract out 4 properties for N local descriptors. In the Dynamic Routing step, the N local descriptors are clustered into lower dimension M descriptors (N>>M) via an Expectation-Maximum like method. And the M descriptors are actually the local feature description for the scene.
The module framework is an Autoencoder-GAN like framework. The CapsuleNet based Encoder is applied to extract the visual features. The Generative neural networks (GAN) is applied to ensure visual features can capture enough geometry detail to generate realistic images.
Place Recognition under variant conditions AGVSLAMF5 AGVSLAMF6 AGVSLAMF7 Place Recognition under variant conditions AGVSLAMF3 AGVSLAMF4
Scene Reconstruction from the extracted place features. AGVSLAMF8 AGVSLAMF9 AGVSLAMF10