Download PDFOpen PDF in browser

Detection of unknown galaxy types in large databases of galaxy images

10 pagesPublished: March 1, 2021


Modern digital sky surveys utilize robotic telescopes that collect extremely large multi- PB astronomical databases. While these databases can contain billions of galaxies, most of the galaxies are “regular” galaxies of known galaxy types. However, a small portion of the galaxies is rare “peculiar” galaxies that are not yet known. These unknown galaxies are of paramount scientific interest, but due to the enormous size of astronomical databases they are practically impossible to find without automation. Since these novelty galaxies are, by definition, not known, machine learning models cannot be trained to detect them. In this paper, an unsupervised machine learning method for automatic detection of novelty galaxies in large databases is proposed. The method is based on a large and comprehensive set of numerical image content descriptors weighted by their entropy, and the farthest neighbors are ranked-ordered to handle self-similar peculiar galaxies that are expected in the very large datasets. Experimental results using data from the Panoramic Survey Telescope and Rapid Response System (Pan-STARRS) show that the ability of the method to detect novelty galaxies outperforms other shallow learning methods such as one-class SVM, Local Outlier Factor, and K-Means, and also newer deep learning-based methods such as auto-encoders. The dataset used to evaluate the method is publicly available and can be used as a benchmark to test future algorithms for automatic detection of peculiar galaxies.

Keyphrases: Astroinformatics, astrophysics, knowledge discovery, Novelty Galaxies, Peculiar galaxies in Pan-STARRS

In: Alexander Redei, Rui Wu and Frederick C. Harris Jr (editors). SEDE 2020. 29th International Conference on Software Engineering and Data Engineering, vol 76, pages 29--38

BibTeX entry
  author    = {Venkata Siva Kumar Margapuri and Basant Thapa and Lior Shamir},
  title     = {Detection of unknown galaxy types in large databases of galaxy images},
  booktitle = {SEDE 2020. 29th International Conference on Software Engineering and Data Engineering},
  editor    = {Alex Redei and Rui Wu and Frederick Harris},
  series    = {EPiC Series in Computing},
  volume    = {76},
  pages     = {29--38},
  year      = {2021},
  publisher = {EasyChair},
  bibsource = {EasyChair,},
  issn      = {2398-7340},
  url       = {},
  doi       = {10.29007/5xhn}}
Download PDFOpen PDF in browser