Home Strategy Activities Grants Publications People Sponsors Contact Us 
  

TaBSEh17ascom

D. Tapiador, A Berihuete, L.M. Sarro, F. Julbe, E. Huedo. Enabling data science in the Gaia mission archive: The present-day mass function and age distribution. Astronomy and Computing, 19:1-15, 2017.

Abstract

Recent advances in large scale computing architectures enable new opportunities to extract value out of the vast amounts of data being currently generated. However, their successful adoption is not straightforward in areas like science, as there are still some barriers that need to be overcome. Those comprise (i) the existence of legacy code that needs to be ported, (ii) the lack of high-level and use case specific frameworks that facilitate a smoother transition, or (iii) the scarcity of profiles with the balanced skill sets between the technological and scientific domains. The European Space Agency’s Gaia mission will create the largest and most precise three dimensional chart of our galaxy (the Milky Way), providing unprecedented position, parallax and proper motion measurements for about one billion stars. The successful exploitation of this data archive will depend on the ability to offer the proper infrastructure upon which scientists will be able to do exploration and modelling with this huge data set. In this paper, we present and contextualize these challenges by building two probabilistic models using Hierarchical Bayesian Modelling. These models represent a key challenge in astronomy and are of paramount importance for the Gaia mission itself. Moreover, we approach the implementation by leveraging a generic distributed processing engine through an existing software package for Markov chain Monte Carlo sampling. The two computationally intensive models are then validated with simulated data in different scenarios under specific restrictions, and their performance is assessed to prove their scalability. We argue that this approach will not only serve for the models in hand but also for exemplifying how to address similar problems in science, which may need to both scale to bigger data sets and reuse existing software as much as possible. This will lead to shorter time to science in massive data archives

Keywords

[ Tin2015-65469-p ] [ Cloud ] [ Grid ]

Contact

Eduardo Huedo

BibTex Reference

@article{TaBSEh17ascom,
   Author = {Tapiador, D. and Berihuete, A and Sarro, L.M. and Julbe, F. and Huedo, E.},
   Title = {Enabling data science in the Gaia mission archive: The present-day mass function and age distribution},
   Journal = {Astronomy and Computing},
   Volume = {19},
   Pages = {1--15},
   Year = {2017}
}

Admin · Log In