Nav: Home

Big data for the universe

February 09, 2017

Astronomers at Lomonosov Moscow State University in cooperation with their French colleagues and with the help of citizen scientists have released «The Reference Catalog of galaxy SEDs» (RCSED), which contains value-added information about 800,000 galaxies. The catalog is accessible on the web and its description has been published in the Astrophysical Journal Supplement (impact factor -- 11.257). Two co-authors of the research paper are undergraduate students at the Faculty of Physics, Lomonosov Moscow State University. While still working on the catalog, the team has published a few research papers based on the data from it, including a study published by the prestigious interdisciplinary journal Science.

What can one learn using RCSED and why is it unique?

RCSED describes properties of 800,000 galaxies derived from the elaborated data analysis. For every galaxy, it presents its stellar composition, brightness at ultraviolet, optical, and near-infrared wavelengths. From RCSED, one can also access galaxy spectra obtained by the Sloan Digital Sky Survey, measurements of spectral lines, and properties determined from them, such as the chemical composition of stars and gas, contained in those galaxies. This makes RCSED the first catalog of its kind, which contains results of detailed homogeneous analysis for such large number of objects. Dr. Igor Chilingarian, an astronomer at Smithsonian Astrophysical Observatory, USA and a Lead Researcher at Sternberg Astronomical Institute, Lomonosov Moscow State University says: "For every galaxy we also provide a small cutout image from three sky surveys, which show how the galaxy looks at different wavelengths. This provides us with the data for further investigations." Dr. Ivan Katkov, a Senior Researcher at Sternberg Astronomical Institute adds: "The analysis of emission line profiles presented in RCSED is substantially more detailed and accurate then the data published in other catalogs".

RCSED is really flexible and very easy to use. By simply entering the object name or its coordinates in the search field, the web site will provide in a single page all the information referring to that object contained in the catalog. One can also use the catalog through Virtual Observatory applications such as TOPCAT. The RCSED web site also provides tutorials including the one, which describes a technique that Igor Chilingarian and Ivan Zolotukhin exploited to discover new compact elliptical galaxies, which were later published in the research paper «Isolated compact elliptical galaxies: Stellar systems that ran away».

Another interesting detail about RCSED is that the team actively used the help of citizen scientists to develop the project web site. And among them there were high-level experts in software development and web design, who have daytime jobs in the largest Russian IT-companies. Dr. Ivan Zolotukhin, a Researcher at Sternberg Astronomical Institute, explains: "Programmers sometimes get burnt out by their routine work, and they would like to do something interesting and pleasant in their spare time, for instance, to help scientists. We are very grateful to them, they have become important members of our team and significantly strengthened our project. It's been always interesting for us to cooperate with IT specialists and we have a lot more projects where they can contribute. So if you use git, program in Python or know HTML/CSS, love stars, have a bit of spare time and are willing to help an international research team - please, contact us using the address published on the web page.

Dr. Ivan Katkov adds: "The RCSED catalog became possible thanks to the application of an interdisciplinary Big Data approach as we had to apply very complex scientific algorithms to a large dataset in a massively parallel way. Eventually, the expertise and resources available at large IT companies would undoubtedly allow researchers to significantly increase the quality and the quantity of research results and to make many important discoveries in astrophysics".

The fact that the RCSED catalog has attracted serious interest in the scientific community even during its assembly phase proves its great potential. During the last three years several external researchers were given the access to the catalog on request and, using RCSED data, published over a dozen of articles in professional peer-reviewed journals (Astrophysical Journal, Astronomy & Astrophysics, MNRAS). The catalog is the world largest homogeneous value-added dataset for nearby galaxies, containing information collected with ground-based and space telescopes. The unique research material for extragalactic astrophysics contained in RCSED will certainly help astrophysicists to achieve new interesting scientific results, some of which would probably qualify for publication in the interdisciplinary journals Science and Nature.

RCSED expansion prospects: one million galaxies will be there soon

The current release of the RCSED catalog could have comprised a larger number of galaxies or contained extra bits of information about the currently included objects, but at this moment the scientists have decided to focus on well-characterized datasets, which are described in detail and have known advantages and disadvantages. However, taking into account the project importance for extragalactic astronomy and observational cosmology, the RCSED team is going to move forward and expand the catalog in the near future.

There are two principal directions of further RCSED development: the galaxy sample expansion and incorporating new data for existing objects. The team considers a possibility to include near- and mid-infrared data from the WIS? satellite all-sky survey for the entire galaxy sample. However, this requires some additional methodical work in order to homogenize the data for galaxies at different redshifts.

Moreover, it is possible to expand the principal galaxy sample by including spectra from the latest data release of the SDSS-III survey. This will turn 800,000 to 1.5 million objects.

Incorporating the publicly available spectral data from the Hectospec archive (Igor Chilingarian has played a major role in the Hectospec archive project) will add 300-400 thousand objects at larger distances, whose spectra were collected with the 6.5-meter MMT telescope in Arizona. The current RCSED release comprises mostly nearby galaxies (by cosmological measures), whose redshifts are smaller than 0.4, because SDSS did not include faint objects. Therefore, the early Universe is not represented in the catalog at all. The Hectospec archive will allow the team to move a little bit further in the cosmological distance scale until the redshift of 0.7. If they add several thousand galaxies from the DEEP2 survey conducted with the 10-meter Keck telescope in early 2000s, they could get insights into objects at redshift up-to 1.0, when the Universe was less than half of its present age.

Igor Chilingarian concludes: "We shall be able to see the global picture in about ten years from now, when large surveys like DESI have collected 25-30 million galaxy spectra out to intermediate redshifts."
-end-
The RCSED project has been supported by the collaborative grant, provided by the Russian Foundation for Basic Research (RFBR) and The French National Center for Scientific Research (Centre National de la Recherche Scientifique, CNRS). On earlier stages the project was supported by the grants from the Russian Science Foundation (RScF), the President of the Russian Federation, along with French resources, available in the framework of the VO-Paris Data Center at the Paris Observatory.

Lomonosov Moscow State University

Related Data Articles:

Discrimination, lack of diversity, & societal risks of data mining highlighted in big data
A special issue of Big Data presents a series of insightful articles that focus on Big Data and Social and Technical Trade-Offs.
Journal AAS publishes first data description paper: Data collection and sharing
AAS published its first data description paper on June 8, 2017.
73 percent of academics say access to research data helps them in their work; 34 percent do not publish their data
Combining results from bibliometric analyses, a global sample of researcher opinions and case-study interviews, a new report reveals that although the benefits of open research data are well known, in practice, confusion remains within the researcher community around when and how to share research data.
Designing new materials from 'small' data
A Northwestern and Los Alamos team developed a novel workflow combining machine learning and density functional theory calculations to create design guidelines for new materials that exhibit useful electronic properties, such as ferroelectricity and piezoelectricity.
Big data for the universe
Astronomers at Lomonosov Moscow State University in cooperation with their French colleagues and with the help of citizen scientists have released 'The Reference Catalog of galaxy SEDs,' which contains value-added information about 800,000 galaxies.
What to do with the data?
Rapid advances in computing constantly translate into new technologies in our everyday lives.
Why keep the raw data?
The increasingly popular subject of raw diffraction data deposition is examined in a Topical Review in IUCrJ.
Infrastructure data for everyone
How much electricity flows through the grid? When and where?
Finding patterns in corrupted data
A new 'robust' statistical method from MIT enables efficient model fitting with corrupted, high-dimensional data.
Big data for little creatures
A multi-disciplinary team of researchers at UC Riverside has received $3 million from the National Science Foundation Research Traineeship program to prepare the next generation of scientists and engineers who will learn how to exploit the power of big data to understand insects.

Related Data Reading:

Data Science for Business: What You Need to Know about Data Mining and Data-Analytic Thinking
by Foster Provost (Author), Tom Fawcett (Author)

Written by renowned data science experts Foster Provost and Tom Fawcett, Data Science for Business introduces the fundamental principles of data science, and walks you through the "data-analytic thinking" necessary for extracting useful knowledge and business value from the data you collect. This guide also helps you understand the many data-mining techniques in use today.

Based on an MBA course Provost has taught at New York University over the past ten years, Data Science for Business provides examples of real-world business problems to illustrate these principles. You’ll... View Details


Everybody Lies: Big Data, New Data, and What the Internet Can Tell Us About Who We Really Are
by Seth Stephens-Davidowitz (Author)

New York Times Bestseller

Foreword by Steven Pinker, author of The Better Angels of our Nature

Blending the informed analysis of The Signal and the Noise with the instructive iconoclasm of Think Like a Freak, a fascinating, illuminating, and witty look at what the vast amounts of information now instantly available to us reveals about ourselves and our world—provided we ask the right questions.

By the end of an average day in the early twenty-first century, human beings searching the internet will... View Details


Storytelling with Data: A Data Visualization Guide for Business Professionals
by Cole Nussbaumer Knaflic (Author)

Don't simply show your datatell a story with it!  Storytelling with Data teaches you the fundamentals of data visualization and how to communicate effectively with data. You'll discover the power of storytelling and the way to make data a pivotal point in your story. The lessons in this illuminative text are grounded in theory, but made accessible through numerous real-world examplesready for immediate application to your next graph or presentation. 
Storytelling is not an inherent skill,... View Details


Designing Data-Intensive Applications: The Big Ideas Behind Reliable, Scalable, and Maintainable Systems
by Martin Kleppmann (Author)

Data is at the center of many challenges in system design today. Difficult issues need to be figured out, such as scalability, consistency, reliability, efficiency, and maintainability. In addition, we have an overwhelming variety of tools, including relational databases, NoSQL datastores, stream or batch processors, and message brokers. What are the right choices for your application? How do you make sense of all these buzzwords?

In this practical and comprehensive guide, author Martin Kleppmann helps you navigate this diverse landscape by examining the pros and cons of various... View Details


Data Smart: Using Data Science to Transform Information into Insight
by John W. Foreman (Author)

Data Science gets thrown around in the press like it's magic. Major retailers are predicting everything from when their customers are pregnant to when they want a new pair of Chuck Taylors. It's a brave new world where seemingly meaningless data can be transformed into valuable insight to drive smart business decisions.

But how does one exactly do data science? Do you have to hire one of these priests of the dark arts, the "data scientist," to extract this gold from your data? Nope.

Data science is little more than using straight-forward steps to process raw data into... View Details


Dear Data
by Giorgia Lupi (Author), Stefanie Posavec (Author), Maria Popova (Foreword)

Equal parts mail art, data visualization, and affectionate correspondence, Dear Data celebrates "the infinitesimal, incomplete, imperfect, yet exquisitely human details of life," in the words of Maria Popova (Brain Pickings), who introduces this charming and graphically powerful book. For one year, Giorgia Lupi, an Italian living in New York, and Stefanie Posavec, an American in London, mapped the particulars of their daily lives as a series of hand-drawn postcards they exchanged via mail weekly—small portraits as full of emotion as they are data, both mundane and magical. Dear... View Details


Data Science from Scratch: First Principles with Python
by Joel Grus (Author)

Data science libraries, frameworks, modules, and toolkits are great for doing data science, but they’re also a good way to dive into the discipline without actually understanding data science. In this book, you’ll learn how many of the most fundamental data science tools and algorithms work by implementing them from scratch.

If you have an aptitude for mathematics and some programming skills, author Joel Grus will help you get comfortable with the math and statistics at the core of data science, and with hacking skills you need to get started as a data scientist. Today’s... View Details


R for Data Science: Import, Tidy, Transform, Visualize, and Model Data
by Hadley Wickham (Author), Garrett Grolemund (Author)

Learn how to use R to turn raw data into insight, knowledge, and understanding. This book introduces you to R, RStudio, and the tidyverse, a collection of R packages designed to work together to make data science fast, fluent, and fun. Suitable for readers with no previous programming experience, R for Data Science is designed to get you doing data science as quickly as possible.

Authors Hadley Wickham and Garrett Grolemund guide you through the steps of importing, wrangling, exploring, and modeling your data and communicating the results. You’ll get a complete, big-picture... View Details


Data and Goliath: The Hidden Battles to Collect Your Data and Control Your World
by Bruce Schneier (Author)

“Bruce Schneier’s amazing book is the best overview of privacy and security ever written.”―Clay Shirky

Your cell phone provider tracks your location and knows who’s with you. Your online and in-store purchasing patterns are recorded, and reveal if you're unemployed, sick, or pregnant. Your e-mails and texts expose your intimate and casual friends. Google knows what you’re thinking because it saves your private searches. Facebook can determine your sexual orientation without you ever mentioning it.

The powers that surveil us do more than simply store... View Details


Python for Data Analysis: Data Wrangling with Pandas, NumPy, and IPython
by Wes McKinney (Author)

Get complete instructions for manipulating, processing, cleaning, and crunching datasets in Python. Updated for Python 3.6, the second edition of this hands-on guide is packed with practical case studies that show you how to solve a broad set of data analysis problems effectively. You’ll learn the latest versions of pandas, NumPy, IPython, and Jupyter in the process.

Written by Wes McKinney, the creator of the Python pandas project, this book is a practical, modern introduction to data science tools in Python. It’s ideal for analysts new to Python and for Python programmers new... View Details

Best Science Podcasts 2017

We have hand picked the best science podcasts for 2017. Sit back and enjoy new science podcasts updated daily from your favorite science news services and scientists.
Now Playing: TED Radio Hour

Simple Solutions
Sometimes, the best solutions to complex problems are simple. But simple doesn't always mean easy. This hour, TED speakers describe the innovation and hard work that goes into achieving simplicity. Guests include designer Mileha Soneji, chef Sam Kass, sleep researcher Wendy Troxel, public health advocate Myriam Sidibe, and engineer Amos Winter.
Now Playing: Science for the People

#448 Pavlov (Rebroadcast)
This week, we're learning about the life and work of a groundbreaking physiologist whose work on learning and instinct is familiar worldwide, and almost universally misunderstood. We'll spend the hour with Daniel Todes, Ph.D, Professor of History of Medicine at The Johns Hopkins University, discussing his book "Ivan Pavlov: A Russian Life in Science."