Big data for the universeFebruary 09, 2017
Astronomers at Lomonosov Moscow State University in cooperation with their French colleagues and with the help of citizen scientists have released «The Reference Catalog of galaxy SEDs» (RCSED), which contains value-added information about 800,000 galaxies. The catalog is accessible on the web and its description has been published in the Astrophysical Journal Supplement (impact factor -- 11.257). Two co-authors of the research paper are undergraduate students at the Faculty of Physics, Lomonosov Moscow State University. While still working on the catalog, the team has published a few research papers based on the data from it, including a study published by the prestigious interdisciplinary journal Science.
What can one learn using RCSED and why is it unique?
RCSED describes properties of 800,000 galaxies derived from the elaborated data analysis. For every galaxy, it presents its stellar composition, brightness at ultraviolet, optical, and near-infrared wavelengths. From RCSED, one can also access galaxy spectra obtained by the Sloan Digital Sky Survey, measurements of spectral lines, and properties determined from them, such as the chemical composition of stars and gas, contained in those galaxies. This makes RCSED the first catalog of its kind, which contains results of detailed homogeneous analysis for such large number of objects. Dr. Igor Chilingarian, an astronomer at Smithsonian Astrophysical Observatory, USA and a Lead Researcher at Sternberg Astronomical Institute, Lomonosov Moscow State University says: "For every galaxy we also provide a small cutout image from three sky surveys, which show how the galaxy looks at different wavelengths. This provides us with the data for further investigations." Dr. Ivan Katkov, a Senior Researcher at Sternberg Astronomical Institute adds: "The analysis of emission line profiles presented in RCSED is substantially more detailed and accurate then the data published in other catalogs".
RCSED is really flexible and very easy to use. By simply entering the object name or its coordinates in the search field, the web site will provide in a single page all the information referring to that object contained in the catalog. One can also use the catalog through Virtual Observatory applications such as TOPCAT. The RCSED web site also provides tutorials including the one, which describes a technique that Igor Chilingarian and Ivan Zolotukhin exploited to discover new compact elliptical galaxies, which were later published in the research paper «Isolated compact elliptical galaxies: Stellar systems that ran away».
Another interesting detail about RCSED is that the team actively used the help of citizen scientists to develop the project web site. And among them there were high-level experts in software development and web design, who have daytime jobs in the largest Russian IT-companies. Dr. Ivan Zolotukhin, a Researcher at Sternberg Astronomical Institute, explains: "Programmers sometimes get burnt out by their routine work, and they would like to do something interesting and pleasant in their spare time, for instance, to help scientists. We are very grateful to them, they have become important members of our team and significantly strengthened our project. It's been always interesting for us to cooperate with IT specialists and we have a lot more projects where they can contribute. So if you use git, program in Python or know HTML/CSS, love stars, have a bit of spare time and are willing to help an international research team - please, contact us using the address published on the web page.
Dr. Ivan Katkov adds: "The RCSED catalog became possible thanks to the application of an interdisciplinary Big Data approach as we had to apply very complex scientific algorithms to a large dataset in a massively parallel way. Eventually, the expertise and resources available at large IT companies would undoubtedly allow researchers to significantly increase the quality and the quantity of research results and to make many important discoveries in astrophysics".
The fact that the RCSED catalog has attracted serious interest in the scientific community even during its assembly phase proves its great potential. During the last three years several external researchers were given the access to the catalog on request and, using RCSED data, published over a dozen of articles in professional peer-reviewed journals (Astrophysical Journal, Astronomy & Astrophysics, MNRAS). The catalog is the world largest homogeneous value-added dataset for nearby galaxies, containing information collected with ground-based and space telescopes. The unique research material for extragalactic astrophysics contained in RCSED will certainly help astrophysicists to achieve new interesting scientific results, some of which would probably qualify for publication in the interdisciplinary journals Science and Nature.
RCSED expansion prospects: one million galaxies will be there soon
The current release of the RCSED catalog could have comprised a larger number of galaxies or contained extra bits of information about the currently included objects, but at this moment the scientists have decided to focus on well-characterized datasets, which are described in detail and have known advantages and disadvantages. However, taking into account the project importance for extragalactic astronomy and observational cosmology, the RCSED team is going to move forward and expand the catalog in the near future.
There are two principal directions of further RCSED development: the galaxy sample expansion and incorporating new data for existing objects. The team considers a possibility to include near- and mid-infrared data from the WIS? satellite all-sky survey for the entire galaxy sample. However, this requires some additional methodical work in order to homogenize the data for galaxies at different redshifts.
Moreover, it is possible to expand the principal galaxy sample by including spectra from the latest data release of the SDSS-III survey. This will turn 800,000 to 1.5 million objects.
Incorporating the publicly available spectral data from the Hectospec archive (Igor Chilingarian has played a major role in the Hectospec archive project) will add 300-400 thousand objects at larger distances, whose spectra were collected with the 6.5-meter MMT telescope in Arizona. The current RCSED release comprises mostly nearby galaxies (by cosmological measures), whose redshifts are smaller than 0.4, because SDSS did not include faint objects. Therefore, the early Universe is not represented in the catalog at all. The Hectospec archive will allow the team to move a little bit further in the cosmological distance scale until the redshift of 0.7. If they add several thousand galaxies from the DEEP2 survey conducted with the 10-meter Keck telescope in early 2000s, they could get insights into objects at redshift up-to 1.0, when the Universe was less than half of its present age.
Igor Chilingarian concludes: "We shall be able to see the global picture in about ten years from now, when large surveys like DESI have collected 25-30 million galaxy spectra out to intermediate redshifts."
Lomonosov Moscow State University
Related Data Articles:
A special issue of Big Data presents a series of insightful articles that focus on Big Data and Social and Technical Trade-Offs.
AAS published its first data description paper on June 8, 2017.
Combining results from bibliometric analyses, a global sample of researcher opinions and case-study interviews, a new report reveals that although the benefits of open research data are well known, in practice, confusion remains within the researcher community around when and how to share research data.
A Northwestern and Los Alamos team developed a novel workflow combining machine learning and density functional theory calculations to create design guidelines for new materials that exhibit useful electronic properties, such as ferroelectricity and piezoelectricity.
Astronomers at Lomonosov Moscow State University in cooperation with their French colleagues and with the help of citizen scientists have released 'The Reference Catalog of galaxy SEDs,' which contains value-added information about 800,000 galaxies.
Rapid advances in computing constantly translate into new technologies in our everyday lives.
The increasingly popular subject of raw diffraction data deposition is examined in a Topical Review in IUCrJ.
How much electricity flows through the grid? When and where?
A new 'robust' statistical method from MIT enables efficient model fitting with corrupted, high-dimensional data.
A multi-disciplinary team of researchers at UC Riverside has received $3 million from the National Science Foundation Research Traineeship program to prepare the next generation of scientists and engineers who will learn how to exploit the power of big data to understand insects.
Related Data Reading:
Everybody Lies: Big Data, New Data, and What the Internet Can Tell Us About Who We Really Are
by Seth Stephens-Davidowitz (Author)
An Economist Best Book of the Year
A PBS NewsHour Book of the Year
An Entrepeneur Top Business Book
An Amazon Best Book of the Year in Business and Leadership
New York Times Bestseller
Foreword by Steven Pinker, author of The Better Angels of our Nature
Blending the informed analysis of The Signal and the Noise with the instructive iconoclasm of Think Like a Freak, a fascinating,... View Details
Data Science for Business: What You Need to Know about Data Mining and Data-Analytic Thinking
by Foster Provost (Author), Tom Fawcett (Author)
Written by renowned data science experts Foster Provost and Tom Fawcett, Data Science for Business introduces the fundamental principles of data science, and walks you through the "data-analytic thinking" necessary for extracting useful knowledge and business value from the data you collect. This guide also helps you understand the many data-mining techniques in use today.
Based on an MBA course Provost has taught at New York University over the past ten years, Data Science for Business provides examples of real-world business problems to illustrate these principles. You’ll... View Details
Designing Data-Intensive Applications: The Big Ideas Behind Reliable, Scalable, and Maintainable Systems
by Martin Kleppmann (Author)
Data is at the center of many challenges in system design today. Difficult issues need to be figured out, such as scalability, consistency, reliability, efficiency, and maintainability. In addition, we have an overwhelming variety of tools, including relational databases, NoSQL datastores, stream or batch processors, and message brokers. What are the right choices for your application? How do you make sense of all these buzzwords?
In this practical and comprehensive guide, author Martin Kleppmann helps you navigate this diverse landscape by examining the pros and cons of various... View Details
Storytelling with Data: A Data Visualization Guide for Business Professionals
by Cole Nussbaumer Knaflic (Author)
Don't simply show your data—tell a story with it! Storytelling with Data teaches you the fundamentals of data visualization and how to communicate effectively with data. You'll discover the power of storytelling and the way to make data a pivotal point in your story. The lessons in this illuminative text are grounded in theory, but made accessible through numerous real-world examples—ready for immediate application to your next graph or presentation.
Storytelling is not an inherent skill,... View Details
Microsoft Excel Data Analysis and Business Modeling (5th Edition)
by Wayne Winston (Author)
Master business modeling and analysis techniques with Microsoft Excel 2016, and transform data into bottom-line results. Written by award-winning educator Wayne Winston, this hands on, scenario-focused guide helps you use Excel’s newest tools to ask the right questions and get accurate, actionable answers. This edition adds 150+ new problems with solutions, plus a chapter of basic spreadsheet models to make sure you’re fully up to speed.
Solve real business problems with Excel—and build your... View Details Statistical methods are a key part of of data science, yet very few data scientists have any formal statistics training. Courses and books on basic statistics rarely cover the topic from a data science perspective. This practical guide explains how to apply various statistical methods to data science, tells you how to avoid their misuse, and gives you advice on what's important and what's not. Many data science resources incorporate statistical methods but lack a deeper statistical perspective. If you’re familiar with the R programming language, and have some exposure to... View Details Knowledge of statistics is essential in modern biology and medicine. Biologists and health professionals learn statistics best with real and interesting examples. The Analysis of Biological Data, Second Edition, by Whitlock and Schluter, teaches modern methods of statistics through the use of fascinating biological and medical cases. Readers consistently praise its clear and engaging writing and practical perspective. The second edition features over 200 new examples and problems. These include new calculation practice problems, which guide the student step by step through the... View Details
by Giorgia Lupi (Author), Stefanie Posavec (Author), Maria Popova (Foreword)
Equal parts mail art, data visualization, and affectionate correspondence, Dear Data celebrates "the infinitesimal, incomplete, imperfect, yet exquisitely human details of life," in the words of Maria Popova (Brain Pickings), who introduces this charming and graphically powerful book. For one year, Giorgia Lupi, an Italian living in New York, and Stefanie Posavec, an American in London, mapped the particulars of their daily lives as a series of hand-drawn postcards they exchanged via mail weekly—small portraits as full of emotion as they are data, both mundane and magical. Dear... View Details
Data Wise, Revised and Expanded Edition: A Step-by-Step Guide to Using Assessment Results to Improve Teaching and Learning
by Kathryn Parker Boudett (Editor), Elizabeth A. City (Editor), Richard J. Murnane (Editor)
Data Wise: A Step-by-Step Guide to Using Assessment Results to Improve Teaching and Learning presents a clear and carefully tested blueprint for school leaders. It shows how examining test scores and other classroom data can become a catalyst for important schoolwide conversations that will enhance schools’ abilities to capture teachers’ knowledge, foster collaboration, identify obstacles to change, and enhance school culture and climate.
This revised and expanded edition captures the learning that has emerged in integrating the Data Wise process into school practice... View Details
Data Smart: Using Data Science to Transform Information into Insight
by John W. Foreman (Author)
Data Science gets thrown around in the press like it's magic. Major retailers are predicting everything from when their customers are pregnant to when they want a new pair of Chuck Taylors. It's a brave new world where seemingly meaningless data can be transformed into valuable insight to drive smart business decisions.
But how does one exactly do data science? Do you have to hire one of these priests of the dark arts, the "data scientist," to extract this gold from your data? Nope.
Data science is little more than using straight-forward steps to process raw data into... View Details
Practical Statistics for Data Scientists: 50 Essential Concepts
by Peter Bruce (Author), Andrew Bruce (Author)
The Analysis of Biological Data, Second Edition
by Michael C. Whitlock (Author), Dolph Schluter (Author)
Statistical methods are a key part of of data science, yet very few data scientists have any formal statistics training. Courses and books on basic statistics rarely cover the topic from a data science perspective. This practical guide explains how to apply various statistical methods to data science, tells you how to avoid their misuse, and gives you advice on what's important and what's not.
Many data science resources incorporate statistical methods but lack a deeper statistical perspective. If you’re familiar with the R programming language, and have some exposure to... View Details
Knowledge of statistics is essential in modern biology and medicine. Biologists and health professionals learn statistics best with real and interesting examples. The Analysis of Biological Data, Second Edition, by Whitlock and Schluter, teaches modern methods of statistics through the use of fascinating biological and medical cases. Readers consistently praise its clear and engaging writing and practical perspective.
The second edition features over 200 new examples and problems. These include new calculation practice problems, which guide the student step by step through the... View Details