Data Sets
Articles tagged with Data Sets
Governments may shape what AI chatbots say by shaping the web they learn from, new Nature study finds
Method for stress-testing cloud computing algorithms helps avoid network failures
Researchers from MIT have developed a more user-friendly and efficient method to identify potential system failures in cloud computing algorithms. The 'MetaEase' technique analyzes an algorithm's source code directly to uncover hidden blind spots that might cause unexpected failures, reducing the risk of costly network outages.
OpenBind’s first data and model release marks a milestone for AI enabled drug discovery
The UK-led OpenBind initiative has released its first publicly available dataset and predictive AI model, accelerating the discovery of new medicines using artificial intelligence. The release showcases high-quality, standardized experimental data and a trained predictive model, enabling researchers worldwide to drive the next generati...
FAU study reveals how camels ‘beat the heat’ at the cellular level
Researchers found that camels have a more flexible and coordinated response to heat stress, allowing them to maintain stability even at higher temperatures. In contrast, human cells tend to respond in a more rigid way, making them less adaptable under heat stress.
Enabling privacy-preserving AI training on everyday devices
Researchers at MIT developed a technique to overcome memory constraints and communication bottlenecks in federated learning, enabling faster and more accurate AI model training. The new framework, FTTE, uses a subset of model parameters and an asynchronous approach to reduce lag time and improve training performance.
Machine learning facilitates the development of China's 1-km daily soil moisture dataset
A new study developed China's high-precision, 1 km resolution soil moisture dataset using machine learning techniques. This dataset enables daily monitoring of soil dryness and wetness conditions across the country, providing critical support for drought early warning, flood forecasting, and agricultural management.
A faster way to estimate AI power consumption
MIT researchers have created an 'EnergAIzer' method that generates reliable results in seconds, allowing data center operators to optimize resource allocation and reduce energy waste. The tool leverages patterns from AI workloads and software optimizations to provide fast but accurate power estimates.
Texas Children’s researcher awarded $6. 7 million NIH grant to accelerate Alzheimer’s drug discovery and advance new therapies
Researchers will use DNA-encoded chemical libraries and artificial intelligence to screen hundreds of millions of potential drug compounds, identifying those most likely to succeed in treating Alzheimer's. The project aims to shorten the timeline for identifying new treatments, bringing them to patients faster and with greater precision.
USC and Tempus form strategic collaboration aimed at accelerating innovation across research and patient care
The Keck School of Medicine of USC and Tempus are creating a system-wide framework to integrate clinical care, clinical trials, and research through AI-powered precision medicine tools. The goal is to enhance patient care and accelerate research and innovation.
Mapping the molecules of life: expanding the quantum-mechanical foundation for biomolecular AI
Researchers introduced QCell, a curated collection of 525,000 new quantum-mechanical calculations for biomolecular fragments. The dataset addresses the limited coverage of nucleic acids, lipids, and carbohydrates, enabling reliable simulations of critical biological processes such as DNA dynamics and membrane behavior.
Reported 2025 drug overdose ‘spike’ was an illusion, new study finds
A new Northwestern University study confirms that US drug overdose deaths have continued to decline following a peak in August 2023, contrary to speculation of manipulated CDC data. The study highlights the importance of accurate data for public health response and calls for greater transparency in federal data systems.
Helping data centers deliver higher performance with less hardware
Researchers developed Sandook, a software-based system that tackles three major sources of performance-hampering variability simultaneously. The two-tier architecture optimizes task distribution for the overall pool while faster schedulers on each SSD react to urgent events.
Center for BrainHealth forms groundbreaking research collaborative to enable data sharing, accelerate discovery
The BrainHealth Network connects researchers across the country to understand brain health improvement through advanced MRI imaging and data analysis. The network leverages a comprehensive multimodal brain imaging dataset, including a longitudinal study of 100,000 healthy participants over 10 years.
Study explores frameworks for improved indigenous data sovereignty
Researchers examined how practice-based research and learning networks approach data governance, identifying the importance of building knowledge of Indigenous data sovereignty. Existing Indigenous governance frameworks provide guidance on incorporating Indigenous data sovereignty principles into PBRLNs.
Shorebird science and conservation collective shows big data can protect birds
The Shorebird Science and Conservation Collective uses big data to inform conservation efforts by analyzing tracking data from over 3,400 individual birds. The collective brings together data from various organizations to provide actionable information for land managers and decision-makers.
Coastal ocean chemistry now substantially shaped by humans
A global analysis of over 2,300 seawater samples reveals human-made chemicals make up a significant portion of organic matter in coastal oceans. Industrial chemicals, including plastics and consumer products, dominate the anthropogenic chemical signal, persisting even 20 kilometers offshore.
New robotic microfluidic platform brings ai to lipid nanoparticle design
Engineers at the University of Pennsylvania have developed LIBRIS, an automated microfluidic platform capable of generating lipid nanoparticle formulations at high speed and scale. This enables the creation of large, systematic datasets needed to train predictive AI models, accelerating the design of lipid nanoparticles for mRNA delivery.
UT Arlington appoints Lal to lead precision health, informatics
Dr. Dennis Lal has been appointed as the new executive director of the Center for Innovation in Health Informatics at UT Arlington, succeeding Marion Ball. He will lead initiatives on precision health, clinical AI, and health care-scale informatics.
Big data and human height: ISTA scientists develop algorithm to boost biobank data retrieval & analysis
Researchers from ISTA developed an algorithm that can extract and analyze information from the world’s most extensive biobank with unprecedented accuracy and speed. The method, dubbed gVAMP, enhances the framework's ability to extract complex information from the dataset at hand, providing a detailed overview of the effects on a trait ...
With the right prompts, AI chatbots analyze big data accurately
Researchers at UCSF and Wayne State University found that generative AI tools can perform orders of magnitude faster than human teams in analyzing health data. Junior researchers paired with AI generated viable prediction models in minutes, outperforming experienced programmers in hours or days.
Treasure trove of data on worms in Europe's seas
A collaborative effort by researchers from the University of Göttingen and other institutions is creating a genomic inventory of European marine annelids. The goal is to accelerate biodiversity research worldwide and counteract the 'silent extinction' of marine species.
Uncovering patterns amid chaos
A recent NSF grant will support the development of new diagnostics and predictive models for understanding self-competition and weak asymmetry in turbulent flows. The project aims to uncover hidden patterns that current models miss, leading to improved simulations in weather forecasting, climate modeling, and engineering design.
Longest running study into open research practices shows strong researcher adoption, yet recognition gaps and regional variations remain
The State of Open Data report shows open data has become embedded into research practice with FAIR awareness widely recognized. Researchers need systems that reward openness and workflows that make sharing effortless to sustain progress.
UC3M presents a platform to improve mobility in the city of Madrid
NEXMO Datahub is a mobility data space that enables secure and reliable data exchange between public and private organizations, fostering innovative solutions for smarter and more sustainable mobility. The platform aims to accelerate the digital transformation of the sector through data sharing among key stakeholders.
WhatsApp data show: We often deceive ourselves
A study by Bielefeld University used anonymized WhatsApp metadata to show that personalized feedback can help people understand their communication habits. Many participants adjusted their views on response speed and chat participation after seeing data-based visualizations.
New video dataset to advance AI for health care
Researchers have launched a new multimodal medical dataset, Observer, capturing anonymized, real-time interactions between patients and clinicians. The dataset links video, audio, transcripts, and electronic health records to study subtleties like body language and environmental factors affecting care.
Subnational income inequality: Regional successes may hold key to addressing widening gap globally
A new study maps three decades of income inequality data globally, revealing worsening trends for half the world's population but 'bright spots' in regions with effective policies. Regional efforts such as investments in public health and education in India and cash transfer programs in Brazil show promise in reducing inequality.
Contactless pulse measurement falters at high heart rates
Researchers analyzed AI methods for detecting pulse rates from facial video recordings and found significant errors at elevated heart rates. The study highlights weaknesses in remote photoplethysmography (rPPG) technique under challenging conditions.
New report demonstrates how harnessing digitally generated data can transform humanitarian aid
A new report demonstrates how harnessing digital data from mobile phone applications and social media platforms can provide faster, more spatially fine-grained estimates of population movement. This information is crucial for delivering timely humanitarian assistance during crises.
IMDEA Networks creates a secure watermarking tool to protect institutional data
IMDEA Networks has created a secure watermarking tool called FreqyWM, allowing institutions to tag their data with a unique signature and preventing leaks and unauthorized copies. The tool enables the exchange and reuse of information while complying with legal frameworks on security and privacy.
UH engineers making AI faster, reducing power consumption
The team created a specialized two-dimensional thin film dielectric designed to replace traditional heat-generating components in integrated circuit chips. This breakthrough aims to reduce the significant energy cost and heat produced by high-performance computing necessary for AI.
KLU Press Release: Open Data for Global Mobility
A new open-source tool, SCGraph, enables users to calculate the shortest connection between two points worldwide across different transport modes. The system is based on a novel shortest path model that integrates data from road, rail, and maritime routes.
First-of-its-kind 3D model lets you explore Easter Island statues up close
A team of researchers from Binghamton University has created the first-ever high-resolution 3D model of Rano Raraku quarry, revealing over 1,000 moai statues. The model allows users to zoom in and pan across various features, providing a detailed look at the island's quarries and challenging previous theories about its history.
Researchers release landmark dataset tracking Japan's social psychology through the COVID-19 pandemic
A comprehensive dataset detailing Japanese adults' responses to the pandemic offers unprecedented insights into public attitudes and behaviors. The 30-wave panel survey captures how risk perception, preventive behaviors, and psychological distress evolved over four years.
New public dataset maps Medicare home health use
The Home Health Focus dataset provides insights into Medicare home health use from 2016 to 2019, including demographic data and patient function indicators. The analysis shows a rise in home health stays among beneficiaries while a decrease in active agencies during the same period.
Bigger datasets aren’t always better
The algorithm identifies the minimum set of locations where field studies would guarantee finding the least expensive route, considering problem structure and uncertainty. This method can be applied to broad classes of structured decision-making problems under uncertainty.
Pusan National University researchers show how AI can help in fashion trend prediction
Pusan National University researchers develop a novel prompting technique to improve ChatGPT's accuracy in predicting fashion trends. The study reveals that ChatGPT can capture emerging themes and identify new trends not found in existing data.
Stowers Institute appoints first AI Fellow to help advance biological research with artificial intelligence
The Stowers Institute has appointed its first AI Fellow, Sumner Magruder, to harness the potential of artificial intelligence in biological research. He will collaborate with researchers to design new algorithms and unlock insights from large datasets.
Global platform for pandemic preparedness to be established at DTU National Food Institute
The Global Pathogen Analysis Platform (GPAP) will enable low- and middle-income countries to conduct research and surveillance of infectious diseases independently. The platform aims to prevent disease outbreaks from developing into pandemics by detecting genetic sequences of potential pathogens.
The African Data Drive
The African Data Drive is an interactive tool that provides accessible, quality-assured spatial data to empower decision-makers to balance development needs with conservation priorities. The platform enables users to assess potential risks to biodiversity and access the most appropriate information on sustainable development.
The Gabriella Miller Kids First Data Resource Center (Kids First DRC) has launched the Variant Workbench
The Variant Workbench enables researchers to explore genetic data in a single, integrated workspace, linking genomic information with clinical conditions. By reducing data complexity, the tool facilitates scientific discovery and accelerates pace of research.
Project aims to advance workforce readiness in molecular bioscience
A new project aims to enhance workforce readiness in molecular bioscience by creating open-access resources and modules tailored to student needs. The Molecular Data Education Hub will host instructional materials and case studies for instructors to implement into their courses.
New AI-powered method helps protect global chip supply chains from cyber threats
Researchers at the University of Missouri have developed an AI-powered method to detect hidden hardware trojans in chip designs, offering a 97% accurate solution. The approach leverages large language models to scan for suspicious code and provides explanations for detected threats.
90% of Science Is Lost: Frontiers’ revolutionary AI-powered service transforms data sharing to deliver breakthroughs faster
Frontiers' revolutionary AI-powered FAIR² Data Management service is transforming the way scientific data is shared, enabling researchers to fuel progress and earn credit for their work. The service ensures every dataset is preserved, validated, citable, and reusable.
Newly released dataset BIRDBASE tracks ecological traits for 11,000 birds
The BIRDBASE dataset covers 78 ecological traits across 11,589 bird species, revealing that 54% are insectivores, with many tropical forest species under pressure. Fruit-eating birds disperse seeds in tropical forests, and fish-eating seabirds face elevated extinction risk.
The relaxed birder
A new framework for flexible data collection has been developed by Masumi Hisano, allowing for counts to take place in various settings, including cities and daily routines. This approach can help increase sample size and provide valuable insights into bird species assemblage datasets linked to landscape characteristics.
Vesalius cell-mapping tool provides insightful multi-layered view of cancer behavior
Researchers developed Vesalius to interpret complex data on cancer cell interactions, leading to potential discoveries in treating hard-to-treat cancers. The tool analyzes whole tissue architecture to identify predictive biomarkers and inform treatment options based on individual disease types.
Meet IDEA: An AI assistant to help geoscientists explore Earth and beyond
The Intelligent Data Exploring Assistant (IDEA) framework combines large language models with scientific data to analyze complex geoscience data. Researchers can ask IDEA to retrieve data, run analyses, and generate plots using plain-language questions.
Boosting behavior science with video datasets
Researchers created a large-scale mouse behavior dataset to study complex behaviors like problem-solving and goal-directed actions. The dataset, featuring over 100 hours of footage and detailed human annotations, provides a benchmark for developing algorithms that analyze animal cognition.
Release of high-resolution dynamical downscaling dataset for the Arctic Ocean
A new dataset has been developed to study climate change in the Arctic Ocean. The dataset spans from 1900 to 2100 with high resolution, offering a more accurate understanding of the region's changing climate. It significantly reduces simulation errors and improves representation of key variables.
Answer ALS Launches AI drug development collaboration with Tulane, Pennington Biomedical Research Center and GATC Health to advance ALS treatment discovery
Answer ALS is launching a groundbreaking collaboration with Tulane University and the Pennington Biomedical Research Center to harness AI for ALS treatment discovery. The Louisiana AI Drug Development Infrastructure for ALS (LADDIA) will prioritize therapeutic targets using AI-driven insights from the Answer ALS' Neuromine Data Portal.
Reliance on administrative billing codes to track medical conditions can lead to high diagnostic error rates
Researchers found that billing codes may mistakenly identify diseases in up to 45% of cases, highlighting a significant limitation in using administrative data for clinical research. The study examined records of 1.36 million patients and found discrepancies between coded diagnoses and actual disease presence.
Paleontologists will convene in Kansas to boost sharing and crediting of scholarly data
A conclave of paleontologists, data scientists, and journal editors will meet in Kansas to improve how data is shared among professionals and beyond. The event aims to align paleontological data with FAIR practices, making it easier for researchers to access and reuse.
Collecting large-scale data from impoverished communities
Researchers from Sapiens Labs created two ongoing data acquisition programs in India and Tanzania to collect large-scale, high-quality neuroimaging data. The programs have collected data from over 7,900 participants with comparable data quality to lab settings and lower costs.
Enhancing environmental data sharing: Policy brief Recommendations on Managing Data in the Green Deal Data Space
The joint policy brief from four EU projects aims to guide successful implementation of the Green Deal Data Space. It recommends standardised data exchange technologies, inclusive governance frameworks, and metadata management to unlock Europe's full potential in environmental data.
Gene networks decode human brain architecture from health to glioma
Gene coexpression analysis reveals optimal markers of cell types and states, providing opportunities for developing novel biomarkers and targeted treatment strategies for glioma patients. Dr. Oldham's work tackles the reproducibility crisis in science, emphasizing data metadata standardization.
Towards better communication schemes for IoT-driven societies
A research team developed an analytical model to evaluate the performance of grant-free communications schemes in densely populated IoT environments. They found that interference cancellation improved base station throughput but did not resolve the near-far problem, while power control addressed it but led to decreased overall network ...
Creature culture: What animal behavior can teach us about saving nature
Researchers have developed an open-access catalog of animal traditions to explore the role of social learning in shaping animal behavior. The Animal Culture Database features vocal communications, mating displays, play, and other social behaviors observed in dozens of species from around the world.
Real-time, large-scale graph neural network inference through BingoCGN
BingoCGN accelerates real-time large-scale graph neural network inference through cross-partition message quantization and a novel training algorithm, achieving up to 65-fold speedup and 107-fold increase in energy efficiency compared to state-of-the-art accelerators.