Bluesky Facebook Reddit Email

FoCo-sing on training AI

05.30.25 | Singapore Management University

SAMSUNG T9 Portable SSD 2TB

SAMSUNG T9 Portable SSD 2TB transfers large imagery and model outputs quickly between field laptops, lab workstations, and secure archives.


By Alvin Lee

SMU Office of Research – AI tools’ capabilities have expanded beyond many people’s expectations, with software such as DALL-E (image generation), Cursor (AI assistant for coding), and Claude (information processing) delivering real-world impact that would have been unimaginable merely five years ago. For the better-known AI models, ChatGPT and Gemini continue to iterate and improve, pushing AI to ever higher adoption rates.

These popular AI models are trained using optimisers, which function and high costs SMU Assistant Professor of Computer Science Zhou Pan had explained to the Office of Research in an article in 2024 . In his latest research project, which clinched a Ministry of Education (MOE) Academic Research Fund (AcRF) Tier 2 grant, Professor Zhou examined the optimisers such as SGD (Stochastic Gradient Descent) and AdamW (Adaptive Moment Estimation W) that are used to train these AI models, and identified three main issues that contribute to the high training costs:

Not FOMO but FoCo

Professor Zhou’s project, “ FoCo: Fast, Communication- and Memory-Efficient Optimizers for Training Large AI Models ”, aims to address those concerns with the following objectives:

“Improvements in one aspect positively affect others,” Professor Zhou elaborates. “For instance, reducing memory costs enables larger minibatches that can reduce gradient noise and typically accelerate optimisers. Given the widespread adoption and immense potential of large AI models across various fields, along with their current challenges like high training costs, lengthy development cycles, significant electricity consumption and carbon dioxide emissions, the study of FoCo is more necessary and urgent than ever before.”

Much of the work will involve reducing ‘gradient noise’, where ‘gradient’ tells the AI how to change its parameters to reach ‘convergence’, while ‘gradient noise’ refers to the random fluctuations in gradient caused by using a few training data points instead of entire training data to compute gradient at each training iteration. The goal is to reach convergence as soon as possible, Professor Zhou explains.

“When training an AI model, we iteratively update its parameters to minimise the training loss,” says Professor Zhou, referring to mistakes made, such as identifying a handwritten ‘8’ as ‘B’. “In each training step, the optimiser adjusts the parameters but it cannot immediately reach the optimal values – instead, it gradually approaches them over many iterations. Convergence occurs when the parameters stabilise, meaning further updates no longer significantly improve the model (the training loss cannot be reduced).

“The convergence speed refers to the number of training steps required to reach this stable state. If the required training step number is big, then the convergence speed is slow; otherwise, the speed is fast. By improving the optimiser’s update strategy, we can reduce the number of steps needed, accelerating training without sacrificing model performance.”

Artificial intelligence, real impact

FoCo-derived AI improvements can have significant real-world benefits in dynamic environments, such as self-driving cars. “These systems can be updated or fine-tuned more frequently and cost-effectively, leading to quicker deployment of safer, more responsive, and contextually aware AI. Moreover, smaller carbon footprint aligns with ESG goals for tech companies,” says Professor Zhou.

Additionally, FoCo could significantly lower training costs and reduce resource demands such as memory and GPU usage, while its optimisations will democratise access to large AI models, observes the computer scientist. “Smaller companies, startups, or academic labs with limited computing infrastructure will be better positioned to train or fine-tune state-of-the-art models without prohibitive investment in hardware.”

He adds: “This research is poised to shift how the AI community approaches large model training – from relying solely on hardware improvements to embracing algorithmic efficiency. For models like GPT and LLaMA, it could enable more sustainable scaling, continuous training, and faster experimentation. Moreover, FoCo’s innovations may inspire new directions in optimiser design, setting a benchmark for how future foundation models are trained globally – faster, greener, and more economically.”

Keywords

Contact Information

Lijie Goh
Singapore Management University
ljgoh@smu.edu.sg

Source

How to Cite This Article

APA:
Singapore Management University. (2025, May 30). FoCo-sing on training AI. Brightsurf News. https://www.brightsurf.com/news/LDEOZ4X8/foco-sing-on-training-ai.html
MLA:
"FoCo-sing on training AI." Brightsurf News, May. 30 2025, https://www.brightsurf.com/news/LDEOZ4X8/foco-sing-on-training-ai.html.