A fully open large language model with approximately 172 billion parameters (GPT-3 level): "llm-jp-3-172b-instruct3" now publicly available

The Research and Development Center for Large Language Models (LLMC) at the National Institute of Informatics (NII) (Director-General: KUROHASHI Sadao, Chiyoda-ku, Tokyo, Japan), part of the Research Organization of Information and Systems, has developed a large language model (LLM) with approximately 172 billion parameters ^(*1) (Comparable in scale to the number of parameters in GPT-3) trained from scratch using 2.1 trillion tokens of training data, and released the model to the public under the name "llm-jp-3-172b-instruct3." This model is the largest in the world among those that make not only model parameters but also training data publicly available.

The model has surpassed GPT-3.5 in performance on benchmarks such as "llm-jp-eval," which measures various language understanding capabilities in Japanese, and "llm-leaderboard" developed as part of the GENIAC project ^(*2) , a program by the Ministry of Economy, Trade, and Industry (METI) and the New Energy and Industrial Technology Development Organization (NEDO) to support the development of generative AI.

This model builds upon the results of training a 13-billion-parameter LLM on the mdx ^(*3) Platform for Building Data-Empowered Society, and a trial for developing a 175-billion-parameter model using the AI Bridging Cloud Infrastructure (ABCI) with support from the 2nd Large-scale Language Model Building Support Program of the National Institute of Advanced Industrial Science and Technology.

LLMC plans to utilize "llm-jp-3-172b-instruct3" to promote research and development to ensure the transparency and reliability of LLMs.

１．Overview of the Released LLM

（1）Computing Resources

（2）Model Training Corpus ^(*4)

（3）Model

（4）Tuning

（5）Evaluation

（6）URL for the Released Model, Tools, and Corpus

https://llm-jp.nii.ac.jp/en/release

Notes：

2 ．Future Plans

（Reference 1）Overview of LLM-jp

１．LLM-jp, organized by NII, consists of over 1,900 participants (as of December 24, 2024) from universities, companies, and research institutions, mainly focusing on researchers in natural language processing and computer systems. LLM-jp shares information on LLM research and development through hybrid meetings, online sessions, and Slack, while also conducting joint research on building LLMs. Specific activities include:

２．LLM-jp has established working groups such as "Corpus Construction WG," "Model Construction WG," "Fine-tuning & Evaluation WG," "Safety WG," "Multi-modal WG" and “Real Environment Interaction WG.” Each group, led respectively by Professor KAWAHARA Daisuke of Waseda University, Professor SUZUKI Jun of Tohoku University, Professor MIYAO Yusuke of the University of Tokyo, Project Professor SEKINE Satoshi of NII, Professor OKAZAKI Naoaki of Institute of Science Tokyo, and Professor OGATA Tetsuya of Waseda University, is engaged in research and development activities.

Additional contributions come from many individuals, including: Professor TAURA Kenjiro and Associate Professor KUGA Yohei of the University of Tokyo (on computational infrastructures), and Professor YOKOTA Rio of Institute of Science Tokyo (on parallel computing methods).

３．For more details, visit the official website: https://llm-jp.nii.ac.jp/en/

（Reference2）

This achievement was made possible through a grant from the New Energy and Industrial Technology Development Organization (NEDO) and MEXT's subsidy.

(*1) Number of Parameters: Large language models are neural networks trained on language data, and the number of parameters is one of the indicators of the network’s size. It is generally believed that more parameters lead to higher performance.

(*2) GENIAC (Generative AI Accelerator Challenge): A program jointly conducted by the Ministry of Economy, Trade, and Industry (METI) and NEDO to strengthen domestic capabilities in developing generative AI. The initiative primarily focuses on providing computational resources for the development of foundation models, which are core technologies of generative AI, as well as supporting pilot studies for utilizing data and AI.

(*3) mdx (a platform for building a data-empowered society): A high-performance virtual environment focused on data utilization, jointly operated by a consortium of nine universities and two research institutes. It is a platform for data collection, accumulation, and analysis that allows users to build, expand, and integrate research environments on-demand in a short amount of time, tailored to specific needs.

(*4) Corpus: A database that stores large amounts of natural language texts in a structural manner.

###

About the National Institute of Informatics (NII)

NII is Japan's only academic research institute dedicated to the new discipline of informatics. Its mission is to "create future value" in informatics. NII conducts both long-term basic research and practical research aimed at solving social problems in a wide range of informatics research fields, from fundamental theories to the latest topics, such as artificial intelligence, big data, the Internet of Things, and information security.

As an inter-university research institute, NII builds and operates academic information infrastructure essential for the research and educational activities of the entire academic community (including the Science Information Network) as well as developing services such as those that enable the provision of academic content and service platforms.
https://www.nii.ac.jp/en/

About the Research Organization of Information and Systems (ROIS)

ROIS is a parent organization of four national institutes (National Institute of Polar Research, National Institute of Informatics, the Institute of Statistical Mathematics and National Institute of Genetics) and the Joint Support-Center for Data Science Research. It is ROIS's mission to promote integrated, cutting-edge research that goes beyond the barriers of these institutions, in addition to facilitating their research activities, as members of inter-university research institutes.

A fully open large language model with approximately 172 billion parameters (GPT-3 level): "llm-jp-3-172b-instruct3" now publicly available

Apple Watch Series 11 (GPS, 46mm)

Keywords

Contact Information

Source

How to Cite This Article