Pythia: A Suite for Analyzing Large Language Models Across Training and Scaling (2024)

Stella Biderman,Hailey Schoelkopf,Quentin Gregory Anthony,Herbie Bradley,Kyle O’Brien,Eric Hallahan,Mohammad Aflah Khan,Shivanshu Purohit,Usvsn Sai Prashanth,Edward Raff,Aviya Skowron,Lintang Sutawika,Oskar Van Der Wal

Proceedings of the 40th International Conference on Machine Learning,PMLR 202:2397-2430,2023.

Abstract

How do large language models (LLMs) develop and evolve over the course of training? How do these patterns change as models scale? To answer these questions, we introduce Pythia, a suite of 16 LLMs all trained on public data seen in the exact same order and ranging in size from 70M to 12B parameters. We provide public access to 154 checkpoints for each one of the 16 models, alongside tools to download and reconstruct their exact training dataloaders for further study. We intend Pythia to facilitate research in many areas, and we present several case studies including novel results in memorization, term frequency effects on few-shot performance, and reducing gender bias. We demonstrate that this highly controlled setup can be used to yield novel insights toward LLMs and their training dynamics. Trained models, analysis code, training code, and training data can be found at https://github.com/EleutherAI/pythia.

Cite this Paper

BibTeX

@InProceedings{pmlr-v202-biderman23a, title = {Pythia: A Suite for Analyzing Large Language Models Across Training and Scaling}, author = {Biderman, Stella and Schoelkopf, Hailey and Anthony, Quentin Gregory and Bradley, Herbie and O'Brien, Kyle and Hallahan, Eric and Khan, Mohammad Aflah and Purohit, Shivanshu and Prashanth, Usvsn Sai and Raff, Edward and Skowron, Aviya and Sutawika, Lintang and Van Der Wal, Oskar}, booktitle = {Proceedings of the 40th International Conference on Machine Learning}, pages = {2397--2430}, year = {2023}, editor = {Krause, Andreas and Brunskill, Emma and Cho, Kyunghyun and Engelhardt, Barbara and Sabato, Sivan and Scarlett, Jonathan}, volume = {202}, series = {Proceedings of Machine Learning Research}, month = {23--29 Jul}, publisher = {PMLR}, pdf = {https://proceedings.mlr.press/v202/biderman23a/biderman23a.pdf}, url = {https://proceedings.mlr.press/v202/biderman23a.html}, abstract = {How do large language models (LLMs) develop and evolve over the course of training? How do these patterns change as models scale? To answer these questions, we introduce Pythia, a suite of 16 LLMs all trained on public data seen in the exact same order and ranging in size from 70M to 12B parameters. We provide public access to 154 checkpoints for each one of the 16 models, alongside tools to download and reconstruct their exact training dataloaders for further study. We intend Pythia to facilitate research in many areas, and we present several case studies including novel results in memorization, term frequency effects on few-shot performance, and reducing gender bias. We demonstrate that this highly controlled setup can be used to yield novel insights toward LLMs and their training dynamics. Trained models, analysis code, training code, and training data can be found at https://github.com/EleutherAI/pythia.}}

Endnote

%0 Conference Paper%T Pythia: A Suite for Analyzing Large Language Models Across Training and Scaling%A Stella Biderman%A Hailey Schoelkopf%A Quentin Gregory Anthony%A Herbie Bradley%A Kyle O’Brien%A Eric Hallahan%A Mohammad Aflah Khan%A Shivanshu Purohit%A Usvsn Sai Prashanth%A Edward Raff%A Aviya Skowron%A Lintang Sutawika%A Oskar Van Der Wal%B Proceedings of the 40th International Conference on Machine Learning%C Proceedings of Machine Learning Research%D 2023%E Andreas Krause%E Emma Brunskill%E Kyunghyun Cho%E Barbara Engelhardt%E Sivan Sabato%E Jonathan Scarlett%F pmlr-v202-biderman23a%I PMLR%P 2397--2430%U https://proceedings.mlr.press/v202/biderman23a.html%V 202%X How do large language models (LLMs) develop and evolve over the course of training? How do these patterns change as models scale? To answer these questions, we introduce Pythia, a suite of 16 LLMs all trained on public data seen in the exact same order and ranging in size from 70M to 12B parameters. We provide public access to 154 checkpoints for each one of the 16 models, alongside tools to download and reconstruct their exact training dataloaders for further study. We intend Pythia to facilitate research in many areas, and we present several case studies including novel results in memorization, term frequency effects on few-shot performance, and reducing gender bias. We demonstrate that this highly controlled setup can be used to yield novel insights toward LLMs and their training dynamics. Trained models, analysis code, training code, and training data can be found at https://github.com/EleutherAI/pythia.

APA

Biderman, S., Schoelkopf, H., Anthony, Q.G., Bradley, H., O’Brien, K., Hallahan, E., Khan, M.A., Purohit, S., Prashanth, U.S., Raff, E., Skowron, A., Sutawika, L. & Van Der Wal, O.. (2023). Pythia: A Suite for Analyzing Large Language Models Across Training and Scaling. Proceedings of the 40th International Conference on Machine Learning, in Proceedings of Machine Learning Research 202:2397-2430 Available from https://proceedings.mlr.press/v202/biderman23a.html.

Related Material

Pythia: A Suite for Analyzing Large Language Models Across Training and Scaling (2024)

References

Top Articles
A Virtual Tour of St. Petersburg, Russia
Irina Scherbakowa: Was hat Putin aus Russland gemacht?
Jack Doherty Lpsg
Spectrum Gdvr-2007
Fort Morgan Hometown Takeover Map
Lexi Vonn
Odawa Hypixel
Part time Jobs in El Paso; Texas that pay $15, $25, $30, $40, $50, $60 an hour online
Form V/Legends
Terrorist Usually Avoid Tourist Locations
Western Union Mexico Rate
Lighthouse Diner Taylorsville Menu
Best Private Elementary Schools In Virginia
WK Kellogg Co (KLG) Dividends
Https://Gw.mybeacon.its.state.nc.us/App
Palace Pizza Joplin
Sports Clips Plant City
Elizabethtown Mesothelioma Legal Question
Stihl Km 131 R Parts Diagram
[Birthday Column] Celebrating Sarada's Birthday on 3/31! Looking Back on the Successor to the Uchiha Legacy Who Dreams of Becoming Hokage! | NARUTO OFFICIAL SITE (NARUTO & BORUTO)
Vanessawest.tripod.com Bundy
My Homework Lesson 11 Volume Of Composite Figures Answer Key
18889183540
The Blind Showtimes Near Amc Merchants Crossing 16
Minnick Funeral Home West Point Nebraska
Encyclopaedia Metallum - WikiMili, The Best Wikipedia Reader
Costco Gas Hours St Cloud Mn
Truvy Back Office Login
Unreasonable Zen Riddle Crossword
4.231 Rounded To The Nearest Hundred
Ncal Kaiser Online Pay
Sony Wf-1000Xm4 Controls
Mastering Serpentine Belt Replacement: A Step-by-Step Guide | The Motor Guy
Grandstand 13 Fenway
67-72 Chevy Truck Parts Craigslist
Blasphemous Painting Puzzle
Craiglist Hollywood
Review: T-Mobile's Unlimited 4G voor Thuis | Consumentenbond
Cygenoth
O'reilly's Palmyra Missouri
Bustednewspaper.com Rockbridge County Va
Love Words Starting with P (With Definition)
Wolf Of Wallstreet 123 Movies
Wzzm Weather Forecast
Abigail Cordova Murder
Tìm x , y , z :a, \(\frac{x+z+1}{x}=\frac{z+x+2}{y}=\frac{x+y-3}{z}=\)\(\frac{1}{x+y+z}\)b, 10x = 6y và \(2x^2\)\(-\) \(...
2487872771
Runelite Ground Markers
Autozone Battery Hold Down
When Is The First Cold Front In Florida 2022
Philasd Zimbra
Mazda 3 Depreciation
Latest Posts
Article information

Author: Ouida Strosin DO

Last Updated:

Views: 6137

Rating: 4.6 / 5 (56 voted)

Reviews: 87% of readers found this page helpful

Author information

Name: Ouida Strosin DO

Birthday: 1995-04-27

Address: Suite 927 930 Kilback Radial, Candidaville, TN 87795

Phone: +8561498978366

Job: Legacy Manufacturing Specialist

Hobby: Singing, Mountain biking, Water sports, Water sports, Taxidermy, Polo, Pet

Introduction: My name is Ouida Strosin DO, I am a precious, combative, spotless, modern, spotless, beautiful, precious person who loves writing and wants to share my knowledge and understanding with you.