Scaling Laws for Neural Language Models

GPTKB entity

Statements (29)
Predicate Object
gptkbp:instanceOf gptkb:academic_journal
gptkbp:arXivID 2001.08361
gptkbp:author gptkb:Ilya_Sutskever
gptkb:Benjamin_Chess
gptkb:Tom_B._Brown
gptkb:Scott_Gray
gptkb:Alec_Radford
gptkb:Dario_Amodei
gptkb:Jared_Kaplan
gptkb:Rewon_Child
gptkb:Sam_McCandlish
gptkb:Tom_Henighan
gptkb:Jeffrey_Wu
gptkbp:citation high
gptkbp:foundIn Performance of neural language models improves predictably as model size, dataset size, and compute increase
There are diminishing returns to increasing model size or dataset size alone
Test loss scales as a power-law with respect to model size, dataset size, and compute
Optimal allocation of compute between model size and dataset size can be derived
https://www.w3.org/2000/01/rdf-schema#label Scaling Laws for Neural Language Models
gptkbp:influenced subsequent research on large language models
gptkbp:memberSchool gptkb:OpenAI
gptkbp:publicationYear 2020
gptkbp:publishedIn gptkb:arXiv
gptkbp:topic deep learning
scaling laws
language models
gptkbp:bfsParent gptkb:NeurIPS_2022
gptkb:Jared_Kaplan
gptkbp:bfsLayer 6