References

References#

All citations across the knowledge base resolve into this page. Each entry is back-linked from every page that cites it — click the ↩ arrows next to the key to jump back.

Bibliography source: references.bib in the repo root.

[EL23]

Ronen Eldan and Yuanzhi Li. Tinystories: how small can language models be and still speak coherent english? arXiv preprint arXiv:2305.07759, 2023. URL: https://arxiv.org/abs/2305.07759, arXiv:2305.07759.

[GLB+18]

Thomas George, César Laurent, Xavier Bouthillier, Nicolas Ballas, and Pascal Vincent. Fast approximate natural gradient descent in a Kronecker-factored eigenbasis. Advances in Neural Information Processing Systems (NeurIPS), 2018. URL: https://arxiv.org/abs/1806.03884, arXiv:1806.03884.

[GBA+23]

Roger Grosse, Juhan Bae, Cem Anil, Nelson Elhage, Alex Tamkin, Amirhossein Tajdini, Benoit Steiner, Dustin Li, Esin Durmus, Ethan Perez, Evan Hubinger, Kamilė Lukošiūtė, Karina Nguyen, Nicholas Joseph, Sam McCandlish, Jared Kaplan, and Samuel R. Bowman. Studying large language model generalization with influence functions. arXiv preprint arXiv:2308.03296, 2023. URL: https://arxiv.org/abs/2308.03296, arXiv:2308.03296.

[KL17]

Pang Wei Koh and Percy Liang. Understanding black-box predictions via influence functions. In Proceedings of the 34th International Conference on Machine Learning (ICML). 2017. URL: https://arxiv.org/abs/1703.04730, arXiv:1703.04730.

[LXT+18]

Hao Li, Zheng Xu, Gavin Taylor, Christoph Studer, and Tom Goldstein. Visualizing the loss landscape of neural nets. In Advances in Neural Information Processing Systems (NeurIPS). 2018. URL: https://arxiv.org/abs/1712.09913, arXiv:1712.09913.

[MG15]

James Martens and Roger Grosse. Optimizing neural networks with Kronecker-factored approximate curvature. In Proceedings of the 32nd International Conference on Machine Learning (ICML). 2015. URL: https://arxiv.org/abs/1503.05671, arXiv:1503.05671.

[Pea94]

Barak A. Pearlmutter. Fast exact multiplication by the Hessian. Neural Computation, 6(1):147–160, 1994.

[R+26]

Jennifer Rosser and others. Infusion: reverse engineering influence functions. In 3rd DATA-FM Workshop at ICLR. 2026. Project page: https://jrosser.co.uk/infusion/.

[SWZ+24]

Zhihong Shao, Peiyi Wang, Qihao Zhu, Runxin Xu, Junxiao Song, Mingchuan Zhang, Y. K. Li, Yu Wu, and Daya Guo. DeepSeekMath: pushing the limits of mathematical reasoning in open language models. arXiv preprint arXiv:2402.03300, 2024. Introduces GRPO. URL: https://arxiv.org/abs/2402.03300, arXiv:2402.03300.