12 August 2025

Tiny Training

How to run an end-to-end PyTorch distributed training

Make subtitle h6 (first of hierarchy after text), make blockquote larger. Python… executed, then stdout. Or, executed in my editor, then written to .blocks/{}, then previewd in nvim with transclusion, then rendered as stdout and parsed as JSON to render in a client-side view.

Also side-bar needs a full-height border and toggle. freez at top for mobile header.

TinyStories is a 10M-parameter model by Microsoft. Here’s the abstract:

In this work, we introduce TinyStories, a synthetic dataset of short stories that only contain words that a typical 3 to 4-year-olds usually understand, generated by GPT-3.5 and GPT-4. We … train and evaluate LMs that are … below 10 million total parameters … [which] produce fluent and consistent stories with several paragraphs that are diverse and have almost perfect grammar, and demonstrate reasoning capabilities.

We also introduce a new paradigm for the evaluation of language models: We suggest a framework which uses GPT-4 to grade the content generated by these models as if those were stories written by students and graded by a (human) teacher.

Data first

We’ll fetch the data[^Use a HF prefix chip to link] from HuggingFace’s datasets, then take a quick peek.

!curl -L https://huggingface.co/datasets/roneneldan/TinyStories/resolve/main/TinyStories-valid.txt -o data/tiny-stories.txt
!head !$

Looks like stories are split by <|endoftext|>

contents = open("data/tiny-stories.txt").read()
stories = [story.trim() for story in contents.split("<|endoftext|>")]

Introducing

@model(“GPT-4”)

[@model(GPT-4)

@modelGPT-4

@model(“openai:gpt-4”)

GPT-4

<-- can show chip inline and brief stats on hovercard

Ideally using some markdown shorthand for kv wrapping {.data-id=foo}[GPT-4]. Can use for arXiv mention. MSFT lab mention if i care.Gkkk i think the linking can just imply the card type by first path /model/openai:gpt-4. Chip.tsx located within /model? Or central registry in LinkPreview?

Then verrry subtle underline later in page. Or first time every paragraph? Then can’t introduce new ones.

Subtle dashed red underline to see recent papers related to “LLM-as-judge” content generated by these models as if those were stories written by students and graded by a (human) teacher.

Moo

Re-organize hierarchy with /prefix/ as tag.