2021-22 Vocab Stats

I wanted to write a short text using the most frequent words students have read so far this whole year. Although I might have been able to predict what most of those words were, the data was insightful. To be clear, this is a *minimum* amount students have read. I copied text from seven novellas we read as a whole class, as well as any class texts in the digital library, then ran it through Voyant Tools. What does NOT appear in the data is the day’s opening greeting I have on a Google Doc that has the date and some statements, as well as any short Type & Talk that didn’t make its way into an edited text for the digital library. The data also does NOT account for what’s heard in class, which is a considerable amount of the input students have received, especially at the beginning of the year. I can’t say including all that would double the stats for every word you see, but it might for some, and certainly would for the ones at the top of this list. Let’s start with the top words appearing at least 100 times:

  • 1225 = esse
  • 508 = in
  • 439 = nōn
  • 373 = et
  • 300 = velle
  • 265 = sed
  • 186 = habēre
  • 181 = placēre
  • 144 = iam
  • 129 = lutulārī
  • 105 = quoque
  • 100 = gladiātōrēs

This list makes a lot of sense, right? The curveballs are lutulārī and gladiātōrēs, but those high repetition figures are from how often they occurred in the topic-specific novellas. In fact, this data really does highlight the kind of exposure students can get from books written for the language learner (vs. “authentic” or unadapted texts). We—as teachers—might feel the repetition in books with that kind of sheltering of vocab, but beginning students neeeeeed it, and benefit from it.

It’s worth noting that none of these words are intentionally…taught! That is, I do try to shelter vocabulary as much as possible in class texts, expressing what could be said in few words that get repeating more often. N.B. this reduces cognitive demand. When students read a text with a handful of meanings, they can keep that meaning in their mind better than if they read a text with hundreds of different meanings. However, I don’t begin classes with a list of these top words, and I don’t set targets for what students should learn, either. Yet over time, even just across 48 classes since September, the most frequent words popped up naturally. The words above tend to be the most useful in order to truly communicate ideas with the fewest meanings possible, so the concept holds up. Let’s break down a few of those verb headwords (lemmas), too:


  • 563 = est
  • 232 = sunt
  • 161 = esse
  • 123 = erat
  • 61 = sum
  • 46 = erant
  • 21 = sumus
  • 18 = es


  • 163 = vult
  • 62 = volō
  • 48 = voluit
  • 15 = vīsne?
  • 12 = volēbat


  • 86 = habet
  • 32 = habēre
  • 31 = habeō
  • 22 = habēbat
  • 15 = habent

At first glance, I’m surprised a word like habent didn’t make its way into a text more than 15 times. Then again, these frequency stats follow the same trend we get from literature analyses. Diederich (1936) found that 7.1% of Latin verbs were singular 3rd person, while only 2.4% of them were plural. Otherwise, I do like how the unsheltering of grammar shines through. For each of the most frequent verbs, students are exposed to a range of perspectives (i.e., persons), and tenses. This is all the first year, no intentional teaching of any grammatical point whatsoever. Now, I didn’t include word forms that appeared fewer than 15 times, but there were plenty in the 5-10 range spanning moods, etc., further showing the exposure to a variety of grammatical uses. Here’s the next batch of words in the 100-50x frequency category shown by headword (lemma):

  • 86 = familia
  • 82 = multī
  • 75 = arma
  • 73 = ad
  • 71 = domī
  • 67 = magica
  • 65 = malus
  • 64 = omnēs
  • 63 = magus
  • 59 = ergō
  • 56 = lūdit
  • 54 = sōla
  • 53 = thermīs
  • 53 = studentī

While some of those are also topic-specific words from novellas, there’s a decent amount of general vocab in there as well, also highlighting a variety of grammatical functions, such as domī. Here’s 50-25x:

  • 48 = lūdus
  • 47 = animal
  • 42 = īre
  • 41 = māter
  • 41 = Rōmae
  • 41 = sonus
  • 40 = energīa
  • 40 = subitō
  • 40 = diēs
  • 39 = optimē
  • 39 = templum
  • 34 = āthlētica
  • 33 = aqua
  • 31 = sēcrētē
  • 31 = videt
  • 29 = ātra
  • 29 = nunc
  • 29 = obiectum
  • 28 = appāret
  • 28 = sorōrēs
  • 27 = cum
  • 26 = īnscrīptiō
  • 26 = mōnstrum
  • 26 = ubīque
  • 26 = parentēs
  • 25 = scytala

In this batch we get īre, which is supposedly one of the Top 5 storytelling verbs along with is, has, wants, and likes. However, īre trails behind (42 instances) the other four verbs ranging from 181 to 1225 instances. This tells me we didn’t have as many stories in which the character goes somewhere. And that’s fine. I haven’t been into storytelling with acting for years, and that’s usually when the verb is used the most, moving from one physical space to another, dramatizing travel in the story from one location to the next. Without acting, or multiple locations, that verb isn’t used as much. And that’s fine.

Although I’d never claim any word is acquired, there’s a good chance the ones with just 25 instances will be forgotten at some point. Then again, the unpredictable nature of what makes an event and its language, memorable means something like subitō (39 instances) might be more understandable than lūdit (56 instances). Although I stopped at the 25x mark for this post, there’s also a good chance that words that appear in stories merely a handful of times hold more meaning—for whatever reason—than certain ones that appear 10x, 20x, or 30x! That’s just the nature of language and language acquisition.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

This site uses Akismet to reduce spam. Learn how your comment data is processed.