2021-22 Vocab Stats

I wanted to write a short text using the most frequent words students have read so far this whole year. Although I might have been able to predict what most of those words were, the data was insightful. To be clear, this is a *minimum* amount students have read. I copied text from seven novellas we read as a whole class, as well as any class texts in the digital library, then ran it through Voyant Tools. What does NOT appear in the data is the day’s opening greeting I have on a Google Doc that has the date and some statements, as well as any short Type & Talk that didn’t make its way into an edited text for the digital library. The data also does NOT account for what’s heard in class, which is a considerable amount of the input students have received, especially at the beginning of the year. I can’t say including all that would double the stats for every word you see, but it might for some, and certainly would for the ones at the top of this list. Let’s start with the top words appearing at least 100 times:

1225 = esse
508 = in
439 = nōn
373 = et
300 = velle
265 = sed
186 = habēre
181 = placēre
144 = iam
129 = lutulārī
105 = quoque
100 = gladiātōrēs

This list makes a lot of sense, right? The curveballs are lutulārī and gladiātōrēs, but those high repetition figures are from how often they occurred in the topic-specific novellas. In fact, this data really does highlight the kind of exposure students can get from books written for the language learner (vs. “authentic” or unadapted texts). We—as teachers—might feel the repetition in books with that kind of sheltering of vocab, but beginning students neeeeeed it, and benefit from it.

It’s worth noting that none of these words are intentionally…taught! That is, I do try to shelter vocabulary as much as possible in class texts, expressing what could be said in few words that get repeating more often. N.B. this reduces cognitive demand. When students read a text with a handful of meanings, they can keep that meaning in their mind better than if they read a text with hundreds of different meanings. However, I don’t begin classes with a list of these top words, and I don’t set targets for what students should learn, either. Yet over time, even just across 48 classes since September, the most frequent words popped up naturally. The words above tend to be the most useful in order to truly communicate ideas with the fewest meanings possible, so the concept holds up. Let’s break down a few of those verb headwords (lemmas), too:

esse

563 = est
232 = sunt
161 = esse
123 = erat
61 = sum
46 = erant
21 = sumus
18 = es

velle

163 = vult
62 = volō
48 = voluit
15 = vīsne?
12 = volēbat

habēre

86 = habet
32 = habēre
31 = habeō
22 = habēbat
15 = habent

At first glance, I’m surprised a word like habent didn’t make its way into a text more than 15 times. Then again, these frequency stats follow the same trend we get from literature analyses. Diederich (1936) found that 7.1% of Latin verbs were singular 3rd person, while only 2.4% of them were plural. Otherwise, I do like how the unsheltering of grammar shines through. For each of the most frequent verbs, students are exposed to a range of perspectives (i.e., persons), and tenses. This is all the first year, no intentional teaching of any grammatical point whatsoever. Now, I didn’t include word forms that appeared fewer than 15 times, but there were plenty in the 5-10 range spanning moods, etc., further showing the exposure to a variety of grammatical uses. Here’s the next batch of words in the 100-50x frequency category shown by headword (lemma):

86 = familia
82 = multī
75 = arma
73 = ad
71 = domī
67 = magica
65 = malus
64 = omnēs
63 = magus
59 = ergō
56 = lūdit
54 = sōla
53 = thermīs
53 = studentī

While some of those are also topic-specific words from novellas, there’s a decent amount of general vocab in there as well, also highlighting a variety of grammatical functions, such as domī. Here’s 50-25x:

48 = lūdus
47 = animal
42 = īre
41 = māter
41 = Rōmae
41 = sonus
40 = energīa
40 = subitō
40 = diēs
39 = optimē
39 = templum
34 = āthlētica
33 = aqua
31 = sēcrētē
31 = videt
29 = ātra
29 = nunc
29 = obiectum
28 = appāret
28 = sorōrēs
27 = cum
26 = īnscrīptiō
26 = mōnstrum
26 = ubīque
26 = parentēs
25 = scytala

In this batch we get īre, which is supposedly one of the Top 5 storytelling verbs along with is, has, wants, and likes. However, īre trails behind (42 instances) the other four verbs ranging from 181 to 1225 instances. This tells me we didn’t have as many stories in which the character goes somewhere. And that’s fine. I haven’t been into storytelling with acting for years, and that’s usually when the verb is used the most, moving from one physical space to another, dramatizing travel in the story from one location to the next. Without acting, or multiple locations, that verb isn’t used as much. And that’s fine.

Although I’d never claim any word is acquired, there’s a good chance the ones with just 25 instances will be forgotten at some point. Then again, the unpredictable nature of what makes an event and its language, memorable means something like subitō (39 instances) might be more understandable than lūdit (56 instances). Although I stopped at the 25x mark for this post, there’s also a good chance that words that appear in stories merely a handful of times hold more meaning—for whatever reason—than certain ones that appear 10x, 20x, or 30x! That’s just the nature of language and language acquisition.

Magister P.

Grading, Assessment, and Comprehension-based Language Teaching

2021-22 Vocab Stats

Leave a comment Cancel reply

Share this:

Leave a comment Cancel reply