In defence of learning words in word pairs

- but only when doing it the 'right' way!


Rob Waring                This version: Feb 2004


In defence of the arguments put forward on this page you are invited to read a thesis based on these principles. The case study showed that for 30 minutes of learning per day for 30 consecutive days that she learned 468 words (16 words per hour)! 2 months later 395 had been retained and even at 7 months she still knew 311 of them (a net effect of 10 words per hour).


Read on.


A 'word pair' refers to the word in your mother tongue and its foreign label.  For example the English word apple  is ringo in Japanese.  The word pair is apple-ringo. A common way to learn word pairs is in a list of pairs or with word cards.

Before we begin I should add that this page is directed at foreign language learning for young adults and over and would not apply to children, particularly younger children.


List learning

One of the most common forms of word learning is list learning. This involves putting pairs of words together in a list like this.


apple       ringo

book        hon

pencil       empitsu

orange      orenji



Usually people go down the list trying to remember the words by looking both the words in the pairs. However, this is a problem if we are not forcing our brains to recall the item. If we can see the translation then we are not making the necessary mental effort required to learn it. Thus it is better to cover one side of the list with a piece of paper and try to remember the word before looking at it.  For many people this is quite successful, but in fact there are several real problems with learning in lists. Here area  few of the problems.

a) Each word has to be learned in turn. As we go down the list some will be known and some will not. As we have to look at each word, and our list does not allow us to change the order of the words, it is wasting time to look at words we already know. This is called the "order effect".

b) Research shows that we remember things best in the way that we learned them.  Thus if we learn words in a list we will remember them best that way. This means that if we have to demonstrate our knowledge in a way other than in a list, then we may not recall it. This is called the "learning method effect".

c) Research shows that as we learn we create a context for our learning. in List learning the context is the other words.  This can be a problem because we can remember the answer to the next word in our list before we even look at it.  This is called the "serial effect".

d) If we have learned 18 of a list of 20 words we still have two words we do not know and we do not want to look at the whole list to remember two words, thus we tend to neglect the last two words. One way to deal with this is to re-write them on a new 'difficult words' list.  This wastes time.

e) Learners tend to write words in alphabetical lists. This is a problem because if one has covered the mother tongue word and is looking at the translations the next mother tongue word in the list can easily be guessed because it will start with the similar letters.  For example if the words are apple, apply, appreciate we can easily guess the next word will also begin with app...  While this does jog our memory, it does so only in this order, and denies us the chance to have retrieved it without the hint.  This is called the "alphabet order effect".


Word card learning is far more efficient than list learning.

A word card is a small piece of card that has the foreign word on one side and its translation on the other. (It does not have to be a translation, a definition or picture may also be fine. In fact anything that the learner feels can help here learn the word is fine).  If we write each word on a word card we immediately get rid of most of the problems with list learning.

a) Word card learning is dynamic as we can change the order of the learning by shuffling the word cards. List learning by comparison is static. By shuffling the cards we get rid of the 'order effect' and and the 'alphabet order effect'.

b) We can easily learn receptively or productively. We do this by choosing whether to look at the foreign word side and trying to recall the mother tongue word (receptively) or by looking at the mother tongue side and trying to recall the foreign word (productively).

c) The learner is learning only words she does not know and does not have to look at words she does not know.

d) It is a motor-manual activity (i.e. using both mental and physical activities) which reinforces learning

f) The learners is paying more attention to the word that she normally would in reading and reactivates knowledge systems.

g) This type of learning forces retrieval.

h) The learner controls the pace of the learning and when it occurs.

i) Each word receives only the amount of attention it needs.

j) The type of definition is flexible (pictures, translation, definition etc.)

k) The learner can see what has been learned easily and thus she can set learning targets and measure the learning easily.


Criticisms of word pair learning - and a response

There are five main criticisms of word pair learning. Firstly, that is is translation dependent, secondly that it is behaviourist, thirdly it is boring, and fourthly that it does not suit all learners and lastly that this type of learning has processing limits. Let us look at these in turn.

    Word pairs and translations

Most practitioners of  word pair learning tend to use translations. Some people criticize this practice because they say that words in the two languages are not always the same.  This is of course absolutely correct, but one has to remember that word pair learning is only the START of learning about a word. Word pair learning is about matching a  meaning with its spelling or pronunciation (its form). It does not attempt to help the learner learn about how to use the word, which words it goes with, how it is similar and different to other words, whether to use it in polite conversation or written prose, what reactions the use of the word may have and so on and so on. Word pair learning ONLY allows the learner to get a grip on the form and meaning.  This HAS TO be supported by more work on the word by word study, lots of reading, listening, its use in conversation and writing and so on.

Often when people criticize word pair learning they say that translation is bad and that learners have to learn the word in the context within which it is found, i.e. in the foreign language environment, say in a text. This certainly would be best but it is not always achievable or successful. Let's look at an example. Look at these sentences. What does 'blunger' mean?

    The blunger walked into the bar and ordered a drink. The blunger then talked to the barman about his day.

We know that a 'blunger' is a person because it walks and talks, and that it is probably a human as it orders a drink. We cannot guess though that blunger could mean 'man'.  'woman'  or whether it is a job title, a murderer, a comedian, or any one of a thousand possible combinations.  It will probably take us several dozen meetings with the word to guess what it is exactly.  Thus a quick 'gloss' (translation or explanation) will be more efficient in making the meaning of the word apparent. (Note that 'blunger' is an imaginary word used as an example).

Another line taken to reject translations of meanings is to suggest that we should train our learners to guess from context and make them independent of teachers and dictionaries.  I completely agree. BUT (a big but) they cannot do this until they have enough critical mass of language to be able to guess successfully.  Research suggests that successful guessing can only take place after 98-99% of the surrounding words are already known.  Thus until this level of text awareness is achieved the learner has to fall back on other strategies, the most effective of which is translation.

Other people reject word pair learning by pointing at the very strong finding in the psychology literature to demonstrate the notion that the more mental effort put into learning something, the better the learning outcome (often called "depth of processing"). While this may be true to some extent however, research dating back to the late 1940's has shown that learners learn more from word pair learning than from guessing word meanings in context.   The research does not seem to support the notion of depth of processing at the initial stage of word learning. In fact there is a faster way into the meaning - and that is translation.  Why? Because a translation is more solid, more concrete than a guess.  A guess is tentative whereas a translation gives the learner a stronger hold on a meaning as there is an anchor (the mother tongue meaning).

Translation is a typical strategy used in the classroom. Teachers tend to give translations whenever new words are met, so what is wrong with learners doing the same? Besides translation is just as valid a learning method as any other. We should NOT though think of translation as the ONLY strategy. There are many more useful techniques for developing word knowledge above the meaning-form relationship too.

    Word pairs and behaviourism

Behaviourism as a language learning strategy is said to be out of favour. The basic notion is that by repeated exposure to a word will help build a stimulus -response relationship and cement it in memory. Opponents of this view suggest that learning is not about stimulus and response but about learning a system, an evolving language. They say that stimulus-response it too mechanical and not human-like. I disagree.  Just because behaviourist notions do not suite some forms of learning does not mean it does not suit all forms of learning. We all know that to learn to dance well we have to repeatedly perform the same routines, the same with practicing for sports, playing the piano and so forth. Why then should we throw the baby out with the bathwater and reject behaviourist ways of learning words?  If it works for some learners, then it is a very valid activity. Besides word-pair learning has existed for hundreds of years, and millions of people have been successful language learners using behaviourist techniques long before behaviourism as a notion was ever conceived.

    Word pair learning need not be boring

One reason this type of learning can become boring is when people do this in a "massed practice" manner.  This means they study word cards for a long time say once a week. This is not the best method.  As we shall see below it is better to space out the learning into small manageable times slots.

    Word pair learning can suit most learners

Word pair learning need not be a main word learning strategy for all learners.  Some love it and they learn so quickly and other hate this form of learning. As has been mentioned if learners can see progress

    Word pair learning gets faster the more you do it

It is commonly believed that our brains will get full too quickly if we learn lots of words this way. It is thus posed that this puts processing limits on what we can achieve and will ultimately slow down our learning. Contrary to what many people believe, research shows that the rate at which we can learn word pairs actually increases rather than decreases as we get better at it.  It is obvious why.  The parallel is the same for learning to drive or play the piano.  We get better at it the more we do it.

    Success with word pair learning

In 1908 a researcher asked his students of German to learn a 1000 word pairs as soon as they could. They did successfully this within a few weeks.  At the time they were not aware of many of the modern learning methods including mnemonics and so had to learn only by rote. Since this time the vocabulary learning literature is littered with success stories of word pair learning, particularly when learners have used mnemonics (see below).


How do we learn from word cards?

1. We make the cards. (People often complain about the time it takes to make word cards. But this is the same effort as writing them in a list. Besides simply writing it down is the first stage in remembering it.  The time invested will repay itself many times over).

2. Then we divide the word cards into small sets of about 10 words.

3. We go through the set of cards trying to remember them. We look at one side of the card trying to recall what is on the reverse side. So if our card has apple on one side we would try to remember its translation into the foreign word. It is vital that we try to recall BEFORE looking. We have to make a mental effort to help it go into memory.

4. If we recall the word, we put in on the left side and if we get it wrong we put it on the right.

5. When we have gone through the set of 10 words we pick up the ones we got wrong and try again. As before, if we recall the word, we put in on the left side and if we get it wrong we put it on the right.

6. We continue like this until all the words are learned.

7. Then we go through the whole set of 10 words again before moving onto set 2. We only move onto set 2 when all the words in a set are known.


It is extremely important that this is done systematically.


How to learn word pairs systematically

We all know that any form of learning is best done within some kind of framework and with a plan in mind.  This applies to learning to play the piano as much as it does to learning to drive and to word learning. We all know that:-

bulletthe learning needs to be structured into steps.  For example, we need to know what the pedals in a car do before we switch on the engine.
bulletwe also need instruction at the right level of difficulty so that the next step in learning can take place. For example, we need to be able to drive in straight line before we can learn to overtake.
bulletwe should learn little bits at a time rather than lots at once.

By far the most important concept in word pair vocabulary learning is the notion of forgetting.  It has long been known that forgetting words is natural and happens to all of us whether we like it or not. Fortunately, as we shall, see there is a lot that we can do about this if we are systematic about it.

Forgetting occurs for many reasons.

bulletForgetting occurs because  the first time we learn something the knowledge is quite fragile and thus easily lost. This is because we have not fitted the new knowledge into our knowledge system correctly yet.
bulletForgetting occurs because the new word is not met again.  We need to meet the word many times before we have 'got it'.  Research suggests that it takes about 8-20 meetings of an average word to learn it.

Researchers have shown that there is a thing called the 'forgetting curve'. The importance of this cannot be overstated.  It looks like this.

How does it work? At the start (time0) let us assume that you have just learned say 20 new words from a word list.  Your knowledge is "perfect" so to speak at time0. If you were tested then you would score 20/20 on a test.  But the knowledge is only in a short-term memory state and will not stay in your head (long-term memory) unless you meet the word again and soon.  If we let nature take its course and not attempt to meet the word again or relearn it,  then by tomorrow maybe you can only remember say 15 words, by next week only say 8 and next month you will have almost completely forgotten most of them. Research into memory decay shows this finding to be very consistent. You may not even remember having met a particular word. It is important to note that most forgetting occurs very soon after learning, but if one does not meet the word again soon then the word is likely to be forgotten forever.  This means that students have to meet words that they have learned soon after the learning so that forgetting is minimized.  This is called "working against the Forgetting Curve". Thus a realistic distance between first learning (time0) and the second relearning (time1) should be very short, say a few minutes.

A Relearning Schedule

We also know from memory research that every time we relearn something the knowledge gets stronger and thus more resistant to decay. Thus the gap of a few  minutes between time0 and time1 is shorter than the gap between time1 and time2. Paul Pimsleur calculated the ideal distance as multiples of 5.  So learning and relearning should take place at a time period of 5 times longer than the previous gap. For example, this would be 5 minutes, 25 minutes, 125 minutes, 10 hours 50 hours and so on. Note that after about time10 the distance is very wide but that the person has met the word 9 times previously and there is a very high probability that the word will be cemented in memory.

Diagrammatically an example learning and relearning schedule would look like this. t0 refers to the first time the words are learned, t1 refers to the first relearning and soon.

Natural forgetting

Note several things

bulletThere is relatively little effort needed to relearn previous knowledge compared to having not met it again. This is because
bulletThe forgetting curves get progressively less steep as relearning goes on. ie the knowledge is getting cemented stronger in memory with every meeting.
bulletWe can widen the gap between each relearning stage.

What are the implications of this?

We have to plan the relearning and ensure it takes place at the correct time so that we can minimize the effort in relearning something.  For word learning, this can be done with an efficient word pair learning system.

Of course not all learners will have the same forgetting curves so we have to bear this in mind when constructing relearning schedules.

Designing a word-pair learning system to minimize the effect of the forgetting curve.

There is a difference between massed and distributed practice (sometimes called spaced or distributed rehearsal).  Massed practice means learning lots at one time whereas distributed practice refers to learning little bits at one over a longer period of time. Sitting down and learning 100 word pairs at once is massed practice.  Distributed practice is learning the 100 words in sets of say 10 words, coming back to them after a time.

Research has shown that distributed practice is more effective than massed practice because of the effects of the forgetting curve.  This implies that it is better to break a list of words into sets of say 10-12 words rather than trying to learn 100 at once.  We can take the benefit of a relearning schedule and distributed rehearsal if we time the learning of the sets to minimize the effect of the Forgetting curve.  This is how we do it. First we learn all 10 words in set 1. Then we learn the 10 words in set 2 In order that we do not forget too many of set 1 we have to meet them again, and similarly we have to meet set 2 again before we go on to set 3. Notice that the gap between the times we return to set 1 is lengthening in line with the relearning schedule.

Thus the set learning order will be


1 2

1 2 3

1 2 3 4

1 2 3 4 5

1 2 3 4 5 6

1 2 3 4 5 6 7

1 2 3 4 5 6 7 8

1 2 3 4 5 6 7 8 9

1 2 3 4 5 6 7 8 9 10

By this time set, 1 will have been met 10 times and it is likely that it will be remembered.

Unfortunately most learners are not this systematic. Most learners tend to select any set of words or any word to practise without considering the effects of the Forgetting Curve. Compare the above  to a learner who 'organized'  her learning of the sets like this.

1 3 6 7 4 3 8 2 5 1 9 2 8 7 3 2 5 9 .......

Note that we have to wait a long time before returning to set 1, and some sets are never relearned. The Forgetting Curve will be working overtime! It is thus no wonder that learners do not learn much in this way and soon get dispirited and feel a lack of progress.


The advantages of a relearning schedule system are obvious.......

a) The learning is minimizing the effect of the Forgetting Curve both by spacing the learning and by recycling the words.

b) Learning is planned. The learner knows exactly where to start learning, what needs relearning and when, and where to go next.

c) The learner is only working with a manageable amount of words.  In our case 10 sets of 10 words. Each learner will have to get a feel of her own learning ability and either adjust the number of sets worked with or adjust the number of words in each set.  Some learners for example may find it easier to have 6 sets of 8 words while others may be able to handle 15 sets of 12 words.

d) The learner can actually see progress, and the progress in numerical. Our learner can keep records of how many words have been learned which allows for the setting of numerical word learning goals.


When can we say a set has been learned?

If the whole set is known without mistakes for say 3 times, we can then put Set 1 aside and come back to it much later to confirm it is in long-term memory. In effect we can say that the set of words has been 'learned'. We can then add a new set (set 11).There is probably a maximum number of sets one can deal with at once. 10 sets may be a good number.

How can manage the sets?

Jan-Arjen Mondria devised a system he called the 'hand-held supercomputer' which is a system for learning individual words rather than sets of words.. His system works like this.

1. Each word to be learned is put on a word card.  This is a piece of paper with the target word on one side and its translation on the other.

2. Then you need to make a box with different slots in it.  This can easily be made of cardboard. I use an old shirt box which is about 3 cm tall.  I sellotaped strips of cardboard inside to make divisions. I have 8 divisions on the left side of the box and 8 divisions on the right side.

3. Put 10 word cards in the front most division and learn them. When each word is learned it goes in the division behind. When the first slot is empty of word cards then I look at slot 2. If the word card is remembered I put it in slot 3, if it is forgotten I put it back in slot 1. And so on. All forgotten words are returned to the start.

It looks like this.

      Slot 1                 Slot 2              Slot 3                   Slot 4                 ....

Successful learning

There are several reasons why this system is more effective than learning words in sets.

a) Each word gets only the amount of attention it needs.  If a word is in a set, then some will be known and some not, thus it is wasting time having to look at them again.

b) Once you put the lid on the box you can carry the system anywhere and each word is ready for you and in its right place at any time.

c) Word  learning is motor-manual, that is the learning involved physical movement as well as mental effort.  Research shows that the more modes of learning we use, the more likely it is learning will take place.



Mnemonics are 'memory tricks' that allow you to remember words  more easily. Research clearly shows that word pair learning using mnemonics is a very powerful - in fact the most powerful way to learn wordpairs.  There have been hundreds of studies comparing mnemonic learning with other forms of learning and no other method has been more successful.

There are many types of mnemonics, but they all work in similar ways. The key to success with mnemonics is to connect the foreign word's meaning with a sound and an image.  Sometimes this can be done by creating a story in your head about the new word.  For example I was trying to learn the Japanese word kaisuken  which is a kind of prepaid strip of discounted tickets one can use to pay for rides on a bus. Trying to remember this word I imagined a student of mine called Kai and my cousin Sue paying for the ride on a bus using one of these tickets. I know the Japanese words for ticket is ken. This little story has all 4 elements of a successful mnemonic.

- an image  -  I created an image of Kai and Sue with their ken on the bus

- the sounds     -      Kai + Sue + ken   =  kaisuken

- the meaning   -   there is a bus ticket in the picture

- the target word -     kaisuken is in the story

Sometimes words mnemonics are easy to create and sometimes not.  Research also shows that training ion mnemonic techniques has a tremendous benefit for learning. Mnemonic techniques can be applied to any form of discrete item learning such as names, places, dates, formulae and so on. Moreover research has shown that the more one practices mnemonics the better one gets art it.

Why are mnemonics so successful? Research into memory shows that if something is learned in a multi-faceted way, it is better than a single way. Because of the multiple connections involved, this strengthens the bonds between the new word and the rest of the lexicon and ties it in more strongly. Besides the multiple connections makes it easier to 'find' the word again as one can 'look' at any of the several connections to get to the match the word and its foreign meaning. Thus simple rote learning is not as effective because there may be only one route (meaning to form) with which to 'find' the word in the head.

Here are a two of many websites that discuss memory techniques. Many more can be found by using your favourite search engine.


Training in word pair learning

Thus in order to have a successful experience with word pair learning the teacher must explain the principles underlying word pair learning, how it words and how to do it successfully. Many of these points have been made above.


Very important endnote

There are several very important points to be made about the limits of word pair learning.

a) It is worth repeating that word pair learning will only be successful at one level of word learning. That is the meaning-form relationship. In other words the relationship between a word's meaning and its spelling or pronunciation. Word pair learning connects a previous known concept (the one in your mother tongue) with a new form in the foreign language and that is all. You also need the 'deeper' knowledge about a word.

b) Word pair learning does not necessarily mean that the word is fully 'known'.  The word pair learning of the form-meaning relationship is only the start of learning the word.  This is extremely important. Note that the form-meaning relationship is probably only available at the recognition level ie when you hear or read it, and may in some circumstances be available for productive use in speaking or writing. However there is still so much more to learn to make it available for productive use.  For example, you need to know

bulletwhich words it goes with (For example why we say Happy Birthday not Merry Birthday or mild cheese not weak cheese). These are called the word's collocations.
bulletwhich grammar words it goes with.  For example we need to know that we say "give something to someone",  not "give something at someone".  These are called the word's colligations.
bulletwhich circumstances you can and cannot use the word (For example in polite conversation or public speaking, for formal essay writing or a note to a friend)
bullethow the word is similar or dissimilar to other words
bulletits shades of meanings
bulletetc etc

c) It is too easy to forget that word pairs are not always exactly the same in the two languages. Concrete words like apple are probably exactly the same concept in both languages, but more abstract words are less likely to translate exactly.  Thus you should be on your guard to look out for the differences and similarities in use and shades of meaning.


These three points imply that we need to deepen the word pair level knowledge with further practice. There are several ways to get this 'deeper' level of word knowledge.

a) By reading and listening to lots and lots of the foreign language. Research shows that we learn best from reading and listening when it is just above your current ability level.  By reading lots of text you will meet the new word many times in different contexts and soon get a feel for how the word is used. The best form of reading and listening practice is graded reading using special small books made especially for language learners called graded readers. This type of learning is often called Extensive Reading. You can read more about it at these websites.

b) By studying the word carefully in a dictionary or in a vocabulary practice book

c) By trying to use it in your speaking and writing so that you can see if you are using it successfully (if not your speaking partner will say "huh? Sorry, what do you mean?"


Go to the Vocabulary Frequency Lists.
Go to the Vocabulary Reference database
Go to the Main Vocabulary index