Some of my students are blissfully unaware of unscored questions. However, depending on which TOEFL iBT tutors you have studied with before, or which Facebook groups you hang out in, you may have heard about the “unscored” or “experimental” questions or passage in the TOEFL iBT Reading and Listening sections.
Specifically, I’m referring to the “unscored” questions that are mentioned by ETS in 2 places… First on this page of their website…
If you have been worried about these extra questions, keep reading!
My goal is to convince you that worrying about these actively damages your score and delays your progress.
- 3 Guiding Principles: Reliability, Validity, Fairness
- How We Know Those Principles Matter
- The Unscored Questions are NOT “Experimental”
- The Actual Purpose of the Unscored Questions
- The Quantity and Frequency of Unscored Questions
- The Best Way to Deal with Unscored Questions:
- Citations / Sources
3 Guiding Principles: Reliability, Validity, Fairness
You may feel like you don’t care, but it’s actually really helpful to talk about this first because you’ll have context for the rest of our discussion about “unscored” questions. Plus, you’ll start to see connections to the entire TOEFL iBT test — not just Reading and Listening.
ETS is one of many companies that produce tests. Every company that makes tests cares about 3 guiding principles… because if they ignore them, then they create a flawed test that isn’t trusted… and then it isn’t used.
Test makers like ETS are supremely concerned with these 3 concepts. Why? Because their reputation literally depends on them. This may be the first time that you’re seeing these terms, so I’m giving you the word families because I’ll be using these words quite a bit…
|reliability||reliable X unreliable|
|validity||valid X invalid|
|fairness||fair X unfair|
What "reliability" means
Reliability is like consistency or comparability. Can we compare your scores from a TOEFL test this month to your scores from a TOEFL test that you took last year?
In other words, does a Reading score of 30/30 from a test that was taken 2 weeks ago mean the same thing as a Reading score of 30/30 from a test that was taken 1 year ago? … Or 5 years ago? Another way to think about this is like quality control for the scores.
It’s very important that the scores are consistent and reliable over time.
What "validity" means
Validity is whether the activities inside the test actually measure anything. Can questions and activities on TOEFL iBT actually measure your ability to read or listen or speak or write in any kind of meaningful way?
So for example, if they ask someone to read a paragraph and answer questions, do those questions actually measure the person’s ability to read in English? Or if they ask someone to listen to a conversation and then identify the main idea, or details that each speaker mentioned, is that actually measuring that person’s listening ability?
If yes, then the activities or the test is “valid.” If no, then it’s “invalid.”
What "fairness" means
Fairness is whether you could possibly even have an answer.
Fairness is one of the reasons why the Independent prompts are so general and bland on TOEFL iBT Speaking Task 1 and the Independent Writing essay. TOEFL test writers need to make sure that people from different cultural backgrounds and socioeconomic levels all have the same opportunity to answer questions.
So, for example, you will never see a super specific question like this:
“Do you prefer Android smartphones or iPhones?”
Why wouldn’t they ask a question like that? Because it assumes 2 things: First, it assumes the test-taker used a smartphone before… What if the test-taker didn’t have the money to use a smartphone? What if they only had a flip phone? Or just a landline? And second, the question assumes that the test-taker has experience with 2 particular brands. Apple products tend to cost more, so it’s unfair to assume that EVERY test-taker who must answer this question had the money to try an Apple product or had the life experience to even form an opinion about it.
You may be thinking, “Hey wait a minute! The topics in the TOEFL iBT Reading and Listening passages aren’t fair because I never heard about _____ before…” The people who write the tests are worried about that. And it’s the reason why they give you enough details and information inside each activity that you can answer the questions without previous knowledge.
So yes, you must bring patience and focus to read or listen — and you must bring a big vocabulary. But if you bring those things, every single TOEFL iBT Reading and Listening task has all of the puzzle pieces that you need in order to answer correctly.
How We Know Those Principles Matter
It’s easy for test-takers and tutors alike to just brush off TOEFL iBT and say, “It’s unfair.” or “These test results are meaningless.” Despite those attitudes, ETS has published many things that lets us know they care an awful lot about those 3 guiding principles.
At the end of the day, if TOEFL iBT wants to maintain its reputation as a strong, good alternative to IELTS, then they simply must care about these concepts.
Here is how we know they care about reliability, validity and fairness.
One indication is just how long they take to produce a new version of the test. According to ETS (on page 6 of this document), it takes 6-18 months to produce a new version of the TOEFL iBT test.
During those 6 to 18 months, each volume goes through the following process:
- Writing passages, dialogues, lectures, questions and answer choices that are fair, culturally-accessible and do not require specialized studies in any particular industry
- Content review
- Fairness review
- Final editing
They aren’t cranking these out like Starbucks lattes! They are deliberately slow… presumably to avoid mistakes because they understand those would negatively impact you.
Another indication they care about the 3 guiding principles is this.
In the rare situation when they doubt (for whatever reason) that someone’s scores on Reading or Listening are not valid, they won’t report those scores.
This is a very rare occurrence and Michael Goodine wrote a blog post about how to deal with it if you experience it… But the point is that ETS already has a procedure in place to deal with it if and when scores might not be valid.
From the homepage of the website for TOEFL iBT, right on through the detailed research summaries, and into the research papers, you’ll see this message over and over and over again in one form or another…
“TOEFL is trustworthy.”
ETS’s entire reputation depends on reliability, validity and fairness. If TOEFL’s reputation was meaningless, then universities and professional licensing organizations would say, “Hey you know what? Let’s stop using TOEFL. We should use IELTS instead because it’s a more accurate measurement of people’s English.” The people who work for ETS won’t take that risk! That’s why they spend 6 to 18 months developing a single test that you answer in less than 4 hours.
The Unscored Questions are NOT "Experimental"
This is so important to discuss. I heard the questions are experimental from my students, who heard it from their friends, who heard it from some tutor or other, who read it in some book… who got it from where?
Everyday on Facebook, we can see the confusion in the community. People say “the experimental questions” or “the unscored passage is trying out new testing material.”
These are not only false, they send the message that ETS is experimenting on test-takers. This creates needless worry and frustration.
When ETS is experimenting, they tell you!
First, we know that when ETS does research on test-takers, they tell people and they pay participants! For example, in April 2021, ETS sent out the following email.
They originally offered to pay test-takers $45 and then later increased it to $50 to participate in a research study. ETS has the money and the means to conduct actual, legitimate research.
Tricking You Would Make Your Scores "Invalid"
Second, the 3 guiding principles of reliability, validity and fairness again give us a clear reason why ETS would not experiment on you in some fast and loose way during a real test… a test which everyone is counting on to be accurate.
It would be unfair (and wildly unethical… like to a scandalous degree) if they collected the cost of test registration from you and then turned around and performed research on you without your knowledge or consent.
This hypothetical scenario would also make the results unreliable and invalid.
Remember, we know they have the money and the ability to conduct actual research. They also have the money to pay their team to spend 6 to 18 months developing a 4-hour test. So if they want to experiment on you, why would they risk all the damage to their reputation of tricking you into participating?
I just don’t think they would.
"Unscored" Questions Aren't There for Experimenting
Third, ETS’s own literature explains the purpose of the unscored questions. We’ll get to that in a minute. Basically, they make it very clear that unscored questions are added for 2 reasons separate from research or experimentation.
We have to stop alluding to these as "experimental"
As I conclude this section, I want to caution the entire community (particularly tutors) about how important that it is that we collectively stop referring to these as “experimental” or “trying out new material” — as if there’s some irresponsible process that will fly out of control and damage students’ scores.
The Actual Purpose of the Unscored Questions
ETS explicitly states 2 reasons why they include unscored, “extra” questions.
In other words...
The unscored questions serve as a kind of “quality control” to make sure that TOEFL iBT Reading and Listening scores are statistically reliable and comparable from week to week, month to month and year to year.
The unscored questions are not included to hurt you or mess you up or confuse you.
They’re included to monitor the accuracy of scoring and to make sure that your score is consistently reliable.
And before you get worried about reason #2, just remember that there’s a difference between “experimenting” and “determining.” They spend 6 to 18 months developing each test. By the time you ever even see new test questions, they’ve been scrutinized a ton by multiple professionals.
Want more details?
The official term for the kind of quality control that ETS does is “equating” and the unscored questions are called “anchor blocks” because they “anchor” the scores from test to test and help them be more comparable to each other from week to week and year to year. If you want, you can read more about “equating” on page 12-13 of this document, and chapter 6 of this book.
The Quantity and Frequency of Unscored Questions
As an answer the question, “How many unscored questions are there? How frequently do we get them?” I found only guesses and stories.
Unlike above where I was able to copy-paste information or show you screenshots of official ETS documents, so far, I haven’t been able to find any satisfying public, official documentation that even hints at how many unscored questions you get, or how often you encounter them.
To me, proof is not someone’s guess. It’s also not a story about what someone else experienced. It’s not a document that none of us have permission to see.
To me, proof is something we can all link to, read, look at and discuss without making wild guesses.
If I missed a public, official source, I hope you will reach out with a copy of it so I can update this article. I haven’t been able to find actual proof from ETS about either the quantity or the frequency.
I realize you may find this alarming and stressful… but I am not going to leave you in a low place. Keep reading because I’ve got a solid way to deal with this.
If if if I were going to continue investigating this, the next step I would take would be to pay the incredibly high price to buy a particular guidebook that is written by 2 specialists with PhD’s in educational measurement. ETS says here on page 13 (last paragraph) they use this particular book as a guide. So this book may be the closest any member of the public could get to proof (without being employees of ETS who are probably legally prevented from discussing any of this unless they want to face a massive lawsuit). When the book arrives, I would read Chapter 6. Chapter 6 outlines something called “Item Response Theory (IRT) equating methods” and, like I said, it’s how ETS makes sure that your Reading and Listening scores from a current test are comparable and relatable to your scores from previous tests that you took.
But there are 3 reasons I won’t continue chasing this:
#1: The book is very expensive.
#2: Even if I wanted to spend money on that, I don’t have the skill or expertise to read, understand, interpret or discuss that book. I can score 30/30 on TOEFL iBT Reading or Listening, but that doesn’t mean I can decode something written by 2 guys with PhD’s in “educational measurement.” The process of score equating across time and space is way above my paygrade.
But mostly #3: I don’t think it matters whether I understand the tedious process of using unscored questions to equate TOEFL iBT Reading and Listening scores across time and space.
Why? Because of the discovery I am going to share about what 75% of my students do when they are scoring 23+ on Reading and Listening …
The Best Way to Deal With Unscored Questions
While I was reading all these sources and writing this article, one question became the most important:
“If there’s so much that we may never have proof for… How do we deal with these unscored questions?”
The damage caused by worrying about unscored questions is very clear. I know lots of students are concerned that these unscored questions hurt their scores. There’s the feeling of outrage at being distracted and exhausted by questions that don’t even matter. The sense of unfairness of answering a question right and not being rewarded. Generally, there is a sense of indignation and shock when people learn about these unscored questions. Plus, this discussion takes up a ton of time and energy both for test-takers and tutors both during classes and the test itself.
… BUT in my 11+ years in this industry, lots of students never even mentioned or asked about these unscored questions. For every student who is struggling to score 23+ on TOEFL Reading or Listening, there’s another student easily scoring 23+…
So I kept wondering again and again…
“How much time and energy do / did the students who score 23+ on TOEFL Reading and Listening spend thinking about these unscored questions?”
So I decided to conduct my own private, informal survey to get some kind of answer.
Is there any relationship between people’s TOEFL iBT Reading and Listening scores and how much time and energy they spend thinking about unscored questions during the test?
For my survey, I asked everyone these same questions…
I wanted to focus on a particular type of student, the one who regularly and repeatedly gets or got minimum scores of 23+ on TOEFL iBT Reading. In other words, I wanted to focus on people whose scores typically look like this…
… because it is really worth knowing, “Do people who regularly and repeatedly get high scores spend any time worrying about unscored questions?”
I’m still gathering data so I’ll continue updating this list as students respond. SO FAR… Here’s what I’ve got.
Results of test-takers who regularly and repeatedly get scores of 23+ on TOEFL iBT Reading and Listening
- 76.9% spent no time at all thinking about unscored questions
- 23% spent not much of their time thinking about unscored questions
Are you surprised? Are you thinking about unscored questions more than they are?
My Advice to Test-takers
Your obsession with and anxiety about unscored questions could be a legitimate and active factor in your low scores.
If you’re worried about how unscored questions are impacting your score, the most important thing to remind yourself: People who are regularly and repeatedly getting scores of 23+ on Reading and Listening spend very little to no time thinking about unscored questions!
Now is a great time for you to consciously adopt the mindset of people who get high scores.
Get away from worry-fests.
Get away from wondering how many unscored questions there are or how often they come up.
- They aren’t experimenting on you!
- The extra unscored questions exist as a form of “quality control.” There’s no punishment here.
- Students who get high scores are just focusing and blocking out distractions and they’re answering the questions—it’s time to do the same.
- Take some deep breaths. Remember the feeling of what it feels like to know something. Remember that at home, you’ve gotten great results on Reading and Listening… and that was because you took each question seriously and you gave 100% effort to answering every one. 💖 You can do it. It’s totally possible.
And if you see people talking or worrying about experimental or unscored questions on Facebook, share the link to this article so we as an industry can stop recycling falsehoods.
My Advice to Tutors
First, I urge tutors to immediately cease referring to these as “experimental” questions or “there to help ETS try out new questions” when all published official documentation indicates they are a deliberately slow, methodical organization. If you have an edit to make in your books, blog posts, social media posts or videos, I hope you will take the time to get in touch with your subscribers. Acknowledge it, retract it and correct it with your students.
Then, I urge every tutor to conduct their own survey and correlate the attitudes of high-scoring students with those of struggling students. If you find results similar to mine, I hope you’ll do some serious soul-searching about what the benefit is of bringing up a topic that does more harm than good.
If we’re here to truly help students and change lives, then we can’t brush off or overlook opportunities to prevent pointless damage.
Citations / Sources
Here’s a summary of the sources that I cited above.
- ETS (2021-2022) TOEFL iBT Information Bulletin
- ETS (undated) About the Test: TOEFL iBT Reading Section
- ETS (undated) TOEFL iBT Research Insight: Volume 1: Test Framework and Test Development
- ETS (undated) TOEFL iBT Research Insight: Volume 3: Reliability and Comparability of TOEFL iBT Scores.
- Kolen, M. J., & Brennan, R. L. (2004) Test equating, scaling, and linking: Methods and practices (2nd ed.). New York,