Does testing actually reduce the time a student spends gaining an education? The answer will surprise you.
(SALEM, Ore.) - In a recent opinion piece by Neal Feldman, he makes numerous suggestions which he believes will turn around education. Among these is the suggestion:
Do away with 'multiple guess' exams returning to essay responses and blank answer spots where work is shown how they arrived at their answers. Yes, this makes grading more time consuming but the multiple guess model often shows only a student's ability to guess rather than accurately guaging [sic] their solid knowledge of the topic.
Neal provides a good opening for discussion about testing. Before we get too deeply into the issue, it would be helpful to define some terminology. While the terms assessment and evaluation are sometimes used interchangeably, it's clarifying to define them in the following way.
Assessment is what teachers do to guide their instruction. Assessments are seat-work, home-work, quizzes, tests, class participation, body language--anything which gives the teacher feedback as to whether or not the student is learning and how much they are learning. Teachers reflect daily on these assessments and modify their plans for the next day based on the results. If the teacher is teaching students how to multiply fractions, for example, the teacher either re-teaches key concepts the next day or moves on to the next lesson depending on these assessments.
Evaluation is defined as giving value to a body of work of a student. It is important also to understand that evaluations are specifically designed with a target audience in mind. SAT test scores, for example, are primarily for college entrance boards and are expressed in values which colleges best understand. While parents may look at SAT scores, they don’t give them the weight of their students’ report cards. Evaluations don't guide current instruction because evaluations are determined after the course work is complete. They are a summation valued with a measure shared and understood by the target audience.
Like Neal, educators are troubled by what’s called snapshot evaluations. Neal is right when he says multiple choice tests don't accurately measure what students know (although they are useful in the daily assessments teacher do to guide their instruction). Evaluation is most accurate when it is derived from a large and varied body of evidence. Educators believe that a range of products created during student learning will provide the most accurate measure. This range might include multiple choice and essay writing, but might also include performance-based projects. In woodshop for example, you wouldn't take an essay test to demonstrate your learning--you'd make a bookcase. The same concept can be applied to other subjects as well. The George Lucas Education Foundation edutopia.org is a great resource for thinking on project-based learning.
Another method of determining what students know is to interview them. This gives the teacher an opportunity to ask students probing and open-ended questions. For a variety of reasons students often know more than they demonstrate on written tests. They may misinterpret the question. They may suffer from test anxiety. They may talk themselves out of the right answer. They may be more verbal in expressing their ideas. They may find the test boring. A skillful interviewer can ask questions in a way which teases out what the student knows without giving away the answer. This is time-consuming and is usually reserved for students who are pursuing degrees such as doctors, lawyers, and Ph.D.‘s.
An important question to ask is who is the target audience for state testing? Is it parents? Perhaps. State test scores are reported to parents, but I don't believe they are the sole audience. If that were so then school report cards wouldn't be reported in newspapers. Another audience for state test scores is the tax-payers. It is to demonstrate that they are getting value for their investment. That's what I think is meant when politicians demand accountability. But measuring accountability with test scores is tricky.
Two more important definitions:
Criterion referenced testing is testing which creates a fixed goal which students must achieve to pass. It does not compare students to other students. It doesn’t get more difficult if students all do well.
Norm-referenced testing is testing which is designed so that only a certain percentage of students will pass it. Sometimes you'll hear about a test being "graded on a curve." So many students will get an A, so many students will fail and so on. The results often resemble a bell-curve.
A criterion test can be given and if every student achieves the benchmarks, then every student passes. Teachers and the public have been told that state-testing is criterion-based. If all students achieve the benchmark, all students will pass. But when a large number of students do pass, we are then told, "the tests are too easy" and next year's tests are more difficult. The harder the students work, the harder the tests become.
Although it went largely unnoticed in the press, the Department of Education in Oregon last school year entered into this game of Catch-22.
How did this happen?
Elementary schools in the state of Oregon in the past have done reasonably well on state tests. Secondary schools tests score were not so good. It was argued that since you couldn't predict the kinds of scores at the secondary level by looking at the elementary that either the elementary school tests were too easy or the secondary were too hard or both. Put another way, they claimed the percent of students who passed the tests at the elementary and secondary should be about the same and they weren’t the same. To correct the discrepancy the Depart of Education raised the passing scores in elementary about 3 points and lowered the passing score for secondary by about 3 points. Needless to say elementary teachers felt they were punished for doing a good job.
This in my mind raises questions.
What if elementary really is doing better than secondary? You’d never be able to measure it because every time the data demonstrated the difference the passing criteria gets adjusted. Might it be that high school students often work after school, are involved in sports, are going through puberty all of which might negatively influence their test scores? Might it be that the models of elementary and secondary education are different enough that there is a difference in the measured outcomes? (It is said that elementary teachers teach students and secondary teachers teach subjects.) That doesn’t mean that secondary teachers are less skillful or work less diligently that their elementary counter-parts. It just means that you can’t compare apples (secondary) with oranges (elementary).
Why test elementary students at all if their scores are handicapped to match the performance of secondary students? Just take the scores of the secondary students and post them to elementary principals. Save time and energy wasted on testing so that elementary teachers retrieve lost instruction time.
What if the reverse situation had occurred? What if elementary scores had been low and secondary educators were asked to raise their passing scores so that they were predictive with the elementary? I think you would have heard a protest from secondary teachers which would have rattled your teeth.
If state testing is going to be norm-referenced (that is by design a certain number will pass, a certain number will fail) then the public needs to understand that. If testing is going to be criterion-referenced, tests must not be made more difficult when students perform well.
Testing is not instruction. Repeat that three times and commit it to memory. Testing and especially test-preparation takes time away from class instruction. As a teacher I cringe to see how many days of instruction have been erased by high-stakes testing. There is nothing about the act of testing which improves a student's education if by education we mean the preparation of students to be citizen-workers who can solve problems, make good decisions and work collaboratively with others. It's an important distinction to make.
I’d like to finish with a quote from a New York Times article, "Free to Be," (1/12/03)
---Begin quote---
"Japanese children have famously scored at the top in international contests for math and sciences. But in a 1996 survey of scientific literacy by the Organization for Economic Cooperation and Development, an international organization for developed economies, Japanese adults placed second from the bottom among the 14 most advanced countries." (Danish adults scored highest, with the United States sixth.)
"I am hypothesizing that Japanese kids do well on tests when they are forced to study," says Mr. Sumitani of Learnnet. "But they did not learn out of curiosity and did not go through a self-motivated process of why they are learning and how studying will serve them."
---End quote---
Sixth place doesn't give us bragging rights, but clearly the United States is doing something better than Japan.
More to say about Japan in another essay.
-----------------------------------------------------
Glen L. Bledsoe is a 4th grade teacher currently working for the Molalla River School District. He previously taught in the Salem-Keizer School District. Glen also has taught in the School of Education at Willamette University and is adjunct faculty at the University of Oregon. He is deeply interested in the uses and impact of technology in education. For two and a half years Glen wrote a series of monthly essays about the issues of technology and education for the National Education Association at NEA.org. Glen has also written for Today's OEA and NEA Today magazines, among others. He and his wife Karen are the authors of over seventeen books. Glen is better known as Leonardo to readers of Salem-news.com’s weekly comic, Nota Bene.
Testing 1, 2, 3Salem-News.com