Goodhart's Law Runs Rampant in Auto Industry?-Page 3| Grassroots Motorsports forum |

Rigante New Reader
3/5/21 12:39 p.m.

I thought that increased curb weight, from much better crash performance and a more luxurious interior is probably the main culprit in terms of lower efficiency.

BUT:

1998 Camry's 3086 lbs is more than a current Corrola's 2900,

BMW e39 1998 3307 lbs & 2020 5 series is only 100lb more.... so it's not that

rslifkin UberDork
3/5/21 12:58 p.m.

Rigante said:
I thought that increased curb weight, from much better crash performance and a more luxurious interior is probably the main culprit in terms of lower efficiency.

BUT:

1998 Camry's 3086 lbs is more than a current Corrola's 2900,

BMW e39 1998 3307 lbs & 2020 5 series is only 100lb more.... so it's not that

What about a current Camry? The 98 Camry is in between the current Corolla and Camry in size.

DaewooOfDeath SuperDork
3/5/21 8:25 p.m.

alfadriver (Forum Supporter) said:

Are there companies and cars out there that have to specifically do things to pass the cycle and get good FE? Sure. Is it the whole fleet? No. Some of us are specifically told to avoid that, and to turn in anyone who demands that we do cycle specific things like that. (which is one more reason that the whole VW thing was so mystifying to me)

And I can very much see that one can come to the conclusion that it's all about gaming the system. But it's not everyone, and there is real data to show that fleet fuel economy is getting better. Of course, that is offset by increased fleet miles driven.....

I understand and apologize if I've come across as too negative. Emissions testing is certainly, on a fleet level, working. FE testing as well, though not as much. And as much as I hate rev hang and Hyundai/Kia being lazy, it's a small price to pay for living in a city of 1.3 million people and not having to chew my air. My intention with this thread is not to say the entire regime is failed, but rather the two following things:

1. Seeing if the FE and the WOT emissions were being partially gamed. I think we've gotten some pretty good answers to these questions. Namely, there's a combination of my selection bias on FE (mostly that I ignored trucks and SUVs, which have improved the most), gaming by some (especially in the EU pre-VW), and an emphasis on full throttle particulates/enrichment schemes over comprehensive emissions.

2. I wanted to see how the EPA fights against Goodhart behavior and if their strategies align with the things I've seen attempted in academic testing.

If you don't mind, here are the general strategies I've seen in academic testing. I'd like your opinion if they are roughly analogous to what the EPA does and if the EPA is doing anything that academic evaluators are missing. So ...

a) Surprise testing. There are a million ways to do this, but the general theme is that Dieselgate (or test cramming for the SAT) is really only possible at a reasonable cost when you know when and how the test will be. "There will be one or more tests this year, but I'm not telling you when." You can also achieve this by arranging it so that the people being evaluated never figure out that an evaluation has happened. The school district sometimes does this by sending evaluators who are also parents who portray themselves as normal parents to wander into PTA meetings etc. Either approach makes it more likely you observe the manufacturer/student/administrator at "default" state and not at the peak of a Goodhart distortion.

b) Hidden criteria. This is something big testers do, but it will be easier to understand if I give a strategy I started using for the Corona/online classes to calculate participation/attendance scores. A problem I ran into pretty early was students logging into the class and then heading off to do karaoke or whatever while still logged in. This was possible because they thought they knew the criteria for participation/attendance - namely, login records - and therefore they quickly figured out how to Goodhart the system. I used a hidden criteria to solve this problem. I hit each student with a minimum of 3 individual questions per class and I type a backup record of everything into a chat room. I can thus calculate the number and quality of answers students left in class chats to derive a participation score by looking over the records at the end of the semester. The students have no idea this is what I'm doing and thus don't know how to Goodhart the system. In order to be fair, I tell them "you will be graded on the frequency and quality of your responses" and then just don't tell them how I'm determining frequency and quality so they can't cheat.

3. Qualitative rather than quantitative evaluation strategies. Take the example of a history test on the SAT. The purpose of a history education is to make us better citizens/voters, help us learn from the mistakes and successes of people in the past, help us discover the reasons for and sources of our traditions and the traditions of other cultures. However, if we look at bad history test questions "What year was Henry VIII born?" - for example, we can see how easy this is to Goodhart. The student can simply memorize a list of birth dates, get a perfect score on the test and get nothing of value at all to make him/herself a better citizen, learn from the past or understand traditions. Qualitative questions partially solve this problem. If instead of "what year was Henry VIII born" we ask "what is the significance of Henry VIII's reign to modern Europe" the student can't really answer without properly understanding the subject.

These are the general strategies I've seen reduce Goodhart behavior. They are also, I'm sure you'll notice, much more vulnerable to litigation than crappy but objective tests.

DaewooOfDeath SuperDork
3/5/21 8:33 p.m.

bobzilla said:
In reply to DaewooOfDeath :

I find this interesting.... Every one of our Korean cars destroy the EPA ratings. The First gen Forte was a consistent 34mpg highway cruiser at 75mph. 36 at 70. The 00 Accent 5-spd averaged 39mpg over the 150k miles we had it. The 08 Rio auto would "only get 35 in comparison. The 2014 Koup 2.0 was a consistent 36mpg highway. The current Rio is the black sheep at only 35 highway for the wife. I get 38 when driving it so I think its an issue with her and not the car.

All the Korean stuff with mechanical throttles I've beaten the EPA estimates. It's the e-throttle stuff that's been worse for me. I think the highway/city thing is part of it as well. The e-throttle Korean stuff hasn't suffered when I do pure highway but it has fallen well short in city driving.

Pete. (l33t FS) GRM+ Memberand MegaDork
3/5/21 9:01 p.m.

Rigante said:
I thought that increased curb weight, from much better crash performance and a more luxurious interior is probably the main culprit in terms of lower efficiency.

BUT:

1998 Camry's 3086 lbs is more than a current Corrola's 2900,

BMW e39 1998 3307 lbs & 2020 5 series is only 100lb more.... so it's not that

Vehicle weight has remarkably little to do with economy once you're moving. It's more down to drivetrain losses, rolling resistance, aero cd, and frontal area.

I tend to ignore city economy, because there are so many variables that can't be easily accounted for.

alfadriver (Forum Supporter) MegaDork
3/5/21 9:17 p.m.

In reply to DaewooOfDeath :

Random in-use testing happens all the time- both by the regulatory agencies and OEM's on an official basis. And with the affordability (relatively) of PEMS devices, there are a lot of people out there other than the agencies and OEM's testing cars. So that's already in place.

If I understand your point about qualitative, it's being more robust to reality than the test. Outside of doing actual on road testing, the testing we are required to do covers a pretty wide range of driving styles, including really aggressive driving at speeds over the legal limit. Hot, cool, and cold testing, with A/C, too. And the fuel economy is calculated using 5 of the required cycles- at least the one that is advertised on the label.

The reason I don't see the US trending to an on road test is 50 years of model correlation between what the test results are and how the real world has developed.

Other than that, it's pretty impossible to have "intent" rules vs. actual ones.

DaewooOfDeath SuperDork
3/5/21 11:33 p.m.

The goal of evaluation, in my view, has little to do with policing intent. The goal is to align the incentives of the evaluated party with the goals of the evaluator. This is harder to do with "objective" testing, when the evaluated party knows exactly how the testing will occur and when they stand to gain from focusing on the test itself.

alfadriver (Forum Supporter) MegaDork
3/6/21 7:58 a.m.

In reply to DaewooOfDeath :

Ok, well I would suggest at this point, you should directly engage with the regulatory agencies. They actually have job openings right now. Unfortunately, the USAJobs site is down right now, or I would post all of the OTAG job openings here in Ann Arbor. I'm sure there are also openings for California ARB in Sacramento. Both of which are very much interested in lowering CO2 and criteria emissions.

At least, you should provide them with your ideas.