Frontloading HQ: FHQ electoral college methodology

Showing posts with label FHQ electoral college methodology. Show all posts

Monday, October 20, 2008

What About North Carolina? Can Obama Swing the Tar Heel State?

Yeah, what about North Carolina? Recently, FHQ has begun discussing the toss up states on our Watch List in terms of "magic numbers." Basically, this asks what it would take from the very next poll released from a state to force that state across, in this case, the partisan line. So, what is the Tar Heel state's magic number? This came up in the comments, and my response ended up being too long not to simply just create a new post.

----

Part of the reason FHQ shifted from a simple weighted average to a graduated weighted average (one that progressively discounts polls based on when they were in the field) is that states like Minnesota and North Carolina were unresponsive to a series of new polls that ran counter to where our averages had each state.

Some of that unresponsiveness was remedied with the methodological change, but did not move either state as much as some would have liked. The bubble seems to have burst in Minnesota for McCain, so the North Star state has worked its way back to essentially where it was prior to the conventions -- a strong Obama state (Sure, some of that has to do with the threshold being dropped.).

North Carolina, though, is a bit different. In the Tar Heel state we have witnessed a string of polls that have shown Obama ahead by margins up to 6 points with just a few pro-McCain polls peppered in. Yet, it is still seemingly stuck in the McCain toss up category. Much of this has to do with the amount of information we have in North Carolina. Even with the older polls discounted, there is an awful lot of McCain support inherent in the average. In other words, there are a lot of McCain polls for this recent series of Obama polls to overcome. The Tar Heel state has had around 50 polls conducted this year and none of them (other than the Zogby internet polls) favored Obama until after the Lehman collapse. That's a lot of McCain support in the average.

As far as a magic number is concerned, North Carolina is a lot like Ohio: it is going to take a lot to move things just a little. For the next poll to push North Carolina into the blue, it would have to give Obama a margin of 45 points. That's just not going to happen. But we may continue to see numbers come in under the current average of 2.4 (for McCain) that continue to chip away at that margin. In fact, I think that is likely. Between now and election day, we are likely to see polls that are in the +/-3 point range with some outliers thrown in.

Just for the heck of it let's do an exercise here. What if we lopped off all the polls conducted before Obama clinched the Democratic nomination; everything from June 3 on (FHQ has done something similar before.)? How would that affect things? Once we reweight the polls based on a lower number of days in the period examined, we find that Obama gains, but that McCain's lead shrinks to only 1.7 points (down from 2.4). What is North Carolina's magic number then? Not surprisingly, it drops, but not to anymore manageable a level. It would still require a poll with Obama ahead by 25 points to turn North Carolina blue. Obviously, the the number of pro-Obama polls it would take to successfully chip away at that average and turn it blue would be far fewer in this instance.

This is in line with my thinking about North Carolina. I'm a native Tar Heel and though I'm not there now, I still have family ties to the state. My sense is that North Carolina is a "close but not quite" state for Obama. Sure, I've been out of the state for a while, but North Carolina still feels (And yes, that certainly strays from the black and white we get from the numbers typically leaned on here at FHQ.) like a state that is a continued demographic shift away from becoming less reliably Republican -- at the presidential level -- and more reliably competitive. It speaks to the Democratic tilt of this election that North Carolina is talked about in the same breath with the Ohios and Floridas on the map.

UPDATE: Our discussion has extended beyond North Carolina in the comments to encompass a discussion of much of the South. Scott has taken the Census data on the African American percentage of the population and regressed that on Obama's support among whites in these states. A simple bivariate regression with some rather interesting results.

Here are the states Scott looked at (all have at one point or another shown John McCain and Barack Obama within single digits of each other):

Virginia: 39% (20%)
North Carolina: 38% (22%)
Georgia: 28% (30%)
South Carolina: 25% (29%)
Louisiana: 18% (32%)
Mississippi: 16% (37%)

Below is the plot of that relationship; one that shows a rather high correlation between the two variables. The data above are rank ordered based on the dependent variable (Obama's white support) and are displayed as such below.

Obama's White Support as a Function of the Percentage African American

[Click Graphic to Enlarge]

A big tip of the cap to SarahLawrenceScott for a nice addition to our discussion. Kudos!

Recent Posts:
The Electoral College Map (10/20/08)

The Electoral College Map (10/19/08)

The Electoral College Map (10/18/08)

Tuesday, October 7, 2008

Frequently Asked Questions: Electoral College Analysis

What data are you using to differentiate between states?

FHQ uses all state-level, trial-heat polls in its averages for each state. We use all the polls available to us since Super Tuesday, when the race for the Democratic nomination officially became a two person race; one with two seemingly evenly matched candidates. The argument can be made that Obama was even in the race following his Iowa victory, but did not fully quash the "flash in the pan" argument until after the split of the contests on February 5.

Also, I use only the polls that avoid the selection bias inherent in internet-based polls or mail-in polls. As such, the three waves of Zogby Interactive polls are excluded as are the mail-in Columbus Dispatch polls.

Finally, the data used at this stage in the game is the data attendant to the "likely" voter samples. With a month to go, those sample are more accurate than they would have been only a couple of months ago. Also, in the event that a polling firm posts two different versions of a poll based on whether third party candidates are included, it is FHQ's policy to take the version with those candidates on the sample ballot.

Why use those past polls at all?

Indeed, why not just use the most recent poll or polls like everyone else? Well, if I'm just doing what everyone else is doing, why even do it? I can quit now and go look at what Pollster or Real Clear Politics, to name just a couple, have to say on the matter. That's part of the reasoning, but the main reason for the inclusion of past polls is to avoid the volatility of polling. FHQ doesn't want fluctuation for the sake of fluctuation. If one poll is an outlier, fine, but that one poll should not be able to fundamentally shift the average and the projected outcome of any given state. The past polls are included because they represent the feelings of a group of respondents at a particular point in the race. Those feelings may be latent in the current environment, but in FHQ's estimation, should be accounted for in some way, shape or form. If the McCain campaign were able to effective make Jeremiah Wright an issue again, we could return to some degree to the polling distribution of that period. Will that happen? Maybe, maybe not, but that will be controlled for nonetheless.

How do you determine which state goes go into which categories on your map?

Early on in this process, it was simply a matter of averaging the polling data we had at our disposal. But as new polling data emerged, the older data served as an anchor on trends of the race -- at that time in the midst of the Democratic nomination battle. From May through the close of the nominating phase of the race, FHQ took the average of a state's polls, but discounted all but the three most recent polls. Following Clinton's withdrawal from the race, we took the opportunity to tweak that yet again, discounting all but the single most recent poll in a given state. The goal then was to make the average more responsive to developing trends in the race, but not responsive to the point that a single poll fundamentally shifted the outlook in a state.

That responsiveness balance is an important element here. Lately, as the polls have trended toward Obama, FHQ's averages have stagnated, moving very little in the face of the Obama flavor to the polls out in the wake of the economic situation on Wall Street. So we have once again fine-tuned our formula in the hopes of being responsive to a new direction in the race, but not simply responsive to one potentially outlier poll.

As I said in Saturday's electoral college post, our method of averaging serves us well in most states, but the exceptions are potentially consequential to the race for the White House. If you look at the Electoral College Spectrum, for example, that rank ordering of the states seems about right. The underlying averages in states like Florida, Minnesota and North Carolina, though, place them in positions outside of where the current trend would likely place them. At issue is the weighting formula for all the past polls backing up to Super Tuesday. All but the most recent poll had been discounted at the same rate and that meant that polls in March were treated the same as polls in September. Under the old configuration, that most recent poll counted as two-thirds of the average and all the other polls, treated with a blanket discount rate, accounted for the remaining one-third.

How exactly are you weighting those past polls?

As I explained above, FHQ's practice has been to discount each poll at the same rate. However, that is likely causing problems for the averages in some states. There is, then, a need to re-examine those weights specifically. The method we have settled on is to use what we are calling a graduated weighted average. And what that does is to discount polls in February at a level greater than more recent polls from August or September.

So, how exactly is FHQ weighting those past polls? The first step was to determine how many days there will have been between Super Tuesday (February 5) and election day (November 4). There are 273 days counting November 4, but that number won't be useful until that actual day. The real point of that determination is to assign a number to each date in between. February 6, then, was day one and yesterday, October 6, was day number 244. To determine the weight, the median point at which a poll was in the field, is used as the numerator while the day we are currently in -- today's numbers reflect yesterday's changes, so 244 -- is the denominator. That equation gives us the weight of any given poll. The poll numbers on that day are then multiplied by that weight.

However, there is one more twist I'll add to this. The effect this change has is only at the margins. Why? Well, there are a couple of things happening here. First, the graduated weighting essentially averages out to the blanket weight applied to all polls before. There are differences, but they are minimal in most cases. The other, related issue is that the relative weight of the most recent poll shrinks after the reweighting of the other polls. The blanket discount rate on past polls basically cut each past poll's value in half. Now that polling frequency has increased, though, there are a lot more polls that are at greater than 80% value. That threatens the preeminent position of the most recent poll. It is too much of an anchor on that poll. To confront this problem, and to give the most recent poll a little more oomph, we cut the weights in half. Relative to each other, then, the past polls are treated with the same basic weight they had before, but relative to the most recent poll they have been minimized.

Why are the thresholds between categories on the map where they are?

For much of this process, the threshold between a strong state for either candidate and a lean state was arbitrarily set at a 10 point margin. Likewise the margin separating a lean state from a toss up state was 5 points. However, as we have approached election day, it has obviously become more difficult for the candidates to make up enough ground to, if not overtake the other candidate in a state, become competitive there. In a nod to that fact, the thresholds were dropped to 9 points and 4 points, respectively, following the first debate. After the final debate, with just less than three weeks left in the race, the threshold will be dropped again to three points between the toss up and lean states. At that point, it probably will not be necessary to discuss the race in terms of three categories. It will be a question of which states are close and which states aren't then. However, FHQ will evaluate where the potential breaking point is between the lean states and the strong states at that point. It may not be necessary to talk about lean states at that point, but that distinction does add an element of clarity to how we perceive all the states in relation to each other.

Wednesday, October 1, 2008

Here's the Deal...

Alright folks. This is a change election, or so I've been told, and to have an electoral college analysis that does not respond well to changes in polling, is not necessarily a plus. I can see the writing on the wall and have now for a few days.

So here's the plan, both short and long term.

Tonight, I'll update the map as if there was no change to the formula and then have a few words to say in a separate post about the Muhlenberg polling discussion that sprang out of last night's update. It will likely not be tonight, but tomorrow I will revise the formula behind the map and see if we can kick start things around here.

Now the big question now is, well, why didn't you make this change after McCain's convention in St. Paul? Things were moving in his direction then. Why favor Obama with a change to the formula now? These are all valid and good questions. [I should have thought of them myself.] The reason is that conventions are part of what Jim Campbell would call the predictable campaign. We expected McCain to get at least something of a bounce out of the convention. The average's job in that scenario wasn't to mute the shift toward McCain, but to account for the likely temporary nature of the fluctuation. What we are witnessing now in the polls is something different and the mountain of past polls in our data set are too much of an anchor on the new -- and different -- data we now have. In other words, some revision is necessary to capture the true nature of the change. Whereas the convention bounce was temporary, the movement now likely isn't.

I'll be back shortly. I need to add in the afternoon polls to the averages.

Recent Posts:
The Electoral College Map (10/1/08)

The Electoral College Map (9/30/08)

The Electoral College Map (9/29/08)