Converting Quarterly Data into Monthly Using Cubic Spline Interpolation

(See this post at Chamberlain Economics, L.L.C.)

A common problem with time-series data is getting them into the right time interval. Some data are daily or weekly, while others are in monthly, quarterly or annual intervals. Since most regression models require consistent time intervals, an econometrician's first job is usually getting the data into the same frequency.

In this post I'll explain how to solve a common problem I've run into: how to divide quarterly data into monthly data. To do so, we'll use a method known as "cubic spline interpolation."

Cubic Spline Interpolation
One of the most widely used data sources in economics is the National Income and Product Accounts (NIPAs) from the U.S. Bureau of Economic Analysis. They're the official source for U.S. GDP, personal income, trade flows and more. Unfortunately, most data are published only quarterly or annually. So if you're hoping to run a regression using monthly observations -- for example, this simple estimate of the price elasticity of demand for gasoline -- you'll need to split these quarterly data into monthly ones.

A common way to do this is by "cubic spline interpolation." Here's how it works. We start with n quarterly data points. That means we have n-1 spaces between them. Across each space, we draw a unique 3rd-degree (or "cubic") polynomial connecting the two points. This is called a "piecewise polynomial" function.

To make sure our connecting lines form a smooth line, we force all our first and second derivatives to be continuous; that is, at each connecting point we make them equal to the derivitive on either side. When all these requirements are met -- along with a couple end-point conditions you can read about here -- we have a (4n-4) x (4n-4) linear system that can be solved for the coefficients of all n-1 cubic polynomials.

Once we have these n-1 piecewise polynomials, we can plug in x values for whatever time intervals we want: monthly, weekly or even daily. The polynomials will give us a pretty good interpolation between our known quarterly data points.

An Example Using MATLAB
While the above method seems simple, doing cubic splines by hand is not. A spline for just four data points requires setting up and solving a 12 x 12 linear system, then manually evaluating three different polynomials at the desired x values. That's a lot of work. To get a sense of how hard this is, here's my own Excel file showing what's involved in fitting a cubic spline to four data points by hand.

In practice, the best way to do a cubic spline is to use MATLAB. It takes about five minutes. Here's how to do it.

MATLAB has a built-in "spline()" function that does the dirty work of cubic spline interpolation for you. It requires three inputs: a list of x values from the quarterly data you want to split; a list of y values from the quarterly data; and a list of x values for the monthly time intervals you want. The spline() function formulates the n-1 cubic polynomials, evaluates them at your desired x values, and gives you a list of interpolated monthly y values.

Here's an Excel file showing how to use MATLAB to split quarterly data into monthly. In the file, the first two columns are quarterly values from BEA's Personal Income series. Our goal is to convert these into monthly values. The next three columns (highlighted in yellow) are the three inputs MATLAB needs: the original quarterly x values (x); the original quarterly y values (y); and the desired monthly x values (xx).

In the Excel file, note that the first quarter is listed as month 2, the second quarter as month 5, and so on. Why is this? BEA's quarterly data represent an average value over the three-month quarter. That means they should be treated as a mid-point of the quarter. For Q1 that's month 2, for Q2 that's month 5, and so on.

The next step is to open MATLAB and paste in these three columns of data. In MATLAB, type " x = [ ", cut and paste the column of x values in from Excel, type " ] " and hit return. This creates an n x 1 vector with the x values. Repeat this for the y, and xx values in the Excel file.

Once you have x, y, and xx defined in MATLAB, type "yy = spline(x,y,xx)" and hit return. This will create a new vector yy with the interpolated monthly y values we're looking for. Each entry in yy will correspond to one of the x values you specified in the xx vector.

Copy these yy values from MATLAB, paste them into Excel, and we're done. We now have an estimated monthly Personal Income series.

Here's an Excel file summarizing the above example for splitting quarterly Personal Income data into monthly using MATLAB. Also, here's a MATLAB file with the x, y, xx, and yy vectors from the above exercise.

Posted by Andrew on Wednesday January 20, 2010 | Feedback?



* * *


The Linear Algebra View of the Fibonacci Sequence

The Fibonacci sequence is a beautiful mathematical concept, making surprise appearances in everything from seashell patterns to the Parthenon. It's easy to write down the first few terms -- it starts with 0, 1, 1, 2, 3, 5, 8, 11, with each term equal to the sum of the previous two. But what about the 100th or 100,000th term? Can we find them without doing thousands of calculations?

In a previous post, I explained how simple linear models can help us understand where systems like this are headed in the long run. In this post, I'll explain how the same tools can help us see the "long run" values of the Fibonacci sequence. The result is an elegant model that illustrates the connection between the Fibonacci sequence and the so-called "golden ratio", an aesthetic principle appearing throughout art, architecture, music, book design and more.

Setting Up the Model
Each term in the Fibonacci sequence equals the sum of the previous two. That is, if we let Fn denote the nth term in the sequence we can write . To make this into a linear system, we need at least one more equation. Let's use an easy one: . Putting these together, here's our system of linear equations for the Fibonacci sequence:





We can write this in matrix notation as the following:



Here's what this is saying. If we start with the vector on the right and multiply it by this 2 x 2 matrix, the result is the vector on the left. That is, multiplying our starting vector by the matrix above gives us the next element in the sequence. The n-1 element from the right becomes the n element on the left, and the n-2 element becomes the n-1 element. Each time we multiply by this matrix we're moving the system from one term of the Fibonacci sequence to the next.

As explained here, the general form for dynamic linear models like this is:



where u0 is the initial state, u1 is the final state, and A is the transformation matrix that moves us from one state to the next. As explained in the earlier post, we can use eigenvalues and eigenvectors to solve for the long run values of systems like this.

Here's how we do that. Let S be a 2 x 2 matrix where the columns are the two eigenvectors of our transformation matrix A. And let be a 2 x 2 matrix with the two eigenvalues of A along the diagonal and zeros elsewhere. By the definition of eigenvalues and eigenvectors, we have the following identity:



Solving for A, we have:



Plugging this into our above equation relating u0 and u1, we have the following system:



This equation relates the initial state vector u0 to the next state u1. But each time we multiply by A it moves us to the next state, and the next state, and so on. Imagine we multiply k times. We get the following general form:



This equation is what we're looking for. It allows us to quickly find the kth term in the Fibonacci sequence with a simple calculation. It relies only on the initial state vector u0 and the eigenvalues and eigenvectors of the transformation matrix A. That's our linear model.

Putting the Pieces Together
Now that we have a model, let's find the S, and other parts that make it up.

The first step is to find the eigenvalues of the A matrix. This is given by:



I won't do the math here, but if you write down the characteristic polynomial and solve for the two eigenvalues, you'll get the following:





For a quick check that these are right, note that the trace of A is 1. The sum of the two eigenvalues is also 1. Since the eigenvalues must always sum to the trace, we've got the right values here.

The next step is to find the two eigenvectors that correspond to the eigenvalues. Let's start with . To do this, we write the following:





This implies that . Our "free" variable here is x2, so we'll let that equal 1. Then we can solve for x1. We get the following:







Using the old algebra trick for the difference of squares -- that is, -- we can simplify this by multiplying both numerator and denominator by as follows:



So our first eigenvector v1 is equal to:



Following the same process for give us the other eigenvector:



Note that the vectors v1 and v2 are orthogonal to each other. That is, the dot product between them is zero. This comes from a basic theorem of linear algebra: every symmetric matrix will have orthogonal eigenvectors. Since A is symmetric, our two eigenvectors are perpendicular to each other.

Now that we have the eigenvalues and eigenvectors, we can write the S and matrices as follows:





To complete our model, we also need to know the inverse of the S matrix. Thankfully, there's a simple formula for the inverse of a 2 x 2 matrix. If A is given by:



then the inverse of A is found by transposing the diagonal terms, putting a negative sign on the off-diagonal terms, and multiplying by 1/(determinant of A), or



Using that formula, we get:



As explained in our earlier post, our final step is to write our initial state vector u0 as a combination of the columns of S. That is, we can write:



where c is a 2 x 1 vector of scalars. Solving for c, we get:



For this example, let's let our initial state vector u0 be (1,1). These are the second and third terms of the Fibonacci sequence. Note that you can use any two subsequent terms for this step -- I'm just using (1,1) because I like the way the math works for it. So we have:



Since , that means



The Result
Putting everything together, we can write our final model for the Fibonacci sequence:





Multiplying this out, we get the following extensive form of the model:



This equation gives the k+3 and k+2 terms of the Fibonacci sequence as a function of just one variable: k. This allows us to easily find any term we'd like -- just plug in k. For example, imagine we want the 100th term. Simply let k = 98, and solve the above for F101 and F100. This is a huge number, by the way -- 218,922,995,834,555,000,000 -- something you can easily verify in this Excel spreadsheet. (Note that whether this is the 99th or 100th term depends on whether you label 0 or 1 to be the first term of the sequence; here I've made zero the first term, but many others use 1.)

Some Analysis
So what happens to the above system as k grows very large? Do the terms in the Fibonacci sequence display some regular pattern as we move outward?

All changes from one term to the next are determined by k in the above model. Imagine k grows very large. In the second term in the equation, we can see that



That means that as k gets bigger, the second term in the equation goes to zero. That leaves only the first term. That means as



So as k grows large, the ratio of the k+1 to the k term in the Fibonacci sequence approaches a constant, or



This is a pretty amazing result. As we move far out in the Fibonacci sequence, the ratio of two subsequent terms approaches a constant. And that constant is equal to the first eigenvalue of our linear system above.

More importantly, this value is also equal to the famous "golden ratio", which appears in myriad forms throughout Western art, architecture, music and more. In the limit, the ratio of subsequent terms in the Fibonacci sequence equals to the golden ratio -- something that's not obvious at all, but which we can verify analytically using our model above.

If you'd like to see how this works in a spreadsheet, here's an Excel file where you can plug in values for k and find the k+2 and k+1 terms in the sequence.

Posted by Andrew on Friday October 16, 2009 | Feedback?



* * *


New Study of the Waxman-Markey Cap-and-Trade Bill

We've released my latest study today, examining the economic impact of the Waxman-Markey cap-and-trade bill (H.R. 2454) passed by the U.S. House of Representatives in June.

The study is basically a critique of recent Congressional Budget Office (CBO) distributional estimates that suggest the bill’s impact is likely to be progressive across income groups. We find the bill is much more likely to be regressive across income groups once the microeconomic response of regulated public utilities is taken into account. Under this framework, we estimate the bill will result in net benefits to the nation’s highest-earning 20 percent of earners, while imposing net costs on the lowest-earning 80 percent of U.S. households.

Check out the full study and news release here.

Posted by Andrew on Wednesday September 30, 2009 | Feedback?



* * *


Deriving the Poisson Distribution from the Binomial Distribution

At first glance, the binomial distribution and the Poisson distribution seem unrelated. But a closer look reveals a pretty interesting relationship. It turns out the Poisson distribution is just a special case of the binomial -- where the number of trials is large, and the probability of success in any given one is small.

In this post I'll walk through a simple proof showing that the Poisson distribution is really just the binomial with n approaching infinity and p approaching zero.

The Proof
The binomial distribution works when we have a fixed number of events n, each with a constant probability of success p. Imagine we don't know the number of trials that will happen. Instead, we only know the average number of successes per time period. So we know the rate of successes per day, but not the number of trials n or the probability of success p that led to that rate.

Define a number . Let this be the rate of successes per day. It's equal to np. That's the number of trials n -- however many there are -- times the chance of success p for each of those trials. Think of it like this: if the chance of success is p and we run n trials per day, we'll observe np successes per day on average. That's our observed success rate lambda.

Recall that the binomial distribution looks like this:



As mentioned above, let's define lambda as follows:



Solving for p, we get:



What we're going to do here is substitute this expression for p into the binomial distribution above, and take the limit as n goes to infinity, and try to come up with something useful. That is,



Pulling out the constants and and splitting the term on the right that's to the power of (n-k) into a term to the power of n and one to the power of -k, we get



Now let's take the limit of this right-hand side one term at a time. We'll do this in three steps. The first step is to find the limit of



In the numerator, we can expand n! into n terms of (n)(n-1)(n-2)...(1). And in the denominator, we can expand (n-k) into n-k terms of (n-k)(n-k-1)(n-k-2)...(1). That is,



Written this way, it's clear that many of terms on the top and bottom cancel out. The (n-k)(n-k-1)...(1) terms cancel from both the numerator and denominator, leaving the following:



Since we canceled out n-k terms, the numerator here is left with k terms, from n to n-k+1. So this has k terms in the numerator, and k terms in the denominator since n is to the power of k. Expanding out the numerator and denominator we can rewrite this as:



This has k terms. Clearly, every one of these k terms approaches 1 as n approaches infinity. So we know this portion of the problem just simplifies to one. So we're done with the first step.

The second step is to find the limit of the term in the middle of our equation, which is



Recall that the definition of e = 2.718... is given by the following:



Our goal here is to find a way to manipulate our expression to look more like the definition of e, which we know the limit of. Let's define a number x as . Now let's substitute this into our expression and take the limit as follows:



This terms just simplifies to e^(-lambda). So we're done with our second step. That leaves only one more term for us to find the limit of. Our third and final step is to find the limit of the last term on the right, which is



This is pretty simple. As n approaches infinity, this term becomes 1^(-k) which is equal to one. And that takes care of our last term.

Putting these three results together, we can rewrite our original limit as



This just simplifies to the following:



This is equal to the familiar probability density function for the Poisson distribution, which gives us the probability of k successes per period given our parameter lambda. So we've shown that the Poisson distribution is just a special case of the binomial, in which the number of n trials grows to infinity and the chance of success in any particular trial approaches zero. And that completes the proof.

Posted by Andrew on Monday September 14, 2009 | Feedback?



* * *


A Simple Explanation of Why Lagrange Multipliers Actually Works

The method of Lagrange multipliers is the economist's workhorse for solving optimization problems. The technique is a centerpiece of economic theory, but unfortunately it's usually taught poorly. Most textbooks focus on mechanically cranking out formulas, leaving students mystified about why it actually works to begin with.

In this post, I'll explain a simple way of seeing why Lagrange multipliers actually do what they do -- that is, solve constrained optimization problems through the use of a semi-mysterious Lagrangian function.

Some Background
Before you can see why the method works, you've got to know something about gradients. For functions of one variable there is -- usually -- one first derivative. For functions of n variables, there are n first derivatives. A gradient is just a vector that collects all the function's partial first derivatives in one place.

Each element in the gradient is one of the function's partial first derivatives. An easy way to think of a gradient is that if we pick a point on some function, it gives us the "direction" the function is heading. If our function is labeled , the notation for the gradient of f is .

The most important thing to know about gradients is that they always point in the direction of a function's steepest slope at a given point. To help illustrate this, take a look at the drawing below. It illustrates how gradients work for a two-variable function of x1 and x2.

The function f in the drawing forms a hill. Toward the peak I've drawn two regions where we hold the height of f constant at some level a. These are called level curves of f, and they're marked f = a1, and f = a2.



Imagine yourself standing on one of those level curves. Think of a hiking trail on a mountainside. Standing on the trail, in what direction is the mountain steepest? Clearly the steepest direction is straight up the hill, perpendicular to the trail. In the drawing, these paths of steepest ascent are marked with arrows. These are the gradients at various points along the level curves. Just as the steepest hike is always perpendicular to our trail, the gradients of f are always perpendicular to its level curves.

That's the key idea here: level curves are where , and .

How the Method Works
To see how Lagrange multipliers work, take a look at the drawing below. I've redrawn the function f from above, along with a constraint g = c. In the drawing, the constraint is a plane that cuts through our hillside. I've also drawn a couple level curves of f.



Our goal here is to climb as high on the hill as we can, given that we can't move any higher than where the constraint g = c cuts the hill. In the drawing, the boundary where the constraint cuts the function is marked with a heavy line. Along that line are the highest points we can reach without stepping over our constraint. That's an obvious place to start looking for a constrained maximum.

Imagine hiking from left to right on the constraint line. As we gain elevation, we walk through various level curves of f. I've marked two in the picture. At each level curve, imagine checking its slope -- that is, the slope of a tangent line to it -- and comparing that to the slope on the constraint where we're standing. If our slope is greater than the level curve, we can reach a higher point on the hill if we keep moving right. If our slope is less than the level curve -- say, toward the right where our constraint line is declining -- we need to move backward to the left to reach a higher point.

When we reach a point where the slope of the constraint line just equals the slope of the level curve, we've moved as high as we can. That is, we've reached our constrained maximum. Any movement from that point will take us downhill. In the figure, this point is marked with a large arrow pointing toward the peak.

At that point, the level curve f = a2 and the constraint have the same slope. That means they're parallel and point in the same direction. But as we saw above, gradients are always perpendicular to level curves. So if these two curves are parallel, their gradients must also be parallel. That means the gradients of f and g both point in the same direction, and differ at most by a scalar. Let's call that scalar . That is,



Solving for zero, we get



This is the condition that must hold when we've reached the maximum of f subject to the constraint g = c.

Now, if we're clever we can write a single equation that will capture this idea. This is where the familiar Lagrangian equation comes in:



or more explicitly,



To see how this equation works, watch what happens when follow the usual Lagrangian procedure.

First, we find the three partial first derivatives of L,

, ,

and set them equal to zero. That is, we need to set the gradient equal to zero.

To find , we take the three partial derivatives of L with respect to x1, x2 and lambda. Then we place each as an element in a 3 x 1 vector. That gives us the following:



Recall that we have two "rules" to follow here. First, the gradients of f and g must point in the same direction, or . And second, we have to satisfy our constraint, or .

The first and second elements of make sure the first rule is followed. That is, they force , assuring that the gradients of f and g both point in the same direction. The third element of is simply a trick to make sure g = c, which is our constraint. In the Lagrangian function, when we take the partial derivative with respect to , it simply returns back to us our original constraint equation.

At this point, we have three equations in three unknowns. So we can solve this for the optimal values of x1 and x2 that maximize f subject to our constraint. And we're done. So the bottom line is that Lagrange multipliers is really just an algorithm that finds where the gradient of a function points in the same direction as the gradients of its constraints, while also satisfying those constraints.

As with most areas of mathematics, once you see to the bottom of things -- in this case, that optimization is really just hill-climbing, which everyone understands -- things are a lot simpler than most economists make them out to be.

Posted by Andrew on Friday August 21, 2009 | Feedback?



* * *


Deriving the Trigonometric Functions Without Geometry

Most people first meet sine and cosine in a basic geometry class. The lesson starts with a right triangle, and derives sine and cosine diagramtically using angles, lines and ratios. This is the "triangle view" of the trigonometric functions.

Most people are suprised to learn that's not the only way to derive them. While the triangle view of sine and cosine is intuitive, it actually conceals many interesting properties. But more importantly, the focus on lines and angles gives many people the impression that sine and cosine are "something from geometry" -- not objects with their own interesting properties and deep connections to other branches of math.

In this post, I'll explain a more elegant way of deriving sine and cosine that uses no geometry. This is the "real analysis" view. We'll start with a single assumption, and develop them analytically based on our earlier discussion of the Taylor series expansion (read here).

Starting Simple
Imagine a function with the following property. The function's second derivative is equal to the negative of the function itself. That is,



This is our starting assumption. It's just another way of saying that sine and cosine -- whatever they are -- need to have this one property. If we start with sine or cosine and take their derivative twice, we should end up back at the original function with a negative sign on it. Our goal is to show how to get from this assumption to actual algebraic functions for sine and cosine.

Let's take a closer look at our assumption. It relates the function to its second derivative -- a simple differential equation. But what about the other derivitives? To find them, let's take the next few derivatives of our assumption and see what happens.

Differentiating both sides and solving each time for f'', f''' and so on, we get the following













We can already see the pattern here. As we take more and more derivatives, we always end up back at either f or f'. The only thing that changes is the positive or negative sign in front.

Now that we have n derivitives of f, we're ready to use our earlier finding about Taylor series expansions and write an expression for f.

Recall the Taylor expansion of f around the point c is given by



Letting c = 0 -- which is technically a Maclaurin rather than a Taylor -- we get the following expansion



But as we've seen above, we already have expressions for the second, third and so on derivitives of f. They're all either f, -f, f' or -f'. Substituting these in, we get the following



Notice the pattern in the signs. The expansion starts with two positive terms, then has two negative terms, then has two positive terms, and so on. The pattern repeats forever, and each term involves only f or f'.

Since the function is really based two parameters -- f(0) and f'(0) -- we can rewrite this in a more useful way as follows. Label our two parameters c1 and c2. That is, let f(0) = c1, and let f'(0) = c2. Rewriting with this notation we get



Now let's pick two sets of values for our parameters c1 and c2. These will define two distinct functions, both of which will satisfy our initial assumption.

First, let c1 = 0, and c2 = 1. Plugging into the above, we get the following function for sine





This is the sine function we're looking for. When x = 0, the above function is zero. When x = pi/2 the function is one. And it works for all real values of x.

For our second case, let c1 = 1, and c2 = 0. Plugging this into our expansion, we get the following for cosine:





This is the cosine function we're looking for. When x = 0, the function is one. When x = pi/2, it's equal to zero. And like the sine function above, it works for all .

We can easily check that these both satisfy our initial assumption that f'' = -f. And we can also verify another basic relationship between sine and cosine from calculus by taking the first derivative of each, which is that



and also



Now that we've developed functions for sine and cosine, we can go on to derive all the other trigonometric functions -- tangent, cosecent, and so on -- from these.

For more on the "real analysis" view of sine and cosine, see this famous primer, especially Chapter 7, Section 4.

Posted by Andrew on Thursday June 18, 2009 | Feedback?



* * *


A Sewing-Circle Method for Estimating Pi

Here's another method for estimating pi that basically requires a hardwood floor, a bag of sewing needles and plenty of time.

Imagine standing over a hardwood floor. Notice the parallel lines between the boards. Now imagine dropping a single sewing needle onto it. It bounces, and comes to rest. Question: What is the probability the needle will come to rest on a line?

The picture below illustrates the problem. The distance between the lines is marked d, and the length of the needle is marked k. Let's assume k is less than or equal to d. The illustration shows two possible resting points for the needle.



Looking at the picture, it's easy to see how to approach the problem. All we care about is the horizontal "width" of the needle when it comes to rest, relative to the total distance between the lines d. The ratio of the former to the latter will be the probability of landing on a line.

In the picture, the horizontal width of the needle is given by . We use the absolute value since physical length is always positive. This is marked in the picture.

Here's how we reason this out. First, the probability that the needle will land at any particular angle is just . And given the angle it happens to land at, the probability it will cross a line is , which is just the horizontal width of the needle divided by the distance between lines. (Note this only works for k <= d.)

Putting these two probabilities together -- since both events must occur at the same time -- we get the probability that a particular needle at angle theta will cross a line as



But this is only for one theta. What about the others? To find the probability that a dropped needle will hit a line for every possible theta, we have the following integral



Pulling out the constants, we can re-write this as



The key to simplifying this is to note the following relationship between ordinary cosine and the absolute value of cosine. For ordinary cosine, the integral between 0 and 2pi is just zero -- half the time it's positive over that range, and half the time it's negative. But we want the absolute value of cosine, which is always positive.

The picture below illustrates how to translate between the two.



The picture shows regular cosine plotted between 0 and 2pi. In absolute value terms, the areas under the curve marked 1, 2, 3, and 4 are all equal. The two positive areas exactly equal the two negative areas. That's another way of saying that the integral of from 0 to 2pi is equal to four times the integral of from 0 to pi/2.

That means we can re-write our integral as











So the probability that a needle will land on a line is just , as long as k is less than or equal to d.

Now imagine we're clever and choose sewing needles exactly the length of the width of boards in our floor. In that case, k = d. Then the probability it will hit a line becomes



Now we have an expression we can use to estimate pi. The process is simple.

Start tossing needles on the floor. Count how many land on the lines, as a percentage of total throws. That's P(k,d). Then divide this number into 2. The result will be pretty close to pi. The more needles you throw, the better the estimate will get.

I won't bore you with the enormous literature of mathematicians actually trying to estimate pi using this awkward method. But for those who want to see how this works with a Java-based simulation, see here.

Posted by Andrew on Wednesday June 3, 2009 | Feedback?



* * *


An Easy Way to Remember the Taylor Series Expansion

The Taylor expansion is one of the most beautiful ideas in mathematics. The intuition is simple: most functions are smooth over ranges we’re interested in. And polynomials are also smooth. So for every smooth function, we should be able to write down a polynomial that approximates it pretty well. And in fact, if our polynomial contains enough terms it will exactly equal the original function. Since polynomials are easier to work with than almost any other kind of function, this usually turns hard problems into easy ones.

The Taylor formula is the key. It gives us an equation for the polynomial expansion for every smooth function f. However, while the intuition behind it is simple, the actual formula is not. It can be pretty daunting for beginners, and even experts have a hard time remembering if they haven’t seen it for a while.

In this post, I’ll explain a quick and easy trick I use to re-derive the Taylor formula from scratch whenever I have trouble remembering it.

Starting from Scratch
The idea behind the Taylor expansion is that we can re-write every smooth function as an infinite sum of polynomial terms. The first step is therefore to write down a general nth-degree polynomial. Here it is:



Where a0, a1, … are coefficients on each polynomial term, and c is a constant that represents where along the x-axis we want to start our approximation (if we don’t care where we start, just let c = 0, which is technically known as a Maclaurin rather than a Taylor). This series can be written in closed form as the following:



The goal here is to find a clever way to find the coefficients a0, a1, … in that equation, given some function f and an initial value of c. Here is the logic for doing that.

Polynomials are smooth, so that guarantees they’re differentiable. That is, we can calculate the first, second, third and so on derivatives of them. So starting with our polynomial above, let’s take the first few derivatives of it, like this:









Clearly we’re seeing a pattern already. We’ll use that in a minute. Now that we have n derivatives of f, let’s evaluate them for some number that will cause most of their terms to drop away. This is the key step. If we’re clever, we’ll notice that if we evaluate them at x = c, most of their terms will go to zero. That will leave behind only the coefficients a1, a2, … multiplied by some constant. So here’s that step:













Now we have a set of simple equations we can solve for a1, a2, … Simply divide both sides by n!. That gives us the following:













The pattern here is beautiful. The nth coefficient is just the nth derivative of the original function, evaluated at c, divided by n factorial. Now we have our n coefficients. The next step is to plug them back into our beginning expression for a general nth-degree polynomial, like this:





This equation is what we’re looking for. It gives a polynomial expansion for every smooth function f. We just need to calculate the first n derivatives of f, evaluate them at c, divide each one by n!, and sum up these terms. The result will be a good approximation to our original function. The more terms we add on, the more accurate the polynomial approximation will be.

The Result: the Taylor Formula
The final step is to write this infinite series in closed form. This is the last step in the trick for remembering the formula. Writing the above as a summation, we get our final result:



The lesson here is simple: don’t waste your time learning formulas. Learn methods. If you can remember the basic logic of where the Taylor expansion comes from, you can quickly and easily re-derive the formula from scratch.

To see how this works in a spreadsheet, here's an Excel file that gives a step-by-step process for the Taylor expansion of f(x) = 2^x at x = 3, using 6 polynomial terms.

Posted by Andrew on Friday May 29, 2009 | Feedback?



* * *


Some Interesting Mathematics of the Rule of 72

Most people first encounter the “rule of 72” in a basic finance course. The idea is simple: divide 72 by your growth rate times 100, and the result is the number of years it takes to double the investment. However, there's some interesting math going on behind this rule that most people never learn.

In this post I’ll explain the hidden math of the rule of 72, show you how it’s derived algebraically, and point out some important limitations.

Some Background
The key idea behind the rule of 72 is that it’s designed for quick mental calculations. Seventy-two gives a decent ballpark approximation for doubling times. But more importantly, 72 is evenly divisible by lots of common growth rates -- 2, 3, 4, 6, 8, 9, 12, 18, and so on. That makes it ideal for back-of-the-envelope approximations.

However, 72 has some major drawbacks. It's inaccurate in many cases, sometimes giving answers that are off by years. Recognizing this, some textbooks offer more accurate (but harder to use) alternatives. Some recommend a “rule of 70,” while others suggest a “rule of 69.” While these are both harder to use, they're considerably more accurate in many cases. As we'll see in a minute, the rule of 69 is probably the most mathematically defensible of all, despite being pretty useless for mental calculation.

Deriving the Optimal Rule of Thumb
So where do these rules come from? In this section, I'll show you how they're derived algebraically from basic compound interest formulas.

When we say we're looking for a rule of thumb for doubling time, here's what we really mean. We want some constant M (like 72, 70, 69, etc.) that we can divide by our growth rate times 100 and get a good estimate of the time to double an investment. The first step is to write an expression for the value of the investment at time t, given some initial value a0. Then we set this expression equal to twice the initial value, or 2a0. That is



The a0s cancel out on both sides of the above, leaving us with



Taking the natural log of both sides and solving for t, we get



This expression gives the time t it takes to double the investment, given some r and k. What we want is a number M such that when we divide by 100r, it gives us t. That is, we want an M such that M/(100r) = t. Solving for M, we see that M = t(100r). So our rule of thumb M is simply given by 100r times our equation for t above, or



As you can see, the optimal rule of thumb M is a function of both r and k. So for different growth rates and frequency of compounding, you’ll want different rules of thumb. The ubiquitous M=72 is really only a good rule of thumb for a very small number of cases.

The graph below is a plot of the function M(r,k) for r between 1 percent and 30 percent and k between 1 and 30. R and k are shown on the horizontal axes, and the optimal rule M is given on the vertical axis. The colored bands show areas where each rule is optimal. For example, the light green band near the center shows where the rule of 70 is optimal, and the flat red plane shows where the rule of 69 is best, and so on.



As you can see, the optimal rules are all over the map for small values of k. In the extreme case of k=1, the usual rules of 70 and 72 are only close for growth rates between 1 percent and 10 percent. Beyond that they quickly break down. When k=1, the optimal rule M grows linearly as r increases, so no rule holds up well in that range. Overall, 72 looks pretty lousy from this perspective -- it's the small aqua-blue band in the center-left. Seventy looks slightly better -- the green band corresponding to 70 covers much more area -- but it also only works over a pretty small range.

A Special Case: The Rule for Continuous Compounding
The rule that seems to hold up best is actually the awkward rule of 69, which corresponds to the broad, flat plane of red on the right-hand side of the graph. It works for a huge range of values as the function flattens out to the right and roughly approaches 69. Beyond k=15, the graph is essentially flat regardless of the growth rate r. This is no surprise, as the rule of 69 turns out to be a special case for continuously compounded growth rates.

With continuously compounded growth, k is very large. To see how this affects our optimal rule M, let's return to our first equation for the time to double our investment.

If growth is continuously compounded, we have the following equation for the value at time t



Again, the a0s cancel out on both sides, giving us



Solving for t we get



Since we're looking for an M such that M/(100r)=t, or M=(100r)t, we get the following optimal rule M



So with continuously compounded growth, the optimal rule becomes pretty simple. Just divide (100)ln 2 -- or roughly 69.3 -- by your growth rate times 100. In the graph, that’s the value our function M is approaching as it flattens out to the right. Put differently, the limit of M as k approaches infinity is (100)ln 2 or about 69.3.

We can easily see this result analytically by taking the limit of the function M as k grows to infinity. That is, we can find the following limit



To work out this limit, we need to do some tricky algebra to the denominator. First, let's move the k from the left side up into the exponent on the log to its right. Then let's define a new number x such that x = k/r. This trick will let us collapse most of the denominator into the constant e, using the fact that e equals the limit of (1 + 1/x)^x as x grows to infinity.

Substituting in x = k/r, we get



Since x = k/r, letting k grow to infinity is equivalent to letting x grow to infinity. But we know that the term in the denominator (1+1/x)^x just becomes e as x grows to infinity. So we have



And that's the result we alluded to above. As k grows, the function M converges very quickly to 69.314... So that's the optimal rule for a huge number of values for r and k -- infinitely many. So in some long-run mathematical sense, the most defensible rule actually turns out to be the awkward and practically useless rule of 69.

To see how this argument works in a spreadsheet, here's an Excel file with the equations, data and graph from above.

Posted by Andrew on Tuesday May 26, 2009 | Feedback?



* * *


A Barroom Method for Estimating Pi

Here's a method for estimating pi that basically requires a dartboard, faith in the relative frequency interpretation of probability, and plenty of time.

Step one, build yourself a dartboard that looks like this:



The area of the circular dart board is . And the area of the square behind the board is .

Imagine throwing darts at this board. What's the probability it will land inside the circle? Easy enough -- it's the area of the circle divided by the area of the square, or





Now we have an expression we can use to estimate pi. The process is simple.

Start throwing darts at the board. Be sure to throw randomly -- you'll need a uniform distribution of throws. Count the percentage that fall inside the circle. Then multiply this percentage by four. The result will be pretty close to pi. The more darts you throw, the better the estimate will get.

For those who want to see how this works in a spreadsheet, here's an Excel file with a simulation for n = 500 and n = 10,000 using random numbers from random.org for the x and y coordinates of the dart throws.

Update: A reader via email suggests another interesting method known as "Buffon's Needle Problem," which you can read about here.

Posted by Andrew on Monday May 11, 2009 | Feedback?



* * *


Walking Through the Bolzano-Weierstrass Theorem

Economic theory relies pretty heavily on a handful of mathematical proofs. One of the most important for proving the existence of equilibria is the Bolzano-Weierstrass theorem.

In this post I'll give a basic explanation of the Bolzano-Weierstrass theorem, and walk you through a simple proof of it.

Some Background
Bolzano-Weierstrass is a theorem about sequences of real numbers. Sequences can either be convergent -- that is, they can approach some long-run value -- or they can be divergent and blow up to infinity. For many equilibrium concepts, it's necessary to show that a given sequence is convergent and stays within some reasonable set of values as we move far out along the sequence in the long run.

Bolzano-Weierstrass basically says that we'll always end up with a convergent subsequence if we start with the right kind of sequence to begin with. Specifically, it says that if we start with a closed and bounded sequence -- that is, if our sequence is finite and contains its own endpoints -- then it's guaranteed to have a subsequence that converges. Often that means the equilibrium we're interested in actually exists, which is a good thing.

In the rest of this post, I'll walk through a typical proof of the theorem. Different variations on this basic proof are used pretty widely in the more mathematical branches of economics such as game theory and general equibrium theory.

Proving the Theorem
Here's the formal statement of the Bolzano-Weierstrass theorem: Every closed and bounded real sequence has a convergent subsequence.

Proof. Let be a bounded real sequence. Then there exists a closed interval [c, d] such that for all positive integers n -- that is, for all .

Now divide the interval [c,d] into the following two intervals:



Since the sequence is bounded, infinitely many terms of it are contained in the original interval [c,d]. Now that we've split that interval in two, one of those two intervals must contain for infinitely many positive integers n. Call this interval .

Now repeat this step, dividing into the following:



Similarly, one of these intervals must also contain for infinitely many positive integers n. Call this . Continuing this process, we get a sequence of intervals such that



where each of these intervals contains for infinitely many positive integers n, and the width of each of the k intervals is given by



Now pick a positive integer such that . Since the next interval contains for infinitely many positive integers n, we can now choose an such that . Continuing this process, we can obtain elements such that



and



This means the newly created sequence is a subsequence of . So we've established that every bounded real sequence has a subsequence. The next step is to go further and show that this subsequence actually converges.

To see why this subsequence converges, consider the illustration below. It shows the original interval [c, d] at the top, with the subsequent intervals [c1, d1], [c2, d2], ...] we created by repeatedly dividing it below.



Note that each interval is half the length of the previous interval, and that the boundaries of each interval are always moving inward or staying the same -- they're never moving outward. That is, the sequence of lower bounds is always moving inward as k grows. Similarly, the sequence of upper bounds is always pushing inward as k grows. That means both the lower and upper bounds are monotone and bounded. And that guarantees they are both convergent.

Let's label the value that the lower bounds converge to as . Similarly, let's label the value the upper bounds converge to as . Using our equation for the width of each interval from above, we know that



Taking the limit of both sides, we see that









So this shows that both the upper and lower bounds converge to the same value of L = M.

We saw above that , so that means for all k. But since and converge to the same limit of L = M, by the so-called "squeeze theorem" we know that must also converge to the same limit of L = M.

That shows that every closed and bounded real sequence -- that is, every sequence defined on a compact set -- has a subsequence , and that this subsequence is convergent. And that completes the proof.

For more on Bolzano-Weierstrass, check out the Wiki article here.

Posted by Andrew on Monday May 11, 2009 | Feedback?



* * *


Using Eigenvectors to Solve Dynamic Linear Systems

Each year, some fraction of Seattle's population moves to Portland. And some fraction of Portland's population moves back to Seattle. If this keeps up year after year, what happens in the long run? Will the populations reach a steady state, or will one city empty out into the other?

These are questions we can answer with simple linear algebraic models. In this post I'll explain how to set up models for problems like this, and how to solve them for their long-run values.

Setting Up the Problem
Here are some basic facts. Imagine Seattle and Portland both have populations of 500,000. Each year, 5 percent of Seattle's population moves to Portland and 20 percent of Portland's population moves to Seattle. The rest don't move. This pattern repeats every year, forever.

The first step is to write this as a linear system. The idea is that we have a system -- in this case, the population in both cities -- that starts at an initial state u0 at time t = 0. At time t = 1, some transformation happens, taking us to state u1. Then the transformation is repeated at time t = 2, 3, ... bringing us to states u2, u3, and so on. The question we're after is what does the system look like after k steps, and what happens in the long run when k gets large?

In our problem, we can write the initial state u0 as a vector of populations in the two cities. The first entry is Seattle's population and the second entry is Portland's:



The population shifts between the two cities -- that is, the transformation that happens at time t = 1, 2, ... -- can be described by this matrix:



Here's how to read this. The first column and first row describe Seattle's population, and the second column and second row describe Portland's. Reading down the first column, it says 95 percent of Seattle's population stays in Seattle, while 5 percent moves to Portland. Similarly, the second column tells us that 20 percent of Portland's population moves to Seattle, while 80 percent stays at home. Since the columns of A all sum to 1, this is what's known as a Markov matrix.

Putting these together, our model works like this. Start with the vector describing the initial state, u0. To move to state u1, we multiply by the matrix A. That is,



To move to state u2, u3, ... we again multiply by A, or




That's our basic model describing the population in the two cites after k steps. In the rest of this post, I'll explain how to solve it for the long-run equilibrium.

Finding Eigenvalues
For any linear system, the key to understanding how it behaves over time is through eigenvalues and eigenvectors. We need to get to the bottom of what the matrix A is doing to the system each time it is multiplied. The way to see that is by examining A's eigenvalues and eigenvectors.

The first step is to find the eigenvalues of A. Since we've got a 2x2 matrix, we'll normally expect to find two of them.

Here's how to find them. Start with the definition of eigenvalues as,



To solve for we first rewrite this as



Looking at this expression, we definitely know that the matrix isn't invertible for any non-zero vector x. If isn't invertible, that means its determinant must be zero. Using that fact, we can solve for the eigenvalues.

In our case, we have a 2x2 matrix which has a pretty simple determinant,



That last equation is called the "characteristic polynomial" of A. It's what we solve to find the eigenvalues. In this case, it's a quadratic since we're working with a 2x2 matrix, which will have two solutions. If the problem was for n cities instead, we'd end up with a characteristic polynomial of degree n.

In this case, we have an easy solution to the polynomial:



These two s are the two eigenvalues of A.

It's no coincidence that one of our eigenvalues is 1. That's always the case with Markov matrices. And that makes the characteristic polynomial easy to solve. We always know that the trace of A -- the sum of A's elements down the diagonal -- is equal to the sum of the eigenvalues. If one of them is 1, then the other must be 1.75 minus 1 or .75. Those are our two solutions, so we can immediately factor and solve.

Finding Eigenvectors
Now that we have the two eigenvalues, the next step is to find the eigenvectors x1 and x2. Here's how we do that. For each eigenvalue and , we find the null space of the matrix . That is, we find the vectors x that make .

We do this by using row reduction on until we get it in row echelon form. Then we can pick off the pivots and free variables, and work out the eigenvectors.

First take the case of . Then the matrix becomes



Adding row one to row two, we get



So our equation becomes,



Since -.05 is attached to our "pivot" and .20 is attached to our "free" variable, we set our free variable x2 equal to 1 and solve for the pivot variable x1, or



So our first eigenvector x1 is



Following the same procedure for the other eigenvalue , we find the second eigenvector x2 is



Creating S and Λ Matrices
The next step is to form a new matrix S which has the eigenvectors found above as its columns. That is, let



Also, let's form a new matrix which has the eigenvalues we found above along the diagonal and zeros elsewhere. That is, let



We know from the definition of eigenvalues and eigenvectors, . Now that we've got our eigenvectors in the columns of the matrix S, and our eigenvalues are along the diagonal of , we can rewrite our basic relationship as



Solving this for A, we can see an important relationship as the system moves from one state to the next:



Note what happens to this equation if we multiply by A over and over, moving from the first state u1 to the second, third, and so on:



Our original model was in the form . But now we've found that just equals . So we can rewrite our model as



But we can compress this model even further. It's possible to write the initial state vector u0 as a linear combination of the two eigenvectors x1 and x2. That is, we can write



which can be solved for c as



This expression for c can now be substituted back into the model above to get



This is the model we've been working to find. It says the kth state of our model is equal to the matrix of eigenvectors S times the matrix of eigenvalues raised to the power of k, times some vector c that gives combinations of them.

Interpreting the Model
Expanding out the expression , we can see better why this is a useful way to model our population problem,



What this is saying is that the kth state of our model is just the sum of each of the eigenvectors, times their corresponding eigenvalues raised to the power of k, times some constant c. Written this way, it's easy to see where this model is headed in the long run. All changes as we move forward in time are determined by the eigenvalues and .

If the absolute value of an eigenvalue is less than one -- that is, if -1 < < 1 -- it will rapidly decay and go to zero as k grows. If the absolute value is > 1, it will explode and grow without bound as k grows. And if it equals one, it's what we call a "steady state" of the model since it remains unchanged as k grows.

In our case, because we set up our problem using population proportions that always sum to one, our matrix A was a Markov matrix. It always has an eigenvalue that equals one, and the rest have absolute values less than one. That means our model is guaranteed to have a steady state, as any other factors influencing the model will decay and disappear over time as k grows.

The Solution
Plugging in the values for our population problem above, here's our actual solution,





At this point, it's easy to see what will happen to this model in the long run. Let k grow very large. The term on the right will rapidly shrink to zero -- this is the "transitory" component of the model. The term on the left just gets multiplied over and over by 1. So that's our steady state, or



This is the long-run equilibrium for our model. What it means is that starting with populations of 500,000 each in Seattle and Portland, migration slowly eroded Portland and expanded Seattle. And in the long run, Seattle's population levels off at 800,000 and Portland's shrinks to 200,000.

For those who want to see how this works in an Excel spreadsheet, here's a file with the above model.

[Here's a post where I explain how to use this method to generate the (k+1)th term of the Fibonacci sequence given some input k.]

Posted by Andrew on Sunday April 19, 2009 | Feedback?



* * *


A Different View of Least-Squares Regression

Linear regression is the most important statistical tool most people ever learn. However, the way it's usually taught makes it hard to see the essence of what regression is really doing.

Most courses focus on the "calculus" view. In this view, regression starts with a large algebraic expression for the sum of the squared distances between each observed point and a hypothetical line. The expression is then minimized by taking the first derivative, setting it equal to zero, and doing a ton of algebra until we arrive at our regression coefficients.

Most textbooks walk students through one painful calculation of this, and thereafter rely on statistical packages like Stata or Excel -- practically inviting students to become dependent on software and never develop deep intuition about what's going on.

This is the way people who don't understand math teach regression. In this post I'll illustrate a more elegant view of least-squares regression -- the so-called "linear algebra" view.

The Problem
The goal of regression is to fit a mathematical model to a set of observed points. Say we're collecting data on the number of machine failures per day in some factory. Imagine we've got three data points:

(day, number of failures)
(1,1)
(2,2)
(3,2)

The goal is to find a linear equation that fits these points. We believe there's an underlying mathematical relationship that maps "days" uniquely to "number of machine failures," or in the form , where b is the number of failures per day, x is the day, and C and D are the regression coefficients we're looking for.

We can write these three data points as a simple linear system like this:



For the first two points the model is a perfect linear system. When x = 1, b = 1; and when x = 2, b = 2. But things go wrong when we reach the third point. When x = 3, b = 2 again, so we already know the three points don't sit on a line and our model will be an approximation at best.

Now that we have a linear system we're in the world of linear algebra. That's good news, since it helps us step back and see the big picture. Rather than hundreds of numbers and algebraic terms, we only have to deal with a few vectors and matrices.

Here's our linear system in the matrix form Ax = b:



What this is saying is that we hope the vector b lies in the column space of A, C(A). That is, we're hoping there's some linear combination of the columns of A that gives us our vector of observed b values. Unfortunately, we already know b doesn't fit our model perfectly. That means it's outside the column space of A. This is just another way of saying A is not an invertible matrix. It's not possible for us to find an A^-1. So we can't simply solve that equation for the vector x.

Let's look at a picture of what's going on. In the drawing below the column space of A is marked C(A). It forms a flat plane in three-space. If we think of the columns of A as vectors a1 and a2, the plane is all possible linear combinations of a1 and a2. These are marked in the picture.



By contrast, the vector of observed values b doesn't lie in the plane. It sticks up in some direction, marked "b" in the drawing. The plane C(A) is really just our hoped-for mathematical model. And the errant vector b is our observed data that unfortunately doesn't fit the model.

So what should we do? The linear regression answer is that we should forget about finding a model that perfectly fits b, and instead swap out b for another vector that's pretty close to it but that fits our model. Specifically, we want to pick a vector p that's in the column space of A, but is also as close as possible to b.

The picture below illustrates the process. Think of shining a flashlight down onto b from above. This casts a shadow onto C(A). This is the projection of the vector b onto the column space of A. This projection is labeled p in the drawing.



The line marked e is the "error" between our observed vector b and the projected vector p that we're planning to use instead. The goal is to choose the vector p to make e as small as possible. That is, we want to minimize the error between the vector p used in the model and the observed vector b.

In the drawing, e is just the observed vector b minus the projection p, or b - p. And the projection itself is just a combination of the columns of A -- that's why it's in the column space after all -- so it's equal to A times some vector x-hat.

To minimize e, we want to choose a p that's perpendicular to the error vector e, but points in the same direction as b. In the figure, the intersection between e and p is marked with a 90-degree angle. The geometry makes it pretty obvious what's going on. We started with b, which doesn't fit the model, and then switched to p, which is a pretty good approximation and has the virtue of sitting in the column space of A.

Solving for Regression Coefficients
Since the vector e is perpendicular to the plane of A's column space, that means the dot product between them must be zero. That is,



But since e = b - p, and p = A times x-hat, we get,



Solving for x-hat, we get



The elements of the vector X-hat are the estimated regression coefficients C and D we're looking for. They minimize the distance e between the model and the observed data in an elegant way that uses no calculus or explicit algebraic sums.

Here's an easy way to remember how this works. Doing linear regression is just trying to solve Ax = b. But if any of the observed points in b deviate from the model, A won't be an invertible matrix. So instead we force it to become invertible by multiplying both sides by the transpose of A. The transpose of A times A will always be square and symmetric, so it's always invertible. Then we just solve for x-hat.

There are other good things about this view as well. For one, it's a lot easier to interpret the correlation coefficient r. If our x and y data points are normalized about their means -- that is, if we subtract their mean from each observed value -- r is just the cosine of the angle between b and the flat plane in the drawing.

Cosine ranges from -1 to 1, just like r. If the regression is perfect, r = 1, which means b lies in the plane. If b lies in the plane, the angle between them is zero, which makes sense since cos 0 = 1. If the regression is terrible, r = 0, and b points perpendicular to the plane. In that case, the angle between them is 90 degrees or pi/2 radians. This makes sense also, since the cos (pi/2) = 0 as well.

For those who want to try this at home, here's a handy Excel file summarizing the argument here, along with the cut-and-paste formulas you'll need in Excel.

Posted by Andrew on Friday April 10, 2009 | Feedback?



* * *


New Study of Household Burdens from a U.S. Cap-and-Trade System

We’ve released a new study today exploring the annual cost of a typical cap-and-trade system to U.S. households.

The study uses a standard Leontief input-output model to estimate how cap-and-trade burdens are passed down to households in the form of higher prices for carbon-intensive products. We present distributional estimates by income groups, age groups, family types and U.S. regions. The paper is No. 6 in the Working Paper series at the Tax Foundation in Washington, D.C.

The full study is here.

Posted by Andrew on Monday March 16, 2009 | Feedback?



* * *


Deriving Tax Incidence from Price Elasticities

(See this post at Chamberlain Economics, LLC.)

All good economics starts with theory. The world is a complicated place—far too complex to make sense of directly. Economic theory helps collapse that complexity into a few key relationships we can work out mathematically and check against the facts. The first step in every analysis is to sit down with pencil and pad to work out the theory.

To help our clients better understand the economic theory underlying our work, we’ll be posting an ongoing series of articles titled “Core Concepts” at the Chamberlain Economics, L.L.C. site. The goal is to provide a collection of simple and brief introductions to the core theoretical concepts used in our work.

As the first in the series we’ve posted “Core Concepts: The Economics of Tax Incidence”. The piece is designed as a refresher on the basics of tax incidence, and how it’s derived analytically from elasticities of supply and demand in the marketplace. This idea serves as the foundation for nearly all of our work on tax modeling and policy analysis.

Check out the article here.

Posted by Andrew on Saturday December 13, 2008 | Feedback?



* * *


New Study of the "Gang of Ten" Energy Bill's Section 199 Repeal

We released my latest study today, which explores the tax burden and economic impact of the proposed “Gang of Ten” energy bill currently working its way through Congress.

The energy bill proposes roughly $84 billion in new federal spending on conservation and alternative energy programs, to be funded partly by corporate income tax changes affecting the oil and gas industry. My study explores the tax side of the proposal, estimating how the tax burden will be split between labor and capital, and how these burdens will impact the broader U.S. economy.

You can download a draft of the study in the “Publications” section at the new Chamberlain Economics, L.L.C. website. Here’s the abstract:

The recently released “Gang of Ten” energy proposal includes revenue offsets that would exclude domestic oil and gas companies from the Section 199 deduction for domestic production activity. Using a simple input-output model we estimate the state-by-state impact of this proposal on tax burdens, employment, household earnings and economic output. We estimate the proposal will increase corporate tax burdens by approximately $13.57 billion over ten years, 44 percent of which will fall on households in the petroleum manufacturing states of Texas, California and Louisiana. Using RIMS II multipliers we estimate the proposal will reduce U.S. employment by roughly 637,000 jobs over ten years, reduce household earnings by $34.97 billion, and reduce total U.S. economic output by $185.95 billion.

A draft of the full study is available here.

(Note: For the official published version of the study from the Institute for Energy Research see here.)

Posted by Andrew on Wednesday September 10, 2008 | Feedback?



* * *


Creating an Input-Output Table from BEA Data: A How-to Guide

(See this article at Chamberlain Economics, L.L.C. also.)

One of the hard parts about building Leontief input-output models is that the source data are hard to use.

Instead of producing a real input-output table, the Bureau of Economic Analysis (BEA) produces what they call “use” and “make” tables. The make table shows products produced by each industry, while the use table shows how products get used by industries, consumers and government. However, what we need for Leontief models is a table that shows only the industry-by-industry relationships.

In this post I’ll explain how to create an input-output table from BEA’s make and use tables. At the bottom, I’ve posted a spreadsheet with an I-O table I developed from the new 2002 BEA Benchmark Input-Output Data.

Building the I-O Table
The first step is to download the BEA’s use and make tables. The easiest ones to work with are the “Standard Make and Use Tables at the summary level,” which you’ll find at http://www.bea.gov/industry/xls/2002summary_makeuse.xls. These have 133 industry groups, which is enough detail for most research.

Once you’ve downloaded them, open the use table. Delete the two rows labeled “Noncomparable Imports” and “ROW adjustment”. There’s no domestic industry that produces these things, so you won’t need them in your industry-by-industry table. Next, label the intermediate-uses portion of the Use table “U”—you can label ranges in Excel by typing names in the upper-left box in the toolbar.

Next, open the make table. This is a 133×134 table showing industries down the rows and the products they make across the columns. Below the make table, create a new 133×134 matrix where each element is equal to the corresponding make table element divided by the column total. Label this matrix “M”.

Now we’re ready to build the I-O table. The classic table has three sections, which we’ll build one at a time: intermediate uses, final demands, and industry value added.

First, let’s build the intermediate-uses section. Call this “S”. You’ll calculate the “S” matrix by multiplying M*U, which we’ve defined above. This will create a 133×133 matrix that’s the intermediate-uses part of the table. Set this aside for now.

Next let’s calculate the final demands part of the table. That’s the right-hand section showing how much output from each industry is used by consumers, government and the rest of the world. In the use table, label the final demands portion of the table “D”. Create a new matrix labeled “Di” by multiplying M*D. This gives us a 133×13 matrix of final demands for the I-O table. This Di matrix will sit to the right of the intermediate portion of the input-output table (“S”) we calculated above. As a final step, add a column on the far right of the table that sums the intermediate uses and final demand for each row. This is total output for each industry.

Finally, let’s fill in the value-added section at the bottom of the table. To do this, first copy the total industry output we added to the far right column and transpose it into the bottom row of the table. This forces the row output of each industry to equal their column output, which is a fundamental equality in I-O tables.

Once you’ve pasted in the total output line, you’ll need to use the entries in the “Gross Operating Surplus” row in the table as balancing items to get the column totals for each industry to equal the row totals.

Once the row and column totals are equal—that is, output used for intermediate uses + final demands = inputs purchased + value added for all 133 industries—we’re done with our I-O table.

Bottom Line: 2002 Input-Output Table for the U.S.
Following the process above, you should end up with an I-O table that looks something like this. Here’s my own table built from the 2002 BEA Benchmark tables, which you’re free to use:

2002 U.S. Input-Output Table

Once you’ve got an I-O table like this, it’s pretty easy to turn it into a Leontief model. This will let you model the distributional impact of carbon taxes, measure tax pyramiding of gross receipts taxes, and more.

In a future post, I’ll share a couple of these models I’ve put together over the last year or so.

Posted by Andrew on Saturday August 9, 2008 | Feedback?



* * *


Estimating Price Elasticity of Gasoline: A How-To Guide

(See this article at Chamberlain Economics, L.L.C. also.)

Like most concepts in economics, price elasticity is easy to talk about but hard to measure. Most people understand the basic idea—some things have lots of substitutes while others we can’t do without. But few learn how to actually measure elasticity in the real world.

In this post, I’ll show you a simple way economists actually measure elasticities. We’ll develop a simple theory, write it down mathematically, find some real-world data, and crunch the numbers in Excel. At the end, I’ll hand over a spreadsheet with my own elasticity estimates for retail gasoline that replicate the numbers from a well-known recent econometric study.

Start with the Theory
Our goal here is to estimate the price elasticity of demand for retail gasoline. The first step is to start with a theory about the demand for gas.

The simplest theory is that we know gasoline — like everything else — should have a demand curve. What should it look like? In the simplest case, it should be driven by two things: the price of gas, and how much income people have. If gas prices rise consumption should fall; conversely, if income goes up gas consumption should rise also.

So there’s our theory. Now, let’s write it down mathematically. If gas demand is a function of prices and income, one way we can write it is like this:

G = a*P + b*Y

Where:

G = Gallons of gas demanded per year
P = The price of gas
Y = Average income in the economy
a, b = Coefficients for the magnitude of the impact of prices and income on gas demand. (Note: According ot our theory, the “a” coefficient on prices should be a negative number and the “b” coefficient on income should be positive.)

Now that we’ve got a theory written down, the next step is to translate it into a form that we can estimate in the real world. Think of the theory as an architect’s drawing — it’s a guide, but our goal is to actually to build the darn thing with hammer and nails.

To do this, think about real-world factors that might complicate our simple theory. For one, we should probably control for population by using per capita figures. Next, we should control for inflation by inflation-adjusting everything. Finally, we should control for seasonal variation somehow, since gas demand always peaks in summer and slows in winter.

Taking these messy details into account, here’s how we translate our theory into a relationship we can actually estimate. Economists call this “specifying the model”:

Gij = A + a*Pij + b*Yij + ei + eij

Where:

G = Per capita gas demand in month i and year j
A = The y-intercept term in our linear demand curve
Pij = The inflation-adjusted gas price in month i and year j
Yij = Real per capita disposable personal income in month i and year j
a, b = Coefficients on price and income
ei = A dummy variable for the month of the year to control for seasonal variation (there are actually eleven of these, one for each month January through November; they’re one if it’s the month in question and zero otherwise); this is called “seasonal fixed effects”
eij = A mean-zero random error term for month i and year j.

This way of specifying our model is called “linear”. This isn’t the only way to do it — at the end I’ll mention another way called “double log” that has some advantages. But for now, we’re ready to collect some data and run a regression.

Go Find the Data
The best source for data is always official government sources. Here’s the data we’ll use for this:

1. Gallons of gas demanded: We’ll use data from the U.S. Energy Information Administration. It’s called “product supplied”. The numbers are in barrels, so you’ll have to multiply them by 42 to convert them to gallons:

http://tonto.eia.doe.gov/dnav/pet/hist/mgfupus1m.htm

2. Gas prices: We’ll use numbers from the U.S. Bureau of Labor Statistics here. It’s from their “average price data” series, and it’s the monthly retail price of gas:

http://www.bls.gov/cpi/home.htm

3. Income: We’ll use data from the U.S. Bureau of Economic Analysis for this one. It’s called “disposable personal income”, and it comes from line 26 on Table 2.1 from the National Income and Product Account (NIPA) tables. Note that you’ll have to split these quarterly data into monthly values using cubic splines interpolation or some other method — here’s a post on how to do that:

http://www.bea.gov/national/nipaweb/SelectTable.asp?Selected=N#S2

4. Something to inflation-adjust prices and income: For this one, we’ll use the implicit price deflator for GDP from the Bureau of Economic Analysis. It’s on line 1 of NIPA Table 1.1.9:

http://www.bea.gov/national/nipaweb/SelectTable.asp?Selected=N#S1

5. Population figures to turn gallons and income into per capita figures: This is a hard one, because we need monthly figures and the U.S. Census Bureau only produces annual figures. Also, it’s hard to piece together a consistent series from before and after each decennial census. Thankfully I’ve done the hard work for you — the Excel files below include a monthly population series I put together myself.

Once you’ve compiled these data in columns in an Excel sheet, you’re ready to run your regression. When you do, you should find something like this:

For 2000-2007, the coefficient on gas prices should be about -1.07, and the coefficient on income should be about 0.0007. Using the formula for price elasticity of E = (Average price over the period/Average quantity over the period)*(price coefficient), that implies a price elasticity of demand of about -0.048 and an income elasticity of about 0.51.

And that’s about what we’d expect. We know short-run demand for gas is inelastic, and has a negative relationship with prices. And we know that income should be positively related to gas demand, which it is.

By the way, the above method is based on a well-known 2006 study from Hughes, Knittel and Sperling which you can read at http://papers.ssrn.com/sol3/papers.cfm?abstract_id=930730. If you’d like to try the regression yourself, here are the files you’ll need:

Average U.S. Gas Prices 1978-2007
Gasoline Demand 1945-2007
GDP Price Deflators 1947-2007
Personal Income Data 1947-2007
US Population 1900-2007

For those who want to see how everything should look in a final Excel file, here are my own estimates. They’re basically similar to what Hughes, Knittel and Sperling found. The first file uses the linear specification above. The second one uses a “double log” specification, which basically takes the log of the data. The big advantage of the latter is that the regression coefficients are also the price and income elasticities, which is handy:

Price elasticity of demand for gasoline: Linear model.
Price elasticity of demand for gasoline: Double log model.

If you want the full data as STATA files instead, here they are:

Price elasticity of demand for gasoline: Linear model.
Price elasticity of demand for gasoline: Double log model.

Posted by Andrew on Friday August 1, 2008 | Feedback?



* * *


New report on tax burdens and government spending by age groups

Released a new report today exploring the distribution of tax burdens and government expenditures by age group in 2004. It’s a spin-off project from the fiscal incidence model we built back in March. Here are some key findings:

• As the Baby Boom generation prepares to retire, lawmakers should be aware of the distribution of taxes and government spending across age groups.

• America’s youngest households aged 25 and under received $2.32 in government spending for each dollar of taxes paid in 2004. Middle-aged households aged 45 to 54 received $0.73 per tax dollar, and America’s oldest households aged 75 and over received $4.93 per dollar of taxes paid;

• As a group, households aged 35 to 64 pay more in taxes than they receive in government spending, while households under age 35 and over age 64 receive more government spending than they pay in taxes. Overall between $376 billion and $872 billion per year is fiscally transferred from middle-aged groups to the youngest and oldest Americans each year through government taxes and spending;

• Over a lifetime, government spending follows a U-shaped pattern, with large education and welfare spending in youth and large Social Security and Medicare payments in old age. But even within each age group, there are large differences in taxes and government spending across households at different income levels.

Full report is here.

Posted by Andrew on Monday June 4, 2007 | Feedback?



* * *


New York Sun on fiscal incidence study

Our fiscal incidence study got a nice write up in this morning’s New York Sun. The author compares our findings to an influential new study of tax progressivity from Profs. Thomas Piketty and Emmanuel Saez:

Messrs. Chamberlain and Prante took an entirely different approach [from Piketty and Saez]. Rather than focusing solely on federal individual taxes, they analyzed the distribution of all taxes paid — federal, state, and local. They looked at fifths of the population and didn’t produce data for fractions of the top percent. They analyzed the period between 1991 and 2004, thereby avoiding the pitfalls of comparing incomes before and after 1988.

And they added another dimension to the analysis. They analyzed government spending and which income groups it benefits — all government spending for goods and services, and also transfer payments to individuals, such as unemployment benefits and Social Security.

Households in the two lowest quintiles — a quintile is one-fifth of the whole — received 51% of all government spending because they received more transfer payments. Messrs. Chamberlain and Prante concluded that “both taxes and spending appear to have large distributional effects on households,” and that our tax system is very progressive. The share of total taxes paid by the top quintile rose to 49% in 2004 from 46% in 1991, after peaking at 51% in 2000.

Households in the lowest fifth of incomes received about $8 in federal, state, and local spending for every tax dollar they paid, whereas households in the top fifth of earners received only 41 cents. This shows a tax system with substantial progressiveness.

Looking at the burden of taxes paid in light of benefits received makes far more sense than looking at these taxes in isolation, so Messrs. Chamberlain and Prante are more persuasive.

Full piece is here.

Posted by Andrew on Friday April 20, 2007 | Feedback?



* * *