A Basic Guide to Correlation in Excel

Do you have a problem concerning statistics and you cannot stand the thought of doing quantitative analysis? If you are studying a course where you don't have a choice on the matter, this may be a good text to read. Not because you really want to, but it is by far the easiest statistical test to perform for most basic economic, psychological, political, and other courses.


What is Correlation?


While different industries have different uses for measuring correlation, correlation generally tends to refer to the measure to which two variables move in relation to each other. The most common measurement of correlations, Person’s Coefficient, can be easily measured in Excel. The value of the coefficient can be between -1 and +1.


Values of the coefficient ranging between 0 and 0.3 are considered weak correlation. If the coefficient value is between 0.3 and 0.7, this is considered a moderately strong correlation. Anything higher than 0.7 is considered to be a strong correlation. Don’t get confused by the potentially negative results you may get. A positive correlation (such as 0.85) means that two variables tend to move together, while a negative correlation (such as -0.72) means that as the value of one variable goes up, the value of the other variable tends to go down.


A Practical Example of Correlation


Let’s use a practical example. If we take a look at your monthly spending and the amount of money in your bank, it is almost certain that there will be a strong negative correlation between these two variables. Unless you are very lucky, the more you spend, the more you are likely to have less money in the bank. Similarly, when we look at the amount of money invested in healthcare, it is logical that the expected living age will increase as a result of the government investing more money into the health of its citizens.


Why do We Need Correlation?


Reding the previous paragraph, it may seem that a correlation test is completely redundant. Actually, it’s very important because everything mentioned in the previous paragraph is an example of a hypothesis that needs to be tested – it is not an actual example of correlation. For the example of negative correlation, someone can be spending more money because his wage increased or he got a large inheritance, meaning that the assumption that there is a negative correlation is entirely wrong.

Similarly, a government may be making a huge increase in healthcare spending because there is an outbreak of a deadly virus – meaning that there will once again likely be no correlation between the two variables. Regardless of how much you can believe two events to be connected, you cannot prove correlation without performing a statistical test for it. Don’t worry, this is relatively easy to do in Excel and there are numerous free websites where you can download the data for free.


Finding the Data


Usually, you won’t have much trouble finding the data to perform a relevant statistical test. If you’re searching for macroeconomic data, you’ll find a lot of data available on sites such as the World Bank or country-specific statistical agencies. If you need data such as stock price, Yahoo Finance usually has a solid archive of relevant stock indexes.


Preparing the Data for the Test

Data for Correlation Test









To provide a quick example, I’ve downloaded the data for the past year for three stock indexes – the Germen DAX Index, the American S&P, and the Chinese SSE Composite Index. Important note if you’re comparing stock indexes – always use the close values. Based on this data, I will test two hypothesis that ‘’feel true’’ and that can be tested by using correlation:

Hypothesis 1: there will likely be a strong correlation between all three variables due to the interconnectivity of the global economy.

Null hypothesis: the interconnectivity of global economies, even if present, does not have to lead to a strong correlation between the country’s stock indexed.

Hypothesis 2: the correlation between the German DAX Index and S&P will be stronger than the correlation between the correlation between the SSE and any of the western indexes.

Null hypothesis: the geographic or political positioning of countries does not have an impact on the correlation between stock indexes.


Performing the Correlation Test


Note that there is something called a ‘’Null hypothesis’’. This is the hypothesis I should objectively accept if my results prove that I am wrong. After such a long build-up, the actual way of how to perform this test may seem like a bit of a letdown. The only order you have to input in Excel is ‘’=pearson(array 1, array 2)’’ where under array 1 and 2 you select the data you are analyzing. The results can be seen in the picture below.

Results of the Pearson Correlation Test

The correlation between S&P and DAX is close to nonexistent, there is a moderate correlation between DAX and SSE, and the SSE and S&P actually have a moderate negative correlation. Meaning, my ‘’feel true’’ hypotheses both fall through and I have to accept the null hypothesis – at least until I find more data or come up with a better hypothesis.

The result provided by the correlation test is actually quite logical. China and the US have been engaged in a trade war for the better part of the past year for the observed time period, making it perfectly logical for their stock indexes to be negatively correlated. The US has been so isolated in trade decisions during the Trump administration that it should come as no shock that the German and Chinese stock index are more connected than that of US and Germany.


Interpreting the Results


Remember the correlation vs. causation dilemma that likely bores you to death? Well, don’t mix the two up. Correlation does not imply causation. This means that regardless of how high the correlation between your two variables, this does not prove that one of the variables causes another. If you want to prove this, there are other methods you should try out such as the Granger causality test.

While even Granger causality implies a very specific form of causation, this is something you shouldn’t have to worry about the moment.  Just don’t interpret the result as ‘’an increase in the German stock index causes an increase in the Chinese stock index’’. This is just false. The correct interpretation is that the German stock index values tend to move with that of the Chinese stock index.

While you will get no specific idea on what causes what, as more specific forms of statistical analysis are required for that, hopefully the above example will show you that correlation can be a useful tool to test some hypotheses. If nothing else, it can help you write your midterm report with as little quantitative analysis as possible.