How to reduce size of large Excel files?

Introduction

Last week, I solved an interesting problem with a customer, who had a very large Excel file compared to others with practically the same amount of data.

And I share here the solution.

The problem

When we work with files - especially Microsoft Office files - moving data, creating worksheets, copying and pasting, formatting, we can end up making the file huge without realizing it.

But this causes several problems such as slowness: whether to open, manipulate and save. Not to mention, it’s worse to send/receive large files via email.

The first thing to do is identify what is causing the problem.

In most cases, removing worksheets, data, and formulas and even removing formatting solves the problem - bringing Excel back to a normal size.

It may be that there are blank lines and/or columns, taking up more space - and in this case, it is always good to delete them - as they are unnecessary.

A tip in this case is to learn how to use the shortcuts CTRL + HOME (a shortcut that takes you to the first cell of the worksheet) and CTRL + END (another shortcut, but now it takes you to the last cell of the worksheet). And in that case, you will see if your worksheet actually ends up where it should (for example at the end of the table) or if it is considering blank cells, and leading Excel to think that it has data in them (like invisible data) and ends up increasing the size of the file unnecessarily.

There are two other great shortcuts I also use to see where data in rows and columns ends up: CTRL + DOWN ARROW (shortcut to last row that has data) and CTRL + RIGHT ARROW (shortcut to last containing data).

These are shortcuts that help you identify the real extent of your data - not what Excel thinks it is (data)!

Saving the file in XLSB (binary) format instead of default (XLSX) also helps to reduce the size a bit. (Only note in this case that XLSX has XML format and is open-source readable - while XLSB does not; that is, depending on your application with other third-party tools you may need to leave it in XLSX format).

But in this particular case today, everything had been removed, and yet the file was still huge!

It was already a large file, a Website Performance file (Traffic, Volume, Drops, Accessibility, etc.) for several cells and several days, and a week’s data resulted in a file of about 18 MB.

There were files for several weeks, and one of them was 31 MB.

image

The cause

Upon starting the investigation, I found that the larger file had nothing but a pivot table, which someone created, to see some summary, and saved the file.

That’s when I discovered that pivot tables can increase Excel file size!

Every time you create a pivot table, Excel automatically creates a “cache” of them!

In other words, a “replica” of your entire data source. And like every replica, we also have increased file size.

Problem identified, let’s go to the solution.

The solution

Well, of course, the quickest and simplest solution would be simple: remove the pivot table! Yes, this solves the size problem after you save the file.

But what if we want to “keep” the pivot table?

That’s where today’s tip comes in.

You can keep your pivot table, and just remove the cache from it!

When you change your pivot table, Excel is actually using the data cache, not the data source (you don’t have a “link” with the data source, as you might imagine).

Of course, this has an advantage - more speed when making changes to your pivot table. But it also has a problem: it takes up a lot of space!

Recommended solution

The recommended solution in this case is to delete the cache of the pivot table and keep the source data (in our case the table with the data).

To do this, right-click any cell on your pivot table, and click “PivotTable Options”.

You will see in the tab that your pivot table has the default options.

image

Then:

  • Uncheck “Save source data with file”.

  • Check the option “Refresh data when opening file”.

And click the “OK” button.

Two messages will appear, informing you of what you have done. Click “OK” on both.

In other words, you are saying that:

  • No longer saving source data along with pivot table - removing cache.

  • Excel will only regenerate the cache when the file is opened (or you click to refresh manually).

Save your file, and exit. Done.

Alternative solution

There is another much simpler way to reduce the size, which is to delete the source data instead of Pivot Cache.

And if you need the original dataset, why regenerate it - simply twice in the “Grand Total” cell in the PivotTable.

But it can be weird when you send the spreadsheet to someone, and they don’t know how to access the original data.

For that reason, I prefer to use the first alternative (remove the cache and leave the original data).

Conclusion

PivotTables are designed to quickly get an overview or verify data. However, they can significantly increase your file size if you no longer need them.

It takes a lot of patience to deal with very large Excel spreadsheet files as they take a long time to update when you make changes.

And they also take a long time to open and respond. In short, it makes you waste a lot of time, and it becomes even more critical when you have limited time.

Not to mention that sharing (eg emailing) can be difficult.

And for these reasons, reducing file size is sometimes the most realistic approach to take.

There are a few ways to shrink the Excel file, such as removing formatting, etc.).

But with today’s tip - specifically for those who use pivot tables - you learned one that gives you the possibility to make your Excel files with pivot tables much smaller.

If you liked the tip, please comment below (and do not forget to share with your teammates).

5 Likes

Thanks for the TIP

Thanks for this “indirect” tip.

My file was slow, and the reason was thousands of "ghost’ cells, empty cells that Excel think have data.

image