My New Favorite Excel Function: INDIRECT
I will confess that somehow I never really learned pivot table mechanics in Excel, so I struggle with three dimensional data. One example might be a spreadsheet with individual tabs where each tab is a different corporate division, and on each tab is a P&L by month (so three dimensions: Month, P&L category, Division). Let's assume the P&L is arranged the same on every page, with, for example, one divisions June's total revenue number in the same cell number as the June total revenue number on every other divisions' tab, ie this field is in cell c8 on every worksheet tab.
Long ago I created a simple way to get a total of a particular cell across all spreadsheets. I would add a spreadsheet tab in the workbook before all the division tabs and another after all the division tabs. Let's say I just name these tabs "Posta" and "Postb". Then the sum of all cells C8 that are located on a spreadsheet tab between tabs "Posta" and "Postb" would be
=SUM(Posta:Postb!C8).
The problem comes when one wants to create a summary worksheet tab that doesn't sum all the values for C8 but summarizes them in a table. Imagine a table where column A is the division name (that matches the name of the tab for that division) and column B is that division's June revenue, ie the value of cell C8 for that division from its individual spreadsheet. The only way I knew how to do this before was manually and tediously.
But laziness is the mother of invention, and I finally encountered a workbook that was so tedious to summarize manually that I had to find another way. I had a spreadsheet of 150 tabs, each worksheet tab being one of our locations containing online customer review scores formatted the same way into the same cells. That is when I found the INDIRECT function. Basically it allows one to craft a custom cell reference in a text string, feed that to the indirect function, which will output the contents of that cell. So if our location names in the first column exactly match the worksheet tab names, then we can write
=INDIRECT("'"&$A3&"'"&"!C$8")
The ampersand symbols are basically text string concatenation operators, and are there to create the text string of a cell reference in the format excel expects. The funny triple quotes is just to add a single quote mark before and after the tab name. This particular string will give us the value of cell C8 that is in the worksheet tab with the name that is in cell A3.
You can also use this cell value from the INDIRECT function in more complicated formulas. For example
=COUNTIF( INDIRECT("'"&$A3&"'"&"!$B$3:$B$500"),D$4)
would look in the spreadsheet tab whose name is in A3 and on that tab count all the values in the range B3 to B500 on that tab that have the value given in cell D4. For example, if D4 is equal to "5" we could be counting all the reviews that had a score of "5".
Postscript: This may be my record for the blog post with the niche-iest audience. Mainly it is aimed at my son, who has the enviable job of being an analyst for a craft beer company in La Jolla. He has learned not to complain much to me about his job, as my first job was in an oil refinery in Baytown, Texas, so my sympathy level is maybe lower than it should be. Anyway, as part of a geeky family, he and I compete on Excel knowledge so this post is mainly my way of counting coup on him.
Update: I totally agree with the comments that a relational database is needed. Unfortunately, at the time I did this work, we had only within weeks been given access to the review data from the government recreation reservations database, and it was all in excel. Faster to do the analysis in excel than to figure out how to read the sheets into something like mysql (of which I am positive there are a million simple tools for doing so). But I will accept it as a challenge for this year.
These days I spend more time in tools like Excel than I do in software development tools, so tips like this are most welcome. I've had to puzzle through Excel a bit at a time - thanks for sharing!
It could be more niche. It could be a post about using Excel to track your model trains.
Mr. Meyer, 'nice solution. There's an even better way. Were you fishing for it? :-) Instead of scattering all your data across separate worksheets and then trying to consolidate it, keep all your data in a table on a single worksheet and use it in the individual reports. That's how relational database systems work, and they are the gold standard for working with data. If your data were organized that way, there would be several ways to read data out of your single data table for each division's report worksheet. PivotTables is one. But my personal favorite is to use to emulate a relational database application by a combination of (a) assigning a simply numeric primary key to each row in the data table; say, 1 through "n" (e.g., 1,000); (b) on each report worksheet use Excel's SMALL and IF functions and a division's name (selected from a dropdown, to name one possibility) to read out the data table's primary key for each row of the division's data; and (c) using INDEX and MATCH to look up each row's data across all columns of interest.
Excel has to be the single most useful program in all of global, computing history so that's a pretty big niche.
I've never really used pivot tables but I know my wife loves them.
My biggest achievement in Excel is a spreadsheet I built to create script code to run macros in a separate program which produces quarterly client reports and then saves/combines them as .pdf docs in dated, client sub-directories. Every quarter the dates change but most everything else stays the same although each client has a unique combination of reports...Q4 gets certain year-end reports etc. etc.
Basically, I have one tab where you plug in all the unique variables for that quarter and all the other tabs are different groups of clients with bits of script code for their own reports. In those tabs, in each column are all the little bits of script code and in the last column everything gets CONCATENATE-ed together into a single line of instruction/code. That column is the code that gets copy/pasted, top to bottom into a .scr file.
It's very effective and has mostly automated a lot of dumb-dumb manual, error-prone report assembly at our firm - allowing me to F-around on the internet and hustle on the side.
I even have "switches" to "turn on" various optional reports with universal and manual override switches on top of those. It took me all day to get this syntax nightmare of a formula down which copies over CONCATEATE-ed code to the last cell only if all the switches allow for it:
=IF(AND($G408="Y",$H408="N"),"",IF(AND($G408="Y",$H408=""),$T408,IF(AND($G408="N",$H408=""),"",IF(AND($G408="N",$H408="Y"),$T408,IF(AND($G408="Y",$H408="Y"),$T408,IF(AND($G408="N",$H408="N"),"",
I may try to convert/interface everything into/with Python some day which is a whole 'nuther sub-culture:
http://www.python-excel.org/
My greatest achievement was to use Excel to load a table with jobs to run for a datamart about 20 years ago.
YOu need a better system. Use a relational database structure instead of 150 excel tabs.
Another function that similarly blew my mind when I found it was OFFSET. It's a little hard to understand, but it's similarly useful when needing to lookup and/or aggregate cells/ranges that vary in location and/or size.
Yeah, I built a heat map application in Excel that would work for multiple clients for a company that my brother works for. It's something that should have been done in JavaScript. But, hey, that needed an MVP fast and cheap. They are still using it today 4 or 5 years later. Still haven't gotten that JS version out there, granted they are doing a database migration which is a huge deal! I had to export the stuff into html/css files since Excel couldn't handle all the data. Regardless, Excel is an amazing tool that you can do amazing things in, even things you really shouldn't do!
I have used the INDIRECT function previously, along with OFFSET, but I wonder if they could not improve the context with fewer many single and double quotes. Don't know how, but this is not my job. You need to copy the function statement and then organize the quote hierachies.
Without actually seeing your data requirements, I agree that a relational DB is most probably a better solution. Having said that, in the 90's, I worked on a $1.5B project, thousands of people, products, labor cost, etc. that was managed with linked Excel spreadsheets on Windows 3.1. Excel is a great tool and can function as a DB of sorts in some situations. It's major failing compared to a true DB is where multiple simultaneous user access is needed. Excel has a "share" feature but it will break and cause data loss always at the worse possible time.
Any time you can smoke the young 'un is a good time
Try using named ranges instead of absolute cell coordinates:
https://support.office.com/en-us/article/define-and-use-names-in-formulas-4d0f13ac-53b7-422e-afd2-abd7ff379c64
That way, you don't have to have the same format for every sheet. You can refer to, "monthly_revenue," instead of, "$C$34." The real utility is this - if you start with named ranges (and a named range is often a single cell), and then add rows or columns, the named range moves like it should, and you don't have to keep track of it. If a named range covers multiple cells, it will shrink and grow.
I like to think I just saved you a few hundred hours a year complying with California reporting requirements.
Gah, Excel (and computer spreadsheets in general) is already almost universally abused in ways equivalent to using GOTO to make spaghetti code, but this is like using a computed GOTO.
In my, rather unsuccessful, period as a high school Physics teacher I always found that in any school the most advanced Excel users were the Phys Ed teachers.
I always put it down to them having too little intellectual stimulation in the day job.
Personally I am nostalgic for Visicalc on an Apple 2e. All the functions could be covered in a small paperback book and it still shook the business world to the core!
150 tabs? Excel is my baby and I've spent my fair share of time coaxing it to do crazy calculations, but once you've hit that scale, you need to seriously consider a better suited tool. Seconding the relational database suggestion (though NoSQL could possibly be an option too). More importantly, grab a programmer with data architecture experience to review what you have and see if there's a better architecture for expressing your data's relationships as well.
Baytown may be vile for all I know but there's nothing wrong with an oil refinery. I enjoyed working on one. Though I think I preferred working in a petrochemicals division: better smells.
I'd agree with those who are suggesting that it's time to look at moving this functionality into a database or some sort of profesionally-developed analytics system. Yes, it is often an expensive, nightmare-level pain in the ass to accomplish this and get it working the way you want... but if you're making important business decisions off of the data it really is something to take a hard look at. Spreadsheets are awesome for ad-hoc analysis and prototyping analytic systems. They really suck in long-term production. Dig into the statistics on significant errors in critical business spreadsheet implementation (especially financials) sometime ... it's a horror show. In practice, these spreadsheets act as applications themselves, but spreadsheets encourage just about every application development worst practice / anti-pattern there is. Unless you're exceptionally meticulous and disciplined, you're going to screw it up. And even if you ware exceptionally meticulous and disciplined, you'll still probably screw it up. I've personally found sheets full of data being used to drive companies and the output was just dead wrong because of some arcane error in a six hundred character formula in a cell somewhere. It's very much a trap.
Vanishingly few temporary quick fixes are ever temporary. :-)
I agree that a referential database is the way to go. If you're not too keen on learning access, an easy way to do it would be to create a referential spreadsheet to categorize the data, and then an additional spreedsheet to summarize the categorized data. So, you create Sheet151, dump all your accounts into it, and categorize each account based on how you'd like to see it on the summarized sheet (Sheet 152). Then, you can use the VLOOKUP function on each tab so that each account has the referential summary column labeled next to it. Then use the SUMIF function on the summarized sheet to get the summary values. Pros of this: easy to do, will take 2 hours. Cons: low referential integrity. Me personally: I despise Access, and have no problems with password protect :)
Yeah, at my last employer I dumped a bunch of F# code. All approved by my employer. It was supposed to be temp code. Until they laid me off! I was in the process of converting a bunch of Access databases over to SQL Server that were multitenant/lingual. Lot's of work. But the temp developer that they have on hand was able to get it working just fine.
There are better relational databases out there than Access. I use old fashion FoxPro. The benefit of that is it would not take 2 hours to handle that data.
Another problem with Excel sheets is if you fudge one of the numbers in one of the 150+ sheets your entire system is wrong and you have to find the error in a massive tangle of tabs. I had this happen to me onetime when the data entry person was supposed to code .001 and instead typed 1. These were percentages so instead of a total of $10,000 being calculated, $10 million was the end result. I therefore went into a meeting presenting forecasted sales numbers off by $10 million, about 25% of the total. Fun to get caught in that.
I have never read a blog post about an excel function. Super cool.
I used to use INDIRECT but stopped because I found the inflexibility to any changes in row or column positions made it a net loss on time (ie if I add a column in a sheet anywhere to the left of data I am calling using INIDRECT, the cell references in the INDIRECT function will not update whereas cell references in a regular function will). I didnt use enough multi tab sheets to make INDIRECT worth it.
After a decade using Excel I didn't know =SUM(Posta:Postb!C8) was even possible. Neat.
Let me make a suggestion - Look at python (2.7.x OR 3.X.X) and the "openpyxl" library (there are others) - I have been helping with analysis of data from multiple xlsx files with multiple sheets to create summary sheets (using specific criteria) - Yes, "excel" is a terrific tool, but writing macros/formula when there are lots of entries is a bit daunting - may be this is me, but I am always afraid of the program corrupting the values/formula - keep the raw data/info in the xlsx, use python for analysis/organization (!)
I remember looking at some Excel files with complicated "IF - THEN - IF THEN - " and said "forget it" _ let python do it (!) ... if you do a LOT of calculations in excel (multiple xlsx, multiple sheets) - just go with python (!)
My first job was programming binary load lifters, very similar to your vaporators..... ;-)
The deal here is that its "work on my complicated spreadsheet myself" vs. hire someone at about $100/hr to work on this for me....
Sometimes, just keep it simple. Although 150 sheets, whoa.
I've seen fairly simple data sets on post-its get input into Excel spreadsheets, go to Access and then to SQL Server and become the basis for legacy systems that hang on for decades. You almost have to put a stake in the heart of some archaic systems to get people to let go of them.
On your update:
I know several posters here don't like Access but for your situation, it might be a viable option.
I hadn't done any database application development since DB2 under msdos. I have friends (attorneys) that were familiar with spreadsheets but not DB's. They started a class action lawsuit using Excel spreadsheets to assemble data on a broad spectrum of information. As the spreadsheets became enormous and unwieldy, they tried breaking them up to make them load more quickly and allow multiple people to work simultaneously. They asked me to help with several data conversion, aggregation and linking issues. I told them early in the process, they should convert to a DB. It wasn't until several major screw up and data loss that they relented on the DB. They hired a local consultant with DB experience. I didn't want the job, being officially retired and not having done any DB work in 30+ years. The consultant turned out to be a disaster so I ended up agreeing to help them out until they could find a permanent solution. To shorten the story, I opened up Access, played with it for a few days, spent a weekend importing their spreadsheets, breaking it up into logical data tables and building several screens for data IO. It's now almost 10 years later and the lawsuit is ongoing and the DB is still in daily use.
My long winded point is that converting from Excel spreadsheets to a Access DB is relatively easy. The import and exports tools are straight forward and you can still use VB for specialized functions. I built several data extract queries to Excel for ad hoc reporting. Excel is the preferred tool for playing "what if" with numbers, graphing, etc. They are happy with that operating environment. The only thing I still help them with is mass updates from an external data source. There is not a doubt in my mind that you can do this. Play with it, you may like it.
Sorry about the length of this post.
I use that function in a spreadsheet designed to "roll the dice" to make random selections of items for a Dungeons and Dragons game.
In a traditional table, you have a list of items, and next to each is a numerical range. You roll dice to generate a random number between 1 and 100, and if the result is in a given range, you assign the corresponding item.
The spreadsheet I created has the column of items (column A), next to a column containing the lower number in each range (column B). The column next to that (column C) is an IF statement, returning 1 if the random number generated is greater than or equal to the value in column B.
Count the number of values of 1 returned, and that tells you what row was selected by the random die roll.
And that's the number of rows down from the top which is fed into the INDIRECT statement, causing the selected row to display in the right box.
Maybe a little more niche than your post...
Mr. Meyer, I read your update in which you said you agree with many commenters' recommendation that you use a relational database application. At the risk of belaboring the issue, with all due respect, as someone who does both Excel development, on the one hand, and Access and SQL Server relational database development, on the other hand, I disagree. Use the simplest tool that will do the job. You can drive a cabinetry nail with a pneumatic sledgehammer, but you should use a carpenter's hammer. The use case you described in your original post was not too much for Excel to handle easily. You simply need one worksheet to host your data and one worksheet in which to implement a report template in which the report content can be determined simply by selecting from a dropdown the Division you want to report on. It would take only a few minutes to set that up in Excel; significantly longer for most non-relational-database developers in a relational database. I hope that's useful.
But if he does end up putting it in a DB put it in PostgreSQL rather than MySQL. MySQL sucks. It doesn't have a very powerful query engine.