Dominance of spreadsheets
I’m not exaggerating when spreadsheets have been the de facto tool for number crunching in aluminium manufacturing. Whether it’s a budget forecast, a process calculation, or a metal chemical composition, a spreadsheet is usually the format for storing and sharing both the data and the calculation logic.
Why are spreadsheets so popular?
What makes spreadsheets, and more specifically Excel, so widely adopted? In my experience, the reason lies in the familiarity of the tool. Every office worker, regardless of education, has at least some exposure to spreadsheets. Whenever data is discussed, spreadsheets are the language.
The popularity of spreadsheets is not without merit: even though I personally dislike them, I admit that for most data tasks spreadsheets are a powerful tool. Storing both the data, and logic in one text file that can be shared and understood by any office worker in the company is a massive benefit. Making a parallel to websites, a spreadsheet is like a frontend that allow data manipulation in a generally understandable way.
A final reason why spreadsheets are hard to replace is that they are free: some engineer, sometime in the past, built this spreadsheet tool that the business now completely depends on. While replacing this spreadsheet with a better tool is desirable, this replacement would come at a cost: the new software, the development hours, etc. The Excel solution was built “for free” by the engineer (in his/her spare time), so as far as Finance is concerned, its cost is zero.
Problems with spreadsheets
My main problem with spreadsheets is that they are constantly used as a “backend”: i.e. as a database. People copy-paste, or worse, type in numbers into a spreadsheet and save it for future reference. This way, instead of having one central place to store all date of your company, data is spread out over a wild growth of spreadsheet files. The fact that the table is often spread out over multiple tabs and formatted columns, makes it very difficult to retrieve and combine using another program.
The second problem pertains to data analysis and has to do with reproducibility: is the logic sufficiently clear, and the data exactly the same, so that a person performing the same analysis on the data will end up with the same results? Reproducibility is important because it builds trust in the report. After all, if no one can reproduce the results – how can they be trusted? Spreadsheets make reproducibility hard for two reason:
- Unlike code, spreadsheet calculations are hard to follow. You have to jump from cell to cell to see how each value in a cell interacts with another value. For large calculations, this can become incredibly complex. It’s not easy to spot a mistake.
- Because the data is often present in the file, we can’t be certain of its origin and therefore its correctness. When data is pulled from a central source, like a database, we know that we’re operating on the same data as someone else. But how do you know where the data your colleague is showing you in column A really comes from?
The future of spreadsheets
In an ideal world, code tools such as R or Python would be the default for data analysis across the enterprise. However, I’m a realist: spreadsheets (Excel) are not going anywhere. I don’t expect any busy manager or process engineer to learn how to code – in my experience, it’s simply not going to happen. What I ask myself instead, is how to best integrate Excel in the workplace.
Reaching back at the website frontend/backend metaphor, I believe it’s fine to keep using spreadsheets as a frontend. By all means, crunch some numbers on a spreadsheet and show the resulting graph on a PowerPoint slide. However, make sure that the source data does not come from Excel: it has to come from the actual source system. This way, everyone is sure that all spreadsheets are manipulating the same data.
There are plenty of integrations that allow for this. For example, the Databricks provides a convenient way to integrate with Excel. This way, the big data analytics can be performed in Databricks leveraging Spark, while the final results can be queried from Databricks directly into Excel.
Conclusion
All in all, I certainly wish Excel (spreadsheets) were less prevalent in the industry, but this is neither the reality nor likely the future. It means we’ll have to balance the trade-off between convenience and reliability by making a smart use of spreadsheets.
__________________________________________________________
About Denis Gontcharov
Denis is a data consultant who helps aluminium manufacturers break down data silos. For the past five years, he has supported the aluminium industry with IT and data services as an independent contractor.
Previously, Denis was employed as a data engineer at Novelis in Germany, a leading aluminium rolling and recycling enterprise, where he played a pivotal role in transferring process data from production machinery to cloud systems. Prior to this, he was employed as a process engineer at TRIMET’s aluminium smelters in France and Germany, developing control software for the electrolysis process. Denis is a graduate in Materials Engineering from KU Leuven, Belgium, and is currently based in Berlin.
__________________________________________________________
Decoding Data with Denis: In this exclusive AL Circle column, Denis delves into the evolving data management landscape within the aluminium industry. He explores how manufacturers are actively breaking data silos to integrate information across operations.
Keep an eye out for his column on the future prospects of unified data systems, highlighting their potential to enhance efficiency, decision-making, and innovation throughout the entire aluminium value chain.