I'm not a statistic professional and I see this 'var' measure option in a Hana calculation view for number data. I want to verify the Hana calculation of Var, so I'm asking where to find the documentation of Hana's equation. I've noticed before, with Hana CV, the 'avg' measure does not actually return the real Avg of the data, so I want to double check the Var/Std calculations.
does this video provide the correct equation? : https://youtu.be/iKpDXCpiqSo?t=93
Thank you.
I've searched the documentation for Hana and haven't found it: https://help.sap.com/viewer/42668af650f84f9384a3337bcd373692/1.0.12/en-US/b7e4d37e811a4fc7aed194cd5908bd1c.html?q=HANA%20VARIANCE%20CALCULATION%20VIEW Additional Aggregation Functions for Measures (Changed) In addition to the existing aggregation functions for measures, SAP HANA modeler also supports the aggregations, Average (AVG), Variance (VAR), and Standard deviation (STDDEV).
Hmm... simply searching the HANA documentation for "variance" returns several hits that cover the different "variance" calculations available in HANA.
For the graphical modeller, the aggregation functions "avg", "stddev" and "var" are the equivalents of the SQL expressions https://help.sap.com/viewer/4fe29514fd584807ac9f2a04f6754767/1.0.12/en-US/20a4389775191014b5a6bf2ccc0df2ed.html?q=variance%20aggregation.
Now, granted, the documentation for those expressions is cyclical (but not wrong) by stating:
VARReturns the variance of the given expression as the square of the standard deviation.STDDEVReturns the standard deviation of the given expression as the square root of the VAR function.From here, even without being a "statistics professional" checking e.g. https://en.wikipedia.org/wiki/Variance might be a good idea.
Looking at the (assumed) random numbers in a column, databases can merely go and calculate the variance for "Discrete random variables".
That means, the "average" (mean) for the values is calculated; then for every value the difference to this mean is squared, and finally, the "average" (mean) of these squared differences is computed.
That's what the variance expression should return.
If you'd had a little bit broader look into the documentation, you would've found the other pages, e.g. about the aggregate functions VAR_POP/VAR_SAMP where this calculation is also explained:
Returns the population variance of the expression as the sum of squares of the difference of <expression> from the mean of <expression>, divided by the number of rows remaining.
Finally, yes, the HANA Academy video explains this the same way and is correct.
Add a comment