approx_top_k
Returns the top k most frequently occurring item values in an expr along with their approximate counts.
This function is supported from v3.0.
Syntaxβ
APPROX_TOP_K(<expr> [ , <k> [ , <counter_num> ] ] )
Argumentsβ
- 
expr: An expression of STRING, BOOLEAN, DATE, DATETIME, or numeric type.
- 
k: An optional INTEGER literal greater than 0. Ifkis not specified, it defaults to5. The maximum value is100000.
- 
counter_num: An optional INTEGER literal greater than or equal tok, The larger thecounter_numis, the more accurate the result will be. However, this also comes with increased CPU and memory costs.- The maximum value is 100000.
- If counter_numis not specified, it defaults tomax(min(2 * k, 100), 100000).
 
- The maximum value is 
Returnsβ
Results are returned as an ARRAY of type STRUCT, where each STRUCT contains an item field for the value (with its original input type) and a count field (of type BIGINT) with the approximate number of occurrences. The array is sorted by count descending.
The aggregate function returns the top k most frequently occurring item values in an expression expr along with their approximate counts. The error in each count may be up to 2.0 * numRows / counter_num where numRows is the total number of rows. Higher values of counter_num provide better accuracy at the cost of increased memory usage. Expressions that have fewer than counter_num distinct items will yield exact item counts. Results include NULL values as their own item in the results.
Examplesβ
Use data in the scores table as an example.
-- Calculate the score distribution of each subject.
MySQL > SELECT subject, APPROX_TOP_K(score)  AS top_k FROM scores GROUP BY subject;
+---------+--------------------------------------------------------------------------------------------------------------------+
| subject | top_k                                                                                                              |
+---------+--------------------------------------------------------------------------------------------------------------------+
| physics | [{"item":99,"count":2},{"item":null,"count":1},{"item":100,"count":1},{"item":85,"count":1},{"item":60,"count":1}] |
| english | [{"item":null,"count":1},{"item":92,"count":1},{"item":98,"count":1},{"item":100,"count":1},{"item":85,"count":1}] |
| NULL    | [{"item":90,"count":1}]                                                                                            |
| math    | [{"item":80,"count":2},{"item":null,"count":1},{"item":92,"count":1},{"item":95,"count":1},{"item":70,"count":1}]  |
+---------+--------------------------------------------------------------------------------------------------------------------+
-- Calculate the score distribution of the math subject.
MySQL > SELECT subject, APPROX_TOP_K(score)  AS top_k FROM scores WHERE subject IN  ('math') GROUP BY subject;
+---------+-------------------------------------------------------------------------------------------------------------------+
| subject | top_k                                                                                                             |
+---------+-------------------------------------------------------------------------------------------------------------------+
| math    | [{"item":80,"count":2},{"item":null,"count":1},{"item":95,"count":1},{"item":92,"count":1},{"item":70,"count":1}] |
+---------+-------------------------------------------------------------------------------------------------------------------+
keywordβ
APPROX_TOP_K