The MySQL CUME_DIST function is used to calculate the cumulative distribution of a set of values within a group of rows. It represents the relative position of a specified value within the group, considering the sorted order of the values. The result is a value between 0 and 1, indicating the cumulative distribution of the specified value in the sorted set.
The MySQL CUME_DIST function is commonly used in statistical analysis and ranking tasks. Here are some specific applications:
Identifying percentiles: The cumulative distribution can be used to calculate percentiles, such as the 25th percentile (Q1), 50th percentile (median), and 75th percentile (Q3).
Identifying low-performing or high-performing groups: By analyzing the cumulative distribution, you can identify groups of students or products that have consistently performed better or worse than others.
Comparing cumulative distributions across different groups: This can be useful for analyzing performance trends or identifying potential outliers.
Visualizing data distributions: The cumulative distribution can be represented graphically to provide a clearer understanding of the spread and distribution of data.
Syntax
Here is the syntax for the CUME_DIST function:
CUME_DIST() OVER ( [PARTITION BY partition_expression, ... ] ORDER BY sort_expression [ASC | DESC], ... )
PARTITION BY: Optional clause that divides the result set into partitions to which the CUME_DIST function is applied independently.
ORDER BY: Specifies the column or columns by which the result set is sorted. The CUME_DIST function considers this order when calculating the cumulative distribution.
Example
Now, let’s look at an example to better understand the usage of the CUME_DIST function. Consider a table named sales:
CREATE TABLE sales ( product_id INT, sale_amount DECIMAL(10, 2) ); INSERT INTO sales (product_id, sale_amount) VALUES (1, 100.50), (1, 150.25), (1, 200.75), (2, 50.00), (2, 75.50), (3, 120.00), (3, 180.25);
Now, if you want to calculate the cumulative distribution of the sale amounts within each product group, you can use the CUME_DIST function:
SELECT product_id, sale_amount, CUME_DIST() OVER (PARTITION BY product_id ORDER BY sale_amount) AS cumulative_distribution FROM sales;
In this example, the CUME_DIST function is applied within each product_id partition and orders the rows by the sale_amount. The result will show the cumulative distribution of sale amounts for each product. The output might look like this:
| product_id | sale_amount | cumulative_distribution | |------------|-------------|-------------------------| | 1 | 100.50 | 0.3333 | | 1 | 150.25 | 0.6667 | | 1 | 200.75 | 1.0000 | | 2 | 50.00 | 0.5 | | 2 | 75.50 | 1.0 | | 3 | 120.00 | 0.5 | | 3 | 180.25 | 1.0 |
In this result, the cumulative_distribution column represents the cumulative distribution of the sale amounts within each product group.
The MySQL CUME_DIST function is a powerful window function for understanding data distributions and identifying patterns in numerical data. It plays a crucial role in data analysis and statistical applications within MySQL.