\n\n
pd.value_counts() is a function in the Pandas library used to count the frequency of each value. It calculates how many times each unique value appears in an array and returns them sorted by frequency in descending order.
This is one of the most commonly used functions in categorical data analysis, allowing you to quickly understand the distribution of your data, such as counting voting results or product category distributions.
\n\nWord Definition: value_counts means "count values", i.e., count how many times each value occurs.
\n\n
Basic Syntax and Parameters
\n\npd.value_counts() is a top-level function in the Pandas library for counting the frequency of each unique value.
Syntax Format
\n\npd.value_counts(values, sort=True, ascending=False, normalize=False, bins=None, dropna=True)\n\nParameter Description
\n\n- \n
- Parameter:
values\n- \n
- Type: Series or array-like object. \n
- Description: The data to count frequencies for. Usually a Series. \n
\n - Parameter:
sort\n- \n
- Type: Boolean. \n
- Whether to sort by frequency. Default is
True(descending). \n
\n - Parameter:
ascending\n- \n
- Type: Boolean. \n
- If
True, sort by frequency ascending; ifFalse(default), sort descending. \n
\n - Parameter:
normalize\n- \n
- Type: Boolean. \n
- If
True, return proportions (between 0 and 1) instead of absolute counts. Default isFalse. \n
\n - Parameter:
bins\n- \n
- Type: Integer or None. \n
- If an integer is specified, bin numerical data and count frequencies (similar to histograms). Not suitable for categorical data. \n
\n - Parameter:
dropna\n- \n
- Type: Boolean. \n
- Whether to include NaN in the count. Default is
True(exclude). \n
\n
Function Description
\n\n- \n
- Return Value: Returns a Series with unique values as index and their frequencies (or proportions) as values. \n
- Effect: Counts and displays how often each unique value occurs. \n
\n\n
Examples
\n\nLet's go through a series of examples from simple to complex to fully master the usage of pd.value_counts().
Example 1: Basic Usage - Counting Frequencies of Categorical Data
\n\nExample
\n\nimport pandas as pd\nimport numpy as np\n\n# 1. Create a Series with duplicate values\ncolors = pd.Series(['red','blue','green','red','blue','yellow','red','blue','blue'])\n\nprint("=== Original Series ===")\nprint(colors)\n\n# 2. Use pd.value_counts() to count frequencies\nresult = pd.value_counts(colors)\n\nprint("n=== pd.value_counts() Frequency Count ===")\nprint(result)\n\n\nExpected Output:
\n\n=== Original Series ===\n0 red\n1 blue\n2 green\n3 red\n4 blue\n5 yellow\n6 red\n7 blue\n8 blue\ndtype: object\n\n=== pd.value_counts() Frequency Count ===\nblue 4\nred 3\ngreen 1\nyellow 1\ndtype: int64\n\n\nCode Explanation:
\n\n- \n
- The result is sorted by frequency in descending order: blue appears most (4 times), red next (3 times), green and yellow once each. \n
- The returned value is a Series where the index is the color values and the values are the counts. \n
Example 2: Using the normalize Parameter to Calculate Proportions
\n\nnormalize=True converts frequencies into proportions, making it easier to see relative distributions.
Example
\n\nimport pandas as pd\nimport numpy as np\n\n# Create voting data\nvotes = pd.Series(['A','B','A','C','A','B','A','A','C','B','A','B'])\n\nprint("=== Voting Data ===")\nprint(votes)\n\n# 1. Absolute frequencies\nprint("n=== Absolute Frequencies ===")\nprint(pd.value_counts(votes))\n\n# 2. Relative proportions (normalize=True)\nprint("n=== Relative Proportions ===")\nresult_normalized = pd.value_counts(votes, normalize=True)\nprint(result_normalized)\n\nprint(f"nSum of proportions: {result_normalized.sum():.2f}")\n\n# 3. Sort ascending\nprint("n=== Ascending Order ===")\nprint(pd.value_counts(votes, ascending=True))\n\n\nExpected Output:
\n\n=== Voting Data ===\n0 A\n1 B\n2 A\n3 C\n4 A\n5 B\n...\nValue 'A' appears 6 times, 50%\nValue 'B' appears 4 times, ~33.3%\nValue 'C' appears 2 times, ~16.7%\n\n=== Relative Proportions ===\nA 0.500000\nB 0.333333\nC 0.166667\ndtype: float64\n\n=== Ascending Order ===\nC 2\nB 4\nA 6\ndtype: int64\n\n\nCode Explanation:
\n\n- \n
- Using
normalize=Trueallows quick calculation of category shares. \n - All proportions sum up to 1.0, useful for proportion analysis. \n
- Using
ascending=Trueshows results sorted by frequency ascending. \n
Example 3: Handling Numerical Data and Using the bins Parameter
\n\nFor continuous numerical data, the bins parameter can be used for binning statistics.
Example
\n\nimport pandas as pd\nimport numpy as np\n\n# 1. Create numerical data\nscores = pd.Series([85,90,78,92,88,76,95,82,70,89,91,77,84,86])\n\nprint("=== Student Scores ===")\nprint(scores)\n\n# 2. No binning - count each unique value\nprint("n=== Count Each Score ===")\nprint(pd.value_counts(scores))\n\n# 3. Use bins parameter for binning\nprint("n=== Binned into 4 bins ===")\nresult_bins = pd.value_counts(scores, bins=4)\nprint(result_bins)\n\n# 4. View bin boundaries\nprint("n=== Bin Boundaries ===")\nprint(f"Min: {scores.min()}, Max: {scores.max()}")\n\n\nExpected Output:
\n\n=== Student Scores ===\n[85, 90, 78, 92, 88, 76, 95, 82, 70, 89, 91, 77, 84, 86]\n\n=== Count Each Score ===\n85 2\n90 1\n78 1\n92 1\n88 1\n76 1\n95 1\n82 1\n70 1\n89 1\n91 1\n77 1\n84 1\n86 1\nName: count, dtype: int64\n\n=== Binned into 4 bins ===\n(88.75, 95.0] 5\n(82.5, 88.75] 4\n(76.25, 82.5] 3\n(69.974, 76.25] 2\nName: count, dtype: int64\n\n=== Bin Boundaries ===\nMin: 70, Max: 95\n\n\nThis resembles a histogram, showing that scores are mainly concentrated in the mid-high score range. The bins parameter divides numerical data into discrete bins based on ranges and then counts the number of data points in each bin.
Example 4: Handling Missing Values
\n\nThe dropna parameter controls whether missing values are included in the count.
Example
\n\nimport pandas as pd\nimport numpy as np\n\n# 1. Data with missing values\ndata = pd.Series(['a','b', np.nan,'a',None,'c', np.nan,'b','a'])\n\nprint("=== Data with Missing Values ===")\nprint(data)\n\n# 2. Default excludes NaN (dropna=True)\nprint("n=== Default Excludes NaN ===")\nprint(pd.value_counts(data))\n\n# 3. Include NaN (dropna=False)\nprint("n=== Includes NaN ===")\nprint(pd.value_counts(data, dropna=False))\n\n# 4. Count numeric missing values\nnumeric_data = pd.Series([1,2, np.nan,3,2, np.nan,1,4])\n\nprint("n=== Numeric Data with NaN ===")\nprint(pd.value_counts(numeric_data, dropna=False))\n\n\nExpected Output:
\n\n=== Data with Missing Values ===\n0 a\n1 b\n2 NaN\n3 a\n4 None\n5 c\n6 NaN\n7 b\n8 a\ndtype: object\n\n=== Default Excludes NaN ===\na 3\nb 2\nc 1\ndtype: int64\n\n=== Includes NaN (dropna=False) ===\na 3\nb 2\nNaN 2 # 2 NaN values (2 np.nan)\nc 1\ndtype: int64\n\n=== Numeric Data with NaN ===\n1.0 2\n2.0 2\nNaN 2 # 2 NaN values\n3.0 1\n4.0 1\ndtype: int64\n\n\nCode Explanation:
\n\n- \n
- By default (
dropna=True), NaN values are not counted. \n - Using
dropna=Falselets you see the count of missing values, which is useful for data quality analysis. \n - Numeric data also works with this parameter. \n
\n\n
Tip: pd.value_counts() is one of the most important functions in exploratory data analysis, helping you quickly understand data distribution. If you only need unique values without frequencies, use the pd.unique() function.
YouTip