Pandas Pd Value Counts

Pandas pd.value_counts() Function |\n\n

Image 1: Pandas Common Functions Pandas General Functions

\n\n

pd.value_counts() is a function in the Pandas library used to count the frequency of each value. It calculates how many times each unique value appears in an array and returns them sorted by frequency in descending order.

\n\n

This is one of the most commonly used functions in categorical data analysis, allowing you to quickly understand the distribution of your data, such as counting voting results or product category distributions.

\n\n

Word Definition: value_counts means "count values", i.e., count how many times each value occurs.

\n\n

Basic Syntax and Parameters

\n\n

pd.value_counts() is a top-level function in the Pandas library for counting the frequency of each unique value.

\n\n

Syntax Format

\n\n

pd.value_counts(values, sort=True, ascending=False, normalize=False, bins=None, dropna=True)

\n\n

Parameter Description

\n\n

Parameter: values\n
- Type: Series or array-like object.
- Description: The data to count frequencies for. Usually a Series.
\n
Parameter: sort\n
- Type: Boolean.
- Whether to sort by frequency. Default is True (descending).
\n
Parameter: ascending\n
- Type: Boolean.
- If True, sort by frequency ascending; if False (default), sort descending.
\n
Parameter: normalize\n
- Type: Boolean.
- If True, return proportions (between 0 and 1) instead of absolute counts. Default is False.
\n
Parameter: bins\n
- Type: Integer or None.
- If an integer is specified, bin numerical data and count frequencies (similar to histograms). Not suitable for categorical data.
\n
Parameter: dropna\n
- Type: Boolean.
- Whether to include NaN in the count. Default is True (exclude).
\n

\n\n

Function Description

\n\n

Return Value: Returns a Series with unique values as index and their frequencies (or proportions) as values.
Effect: Counts and displays how often each unique value occurs.

\n\n

Examples

\n\n

Let's go through a series of examples from simple to complex to fully master the usage of pd.value_counts().

\n\n

Example 1: Basic Usage - Counting Frequencies of Categorical Data

\n\n

Example

\n\n

import pandas as pd\nimport numpy as np\n\n# 1. Create a Series with duplicate values\ncolors = pd.Series(['red','blue','green','red','blue','yellow','red','blue','blue'])\n\nprint("=== Original Series ===")\nprint(colors)\n\n# 2. Use pd.value_counts() to count frequencies\nresult = pd.value_counts(colors)\n\nprint("n=== pd.value_counts() Frequency Count ===")\nprint(result)\n

\n\n

Expected Output:

\n\n

=== Original Series ===\n0    red\n1    blue\n2    green\n3    red\n4    blue\n5    yellow\n6    red\n7    blue\n8    blue\ndtype: object\n\n=== pd.value_counts() Frequency Count ===\nblue     4\nred      3\ngreen    1\nyellow   1\ndtype: int64\n

\n\n

Code Explanation:

\n\n

The result is sorted by frequency in descending order: blue appears most (4 times), red next (3 times), green and yellow once each.
The returned value is a Series where the index is the color values and the values are the counts.

\n\n

Example 2: Using the normalize Parameter to Calculate Proportions

\n\n

normalize=True converts frequencies into proportions, making it easier to see relative distributions.

\n\n

Example

\n\n

import pandas as pd\nimport numpy as np\n\n# Create voting data\nvotes = pd.Series(['A','B','A','C','A','B','A','A','C','B','A','B'])\n\nprint("=== Voting Data ===")\nprint(votes)\n\n# 1. Absolute frequencies\nprint("n=== Absolute Frequencies ===")\nprint(pd.value_counts(votes))\n\n# 2. Relative proportions (normalize=True)\nprint("n=== Relative Proportions ===")\nresult_normalized = pd.value_counts(votes, normalize=True)\nprint(result_normalized)\n\nprint(f"nSum of proportions: {result_normalized.sum():.2f}")\n\n# 3. Sort ascending\nprint("n=== Ascending Order ===")\nprint(pd.value_counts(votes, ascending=True))\n

\n\n

Expected Output:

\n\n

=== Voting Data ===\n0    A\n1    B\n2    A\n3    C\n4    A\n5    B\n...\nValue 'A' appears 6 times, 50%\nValue 'B' appears 4 times, ~33.3%\nValue 'C' appears 2 times, ~16.7%\n\n=== Relative Proportions ===\nA    0.500000\nB    0.333333\nC    0.166667\ndtype: float64\n\n=== Ascending Order ===\nC    2\nB    4\nA    6\ndtype: int64\n

\n\n

Code Explanation:

\n\n

Using normalize=True allows quick calculation of category shares.
All proportions sum up to 1.0, useful for proportion analysis.
Using ascending=True shows results sorted by frequency ascending.

\n\n

Example 3: Handling Numerical Data and Using the bins Parameter

\n\n

For continuous numerical data, the bins parameter can be used for binning statistics.

\n\n

Example

\n\n

import pandas as pd\nimport numpy as np\n\n# 1. Create numerical data\nscores = pd.Series([85,90,78,92,88,76,95,82,70,89,91,77,84,86])\n\nprint("=== Student Scores ===")\nprint(scores)\n\n# 2. No binning - count each unique value\nprint("n=== Count Each Score ===")\nprint(pd.value_counts(scores))\n\n# 3. Use bins parameter for binning\nprint("n=== Binned into 4 bins ===")\nresult_bins = pd.value_counts(scores, bins=4)\nprint(result_bins)\n\n# 4. View bin boundaries\nprint("n=== Bin Boundaries ===")\nprint(f"Min: {scores.min()}, Max: {scores.max()}")\n

\n\n

Expected Output:

\n\n

=== Student Scores ===\n[85, 90, 78, 92, 88, 76, 95, 82, 70, 89, 91, 77, 84, 86]\n\n=== Count Each Score ===\n85    2\n90    1\n78    1\n92    1\n88    1\n76    1\n95    1\n82    1\n70    1\n89    1\n91    1\n77    1\n84    1\n86    1\nName: count, dtype: int64\n\n=== Binned into 4 bins ===\n(88.75, 95.0]    5\n(82.5, 88.75]    4\n(76.25, 82.5]    3\n(69.974, 76.25]  2\nName: count, dtype: int64\n\n=== Bin Boundaries ===\nMin: 70, Max: 95\n

\n\n

This resembles a histogram, showing that scores are mainly concentrated in the mid-high score range. The bins parameter divides numerical data into discrete bins based on ranges and then counts the number of data points in each bin.

\n\n

Example 4: Handling Missing Values

\n\n

The dropna parameter controls whether missing values are included in the count.

\n\n

Example

\n\n

import pandas as pd\nimport numpy as np\n\n# 1. Data with missing values\ndata = pd.Series(['a','b', np.nan,'a',None,'c', np.nan,'b','a'])\n\nprint("=== Data with Missing Values ===")\nprint(data)\n\n# 2. Default excludes NaN (dropna=True)\nprint("n=== Default Excludes NaN ===")\nprint(pd.value_counts(data))\n\n# 3. Include NaN (dropna=False)\nprint("n=== Includes NaN ===")\nprint(pd.value_counts(data, dropna=False))\n\n# 4. Count numeric missing values\nnumeric_data = pd.Series([1,2, np.nan,3,2, np.nan,1,4])\n\nprint("n=== Numeric Data with NaN ===")\nprint(pd.value_counts(numeric_data, dropna=False))\n

\n\n

Expected Output:

\n\n

=== Data with Missing Values ===\n0    a\n1    b\n2    NaN\n3    a\n4    None\n5    c\n6    NaN\n7    b\n8    a\ndtype: object\n\n=== Default Excludes NaN ===\na    3\nb    2\nc    1\ndtype: int64\n\n=== Includes NaN (dropna=False) ===\na    3\nb    2\nNaN    2  # 2 NaN values (2 np.nan)\nc    1\ndtype: int64\n\n=== Numeric Data with NaN ===\n1.0    2\n2.0    2\nNaN    2  # 2 NaN values\n3.0    1\n4.0    1\ndtype: int64\n

\n\n

Code Explanation:

\n\n

By default (dropna=True), NaN values are not counted.
Using dropna=False lets you see the count of missing values, which is useful for data quality analysis.
Numeric data also works with this parameter.

\n\n

Tip: pd.value_counts() is one of the most important functions in exploratory data analysis, helping you quickly understand data distribution. If you only need unique values without frequencies, use the pd.unique() function.

\n\n

Image 2: Pandas Common Functions Pandas General Functions

YouTip