YouTip LogoYouTip

Pandas Pd Read Json

Pandas pd.read_json() Function |\n\n

Image 1: Python math Module Pandas Common Functions

\n\n
\n\n

read_json() is a function in the pandas library used for reading JSON (JavaScript Object Notation) files, supporting multiple JSON format data imports.

\n\n

JSON is a lightweight data interchange format that is easy for humans to read and write, and also easy for machines to parse and generate. It is widely used in scenarios such as Web APIs and configuration files. read_json() can convert JSON data into pandas DataFrame format, making it convenient for data analysis.

\n\n
\n\n

Basic Syntax and Parameters

\n\n

Syntax Format

\n\n
pandas.read_json(path_or_buf, orient=None, typ='frame', dtype=None, convert_axes=None, convert_dates=True, keep_default_dates=True, numpy=False, precise_float=False, date_unit='ms', ...)\n
\n\n

Parameter Description

\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n
ParameterTypeDescriptionDefault Value
path_or_bufstr, path object, or file-like objectJSON file path, URL, or stringRequired
orientstrFormat of JSON data: 'split', 'records', 'index', 'columns', 'values'None
typstrReturn type: 'frame' returns DataFrame, 'series' returns Series'frame'
dtypedictSpecify data types for columnsNone
convert_axesboolWhether to convert axes to datetimeNone
convert_datesbool, listWhether to convert date columnsTrue
numpyboolWhether to use numpy arraysFalse
\n\n

Return Value

\n\n
    \n
  • Return Type: pd.DataFrame or pd.Series
  • \n
  • By default, returns a DataFrame, which is a two-dimensional table.
  • \n
  • Returns a Series when typ='series'.
  • \n
\n\n
\n\n

Examples

\n\n

Through the following examples, master various uses of read_json().

\n\n

Example 1: Reading JSON Data in Different Formats

\n\n

JSON data has multiple formats, and read_json() supports several common ones.

\n\n

Example

\n\n
import pandas as pd\n\nimport json\n\n# Sample data: JSON array format (records format)\n\njson_records ='''\n\n [\n {"name": "Tom", "age": 28, "city": "Beijing", "salary": 8000},\n {"name": "Jerry", "age": 35, "city": "Shanghai", "salary": 12000},\n {"name": "Mike", "age": 42, "city": "Guangzhou", "salary": 15000}\n ]\n\n '''\n\n# Write JSON string to file\n\nwith open('data_records.json','w', encoding='utf-8')as f:\n f.write(json_records)\n\n# Read JSON file (records format, most commonly used)\n\n# orient='records' means each row is a JSON object\n\ndf_records = pd.read_json('data_records.json', orient='records')\n\nprint("Records format:")\nprint(df_records)\nprint()\n\n# Sample data: JSON object format (index format)\n\njson_index ='''\n\n {\n "Tom": {"age": 28, "city": "Beijing", "salary": 8000},\n "Jerry": {"age": 35, "city": "Shanghai", "salary": 12000},\n "Mike": {"age": 42, "city": "Guangzhou", "salary": 15000}\n }\n\n '''\n\nwith open('data_index.json','w', encoding='utf-8')as f:\n f.write(json_index)\n\n# Read JSON file (index format, using a field as index)\n\ndf_index = pd.read_json('data_index.json', orient='index')\n\nprint("Index format:")\nprint(df_index)\nprint()\n\n# Sample data: JSON column format (columns format)\n\njson_columns ='''\n\n {\n "name": ["Tom", "Jerry", "Mike"],\n "age": [28, 35, 42],\n "city": ["Beijing", "Shanghai", "Guangzhou"],\n "salary": [8000, 12000, 15000]\n }\n\n '''\n\nwith open('data_columns.json','w', encoding='utf-8')as f:\n f.write(json_columns)\n\n# Read JSON file (columns format)\n\ndf_columns = pd.read_json('data_columns.json', orient='columns')\n\nprint("Columns format:")\nprint(df_columns)\n
\n\n

Expected Output:

\n\n
Records format:\n   name  age      city  salary\n0   Tom   28    Beijing    8000\n1  Jerry   35   Shanghai   12000\n2   Mike   42  Guangzhou   15000\n\nIndex format:\n        age      city  salary\nTom     28    Beijing    8000\nJerry   35   Shanghai   12000\nMike    42  Guangzhou   15000\n\nColumns format:\n   name  age      city  salary\n0   Tom   28    Beijing    8000\n1  Jerry   35   Shanghai   12000\n2   Mike   42  Guangzhou   15000\n
\n\n

Code Explanation:

\n\n
    \n
  • orient='records': JSON array format, where each row is a JSON object, the most commonly used format.
  • \n
  • orient='index': JSON object format, with keys as indices.
  • \n
  • orient='columns': JSON column format, with keys as column names and values as arrays.
  • \n
  • Correctly specifying the orient parameter is crucial for correctly parsing JSON data.
  • \n
\n\n

Example 2: Reading JSON from Strings and URLs

\n\n

read_json() can not only read files but also support reading data from strings and URLs.

\n\n

Example

\n\n
import pandas as pd\n\nimport json\n\nfrom io import StringIO\n\n# Example 2a: Reading from JSON string\n\njson_string ='''\n\n [\n {"product": "A", "sales": 100, "region": "North"},\n {"product": "B", "sales": 200, "region": "South"},\n {"product": "C", "sales": 150, "region": "East"}\n ]\n\n '''\n\n# Use StringIO to convert string to file-like object\n\ndf_from_string = pd.read_json(StringIO(json_string))\n\nprint("Reading from string:")\nprint(df_from_string)\nprint()\n\n# Can also pass JSON string directly as parameter\n\n# Note: Python strings need proper escaping\n\njson_str_direct ='[{"product": "A", "sales": 100}, {"product": "B", "sales": 200}]'\n\ndf_direct = pd.read_json(json_str_direct)\n\nprint("Passing string directly:")\nprint(df_direct)\nprint()\n\n# Example 2b: Reading JSON Lines format (each line is a JSON object)\n\n# JSON Lines is a common log format\n\njson_lines ='''{"name": "Tom", "score": 85}\n {"name": "Jerry", "score": 92}\n {"name": "Mike", "score": 78}\n {"name": "Lucy", "score": 95}'''\n\nwith open('data_lines.json','w', encoding='utf-8')as f:\n f.write(json_lines)\n\n# JSON Lines format requires line-by-line reading\n\n# Can use lines=True parameter (if supported by JSONL format)\n\n# Or handle manually\n\ndf_list =[]\n\nwith open('data_lines.json','r', encoding='utf-8')as f:\n for line in f:\n  df_list.append(json.loads(line))\n\ndf_lines = pd.DataFrame(df_list)\n\nprint("Reading JSON Lines format:")\nprint(df_lines)\nprint()\n\n# Example 2c: Reading from API URL (requires network access)\n\n# Using example API, replace with real URL in practice\n\n# df_api = pd.read_json('https://api.example.com/data')\n\n# print(df_api)\n\nprint("Note: Reading from URL requires actual network request")\n
\n\n

Expected Output:

\n\n
Reading from string:\n  product  sales region\n0       A    100  North\n1       B    200  South\n2       C    150   East\n\nPassing string directly:\n  product  sales\n0       A    100\n1       B    200\n\nReading JSON Lines format:\n   name  score\n0   Tom     85\n1  Jerry     92\n2   Mike     78\n3   Lucy     95\n\nNote: Reading from URL requires actual network request\n
\n\n

Code Explanation:

\n\n
    \n
  • read_json() can accept JSON strings as input; use StringIO to convert strings into file-like objects.
  • \n
  • JSON Lines format has one independent JSON object per line, often used in log processing, requiring line-by-line reading and merging into a DataFrame.
  • \n
  • When reading from a URL, simply pass the URL string, but network support is required.
  • \n
\n\n

Example 3: Handling Dates and Type Conversion

\n\n

Date and numeric types in JSON data require special handling.

\n\n

Example

\n\n
import pandas as pd\n\n# Example 3a: Handling date fields\n\njson_with_date ='''\n\n [\n {"name": "Tom", "birthday": "1995-03-15", "join_date": "2020-01-10"},\n {"name": "Jerry", "birthday": "1988-07-22", "join_date": "2019-03-05"},\n {"name": "Mike", "birthday": "1981-11-30", "join_date": "2018-06-20"}\n ]\n\n '''\n\nwith open('data_with_date.json','w', encoding='utf-8')as f:\n f.write(json_with_date)\n\n# By default, dates are read as object type\n\ndf_date = pd.read_json('data_with_date.json')\n\nprint("Default read (dates as strings):")\nprint(df_date)\nprint("birthday type:", df_date['birthday'].dtype)\nprint()\n\n# Use convert_dates to automatically convert date columns\n\ndf_date_converted = pd.read_json('data_with_date.json', convert_dates=['birthday','join_date'])\n\nprint("After converting dates:")\nprint(df_date_converted)\nprint("birthday type:", df_date_converted['birthday'].dtype)\nprint()\n\n# Example 3b: Specifying data types\n\njson_mixed ='''\n\n [\n {"id": "1", "name": "Tom", "score": 85.5},\n {"id": "2", "name": "Jerry", "score": 92.0},\n {"id": "3", "name": "Mike", "score": 78.5}\n ]\n\n '''\n\nwith open('data_mixed.json','w', encoding='utf-8')as f:\n f.write(json_mixed)\n\n# By default, id is read as integer, name as string, score as float\n\ndf_mixed = pd.read_json('data_mixed.json')\n\nprint("Default type inference:")\nprint(df_mixed)\nprint("id type:", df_mixed['id'].dtype)\nprint()\n\n# Use dtype to explicitly specify types\n\ndf_typed = pd.read_json('data_mixed.json', dtype={'id': str,'score': float})\n\nprint("After specifying types:")\nprint(df_typed)\nprint("id type:", df_typed['id'].dtype)\n
\n\n

Expected Output:

\n\n
Default read (dates as strings):\n   name   birthday   join_date\n0   Tom  1995-03-15  2020-01-10\n1  Jerry  1988-07-22  2019-03-05\n2   Mike  1981-11-30  2018-06-20\n\nDefault read (dates as strings):\n   name   birthday   join_date\n0   Tom  1995-03-15  2020-01-10\n1  Jerry  1988-07-22  2019-03-05\n2   Mike  1981-11-30  2018-06-20\n\nAfter converting dates:\n   name   birthday   join_date\n0   Tom 1995-03-15  2020-01-10\n1  Jerry 1988-07-22  2019-03-05\n2   Mike 1981-11-30  2018-06-20\n\nbirthday type: datetime64\n
\n\n

Code Explanation:

\n\n
    \n
  • The convert_dates parameter specifies which columns should be converted to date types.
  • \n
  • By default, convert_dates=True automatically recognizes common date formats.
  • \n
  • The dtype parameter allows explicit specification of column data types to avoid incorrect type inference.
  • \n
\n\n
\n\n

Notes

\n\n
    \n
  • Correctly specifying the orient parameter is key to reading JSON data; different JSON structures require different orient values.
  • \n
  • Dates in JSON are read as strings by default and must be converted using the convert_dates parameter.
  • \n
  • For large JSON files, consider using the chunksize parameter for chunked reading.
  • \n
  • read_json() supports reading from file paths, URLs, and JSON strings.
  • \n
  • JSON Lines format requires line-by-line parsing and merging into a DataFrame.
  • \n
\n\n
\n\n

Summary

\n\n

read_json() is the core function in pandas for reading JSON data, supporting multiple JSON formats. As a common format for Web APIs and data interchange, JSON is widely used in practical data analysis work.

\n\n

The key to mastering read_json() lies in understanding different orient formats and methods for handling dates and types. Readers are encouraged to practice reading JSON data in various formats in their actual work.

\n\n

Image 2: Python math Module Pandas Common Functions

← Pandas Pd Read SqlPandas Groupby Sum β†’