\n\n
read_json() is a function in the pandas library used for reading JSON (JavaScript Object Notation) files, supporting multiple JSON format data imports.
JSON is a lightweight data interchange format that is easy for humans to read and write, and also easy for machines to parse and generate. It is widely used in scenarios such as Web APIs and configuration files. read_json() can convert JSON data into pandas DataFrame format, making it convenient for data analysis.
\n\n
Basic Syntax and Parameters
\n\nSyntax Format
\n\npandas.read_json(path_or_buf, orient=None, typ='frame', dtype=None, convert_axes=None, convert_dates=True, keep_default_dates=True, numpy=False, precise_float=False, date_unit='ms', ...)\n\n\nParameter Description
\n\n| Parameter | \nType | \nDescription | \nDefault Value | \n
|---|---|---|---|
| path_or_buf | \nstr, path object, or file-like object | \nJSON file path, URL, or string | \nRequired | \n
| orient | \nstr | \nFormat of JSON data: 'split', 'records', 'index', 'columns', 'values' | \nNone | \n
| typ | \nstr | \nReturn type: 'frame' returns DataFrame, 'series' returns Series | \n'frame' | \n
| dtype | \ndict | \nSpecify data types for columns | \nNone | \n
| convert_axes | \nbool | \nWhether to convert axes to datetime | \nNone | \n
| convert_dates | \nbool, list | \nWhether to convert date columns | \nTrue | \n
| numpy | \nbool | \nWhether to use numpy arrays | \nFalse | \n
Return Value
\n\n- \n
- Return Type:
pd.DataFrameorpd.Series\n - By default, returns a DataFrame, which is a two-dimensional table. \n
- Returns a Series when
typ='series'. \n
\n\n
Examples
\n\nThrough the following examples, master various uses of read_json().
Example 1: Reading JSON Data in Different Formats
\n\nJSON data has multiple formats, and read_json() supports several common ones.
Example
\n\nimport pandas as pd\n\nimport json\n\n# Sample data: JSON array format (records format)\n\njson_records ='''\n\n [\n {"name": "Tom", "age": 28, "city": "Beijing", "salary": 8000},\n {"name": "Jerry", "age": 35, "city": "Shanghai", "salary": 12000},\n {"name": "Mike", "age": 42, "city": "Guangzhou", "salary": 15000}\n ]\n\n '''\n\n# Write JSON string to file\n\nwith open('data_records.json','w', encoding='utf-8')as f:\n f.write(json_records)\n\n# Read JSON file (records format, most commonly used)\n\n# orient='records' means each row is a JSON object\n\ndf_records = pd.read_json('data_records.json', orient='records')\n\nprint("Records format:")\nprint(df_records)\nprint()\n\n# Sample data: JSON object format (index format)\n\njson_index ='''\n\n {\n "Tom": {"age": 28, "city": "Beijing", "salary": 8000},\n "Jerry": {"age": 35, "city": "Shanghai", "salary": 12000},\n "Mike": {"age": 42, "city": "Guangzhou", "salary": 15000}\n }\n\n '''\n\nwith open('data_index.json','w', encoding='utf-8')as f:\n f.write(json_index)\n\n# Read JSON file (index format, using a field as index)\n\ndf_index = pd.read_json('data_index.json', orient='index')\n\nprint("Index format:")\nprint(df_index)\nprint()\n\n# Sample data: JSON column format (columns format)\n\njson_columns ='''\n\n {\n "name": ["Tom", "Jerry", "Mike"],\n "age": [28, 35, 42],\n "city": ["Beijing", "Shanghai", "Guangzhou"],\n "salary": [8000, 12000, 15000]\n }\n\n '''\n\nwith open('data_columns.json','w', encoding='utf-8')as f:\n f.write(json_columns)\n\n# Read JSON file (columns format)\n\ndf_columns = pd.read_json('data_columns.json', orient='columns')\n\nprint("Columns format:")\nprint(df_columns)\n\n\nExpected Output:
\n\nRecords format:\n name age city salary\n0 Tom 28 Beijing 8000\n1 Jerry 35 Shanghai 12000\n2 Mike 42 Guangzhou 15000\n\nIndex format:\n age city salary\nTom 28 Beijing 8000\nJerry 35 Shanghai 12000\nMike 42 Guangzhou 15000\n\nColumns format:\n name age city salary\n0 Tom 28 Beijing 8000\n1 Jerry 35 Shanghai 12000\n2 Mike 42 Guangzhou 15000\n\n\nCode Explanation:
\n\n- \n
orient='records': JSON array format, where each row is a JSON object, the most commonly used format. \norient='index': JSON object format, with keys as indices. \norient='columns': JSON column format, with keys as column names and values as arrays. \n- Correctly specifying the
orientparameter is crucial for correctly parsing JSON data. \n
Example 2: Reading JSON from Strings and URLs
\n\nread_json() can not only read files but also support reading data from strings and URLs.
Example
\n\nimport pandas as pd\n\nimport json\n\nfrom io import StringIO\n\n# Example 2a: Reading from JSON string\n\njson_string ='''\n\n [\n {"product": "A", "sales": 100, "region": "North"},\n {"product": "B", "sales": 200, "region": "South"},\n {"product": "C", "sales": 150, "region": "East"}\n ]\n\n '''\n\n# Use StringIO to convert string to file-like object\n\ndf_from_string = pd.read_json(StringIO(json_string))\n\nprint("Reading from string:")\nprint(df_from_string)\nprint()\n\n# Can also pass JSON string directly as parameter\n\n# Note: Python strings need proper escaping\n\njson_str_direct ='[{"product": "A", "sales": 100}, {"product": "B", "sales": 200}]'\n\ndf_direct = pd.read_json(json_str_direct)\n\nprint("Passing string directly:")\nprint(df_direct)\nprint()\n\n# Example 2b: Reading JSON Lines format (each line is a JSON object)\n\n# JSON Lines is a common log format\n\njson_lines ='''{"name": "Tom", "score": 85}\n {"name": "Jerry", "score": 92}\n {"name": "Mike", "score": 78}\n {"name": "Lucy", "score": 95}'''\n\nwith open('data_lines.json','w', encoding='utf-8')as f:\n f.write(json_lines)\n\n# JSON Lines format requires line-by-line reading\n\n# Can use lines=True parameter (if supported by JSONL format)\n\n# Or handle manually\n\ndf_list =[]\n\nwith open('data_lines.json','r', encoding='utf-8')as f:\n for line in f:\n df_list.append(json.loads(line))\n\ndf_lines = pd.DataFrame(df_list)\n\nprint("Reading JSON Lines format:")\nprint(df_lines)\nprint()\n\n# Example 2c: Reading from API URL (requires network access)\n\n# Using example API, replace with real URL in practice\n\n# df_api = pd.read_json('https://api.example.com/data')\n\n# print(df_api)\n\nprint("Note: Reading from URL requires actual network request")\n\n\nExpected Output:
\n\nReading from string:\n product sales region\n0 A 100 North\n1 B 200 South\n2 C 150 East\n\nPassing string directly:\n product sales\n0 A 100\n1 B 200\n\nReading JSON Lines format:\n name score\n0 Tom 85\n1 Jerry 92\n2 Mike 78\n3 Lucy 95\n\nNote: Reading from URL requires actual network request\n\n\nCode Explanation:
\n\n- \n
read_json()can accept JSON strings as input; useStringIOto convert strings into file-like objects. \n- JSON Lines format has one independent JSON object per line, often used in log processing, requiring line-by-line reading and merging into a DataFrame. \n
- When reading from a URL, simply pass the URL string, but network support is required. \n
Example 3: Handling Dates and Type Conversion
\n\nDate and numeric types in JSON data require special handling.
\n\nExample
\n\nimport pandas as pd\n\n# Example 3a: Handling date fields\n\njson_with_date ='''\n\n [\n {"name": "Tom", "birthday": "1995-03-15", "join_date": "2020-01-10"},\n {"name": "Jerry", "birthday": "1988-07-22", "join_date": "2019-03-05"},\n {"name": "Mike", "birthday": "1981-11-30", "join_date": "2018-06-20"}\n ]\n\n '''\n\nwith open('data_with_date.json','w', encoding='utf-8')as f:\n f.write(json_with_date)\n\n# By default, dates are read as object type\n\ndf_date = pd.read_json('data_with_date.json')\n\nprint("Default read (dates as strings):")\nprint(df_date)\nprint("birthday type:", df_date['birthday'].dtype)\nprint()\n\n# Use convert_dates to automatically convert date columns\n\ndf_date_converted = pd.read_json('data_with_date.json', convert_dates=['birthday','join_date'])\n\nprint("After converting dates:")\nprint(df_date_converted)\nprint("birthday type:", df_date_converted['birthday'].dtype)\nprint()\n\n# Example 3b: Specifying data types\n\njson_mixed ='''\n\n [\n {"id": "1", "name": "Tom", "score": 85.5},\n {"id": "2", "name": "Jerry", "score": 92.0},\n {"id": "3", "name": "Mike", "score": 78.5}\n ]\n\n '''\n\nwith open('data_mixed.json','w', encoding='utf-8')as f:\n f.write(json_mixed)\n\n# By default, id is read as integer, name as string, score as float\n\ndf_mixed = pd.read_json('data_mixed.json')\n\nprint("Default type inference:")\nprint(df_mixed)\nprint("id type:", df_mixed['id'].dtype)\nprint()\n\n# Use dtype to explicitly specify types\n\ndf_typed = pd.read_json('data_mixed.json', dtype={'id': str,'score': float})\n\nprint("After specifying types:")\nprint(df_typed)\nprint("id type:", df_typed['id'].dtype)\n\n\nExpected Output:
\n\nDefault read (dates as strings):\n name birthday join_date\n0 Tom 1995-03-15 2020-01-10\n1 Jerry 1988-07-22 2019-03-05\n2 Mike 1981-11-30 2018-06-20\n\nDefault read (dates as strings):\n name birthday join_date\n0 Tom 1995-03-15 2020-01-10\n1 Jerry 1988-07-22 2019-03-05\n2 Mike 1981-11-30 2018-06-20\n\nAfter converting dates:\n name birthday join_date\n0 Tom 1995-03-15 2020-01-10\n1 Jerry 1988-07-22 2019-03-05\n2 Mike 1981-11-30 2018-06-20\n\nbirthday type: datetime64\n\n\nCode Explanation:
\n\n- \n
- The
convert_datesparameter specifies which columns should be converted to date types. \n - By default,
convert_dates=Trueautomatically recognizes common date formats. \n - The
dtypeparameter allows explicit specification of column data types to avoid incorrect type inference. \n
\n\n
Notes
\n\n- \n
- Correctly specifying the
orientparameter is key to reading JSON data; different JSON structures require differentorientvalues. \n - Dates in JSON are read as strings by default and must be converted using the
convert_datesparameter. \n - For large JSON files, consider using the
chunksizeparameter for chunked reading. \n read_json()supports reading from file paths, URLs, and JSON strings. \n- JSON Lines format requires line-by-line parsing and merging into a DataFrame. \n
\n\n
Summary
\n\nread_json() is the core function in pandas for reading JSON data, supporting multiple JSON formats. As a common format for Web APIs and data interchange, JSON is widely used in practical data analysis work.
The key to mastering read_json() lies in understanding different orient formats and methods for handling dates and types. Readers are encouraged to practice reading JSON data in various formats in their actual work.
YouTip