Linux split Command
split is a built-in file splitting tool in Linux that can cut a file by line count, size, or a specified number of pieces.
The split command is used to split a large file into several smaller files, making it convenient for transfer, storage, or parallel processing.
The split small files can be merged back into the original file using the cat command.
When you encounter a log file too large to open with an editor, need to upload a large file in chunks, or want to process data in parallel, split is the go-to solution.
The split files are named by default with an alphabetical sequence like xaa, xab, xac, etc.
split does not delete or modify the original file; the splitting operation is safe. However, be mindful of disk space when splitting into many small files.
Command Syntax
The basic syntax for split is as follows:
split
If no input file is specified, data is read from standard input.
If no output file prefix is specified, the default prefix x is used.
Common options for split are summarized below:
| Option | Function | Example |
|---|---|---|
-b, --bytes=SIZE |
Split file by specified number of bytes | split -b 100M large.log |
-l, --lines=NUMBER |
Split file by specified number of lines | split -l 1000 data.txt |
-n, --number=CHUNKS |
Split into a specified number of smaller files | split -n 5 data.txt |
-a, --suffix-length=N |
Specify suffix length (default is 2) | split -a 3 -l 100 data.txt |
-d, --numeric-suffixes |
Use numeric suffixes instead of alphabetic suffixes | split -d -l 100 data.txt |
-C, --line-bytes=SIZE |
Split by size, but keep each line intact | split -C 10M log.txt |
--verbose |
Display detailed information about the splitting process | split --verbose -b 10M file.bin |
Size units support suffixes like K (KB), M (MB), G (GB), T (TB), etc.
Difference between -b and -C:
| Option | Behavior | Applicable Scenario |
|---|---|---|
-b |
Cuts strictly by byte count, possibly truncating in the middle of a line | Binary files, scenarios where line breaks are not a concern |
-C |
Cuts at line boundaries when reaching the size limit, ensuring each line remains complete | Text logs, CSVs, and other scenarios where line integrity is required |
When splitting text files, prefer-Cover-bto avoid having a line of data truncated across two files.
Detailed Usage
Split by Line Count
Splitting by line count is the most intuitive method, suitable for processing log files or CSV data.
# Generate a test file with 5000 lines
$ seq 1 5000 > data.txt
# Split into smaller files every 1000 lines
$ split -l 1000 data.txt part_
# View the generated files
$ ls -lh part_*
-rw-r--r-- 1 tutorial tutorial 3.9K May 19 14:30 part_aa
-rw-r--r-- 1 tutorial tutorial 3.9K May 19 14:30 part_ab
-rw-r--r-- 1 tutorial tutorial 3.9K May 19 14:30 part_ac
-rw-r--r-- 1 tutorial tutorial 3.9K May 19 14:30 part_ad
-rw-r--r-- 1 tutorial tutorial 3.9K May 19 14:30 part_ae
After running, 5 files are generated, each containing exactly 1000 lines, with filenames prefixed by part_.
Split by File Size
Splitting by size is suitable for cutting binary files or controlling the maximum size of individual files.
# Generate a 10MB test file
$ dd if=/dev/urandom of=test.bin bs=1M count=10
# Split every 2MB
$ split -b 2M test.bin chunk_
# View the split results
$ ls -lh chunk_*
-rw-r--r-- 1 tutorial tutorial 2.0M May 19 14:32 chunk_aa
-rw-r--r-- 1 tutorial tutorial 2.0M May 19 14:32 chunk_ab
-rw-r--r-- 1 tutorial tutorial 2.0M May 19 14:32 chunk_ac
-rw-r--r-- 1 tutorial tutorial 2.0M May 19 14:32 chunk_ad
-rw-r--r-- 1 tutorial tutorial 2.0M May 19 14:32 chunk_ae
Specify Number of Splits
Use -n to evenly split a file into a specified number of smaller files.
# Split the file evenly into 3 parts
$ split -n 3 data.txt equal_
# Check the line count of each file
$ wc -l equal_*
1667 equal_aa
1667 equal_ab
1666 equal_ac
5000 total
As you can see, the files are distributed as evenly as possible. A file with a total of 5000 lines is split into approximately 1667 lines per file.
When the total number of lines in the file is not evenly divisible, the earlier files will contain a few more lines, and the later files will contain a few fewer.
Use Numeric Suffixes
The default alphabetic suffixes (aa, ab...) are not very intuitive. You can use -d to switch to numeric suffixes.
# Use numeric suffixes and specify a suffix length of 3
$ split -d -a 3 -l 1000 data.txt tutorial_
# The generated filenames
$ ls tutorial_*
tutorial_000 tutorial_001 tutorial_002 tutorial_003 tutorial_004
Here, a suffix length of 3 means it supports up to 1000 files (from 000 to 999). When splitting into a large number of small files, you can increase the value of -a.
Split Text While Maintaining Line Integrity
Use -C to split by size while ensuring that no line is truncated.
# Generate a test log with lines of varying lengths
$ for i in $(seq 1 100); do echo "Line $i: TUTORIAL testing data $(head -c $((RANDOM % 50 + 10)) /dev/urandom | base64)"; done > log.txt
# Split by 1KB while maintaining line integrity
$ split -C 1K log.txt log_
# Check if the last line of each file is complete (all end with a newline character)
$ for f in log_*; do echo "$f: $(tail -c 1 $f | xxd | grep -c '0a')"; done
-C ensures that the last line of each split file is complete, preventing a line from being split in half across two files.
Merge Split Files
Split files can be merged back losslessly:
# Merge all split files to restore the original file
$ cat part_* > restored.txt
# Verify that the merged content matches the original file
$ diff data.txt restored.txt && echo "Files are identical, merge successful"
Files are identical, merge successful
When merging, cat concatenates the files in the alphabetical order determined by the shell's wildcard expansion, which exactly matches the order in which split generated the files.
Common Questions
The split files still occupy the same total disk space as the original file (actually a bit more due to filesystem metadata overhead). Please ensure you have sufficient free disk space.
When using numeric suffixes, if the number of split files exceeds the maximum value representable by the suffix length (e.g.,-a 2allows a maximum of 100 files),splitwill exit with an error. This can be resolved by increasing the value of the-aparameter.
If you need to split based on specific content (e.g., by a delimiter), split does not support this. You should use the csplit command instead.
When reading data from standard input, split cannot estimate the total size, so the -n option to specify the number of chunks is not supported in standard input mode.
Related Commands
| Command | Function |
|---|---|
cat |
Concatenate files or display file contents |
csplit |
Split a file based on content (regular expressions) |
wc |
Count lines, words, and bytes in files |
dd |
Convert and copy files, can extract data by block size |
head |
Output the beginning part of a file |
tail |
Output the ending part of a file |
YouTip