YouTip LogoYouTip

Linux Comm Split

Linux split Command body { font-family: -apple-system, BlinkMacSystemFont, "Segoe UI", Roboto, "Helvetica Neue", Arial, sans-serif; line-height: 1.6; color: #333; max-width: 900px; margin: 0 auto; padding: 20px; } h1, h2, h3 { color: #2c3e50; } code { background-color: #f4f4f4; padding: 2px 6px; border-radius: 3px; font-family: "Courier New", Courier, monospace; } pre { background-color: #f8f8f8; border: 1px solid #ddd; border-radius: 4px; padding: 12px; overflow-x: auto; font-size: 14px; } pre code { background-color: transparent; padding: 0; } table { width: 100%; border-collapse: collapse; margin: 20px 0; } th, td { border: 1px solid #ddd; padding: 10px; text-align: left; } th { background-color: #f2f2f2; } blockquote { border-left: 4px solid #3498db; margin: 20px 0; padding: 10px 20px; background-color: #f9f9f9; } a { color: #3498db; text-decoration: none; } a:hover { text-decoration: underline; } .image-link { display: block; text-align: center; margin: 20px 0; } Linux Command Manual Linux Command Manual

Linux split Command

split is a built-in file splitting tool in Linux that can cut a file by line count, size, or a specified number of pieces.

The split command is used to split a large file into several smaller files, making it convenient for transfer, storage, or parallel processing.

The split small files can be merged back into the original file using the cat command.

When you encounter a log file too large to open with an editor, need to upload a large file in chunks, or want to process data in parallel, split is the go-to solution.

The split files are named by default with an alphabetical sequence like xaa, xab, xac, etc.

split does not delete or modify the original file; the splitting operation is safe. However, be mindful of disk space when splitting into many small files.

Command Syntax

The basic syntax for split is as follows:

split   

If no input file is specified, data is read from standard input.

If no output file prefix is specified, the default prefix x is used.

Common options for split are summarized below:

Option Function Example
-b, --bytes=SIZE Split file by specified number of bytes split -b 100M large.log
-l, --lines=NUMBER Split file by specified number of lines split -l 1000 data.txt
-n, --number=CHUNKS Split into a specified number of smaller files split -n 5 data.txt
-a, --suffix-length=N Specify suffix length (default is 2) split -a 3 -l 100 data.txt
-d, --numeric-suffixes Use numeric suffixes instead of alphabetic suffixes split -d -l 100 data.txt
-C, --line-bytes=SIZE Split by size, but keep each line intact split -C 10M log.txt
--verbose Display detailed information about the splitting process split --verbose -b 10M file.bin

Size units support suffixes like K (KB), M (MB), G (GB), T (TB), etc.

Difference between -b and -C:

Option Behavior Applicable Scenario
-b Cuts strictly by byte count, possibly truncating in the middle of a line Binary files, scenarios where line breaks are not a concern
-C Cuts at line boundaries when reaching the size limit, ensuring each line remains complete Text logs, CSVs, and other scenarios where line integrity is required
When splitting text files, prefer -C over -b to avoid having a line of data truncated across two files.

Detailed Usage

Split by Line Count

Splitting by line count is the most intuitive method, suitable for processing log files or CSV data.

# Generate a test file with 5000 lines
$ seq 1 5000 > data.txt

# Split into smaller files every 1000 lines
$ split -l 1000 data.txt part_

# View the generated files
$ ls -lh part_*
-rw-r--r-- 1 tutorial tutorial 3.9K May 19 14:30 part_aa
-rw-r--r-- 1 tutorial tutorial 3.9K May 19 14:30 part_ab
-rw-r--r-- 1 tutorial tutorial 3.9K May 19 14:30 part_ac
-rw-r--r-- 1 tutorial tutorial 3.9K May 19 14:30 part_ad
-rw-r--r-- 1 tutorial tutorial 3.9K May 19 14:30 part_ae

After running, 5 files are generated, each containing exactly 1000 lines, with filenames prefixed by part_.

Split by File Size

Splitting by size is suitable for cutting binary files or controlling the maximum size of individual files.

# Generate a 10MB test file
$ dd if=/dev/urandom of=test.bin bs=1M count=10

# Split every 2MB
$ split -b 2M test.bin chunk_

# View the split results
$ ls -lh chunk_*
-rw-r--r-- 1 tutorial tutorial 2.0M May 19 14:32 chunk_aa
-rw-r--r-- 1 tutorial tutorial 2.0M May 19 14:32 chunk_ab
-rw-r--r-- 1 tutorial tutorial 2.0M May 19 14:32 chunk_ac
-rw-r--r-- 1 tutorial tutorial 2.0M May 19 14:32 chunk_ad
-rw-r--r-- 1 tutorial tutorial 2.0M May 19 14:32 chunk_ae

Specify Number of Splits

Use -n to evenly split a file into a specified number of smaller files.

# Split the file evenly into 3 parts
$ split -n 3 data.txt equal_

# Check the line count of each file
$ wc -l equal_*
  1667 equal_aa
  1667 equal_ab
  1666 equal_ac
  5000 total

As you can see, the files are distributed as evenly as possible. A file with a total of 5000 lines is split into approximately 1667 lines per file.

When the total number of lines in the file is not evenly divisible, the earlier files will contain a few more lines, and the later files will contain a few fewer.

Use Numeric Suffixes

The default alphabetic suffixes (aa, ab...) are not very intuitive. You can use -d to switch to numeric suffixes.

# Use numeric suffixes and specify a suffix length of 3
$ split -d -a 3 -l 1000 data.txt tutorial_

# The generated filenames
$ ls tutorial_*
tutorial_000 tutorial_001 tutorial_002 tutorial_003 tutorial_004

Here, a suffix length of 3 means it supports up to 1000 files (from 000 to 999). When splitting into a large number of small files, you can increase the value of -a.

Split Text While Maintaining Line Integrity

Use -C to split by size while ensuring that no line is truncated.

# Generate a test log with lines of varying lengths
$ for i in $(seq 1 100); do echo "Line $i: TUTORIAL testing data $(head -c $((RANDOM % 50 + 10)) /dev/urandom | base64)"; done > log.txt

# Split by 1KB while maintaining line integrity
$ split -C 1K log.txt log_

# Check if the last line of each file is complete (all end with a newline character)
$ for f in log_*; do echo "$f: $(tail -c 1 $f | xxd | grep -c '0a')"; done

-C ensures that the last line of each split file is complete, preventing a line from being split in half across two files.

Merge Split Files

Split files can be merged back losslessly:

# Merge all split files to restore the original file
$ cat part_* > restored.txt

# Verify that the merged content matches the original file
$ diff data.txt restored.txt && echo "Files are identical, merge successful"
Files are identical, merge successful

When merging, cat concatenates the files in the alphabetical order determined by the shell's wildcard expansion, which exactly matches the order in which split generated the files.


Common Questions

The split files still occupy the same total disk space as the original file (actually a bit more due to filesystem metadata overhead). Please ensure you have sufficient free disk space.
When using numeric suffixes, if the number of split files exceeds the maximum value representable by the suffix length (e.g., -a 2 allows a maximum of 100 files), split will exit with an error. This can be resolved by increasing the value of the -a parameter.

If you need to split based on specific content (e.g., by a delimiter), split does not support this. You should use the csplit command instead.

When reading data from standard input, split cannot estimate the total size, so the -n option to specify the number of chunks is not supported in standard input mode.


Related Commands

Command Function
cat Concatenate files or display file contents
csplit Split a file based on content (regular expressions)
wc Count lines, words, and bytes in files
dd Convert and copy files, can extract data by block size
head Output the beginning part of a file
tail Output the ending part of a file
Linux Command Manual Linux Command Manual
← Linux Comm TeeLinux Comm Slocate β†’