Miller
Like awk/sed/cut for name-indexed data (CSV, JSON, etc.).
Official WebsiteFeatures
Multi-formatStreamingSQL-likeDSL
Installation
Homebrew
brew install millerAPT (Debian/Ubuntu)
apt install millerPacman (Arch)
pacman -S millerWhy use Miller?
Miller is like AWK, sed, and cut for structured data. It works with CSV, JSON, XML, TOML, and other formats, providing a unified interface for data transformation, filtering, and aggregation—without leaving the command line.
Multi-format Support
Process CSV, JSON, XML, TOML, NDJSON and more with consistent commands. Convert between formats effortlessly.
SQL-like Operations
Use familiar SQL concepts: SELECT, GROUP BY, WHERE, JOIN, and aggregation functions.
Powerful DSL
Miller has its own domain-specific language (DSLX) for complex transformations and custom logic.
Streaming Processing
Process data line-by-line without loading entire files. Perfect for large datasets and pipelines.
Installation
Installation
# macOS (Homebrew)
brew install miller
# Ubuntu/Debian
sudo apt install miller
# Arch Linux
sudo pacman -S miller
# Windows (Chocolatey)
choco install miller
# From source
git clone https://github.com/johnkerl/miller
cd miller
./configure && make && make install
# Docker
docker run -i stedolan/miller mlr --versionBasic Usage
Working with CSV
CSV Operations
# View CSV data
mlr --csv cat data.csv
# Pretty-print CSV
mlr --csv --ojson cat data.csv | head
# Convert CSV to JSON
mlr --csv --ojson cat data.csv > data.json
# Convert CSV to TOML
mlr --csv --otoml cat data.csv
# Display first N rows
mlr --csv head -n 10 data.csvSelecting and Filtering
Selection & Filtering
# Select specific columns
mlr --csv cut -f name,email,age data.csv
# Filter rows with condition
mlr --csv filter '$age > 30' data.csv
# Multiple conditions
mlr --csv filter '$age > 25 && $status == "active"' data.csv
# Exclude columns
mlr --csv cut -x -f temp_field data.csv
# Filter and select together
mlr --csv filter '$status == "active"' then cut -f name,email data.csvData Transformation
Transformations
# Rename columns
mlr --csv rename old_name,new_name data.csv
# Add calculated field
mlr --csv put '$total = $price * $quantity' data.csv
# Format strings
mlr --csv put '$name = toupper($name)' data.csv
# Round numbers
mlr --csv put '$value = round($value, 2)' data.csv
# Conditional assignment
mlr --csv put '$status = $age >= 18 ? "adult" : "minor"' data.csvCommon Patterns
Aggregation and Grouping
Aggregation
# Group by category and count
mlr --csv stats count -g category data.csv
# Sum values by category
mlr --csv stats sum price -g category data.csv
# Multiple aggregations
mlr --csv stats count,sum price,mean age -g department data.csv
# Count distinct values
mlr --csv stats count -a distinct -g product_id data.csv
# Get min/max per group
mlr --csv stats min rating,max rating -g store_id data.csvSorting and Ordering
Sorting
# Sort by column
mlr --csv sort -f age data.csv
# Sort numeric (reverse)
mlr --csv sort -nr salary data.csv
# Sort by multiple columns
mlr --csv sort -f department -n age data.csv
# Sort with string case sensitivity
mlr --csv sort -f name data.csvFormat Conversion
Format Conversion
# CSV to JSON
mlr --csv --ojson cat data.csv
# CSV to TOML
mlr --csv --otoml cat data.csv
# JSON to CSV
mlr --json --ocsv cat data.json
# XML to JSON
mlr --xml --ojson cat data.xml
# Pretty-print JSON
mlr --json --ojson --jvstack cat data.jsonJoining Data
Joining
# Join two CSV files
mlr --csv join --left --lk id --rk user_id users.csv orders.csv
# Inner join
mlr --csv join --inner --lk id --rk id file1.csv file2.csv
# Use key lookup
mlr --csv step --from users.csv -a getters 'id' then join --left --lk user_id --rk id orders.csvAdvanced Transformations
Advanced Transforms
# Pivot/reshape data
mlr --csv unsparsify data.csv
# Sample records
mlr --csv sample -k 100 data.csv
# Reverse field order
mlr --csv seqgen -n 10 | mlr --csv shuffle
# Repeat each record
mlr --csv repeat -n 3 data.csv
# Create new field sequences
mlr --csv seqgen -n 1000 | mlr --csv put '$squared = $i ** 2'Advanced Features
Using the Miller DSL (DSLX)
DSLX Programming
# Use put with complex expressions
mlr --csv put '@sum += $amount; @count += 1; @avg = @sum / @count' data.csv
# Define and use variables
mlr --csv put 'begin { @threshold = 100 } $value > @threshold' data.csv
# Use built-in functions
mlr --csv put '$lower = tolower($name); $len = strlen($email)' data.csv
# Loop operations
mlr --csv put 'for (i = 1; i <= 10; i += 1) { @count[i] = 0 }' data.csvStreaming Processing
Streaming
# Process NDJSON (newline-delimited JSON)
mlr --ndjson cat large-file.ndjson
# Filter while streaming
mlr --ndjson filter '$status == "active"' large-file.ndjson
# Streaming aggregation
mlr --csv stats count -g category data.csv
# Process multiple files
mlr --csv --mfn cat *.csv
# Specify record and field separator
mlr --csv --rs '@' --fs ',' cat custom-data.csvStatistical Analysis
Statistics
# Compute percentiles
mlr --csv stats p10,p50,p90 -f salary data.csv
# Get cardinality
mlr --csv stats count -a distinct -f product_id data.csv
# Standard deviation and variance
mlr --csv stats stddev,variance -f price data.csv
# Top N values
mlr --csv top -f value -n 10 data.csvRegular Expressions
Regex
# Filter with regex
mlr --csv filter '$email =~ ".*@gmail\.com"' data.csv
# Match pattern negation
mlr --csv filter '$email !~ ".*@company\.com"' data.csv
# Extract with regex
mlr --csv put '$domain = sub($email, ".*@", "")' data.csv
# Case-insensitive regex
mlr --csv filter '$name =~ "[Jj]ohn"' data.csvCommand Reference
| Command | Description | Example |
|---|---|---|
cat | Output records | mlr --csv cat data.csv |
cut | Select columns | mlr --csv cut -f name,email |
filter | Filter rows | mlr --csv filter '$age > 30' |
put | Add/modify fields | mlr --csv put '$total = $a * $b' |
stats | Aggregate statistics | mlr --csv stats sum price -g category |
sort | Sort records | mlr --csv sort -f name |
join | Join files | mlr --csv join --left --lk id |
Tips
- •Always specify input and output formats explicitly with
--csv,--json, etc. for clarity and reliability - •Use
thento chain multiple operations:mlr --csv filter ... then cut ... then sort - •Miller is excellent for exploratory data analysis—use it to understand your data before loading into databases
- •The
--mfnflag enables multi-file processing without concatenating files first - •Use
--barredor--markdownfor pretty-printed output