Diffing CSV with SQLite
2022-06-10
Say you want to diff two CSV files, like these:
$ cat 1.csv
id,name,fav_color
1,bill,blue
2,sue,red
3,abe,green
$ cat 2.csv
id,name,fav_color
1,bill,blue
2,sue,red
4,ali,orange
How do you do it?
The most straightforward way would be to use diff
, like this:
$ diff 1.csv 2.csv
4c4
< 3,abe,green
---
> 4,ali,orange
diff
works great, especially in the simplest case.
You can also use SQLite, like this:
$ cat diff_csv.sql
with left_diff as (
select * from left except select * from right
), right_diff as (
select * from right except select * from left
)
select *, 'left' as which from left_diff
union all
select *, 'right' as which from right_diff;
$ sqlite3 \
-cmd ".mode csv" \
-cmd ".headers on" \
-cmd ".import 1.csv left" \
-cmd ".import 2.csv right" < diff_csv.sql
id,name,fav_color,which
3,abe,green,left
4,ali,orange,right
This does a few things:
- Puts SQLite into CSV mode so it can load CSVs
- Turns on headers, so they show on output
- Creates the table
left
and imports the file1.csv
into it - Creates the table
right
and imports the file2.csv
into it - Runs
diff_csv.sql
This might seem a bit more complicated than just running diff
, but it gives you the entire power of SQL.
You can decide which columns you want to diff on, or only diff certain subsets of each file, easily. You might add new rows inline.
You can transform the diff'd results into something else, or summarize them. Maybe you want to diff 3 or more CSV files. No problem.
There's a ton of power you get over doing it all in text, and using grep/sed/awk. You're only limited by your knowledge of SQL.