I have a series of multiple files. I would like to sort these files, select the bottom line from each, and pipe those lines into a new single file.
My files look like this:
1, 100, 2.5
2, 100, 3.3
3, 100, 5.1
4, 100, 1.2
These files are all named 51_Sur_extracted_data.csv
The files are housed in parent directories as follows:
Track_0001/output_dfsu/51_Sur_extracted_data.csv Track_0002/output_dfsu/51_Sur_extracted_data.csv
So I would like to sort all of the csv files on the 3rd column, extract the bottom line, and place it into a new summary file. Basically the goal is produce a file that has the maximum values from column three across all of the parent directories. Ideally, I would also like to add a column in the output file that contains the name of the parent directory (Track_0002).
I have the sort figured out, but the rest is proving a bit troublesome for me. For instance, can I sort all of the csv files at the same time and write the output to new files (example 51_Sur_extracted_data_sort.csv)? Then I can grep the last line and pipe it to a new file?
sort -t"," -k3,3g filename
Thanks, K
IF you want the last line of any given command, use tail
.
So for you, you would do this:
sort -t"," -k3,3g filename | tail -n1 > newfilename
I'd do something like this:
for a in */*/*.csv
do
dname="$(basename "$(dirname "$a")")"
echo -e "$dname\t$(sort -t"," -k3,3g "$a" | tail -n 1)"
done
On my test files it returned:
output_abcd 3, 100, 9.1
output_bcde 3, 100, 5.1
output_cdef 3, 100, 5.1
output_abcd 3, 100, 5.1
output_bcde 3, 100, 5.1
output_cdef 3, 100, 5.1
output_abcd 3, 100, 5.1
output_bcde 3, 100, 5.1
output_cdef 1, 100, 7.5
output_abcd 3, 100, 5.1
output_bcde 3, 100, 5.1
output_cdef 3, 100, 5.1
output_abcd 3, 100, 5.1
output_bcde 3, 100, 5.1
output_cdef 3, 100, 5.1
output_abcd 3, 100, 5.1
output_bcde 3, 100, 5.1
output_cdef 3, 100, 5.1
output_abcd 3, 100, 5.1
output_bcde 2, 100, 42.3
output_cdef 3, 100, 5.1
output_abcd 3, 100, 5.1
output_bcde 3, 100, 5.1
output_cdef 3, 100, 5.2
output_abcd 3, 100, 5.1
output_bcde 3, 100, 5.1
output_cdef 3, 100, 5.1
Yeah, my input data is a bit boring. Of course you can sort the end result again, but I'll leave that to your imagination (you already had that step) :)