{"id":20,"date":"2010-07-08T22:42:42","date_gmt":"2010-07-08T20:42:42","guid":{"rendered":"http:\/\/blog.garion.org\/?p=20"},"modified":"2010-07-08T22:45:15","modified_gmt":"2010-07-08T20:45:15","slug":"2-files-finding-lines-that-are-in-only-1-of-those-files","status":"publish","type":"post","link":"https:\/\/joost.vunderink.net\/blog\/2010\/07\/08\/2-files-finding-lines-that-are-in-only-1-of-those-files\/","title":{"rendered":"2 files: finding lines that are in only 1 of those files"},"content":{"rendered":"<p>Today, I learned a trick at work.<\/p>\n<p>I had a list of phone numbers in a file, and wanted to obtain a second list of phone numbers: the ones that were in a table in our database, but not in the given file.<\/p>\n<p>Of course, you can solve this in SQL, for example with a &#8220;NOT IN&#8221; clause. However, with tens of thousands of phone numbers involved, this would probably be rather slow. Besides, I&#8217;m not very good at SQL, and I don&#8217;t like reading the <a href=\"http:\/\/www.postgresql.org\/docs\/\">PostgreSQL documentation<\/a>.<\/p>\n<p>A colleague mentioned a trick with <a href=\"http:\/\/unixhelp.ed.ac.uk\/CGI\/man-cgi?cat\">cat<\/a>, <a href=\"http:\/\/unixhelp.ed.ac.uk\/CGI\/man-cgi?sort\">sort<\/a> and <a href=\"http:\/\/unixhelp.ed.ac.uk\/CGI\/man-cgi?uniq\">uniq<\/a>. If you do <strong>cat filename | sort | uniq -u<\/strong>, only lines that appear once in <em>filename<\/em> will be printed. So, if you have 2 files, <em>file1<\/em> and <em>file2<\/em>, neither of them having duplicate lines, and you want to know which lines are in <em>file2<\/em> but not in <em>file1<\/em>, you do this:<\/p>\n<p><strong>cat file1 file1 file2 | sort | uniq -u<\/strong><\/p>\n<p><span style=\"color: #000000;\">Lines that appear only in <em>file1<\/em> are filtered out by uniq&#8217;s -u option, as are lines that appear both in <em>file1<\/em> and <em>file2<\/em>. So you end up with the lines that appear only in <em>file2<\/em>.<\/span><\/p>\n<p><span style=\"color: #000000;\">I made file containing a list of <em>all<\/em> phone numbers in our database, and used the above trick to get the phone numbers that were in that file, but not in the file with the limited list of phone numbers. Fast, and easy.<\/span><\/p>\n","protected":false},"excerpt":{"rendered":"<p>Today, I learned a trick at work. I had a list of phone numbers in a file, and wanted to obtain a second list of phone numbers: the ones that were in a table in our database, but not in &hellip; <a href=\"https:\/\/joost.vunderink.net\/blog\/2010\/07\/08\/2-files-finding-lines-that-are-in-only-1-of-those-files\/\">Continue reading <span class=\"meta-nav\">&rarr;<\/span><\/a><\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[8,7],"tags":[],"class_list":["post-20","post","type-post","status-publish","format-standard","hentry","category-linux","category-unix"],"_links":{"self":[{"href":"https:\/\/joost.vunderink.net\/blog\/wp-json\/wp\/v2\/posts\/20"}],"collection":[{"href":"https:\/\/joost.vunderink.net\/blog\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/joost.vunderink.net\/blog\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/joost.vunderink.net\/blog\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/joost.vunderink.net\/blog\/wp-json\/wp\/v2\/comments?post=20"}],"version-history":[{"count":2,"href":"https:\/\/joost.vunderink.net\/blog\/wp-json\/wp\/v2\/posts\/20\/revisions"}],"predecessor-version":[{"id":22,"href":"https:\/\/joost.vunderink.net\/blog\/wp-json\/wp\/v2\/posts\/20\/revisions\/22"}],"wp:attachment":[{"href":"https:\/\/joost.vunderink.net\/blog\/wp-json\/wp\/v2\/media?parent=20"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/joost.vunderink.net\/blog\/wp-json\/wp\/v2\/categories?post=20"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/joost.vunderink.net\/blog\/wp-json\/wp\/v2\/tags?post=20"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}