How to compare 2 csv files

Ed7 asked Dec 6, '21 | LimitlessTechnology-2700 edited Dec 7, '21

Hello,

I would like to compare 2 csv values (Names and MD5).

The script I do have so far does not what I aim for. I want it to display whether the values match or not and for each action to export as csv file.
Could help me on this please.

Below is my script


$file1 = import-csv "C:\pathtofile.csv"
$file2 = import-csv "C:\Users\users\Desktop\file.csv"

Compare-Object -ReferenceObject $file1 -DifferenceObject $file2 -Property 'Name', 'Hash' -IncludeEqual

if(Compare-Object $file1.Hash -eq $file2.Hash){ trow "Properties are not the same!" $file1.Hash $file2.Hash

}

else{ Compare-Object $file1 $file2 -Property Hash

}

windows-serverwindows-server-powershelloffice-excel-itprooffice-scripts-excel-dev

Comment

Pandas is the best Python library for creating and manipulating dataframe. You can read CSV files, manipulate them and also export the final CSV file after manipulation. Suppose you have two CSV files and want to compare both of them. How you will do? In this entire tutorial, you will learn how to how to compare two CSV files in python using pandas using various methods.

In this section, you will know all the steps required for comparing two CSV files in python using pandas. Just follow all the steps for a better understanding.

Step 1: Create a Sample CSVs file

The first step is to create sample CSV files for the method. Firstly I will create two sample dataframes and then export each dataframe to a CSV file. It will use to read the CSV file and compare them.

Execute the below lines of code to create two sample CSVs files.

import pandas as pd data1 = {"country":["India","USA","UK","Germany"],"dial_code":[91,1,44,49]} df1 = pd.DataFrame(data1) df1.to_csv("data1.csv",index=None) data2 = {"country":["India","USA","UK","Germany","Australia","China"],"dial_code":[91,1,44,49,61,86]} df2 = pd.DataFrame(data2) df2.to_csv("data2.csv",index=None) print(df1,"\n") print(df2)

Output

How to compare 2 csv files
How to compare 2 csv files
Sample dataframe creation for comparing between them

Step 2: Read  the CSV files

The second step is to read the created CSV files. You can read the CSV file using the pandas read_csv() method. Just pass the filename of the CSV file. It will convert the CSV file data to dataframe for manipulation.

Run the below lines of code to read your CSV files.

import pandas as pd df1 = pd.read_csv("data1.csv") df2 = pd.read_csv("data2.csv") print(df1,"\n") print(df2,"\n")

Output

country dial_code 0 India 91 1 USA 1 2 UK 44 3 Germany 49 country dial_code 0 India 91 1 USA 1 2 UK 44 3 Germany 49 4 Australia 61 5 China 86

Step 3: Implement the method to compare two CSV files in python using pandas

Now the CSV files have been read. Let’s compare both files. You will learn the various method to compare to CSV files in python.

The pandas package has a function isin() that allows you to check whether there are records in both the CSV files or not. If it finds then returns true else returns false. After that, we can get the values using the df[“boolean_result”].

Execute the below lines of code to compare the two CSV files.

# method 1 isin() metthod import pandas as pd df1 = pd.read_csv("data1.csv") df2 = pd.read_csv("data2.csv") c_result = df1[df1.apply(tuple,1).isin(df2.apply(tuple,1))] print(c_result)

Output

How to compare 2 csv files
How to compare 2 csv files
Comparing two csv files using the isin() method

Here I am also using the apply() method for comparing each row record with each CSV file.

Pandas also have a function merge() that is useful in comparing the two CSV files. It performs an inner join, outer join or both join on columns. You have to just pass the dataframes you want to compare as a list inside the merge() method. The function will compare and returns the dataframe.

Run the below lines of code to compare the CSV files.

# method 2 merge() method import pandas as pd df1 = pd.read_csv("data1.csv") df2 = pd.read_csv("data2.csv") c_result_m = pd.merge(df1,df2) print(c_result_m)

Output

How to compare 2 csv files
How to compare 2 csv files
Comparing two csv files using the merge() method

Conclusion

Pandas is the best python package for manipulating large datasets. If you have CSV files then you can compare them using the above methods.

I hope you have liked this tutorial. If you have any suggestions or want to include some methods in this tutorial then you can contact us for more information.