A simple python script to deduplicate a mailbox (mbox format).
#!/usr/bin/env python # Created by Evaggelos Balaskas on Thu Jul 29 21:22:41 EEST 2010 # Remove duplicate mails from mbox using message-id import sys import mailbox if len(sys.argv) == 2: mid = [] for message in mailbox.mbox( sys.argv[1] ) : s = message['message-id'] if s not in mid: mid.append(s) print message else: print "Usage should be: " + sys.argv[0] + " mbox > new.mbox"
You can take a look, also, on my other python script: How to remove specific mails from a mbox by subject

Friday, July 30, 2010 - 05:02:32
The use of sets (mid = {}) instead of list would lead to code with better performance. The required changes are very few.
I can’t test it right now because I need sleep, but I think something like:
uniq_message_ids = {m[’message-id’] for m in mailbox.mbox(sys.argv[1])}
would work