Wednesday, November 16, 2016

Python - Decode ISO/UTF Character Encodings

Given a string of the format:

=?iso-2022-jp?B?GyRCO2QkTzYyJG0kNyQkSjg7ek5zJEckOSEqGyhCIBskQjlsMjshKhsoQg?=

This should decode it properly:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
import re
import base64

uniString="=?iso-2022-jp?B?GyRCO2QkTzYyJG0kNyQkSjg7ek5zJEckOSEqGyhCIBskQjlsMjshKhsoQg?="
charset=re.search("^=\?(.*?)\?", uniString).group(1)
b64String=re.search("\?[Bb]\?(.*?)[?=]", uniString).group(1)
print b64String
missing_padding = len(b64String) % 4
if missing_padding != 0:
    b64String += b'='* (4 - missing_padding)
dec=base64.b64decode(b64String)
print dec.decode(charset)

It decodes into "私は恐ろしい文字列です! 轟音!", which according to Google Translate, is "I am a fearsome string!  Roar!"

No comments:

Post a Comment