How do I read/convert an HDF file containing a pandas dataframe written in Python 2.7 in Python 3.6?

by irenemeanspeace   Last Updated July 13, 2019 01:26 AM

I wrote a dataframe in Python 2.7 but now I need to open it in Python 3.6, and vice versa (I want to compare two dataframes written in both versions).

If I open a Python2.7-generated HDF file using pandas in Python 3.6, this is the error produced: UnicodeDecodeError: 'ascii' codec can't decode byte 0xde in position 1: ordinal not in range(128)

If I open a Python3.6-generated HDF file using pandas in Python 2.7, this is the error: ValueError: unsupported pickle protocol: 4

For both cases I simply saved the file by df.to_hdf.

Does anybody have a clue how to go about this?



Answers 2


Not exactly a solution but more of a workaround.

I simply read the files in their corresponding Python versions and saved them as a CSV file, which can then be read any version of Python.

irenemeanspeace
irenemeanspeace
March 06, 2018 16:14 PM

Converting to CSV (proposed by @irenemeanspeace) will not work if some columns of the original dataframe contain lists or dicts.

I have found a workaround which can deal with simple objects like lists and dicts. Convert it to json in py27 and then read it from py3.6.

# Run this in py2.7
###################################################
import pandas as pd
# read dataframe in py2.7
path = 'df.hdf5' # path to dataframe saved in py2.7
df = pd.read_hdf(path)
json_string = pd.to_json(compression='gzip')
with open('df.json.gz', 'w') as fp:
    fp.write(json_string)


###################################################
# Now run in py3.6
###################################################
import pandas as pd
with open('df.json.gz', 'r') as fp:
    json_string = fp.read()
df = pd.read_json(json_string)

This is more general solution.

Temak
Temak
July 13, 2019 01:23 AM

Related Questions


Replace column with rows pandas

Updated December 24, 2017 13:26 PM


Looking at Previous Time series

Updated March 21, 2019 09:26 AM