Is there a faster way to store a big dictionary, than pickle or regular Python file? [closed]

I want to store a dictionary which only contains data in the following format:

{

    "key1" : True,

    "key2" : True,

    .....

}

In other words, just a quick way to check if a key is valid or not. I can do this by storing a dict called foo in a file called bar.py, and then in my other modules, I can import it as follows:

from bar import foo

Or, I can save it in a pickle file called bar.pickle, and import it at the top of the file as follows:

import pickle  

with open('bar.pickle', 'rb') as f:

    foo = pickle.load(f)

Which would be the ideal, and faster way to do this?

edited Jan 2 at 20:16

spectras

7,64511635

asked Jan 2 at 19:21

darkhorse

1,53551847

closed as primarily opinion-based by Engineero, Patrick Artner, eyllanesc, Jean-François Fabre, Paul Roub Jan 2 at 22:58

Many good questions generate some degree of opinion based on expert experience, but answers to this question will tend to be almost entirely based on opinions, rather than facts, references, or specific expertise. If this question can be reworded to fit the rules in the help center, please edit the question.

pickle is a just a binary format which can help you to save dict more efficiently. Python definitely needs to spend a little bit more effort to read/parse binary format than plain text format.

– Windchill
Jan 2 at 19:24

there are lots of alternatives here, what's best/ideal/fastest almost certainly depends on much more information than you've given. i.e. how often does this data change, who changes it, how do they change it, do you want to keep track of these changes, do you care about portability to different systems/versions of Python… and many more

– Sam Mason
Jan 2 at 19:25

You could also consider dumping it as a json since the format is essentially the same, unless your dictionary have other Python object inside that is.

– Idlehands
Jan 2 at 19:31

If you don't have any need to distinguish between False and varieties of N/A, you can just create a list of the keys with True values and write that to file. Then instead of doing if my_dict[key], you can do if key in my_list.

– Acccumulation
Jan 2 at 19:35

1

@Acccumulation but that would be trading an O(1) membership test for an O(N) membership test.

– juanpa.arrivillaga
Jan 2 at 19:38

|
show 2 more comments

I want to store a dictionary which only contains data in the following format:

{

    "key1" : True,

    "key2" : True,

    .....

}

In other words, just a quick way to check if a key is valid or not. I can do this by storing a dict called foo in a file called bar.py, and then in my other modules, I can import it as follows:

from bar import foo

Or, I can save it in a pickle file called bar.pickle, and import it at the top of the file as follows:

import pickle  

with open('bar.pickle', 'rb') as f:

    foo = pickle.load(f)

Which would be the ideal, and faster way to do this?

edited Jan 2 at 20:16

spectras

7,64511635

asked Jan 2 at 19:21

darkhorse

1,53551847

closed as primarily opinion-based by Engineero, Patrick Artner, eyllanesc, Jean-François Fabre, Paul Roub Jan 2 at 22:58

pickle is a just a binary format which can help you to save dict more efficiently. Python definitely needs to spend a little bit more effort to read/parse binary format than plain text format.

– Windchill
Jan 2 at 19:24

there are lots of alternatives here, what's best/ideal/fastest almost certainly depends on much more information than you've given. i.e. how often does this data change, who changes it, how do they change it, do you want to keep track of these changes, do you care about portability to different systems/versions of Python… and many more

– Sam Mason
Jan 2 at 19:25

You could also consider dumping it as a json since the format is essentially the same, unless your dictionary have other Python object inside that is.

– Idlehands
Jan 2 at 19:31

If you don't have any need to distinguish between False and varieties of N/A, you can just create a list of the keys with True values and write that to file. Then instead of doing if my_dict[key], you can do if key in my_list.

– Acccumulation
Jan 2 at 19:35

1

@Acccumulation but that would be trading an O(1) membership test for an O(N) membership test.

– juanpa.arrivillaga
Jan 2 at 19:38

|
show 2 more comments

I want to store a dictionary which only contains data in the following format:

{

    "key1" : True,

    "key2" : True,

    .....

}

In other words, just a quick way to check if a key is valid or not. I can do this by storing a dict called foo in a file called bar.py, and then in my other modules, I can import it as follows:

from bar import foo

Or, I can save it in a pickle file called bar.pickle, and import it at the top of the file as follows:

import pickle  

with open('bar.pickle', 'rb') as f:

    foo = pickle.load(f)

Which would be the ideal, and faster way to do this?

edited Jan 2 at 20:16

spectras

7,64511635

asked Jan 2 at 19:21

darkhorse

1,53551847

I want to store a dictionary which only contains data in the following format:

{

    "key1" : True,

    "key2" : True,

    .....

}

In other words, just a quick way to check if a key is valid or not. I can do this by storing a dict called foo in a file called bar.py, and then in my other modules, I can import it as follows:

from bar import foo

Or, I can save it in a pickle file called bar.pickle, and import it at the top of the file as follows:

import pickle  

with open('bar.pickle', 'rb') as f:

    foo = pickle.load(f)

Which would be the ideal, and faster way to do this?

python pickle

edited Jan 2 at 20:16

spectras

7,64511635

asked Jan 2 at 19:21

darkhorse

1,53551847

edited Jan 2 at 20:16

spectras

7,64511635

asked Jan 2 at 19:21

darkhorse

1,53551847

edited Jan 2 at 20:16

spectras

7,64511635

edited Jan 2 at 20:16

spectras

7,64511635

edited Jan 2 at 20:16

spectras

7,64511635

asked Jan 2 at 19:21

darkhorse

1,53551847

asked Jan 2 at 19:21

darkhorse

1,53551847

asked Jan 2 at 19:21

darkhorse

1,53551847

closed as primarily opinion-based by Engineero, Patrick Artner, eyllanesc, Jean-François Fabre, Paul Roub Jan 2 at 22:58

pickle is a just a binary format which can help you to save dict more efficiently. Python definitely needs to spend a little bit more effort to read/parse binary format than plain text format.

– Windchill
Jan 2 at 19:24

there are lots of alternatives here, what's best/ideal/fastest almost certainly depends on much more information than you've given. i.e. how often does this data change, who changes it, how do they change it, do you want to keep track of these changes, do you care about portability to different systems/versions of Python… and many more

– Sam Mason
Jan 2 at 19:25

You could also consider dumping it as a json since the format is essentially the same, unless your dictionary have other Python object inside that is.

– Idlehands
Jan 2 at 19:31

If you don't have any need to distinguish between False and varieties of N/A, you can just create a list of the keys with True values and write that to file. Then instead of doing if my_dict[key], you can do if key in my_list.

– Acccumulation
Jan 2 at 19:35

1

@Acccumulation but that would be trading an O(1) membership test for an O(N) membership test.

– juanpa.arrivillaga
Jan 2 at 19:38

|
show 2 more comments

pickle is a just a binary format which can help you to save dict more efficiently. Python definitely needs to spend a little bit more effort to read/parse binary format than plain text format.

– Windchill
Jan 2 at 19:24

there are lots of alternatives here, what's best/ideal/fastest almost certainly depends on much more information than you've given. i.e. how often does this data change, who changes it, how do they change it, do you want to keep track of these changes, do you care about portability to different systems/versions of Python… and many more

– Sam Mason
Jan 2 at 19:25

You could also consider dumping it as a json since the format is essentially the same, unless your dictionary have other Python object inside that is.

– Idlehands
Jan 2 at 19:31

If you don't have any need to distinguish between False and varieties of N/A, you can just create a list of the keys with True values and write that to file. Then instead of doing if my_dict[key], you can do if key in my_list.

– Acccumulation
Jan 2 at 19:35

1

@Acccumulation but that would be trading an O(1) membership test for an O(N) membership test.

– juanpa.arrivillaga
Jan 2 at 19:38

pickle is a just a binary format which can help you to save dict more efficiently. Python definitely needs to spend a little bit more effort to read/parse binary format than plain text format.

– Windchill
Jan 2 at 19:24

there are lots of alternatives here, what's best/ideal/fastest almost certainly depends on much more information than you've given. i.e. how often does this data change, who changes it, how do they change it, do you want to keep track of these changes, do you care about portability to different systems/versions of Python… and many more

– Sam Mason
Jan 2 at 19:25

You could also consider dumping it as a json since the format is essentially the same, unless your dictionary have other Python object inside that is.

– Idlehands
Jan 2 at 19:31

If you don't have any need to distinguish between False and varieties of N/A, you can just create a list of the keys with True values and write that to file. Then instead of doing if my_dict[key], you can do if key in my_list.

– Acccumulation
Jan 2 at 19:35

@Acccumulation but that would be trading an O(1) membership test for an O(N) membership test.

– juanpa.arrivillaga
Jan 2 at 19:38

|
show 2 more comments

2 Answers
2

active

oldest

votes

Python File

Using a python file will easily cache the dictionary, so that if you "import" it multiple times, it only has to be parsed once. However, python syntax is complicated, and so the parser that loads the file may not be well optimized for the limited complexity of the data you're saving (unless you're including arbitrary Python objects and code). It's easy to view and edit, and easy to use, but it's not easy to transport.

EDIT: to clarify, raw Python files are easy for a human to modify, but very hard for a computer to edit. If your code edits the data and you ever want that to be reflected in the dictionary, you're pretty much up a creek: instead, use one of the methods below.

Pickle File

If you use a pickle file, you'd either re-load the file each time you use it, or need some management code to cache the file after reading it the first time. Like arbitrary Python code, pickle files can be quite complex and the loader for them might not be optimized for your particular data types since, like raw python files, they can also store most arbitrary Python objects. However, they're hard to edit and view for a regular human, and you might encounter portability issues if you move the data around. It's also only readable by Python, and you need to consider the security considerations of using pickle, since loading pickle files can be risky and should only be done with trusted files.

JSON File

If all you're storing is simple objects (dictionaries, lists, strings, booleans, numbers), consider using the JSON file format. Python has a built-in json module that's just as easy to use as pickle, so there's no added complexity. These files are easy to store, view, edit, and compress (if desired), and look almost exactly like a python dictionary. It's highly portable (most common languages support reading/writing JSON files these days), and if you need to improve file loading speed, the ujson module is a faster, drop-in replacement for the standard json module. Since the JSON file format is fairly restricted, I'd expect its parsers and writers to be quite a bit faster than the regular Python or Pickle parsers (especially using ujson).

edited Jan 2 at 21:45

answered Jan 2 at 19:35

scnerd

3,33411026

Yes, yes and yes, please use a common format such as json. It's not just faster, it's also safer, portable, human-readable, re-usable…

– spectras
Jan 2 at 19:58

I wouldn't consider json exactly human-readable (beyond the literal definition)... after all that's why hjson exists. But I do agree on using a common format like itself.

– Idlehands
Jan 2 at 20:06

YMMV, it's reasonably human-readable while reasonably fast. Requirements will certainly shift the cursor. I wouldn't use json for any file intended to be written by a human. Nor would I use it for performance-critical serialization. Probably I'd use YAML for the former and FlatBuffers for the latter, or something along those lines. JSON is a good in-the-middle general-purpose data format :)

– spectras
Jan 2 at 20:12

Agreed, YAML is a great format for human readability, and custom formats like FlatBuffers and Protobuf are great for performance and serialization size. However, support for YAML is a bit more spotty (even Python needs a third party module to read it), and custom binary formats often involve more setup (e.g. defining and compiling the data structure in advance). JSON is a great compromise, being extremely easy to use in general, and (if formatted properly) easy to read/write. It's a good next step for people who don't want to get swamped in all the possible serialization methods.

– scnerd
Jan 2 at 21:35

add a comment |

To add to @scnerd's comment, here are the timings in IPython for different load situations.

Here we create a dictionary and write it to 3 formats:

import random

import json

import pickle



letters = 'abcdefghijklmnopqrstuvwxyz'

d = {''.join(random.choices(letters, k=6)): random.choice([True, False]) 

     for _ in range(100000)}



# write a python file

with open('mydict.py', 'w') as fp:

    fp.write('d = {n')

    for k,v in d.items():

        fp.write(f"'{k}':{v},n")

    fp.write('None:False}')



# write a pickle file

with open('mydict.pickle', 'wb') as fp:

    pickle.dump(d, fp)



# write a json file

with open('mydict.json', 'wb') as fp:

    json.dump(d, fp)

Python file:

# on first import the file will be cached.  

%%timeit -n1 -r1

from mydict import d



644 ms ± 0 ns per loop (mean ± std. dev. of 1 run, 1 loop each)



# after creating the __pycache__ folder, import is MUCH faster

%%timeit

from mydict import d



1.37 µs ± 54.2 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)

pickle file:

%%timeit

with open('mydict.pickle', 'rb') as fp:

    pickle.load(fp)



52.4 ms ± 1.03 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)

json file:

%%timeit

with open('mydict.json', 'rb') as fp:

    json.load(fp)



81.3 ms ± 2.21 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)



# here is the same test with ujson

import ujson



%%timeit

with open('mydict.json', 'rb') as fp:

    ujson.load(fp)



51.2 ms ± 304 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)

edited Jan 2 at 21:40

answered Jan 2 at 20:02

James

13.7k11633

Since I did specifically mention ujson to improve json speed, could you add a timing with that as well?

– scnerd
Jan 2 at 21:31

Also, for the Python file, you don't need the None: False at the end, at least in Python 3.6+ (probably any 3.X, not sure), it's valid syntax to have an extra comma at the end of a dictionary literal

– scnerd
Jan 2 at 21:32

Oh! Did not know that.

– James
Jan 2 at 21:35

1

Added the ujson test. It looks comparable to using pickle.

– James
Jan 2 at 21:41

add a comment |

2 Answers
2

active

oldest

votes

2 Answers
2

active

oldest

votes

Python File

Pickle File

JSON File

edited Jan 2 at 21:45

answered Jan 2 at 19:35

scnerd

3,33411026

Yes, yes and yes, please use a common format such as json. It's not just faster, it's also safer, portable, human-readable, re-usable…

– spectras
Jan 2 at 19:58

I wouldn't consider json exactly human-readable (beyond the literal definition)... after all that's why hjson exists. But I do agree on using a common format like itself.

– Idlehands
Jan 2 at 20:06

YMMV, it's reasonably human-readable while reasonably fast. Requirements will certainly shift the cursor. I wouldn't use json for any file intended to be written by a human. Nor would I use it for performance-critical serialization. Probably I'd use YAML for the former and FlatBuffers for the latter, or something along those lines. JSON is a good in-the-middle general-purpose data format :)

– spectras
Jan 2 at 20:12

Agreed, YAML is a great format for human readability, and custom formats like FlatBuffers and Protobuf are great for performance and serialization size. However, support for YAML is a bit more spotty (even Python needs a third party module to read it), and custom binary formats often involve more setup (e.g. defining and compiling the data structure in advance). JSON is a great compromise, being extremely easy to use in general, and (if formatted properly) easy to read/write. It's a good next step for people who don't want to get swamped in all the possible serialization methods.

– scnerd
Jan 2 at 21:35

add a comment |

Python File

Pickle File

JSON File

edited Jan 2 at 21:45

answered Jan 2 at 19:35

scnerd

3,33411026

Yes, yes and yes, please use a common format such as json. It's not just faster, it's also safer, portable, human-readable, re-usable…

– spectras
Jan 2 at 19:58

I wouldn't consider json exactly human-readable (beyond the literal definition)... after all that's why hjson exists. But I do agree on using a common format like itself.

– Idlehands
Jan 2 at 20:06

YMMV, it's reasonably human-readable while reasonably fast. Requirements will certainly shift the cursor. I wouldn't use json for any file intended to be written by a human. Nor would I use it for performance-critical serialization. Probably I'd use YAML for the former and FlatBuffers for the latter, or something along those lines. JSON is a good in-the-middle general-purpose data format :)

– spectras
Jan 2 at 20:12

Agreed, YAML is a great format for human readability, and custom formats like FlatBuffers and Protobuf are great for performance and serialization size. However, support for YAML is a bit more spotty (even Python needs a third party module to read it), and custom binary formats often involve more setup (e.g. defining and compiling the data structure in advance). JSON is a great compromise, being extremely easy to use in general, and (if formatted properly) easy to read/write. It's a good next step for people who don't want to get swamped in all the possible serialization methods.

– scnerd
Jan 2 at 21:35

add a comment |

Python File

Pickle File

JSON File

edited Jan 2 at 21:45

answered Jan 2 at 19:35

scnerd

3,33411026

Python File

Pickle File

JSON File

edited Jan 2 at 21:45

answered Jan 2 at 19:35

scnerd

3,33411026

edited Jan 2 at 21:45

answered Jan 2 at 19:35

scnerd

3,33411026

answered Jan 2 at 19:35

scnerd

3,33411026

answered Jan 2 at 19:35

scnerd

3,33411026

Yes, yes and yes, please use a common format such as json. It's not just faster, it's also safer, portable, human-readable, re-usable…

– spectras
Jan 2 at 19:58

I wouldn't consider json exactly human-readable (beyond the literal definition)... after all that's why hjson exists. But I do agree on using a common format like itself.

– Idlehands
Jan 2 at 20:06

YMMV, it's reasonably human-readable while reasonably fast. Requirements will certainly shift the cursor. I wouldn't use json for any file intended to be written by a human. Nor would I use it for performance-critical serialization. Probably I'd use YAML for the former and FlatBuffers for the latter, or something along those lines. JSON is a good in-the-middle general-purpose data format :)

– spectras
Jan 2 at 20:12

Agreed, YAML is a great format for human readability, and custom formats like FlatBuffers and Protobuf are great for performance and serialization size. However, support for YAML is a bit more spotty (even Python needs a third party module to read it), and custom binary formats often involve more setup (e.g. defining and compiling the data structure in advance). JSON is a great compromise, being extremely easy to use in general, and (if formatted properly) easy to read/write. It's a good next step for people who don't want to get swamped in all the possible serialization methods.

– scnerd
Jan 2 at 21:35

add a comment |

Yes, yes and yes, please use a common format such as json. It's not just faster, it's also safer, portable, human-readable, re-usable…

– spectras
Jan 2 at 19:58

I wouldn't consider json exactly human-readable (beyond the literal definition)... after all that's why hjson exists. But I do agree on using a common format like itself.

– Idlehands
Jan 2 at 20:06

YMMV, it's reasonably human-readable while reasonably fast. Requirements will certainly shift the cursor. I wouldn't use json for any file intended to be written by a human. Nor would I use it for performance-critical serialization. Probably I'd use YAML for the former and FlatBuffers for the latter, or something along those lines. JSON is a good in-the-middle general-purpose data format :)

– spectras
Jan 2 at 20:12

Agreed, YAML is a great format for human readability, and custom formats like FlatBuffers and Protobuf are great for performance and serialization size. However, support for YAML is a bit more spotty (even Python needs a third party module to read it), and custom binary formats often involve more setup (e.g. defining and compiling the data structure in advance). JSON is a great compromise, being extremely easy to use in general, and (if formatted properly) easy to read/write. It's a good next step for people who don't want to get swamped in all the possible serialization methods.

– scnerd
Jan 2 at 21:35

Yes, yes and yes, please use a common format such as json. It's not just faster, it's also safer, portable, human-readable, re-usable…

– spectras
Jan 2 at 19:58

I wouldn't consider json exactly human-readable (beyond the literal definition)... after all that's why hjson exists. But I do agree on using a common format like itself.

– Idlehands
Jan 2 at 20:06

YMMV, it's reasonably human-readable while reasonably fast. Requirements will certainly shift the cursor. I wouldn't use json for any file intended to be written by a human. Nor would I use it for performance-critical serialization. Probably I'd use YAML for the former and FlatBuffers for the latter, or something along those lines. JSON is a good in-the-middle general-purpose data format :)

– spectras
Jan 2 at 20:12

Agreed, YAML is a great format for human readability, and custom formats like FlatBuffers and Protobuf are great for performance and serialization size. However, support for YAML is a bit more spotty (even Python needs a third party module to read it), and custom binary formats often involve more setup (e.g. defining and compiling the data structure in advance). JSON is a great compromise, being extremely easy to use in general, and (if formatted properly) easy to read/write. It's a good next step for people who don't want to get swamped in all the possible serialization methods.

– scnerd
Jan 2 at 21:35

add a comment |

To add to @scnerd's comment, here are the timings in IPython for different load situations.

Here we create a dictionary and write it to 3 formats:

import random

import json

import pickle



letters = 'abcdefghijklmnopqrstuvwxyz'

d = {''.join(random.choices(letters, k=6)): random.choice([True, False]) 

     for _ in range(100000)}



# write a python file

with open('mydict.py', 'w') as fp:

    fp.write('d = {n')

    for k,v in d.items():

        fp.write(f"'{k}':{v},n")

    fp.write('None:False}')



# write a pickle file

with open('mydict.pickle', 'wb') as fp:

    pickle.dump(d, fp)



# write a json file

with open('mydict.json', 'wb') as fp:

    json.dump(d, fp)

Python file:

# on first import the file will be cached.  

%%timeit -n1 -r1

from mydict import d



644 ms ± 0 ns per loop (mean ± std. dev. of 1 run, 1 loop each)



# after creating the __pycache__ folder, import is MUCH faster

%%timeit

from mydict import d



1.37 µs ± 54.2 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)

pickle file:

%%timeit

with open('mydict.pickle', 'rb') as fp:

    pickle.load(fp)



52.4 ms ± 1.03 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)

json file:

%%timeit

with open('mydict.json', 'rb') as fp:

    json.load(fp)



81.3 ms ± 2.21 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)



# here is the same test with ujson

import ujson



%%timeit

with open('mydict.json', 'rb') as fp:

    ujson.load(fp)



51.2 ms ± 304 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)

edited Jan 2 at 21:40

answered Jan 2 at 20:02

James

13.7k11633

Since I did specifically mention ujson to improve json speed, could you add a timing with that as well?

– scnerd
Jan 2 at 21:31

Also, for the Python file, you don't need the None: False at the end, at least in Python 3.6+ (probably any 3.X, not sure), it's valid syntax to have an extra comma at the end of a dictionary literal

– scnerd
Jan 2 at 21:32

Oh! Did not know that.

– James
Jan 2 at 21:35

1

Added the ujson test. It looks comparable to using pickle.

– James
Jan 2 at 21:41

add a comment |

To add to @scnerd's comment, here are the timings in IPython for different load situations.

Here we create a dictionary and write it to 3 formats:

import random

import json

import pickle



letters = 'abcdefghijklmnopqrstuvwxyz'

d = {''.join(random.choices(letters, k=6)): random.choice([True, False]) 

     for _ in range(100000)}



# write a python file

with open('mydict.py', 'w') as fp:

    fp.write('d = {n')

    for k,v in d.items():

        fp.write(f"'{k}':{v},n")

    fp.write('None:False}')



# write a pickle file

with open('mydict.pickle', 'wb') as fp:

    pickle.dump(d, fp)



# write a json file

with open('mydict.json', 'wb') as fp:

    json.dump(d, fp)

Python file:

# on first import the file will be cached.  

%%timeit -n1 -r1

from mydict import d



644 ms ± 0 ns per loop (mean ± std. dev. of 1 run, 1 loop each)



# after creating the __pycache__ folder, import is MUCH faster

%%timeit

from mydict import d



1.37 µs ± 54.2 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)

pickle file:

%%timeit

with open('mydict.pickle', 'rb') as fp:

    pickle.load(fp)



52.4 ms ± 1.03 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)

json file:

%%timeit

with open('mydict.json', 'rb') as fp:

    json.load(fp)



81.3 ms ± 2.21 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)



# here is the same test with ujson

import ujson



%%timeit

with open('mydict.json', 'rb') as fp:

    ujson.load(fp)



51.2 ms ± 304 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)

edited Jan 2 at 21:40

answered Jan 2 at 20:02

James

13.7k11633

Since I did specifically mention ujson to improve json speed, could you add a timing with that as well?

– scnerd
Jan 2 at 21:31

Also, for the Python file, you don't need the None: False at the end, at least in Python 3.6+ (probably any 3.X, not sure), it's valid syntax to have an extra comma at the end of a dictionary literal

– scnerd
Jan 2 at 21:32

Oh! Did not know that.

– James
Jan 2 at 21:35

1

Added the ujson test. It looks comparable to using pickle.

– James
Jan 2 at 21:41

add a comment |

To add to @scnerd's comment, here are the timings in IPython for different load situations.

Here we create a dictionary and write it to 3 formats:

import random

import json

import pickle



letters = 'abcdefghijklmnopqrstuvwxyz'

d = {''.join(random.choices(letters, k=6)): random.choice([True, False]) 

     for _ in range(100000)}



# write a python file

with open('mydict.py', 'w') as fp:

    fp.write('d = {n')

    for k,v in d.items():

        fp.write(f"'{k}':{v},n")

    fp.write('None:False}')



# write a pickle file

with open('mydict.pickle', 'wb') as fp:

    pickle.dump(d, fp)



# write a json file

with open('mydict.json', 'wb') as fp:

    json.dump(d, fp)

Python file:

# on first import the file will be cached.  

%%timeit -n1 -r1

from mydict import d



644 ms ± 0 ns per loop (mean ± std. dev. of 1 run, 1 loop each)



# after creating the __pycache__ folder, import is MUCH faster

%%timeit

from mydict import d



1.37 µs ± 54.2 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)

pickle file:

%%timeit

with open('mydict.pickle', 'rb') as fp:

    pickle.load(fp)



52.4 ms ± 1.03 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)

json file:

%%timeit

with open('mydict.json', 'rb') as fp:

    json.load(fp)



81.3 ms ± 2.21 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)



# here is the same test with ujson

import ujson



%%timeit

with open('mydict.json', 'rb') as fp:

    ujson.load(fp)



51.2 ms ± 304 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)

edited Jan 2 at 21:40

answered Jan 2 at 20:02

James

13.7k11633

To add to @scnerd's comment, here are the timings in IPython for different load situations.

Here we create a dictionary and write it to 3 formats:

import random

import json

import pickle



letters = 'abcdefghijklmnopqrstuvwxyz'

d = {''.join(random.choices(letters, k=6)): random.choice([True, False]) 

     for _ in range(100000)}



# write a python file

with open('mydict.py', 'w') as fp:

    fp.write('d = {n')

    for k,v in d.items():

        fp.write(f"'{k}':{v},n")

    fp.write('None:False}')



# write a pickle file

with open('mydict.pickle', 'wb') as fp:

    pickle.dump(d, fp)



# write a json file

with open('mydict.json', 'wb') as fp:

    json.dump(d, fp)

Python file:

# on first import the file will be cached.  

%%timeit -n1 -r1

from mydict import d



644 ms ± 0 ns per loop (mean ± std. dev. of 1 run, 1 loop each)



# after creating the __pycache__ folder, import is MUCH faster

%%timeit

from mydict import d



1.37 µs ± 54.2 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)

pickle file:

%%timeit

with open('mydict.pickle', 'rb') as fp:

    pickle.load(fp)



52.4 ms ± 1.03 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)

json file:

%%timeit

with open('mydict.json', 'rb') as fp:

    json.load(fp)



81.3 ms ± 2.21 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)



# here is the same test with ujson

import ujson



%%timeit

with open('mydict.json', 'rb') as fp:

    ujson.load(fp)



51.2 ms ± 304 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)

edited Jan 2 at 21:40

answered Jan 2 at 20:02

James

13.7k11633

edited Jan 2 at 21:40

answered Jan 2 at 20:02

James

13.7k11633

answered Jan 2 at 20:02

James

13.7k11633

answered Jan 2 at 20:02

James

13.7k11633

Since I did specifically mention ujson to improve json speed, could you add a timing with that as well?

– scnerd
Jan 2 at 21:31

Also, for the Python file, you don't need the None: False at the end, at least in Python 3.6+ (probably any 3.X, not sure), it's valid syntax to have an extra comma at the end of a dictionary literal

– scnerd
Jan 2 at 21:32

Oh! Did not know that.

– James
Jan 2 at 21:35

1

Added the ujson test. It looks comparable to using pickle.

– James
Jan 2 at 21:41

add a comment |

Since I did specifically mention ujson to improve json speed, could you add a timing with that as well?

– scnerd
Jan 2 at 21:31

Also, for the Python file, you don't need the None: False at the end, at least in Python 3.6+ (probably any 3.X, not sure), it's valid syntax to have an extra comma at the end of a dictionary literal

– scnerd
Jan 2 at 21:32

Oh! Did not know that.

– James
Jan 2 at 21:35

1

Added the ujson test. It looks comparable to using pickle.

– James
Jan 2 at 21:41

Since I did specifically mention ujson to improve json speed, could you add a timing with that as well?

– scnerd
Jan 2 at 21:31

Also, for the Python file, you don't need the None: False at the end, at least in Python 3.6+ (probably any 3.X, not sure), it's valid syntax to have an extra comma at the end of a dictionary literal

– scnerd
Jan 2 at 21:32

Oh! Did not know that.

– James
Jan 2 at 21:35

Added the ujson test. It looks comparable to using pickle.

– James
Jan 2 at 21:41

add a comment |

This page is only for reference, If you need detailed information, please check here

搜尋此網誌

Bdtjtk