Is there a faster way to store a big dictionary, than pickle or regular Python file? [closed]












0















I want to store a dictionary which only contains data in the following format:



{
"key1" : True,
"key2" : True,
.....
}


In other words, just a quick way to check if a key is valid or not. I can do this by storing a dict called foo in a file called bar.py, and then in my other modules, I can import it as follows:



from bar import foo


Or, I can save it in a pickle file called bar.pickle, and import it at the top of the file as follows:



import pickle  
with open('bar.pickle', 'rb') as f:
foo = pickle.load(f)


Which would be the ideal, and faster way to do this?










share|improve this question















closed as primarily opinion-based by Engineero, Patrick Artner, eyllanesc, Jean-François Fabre, Paul Roub Jan 2 at 22:58


Many good questions generate some degree of opinion based on expert experience, but answers to this question will tend to be almost entirely based on opinions, rather than facts, references, or specific expertise. If this question can be reworded to fit the rules in the help center, please edit the question.



















  • pickle is a just a binary format which can help you to save dict more efficiently. Python definitely needs to spend a little bit more effort to read/parse binary format than plain text format.

    – Windchill
    Jan 2 at 19:24











  • there are lots of alternatives here, what's best/ideal/fastest almost certainly depends on much more information than you've given. i.e. how often does this data change, who changes it, how do they change it, do you want to keep track of these changes, do you care about portability to different systems/versions of Python… and many more

    – Sam Mason
    Jan 2 at 19:25











  • You could also consider dumping it as a json since the format is essentially the same, unless your dictionary have other Python object inside that is.

    – Idlehands
    Jan 2 at 19:31













  • If you don't have any need to distinguish between False and varieties of N/A, you can just create a list of the keys with True values and write that to file. Then instead of doing if my_dict[key], you can do if key in my_list.

    – Acccumulation
    Jan 2 at 19:35






  • 1





    @Acccumulation but that would be trading an O(1) membership test for an O(N) membership test.

    – juanpa.arrivillaga
    Jan 2 at 19:38
















0















I want to store a dictionary which only contains data in the following format:



{
"key1" : True,
"key2" : True,
.....
}


In other words, just a quick way to check if a key is valid or not. I can do this by storing a dict called foo in a file called bar.py, and then in my other modules, I can import it as follows:



from bar import foo


Or, I can save it in a pickle file called bar.pickle, and import it at the top of the file as follows:



import pickle  
with open('bar.pickle', 'rb') as f:
foo = pickle.load(f)


Which would be the ideal, and faster way to do this?










share|improve this question















closed as primarily opinion-based by Engineero, Patrick Artner, eyllanesc, Jean-François Fabre, Paul Roub Jan 2 at 22:58


Many good questions generate some degree of opinion based on expert experience, but answers to this question will tend to be almost entirely based on opinions, rather than facts, references, or specific expertise. If this question can be reworded to fit the rules in the help center, please edit the question.



















  • pickle is a just a binary format which can help you to save dict more efficiently. Python definitely needs to spend a little bit more effort to read/parse binary format than plain text format.

    – Windchill
    Jan 2 at 19:24











  • there are lots of alternatives here, what's best/ideal/fastest almost certainly depends on much more information than you've given. i.e. how often does this data change, who changes it, how do they change it, do you want to keep track of these changes, do you care about portability to different systems/versions of Python… and many more

    – Sam Mason
    Jan 2 at 19:25











  • You could also consider dumping it as a json since the format is essentially the same, unless your dictionary have other Python object inside that is.

    – Idlehands
    Jan 2 at 19:31













  • If you don't have any need to distinguish between False and varieties of N/A, you can just create a list of the keys with True values and write that to file. Then instead of doing if my_dict[key], you can do if key in my_list.

    – Acccumulation
    Jan 2 at 19:35






  • 1





    @Acccumulation but that would be trading an O(1) membership test for an O(N) membership test.

    – juanpa.arrivillaga
    Jan 2 at 19:38














0












0








0


1






I want to store a dictionary which only contains data in the following format:



{
"key1" : True,
"key2" : True,
.....
}


In other words, just a quick way to check if a key is valid or not. I can do this by storing a dict called foo in a file called bar.py, and then in my other modules, I can import it as follows:



from bar import foo


Or, I can save it in a pickle file called bar.pickle, and import it at the top of the file as follows:



import pickle  
with open('bar.pickle', 'rb') as f:
foo = pickle.load(f)


Which would be the ideal, and faster way to do this?










share|improve this question
















I want to store a dictionary which only contains data in the following format:



{
"key1" : True,
"key2" : True,
.....
}


In other words, just a quick way to check if a key is valid or not. I can do this by storing a dict called foo in a file called bar.py, and then in my other modules, I can import it as follows:



from bar import foo


Or, I can save it in a pickle file called bar.pickle, and import it at the top of the file as follows:



import pickle  
with open('bar.pickle', 'rb') as f:
foo = pickle.load(f)


Which would be the ideal, and faster way to do this?







python pickle






share|improve this question















share|improve this question













share|improve this question




share|improve this question








edited Jan 2 at 20:16









spectras

7,64511635




7,64511635










asked Jan 2 at 19:21









darkhorsedarkhorse

1,53551847




1,53551847




closed as primarily opinion-based by Engineero, Patrick Artner, eyllanesc, Jean-François Fabre, Paul Roub Jan 2 at 22:58


Many good questions generate some degree of opinion based on expert experience, but answers to this question will tend to be almost entirely based on opinions, rather than facts, references, or specific expertise. If this question can be reworded to fit the rules in the help center, please edit the question.









closed as primarily opinion-based by Engineero, Patrick Artner, eyllanesc, Jean-François Fabre, Paul Roub Jan 2 at 22:58


Many good questions generate some degree of opinion based on expert experience, but answers to this question will tend to be almost entirely based on opinions, rather than facts, references, or specific expertise. If this question can be reworded to fit the rules in the help center, please edit the question.















  • pickle is a just a binary format which can help you to save dict more efficiently. Python definitely needs to spend a little bit more effort to read/parse binary format than plain text format.

    – Windchill
    Jan 2 at 19:24











  • there are lots of alternatives here, what's best/ideal/fastest almost certainly depends on much more information than you've given. i.e. how often does this data change, who changes it, how do they change it, do you want to keep track of these changes, do you care about portability to different systems/versions of Python… and many more

    – Sam Mason
    Jan 2 at 19:25











  • You could also consider dumping it as a json since the format is essentially the same, unless your dictionary have other Python object inside that is.

    – Idlehands
    Jan 2 at 19:31













  • If you don't have any need to distinguish between False and varieties of N/A, you can just create a list of the keys with True values and write that to file. Then instead of doing if my_dict[key], you can do if key in my_list.

    – Acccumulation
    Jan 2 at 19:35






  • 1





    @Acccumulation but that would be trading an O(1) membership test for an O(N) membership test.

    – juanpa.arrivillaga
    Jan 2 at 19:38



















  • pickle is a just a binary format which can help you to save dict more efficiently. Python definitely needs to spend a little bit more effort to read/parse binary format than plain text format.

    – Windchill
    Jan 2 at 19:24











  • there are lots of alternatives here, what's best/ideal/fastest almost certainly depends on much more information than you've given. i.e. how often does this data change, who changes it, how do they change it, do you want to keep track of these changes, do you care about portability to different systems/versions of Python… and many more

    – Sam Mason
    Jan 2 at 19:25











  • You could also consider dumping it as a json since the format is essentially the same, unless your dictionary have other Python object inside that is.

    – Idlehands
    Jan 2 at 19:31













  • If you don't have any need to distinguish between False and varieties of N/A, you can just create a list of the keys with True values and write that to file. Then instead of doing if my_dict[key], you can do if key in my_list.

    – Acccumulation
    Jan 2 at 19:35






  • 1





    @Acccumulation but that would be trading an O(1) membership test for an O(N) membership test.

    – juanpa.arrivillaga
    Jan 2 at 19:38

















pickle is a just a binary format which can help you to save dict more efficiently. Python definitely needs to spend a little bit more effort to read/parse binary format than plain text format.

– Windchill
Jan 2 at 19:24





pickle is a just a binary format which can help you to save dict more efficiently. Python definitely needs to spend a little bit more effort to read/parse binary format than plain text format.

– Windchill
Jan 2 at 19:24













there are lots of alternatives here, what's best/ideal/fastest almost certainly depends on much more information than you've given. i.e. how often does this data change, who changes it, how do they change it, do you want to keep track of these changes, do you care about portability to different systems/versions of Python… and many more

– Sam Mason
Jan 2 at 19:25





there are lots of alternatives here, what's best/ideal/fastest almost certainly depends on much more information than you've given. i.e. how often does this data change, who changes it, how do they change it, do you want to keep track of these changes, do you care about portability to different systems/versions of Python… and many more

– Sam Mason
Jan 2 at 19:25













You could also consider dumping it as a json since the format is essentially the same, unless your dictionary have other Python object inside that is.

– Idlehands
Jan 2 at 19:31







You could also consider dumping it as a json since the format is essentially the same, unless your dictionary have other Python object inside that is.

– Idlehands
Jan 2 at 19:31















If you don't have any need to distinguish between False and varieties of N/A, you can just create a list of the keys with True values and write that to file. Then instead of doing if my_dict[key], you can do if key in my_list.

– Acccumulation
Jan 2 at 19:35





If you don't have any need to distinguish between False and varieties of N/A, you can just create a list of the keys with True values and write that to file. Then instead of doing if my_dict[key], you can do if key in my_list.

– Acccumulation
Jan 2 at 19:35




1




1





@Acccumulation but that would be trading an O(1) membership test for an O(N) membership test.

– juanpa.arrivillaga
Jan 2 at 19:38





@Acccumulation but that would be trading an O(1) membership test for an O(N) membership test.

– juanpa.arrivillaga
Jan 2 at 19:38












2 Answers
2






active

oldest

votes


















3














Python File



Using a python file will easily cache the dictionary, so that if you "import" it multiple times, it only has to be parsed once. However, python syntax is complicated, and so the parser that loads the file may not be well optimized for the limited complexity of the data you're saving (unless you're including arbitrary Python objects and code). It's easy to view and edit, and easy to use, but it's not easy to transport.



EDIT: to clarify, raw Python files are easy for a human to modify, but very hard for a computer to edit. If your code edits the data and you ever want that to be reflected in the dictionary, you're pretty much up a creek: instead, use one of the methods below.



Pickle File



If you use a pickle file, you'd either re-load the file each time you use it, or need some management code to cache the file after reading it the first time. Like arbitrary Python code, pickle files can be quite complex and the loader for them might not be optimized for your particular data types since, like raw python files, they can also store most arbitrary Python objects. However, they're hard to edit and view for a regular human, and you might encounter portability issues if you move the data around. It's also only readable by Python, and you need to consider the security considerations of using pickle, since loading pickle files can be risky and should only be done with trusted files.



JSON File



If all you're storing is simple objects (dictionaries, lists, strings, booleans, numbers), consider using the JSON file format. Python has a built-in json module that's just as easy to use as pickle, so there's no added complexity. These files are easy to store, view, edit, and compress (if desired), and look almost exactly like a python dictionary. It's highly portable (most common languages support reading/writing JSON files these days), and if you need to improve file loading speed, the ujson module is a faster, drop-in replacement for the standard json module. Since the JSON file format is fairly restricted, I'd expect its parsers and writers to be quite a bit faster than the regular Python or Pickle parsers (especially using ujson).






share|improve this answer


























  • Yes, yes and yes, please use a common format such as json. It's not just faster, it's also safer, portable, human-readable, re-usable…

    – spectras
    Jan 2 at 19:58











  • I wouldn't consider json exactly human-readable (beyond the literal definition)... after all that's why hjson exists. But I do agree on using a common format like itself.

    – Idlehands
    Jan 2 at 20:06













  • YMMV, it's reasonably human-readable while reasonably fast. Requirements will certainly shift the cursor. I wouldn't use json for any file intended to be written by a human. Nor would I use it for performance-critical serialization. Probably I'd use YAML for the former and FlatBuffers for the latter, or something along those lines. JSON is a good in-the-middle general-purpose data format :)

    – spectras
    Jan 2 at 20:12











  • Agreed, YAML is a great format for human readability, and custom formats like FlatBuffers and Protobuf are great for performance and serialization size. However, support for YAML is a bit more spotty (even Python needs a third party module to read it), and custom binary formats often involve more setup (e.g. defining and compiling the data structure in advance). JSON is a great compromise, being extremely easy to use in general, and (if formatted properly) easy to read/write. It's a good next step for people who don't want to get swamped in all the possible serialization methods.

    – scnerd
    Jan 2 at 21:35



















2














To add to @scnerd's comment, here are the timings in IPython for different load situations.



Here we create a dictionary and write it to 3 formats:



import random
import json
import pickle

letters = 'abcdefghijklmnopqrstuvwxyz'
d = {''.join(random.choices(letters, k=6)): random.choice([True, False])
for _ in range(100000)}

# write a python file
with open('mydict.py', 'w') as fp:
fp.write('d = {n')
for k,v in d.items():
fp.write(f"'{k}':{v},n")
fp.write('None:False}')

# write a pickle file
with open('mydict.pickle', 'wb') as fp:
pickle.dump(d, fp)

# write a json file
with open('mydict.json', 'wb') as fp:
json.dump(d, fp)


Python file:



# on first import the file will be cached.  
%%timeit -n1 -r1
from mydict import d

644 ms ± 0 ns per loop (mean ± std. dev. of 1 run, 1 loop each)

# after creating the __pycache__ folder, import is MUCH faster
%%timeit
from mydict import d

1.37 µs ± 54.2 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)


pickle file:



%%timeit
with open('mydict.pickle', 'rb') as fp:
pickle.load(fp)

52.4 ms ± 1.03 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)


json file:



%%timeit
with open('mydict.json', 'rb') as fp:
json.load(fp)

81.3 ms ± 2.21 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)

# here is the same test with ujson
import ujson

%%timeit
with open('mydict.json', 'rb') as fp:
ujson.load(fp)

51.2 ms ± 304 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)





share|improve this answer


























  • Since I did specifically mention ujson to improve json speed, could you add a timing with that as well?

    – scnerd
    Jan 2 at 21:31











  • Also, for the Python file, you don't need the None: False at the end, at least in Python 3.6+ (probably any 3.X, not sure), it's valid syntax to have an extra comma at the end of a dictionary literal

    – scnerd
    Jan 2 at 21:32











  • Oh! Did not know that.

    – James
    Jan 2 at 21:35






  • 1





    Added the ujson test. It looks comparable to using pickle.

    – James
    Jan 2 at 21:41


















2 Answers
2






active

oldest

votes








2 Answers
2






active

oldest

votes









active

oldest

votes






active

oldest

votes









3














Python File



Using a python file will easily cache the dictionary, so that if you "import" it multiple times, it only has to be parsed once. However, python syntax is complicated, and so the parser that loads the file may not be well optimized for the limited complexity of the data you're saving (unless you're including arbitrary Python objects and code). It's easy to view and edit, and easy to use, but it's not easy to transport.



EDIT: to clarify, raw Python files are easy for a human to modify, but very hard for a computer to edit. If your code edits the data and you ever want that to be reflected in the dictionary, you're pretty much up a creek: instead, use one of the methods below.



Pickle File



If you use a pickle file, you'd either re-load the file each time you use it, or need some management code to cache the file after reading it the first time. Like arbitrary Python code, pickle files can be quite complex and the loader for them might not be optimized for your particular data types since, like raw python files, they can also store most arbitrary Python objects. However, they're hard to edit and view for a regular human, and you might encounter portability issues if you move the data around. It's also only readable by Python, and you need to consider the security considerations of using pickle, since loading pickle files can be risky and should only be done with trusted files.



JSON File



If all you're storing is simple objects (dictionaries, lists, strings, booleans, numbers), consider using the JSON file format. Python has a built-in json module that's just as easy to use as pickle, so there's no added complexity. These files are easy to store, view, edit, and compress (if desired), and look almost exactly like a python dictionary. It's highly portable (most common languages support reading/writing JSON files these days), and if you need to improve file loading speed, the ujson module is a faster, drop-in replacement for the standard json module. Since the JSON file format is fairly restricted, I'd expect its parsers and writers to be quite a bit faster than the regular Python or Pickle parsers (especially using ujson).






share|improve this answer


























  • Yes, yes and yes, please use a common format such as json. It's not just faster, it's also safer, portable, human-readable, re-usable…

    – spectras
    Jan 2 at 19:58











  • I wouldn't consider json exactly human-readable (beyond the literal definition)... after all that's why hjson exists. But I do agree on using a common format like itself.

    – Idlehands
    Jan 2 at 20:06













  • YMMV, it's reasonably human-readable while reasonably fast. Requirements will certainly shift the cursor. I wouldn't use json for any file intended to be written by a human. Nor would I use it for performance-critical serialization. Probably I'd use YAML for the former and FlatBuffers for the latter, or something along those lines. JSON is a good in-the-middle general-purpose data format :)

    – spectras
    Jan 2 at 20:12











  • Agreed, YAML is a great format for human readability, and custom formats like FlatBuffers and Protobuf are great for performance and serialization size. However, support for YAML is a bit more spotty (even Python needs a third party module to read it), and custom binary formats often involve more setup (e.g. defining and compiling the data structure in advance). JSON is a great compromise, being extremely easy to use in general, and (if formatted properly) easy to read/write. It's a good next step for people who don't want to get swamped in all the possible serialization methods.

    – scnerd
    Jan 2 at 21:35
















3














Python File



Using a python file will easily cache the dictionary, so that if you "import" it multiple times, it only has to be parsed once. However, python syntax is complicated, and so the parser that loads the file may not be well optimized for the limited complexity of the data you're saving (unless you're including arbitrary Python objects and code). It's easy to view and edit, and easy to use, but it's not easy to transport.



EDIT: to clarify, raw Python files are easy for a human to modify, but very hard for a computer to edit. If your code edits the data and you ever want that to be reflected in the dictionary, you're pretty much up a creek: instead, use one of the methods below.



Pickle File



If you use a pickle file, you'd either re-load the file each time you use it, or need some management code to cache the file after reading it the first time. Like arbitrary Python code, pickle files can be quite complex and the loader for them might not be optimized for your particular data types since, like raw python files, they can also store most arbitrary Python objects. However, they're hard to edit and view for a regular human, and you might encounter portability issues if you move the data around. It's also only readable by Python, and you need to consider the security considerations of using pickle, since loading pickle files can be risky and should only be done with trusted files.



JSON File



If all you're storing is simple objects (dictionaries, lists, strings, booleans, numbers), consider using the JSON file format. Python has a built-in json module that's just as easy to use as pickle, so there's no added complexity. These files are easy to store, view, edit, and compress (if desired), and look almost exactly like a python dictionary. It's highly portable (most common languages support reading/writing JSON files these days), and if you need to improve file loading speed, the ujson module is a faster, drop-in replacement for the standard json module. Since the JSON file format is fairly restricted, I'd expect its parsers and writers to be quite a bit faster than the regular Python or Pickle parsers (especially using ujson).






share|improve this answer


























  • Yes, yes and yes, please use a common format such as json. It's not just faster, it's also safer, portable, human-readable, re-usable…

    – spectras
    Jan 2 at 19:58











  • I wouldn't consider json exactly human-readable (beyond the literal definition)... after all that's why hjson exists. But I do agree on using a common format like itself.

    – Idlehands
    Jan 2 at 20:06













  • YMMV, it's reasonably human-readable while reasonably fast. Requirements will certainly shift the cursor. I wouldn't use json for any file intended to be written by a human. Nor would I use it for performance-critical serialization. Probably I'd use YAML for the former and FlatBuffers for the latter, or something along those lines. JSON is a good in-the-middle general-purpose data format :)

    – spectras
    Jan 2 at 20:12











  • Agreed, YAML is a great format for human readability, and custom formats like FlatBuffers and Protobuf are great for performance and serialization size. However, support for YAML is a bit more spotty (even Python needs a third party module to read it), and custom binary formats often involve more setup (e.g. defining and compiling the data structure in advance). JSON is a great compromise, being extremely easy to use in general, and (if formatted properly) easy to read/write. It's a good next step for people who don't want to get swamped in all the possible serialization methods.

    – scnerd
    Jan 2 at 21:35














3












3








3







Python File



Using a python file will easily cache the dictionary, so that if you "import" it multiple times, it only has to be parsed once. However, python syntax is complicated, and so the parser that loads the file may not be well optimized for the limited complexity of the data you're saving (unless you're including arbitrary Python objects and code). It's easy to view and edit, and easy to use, but it's not easy to transport.



EDIT: to clarify, raw Python files are easy for a human to modify, but very hard for a computer to edit. If your code edits the data and you ever want that to be reflected in the dictionary, you're pretty much up a creek: instead, use one of the methods below.



Pickle File



If you use a pickle file, you'd either re-load the file each time you use it, or need some management code to cache the file after reading it the first time. Like arbitrary Python code, pickle files can be quite complex and the loader for them might not be optimized for your particular data types since, like raw python files, they can also store most arbitrary Python objects. However, they're hard to edit and view for a regular human, and you might encounter portability issues if you move the data around. It's also only readable by Python, and you need to consider the security considerations of using pickle, since loading pickle files can be risky and should only be done with trusted files.



JSON File



If all you're storing is simple objects (dictionaries, lists, strings, booleans, numbers), consider using the JSON file format. Python has a built-in json module that's just as easy to use as pickle, so there's no added complexity. These files are easy to store, view, edit, and compress (if desired), and look almost exactly like a python dictionary. It's highly portable (most common languages support reading/writing JSON files these days), and if you need to improve file loading speed, the ujson module is a faster, drop-in replacement for the standard json module. Since the JSON file format is fairly restricted, I'd expect its parsers and writers to be quite a bit faster than the regular Python or Pickle parsers (especially using ujson).






share|improve this answer















Python File



Using a python file will easily cache the dictionary, so that if you "import" it multiple times, it only has to be parsed once. However, python syntax is complicated, and so the parser that loads the file may not be well optimized for the limited complexity of the data you're saving (unless you're including arbitrary Python objects and code). It's easy to view and edit, and easy to use, but it's not easy to transport.



EDIT: to clarify, raw Python files are easy for a human to modify, but very hard for a computer to edit. If your code edits the data and you ever want that to be reflected in the dictionary, you're pretty much up a creek: instead, use one of the methods below.



Pickle File



If you use a pickle file, you'd either re-load the file each time you use it, or need some management code to cache the file after reading it the first time. Like arbitrary Python code, pickle files can be quite complex and the loader for them might not be optimized for your particular data types since, like raw python files, they can also store most arbitrary Python objects. However, they're hard to edit and view for a regular human, and you might encounter portability issues if you move the data around. It's also only readable by Python, and you need to consider the security considerations of using pickle, since loading pickle files can be risky and should only be done with trusted files.



JSON File



If all you're storing is simple objects (dictionaries, lists, strings, booleans, numbers), consider using the JSON file format. Python has a built-in json module that's just as easy to use as pickle, so there's no added complexity. These files are easy to store, view, edit, and compress (if desired), and look almost exactly like a python dictionary. It's highly portable (most common languages support reading/writing JSON files these days), and if you need to improve file loading speed, the ujson module is a faster, drop-in replacement for the standard json module. Since the JSON file format is fairly restricted, I'd expect its parsers and writers to be quite a bit faster than the regular Python or Pickle parsers (especially using ujson).







share|improve this answer














share|improve this answer



share|improve this answer








edited Jan 2 at 21:45

























answered Jan 2 at 19:35









scnerdscnerd

3,33411026




3,33411026













  • Yes, yes and yes, please use a common format such as json. It's not just faster, it's also safer, portable, human-readable, re-usable…

    – spectras
    Jan 2 at 19:58











  • I wouldn't consider json exactly human-readable (beyond the literal definition)... after all that's why hjson exists. But I do agree on using a common format like itself.

    – Idlehands
    Jan 2 at 20:06













  • YMMV, it's reasonably human-readable while reasonably fast. Requirements will certainly shift the cursor. I wouldn't use json for any file intended to be written by a human. Nor would I use it for performance-critical serialization. Probably I'd use YAML for the former and FlatBuffers for the latter, or something along those lines. JSON is a good in-the-middle general-purpose data format :)

    – spectras
    Jan 2 at 20:12











  • Agreed, YAML is a great format for human readability, and custom formats like FlatBuffers and Protobuf are great for performance and serialization size. However, support for YAML is a bit more spotty (even Python needs a third party module to read it), and custom binary formats often involve more setup (e.g. defining and compiling the data structure in advance). JSON is a great compromise, being extremely easy to use in general, and (if formatted properly) easy to read/write. It's a good next step for people who don't want to get swamped in all the possible serialization methods.

    – scnerd
    Jan 2 at 21:35



















  • Yes, yes and yes, please use a common format such as json. It's not just faster, it's also safer, portable, human-readable, re-usable…

    – spectras
    Jan 2 at 19:58











  • I wouldn't consider json exactly human-readable (beyond the literal definition)... after all that's why hjson exists. But I do agree on using a common format like itself.

    – Idlehands
    Jan 2 at 20:06













  • YMMV, it's reasonably human-readable while reasonably fast. Requirements will certainly shift the cursor. I wouldn't use json for any file intended to be written by a human. Nor would I use it for performance-critical serialization. Probably I'd use YAML for the former and FlatBuffers for the latter, or something along those lines. JSON is a good in-the-middle general-purpose data format :)

    – spectras
    Jan 2 at 20:12











  • Agreed, YAML is a great format for human readability, and custom formats like FlatBuffers and Protobuf are great for performance and serialization size. However, support for YAML is a bit more spotty (even Python needs a third party module to read it), and custom binary formats often involve more setup (e.g. defining and compiling the data structure in advance). JSON is a great compromise, being extremely easy to use in general, and (if formatted properly) easy to read/write. It's a good next step for people who don't want to get swamped in all the possible serialization methods.

    – scnerd
    Jan 2 at 21:35

















Yes, yes and yes, please use a common format such as json. It's not just faster, it's also safer, portable, human-readable, re-usable…

– spectras
Jan 2 at 19:58





Yes, yes and yes, please use a common format such as json. It's not just faster, it's also safer, portable, human-readable, re-usable…

– spectras
Jan 2 at 19:58













I wouldn't consider json exactly human-readable (beyond the literal definition)... after all that's why hjson exists. But I do agree on using a common format like itself.

– Idlehands
Jan 2 at 20:06







I wouldn't consider json exactly human-readable (beyond the literal definition)... after all that's why hjson exists. But I do agree on using a common format like itself.

– Idlehands
Jan 2 at 20:06















YMMV, it's reasonably human-readable while reasonably fast. Requirements will certainly shift the cursor. I wouldn't use json for any file intended to be written by a human. Nor would I use it for performance-critical serialization. Probably I'd use YAML for the former and FlatBuffers for the latter, or something along those lines. JSON is a good in-the-middle general-purpose data format :)

– spectras
Jan 2 at 20:12





YMMV, it's reasonably human-readable while reasonably fast. Requirements will certainly shift the cursor. I wouldn't use json for any file intended to be written by a human. Nor would I use it for performance-critical serialization. Probably I'd use YAML for the former and FlatBuffers for the latter, or something along those lines. JSON is a good in-the-middle general-purpose data format :)

– spectras
Jan 2 at 20:12













Agreed, YAML is a great format for human readability, and custom formats like FlatBuffers and Protobuf are great for performance and serialization size. However, support for YAML is a bit more spotty (even Python needs a third party module to read it), and custom binary formats often involve more setup (e.g. defining and compiling the data structure in advance). JSON is a great compromise, being extremely easy to use in general, and (if formatted properly) easy to read/write. It's a good next step for people who don't want to get swamped in all the possible serialization methods.

– scnerd
Jan 2 at 21:35





Agreed, YAML is a great format for human readability, and custom formats like FlatBuffers and Protobuf are great for performance and serialization size. However, support for YAML is a bit more spotty (even Python needs a third party module to read it), and custom binary formats often involve more setup (e.g. defining and compiling the data structure in advance). JSON is a great compromise, being extremely easy to use in general, and (if formatted properly) easy to read/write. It's a good next step for people who don't want to get swamped in all the possible serialization methods.

– scnerd
Jan 2 at 21:35













2














To add to @scnerd's comment, here are the timings in IPython for different load situations.



Here we create a dictionary and write it to 3 formats:



import random
import json
import pickle

letters = 'abcdefghijklmnopqrstuvwxyz'
d = {''.join(random.choices(letters, k=6)): random.choice([True, False])
for _ in range(100000)}

# write a python file
with open('mydict.py', 'w') as fp:
fp.write('d = {n')
for k,v in d.items():
fp.write(f"'{k}':{v},n")
fp.write('None:False}')

# write a pickle file
with open('mydict.pickle', 'wb') as fp:
pickle.dump(d, fp)

# write a json file
with open('mydict.json', 'wb') as fp:
json.dump(d, fp)


Python file:



# on first import the file will be cached.  
%%timeit -n1 -r1
from mydict import d

644 ms ± 0 ns per loop (mean ± std. dev. of 1 run, 1 loop each)

# after creating the __pycache__ folder, import is MUCH faster
%%timeit
from mydict import d

1.37 µs ± 54.2 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)


pickle file:



%%timeit
with open('mydict.pickle', 'rb') as fp:
pickle.load(fp)

52.4 ms ± 1.03 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)


json file:



%%timeit
with open('mydict.json', 'rb') as fp:
json.load(fp)

81.3 ms ± 2.21 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)

# here is the same test with ujson
import ujson

%%timeit
with open('mydict.json', 'rb') as fp:
ujson.load(fp)

51.2 ms ± 304 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)





share|improve this answer


























  • Since I did specifically mention ujson to improve json speed, could you add a timing with that as well?

    – scnerd
    Jan 2 at 21:31











  • Also, for the Python file, you don't need the None: False at the end, at least in Python 3.6+ (probably any 3.X, not sure), it's valid syntax to have an extra comma at the end of a dictionary literal

    – scnerd
    Jan 2 at 21:32











  • Oh! Did not know that.

    – James
    Jan 2 at 21:35






  • 1





    Added the ujson test. It looks comparable to using pickle.

    – James
    Jan 2 at 21:41
















2














To add to @scnerd's comment, here are the timings in IPython for different load situations.



Here we create a dictionary and write it to 3 formats:



import random
import json
import pickle

letters = 'abcdefghijklmnopqrstuvwxyz'
d = {''.join(random.choices(letters, k=6)): random.choice([True, False])
for _ in range(100000)}

# write a python file
with open('mydict.py', 'w') as fp:
fp.write('d = {n')
for k,v in d.items():
fp.write(f"'{k}':{v},n")
fp.write('None:False}')

# write a pickle file
with open('mydict.pickle', 'wb') as fp:
pickle.dump(d, fp)

# write a json file
with open('mydict.json', 'wb') as fp:
json.dump(d, fp)


Python file:



# on first import the file will be cached.  
%%timeit -n1 -r1
from mydict import d

644 ms ± 0 ns per loop (mean ± std. dev. of 1 run, 1 loop each)

# after creating the __pycache__ folder, import is MUCH faster
%%timeit
from mydict import d

1.37 µs ± 54.2 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)


pickle file:



%%timeit
with open('mydict.pickle', 'rb') as fp:
pickle.load(fp)

52.4 ms ± 1.03 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)


json file:



%%timeit
with open('mydict.json', 'rb') as fp:
json.load(fp)

81.3 ms ± 2.21 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)

# here is the same test with ujson
import ujson

%%timeit
with open('mydict.json', 'rb') as fp:
ujson.load(fp)

51.2 ms ± 304 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)





share|improve this answer


























  • Since I did specifically mention ujson to improve json speed, could you add a timing with that as well?

    – scnerd
    Jan 2 at 21:31











  • Also, for the Python file, you don't need the None: False at the end, at least in Python 3.6+ (probably any 3.X, not sure), it's valid syntax to have an extra comma at the end of a dictionary literal

    – scnerd
    Jan 2 at 21:32











  • Oh! Did not know that.

    – James
    Jan 2 at 21:35






  • 1





    Added the ujson test. It looks comparable to using pickle.

    – James
    Jan 2 at 21:41














2












2








2







To add to @scnerd's comment, here are the timings in IPython for different load situations.



Here we create a dictionary and write it to 3 formats:



import random
import json
import pickle

letters = 'abcdefghijklmnopqrstuvwxyz'
d = {''.join(random.choices(letters, k=6)): random.choice([True, False])
for _ in range(100000)}

# write a python file
with open('mydict.py', 'w') as fp:
fp.write('d = {n')
for k,v in d.items():
fp.write(f"'{k}':{v},n")
fp.write('None:False}')

# write a pickle file
with open('mydict.pickle', 'wb') as fp:
pickle.dump(d, fp)

# write a json file
with open('mydict.json', 'wb') as fp:
json.dump(d, fp)


Python file:



# on first import the file will be cached.  
%%timeit -n1 -r1
from mydict import d

644 ms ± 0 ns per loop (mean ± std. dev. of 1 run, 1 loop each)

# after creating the __pycache__ folder, import is MUCH faster
%%timeit
from mydict import d

1.37 µs ± 54.2 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)


pickle file:



%%timeit
with open('mydict.pickle', 'rb') as fp:
pickle.load(fp)

52.4 ms ± 1.03 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)


json file:



%%timeit
with open('mydict.json', 'rb') as fp:
json.load(fp)

81.3 ms ± 2.21 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)

# here is the same test with ujson
import ujson

%%timeit
with open('mydict.json', 'rb') as fp:
ujson.load(fp)

51.2 ms ± 304 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)





share|improve this answer















To add to @scnerd's comment, here are the timings in IPython for different load situations.



Here we create a dictionary and write it to 3 formats:



import random
import json
import pickle

letters = 'abcdefghijklmnopqrstuvwxyz'
d = {''.join(random.choices(letters, k=6)): random.choice([True, False])
for _ in range(100000)}

# write a python file
with open('mydict.py', 'w') as fp:
fp.write('d = {n')
for k,v in d.items():
fp.write(f"'{k}':{v},n")
fp.write('None:False}')

# write a pickle file
with open('mydict.pickle', 'wb') as fp:
pickle.dump(d, fp)

# write a json file
with open('mydict.json', 'wb') as fp:
json.dump(d, fp)


Python file:



# on first import the file will be cached.  
%%timeit -n1 -r1
from mydict import d

644 ms ± 0 ns per loop (mean ± std. dev. of 1 run, 1 loop each)

# after creating the __pycache__ folder, import is MUCH faster
%%timeit
from mydict import d

1.37 µs ± 54.2 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)


pickle file:



%%timeit
with open('mydict.pickle', 'rb') as fp:
pickle.load(fp)

52.4 ms ± 1.03 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)


json file:



%%timeit
with open('mydict.json', 'rb') as fp:
json.load(fp)

81.3 ms ± 2.21 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)

# here is the same test with ujson
import ujson

%%timeit
with open('mydict.json', 'rb') as fp:
ujson.load(fp)

51.2 ms ± 304 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)






share|improve this answer














share|improve this answer



share|improve this answer








edited Jan 2 at 21:40

























answered Jan 2 at 20:02









JamesJames

13.7k11633




13.7k11633













  • Since I did specifically mention ujson to improve json speed, could you add a timing with that as well?

    – scnerd
    Jan 2 at 21:31











  • Also, for the Python file, you don't need the None: False at the end, at least in Python 3.6+ (probably any 3.X, not sure), it's valid syntax to have an extra comma at the end of a dictionary literal

    – scnerd
    Jan 2 at 21:32











  • Oh! Did not know that.

    – James
    Jan 2 at 21:35






  • 1





    Added the ujson test. It looks comparable to using pickle.

    – James
    Jan 2 at 21:41



















  • Since I did specifically mention ujson to improve json speed, could you add a timing with that as well?

    – scnerd
    Jan 2 at 21:31











  • Also, for the Python file, you don't need the None: False at the end, at least in Python 3.6+ (probably any 3.X, not sure), it's valid syntax to have an extra comma at the end of a dictionary literal

    – scnerd
    Jan 2 at 21:32











  • Oh! Did not know that.

    – James
    Jan 2 at 21:35






  • 1





    Added the ujson test. It looks comparable to using pickle.

    – James
    Jan 2 at 21:41

















Since I did specifically mention ujson to improve json speed, could you add a timing with that as well?

– scnerd
Jan 2 at 21:31





Since I did specifically mention ujson to improve json speed, could you add a timing with that as well?

– scnerd
Jan 2 at 21:31













Also, for the Python file, you don't need the None: False at the end, at least in Python 3.6+ (probably any 3.X, not sure), it's valid syntax to have an extra comma at the end of a dictionary literal

– scnerd
Jan 2 at 21:32





Also, for the Python file, you don't need the None: False at the end, at least in Python 3.6+ (probably any 3.X, not sure), it's valid syntax to have an extra comma at the end of a dictionary literal

– scnerd
Jan 2 at 21:32













Oh! Did not know that.

– James
Jan 2 at 21:35





Oh! Did not know that.

– James
Jan 2 at 21:35




1




1





Added the ujson test. It looks comparable to using pickle.

– James
Jan 2 at 21:41





Added the ujson test. It looks comparable to using pickle.

– James
Jan 2 at 21:41



Popular posts from this blog

Monofisismo

Angular Downloading a file using contenturl with Basic Authentication

Olmecas