CopyPastor

Detecting plagiarism made easy.

Score: -1; Reported for: Open both answers

Possible Plagiarism

Reposted on 2018-01-12

Original Post

Original - Posted on 2016-10-11



            
Present in both answers; Present only in the new answer; Present only in the old answer;

## Yes, dictionaries are ordered in CPython 3.7 (and guaranteed to be so in all future versions)
Guido von Rossam, the "benevolent dictator for life" of CPython, [stated in an email](https://code.activestate.com/lists/python-dev/148139/) that starting in CPython 3.7, dictionaries will have their *insertion order* preserved. Ordering by insertion order is *guaranteed*, meaning it can be relied upon in CPython 3.7 and all future versions.
Here's the [origin proposal](https://mail.python.org/pipermail/python-dev/2012-December/123028.html) by Raymond Hettinger, which was implemented (but not guaranteed) in CPython 3.6.
## What does this mean for the ordinary user? <sup>(compared to CPython <= 3.5)</sup>
- Significant memory savings (likely >30%) - Faster iteration - Faster resizing - Likely better cache utilization (*read: even faster!*)

## How?
For CPython <= 3.5, a dictionary such as `d = {'timmy': 'red', 'barry': 'green', 'guido': 'blue'} ` would be stored as follows:
entries = [['--', '--', '--'], [-8522787127447073495, 'barry', 'green'], ['--', '--', '--'], ['--', '--', '--'], ['--', '--', '--'], [-9092791511155847987, 'timmy', 'red'], ['--', '--', '--'], [-6480567542315338377, 'guido', 'blue']]
Under the new structure, it is instead represented as:
indices = [None, 1, None, None, None, 0, None, 2] entries = [[-9092791511155847987, 'timmy', 'red'], [-8522787127447073495, 'barry', 'green'], [-6480567542315338377, 'guido', 'blue']]
Because of the way dictionaries were implemented, they were *never* more than 2/3 full. This is still the case, but only *one* array (rather than *three*) has that requirement. The other two arrays (the data and value) can be 100% full. When a key is removed, a hole is left in the table. If you never remove keys, the main table will *always* be 100% full!
<sub>* Examples from Raymond Hettinger in his proposal.</sub>
> **Are dictionaries ordered in Python 3.6+?**
Yes, for the CPython implementation of Python, dictionaries are ordered as of Python 3.6. *This is considered an implementation detail in Python 3.6*; you need to use `OrderedDict` if you want ordering that's *guaranteed* across other implementations of Python.
**As of Python 3.7**, this is no longer an implementation detail and instead becomes a language feature. [From a python-dev message by GvR][11]:
> Make it so. "Dict keeps insertion order" is the ruling. Thanks!
This simply means that *you can depend on it*. Other implementations of Python must also offer an ordered dictionary if they wish to be a conforming implementation of Python 3.7.
---
> **How does the Python `3.6` dictionary implementation perform better* than the older one while preserving element order?**
Essentially, by *keeping two arrays*.
- The first array, [`dk_entries`][1], holds the entries ([of type ` PyDictKeyEntry`][2]) for the dictionary in the order that they were inserted. Preserving order is achieved by this being an append only array where new items are always inserted at the end (insertion order).
- The second, [`dk_indices`][3], holds the indices for the `dk_entries` array (that is, values that indicate the position of the corresponding entry in `dk_entries`). This array acts as the hash table. When a key is hashed it leads to one of the indices stored in `dk_indices` and the corresponding entry is fetched by indexing `dk_entries`. Since only indices are kept, the type of this array depends on the overall size of the dictionary (ranging from type [`int8_t`][4](`1` byte) to [`int32_t`][5]/[`int64_t`][6] (`4`/`8` bytes) on `32`/`64` bit builds)
In the previous implementation, a sparse array of type `PyDictKeyEntry` and size `dk_size` had to be allocated; unfortunately, it also resulted in a lot of empty space since that array was not allowed to be more than `2/3 * dk_size` full [for performance reasons][7]. (and the empty space *still* had `PyDictKeyEntry` size!).
This is not the case now since only the *required* entries are stored (those that have been inserted) and a sparse array of type `intX_t` (`X` depending on dict size) `2/3 * dk_size`s full is kept. The empty space changed from type `PyDictKeyEntry` to `intX_t`.
So, obviously, creating a sparse array of type `PyDictKeyEntry` is much more memory demanding than a sparse array for storing `int`s.
You can see the full conversation [on Python-Dev][8] regarding this feature if interested, it is a good read.
---
[In the original proposal made by Raymond Hettinger][9], a visualization of the data structures used can be seen which captures the gist of the idea.
>For example, the dictionary: > d = {'timmy': 'red', 'barry': 'green', 'guido': 'blue'} > is currently stored as: > entries = [['--', '--', '--'], [-8522787127447073495, 'barry', 'green'], ['--', '--', '--'], ['--', '--', '--'], ['--', '--', '--'], [-9092791511155847987, 'timmy', 'red'], ['--', '--', '--'], [-6480567542315338377, 'guido', 'blue']] > Instead, the data should be organized as follows: > indices = [None, 1, None, None, None, 0, None, 2] entries = [[-9092791511155847987, 'timmy', 'red'], [-8522787127447073495, 'barry', 'green'], [-6480567542315338377, 'guido', 'blue']]

As you can visually now see, in the original proposal, a lot of space is essentially empty to reduce collisions and make look-ups faster. With the new approach, you reduce the memory required by moving the sparseness where it's really required, in the indices.
<sub> *The new dictionary implementations performs better **memory wise** by being designed more compactly; that's the main benefit here. Speed wise, the difference isn't so drastic, there's places where the new dict might introduce slight regressions ([key-lookups, for example][10]) while in others (iteration and resizing come to mind) a performance boost should be present. </sub>
<sub> Overall, the performance of the dictionary, especially in real-life situations, improves due to the compactness introduced. </sub>

[1]: https://github.com/python/cpython/blob/474ef63e38726d4bcde14f6104984a742c6cb747/Objects/dictobject.c#L551 [2]: https://github.com/python/cpython/blob/c30098c8c6014f3340a369a31df9c74bdbacc269/Objects/dict-common.h#L4 [3]: https://github.com/python/cpython/blob/c30098c8c6014f3340a369a31df9c74bdbacc269/Objects/dict-common.h#L70 [4]: https://github.com/python/cpython/blob/c30098c8c6014f3340a369a31df9c74bdbacc269/Objects/dict-common.h#L64 [5]: https://github.com/python/cpython/blob/c30098c8c6014f3340a369a31df9c74bdbacc269/Objects/dict-common.h#L66 [6]: https://github.com/python/cpython/blob/c30098c8c6014f3340a369a31df9c74bdbacc269/Objects/dict-common.h#L68 [7]: https://github.com/python/cpython/blob/474ef63e38726d4bcde14f6104984a742c6cb747/Objects/dictobject.c#L375 [8]: https://mail.python.org/pipermail/python-dev/2016-September/146327.html [9]: https://mail.python.org/pipermail/python-dev/2012-December/123028.html [10]: http://bugs.python.org/msg275587 [11]: https://mail.python.org/pipermail/python-dev/2017-December/151283.html

        
Present in both answers; Present only in the new answer; Present only in the old answer;