CopyPastor

Detecting plagiarism made easy.

Score: 1; Reported for: Exact paragraph match Open both answers

Possible Plagiarism

Plagiarized on 2018-07-22
by MSeifert

Original Post

Original - Posted on 2013-06-10
by jamylak



            
Present in both answers; Present only in the new answer; Present only in the old answer;

It's actually quite tricky to get that right in a general way.
It essentially boils down to two basic problems:
- Checking if two lists contain the same elements - Remove all lists that contain the same elements
I'll tackle these separately.
## Check if two lists contain the same elements
I best to refer to Raymond Hettingers answer from [here](https://stackoverflow.com/a/7829388/5393381):
> **O(n)**: The *[Counter()][1]* method is best (if your objects are hashable):
> from collections import Counter def compare(s, t): return Counter(s) == Counter(t)
> **O(n log n)**: The *[sorted()][2]* method is next best (if your objects are orderable):
> def compare(s, t): return sorted(s) == sorted(t)
> **O(n * n)**: If the objects are neither hashable, nor orderable, you can use equality:

> def compare(s, t): t = list(t) # make a mutable copy try: for elem in s: t.remove(elem) except ValueError: return False return not t
In your case you don't want any imports so you could replace `collections.Counter` with:
def count(it): d = {} for item in it: try: d[item] += 1 except KeyError: d[item] = 1 return d
Just in case the items are hashable and you don't care about the count of the items (for example `[1,1,2]` should be interpreted as equal to `[1,2,2]`) or they will always be unique then you could also use `set`s:
def compare(s, t): return set(s) == set(t)
So that way you can check if two sublists contain the same elements. There are possible optimizations in case you *might* have lists of different lengths, then it could be worthwhile to add a: if len(s) != len(t): return False
At the beginning of each of these functions.
## Removing duplicates from the list
That also depends on assumptions about the result (should the non-duplicates keep their relative order or not) and the contents (again, can you hash the contents or can they be ordered).
If the items are hashable (or could be converted to something hashable) you could use a `set` call to remove duplicates. If you care about the order you can still use a set but only for lookups, for example the recipe from the itertools documentation [`unique_everseen`](https://docs.python.org/library/itertools.html#itertools-recipes):
from itertools import filterfalse
def unique_everseen(iterable, key=None): "List unique elements, preserving order. Remember all elements ever seen." # unique_everseen('AAAABBBCCDAABBB') --> A B C D # unique_everseen('ABBCcAD', str.lower) --> A B C D seen = set() seen_add = seen.add if key is None: for element in filterfalse(seen.__contains__, iterable): seen_add(element) yield element else: for element in iterable: k = key(element) if k not in seen: seen_add(k) yield element
You mentioned no imports but fortunately we don't need the `key is None` part anyway (see below) so you can simple use:
def unique_everseen(iterable, key): seen = set() seen_add = seen.add for element in iterable: k = key(element) if k not in seen: seen_add(k) yield element
Note that the approaches to compare the inner lists use sets, dictionaries and lists which are unhashable. But all of them can be converted to a hashable collections, like frozensets or tuples:
# for sets frozenset(s)
# for dictionaries frozenset(d.items())
# for lists tuple(l)
However the last approach (if the items are unhashable and cannot be ordered) can't be used with this approach so let's ignore it for now.
Basically you could then use `unique_everseen` like this:
list(unique_everseen(your_list, key=lambda sublist: frozenset(count(sublist).items()))) # Or with collections.Counter instead of count
Or if you don't care about the duplicates (or there will be no duplicates) inside your sublists:
list(unique_everseen(your_list, key=frozenset))
Or if they are not hashable but can be ordered:
list(unique_everseen(your_list, key=lambda sublist: tuple(sorted(sublist))))
Just the approach in case the items in your sublist are not hashable and not orderable cannot be done using that fast `unique_everseen` approach. You'll have to use a slower variant:
def compare(s, t): t = list(t) # make a mutable copy try: for elem in s: t.remove(elem) except ValueError: return False return not t def unique_everseen_slow(iterable): seen = [] for element in iterable: for already_seen_item in seen: if compare(element, already_seen_item): break # We found a match, so stop looking else: seen.append(element) yield element list(unique_everseen_slow(your_list))
The `else` clause belongs to the `for` loop and is only entered when there was no `break`. You could instead also check for [`any`](https://docs.python.org/library/functions.html#any) to avoid this `for`-`else`:
def unique_everseen_slow(iterable): seen = [] for element in iterable: if not any(compare(element, seen_element) for seen_element in seen): seen.append(element) yield element
---
In your case it's actually very easy because the integers in the sublists are hashable and orderable. But this can become very complex for more general cases.
However in your case you could even avoid creating duplicate factors by simply checking that the factors are sorted (and if not stop):
def factors(number): for candidate in range(1, number + 1): if number % candidate == 0: other_factor = number // candidate if candidate > other_factor: return yield [candidate, other_factor] For example: >>> list(factors(200)) [[1, 200], [2, 100], [4, 50], [5, 40], [8, 25], [10, 20]]
[1]: https://docs.python.org/library/collections.html#collections.Counter [2]: https://docs.python.org/library/functions.html#sorted
**Edit 2016**
As Raymond [pointed out](https://stackoverflow.com/a/39835527/336527), in python 3.5+ where `OrderedDict` is implemented in C, the list comprehension approach will be slower than `OrderedDict` (unless you actually need the list at the end - and even then, only if the input is very short). So the best solution for 3.5+ is `OrderedDict`.
**Important Edit 2015**
As [@abarnert][1] notes, the [`more_itertools`][2] library (`pip install more_itertools`) contains a [`unique_everseen`][3] function that is built to solve this problem without any **unreadable** (`not seen.add`) **mutations** in list comprehensions. This is also the fastest solution too:
>>> from more_itertools import unique_everseen >>> items = [1, 2, 0, 1, 3, 2] >>> list(unique_everseen(items)) [1, 2, 0, 3]
Just one simple library import and no hacks. This comes from an implementation of the itertools recipe [`unique_everseen`][4] which looks like:
def unique_everseen(iterable, key=None): "List unique elements, preserving order. Remember all elements ever seen." # unique_everseen('AAAABBBCCDAABBB') --> A B C D # unique_everseen('ABBCcAD', str.lower) --> A B C D seen = set() seen_add = seen.add if key is None: for element in filterfalse(seen.__contains__, iterable): seen_add(element) yield element else: for element in iterable: k = key(element) if k not in seen: seen_add(k) yield element
_____
In Python `2.7+` the <strike>accepted common idiom</strike> (which works but isn't optimized for speed, I would now use [`unique_everseen`][3]) for this uses [`collections.OrderedDict`](http://docs.python.org/3/library/collections.html#collections.OrderedDict):
Runtime: **O(N)**
>>> from collections import OrderedDict >>> items = [1, 2, 0, 1, 3, 2] >>> list(OrderedDict.fromkeys(items)) [1, 2, 0, 3]
This looks much nicer than:
seen = set() [x for x in seq if x not in seen and not seen.add(x)]
and doesn't utilize the **ugly hack**:
not seen.add(x)
which relies on the fact that `set.add` is an in-place method that always returns `None` so `not None` evaluates to `True`.
Note however that the hack solution is faster in raw speed though it has the same runtime complexity O(N).
[1]: https://stackoverflow.com/a/19279812/1219006 [2]: https://pythonhosted.org/more-itertools/api.html [3]: https://pythonhosted.org/more-itertools/api.html#more_itertools.unique_everseen [4]: https://docs.python.org/3/library/itertools.html#itertools-recipes

        
Present in both answers; Present only in the new answer; Present only in the old answer;