Intro
Recently I did an interview with the following problem:
You have a list of strings, on our list, some of the strings were corrupted on the reading process. You want to avoid duplications so do an algorithm to solve this.
List example: ttoo, toto, oott.
Example output: toto.
List example 2: xt, tx.
Example output 2: xt.
Solutions
There are several ways to solve the problem and basically, you can understand it as a sort algorithm, a kind of special sort, but a sort. Therefore a bubble sort or insertion sort can be applied.
Solution 1
My second solution was to come back to a dict and try to use it to optimize the size comparison. However, since I’m using a dict (i.e. an encoding operation) I will need therefore an operation for translate it in a list (i.e. decoding operation).
Part A – encoding:
for each_word in words: key = len(each_word) if key not in result: result[key] = [each_word] else: list_values = result[key] list_values.append(each_word) result[key] = list_values
Part A2: I use an aux function to end correctly in the dict
def create_new_list(list_values): #['ot', 'to', 'tt'] #['to', 'tt'] sorted(list_values) list_new = [list_values[0]] flag = True for i in range(1, len(list_values)): flag = True for each in list_new: if set(list_values[i]) == set(each): flag = False if flag: list_new.append(list_values[i]) return list_new
Solution 2
This was actually my first solution on the head, but second to be implemented. And its more or less based on the insertion sort algorithm and basically inserts an element on the final list if and only if the set of this element not there. [I mean based on insertion because you will get the elements on the list and compare with the final list.] + the survival flag for each element.
However, you still need to do the size comparison to avoid tt in ttoo problems:
flag = True for i in range(1, len(words)): flag = True for each in final: if len(words[i]) == len(each) and set(words[i]) == set(each): flag = False if flag: final.append(words[i]) return final
The solution can be improved by using the sorted using len option (i) + the comparison is first done on the length and then on the set, avoiding unnecessary comparisons.
(i) sorted(xs, key=len)
Solution 3
Solution 3 would use a dict comprehension on the first solution. But after thinking about it, it just too noisy and unclear for the reader.
(i) I’m creating the values of the Dict on the fly, so I don’t need an extra function to do it for me (encoding).
(ii) I just put a list comprehension in the end (for the decoder operation).
(i)for each_word in words: key = len(each_word) if key not in result: result[key] = [each_word] else: list_values = result[key] # 2:[cra] - car for each in list_values: if set(each) != set(each_word): list_values.append(each_word) result[key] = list_values
(ii) return [result.get(each_key)[0] for each_key in result.keys()]
Solution 4
It’s totally possible to do it in just one list comprehension, though it would be very long though, basically because one would need to condense the survival code within the list comprehension. I don’t think it worth though.
Source
All the solutions are there.
https://github.com/FranciscoMeloJr/Python-Tests/tree/master/banks
REFs
[1] https://www.datacamp.com/community/tutorials/python-dictionary-comprehension