Brushing up my python skills since it’s been a while.
The problem:
You have two lists. The first list are the keywords. The seconds contains the sentences that those keywords are used. You just need to return the top (n) of the used keywords:
Example:
First list ~ keywords:
[“anacel“, “betacellular“, “cetracular”, “deltacelular”, “eurocell”]
numFeatureRequests= 3
Second list ~ sentences:
[“Best services provided by anacell“, “betacellular has great services, anacell provides much better services than all other”, “anacell“]
Results
Get me the top 2, so the two most used words.
[‘anacel’, ‘betacellular’]
Implementation:
~ I thought about three ways to solve this: the basic using a histogram (with a dictionary) and a linked list:
#Main function:def popularNFeatures(numFeatures, topFeatures, possibleFeatures, numFeatureRequests, featureRequests, print_flag = False):
Using histogram (and python counter):
~~~
word_counts = Counter(featureRequests)print word_countshistogram = {}#Counter({‘hello’: 2, ‘ciao’: 1, ‘hi’: 1})word_counts[eachFeature]sorted_d = sorted(word_counts.items(), key=operator.itemgetter(1),reverse = True)#[(‘hello’, 2), (‘ciao’, 1), (‘hi’, 1)]return [item[0] for idx,item in enumerate(sorted_d) if idx < topFeatures]
~~~
Using a linked list:
for eachFeature in possibleFeatures:value = endstring.count(eachFeature)aux = module_node.node(eachFeature, value)list_nodes.append(aux)
The linked list is trivial, as Francis Girauldeau would put, and the main part is the insertion, which has three cases:
def insert(self, new_node, print_flag=False):#The list is emptyif print_flag:new_node.printNode()iflen(self.list_nodes) == 0:self.list_nodes.append(new_node)returnTrueiflen(self.list_nodes) > 0:max = self.list_nodes[0].get_value()min = self.list_nodes[-1].get_value()#print “max” + str(max)+ “min” + str(min)if new_node.get_value() >= max:new_list = []new_list.append(new_node)new_list.extend(self.list_nodes)self.list_nodes = new_listif new_node.get_value() <= min:self.list_nodes.append(new_node)if new_node.get_value() > minand new_node.get_value() < max:new_list = []for node inself.list_nodes:if node.get_value() <= new_node.get_value():new_list.append(new_node)new_list.append(node)self.list_nodes = new_list
The source codes and tests are here