Brushing up my python skills since it’s been a while.
The problem:
You have two lists. The first list are the keywords. The seconds contains the sentences that those keywords are used. You just need to return the top (n) of the used keywords:
Example:
First list ~ keywords:
[“anacel“, “betacellular“, “cetracular”, “deltacelular”, “eurocell”]
numFeatureRequests= 3
Second list ~ sentences:
[“Best services provided by anacell“, “betacellular has great services, anacell provides much better services than all other”, “anacell“]
Results
Get me the top 2, so the two most used words.
[‘anacel’, ‘betacellular’]
Implementation:
~ I thought about three ways to solve this: the basic using a histogram (with a dictionary) and a linked list:
#Main function:
def popularNFeatures(numFeatures, topFeatures, possibleFeatures, numFeatureRequests, featureRequests, print_flag = False):
Using histogram (and python counter):
~~~
word_counts = Counter(featureRequests)
print word_counts
histogram = {}
#Counter({‘hello’: 2, ‘ciao’: 1, ‘hi’: 1})
word_counts[eachFeature]
sorted_d = sorted(word_counts.items(), key=operator.itemgetter(1),reverse = True)
#[(‘hello’, 2), (‘ciao’, 1), (‘hi’, 1)]
return [item[0] for idx,item in enumerate(sorted_d) if idx < topFeatures]
~~~
Using a linked list:
for eachFeature in possibleFeatures:
value = endstring.count(eachFeature)
aux = module_node.node(eachFeature, value)
list_nodes.append(aux)
The linked list is trivial, as Francis Girauldeau would put, and the main part is the insertion, which has three cases:
def insert(self, new_node, print_flag=False):
#The list is empty
if print_flag:
new_node.printNode()
iflen(self.list_nodes) == 0:
self.list_nodes.append(new_node)
returnTrue
iflen(self.list_nodes) > 0:
max = self.list_nodes[0].get_value()
min = self.list_nodes[-1].get_value()
#print “max” + str(max)+ “min” + str(min)
if new_node.get_value() >= max:
new_list = []
new_list.append(new_node)
new_list.extend(self.list_nodes)
self.list_nodes = new_list
if new_node.get_value() <= min:
self.list_nodes.append(new_node)
if new_node.get_value() > minand new_node.get_value() < max:
new_list = []
for node inself.list_nodes:
if node.get_value() <= new_node.get_value():
new_list.append(new_node)
new_list.append(node)
self.list_nodes = new_list
The source codes and tests are
here