[Algorithm] LeetCode #819 - Most Common Word

개요

LeetCode #819, Most Common Word 문제를 풀어봅니다.

Given a string paragraph and a string array of the banned words banned, return the most frequent word that is not banned. It is guaranteed there is at least one word that is not banned, and that the answer is unique.

The words in paragraph are case-insensitive and the answer should be returned in lowercase.

Example 1:

Input: paragraph = “Bob hit a ball, the hit BALL flew far after it was hit.”, banned = [“hit”] Output: “ball” Explanation: “hit” occurs 3 times, but it is a banned word. “ball” occurs twice (and no other word does), so it is the most frequent non-banned word in the paragraph. Note that words in the paragraph are not case sensitive, that punctuation is ignored (even if adjacent to words, such as “ball,”), and that “hit” isn’t the answer even though it occurs more because it is banned.

Example 2:

Input: paragraph = “a.”, banned = [] Output: “a”

Constraints:

1 <= paragraph.length <= 1000
paragraph consists of English letters, space ' ', or one of the symbols: "!?',;.".
0 <= banned.length <= 100
1 <= banned[i].length <= 10
banned[i] consists of only lowercase English letters.

첫번째 방식은 받은 paragraph를 적절한 단어만 남겨서 리스트로 만들고 딕셔너리에 개수를 넣는 방식입니다. 딕셔너리에 key로 들어가있지 않은 단어면 새로 추가하면서 0을 지정하고, 그 단어가 나올 때마다 1씩 더해서 개수를 셉니다. 개수 딕셔너리에서 가장 값이 큰 key를 가져오면 됩니다. 두번째 방식은 collections 라이브러리의 Counter를 사용해서 원소별 개수를 세고 most_common으로 개수 큰 순서로 정렬하고 가장 큰 것만 뽑아줍니다.

import collections

class Solution:
    def mostCommonWord(self, paragraph: str, banned: List[str]) -> str:
        # r: r 뒤의 문자열 그대로 반환
        # \w: 알파벳, 숫자, _ 중 하나
        # \w에 해당하지 않는 것은 공백으로 처리
        # 대소문자 구분 없고 소문자 출력
        # banned에 들어가면 안 됨
        words = [word for word in re.sub(r'[^\w]',' ', paragraph)
                 .lower().split() 
                 if word not in banned]
        
        counter = {}
        
        for word in words:
            if word not in counter:
                counter[word] = 0
            counter[word] += 1
        
        # counter에서 값 가장 큰 key 가져옴
        return max(counter, key=counter.get)
    
    def mostCommonWord_ver2(self, paragraph: str, banned: List[str]) -> str:
        # r: r 뒤의 문자열 그대로 반환
        # \w: 알파벳, 숫자, _ 중 하나
        # \w에 해당하지 않는 것은 공백으로 처리
        # 대소문자 구분 없고 소문자 출력
        # banned에 들어가면 안 됨
        words = [word for word in re.sub(r'[^\w]',' ', paragraph)
                 .lower().split() 
                 if word not in banned]
        
        # collections 라이브러리의 Counter 클래스. words에서 각 원소별로 몇 개 나오는지 딕셔너리로 돌려줌
        counts = collections.Counter(words)
        # most_common까지 하면 개수 순으로 정렬해서 주는데 [('a',3), ('b',2), ...] 이런 식
        # 1을 넣어주면 첫 번째 값으로 [('a',3)]. [0][0] 하면 'a'
        return counts.most_common(1)[0][0]

개요

Share this post