문자열에서 주어진 부분 문자열의 발생 횟수

lottoking 2020. 5. 23. 09:26

문자열에서 주어진 부분 문자열의 발생 횟수

파이썬에서 문자열에 주어진 하위 문자열이 몇 번이나 있는지 계산할 수 있습니까?

예를 들면 다음과 같습니다.

>>> 'foo bar foo'.numberOfOccurrences('foo')
2

string.count(substring)다음과 같이 :

>>> "abcdabcva".count("ab")
2

업데이트 : 주석에서 지적했듯이 겹치지 않는 경우 에 수행하는 방법 입니다. 중복 발생 횟수를 계산해야하는 경우 " Python regex 에서 중복 일치 항목을 모두 찾으십니까? "에서 답변을 확인하거나 다른 답변을 확인하십시오.

s = 'arunununghhjj'
sb = 'nun'
results = 0
sub_len = len(sb)
for i in range(len(s)):
    if s[i:i+sub_len] == sb:
        results += 1
print results

실제로 의미하는 바에 따라 다음과 같은 해결책을 제안합니다.

공백으로 구분 된 하위 문자열 목록을 의미하며 모든 하위 문자열 중 하위 문자열 위치 번호가 무엇인지 알고 싶습니다.
```
s = 'sub1 sub2 sub3'
s.split().index('sub2')
>>> 1
```
문자열에서 하위 문자열의 문자 위치를 의미합니다.
```
s.find('sub2')
>>> 5
```
su-bstring 의 (중첩되지 않은) 출현 횟수 를 의미합니다 .
```
s.count('sub2')
>>> 1
s.count('sub')
>>> 3
```

두 가지 방법으로 빈도를 계산할 수 있습니다.

count()in 사용 str:

a.count(b)
또는 다음을 사용할 수 있습니다.

len(a.split(b))-1

a문자열은 어디에 b있으며 빈도를 계산할 부분 문자열입니다.

Python 3에서 문자열의 하위 문자열이 겹치는 것을 찾으려면이 알고리즘은 다음을 수행합니다.

def count_substring(string,sub_string):
    l=len(sub_string)
    count=0
    for i in range(len(string)-len(sub_string)+1):
        if(string[i:i+len(sub_string)] == sub_string ):      
            count+=1
    return count

나는 나 자신 이이 알고리즘을 확인하고 작동했습니다.

주어진 문자열에서 겹치는 부분 문자열을 찾는 가장 좋은 방법은 파이썬 정규 표현식을 사용하는 것입니다. 정규 표현식 라이브러리를 사용하여 모든 겹치는 일치 항목을 찾습니다. 왼쪽에 부분 문자열이 있고 오른쪽에 일치하는 문자열을 제공하는 방법은 다음과 같습니다.

print len(re.findall('(?=aa)','caaaab'))
3

메서드와 관련된 현재 최고의 답변은 count실제로 중복 발생을 계산하지 않으며 빈 하위 문자열도 신경 쓰지 않습니다. 예를 들면 다음과 같습니다.

>>> a = 'caatatab'
>>> b = 'ata'
>>> print(a.count(b)) #overlapping
1
>>>print(a.count('')) #empty string
9

겹치는 부분 문자열을 고려하면 첫 번째 대답은 2아니 어야합니다 1. 두 번째 대답은 빈 하위 문자열이 0으로 asnwer를 반환하는 것이 좋습니다.

다음 코드는 이러한 것들을 처리합니다.

def num_of_patterns(astr,pattern):
    astr, pattern = astr.strip(), pattern.strip()
    if pattern == '': return 0

    ind, count, start_flag = 0,0,0
    while True:
        try:
            if start_flag == 0:
                ind = astr.index(pattern)
                start_flag = 1
            else:
                ind += 1 + astr[ind+1:].index(pattern)
            count += 1
        except:
            break
    return count

이제 실행할 때 :

>>>num_of_patterns('caatatab', 'ata') #overlapping
2
>>>num_of_patterns('caatatab', '') #empty string
0
>>>num_of_patterns('abcdabcva','ab') #normal
2

그 질문은 분명하지 않지만, 당신이 무엇을 묻고 있는지 대답 해 드리겠습니다.

L 문자 길이이고 S [1]이 문자열의 첫 번째 문자이고 S [L]이 마지막 문자 인 문자열 S에는 다음과 같은 하위 문자열이 있습니다.

널 문자열 '' 이 중 하나가 있습니다.
1에서 L까지의 모든 값 A에 대해, A에서 L까지의 모든 값 B에 대해 문자열 S [A] .. S [B] (포함). 이 문자열에는 L + L-1 + L-2 + ... 1이 있으며 총 0.5 * L * (L + 1)입니다.
두 번째 항목에는 S [1] .. S [L], 즉 전체 원래 문자열 S가 포함됩니다.

따라서 길이가 L 인 문자열 내에 0.5 * L * (L + 1) + 1 개의 하위 문자열이 있습니다. Python에서 해당 표현식을 렌더링하면 문자열 내에 하위 문자열 수가 있습니다.

한 가지 방법은을 사용하는 것 re.subn입니다. 예를 들어, 여러 'hello'사례를 혼합하여 발생 횟수를 계산하려면 다음을 수행하십시오.

import re
_, count = re.subn(r'hello', '', astring, flags=re.I)
print('Found', count, 'occurrences of "hello"')

승인 된 답변을 "단순하고 명백한 방법"으로 유지하지만 중복되는 내용은 다루지 않습니다. sum ( "GCAAAAAGH"[i :]. startswith ( "AAA") in i in range (len ( "GCAAAAAGH")))와 같이 슬라이스를 여러 번 검사하여 순진하게 수행 할 수 있습니다.

파이썬 정규식 에서 볼 수 있듯이 정규 표현식을 사용하여 트릭을 사용하여 수행 할 수 있습니까? -그리고 그것은 또한 훌륭한 코드 골프를 만들 수 있습니다-이것은 매우 순진하지 않으려 고 노력하는 문자열에서 패턴의 중복 발생에 대한 나의 "수제"카운트입니다 (적어도 각 상호 작용에서 새로운 문자열 객체를 생성하지는 않습니다).

def find_matches_overlapping(text, pattern):
    lpat = len(pattern) - 1
    matches = []
    text = array("u", text)
    pattern = array("u", pattern)
    indexes = {}
    for i in range(len(text) - lpat):
        if text[i] == pattern[0]:
            indexes[i] = -1
        for index, counter in list(indexes.items()):
            counter += 1
            if text[i] == pattern[counter]:
                if counter == lpat:
                    matches.append(index)
                    del indexes[index]
                else:
                    indexes[index] = counter
            else:
                del indexes[index]
    return matches

def count_matches(text, pattern):
    return len(find_matches_overlapping(text, pattern))

중복 발생 :

def olpcount(string,pattern,case_sensitive=True):
    if case_sensitive != True:
        string  = string.lower()
        pattern = pattern.lower()
    l = len(pattern)
    ct = 0
    for c in range(0,len(string)):
        if string[c:c+l] == pattern:
            ct += 1
    return ct

test = 'my maaather lies over the oceaaan'
print test
print olpcount(test,'a')
print olpcount(test,'aa')
print olpcount(test,'aaa')

결과 :

my maaather lies over the oceaaan
6
4
2

중복 카운트의 경우 다음을 사용할 수 있습니다.

def count_substring(string, sub_string):
    count=0
    beg=0
    while(string.find(sub_string,beg)!=-1) :
        count=count+1
        beg=string.find(sub_string,beg)
        beg=beg+1
    return count

겹치지 않는 경우 count () 함수를 사용할 수 있습니다.

string.count(sub_string)

문자열 내부의 하위 문자열 수를 찾으려면; 아래 코드를 사용하십시오. 코드는 이해하기 쉽기 때문에 주석을 건너 뛰었습니다. :)

string=raw_input()
sub_string=raw_input()
start=0
answer=0
length=len(string)
index=string.find(sub_string,start,length)
while index<>-1:
    start=index+1
    answer=answer+1
    index=string.find(sub_string,start,length)
print answer

리스트 이해력이있는 원 라이너는 어떻습니까? 기술적으로 93 자 길이의 PEP-8 순결을 아끼지 않습니다. regex.findall 답변은 높은 수준의 코드 인 경우 가장 읽기 쉽습니다. 저수준으로 무언가를 만들고 의존성을 원하지 않는다면, 이것은 매우 희박하고 의미가 있습니다. 나는 겹치는 대답을하고있다. 겹치지 않으면 최고 점수 답변처럼 count를 사용하십시오.

def count_substring(string, sub_string):
    return len([i for i in range(len(string)) if string[i:i+len(sub_string)] == sub_string])

시나리오 1 : 문장에서 단어의 발생. 예 : str1 = "This is an example and is easy". "is"라는 단어가 나타납니다. 하자str2 = "is"

count = str1.count(str2)

시나리오 2 : 문장에서 패턴의 발생.

string = "ABCDCDC"
substring = "CDC"

def count_substring(string,sub_string):
len1 = len(string)
len2 = len(sub_string)
j =0
counter = 0
while(j < len1):
    if(string[j] == sub_string[0]):
        if(string[j:j+len2] == sub_string):
            counter += 1
    j += 1

return counter

감사!

이것이 이미 본 것인지 확실하지 않지만 이것을 '일회용'이라는 단어의 해결책으로 생각했습니다.

for i in xrange(len(word)):
if word[:len(term)] == term:
    count += 1
word = word[1:]

print count

여기서 word 는 검색중인 단어이고 term 은 찾고있는 용어 입니다.

string="abc"
mainstr="ncnabckjdjkabcxcxccccxcxcabc"
count=0
for i in range(0,len(mainstr)):
    k=0
    while(k<len(string)):
        if(string[k]==mainstr[i+k]):
            k+=1
        else:
            break   
    if(k==len(string)):
        count+=1;   
print(count)

import re
d = [m.start() for m in re.finditer(seaching, string)] 
print (d)

문자열에서 하위 문자열을 찾은 횟수를 찾아 인덱스를 표시합니다.

my_string = """Strings are amongst the most popular data types in Python. 
               We can create the strings by enclosing characters in quotes.
               Python treats single quotes the same as double quotes."""

Count = my_string.lower().strip("\n").split(" ").count("string")
Count = my_string.lower().strip("\n").split(" ").count("strings")
print("The number of occurance of word String is : " , Count)
print("The number of occurance of word Strings is : " , Count)

2 명 이상이 이미이 솔루션을 제공했기 때문에 다운 보트 위험. 나는 심지어 그들 중 하나를 upvoted. 그러나 내 초보자는 아마도 초보자가 가장 쉽게 이해할 수 있습니다.

def count_substring(string, sub_string):
    slen  = len(string)
    sslen = len(sub_string)
    range_s = slen - sslen + 1
    count = 0
    for i in range(range_s):
        if (string[i:i+sslen] == sub_string):
            count += 1
    return count

공백으로 구분 된 간단한 문자열의 경우 Dict를 사용하면 매우 빠릅니다. 아래 코드를 참조하십시오

def getStringCount(mnstr:str, sbstr:str='')->int:
    """ Assumes two inputs string giving the string and 
        substring to look for number of occurances 
        Returns the number of occurances of a given string
    """
    x = dict()
    x[sbstr] = 0
    sbstr = sbstr.strip()
    for st in mnstr.split(' '):
        if st not in [sbstr]:
            continue
        try:
            x[st]+=1
        except KeyError:
            x[st] = 1
    return x[sbstr]

s = 'foo bar foo test one two three foo bar'
getStringCount(s,'foo')

다음 startswith방법을 사용할 수 있습니다 .

def count_substring(string, sub_string):
    x = 0
    for i in range(len(string)):
        if string[i:].startswith(sub_string):
            x += 1
    return x

아래의 논리는 모든 문자열 및 특수 문자에 적용됩니다.

def cnt_substr(inp_str, sub_str):
    inp_join_str = ''.join(inp_str.split())
    sub_join_str = ''.join(sub_str.split())

    return inp_join_str.count(sub_join_str)

print(cnt_substr("the sky is   $blue and not greenthe sky is   $blue and not green", "the sky"))

다음은 Python 3의 솔루션이며 대소 문자를 구분하지 않습니다.

s = 'foo bar foo'.upper()
sb = 'foo'.upper()
results = 0
sub_len = len(sb)
for i in range(len(s)):
    if s[i:i+sub_len] == sb:
        results += 1
print(results)

j = 0
    while i < len(string):
        sub_string_out = string[i:len(sub_string)+j]
        if sub_string == sub_string_out:
            count += 1
        i += 1
        j += 1
    return count

#counting occurence of a substring in another string (overlapping/non overlapping)
s = input('enter the main string: ')# e.g. 'bobazcbobobegbobobgbobobhaklpbobawanbobobobob'
p=input('enter the substring: ')# e.g. 'bob'

counter=0
c=0

for i in range(len(s)-len(p)+1):
    for j in range(len(p)):
        if s[i+j]==p[j]:
            if c<len(p):
                c=c+1
                if c==len(p):
                    counter+=1
                    c=0
                    break
                continue
        else:
            break
print('number of occurences of the substring in the main string is: ',counter)

s = input('enter the main string: ')
p=input('enter the substring: ')
l=[]
for i in range(len(s)):
    l.append(s[i:i+len(p)])
print(l.count(p))

If you want to count all the sub-string (including overlapped) then use this method.

import re
def count_substring(string, sub_string):
    regex = '(?='+sub_string+')'
    # print(regex)
    return len(re.findall(regex,string))

def num_occ(str1, str2):
    l1, l2 = len(str1), len(str2)
    return len([str1[i:i + l2] for i in range(l1 - l2 + 1) if str1[i:i + l2] == str2])

참고URL : https://stackoverflow.com/questions/8899905/count-number-of-occurrences-of-a-given-substring-in-a-string

'IT' 카테고리의 다른 글

OS X Lion에서 터미널이 ~ / .bashrc를로드하지 않는 문제를 해결하는 방법 (0)	2020.05.23
키를 포함한 PHP의 array_map (0)	2020.05.23
파이썬의 'in'연산자를 무시 하시겠습니까? (0)	2020.05.23
몽구스 _id와 문자열 비교 (0)	2020.05.23
힘내 푸시 오류 : 체크 아웃 분기 업데이트 거부 (0)	2020.05.23

현재글문자열에서 주어진 부분 문자열의 발생 횟수

내가 바로 로또왕!

놀거리, c++, 연극, spring3, spring, 영화순위, DVD순위, javascript, C#, Java, 뮤지컬, 볼거리, 여행, 자바, 축제, 무비순위, 행사, 공연, 관광, jquery,

Today :
Yesterday :

일	월	화	수	목	금	토
	1	2	3	4	5	6
7	8	9	10	11	12	13
14	15	16	17	18	19	20
21	22	23	24	25	26	27
28	29	30

내 블로그 - 관리자 홈 전환	`Q` `Q`
새 글 쓰기	`W` `W`

글 수정 (권한 있는 경우)	`E` `E`
댓글 영역으로 이동	`C` `C`

이 페이지의 URL 복사	`S` `S`
맨 위로 이동	`T` `T`
티스토리 홈 이동	`H` `H`
단축키 안내	`Shift` + `/` `⇧` + `/`

lottoking