๐Ÿ—ฃ๏ธ

SpeechRecognition

Tags
Python
ID matched
Created
Mar 15, 2023 05:20 PM
Last Updated
Last updated July 15, 2023
ย 
ย 
๐Ÿ’ก
Speech Recognition ํŒจํ‚ค์ง€์—์„œ ๊ธฐ๋ณธ์ด ๋˜๋Š” google api์™€ ํ•œ๊ตญ์–ด์™€ ์˜คํ”„๋ผ์ธ ๋ชจ๋“œ๋ฅผ ์ง€์›ํ•˜๋Š” vosk, whisper๋ฅผ ์ค‘์ ์ ์œผ๋กœ stt ๊ธฐ๋ณธ ์ฝ”๋“œ๋ฅผ ํ™œ์šฉ
ย 
ย 

1. ๊ฐœ๋ฐœํ™˜๊ฒฝ ๊ตฌ์„ฑ

pip3 install SpeechRecognition # for MAC brew install portaudio pip3 install pyaudio # for Ubuntu sudo apt-get install python-pyaudio python3-pyaudio sudo apt-get install portaudio19-dev python-all-dev python3-all-dev sudo pip install pyaudio # for api python3 -m pip install vosk python3 -m pip install git+https://github.com/openai/whisper.git soundfile
ย 
ย 

2. ์˜ˆ์ œ ์ฝ”๋“œ

1. Google API

  • ์ฝ”๋“œ
    • import speech_recognition as sr r = sr.Recognizer() with sr.Microphone() as source: print('listening...') audio = r.listen(source, timeout=10, phrase_time_limit=10) print("......") try: text = r.recognize_google(audio, language='ko') print(text) except sr.UnknownValueError: print("Recognizer Failed..") except sr.RequestError as e: print("Request Failed...", e)
ย 
ย 

2. Vosk

  • ์ฝ”๋“œ
    • import speech_recognition as sr r = sr.Recognizer() with sr.Microphone() as source: print('listening...') audio = r.listen(source, timeout=10, phrase_time_limit=10) print("......") try: text = r.recognize_vosk(audio, language='ko') print(text) except sr.UnknownValueError: print("Recognizer Failed..") except sr.RequestError as e: print("Request Failed...", e)
ย 
ย 

3. whisper

  • ์ฝ”๋“œ
    • import speech_recognition as sr r = sr.Recognizer() with sr.Microphone() as source: print('listening...') audio = r.listen(source, timeout=10, phrase_time_limit=10) print("......") try: text = r.recognize_whisper(audio, language='ko') print(text) except sr.UnknownValueError: print("Recognizer Failed..") except sr.RequestError as e: print("Request Failed...", e)
ย 
ย 
ย 

3. ํ…Œ์ŠคํŠธ

  • ํ…Œ์ŠคํŠธ ํŒŒ์ผ
    • ์•ˆ๋…•ํ•˜์„ธ์š”. ์ด๊ฒƒ์€ ํ…Œ์ŠคํŠธ ๋ฌธ์žฅ์ž…๋‹ˆ๋‹ค.
  • ์ฝ”๋“œ
    • import speech_recognition as sr import json r = sr.Recognizer() with sr.Microphone() as source: print('listening...') audio = r.listen(source, timeout=10, phrase_time_limit=10) print("......") try: text_google = r.recognize_google(audio, language='ko', show_all=True) text_google = dict(text_google)['alternative'][0]['transcript'] if 'alternative' in dict(text_google).keys() else "" text_vosk = r.recognize_vosk(audio, language='ko') text_vosk = json.loads(text_vosk)['text'] text_whisper = r.recognize_whisper(audio, language='ko') print("[Google]", text_google) print("[Vosk]", text_vosk) print("[whisper]", text_whisper) except sr.UnknownValueError: print("Recognizer Failed..") except sr.RequestError as e: print("Request Failed...", e)
  • ์ธ์‹ ๊ฒฐ๊ณผ
    • [Google] ์•ˆ๋…•ํ•˜์„ธ์š” ์ด๊ฒƒ์€ ํ…Œ์ŠคํŠธ ๋ฌธ์žฅ์ž…๋‹ˆ๋‹ค [Vosk] ๋ฅœ ๋ฆฌ์•„์˜ ์•„์ด๋ฒ„์Šจ ํ…Œ์ŠคํŠธ ๋ฌธ์ž ์ž…๋‹ˆ๋‹ค [whisper] ์•ˆ๋…•ํ•˜์„ธ์š” ์ด๊ฒƒ์€ ํ…Œ์ŠคํŠธ ๋ฌธ์žฅ์ž…๋‹ˆ๋‹ค
      notion image
ย 
ย 
ย 

์ฐธ๊ณ