ย
ย
Speech Recognition ํจํค์ง์์ ๊ธฐ๋ณธ์ด ๋๋ google api์
ํ๊ตญ์ด์ ์คํ๋ผ์ธ ๋ชจ๋๋ฅผ ์ง์ํ๋ vosk, whisper๋ฅผ ์ค์ ์ ์ผ๋ก stt ๊ธฐ๋ณธ ์ฝ๋๋ฅผ ํ์ฉ
ย
ย
1. ๊ฐ๋ฐํ๊ฒฝ ๊ตฌ์ฑ
pip3 install SpeechRecognition # for MAC brew install portaudio pip3 install pyaudio # for Ubuntu sudo apt-get install python-pyaudio python3-pyaudio sudo apt-get install portaudio19-dev python-all-dev python3-all-dev sudo pip install pyaudio # for api python3 -m pip install vosk python3 -m pip install git+https://github.com/openai/whisper.git soundfile
ย
ย
2. ์์ ์ฝ๋
1. Google API
- ์ฝ๋
import speech_recognition as sr r = sr.Recognizer() with sr.Microphone() as source: print('listening...') audio = r.listen(source, timeout=10, phrase_time_limit=10) print("......") try: text = r.recognize_google(audio, language='ko') print(text) except sr.UnknownValueError: print("Recognizer Failed..") except sr.RequestError as e: print("Request Failed...", e)
ย
ย
2. Vosk
- ํ๊ฒฝ ๊ตฌ์ฑ
- ๋ชจ๋ธ ๋ค์ด๋ก๋
- ๋ชจ๋ธ ํ์ผ ๋ค์ด๋ก๋ ์ดํ์, ํ๋ก์ ํธ ํด๋ ํ์์ model ํด๋ ์์ฑ ํ ์์ถํด์ ํ๋ค.
- ์ฝ๋
import speech_recognition as sr r = sr.Recognizer() with sr.Microphone() as source: print('listening...') audio = r.listen(source, timeout=10, phrase_time_limit=10) print("......") try: text = r.recognize_vosk(audio, language='ko') print(text) except sr.UnknownValueError: print("Recognizer Failed..") except sr.RequestError as e: print("Request Failed...", e)
ย
ย
3. whisper
- ์ฝ๋
import speech_recognition as sr r = sr.Recognizer() with sr.Microphone() as source: print('listening...') audio = r.listen(source, timeout=10, phrase_time_limit=10) print("......") try: text = r.recognize_whisper(audio, language='ko') print(text) except sr.UnknownValueError: print("Recognizer Failed..") except sr.RequestError as e: print("Request Failed...", e)
ย
ย
ย
3. ํ ์คํธ
- ํ ์คํธ ํ์ผ
- ์๋ ํ์ธ์. ์ด๊ฒ์ ํ ์คํธ ๋ฌธ์ฅ์ ๋๋ค.
- ์ฝ๋
import speech_recognition as sr import json r = sr.Recognizer() with sr.Microphone() as source: print('listening...') audio = r.listen(source, timeout=10, phrase_time_limit=10) print("......") try: text_google = r.recognize_google(audio, language='ko', show_all=True) text_google = dict(text_google)['alternative'][0]['transcript'] if 'alternative' in dict(text_google).keys() else "" text_vosk = r.recognize_vosk(audio, language='ko') text_vosk = json.loads(text_vosk)['text'] text_whisper = r.recognize_whisper(audio, language='ko') print("[Google]", text_google) print("[Vosk]", text_vosk) print("[whisper]", text_whisper) except sr.UnknownValueError: print("Recognizer Failed..") except sr.RequestError as e: print("Request Failed...", e)
- ์ธ์ ๊ฒฐ๊ณผ
[Google] ์๋ ํ์ธ์ ์ด๊ฒ์ ํ ์คํธ ๋ฌธ์ฅ์ ๋๋ค [Vosk] ๋ฅ ๋ฆฌ์์ ์์ด๋ฒ์จ ํ ์คํธ ๋ฌธ์ ์ ๋๋ค [whisper] ์๋ ํ์ธ์ ์ด๊ฒ์ ํ ์คํธ ๋ฌธ์ฅ์ ๋๋ค
ย
ย
ย