CopyPastor

Instead of using a `while True` loop that can cause high CPU usage, make sure the transcribing function operates efficiently within an asynchronous context.
**Refactored version:** ```py # Azure Speech-to-Text Conversation Transcriber def transcribing(evt, name): print(f"{name} transcribing: {evt.result.text}")
def transcribed(evt, name): print(f"{name} transcribed: {evt.result.text}")
async def start_recognition(audio_config, speech_config, name, stop_event): transcriber = speechsdk.transcription.ConversationTranscriber(speech_config=speech_config, audio_config=audio_config) transcriber.transcribed.connect(lambda evt: transcribed(evt, name)) transcriber.transcribing.connect(lambda evt: transcribing(evt, name))
await transcriber.start_transcribing_async() print(f"{name} started!")
while not stop_event.is_set(): await asyncio.sleep(0.1) # Non-blocking wait
await transcriber.stop_transcribing_async() print(f"{name} stopped!")
def run_recognition_thread(audio_config, speech_config, name, stop_event): loop = asyncio.new_event_loop() asyncio.set_event_loop(loop) loop.run_until_complete(start_recognition(audio_config, speech_config, name, stop_event))
# Event to signal the threads to stop stop_event = threading.Event()
# Individual threads for each transcriber microphone_thread = threading.Thread(target=run_recognition_thread, args=(microphone_audio_config, speech_config, "Microphone", stop_event)) speaker_thread = threading.Thread(target=run_recognition_thread, args=(speaker_audio_config, speech_config, "Speaker", stop_event))
# Start threads microphone_thread.start() speaker_thread.start()
try: while True: # Main thread non-blocking wait if not microphone_thread.is_alive() or not speaker_thread.is_alive(): break asyncio.sleep(1) except KeyboardInterrupt: stop_event.set()
# Join threads to ensure clean exit microphone_thread.join() speaker_thread.join() ```
- In the above, the main thread uses `asyncio.sleep(1)` for non-blocking waits and checks if either thread has stopped.
Or else try this [OtterPilot](https://otter.ai/welcome/otter_assistant_questionnaire?utm_source=bing_ads&utm_medium=search&utm_campaign=search-prospecting-nonbrand-transcription-exact-us-otteraichat-maxconv-9192023&utm_term=audio%20transcription%20software&msclkid=33f859ea7dfa13ae960c3f8a08974795&utm_content=lp_oaichat_aw_cta_signup&new_onboarding=ThirdPartySignupFlow&third_party=google) to record the audio from both devices.

This is a "sorta" answer, still not perfect.
Turns out, you can [specify an input device][1] for audio configs and use that for the transcriber: You need the "audio device endpoint ID string [...] from the IMMDevice object".
This took a while to find, but I came across this [code][2] (credit to @Damien) that finds exactly that string: ```python import subprocess
sd = subprocess.run( ["pnputil", "/enum-devices", "/connected", "/class", "AudioEndpoint"], capture_output=True, text=True, )
output = sd.stdout.split("\n")[1:-1]
def getDevices(devices): deviceList = {} for device in range(len(devices)): if "Instance ID:" in devices[device]: deviceList[devices[device+1].split(":")[-1].strip()] = devices[device].split("\\")[-1].strip() return deviceList
print(getDevices(output)) ```
Unfortunately, it doesn't seem like I can use my headset speaker with Azure (just don't get any transcriptions), but I am able to use the Stereo Mix which after a lot of finicking does allow me to transcribe the speaker output. Additionally, the transcription is very slow and inaccurate.
I am attaching relevant code for reference: ```python import azure.cognitiveservices.speech as speechsdk from dotenv import load_dotenv import os import threading
# Store credentials in .env file, initialize speech_config load_dotenv() audio_key = os.getenv("audio_key") audio_region = os.getenv("audio_region") speech_config = speechsdk.SpeechConfig(subscription=audio_key, region=audio_region) speech_config.speech_recognition_language = "en-US"
# Endpoint strings found using aforementioned code mic = "{0.0.1.00000000}.{6dd64d0d-e876-4f3f-b1fe-464843289599}" stereo_mix = "{0.0.1.00000000}.{c4c4d95c-5bd1-4f09-a07e-ad3a96c381f0}"
# Initialize audio_config as shown in Azure documentation microphone_audio_config = speechsdk.audio.AudioConfig(device_name=mic) speaker_audio_config = speechsdk.audio.AudioConfig(device_name=stereo_mix)
# Azure Speech-to-Text Conversation Transcriber def transcribing(evt, name): print(f"{name} transcribing!")
def transcribed(evt, name): print(f"{name} transcribed!") # Function to start Azure speech recognition def start_recognition(audio_config, speech_config, name): transcriber = speechsdk.transcription.ConversationTranscriber(speech_config=speech_config, audio_config=audio_config) transcriber.transcribed.connect(lambda evt: transcribed(evt, name)) transcriber.transcribing.connect(lambda evt: transcribing(evt, name))
transcriber.start_transcribing_async()
print(f"{name} started!")
# Infinite Loop to continue transcription while True: pass
# Individual threads for each transcriber threading.Thread(target=start_recognition, args=(microphone_audio_config, speech_config, "Microphone",)).start() threading.Thread(target=start_recognition, args=(speaker_audio_config, speech_config, "Speaker",)).start() ``` Note: This is an altered version of my code, so it is possible that there is something that I accidentally removed that is in fact necessary, but the main idea is there. Feel free to ask questions if something doesn't work.
Didn't have to use Soundcard library or create any custom Audio Stream classes (which I found very painful). I'm gonna open up another question about the slowness, though I suspect that it is due to the threads rather than the Azure service itself.
[1]: https://learn.microsoft.com/en-us/azure/ai-services/speech-service/how-to-select-audio-input-devices [2]: https://stackoverflow.com/questions/72413426/get-audio-device-guid-in-python

CopyPastor

Possible Plagiarism

Original Post