Instead of using a `while True` loop that can cause high CPU usage, make sure the transcribing function operates efficiently within an asynchronous context.
**Refactored version:**
```py
# Azure Speech-to-Text Conversation Transcriber
def transcribing(evt, name):
print(f"{name} transcribing: {evt.result.text}")
def transcribed(evt, name):
print(f"{name} transcribed: {evt.result.text}")
async def start_recognition(audio_config, speech_config, name, stop_event):
transcriber = speechsdk.transcription.ConversationTranscriber(speech_config=speech_config, audio_config=audio_config)
transcriber.transcribed.connect(lambda evt: transcribed(evt, name))
transcriber.transcribing.connect(lambda evt: transcribing(evt, name))
await transcriber.start_transcribing_async()
print(f"{name} started!")
while not stop_event.is_set():
await asyncio.sleep(0.1) # Non-blocking wait
await transcriber.stop_transcribing_async()
print(f"{name} stopped!")
def run_recognition_thread(audio_config, speech_config, name, stop_event):
loop = asyncio.new_event_loop()
asyncio.set_event_loop(loop)
loop.run_until_complete(start_recognition(audio_config, speech_config, name, stop_event))
# Event to signal the threads to stop
stop_event = threading.Event()
# Individual threads for each transcriber
microphone_thread = threading.Thread(target=run_recognition_thread, args=(microphone_audio_config, speech_config, "Microphone", stop_event))
speaker_thread = threading.Thread(target=run_recognition_thread, args=(speaker_audio_config, speech_config, "Speaker", stop_event))
# Start threads
microphone_thread.start()
speaker_thread.start()
try:
while True:
# Main thread non-blocking wait
if not microphone_thread.is_alive() or not speaker_thread.is_alive():
break
asyncio.sleep(1)
except KeyboardInterrupt:
stop_event.set()
# Join threads to ensure clean exit
microphone_thread.join()
speaker_thread.join()
```
- In the above, the main thread uses `asyncio.sleep(1)` for non-blocking waits and checks if either thread has stopped.
Or else try this [OtterPilot](https://otter.ai/welcome/otter_assistant_questionnaire?utm_source=bing_ads&utm_medium=search&utm_campaign=search-prospecting-nonbrand-transcription-exact-us-otteraichat-maxconv-9192023&utm_term=audio%20transcription%20software&msclkid=33f859ea7dfa13ae960c3f8a08974795&utm_content=lp_oaichat_aw_cta_signup&new_onboarding=ThirdPartySignupFlow&third_party=google) to record the audio from both devices.
This is a "sorta" answer, still not perfect.
Turns out, you can [specify an input device][1] for audio configs and use that for the transcriber: You need the "audio device endpoint ID string [...] from the IMMDevice object".
This took a while to find, but I came across this [code][2] (credit to @Damien) that finds exactly that string:
```python
import subprocess
sd = subprocess.run(
["pnputil", "/enum-devices", "/connected", "/class", "AudioEndpoint"],
capture_output=True,
text=True,
)
output = sd.stdout.split("\n")[1:-1]
def getDevices(devices):
deviceList = {}
for device in range(len(devices)):
if "Instance ID:" in devices[device]:
deviceList[devices[device+1].split(":")[-1].strip()] = devices[device].split("\\")[-1].strip()
return deviceList
print(getDevices(output))
```
Unfortunately, it doesn't seem like I can use my headset speaker with Azure (just don't get any transcriptions), but I am able to use the Stereo Mix which after a lot of finicking does allow me to transcribe the speaker output. Additionally, the transcription is very slow and inaccurate.
I am attaching relevant code for reference:
```python
import azure.cognitiveservices.speech as speechsdk
from dotenv import load_dotenv
import os
import threading
# Store credentials in .env file, initialize speech_config
load_dotenv()
audio_key = os.getenv("audio_key")
audio_region = os.getenv("audio_region")
speech_config = speechsdk.SpeechConfig(subscription=audio_key, region=audio_region)
speech_config.speech_recognition_language = "en-US"
# Endpoint strings found using aforementioned code
mic = "{0.0.1.00000000}.{6dd64d0d-e876-4f3f-b1fe-464843289599}"
stereo_mix = "{0.0.1.00000000}.{c4c4d95c-5bd1-4f09-a07e-ad3a96c381f0}"
# Initialize audio_config as shown in Azure documentation
microphone_audio_config = speechsdk.audio.AudioConfig(device_name=mic)
speaker_audio_config = speechsdk.audio.AudioConfig(device_name=stereo_mix)
# Azure Speech-to-Text Conversation Transcriber
def transcribing(evt, name):
print(f"{name} transcribing!")
def transcribed(evt, name):
print(f"{name} transcribed!")
# Function to start Azure speech recognition
def start_recognition(audio_config, speech_config, name):
transcriber = speechsdk.transcription.ConversationTranscriber(speech_config=speech_config, audio_config=audio_config)
transcriber.transcribed.connect(lambda evt: transcribed(evt, name))
transcriber.transcribing.connect(lambda evt: transcribing(evt, name))
transcriber.start_transcribing_async()
print(f"{name} started!")
# Infinite Loop to continue transcription
while True:
pass
# Individual threads for each transcriber
threading.Thread(target=start_recognition, args=(microphone_audio_config, speech_config, "Microphone",)).start()
threading.Thread(target=start_recognition, args=(speaker_audio_config, speech_config, "Speaker",)).start()
```
Note: This is an altered version of my code, so it is possible that there is something that I accidentally removed that is in fact necessary, but the main idea is there. Feel free to ask questions if something doesn't work.
Didn't have to use Soundcard library or create any custom Audio Stream classes (which I found very painful). I'm gonna open up another question about the slowness, though I suspect that it is due to the threads rather than the Azure service itself.
[1]: https://learn.microsoft.com/en-us/azure/ai-services/speech-service/how-to-select-audio-input-devices
[2]: https://stackoverflow.com/questions/72413426/get-audio-device-guid-in-python