Hi guys I’ve spent a little bit of time with chat talking about an app which would build something like this:
Yes, it’s possible to create an app that integrates with Google Assistant to achieve your desired functionality. Here's how you can do it, broken down step by step:
1. Creating the App with Google Actions:
Google Actions: You'll need to build a Google Assistant Action for your app using the Actions on Google platform. This Action will respond to a voice command like "Hey Google, open Voice Typer."
Steps to build Google Action:
1. Create a new project on the Actions on Google developer console.
2. Define an intent in the Action to recognize a custom command like "open voice typer."
3. When the intent is triggered, it will open your custom Android app.
2. Voice Recording and Transcription in the App:
Once your app is opened via Google Assistant, it will:
1. Record Voice Input: Implement the recording feature using the Android MediaRecorder API or the SpeechRecognizer API to capture audio input.
2. Transcribe Using Whisper: Integrate OpenAI’s Whisper for local transcription. You'll need to bundle Whisper in a way that it can run efficiently on mobile devices (possibly using a smaller model like tiny or base converted to TensorFlow Lite).
3. Silence Detection: Implement a timeout (e.g., 4-5 seconds) to detect when the user stops speaking. This could be done using audio level thresholds or Android’s SpeechRecognizer API, which can provide silence detection.
3. Copying Text to Clipboard:
After the transcription completes, the app will:
Use the ClipboardManager API to copy the transcribed text to the clipboard:
ClipboardManager clipboard = (ClipboardManager) getSystemService(Context.CLIPBOARD_SERVICE);
ClipData clip = ClipData.newPlainText("transcribed text", transcribedText);
clipboard.setPrimaryClip(clip);
4. Returning to Google Keyboard (Gboard):
Once the text is in the clipboard, the user can easily paste it into any text field using Gboard or another keyboard by long-pressing and selecting "Paste."
5. Example Workflow:
You’re typing a message in any app using Gboard.
When you want to switch to voice input, say "Hey Google, open Voice Typer."
Google Assistant opens your app.
You speak your text, and after 4-5 seconds of silence, the app transcribes the speech and copies it to the clipboard.
You go back to the text input field and paste the transcribed text using Gboard.
Tools and Resources You’ll Need:
1. Google Assistant SDK for creating Actions: Actions on Google
2. Android SDK: For building the app, recording audio, and integrating Whisper.
3. Whisper Integration: Use OpenAI Whisper or a TensorFlow Lite version for local processing.
4. ClipboardManager: Android's API for managing clipboard content.
Key Considerations:
Performance: Whisper's larger models can be resource-intensive. Optimizing for mobile performance (by using a smaller model like tiny.en) will be essential to avoid lag.
Permissions: The app will need permissions for recording audio and accessing the clipboard.
By following this approach, you can create an app that meets your needs with minimal intervention on the device, as Google Assistant would handle the voice command, and your app would take care of the recording, transcription, and clipboard management.
So if anyone who has experience with Android app development finds it interesting please help yourself.