An idea for Android app developers

Hi guys I’ve spent a little bit of time with chat talking about an app which would build something like this:

Yes, it’s possible to create an app that integrates with Google Assistant to achieve your desired functionality. Here's how you can do it, broken down step by step:

1. Creating the App with Google Actions:

Google Actions: You'll need to build a Google Assistant Action for your app using the Actions on Google platform. This Action will respond to a voice command like "Hey Google, open Voice Typer."

Steps to build Google Action:

1. Create a new project on the Actions on Google developer console.


2. Define an intent in the Action to recognize a custom command like "open voice typer."


3. When the intent is triggered, it will open your custom Android app.





2. Voice Recording and Transcription in the App:

Once your app is opened via Google Assistant, it will:

1. Record Voice Input: Implement the recording feature using the Android MediaRecorder API or the SpeechRecognizer API to capture audio input.


2. Transcribe Using Whisper: Integrate OpenAI’s Whisper for local transcription. You'll need to bundle Whisper in a way that it can run efficiently on mobile devices (possibly using a smaller model like tiny or base converted to TensorFlow Lite).


3. Silence Detection: Implement a timeout (e.g., 4-5 seconds) to detect when the user stops speaking. This could be done using audio level thresholds or Android’s SpeechRecognizer API, which can provide silence detection.




3. Copying Text to Clipboard:

After the transcription completes, the app will:

Use the ClipboardManager API to copy the transcribed text to the clipboard:

ClipboardManager clipboard = (ClipboardManager) getSystemService(Context.CLIPBOARD_SERVICE);
ClipData clip = ClipData.newPlainText("transcribed text", transcribedText);
clipboard.setPrimaryClip(clip);



4. Returning to Google Keyboard (Gboard):

Once the text is in the clipboard, the user can easily paste it into any text field using Gboard or another keyboard by long-pressing and selecting "Paste."


5. Example Workflow:

You’re typing a message in any app using Gboard.

When you want to switch to voice input, say "Hey Google, open Voice Typer."

Google Assistant opens your app.

You speak your text, and after 4-5 seconds of silence, the app transcribes the speech and copies it to the clipboard.

You go back to the text input field and paste the transcribed text using Gboard.


Tools and Resources You’ll Need:

1. Google Assistant SDK for creating Actions: Actions on Google


2. Android SDK: For building the app, recording audio, and integrating Whisper.


3. Whisper Integration: Use OpenAI Whisper or a TensorFlow Lite version for local processing.


4. ClipboardManager: Android's API for managing clipboard content.



Key Considerations:

Performance: Whisper's larger models can be resource-intensive. Optimizing for mobile performance (by using a smaller model like tiny.en) will be essential to avoid lag.

Permissions: The app will need permissions for recording audio and accessing the clipboard.


By following this approach, you can create an app that meets your needs with minimal intervention on the device, as Google Assistant would handle the voice command, and your app would take care of the recording, transcription, and clipboard management.


So if anyone who has experience with Android app development finds it interesting please help yourself.

2 Likes

I don’t think that openAI has any interest or experience in building Android apps. They are all iPhone loving people and tragically Android (and windows) is an annoying afterthought.

Were that actually true, then the comment you make only leaves more breathing room for developers, and does not nullify the topic.

I also can generate and iterate AI text on an idea. Let’s see how that goes.


Comprehensive Guide to Building an Android AI Voice Assistant

This document provides step-by-step instructions to create an advanced AI voice assistant app that:

  • Can be set as the default voice assistant.
  • Activates via a long press of the Home button.
  • Listens for a wake word in the background and activates voice commands upon detection.

Prerequisites

Before starting, ensure the following:

  1. Development Environment:

    • Android Studio: Version Arctic Fox (2020.3.1) or later.
    • Programming Language: Kotlin (recommended) or Java.
    • Knowledge: Familiarity with Android development and modern app architecture patterns.
  2. System Requirements:

    • Android SDK: Minimum SDK 21 (Lollipop) or higher.
    • Target SDK: API Level 33 (Android 13) or the latest available.
    • Device/Emulator for Testing: Android 10 (API 29) or higher recommended.
  3. Libraries and Tools:

    • Picovoice Porcupine for wake word detection.
    • Android Jetpack Libraries for modern app development.
    • Android SpeechRecognizer API.
    • TextToSpeech API for voice responses.

1. Setting Up the Android Project

a. Create a New Project

  1. Open Android Studio.

  2. Select New Project > Empty Activity.

  3. Configure:

    • Name: VoiceAssistantApp.
    • Package Name: com.example.voiceassistantapp.
    • Language: Kotlin.
    • Minimum SDK: API 21 (Android 5.0 Lollipop).

b. Configure Build Gradle

In your app-level build.gradle or build.gradle.kts, set up the following:

android {
    compileSdk = 33

    defaultConfig {
        applicationId = "com.example.voiceassistantapp"
        minSdk = 21
        targetSdk = 33
        versionCode = 1
        versionName = "1.0"

        // For multidex support if needed
        multiDexEnabled = true
    }

    buildTypes {
        release {
            minifyEnabled true
            proguardFiles(
                getDefaultProguardFile('proguard-android-optimize.txt'),
                'proguard-rules.pro'
            )
        }
    }
}

c. Add Dependencies

Add the necessary dependencies in the dependencies section:

dependencies {
    implementation "androidx.core:core-ktx:1.10.1"
    implementation "androidx.appcompat:appcompat:1.6.1"
    implementation "com.google.android.material:material:1.9.0"

    // Porcupine SDK for wake word detection
    implementation 'ai.picovoice:porcupine-android-core:2.1.0'

    // Lifecycle components
    implementation "androidx.lifecycle:lifecycle-livedata-ktx:2.6.1"
    implementation "androidx.lifecycle:lifecycle-viewmodel-ktx:2.6.1"
}

Note: Replace version numbers with the latest available at the time of development.

d. Add Permissions

In AndroidManifest.xml, add the following permissions:

<uses-permission android:name="android.permission.RECORD_AUDIO" />
<uses-permission android:name="android.permission.FOREGROUND_SERVICE" />

Important: You must request these permissions at runtime for devices running Android 6.0 (API 23) and above.


2. Declaring the App as the Default Assistant

To enable your app to be set as the system’s default assistant:

a. Update AndroidManifest.xml

Add an activity with the appropriate intent filters:

<application ... >
    <activity android:name=".AssistantActivity">
        <intent-filter>
            <action android:name="android.intent.action.MAIN" />
            <category android:name="android.intent.category.LAUNCHER" />
        </intent-filter>
        <intent-filter>
            <action android:name="android.intent.action.ASSIST" />
            <category android:name="android.intent.category.DEFAULT" />
        </intent-filter>
    </activity>
    ...
</application>

b. Handling Assistant Action in Activity

In AssistantActivity.kt:

class AssistantActivity : AppCompatActivity() {

    override fun onCreate(savedInstanceState: Bundle?) {
        super.onCreate(savedInstanceState)
        handleIntent(intent)
    }

    override fun onNewIntent(intent: Intent?) {
        super.onNewIntent(intent)
        handleIntent(intent)
    }

    private fun handleIntent(intent: Intent?) {
        if (Intent.ACTION_ASSIST == intent?.action) {
            // Start voice recognition
            startVoiceRecognition()
        }
    }

    private fun startVoiceRecognition() {
        // Implement the logic to start voice recognition
    }
}

3. Implementing Wake Word Detection

a. Integrate Picovoice Porcupine SDK

  1. Get an Access Key:

  2. Add Porcupine Dependency:

    implementation 'ai.picovoice:porcupine-android-core:2.1.0'
    
  3. Include Wake Word Model:

    • Use built-in keywords or create custom ones using the Picovoice Console.

b. Create a Foreground Service

Create a service that runs in the foreground to handle wake word detection.

i. WakeWordService.kt
class WakeWordService : Service() {

    private lateinit var porcupineManager: PorcupineManager

    override fun onCreate() {
        super.onCreate()

        val accessKey = "YOUR_PICOVOICE_ACCESS_KEY" // Replace with your key

        porcupineManager = PorcupineManager.Builder()
            .setAccessKey(accessKey)
            .setKeyword(Porcupine.BuiltInKeyword.PORCUPINE)
            .setProcessErrorCallback { throwable ->
                // Handle error
            }
            .build(applicationContext) { keywordIndex ->
                // Wake word detected
                onWakeWordDetected()
            }

        porcupineManager.start()

        startForegroundServiceWithNotification()
    }

    private fun onWakeWordDetected() {
        // Start the assistant activity
        val intent = Intent(this, AssistantActivity::class.java)
        intent.addFlags(Intent.FLAG_ACTIVITY_NEW_TASK)
        startActivity(intent)
    }

    private fun startForegroundServiceWithNotification() {
        val notificationChannelId = "WAKE_WORD_SERVICE_CHANNEL"
        val channelName = "Wake Word Detection"

        if (Build.VERSION.SDK_INT >= Build.VERSION_CODES.O) {
            val chan = NotificationChannel(
                notificationChannelId, channelName, NotificationManager.IMPORTANCE_LOW
            )
            val manager = getSystemService(Context.NOTIFICATION_SERVICE) as NotificationManager
            manager.createNotificationChannel(chan)
        }

        val notificationBuilder = NotificationCompat.Builder(this, notificationChannelId).apply {
            setContentTitle("Voice Assistant")
            setContentText("Listening for wake word")
            setSmallIcon(R.drawable.ic_notification)
            setOngoing(true)
        }

        val notification = notificationBuilder.build()
        startForeground(1, notification)
    }

    override fun onDestroy() {
        super.onDestroy()
        porcupineManager.stop()
        porcupineManager.delete()
    }

    override fun onBind(intent: Intent?): IBinder? = null
}

Note: Don’t forget to replace "YOUR_PICOVOICE_ACCESS_KEY" with your actual access key.

ii. Update AndroidManifest.xml
<service
    android:name=".WakeWordService"
    android:exported="false"
    android:foregroundServiceType="microphone" />

Explanation: foregroundServiceType="microphone" indicates that the service uses the microphone.

c. Starting the Service

In your MainActivity.kt or appropriate location:

private fun startWakeWordService() {
    val serviceIntent = Intent(this, WakeWordService::class.java)
    ContextCompat.startForegroundService(this, serviceIntent)
}

Ensure you start the service after obtaining the necessary permissions.


4. Requesting Runtime Permissions

a. Check and Request Permissions

In your activity (e.g., MainActivity):

private val RECORD_AUDIO_PERMISSION_CODE = 100

private fun checkMicrophonePermission() {
    if (ContextCompat.checkSelfPermission(this, Manifest.permission.RECORD_AUDIO)
        != PackageManager.PERMISSION_GRANTED) {
        ActivityCompat.requestPermissions(
            this,
            arrayOf(Manifest.permission.RECORD_AUDIO),
            RECORD_AUDIO_PERMISSION_CODE
        )
    } else {
        // Permission granted, start the service
        startWakeWordService()
    }
}

override fun onRequestPermissionsResult(
    requestCode: Int, permissions: Array<out String>, grantResults: IntArray
) {
    super.onRequestPermissionsResult(requestCode, permissions, grantResults)
    if (requestCode == RECORD_AUDIO_PERMISSION_CODE) {
        if ((grantResults.isNotEmpty() && grantResults[0] == PackageManager.PERMISSION_GRANTED)) {
            startWakeWordService()
        } else {
            Toast.makeText(this, "Permission denied", Toast.LENGTH_SHORT).show()
        }
    }
}

Call checkMicrophonePermission() at the appropriate point in your activity lifecycle.


5. Capturing and Processing Voice Commands

a. Setting Up Speech Recognition

In your AssistantActivity.kt:

class AssistantActivity : AppCompatActivity(), RecognitionListener {

    private lateinit var speechRecognizer: SpeechRecognizer
    private lateinit var recognizerIntent: Intent
    private var textToSpeech: TextToSpeech? = null

    override fun onCreate(savedInstanceState: Bundle?) {
        super.onCreate(savedInstanceState)
        setContentView(R.layout.activity_assistant)

        initializeSpeechRecognizer()
        initializeTextToSpeech()
        startListening()
    }

    private fun initializeSpeechRecognizer() {
        speechRecognizer = SpeechRecognizer.createSpeechRecognizer(this)
        speechRecognizer.setRecognitionListener(this)

        recognizerIntent = Intent(RecognizerIntent.ACTION_RECOGNIZE_SPEECH).apply {
            putExtra(RecognizerIntent.EXTRA_LANGUAGE_MODEL, RecognizerIntent.LANGUAGE_MODEL_FREE_FORM)
            putExtra(RecognizerIntent.EXTRA_LANGUAGE, Locale.getDefault())
            putExtra(RecognizerIntent.EXTRA_CALLING_PACKAGE, packageName)
        }
    }

    private fun initializeTextToSpeech() {
        textToSpeech = TextToSpeech(this) { status ->
            if (status != TextToSpeech.ERROR) {
                textToSpeech?.language = Locale.getDefault()
            }
        }
    }

    private fun startListening() {
        speechRecognizer.startListening(recognizerIntent)
    }

    override fun onResults(results: Bundle?) {
        val matches = results?.getStringArrayList(SpeechRecognizer.RESULTS_RECOGNITION)
        val command = matches?.firstOrNull()
        if (command != null) {
            processCommand(command)
        } else {
            respond("I didn't catch that. Please try again.")
        }
    }

    // Implement other RecognitionListener methods...

    override fun onDestroy() {
        speechRecognizer.destroy()
        textToSpeech?.shutdown()
        super.onDestroy()
    }
}

b. Processing Commands

Implement the processCommand function:

private fun processCommand(command: String) {
    when {
        command.contains("weather", ignoreCase = true) -> respond("Fetching the weather for you.")
        command.contains("time", ignoreCase = true) -> announceTime()
        else -> respond("I'm sorry, I didn't understand that.")
    }
}

private fun announceTime() {
    val currentTime = SimpleDateFormat("hh:mm a", Locale.getDefault()).format(Date())
    val response = "The current time is $currentTime."
    respond(response)
}

private fun respond(response: String) {
    textToSpeech?.speak(response, TextToSpeech.QUEUE_FLUSH, null, null)
}

Note: Ensure that your responses are appropriate and that you handle the TextToSpeech lifecycle correctly.


6. Managing Background Execution

Background execution has strict limitations, especially in recent Android versions.

a. Foreground Service

  • Ensure that your wake word detection runs in a foreground service with an ongoing notification.

b. Battery Optimization Exemption

To prevent the system from stopping your service, request the user to exclude your app from battery optimizations:

if (Build.VERSION.SDK_INT >= Build.VERSION_CODES.M) {
    val pm = getSystemService(Context.POWER_SERVICE) as PowerManager
    if (!pm.isIgnoringBatteryOptimizations(packageName)) {
        val intent = Intent(Settings.ACTION_REQUEST_IGNORE_BATTERY_OPTIMIZATIONS)
        intent.data = Uri.parse("package:$packageName")
        startActivity(intent)
    }
}

Add Permission in Manifest:

<uses-permission android:name="android.permission.REQUEST_IGNORE_BATTERY_OPTIMIZATIONS" />

c. Microphone Usage Compliance

  • Ensure compliance with Android’s policies on background microphone usage.
  • Since the app uses a foreground service and displays an ongoing notification, it is allowed to access the microphone in the background.

7. Testing and Debugging

a. Unit Testing

  • Write unit tests for your command processing logic.

b. Integration Testing

  • Test the interaction between wake word detection, speech recognition, and response generation.

c. Device Testing

  • Test on real devices from different manufacturers to ensure proper functioning under various conditions.

d. Debugging Tools

  • Use Android Studio’s debugging tools and logcat to monitor app behavior.
  • Utilize ADB commands for triggering assistant actions during testing.

8. Distribution and Compliance

a. Privacy Policy

  • Provide a clear privacy policy that explains how the app collects, uses, and shares user data.

b. User Consent and Transparency

  • Obtain explicit user consent for microphone access.
  • Clearly inform users about the app’s background behavior and the need for the ongoing notification.

c. Google Play Store Policies

  • Ensure compliance with all Google Play policies.
  • Be transparent about foreground service usage and background microphone access.

d. Data Security

  • Secure any data collected and transmitted by the app.
  • Avoid collecting unnecessary user data.

9. Future Enhancements

a. Advanced Natural Language Processing

  • Integrate NLP capabilities using libraries like TensorFlow Lite or Google’s ML Kit for more sophisticated command understanding.

b. Multilingual Support

  • Expand the app to support additional languages by configuring speech recognition and TTS for different locales.

c. Custom Wake Words

  • Allow users to customize the wake word using Porcupine’s custom wake word feature.

d. User Interface Improvements

  • Enhance the app’s UI for better user interaction, including settings for personalization.

e. Integration with Services

  • Connect the assistant with external APIs for extended functionalities like weather updates, news, or smart home controls.

This documentation provides a detailed, modern, and compliant guide to building an Android voice assistant with wake word detection and Home button activation. It adheres to the latest Android development best practices and policies as of 2023-10.

(welcome to the dead internet, where AI can masquerade for documentation)

sounds like rtc and function calling… the idea is behind on what openai already has out there in my humble opinion

Actually, it has no RTC or function calling mentioned. This focuses on the OS hooks that an app like Google Assistant would use to become the default assistant. This is notably absent in the ChatGPT app (or at least on my stone-age devices with a few years on them). You don’t need to integrate with Google if you are the new Google. The rest you can consider introductory but boilerplate or placeholder.

Actually, it has no RTC or function calling mentioned.

  1. Voice Recording and Transcription in the App:

When you want to switch to voice input, say “Hey Google, open Voice Typer.”

number 2 talks about using whisper…

anyways, wasn’t one of the oai 12 days talk about something similar to this for the chatgpt app?

Yes, advanced voice mode in ChatGPT with looking through your camera, talking to Santa. But not: hold down the Android home button to get the voice assistant or having it burn battery to hear “hey Google”.

The killer app to go along with what’s in your pocket would be a home hardware voice assistant like used for Alexa. Hopefully with a plan to not lose a billion dollars like Amazon, and to meet mom’s expectations of getting the weather and playing a song.

Yes, the goal was to use whisper locally instead of remote google’s transcription. But as I have no idea how to replace the stt button in Google keyboard (btw great keyboard) and it seems almost impossible without rebuilding the whole keyboard app… The easiest thing was to launch a stt only app that will transcribe speech and store it somewhere…

Either you have a separate editor (not the goal, nor practical in normal workflow) or you simply save it in the clipboard.

Launching with Google assistant is optional and needs no dev effort (the assistant is already installed on the device, is used by many, and would act as an easy UI to launch your app without closing the current window).

So I kind of presented my vision on a simple to do app by those who know how to code on Android…

How many years whisper is available? 2 ? and still, no local app that is worth looking at…

All other speech to text tools are seriously crap for guys like me:

  • 2 native languages,
  • 2 more - bilingual,
  • and 1 more - fluent.

In 3 I have light accent , in 1 - a bigger one…

Thinking and working in whichever language is faster for the task.

Whisper is kind of the only one that shows hope of keeping up with me. And nothing in local that has a decent functionality (UI not AI) in 2 years? That’s what I call a business niche.

takes 2 minutes to make a version for yourself and 2 years to make it consumer ready by yourself… and by consumer ready I mean everything a normal for profit project needs to have, from marketing, to security, to sales, to a bank account and so on

The hardest part is converting the time spent to cash. By the time someone finishes this app and asks its users for money… all major apps might already have something like this (which a lot already do)… If it bothers google, they can just add it everywhere in 24-48 hours.

Anyone that has the skills to do this, will do it in 2 minutes as a local server that they can run using their own keys on their phone or on their home network. To make all that effort to publish something like this and not get killed in the market would require deep pockets. Low hanging fruit that won’t be a good idea in 2 years from now. At least that’s how I view it.

The goal is to have this thing run on the device, be installable from marketplace, and be premium under something like $20 a year.

I agree that marketing is totally another beast to fight with.

But the first part like development and compliances might take someone about a week if not counting for the time to get approved.

So spending a week to have a chance to bother Google, I’m taking it.

If someone is willing to develop this for me I’m ready to hire you for a reasonable “2 minutes” development time, and take care of the marketing (basically I think it would take me just couple of email campaigns to get first 100 users).