Creating a Transcription Job

Create and submit a job to transcribe one or more media files to text files in the Speech service.

Before you begin

Store the media files that you want to transcribe in an Object Storage bucket.
To compare the Whisper and Oracle ASR models for transcription job creation, see Comparing Whisper and Oracle ASR Models.

Comparing Whisper and Oracle ASR Models

Compare Whisper model and Oracle ASR model for creating transcription jobs.

In addition to the native Oracle ASR speech model, Speech supports the Whisper model from OpenAI. Whisper is trained on a large corpus of multilingual data collected from the web, and it supports file-based voice-to-text transcription for over 50 languages. This model uses the same service end points and API and SDK interfaces as the Oracle ASR model to give you flexibility and compatibility. In addition, the Whisper model uses diarization to label individual speakers in the recording.

Use the following comparison of the Whisper and Oracle ASR models to choose the correct model when creating a transcription job.


Feature	Oracle ASR Model	Whisper Model in OCI Speech
Real time transcriptions	Supported	Not supported
Large file size	Up to 2 GB	Up to 2 GB
Word level timestamp	Supported	Supported
File format	AAC, AC3, AMR, AU, FLAC, M4A, MKV, MP3, MP4, OGA, OGG, OPUS, WAV, WEBM	AAC, AC3, AMR, AU, FLAC, M4A, MKV, MP3, MP4, OGA, OGG, OPUS, WAV, WEBM
Multilingual support	English, Spanish, French, German, Italian, Portuguese, and Hindi	Same as Oracle ASR model plus 50 other languages*
Diarization	Supported	Supported

* OpenAI Whisper FAQ

To create a transcription job, follow these steps:
1. Open the navigation menu and click Analytics & AI. Under AI Services, click Speech.
2. In the left-side navigation menu, click Transcription jobs
3. Under List scope, select the compartment that you want to work in.
4. Click Create job.
5. On the basic information page, enter a unique name (255 character limit) for the project. The name must include one or more alphanumeric characters, dashes, or underscores in any order. If you don't provide a name, a name is automatically generated for you.
  
  For example:
  
  AiSpeechTranscriptionJob20220804134759
6. (Optional) Enter a description (400 character limit) for the job.
7. Select the compartment to create the job in, if different from the one displayed.
8. Under Input, select a data input bucket that contains the media file that you want to transcribe.
  
  If the bucket that you want isn't in the selected compartment, change the compartment.
9. Under Output, select where you want to store the output files, either in the input bucket or in a different bucket. To use a different bucket, select it.
10. (Optional) Enter an output prefix to separate and sort the files in the bucket.
  
  For example, you could enter call_ctr for call center media files.
  
  You can also create an output folder in your bucket by using a slash (/). For example, MyResults/ stores all the transcribed files in a MyResults folder in the bucket.
11. Select the model type of the job you're creating.
  
  Note
  
  See Comparing Whisper and Oracle ASR Models to determine the model type to use.
12. If you selected a Whisper model in the previous step, select the model subtype. Otherwise, proceed to the next step.
13. Select the language of the media file.
  
  You can search for the appropriate language by language or by language code (for the Oracle model). US English is the default.
14. (Optional) To include both the SRT and JSON formats in the transcription, select Get SRT transcription format.
15. If you don't want your transcription punctuated, clear Enable punctuation.
  
  Note
  
  Enable punctuation is selected for Whisper models and can't be cleared.
16. (Optional) To identify the speakers in the input file, select Enable diarization.
  
  You can let the Speech service automatically detect the number of unique speakers in the input file or you can enter a number. The minimum number of speakers is 2 and the maximum is 16.
  
  Note
  
  Using diarization increases the transcription task latency, which is why this option is disabled by default.
17. To add filters to change the way the output file is generated, click Add filter.
  
  Select a filter type. Profanity is the default.
  
  Select the filter mode:
  
  For example, the profanity filter offers these modes:
  
  Mask:Any detected profanity is masked in the transcription with asterisks except for the first letter.
  
  Remove: Any detected profanity is replaced with one asterisk in the transcription.
  
  Tag: Profanity isn't masked or removed but is marked as TYPE: "Profanity" in the transcription.
18. (Optional) Click Show advanced options to assign tags to the job. Tags help you to easily locate and track resources by selecting a tag namespace, then entering the key and value.
  
  Tagging describes the various tags that you can use organize and find resources including cost-tracking tags.
19. Click Next to choose the files for the job.
20. Select the checkboxes for the media files that you want to transcribe or select them all by selecting the checkbox next to Name.
  
  Note
  
  The maximum file size is 2 GB.
  
  File duration is a maximum of 4 hours.
21. Click Submit to start the job.
  
  A job can run in seconds or hours depending on the size and number of files that you select. While running, the job is in an in-progress state that changes to succeeded or failed when it finishes. You can select a job to go to its details page.
  
  Each job can have up to 100 tasks.
  
  Jobs are retained for 90 days.
Use the create command and required parameters to create a transcription job.
```
oci speech transcription-job create [OPTIONS]
```
Avoid entering confidential information.

For a complete list of flags and variable options for CLI commands, see the CLI Command Reference.
Use the CreateTranscriptionJob and ChangeTranscriptionJobCompartment operations to create a job.

Oracle Cloud Infrastructure Documentation

Creating a Transcription Job

Comparing Whisper and Oracle ASR Models