Installation:
Install using
vpkg
vpkg get https://github.com/thecodrr/vspeech
Install using
V
vpm
import thecodrr.vspeech
v install thecodrr.vspeech
Install using
git
cd path/to/your/project
git clone https://github.com/thecodrr/vspeech
You can use
thecodrr.vave
Then in the wherever you want to use it:
import thecodrr.vspeech //OR simply vave depending on how you installed
// Optional
import thecodrr.vave
Manual:
Perform the following steps:
-
Download the latest
native_client.<your system>.tar.xz
matching your system from DeepSpeech's Releases. -
Extract the
.tar.xz
into your project directory in libs
folder. It MUST be in the libs folder. If you don't have one, create it and extract into it. -
Download
pre-trained
model from DeepSpeech's Releases(the file named deepspeech-0.6.1-models.tar.gz
). It's pretty big (1.1G) so make sure you have the space. -
Extract the model anywhere you like on your system.
-
Extra:
If you don't have any audio files for testing etc. you can download the samples from DeepSpeech's Releases(the file named audio-0.6.1.tar.gz
) -
When you are done, run this command in your project directory:
export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:$PWD/lib/
And done!
Automatic:
// TODO
I will add a
bash
Usage
There is a complete example of how to use this module in
cmd/main.v
import thecodrr.vspeech
// specify values for use later
const (
beam_width = 300
lm_weight = 0.75
valid_word_count_weight = 1.85
)
// create a new model
mut model := vspeech.new("/path/to/the/model.pbmm", 1)
lm := "/path/to/the/lm/file" //its in the models archive
trie := "/path/to/the/trie/file" //its in the models archive
// enable the decoder with language model (optional)
model.enable_decoder_with_lm(lm, trie, lm_weight, valid_word_count_weight)
data := byteptr(0)//raw audio samples (use thecodrr.vave module for this)
data_len := 0 //the total length of the buffer
// convert the audio to text
text := model.speech_to_text(data, data_len)
println(text)
// make sure to free everything
unsafe {
model.free()
model.free_string(text)
}
API
vspeech.new(model_path, beam_size)
Creates a new
Model
model_path
beam_size
beam_size
beam_size
model_path
.pb
.pbmm
Model
struct
The main
struct
1.
enable_decoder_with_lm(lm_path, trie_path, lm_weight, valid_word_count_weight)
Load the Language Model and enable the decoder to use it. Read the method comments to know what each
param
2.
get_model_sample_rate()
Use this to get the sample rate expected by the model. The audio samples you need converted
MUST
3.
speech_to_text(buffer, buffer_size)
This is the method that you are looking for. It's where all the magic happens (and also all the bugs).
buffer
buffer_size
4.
speech_to_text_with_metadata(buffer, buffer_size)
Same as
speech_to_text
Metadata
5.
create_stream()
Create a stream for streaming audio data (from a microphone for example) into the decoder. This, however, isn't an actual stream i.e. there's no seek etc. This will initialize the streaming_state
in your
6.
free()
Free the
Model
7.
free_string(text)
Free the
string
speech_to_text
StreamingState
The streaming state is used to handle pseudo-streaming of audio content into the decoder. It exposes the following methods:
1.
feed_audio_content(buffer, buffer_size)
Use this for feeding multiple chunks of data into the stream continuously.
2.
intermediate_decode()
You can use this to get the output of the current data in the stream. However, this is quite expensive due to no streaming capabilities in the decoder. Use this only when necessary.
3.
finish_stream()
Call this when streaming is finished and you want the final output of the whole stream.
4.
finish_stream_with_metadata()
Same as
finish_stream
Metadata
5.
free()
Call this when done to free the captured StreamingState.
Metadata
Fields:
items
MetadataItem
num_items
confidence
Methods:
get_items()
MetadataItem
get_text()
MetadataItem
string
free()
Metadata
MetadataItem
Fields:
character
timestep
start_time
Methods:
str()
MetadataItem
string
Find this library useful? :heart:
Support it by joining
stargazers
License
MIT License
Copyright (c) 2019 Abdullah Atta
Permission is hereby granted, free of charge, to any person obtaining a copy
of this software and associated documentation files (the "Software"), to deal
in the Software without restriction, including without limitation the rights
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
copies of the Software, and to permit persons to whom the Software is
furnished to do so, subject to the following conditions:
The above copyright notice and this permission notice shall be included in all
copies or substantial portions of the Software.
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
SOFTWARE.