a new autosub and an older autosub

  • I briefly look at the developers behind two versions of the python package called autosub
  • I try out each of these versions of autosub on the same English-language mp4 file
  • I post the resulting .srt files to github 

There are at least two python packages that are called autosub. One of them is a newer version of the older one. What are they and what are/were they used for? I am going to quickly try both of them here without digging too deep.

Both of these packages can be used as voice-to-text transcription via the command line.

an older autosub:

agermandisis's autosub 

This project is labeled as no longer being maintained. At the time of writing, it seems to have last been updated 15 months ago. It's on Anastasis Germandis' github, who I guess is one of the RunwayML founders. I don't know anything about RunwayML except that it exists and some people talk about it in regards to machine learning and computer vision.

When I installed this version of autosub from the github repo it worked on the first try using python 3.7.7. These were the steps I took to install and use it with its default settings on a video with (Austrian accent) English:

$ pip install git+https://github.com/agermanidis/autosub.git
$ autosub movies/test.mp4 

a newer autosub: 

BingLingGroup's autosub

This project, from what I can tell, is being actively maintained. It's a fork of a fork of the agermandisis autosub. 

The first fork is on Wang Jiaxiang's github. I looked for a second to see what other kind of work Wang Jiaxiang does, I believe they're a designer/programmer from Shandong, China. 

The fork of Jiaxiang's fork, which I'm installing it from, is maintained on the BingLingGroup's github. BingLingGroup describes themselves as "BingLing Fansub Group github account" located at Kadic Academy. I guess there's a real Kadic Academy in France somewhere, and there's a fictional Kadic Academy in Code Lyoko (a French animated series)

I'm kind of interested in learning more about the series and whether this github account is listing their location as the animated or real-world version of Kadic Academy. Can it count as a post for this "captions and subtitles" blog, if I do some future post on Code Lyoko? 

Back to autosub, the documentation for this version of autosub is heftier than the documentation for the other version I tried. It lists several branches I can try installing. I chose the most recent release and installed and used its Google Speech V2 API option like this:
$ pip install git+https://github.com/BingLingGroup/[email protected] ffmpeg-normalize
$ autosub -i movies/test2.mp4 -S en

Comparing the results

If helpful or interesting to you, the two resulting .srt files are posted to my github here.

After trying out agermandis's autosub and taking a look at the .srt file, the part I wanted to most compare to the BingLingGroup's autosub default output was this particular caption group. From agermandis's autosub with default settings, this was what I got:

The actual speech is "It handles URL requests and URL mapping for you." Kinda cute result, I liked it.

So now I'm going to check the BingLingGroup's autosub. Here's what I get:

Well, it would appear to be the exact same understanding of the speech. Cool.

There are quite a few differences in how the speech was transcribed and organized into caption groups, though. I included a file that shows the differences in the github I linked above, if you would like to see.

  • What's a good way for me to compare these files in a meaningful way? 
  • Are there other speech transcriber APIs can I use via BingLingGroup's autosub?
  • How is this nice image useful for me (from the BingLingGroup's autosub readme):