Thanks to @mxincredible for alerting me to this. Think safety videos for air flights are boring? Take a look at this one from Virgin America with some great visuals used in the subtitles:
Thanks to @mxincredible for alerting me to this. Think safety videos for air flights are boring? Take a look at this one from Virgin America with some great visuals used in the subtitles:
Something I have not blogged much about to date is the topic of machine translation and its use within a subtitling context. Having read about a project titled SUMAT I was lucky enough to asks questions on this topic with Yota Georgakopoulou:
Q1: What does SUMAT stand for? (is it an Acronym?)
Yes, it stands for SUbtitling by MAchine Translation.
Q2: How is SUMAT funded and what industries/companies are involved?
SUMAT is funded by the European Commission through Grant Agreement nº 270919 of the funding scheme ICT CIP-PSP – Theme 6, Multilingual Online Services.
There are a total of nine legal entities involved in the project. Four of them are subtitling companies, four are technical centres in charge of building the MT systems we are using in the project, and the ninth is responsible for integrating all systems in an online interface through which the service will be offered.
Q3: Can you give us a little bit of information on your background and what your involvement in SUMAT has been to date?
I have been working in translation and subtitling ever since I was a BA student in the early 90’s. I was working in the UK as a translator/subtitler, teaching and studying for a PhD in subtitling at the time of the DVD ‘revolution’, with all the changes it brought to the subtitling industry. This was when I was asked to join the European Captioning Institute (ECI), to set up the company’s translation department that would handle multi-language subtitling in approximately 40 languages for the DVD releases of major Hollywood studios. That’s how my career in the industry began. It was a very exciting time, as the industry was undergoing major changes, much like what is happening today.
Due to my background in translation, I was always interested in machine translation and was closely following all attempts to bring it to the subtitling world. At the same time, I was looking for a cost-effective way to make use of ECI’s valuable archive of parallel subtitle files in 40+ languages, and the opportunity came up with the SUMAT consortium. ECI has since been acquired by Deluxe, who saw the value of the SUMAT project and brought further resources to it. Our involvement in the project has been that of data providers, evaluators and end users.
Q4: Machine Translation (MT) already has some history of being used to translate traditional text. Why has machine translation not been put to use for translation subtitles?
Actually, it has. There have been at least two other European projects which have attempted to use machine translation as part of a workflow that was meant to automate the subtitling process: MUSA (2002-2004) and eTITLE (2004-2006). Unfortunately, these projects were not commercialized in the end. Part of the reason for this is likely to be that the MT output was not of good enough quality for a commercial setting. As professional quality parallel subtitle data are typically the property of subtitling companies and their clients, this is not surprising. The SUMAT consortium invested a large amount of effort at the beginning of the project harvesting millions of professional parallel subtitles from the archives of partner subtitling companies, then cleaning and otherwise processing them for the training of the Statistical Machine Translation (SMT) systems our Research and Technical Development (RTD) partners have built as part of the project.
Q5: Some readers might be concerned that a machine could never replace the accuracy of a human subtitler translating material. What is your response to that concern?
Well, actually, I also believe that a machine will never replace a subtitler – at least not in my lifetime. MT is not meant to replace humans, it is simply meant to be another tool at their disposal. Even if machines were so smart that they could translate between natural languages perfectly, the source text in the case of film is the video as a whole, not just the dialogue. The machine will only ‘see’ the dialogue as source file input, with no contextual information, and will translate just that. Would a human be able to produce great subtitles simply by translating from script without ever watching the film? Of course not. Subtitling is a lot more complex than that. So why would anyone expect that an MT system could be able to do this? I haven’t heard anyone claiming this, so I am continuously surprised to see this coming up as a topic for discussion. I think some translators are so afraid of technology, because they think it will take their jobs away or make their lives hard because they will have to learn how to use it, that they are missing the point altogether: MT is not there to do their job, it is there to help them do their job faster!
Q6: Is the technology behind SUMAT similar to that used by You Tube for its ‘automated subtitles’?
Yes, in a way. YouTube also uses SMT technology to translate subtitles. However, the data YouTube’s SMT engines have been trained with is different. It is not professional quality subtitle data, but vast amounts of amateur quality subtitle data found on the internet, coupled with even larger amounts of any type of parallel text data found on the web and utilized by Google Translate. Also, one should bear in mind that many ‘issues’ found in YouTube subtitles, such as poor subtitle segmentation, are a result of the input text, which in some cases is an automatic transcription of the source audio. Thus, errors in these transcriptions (including segmentation of text in subtitle format) are propagated in the ‘automatic subtitles’ provided by YouTube.
SUMAT also uses SMT engines built with the Moses toolkit. This is an open source toolkit that has been developed as part of another EU-funded project. In SUMAT, the SMT engines have been trained with professional quality subtitle data in the 14 language pairs we deal with in the project, and supplemented with other freely available data. Various techniques have been used to improve the core SMT systems (e.g. refined data selection, translation model combination, etc.), with the aim of ironing out translation problems and improving the quality of the MT output. Furthermore, the MT output of SUMAT has been evaluated by professional subtitlers. Human evaluation is the most costly and time-consuming part of any MT project, and this is why SUMAT is so special: we are dedicating almost an entire year to such human evaluation. We have already completed the 1st round of this evaluation, where we focused on the quality output of the system, and we have now moved on to the 2nd round which focuses on measuring the productivity gain that the system helps subtitlers achieve.
Q7: Why do you think machine translation is needed in the field of subtitling?
I work in the entertainment market, and there alone the work volumes in recent years have skyrocketed, while at the same time clients require subtitle service providers to deliver continuous improvement on turnaround times and cost reduction. The only way I see to meet current client needs is by introducing automation to speed up the work of subtitlers.
Aside from entertainment material, there is a huge amount of other audiovisual material that needs to be made accessible to speakers of other languages. We have witnessed the rise of crowdsourcing platforms for subtitling purposes in recent years specifically as a result of this. Alternative workflows involving MT could also be used in order to make such material accessible to all. In fact, there are other EU-funded projects, such as transLectures and EU-Bridge, which are trying to achieve this level of automation for material such as academic videolectures, meetings, telephone conversations, etc.
Q8: How do you control quality of the output if it is translated by a machine?
The answer is quite simple. The output is not meant to be published as is. It is meant to be post-edited by an experienced translator/subtitler (a post editor) in order for it to reach publishable quality. So nothing changes here: it is still a human who quality-checks the output.
However, we did go through an extensive evaluation round measuring MT quality in order to finalise the SMT systems to be used in the SUMAT online service, as explained below. The point of this evaluation was to measure MT quality, pinpoint recurrent and time-consuming errors and dedicate time and resources to improving the final system output quality-wise. Retraining cycles of MT systems and other measures to improve system accuracy should also be part of MT system maintenance after system deployment, so that new post-edited data can be used to benefit the system and to ensure that the quality of the system output continues to improve.
Q9: How do you intend to measure the quality/accuracy of SUMAT?
We have designed a lengthy evaluation process specifically to measure the quality and accuracy of SUMAT. The first round of this evaluation was focused on quality: we asked the professional translator/subtitlers who participated to rank MT output on a 1-5 scale (1 being incomprehensible MT output that cannot be used, and 5 being near perfect MT output that requires little to no post-editing effort), as well as annotate recurrent MT errors according to a typology we provided, and give us their opinion on the MT output and the post-editing experience itself. The results of this evaluation showed that over 50% of the MT subtitles were ranked as 4 or 5, meaning little post-editing effort is required for the translations to reach publishable quality.
At the second and final stage of evaluation that is currently under way, we are measuring the benefits of MT in a professional use case scenario, i.e. checking the quality of MT output indirectly, by assessing its usefulness. We will thus measure the productivity gain (or loss) achieved through post-editing MT output as opposed to translating subtitles from a template. We have also planned for a third scenario, whereby the MT output is filtered automatically to remove poor MT output, so that translators’ work is a combination of post-editing and translation from source. One of the recurrent comments translators made during the first round of evaluation was that it was frustrating to have to deal with poor MT output and that there was significant cognitive effort involved in deciding how to treat such output before actually proceeding with post-editing it. We concluded it was important to deal with such translator frustrations as they may have a negative impact on productivity and have designed our second round of experiments accordingly.
Q10: Are there any examples of translation subtitles created by SUMAT?
Yes, the SUMAT demo is live and can be found on the project website (www.sumat-project.eu). Users can upload subtitle files in various subtitle formats and they will be able to download a machine translated version of their file in the language(s) they have selected. We have decided to limit the number of subtitles that can be translated through the demo, so that people do not abuse it and try to use it for commercial purposes.
Q11: Does SUMAT have a role to play in Same Language Subtitles for Access? (Subtitles for the Deaf and HOH)
No. SUMAT is a service that offers automation when one needs to translate existing subtitles from one language to another and presupposes the existence of a source subtitle file as input.
Q12: You recently gave a workshop for SUMAT at the Media For All conference, can you tell us a little bit about the results of the workshop?
The workshop at Media for All was the culmination of our dissemination efforts and the first time the SUMAT demo was shown to professionals (other than staff of the subtitling companies that are partners in this project). These professionals had the chance to upload their own subtitle files and download machine-translated versions thereof. There were approximately 30 participants at the workshop, who were first briefed on the background of the project, the way the MT systems were built and automatically evaluated, as well as on the progress of our current evaluation with professional translators.
In general, participants seemed impressed with the demo and the quality of the MT output. Representatives of European universities teaching subtitling to their students acknowledged that post-editing will have an important role to play in the future of the industry and were very interested in hearing our thoughts on it. We were also invited to give presentations on post-editing to their students, some of which have already been scheduled.
Q13: Where can readers go to find out more about this project?
The best source of information on the project is the project website: http://www.sumat-project.eu. We have recently re-designed it, making it easier to navigate. One can also access our live demo through it and will eventually be able to access the online service itself.
Q14: Is there anything readers can do if they wish to get involved in the project?
Although the project is almost complete, with less than half a year to go, contributions are more than welcome both until project end and beyond.
Once people have started using the live demo (or, later on, the service itself), any type of feedback would be beneficial to us, especially if specific examples of files, translations, etc. are mentioned. We plan to continue improving our systems’ output after the end of the project, as well as add more language pairs, depending on the data and resources we will have available. As we all know, professional human evaluation is time-consuming and costly, so we would love to hear from all translators that end up using the service – both about the good and the bad, but especially about the bad, so we can act on it!
Q15: If you could translate any subtitling of your choice using SUMAT what would it be?
Obviously MT output is most useful to the translator when its accuracy is at its highest. From our evaluation of the SUMAT systems so far, we have noticed trends that indicate that scripted material is translated with higher accuracy than unscripted material. This is something that we are looking at in detail during the second round of evaluations that are now underway, but it is not surprising. MT fares better with shorter textual units that have a fairly straightforward syntax. If there are a great deal of disfluencies, as one typically finds in free speech, the machine may struggle with these, so I’m expecting our experiments to confirm this. I suppose we will need to wait until March 2014 when our SUMAT evaluation will be completed before I can give you a definite answer to this question.
Thanks again to Yota for agreeing to the Q&A and for providing such informative answers.
This webcast posted by the Society of Motion Picture and Television Engineers (SMPTE) is a good introduction to current US captioning regulatory requirements and new requirements due to come into play in the USA. All US broadcasters must caption content online that has previously been broadcast on linear TV by the end of this month. This includes pre-recorded content that has been edited for broadcast online. By March 2014, this also applies to live and near live content. Whilst the webcast is US-Centric the technical problems and solutions it discusses around captioning formats for online, and multi-platform broadcast content is relevant to all global broadcasters. The webcast covers both pre-recorded/block style captioning as well as live subtitling. It is captioned and you can view it below:
Here are two fun videos that illustrate two very different results when captioning music.
The first is lyric video for One Direction lyrics as captioned by You Tube’s auto captioning system. (You can also view the results of Taylor Swift’s lyrics)
Machine translation does have a role to play in providing access and despite these funny videos continues to improve but that is for another blog post.
Continuing on, compare the above with the fantastic skill of this stenographer and watch them subtitle Eminem’s Lose Yourself in real-time (music starts at 1:35 in).
Stenography is also used to caption/subtitle live television – see #subtitlefail! TV
A lot of viewers to this blog are using search terms that seem to relate to using subtitles to learn a second language so depending on feedback I get I may later change this blog post into a page for reference in the future. My question is how many good resources are there on the web that help with this? Of course anyone can watch a DVD and/or download a subtitle file in the language they are trying to learn but what about other resources?
Here are some I have found to date. I cannot vouch for their usefulness since I am not using any to learn a language but I have included them because they offer something extra than just a subtitle file.
An interesting website for learning Chinese.
Clip Flair describes itself as Foreign Language Learning through Interactive Captioning and Revoicing of Clips. It is an online tool that allows users to create clips, revoice them, and subtitle them. The video below demonstrates how it works. (Note: there is no audio dialogue on this video)
Anyone learning English as a second language that is also a music lover might want to check out the musicESL You Tube channel and the website MusicEnglish for collections of subtitled music videos. If music is not your thing then Voice of America (VOA) has captioned You Tube videos for viewers to learn American English and much more with captioned news reports that are read at a slower speed.
If anyone else knows of any good online resources please comment and share. Thanks!
A fully subtitled from launch, with the aim to also eventually provided full BSL signed movies from a Video On Demand service. Imagine that? Well one business entrepreneur Shaun Sadlier is planning to do just that through Films14. Read the Q&A from Shaun below and watch the video for more information:
Q: Your service is called Films14. Is there a story behind the name?
A: I was looking for a name which it is easy to remember and maximum is 7 letters or numbers, films is what we provide and 14 references 2014 when we want to launch.
Q: You are based in the UK but the internet is global. Can anyone sign up to Films14 or is it UK residents only?
A: That’s correct, we are global brand but we start out in UK and if it goes well then we will expand across the world. Anyone can sign up but it is for UK residents only. If I found anyone who aren’t UK residents then they have to wait for us to come over.
Q: Can you reveal what content there will be available to watch?
A: We’ve got two types of content, Subscription and On Demands. There will be 50+ movies / TV shows in the first month and additional 50 or more on every month for Subscription. There will be 60+ blockbusters movies every year for On Demands.
Q: The subscription content – does that cost extra to access it in addition to the monthly fee? Or does the monthly fee give you access to the subscription content?
A: No, it will not cost extra. It is a monthly fee to access subscription and discount blockbuster movie from On Demand.
Q: Are there any benefits to signing up in advance of the Films14 launch?
A: Yes, there is a benefit.
1. £4.99 for first month and then £6.99 monthly
2. Access to subscription movie’s and TV series (50+ Movie’s & TV Series addition every month)
3. Discount Blockbusters movie’s On Demands (60+ New movie’s in a year)
4. Can cancel membership after first month
5. Pay nothing until launch
6. 100% Subtitles and In-vision signer for sign language (On and Off feature!) – World first!
7. Mystery Gift on the Launch day for Pre-Launch membership only
About the Mystery Gift.
1. If we get over 20,000 UK residents sign up before launch then Pre-Launch membership will get £4.99 monthly for life.
2. If we get over 50,000 UK residents sign up then before launch Pre-Launch membership will get £3.99 monthly for life.
3. If we get over 150,000 UK residents sign up then before launch Pre-Launch membership will get £2.99 monthly for life.
Q: How is this service funded?
A: This service will be funded by crowdfunding and then membership sign up on the first month of launch. Our Seed Enterprise Investment Scheme and Enterprise Investment Scheme are currently pending which take up 4 to 6 weeks.
Q: How will the subtitles be provided, are you creating them?
A: Our content distributors provides movies with subtitles included. I won’t accept any movies or TV show without subtitles available because in my view, it is pieces of junk.
Q: How will the BSL be provided, are you creating them?
A: I have a studio which I can use and hire professional BSL signer’s but it will take lots of time to edit them therefore I am looking around for a professional company that can offer a good deal.
Q: Will all content released on the website have subtitles and BSL immediately?
A: All will have subtitles immediately and BSL will start out with a few titles because it is very expensive and it is new technology. Eventually, all movies will have Sign Language included. That’s our mission.
Q: What are the challenges you are facing in getting this service up and running?
A: The most challenging is to get as many subscriber’s as possible to cover the costs and in-vision signer features. I am very confident it will go OK.
Q: Will you be able to watch the content on all internet enabled devices or desktop and laptops only?
A: It will work on Playstation 3, Wii, iPad and any devices with an internet connection and screen because we are going to use HTML5 video player.
Q: What can readers do to help get the service up and running?
A: Readers can help us to find weakness in our services and sign up please.
Q: What is your favourite subtitled content?
A: 100% Subtitles with options of size, colour and background colour to suit their need. I don’t have a favourite subtitled movie because I love so many movie’s so it is very difficult to choose. But I mostly watch Sci-fi, Horror, Thriller, Adventure and Drama. Sometime Comedy.
Q: What is your favourite BSL content?
A: In-vision signer with on and off feature. We are going to start with British Sign Language and when we expand to USA we will put in America Sign Language. American’s are excited and want us to come over, even Australia as well! I don’t have a favourite British Sign Language movie because I haven’t seen one yet considering we don’t get 24/7 access to entertainment and currently it is very limited access. When I heard about a movie with in-vision signer on TV, they normally show these at 2am in the morning which it is frustrating for us. And, some BSL TV series are shown on PC or Laptop which is limited devices. Therefore, our company is 24/7 access, you can watch anytime, anywhere and any devices with internet connection and screen. It will also be the fastest way to watch movies.
Q: Why do you think current content providers are so slow at providing access?
A: They don’t think how important about our access need because they don’t see how we feel after all these years. I feel so frustrated to have limited access to entertainment and it is getting worse. So, here I am.
Q: Is there anything else you’d like readers to know about Films14?
A: Films14 is Deaf-led company and we know what we need to access the enjoyment of movies and TV Series. Also, we are world first to have sign language with on and off features. Just like subtitles.
All the best!
Shaun has already made a BSL signed and subtitled video explaining the service which you can watch on the Films14 website or watch it below:
Readers who are keeping up to date with subtitling solutions and projects might be pleased to know that the former Indiegogo project for a subtitling solution for cinemas is a project now being developed by GeoJaX Ltd and Mystery Technology LLP.
Entrepreneur George Georgiou and inventor Jack Ezra have teamed up to form “GioJaX Ltd” and “Mystery Technology LLP”, which will develop an “Off-Screen Cinema Subtitle System” for the deaf and hard of hearing”. The development work will be carried out in Sri Lanka, China & the UK over the coming months with a fully working system hopefully, being tested in October/ November 2013. The Off-Screen Cinema Subtitle System uses a special display under the movie screen which is invisible to the general audience until you wear special light-weight glasses and then the subtitles are viewable to anyone in the audience wishing to see them.
I was lucky enough to be shown a prototype of the technology last week. Already built as a demo on a laptop I was shown what appeared to be a blank screen. However as soon as I put on a standard pair of 3D glasses (the same kind worn for 3D movies at the cinema now) I could see letters, and numbers displayed across the screen. It was great to see a real working example of the technology I had heard being described as a potential way of displaying subtitles at the cinema that are only viewable to those that wish it. It is the closest experience I have had of using different technology than that of open captions but still gives the same feel as using open captions or switching on the subtitles on the television or on a DVD. The text was easy to read and the glasses comfortable to wear. The next step will be for the company to build a fully working system example and get feedback. I for one will be keeping an eye on the progress with this project. And I am not the only one – industry professionals such as Regal in the USA, and in the UK, the Cinema Exhibitors Association and Cineworld, have offered their help to the new venture in the form of feedback, testing and promotion of the technology.
I have blogged previously about this popular BBC TV series before and how Sherlock uses visual text on screen as part of the storytelling process (this is actually one of my most popular posts for hit counts!). Last week the BBC did something rather cool involving the subtitles for the deaf and hard of hearing for this series.
On Friday 12th July the BBC scheduled a repeat of an episode and urged viewers to tune in to look for clues not previously released in any other broadcast of the episode that would give fans a sneak peek into an episode title for the next series due to air later this year/beginning of 2014.
Like previous broadcasts the episode was subtitled for the deaf and hard of hearing. But in addition for this repeat only, the subtitles also displayed in the top left hand corner, letters that acted as clues to viewers and was part of the promotion to encourage repeated viewing and speculation about the new series. Nothing to do with providing access, but a fitting way to uses subtitles as part of a promotional campaign for the series. If you were watching without the subtitles switched on, you would have missed the clues but this is entirely fitting as a campaign to the programme to think outside the box and consider all your options. Below are screen shots showing the letters being displayed in the subtitles in the top left of the screen so not to be confused with any of the subtitled audio dialogue:
It spells HIS, now fans just have to work out its significance in terms of the episode title.
Back in March 2013, some, including myself were lucky enough to take part in a trial to test some personalised technology that provides subtitles to cinemas. The trial took place in London and was organised by the Cinema Exhibitors Association and they have now published the results to those that attended. I have summarised the main points below:
The project was designed to gather:
- Findings from a demonstration of four of the leading CC technologies for interested industry partners;
- Initial and headline structured feedback from a small sample of people with varying degrees of hearing loss on their experience of using the systems;
- And preliminary feedback from an operator perspective on the potential management, practical and technical considerations around each of the systems.
The suppliers and products involved were:
- Doremi – Captiview for CC, and Fidelio for audio description (AD) and hearing assist.
- Sony – Entertainment Access Glasses (SEAG) for CC and connecting headphones for AD and hearing assist.
- USL – Captionwear glasses and screens for CC and connecting headphones for AD and hearing assist.
While the AD functionalities of the products were part of the industry showcase, the audience screenings concentrated solely on CC, that being the technology which offers something completely new for customers.
For more details read the CEA’s published report. Now that this detail has finally been released I can talk more freely about the device I got to test. I was given the Captiview device to watch the movie Wreck It Ralph. The good thing about it was that the subtitles worked, were pretty accurate with the exception of a few letters dropping of the ends of words at the end of a line on the screen. They were easy to follow for someone used to reading subtitles but trying to watch the action on screen is much harder and so the movie experience itself was not as immersive as it would’ve been through no fault of the movie itself. More recently whilst on holiday in the United States I got to use the device again in a real screening for Iron Man 3:
I got a few strange looks from some people in the cinema who clearly hadn’t seen this device being used before but that didn’t bother me. What did bother me was the fact that I couldn’t get the device positioned correctly. Why? Because the device is supposed to sit in the cup holder on your seat. Except in this cinema it didn’t fit correctly. This made it an even worse experience than during the trial where the device was fitted for me and correctly before sitting in my seat. Again whilst the subtitles were accurate, it’s the practicality of using the device that left me feeling a bit disheartened by it all. For a start, collecting a device at the point that you purchase the ticket, and then having to carry it around. It is not very heavy but it is bulky. Trying to juggle carrying that whilst also purchasing popcorn, and then what if you want a toilet visit prior to being allowed into the cinema to take your seat? What do you do with the piece of kit you are carrying around? (I hope the cinema’s that provide these devices consider hygiene and that they are wiped clean after each use).
Back in the UK and open subtitled cinema screenings has been a bit of mixed bag. I failed to get to see Star Trek into Darkness with subtitles because the advertised subtitled screening I wanted to go to got cancelled. More recently though I did get to successfully go to a subtitled screening of Man Of Steel. A life long fan of Superman, Man of Steel is actually the first ever Superman-related subtitled cinema screening I have attended. To be able to hear all the dialogue prior to the movies DVD release and turning on the subtitles months after struggling to watch it without is a complete joy and something I suspect hearing people take for granted (I can’t tell you the number of movies I’ve re-watched on DVD with the subtitles on after its cinema release to find myself thinking ‘Oh, so that’s what they said, now I get it!’).
Will the UK see personalised subtitling solutions in cinemas? The CEA don’t have an answer for that just yet. Since the feedback from the trials was mixed and sometimes conflicting I hope that there are more trials to come before committing to the right technological solution. The CEA have said that if/when there is further progress they will make this known so keep an eye on the CEA website.
Ever wonder what You Tube’s auto-captioning would make of Taylor Swift’s song lyrics? Wonder no more thanks to this funny video from Rhett and Link.
Captioned Music – automated vs human skill | i heart subtitles, itsmesammies, Claire Brown, and 2 others are discussing. Toggle Comments