Now On Sale!!
VOCALOID is a singing voice synthesis technology developed by Yamaha Corporation, and is also the title of this software application. This software allows users to input melody and lyrics in order to synthesize a singing voice. In other words, a singing voice can be produced without a human singer by using this technology. Synthesized singing from this tool is produced from Singer Libraries, databases of fragments of voices and phonemes recorded from actual singers. Various voices can be used by changing the Singer Library. There are lots of musical pieces and works created by users using products with VOCALOID technology, on Youtube and many other types of media.
Formerly, when you wanted to have someone sing a song you composed, you had to do some onerous things, such as finding and negotiating with a singer and actually recording them in a studio. However, you now have VOCALOID. With this tool you will not need additional devices like a microphone and an audio interface. It will assist you as your exclusive vocalist anytime, whenever you want to create a song for it to sing. It will also help you create a demo song so that human vocalists can listen to it for practice, or it can even be used as sound effects for a DJ performance. VOCALOID will support your music life with its singing voice.
To start producing a singing voice using VOCALOID, you need to own both a VOCALOID editor and a library (voicebank). The editor is the composing software and singing voice synthesizing engine, and there are two types available: "VOCALOID4 Editor" and "VOCALOID4 Editor for Cubase". The library is the product that contains the voice databank, such as "CYBER DIVA".
You need to load a library into the editor to use it.
This tool not only provides instrument features which can be used for temporary tracks before a human vocalist is selected, but also offers the possibility of new unique musical expressions which humans are unable to sing.
The first VOCALOID exclusive library with an American English female voice
Flexible voice type with clear speech, powerful long tones, and smooth vocalization that can sing any genre of song.
#CYBER DIVA has only one library, so you cannot use cross synthesis for now.
VOCALOID will start in Cubase. Start VOCALOID Editor in Cubase and you can create singing data. Using Cubase's rich set of features, complete original songs by making backing tracks and mixing in VOCALOID tracks.
VOCALOID4 Editor for Cubase operation is very simple. Using the mouse and keyboard, you can compose vocal parts by simply entering notes and lyrics on the piano roll. In addition, you can graphically adjust qualities such as strength and brightness of the sound with the provided control parameters.
Building on the same core technologies found in Steinberg’s Cubase advanced music production software, Cubase AI is a special compact version offering all the basic tools for recording, editing and mixing everything from the basic idea to the final masterpiece. You can edit audio and MIDI data in real time, use 28 high-quality VST3 plug-in effects, play more than 180 instrument sounds with HALion Sonic SE, and use MixConsole to cover a variety of routing. Cubase AI is the perfect entry ticket into the world of computer-based music production.
VOCALOID4 Editor is stand-alone software for composing with VOCALOID.
Enter lyrics and melody lines, and import a backing track in wav format to complete your masterpiece. Enjoy creating your own song!
I'm Michael Wilson, an employee at Yamaha Corporation's Corporate Research and Development Center. Here's my story about working on CYBER DIVA:
I joined Yamaha in November 2012 to help improve English VOCALOID. The common consensus on the Internet has been that English is too difficult to realize properly in the VOCALOID engine. So one of my first tasks was to try to figure out why. Listening to a lot of examples and playing with existing English VOCALOID libraries made by third parties, a few things became clear:
1) There was a mix of UK English and American English pronunciation in our spelling to phonetic symbol converter, probably frustrating speakers of both dialects.
2) There were several cases where the sounds of the phonetic symbols didn't match the sounds that they were supposed to have. This was either due to errors in recording or errors in creating the library.
3) Noise, from processing or otherwise.
When I joined, a project was just starting to create a new Yamaha-branded American English library. This was a big deal because if the quality wasn't good enough we wouldn't release it under the Yamaha name. So there was a lot of pressure to do things well. Iriyama-san, who had worked a lot on English VOCALOID before, was the project leader and I was the "English expert" on the team.
One month after I started we recorded our first candidate singer for CYBER DIVA (of course, we didn't call it CYBER DIVA at that time). And about a week after that we recorded our second candidate singer. Iriyama-san and I made some test libraries ("Singer Libraries") from these candidates, with some help from Baba-san who would eventually make the final CYBER DIVA library.
Iriyama-san did a detailed analysis of one of the test libraries and realized that there were many cases where sounds we expected were missing, or sounds that we didn't expect were present, or things were noisy or coarticulation effects were very strong. He also noted some very expressive or clear sounds as well.
Iriyama-san also found what we called the "Aspiration Problem" -- in English VOCALOID we separate plosive sounds like "p" into two phonetic symbols, an "aspirated" symbol that's supposed to come at the beginning of stressed syllables and an "unaspirated" symbol that's supposed to come at the end of a syllable or in a few other cases. What we found was that our recordings didn't match up with the labels that we put on these sounds in a large number of cases, resulting in some unnatural synthesis results.
We wanted expressive sounds, but in my thinking the first thing we needed to do was make sure we recorded the right sounds for each phonetic symbol. So I took all of February 2013 to rewrite our recording script, compacting it and strictly controlling the phonetic context so that we could fix the aspiration problem and other mislabeling problems.
In March of 2013 we did two new test recordings with two new singers, using the new recording script. The new script was shorter than the old one, but more difficult for the singer to sing. What we found from the test libraries was that the sounds were a lot more consistent, but perhaps less expressive. Nonetheless, in my estimation the synthesis results were easier to understand and we didn't have the mislabeling problems from before. There were some issues with the script we discovered during these recordings that I was able to fix.
After building two reasonable-quality test libraries from these recording sessions we had to decide if we were going to make our product with one of these or look for a new singer. After a lot of deliberation the judgment came from product planning: we should use one of the singers but coach her to sound more like the other singer. Basically we wanted a best of all worlds approach. This wasn't going to be easy, but we were going to give it our best shot.
So at the beginning of April 2013 we did two eight-hour recording sessions to run through the revised recording script three times.
There are three basic types of data in a VOCALOID library: stationaries, diphones, and triphones. Stationaries are sustained sounds, mostly vowel sounds, and for this product we recorded six different pitches which is more than any English library that has been released yet. For diphones and triphones we decided to record three pitches, which is as many as any other released English library. Some other English libraries only include two pitches. Triphones aren't strictly necessary but they can really improve naturalness. I hadn't focused on them at all, so Iriyama-san produced a supplemental recording script to record useful triphones. We ended up including 231 different triphones per pitch in the final product, more than any other English library released before.
After that, we had to actually make the library. Baba-san had a lot of experience making libraries so he started working full-time on this task. There are a lot of subtle things about a language that are difficult for a non-native speaker to evaluate, so I still was involved in checking things.
Actually, this last point was a bit difficult for me at first because I was the only American in the office. So I would try to insist that we change something because the pronunciation sounded weird to me, for instance, but since it sounded fine to the other team members it wouldn't get priority. After a couple of months we hired two American English teachers living in Hamamatsu as testers, and from that point on we could outvote the Japanese team members on issues of pronunciation. But of course pronunciation isn't consistent even among Americans, so there were times when the three of us disagreed. When we agreed on something it would get fixed, even if the distinction wasn't clear to the Japanese team members.
Since tracking all of this was difficult Iriyama-san wrote a custom bug tracker for the library. This was a big help for keeping things organized during development.
While Baba-san was making the library there were still the issues of noise and mixed pronunciation. So we tried to look into ways to improve in those areas too.
We do collaborative research with the Music Technology Group at Universitat Pompeu Fabra, which is where the underlying technology from VOCALOID got its start. We also knew that their VOCALOID libraries tended to sound very clean and natural. So I asked them about what they were doing, and they mentioned that one of the things they did was make sure that EpR was estimated very well for their libraries. EpR is basically a kind of model that is built on top of the recorded data, which allows it to be manipulated in many ways. I asked around and found out that we weren't doing anything special to handle this at Yamaha, so I consulted with MTG and learned their procedure for estimating EpR. And we applied this to CYBER DIVA and it made a bit of a difference, smoothing out some sounds. We also started applying this procedure to libraries in other languages so hopefully this will push the overall quality of VOCALOID up a tiny bit.
A bigger issue was pronunciation; even though we had pretty accurate recordings now there were still issues with converting words that were typed into the VOCALOID software into phonetic symbols that get translated into sounds. As I mentioned before there was a mix of various dialects, and there were also some words that just didn't have very good phonetic transcriptions. CYBER DIVA also includes a reduced vowel called a "schwa" but some English libraries don't have this sound so it's not used by default. We wanted CYBER DIVA users to have the best conversion possible for the CYBER DIVA library, so we wanted the schwa to be used in areas where it fit well.
The VOCALOID Editor software has a special function where you can override certain spelling-to-phoneme rules, called the User Dictionary function. After considering many possibilities, we decided to make a custom User Dictionary for CYBER DIVA to match the conversion rules as closely as possible to the library.
So in July 2013 I started working on the dictionary. I compared a lot of different information sources and wrote some computer programs to do some of it automatically, and did the rest by hand. Since this was for a product every word had to be listened to by a human to be included in the final version, so the two testers and I spent a good amount of time verifying the 10,222 entries one by one. Most of them went pretty quickly (I wrote a custom tool to help check things) but a few words sparked long discussions about what the right phonemes should be. If CYBER DIVA sings something slightly differently from what you expect I'm sorry; we tried to pick the best compromises we could! In the end though, the dictionary should get things a lot closer to what customers in the US would expect. And users can always edit the User Dictionary or change phonetic symbols by hand in the editor if they want a special pronunciation.
Incidentally, this dictionary is much, much larger than any User Dictionary that has been made before. When we first tried to load it we realized there was an inefficiency in the loading code that made it add several seconds to the startup time of the VOCALOID Editor. This was promptly fixed by the development team.
We finalized the user dictionary in October of 2013, and finished release testing of the library on March 13, 2014.
After that my involvement was mostly waiting to see when it would be released. So I have to thank the sales and marketing division for their work looking for ways to promote and market the product, and also the development division for their work building the installer and taking care of all the small details that are required in taking something from "complete" to shippable.
Time went by and the new version of the VOCALOID engine was set to be released. One of the new features was "Growl" but it required some additional samples, so we did another short recording session with the original singer to get some growl sounds in August 2014 and added them to the library. And now CYBER DIVA is slated to be released for the VOCALOID4 engine in early 2015.
And that's one story of how CYBER DIVA was made.
Just telling it like that doesn't really show how uncertain the project was for most of its development.
From the start there wasn't a lot of confidence that we would make something good. I remember an early planning meeting, well before CYBER DIVA was completed, where we were projecting sales. We were planning to sell more in Japan than we were overseas. And this was for an American English accented library with no Japanese library attached, so that should say something about how confident we were in making something that would appeal to a Western sensibility! Our initial target was to have 80% of the lyrics in a given song understandable, which if you think about as "one out of five words is unintelligible" isn't that great. Even as late as December 2014 we were told by our overseas marketing division that if we didn't take care of some promotional material details they didn't think it would be wise to launch the product.
So it has been an uphill battle the whole way. We were doing a lot of new things, which is always prone to failure, but whenever it seemed like we'd never make it everyone just kept pushing until we got through. The testers spending hours and hours listening to words and phrases and recording bugs. Baba-san going over the same samples countless times trying to get a result that sounded good. Iriyama-san keeping the whole thing together. Even though we know CYBER DIVA isn't a perfect reproduction of a human voice, it's a step forward for English VOCALOID in a lot of ways and the product of a lot of love and effort. I'm really looking forward to hearing what people make with it!