可以看到,输出结果的英文前后也就没有空格了。 能够在音频内容的转录之前提供一段Prompt,来引导模型更好地做语音识别,是Whisper模型的一大亮点。 如果你觉得音频里面会有很多专有名词,模型容易识别错,你就可以在Prompt里加上对应的专有名词。比如,在上面的内容转录里面,模型就把ChatGPT也听错了,变成了ChatGBT。Google的PALM模型也给听错了,听成了POM。对应的全称Pathways Language Model也少了一个s。而针对这些错漏,我们只要再修改一下Prompt,它就能够转录正确了。
audio_file= open("./data/podcast_clip.mp3", "rb") translated_prompt="""This is a podcast discussing ChatGPT and PaLM model. The full name of PaLM is Pathways Language Model.""" transcript = openai.Audio.translate("whisper-1", audio_file, prompt=translated_prompt) print(transcript['text'])
输出结果:
1 2
Welcome to Onboard. Real first-line experience. New investment thinking. I am Monica. I am Gao Ning. Let's talk about how software can change the world. Hello everyone, welcome to Onboard. I am Monica. Since the release of ChatGPT by OpenAI, the world's AI has been in a frenzy. In less than three months, it has accumulated more than 100 million active users, and more than 13 million active users. It really shows the amazing ability of AI. It also makes many people say that this is the future of the next Internet. Many viewers said that they wanted us to do another AI discussion. So this discussion came. This time we invited a researcher from Google Brain, Xue Zhi. He is one of the authors of Google's large-scale model PaLM, Pathways Language Model. You should know that the number of parameters of this model is three times more than ChatGPT-3. In addition, there are two AI product big cows. One is from the famous company behind Stable Diffusion, Stability AI. The other is from a Silicon Valley technology factory. He was also the product manager in Professor Wu Wenda's Landing AI. In addition, Monica also invited a friend of AI who has been paying attention to AI, Bill, as my special guest host. We mainly discuss several topics. On the one hand, from the perspective of research, what are the most cutting-edge researchers paying attention to? Where are the cutting-edge technologies and the large variables of the future? From the perspective of products and business, what is a good AI product? What kind of evolution may the whole state follow? More importantly, what can we learn from the previous wave of AI entrepreneurship? Finally, Monica and Bill will also make a review, summary and reflection from the perspective of investors. Here is a small update. When this issue was released, Google also responded to the explosive growth of ChatGPT. We are testing an Apprentice Bot based on Lambda model. What kind of surprises will be released? We are looking forward to it. AI is undoubtedly one of the most exciting variables in the coming years. Monica also hopes to invite more first-line entrepreneurs to discuss this topic from different angles. Whether you want to do entrepreneurship, research, product or investment, I hope these conversations will help you understand the possibilities of these technical horizons and business. Even in the future, it can cause some thoughts and inspire us to think about what it means to each person and each society. This discussion is a bit technical, and requires you to have some basic understanding of the biometric AI model. The papers and important concepts involved in the discussion will also be summarized in this episode's summary, which is for your reference. You have worked in North America for many years, and you may have some English mistakes. Please understand. Welcome to the future. Enjoy. Let me give you a brief introduction. Some of your past experiences. A fun fact. Using an AI to represent the world is now palped.
# PyDub handles time in milliseconds ten_minutes = 15 * 60 * 1000
total_length = len(podcast)
start = 0 index = 0 while start < total_length: end = start + ten_minutes if end < total_length: chunk = podcast[start:end] else: chunk = podcast[start:] withopen(f"./data/podcast_clip_{index}.mp3", "wb") as f: chunk.export(f, format="mp3") start = end index += 1
在切分完成之后,我们就一个个地来转录对应的音频文件,对应的代码就在下面。
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16
prompt = "这是一段Onboard播客,里面会聊到ChatGPT以及PALM这个大语言模型。这个模型也叫做Pathways Language Model。" for i inrange(index): clip = f"./data/podcast_clip_{i}.mp3" audio_file= open(clip, "rb") transcript = openai.Audio.transcribe("whisper-1", audio_file, prompt=prompt) # mkdir ./data/transcripts if not exists ifnot os.path.exists("./data/transcripts"): os.makedirs("./data/transcripts") # write to file withopen(f"./data/transcripts/podcast_clip_{i}.txt", "w") as f: f.write(transcript['text']) # get last sentence of the transcript sentences = transcript['text'].split("。") prompt = sentences[-1]
model = whisper.load_model("large") index = 11# number of fi
deftranscript(clip, prompt, output): result = model.transcribe(clip, initial_prompt=prompt) withopen(output, "w") as f: f.write(result['text']) print("Transcripted: ", clip)
original_prompt = "这是一段Onboard播客,里面会聊到ChatGPT以及PALM这个大语言模型。这个模型也叫做Pathways Language Model。\n\n" prompt = original_prompt for i inrange(index): clip = f"./drive/MyDrive/colab_data/podcast/podcast_clip_{i}.mp3" output = f"./drive/MyDrive/colab_data/podcast/transcripts/local_podcast_clip_{i}.txt" transcript(clip, prompt, output) # get last sentence of the transcript withopen(output, "r") as f: transcript = f.read() sentences = transcript.split("。") prompt = original_prompt + sentences[-1]
有一个点你可以注意一下,Whisper的模型和我们之前看过的其他开源模型一样,有好几种不同尺寸。你可以通过 load_model 里面的参数来决定加载什么模型。这里我们选用的是最大的 large 模型,它大约需要10GB的显存。因为Colab提供的GPU是英伟达的T4,有16G显存,所以是完全够用的。
from langchain.chat_models import ChatOpenAI from langchain.text_splitter import SpacyTextSplitter from llama_index import GPTListIndex, LLMPredictor, ServiceContext, SimpleDirectoryReader from llama_index.node_parser import SimpleNodeParser