MusicGen
5
About
MusicGen is a versatile AI model for creating original music from text descriptions or audio examples. It lets users specify genre, mood, tempo, and instrumentation to produce polished background tracks, jingles, or full musical ideas. You can either type a prompt (for example: “upbeat lo-fi guitar with mellow keys, 90 BPM”) or supply an audio clip to continue or mimic a style — ideal for remixes, mashups, or extending short melodies. MusicGen supports both text-to-audio and audio-to-audio workflows, giving creators practical ways to prototype and finalize music quickly.
Designed for ease of use, MusicGen is suitable for hobbyists, video producers, game developers, and sound designers who need fast, customizable music without deep music-production experience. It outputs high-quality samples that are ready for use in videos, podcasts, games, or demo tracks, and offers controls over tempo, mood, and instrumentation to match project needs. Multiple model sizes let you balance quality and compute resources, and the model was trained on a broad set of licensed music to produce diverse results.
What makes MusicGen particularly useful is its blend of quality, controllability, and support for audio references — you can get both fresh compositions from text and faithful continuations of an existing piece. Limitations to be aware of: vocal synthesis can be less realistic than instrumental output, and prompts in languages other than English may produce variable results depending on training coverage. Overall, MusicGen streamlines music creation, enabling fast iteration and creative exploration across personal and commercial projects.
Percs
High quality
Controllable
Supports references
Settings
BPM- Beats Per Minute. Set the tempo of a generation
Track duration- The duration of a generation
Model version- Different models make different sound.
Normalization Strategy- Strategy for normalizing audio
Temperature- The temperature of the model. Higher values make the model more creative and lower values make it more focused.
Top K- The number of top-ranked items to select from the output. Higher values result in more diverse outputs by considering more candidates.
Top P- Reduces sampling to tokens with cumulative probability of p. When set to `0` (default), top_k sampling is used.
Classifier Free Guidance- Increases the influence of inputs on the output. Higher values produce lower-varience outputs that adhere more closely to inputs.