Summary
>Video + Sound Creator
>Video Creator
>Fast Image Generator
>AI Music Creator
>Smart Image Editor
>Quick Image Generator
1. Log in to forge.vyvo.com and go to Explore.
2. Select the tool you want to use. For example, Quick Image Generator.
3. In the Prompt field, describe the image you want Forge to generate in detail.
4. In the Negative Prompt field, when available, describe what you don’t want in the image. This field is optional, but filling it in can improve the result.
5. In the settings, depending on the tool you choose, you can set the media dimensions, how closely the final result should match the prompt, the quality of the media, the duration for videos, and more. Some tools also include Advanced settings. Required fields are marked with a red asterisk. Fields without an asterisk are optional.
6. When you’re happy with your prompts, make sure you have enough credits, then click the Run button. The final result will appear in the window on the right side of the screen.
To create with Forge, you’ll need credits. Each tool requires a different number of credits.
To earn credits, you need to subscribe to a Vyvo AI plan. Each plan includes a set number of credits, ranging from a minimum of 2,000 to a maximum of 10,000 credits per month. At each renewal, your credit balance resets.
You can see how many credits a tool will use on the selected tool page. For example, on the “Video Creator” page, scroll down to the “Run” button. The number of credits required will be displayed above it.
Jobs that fail are not charged.
You can see your remaining credits on the homepage, or at the top-right side of the page.
How can I top up Credits to use in Vyvo Forge? →
Create videos with matching sound effects and audio. Describe your scene and get a complete video with synchronized audio. Works with text or images as input.
Upon entering the page, you’ll be in the Playground page. Here is where you can instruct Forge on the content you want to create. On the left side of the screen, you’ll have Inputs and Settings, where you can provide your instructions.
Inputs.
- Prompt (required): here you have to describe the scene, motion, and overall look you want in the final video.
Example: A motorcyclist racing through a neon-lit tunnel, side tracking shot, reflections streaking across wet pavement, pulsing synth bass, engine roar, tunnel reverb, high-contrast cyberpunk atmosphere.
- First Frame (optional): here you can upload the image you want to animate; Forge generate the whole scene based on that sample. Clear portraits usually work best. Supported formats are jpg, png, gif, webp.
- End Frame (optional): if you want the video to have different scenes/subjects, here you can upload the ending image of your scene, or a different image if you want the video to move from one image to another. Supported formats are jpg, png, gif, webp.
- Audio Input Mode: you can upload a finished voice track (Upload Audio - Optional) or type the speech and let Forge generate the audio of it (Generate Speech - if selected, the Speech Text is required).
- Voice Id (optional): you can choose the voice used for generated speech, among the many names available (male or female voices).
- Reference Audio (optional): you can upload a voice recording that will be used by Forge as a sample, generating a speech that sounds as closer as possible to it. Supported formats are mp3, wav, ogg, flac.
Settings.
- Mode: you can choose how the video should be created, if Text to Video, Image to Video, or Interpolate.
- Resolution: here you can choose the output size format among the available ones. Higher resolutions take longer and are more demanding.
- Duration: define how long the video should be, from a minimum of 2 seconds to a maximum of 20. Longer videos take more time to generate.
- Enhance Prompt: id switched on, this will allow Forge to expand your prompt automatically. Leave it off if you want tighter control over the result, based exclusively on the inputs and settings you provided.
Advanced.
- Seed: the seed controls that randomness used by the AI when it generates your content. If you leave the seed empty or set it to a random value, Forge will produce a new variation every time, even if the prompt stays identical. If you set a fixed seed (for example, 12345) and keep all other parameters the same, Forge will produce the same result every time. Change the seed, and you get a new variation of the same idea.
At the bottom of the page, you will also find a series of pre-set prompts you can choose from.
Once you’re happy with the settings you provided, click the “Run” button. Your results will appear in the window on the right side of the screen.
Turn your ideas into videos. Describe a scene and watch it come to life, from cinematic landscapes to animated sequences, up to a minute long.
Upon entering the page, you’ll be in the Playground page. Here is where you can instruct Forge on the content you want to create. On the left side of the screen, you’ll have Inputs and Settings, where you can provide your instructions.
Inputs.
- Prompt (required): here you have to describe the scene, motion, and overall look you want in the final video.
Example: A cinematic drone shot gliding over terraced rice fields at sunrise, soft mist between layers, warm golden light, subtle parallax, realistic vegetation detail, premium travel film look.
- Image: here you can upload the image you want to animate; Forge generate the whole scene based on that image. Supported formats are jpg, png, gif, webp. If you want to base the scene only on the text prompt you provided, leave this field empty.
Settings.
- Num Frames: you can decide the frames per second of your video. For example, a 60-second video at 30 fps contains 60×30=1,800 frames. More frames will create smoother motion but can increase file size.
- Resolution: select the output size of your video among the ones available. If you select “Auto”, Forge will use the input image ratio when possible; exact 1280x720 is not supported in the current Helios pipeline, so use 1280x704 instead.
- Fps: here you can decide how many frames (individual image), are shown every second in a video. The standard frame rate is 24fps.
Advanced.
- Seed: the seed controls that randomness used by the AI when it generates your content. If you leave the seed empty or set it to a random value, Forge will produce a new variation every time, even if the prompt stays identical. If you set a fixed seed (for example, 12345) and keep all other parameters the same, Forge will produce the same result every time. Change the seed, and you get a new variation of the same idea.
- Num Inference Steps: Forge's video engine uses a technique called pyramid stages: the video is built across multiple stages, starting from a lower-resolution draft and progressively moving to the full resolution. The Num Inference Steps value tells Forge how many refinement steps to perform within each pyramid stage. Unless you have a specific reason to experiment, we recommend leaving this value at 2.
- Is Amplify First Chunk: Forge creates videos in chunks, and the first chunk is the most important because it defines the video’s overall look, style, lighting, and color. Set it to “true” to keep this boost on. This is the default and recommended setting. Set it to “false” to turn it off for slightly faster generation, but with a less refined opening segment.
At the bottom of the page, you will also find a series of pre-set prompts you can choose from.
Once you’re happy with the settings you provided, click the “Run” button. Your results will appear in the window on the right side of the screen.
Generate stunning images from text in under a second. Describe what you want to see and get a high-quality image instantly.
Upon entering the page, you’ll be in the Playground page. Here is where you can instruct Forge on the content you want to create. On the left side of the screen, you’ll have Inputs and Settings, where you can provide your instructions.
Inputs.
- Prompt (required): here you have to describe the image you want to generate.
Example: Editorial portrait of a boxer in a cream suit under moody softbox lighting, subtle sweat on skin, textured fabric, medium format fashion photography, rich cinematic contrast.
- Source Image: here you can upload a reference image that Forge will use as a sample. You can ask Forge to just edit it or base the final result on that reference.
Settings.
- Num Inference Steps: Forge's video engine uses a technique called pyramid stages: the image is built across multiple stages, starting from a lower-resolution draft and progressively moving to the full resolution. The Num Inference Steps value tells Forge how many refinement steps to perform within each pyramid stage. Unless you have a specific reason to experiment, we recommend leaving this value at 4.
- Guidance Scale: here you can decide how much the image generation process have to follow the text prompt. The higher the value, the more the image generated sticks to a given text input.
- Width: decide the image width (in pixels).
- Height: decide the image height (in pixels).
- Seed: the seed controls that randomness used by the AI when it generates your content. If you leave the seed empty or set it to a random value, Forge will produce a new variation every time, even if the prompt stays identical. If you set a fixed seed (for example, 12345) and keep all other parameters the same, Forge will produce the same result every time. Change the seed, and you get a new variation of the same idea.
At the bottom of the page, you will also find a series of pre-set prompts you can choose from.
Once you’re happy with the settings you provided, click the “Run” button. Your results will appear in the window on the right side of the screen.
Create original music from a text description. Describe the mood, genre, and style you want and get a full song with vocals and instrumentals. Supports 30+ languages and genres from pop to classical.
Upon entering the page, you’ll be in the Playground page. Here is where you can instruct Forge on the content you want to create. On the left side of the screen, you’ll have Inputs and Settings, where you can provide your instructions.
Inputs.
- Reference Audio: here you can upload a reference audio that Forge will use as a sample to base the final result on.
- Prompt (required): here you have to describe the music you want to generate. Describe the style, genre, mood, tempo and instrumentation.
Example: Afro house groove with deep bass, layered percussion, hypnotic vocal chops, warm club atmosphere, elegant build, late-night rooftop energy.
- Lyrics: if you plan to create a song with singing parts, you can enter your own lyrics or have Forge generate it for you by clicking on the “Generate Lyrics with AI” button.
- Tags: choose a music style by entering tags like Lo-fi, upbeat, groovy etc.
Settings.
- Duration: decide the duration of the track, from 10 seconds up until 300 seconds (5 minutes).
- Bpm: this stands for Beats Per Minute (BPM), and it indicates the number of beats in one minute (60 seconds). A higher BPM means faster music (e.g., 150+ BPM for dance), while lower BPM signifies slower, calmer, or solemn music (e.g., 60-80 BPM for ballads or walking pace). If you don’t want to decide this, you can just lower the number to 0, and let Forge generate a right tempo automatically.
- Task Type: here you can decide how to generate your song, based on your inputs. Text to Music, if you used only a text input, or Audio to Audio if you uploaded a Reference Audio.
- Audio Cover Strength: with this you can decide how much Forge has to follow the reference audio you provided. The higher the value, the more the music generated sticks to a given audio input.
Advanced.
- Seed: the seed controls that randomness used by the AI when it generates your content. If you leave the seed empty or set it to a random value, Forge will produce a new variation every time, even if the prompt stays identical. If you set a fixed seed (for example, 12345) and keep all other parameters the same, Forge will produce the same result every time. Change the seed, and you get a new variation of the same idea.
- Key: here you can enter the key scale around which your piece of music revolves, minor or major. Just enter the key into the field (e.g. C major, A minor), or leave it empty to let Forge decide automatically.
- Time signature: this indicates how many note values of a particular type fit into each measure. Choose the time signature among the ones available to decide the rhythm of your music.
- Infer Steps: this parameter indicates the Diffusion Transformer (DiT) which turns the audio prompt into a structured musical result. At each step, the Diffusion Transformer predicts how to remove a little noise while following the prompt, style, melody, or conditioning signals. The number of Inference Steps value tells Forge how many refinement steps to perform within each stage.
- Shift: this parameter can change the balance between structure, smoothness, and detail during the generation of your music (not the prompt itself). A higher shift factor usually puts more emphasis on building the coarse musical structure first, then refining details later. A lower value keeps the step distribution more even. It goes from a minimum of 1.0 to a maximum of 5.0. 1.0 means effectively no shift, while the default parameter is 3.0.
- Solver: this is a sampling algorithm that turns noisy latent audio into a finished result over a series of timesteps. ODE/Euler is better when you want speed, stability, and reproducibility. SDE is better when you want a more exploratory sampling path that may produce richer variation, but with less consistency from run to run.
- Audio Format: select your preferred output audio format among the ones available.
- Batch Size: You can choose the number of tracks to generate, from a minimum of 1 to a maximum of 4.
- LM temperature: this affects how adventurous the generated lyrics, symbolic music tokens, or structural choices feel. Lower values give safer, cleaner results. Higher values can add originality, but may also create odd or less coherent passages.
- Lm Cfg Scale: this parameter controls how strongly the LM follows your prompt when generating music-related text, metadata, lyrics, or audio codes. Higher values mean stronger prompt adherence. Lower values allow more freedom and variation. If you don’t want this regulation, set this parameter to 1.0.
- Lm Top K: this mainly affects the LM planning stage such as how captions, lyrics, metadata, or semantic codes are sampled before the audio model renders the final music. A low number generates safer, more predictable output, while a higher number generates more variety and exploration. When choosing 0, this regulation is disabled.
- Lm Top P: this mainly affects the LM planning part before audio rendering, such as how captions, metadata, lyrics, or semantic codes are sampled. Similarly to Lm Top K, a lower Top-P makes the output safer and more predictable, while a higher Top-P allows more variety and surprise. When choosing 1.0, this regulation is disabled.
At the bottom of the page, you will also find a series of pre-set prompts you can choose from.
Once you’re happy with the settings you provided, click the “Run” button. Your results will appear in the window on the right side of the screen.
Edit photos with natural language instructions. Tell it what to change and it handles the rest. It is perfect for quick touch-ups, creative edits, and style transformations.
Upon entering the page, you’ll be in the Playground page. Here is where you can instruct Forge on the content you want to create. On the left side of the screen, you’ll have Inputs and Settings, where you can provide your instructions.
Inputs.
- Source Image: here you can upload the image that Forge will have to edit for you.
- Prompt (required): here you have to describe the edit you want to apply to the source image.
Example: Premium wireless headphones floating above a soft gradient backdrop, precise rim light, crisp materials, luxury product hero shot, clean advertising aesthetic.
Settings.
- Num Inference Steps: Forge uses a technique called pyramid stages: the image is built across multiple stages, starting from a lower-resolution draft and progressively moving to the full resolution. The Num Inference Steps value tells Forge how many refinement steps to perform within each pyramid stage. Unless you have a specific reason to experiment, we recommend leaving this value at 4.
- Guidance Scale: here you can decide how much the image generation process have to follow the text prompt. The higher the value, the more the image generated sticks to a given text input.
- Seed: the seed controls that randomness used by the AI when it generates your content. If you leave the seed empty or set it to a random value, Forge will produce a new variation every time, even if the prompt stays identical. If you set a fixed seed (for example, 12345) and keep all other parameters the same, Forge will produce the same result every time. Change the seed, and you get a new variation of the same idea.
At the bottom of the page, you will also find a series of pre-set prompts you can choose from.
Once you’re happy with the settings you provided, click the “Run” button. Your results will appear in the window on the right side of the screen.
Generate beautiful images from text descriptions in seconds. Great for quick concepts, social media content, and creative brainstorming. Fast and easy to use.
Upon entering the page, you’ll be in the Playground page. Here is where you can instruct Forge on the content you want to create. On the left side of the screen, you’ll have Inputs and Settings, where you can provide your instructions.
Inputs.
- Prompt (required): here you have to describe the image you want to generate.
Example: A confident streetwear designer leaning against a neon vending machine in Seoul, glossy rain-soaked pavement, anime-inspired realism, vibrant cyan and coral highlights.
- Negative Prompt: here you have to describe what you don’t want in the image you want to generate.
Example: distorted anatomy, duplicated features, poorly drawn clothing, awkward pose, multiple people, crowd, childlike appearance, cartoonish style.
Settings.
- Width: decide the image width (in pixels).
- Height: decide the image height (in pixels).
- Num Inference Steps: Forge uses a technique called pyramid stages: the image is built across multiple stages, starting from a lower-resolution draft and progressively moving to the full resolution. The Num Inference Steps value tells Forge how many refinement steps to perform within each pyramid stage. Unless you have a specific reason to experiment, we recommend leaving this value at 4.
- Guidance Scale: here you can decide how much the image generation process have to follow the text prompt. The higher the value, the more the image generated sticks to a given text input.
- Seed: the seed controls that randomness used by the AI when it generates your content. If you leave the seed empty or set it to a random value, Forge will produce a new variation every time, even if the prompt stays identical. If you set a fixed seed (for example, 12345) and keep all other parameters the same, Forge will produce the same result every time. Change the seed, and you get a new variation of the same idea.
At the bottom of the page, you will also find a series of pre-set prompts you can choose from.
Once you’re happy with the settings you provided, click the “Run” button. Your results will appear in the window on the right side of the screen.