Overall, the video showcases Genmo AI’s potential for creating dynamic animations and visually appealing effects. Genmo.ai operates on a business-to-business (B2B) and business-to-consumer (B2C) model. It serves a wide range of clients, from individual content creators to large enterprises. The company makes money through subscription plans, offering different tiers of service based on the level of access and features required.
Generative AI systems trained on words or word tokens include GPT-3, GPT-4, GPT-4o, LaMDA, LLaMA, BLOOM, Gemini and others (see List of large language models). An AsymmDiT efficiently processes user prompts alongside compressed video tokens by streamlining text processing and focusing neural network capacity on visual reasoning. AsymmDiT jointly attends to text and visual tokens with multi-modal self-attention and learns separate MLP layers for each modality, similar to Stable Diffusion 3. However, our visual stream has nearly 4 times as many parameters as the text stream via a larger hidden dimension. To unify the modalities in self-attention, we use non-square QKV and output projection layers.
Scientists have been looking for alternative architectures that can replace transformers. However, much research has already been poured into enhancing and making transformers efficient. For now, transformers continue to remain the dominant architecture for language models.
By offering cost-effective, multi-modal solutions, Reka has the potential to make advanced AI more accessible and drive new applications across multiple industries. Industry dominance in AI research suggests that companies will continue to drive advancements in the field, leading to more advanced and capable AI systems. However, the rising costs of AI training may pose challenges, as it could limit access to cutting-edge AI technology for smaller organizations or researchers. Apple’s commitment to user data privacy is commendable, but eliminating cloud-based processing and internet connectivity may impede the implementation of more advanced features. Nevertheless, it presents an opportunity for Apple to differentiate itself from competitors by offering users a choice between privacy-focused on-device processing and more powerful cloud-based features.
Further, a team of researchers used the fidelity of geometric constraints to measure the extent to which generated videos conform to physics principles in the real world. Stability AI released Stable Video 3D (SV3D), a new generative AI tool for rendering 3D videos. SV3D can create multi-view 3D models from a single image, allowing users to see an object from any angle. This technology is expected to be valuable in the gaming sector for creating 3D assets and in e-commerce for generating 360-degree product views. Researchers at Anthropic discovered a new way to get advanced AI language models to bypass their safety restrictions and provide unethical or dangerous information.