Gen AI & ML
Gen AI & ML
The eCommerce industry increasingly relies on 3D content for digital twins and spatial experiences, but traditional 3D asset creation remains labor-intensive, requiring custom models derived from manual modeling, 3D CAD optimization, photogrammetry, or LiDAR scans. However, advancements in NeRF, Gaussian Splatting, and Generative AI are opening new possibilities. In particular, generative AI is making significant strides in 3D content creation, helping to overcome key production bottlenecks.
In our recent research, we explored the evolving landscape of generative 3D technologies, assessing AI models' capabilities, limitations, and potential workflow integrations. Here are our initial test results, featuring real-world product samples—check them out! You can also view the models in augmented reality on Android or iOS devices.
Here are our initial test results by sampling some real-world products - check them out here. You can also view the models in augmented reality using Android or iOS devices:
The displayed 3D models (GLB/USDZ) are direct outputs from AI prediction inference, with minimal adjustments to dimensions and colors. Geometry and texture are AI-generated. The initial tests on Home category objects (cabinets, chairs, sectionals, bench) provided valuable insights:
Few Images, Varied Angles: High-quality images with neutral lighting and diverse angles are crucial to produce good results, even when only a limited number (1-5) are available.
Clean Backgrounds: Effective background removal significantly improves inferences.
API Integration: The availability of API support streamlines integration into existing workflows.
Multiple Image and 3D Shape Support: Leveraging multiple images, and even 3D shapes in some cases, enhances model accuracy.
Feasible Cost and Time: Cloud and local compute inference times (3-6 minutes) make the technology increasingly practical for adoption.
Textured Output with Limitations: While all solutions generate albedo textures, automatic PBR and normal map generation still require improvement for added realism.
The rapid growth of eCommerce has fueled a demand for 3D content to create rich, engaging digital experiences. Recent advancements in generative AI for 3D content creation show great potential for increasing efficiency and quality. While the current output may not be suitable for all production needs, it can significantly reduce artist workload by providing a base geometry shape, serving as background stand-ins, or enabling rapid concept visualization. The future of 3D content creation is evolving at a fast pace. What are your thoughts on generative 3D? Let's discuss in the comments!
#3D #AI #GenerativeAI #Innovation #eCommerce #AR #3DModeling
The demand for high-quality product visuals is ever-present in the online marketplace. Balancing the need for speed, budget, and artistic control can be a challenge. Can generative AI automate product imagery while maintaining artistic integrity? This article explores this possibility, using an experiment with ComfyUI to find a balance between automation and artistry.
Can a predefined workflow, tailored to a specific product category, generate diverse images with consistent style and quality?
What does effective automation look like in this context?
Following an initial exploration of home furniture in the previous article, the selection of perfume as a category in this article strategically tests automating image generation1. Perfume's association with elegance makes it an ideal test case for assessing whether generative AI can capture subtle brand nuances. The article focuses on automating lifestyle shots of perfume bottles in curated settings. The intent is to create specialized workflows for specific product categories while ensuring consistent and aesthetically pleasing results.
To simulate a creative process designing perfume product lifestyle imagery, three assumptions are made:
Desired aesthetics, such as soft lighting, rich colors, and a sense of depth, can be predefined.
Centered product composition with accentuating inspiration.
Thematic backdrops aligned with the perfume's scent profile. In this research, I defined 2 backdrop themes: floral and beach.
The primary automation goals are:
Creative Direction Input: A user-friendly interface for providing visual feedback and adjustments.
Consistency: A unified visual style across all product images.
Scalability and Efficiency: Easy scaling of image production to meet demands.
Here is the breakdown of how the automation workflow is designed into 4 parts:
User-Friendly Front-End UI: The front-end UI prioritizes simplicity and ease of use - a simple webpage with chat interface, where users can upload a product image, and a text prompt to select the background theme - Floral or Beach.
The Functional Automation Functions: functions connect the front-end UI with the backend ComfyUI running on a remote server.
ComfyUI Backend Mechanics: Remote Server configured with ComfyUI and dependencies. Some automation processes include - image masking, background theme generation based on prompt, compositing, relighting and color correction.
Output Retrieval: Generated images are automatically saved to a designated location - e.g. Google Drive.
The results demonstrate the automation potential with web user interface communicating to a local server and output saved to Google Drive. The results validated process efficiency, with an average rendering time of 30-120 seconds per image using a GPU (RTX3070+). Image quality and consistent themes demonstrated below.
Although this simplistic approach would benefit from additional processes (composting, relighting, color correction, etc.), it demonstrates that automating sequential workflows can improve consistency, usability, and creative control when generating AI-driven content.
The experiment deliberately focused on utilizing only APIs to showcase the workflow's capabilities. While this approach demonstrates the potential of automation, it also reveals that certain nuanced, creative decision-making is still best achieved with tools like ComfyUI. As such tools continue to evolve, they provide greater flexibility for experimenting and refining AI-driven content generation. Once workflows are well-defined, structuring them into sequential steps—with human oversight—enhances reliability, ensuring a balance between automation and creative control.
What do you see AI shaping creative workflows in your field? What key considerations do you find most important? Would love to hear your thoughts!
Generative AI is opening new avenues in eCommerce, transforming content creation to captivate and inspire customers. My latest experiment explores how varied product imagery can enhance visualization—recognizing it as a key driver for KPIs like conversion rates and average order value.
In this experiment, I trained a Low-Rank Adaptation (LoRA) model with Flux.1-Dev using about 20 reference images of a puffer bag, an accessory with distinct appeal. The resulting photorealistic images showcase the product’s charm across diverse lifestyle settings, suggesting new possibilities for engaging customers in dynamic ways. These visuals highlight how AI innovation can align seamlessly with content needs in eCommerce.
This exploration supports the idea that AI can be applied effectively to deliver inspiring outcomes and shape future business strategy.
Recently I've been experimenting with FLUX.1 Tools and the FLUX Pro Fine-Tuning API to explore how generative AI can streamline product image creation. This led me to two key questions:
What might generative AI content automation look like in practice if only API calls are used?
How far a single product image could be transformed into compelling lifestyle imagery (in contrast to finetuning LoRA training that requires about 20 images)?
Inspired by Anthropic's blog post about Agentic Workflows, I hypothesized that structured, step-by-step generation—using techniques like prompt chaining—could lead to more predictable and controlled results.
🔹 Baseline: Single-shot approach - Quick but inconsistent, with AI hallucinations affecting product representation.
🔹 Experimental Condition: Multi-step agentic-like workflow with human intervention - A structured process produced better outcomes:
1. Focused only on background inference.
2. Added over-the-table elements as a separate pass.
3. Used Fill - outpainting to expand the composition dynamically.
Although this simplistic approach would benefit from additional processes (composting, relighting, color correction, etc.), it demonstrates that automating sequential workflows can improve consistency, usability, and creative control when generating AI-driven content.
The experiment deliberately focused on utilizing only APIs to showcase the workflow's capabilities. While this approach demonstrates the potential of automation, it also reveals that certain nuanced, creative decision-making is still best achieved with tools like ComfyUI. As such tools continue to evolve, they provide greater flexibility for experimenting and refining AI-driven content generation. Once workflows are well-defined, structuring them into sequential steps—with human oversight—enhances reliability, ensuring a balance between automation and creative control.
What do you see AI shaping creative workflows in your field? What key considerations do you find most important? Would love to hear your thoughts!
Generative AI continues to amaze me! I’ve been experimenting a bit for a few years, and the pace of evolution is astonishing.
Back in 2021, I immersed myself in Pix2Pix and StyleGAN2-ADA, training models on thousands of carefully crafted image pairs to identify edge contours and color blocks, transforming them into vivid imageries. Inspired by Egon Schiele’s bold lines and emotive portraits, I used some of his sketches as my experimental canvas. Despite long cloud-based training sessions (72+ hours!) and unpredictable results, the potential was clear.
Now, in 2024, I’ve resumed this journey using a local setup with pre-trained model checkpoints like Dreamshaper(SD1.5), Epicrealism (SD1.5), and Flux1.1. The difference is extraordinary! Running complex generative tasks locally on my laptop feels surreal compared to just a few years ago. See the video below.
Next on my horizon is exploring ControlNet and fine-tuning pre-trained models with my own custom materials.
Experimenting on NVidia Instant-ngp to train a NeRF model in seconds.
ML Model: NVlabs Instant-ngp
Environment: Windows 10 Anaconda
To experiment on Instant-ngp, I recorded a 20-second, 1080p video of a toy car with a mobile phone. After preparing my own NeRF dataset from the video clip, I started the interactive training and rendering in the UI. I am blown away with the results and see how researchers are able to accelerate NeRF training from hours down to seconds! With the possibility to view NeRF in realtime and to generate 3D geometry outputs, NeRF shows exciting future of AI and 3D visualization.
This is an experimenting on how to generate photorealistic synthetic human faces using StyleGAN2-ADA. Here's the training result in a video format.
Dataset: Flickr-Faces-HQ Dataset (FFHQ)
ML Model: StyleGAN2-ada
Environment: Google Colab Pro
To train the model, I used I used the first 6,000 images with 1K resolution. With the trained model, I applied the image projection into latent space. The result is a progression of synthetic faces that shares similar visual landmarks and features, and even the glasses, and hair style!
Exploring ML for Shoe Design using StyleGAN2-ADA. Here's the training result in video format.
Dataset: Shoes dataset from Kaggle with 7,000 images
ML Model: StyleGAN2-ada
Environment: Google Colab Pro
My hypothesis that ML could be used to assist/ inspire the product design process using StyleGAN2 ML model. By projecting inspiration images to latent space, the following are results in video showing the meandering progression process.
First, using a regular shoes as the inspiration (left) to see what variation of the target images (right) the ML model could generate.
Let's try something different. How about a red fox?
What about a cat?
Experimentation of Neural Style Transfer with TensorFlow on Colab
Environment: TensorFlow 2.0 on Colab
Discover how art style transfer can transform images into unique creations by pairing them with famous art styles, all powered by TensorFlow on an Android device... (Read more)