Although recent work on text-conditional 3D object generation has shown promising results, state-of-the-art methods typically require multiple GPU hours to generate a single sample. This is in stark contrast to state-of-the-art generative image models that generate samples in seconds or minutes. In this paper, we consider an alternative method of 3D object generation that generates his 3D models in just 1-2 minutes on a single GPU. Our method first generates a single composite view using a text-to-image diffusion model, and then uses a second diffusion model conditional on the generated image to generate 3D points. Generate a swarm. Although our method is not yet state-of-the-art in terms of sample quality, it speeds up sampling by an order of magnitude or two and provides a practical trade-off for some use cases. We release pre-trained point cloud diffusion models, evaluation code and models at: This https URL.