Volcano is system for runnning high performance workloads on Kubernetes. It provides a suite of mechanisms currently missing from Kubernetes that are commonly required by many classes of high performance workload including:
- Machine learning/Deep learning,
- BioInformatics/Genomics, and
- Other “big data” applications.
These types of applications typically run on generalized domain frameworks like Tensorflow, Spark, PyTorch, MPI, etc, which Volcano integrates with.
Some examples of the mechanisms and features that Volcano adds to Kubernetes are:
Scheduling extensions, e.g:
- Co-scheduling
- Fair-share scheduling
- Queue scheduling
- Preemption and reclaims
- Reservartions and backfills
- Topology-based scheduling
Job management extensions and improvements, e.g: - Multi-pod jobs - Improved error handling - Indexed jobs - Others (in upstream)
Optimizations for throughput, round-trip latency, etc.
Volcano builds upon a decade and a half of experience running a wide variety of high performance workloads at scale using several systems and platforms, combined with best-of-breed ideas and practices from the open source community.
- Twitter: https://twitter.com/volcano_sh
- Slack: https://volcano-sh.slack.com
- Website: https://volcano.sh
- Documentation: https://volcano.sh/docs/