Open source data processing tools, ranked by score and analyzed honestly. Part of our Data & Storage coverage.
Ranked by score. Updated weekly.
Stream processing framework
Unified analytics engine for large-scale data processing
TypeScript-first schema validation with type inference
Extremely fast DataFrame query engine
Distributed event streaming platform
Workflow orchestration for resilient data pipelines
Distributed SQL query engine for big data
Data transformation using software engineering practices
The leading data integration platform for ETL / ELT data pipelines from APIs, databases & files to data warehouses, data lakes & data lakehouses. Both self-hosted and Cloud-hosted.
Domain-specific language designed to streamline the development of high-performance GPU/CPU/Accelerators kernels
A large scale non-linear optimization library
Python Streaming DataFrames for Kafka
A polyglot document intelligence framework with a Rust core. Extract text, metadata, images, and structured information from PDFs, Office documents, images, and 91+ formats. Available for Rust, Python, Ruby, Java, Go, PHP, Elixir, C#, R, C, TypeScript (Node/Bun/Wasm/Deno)- or use via CLI, REST API, or MCP server.
High-performance analytical database combining DuckDB SQL engine, Parquet storage, and Arrow format. 18M+ records/sec.