omnigraph/Cargo.toml
Andrew Altshuler 74eb5a5380
Parallel per-type load writes + omnigraph optimize/cleanup CLI (#46)
* Parallel per-type load writes + omnigraph optimize/cleanup CLI

## MR-677.3 — parallel per-type load writes

The load path already groups records into one RecordBatch per type and
makes one Lance commit per table (loader::mod.rs:249-..), but those
commits ran sequentially. Wrap node and edge write loops in
`futures::stream::buffered(N)` against a new helper
`write_batches_concurrently`. Concurrency tunable via
`OMNIGRAPH_LOAD_CONCURRENCY` (default 8).

## MR-676 — `omnigraph optimize` and `omnigraph cleanup`

New CLI subcommands that walk every node + edge table in the repo:

- `omnigraph optimize <uri>` — runs Lance `compact_files` on each
  table to merge small fragments into fewer larger ones.
- `omnigraph cleanup <uri> --keep N | --older-than 7d --confirm` —
  runs Lance `cleanup_old_versions` to prune historical manifests +
  unique fragments. Requires `--confirm` because it's destructive.
  Supports both count-based and time-based retention (or both AND'd
  together). Time uses chrono `DateTime<Utc>` (added as a workspace
  dep, default-features off).

Both commands run their per-table loops in parallel (8-way bounded,
`OMNIGRAPH_MAINTENANCE_CONCURRENCY` env override). Smoke-tested
against the 114-table prod graph: optimize went 7m15s sequential
→ 1m28s parallel. cleanup --keep 1 removed 137 historical versions
across 114 tables in 1m57s without disrupting `/healthz` or query
responses.

Public API on `Omnigraph`:

  pub async fn optimize(&mut self) -> Result<Vec<TableOptimizeStats>>
  pub async fn cleanup(&mut self, opts: CleanupPolicyOptions)
      -> Result<Vec<TableCleanupStats>>

All 10 existing loader tests still pass.

Closes MR-676.
Partially addresses MR-677 (the .3 — parallel by type — piece;
MR-677.1 is for the `omnigraph embed` path, not load, since load
doesn't call Gemini directly. .2 was already in place).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* chore: regenerate openapi.json

---------

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
2026-04-25 14:22:14 +03:00

82 lines
2 KiB
TOML

[workspace]
resolver = "2"
members = [
"crates/omnigraph-compiler",
"crates/omnigraph",
"crates/omnigraph-cli",
"crates/omnigraph-server",
]
default-members = [
"crates/omnigraph",
"crates/omnigraph-cli",
"crates/omnigraph-server",
]
[workspace.dependencies]
arrow-array = "57"
arrow-ipc = "57"
arrow-schema = "57"
arrow-select = "57"
arrow-cast = { version = "57", features = ["prettyprint"] }
arrow-ord = "57"
datafusion-physical-plan = "52"
datafusion-physical-expr = "52"
datafusion-execution = "52"
datafusion-common = "52"
datafusion-expr = "52"
datafusion-functions-aggregate = "52"
lance = { version = "4.0.0", default-features = false, features = ["aws"] }
lance-datafusion = "4.0.0"
lance-file = "4.0.0"
lance-index = "4.0.0"
lance-linalg = "4.0.0"
lance-namespace = "4.0.0"
lance-namespace-impls = "4.0.0"
lance-table = "4.0.0"
ulid = "1"
futures = "0.3"
async-trait = "0.1"
chrono = { version = "0.4", default-features = false, features = ["clock"] }
pest = "2"
pest_derive = "2"
thiserror = "2"
tokio = { version = "1", features = ["rt-multi-thread", "macros", "time", "net", "signal", "sync"] }
clap = { version = "4", features = ["derive"] }
serde = { version = "1", features = ["derive"] }
serde_json = "1"
serde_yaml = "0.9"
tracing = "0.1"
tracing-subscriber = { version = "0.3", features = ["env-filter", "fmt"] }
tower = "0.5"
tower-http = { version = "0.6", features = ["trace"] }
color-eyre = "0.6"
tempfile = "3"
ahash = "0.8"
base64 = "0.22"
ariadne = "0.4"
regex = "1"
reqwest = { version = "0.12", default-features = false, features = ["json", "rustls-tls"] }
object_store = { version = "0.12.5", default-features = false, features = ["aws"] }
fail = "0.5"
time = { version = "0.3", features = ["formatting"] }
axum = { version = "0.8", features = ["json", "macros"] }
utoipa = { version = "5", features = ["axum_extras"] }
url = "2"
cedar-policy = "4.9"
sha2 = "0.10"
subtle = "2"
[profile.dev]
debug = 0
[profile.dev.package."*"]
opt-level = 2
[profile.release]
opt-level = 2
lto = "thin"
codegen-units = 16
strip = true