mirror of
https://github.com/MODSetter/SurfSense.git
synced 2026-05-04 05:12:38 +02:00
feat: various UI fixes, prompt optimizations, and allowing duplicate docs
- Updated `content_hash` in the `Document` model to remove global uniqueness, allowing identical content across different paths. - Enhanced `_create_document` function to handle path uniqueness and prevent session-poisoning from `IntegrityError`. - Added detailed comments for clarity on the changes and their implications. - Introduced new citation handling in the editor for improved user experience with citation jumps. - Updated package dependencies in the frontend for better functionality.
This commit is contained in:
parent
e6433f78c4
commit
b9a66cb417
26 changed files with 1540 additions and 852 deletions
|
|
@ -976,7 +976,15 @@ class Document(BaseModel, TimestampMixin):
|
|||
document_metadata = Column(JSON, nullable=True)
|
||||
|
||||
content = Column(Text, nullable=False)
|
||||
content_hash = Column(String, nullable=False, index=True, unique=True)
|
||||
# ``content_hash`` is intentionally NOT globally unique. In a real
|
||||
# filesystem two files at different paths can hold identical bytes,
|
||||
# and the agent's ``write_file`` flow needs that semantic to support
|
||||
# copy / duplicate operations. Path uniqueness lives on
|
||||
# ``unique_identifier_hash`` (per search space). The hash remains
|
||||
# indexed because connector indexers consult it as a change-detection
|
||||
# / cross-source dedup hint via :func:`check_duplicate_document`.
|
||||
# See migration 133.
|
||||
content_hash = Column(String, nullable=False, index=True)
|
||||
unique_identifier_hash = Column(String, nullable=True, index=True, unique=True)
|
||||
embedding = Column(Vector(config.embedding_model_instance.dimension))
|
||||
|
||||
|
|
|
|||
Loading…
Add table
Add a link
Reference in a new issue