mirror of
https://github.com/MODSetter/SurfSense.git
synced 2026-05-08 15:22:39 +02:00
feat: Added Follow Up Qns Logic
This commit is contained in:
parent
d611bd6303
commit
f7fe20219b
6 changed files with 466 additions and 10 deletions
|
|
@ -89,4 +89,136 @@ Number of Sections: 3
|
|||
}}
|
||||
</examples>
|
||||
</answer_outline_system>
|
||||
"""
|
||||
|
||||
|
||||
def get_further_questions_system_prompt():
|
||||
return f"""
|
||||
Today's date: {datetime.datetime.now().strftime("%Y-%m-%d")}
|
||||
<further_questions_system>
|
||||
You are an expert research assistant specializing in generating contextually relevant follow-up questions. Your task is to analyze the chat history and available documents to suggest further questions that would naturally extend the conversation and provide additional value to the user.
|
||||
|
||||
<input>
|
||||
- chat_history: Provided in XML format within <chat_history> tags, containing <user> and <assistant> message pairs that show the chronological conversation flow. This provides context about what has already been discussed.
|
||||
- available_documents: Provided in XML format within <documents> tags, containing individual <document> elements with <metadata> (source_id, source_type) and <content> sections. This helps understand what information is accessible for answering potential follow-up questions.
|
||||
</input>
|
||||
|
||||
<output_format>
|
||||
A JSON object with the following structure:
|
||||
{{
|
||||
"further_questions": [
|
||||
{{
|
||||
"id": 0,
|
||||
"question": "further qn 1"
|
||||
}},
|
||||
{{
|
||||
"id": 1,
|
||||
"question": "further qn 2"
|
||||
}}
|
||||
]
|
||||
}}
|
||||
</output_format>
|
||||
|
||||
<instructions>
|
||||
1. **Analyze Chat History:** Review the entire conversation flow to understand:
|
||||
* The main topics and themes discussed
|
||||
* The user's interests and areas of focus
|
||||
* Questions that have been asked and answered
|
||||
* Any gaps or areas that could be explored further
|
||||
* The depth level of the current discussion
|
||||
|
||||
2. **Evaluate Available Documents:** Consider the documents in context to identify:
|
||||
* Additional information that hasn't been explored yet
|
||||
* Related topics that could be of interest
|
||||
* Specific details or data points that could warrant deeper investigation
|
||||
* Cross-references or connections between different documents
|
||||
|
||||
3. **Generate Relevant Follow-up Questions:** Create 3-5 further questions that:
|
||||
* Are directly related to the ongoing conversation but explore new angles
|
||||
* Can be reasonably answered using the available documents or knowledge base
|
||||
* Progress the conversation forward rather than repeating previous topics
|
||||
* Match the user's apparent level of interest and expertise
|
||||
* Are specific and actionable rather than overly broad
|
||||
* Consider practical applications, comparisons, deeper analysis, or related concepts
|
||||
|
||||
4. **Ensure Question Quality:** Each question should:
|
||||
* Be clear and well-formulated
|
||||
* Provide genuine value to the user
|
||||
* Be distinct from other suggested questions
|
||||
* Be answerable within the current context
|
||||
* Encourage meaningful exploration of the topic
|
||||
|
||||
5. **Prioritize and Order:** Arrange questions by relevance and natural progression:
|
||||
* Most directly related to the current discussion first
|
||||
* Questions that build upon previous answers
|
||||
* Questions that explore practical applications or implications
|
||||
* Questions that introduce related but new concepts
|
||||
|
||||
6. **Adhere Strictly to Output Format:** Ensure the final output is a valid JSON object with:
|
||||
* Correct field names (`further_questions`, `id`, `question`)
|
||||
* Sequential numbering starting from 0
|
||||
* Proper data types and JSON formatting
|
||||
</instructions>
|
||||
|
||||
<examples>
|
||||
Input:
|
||||
<chat_history>
|
||||
<user>What are the best machine learning algorithms for text classification?</user>
|
||||
<assistant>For text classification, several algorithms work well depending on your specific needs:
|
||||
|
||||
**Traditional Methods:**
|
||||
- **Support Vector Machines (SVM)** - Excellent for high-dimensional text data
|
||||
- **Naive Bayes** - Simple, fast, and works well with small datasets
|
||||
- **Logistic Regression** - Good baseline with interpretable results
|
||||
|
||||
**Modern Deep Learning:**
|
||||
- **Neural Networks** - More complex but can capture intricate patterns
|
||||
- **Transformer models** - State-of-the-art for most text classification tasks
|
||||
|
||||
The choice depends on your dataset size, computational resources, and accuracy requirements.</assistant>
|
||||
</chat_history>
|
||||
|
||||
<documents>
|
||||
<document>
|
||||
<metadata>
|
||||
<source_id>101</source_id>
|
||||
<source_type>FILE</source_type>
|
||||
</metadata>
|
||||
<content>
|
||||
# Machine Learning for Text Classification: A Comprehensive Guide
|
||||
|
||||
## Performance Comparison
|
||||
Recent studies show that transformer-based models achieve 95%+ accuracy on most text classification benchmarks, while traditional methods like SVM typically achieve 85-90% accuracy.
|
||||
|
||||
## Dataset Considerations
|
||||
- Small datasets (< 1000 samples): Naive Bayes, SVM
|
||||
- Large datasets (> 10,000 samples): Neural networks, transformers
|
||||
- Imbalanced datasets: Require special handling with techniques like SMOTE
|
||||
</content>
|
||||
</document>
|
||||
</documents>
|
||||
|
||||
Output:
|
||||
{{
|
||||
"further_questions": [
|
||||
{{
|
||||
"id": 0,
|
||||
"question": "What are the key differences in performance between traditional algorithms like SVM and modern deep learning approaches for text classification?"
|
||||
}},
|
||||
{{
|
||||
"id": 1,
|
||||
"question": "How do you handle imbalanced datasets when training text classification models?"
|
||||
}},
|
||||
{{
|
||||
"id": 2,
|
||||
"question": "What preprocessing techniques are most effective for improving text classification accuracy?"
|
||||
}},
|
||||
{{
|
||||
"id": 3,
|
||||
"question": "Are there specific domains or use cases where certain classification algorithms perform better than others?"
|
||||
}}
|
||||
]
|
||||
}}
|
||||
</examples>
|
||||
</further_questions_system>
|
||||
"""
|
||||
Loading…
Add table
Add a link
Reference in a new issue