plano/model_server/app/main.py

import os

from app.commons.globals import handler_map
from app.model_handler.base_handler import ChatMessage
from app.model_handler.guardrails import GuardRequest

from fastapi import FastAPI, Response
from opentelemetry import trace
from opentelemetry.sdk.trace import TracerProvider
from opentelemetry.instrumentation.fastapi import FastAPIInstrumentor
from opentelemetry.exporter.otlp.proto.grpc.trace_exporter import OTLPSpanExporter
from opentelemetry.sdk.trace.export import BatchSpanProcessor
from opentelemetry.sdk.resources import Resource


resource = Resource.create(
    {
        "service.name": "model-server",
    }
)

# Initialize the tracer provider
trace.set_tracer_provider(TracerProvider(resource=resource))
tracer = trace.get_tracer(__name__)


app = FastAPI()

FastAPIInstrumentor().instrument_app(app)

# DEFAULT_OTLP_HOST = "http://localhost:4317"
DEFAULT_OTLP_HOST = "none"

# Configure the OTLP exporter (Jaeger, Zipkin, etc.)
otlp_exporter = OTLPSpanExporter(
    endpoint=os.getenv("OTLP_HOST", DEFAULT_OTLP_HOST)  # noqa: F821
)

trace.get_tracer_provider().add_span_processor(BatchSpanProcessor(otlp_exporter))


@app.get("/healthz")
async def healthz():
    return {"status": "ok"}


@app.get("/models")
async def models():
    return {
        "object": "list",
        "data": [{"id": model_name, "object": "model"} for model_name in handler_map],
    }


@app.post("/function_calling")
async def function_calling(req: ChatMessage, res: Response):
    try:
        intent_response = await handler_map["Arch-Intent"].chat_completion(req)

        if handler_map["Arch-Intent"].detect_intent(intent_response):
            # [TODO] measure agreement between intent detection and function calling
            try:
                function_calling_response = await handler_map[
                    "Arch-Function"
                ].chat_completion(req)
                return function_calling_response
            except Exception as e:
                # [TODO] Review: update how to collect debugging outputs
                # logger.error(f"Error in chat_completion from `Arch-Function`: {e}")
                res.status_code = 500
                return {"error": f"[Arch-Function] - {e}"}
        # [TODO] Review: define the behavior if `Arch-Intent` doesn't detect an intent
        # else:

    except Exception as e:
        # [TODO] Review: update how to collect debugging outputs
        # logger.error(f"Error in chat_completion from `Arch-Intent`: {e}")
        res.status_code = 500
        return {"error": f"[Arch-Intent] - {e}"}


@app.post("/guardrails")
async def guardrails(req: GuardRequest, res: Response, max_num_words=300):
    try:
        guard_result = handler_map["Arch-Guard"].predict(req)
        return guard_result
    except Exception as e:
        # [TODO] Review: update how to collect debugging outputs
        res.status_code = 500
        return {"error": f"[Arch-Guard] - {e}"}
add support for jaeger tracing (#229) 2024-11-07 22:11:00 -06:00			`import os`
ensure that we can call the new api.fc.archgw.com url, logging fixes … (#142) * ensure that we can call the new api.fc.archgw.com url, logging fixes and minor cli bug fixes * fixed a bug where model_server printed on terminal after start script stopped running * updating the logo and fixing the website styles * updated the branch with feedback from Co and Adil --------- Co-authored-by: Salman Paracha <salmanparacha@MacBook-Pro-261.local> 2024-10-08 12:40:24 -07:00
Init update on model_server 2024-12-04 16:41:30 -08:00			`from app.commons.globals import handler_map`
refactor model_handler 2024-12-05 11:00:22 -08:00			`from app.model_handler.base_handler import ChatMessage`
Init update on model_server 2024-12-04 16:41:30 -08:00			`from app.model_handler.guardrails import GuardRequest`

Refine model_server 2024-12-05 15:19:41 -08:00			`from fastapi import FastAPI, Response`
add support for jaeger tracing (#229) 2024-11-07 22:11:00 -06:00			`from opentelemetry import trace`
			`from opentelemetry.sdk.trace import TracerProvider`
			`from opentelemetry.instrumentation.fastapi import FastAPIInstrumentor`
			`from opentelemetry.exporter.otlp.proto.grpc.trace_exporter import OTLPSpanExporter`
			`from opentelemetry.sdk.trace.export import BatchSpanProcessor`
			`from opentelemetry.sdk.resources import Resource`

Init update on model_server 2024-12-04 16:41:30 -08:00
add support for jaeger tracing (#229) 2024-11-07 22:11:00 -06:00			`resource = Resource.create(`
			`{`
			`"service.name": "model-server",`
			`}`
			`)`

			`# Initialize the tracer provider`
			`trace.set_tracer_provider(TracerProvider(resource=resource))`
			`tracer = trace.get_tracer(__name__)`


add embedding store (#10) 2024-07-18 14:04:51 -07:00			`app = FastAPI()`

add support for jaeger tracing (#229) 2024-11-07 22:11:00 -06:00			`FastAPIInstrumentor().instrument_app(app)`

			`# DEFAULT_OTLP_HOST = "http://localhost:4317"`
			`DEFAULT_OTLP_HOST = "none"`

			`# Configure the OTLP exporter (Jaeger, Zipkin, etc.)`
			`otlp_exporter = OTLPSpanExporter(`
			`endpoint=os.getenv("OTLP_HOST", DEFAULT_OTLP_HOST) # noqa: F821`
			`)`

			`trace.get_tracer_provider().add_span_processor(BatchSpanProcessor(otlp_exporter))`

lint + formating with black (#158) * lint + formating with black * add black as pre commit 2024-10-09 11:25:07 -07:00
Add workflow logic for weather forecast demo (#24) 2024-07-30 16:23:23 -07:00			`@app.get("/healthz")`
			`async def healthz():`
[Kan-103] add support toxic/jailbreak model (#49) * add toxic/jailbreak model * fix path loading model * fix syntax * fix bug,lint, format * fix bug * formatting * add parallel + chunking * fix bug * working version * fix onnnx name erorr * device * fix jailbreak config * fix syntax error * format * add requirement + cli download for dockerfile * add task * add skeleton change for envoy filter for prompt guard * fix hardware config * fix bug * add config changes * add gitignore * merge main * integrate arch-guard with filter * add hardware config * nothing * add hardware config feature * fix requirement * fix chat ui * fix onnx * fix lint * remove non intel cpu * remove onnx * working version * modify docker * fix guard time * add nvidia support * remove nvidia * add gpu * add gpu * add gpu support * add gpu support for compose * add gpu support for compose * add gpu support for compose * add gpu support for compose * add gpu support for compose * fix docker file * fix int test * correct gpu docker * upgrad python 10 * fix logits to be gpu compatible * default to cpu dockerfile * resolve comments * fix lint + unused parameters * fix * remove eetq install for cpu * remove deploy gpu --------- Co-authored-by: Adil Hafeez <adil@katanemo.com> 2024-09-23 12:07:31 -07:00			`return {"status": "ok"}`

formating and mointoring change (#136) 2024-10-07 15:21:05 -07:00
add embedding store (#10) 2024-07-18 14:04:51 -07:00			`@app.get("/models")`
			`async def models():`
Update model_server (#164) * Update model server * Delete model_server/.vscode/settings.json * Update loader.py * Fix errors * Update log mode 2024-10-09 18:04:52 -07:00			`return {`
			`"object": "list",`
Init update on model_server 2024-12-04 16:41:30 -08:00			`"data": [{"id": model_name, "object": "model"} for model_name in handler_map],`
Update model_server (#164) * Update model server * Delete model_server/.vscode/settings.json * Update loader.py * Fix errors * Update log mode 2024-10-09 18:04:52 -07:00			`}`
add embedding store (#10) 2024-07-18 14:04:51 -07:00

Init update on model_server 2024-12-04 16:41:30 -08:00			`@app.post("/function_calling")`
Refine model_server 2024-12-05 15:19:41 -08:00			`async def function_calling(req: ChatMessage, res: Response):`
Init update on model_server 2024-12-04 16:41:30 -08:00			`try:`
Refine model_server 2024-12-05 15:19:41 -08:00			`intent_response = await handler_map["Arch-Intent"].chat_completion(req)`
[Kan-103] add support toxic/jailbreak model (#49) * add toxic/jailbreak model * fix path loading model * fix syntax * fix bug,lint, format * fix bug * formatting * add parallel + chunking * fix bug * working version * fix onnnx name erorr * device * fix jailbreak config * fix syntax error * format * add requirement + cli download for dockerfile * add task * add skeleton change for envoy filter for prompt guard * fix hardware config * fix bug * add config changes * add gitignore * merge main * integrate arch-guard with filter * add hardware config * nothing * add hardware config feature * fix requirement * fix chat ui * fix onnx * fix lint * remove non intel cpu * remove onnx * working version * modify docker * fix guard time * add nvidia support * remove nvidia * add gpu * add gpu * add gpu support * add gpu support for compose * add gpu support for compose * add gpu support for compose * add gpu support for compose * add gpu support for compose * fix docker file * fix int test * correct gpu docker * upgrad python 10 * fix logits to be gpu compatible * default to cpu dockerfile * resolve comments * fix lint + unused parameters * fix * remove eetq install for cpu * remove deploy gpu --------- Co-authored-by: Adil Hafeez <adil@katanemo.com> 2024-09-23 12:07:31 -07:00
Refine model_server 2024-12-05 15:19:41 -08:00			`if handler_map["Arch-Intent"].detect_intent(intent_response):`
			`# [TODO] measure agreement between intent detection and function calling`
Init update on model_server 2024-12-04 16:41:30 -08:00			`try:`
Refine model_server 2024-12-05 15:19:41 -08:00			`function_calling_response = await handler_map[`
			`"Arch-Function"`
			`].chat_completion(req)`
			`return function_calling_response`
Init update on model_server 2024-12-04 16:41:30 -08:00			`except Exception as e:`
Refine model_server 2024-12-05 15:19:41 -08:00			`# [TODO] Review: update how to collect debugging outputs`
Init update on model_server 2024-12-04 16:41:30 -08:00			# logger.error(f"Error in chat_completion from `Arch-Function`: {e}")
			`res.status_code = 500`
			`return {"error": f"[Arch-Function] - {e}"}`
Update the logic of intent detection 2024-12-06 14:14:44 -08:00			# [TODO] Review: define the behavior if `Arch-Intent` doesn't detect an intent
			`# else:`
Improve prompt target intent matching (#51) 2024-09-16 19:20:07 -07:00
Init update on model_server 2024-12-04 16:41:30 -08:00			`except Exception as e:`
Refine model_server 2024-12-05 15:19:41 -08:00			`# [TODO] Review: update how to collect debugging outputs`
Init update on model_server 2024-12-04 16:41:30 -08:00			# logger.error(f"Error in chat_completion from `Arch-Intent`: {e}")
			`res.status_code = 500`
			`return {"error": f"[Arch-Intent] - {e}"}`
Cotran/hallucination integration (#115) * fix fc integration * fix integration * remove file * Update arch_fc.py * create model server hallucination detection class 2024-10-04 11:05:25 -07:00

Init update on model_server 2024-12-04 16:41:30 -08:00			`@app.post("/guardrails")`
			`async def guardrails(req: GuardRequest, res: Response, max_num_words=300):`
add prefill and test (#236) * add prefill and test * fix stream * fix * feedback * address comments * update * add e2e test * fix e2e test * update fix * fix * address cmt * address cmt 2024-11-07 11:59:29 -08:00			`try:`
Init update on model_server 2024-12-04 16:41:30 -08:00			`guard_result = handler_map["Arch-Guard"].predict(req)`
			`return guard_result`
add prefill and test (#236) * add prefill and test * fix stream * fix * feedback * address comments * update * add e2e test * fix e2e test * update fix * fix * address cmt * address cmt 2024-11-07 11:59:29 -08:00			`except Exception as e:`
Refine model_server 2024-12-05 15:19:41 -08:00			`# [TODO] Review: update how to collect debugging outputs`
add prefill and test (#236) * add prefill and test * fix stream * fix * feedback * address comments * update * add e2e test * fix e2e test * update fix * fix * address cmt * address cmt 2024-11-07 11:59:29 -08:00			`res.status_code = 500`
Init update on model_server 2024-12-04 16:41:30 -08:00			`return {"error": f"[Arch-Guard] - {e}"}`