Skip to content

Commit 5221f22

Browse files
committed
feat(integrations): updaye spider api call to support async + removed duplicate code + readme update
1 parent 1e96a7d commit 5221f22

File tree

9 files changed

+213
-52
lines changed

9 files changed

+213
-52
lines changed

Diff for: README.md

+204-5
Original file line numberDiff line numberDiff line change
@@ -76,7 +76,6 @@ Your contributions, big or small, are valuable to us. Let's build something amaz
7676
- [Integrations](#integrations)
7777
- [Other Features](#other-features)
7878
- [Adding Tools to Agents](#adding-tools-to-agents)
79-
- [Managing Sessions and Users](#managing-sessions-and-users)
8079
- [Document Integration and Search](#document-integration-and-search)
8180
- [Reference](#reference)
8281
- [SDK Reference](#sdk-reference)
@@ -86,6 +85,15 @@ Your contributions, big or small, are valuable to us. Let's build something amaz
8685
- [Different Use Cases](#different-use-cases)
8786
- [Different Form Factor](#different-form-factor)
8887
- [In Summary](#in-summary)
88+
- [Document Integration and Search](#document-integration-and-search-1)
89+
- [Reference](#reference-1)
90+
- [SDK Reference](#sdk-reference-1)
91+
- [API Reference](#api-reference-1)
92+
- [Local Quickstart](#local-quickstart-1)
93+
- [What's the difference between Julep and LangChain etc?](#whats-the-difference-between-julep-and-langchain-etc-1)
94+
- [Different Use Cases](#different-use-cases-1)
95+
- [Different Form Factor](#different-form-factor-1)
96+
- [In Summary](#in-summary-1)
8997

9098
<!-- END doctoc generated TOC please keep comment here to allow auto update -->
9199

@@ -1400,6 +1408,9 @@ output:
14001408
<td>
14011409

14021410
```yaml
1411+
setup:
1412+
# No specific setup parameters are required for Wikipedia
1413+
14031414
arguments:
14041415
query: string # The search query string
14051416
load_max_docs: integer # (Optional) Maximum number of documents to load. Default is 2.
@@ -1409,7 +1420,6 @@ output:
14091420
```
14101421

14111422
</td>
1412-
14131423
<td>
14141424

14151425
**Example cookbook**: [cookbooks/03-trip-planning-assistant.ipynb](https://github.com/julep-ai/julep/blob/dev/cookbooks/03-trip-planning-assistant.ipynb)
@@ -1433,7 +1443,6 @@ output:
14331443
```
14341444

14351445
</td>
1436-
14371446
</tr>
14381447

14391448
<tr>
@@ -1456,9 +1465,14 @@ output:
14561465
```
14571466

14581467
</td>
1468+
<td>
1469+
1470+
**Example cookbook**: [cookbooks/07-personalized-research-assistant.ipynb](https://github.com/julep-ai/julep/blob/dev/cookbooks/07-personalized-research-assistant.ipynb)
14591471

1472+
</td>
14601473
</tr>
14611474

1475+
14621476
<tr>
14631477
<td> <b>Cloudinary</b> </td>
14641478
<td>
@@ -1489,16 +1503,45 @@ output:
14891503
```
14901504

14911505
</td>
1492-
14931506
<td>
14941507

14951508
**Example cookbook**: [cookbooks/05-video-processing-with-natural-language.ipynb](https://github.com/julep-ai/julep/blob/dev/cookbooks/05-video-processing-with-natural-language.ipynb)
14961509

14971510
</td>
14981511
</tr>
14991512

1500-
</table>
1513+
<tr>
1514+
<td> <b>Arxiv</b> </td>
1515+
<td>
15011516

1517+
```yaml
1518+
method: search # The method to use for the Arxiv integration
1519+
1520+
setup:
1521+
# No specific setup parameters are required for Arxiv
1522+
1523+
arguments:
1524+
query: string # The search query for searching with Arxiv
1525+
id_list: list[string] | None # (Optional) The list of Arxiv IDs to search with
1526+
max_results: integer # The maximum number of results to return, must be between 1 and 300000
1527+
download_pdf: boolean # Whether to download the PDF of the results. Default is false.
1528+
sort_by: string # The sort criterion for the results, options: relevance, lastUpdatedDate, submittedDate
1529+
sort_order: string # The sort order for the results, options: ascending, descending
1530+
1531+
output:
1532+
result: list # A list of search results, each containing: entry_id, title, updated, published, authors, summary, comment, journal_ref, doi, primary_category, categories, links, pdf_url, pdf_downloaded
1533+
```
1534+
1535+
</td>
1536+
1537+
<td>
1538+
1539+
**Example cookbook**: [cookbooks/07-personalized-research-assistant.ipynb](https://github.com/julep-ai/julep/blob/dev/cookbooks/07-personalized-research-assistant.ipynb)
1540+
1541+
</td>
1542+
</tr>
1543+
1544+
</table>
15021545
For more details, refer to our [Integrations Documentation](#integrations).
15031546

15041547
<div align="center">
@@ -1508,6 +1551,10 @@ For more details, refer to our [Integrations Documentation](#integrations).
15081551
<a href="#-table-of-contents">
15091552
<img src="https://img.shields.io/badge/Table%20of%20Contents-000000?style=for-the-badge&logo=github&logoColor=white" alt="Table of Contents">
15101553
</a>
1554+
</a>&nbsp;|&nbsp;
1555+
<a href="#-table-of-contents">
1556+
<img src="https://img.shields.io/badge/Table%20of%20Contents-000000?style=for-the-badge&logo=github&logoColor=white" alt="Table of Contents">
1557+
</a>
15111558
</div>
15121559

15131560
## Other Features
@@ -1529,6 +1576,158 @@ client.agents.tools.create(
15291576
"setup": {"api_key": "your_brave_api_key"},
15301577
},
15311578
)
1579+
1580+
Julep offers a range of advanced features to enhance your AI workflows:
1581+
### Managing Sessions and Users
1582+
### Adding Tools to Agents
1583+
Julep provides robust session management for persistent interactions:
1584+
Extend your agent's capabilities by integrating external tools and APIs:
1585+
```python
1586+
session = client.sessions.create(
1587+
agent_id=agent.id,
1588+
user_id=user.id,
1589+
context_overflow="adaptive"
1590+
)
1591+
```python
1592+
# Continue conversation in the same session
1593+
response = client.sessions.chat(
1594+
session_id=session.id,
1595+
messages=[
1596+
{
1597+
"role": "user",
1598+
"content": "Follow up on the previous conversation."
1599+
}
1600+
]
1601+
)
1602+
```
1603+
1604+
### Document Integration and Search
1605+
1606+
Easily manage and search through documents for your agents:
1607+
1608+
```python
1609+
# Upload a document
1610+
document = client.agents.docs.create(
1611+
title="AI advancements",
1612+
content="AI is changing the world...",
1613+
metadata={"category": "research_paper"}
1614+
)
1615+
1616+
# Search documents
1617+
results = client.agents.docs.search(
1618+
text="AI advancements",
1619+
metadata_filter={"category": "research_paper"}
1620+
)
1621+
```
1622+
agent_id=agent.id,
1623+
name="web_search",
1624+
description="Search the web for information.",
1625+
integration={
1626+
</a>&nbsp;|&nbsp;
1627+
<a href="#-table-of-contents">
1628+
<img src="https://img.shields.io/badge/Table%20of%20Contents-000000?style=for-the-badge&logo=github&logoColor=white" alt="Table of Contents">
1629+
</a>
1630+
</div>
1631+
1632+
## Reference
1633+
1634+
### SDK Reference
1635+
1636+
- **Node.js** [SDK Reference](https://github.com/julep-ai/node-sdk/blob/main/api.md) | [NPM Package](https://www.npmjs.com/package/@julep/sdk)
1637+
- **Python** [SDK Reference](https://github.com/julep-ai/python-sdk/blob/main/api.md) | [PyPI Package](https://pypi.org/project/julep/)
1638+
1639+
### API Reference
1640+
1641+
Explore our API documentation to learn more about agents, tasks, and executions:
1642+
1643+
- [Agents API](https://dev.julep.ai/api/docs#tag/agents)
1644+
- [Tasks API](https://dev.julep.ai/api/docs#tag/tasks)
1645+
- [Executions API](https://dev.julep.ai/api/docs#tag/executions)
1646+
1647+
<div align="center">
1648+
<a href="#top">
1649+
<img src="https://img.shields.io/badge/Back%20to%20Top-000000?style=for-the-badge&logo=github&logoColor=white" alt="Back to Top">
1650+
</a>&nbsp;|&nbsp;
1651+
<a href="#-table-of-contents">
1652+
<img src="https://img.shields.io/badge/Table%20of%20Contents-000000?style=for-the-badge&logo=github&logoColor=white" alt="Table of Contents">
1653+
</a>
1654+
</div>
1655+
1656+
## Local Quickstart
1657+
1658+
**Requirements**:
1659+
1660+
- latest docker compose installed
1661+
1662+
**Steps**:
1663+
1664+
1. `git clone https://github.com/julep-ai/julep.git`
1665+
2. `cd julep`
1666+
3. `docker volume create cozo_backup`
1667+
4. `docker volume create cozo_data`
1668+
5. `cp .env.example .env # <-- Edit this file`
1669+
6. `docker compose --env-file .env --profile temporal-ui --profile single-tenant --profile self-hosted-db up --build`
1670+
1671+
<div align="center">
1672+
<a href="#top">
1673+
<img src="https://img.shields.io/badge/Back%20to%20Top-000000?style=for-the-badge&logo=github&logoColor=white" alt="Back to Top">
1674+
</a>&nbsp;|&nbsp;
1675+
<a href="#-table-of-contents">
1676+
<img src="https://img.shields.io/badge/Table%20of%20Contents-000000?style=for-the-badge&logo=github&logoColor=white" alt="Table of Contents">
1677+
</a>
1678+
</div>
1679+
1680+
---
1681+
1682+
## What's the difference between Julep and LangChain etc?
1683+
1684+
### Different Use Cases
1685+
1686+
Think of LangChain and Julep as tools with different focuses within the AI development stack.
1687+
1688+
LangChain is great for creating sequences of prompts and managing interactions with LLMs. It has a large ecosystem with lots of pre-built integrations, which makes it convenient if you want to get something up and running quickly. LangChain fits well with simple use cases that involve a linear chain of prompts and API calls.
1689+
1690+
Julep, on the other hand, is more about building persistent AI agents that can maintain context over long-term interactions. It shines when you need complex workflows that involve multi-step tasks, conditional logic, and integration with various tools or APIs directly within the agent's process. It's designed from the ground up to manage persistent sessions and complex workflows.
1691+
1692+
Use Julep if you imagine building a complex AI assistant that needs to:
1693+
1694+
- Keep track of user interactions over days or weeks.
1695+
- Perform scheduled tasks, like sending daily summaries or monitoring data sources.
1696+
- Make decisions based on prior interactions or stored data.
1697+
- Interact with multiple external services as part of its workflow.
1698+
1699+
Then Julep provides the infrastructure to support all that without you having to build it from scratch.
1700+
1701+
### Different Form Factor
1702+
1703+
Julep is a **platform** that includes a language for describing workflows, a server for running those workflows, and an SDK for interacting with the platform. In order to build something with Julep, you write a description of the workflow in `YAML`, and then run the workflow in the cloud.
1704+
1705+
Julep is built for heavy-lifting, multi-step, and long-running workflows and there's no limit to how complex the workflow can be.
1706+
1707+
LangChain is a **library** that includes a few tools and a framework for building linear chains of prompts and tools. In order to build something with LangChain, you typically write Python code that configures and runs the model chains you want to use.
1708+
1709+
LangChain might be sufficient and quicker to implement for simple use cases that involve a linear chain of prompts and API calls.
1710+
1711+
### In Summary
1712+
1713+
Use LangChain when you need to manage LLM interactions and prompt sequences in a stateless or short-term context.
1714+
1715+
Choose Julep when you need a robust framework for stateful agents with advanced workflow capabilities, persistent sessions, and complex task orchestration.
1716+
1717+
<div align="center">
1718+
<a href="#top">
1719+
<img src="https://img.shields.io/badge/Back%20to%20Top-000000?style=for-the-badge&logo=github&logoColor=white" alt="Back to Top">
1720+
</a>&nbsp;|&nbsp;
1721+
<a href="#-table-of-contents">
1722+
<img src="https://img.shields.io/badge/Table%20of%20Contents-000000?style=for-the-badge&logo=github&logoColor=white" alt="Table of Contents">
1723+
</a>
1724+
</div>
1725+
1726+
"provider": "brave",
1727+
"method": "search",
1728+
"setup": {"api_key": "your_brave_api_key"},
1729+
},
1730+
)
15321731
```
15331732

15341733
### Managing Sessions and Users

Diff for: agents-api/agents_api/autogen/Tools.py

+2-2
Original file line numberDiff line numberDiff line change
@@ -1639,7 +1639,7 @@ class SpiderFetchArguments(BaseModel):
16391639
"""
16401640
The URL to fetch data from
16411641
"""
1642-
mode: Literal["scrape"] = "scrape"
1642+
mode: Literal["crawl", "scrape"] = "scrape"
16431643
"""
16441644
The type of crawler to use
16451645
"""
@@ -1661,7 +1661,7 @@ class SpiderFetchArgumentsUpdate(BaseModel):
16611661
"""
16621662
The URL to fetch data from
16631663
"""
1664-
mode: Literal["scrape"] = "scrape"
1664+
mode: Literal["crawl", "scrape"] = "scrape"
16651665
"""
16661666
The type of crawler to use
16671667
"""

Diff for: agents-api/agents_api/routers/sessions/chat.py

-12
Original file line numberDiff line numberDiff line change
@@ -56,17 +56,6 @@ async def chat(
5656
ChatResponse: The chat response.
5757
"""
5858

59-
# check if the developer is paid
60-
if "paid" not in developer.tags:
61-
# get the session length
62-
sessions = count_sessions_query(developer_id=developer.id)
63-
session_length = sessions["count"]
64-
if session_length > max_free_sessions:
65-
raise HTTPException(
66-
status_code=status.HTTP_403_FORBIDDEN,
67-
detail="Session length exceeded the free tier limit",
68-
)
69-
7059
# check if the developer is paid
7160
if "paid" not in developer.tags:
7261
# get the session length
@@ -108,7 +97,6 @@ async def chat(
10897
)
10998
for ref in doc_references
11099
]
111-
112100
# Render the system message
113101
if situation := chat_context.session.situation:
114102
system_message = dict(

Diff for: integrations-service/gunicorn_conf.py

-26
Original file line numberDiff line numberDiff line change
@@ -24,32 +24,6 @@
2424
preload_app = False
2525

2626

27-
def when_ready(server):
28-
"""Run when server is ready to handle requests."""
29-
# Ensure proper permissions for any required directories
30-
for directory in ["logs", "run"]:
31-
path = os.path.join(os.getcwd(), directory)
32-
if not os.path.exists(path):
33-
os.makedirs(path, mode=0o755)
34-
35-
36-
def on_starting(server):
37-
"""Run when server starts."""
38-
server.log.setup(server.app.cfg)
39-
40-
41-
def worker_exit(server, worker):
42-
"""Clean up on worker exit."""
43-
server.log.info(f"Worker {worker.pid} exiting gracefully")
44-
45-
46-
loglevel = "info"
47-
graceful_timeout = 30
48-
max_requests = 1000
49-
max_requests_jitter = 50
50-
preload_app = False
51-
52-
5327
def when_ready(server):
5428
"""Run when server is ready to handle requests."""
5529
# Ensure proper permissions for any required directories

Diff for: integrations-service/integrations/autogen/Tools.py

+2-2
Original file line numberDiff line numberDiff line change
@@ -1639,7 +1639,7 @@ class SpiderFetchArguments(BaseModel):
16391639
"""
16401640
The URL to fetch data from
16411641
"""
1642-
mode: Literal["scrape"] = "scrape"
1642+
mode: Literal["crawl", "scrape"] = "scrape"
16431643
"""
16441644
The type of crawler to use
16451645
"""
@@ -1661,7 +1661,7 @@ class SpiderFetchArgumentsUpdate(BaseModel):
16611661
"""
16621662
The URL to fetch data from
16631663
"""
1664-
mode: Literal["scrape"] = "scrape"
1664+
mode: Literal["crawl", "scrape"] = "scrape"
16651665
"""
16661666
The type of crawler to use
16671667
"""

Diff for: integrations-service/integrations/utils/integrations/ffmpeg.py

+1-1
Original file line numberDiff line numberDiff line change
@@ -67,7 +67,7 @@ async def bash_cmd(arguments: FfmpegSearchArguments) -> FfmpegSearchOutput:
6767
# Decode base64 input
6868
try:
6969
input_data = base64.b64decode(arguments.file)
70-
except Exception as e:
70+
except Exception:
7171
return FfmpegSearchOutput(
7272
fileoutput="Error: Invalid base64 input", result=False, mime_type=None
7373
)

Diff for: integrations-service/integrations/utils/integrations/spider.py

+1-3
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,3 @@
1-
import asyncio
2-
31
from beartype import beartype
42
from langchain_community.document_loaders import SpiderLoader
53
from tenacity import retry, stop_after_attempt, wait_exponential
@@ -45,5 +43,5 @@ async def crawl(
4543
params=arguments.params,
4644
)
4745

48-
documents = await asyncio.to_thread(spider_loader.load)
46+
documents = await spider_loader.aload()
4947
return SpiderFetchOutput(documents=documents)

0 commit comments

Comments
 (0)