About Orderfox
Orderfox is redefining AI-driven industrial intelligence with cutting-edge solutions for data-driven decision-making and smart procurement automation. Our mission is to empower businesses with next-generation AI tools that enhance operational efficiency, uncover market insights, and drive competitive advantage.
Our Products
* Gieni AI – An advanced AI-powered market intelligence engine that analyzes data from over 380 million websites worldwide, providing businesses with real-time insights, deep industry analytics, and predictive intelligence to stay ahead of the competition.
* Partfox – Partfox is an AI-powered, commission-free platform that connects buyers with manufacturers based on precise CNC machine capabilities. Buyers can upload part specifications (STEP, PDF) and receive instant matches with suitable manufacturers. The platform ensures transparent, efficient procurement by enabling direct communication without intermediaries. Manufacturers can showcase their machine capacities and gain visibility to attract the right projects.
Tasks
- We can only accept applications from within the EU - no VISA support -
We are looking for a Data Scientist specializing in data quality, completeness, and AI-driven automation, with a strong focus on LLM-powered retrieval and process orchestration.
This role is not about traditional machine learning model training—our focus is on agentic AI workflows, efficient data processing, and scalable automation using LLMs via inference APIs (OpenAI, Bedrock, AI Foundry, etc.). You will work with distributed computing, data lakes, vector search, and retrieval-augmented generation (RAG) techniques to optimize knowledge access for AI-driven applications.
Core Focus Areas (Top Priorities)
1. Agentic Prompt Engineering (Langflow, AgentOps, etc.) – Designing dynamic LLM prompts, chains, and automation workflows.
2. Distributed Computing & Process Orchestration (Airflow, Kestra, AWS Batch, Databricks, etc.) – Managing large-scale workflows with automation and fault tolerance.
3. Data Lake Access & Processing (Presto, AWS Athena, etc.) – Efficiently handling big data retrieval, querying, and optimization.
4. Vector & Semantic Search (pgvector, ChromaDB, MeiliSearch, etc.) – Building high-performance search and retrieval systems for LLM-powered AI.
5. Advanced / Agentic RAG (Graph RAG, Recursive RAG, Modular RAG, text chunking techniques, etc.) – Implementing next-generation knowledge retrieval architectures.
Additional Responsibilities
* Ensure data quality & completeness across large-scale structured and unstructured datasets.
* Automate data validation, cleansing, and transformation for LLM & AI pipelines.
* Develop scalable vector databases & search solutions to improve retrieval efficiency.
* Work with graph databases for knowledge representation and structured reasoning.
* Optimize RAG-based AI pipelines for fast, contextually relevant information retrieval.
* Collaborate with engineers & AI researchers to refine data access and orchestration.
* Version control & automation using Git and CI/CD best practices.Stay updated on latest AI trends, LLM advancements, and data infrastructure.
Requirements
Required Skills & Experience
* Programming: Python, SQL, R.
* Agentic Prompt Engineering: Langflow, AgentOps, LlamaIndex, AutoGPT frameworks.
* Process Orchestration & Distributed Computing: Airflow, Kestra, AWS Batch, Databricks.
* Data Lake Access & Processing: Presto, AWS Athena for large-scale data retrieval.
* Vector & Semantic Search: pgvector, ChromaDB, MeiliSearch.
* Advanced RAG Techniques: Graph RAG, Recursive RAG, Modular RAG, text chunking.
* Big Data & Storage: Spark, Hadoop for processing large datasets.
* Cloud Platforms: AWS, GCP, Azure for AI-powered workloads.
* Data Quality & Governance: Ensuring high data integrity & compliance.
* Version Control & Collaboration: Git, CI/CD pipelines for AI workflow management.
* Ethics & AI Governance: Understanding of data privacy and ethical AI applications.
Nice to Have (Bonus Skills)
* Graph Databases (Neo4j, ArangoDB) for AI knowledge graphs.
* MLOps & LLMOps (Docker, Kubernetes) for scalable AI deployments.
* Enterprise AI Data Pipelines – Experience building production-ready AI retrieval systems.LLM Fine-tuning & Multimodal AI (if needed in the future).
Benefits
* Be at the forefront of AI innovation – Work with LLMs, agentic AI, vector search, and advanced RAG in a high-impact environment.
* Shape the future of AI-driven industry intelligence – Develop solutions that redefine market research, procurement, and supply chain automation.
* Global impact – Our products power businesses across industries, enabling smarter decision-making and efficiency at scale.
* Tech-first culture – Work with the latest in LLM orchestration, distributed computing, and AI-powered retrieval, collaborating with top engineers and data scientists.
* Career growth and learning – Join a team that values continuous learning, innovation, and professional development.Competitive compensation and flexibility – Benefit from a strong salary, remote work options, and an agile, fast-moving team environment.
We Look Forward to Your Application
If you are passionate about AI-driven data quality, retrieval, and process automation, we would love to hear from you. Join us in shaping the future of industrial intelligence and apply today!
#J-18808-Ljbffr