TL;DR for operators
A paper on a “real-time stock analyst” sounds, at first blush, like another attempt to place a crystal ball inside a chatbot and call it alpha. Fortunately, this one is more useful than that. Taniv Ashraf’s paper, A Serverless Architecture for Real-Time Stock Analysis using Large Language Models, is best read as a build-and-debug case study, not as evidence that Gemini can reliably predict stock prices.1
The operational lesson is simple: a single developer can now assemble a daily financial-intelligence pipeline with almost no traditional backend. The system fetches end-of-day stock data and news, asks Gemini for structured qualitative analysis, writes the result into a JSON file, and serves it through a static frontend. GitHub Actions does the scheduling and execution. GitHub itself stores the latest artifact. The frontend simply reads the updated JSON. No server fleet. No database. No heroic Kubernetes mural painted across the office wall.
The business value is not “AI trader in a box”. It is cheaper instrumentation. Teams can use the same pattern for lightweight dashboards, monitored watchlists, internal research briefs, customer-facing market summaries, or automated operational reports. The architecture is attractive because the outputs are explicit files, the workflow is reproducible, and the cost base is small.
The boundary is equally important. The paper does not present robust backtesting, portfolio-level evaluation, risk-adjusted return analysis, or proof of profitable directional accuracy. It also relies on qualitative LLM output, limited data inputs, and a prototype deployment path. Treat it as a useful architecture pattern and a debugging diary. Treat it as an investment product only after much more validation, governance, and legal work. Markets are rude like that.
The real story starts where the system breaks
Most AI finance demos prefer to start with a gleaming interface. A ticker goes in, a prediction comes out, and somewhere in the background the word “sentiment” does unsupervised yoga. Ashraf’s paper is more honest. Its centre of gravity is the sequence of things that failed before the system became stable.
That matters because modern AI products often fail at the seams, not in the slideware. The model may answer. The API may return. The frontend may render. Then the pipeline collapses because a pandas object cannot be serialised into JSON, a workflow token cannot push to its own repository, or a cloud platform develops the software equivalent of a haunted corridor.
The paper’s case-first value comes from these seams. The final system is not technically exotic, but its development path exposes the kind of mundane friction that determines whether a prototype becomes a working tool or a beautiful screenshot abandoned after the second demo.
The architecture is deliberately small:
| Layer | Tool or component | Operational role |
|---|---|---|
| Data ingestion | yfinance and NewsAPI |
Fetch daily market data and recent headlines |
| Backend logic | Python script | Orchestrate fetching, prompting, parsing, and artifact generation |
| LLM analysis | Gemini Pro via API | Produce structured qualitative stock analysis |
| Automation | GitHub Actions | Run the pipeline on a daily schedule |
| State and artifact | predictions.json in the repository |
Store the latest machine-readable output |
| Frontend | Static HTML, CSS, and JavaScript | Display the latest analysis by fetching the JSON file |
The obvious summary is “serverless stock dashboard”. The more useful summary is “an event-driven artifact pipeline for recurring financial analysis”. That distinction is not semantic decoration. It changes what the system is good for.
A dashboard is an interface. An artifact pipeline is an operating pattern. It fetches inputs, generates a structured output, preserves that output in a known location, and lets other systems consume it. For business teams, this is the more transferable idea.
The pipeline is small because the artifact does the work
The paper’s system operates on a daily cycle. A GitHub Actions cron job triggers a Python script. The script fetches market and news data, reads the previous predictions.json file, performs an accuracy check against earlier predictions, prompts Gemini, parses the response, combines it with current price data and metrics, then overwrites predictions.json. A static site reads that file and displays the updated output.
This is not a full financial platform. It is closer to a scheduled research clerk with a filing cabinet.
That is precisely why the design is interesting. Many early AI products are overbuilt because teams assume intelligence requires infrastructure mass: persistent backend services, databases, queues, orchestration layers, dashboards, authentication systems, and monitoring stacks. Sometimes it does. Often it does not, at least not at prototype stage.
Ashraf’s design uses the repository as both codebase and lightweight state store. The generated JSON file becomes the handoff point between analysis and presentation. This gives the system three practical advantages.
First, the output is auditable. A JSON artifact can be inspected, versioned, diffed, and rolled back. For financial analysis, that matters. If an AI system recommended caution yesterday and enthusiasm today, operators need to know whether the input changed, the prompt changed, the model changed, or the system simply got theatrical.
Second, the frontend remains simple. Static hosting reduces the operational surface area. There is no application server to maintain, patch, scale, or accidentally misconfigure into a public incident.
Third, the architecture is cheap enough to be disposable. That sounds unromantic, but disposable prototypes are powerful. A team can test whether a workflow produces useful internal decisions before committing to production infrastructure. The graveyard of enterprise AI is crowded with systems that scaled before they learned what job they were actually doing.
Gemini is being used as a qualitative analyst, not a price oracle
The most tempting misunderstanding is also the most dangerous one: that this paper validates LLM-based stock prediction. It does not.
The paper positions Gemini as a qualitative analysis engine. It receives market data and news context, then returns a structured JSON object. That is materially different from claiming that the model can generate tradeable forecasts with proven accuracy.
This distinction matters because financial prediction has a brutal evaluation problem. A system can sound insightful while being useless. It can be directionally right for the wrong reason. It can be lucky for a week, overfit to recent volatility, or produce language that feels calibrated but has no measurable edge. Markets have a special talent for punishing adjectives.
The paper does include a mechanism for checking previous predictions against subsequent price movement, but it does not provide the kind of long-horizon, statistically meaningful backtest that would support claims about profitability or robust directional performance. Future work explicitly includes longer backtesting and quantitative assessment of accuracy and potential profitability.
So the right interpretation is narrower and more useful:
| What the paper directly shows | What Cognaptus infers for operators | What remains uncertain |
|---|---|---|
| A daily automated LLM analysis pipeline can be built with public, low-cost tools | Similar pipelines can support lightweight internal intelligence workflows | Whether the analysis improves investment decisions |
| Gemini can be prompted to return structured JSON for a stock-analysis dashboard | LLMs can act as qualitative synthesis components inside deterministic workflows | Whether the model’s qualitative judgement is stable, calibrated, or financially useful |
| GitHub Actions can schedule and execute the workflow | CI/CD platforms can double as low-cost automation layers for small analytical products | Whether this pattern is robust enough for regulated, customer-facing finance |
| Debugging required fixes across data types, permissions, and platform behaviour | The main implementation risk sits at integration boundaries | How the system behaves under scale, API outages, model changes, and compliance requirements |
The replacement belief should be: “LLMs can generate structured qualitative analysis inside low-cost automation loops.” That is less exciting than “AI beats Wall Street”, but it has the advantage of being plausible.
The pandas bug is a small error with a large lesson
The first major failure was a data serialization error: TypeError: Object of type Series is not JSON serializable.
This is the kind of bug that looks trivial after it is fixed and deeply irritating before it is understood. The system tried to place a value from yfinance into a dictionary destined for JSON output. The value looked like a number, behaved near enough like a number for analytical purposes, but remained a pandas Series. Python’s standard JSON library did not know how to convert it.
The fix was explicit casting:
'current_price': round(float(stock_data['Close'].iloc[-1]), 2)
Technically, this is a one-line change. Operationally, it is a reminder that AI workflows are still software workflows. Every boundary between libraries, APIs, file formats, and interfaces can introduce type assumptions. LLMs do not repeal serialization. They simply make it easier to build systems that later discover serialization still exists.
For business teams, this has a practical implication. When designing AI-enabled reporting pipelines, pay attention to artifact boundaries. Where does a dataframe become JSON? Where does a model response become a typed object? Where does a generated recommendation become a displayed card? Where does an internal note become customer-visible content?
These boundaries need explicit handling. Otherwise, the system’s weakest point may not be reasoning quality. It may be the quiet mismatch between “number-like” and “number”.
The GitHub Actions permission bug is governance in miniature
The second failure occurred after the Python script ran successfully. The workflow attempted to push the updated predictions.json file back into the repository and received a 403 Forbidden error. The GitHub Actions bot did not have permission to write.
The fix was to add a permissions block:
permissions:
contents: write
This is more than a configuration footnote. It is a small example of a larger governance pattern. Automated systems need permissions, but not too many. The secure default is often read-only. The operational need is sometimes write access. The engineering task is to grant the minimum required capability deliberately.
In a financial-analysis context, this matters. A scheduled AI system that can generate and publish outputs is not just “running code”. It is changing the artefacts that users consume. In the paper’s prototype, that artifact is a JSON file. In a more mature business environment, it might be a customer briefing, a dashboard signal, a compliance-sensitive recommendation, or an internal risk flag.
The permission question then becomes: who or what is allowed to publish analysis, overwrite prior output, approve changes, and trigger downstream decisions?
The paper does not build a full governance layer, and it does not need to. But the permission bug usefully surfaces the issue early. A working automation pipeline is already a delegated actor. Giving it write access should be an architectural decision, not an accident discovered when the bot is politely denied entry.
The “ghost action” is the paper’s best debugging lesson
The most interesting failure was not caused by the author’s code. The GitHub Actions workflow began failing at the setup stage with an “action not found” error when trying to resolve a standard public action. The debugging path was systematic: verify workflow syntax, test different action versions, change the runner from ubuntu-latest to ubuntu-22.04, check account settings, and create a minimal workflow that did nothing except reproduce the failure.
That minimal test was important. It separated code failure from environment failure. When the stripped-down case still failed, the problem was no longer plausibly inside the stock-analysis script. The repository environment itself became suspect.
The eventual resolution was blunt: create a new blank repository, migrate the same code and secrets, and run the workflow there. It succeeded on the first attempt.
This is the sort of detail that rarely survives polished AI case studies, which is exactly why it is useful. Real systems sometimes fail because the platform state is strange. Not theoretically impossible. Not elegantly diagnosed. Just strange.
For operators, the lesson is not “always recreate the repository”. That would be the wrong moral, and a fine way to turn debugging into ritual sacrifice. The lesson is to escalate the hypothesis only after isolating the failure:
- Check the application code.
- Check the workflow configuration.
- Check dependency versions.
- Check permissions and account policy.
- Reduce to a minimal reproducible test.
- If the minimal case fails, consider the environment itself.
The paper’s debugging sequence is valuable because it treats platform failure as a valid final hypothesis, not as a first excuse. That distinction separates engineering judgement from superstition.
The human-AI workflow is architecture plus execution, not magic co-authorship
Ashraf includes an author’s note describing the project as a human-guided AI execution workflow. The human author acted as architect, strategist, and project manager: defining goals, sequencing tasks, diagnosing logical errors, and specifying desired fixes. The AI generated code, drafted text, and implemented solutions under direction.
This is easy to either overstate or dismiss. The boring interpretation is “the author used AI assistance”. The overheated interpretation is “AI built the system”. Neither is quite right.
The workflow described in the paper is closer to a shift in where human expertise is applied. The human role moves upward: task decomposition, architectural judgement, debugging direction, and acceptance criteria. The AI role becomes implementation acceleration: code generation, drafting, iteration, and local fixes.
That division works only if the human can recognise when the output is wrong. In the debugging case, the important moves were not just generating code. They were identifying the category of failure, narrowing the search space, and deciding when the environment had become the likely culprit.
This is an uncomfortable point for organisations hoping to replace expertise with prompting. The paper suggests something more specific: AI can compress implementation time when paired with human architectural control. It does not show that architectural control can be skipped. Sorry, procurement departments.
Where the system is actually useful in business
The immediate financial domain makes the system feel like a stock tool. But the deeper pattern is broader: scheduled data ingestion, model-assisted interpretation, structured artifact generation, and static publication.
That pattern can support several business workflows.
| Use case | Why this architecture fits | What would need hardening |
|---|---|---|
| Internal market watchlists | Daily summaries can be generated cheaply and reviewed by analysts | Source quality, model stability, analyst approval flow |
| Executive briefing dashboards | Static delivery is simple and accessible | Access control, version history, explanation standards |
| Customer-facing research snippets | JSON artifacts can feed lightweight content modules | Compliance review, disclaimers, suitability controls |
| Competitor or sector monitoring | News and public data can be summarised on a schedule | Entity resolution, deduplication, source weighting |
| Operational anomaly reports | The same pattern can monitor non-financial metrics | Alert thresholds, escalation logic, false-positive handling |
The common thread is not stock prediction. It is recurring analysis with low infrastructure burden.
For a small team, this can be enough. A founder, analyst, or internal innovation group can test whether an AI-generated daily artifact is useful before investing in a full platform. If users ignore it, the cost of learning is low. If users rely on it, the team has evidence for where to harden the system.
That is a healthier adoption path than buying a large platform first and discovering later that the workflow was mostly theatre with a login screen.
What must change before this becomes a serious financial product
The paper’s boundaries are not incidental. They define the distance between prototype and product.
First, the system needs real evaluation. A daily accuracy check is a start, but investment relevance requires longer backtesting, benchmark comparisons, and clarity about the prediction target. Is the model forecasting next-day direction, medium-term sentiment, volatility, drawdown risk, or merely summarising context? Each target requires different evaluation.
Second, the data layer is thin. End-of-day prices and headlines can support a basic qualitative summary, but serious financial analysis usually needs more: fundamentals, earnings data, macro indicators, sector context, analyst revisions, liquidity, corporate actions, and cleaner news ingestion. More data does not automatically mean better judgement, but too little data guarantees shallow judgement.
Third, LLM output needs stability controls. If the same inputs produce materially different recommendations across runs, the system becomes difficult to trust. Structured JSON helps with parsing, but not necessarily with calibration. Operators would need prompt versioning, model-version tracking, deterministic settings where possible, output validation, and human review for high-stakes uses.
Fourth, compliance cannot be bolted on at the end. A tool that displays analysis about securities may drift into regulated advice depending on jurisdiction, audience, personalisation, and call-to-action. The paper’s prototype is a research system. A commercial version would need disclaimers, suitability boundaries, audit logs, approval workflows, and probably legal review from someone whose job is to ruin everyone’s fun for good reasons.
Finally, platform resilience matters. GitHub Actions and static hosting are excellent for small prototypes. They are not automatically sufficient for uptime-sensitive, regulated, customer-facing products. API failures, rate limits, secret rotation, dependency changes, and model availability all need operational planning.
The paper’s contribution is practical, not predictive
The best way to read this paper is as a disciplined prototype diary. It shows that the components for lightweight AI financial analysis are now accessible to individual builders: public market-data libraries, news APIs, LLM APIs, scheduled CI workflows, static frontends, and repository-based artifacts.
Its strongest evidence is implementation evidence. The system was built. It ran. It broke. The paper documents how it broke and how the author fixed it. That is not the same as scientific validation of market performance, but it is still useful knowledge.
For Cognaptus readers, the practical takeaway is a design principle:
Build the smallest automated intelligence loop that produces a reviewable artifact.
That artifact might be a JSON file, a Markdown brief, a dashboard card, a CSV, or a PDF report. The key is that it should be structured, inspectable, versioned, and easy to consume. Once the artifact proves useful, the surrounding system can mature: better data, better validation, stronger governance, richer interfaces, and more careful deployment.
This reverses the usual AI product mistake. Instead of starting with a grand platform and searching for a decision, start with a recurring decision and build the lightest loop that improves it.
Conclusion: zero infrastructure is not zero engineering
Ashraf’s serverless stock analyst is not a market oracle. It is not a validated trading strategy. It is not a substitute for financial research. Good. Those claims would be much less interesting and much more suspicious.
Its real contribution is showing how little infrastructure is now required to build an automated, LLM-assisted analysis loop — and how much engineering judgement is still required to make that loop work. The pandas serialization bug, the GitHub permissions failure, and the ghost action are not side quests. They are the substance of the case.
For operators, the lesson is clear. Serverless AI systems can reduce deployment cost, speed up experimentation, and make lightweight intelligence workflows practical for small teams. But the hard parts do not disappear. They move into data boundaries, permission models, platform behaviour, evaluation design, and governance.
The bulls and bears may get the headline. The JSON file does the work.
Cognaptus: Automate the Present, Incubate the Future.
-
Taniv Ashraf, “A Serverless Architecture for Real-Time Stock Analysis using Large Language Models: An Iterative Development and Debugging Case Study,” arXiv:2507.09583, 2025. https://arxiv.org/abs/2507.09583 ↩︎