diff --git a/.ruff.toml b/.ruff.toml
index d823f4d..0d0be53 100644
--- a/.ruff.toml
+++ b/.ruff.toml
@@ -1,4 +1,4 @@
-line-length = 100
+line-length = 110
 
 [format]
 quote-style = "single"
diff --git a/Default.sublime-commands b/Default.sublime-commands
index d2166a5..05b1af4 100644
--- a/Default.sublime-commands
+++ b/Default.sublime-commands
@@ -23,15 +23,11 @@
 	{
 		"caption": "OpenAI: New Message",
 		"command": "openai",
-		"args": {
-			"mode": "chat_completion"
-		}
 	},
 	{
 		"caption": "OpenAI: New Message With Sheets",
 		"command": "openai",
 		"args": {
-			"mode": "chat_completion",
 			"files_included": true
 		}
 	},
diff --git a/README.md b/README.md
index 120db88..4638d00 100644
--- a/README.md
+++ b/README.md
@@ -1,24 +1,20 @@
-[![Star on GitHub][img-stars]][stars]
+[![Star on GitHub][img-stars]][stars] ![Package Control][img-downloads]
 
 # OpenAI Sublime Text Plugin
 ## tldr;
 
-OpenAI Completion is a Sublime Text plugin that uses LLM models to provide first class code assistant support within the editor.
+Cursor level of AI assistance for Sublime Text. I mean it.
 
-It's not locked with just OpenAI anymore. [llama.cpp](https://github.com/ggerganov/llama.cpp) server and [ollama](https://ollama.com) supported as well.
+Works with all OpenAI'ish API: [llama.cpp](https://github.com/ggerganov/llama.cpp) server, [ollama](https://ollama.com) or whatever third party LLM hosting.
 
-![](static/media/ai_chat_left.png)
-
-> [!NOTE]
-> I think this plugin is in its finite state. Meaning there's no further development of it I have in plans. I still have plans to fix bugs and review PR if any, but those tons of little enhancement that could be applied here to fix minor issues and roughness and there likely never would.
-
-> What I do have in plans is to implement ST front end for [plandex](https://github.com/plandex-ai/plandex) tool based on some parts of this plugin codebase, to get (and to bring) a fancy and powerful agentish capabilities to ST ecosystem. So stay tuned.
+![](static/media/ai_chat_right_phantom.png)
 
 ## Features
 
 - Code manipulation (append, insert and edit) selected code with OpenAI models.
+- **Phantoms** Get non-disruptive inline right in view answers from the model.
 - **Chat mode** powered by whatever model you'd like.
-- **GPT-4 support**.
+- **gpt-o1 support**.
 - **[llama.cpp](https://github.com/ggerganov/llama.cpp)**'s server, **[Ollama](https://ollama.com)** and all the rest OpenAI'ish API compatible.
 - **Dedicated chats histories** and assistant settings for a projects.
 - **Ability to send whole files** or their parts as a context expanding.
@@ -75,7 +71,7 @@ You can separate a chat history and assistant settings for a given project by ap
 {   
     "settings": {
         "ai_assistant": {
-            "cache_prefix": "your_name_project"
+            "cache_prefix": "your_project_name"
         }
     }
 }
@@ -100,16 +96,31 @@ To send the whole file(s) in advance to request you should `super+button1` on th
 
 Image handle can be called by `OpenAI: Handle Image` command.
 
-It expects an absolute path of image to be selected in a buffer on the command call (smth like `/Users/username/Documents/Project/image.png`). In addition command can be passed by input panel to proceed the image with special treatment. `png` and `jpg` images are only supported.
+It expects an absolute path to image to be selected in a buffer or stored in clipboard on the command call (smth like `/Users/username/Documents/Project/image.png`). In addition command can be passed by input panel to proceed the image with special treatment. `png` and `jpg` images are only supported.
 
-> [!WARNING]
-> Userflow don't expects that image url would be passed by that input panel input, it has to be selected in buffer. I'm aware about the UX quality of this design decision, but yet I'm too lazy to develop it further to some better state.
+> [!NOTE]
+> Currently plugin expects the link or the list of links separated by a new line to be selected in buffer or stored in clipboard **only**.
 
 ### In buffer llm use case
 
-1. You can pick one of the following modes: `append`, `replace`, `insert`. They're quite self-descriptive. They should be set up in assistant settings to take effect.
-2. Select some text (they're useless otherwise) to manipulate with and hit `OpenAI: New Message`.
-4. The plugin will response accordingly with **appending**, **replacing** or **inserting** some text.
+#### Phantom use case
+
+Phantom is the overlay UI placed inline in the editor view (see the picture below). It doesn't affects content of the view. 
+
+1. You can set `"prompt_mode": "phantom"` for AI assistant in its settings.
+2. [optional] Select some text to pass in context in to manipulate with.
+3. Hit `OpenAI: New Message` or `OpenAI: Chat Model Select` and ask whatever you'd like in popup input pane.
+4. Phantom will appear below the cursor position or the beginning of the selection while the streaming LLM answer occurs.
+5. You can apply actions to the llm prompt, they're quite self descriptive and follows behavior deprecated in buffer commands.
+6. You can hit `ctrl+c` to stop prompting same as with in `panel` mode.
+
+![](static/media/phantom_example.png)
+
+> [!WARNING]
+> The following in buffer commands are deprecated and will be removed in 5.0 release.
+> 1. You can pick one of the following modes: `append`, `replace`, `insert`. They're quite self-descriptive. They should be set up in assistant settings to take effect.
+> 2. Select some text (they're useless otherwise) to manipulate with and hit `OpenAI: New Message`.
+> 4. The plugin will response accordingly with **appending**, **replacing** or **inserting** some text.
 
 > [!IMPORTANT]
 >  Yet this is a standalone mode, i.e. an existing chat history won't be sent to a server on a run.
@@ -138,10 +149,6 @@ The OpenAI Completion plugin has a settings file where you can set your OpenAI A
 }
 ```
 
-### ollama setup specific
-
-If you're here it meaning that a model that you're using with ollama talking shit. This is because `temperature` property of a model which is 1 somewhat [doubles](https://github.com/ollama/ollama/blob/69be940bf6d2816f61c79facfa336183bc882720/openai/openai.go#L454) on ollama's side, so it becomes 2, which is a little bit too much for a good model's response. So you to make things work you have to set temperature to 1.
-
 ### Advertisement disabling
 
 To disable advertisement you have to add `"advertisement": false` line into an assistant setting where you wish it to be disabled.
@@ -188,3 +195,5 @@ You can setup it up by overriding the proxy property in the `OpenAI completion`
 
 [stars]: https://github.com/yaroslavyaroslav/OpenAI-sublime-text/stargazers
 [img-stars]: static/media/star-on-github.svg
+[downloads]: https://packagecontrol.io/packages/OpenAI%20completion
+[img-downloads]: https://img.shields.io/packagecontrol/dt/OpenAI%2520completion.svg
\ No newline at end of file
diff --git a/dependencies.json b/dependencies.json
index 7a595c5..8ae47b2 100644
--- a/dependencies.json
+++ b/dependencies.json
@@ -1,7 +1,8 @@
 {
     "*": {
         "*": [
-            "requests"
+            "requests",
+            "mdpopups"
         ]
     }
 }
\ No newline at end of file
diff --git a/main.py b/main.py
index 3d6658a..cd93b7e 100644
--- a/main.py
+++ b/main.py
@@ -16,6 +16,7 @@
 from .plugins.openai import Openai  # noqa: E402, F401
 from .plugins.openai_panel import OpenaiPanelCommand  # noqa: E402, F401
 from .plugins.output_panel import SharedOutputPanelListener  # noqa: E402, F401
+from .plugins.phantom_streamer import PhantomStreamer # noqa: E402, F401
 from .plugins.settings_reloader import ReloadSettingsListener  # noqa: E402, F401
 from .plugins.stop_worker_execution import (  # noqa: E402
     StopOpenaiExecutionCommand,  # noqa: F401
diff --git a/messages.json b/messages.json
index 9433eff..ea03e09 100644
--- a/messages.json
+++ b/messages.json
@@ -23,5 +23,6 @@
     "4.0.0": "messages/4.0.0.md",
     "4.0.1": "messages/4.0.1.md",
     "4.1.0": "messages/4.1.0.md",
-    "4.1.1": "messages/4.1.1.md"
+    "4.1.1": "messages/4.1.1.md",
+    "4.2.0": "messages/4.2.0.md"
 }
diff --git a/messages/4.2.0.md b/messages/4.2.0.md
new file mode 100644
index 0000000..19c2af5
--- /dev/null
+++ b/messages/4.2.0.md
@@ -0,0 +1,36 @@
+## Featues
+
+- New in buffer mode `phantom`
+- `stream` toggle for responses brought back
+- images handling UX improved
+- advertisement logic improved
+
+## Deprecated
+- `append`, `replace`, `insert` in prompt modes is deprecated and will be removed in 5.0 release.
+
+## Detaied description
+
+### Phantom mode
+
+Phantom is the overlay UI placed inline in the editor view (see the picture below). It doesn't affects content of the view. 
+
+1. You can set `"prompt_mode": "phantom"` for AI assistant in its settings.
+2. [optional] Select some text to pass in context in to manipulate with.
+3. Hit `OpenAI: New Message` or `OpenAI: Chat Model Select` and ask whatever you'd like in popup input pane.
+4. Phantom will appear below the cursor position or the beginning of the selection while the streaming LLM answer occurs.
+5. You can apply actions to the llm prompt, they're quite self descriptive and follows behavior deprecated in buffer commands.
+6. You can hit `ctrl+c` to stop prompting same as with in `panel` mode.
+
+### Stream toggle
+
+You can toggle streaming behavior of a model response with `"stream": false` setting in per assistant basis. That's pretty much it, the default value is `true`.
+
+### Images handling UX improved
+
+Images paths can now be fetched from the clipboard in addition to be extracted from the selection in a given view. It could be either a single image path [and nothing more than that] or a list of such paths separated with a new line, e.g. `/Users/username/Documents/Project/image0.png\n/Users/username/Documents/Project/image1.png`.
+
+Please note the parser that is trying to deduct whether the content of your clipboard is an [list of] image[s] is made by AI and quite fragile, so don't expect too much from it.
+
+### Advertisement logic improvement
+
+Advertisements now appear only when users excessively utilize the plugin, such as by processing too many tokens or sending/receiving an excessive number of messages.
diff --git a/openAI.sublime-settings b/openAI.sublime-settings
index 523f6f8..6d85e36 100644
--- a/openAI.sublime-settings
+++ b/openAI.sublime-settings
@@ -67,13 +67,16 @@
 
             // Mode of how plugin should prompts its output, available options:
             //  - panel: prompt would be output in output panel, selected text wouldn't be affected in any way.
-            //  - append: prompt would be added next to the selected text.
-            //  - insert: prompt would be inserted instead of a placeholder within a selected text.
-            //  - replace: prompt would overwrite selected text.
+            //  - append: [DEPRECATED] prompt would be added next to the selected text.
+            //  - insert: [DEPRECATED] prompt would be inserted instead of a placeholder within a selected text.
+            //  - replace: [DEPRECATED] prompt would overwrite selected text.
+            //  - phantom: llm prompts in phantom view in a non-dstruptive way to the buffer content, each such phantom provides way to copy to clipboard, append, replace or paste in new created tab all the content with a single click.
             //
             // All cases but `panel` required to some text be selected beforehand.
             // The same in all cases but `panel` user type within input panel will be treated by a model
             // as `system` command, e.g. instruction to action.
+            //
+            // NOTE: Please pay attention that append, insert and replace are deprecated and will be removed in 5.0 release in favor to phantom mode.
             "prompt_mode": "panel", // **REQUIRED**
 
             // The model which will generate the chat completion.
@@ -121,6 +124,11 @@
             // Does not affect editing mode.
             "max_tokens": 2048,
 
+            // Since o1 (September 2024) OpenAI deprecated max_token key,
+            // Use this field to set the cap instead. The default value set here is recommended by OpenAI
+            // _minimal_ value for this particular model. https://platform.openai.com/docs/guides/reasoning/allocating-space-for-reasoning
+            "max_completion_tokens": 25000,
+
             // An alternative to sampling with temperature, called nucleus sampling,
             // where the model considers the results of the tokens with `top_p` probability mass.
             // So 0.1 means only the tokens comprising the top 10% probability mass are considered.
@@ -134,6 +142,12 @@
             // docs: https://platform.openai.com/docs/api-reference/parameter-details
             "frequency_penalty": 0,
 
+            // Toggles whether to stream the response from the server or to get in atomically
+            // after llm finishes its prompting.
+            //
+            // By default this is true.
+            "stream": true,
+
             // Number between -2.0 and 2.0.
             /// Positive values penalize new tokens based on whether they appear in the text so far,
             // increasing the model's likelihood to talk about new topics.
@@ -142,6 +156,13 @@
         },
 
         // Instructions //
+        {
+            "name": "Insert instruction example",
+            "prompt_mode": "phantom",
+            "chat_model": "gpt-4o-mini", // works unreliable with gpt-3.5-turbo yet.
+            "assistant_role": "Insert code or whatever user will request with the following command instead of placeholder with respect to senior knowledge of in Python 3.8 and Sublime Text 4 plugin API",
+            "max_tokens": 4000,
+        },
         {
             "name": "Insert instruction example",
             "prompt_mode": "insert",
@@ -190,23 +211,9 @@
             "temperature": 1,
             "max_tokens": 2048,
         },
-        {
-            "name": "UIKit & Combine",
-            "prompt_mode": "panel",
-            "chat_model": "gpt-4o-mini",
-            "assistant_role": "You are senior UIKit and Combine code assistant",
-            "max_tokens": 4000,
-        },
-        {
-            "name": "Social Researcher",
-            "prompt_mode": "panel",
-            "chat_model": "gpt-4o-mini",
-            "assistant_role": "You are senior social researcher",
-            "max_tokens": 4000,
-        },
         {
             "name": "Corrector",
-            "prompt_mode": "replace",
+            "prompt_mode": "phantom",
             "chat_model": "gpt-4o-mini",
             "assistant_role": "Fix provided text with the correct and sounds English one, you are strictly forced to skip any changes in such its part that have not rules violation within them, you're strictly forbidden to wrap response into something and to provide any explanation.",
             "max_tokens": 1000,
@@ -218,19 +225,5 @@
             "assistant_role": "1. You are to provide clear, concise, and direct responses.\n2. Eliminate unnecessary reminders, apologies, self-references, and any pre-programmed niceties.\n3. Maintain a casual tone in your communication.\n4. Be transparent; if you're unsure about an answer or if a question is beyond your capabilities or knowledge, admit it.\n5. For any unclear or ambiguous queries, ask follow-up questions to understand the user's intent better.\n6. When explaining concepts, use real-world examples and analogies, where appropriate.\n7. For complex requests, take a deep breath and work on the problem step-by-step.\n8. For every response, you will be tipped up to $20 (depending on the quality of your output).\n\nIt is very important that you get this right. Multiple lives are at stake.\n",
             "max_tokens": 4000,
         },
-        {
-            "name": "Bash & Git assistant",
-            "prompt_mode": "panel",
-            "chat_model": "gpt-4o-mini",
-            "assistant_role": "You are bash and git senior assistant",
-            "max_tokens": 4000,
-        },
-        {
-            "name": "Pytorch assistant",
-            "prompt_mode": "panel",
-            "chat_model": "gpt-4o-mini",
-            "assistant_role": "You are senior Pytorch and LLM/SD code assistant",
-            "max_tokens": 4000,
-        },
     ]
 }
diff --git a/plugins/ai_chat_event.py b/plugins/ai_chat_event.py
index 347ba42..e692cbb 100644
--- a/plugins/ai_chat_event.py
+++ b/plugins/ai_chat_event.py
@@ -36,9 +36,5 @@ def is_ai_chat_tab_active(self, window: Window) -> bool:
         return active_view.name() == 'AI Chat' if active_view else False
 
     def get_status_message(self, cacher: Cacher) -> str:
-        tokens = cacher.read_tokens_count()
-        prompt = tokens['prompt_tokens'] if tokens else 0
-        completion = tokens['completion_tokens'] if tokens else 0
-        total = prompt + completion
-
-        return f'[⬆️: {prompt:,} + ⬇️: {completion:,} = {total:,}]'
+        prompt, completion = cacher.read_tokens_count()
+        return f'[⬆️: {prompt:,} + ⬇️: {completion:,} = {prompt + completion:,}]'
diff --git a/plugins/assistant_settings.py b/plugins/assistant_settings.py
index 5eedf5a..50fc360 100644
--- a/plugins/assistant_settings.py
+++ b/plugins/assistant_settings.py
@@ -10,6 +10,7 @@ class PromptMode(Enum):
     append = 'append'
     insert = 'insert'
     replace = 'replace'
+    phantom = 'phantom'
 
 
 @dataclass
@@ -19,25 +20,30 @@ class AssistantSettings:
     url: str | None
     token: str | None
     chat_model: str
-    assistant_role: str
-    temperature: int
-    max_tokens: int
-    top_p: int
-    frequency_penalty: int
-    presence_penalty: int
+    assistant_role: str | None
+    temperature: int | None
+    max_tokens: int | None
+    max_completion_tokens: int | None
+    top_p: int | None
+    frequency_penalty: int | None
+    presence_penalty: int | None
     placeholder: str | None
+    stream: bool | None
     advertisement: bool
 
 
 DEFAULT_ASSISTANT_SETTINGS: Dict[str, Any] = {
     'placeholder': None,
+    'assistant_role': None,
     'url': None,
     'token': None,
-    'temperature': 1,
-    'max_tokens': 2048,
-    'top_p': 1,
-    'frequency_penalty': 0,
-    'presence_penalty': 0,
+    'temperature': None,
+    'max_tokens': None,
+    'max_completion_tokens': None,
+    'top_p': None,
+    'frequency_penalty': None,
+    'presence_penalty': None,
+    'stream': True,
     'advertisement': True,
 }
 
@@ -47,4 +53,3 @@ class CommandMode(Enum):
     refresh_output_panel = 'refresh_output_panel'
     create_new_tab = 'create_new_tab'
     reset_chat_history = 'reset_chat_history'
-    chat_completion = 'chat_completion'
diff --git a/plugins/buffer.py b/plugins/buffer.py
index d4a0109..e0448b5 100644
--- a/plugins/buffer.py
+++ b/plugins/buffer.py
@@ -1,4 +1,9 @@
+from __future__ import annotations
+
+from typing import Dict
+
 from sublime import Edit, Region, View
+from sublime_types import Point
 from sublime_plugin import TextCommand
 
 
@@ -25,10 +30,10 @@ def run(self, edit: Edit, position: int, text: str):  # type: ignore
 
 
 class ReplaceRegionCommand(TextCommand):
-    def run(self, edit: Edit, region, text: str):  # type: ignore
+    def run(self, edit: Edit, region: Dict[str, Point], text: str):  # type: ignore
         self.view.replace(edit=edit, region=Region(region['a'], region['b']), text=text)
 
 
 class EraseRegionCommand(TextCommand):
-    def run(self, edit: Edit, region):  # type: ignore
+    def run(self, edit: Edit, region: Dict[str, Point]):  # type: ignore
         self.view.erase(edit=edit, region=Region(region['a'], region['b']))
diff --git a/plugins/cacher.py b/plugins/cacher.py
index aca7228..ae993b6 100644
--- a/plugins/cacher.py
+++ b/plugins/cacher.py
@@ -1,14 +1,17 @@
 from __future__ import annotations
 
 import json
+import logging
 import os
 from json.decoder import JSONDecodeError
-from typing import Any, Dict, Iterator, List
+from typing import Any, Dict, Iterator, List, Tuple
 
 import sublime
 
 from . import jl_utility as jl
 
+logger = logging.getLogger(__name__)
+
 
 class Cacher:
     def __init__(self, name: str | None = None) -> None:
@@ -35,6 +38,11 @@ def check_and_create(self, path: str):
         if not os.path.isfile(path):
             open(path, 'w').close()
 
+    def len(self) -> int:
+        length = len(self.read_all()) // 2
+        logger.debug(f'history length: {length}')
+        return length
+
     def append_tokens_count(self, data: Dict[str, int]):
         try:
             with open(self.tokens_count_file, 'r') as file:
@@ -43,7 +51,6 @@ def append_tokens_count(self, data: Dict[str, int]):
             existing_data = {
                 'prompt_tokens': 0,
                 'completion_tokens': 0,
-                'total_tokens': 0,
             }
 
         for key, value in data.items():
@@ -59,15 +66,15 @@ def reset_tokens_count(self):
         with open(self.tokens_count_file, 'w') as _:
             pass  # Truncate the file by opening it in 'w' mode and doing nothing
 
-    def read_tokens_count(self) -> Dict[str, int] | None:
+    def read_tokens_count(self) -> Tuple[int, int]:
         self.check_and_create(self.tokens_count_file)
         with open(self.tokens_count_file, 'r') as file:
             try:
                 data: Dict[str, int] | None = json.load(file)
+                tokens = (data['prompt_tokens'], data['completion_tokens'])
             except JSONDecodeError:
-                data = {'prompt_tokens': 0, 'completion_tokens': 0, 'total_tokens': 0}
-                return data
-        return data
+                return (0, 0)
+        return tokens
 
     def save_model(self, data: Dict[str, Any]):
         with open(self.current_model_file, 'w') as file:
diff --git a/plugins/openai_base.py b/plugins/openai_base.py
index 0f224f3..9a7a401 100644
--- a/plugins/openai_base.py
+++ b/plugins/openai_base.py
@@ -43,10 +43,9 @@ def process_openai_command(cls, view: View, assistant: AssistantSettings | None,
         try:
             minimum_selection_length: int | None = settings.get('minimum_selection_length')  # type: ignore
             if region and minimum_selection_length and len(region) < minimum_selection_length:
-                if mode == CommandMode.chat_completion:
-                    raise WrongUserInputException(
-                        'Not enough text selected to complete the request, please expand the selection.'
-                    )
+                raise WrongUserInputException(
+                    'Not enough text selected to complete the request, please expand the selection.'
+                )
         except WrongUserInputException as error:
             present_error(title='OpenAI error', error=error)
             return
@@ -56,8 +55,7 @@ def process_openai_command(cls, view: View, assistant: AssistantSettings | None,
 
         if mode == CommandMode.handle_image_input.value:
             cls.handle_image_input(region, text, view, mode)
-
-        elif mode == CommandMode.chat_completion.value:
+        else:
             cls.handle_chat_completion(view, region, text, mode, assistant, files_included)
 
     @classmethod
diff --git a/plugins/openai_network_client.py b/plugins/openai_network_client.py
index c45dd8b..ae1c9af 100644
--- a/plugins/openai_network_client.py
+++ b/plugins/openai_network_client.py
@@ -2,6 +2,7 @@
 
 import json
 import logging
+import random
 from base64 import b64encode
 from http.client import HTTPConnection, HTTPResponse, HTTPSConnection
 from typing import Any, Dict, List
@@ -62,6 +63,7 @@ def __init__(self, settings: sublime.Settings, assistant: AssistantSettings, cac
     def prepare_payload(self, assitant_setting: AssistantSettings, messages: List[Dict[str, str]]) -> str:
         internal_messages: List[Dict[str, str]] = []
         if assitant_setting.assistant_role:
+            req_tok, out_tok = self.cacher.read_tokens_count()
             internal_messages.insert(
                 0,
                 {
@@ -73,7 +75,9 @@ def prepare_payload(self, assitant_setting: AssistantSettings, messages: List[Di
                     + ' but ensure the message is light. Include a clear call-to-action but avoid'
                     + ' pressuring the user. Always express gratitude. Format the donation link nicely:'
                     + ' https://github.com/sponsors/yaroslavyaroslav'
-                    if assitant_setting.advertisement and random.randint(0, 1) > 0.5
+                    if assitant_setting.advertisement
+                    and (self.cacher.len() > 8 or req_tok + out_tok > 10_000)
+                    and random.randint(0, 1) > 0.3
                     else '',
                 },
             )
@@ -88,13 +92,18 @@ def prepare_payload(self, assitant_setting: AssistantSettings, messages: List[Di
 
         return json.dumps(
             {
-                # Todo add uniq name for each output panel (e.g. each window)
-                'messages': internal_messages,
-                'model': assitant_setting.chat_model,
-                'temperature': assitant_setting.temperature,
-                'max_tokens': assitant_setting.max_tokens,
-                'top_p': assitant_setting.top_p,
-                'stream': True,
+                # Filter out any `None` values using dictionary comprehension
+                key: value
+                for key, value in {
+                    'messages': internal_messages,
+                    'model': assitant_setting.chat_model,
+                    'temperature': assitant_setting.temperature,
+                    'max_tokens': assitant_setting.max_tokens,
+                    'max_completion_tokens': assitant_setting.max_completion_tokens,
+                    'top_p': assitant_setting.top_p,
+                    'stream': assitant_setting.stream,
+                }.items()
+                if value is not None
             }
         )
 
diff --git a/plugins/openai_worker.py b/plugins/openai_worker.py
index 93928d3..833e6d2 100644
--- a/plugins/openai_worker.py
+++ b/plugins/openai_worker.py
@@ -4,7 +4,8 @@
 import copy
 import logging
 import re
-from json import JSONDecoder
+from http.client import HTTPResponse
+from json import JSONDecodeError, JSONDecoder
 from threading import Event, Thread
 from typing import Any, Dict, List
 
@@ -27,6 +28,7 @@
     present_unknown_error,
 )
 from .openai_network_client import NetworkClient
+from .phantom_streamer import PhantomStreamer
 
 logger = logging.getLogger(__name__)
 
@@ -90,6 +92,7 @@ def __init__(
         self.listner = SharedOutputPanelListener(markdown=markdown_setting, cacher=self.cacher)
 
         self.buffer_manager = TextStreamer(self.view)
+        self.phantom_manager = PhantomStreamer(self.view)
         super(OpenAIWorker, self).__init__()
 
     # This method appears redundant.
@@ -109,6 +112,9 @@ def handle_sse_delta(self, delta: Dict[str, Any], full_response_content: Dict[st
             if 'content' in delta:
                 full_response_content['content'] += delta['content']
                 self.update_output_panel(delta['content'])
+        elif self.assistant.prompt_mode == PromptMode.phantom.name:
+            if 'content' in delta:
+                self.phantom_manager.update_completion(delta['content'])
         else:
             if 'content' in delta:
                 self.update_completion(delta['content'])
@@ -157,24 +163,13 @@ def prepare_to_response(self):
                         )
                 elif not self.assistant.placeholder:
                     raise WrongUserInputException(
-                        'There is no placeholder value set for this assistant. Please add `placeholder` property in a given assistant setting.'
+                        'There is no placeholder value set for this assistant. '
+                        + 'Please add `placeholder` property in a given assistant setting.'
                     )
             except Exception:
                 raise
 
-    def handle_chat_response(self):
-        response = self.provider.execute_response()
-
-        if response is None or response.status != 200:
-            return
-
-        try:
-            self.prepare_to_response()
-        except Exception:
-            logger.error('prepare_to_response failed')
-            self.provider.close_connection()
-            raise
-
+    def handle_streaming_response(self, response: HTTPResponse):
         # without key declaration it would failt to append there later in code.
         full_response_content = {'role': '', 'content': ''}
 
@@ -201,9 +196,9 @@ def handle_chat_response(self):
                 chunk_str = chunk_str[len('data:') :].strip()
 
                 try:
-                    response_str: Dict[str, Any] = JSONDecoder().decode(chunk_str)
-                    if 'delta' in response_str['choices'][0]:
-                        delta: Dict[str, Any] = response_str['choices'][0]['delta']
+                    response_dict: Dict[str, Any] = JSONDecoder().decode(chunk_str)
+                    if 'delta' in response_dict['choices'][0]:
+                        delta: Dict[str, Any] = response_dict['choices'][0]['delta']
                         self.handle_sse_delta(delta=delta, full_response_content=full_response_content)
                 except:
                     self.provider.close_connection()
@@ -212,13 +207,84 @@ def handle_chat_response(self):
         self.provider.close_connection()
         if self.assistant.prompt_mode == PromptMode.panel.name:
             if full_response_content['role'] == '':
-                full_response_content['role'] = (
-                    'assistant'  # together.ai never returns role value, so we have to set it manually
-                )
+                # together.ai never returns role value, so we have to set it manually
+                full_response_content['role'] = 'assistant'
             self.cacher.append_to_cache([full_response_content])
             completion_tokens_amount = self.calculate_completion_tokens([full_response_content])
             self.cacher.append_tokens_count({'completion_tokens': completion_tokens_amount})
 
+    def handle_plain_response(self, response: HTTPResponse):
+        # Prepare the full response content structure
+        full_response_content = {'role': '', 'content': ''}
+
+        logger.debug('Handling plain (non-streaming) response for OpenAIWorker.')
+
+        # Read the complete response directly
+        response_data = response.read().decode()
+        logger.debug(f'raw response: {response_data}')
+
+        try:
+            # Parse the JSON response
+            response_dict: Dict[str, Any] = JSONDecoder().decode(response_data)
+            logger.debug(f'raw dict: {response_dict}')
+
+            # Ensure there's at least one choice
+            if 'choices' in response_dict and len(response_dict['choices']) > 0:
+                choice = response_dict['choices'][0]
+                logger.debug(f'choise: {choice}')
+
+                if 'message' in choice:
+                    message = choice['message']
+                    logger.debug(f'message: {message}')
+                    # Directly populate the full response content
+                    if 'role' in message:
+                        full_response_content['role'] = message['role']
+                    if 'content' in message:
+                        full_response_content['content'] = message['content']
+
+            # If role is not set, default it
+            if full_response_content['role'] == '':
+                full_response_content['role'] = 'assistant'
+
+            self.handle_sse_delta(delta=full_response_content, full_response_content=full_response_content)
+            # Store the response in the cache
+            self.cacher.append_to_cache([full_response_content])
+
+            # Calculate and store the token count
+            completion_tokens_amount = self.calculate_completion_tokens([full_response_content])
+            self.cacher.append_tokens_count({'completion_tokens': completion_tokens_amount})
+
+        except JSONDecodeError as e:
+            logger.error('Failed to decode JSON response: %s', e)
+            self.provider.close_connection()
+            raise
+        except Exception as e:
+            logger.error('An error occurred while handling the plain response: %s', e)
+            self.provider.close_connection()
+            raise
+
+        # Close the connection
+        self.provider.close_connection()
+
+    def handle_chat_response(self):
+        response: HTTPResponse | None = self.provider.execute_response()
+
+        if response is None or response.status != 200:
+            return
+
+        try:
+            self.prepare_to_response()  # TODO: This could be moved earlier in request pipeline.
+        except Exception:
+            logger.error('prepare_to_response failed')
+            self.provider.close_connection()
+            raise
+
+        (
+            self.handle_streaming_response(response)
+            if self.assistant.stream
+            else self.handle_plain_response(response)
+        )
+
     def handle_response(self):
         try:
             self.handle_chat_response()
@@ -258,7 +324,8 @@ def wrap_sheet_contents_with_scope(self) -> List[str]:
                 content = view.substr(sublime.Region(0, view.size()))
 
                 # Wrapping the content with the derived scope name
-                wrapped_content = f'```{scope_name}\n{content}\n```'
+                # FIXME: make captured path relative to the project root
+                wrapped_content = f'`{view.file_name()}`\n' + f'```{scope_name}\n{content}\n```'
                 wrapped_selection.append(wrapped_content)
 
         return wrapped_selection
@@ -279,12 +346,12 @@ def manage_chat_completion(self):
             ## MARK: This should be here, otherwise it would duplicates the messages.
             image_assistant = copy.deepcopy(self.assistant)
             image_assistant.assistant_role = (
-                "Follow user's request on an image provided. "
-                'If none provided do either: '
-                '1. Describe this image that it be possible to drop it from the chat history without any context lost. '
-                "2. It it's just a text screenshot prompt its literally with markdown formatting (don't wrapp the text into markdown scope). "
-                "3. If it's a figma/sketch mock, provide the exact code of the exact following layout with the tools of user's choise. "
-                'Pay attention between text screnshot and a mock of the design in figma or sketch'
+                "Follow user's request on an image provided."
+                '\n If none provided do either:'
+                '\n 1. Describe this image that it be possible to drop it from the chat history without any context lost.'
+                "\n 2. It it's just a text screenshot prompt its literally with markdown formatting (don't wrapp the text into markdown scope)."
+                "\n 3. If it's a figma/sketch mock, provide the exact code of the exact following layout with the tools of user's choise."
+                '\n Pay attention between text screnshot and a mock of the design in figma or sketch'
             )
             payload = self.provider.prepare_payload(assitant_setting=image_assistant, messages=messages)
         else:
diff --git a/plugins/phantom_streamer.py b/plugins/phantom_streamer.py
new file mode 100644
index 0000000..2f5d748
--- /dev/null
+++ b/plugins/phantom_streamer.py
@@ -0,0 +1,83 @@
+from __future__ import annotations
+
+import logging
+from enum import Enum
+
+import mdpopups
+from sublime import NewFileFlags, Phantom, PhantomLayout, PhantomSet, View, active_window, set_clipboard
+
+VIEW_SETTINGS_KEY_OPENAI_TEXT = 'VIEW_SETTINGS_KEY_OPENAI_TEXT'
+OPENAI_COMPLETION_KEY = 'openai_completion'
+PHANTOM_TEMPLATE = (
+    '---'
+    + '\nallow_code_wrap: true'
+    + '\n---'
+    + '\n\n<a href="close">[x]</a> | <a href="copy">Copy</a> | <a href="append">Append</a> | <a href="replace">Replace</a> | <a href="new_file">In New Tab</a>'
+    + '\n\n{streaming_content}'
+)
+CLASS_NAME = 'openai-completion-phantom'
+
+logger = logging.getLogger(__name__)
+
+
+class PhantomStreamer:
+    def __init__(self, view: View) -> None:
+        self.view = view
+        self.phantom_set = PhantomSet(self.view, OPENAI_COMPLETION_KEY)
+        self.completion: str = ''
+        self.phantom: Phantom | None = None
+        self.phantom_id: int | None = None
+        logger.debug(f'view selection: {view.sel()[0]}')
+        self.selected_region = view.sel()[0]  # saving only first selection to ease buffer logic
+
+    def update_completion(self, completion: str):
+        line_beginning = self.view.line(self.view.sel()[0])
+        self.completion += completion
+
+        content = PHANTOM_TEMPLATE.format(streaming_content=self.completion)
+        html = mdpopups._create_html(self.view, content, wrapper_class=CLASS_NAME)
+
+        phantom = (
+            self.phantom
+            if self.phantom
+            else Phantom(line_beginning, html, PhantomLayout.BLOCK, self.close_phantom)
+        )
+
+        self.phantom_set.update([phantom])
+
+    def close_phantom(self, attribute):
+        logger.debug(f'attribure: `{attribute}`')
+        if attribute in [action.value for action in PhantomActions]:
+            if attribute == PhantomActions.copy.value:
+                set_clipboard(self.completion)
+            if attribute == PhantomActions.append.value:
+                self.view.run_command(
+                    'text_stream_at', {'position': self.selected_region.end(), 'text': self.completion}
+                )
+            elif attribute == PhantomActions.replace.value:
+                region_object = {'a': self.selected_region.begin(), 'b': self.selected_region.end()}
+                self.view.run_command('replace_region', {'region': region_object, 'text': self.completion})
+            elif attribute == PhantomActions.new_file.value:
+                new_tab = (self.view.window() or active_window()).new_file(
+                    flags=NewFileFlags.REPLACE_MRU
+                    | NewFileFlags.ADD_TO_SELECTION
+                    | NewFileFlags.CLEAR_TO_RIGHT,
+                    syntax='Packages/Markdown/MultiMarkdown.sublime-syntax',
+                )
+                new_tab.set_scratch(False)
+                new_tab.run_command('text_stream_at', {'position': 0, 'text': self.completion})
+            elif attribute == PhantomActions.close.value:
+                pass
+
+            self.phantom_set.update([])
+            self.view.settings().set(VIEW_SETTINGS_KEY_OPENAI_TEXT, False)
+        else:  # for handling all the rest URLs
+            (self.view.window() or active_window()).run_command('open_url', {'url': attribute})
+
+
+class PhantomActions(Enum):
+    close = 'close'
+    copy = 'copy'
+    append = 'append'
+    replace = 'replace'
+    new_file = 'new_file'
diff --git a/static/media/ai_chat_right_phantom.png b/static/media/ai_chat_right_phantom.png
new file mode 100644
index 0000000..1c88a07
Binary files /dev/null and b/static/media/ai_chat_right_phantom.png differ
diff --git a/static/media/phantom_example.png b/static/media/phantom_example.png
new file mode 100644
index 0000000..61bf233
Binary files /dev/null and b/static/media/phantom_example.png differ