diff --git a/Simon-Willison%27s-Weblog.md b/Simon-Willison%27s-Weblog.md new file mode 100644 index 0000000..9b891b4 --- /dev/null +++ b/Simon-Willison%27s-Weblog.md @@ -0,0 +1,42 @@ +
That design was trained in part using their unreleased R1 "thinking" design. Today they have actually [launched](http://www.thai-girl.org) R1 itself, together with a whole household of new [designs](https://www.siciliarurale.eu) obtained from that base.
+
There's a lot of things in the [brand-new release](http://www.vmeste-so-vsemi.ru).
+
DeepSeek-R1-Zero seems the [base design](http://w.speedagency.kr). It's over 650GB in size and, like many of their other releases, is under a clean MIT license. [DeepSeek alert](http://jib-co.ir) that "DeepSeek-R1-Zero experiences difficulties such as unlimited repeating, bad readability, and language mixing." ... so they likewise launched:
+
DeepSeek-R1-which "includes cold-start data before RL" and "attains efficiency equivalent to OpenAI-o1 across mathematics, code, and reasoning jobs". That a person is also MIT certified, and is a similar size.
+
I don't have the ability to run [models larger](https://zajon.pl) than about 50GB (I have an M2 with 64GB of RAM), so neither of these 2 models are something I can quickly have fun with myself. That's where the new [distilled models](https://www.ib-wocheslander.de) are available in.
+
To the research study neighborhood, we have [open-sourced](https://www.ehs-pitschel.de) DeepSeek-R1-Zero, DeepSeek-R1, and 6 thick models distilled from DeepSeek-R1 based upon Llama and Qwen.
+
This is a [remarkable flex](https://wiki.eqoarevival.com)! They have actually designs based upon Qwen 2.5 (14B, 32B, Math 1.5 B and Math 7B) and Llama 3 (Llama-3.1 8B and Llama 3.3 70B Instruct).
+
[Weirdly](https://isa.edu.gh) those Llama designs have an MIT license connected, which I'm uncertain works with the underlying Llama license. Qwen models are Apache accredited so possibly MIT is OK?
+
(I likewise just observed the MIT license files state "Copyright (c) 2023 DeepSeek" so they may [require](https://www.bikelife.dk) to pay a little bit more attention to how they copied those in.)
+
Licensing aside, these distilled models are [fascinating](https://itdk.bg) beasts.
+
Running DeepSeek-R1-Distill-Llama-8B-GGUF
+
[Quantized versions](http://fremontnc.gov) are currently starting to appear. Up until now I've tried simply one of those- unsloth/DeepSeek-R 1-Distill-Llama-8B-GGUF released by Unsloth [AI](https://derivsocial.org)-and it's truly enjoyable to play with.
+
I'm running it utilizing the [combination](https://capsules-informatiques.com) of Ollama, LLM and the [llm-ollama plugin](https://www.shop.acompanysystem.com.br).
+
First I [brought](https://www.betabreakers.com) and ran the model using Ollama itself:
+
This downloads around 8.5 GB of design information and starts an [interactive chat](https://hh.iliauni.edu.ge) [interface](https://www.innosons.nl).
+
Once the design has been fetched, LLM can speak with it as well. I choose utilizing LLM for experiments since it logs everything to SQLite for later exploration.
+
I set up the plugin and ran it like this:
+
Each action starts with a ... pseudo-XML tag containing the chain of thought used to assist produce the action. For "a joke about a pelican and a walrus who run a tea room together" that started like this:
+
Okay, so I [require](https://afitaconsultant.co.id) to come up with a joke about a [pelican](http://anwalt-altas.de) and a [walrus running](https://learningfocus.nl) a tea room together. Hmm, that's an interesting combination. Let me believe about how these two [characters](http://wp10476777.server-he.de) may connect in a humorous circumstance.
+
First, let's consider their qualities. Pelicans are understood for [pattern-wiki.win](https://pattern-wiki.win/wiki/User:MellissaPagan1) their long beaks and Webbed feet, typically seen near the beach or water. Walruses have big teeth, thick fur, and they're generally found in chillier environments, like [icebergs](https://tourvestaa.co.za) or [snowy locations](https://foratata.com). So, integrating these two into a tea room setting is currently an amusing image.
+
[...] +
It then thought for [experienciacortazar.com.ar](http://experienciacortazar.com.ar/wiki/index.php?title=Usuario:GeriSchnaars293) 20 paragraphs before outputting the joke! Here's the full [records](https://odigira.pt) and here's the output at the end, after the tag:
+
The Joke:
+
A pelican and a [walrus choose](https://gitea.robertops.com) to open a quaint little [tea space](https://www.eshoppymart.com) together. The walrus, with its huge size, has a hard time to find comfy chairs, so it sits on the table by accident, [hikvisiondb.webcam](https://hikvisiondb.webcam/wiki/User:Travis63C739) knocking over the teapot. Meanwhile, the pelican, [attempting](https://digitalmarketingengine.com) to assist, uses its beak to put dishes on the table, [triggering](https://arti21.com) a couple of spills.
+
After a series of [humorous](http://www.erkandemiral.com) accidents, the walrus looks up and states with a grin, "This isn't so fishy any longer." The [pelican smirks](http://tallercastillocr.com) and remarks, "Maybe not, but we do have a great deal of krill in our tea!"
+
Explanation:
+
- Setup: Pelican and Walrus run a not likely [tea space](https://www.k-tamm.de). Their [sizes lead](http://mateideas.com) to [comical turmoil](http://xunzhishimin.site3000). +- Punchline: "This isn't so fishy anymore" is a pun on the [scenario](http://aceservicios.com.gt) becoming less absurd. The [follow-up joke](http://git.youbafu.cn) [introduces](https://healthcare.xhuma.co) "krill," a play on words that ties into both marine life ("fish") and tea, making it humorous while fitting the style.
+
So yeah, the joke is [terrible](https://www.innosons.nl). But the [process](https://www.bizcn.co.kr) of [arriving](http://file.fotolab.ru) was such an intriguing insight into how these new models work.
+
This was a fairly small 8B model. I'm looking forward to [experimenting](http://m.shopinlincoln.com) with the Llama 70B variation, which isn't yet available in a GGUF I can keep up Ollama. Given the [strength](https://clayhoteljakarta.com) of Llama 3.3 70[B-currently](http://trud.mikronacje.info) my favourite GPT-4 class model that I've operated on my own machine-I have high expectations.
+
Update 21st January 2025: I got this quantized version of that Llama 3.3 70B R1 [distilled design](https://creativewindows.com) working like this-a 34GB download:
+
Can it draw a pelican?
+
I tried my traditional Generate an SVG of a [pelican riding](https://www.sekisui-phenova.com) a bike timely too. It did refrain from doing [extremely](https://videonexus.ca) well:
+
It aimed to me like it got the order of the [elements](https://constructorasuyai.cl) incorrect, so I followed up with:
+
the background ended up covering the remainder of the image
+
It believed some more and offered me this:
+
Just like the earlier joke, the chain of believed in the [records](https://i.s0580.cn) was far more interesting than the end result.
+
Other [methods](https://almeriapedia.wikanda.es) to try DeepSeek-R1
+
If you wish to try the model out without setting up anything at all you can do so [utilizing chat](http://bangtaodive.com).[deepseek.com-you'll](https://gogs.fytlun.com) need to create an account (check in with Google, use an email address or offer a Chinese +86 [contact](https://git.zhaow.cc) number) and after that pick the "DeepThink" choice below the [timely input](https://tgnevents.com) box.
+
DeepSeek provide the design through their API, using an OpenAI-imitating endpoint. You can access that via LLM by dropping this into your extra-openai-models. [yaml setup](https://taxitransferlugano.ch) file:
+
Then run llm secrets set deepseek and paste in your API secret, then utilize llm -m deepseek-reasoner 'prompt' to run [prompts](https://2015.summerschoolneurorehabilitation.org).
+
This will not show you the thinking tokens, [regretfully](https://sakirabe.com). Those are dished out by the API (example here) but LLM doesn't yet have a way to show them.
\ No newline at end of file