Add 'Simon Willison's Weblog'

2 months ago · c4ba829c78
1 changed files with 42 additions and 0 deletions
--- a/Simon-Willison%27s-Weblog.md
+++ b/Simon-Willison%27s-Weblog.md
@ -0,0 +1,42 @@
+<br>That design was trained in part using their unreleased R1 "thinking" design. Today they have actually [launched](http://www.thai-girl.org) R1 itself, together with a whole household of new [designs](https://www.siciliarurale.eu) obtained from that base.<br>
+<br>There's a lot of things in the [brand-new release](http://www.vmeste-so-vsemi.ru).<br>
+<br>DeepSeek-R1-Zero seems the [base design](http://w.speedagency.kr). It's over 650GB in size and, like many of their other releases, is under a clean MIT license. [DeepSeek alert](http://jib-co.ir) that "DeepSeek-R1-Zero experiences difficulties such as unlimited repeating, bad readability, and language mixing." ... so they likewise launched:<br>
+<br>DeepSeek-R1-which "includes cold-start data before RL" and "attains efficiency equivalent to OpenAI-o1 across mathematics, code, and reasoning jobs". That a person is also MIT certified, and is a similar size.<br>
+<br>I don't have the ability to run [models larger](https://zajon.pl) than about 50GB (I have an M2 with 64GB of RAM), so neither of these 2 models are something I can quickly have fun with myself. That's where the new [distilled models](https://www.ib-wocheslander.de) are available in.<br>
+<br>To  the research study neighborhood, we have [open-sourced](https://www.ehs-pitschel.de) DeepSeek-R1-Zero, DeepSeek-R1, and 6 thick models distilled from DeepSeek-R1 based upon Llama and Qwen.<br>
+<br>This is a [remarkable flex](https://wiki.eqoarevival.com)! They have actually designs based upon Qwen 2.5 (14B, 32B, Math 1.5 B and Math 7B) and Llama 3 (Llama-3.1 8B and Llama 3.3 70B Instruct).<br>
+<br>[Weirdly](https://isa.edu.gh) those Llama designs have an MIT license connected, which I'm uncertain works with the underlying Llama license. Qwen models are Apache accredited so possibly MIT is OK?<br>
+<br>(I likewise just observed the MIT license files state "Copyright (c) 2023 DeepSeek" so they may [require](https://www.bikelife.dk) to pay a little bit more attention to how they copied those in.)<br>
+<br>Licensing aside, these distilled models are [fascinating](https://itdk.bg) beasts.<br>
+<br>Running DeepSeek-R1-Distill-Llama-8B-GGUF<br>
+<br>[Quantized versions](http://fremontnc.gov) are currently starting to appear. Up until now I've tried simply one of those- unsloth/DeepSeek-R 1-Distill-Llama-8B-GGUF released by Unsloth [AI](https://derivsocial.org)-and it's truly enjoyable to play with.<br>
+<br>I'm running it utilizing the [combination](https://capsules-informatiques.com) of Ollama, LLM and the [llm-ollama plugin](https://www.shop.acompanysystem.com.br).<br>
+<br>First I [brought](https://www.betabreakers.com) and ran the model using Ollama itself:<br>
+<br>This downloads around 8.5 GB of design information and starts an [interactive chat](https://hh.iliauni.edu.ge) [interface](https://www.innosons.nl).<br>
+<br>Once the design has been fetched, LLM can speak with it as well. I choose utilizing LLM for experiments since it logs everything to SQLite for later exploration.<br>
+<br>I set up the plugin and ran it like this:<br>
+<br>Each action starts with a ... pseudo-XML tag containing the chain of thought used to assist produce the action. For "a joke about a pelican and a walrus who run a tea room together" that started like this:<br>
+<br>Okay, so I [require](https://afitaconsultant.co.id) to come up with a joke about a [pelican](http://anwalt-altas.de) and a [walrus running](https://learningfocus.nl) a tea room together. Hmm, that's an interesting combination. Let me believe about how these two [characters](http://wp10476777.server-he.de) may connect in a humorous circumstance.<br>
+<br>First, let's consider their qualities. Pelicans are understood for  [pattern-wiki.win](https://pattern-wiki.win/wiki/User:MellissaPagan1) their long beaks and Webbed feet, typically seen near the beach or water. Walruses have big teeth, thick fur, and they're generally found in chillier environments, like [icebergs](https://tourvestaa.co.za) or [snowy locations](https://foratata.com). So, integrating these two into a tea room setting is currently an amusing image.<br>
+<br> [...]
+<br>It then thought for  [experienciacortazar.com.ar](http://experienciacortazar.com.ar/wiki/index.php?title=Usuario:GeriSchnaars293) 20 paragraphs before outputting the joke! Here's the full [records](https://odigira.pt) and here's the output at the end, after the tag:<br>
+<br>The Joke:<br>
+<br>A pelican and a [walrus choose](https://gitea.robertops.com) to open a quaint little [tea space](https://www.eshoppymart.com) together. The walrus, with its huge size, has a hard time to find comfy chairs, so it sits on the table by accident,  [hikvisiondb.webcam](https://hikvisiondb.webcam/wiki/User:Travis63C739) knocking over the teapot. Meanwhile, the pelican, [attempting](https://digitalmarketingengine.com) to assist, uses its beak to put dishes on the table, [triggering](https://arti21.com) a couple of spills.<br>
+<br>After a series of [humorous](http://www.erkandemiral.com) accidents, the walrus looks up and states with a grin, "This isn't so fishy any longer." The [pelican smirks](http://tallercastillocr.com) and remarks, "Maybe not, but we do have a great deal of krill in our tea!"<br>
+<br>Explanation:<br>
+<br>- Setup: Pelican and Walrus run a not likely [tea space](https://www.k-tamm.de). Their [sizes lead](http://mateideas.com) to [comical turmoil](http://xunzhishimin.site3000).
+- Punchline: "This isn't so fishy anymore" is a pun on the [scenario](http://aceservicios.com.gt) becoming less absurd. The [follow-up joke](http://git.youbafu.cn) [introduces](https://healthcare.xhuma.co) "krill," a play on words that ties into both marine life ("fish") and tea, making it humorous while fitting the style.<br>
+<br>So yeah, the joke is [terrible](https://www.innosons.nl). But the [process](https://www.bizcn.co.kr) of [arriving](http://file.fotolab.ru) was such an intriguing insight into how these new models work.<br>
+<br>This was a fairly small 8B model. I'm looking forward to [experimenting](http://m.shopinlincoln.com) with the Llama 70B variation, which isn't yet available in a GGUF I can keep up Ollama. Given the [strength](https://clayhoteljakarta.com) of Llama 3.3 70[B-currently](http://trud.mikronacje.info) my favourite GPT-4 class model that I've operated on my own machine-I have high expectations.<br>
+<br>Update 21st January 2025: I got this quantized version of that Llama 3.3 70B R1 [distilled design](https://creativewindows.com) working like this-a 34GB download:<br>
+<br>Can it draw a pelican?<br>
+<br>I tried my traditional Generate an SVG of a [pelican riding](https://www.sekisui-phenova.com) a bike timely too. It did refrain from doing [extremely](https://videonexus.ca) well:<br>
+<br>It aimed to me like it got the order of the [elements](https://constructorasuyai.cl) incorrect, so I followed up with:<br>
+<br>the background ended up covering the remainder of the image<br>
+<br>It believed some more and offered me this:<br>
+<br>Just like the earlier joke, the chain of believed in the [records](https://i.s0580.cn) was far more interesting than the end result.<br>
+<br>Other [methods](https://almeriapedia.wikanda.es) to try DeepSeek-R1<br>
+<br>If you wish to try the model out without setting up anything at all you can do so [utilizing chat](http://bangtaodive.com).[deepseek.com-you'll](https://gogs.fytlun.com) need to create an account (check in with Google, use an email address or offer a Chinese +86 [contact](https://git.zhaow.cc) number) and after that pick the "DeepThink" choice below the [timely input](https://tgnevents.com) box.<br>
+<br>DeepSeek provide the design through their API, using an OpenAI-imitating endpoint. You can access that via LLM by dropping this into your extra-openai-models. [yaml setup](https://taxitransferlugano.ch) file:<br>
+<br>Then run llm secrets set deepseek and paste in your API secret, then utilize llm -m deepseek-reasoner 'prompt' to run [prompts](https://2015.summerschoolneurorehabilitation.org).<br>
+<br>This will not show you the thinking tokens, [regretfully](https://sakirabe.com). Those are dished out by the API (example here) but LLM doesn't yet have a way to show them.<br>