<?xml version="1.0" encoding="UTF-8"?><rss xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:atom="http://www.w3.org/2005/Atom" version="2.0" xmlns:itunes="http://www.itunes.com/dtds/podcast-1.0.dtd" xmlns:googleplay="http://www.google.com/schemas/play-podcasts/1.0"><channel><title><![CDATA[DataBites]]></title><description><![CDATA[Weekly curated insights to make you a better data professional 🧩]]></description><link>https://www.databites.tech</link><image><url>https://substackcdn.com/image/fetch/$s_!kyJ6!,w_256,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe930fbab-b8df-40ef-9676-3d9ca5d49eae_714x714.png</url><title>DataBites</title><link>https://www.databites.tech</link></image><generator>Substack</generator><lastBuildDate>Thu, 30 Apr 2026 21:35:46 GMT</lastBuildDate><atom:link href="https://www.databites.tech/feed" rel="self" type="application/rss+xml"/><copyright><![CDATA[Josep Ferrer]]></copyright><language><![CDATA[en]]></language><webMaster><![CDATA[databites.hi@gmail.com]]></webMaster><itunes:owner><itunes:email><![CDATA[databites.hi@gmail.com]]></itunes:email><itunes:name><![CDATA[Josep Ferrer]]></itunes:name></itunes:owner><itunes:author><![CDATA[Josep Ferrer]]></itunes:author><googleplay:owner><![CDATA[databites.hi@gmail.com]]></googleplay:owner><googleplay:email><![CDATA[databites.hi@gmail.com]]></googleplay:email><googleplay:author><![CDATA[Josep Ferrer]]></googleplay:author><itunes:block><![CDATA[Yes]]></itunes:block><item><title><![CDATA[How to Actually Get Started with HuggingFace 🤗]]></title><description><![CDATA[A clear (and human) guide to get started without drowning]]></description><link>https://www.databites.tech/p/how-to-actually-get-started-with-b80</link><guid isPermaLink="false">https://www.databites.tech/p/how-to-actually-get-started-with-b80</guid><dc:creator><![CDATA[Josep Ferrer]]></dc:creator><pubDate>Tue, 28 Oct 2025 13:15:02 GMT</pubDate><enclosure url="https://substack-post-media.s3.amazonaws.com/public/images/c711045e-5919-47eb-9c31-5b43631fe9b0_976x864.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>If you still think &#129303; is just a WhatsApp emoji, <strong>you&#8217;ve missed a lot. </strong></p><p>AI isn&#8217;t stuck in research labs anymore, it&#8217;s in products, back-office flows, and tiny scripts that save hours each week. </p><p><strong>Hugging Face is the community backbone behind much of that shift.</strong></p><p>One of the leading agents of this revolution is Hugging Face, an open-source platform that has become essential for anyone working in Machine Learning (ML) and Natural Language Processing (NLP).</p><p>Whether you&#8217;re an experienced data scientist or just starting, Hugging Face offers a wide variety of tools and resources to help you bring your AI projects to life.</p><p><strong>Trust me when I say, you&#8217;ll want to be a part of it!</strong></p><p>Before we dive in, I strongly recommend checking out my previous issue on <em>How to Get Started with LLMs</em> (if you haven&#8217;t already). Trust me, it&#8217;s a great primer!</p><div class="digest-post-embed" data-attrs="{&quot;nodeId&quot;:&quot;2fc20ea9-1d63-4d47-8925-b0957add7c47&quot;,&quot;caption&quot;:&quot;LLMs are moving faster than your backlog.&quot;,&quot;cta&quot;:&quot;Read full story&quot;,&quot;showBylines&quot;:true,&quot;size&quot;:&quot;sm&quot;,&quot;isEditorNode&quot;:true,&quot;title&quot;:&quot;How to Actually Get Started with LLMs&quot;,&quot;publishedBylines&quot;:[{&quot;id&quot;:132707413,&quot;name&quot;:&quot;Josep Ferrer&quot;,&quot;bio&quot;:&quot;Outstand using data -- Data Science, Design and Tech Tech Writer @KDnuggets @DataCamp &#128073;&#127995;Inquiries in rfeers@gmail.com&quot;,&quot;photo_url&quot;:&quot;https://substackcdn.com/image/fetch/f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd196b5a6-59f2-46dd-99b3-e10ab1bbd27d_604x604.png&quot;,&quot;is_guest&quot;:false,&quot;bestseller_tier&quot;:null}],&quot;post_date&quot;:&quot;2025-10-08T13:33:57.035Z&quot;,&quot;cover_image&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/aa672bbc-1acb-447c-b2a4-5259717b2089_976x864.png&quot;,&quot;cover_image_alt&quot;:null,&quot;canonical_url&quot;:&quot;https://www.databites.tech/p/how-to-actually-get-started-with&quot;,&quot;section_name&quot;:null,&quot;video_upload_id&quot;:null,&quot;id&quot;:175617350,&quot;type&quot;:&quot;newsletter&quot;,&quot;reaction_count&quot;:10,&quot;comment_count&quot;:0,&quot;publication_id&quot;:2143185,&quot;publication_name&quot;:&quot;DataBites&quot;,&quot;publication_logo_url&quot;:&quot;https://substackcdn.com/image/fetch/$s_!kyJ6!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe930fbab-b8df-40ef-9676-3d9ca5d49eae_714x714.png&quot;,&quot;belowTheFold&quot;:false,&quot;youtube_url&quot;:null,&quot;show_links&quot;:null,&quot;feed_url&quot;:null}"></div><h1><strong>Hugging Face, or The GitHub of ML</strong></h1><p>Hugging Face is often described as the &#8220;GitHub of the ML world&#8221;, a collaborative platform with lots of pre-trained models and datasets (ready to be loaded and used!!).</p><p>But it actually further pushes this definition. Think of it as <strong>GitHub + model hosting + serving for AI</strong>: a massive <strong>Hub</strong> of models/datasets, the <strong>Transformers</strong> library (not just NLP anymore), easy <strong>Datasets</strong>, and simple ways to <strong>demo</strong> (Spaces) and <strong>serve</strong> (Inference Endpoints, TGI) models.</p><h4>Why you should care</h4><ul><li><p><strong>Speed:</strong> pre-trained models + one-line pipelines get you to a baseline in minutes.</p></li><li><p><strong>Breadth:</strong> text, vision, audio, multimodal, diffusion&#8212;you name it.</p></li><li><p><strong>Community:</strong> model cards, evals, PRs, and fast iteration on SOTA ideas.</p></li></ul><p><em>So&#8230; where does this company come from?</em></p><h3><strong>From Chatbot to Open-Source Powerhouse</strong></h3><p>Founded in 2016, Hugging Face originally aimed to create a chatbot targeted at teenagers. However, <strong>the company quickly pivoted after open-sourcing its underlying model, leading to the creation of the Transformers library in 2018.</strong></p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!NXS4!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9f3f6056-83e3-4045-9b33-0a125a7db122_1472x518.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!NXS4!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9f3f6056-83e3-4045-9b33-0a125a7db122_1472x518.png 424w, https://substackcdn.com/image/fetch/$s_!NXS4!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9f3f6056-83e3-4045-9b33-0a125a7db122_1472x518.png 848w, https://substackcdn.com/image/fetch/$s_!NXS4!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9f3f6056-83e3-4045-9b33-0a125a7db122_1472x518.png 1272w, https://substackcdn.com/image/fetch/$s_!NXS4!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9f3f6056-83e3-4045-9b33-0a125a7db122_1472x518.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!NXS4!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9f3f6056-83e3-4045-9b33-0a125a7db122_1472x518.png" width="1456" height="512" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/9f3f6056-83e3-4045-9b33-0a125a7db122_1472x518.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:512,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:416168,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.databites.tech/i/177365648?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9f3f6056-83e3-4045-9b33-0a125a7db122_1472x518.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!NXS4!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9f3f6056-83e3-4045-9b33-0a125a7db122_1472x518.png 424w, https://substackcdn.com/image/fetch/$s_!NXS4!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9f3f6056-83e3-4045-9b33-0a125a7db122_1472x518.png 848w, https://substackcdn.com/image/fetch/$s_!NXS4!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9f3f6056-83e3-4045-9b33-0a125a7db122_1472x518.png 1272w, https://substackcdn.com/image/fetch/$s_!NXS4!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9f3f6056-83e3-4045-9b33-0a125a7db122_1472x518.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>Today, Hugging Face is a central hub for AI professionals and enthusiasts, fostering a community that continually pushes the boundaries of what&#8217;s possible with machine learning.</p><p><em>Isn&#8217;t it crazy how things change up so fast?</em></p><h2>Core pieces you&#8217;ll actually use</h2><p>One of the biggest advantages of Hugging Face is how easy it is to get started. </p><h3><strong>#1. Transformers Library</strong></h3><p>The Transformers library is a comprehensive suite of state-of-the-art ML models specially designed for NLP that contains an extensive collection of pre-trained models optimized for tasks such as text classification, language generation, translation, and summarization, among others</p><p>It abstracts common NLP tasks into a simple-to-use pipeline() method, an easy-to-use API for performing a wide variety of tasks. The Transformers library simplifies the implementation of NLP models in several key ways:</p><ol><li><p><strong>Abstraction of complexity:</strong> It abstracts away the complexity involved in initializing models, managing pipelines, and handling tokenization.</p></li><li><p><strong>Pre-trained models:</strong> Providing the biggest collection of pre-trained models, they reduce the time and resources required to develop NLP applications from scratch.</p></li><li><p><strong>Flexibility and modularity:</strong> The library is designed with modularity in mind, allowing users to plug in different components as required.</p></li><li><p><strong>Community and support: </strong>Hugging Face has fostered a strong community around its tools, with extensive documentation, tutorials, and forums.</p></li><li><p><strong>Continuous updates and expansion: </strong>The library is constantly updated with the latest breakthroughs in NLP, incorporating new models and methodologies.</p></li></ol><h3><strong>#2. Model Hub</strong></h3><p>The Model Hub stands as the community&#8217;s face, a platform where thousands of models and datasets are at your fingertips. It is an innovative feature that allows users to share and discover models contributed by the community, promoting a collaborative approach to NLP development.</p><p>You can go check it out <a href="https://substack.com/redirect/8e3b6836-14e0-46bf-aa01-9914cb11ee26?j=eyJ1IjoiMjcwZHAxIn0.hGTR9CXb_nmPcUKqllDE9vqggNRtE3-4-yLAzGi9eWs">on their official website</a>. There you can easily select the Model Hub by clicking on the Models button in the navigator, and a view like the following should appear to you:</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!qnEg!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2db3dcec-7e63-4980-bc77-a75b686fec79_1999x1129.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!qnEg!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2db3dcec-7e63-4980-bc77-a75b686fec79_1999x1129.png 424w, https://substackcdn.com/image/fetch/$s_!qnEg!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2db3dcec-7e63-4980-bc77-a75b686fec79_1999x1129.png 848w, https://substackcdn.com/image/fetch/$s_!qnEg!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2db3dcec-7e63-4980-bc77-a75b686fec79_1999x1129.png 1272w, https://substackcdn.com/image/fetch/$s_!qnEg!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2db3dcec-7e63-4980-bc77-a75b686fec79_1999x1129.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!qnEg!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2db3dcec-7e63-4980-bc77-a75b686fec79_1999x1129.png" width="1456" height="822" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/2db3dcec-7e63-4980-bc77-a75b686fec79_1999x1129.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:822,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:&quot;Screenshot of Hugging Face Model Hub main view.&quot;,&quot;title&quot;:&quot;Screenshot of Hugging Face Model Hub main view.&quot;,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="Screenshot of Hugging Face Model Hub main view." title="Screenshot of Hugging Face Model Hub main view." srcset="https://substackcdn.com/image/fetch/$s_!qnEg!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2db3dcec-7e63-4980-bc77-a75b686fec79_1999x1129.png 424w, https://substackcdn.com/image/fetch/$s_!qnEg!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2db3dcec-7e63-4980-bc77-a75b686fec79_1999x1129.png 848w, https://substackcdn.com/image/fetch/$s_!qnEg!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2db3dcec-7e63-4980-bc77-a75b686fec79_1999x1129.png 1272w, https://substackcdn.com/image/fetch/$s_!qnEg!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2db3dcec-7e63-4980-bc77-a75b686fec79_1999x1129.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">Screenshot of Hugging Face Model Hub main view.</figcaption></figure></div><p>As you can see, in the left-sidebar, there are multiple filters regarding the main task to be performed.</p><p>Contributing to the Model Hub is made straightforward by Hugging Face&#8217;s tools, which guide users through the process of uploading their models. Once contributed, these models are available for the entire community to use, either directly through the hub or via integration with the Hugging Face Transformers library.</p><p><em>Isn&#8217;t it exciting?</em></p><p><strong>This ease of access and contribution fosters a dynamic ecosystem where state-of-the-art models are constantly refined and expanded upon</strong>, providing a rich, collaborative foundation for NLP advancement.</p><h3><strong>#3. Tokenizers</strong></h3><p>Tokenizers are crucial in NLP, as they are responsible for converting text into a format that machine learning models can understand, which is essential for processing different languages and text structures.</p><p>They are responsible for breaking down text into tokens&#8212;basic units like words, subwords, or characters&#8212;thus preparing data for machine learning models to process. These tokens are the building blocks that enable models to understand and generate human language.</p><p>They also facilitate the transformation of tokens into vector representations for model input and handle padding and truncation for uniform sequence lengths.</p><p>Hugging Face provides a range of user-friendly tokenizers, optimized for their Transformers library, which are key to the seamless preprocessing of text. </p><h3><strong>#4. Datasets Library</strong></h3><p>Another key component is the Hugging Face Datasets library, a vast repository of NLP datasets that support the training and benchmarking of ML models.</p><p>This library is a crucial tool for developers in the field, as it offers a diverse collection of datasets that can be used to train, test, and benchmark any NLP models across a wide variety of tasks.</p><p>One of the main benefits it presents is the simple and user-friendly interface. While you can browse and explore all datasets in the Hugging Face Hub, to use it in your code, they have tailored the dataset library that allows you to download any dataset effortlessly.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!fa5y!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fff5383f6-2189-44fd-92d6-5ac85cabd592_1999x1217.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!fa5y!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fff5383f6-2189-44fd-92d6-5ac85cabd592_1999x1217.png 424w, https://substackcdn.com/image/fetch/$s_!fa5y!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fff5383f6-2189-44fd-92d6-5ac85cabd592_1999x1217.png 848w, https://substackcdn.com/image/fetch/$s_!fa5y!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fff5383f6-2189-44fd-92d6-5ac85cabd592_1999x1217.png 1272w, https://substackcdn.com/image/fetch/$s_!fa5y!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fff5383f6-2189-44fd-92d6-5ac85cabd592_1999x1217.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!fa5y!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fff5383f6-2189-44fd-92d6-5ac85cabd592_1999x1217.png" width="1456" height="886" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/ff5383f6-2189-44fd-92d6-5ac85cabd592_1999x1217.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:886,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:&quot;Screenshot of Hugging Face Datasets main view.&quot;,&quot;title&quot;:&quot;Screenshot of Hugging Face Datasets main view.&quot;,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="Screenshot of Hugging Face Datasets main view." title="Screenshot of Hugging Face Datasets main view." srcset="https://substackcdn.com/image/fetch/$s_!fa5y!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fff5383f6-2189-44fd-92d6-5ac85cabd592_1999x1217.png 424w, https://substackcdn.com/image/fetch/$s_!fa5y!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fff5383f6-2189-44fd-92d6-5ac85cabd592_1999x1217.png 848w, https://substackcdn.com/image/fetch/$s_!fa5y!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fff5383f6-2189-44fd-92d6-5ac85cabd592_1999x1217.png 1272w, https://substackcdn.com/image/fetch/$s_!fa5y!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fff5383f6-2189-44fd-92d6-5ac85cabd592_1999x1217.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">Screenshot of Hugging Face Datasets main view.</figcaption></figure></div><p>It includes datasets for common tasks such as text classification, translation, and question-answering, as well as more specialized datasets for unique challenges in the field.</p><p>So now that we know what it is, let&#8217;s get our hands dirty &#128165;</p><h2><strong>Getting Started with Hugging Face</strong></h2><p>Before you can start exploring Hugging Face, you&#8217;ll need to install it on your local machine.</p><h3>Installation</h3><p>First, you should combine the<code> transformers</code> library with your favorite deep learning library, either <code>TensorFlow</code> or <code>PyTorch</code>.</p><p>The transformers library can be easily installed using <code>pip</code>, Python&#8217;s package installer.</p><pre><code><code>pip install transformers</code></code></pre><p>To have the full capability, you should also install the <code>datasets</code> and the <code>tokenizers</code> library.</p><pre><code><code>pip install tokenizers, datasets</code></code></pre><p>Hugging Face&#8217;s model hub offers a huge collection of pre-trained models that you can use for a wide range of NLP tasks. There are a bunch of things we can do with LLMs. </p><p><strong>The first task we can do is directly using a Pre-trained Model. </strong></p><h3>1. Using Pre-trained Models</h3><h4><strong>#1 Select a Pre-trained Model</strong></h4><p>First, you need to select a pre-trained model. To do so, we go to the <strong><a href="https://huggingface.co/models">Model Hub</a></strong>.</p><p>Imagine we want to infer the sentiment corresponding to a string of text. So we can easily browse only the models that perform `Text Classification` tasks by selecting the Text Classification button on the left-sidebar.</p><p>Hugging Face models always appeared ordered by Trending. Usually, the higher results are the most used ones. </p><p><em>So, we select the second result, which is the most used sentiment analysis model.</em></p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!Ev9g!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Faec83d08-0bfd-4a38-a633-22a6bdd7dc8c_1999x1239.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!Ev9g!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Faec83d08-0bfd-4a38-a633-22a6bdd7dc8c_1999x1239.png 424w, https://substackcdn.com/image/fetch/$s_!Ev9g!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Faec83d08-0bfd-4a38-a633-22a6bdd7dc8c_1999x1239.png 848w, https://substackcdn.com/image/fetch/$s_!Ev9g!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Faec83d08-0bfd-4a38-a633-22a6bdd7dc8c_1999x1239.png 1272w, https://substackcdn.com/image/fetch/$s_!Ev9g!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Faec83d08-0bfd-4a38-a633-22a6bdd7dc8c_1999x1239.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!Ev9g!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Faec83d08-0bfd-4a38-a633-22a6bdd7dc8c_1999x1239.png" width="1456" height="902" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/aec83d08-0bfd-4a38-a633-22a6bdd7dc8c_1999x1239.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:902,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:&quot;Screenshot of Hugging Face Model Hub main view. Selecting Text Classification models.&quot;,&quot;title&quot;:&quot;Screenshot of Hugging Face Model Hub main view. Selecting Text Classification models.&quot;,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="Screenshot of Hugging Face Model Hub main view. Selecting Text Classification models." title="Screenshot of Hugging Face Model Hub main view. Selecting Text Classification models." srcset="https://substackcdn.com/image/fetch/$s_!Ev9g!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Faec83d08-0bfd-4a38-a633-22a6bdd7dc8c_1999x1239.png 424w, https://substackcdn.com/image/fetch/$s_!Ev9g!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Faec83d08-0bfd-4a38-a633-22a6bdd7dc8c_1999x1239.png 848w, https://substackcdn.com/image/fetch/$s_!Ev9g!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Faec83d08-0bfd-4a38-a633-22a6bdd7dc8c_1999x1239.png 1272w, https://substackcdn.com/image/fetch/$s_!Ev9g!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Faec83d08-0bfd-4a38-a633-22a6bdd7dc8c_1999x1239.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">Model Hub. Selecting our model. </figcaption></figure></div><p>To use it, we need to copy the corresponding name of the model. It can be found within the top section of its specific view.</p><h4><strong>#2 Load a pre-trained model</strong></h4><p>Now that we already know what model to use, let&#8217;s use it in Python. First we need to import the <code>AutoTokenizer</code> and the <code>AutoModelForSequenceClassification</code> classes from <code>transformers</code>.</p><p>Using these AutoModel classes will automatically infer the model architecture from the model name.</p><pre><code><code>from transformers import AutoTokenizer, AutoModelForSequenceClassification

model_name = &#8220;lxyuan/distilbert-base-multilingual-cased-sentiments-student&#8221;

# We call define a model object
model = AutoModelForSequenceClassification.from_pretrained(model_name)</code></code></pre><h4><strong>#3 Prepare your input</strong></h4><p>Load a tokenizer for our model, in this case, the transformers library facilitates the process as it inferes the tokenizer to be used from the name of the model that we have chosen.</p><pre><code><code>#We call the tokenizer class
tokenizer = AutoTokenizer.from_pretrained(model_name)
</code></code></pre><h4><strong>#4 Run the model</strong></h4><p>Generate a pipeline object with the chosen model, the tokenizer, and the task to be performed. In our case, a sentiment analysis. If you initialize the classifier object with the task, the pipeline class will populate it with the default values, even though it is not recommended in production.</p><pre><code><code># Initializing a classifier with a model and a tokenizer
classifier = pipeline(&#8221;sentiment-analysis&#8221;, model = model, tokenizer = tokenizer)
# When passing only the task, the pipeline command inferes both the model and tokenizer.
classifier = pipeline(&#8221;sentiment-analysis&#8221;)
</code></code></pre><p>We can execute this model by introducing some input.</p><pre><code><code>output = classifier(&#8221;I&#8217;ve been waiting for this tutorial all my life!&#8221;)</code></code></pre><p>And we will obtain the results right away!</p><p><em>Which leads to the following (and final) step&#8230;</em></p><h4><strong>#5 Interpret the outputs</strong></h4><p>The model will return an object containing various elements depending on the model&#8217;s class. For example, for this sentiment analysis example, we will get:</p><div class="captioned-image-container"><figure><a class="image-link image2" target="_blank" href="https://substackcdn.com/image/fetch/$s_!j97a!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7c045ace-ae69-445e-aa72-b0e984c15ac3_1062x60.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!j97a!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7c045ace-ae69-445e-aa72-b0e984c15ac3_1062x60.png 424w, https://substackcdn.com/image/fetch/$s_!j97a!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7c045ace-ae69-445e-aa72-b0e984c15ac3_1062x60.png 848w, https://substackcdn.com/image/fetch/$s_!j97a!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7c045ace-ae69-445e-aa72-b0e984c15ac3_1062x60.png 1272w, https://substackcdn.com/image/fetch/$s_!j97a!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7c045ace-ae69-445e-aa72-b0e984c15ac3_1062x60.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!j97a!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7c045ace-ae69-445e-aa72-b0e984c15ac3_1062x60.png" width="1062" height="60" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/7c045ace-ae69-445e-aa72-b0e984c15ac3_1062x60.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:60,&quot;width&quot;:1062,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:&quot;Obtained output.&quot;,&quot;title&quot;:&quot;Obtained output.&quot;,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="Obtained output." title="Obtained output." srcset="https://substackcdn.com/image/fetch/$s_!j97a!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7c045ace-ae69-445e-aa72-b0e984c15ac3_1062x60.png 424w, https://substackcdn.com/image/fetch/$s_!j97a!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7c045ace-ae69-445e-aa72-b0e984c15ac3_1062x60.png 848w, https://substackcdn.com/image/fetch/$s_!j97a!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7c045ace-ae69-445e-aa72-b0e984c15ac3_1062x60.png 1272w, https://substackcdn.com/image/fetch/$s_!j97a!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7c045ace-ae69-445e-aa72-b0e984c15ac3_1062x60.png 1456w" sizes="100vw" loading="lazy"></picture><div></div></div></a></figure></div><p>In this instance, the input string has been classified with the &#8220;Positive&#8221; label (using a sentiment analysis model), achieving a confidence score of 0.579. This score reflects the model&#8217;s certainty in its classification.</p><p><strong>A second task we can do using HF is fine-tuning a model. </strong></p><h3>2. Fine-tuning models</h3><p>Fine-tuning is the process of taking a pre-trained model and updating its parameters by training on a dataset specific to your task. This allows you to leverage the model&#8217;s learned representations and adapt them to your use case.</p><p>Imagine we need to use a text-classifier model to infer sentiments from a list of tweets. One natural question that comes to mind is: </p><p><em>Will this pre-trained model work properly?</em></p><p>To make sure it does, we can take advantage of fine-tuning by training a pre-trained Hugging Face model with a dataset containing tweets and their corresponding sentiments so the performance improves.</p><p><strong>Here&#8217;s a basic example of fine-tuning a model for sequence classification:</strong></p><h4><strong>#1. Choose a pre-trained model and a dataset</strong></h4><p>Select a model architecture suitable for your task. In this case, we want to keep using the same sentiment analysis model. </p><p><strong>However, now we need some data to train our model. </strong>And this is precisely where the <code>datasets</code> library kicks in. We can go check all datasets in the Model Hub, and find the one that fits us the best.</p><p><strong>In my case, I&#8217;ll be using the twitter-sentiment-analysis dataset. </strong></p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!ovdM!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe88a8a37-c53c-4e0f-8603-a0efd5446606_1999x1206.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!ovdM!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe88a8a37-c53c-4e0f-8603-a0efd5446606_1999x1206.png 424w, https://substackcdn.com/image/fetch/$s_!ovdM!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe88a8a37-c53c-4e0f-8603-a0efd5446606_1999x1206.png 848w, https://substackcdn.com/image/fetch/$s_!ovdM!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe88a8a37-c53c-4e0f-8603-a0efd5446606_1999x1206.png 1272w, https://substackcdn.com/image/fetch/$s_!ovdM!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe88a8a37-c53c-4e0f-8603-a0efd5446606_1999x1206.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!ovdM!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe88a8a37-c53c-4e0f-8603-a0efd5446606_1999x1206.png" width="1456" height="878" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/e88a8a37-c53c-4e0f-8603-a0efd5446606_1999x1206.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:878,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:&quot;Screenshot of Hugging Face Datasets Hub main view. Selecting Sentiment analysis datasets.&quot;,&quot;title&quot;:&quot;Screenshot of Hugging Face Datasets Hub main view. Selecting Sentiment analysis datasets.&quot;,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="Screenshot of Hugging Face Datasets Hub main view. Selecting Sentiment analysis datasets." title="Screenshot of Hugging Face Datasets Hub main view. Selecting Sentiment analysis datasets." srcset="https://substackcdn.com/image/fetch/$s_!ovdM!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe88a8a37-c53c-4e0f-8603-a0efd5446606_1999x1206.png 424w, https://substackcdn.com/image/fetch/$s_!ovdM!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe88a8a37-c53c-4e0f-8603-a0efd5446606_1999x1206.png 848w, https://substackcdn.com/image/fetch/$s_!ovdM!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe88a8a37-c53c-4e0f-8603-a0efd5446606_1999x1206.png 1272w, https://substackcdn.com/image/fetch/$s_!ovdM!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe88a8a37-c53c-4e0f-8603-a0efd5446606_1999x1206.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">Datasets section. </figcaption></figure></div><p>Now that I already know what dataset to choose, we can simply initialize both the model and dataset.</p><pre><code><code>model = AutoModelForSequenceClassification.from_pretrained(model_name)

# Loading the dataset to train our model
dataset = load_dataset(&#8221;mteb/tweet_sentiment_extraction&#8221;)
</code></code></pre><p>If we check the dataset we just downloaded, it is a dictionary containing a subset for training and a subset for testing. If we convert the training subset to a dataframe, it looks like follows:</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!0YAl!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7359fe2f-137a-4fed-b24e-b60401e03c1d_1246x874.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!0YAl!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7359fe2f-137a-4fed-b24e-b60401e03c1d_1246x874.png 424w, https://substackcdn.com/image/fetch/$s_!0YAl!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7359fe2f-137a-4fed-b24e-b60401e03c1d_1246x874.png 848w, https://substackcdn.com/image/fetch/$s_!0YAl!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7359fe2f-137a-4fed-b24e-b60401e03c1d_1246x874.png 1272w, https://substackcdn.com/image/fetch/$s_!0YAl!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7359fe2f-137a-4fed-b24e-b60401e03c1d_1246x874.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!0YAl!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7359fe2f-137a-4fed-b24e-b60401e03c1d_1246x874.png" width="1246" height="874" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/7359fe2f-137a-4fed-b24e-b60401e03c1d_1246x874.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:874,&quot;width&quot;:1246,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:&quot;The data set to be used.&quot;,&quot;title&quot;:&quot;The data set to be used.&quot;,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="The data set to be used." title="The data set to be used." srcset="https://substackcdn.com/image/fetch/$s_!0YAl!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7359fe2f-137a-4fed-b24e-b60401e03c1d_1246x874.png 424w, https://substackcdn.com/image/fetch/$s_!0YAl!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7359fe2f-137a-4fed-b24e-b60401e03c1d_1246x874.png 848w, https://substackcdn.com/image/fetch/$s_!0YAl!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7359fe2f-137a-4fed-b24e-b60401e03c1d_1246x874.png 1272w, https://substackcdn.com/image/fetch/$s_!0YAl!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7359fe2f-137a-4fed-b24e-b60401e03c1d_1246x874.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">The dataset we are using.</figcaption></figure></div><h4><strong>#2. Prepare Your dataset</strong></h4><p>Now that we already have our dataset, we need a tokenizer to prepare it to be parsed by our model. The text variable of our dataset needs to be tokenized so we can use it to fine-tune our model.</p><p>This is why the second step is to load a pre-trained Tokenizer and tokenize our dataset so it can be used for the fine-tuning.</p><pre><code><code>tokenizer = AutoTokenizer.from_pretrained(model_name)

def tokenize_function(examples):
    return tokenizer(examples[&#8221;text&#8221;], padding=&#8221;max_length&#8221;, truncation=True)

tokenized_datasets = dataset.map(tokenize_function, batched=True)</code></code></pre><h4><strong>#3. Build a PyTorch dataset with encodings</strong></h4><p>The third step is to generate a train and testing dataset. The training set will be used to fine-tune our model, while the testing set will be used to evaluate it.</p><p>Usually, the fine-tuning process takes a lot of time. </p><p><em>(To facilitate the tutorial, we randomly sample both datasets so your computation time is lower)</em></p><pre><code><code>from datasets import load_dataset

model = AutoModelForSequenceClassification.from_pretrained(model_name)

# Loading the dataset to train our model
dataset = load_dataset(&#8221;mteb/tweet_sentiment_extraction&#8221;)

small_train_dataset = tokenized_datasets[&#8221;train&#8221;].shuffle(seed=42).select(range(1000))
small_eval_dataset = tokenized_datasets[&#8221;test&#8221;].shuffle(seed=42).select(range(1000))
</code></code></pre><h4><strong>#4. Fine-tune the model</strong></h4><p>Our final step is to set up the training arguments and start the training process. The transformers library contains the <code>trainer()</code> class, which takes care of everything.</p><p>We first define the training arguments together with the evaluation strategy. Once everything is defined, we can easily train the model with the <code>train()</code> command.</p><pre><code><code>from transformers import Trainer, TrainingArguments
import numpy as np

training_args = TrainingArguments(output_dir=&#8221;trainer_output&#8221;, evaluation_strategy=&#8221;epoch&#8221;)

metric = evaluate.load(&#8221;accuracy&#8221;)

def compute_metrics(eval_pred):
    logits, labels = eval_pred
    predictions = np.argmax(logits, axis=-1)
    return metric.compute(predictions=predictions, references=labels)


trainer = Trainer(
    model=model,
    args=training_args,
    train_dataset=small_train_dataset,
    eval_dataset=small_eval_dataset,
    compute_metrics=compute_metrics,
)
trainer.train()
</code></code></pre><h4><strong>#5. Evaluate the model</strong></h4><p>After training, evaluate the model&#8217;s performance on a validation or test set. Again, the trainer class already contains an evaluate method that takes care of this.</p><pre><code><code>import evaluate

trainer.evaluate()
</code></code></pre><p>Our fine-tuned model presents an accuracy of 70%.</p><p>Now that we have already improved our model, how can we share it with the community? </p><p><em>This brings us to our final step&#8230;</em></p><h4>#6. Sharing Models</h4><p>Once we&#8217;ve fine-tuned our new model, the best idea is to share it with the community.</p><p>Hugging Face makes this process straightforward. First, we need to install the <code>huggingface_hub</code> library.</p><p>A requirement for this final step is to have an active token to be able to connect to your Hugging Face account. <strong><a href="https://huggingface.co/docs/hub/security-tokens">You can easily get one following this guideline.</a></strong> When working in a Jupyter Notebook, we can easily import the <code>notebook_login</code> library.</p><pre><code><code>from huggingface_hub import notebook_login

notebook_login()
</code></code></pre><p>This will generate a login within our Jupyter Notebook. We just need to submit our token, and our notebook will be connected to our hugging face account.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!SKD6!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5ef073e1-e2fa-4ee0-b5f9-e3534d98cb7b_855x501.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!SKD6!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5ef073e1-e2fa-4ee0-b5f9-e3534d98cb7b_855x501.png 424w, https://substackcdn.com/image/fetch/$s_!SKD6!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5ef073e1-e2fa-4ee0-b5f9-e3534d98cb7b_855x501.png 848w, https://substackcdn.com/image/fetch/$s_!SKD6!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5ef073e1-e2fa-4ee0-b5f9-e3534d98cb7b_855x501.png 1272w, https://substackcdn.com/image/fetch/$s_!SKD6!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5ef073e1-e2fa-4ee0-b5f9-e3534d98cb7b_855x501.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!SKD6!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5ef073e1-e2fa-4ee0-b5f9-e3534d98cb7b_855x501.png" width="855" height="501" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/5ef073e1-e2fa-4ee0-b5f9-e3534d98cb7b_855x501.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:501,&quot;width&quot;:855,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:&quot;Hugging Face login dialogue&quot;,&quot;title&quot;:&quot;Hugging Face login dialogue&quot;,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="Hugging Face login dialogue" title="Hugging Face login dialogue" srcset="https://substackcdn.com/image/fetch/$s_!SKD6!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5ef073e1-e2fa-4ee0-b5f9-e3534d98cb7b_855x501.png 424w, https://substackcdn.com/image/fetch/$s_!SKD6!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5ef073e1-e2fa-4ee0-b5f9-e3534d98cb7b_855x501.png 848w, https://substackcdn.com/image/fetch/$s_!SKD6!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5ef073e1-e2fa-4ee0-b5f9-e3534d98cb7b_855x501.png 1272w, https://substackcdn.com/image/fetch/$s_!SKD6!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5ef073e1-e2fa-4ee0-b5f9-e3534d98cb7b_855x501.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p><strong>After this, the model will be available for everyone in our Hugging Face profile.</strong></p><h3><strong>4 use-cases you can start doing today</strong></h3><p>If we want to standardize any NLP process, Hugging Face makes it incredibly simple, allowing us to build any pipeline in just three steps:</p>
      <p>
          <a href="https://www.databites.tech/p/how-to-actually-get-started-with-b80">
              Read more
          </a>
      </p>
   ]]></content:encoded></item><item><title><![CDATA[How to Actually Get Started with SQL]]></title><description><![CDATA[CS16 - A clear (and human) guide to get started without drowning]]></description><link>https://www.databites.tech/p/how-to-actually-get-started-with-228</link><guid isPermaLink="false">https://www.databites.tech/p/how-to-actually-get-started-with-228</guid><dc:creator><![CDATA[Josep Ferrer]]></dc:creator><pubDate>Wed, 22 Oct 2025 10:02:42 GMT</pubDate><enclosure url="https://substack-post-media.s3.amazonaws.com/public/images/e79bace7-d73f-409e-9830-c05b7103c75a_976x704.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>Many of you have been asking how to get started in the data world. I know it can seem <strong>complex</strong> and <strong>intimidating</strong>, <em>but fear often clouds our vision. </em></p><p>That&#8217;s why I want to remind you all that SQL is still the number one data language and the easiest one to learn. </p><p>If you&#8217;re looking to break into this field, there&#8217;s not better advice than&#8230;</p><blockquote><p>START</p><p>LEARNING</p><p>SQL</p><p>RIGHT</p><p>&#8230;</p></blockquote>
      <p>
          <a href="https://www.databites.tech/p/how-to-actually-get-started-with-228">
              Read more
          </a>
      </p>
   ]]></content:encoded></item><item><title><![CDATA[How to Actually Get Started with Python]]></title><description><![CDATA[CS15 - A clear (and human) guide to get started without drowning]]></description><link>https://www.databites.tech/p/how-to-actually-get-started-with-5e7</link><guid isPermaLink="false">https://www.databites.tech/p/how-to-actually-get-started-with-5e7</guid><dc:creator><![CDATA[Josep Ferrer]]></dc:creator><pubDate>Tue, 14 Oct 2025 10:02:52 GMT</pubDate><enclosure url="https://substack-post-media.s3.amazonaws.com/public/images/eaa3293c-87f6-405a-985f-7f92a93b27f2_976x704.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p><strong>You&#8217;ve wanted to learn Python for a while&#8230;</strong></p><p><em>Too many tabs, not enough progress? </em></p><blockquote><p>This guide cuts the noise and gives you a shippable path. </p></blockquote><p><strong>Only the pieces that actually move you forward.</strong></p><h1>Why this, why now</h1><p>Python is the most versatile &#8220;one language, many careers&#8221; tool: analytics, ML, web, scripting, automation, LLM apps&#8212;you name it.<br>If you learn it now, you co&#8230;</p>
      <p>
          <a href="https://www.databites.tech/p/how-to-actually-get-started-with-5e7">
              Read more
          </a>
      </p>
   ]]></content:encoded></item><item><title><![CDATA[How to Actually Get Started with LLMs]]></title><description><![CDATA[A clear (and human) guide to get started with LLMs without drowning]]></description><link>https://www.databites.tech/p/how-to-actually-get-started-with</link><guid isPermaLink="false">https://www.databites.tech/p/how-to-actually-get-started-with</guid><dc:creator><![CDATA[Josep Ferrer]]></dc:creator><pubDate>Wed, 08 Oct 2025 13:33:57 GMT</pubDate><enclosure url="https://substack-post-media.s3.amazonaws.com/public/images/aa672bbc-1acb-447c-b2a4-5259717b2089_976x864.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>LLMs are moving faster than your backlog. </p><p><strong>Feeling behind?</strong> You&#8217;re not.<br>Today&#8217;s issue compresses the essentials (what matters, what doesn&#8217;t) into a buildable path. </p><blockquote><p>Minimal theory, maximum leverage.</p></blockquote><p><strong>Following my Transformers cheat sheets (<a href="https://www.databites.tech/p/cs8-the-transformers-architectur">architecture</a>, <a href="https://www.databites.tech/p/cs9-the-transformers-architecture">encoder</a>, <a href="https://www.databites.tech/p/cs10-understanding-the-decoder-part">decoder</a>), today we go end-to-end: </strong></p><ol><li><p><strong>What to learn</strong></p></li><li><p><strong>What to build first</strong></p></li><li><p><strong>How to avoid the rabbit holes.</strong></p></li></ol><p>&#9888;&#65039; <em>It&#8217;s a longer, denser issue &#8212; but it&#8217;s meant to be a keeper. Bookmark it, steal the prompts, and ship something this week.</em></p><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://www.databites.tech/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe now&quot;,&quot;action&quot;:null,&quot;class&quot;:null}" data-component-name="ButtonCreateButton"><a class="button primary" href="https://www.databites.tech/subscribe?"><span>Subscribe now</span></a></p><div><hr></div><h2>TL;DR (paste this in your notes)</h2><ul><li><p><strong>LLMs &#8800; magic.</strong> Learn the <em>Transformer + tokens + pretrain&#8594;post-train&#8594;inference</em> pipeline.</p></li><li><p><strong>Start &#8220;outside in.&#8221;</strong> Ship value via APIs or open models first; fine-tune later.</p></li><li><p><strong>Leverage &gt; novelty.</strong> Framing, evaluation, and alignment matter more than training a giant from scratch.</p></li></ul><div><hr></div><h2>Why this, why now</h2><p><strong>Understanding LLMs and GenAI is crucial for everyone, from seasoned data professionals to beginners, as they are set to revolutionize text data processing and our future. </strong>With new models and applications constantly emerging, it&#8217;s essential to stay updated and maintain sharp skills in this rapidly evolving field.</p><h2>#1 <strong>Understanding the Basics</strong></h2><h4>What are LLMs?</h4><p>Large Language Models are a type of artificial intelligence trained on extensive text datasets. These models can generate human-like text, understand context, and even carry on conversations. They&#8217;re used in various applications, from chatbots to content creation and beyond.</p><div class="captioned-image-container"><figure><a class="image-link image2" target="_blank" href="https://substackcdn.com/image/fetch/$s_!f8Sn!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7caff1e8-11f1-4f6a-add4-2d01e181c18a_3327x876.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!f8Sn!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7caff1e8-11f1-4f6a-add4-2d01e181c18a_3327x876.png 424w, https://substackcdn.com/image/fetch/$s_!f8Sn!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7caff1e8-11f1-4f6a-add4-2d01e181c18a_3327x876.png 848w, https://substackcdn.com/image/fetch/$s_!f8Sn!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7caff1e8-11f1-4f6a-add4-2d01e181c18a_3327x876.png 1272w, https://substackcdn.com/image/fetch/$s_!f8Sn!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7caff1e8-11f1-4f6a-add4-2d01e181c18a_3327x876.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!f8Sn!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7caff1e8-11f1-4f6a-add4-2d01e181c18a_3327x876.png" width="574" height="150.9903846153846" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/7caff1e8-11f1-4f6a-add4-2d01e181c18a_3327x876.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:383,&quot;width&quot;:1456,&quot;resizeWidth&quot;:574,&quot;bytes&quot;:145258,&quot;alt&quot;:&quot;&quot;,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" title="" srcset="https://substackcdn.com/image/fetch/$s_!f8Sn!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7caff1e8-11f1-4f6a-add4-2d01e181c18a_3327x876.png 424w, https://substackcdn.com/image/fetch/$s_!f8Sn!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7caff1e8-11f1-4f6a-add4-2d01e181c18a_3327x876.png 848w, https://substackcdn.com/image/fetch/$s_!f8Sn!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7caff1e8-11f1-4f6a-add4-2d01e181c18a_3327x876.png 1272w, https://substackcdn.com/image/fetch/$s_!f8Sn!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7caff1e8-11f1-4f6a-add4-2d01e181c18a_3327x876.png 1456w" sizes="100vw" loading="lazy"></picture><div></div></div></a></figure></div><p><strong>So&#8230; why are they so popular?</strong></p><p>LMs are popular due to their ability to generate coherent, contextually relevant, and grammatically accurate text. <strong>Their exceptional performance on diverse language tasks and the accessibility of pre-trained models have democratized AI-powered natural language understanding and generation.</strong></p><h4>LLMs core components</h4><p>Key concepts of LLMs include:</p><ul><li><p><strong>Transformer Architecture: </strong>It is the backbone of LLMs, featuring self-attention mechanisms that enable the model to weigh the importance of different words in a sentence.</p></li><li><p><strong>Tokenization</strong>: Breaking down text into manageable pieces or tokens. This is performed by <strong>Tokenizers</strong>. </p></li><li><p><strong>Pre-training:</strong> Involves training the model on a large corpus of text to learn language patterns, grammar, and context.</p></li><li><p><strong>Fine-tuning:</strong> Adapts the pre-trained model to specific tasks using smaller, task-specific datasets.</p></li><li><p><strong>NLU (Natural Language Understanding):</strong> The ability to understand and interpret human language.</p></li><li><p><strong>NLG (Natural language Generation):</strong> The ability to generate coherent and contextually relevant text.</p></li><li><p><strong>Prompt Engineering: </strong>Crafting input prompts to guide the model towards generating desired outputs, essential for tasks performed via API access.</p></li></ul><h4>Main Differences between LLMs and Deep Learning Models</h4><p>LLMs differ from other deep learning models primarily due to their size and use of self-attention mechanisms. Key differentiators include:</p><ul><li><p><strong>Transformer Architecture:</strong> This revolutionary design underpins LLMs and has transformed natural language processing.</p></li><li><p><strong>Contextual Understanding:</strong> LLMs capture long-range dependencies in text, enhancing their contextual comprehension.</p></li><li><p><strong>Versatility:</strong> They excel in various language tasks, including text generation, translation, summarization, and question-answering.</p></li></ul><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://www.databites.tech/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe now&quot;,&quot;action&quot;:null,&quot;class&quot;:null}" data-component-name="ButtonCreateButton"><a class="button primary" href="https://www.databites.tech/subscribe?"><span>Subscribe now</span></a></p><h2>#2 <strong>How to get started with LLMs?</strong></h2><h4>1. Understanding the Transformer Architecture in LLMs</h4><p>Now that you&#8217;re familiar with LLMs, let&#8217;s delve into the Transformer architecture that powers these models. The original Transformer, introduced in the paper <em><a href="https://arxiv.org/abs/1706.03762">Attention Is All You Need</a></em>, revolutionized natural language processing.</p><h4>Key Features:</h4><ul><li><p><strong>Self-Attention Layers:</strong> Allow the model to focus on different parts of the input sequence.</p></li><li><p><strong>Multi-Head Attention:</strong> Enables the model to attend to information from different representation subspaces.</p></li><li><p><strong>Feed-Forward Neural Networks:</strong> Process the output from the attention mechanism.</p></li><li><p><strong>Encoder-Decoder Architecture:</strong> Facilitates tasks like translation.</p></li></ul><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!F4X4!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1d80fe6d-d7d9-4435-a5f6-e594d7ef9c1c_4133x4529.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!F4X4!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1d80fe6d-d7d9-4435-a5f6-e594d7ef9c1c_4133x4529.png 424w, https://substackcdn.com/image/fetch/$s_!F4X4!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1d80fe6d-d7d9-4435-a5f6-e594d7ef9c1c_4133x4529.png 848w, https://substackcdn.com/image/fetch/$s_!F4X4!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1d80fe6d-d7d9-4435-a5f6-e594d7ef9c1c_4133x4529.png 1272w, https://substackcdn.com/image/fetch/$s_!F4X4!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1d80fe6d-d7d9-4435-a5f6-e594d7ef9c1c_4133x4529.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!F4X4!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1d80fe6d-d7d9-4435-a5f6-e594d7ef9c1c_4133x4529.png" width="508" height="556.8461538461538" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/1d80fe6d-d7d9-4435-a5f6-e594d7ef9c1c_4133x4529.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:1596,&quot;width&quot;:1456,&quot;resizeWidth&quot;:508,&quot;bytes&quot;:870800,&quot;alt&quot;:&quot;&quot;,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" title="" srcset="https://substackcdn.com/image/fetch/$s_!F4X4!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1d80fe6d-d7d9-4435-a5f6-e594d7ef9c1c_4133x4529.png 424w, https://substackcdn.com/image/fetch/$s_!F4X4!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1d80fe6d-d7d9-4435-a5f6-e594d7ef9c1c_4133x4529.png 848w, https://substackcdn.com/image/fetch/$s_!F4X4!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1d80fe6d-d7d9-4435-a5f6-e594d7ef9c1c_4133x4529.png 1272w, https://substackcdn.com/image/fetch/$s_!F4X4!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1d80fe6d-d7d9-4435-a5f6-e594d7ef9c1c_4133x4529.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">Transformers Architecture</figcaption></figure></div><p>Remember, you can learn more about it <a href="https://www.databites.tech/p/cs8-the-transformers-architecture">in the following article</a> about the Transformers Architecture. </p><h4>2. Pre-training LLMs</h4><p>Now that you understand the fundamentals of LLMs and the transformer architecture, it&#8217;s time to explore pre-training LLMs. Pre-training is crucial for enabling LLMs to grasp human language by exposing them to huge amounts of text. </p><p><strong>This part is (usually) performed by companies like OpenAI, Google, DeepSeek, Meta, or Anthropic. </strong></p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!rKfp!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd92e1989-5bd2-4b5d-a9de-0e4b565c4721_3347x1752.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!rKfp!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd92e1989-5bd2-4b5d-a9de-0e4b565c4721_3347x1752.png 424w, https://substackcdn.com/image/fetch/$s_!rKfp!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd92e1989-5bd2-4b5d-a9de-0e4b565c4721_3347x1752.png 848w, https://substackcdn.com/image/fetch/$s_!rKfp!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd92e1989-5bd2-4b5d-a9de-0e4b565c4721_3347x1752.png 1272w, https://substackcdn.com/image/fetch/$s_!rKfp!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd92e1989-5bd2-4b5d-a9de-0e4b565c4721_3347x1752.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!rKfp!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd92e1989-5bd2-4b5d-a9de-0e4b565c4721_3347x1752.png" width="462" height="241.78846153846155" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/d92e1989-5bd2-4b5d-a9de-0e4b565c4721_3347x1752.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:762,&quot;width&quot;:1456,&quot;resizeWidth&quot;:462,&quot;bytes&quot;:265118,&quot;alt&quot;:&quot;&quot;,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" title="" srcset="https://substackcdn.com/image/fetch/$s_!rKfp!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd92e1989-5bd2-4b5d-a9de-0e4b565c4721_3347x1752.png 424w, https://substackcdn.com/image/fetch/$s_!rKfp!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd92e1989-5bd2-4b5d-a9de-0e4b565c4721_3347x1752.png 848w, https://substackcdn.com/image/fetch/$s_!rKfp!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd92e1989-5bd2-4b5d-a9de-0e4b565c4721_3347x1752.png 1272w, https://substackcdn.com/image/fetch/$s_!rKfp!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd92e1989-5bd2-4b5d-a9de-0e4b565c4721_3347x1752.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><h4>Key Concepts:</h4><ul><li><p><strong>Objectives of Pre-training:</strong> LLMs learn language patterns, grammar, and context through exposure to extensive text corpora<strong>. Key tasks include masked language modeling and next sentence prediction.</strong></p></li><li><p><strong>Text Corpus for Pre-training:</strong> LLMs are trained on diverse and massive datasets, including web articles, books, and more, with billions to trillions of text tokens. Common datasets are C4, BookCorpus, Pile, OpenWebText, etc.</p></li><li><p><strong>Training Procedure:</strong> Understand the technical aspects such as optimization algorithms, batch sizes, and training epochs, and learn about challenges like mitigating data biases.</p></li></ul><p>For further learning, <a href="https://stanford-cs324.github.io/winter2022/lectures/training/">check out the module on LLM training from CS324: Large Language Models.</a> </p><p>As training an LLM from scratch requires a lot of resources, we can access pre-trained models directly via API (OpenAI, Google&#8230;) or using open-source models in HuggingFace. </p><h4>3. Accessing LLMs and using them</h4><p>In today&#8217;s landscape, accessing and utilizing LLMs has become easier than ever, thanks to both commercial APIs and open-source platforms. </p><h5>Using Commercial APIs </h5><p>The most common one is OpenAI and their GPT models, but others like Anthropic can be used as well. </p><ul><li><p><strong>API Access:</strong> OpenAI provides robust API access to its models, such as GPT-4 and ChatGPT, allowing developers to integrate powerful language capabilities into their applications.</p></li><li><p><strong>Ease of Use: </strong>With simple HTTP requests, you can send text prompts to the API and receive generated responses. The API supports various parameters to fine-tune the behavior of the model, such as temperature, max tokens, and more.</p></li><li><p><strong>Applications: </strong>This API is versatile and can be used for chatbots, content generation, summarization, translation, and other NLP tasks.</p></li></ul><h5>Using Open-Source Models (Hugging Face)</h5><ul><li><p><strong>Model Hub: </strong>Hugging Face offers a vast repository of open-source models, including versions of GPT, BERT, T5, Mistral, Meta&#8217;s Llama and many more, which can be accessed for specific tasks.</p></li><li><p><strong>Transformers Library:</strong> The Transformers library by Hugging Face provides a comprehensive toolkit for using and fine-tuning these models. It supports multiple frameworks, including TensorFlow and PyTorch.</p></li><li><p><strong>Ease of Use: </strong>With Hugging Face, you can load pre-trained models with just a few lines of code and fine-tune them on your dataset. The library also offers utilities for tokenization, training, and deploying models.</p></li></ul><h4>4. Fine-Tuning LLMs</h4><p>Once we know how to access and use pre-trained LLMs, the next step is understanding the process of fine-tuning and how to train them for specific tasks. Fine-tuning tailors pre-trained models to perform tasks like sentiment analysis, question answering, or translation with greater accuracy and efficiency.</p><h5>Why Fine-Tune LLMs?</h5><ul><li><p><strong>Task-Specific Performance:</strong> While pre-trained LLMs have a general understanding of language, fine-tuning is essential to excel in specific tasks by learning their unique nuances.</p></li><li><p><strong>Efficiency:</strong> Fine-tuning leverages the pre-trained model&#8217;s knowledge, reducing the data and computation needed compared to training from scratch. This process requires a much smaller dataset.</p></li></ul><h5>Fine-Tuning LLMs with access to their weights</h5><ol><li><p><strong>Choose the Pre-trained LLM:</strong> Select a pre-trained model that suits your task. For instance, for question-answering, choose a model designed for natural language understanding.</p></li><li><p><strong>Data Preparation:</strong> Prepare a labeled dataset for your specific task, ensuring it is properly formatted.</p></li><li><p><strong>Fine-Tuning Process:</strong></p><ul><li><p>Use parameter-efficient techniques to fine-tune the model, considering LLMs have tens of billions of parameters.</p></li><li><p>If you don&#8217;t have access to the weights, explore alternative approaches or frameworks that facilitate fine-tuning without direct weight manipulation.</p></li></ul></li></ol><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!4z6w!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F094475f9-dd64-4ad9-8a28-71376ea39355_4793x1778.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!4z6w!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F094475f9-dd64-4ad9-8a28-71376ea39355_4793x1778.png 424w, https://substackcdn.com/image/fetch/$s_!4z6w!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F094475f9-dd64-4ad9-8a28-71376ea39355_4793x1778.png 848w, https://substackcdn.com/image/fetch/$s_!4z6w!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F094475f9-dd64-4ad9-8a28-71376ea39355_4793x1778.png 1272w, https://substackcdn.com/image/fetch/$s_!4z6w!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F094475f9-dd64-4ad9-8a28-71376ea39355_4793x1778.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!4z6w!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F094475f9-dd64-4ad9-8a28-71376ea39355_4793x1778.png" width="1456" height="540" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/094475f9-dd64-4ad9-8a28-71376ea39355_4793x1778.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:540,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:381128,&quot;alt&quot;:&quot;&quot;,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" title="" srcset="https://substackcdn.com/image/fetch/$s_!4z6w!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F094475f9-dd64-4ad9-8a28-71376ea39355_4793x1778.png 424w, https://substackcdn.com/image/fetch/$s_!4z6w!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F094475f9-dd64-4ad9-8a28-71376ea39355_4793x1778.png 848w, https://substackcdn.com/image/fetch/$s_!4z6w!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F094475f9-dd64-4ad9-8a28-71376ea39355_4793x1778.png 1272w, https://substackcdn.com/image/fetch/$s_!4z6w!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F094475f9-dd64-4ad9-8a28-71376ea39355_4793x1778.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>By following these steps, you can adapt pre-trained LLMs to achieve optimal performance on your desired tasks. <a href="https://www.kdnuggets.com/7-steps-to-mastering-large-language-model-fine-tuning">You can read more about it here. </a></p><h5>Fine-Tuning LLMs Without Access to Model Weights</h5><p>When you don&#8217;t have access to an LLM&#8217;s weights and must use an API, you can still fine-tune the model using in-context learning and prompt tuning.</p><ol><li><p><strong>In-Context Learning:</strong> Leverage the LLM&#8217;s ability to learn from provided examples. By giving input-output examples within the prompt, the model can perform tasks without explicit fine-tuning.</p></li><li><p><strong>Prompt Tuning:</strong></p><ul><li><p><strong>Hard Prompt Tuning:</strong> Modify the input tokens directly in the prompt to guide the model&#8217;s output.</p></li><li><p><strong>Soft Prompt Tuning:</strong> Concatenate the input embedding with a learnable tensor. Prefix tuning is a related approach where learnable tensors are used with each Transformer block, not just the input embeddings.</p></li></ul></li><li><p><strong>Parameter-Efficient Fine-Tuning Techniques (PEFT):</strong></p><ul><li><p><strong>LoRA and QLoRA:</strong> These techniques allow fine-tuning by introducing a small set of learnable parameters, called adapters, instead of updating the entire weight matrix. QLoRA, for instance, enables fine-tuning a 4-bit quantized LLM on a single consumer GPU without performance loss.</p></li></ul></li></ol><p>By using these methods, you can adapt LLMs for specific tasks efficiently, even without direct access to the model&#8217;s weights. Here are some resources to explore further: </p><ul><li><p><strong><a href="https://www.datacamp.com/tutorial/quantization-for-large-language-models">Quantization for LLMs: Reduce AI Model Sizes Efficiently</a></strong></p></li><li><p><strong><a href="https://medium.com/geekculture/prompt-engineering-course-openai-inferring-transforming-expanding-chatgpt-chatgpt4-e5f63132f422">Prompt Engineering Course by OpenAI &#8212; Inferring, Transforming, and Expanding with ChatGPT</a></strong></p></li></ul><p>And don&#8217;t forget to check my webinar about fine-tuning Disilbert and Mistral 7B!</p><div id="youtube2-SnGXzb0adLQ" class="youtube-wrap" data-attrs="{&quot;videoId&quot;:&quot;SnGXzb0adLQ&quot;,&quot;startTime&quot;:&quot;1s&quot;,&quot;endTime&quot;:null}" data-component-name="Youtube2ToDOM"><div class="youtube-inner"><iframe src="https://www.youtube-nocookie.com/embed/SnGXzb0adLQ?start=1s&amp;rel=0&amp;autoplay=0&amp;showinfo=0&amp;enablejsapi=0" frameborder="0" loading="lazy" gesture="media" allow="autoplay; fullscreen" allowautoplay="true" allowfullscreen="true" width="728" height="409"></iframe></div></div><h4>5. Alignment and Post-Training in LLMs</h4><p>LLMs can sometimes generate content that is harmful, biased, or misaligned with user expectations. Alignment involves adjusting an LLM&#8217;s behavior to align with human preferences and ethical standards, aiming to reduce the risks of biased, controversial, or harmful content.</p><h5>Techniques to Explore:</h5><ul><li><p><strong>Reinforcement Learning from Human Feedback (RLHF):</strong> This method uses human annotations on LLM outputs to train a reward model, guiding the model to produce more desirable outputs.</p></li><li><p><strong>Contrastive Post-Training:</strong> This technique leverages contrastive methods to automatically create preference pairs, refining the model&#8217;s responses to better match user expectations.</p></li></ul><p>By employing these techniques, you can enhance the alignment of LLMs, ensuring they produce content that is safe, ethical, and aligned with human values.</p><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://www.databites.tech/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe now&quot;,&quot;action&quot;:null,&quot;class&quot;:null}" data-component-name="ButtonCreateButton"><a class="button primary" href="https://www.databites.tech/subscribe?"><span>Subscribe now</span></a></p><h4>6. Evaluating LLMs</h4><p>Evaluating the performance of LLMs is crucial to assess their effectiveness and identify areas for improvement. Key aspects of LLM evaluation include:</p><ol><li><p><strong>Task-Specific Metrics:</strong> Select appropriate metrics for your specific task. For example:</p><ul><li><p><strong>Text Classification:</strong> Use metrics like accuracy, precision, recall, and F1 score.</p></li><li><p><strong>Language Generation:</strong> Metrics such as perplexity and BLEU scores are commonly used.</p></li></ul></li><li><p><strong>Human Evaluation:</strong> Have experts or crowdsourced annotators assess the quality of generated content or model responses in real-world scenarios.</p></li><li><p><strong>Bias and Fairness:</strong> Evaluate LLMs for biases and fairness, especially when deploying them in real-world applications. Analyze performance across different demographic groups and address any disparities.</p></li><li><p><strong>Robustness and Adversarial Testing:</strong> Test the LLM&#8217;s robustness by subjecting it to adversarial attacks or challenging inputs to uncover vulnerabilities and enhance model security.</p></li></ol><h4>7. Continuous Learning and Adaptation</h4><p>To keep LLMs updated with new data and tasks, consider these strategies:</p><ol><li><p><strong>Data Augmentation:</strong> Continuously augment your dataset to prevent performance degradation due to outdated information.</p></li><li><p><strong>Retraining:</strong> Periodically retrain the LLM with new data and fine-tune it for evolving tasks to ensure the model stays current.</p></li><li><p><strong>Active Learning:</strong> Implement active learning techniques to identify instances where the model is uncertain or likely to make errors. Collect annotations for these instances to refine the model.</p></li></ol><p>Additionally, to mitigate common issues like hallucinations, explore techniques such as retrieval augmentation.</p><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://www.databites.tech/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe now&quot;,&quot;action&quot;:null,&quot;class&quot;:null}" data-component-name="ButtonCreateButton"><a class="button primary" href="https://www.databites.tech/subscribe?"><span>Subscribe now</span></a></p><h2>#3 Building and Deploying LLM Applications</h2><p>Once you&#8217;ve developed and fine-tuned an LLM for specific tasks, the next step is to build and deploy applications that harness the LLM&#8217;s capabilities. This involves creating practical, real-world solutions that make the most of your LLM&#8217;s potential.</p><h4>Building LLM Applications</h4><p>When developing applications that leverage Large Language Models (LLMs), consider the following:</p><ol><li><p><strong>Task-Specific Application Development:</strong></p><p>Tailor your applications to meet specific use cases, such as web interfaces, mobile apps, chatbots, or integrations into existing software systems.</p></li><li><p><strong>User Experience (UX) Design:</strong></p><p>Prioritize user-centered design to ensure your LLM application is intuitive, user-friendly, and meets the needs of your target audience.</p></li><li><p><strong>API Integration:</strong></p><p>If your LLM acts as a language model backend, create RESTful APIs or GraphQL endpoints to facilitate seamless interaction with other software components.</p></li><li><p><strong>Scalability and Performance:</strong></p><p>Design your applications to handle varying levels of traffic and demand. Optimize for performance and scalability to provide a smooth and reliable user experience.</p></li></ol><h4>Deploying LLM Applications</h4><p>Now that you&#8217;ve developed your LLM application, it&#8217;s time to deploy it to production. Here are key considerations for a successful deployment:</p><ol><li><p><strong>Cloud Deployment:</strong></p><p>Deploy your LLM applications on cloud platforms like AWS, Google Cloud, or Azure. These platforms offer scalability, reliability, and easy management of resources.</p></li><li><p><strong>Containerization:</strong></p><p>Use containerization technologies such as Docker and Kubernetes to package your applications. This ensures consistent deployment across various environments and simplifies scaling and management.</p></li><li><p><strong>Monitoring:</strong></p><p>Implement robust monitoring solutions to track the performance of your deployed LLM applications. This allows you to detect and address issues in real time, ensuring optimal performance and reliability.</p></li></ol><p>Practical experience is crucial. Here&#8217;s how you can get hands-on:</p><ul><li><p><strong><a href="https://www.youtube.com/watch?v=l4HTEf0_s70&amp;list=PLuI8kc1bqP2junKkKVD-5441I8G7oXDTm">Welcome to the Hands-on LLM Course</a></strong> by<strong> </strong></p><p>Pau Labarta Bajo</p></li><li><p><strong><a href="https://www.youtube.com/watch?v=Ku9PM26Cc2c">Hugging Face and PyTorch Lightning</a> </strong>by <a href="https://www.linkedin.com/in/jonkrohn/">Jon Krohn.</a></p></li></ul><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://www.databites.tech/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe now&quot;,&quot;action&quot;:null,&quot;class&quot;:null}" data-component-name="ButtonCreateButton"><a class="button primary" href="https://www.databites.tech/subscribe?"><span>Subscribe now</span></a></p><h3><strong>A final note</strong></h3><p>If you&#8217;ve made it this far, you&#8217;ve already taken the first step: understanding that this isn&#8217;t about knowing everything&#8212;it&#8217;s about moving forward bit by bit.</p><p><strong>With patience, curiosity, and consistency.</strong></p><p>No one starts out knowing.</p><p>But we all start in the same place: by taking the first step.</p><p><em>Are you in?</em></p><p>Hope to see you in the community soon!</p><p>Sincerely,</p><p>&#8212; Josep</p><div><hr></div><h2>Your turn</h2><p>Some final resources to check: </p><ul><li><p><a href="https://arxiv.org/abs/1706.03762">Attention Is All You Need</a> (must read)</p></li><li><p>My illustrated Transformers saga (<a href="https://www.databites.tech/p/cs8-the-transformers-architectur">architecture</a>, <a href="https://www.databites.tech/p/cs9-the-transformers-architecture">encoder</a>, <a href="https://www.databites.tech/p/cs10-understanding-the-decoder-part">decoder</a>)</p></li><li><p><a href="https://stanford-cs324.github.io/winter2022/lectures/modeling/">Module on Modeling from Stanford CS324: Large Language Models</a></p></li><li><p><a href="https://huggingface.co/learn/nlp-course/chapter1/1">HuggingFace Transformers Course</a></p></li></ul><div><hr></div><h2><strong>Are you still here? &#129488;</strong></h2><p>&#128073;&#127995; I want this newsletter to be useful, so please let me know your feedback!</p><div class="poll-embed" data-attrs="{&quot;id&quot;:387267}" data-component-name="PollToDOM"></div><div><hr></div><p>Before you go,<strong> tap the &#128154; and the restack buttons at the bottom of this email to show your support</strong>&#8212;<em>it really helps and means a lot!</em></p><p><em>P.S. Share with the coworker who thinks self-attention is a personality trait.</em></p><p><strong>Any doubt? Let&#8217;s start a conversation! &#128071;&#127995;</strong></p>]]></content:encoded></item><item><title><![CDATA[You’re Using ChatGPT Wrong (According to 700M Users)]]></title><description><![CDATA[Notes #13 - Why asking > doing, and how to turn prompts into business decisions.]]></description><link>https://www.databites.tech/p/why-most-people-dont-use-chatgpt</link><guid isPermaLink="false">https://www.databites.tech/p/why-most-people-dont-use-chatgpt</guid><dc:creator><![CDATA[Josep Ferrer]]></dc:creator><pubDate>Tue, 16 Sep 2025 10:02:45 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!T-e9!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb51a0eef-c638-4f47-9523-5bb7a5d551b5_830x784.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p><strong>Hey everyone! &#128075;&#127996;</strong></p><p>Josep here, back with your weekly bite of career insights and encouragement &#10024;</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!T-e9!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb51a0eef-c638-4f47-9523-5bb7a5d551b5_830x784.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!T-e9!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb51a0eef-c638-4f47-9523-5bb7a5d551b5_830x784.png 424w, https://substackcdn.com/image/fetch/$s_!T-e9!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb51a0eef-c638-4f47-9523-5bb7a5d551b5_830x784.png 848w, https://substackcdn.com/image/fetch/$s_!T-e9!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb51a0eef-c638-4f47-9523-5bb7a5d551b5_830x784.png 1272w, https://substackcdn.com/image/fetch/$s_!T-e9!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb51a0eef-c638-4f47-9523-5bb7a5d551b5_830x784.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!T-e9!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb51a0eef-c638-4f47-9523-5bb7a5d551b5_830x784.png" width="830" height="784" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/b51a0eef-c638-4f47-9523-5bb7a5d551b5_830x784.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:784,&quot;width&quot;:830,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:1723419,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:true,&quot;internalRedirect&quot;:&quot;https://www.databites.tech/i/173735987?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb51a0eef-c638-4f47-9523-5bb7a5d551b5_830x784.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!T-e9!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb51a0eef-c638-4f47-9523-5bb7a5d551b5_830x784.png 424w, https://substackcdn.com/image/fetch/$s_!T-e9!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb51a0eef-c638-4f47-9523-5bb7a5d551b5_830x784.png 848w, https://substackcdn.com/image/fetch/$s_!T-e9!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb51a0eef-c638-4f47-9523-5bb7a5d551b5_830x784.png 1272w, https://substackcdn.com/image/fetch/$s_!T-e9!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb51a0eef-c638-4f47-9523-5bb7a5d551b5_830x784.png 1456w" sizes="100vw" fetchpriority="high"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">Enjoying a biking day from Rotterdam to Delft! &#128154;</figcaption></figure></div><h3>A quick gut-check:</h3><p>When you picture ChatGPT, what&#8217;s the first image that pops up?<br>Someone cranking out SQL? Debugging Python? Auto-drafting emails?</p><p>That was my picture too, until I dug into a new OpenAI study cover&#8230;</p>
      <p>
          <a href="https://www.databites.tech/p/why-most-people-dont-use-chatgpt">
              Read more
          </a>
      </p>
   ]]></content:encoded></item><item><title><![CDATA[How to Become a Data Scientist]]></title><description><![CDATA[A clear (and human) guide to get started without getting lost]]></description><link>https://www.databites.tech/p/how-to-become-a-data-scientist</link><guid isPermaLink="false">https://www.databites.tech/p/how-to-become-a-data-scientist</guid><dc:creator><![CDATA[Josep Ferrer]]></dc:creator><pubDate>Mon, 15 Sep 2025 13:56:07 GMT</pubDate><enclosure url="https://substack-post-media.s3.amazonaws.com/public/images/afb8123c-7433-4503-8ab0-f17ef22a36a9_1465x1296.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>If you&#8217;re reading this, you probably suspect it already: <strong>data science is a fascinating field&#8230; and also overwhelming. </strong></p><p>With so many languages, tools, and possible paths, it&#8217;s easy not to know where to start.</p><p>That&#8217;s why one of the questions I get most is: </p><blockquote><p>How do you become a data scientist?</p></blockquote><p><strong>This article is my attempt to answer it clearly. </strong></p><p>I won&#8217;t promise mag&#8230;</p>
      <p>
          <a href="https://www.databites.tech/p/how-to-become-a-data-scientist">
              Read more
          </a>
      </p>
   ]]></content:encoded></item><item><title><![CDATA[SQL COURSE PROBLEM #4]]></title><description><![CDATA[SQL Crash Course - Managing Financial Services Database]]></description><link>https://www.databites.tech/p/sql-course-problem-4</link><guid isPermaLink="false">https://www.databites.tech/p/sql-course-problem-4</guid><dc:creator><![CDATA[Josep Ferrer]]></dc:creator><pubDate>Fri, 06 Jun 2025 11:22:18 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!RryE!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F29d8ee22-a790-4cbe-9cac-63125f0c89d7_1380x962.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p></p><div class="pullquote"><p><em>All the course material is stored in the <strong><a href="https://github.com/CornelliusYW/SQL-Crash-Course">SQL Crash Course repository</a></strong>.</em></p></div><p>Hi everyone! <strong>Josep</strong> and <a href="https://open.substack.com/users/6000855-cornellius-yudha-wijaya?utm_source=mentions">Cornellius Yudha Wijaya</a> from <a href="https://open.substack.com/pub/cornellius">Non-Brand Data</a> here &#128075;&#127995;</p><p>As promised, today we are publishing the next two issues of our <a href="https://www.databites.tech/p/launching-the-sql-crash-course-from">SQL Crash Course &#8211; From Zero to Hero!</a> &#128640;</p><p>I am sure you are here to continue our <strong>SQL Crash Course Journey!&#128218;</strong></p><p>If this is your first time or you&#8217;ve for&#8230;</p>
      <p>
          <a href="https://www.databites.tech/p/sql-course-problem-4">
              Read more
          </a>
      </p>
   ]]></content:encoded></item><item><title><![CDATA[The Importance of Context]]></title><description><![CDATA[Notes #12 - Don&#8217;t just show data. Tell the story that moves people.]]></description><link>https://www.databites.tech/p/the-importance-of-context</link><guid isPermaLink="false">https://www.databites.tech/p/the-importance-of-context</guid><dc:creator><![CDATA[Josep Ferrer]]></dc:creator><pubDate>Tue, 03 Jun 2025 10:02:41 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!wI6A!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7eed8149-b89a-4c91-8589-e386d2bf2761_1294x1296.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p><strong>Hey everyone! &#128075;&#127996;</strong></p><p>Josep here, back with your weekly bite of career insights and encouragement &#10024;</p><p>Last week, we unpacked how to position ourselves for luck.</p><div class="digest-post-embed" data-attrs="{&quot;nodeId&quot;:&quot;9bcfd1a8-d51b-42ec-a7b0-d3ba66dec9ba&quot;,&quot;caption&quot;:&quot;Hey everyone! &#128075;&#127996;&quot;,&quot;cta&quot;:&quot;Read full story&quot;,&quot;showBylines&quot;:true,&quot;size&quot;:&quot;sm&quot;,&quot;isEditorNode&quot;:true,&quot;title&quot;:&quot; Position Yourself for Luck&quot;,&quot;publishedBylines&quot;:[{&quot;id&quot;:132707413,&quot;name&quot;:&quot;Josep Ferrer&quot;,&quot;bio&quot;:&quot;Outstand using data -- Data Science, Design and Tech Tech Writer @KDnuggets @DataCamp &#128073;&#127995;Inquiries in rfeers@gmail.com&quot;,&quot;photo_url&quot;:&quot;https://substackcdn.com/image/fetch/f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd196b5a6-59f2-46dd-99b3-e10ab1bbd27d_604x604.png&quot;,&quot;is_guest&quot;:false,&quot;bestseller_tier&quot;:null}],&quot;post_date&quot;:&quot;2025-05-28T10:02:48.977Z&quot;,&quot;cover_image&quot;:&quot;https://substackcdn.com/image/fetch/f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F12d60872-4d4b-4344-b5e4-f28d501e8a47_1646x1644.png&quot;,&quot;cover_image_alt&quot;:null,&quot;canonical_url&quot;:&quot;https://www.databites.tech/p/position-yourself-for-luck&quot;,&quot;section_name&quot;:&quot;Josep's Notes &#128640;&quot;,&quot;video_upload_id&quot;:null,&quot;id&quot;:164624046,&quot;type&quot;:&quot;newsletter&quot;,&quot;reaction_count&quot;:9,&quot;comment_count&quot;:3,&quot;publication_id&quot;:null,&quot;publication_name&quot;:&quot;DataBites&quot;,&quot;publication_logo_url&quot;:&quot;https://substackcdn.com/image/fetch/f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe930fbab-b8df-40ef-9676-3d9ca5d49eae_714x714.png&quot;,&quot;belowTheFold&quot;:false,&quot;youtube_url&quot;:null,&quot;show_links&quot;:null,&quot;feed_url&quot;:null}"></div><p>This week, let me ask you this:<br>Have you ever built a chart that <em>technically</em> made sense&#8230; but no one seemed to get it?<br>You showed the data, but it didn&#8217;t land. </p><p>It didn&#8217;t inspire action. </p><p>It didn&#8217;t spark con&#8230;</p>
      <p>
          <a href="https://www.databites.tech/p/the-importance-of-context">
              Read more
          </a>
      </p>
   ]]></content:encoded></item><item><title><![CDATA[SQL COURSE PROBLEM #1]]></title><description><![CDATA[SQL Crash Course - Managing your own newsletters]]></description><link>https://www.databites.tech/p/sql-course-problem-1</link><guid isPermaLink="false">https://www.databites.tech/p/sql-course-problem-1</guid><dc:creator><![CDATA[Josep Ferrer]]></dc:creator><pubDate>Thu, 29 May 2025 12:02:52 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!YPzg!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc5ec4a4b-f38b-46cf-92a0-4287d031a25e_1380x962.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p></p><div class="pullquote"><p><em>All the course material is stored in the <strong><a href="https://github.com/CornelliusYW/SQL-Crash-Course">SQL Crash Course repository</a></strong>.</em></p></div><p>Hi everyone! <strong>Josep</strong> and <a href="https://open.substack.com/users/6000855-cornellius-yudha-wijaya?utm_source=mentions">Cornellius Yudha Wijaya</a> from <a href="https://open.substack.com/pub/cornellius">Non-Brand Data</a> here &#128075;&#127995;</p><p>As promised, today we are publishing the next two issues of our <a href="https://www.databites.tech/p/launching-the-sql-crash-course-from">SQL Crash Course &#8211; From Zero to Hero!</a> &#128640;</p><p>I am sure you are here to continue our <strong>SQL Crash Course Journey!&#128218;</strong></p><p>If this is your first time or you&#8217;ve for&#8230;</p>
      <p>
          <a href="https://www.databites.tech/p/sql-course-problem-1">
              Read more
          </a>
      </p>
   ]]></content:encoded></item><item><title><![CDATA[ Position Yourself for Luck]]></title><description><![CDATA[Notes #11 - How to shift your environment, act with agency, and make better decisions that attract opportunity.]]></description><link>https://www.databites.tech/p/position-yourself-for-luck</link><guid isPermaLink="false">https://www.databites.tech/p/position-yourself-for-luck</guid><dc:creator><![CDATA[Josep Ferrer]]></dc:creator><pubDate>Wed, 28 May 2025 10:02:48 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!zKxA!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F12d60872-4d4b-4344-b5e4-f28d501e8a47_1646x1644.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p><strong>Hey everyone! &#128075;&#127996;</strong></p><p>Josep here, back with your weekly bite of career insights and encouragement &#10024;</p><p>Last week, we unpacked why <strong>mindset beats raw talent</strong> &#8212; and how the smallest shift in belief can unlock big transformation.</p><div class="digest-post-embed" data-attrs="{&quot;nodeId&quot;:&quot;683b3d39-d958-4890-9b40-303c12f9de07&quot;,&quot;caption&quot;:&quot;Hey everyone! &#128075;&#127996;&quot;,&quot;cta&quot;:&quot;Read full story&quot;,&quot;showBylines&quot;:true,&quot;size&quot;:&quot;sm&quot;,&quot;isEditorNode&quot;:true,&quot;title&quot;:&quot;Why Resilience Is the New Hard Skill&quot;,&quot;publishedBylines&quot;:[{&quot;id&quot;:132707413,&quot;name&quot;:&quot;Josep Ferrer&quot;,&quot;bio&quot;:&quot;Outstand using data -- Data Science, Design and Tech Tech Writer @KDnuggets @DataCamp &#128073;&#127995;Inquiries in rfeers@gmail.com&quot;,&quot;photo_url&quot;:&quot;https://substackcdn.com/image/fetch/f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd196b5a6-59f2-46dd-99b3-e10ab1bbd27d_604x604.png&quot;,&quot;is_guest&quot;:false,&quot;bestseller_tier&quot;:null}],&quot;post_date&quot;:&quot;2025-05-20T10:02:33.375Z&quot;,&quot;cover_image&quot;:&quot;https://substackcdn.com/image/fetch/f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F35c24ac9-df4f-4491-b8cb-e474f1c2914d_890x892.png&quot;,&quot;cover_image_alt&quot;:null,&quot;canonical_url&quot;:&quot;https://www.databites.tech/p/why-resilience-is-the-new-hard-skill&quot;,&quot;section_name&quot;:&quot;Josep's Notes &#128640;&quot;,&quot;video_upload_id&quot;:null,&quot;id&quot;:163989029,&quot;type&quot;:&quot;newsletter&quot;,&quot;reaction_count&quot;:17,&quot;comment_count&quot;:1,&quot;publication_id&quot;:null,&quot;publication_name&quot;:&quot;DataBites&quot;,&quot;publication_logo_url&quot;:&quot;https://substackcdn.com/image/fetch/f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe930fbab-b8df-40ef-9676-3d9ca5d49eae_714x714.png&quot;,&quot;belowTheFold&quot;:false,&quot;youtube_url&quot;:null,&quot;show_links&quot;:null,&quot;feed_url&quot;:null}"></div><p>This week, we go even deeper. </p><p>&#127744; <strong>What do you do when the world is changing faster than your plans can keep up? </strong>You improve your <em>position</em> &#8212; &#8230;</p>
      <p>
          <a href="https://www.databites.tech/p/position-yourself-for-luck">
              Read more
          </a>
      </p>
   ]]></content:encoded></item><item><title><![CDATA[Understanding Autoencoders]]></title><description><![CDATA[TheGuestBites #2 - Daniel - Compress, Clean, and Discover Patterns with Neural Networks]]></description><link>https://www.databites.tech/p/understanding-autoencoders</link><guid isPermaLink="false">https://www.databites.tech/p/understanding-autoencoders</guid><dc:creator><![CDATA[Daniel]]></dc:creator><pubDate>Sat, 24 May 2025 10:02:24 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!ClpM!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F951ba09a-d8d6-4e4a-98de-8e281d953508_1332x1320.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p></p><div><hr></div><p>We&#8217;re kicking things off with Daniel Garc&#237;a, machine learning educator and creator of <em><a href="https://iamdgarcia.substack.com/">The Learning Curve</a></em><a href="https://iamdgarcia.substack.com/">.</a> In this issue, Daniel dives into the world of <strong>autoencoders</strong> &#8212; a neural network technique that goes far beyond just copying data.</p><p>Through real-world demos and crisp explanations, he reveals how autoencoders help us <strong>compress complex data, clean up noisy signals, detect anomalies, and even generate new content</strong>. Whether you're curious about smarter data workflows or the magic behind generative models, this breakdown makes a foundational concept feel both approachable and powerful.</p><p>&#8212; Josep</p><div><hr></div><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!ClpM!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F951ba09a-d8d6-4e4a-98de-8e281d953508_1332x1320.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!ClpM!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F951ba09a-d8d6-4e4a-98de-8e281d953508_1332x1320.png 424w, https://substackcdn.com/image/fetch/$s_!ClpM!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F951ba09a-d8d6-4e4a-98de-8e281d953508_1332x1320.png 848w, https://substackcdn.com/image/fetch/$s_!ClpM!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F951ba09a-d8d6-4e4a-98de-8e281d953508_1332x1320.png 1272w, https://substackcdn.com/image/fetch/$s_!ClpM!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F951ba09a-d8d6-4e4a-98de-8e281d953508_1332x1320.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!ClpM!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F951ba09a-d8d6-4e4a-98de-8e281d953508_1332x1320.png" width="1332" height="1320" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/951ba09a-d8d6-4e4a-98de-8e281d953508_1332x1320.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:1320,&quot;width&quot;:1332,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:1265545,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:true,&quot;internalRedirect&quot;:&quot;https://www.databites.tech/i/164146477?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F951ba09a-d8d6-4e4a-98de-8e281d953508_1332x1320.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!ClpM!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F951ba09a-d8d6-4e4a-98de-8e281d953508_1332x1320.png 424w, https://substackcdn.com/image/fetch/$s_!ClpM!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F951ba09a-d8d6-4e4a-98de-8e281d953508_1332x1320.png 848w, https://substackcdn.com/image/fetch/$s_!ClpM!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F951ba09a-d8d6-4e4a-98de-8e281d953508_1332x1320.png 1272w, https://substackcdn.com/image/fetch/$s_!ClpM!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F951ba09a-d8d6-4e4a-98de-8e281d953508_1332x1320.png 1456w" sizes="100vw" fetchpriority="high"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>Hi! I&#8217;m <a href="http://www.linkedin.com/in/iamdgarcia">Daniel Garc&#237;a</a>, the tinkerer behind <em><a href="https://iamdgarcia.substack.com/">The Learning Curve</a></em>, where I break down machine learning concepts into real-world demos and practical insights for anyone curious about AI. </p><p>In this issue, we&#8217;ll explore a tool that&#8217;s both powerful and deceptively simple: <strong>autoencoders</strong>.</p><p>While they&#8217;re often introduced as models that &#8220;just copy their input,&#8221; the truth is much richer. </p><p>By learning to compress and reconstruct data, autoencoders unlock everything from smarter data cleaning to generative art. If you&#8217;ve ever wondered how machines learn to see structure in chaos, this one&#8217;s for you.</p><h2><strong>What is an Autoencoder?</strong></h2><p>An autoencoder is a type of neural network designed to take in data, compress it, and then reconstruct it as closely as possible to the original. But it&#8217;s not just copying &#8212; the key feature is a <strong>bottleneck</strong>, a deliberately small hidden layer that limits how much information the model can store.</p><p>This constraint forces the model to <em>learn the most important features</em> of the input data. It must filter out noise, redundancy, and irrelevant detail in order to create a useful summary &#8212; called the <strong>latent representation</strong>.</p><p>Once trained, autoencoders can be used for compression, denoising, anomaly detection, and &#8212; in some variants &#8212; even creative generation of new data.</p><h3><strong>1. Why Should You Care?</strong></h3><p>Let&#8217;s take a closer look at how autoencoders show up in the real world and why they&#8217;re worth adding to your machine learning toolbox.</p><h4><strong>1.1 Compression (Dimensionality Reduction)</strong></h4><p>Autoencoders are one of the most flexible tools for dimensionality reduction. When working with high-dimensional data like images, sensor arrays, or audio signals, storing or processing that data in full can be expensive. Autoencoders solve this by learning a more compact version.</p><p>Instead of manually engineering features, the network automatically learns a compressed form that keeps the structure but discards unnecessary detail. This is especially useful for speeding up downstream models or visualizing data in two or three dimensions.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!r3wZ!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1a8228e8-5828-423b-8018-0b7ba2bc3f6e_1600x1463.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!r3wZ!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1a8228e8-5828-423b-8018-0b7ba2bc3f6e_1600x1463.png 424w, https://substackcdn.com/image/fetch/$s_!r3wZ!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1a8228e8-5828-423b-8018-0b7ba2bc3f6e_1600x1463.png 848w, https://substackcdn.com/image/fetch/$s_!r3wZ!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1a8228e8-5828-423b-8018-0b7ba2bc3f6e_1600x1463.png 1272w, https://substackcdn.com/image/fetch/$s_!r3wZ!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1a8228e8-5828-423b-8018-0b7ba2bc3f6e_1600x1463.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!r3wZ!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1a8228e8-5828-423b-8018-0b7ba2bc3f6e_1600x1463.png" width="488" height="446.1043956043956" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/1a8228e8-5828-423b-8018-0b7ba2bc3f6e_1600x1463.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:1331,&quot;width&quot;:1456,&quot;resizeWidth&quot;:488,&quot;bytes&quot;:null,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!r3wZ!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1a8228e8-5828-423b-8018-0b7ba2bc3f6e_1600x1463.png 424w, https://substackcdn.com/image/fetch/$s_!r3wZ!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1a8228e8-5828-423b-8018-0b7ba2bc3f6e_1600x1463.png 848w, https://substackcdn.com/image/fetch/$s_!r3wZ!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1a8228e8-5828-423b-8018-0b7ba2bc3f6e_1600x1463.png 1272w, https://substackcdn.com/image/fetch/$s_!r3wZ!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1a8228e8-5828-423b-8018-0b7ba2bc3f6e_1600x1463.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">Image by Author (Daniel Garc&#237;a from The Learning Curve)</figcaption></figure></div><h4><strong>1.2 Denoising (Data Cleaning)</strong></h4><p>A common real-world problem is noisy data. Whether you&#8217;re scanning documents, recording audio, or collecting signals from sensors, the data you get is rarely clean.</p><p>A <strong>denoising autoencoder</strong> solves this by training the model to take in noisy input and predict the clean version. This forces the network to ignore irrelevant variations and reconstruct only the core signal.</p><p>It&#8217;s a data-driven way to clean inputs without needing to handcraft filtering rules. If you&#8217;re dealing with messy datasets, this can make a huge difference in performance downstream.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!iOQ5!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Febff34ab-8b64-4730-b407-53956dfc17d9_860x414.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!iOQ5!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Febff34ab-8b64-4730-b407-53956dfc17d9_860x414.png 424w, https://substackcdn.com/image/fetch/$s_!iOQ5!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Febff34ab-8b64-4730-b407-53956dfc17d9_860x414.png 848w, https://substackcdn.com/image/fetch/$s_!iOQ5!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Febff34ab-8b64-4730-b407-53956dfc17d9_860x414.png 1272w, https://substackcdn.com/image/fetch/$s_!iOQ5!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Febff34ab-8b64-4730-b407-53956dfc17d9_860x414.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!iOQ5!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Febff34ab-8b64-4730-b407-53956dfc17d9_860x414.png" width="860" height="414" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/ebff34ab-8b64-4730-b407-53956dfc17d9_860x414.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:414,&quot;width&quot;:860,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!iOQ5!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Febff34ab-8b64-4730-b407-53956dfc17d9_860x414.png 424w, https://substackcdn.com/image/fetch/$s_!iOQ5!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Febff34ab-8b64-4730-b407-53956dfc17d9_860x414.png 848w, https://substackcdn.com/image/fetch/$s_!iOQ5!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Febff34ab-8b64-4730-b407-53956dfc17d9_860x414.png 1272w, https://substackcdn.com/image/fetch/$s_!iOQ5!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Febff34ab-8b64-4730-b407-53956dfc17d9_860x414.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">Image from the <a href="http://en.wikipedia.org/wiki/Autoencoder">Autoencoder Wikipedia website.</a></figcaption></figure></div><p></p><h4><strong>1.3 Anomaly Detection</strong></h4><p>Autoencoders are also a powerful tool for spotting things that don&#8217;t belong. By training a model on normal data &#8212; for example, regular sensor readings or typical user behavior &#8212; it becomes very good at reconstructing those patterns.</p><p>But when an unusual input comes along, the autoencoder struggles. Its reconstruction will be poor, and the <strong>reconstruction error</strong> will spike. That spike is a useful signal: something about this input is different from the norm.</p><p>This technique is widely used in fraud detection, predictive maintenance, cybersecurity, and monitoring for system failures.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!Z1tT!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F67bc6ee7-d04e-422e-9f84-758f12142edf_548x295.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!Z1tT!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F67bc6ee7-d04e-422e-9f84-758f12142edf_548x295.png 424w, https://substackcdn.com/image/fetch/$s_!Z1tT!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F67bc6ee7-d04e-422e-9f84-758f12142edf_548x295.png 848w, https://substackcdn.com/image/fetch/$s_!Z1tT!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F67bc6ee7-d04e-422e-9f84-758f12142edf_548x295.png 1272w, https://substackcdn.com/image/fetch/$s_!Z1tT!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F67bc6ee7-d04e-422e-9f84-758f12142edf_548x295.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!Z1tT!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F67bc6ee7-d04e-422e-9f84-758f12142edf_548x295.png" width="668" height="359.5985401459854" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/67bc6ee7-d04e-422e-9f84-758f12142edf_548x295.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:295,&quot;width&quot;:548,&quot;resizeWidth&quot;:668,&quot;bytes&quot;:null,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!Z1tT!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F67bc6ee7-d04e-422e-9f84-758f12142edf_548x295.png 424w, https://substackcdn.com/image/fetch/$s_!Z1tT!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F67bc6ee7-d04e-422e-9f84-758f12142edf_548x295.png 848w, https://substackcdn.com/image/fetch/$s_!Z1tT!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F67bc6ee7-d04e-422e-9f84-758f12142edf_548x295.png 1272w, https://substackcdn.com/image/fetch/$s_!Z1tT!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F67bc6ee7-d04e-422e-9f84-758f12142edf_548x295.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">Image by Author. Daniel Garc&#237;a from The Learning Curve. </figcaption></figure></div><h4><strong>1.4 Generating New Samples</strong></h4><p>Not all autoencoders are just for cleaning or compressing data. Some are built to <em>generate new data</em> entirely.</p><p><strong>Variational Autoencoders (VAEs)</strong> treat the latent space as a probability distribution rather than a fixed point. This allows the model to sample new points in that space and decode them into plausible new outputs.</p><p>In practice, this enables you to create new images, sounds, or sequences based on the structure learned from training data. It&#8217;s one of the most creative and experimental branches of unsupervised learning.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!g6-I!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5f73967e-73c1-4793-8e40-e31fa0aa4c9b_511x453.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!g6-I!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5f73967e-73c1-4793-8e40-e31fa0aa4c9b_511x453.png 424w, https://substackcdn.com/image/fetch/$s_!g6-I!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5f73967e-73c1-4793-8e40-e31fa0aa4c9b_511x453.png 848w, https://substackcdn.com/image/fetch/$s_!g6-I!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5f73967e-73c1-4793-8e40-e31fa0aa4c9b_511x453.png 1272w, https://substackcdn.com/image/fetch/$s_!g6-I!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5f73967e-73c1-4793-8e40-e31fa0aa4c9b_511x453.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!g6-I!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5f73967e-73c1-4793-8e40-e31fa0aa4c9b_511x453.png" width="511" height="453" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/5f73967e-73c1-4793-8e40-e31fa0aa4c9b_511x453.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:453,&quot;width&quot;:511,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!g6-I!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5f73967e-73c1-4793-8e40-e31fa0aa4c9b_511x453.png 424w, https://substackcdn.com/image/fetch/$s_!g6-I!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5f73967e-73c1-4793-8e40-e31fa0aa4c9b_511x453.png 848w, https://substackcdn.com/image/fetch/$s_!g6-I!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5f73967e-73c1-4793-8e40-e31fa0aa4c9b_511x453.png 1272w, https://substackcdn.com/image/fetch/$s_!g6-I!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5f73967e-73c1-4793-8e40-e31fa0aa4c9b_511x453.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">Image by Author (Daniel Garc&#237;a from The Learning Curve)</figcaption></figure></div><h3><strong>2. How Autoencoders Work</strong></h3><p>The architecture of an autoencoder is made up of two main parts:</p><ul><li><p><strong>Encoder</strong>: This compresses the input data into a smaller internal representation (the latent vector). It&#8217;s like summarizing a paragraph into a sentence.</p></li><li><p><strong>Decoder</strong>: This takes that compressed summary and tries to recreate the original data as closely as possible. The goal is to make the output look just like the input.</p></li></ul><p>The model is trained by minimizing a <strong>reconstruction loss</strong>, a measure of how different the output is from the original. This could be mean squared error (for continuous data like images) or binary cross-entropy (for normalized or binary data).</p><p>Without the bottleneck, the network would just memorize and copy the data. But with the bottleneck, it is forced to <em>learn patterns and compress meaningfully</em>.</p><h3><strong>3. Autoencoder Variants</strong></h3><p>Autoencoders come in many forms. Here&#8217;s a quick guide to the most common variants and what they&#8217;re useful for:</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!Xclz!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F930db4a5-f498-44e3-aa83-e5cb1118169d_1318x448.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!Xclz!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F930db4a5-f498-44e3-aa83-e5cb1118169d_1318x448.png 424w, https://substackcdn.com/image/fetch/$s_!Xclz!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F930db4a5-f498-44e3-aa83-e5cb1118169d_1318x448.png 848w, https://substackcdn.com/image/fetch/$s_!Xclz!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F930db4a5-f498-44e3-aa83-e5cb1118169d_1318x448.png 1272w, https://substackcdn.com/image/fetch/$s_!Xclz!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F930db4a5-f498-44e3-aa83-e5cb1118169d_1318x448.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!Xclz!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F930db4a5-f498-44e3-aa83-e5cb1118169d_1318x448.png" width="1318" height="448" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/930db4a5-f498-44e3-aa83-e5cb1118169d_1318x448.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:448,&quot;width&quot;:1318,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:102034,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.databites.tech/i/164146477?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F930db4a5-f498-44e3-aa83-e5cb1118169d_1318x448.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!Xclz!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F930db4a5-f498-44e3-aa83-e5cb1118169d_1318x448.png 424w, https://substackcdn.com/image/fetch/$s_!Xclz!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F930db4a5-f498-44e3-aa83-e5cb1118169d_1318x448.png 848w, https://substackcdn.com/image/fetch/$s_!Xclz!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F930db4a5-f498-44e3-aa83-e5cb1118169d_1318x448.png 1272w, https://substackcdn.com/image/fetch/$s_!Xclz!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F930db4a5-f498-44e3-aa83-e5cb1118169d_1318x448.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">Table by Author (Daniel Garcia from the Learning Curve)</figcaption></figure></div><p>Let&#8217;s dig a little deeper into the more interesting ones:</p><h4><strong>3.1 Sparse Autoencoders</strong></h4><p>In this version, we don&#8217;t necessarily make the latent space smaller &#8212; instead, we encourage the model to activate only a few neurons at a time. This sparsity leads to representations where different neurons specialize in detecting different features.</p><p>We add a penalty during training that discourages the model from turning on too many neurons. The result is a more interpretable and often more robust model that can still extract useful features.</p><h4><strong>3.2 Contractive Autoencoders</strong></h4><p>These are designed to be resistant to tiny changes in the input. They include a penalty that discourages the model from making big changes in the encoding in response to small changes in the input.</p><p>This is useful for tasks where inputs may be noisy or jittery, but we want the model to focus on the stable patterns.</p><h4><strong>3.3 Variational Autoencoders (VAEs)</strong></h4><p>VAEs change the way we think about the latent space. Instead of mapping inputs to a single point, they map them to a probability distribution &#8212; usually Gaussian. This enables you to sample new points and generate new outputs.</p><p>To make this work, VAEs add an extra penalty term that ensures the latent space stays well-behaved (smooth, continuous, and compact). This is what allows them to generate data that looks convincingly real.</p><h3><strong>4. Three Mini-Projects to Try</strong></h3><p>Let&#8217;s make this practical. Here are three hands-on projects you can try to explore different use cases of autoencoders.</p><p><strong>Project 1 &#8211; Compression with MNIST</strong></p><p>Train a basic autoencoder on MNIST, a dataset of grayscale images of digits. After each training epoch, save a set of reconstructions and compare them to the originals.</p><p>You&#8217;ll be able to visually track how the model learns to compress and reconstruct the data over time &#8212; starting with blurry blobs and ending with recognizable digits.</p><div class="captioned-image-container"><figure><a class="image-link image2" target="_blank" href="https://substackcdn.com/image/fetch/$s_!T8CZ!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4edaf2b3-085e-47f5-ba97-d935e135ae8a_600x154.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!T8CZ!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4edaf2b3-085e-47f5-ba97-d935e135ae8a_600x154.png 424w, https://substackcdn.com/image/fetch/$s_!T8CZ!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4edaf2b3-085e-47f5-ba97-d935e135ae8a_600x154.png 848w, https://substackcdn.com/image/fetch/$s_!T8CZ!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4edaf2b3-085e-47f5-ba97-d935e135ae8a_600x154.png 1272w, https://substackcdn.com/image/fetch/$s_!T8CZ!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4edaf2b3-085e-47f5-ba97-d935e135ae8a_600x154.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!T8CZ!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4edaf2b3-085e-47f5-ba97-d935e135ae8a_600x154.png" width="600" height="154" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/4edaf2b3-085e-47f5-ba97-d935e135ae8a_600x154.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:154,&quot;width&quot;:600,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!T8CZ!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4edaf2b3-085e-47f5-ba97-d935e135ae8a_600x154.png 424w, https://substackcdn.com/image/fetch/$s_!T8CZ!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4edaf2b3-085e-47f5-ba97-d935e135ae8a_600x154.png 848w, https://substackcdn.com/image/fetch/$s_!T8CZ!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4edaf2b3-085e-47f5-ba97-d935e135ae8a_600x154.png 1272w, https://substackcdn.com/image/fetch/$s_!T8CZ!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4edaf2b3-085e-47f5-ba97-d935e135ae8a_600x154.png 1456w" sizes="100vw" loading="lazy"></picture><div></div></div></a><figcaption class="image-caption">Image by Author (Daniel Garc&#237;a from The Learning Curve)</figcaption></figure></div><p><strong>Project 2 &#8211; Audio Denoising</strong></p><p>Record yourself saying a short phrase like "machine learning rocks" while there&#8217;s background noise (e.g. fan or vacuum cleaner). Alternatively, add artificial Gaussian noise to a clean recording.</p><p>Convert the audio into a spectrogram, then train a denoising autoencoder to reconstruct the clean signal. The model will learn to ignore the noise and focus on the speech signal.</p><p><strong>Project 3 &#8211; Anomaly Detection in Sensor Data</strong></p><p>Use a dataset of sensor readings from an industrial process or simulated IoT environment. Train an autoencoder only on normal data. Then introduce some outlier readings (e.g., spikes, drops, or irregular behavior).</p><p>Monitor the reconstruction error over time. When it spikes, it&#8217;s likely an anomaly. This is a powerful technique for predictive maintenance and safety monitoring.</p><h3><strong>5. Common Questions About Autoencoders</strong></h3><h4><strong>Is this just fancy PCA?</strong></h4><p>They&#8217;re related &#8212; both compress data &#8212; but PCA is linear and deterministic. Autoencoders are nonlinear and can be scaled and customized for many more types of input.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!AlCI!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff5d6b349-687c-4566-a068-c6676a1653d2_497x265.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!AlCI!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff5d6b349-687c-4566-a068-c6676a1653d2_497x265.png 424w, https://substackcdn.com/image/fetch/$s_!AlCI!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff5d6b349-687c-4566-a068-c6676a1653d2_497x265.png 848w, https://substackcdn.com/image/fetch/$s_!AlCI!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff5d6b349-687c-4566-a068-c6676a1653d2_497x265.png 1272w, https://substackcdn.com/image/fetch/$s_!AlCI!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff5d6b349-687c-4566-a068-c6676a1653d2_497x265.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!AlCI!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff5d6b349-687c-4566-a068-c6676a1653d2_497x265.png" width="497" height="265" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/f5d6b349-687c-4566-a068-c6676a1653d2_497x265.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:265,&quot;width&quot;:497,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!AlCI!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff5d6b349-687c-4566-a068-c6676a1653d2_497x265.png 424w, https://substackcdn.com/image/fetch/$s_!AlCI!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff5d6b349-687c-4566-a068-c6676a1653d2_497x265.png 848w, https://substackcdn.com/image/fetch/$s_!AlCI!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff5d6b349-687c-4566-a068-c6676a1653d2_497x265.png 1272w, https://substackcdn.com/image/fetch/$s_!AlCI!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff5d6b349-687c-4566-a068-c6676a1653d2_497x265.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">Image by Author (Daniel Garc&#237;a from The Learning Curve)</figcaption></figure></div><ol><li><p><strong>How do I pick the latent size?<br></strong>A good rule of thumb is to start with log base 2 of your input size. From there, adjust based on how well the model reconstructs data and whether it generalizes well.</p></li><li><p><strong>Why are my reconstructions blurry or inaccurate?<br></strong>Check your latent size, your loss function, and whether your train/test split is correct. Blurry outputs often mean your model doesn&#8217;t have enough capacity or hasn&#8217;t trained long enough.</p></li><li><p><strong>Can I generate new data with a vanilla autoencoder?<br></strong>Not reliably. You&#8217;ll need a Variational Autoencoder (VAE) or a GAN if you want to generate novel samples.</p></li></ol><h3><strong>6. Wrapping Up</strong></h3><p>Autoencoders are a foundational tool in the machine learning world. While they&#8217;re often described as models that "just copy the input," the reality is that they learn how to compress and represent the essence of your data. Once trained, they can do much more than reconstruction &#8212; they can clean, compress, detect, and even create.</p><p>Over the coming weeks, I&#8217;ll be publishing walkthroughs on <em>The Learning Curve</em> showing exactly how to build each of the three mini-projects above, step by step. That means you&#8217;ll not only understand the theory &#8212; you&#8217;ll get working code, visualizations, and practical insights to make it your own.</p><div><hr></div><p><a href="https://www.linkedin.com/in/iamdgarcia/">Daniel</a> is an <strong>ML engineer and writer</strong> of <a href="https://iamdgarcia.substack.com/">The Learning Curve</a>, a newsletter that makes AI make sense&#8212;no hype, no jargon. <strong>He&#8217;s been through every stage of academia</strong> (yes, <em>all the way to a PhD</em>), worked in startups and consulting, and now shares the kind of lessons he wishes he&#8217;d had when he started: <strong>clear, practical, and fluff-free.</strong></p>]]></content:encoded></item><item><title><![CDATA[SQL Crash Course – Getting into Practice! 👨🏻‍💻]]></title><description><![CDATA[SQL Crash Course Tehory Ending, and starting of hands-on! &#128640;]]></description><link>https://www.databites.tech/p/sql-crash-course-getting-into-practice</link><guid isPermaLink="false">https://www.databites.tech/p/sql-crash-course-getting-into-practice</guid><dc:creator><![CDATA[Josep Ferrer]]></dc:creator><pubDate>Fri, 23 May 2025 10:02:50 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!0uya!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5c7f40ab-eb8f-4cc0-b9be-c1a1e2b0808b_2870x1465.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>Hey everyone! Josep and Cornellius one more week of SQL learning! &#128075;&#127995;</p><p>Can you believe it&#8217;s already been <strong>2 months</strong> since <span class="mention-wrap" data-attrs="{&quot;name&quot;:&quot;Cornellius Yudha Wijaya&quot;,&quot;id&quot;:6000855,&quot;type&quot;:&quot;user&quot;,&quot;url&quot;:null,&quot;photo_url&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/8981f4c2-dc6b-42f7-a2bd-9751653af571_512x512.jpeg&quot;,&quot;uuid&quot;:&quot;9c7a043d-f76d-4e3a-964b-8e495ce09f42&quot;}" data-component-name="MentionToDOM"></span> (from <span class="mention-wrap" data-attrs="{&quot;name&quot;:&quot;Non-Brand Data&quot;,&quot;id&quot;:37262,&quot;type&quot;:&quot;pub&quot;,&quot;url&quot;:&quot;https://open.substack.com/pub/cornellius&quot;,&quot;photo_url&quot;:&quot;https://bucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com/public/images/6c0e1cde-d120-4029-8ffd-2a8c7c6e4504_1280x1280.png&quot;,&quot;uuid&quot;:&quot;24794f1a-7a13-4a47-b5c0-1e53363fa81a&quot;}" data-component-name="MentionToDOM"></span> ) and I (from <span class="mention-wrap" data-attrs="{&quot;name&quot;:&quot;DataBites&quot;,&quot;id&quot;:2143185,&quot;type&quot;:&quot;pub&quot;,&quot;url&quot;:&quot;https://open.substack.com/pub/rfeers&quot;,&quot;photo_url&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/e930fbab-b8df-40ef-9676-3d9ca5d49eae_714x714.png&quot;,&quot;uuid&quot;:&quot;560fa677-00b7-46c2-a9e4-66e5e062ef0b&quot;}" data-component-name="MentionToDOM"></span> ) kicked off our <strong>SQL Crash Course</strong>? &#127881;<br>What started as a fun idea quickly turned into a full-blown series &#8212; and today, we&#8217;re sharing a <strong>recap of everything we&#8217;ve covered&#8230; and what&#8217;s coming next!</strong> &#128588;&#127995;</p><h2>&#128293; What&#8217;s inside the course?</h2><p>We&#8217;ve structured the course into <strong>7 key modules</strong> to take you from zero to SQL hero:</p><ol><li><p><strong>Introduction</strong> &#8211; What SQL is and why it matters</p></li><li><p><strong>SQL Fundamentals</strong> &#8211; Basic commands, filtering, and aggregation</p></li><li><p><strong>Intermediate SQL</strong> &#8211; Joins, unions, and functions</p></li><li><p><strong>Advanced SQL</strong> &#8211; Subqueries, CTEs, recursion, and views</p></li><li><p><strong>Database Operations</strong> &#8211; CRUD, schema changes, and optimization</p></li><li><p><strong>Crafting Good SQL Queries</strong> &#8211; Best practices for writing efficient queries</p></li><li><p><strong>Real-world Problems</strong> &#8211; Applying SQL to practical challenges</p></li></ol><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!0uya!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5c7f40ab-eb8f-4cc0-b9be-c1a1e2b0808b_2870x1465.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!0uya!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5c7f40ab-eb8f-4cc0-b9be-c1a1e2b0808b_2870x1465.png 424w, https://substackcdn.com/image/fetch/$s_!0uya!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5c7f40ab-eb8f-4cc0-b9be-c1a1e2b0808b_2870x1465.png 848w, https://substackcdn.com/image/fetch/$s_!0uya!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5c7f40ab-eb8f-4cc0-b9be-c1a1e2b0808b_2870x1465.png 1272w, https://substackcdn.com/image/fetch/$s_!0uya!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5c7f40ab-eb8f-4cc0-b9be-c1a1e2b0808b_2870x1465.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!0uya!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5c7f40ab-eb8f-4cc0-b9be-c1a1e2b0808b_2870x1465.png" width="1456" height="743" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/5c7f40ab-eb8f-4cc0-b9be-c1a1e2b0808b_2870x1465.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:743,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:490572,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:true,&quot;internalRedirect&quot;:&quot;https://www.databites.tech/i/158754537?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5c7f40ab-eb8f-4cc0-b9be-c1a1e2b0808b_2870x1465.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!0uya!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5c7f40ab-eb8f-4cc0-b9be-c1a1e2b0808b_2870x1465.png 424w, https://substackcdn.com/image/fetch/$s_!0uya!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5c7f40ab-eb8f-4cc0-b9be-c1a1e2b0808b_2870x1465.png 848w, https://substackcdn.com/image/fetch/$s_!0uya!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5c7f40ab-eb8f-4cc0-b9be-c1a1e2b0808b_2870x1465.png 1272w, https://substackcdn.com/image/fetch/$s_!0uya!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5c7f40ab-eb8f-4cc0-b9be-c1a1e2b0808b_2870x1465.png 1456w" sizes="100vw" fetchpriority="high"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><h3>&#9989; What We&#8217;ve Covered So Far</h3><p>We&#8217;ve completed <strong>Modules 1 through 6</strong> &#8212; all the theory and best practices you need.<br>You can catch up on each lesson here:</p><h4><strong>1&#65039;&#8419; Introduction to SQL</strong></h4><ul><li><p> <strong>#1. What is SQL? &#8594; <a href="https://www.nb-data.com/p/2-what-is-sql">link</a></strong></p></li><li><p><strong>#2. Why Learn SQL? &#8594; <a href="https://www.databites.tech/p/2-why-learn-sql">link</a></strong></p></li><li><p><strong>#3. Relational Data &amp; Models &#8594; <a href="https://www.databites.tech/p/3-relational-data-and-models">link</a></strong></p></li></ul><h4><strong>2&#65039;&#8419; SQL Fundamentals</strong></h4><ul><li><p><strong>#4. Basic Commands (SELECT, FROM, WHERE) &#8594; <a href="https://www.nb-data.com/p/4-sql-basic-commands">link</a></strong></p></li><li><p><strong>#5. Sorting &amp; Limiting (ORDER BY, LIMIT)&#8594;</strong> <strong><a href="https://www.databites.tech/p/5-sorting-and-limiting">link</a></strong></p></li><li><p><strong>#6. Aggregate Functions (SUM, AVG, COUNT, etc.) &#8594; <a href="https://www.nb-data.com/p/6-aggregate-functions">link</a></strong></p></li></ul><h4><strong>3&#65039;&#8419; Intermediate SQL</strong></h4><ul><li><p><strong>#7. JOINS (INNER, LEFT, RIGHT, FULL) &#8594; <a href="https://www.databites.tech/p/7-joins-left-right-inner-and-full">link</a> </strong></p></li><li><p><strong>#8. UNION &amp; UNION ALL &#8594;</strong> <strong><a href="https://www.nb-data.com/p/8-union-and-union-all">link</a></strong></p></li><li><p><strong>#9. Case Expressions &#8594; <a href="https://www.databites.tech/p/9-case-expressions">link</a></strong></p></li><li><p><strong>#10. Functions (String, Date, Numeric) &#8594; <a href="https://www.nb-data.com/p/10-functions-string-date-numeric">link</a></strong></p></li></ul><h4><strong>4&#65039;&#8419; Advanced SQL</strong></h4><ul><li><p><strong>#11. Subqueries &#8594; <a href="https://www.nb-data.com/p/11-subqueries">link</a></strong></p></li><li><p><strong>#12. Common Table Expressions (CTEs) &#8594;</strong> <strong><a href="https://www.databites.tech/p/12-common-table-expressions-ctes">link</a></strong></p></li><li><p><strong>#13. Recursion &#8594;</strong> <strong><a href="https://www.nb-data.com/p/13-recursion">link</a></strong></p></li><li><p><strong>#14. Views  &#8594;</strong> <strong><a href="https://www.databites.tech/p/14-views">link</a></strong></p></li></ul><h4><strong>5&#65039;&#8419; Database Operations</strong></h4><ul><li><p><strong>#15. CRUD operations (INSERT, UPDATE, DELETE) &#8594;</strong><em> </em><strong><a href="https://www.nb-data.com/p/15-crud-operations">link</a></strong></p></li><li><p><strong>#16. Database modifications (ALTER, DROP, CREATE) &#8594;</strong> <strong><a href="https://www.databites.tech/p/16-database-modifications">link</a></strong></p></li><li><p><strong>#17. Indexing &amp; Optimization &#8594; <a href="https://www.nb-data.com/p/17-indexing-and-optimization">link</a></strong></p></li></ul><h4><strong>6&#65039;&#8419; Crafting Good SQL queries</strong></h4><ul><li><p><strong>#18. Modular Code &#8594;<a href="https://www.databites.tech/p/18-generating-modular-code">link</a></strong></p></li><li><p><strong>#19. SQL Execution Order &#8594; <a href="https://www.databites.tech/p/19-sql-execution-order">link</a></strong></p></li><li><p><strong>#20. Query Optimization &#8594; <a href="https://www.nb-data.com/p/20-query-optimization">link</a></strong> </p></li></ul><p>So the following question is&#8230; </p><h3>&#128284; What&#8217;s Next?</h3><p>Now it&#8217;s time to <strong>put theory into practice</strong>! Over the next few weeks, we&#8217;ll release hands-on exercises and projects to help you apply what you&#8217;ve learned:</p><h4>7&#65039;&#8419; 7. Real-World Problems (Quick Wins)</h4><p>Easy-level problems, perfect for 30&#8211;60 minutes of practice:</p><ul><li><p>Problem 1 &#8594; 29th May</p></li><li><p>Problem 2&#8594; 29th May</p></li><li><p>Problem 3 &#8594; 5th June</p></li><li><p>Problem 4&#8594; 5th June</p></li></ul><h4>&#129514; Mini-Projects (Deeper Dives)</h4><p><strong>Medium-difficulty projects</strong> to consolidate your skills:</p><ul><li><p>Mini-Project 1 &#8594; 19th June</p></li><li><p>Mini-Project 2 &#8594; 19th June</p></li></ul><h4>&#128163; Final Projects (End-to-End Challenges)</h4><p><strong>Advanced, real-life projects</strong> released in multiple parts:</p><ul><li><p>Project 1 &#8594; 26th June (with multiple issues in the following weeks)</p></li><li><p>Project 2 &#8594; 26th June (with multiple issues in the following weeks)</p></li></ul><p>We&#8217;ll share full project briefs and walkthroughs &#8212; so stay tuned!</p><div><hr></div><h2>&#128161; Where to Follow Along?</h2><p>We&#8217;ll continue posting weekly updates in our newsletters:</p><ul><li><p><strong><a href="https://www.databites.tech">DataBites</a></strong> (<em>by Josep</em>)</p></li><li><p><strong><a href="#">Non-Brand Data</a></strong> (<em>by Cornellius</em>)</p></li></ul><p>&#128073; Check out the <strong><a href="https://github.com/CornelliusYW/SQL-Crash-Course">GitHub repo</a></strong> and stay tuned for the first post!</p><div><hr></div><p>Let&#8217;s dive in and <strong>make SQL less scary, more fun, and way more useful!</strong> &#128640;</p><p><strong>Josep &amp; Cornellius</strong></p><div class="subscription-widget-wrap-editor" data-attrs="{&quot;url&quot;:&quot;https://www.databites.tech/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="SubscribeWidgetToDOM"><div class="subscription-widget show-subscribe"><div class="preamble"><p class="cta-caption">DataBites is a reader-supported publication. To receive new posts and support my work, consider becoming a free or paid subscriber.</p></div><form class="subscription-widget-subscribe"><input type="email" class="email-input" name="email" placeholder="Type your email&#8230;" tabindex="-1"><input type="submit" class="button primary" value="Subscribe"><div class="fake-input-wrapper"><div class="fake-input"></div><div class="fake-button"></div></div></form></div></div>]]></content:encoded></item><item><title><![CDATA[Why Resilience Is the New Hard Skill]]></title><description><![CDATA[Notes #10 - The skill that keeps you growing when plans fall apart]]></description><link>https://www.databites.tech/p/why-resilience-is-the-new-hard-skill</link><guid isPermaLink="false">https://www.databites.tech/p/why-resilience-is-the-new-hard-skill</guid><dc:creator><![CDATA[Josep Ferrer]]></dc:creator><pubDate>Tue, 20 May 2025 10:02:33 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!K1vK!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F35c24ac9-df4f-4491-b8cb-e474f1c2914d_890x892.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p><strong>Hey everyone! &#128075;&#127996;</strong></p><p>Josep here, back with your weekly bite of career insights and encouragement &#10024;</p><p>Last week, we explored how <strong>mindset, not talent, shapes your future, and how a small shift in belief can unlock big growth.</strong></p><div class="digest-post-embed" data-attrs="{&quot;nodeId&quot;:&quot;64324a62-8bc1-42aa-bb2a-e6fe0b0d9cec&quot;,&quot;caption&quot;:&quot;Hey everyone! &#128075;&#127996;&quot;,&quot;cta&quot;:&quot;Read full story&quot;,&quot;showBylines&quot;:true,&quot;size&quot;:&quot;sm&quot;,&quot;isEditorNode&quot;:true,&quot;title&quot;:&quot;Fixed Mindset vs Groth Mindset&quot;,&quot;publishedBylines&quot;:[{&quot;id&quot;:132707413,&quot;name&quot;:&quot;Josep Ferrer&quot;,&quot;bio&quot;:&quot;Outstand using data -- Data Science, Design and Tech Tech Writer @KDnuggets @DataCamp &#128073;&#127995;Inquiries in rfeers@gmail.com&quot;,&quot;photo_url&quot;:&quot;https://substackcdn.com/image/fetch/f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd196b5a6-59f2-46dd-99b3-e10ab1bbd27d_604x604.png&quot;,&quot;is_guest&quot;:false,&quot;bestseller_tier&quot;:null}],&quot;post_date&quot;:&quot;2025-05-13T10:02:28.108Z&quot;,&quot;cover_image&quot;:&quot;https://substackcdn.com/image/fetch/f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3116b939-1d96-4d79-94a2-7f33d6839e88_1028x1036.png&quot;,&quot;cover_image_alt&quot;:null,&quot;canonical_url&quot;:&quot;https://www.databites.tech/p/fixed-mindset-vs-groth-mindset&quot;,&quot;section_name&quot;:&quot;Josep's Notes &#128640;&quot;,&quot;video_upload_id&quot;:null,&quot;id&quot;:163454232,&quot;type&quot;:&quot;newsletter&quot;,&quot;reaction_count&quot;:9,&quot;comment_count&quot;:1,&quot;publication_id&quot;:null,&quot;publication_name&quot;:&quot;DataBites&quot;,&quot;publication_logo_url&quot;:&quot;https://substackcdn.com/image/fetch/f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe930fbab-b8df-40ef-9676-3d9ca5d49eae_714x714.png&quot;,&quot;belowTheFold&quot;:false,&quot;youtube_url&quot;:null,&quot;show_links&quot;:null,&quot;feed_url&quot;:null}"></div><p>This week, we go even deeper. </p><p>&#127744;  <strong>What happens when the world keeps changing faster than your plans can keep up?</strong></p>
      <p>
          <a href="https://www.databites.tech/p/why-resilience-is-the-new-hard-skill">
              Read more
          </a>
      </p>
   ]]></content:encoded></item><item><title><![CDATA[#19 SQL Execution Order]]></title><description><![CDATA[SQL Crash Course #18]]></description><link>https://www.databites.tech/p/19-sql-execution-order</link><guid isPermaLink="false">https://www.databites.tech/p/19-sql-execution-order</guid><dc:creator><![CDATA[Josep Ferrer]]></dc:creator><pubDate>Thu, 15 May 2025 11:03:01 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!nQRO!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7b59985c-3a44-4fde-b097-eab424ab26d1_1380x962.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p></p><div class="pullquote"><p><em>All the course material is stored in the <strong><a href="https://github.com/CornelliusYW/SQL-Crash-Course">SQL Crash Course repository</a></strong>.</em></p></div><p>Hi everyone! <strong>Josep</strong> and <a href="https://open.substack.com/users/6000855-cornellius-yudha-wijaya?utm_source=mentions">Cornellius Yudha Wijaya</a> from <a href="https://open.substack.com/pub/cornellius">Non-Brand Data</a> here &#128075;&#127995;</p><p>As promised, today we are publishing the next two issues of our <a href="https://www.databites.tech/p/launching-the-sql-crash-course-from">SQL Crash Course &#8211; From Zero to Hero!</a> &#128640;</p><p>I am sure you are here to continue our <strong>SQL Crash Course Journey!&#128218;</strong></p><p>If this is your first time or you&#8217;ve for&#8230;</p>
      <p>
          <a href="https://www.databites.tech/p/19-sql-execution-order">
              Read more
          </a>
      </p>
   ]]></content:encoded></item><item><title><![CDATA[Fixed Mindset vs Groth Mindset]]></title><description><![CDATA[Notes #9 - Why a Growth Mindset Will Take You Further Than Talent Ever Could]]></description><link>https://www.databites.tech/p/fixed-mindset-vs-groth-mindset</link><guid isPermaLink="false">https://www.databites.tech/p/fixed-mindset-vs-groth-mindset</guid><dc:creator><![CDATA[Josep Ferrer]]></dc:creator><pubDate>Tue, 13 May 2025 10:02:28 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!dzj0!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3116b939-1d96-4d79-94a2-7f33d6839e88_1028x1036.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p><strong>Hey everyone! &#128075;&#127996;</strong></p><p>Josep here, back with your weekly bite of career insights and encouragement &#10024;</p><p>Last week, we explored <strong>how </strong><em><strong>soft skills</strong></em><strong> like emotional intelligence are becoming the new superpowers in an automated world.</strong></p><div class="digest-post-embed" data-attrs="{&quot;nodeId&quot;:&quot;eb6bc240-76c7-4733-ad51-da0ce23a8179&quot;,&quot;caption&quot;:&quot;Hey everyone! &#128075;&#127996;&quot;,&quot;cta&quot;:&quot;Read full story&quot;,&quot;showBylines&quot;:true,&quot;size&quot;:&quot;sm&quot;,&quot;isEditorNode&quot;:true,&quot;title&quot;:&quot;Why Soft Skills Are the New Hard Skills&quot;,&quot;publishedBylines&quot;:[{&quot;id&quot;:132707413,&quot;name&quot;:&quot;Josep Ferrer&quot;,&quot;bio&quot;:&quot;Outstand using data -- Data Science, Design and Tech Tech Writer @KDnuggets @DataCamp &#128073;&#127995;Inquiries in rfeers@gmail.com&quot;,&quot;photo_url&quot;:&quot;https://substackcdn.com/image/fetch/f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd196b5a6-59f2-46dd-99b3-e10ab1bbd27d_604x604.png&quot;,&quot;is_guest&quot;:false,&quot;bestseller_tier&quot;:null}],&quot;post_date&quot;:&quot;2025-05-06T10:02:25.908Z&quot;,&quot;cover_image&quot;:&quot;https://substackcdn.com/image/fetch/f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3bc29622-1d0f-4b1b-9958-6873fa224109_1042x1342.png&quot;,&quot;cover_image_alt&quot;:null,&quot;canonical_url&quot;:&quot;https://www.databites.tech/p/why-soft-skills-are-the-new-hard&quot;,&quot;section_name&quot;:&quot;Josep's Notes &#128640;&quot;,&quot;video_upload_id&quot;:null,&quot;id&quot;:162956699,&quot;type&quot;:&quot;newsletter&quot;,&quot;reaction_count&quot;:14,&quot;comment_count&quot;:4,&quot;publication_id&quot;:null,&quot;publication_name&quot;:&quot;DataBites&quot;,&quot;publication_logo_url&quot;:&quot;https://substackcdn.com/image/fetch/f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe930fbab-b8df-40ef-9676-3d9ca5d49eae_714x714.png&quot;,&quot;belowTheFold&quot;:false,&quot;youtube_url&quot;:null,&quot;show_links&quot;:null,&quot;feed_url&quot;:null}"></div><p>This week, we go deeper.<br>&#127744;  <strong>Beneath every skill&#8212;technical or human&#8212;lies the mindset that fuels it.</strong></p>
      <p>
          <a href="https://www.databites.tech/p/fixed-mindset-vs-groth-mindset">
              Read more
          </a>
      </p>
   ]]></content:encoded></item><item><title><![CDATA[#18 Generating Modular Code]]></title><description><![CDATA[SQL Crash Course #18]]></description><link>https://www.databites.tech/p/18-generating-modular-code</link><guid isPermaLink="false">https://www.databites.tech/p/18-generating-modular-code</guid><dc:creator><![CDATA[Josep Ferrer]]></dc:creator><pubDate>Fri, 09 May 2025 11:02:23 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!QHOl!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F99261a6f-e2fa-4c74-be13-00baae534558_1380x962.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p></p><div class="pullquote"><p><em>All the course material is stored in the <strong><a href="https://github.com/CornelliusYW/SQL-Crash-Course">SQL Crash Course repository</a></strong>.</em></p></div><p>Hi everyone! <strong>Josep</strong> and <a href="https://open.substack.com/users/6000855-cornellius-yudha-wijaya?utm_source=mentions">Cornellius Yudha Wijaya</a> from <a href="https://open.substack.com/pub/cornellius">Non-Brand Data</a> here &#128075;&#127995;</p><p>As promised, today we are publishing the next two issues of our <a href="https://www.databites.tech/p/launching-the-sql-crash-course-from">SQL Crash Course &#8211; From Zero to Hero!</a> &#128640;</p><p>I am sure you are here to continue our <strong>SQL Crash Course Journey!&#128218;</strong></p><p>If this is your first time or you&#8217;ve for&#8230;</p>
      <p>
          <a href="https://www.databites.tech/p/18-generating-modular-code">
              Read more
          </a>
      </p>
   ]]></content:encoded></item><item><title><![CDATA[Why Soft Skills Are the New Hard Skills]]></title><description><![CDATA[Notes #8 - Why mastering emotional intelligence, adaptability, and empathy is your ultimate career advantage in the age of AI.]]></description><link>https://www.databites.tech/p/why-soft-skills-are-the-new-hard</link><guid isPermaLink="false">https://www.databites.tech/p/why-soft-skills-are-the-new-hard</guid><dc:creator><![CDATA[Josep Ferrer]]></dc:creator><pubDate>Tue, 06 May 2025 10:02:25 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!Q_r7!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3bc29622-1d0f-4b1b-9958-6873fa224109_1042x1342.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p><strong>Hey everyone! &#128075;&#127996;</strong></p><p>Josep here, back with your weekly bite of career insights and encouragement &#10024;</p><p>Last week, we explored <strong>how showing up&#8212;even when uninspired&#8212;can unlock your best creative work.</strong></p><div class="digest-post-embed" data-attrs="{&quot;nodeId&quot;:&quot;3ffc21fa-9c22-4a78-8def-69d016105f03&quot;,&quot;caption&quot;:&quot;Hey everyone! &#128075;&#127996;&quot;,&quot;cta&quot;:&quot;Read full story&quot;,&quot;showBylines&quot;:true,&quot;size&quot;:&quot;sm&quot;,&quot;isEditorNode&quot;:true,&quot;title&quot;:&quot;You Don&#8217;t Need to Feel Inspired to Get Things Done&quot;,&quot;publishedBylines&quot;:[{&quot;id&quot;:132707413,&quot;name&quot;:&quot;Josep Ferrer&quot;,&quot;bio&quot;:&quot;Outstand using data -- Data Science, Design and Tech Tech Writer @KDnuggets @DataCamp &#128073;&#127995;Inquiries in rfeers@gmail.com&quot;,&quot;photo_url&quot;:&quot;https://substackcdn.com/image/fetch/f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd196b5a6-59f2-46dd-99b3-e10ab1bbd27d_604x604.png&quot;,&quot;is_guest&quot;:false,&quot;bestseller_tier&quot;:null}],&quot;post_date&quot;:&quot;2025-04-29T10:01:51.393Z&quot;,&quot;cover_image&quot;:&quot;https://substackcdn.com/image/fetch/f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff1897a58-27aa-4791-9c63-0168599748f7_1040x1040.png&quot;,&quot;cover_image_alt&quot;:null,&quot;canonical_url&quot;:&quot;https://www.databites.tech/p/you-dont-need-to-feel-inspired-to&quot;,&quot;section_name&quot;:&quot;Josep's Notes &#128640;&quot;,&quot;video_upload_id&quot;:null,&quot;id&quot;:162354465,&quot;type&quot;:&quot;newsletter&quot;,&quot;reaction_count&quot;:9,&quot;comment_count&quot;:0,&quot;publication_id&quot;:null,&quot;publication_name&quot;:&quot;DataBites&quot;,&quot;publication_logo_url&quot;:&quot;https://substackcdn.com/image/fetch/f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe930fbab-b8df-40ef-9676-3d9ca5d49eae_714x714.png&quot;,&quot;belowTheFold&quot;:false,&quot;youtube_url&quot;:null,&quot;show_links&quot;:null,&quot;feed_url&quot;:null}"></div><p>This week, we&#8217;re shifting focus to a different kind of edge: the human one.<br>&#127744; <strong>our most valuable skills might not be technical&#8212;they&#8217;re human.</strong></p>
      <p>
          <a href="https://www.databites.tech/p/why-soft-skills-are-the-new-hard">
              Read more
          </a>
      </p>
   ]]></content:encoded></item><item><title><![CDATA[ML - What It Is, How It Works & Why It Matters]]></title><description><![CDATA[CS14 - An ML beginner-friendly guide with a visual cheatsheet.]]></description><link>https://www.databites.tech/p/ml-what-it-is-how-it-works-and-why</link><guid isPermaLink="false">https://www.databites.tech/p/ml-what-it-is-how-it-works-and-why</guid><dc:creator><![CDATA[Josep Ferrer]]></dc:creator><pubDate>Mon, 05 May 2025 14:02:28 GMT</pubDate><enclosure url="https://substack-post-media.s3.amazonaws.com/public/images/736707c9-9eda-46d2-ab01-ff9b20f5ef19_1465x1057.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>Machine learning is all around us, from your Netflix recommendations to the voice behind your phone&#8217;s assistant.<br>But how does it work? </p><p>And how can you get started?</p><p>Today I&#8217;m bringing a simple introduction to <strong>Machine Learning</strong>, <strong>its</strong> <strong>types</strong>, and <strong>real-world examples</strong> &#8212; plus giving you a cheatsheet to keep things crystal clear. &#128588;&#127995;</p><p>So let&#8217;s get started with the full-resolution cheatsheet &#128071;&#127995;</p>
      <p>
          <a href="https://www.databites.tech/p/ml-what-it-is-how-it-works-and-why">
              Read more
          </a>
      </p>
   ]]></content:encoded></item><item><title><![CDATA[#16 Database Modifications ]]></title><description><![CDATA[SQL Crash Course #16]]></description><link>https://www.databites.tech/p/16-database-modifications</link><guid isPermaLink="false">https://www.databites.tech/p/16-database-modifications</guid><dc:creator><![CDATA[Josep Ferrer]]></dc:creator><pubDate>Fri, 02 May 2025 10:02:27 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!2Y-t!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F75944a1c-58ad-4eba-8285-b85dda072d6a_1380x962.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p></p><div class="pullquote"><p><em>All the course material is stored in the <strong><a href="https://github.com/CornelliusYW/SQL-Crash-Course">SQL Crash Course repository</a></strong>.</em></p></div><p>Hi everyone! <strong>Josep</strong> and <a href="https://open.substack.com/users/6000855-cornellius-yudha-wijaya?utm_source=mentions">Cornellius Yudha Wijaya</a> from <a href="https://open.substack.com/pub/cornellius">Non-Brand Data</a> here &#128075;&#127995;</p><p>As promised, today we are publishing the next two issues of our <a href="https://www.databites.tech/p/launching-the-sql-crash-course-from">SQL Crash Course &#8211; From Zero to Hero!</a> &#128640;</p><p>I am sure you are here to continue our <strong>SQL Crash Course Journey!&#128218;</strong></p><p>If this is your first time or you&#8217;ve for&#8230;</p>
      <p>
          <a href="https://www.databites.tech/p/16-database-modifications">
              Read more
          </a>
      </p>
   ]]></content:encoded></item><item><title><![CDATA[You Don’t Need to Feel Inspired to Get Things Done]]></title><description><![CDATA[Notes #7 - Why maniacal momentum beats waiting for the perfect moment, EVERY SINGLE TIME.]]></description><link>https://www.databites.tech/p/you-dont-need-to-feel-inspired-to</link><guid isPermaLink="false">https://www.databites.tech/p/you-dont-need-to-feel-inspired-to</guid><dc:creator><![CDATA[Josep Ferrer]]></dc:creator><pubDate>Tue, 29 Apr 2025 10:01:51 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!rNRG!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff1897a58-27aa-4791-9c63-0168599748f7_1040x1040.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p><strong>Hey everyone! &#128075;&#127996;</strong></p><p>Josep here, back with your weekly bite of career insights and encouragement &#10024;</p><p>Last week, we dove into how <strong>GenAI is </strong><em><strong>not</strong></em><strong> the real threat</strong>, but how <em>you</em> adapt to it might be.</p><div class="digest-post-embed" data-attrs="{&quot;nodeId&quot;:&quot;23be9ec7-dcb2-49bb-841f-8ec9814d80e4&quot;,&quot;caption&quot;:&quot;Hey everyone! &#128075;&#127996;&quot;,&quot;cta&quot;:null,&quot;showBylines&quot;:true,&quot;size&quot;:&quot;sm&quot;,&quot;isEditorNode&quot;:true,&quot;title&quot;:&quot;GenAI Won&#8217;t Replace You &#8212; But Someone Who Knows How to Use It Will&quot;,&quot;publishedBylines&quot;:[{&quot;id&quot;:132707413,&quot;name&quot;:&quot;Josep Ferrer&quot;,&quot;bio&quot;:&quot;Outstand using data -- Data Science, Design and Tech Tech Writer @KDnuggets @DataCamp &#128073;&#127995;Inquiries in rfeers@gmail.com&quot;,&quot;photo_url&quot;:&quot;https://substackcdn.com/image/fetch/f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd196b5a6-59f2-46dd-99b3-e10ab1bbd27d_604x604.png&quot;,&quot;is_guest&quot;:false,&quot;bestseller_tier&quot;:null}],&quot;post_date&quot;:&quot;2025-04-22T10:01:25.219Z&quot;,&quot;cover_image&quot;:&quot;https://substackcdn.com/image/fetch/f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F025a7b7b-0665-42e5-b456-adffc08f6741_1040x1464.png&quot;,&quot;cover_image_alt&quot;:null,&quot;canonical_url&quot;:&quot;https://www.databites.tech/p/genai-wont-replace-you-but-someone&quot;,&quot;section_name&quot;:&quot;Josep's Notes &#128640;&quot;,&quot;video_upload_id&quot;:null,&quot;id&quot;:161863295,&quot;type&quot;:&quot;newsletter&quot;,&quot;reaction_count&quot;:7,&quot;comment_count&quot;:0,&quot;publication_id&quot;:null,&quot;publication_name&quot;:&quot;DataBites&quot;,&quot;publication_logo_url&quot;:&quot;https://substackcdn.com/image/fetch/f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe930fbab-b8df-40ef-9676-3d9ca5d49eae_714x714.png&quot;,&quot;belowTheFold&quot;:false,&quot;youtube_url&quot;:null,&quot;show_links&quot;:null,&quot;feed_url&quot;:null}"></div><p>This week, we&#8217;re taking a different kind of plunge &#8212; one that&#8217;s a little more personal, a little more uncomfortable, and incredibly important:<br>&#127744; <strong>Your work doesn&#8217;t have t&#8230;</strong></p>
      <p>
          <a href="https://www.databites.tech/p/you-dont-need-to-feel-inspired-to">
              Read more
          </a>
      </p>
   ]]></content:encoded></item></channel></rss>