<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:wfw="http://wellformedweb.org/CommentAPI/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:sy="http://purl.org/rss/1.0/modules/syndication/" xmlns:slash="http://purl.org/rss/1.0/modules/slash/" xmlns:webfeeds="http://webfeeds.org/rss/1.0" version="2.0">
  <channel>
    <title>Dringtech</title>
    <link>https://dringtech.com/</link>
    <atom:link href="https://dringtech.com/feed.rss" rel="self" type="application/rss+xml"/>
    <lastBuildDate>Sat, 20 Dec 2025 01:17:50 GMT</lastBuildDate>
    <language>en</language>
    <generator>Lume v3.0.0</generator>
    <item>
      <title>Weeknotes 2024-W47</title>
      <link>https://dringtech.com/blog/2024/weeknotes-2024-W47/</link>
      <guid isPermaLink="false">https://dringtech.com/blog/2024/weeknotes-2024-W47/</guid>
      <description>Project progress on Bradford 2025 and a draft schema for event data publishing.</description>
      <content:encoded>
        <![CDATA[<p>A few weeks with no weeknotes! Finally getting round to it late on a Friday
evening, so this one might be a bit short.</p>
<p>Good, if slow, progress on publishing the Bradford 2025 data. Volunteering data
is now nearly ready to go. In preparing for this, I've developed some minimal
data governance which identifies corporate risk and seeks sign-off from key
personnel. I've also made links with the communications team, who have expressed
concern that going public will mean they have a more difficult job. My counter
argument is typically that people would criticise in any case, and having the
facts in the form of data should make things easier. We've committed to getting
the data published next week. Watch this space!</p>
<p>Meanwhile, I've been working with colleagues at Open Innovations to plan the
next releases. Some minor progress on technical adapters, which will mean that
pulling the data is easy to do.</p>
<p>Not a lot of other work, although I did develop a
<a href="https://github.com/dringtech/json-schemas/blob/main/schemas/culture/draft/2024-10/events.schema.json">schema for publishing event lists</a>
as part of some work I did to help publish the Hebden Bridge Picture House
listings on the
<a href="https://whatsonhebdenbridge.com/">What's On Hebden Bridge site</a>. Planning to do
the same for The Trades Club.</p>
<p>Links, in no particular order:</p>
<ul>
<li>A very clever
<a href="https://piccalil.li/blog/making-content-aware-components-using-css-has-grid-and-quantity-queries/">design for content responsive layouts</a>
using modern CSS.</li>
<li><a href="https://www.benjystanton.co.uk/blog/inclusive-design-resources/">A list of articles on inclusive design</a></li>
<li>The <a href="https://weird.one/">Weird One</a> prosocial web, where you can set up a home
on the web away from the big social media networks. Mine is here:
<a href="https://giles.dri.ng/">https://giles.dri.ng/</a></li>
<li>A list of
<a href="https://european-alternatives.eu/alternatives-to">European hosted alternatives to US services</a>.</li>
<li>Some
<a href="https://turso.tech/blog/simple-trick-to-save-environment-and-money-when-using-github-actions">guidance on more efficient GitHub workflows</a></li>
</ul>
]]>
      </content:encoded>
      <pubDate>Fri, 22 Nov 2024 00:00:00 GMT</pubDate>
    </item>
    <item>
      <title>Weeknotes 2024-W42</title>
      <link>https://dringtech.com/blog/2024/weeknotes-2024-W42/</link>
      <guid isPermaLink="false">https://dringtech.com/blog/2024/weeknotes-2024-W42/</guid>
      <description>
        Missed a week, but now back on it. Some very satisfying progress on a design system and some data governance. Exciting software development ideas
      </description>
      <content:encoded>
        <![CDATA[<p>So I missed weeknote-ing last week, given everything that was going on, so this will be a bumper edition…
except that one of the projects is still shrouded in secrecy…</p>
<p>The Bradford 2025 open data work has been progressing at pace. Lots of work on
data governance processes and
<a href="https://open-innovations.github.io/bradford-2025/design/">the design system</a>.
The basic volunteering data is more or less ready to go, once we get approval to make it open.
We've also had some very productive discussions about the ongoing work, which is likely to
be fairly intense for the next couple of months, then drop down to maintenance.
I'm really enjoying the slightly deeper work around setting up governance, design systems
and skills transfer to the evaluation team.</p>
<p>With this in mind, I've also started a short course on
<a href="https://www.futurelearn.com/courses/evaluation-for-arts-culture-and-heritage-principles-and-practice">Evaluation for Arts, Culture, and Heritage: Principles and Practice</a>
that's been set up by the Centre for Cultural Value at Leeds University.
I've met a few of the folk from the CCV through Leeds 2023 and Bradford 2025 culture work,
and my initial impressions of the course are very favourable.
It's reinforcing some of the things that I've discussed when talking about culture data,
notably the difficultly of codifying the actual evaluation outcomes, rather than just the
monitoring aspects. My view, which I'm interested to test as the course unfolds,
is that the monitoring data is a useful backdrop to the actual work of evaluation.
If it's doing its job well, the backdrop provides evidence of the impact of
cultural stuff. There is a tendency for overreach of aims: can culture solve all of society's ills?
Should it? More on this in coming weeks!</p>
<p>Back at the monitoring level, I will need to slightly refine my
<a href="https://dringtech.com/blog/2024/weeknotes-2024-W40/#small-value-masker">data masking technique</a>,
as I realised during the week that if only one value was below the clip level, the replacement value
of the average would be the same as the original value! Looks like all the highly trained statisticians
at the ONS know what they're doing! On that basis, I'll resort to dropping the raw values if there are
fewer than 3 (?) values.</p>
<div style="display: flex;flex-wrap:wrap;">
<p style="flex-basis:20rem;flex-grow:1;">
One thing I did on the secret project that I can talk about, I reckon, is build a new visualisation
to visualise links between clusters. I've been visualising clusters of
<a href="https://www.gov.uk/government/publications/standard-industrial-classification-of-economic-activities-sic">UK Company SIC codes</a>,
which although not a perfect marker, are at least relatively well documented, and for more traditional
organisations are probably OK. I used a variant of
<a href="https://observablehq.com/@d3/bilevel-edge-bundling">Bilevel Edge Bundling</a>
which ended up being quite a useful visualisation. The server rendering is done as a <a href="https://lume.land/">Lume</a>
component, with some interactivity (highlighting, not updating) added using some minimal Javascript.
Here's what it looked like:
</p>
<img style="margin-inline:auto;flex-basis:20rem;flex-shrink:0;" src="https://dringtech.com/assets/uploads/bilevel-edge-bundling.png" />
</div>
<p>I've latterly been pondering running my own instances of useful services as a way of decoupling from
cloud services which are increasingly being enshittified and sullied with AI. I've currently got an instance
of <a href="https://openproject.org/">Open Project</a> running, where I sometimes plan larger bits of work.
I've tinkered with a few other services, but the two which would logically fit next to that are
<a href="https://www.freeipa.org/">FreeIPA</a> ro manage users and access and
<a href="https://forgejo.org/">Forgejo</a> to manage software development (this is the software that powers <a href="https://codeberg.org/">Codeberg</a>).
No firm plans, but I found myself re-researching this and wanted to capture this.</p>
<p>Some while ago I presented the work I did on the <a href="https://data.leeds2023.co.uk/">Leeds 2023 data site</a>.
The organising team has written up a
<a href="https://www.nottingham.ac.uk/clas/departments/culturalmediaandvisualstudies/research/visioning-a-creative-and-cultural-county/blog/blog-posts/visioning-an-audience-data-strategy.aspx">blog post about the Visioning an audience data strategy event</a>, and kindly sent it to me.</p>
<p>Links, in no particular order:</p>
<ul>
<li>There was some discussion over on Mastodon about <a href="https://www.tldraw.com/">tldraw infinite canvas component</a>,
which was being recommended, but appears to have been closed sourced.
An alternative is the <a href="https://dgrm.net/">DGRM editor</a>. Both look interesting!</li>
<li>Matt Edgar boosted a
<a href="https://petafloptimism.com/2024/10/08/wibble-y-wobble-y-pace-y-wace-y/">blog post post about Stewart Brand's pace layers</a>
as applied to services. The conversion of infrastructure to OpEx, and the vesting of more and more
operational stuff in bundles services was an interesting perspective.</li>
<li>I keep meaning to get into <a href="https://www.kaggle.com/">Kaggle</a>, particularly the competitions.</li>
<li>I should consider joining a union. <a href="https://utaw.tech/">United Tech and Allied Workers (UTAW)</a> looks like a good one</li>
<li>I came across the brilliant <a href="https://htmlforpeople.com/"><q>HTML is for People</q> course</a> that Blake Watson has created.
It's a really great introduction which I'm going to try to get the kids to have a look at!</li>
<li>More from the CCV: this time a
<a href="https://www.culturalvalue.org.uk/transforming-cultural-sector-data/">project that is aiming to scope out a national Cultural data observatory</a>.</li>
</ul>
<p>And finally (phew!), I ended up going down a rabbit hole on a quote that someone shared and which rang true for me:</p>
<figure style="border-inline-start: 0.5rem solid var(--color-green);">
<blockquote>
<p>
The Biggest Problem in Communication Is the Illusion That It Has Taken Place
</p>
</blockquote>
<figcaption style="font-size:0.9em;font-style: oblique; margin-inline-start: 3rem;margin-block-start:0.5rem;">
&mdash; <a href="https://quoteinvestigator.com/2014/08/31/illusion/">Find out who Quote Investigator thinks (probably) said this</a>
</figcaption>
</figure>
]]>
      </content:encoded>
      <pubDate>Fri, 18 Oct 2024 00:00:00 GMT</pubDate>
    </item>
    <item>
      <title>Weeknotes 2024-W40</title>
      <link>https://dringtech.com/blog/2024/weeknotes-2024-W40/</link>
      <guid isPermaLink="false">https://dringtech.com/blog/2024/weeknotes-2024-W40/</guid>
      <description>Scope wobbles, massive progress and data masking techniques.</description>
      <content:encoded>
        <![CDATA[<p>Short one this week, as it's been a heck of a time for various reasons.</p>
<p>Most of the week has been consumed with some rocks thrown into a project about
the scope of an initial phase. Let's just say that managing expectations and
getting feedback early is very important! It's a pity, as the work is progressing
well overall. Consultancy is hard!</p>
<p>Other projects have been more productive. The Bradford 2025 data publishing
programme attained automated extract from the first source system. Quite a
milestone, which allows / requires getting the data governance bit sorted out.
To this end, I've documented the process and drafted some risks to be discussed
with the risk owner. This should lead to the data being released, with any luck.
We could, by the end of next week, have an initial site sorted out.
Watch this space.</p>
<p>One of the steps on the data release was the handling of small numbers in summaries
with the risk of personal identification. Often small numbers are surpressesd in
data released to avoid this eventuality, but there are other techniques.
<a href="https://www.ons.gov.uk/peoplepopulationandcommunity/populationandmigration/populationestimates/methodologies/comparisonofposttabularstatisticaldisclosurecontrolmethods">ONS has published information on disclosure control</a>, but in our case,
a simpler technique would suffice. <span id="small-value-masker">I built a small value masker which bisects the
dataset based on a clip level, then replaces the numbers below the clip with the
average of the small numbers.</span> The thinking here is that the overall distribution
and totals should be similar in this case. I'll do some more detailed analysis of
this next week, but for now, it'll do!</p>
]]>
      </content:encoded>
      <pubDate>Fri, 04 Oct 2024 00:00:00 GMT</pubDate>
    </item>
    <item>
      <title>Weeknotes 2024-W39</title>
      <link>https://dringtech.com/blog/2024/weeknotes-2024-W39/</link>
      <guid isPermaLink="false">https://dringtech.com/blog/2024/weeknotes-2024-W39/</guid>
      <description>Projects and planning meetings, plus a brief delve into digital marketing.</description>
      <content:encoded>
        <![CDATA[<p>Loads of progress on the Bradford 2025 data publishing project.
I had the first in-person working day with the team on Wednesday,
and we managed to get one of the key datasets extracted and a minimal process working.
This built on the Leeds 2023 data, and I was pleased to dramatically simplify
the state tracking code I wrote for that extract.</p>
<p>This code is needed because the volunteering system does not track when checkpoints
(used to track the onboarding process for volunteers) were achieved.
The Leeds solution (for the same system) was pretty arcane, and meant that the
dataset needed to contain a hashed version of the ID field to match with incoming
data.
The new solution reduces the information stored along with the hash to a list of
checkpoints and dates.
The first step on extracting the data from the system is to establish if any of the
hashes have a new checkpoint, and if so append the updates to the list of state dates.
This avoids some unpleasant round-tripping on the data.
Once this is done, the (possibly updated) state data is read and turned into a lookup
returning the states as a dictionary.
This dictionary replaces the hash, meaning that this potentially personal and disclosive
data is removed from the dataset.</p>
<p>The hush-hush project also progressed, and I was able to play back the data work
to the client, with some excellent feedback. We have a pilot organisation workshop
next week, and will hopefully be able to speak more about the project after that.
The data science work that I mentioned in <a href="https://dringtech.com/blog/2024/weeknotes-2024-W38/">last week's weeknotes</a>
has progressed, with a fuzzy match of part of the company house data.
Next steps on that is to add in the identification of personal names.</p>
<p>Finally, today, I met with a digital marketer who is working with one of my clients.
He needed some help getting the Google search console linked in to GA-4, which needed
a bit of tinkering and archaeology to work out where their DNS records were managed!
Will be interesting to keep abreast of this work and see how digital marketing data
is linked together.</p>
<p>A few links:</p>
<ul>
<li>An excellent <a href="https://freeradical.zone/@missiggeek/113170518046838948">visualisation / flowchart of what is considered personal data from missiggeek on Mastodon</a></li>
<li>A brilliant <a href="https://web.archive.org/web/20240919122353/https://www.newyorker.com/tech/annals-of-technology/chatgpt-is-a-blurry-jpeg-of-the-web">article about what ChatGPT is, with analogy to lossy compression</a></li>
<li>The <a href="https://h3geo.org/">H3 hexagonal based coordinate system</a> developed by Uber for geospatial analysis.</li>
</ul>
]]>
      </content:encoded>
      <pubDate>Fri, 27 Sep 2024 00:00:00 GMT</pubDate>
    </item>
    <item>
      <title>Weeknotes 2024-W38</title>
      <link>https://dringtech.com/blog/2024/weeknotes-2024-W38/</link>
      <guid isPermaLink="false">https://dringtech.com/blog/2024/weeknotes-2024-W38/</guid>
      <description>
        Project kickoffs, progress and some honest-to-goodness data science. Also, multiple examples DNS being to blame for stuff and
      </description>
      <content:encoded>
        <![CDATA[<p>This week, the Bradford 2025 data site work kicked off.
An excellent meeting where we collaboratively agreed on the areas of focus.
We're already expanding on the Leeds 2023 work, so I'm hopeful we'll be building
some solid culture infrastructure for the team, and legacy for the broader region.</p>
<p>The other project is progressing well too, although is still under a communications
blackout, hence me not mentioning what it is explicitly!
I've had cause to use some actual data science techniques in building an exciting
bit of data infrastructure. Firstly, I'm matching very large datasets
(several million rows in the reference), and this was slowing down.
Chucking this into DuckDB rather than processing with PETL or Pandas totally sped
this process up.
Right tool for the job.
As an added bonus, the DuckDB bindings for Python has a <code>.df()</code> method which returns the
result set as a Pandas dataframe. Winner!
I've also used some fuzzy matching using <code>thefuzz</code> to identify spelling mistakes in the
source dataset and enhance matching.
Next up will be starting to try to identify personal names in the source data so that I
can separate organisations from individuals, and handle them separately.
I've identified <a href="https://en.wikipedia.org/wiki/Named-entity_recognition">Named-entity recognition (NER)</a>
as the means to do this, and early experiments using the NLTK NER modules is looking
promising. Quite a lot of tuning to be done here.
Finally, I'm going to extend the simple DuckDB SQL matching to take account of fuzzy matches
between the source and reference sets.</p>
<p>In other work, I spent quite a lot of time on network fixing. One of my clients, for whom
I've set up a Sentinel license server running on a laptop accessed via a VPN, got a
new router, so I had to open up the VPN ports again. Annoyingly I hadn't written down
exactly what I did, so this took a bit longer to work out that the port mapping (51820/udp)
shouldn't be limited! Anyway sorted now.
Another client had a major DNS failure when one of their suppliers decided to turn off a cPanel
server that they had been using, and no longer needed. Unfortunately, their main domain was
associated with this service, and when it went away, everything broke. Badly.
That took a couple of hours to unpick and get back on the road to recovery.
And for the hat-trick, a data microsite that OI had created had it's DNS records removed,
so stopped working.
Thankfully, I was able to direct them to fix it fairly quickly, and managed to find the
original email trail from two years ago.</p>
<p>All of this left me thinking that for many small (and not-so-small) businesses this network
configuration was something close to magic. I have begun work on a workbook, focussed on small
businesses, which collects important data about the critical infrastructure that keeps their
website, email and other important services up-and-running. It would encourage the business
owners to review this data frequently so that (for example) credit cards don't run out and
lead to a break in service. Furthermore it would contain advice about practical matters,
such as email addresses to use a primary contact for critical accounts to avoid being locked
out in the event something goes wrong.</p>
]]>
      </content:encoded>
      <pubDate>Fri, 20 Sep 2024 00:00:00 GMT</pubDate>
    </item>
    <item>
      <title>Weeknotes 2024-W37</title>
      <link>https://dringtech.com/blog/2024/weeknotes-2024-W37/</link>
      <guid isPermaLink="false">https://dringtech.com/blog/2024/weeknotes-2024-W37/</guid>
      <description>Some thoughts on how and why to automation and encapsulate stuff, and an idea for a business.</description>
      <content:encoded>
        <![CDATA[<p>Full disclosure, I'm almost a week late writing these, so they'll be a bit patchy!</p>
<p>A couple of big thoughts: one about value of various types of work and
one about a business model that might be worth exploring.</p>
<p>I was pondering the things that we spend time on in projects and things we don't.
In an ideal world the balance between time spent on the hard technical stuff
and time spend on the even harder people-related stuff should tip towards the
people-related. I've invested a lot of time in ensuring I know how to make the
techy stuff get out of the way. You want a website: I know how to build it.
You want some some visualisations? I know how to do that, and moreover, I've
<a href="https://open-innovations.github.io/oi-lume-viz/">invested time with the people I work with</a>
on making it easy for anyone in the team to create a pretty good map, or chart
or whatever. Pipelines? I've got patterns that help me make robust and repeatable
data processing pipeline pretty fast.
This means that we can prototype at speed and get stuff done.
However, there are times, and particularly in more open ended projects
where delivering fast is not really desirable. This isn't because we need to spend
longer building the tech, but because we need to spend more
time dealing with the <em>really</em> interesting stuff, which tends to be squishy, human
stuff. That might be researching, pondering, discussing with colleagues, exploring
prior art. Sometimes, delivering fast is a sure-fire way to miss the point.</p>
<p>I'm sure at various points in the week, those thoughts have been more well defined,
so I might loop back to them and see if I can tighten them up a bit.</p>
<p>The other 'biggish thought' I had was about the idea of founding a co-op which
delivers IT services to other co-ops. This came out a tech incident that I was
called upon to help out with at Equal Care Co-op. They are a small team, and don't
have a dedicated IT Service team. This is arguably problematic given the dependence
that people are beginning to have on the tech. Getting a full service team, with
cover dedicated simply to Equal Care is likely to be beyond their means, and
it feels like signing up with a commercial organisation might not mesh well with
their ethos and instruments. Maybe a like-minded team, constructed to service a
number of organisations could have the economies of scale to deal with this.
Some initial challenges are seed funding the organisation, defining the offer
and recruiting both service delivery team and the client organisations.
Would this be a co-op of co-ops: each providing their IT folk to support others?
Would there be a TUPE element to this if acting as a de facto outsourcing organisation?
More to think about, but worth developing.</p>
<p>That'll do for now: it's nearly time for the next weeknote!</p>
]]>
      </content:encoded>
      <pubDate>Fri, 13 Sep 2024 00:00:00 GMT</pubDate>
    </item>
    <item>
      <title>Weeknotes 2024-W36</title>
      <link>https://dringtech.com/blog/2024/weeknotes-2024-W36/</link>
      <guid isPermaLink="false">https://dringtech.com/blog/2024/weeknotes-2024-W36/</guid>
      <description>Progress on the cultural-sector projects I'm working on and some R&amp;D, including reviving Open Audience.</description>
      <content:encoded>
        <![CDATA[<p>This week has seen
further clarification on scope for the project that started when I was on holiday,
progress on the contract for the first Bradford 2025 data delivery,
the start of the refresh of <a href="https://openaudience.org/">Open Audience</a>
and a bunch of R&amp;D on optimising SvelteKit.</p>
<p>One thing that I've struggled with a bit on the new project is lack of clarity on the direction of travel.
A colleague (hi, Sarah!) describes this as the North Star, which is great imagery:
I just need to know that I'm heading in broadly the right direction.
I'd raised the idea of writing down some (possibly / probably wrong) hypotheses that we could
use to help set this direction without being too prescriptive.
The same colleague then shared this excellent
<a href="https://benholliday.com/2015/07/16/everything-is-hypothesis-driven-design/">blog post by Ben Holliday about hypothesis driven design</a>
which pretty well encapsulates my reasons for wanting to write something down
We've (me and Sarah) have been working through some high-level / very broad hypotheses
and realised we can use these to identify some lower-level / more fine-grained hypotheses
which can really shape the work.
I've come to realise this is slightly in tension with the design &quot;double diamond&quot;,
which aims for breadth in the first instance.
My response to this is that without
the direction setting, the initial diamond risks effort radiating in all directions from
a point, resulting in a circle. The hypotheses set enough of a direction
to decide what to consider in the first diamond.
I may write this up soon.</p>
<p>My SvelteKit optimisation has resulted in a much better understanding of
how to control what the bundler creates. This is captured in a repo, ready to be
written up. As I was doing this, I was also implementing some of the recommendations
in the Social Value site I'm building. This uses DuckDB-WASM on the client side,
which is excellent... a full OLAP database in a browser? Crikey!
It is, however, quite costly on the network, as it has to download 35MB of Web Assembly code
to start the database engine up.
In a search for an alternative, I've come across <a href="https://oakserver.org/acorn">the @oak/acorn framework</a>
which will allow me to serve just the JSON. Much better for this use case.
I have, however, discovered that I cannot host this on <a href="https://deno.com/deploy">Deno Deploy</a>,
which is my go-to platform.</p>
<p>The final bit of work is refreshing <a href="https://openaudience.org/">Open Audience</a>, a tool
which <a href="https://tomforth.co.uk/">Tom Forth</a> built using 2011 Census data which
can build a profile of attendees at events based on their postcodes.
This has been buzzing around some of the culture work that I've been involved in for a while
and given we now have 2021 Census data, I thought I'd rebuild at least the dataset.
The data has changed ever so slightly, so it might not be possible to recreate completely,
and I might need to rethink the frontend.
I have some other ideas, including:</p>
<ul>
<li><strong>Open Audience as a service</strong>: An API which provides the profile based on postcodes</li>
<li><strong>Open Audience language bindings</strong>: Wrappers in Python / Javascript to allow the data to be used easily in pipelines and web-pages</li>
</ul>
<p>I did have a look at <a href="https://storybook.js.org/">Storybook</a>, and it looks promising as a way of
documenting web design libraries. I'm also wondering if there's a way of using it as an SSG,
as I suspect it might be overkill for some of the work.</p>
<p>Finally, I've added an <a href="https://dringtech.com/feed.rss">RSS feed for this site</a>, so you can read my witterings in your favourite feed reader.
I like <a href="https://netnewswire.com/">NetNewsWire</a>, FWIW, but there are many others</p>
<p>Plans for next week:</p>
<ul>
<li>Refine the culture project hypotheses</li>
<li>Blog about the SvelteKit optimisations</li>
<li>Research adding <em>ActivityPub</em> to this site.
There's a really nice (albeit incomplete!) <a href="https://maho.dev/2024/02/a-guide-to-implement-activitypub-in-a-static-site-or-any-website/">series of blog posts by Maho Pacheco</a>
covering how to do this.</li>
</ul>
<p>Some links I came across</p>
<ul>
<li><a href="https://landsat.gsfc.nasa.gov/apps/YourNameInLandsat-main/index.html">Your name in Landsat images</a></li>
<li><a href="https://www.map.signalbox.io/?location=@53.69127,-1.99726,9.922Z">Signalbox live train locations</a>. NB Proves to be not that accurate as I tested when I was on an actual train!</li>
<li>A really nice <a href="https://digital-land.github.io/blog-post/open-data-and-the-planning-data-platform/">blog post about the importance of clear Open Data licensing by Mike Rose and Kieran Wint</a>.</li>
</ul>
<p>Finally, I really like this pull quote by Ted Chiang, shared in this <a href="https://mas.to/@gleick/113058537194470078">toot by James Gleick on Mastodon</a>:</p>
<blockquote>
<p>The programmer Simon Willison has described the training for large language models as
‘money laundering for copyrighted data,’ which I find a useful way to think about the
appeal of generative-A.I. programs: they let you engage in something like plagiarism,
but there’s no guilt associated with it because it’s not clear even to you that you’re copying.<br>
<em>Source <a href="https://www.newyorker.com/culture/the-weekend-essay/why-ai-isnt-going-to-make-art">https://www.newyorker.com/culture/the-weekend-essay/why-ai-isnt-going-to-make-art</a></em>.</p>
</blockquote>
]]>
      </content:encoded>
      <pubDate>Fri, 06 Sep 2024 00:00:00 GMT</pubDate>
    </item>
    <item>
      <title>Weeknotes 2024-W35</title>
      <link>https://dringtech.com/blog/2024/weeknotes-2024-W35/</link>
      <guid isPermaLink="false">https://dringtech.com/blog/2024/weeknotes-2024-W35/</guid>
      <description>I start weeknoting (again!), do some bikeshedding and come back from holiday to a range of projects.</description>
      <content:encoded>
        <![CDATA[<p>I've decided to start writing weeknotes for myself.
In part to keep track of <em>all the stuff</em> but also to ensure I keep in the habit of writing.
As with all projects, this requires a degree of
<a href="https://en.wiktionary.org/wiki/bikeshedding#English">bikeshedding</a> and / or
<a href="https://en.wiktionary.org/wiki/yak_shaving#English">yak shaving</a>.</p>
<p>Given one of the hardest things to do in tech is deciding on the name of things,
my first task is coming up with distinct names for each of the what I assume will be almost endless collection of blog posts.
I briefly considered naming them like Friends episodes,
but this could quite quickly descend into farce.
I decided on simply the year and <a href="https://www.calendar-365.com/week-number.html">week number</a>,
according to the ISO standard of weeks starting on Monday.
I note from the link above that Monday is Labor Day in the US,
which I think has something to do with wearing white shoes.</p>
<p>This week has been mostly return from holiday and getting my head round
a couple of culture-related projects that I'm involved in as a freelancer with <a href="https://open-innovations.org/">Open Innovations</a>.
The first is the Bradford 2025 City of Culture,
for which I wrote an <a href="https://open-innovations.github.io/bradford-2025/strategy/">Open Data Strategy</a> just before my break.
I've agreed the scope of the next piece of work to start delivering on it.
It builds on the <a href="https://data.leeds2023.co.uk/">LEEDS 2023 Data Microsite</a> that I built with OI team.
The other project is along the same lines, but much broader, and earlier in the discovery phase.
I consequently spent a fair amount of time reading what I could lay my hands on
and hypothesising about what we might build.
Very happy to be wrong about assumptions at this stage!</p>
<p>Away from Open Innovations,
I've finally broken the back of a website upgrade for the
<a href="https://hebdenbridgepicturehouse.co.uk/">Hebden Bridge Picture House</a> which should enable me to upgrade from (out of support) PHP 7.3.
It turns out that some changes in the Textpattern software have invalidated the way I built the site.
The Textpattern community were very helpful,
responding to <a href="https://forum.textpattern.com/viewtopic.php?id=52408">a topic I posted in the Textpattern forums</a> with some really useful suggestions.</p>
<p>I've also been refreshing a prototype site I'm building for <a href="https://www.chyconsultancy.com/">social value consultants CHY Consultancy</a>.
I've been mindful of recent discussions about Javascript framework bloat, and I agree wholeheartedly with this.
I really like static site generators, and tend to use <a href="https://lume.land/">Lume</a> as my weapon of choice.
In fact, this very post is written in it.
There are times that having a framework that can be easily extended to do more powerful client-side stuff is useful,
but I really don't like the likes of React, Angular, et al.
I've long been a fan of <a href="https://svelte.dev/">Svelte</a> for componentised client-side code, particularly where the need is a bit more heavyweight.
The prototype uses <a href="https://kit.svelte.dev/">Sveltekit</a> which builds on this beautifully.
Given a combination of setting <a href="https://kit.svelte.dev/docs/page-options"><code>prerender</code>, <code>csr</code> and <code>ssr</code> options</a> for pages,
it's even possible to <a href="https://kit.svelte.dev/docs/adapter-static">generate a static site</a>.
This can be turned off for any pages which need live server-side rendering (e.g. dynamic pages based on routing parameters),
or client-side processing (e.g. highly interactive islands on a web page).
Anyway, there's been tinkering on making this perform well.</p>
<p>Stuff added to the list to do or look at next week:</p>
<ul>
<li><a href="https://storybook.js.org/">Storybook</a> frontend UI workshop.</li>
<li>Draft a blog post about the Generative AI inspired by a very long and rambling pub chat that I had with a friend last Sunday night.</li>
<li>Draft a blog post about using SvelteKit to make efficient websites.</li>
</ul>
<p>Finally, and in the spirit of full disclosure,
I have just spent 20 minutes automating appending the week of the year to the post title
as a <a href="https://lume.land/docs/core/processors/">Lume preprocessor</a>.
It uses the pretty handy <a href="https://date-fns.org/"><code>date-fns</code></a> library, with the format string <code>RRRR-'W'II</code>.
This is next-level yak shaving.</p>
]]>
      </content:encoded>
      <pubDate>Fri, 30 Aug 2024 00:00:00 GMT</pubDate>
    </item>
    <item>
      <title>Kitchen sync</title>
      <link>https://dringtech.com/blog/2024/kitchen-sync/</link>
      <guid isPermaLink="false">https://dringtech.com/blog/2024/kitchen-sync/</guid>
      <description>
        Automatically building and deploying a static website made simple. A few useful tips and pointers concerning use of rsync in a secure way, and dealing with differing user identities.
      </description>
      <content:encoded>
        <![CDATA[<p>In my work with <a href="https://open-innovations.org/">Open Innovations</a> (and elsewhere), I frequently create static websites. These suit the work as they don't take a lot of hosting. Most of the production sites are hosted on GitHub Pages which works really well, certainly for production sites. A slight drawback is the inability to password-protect pages and the quite reasonable limitation of one GitHub Pages site per repo.
Recently the sites have been getting more complex, with longer-running development processes. I decided that it was time to host a <strong>dev</strong> version. Luckily, OI has a cloud server running Apache, so all I needed to do was upload the result of the built site to an appropriate directory.</p>
<p>In my goal of automating all the things, I wanted to make this happen whenever anyone pushed to the <code>dev</code> branch in the repository.
It's pretty simple to <a href="https://docs.github.com/en/actions/using-workflows/workflow-syntax-for-github-actions#example-including-branches-and-tags">trigger a workflow on a push to a specific branch</a>.
The question was what to run to transfer the files.</p>
<p>The answer to this question is <code>rsync</code>, which you can read all about on the <a href="https://en.wikipedia.org/wiki/Rsync">Wikipedia page for <code>rsync</code></a>. In short, this venerable tool allows files to be transferred and synchronised over connection protocols such as SSH.</p>
<p>Building the actions pipeline now has the following stages</p>
<ol>
<li>Build site using the site builder into the build folder. In my case, this the builder is <a href="https://lume.land/"><code>lume</code></a>, which deposits the compiled site in <code>_site</code>.</li>
<li>Use <code>rsync</code> to transfer the build folder to the dev host.</li>
</ol>
<p>Step 1 is easy enough, and in any case out of scope of this post! Step 2 needs a bit of careful thought.</p>
<p>The basic incantation is</p>
<pre><code class="language-bash">rsync --recursive --delete $SOURCE_PATH $SSH_HOST:$SSH_PATH
</code></pre>
<p>This recurses through the source path, uploading any new or changed files, and deleting any orphans. The source path should end in a <code>/</code> to avoid including the directory itself! Using environment variables <code>SOURCE_PATH</code>, <code>SSH_HOST</code> and <code>SSH_PATH</code> means that the configuration can be altered and used in multiple potential targets, which could be useful.
You can <a href="https://docs.github.com/en/actions/using-workflows/workflow-syntax-for-github-actions#env">set environment variables in GitHub workflows</a>.</p>
<p>When using <code>rsync</code> interactively, it's usual to run as a personal user account, in which case SSH credentials (keys, etc) are likely to be set up.
It's not, however, a great idea to use a personal user in an automated pipeline, so I created a locked down <strong>bot</strong> user.
Passing credentials into a GitHub actions environment is also slightly fiddly.
It's theoretically possible to setup SSH keys and a config file, but <code>rsync</code> allows a slightly easier setup, by providing the <code>--rsh</code> option. This allows exact specification of the remote shell command.</p>
<pre><code class="language-bash">rsync ... \
  --rsh=&quot;sshpass -e ssh -o StrictHostKeyChecking=no -l $SSH_USER&quot; \
  ...
</code></pre>
<p>This allows the SSH password to be provided in the <code>SSHPASS</code> environment variable (managed via the <a href="https://www.redhat.com/sysadmin/ssh-automation-sshpass"><code>sshpass -e</code></a> command).
It also specifies the user to connect with (<code>SSH_USER</code>) and allows overriding other <code>ssh</code> options such as <code>StrictHostKeyChecking</code>.</p>
<p>So now we have a working sync command which signs in as our bot user. All is not well, however, as the files and directories are owned by the bot user.
Given these are web content, we'd ideally like them to be owned by the <code>www-data</code> user with a group ownership of <code>www-data</code>.
Thankfully here can provide another option <code>--rsync-path</code> which defines the command that is run in the shell created by the connection.</p>
<pre><code class="language-bash">rsync ... \
  --rsync-path=&quot;sudo -u www-data rsync&quot; \
  ...
</code></pre>
<p>This will run the remote rsync command as the user <code>www-data</code>, meaning that the files are written with appropriate ownership and permissions.</p>
<p>The cherry on the cake is specifying a the <code>--info</code> flag to write a report on completion of the sync. The final command is:</p>
<pre><code class="language-bash">rsync \
  --rsh=&quot;sshpass -e ssh -o StrictHostKeyChecking=no -l $SSH_USER&quot; \
  --rsync-path=&quot;sudo -u www-data rsync&quot; \
  --info=STATS2 --recursive --delete \
  $SOURCE_PATH $SSH_HOST:$SSH_PATH
</code></pre>
<p>Wrapping this up in a GitHub Actions script is fairly simple using the <a href="https://docs.github.com/en/actions/using-workflows/workflow-syntax-for-github-actions#jobsjob_idstepsrun">actions file <code>run</code> directive</a>. Of course, it can also be packaged in another runner, such as <code>deno.json</code>, or as an NPM task. This is left as an exercise for the reader!</p>
<p>I hope that this has been helpful. I'm sure future me will also be thankful!</p>
]]>
      </content:encoded>
      <pubDate>Thu, 09 May 2024 00:00:00 GMT</pubDate>
    </item>
    <item>
      <title>One hundred duck-sized horses</title>
      <link>https://dringtech.com/blog/2023/one-hundred-duck-sized-horses/</link>
      <guid isPermaLink="false">https://dringtech.com/blog/2023/one-hundred-duck-sized-horses/</guid>
      <description>
        I've been exploring the capabilities of DuckDB to query data from partitioned parquet files. This blog post collects some useful hints in getting this working with DuckDB-WASM for use in client-side javascript.
      </description>
      <content:encoded>
        <![CDATA[<blockquote>
<p><a href="https://knowyourmeme.com/memes/horse-sized-duck">Would you rather fight 100 duck-sized horses or one horse-sized duck?</a></p>
</blockquote>
<p>I’ve recently been trying the capabilities of DuckDB to drive visualisations. There’s something quite astounding about writing SQL in client-side Javascript.
My current platform of choice is Svelte. I’ve come up with some patterns on using Duck within a Svelte app — to be written up another day.</p>
<p>Suffice it to say that the official <a href="https://duckdb.org/docs/archive/0.9.1/api/wasm/instantiation">DuckDB WASM client setup guide</a> is a great start.
The basic pattern is to prepare parquet files with the data, and query those via DuckDB in the following format:</p>
<pre><code class="language-sql">SELECT
  strftime(date, '%x') AS date,
  value
FROM ‘data.parquet’
WHERE code == ‘The Code’
ORDER BY date;
</code></pre>
<p>The only pre-requisite is that the parquet files need to be registered when the database connection is made:</p>
<pre><code class="language-javascript">await db.registerFileURL(
  'data.parquet',
  'data.parquet',
  DuckDBDataProtocol.HTTP,
  false
);
</code></pre>
<p>So far, so good, but if the parquet files are large, it'd be nice to take advantage of partitioning to avoid shipping the entire file to the browser.
It's pretty easy to create a partitioned file using libraries such as pandas:</p>
<pre><code class="language-python">df.to_parquet(path='data/', partition_cols=['variable'])
</code></pre>
<p>We now have a dataset partitioned by variable name, and can in theory write queries as follows:</p>
<pre><code class="language-sql">SELECT
  strftime(date, '%x') AS date,
  value
FROM ‘data/**/*.parquet’
WHERE code == ‘The Code’
ORDER BY date;
</code></pre>
<p>The slight wrinkle is that you still need to register each file as before.
It appears that the DuckDB registerFileURL doesn't support wildcards, so each file has to be registered independently.
Having discovered, this, I decided that a manifest file would be a sensible way of dealing with the potentially very large number of files that needed to be registered.
A simple way to create this is using shell commands and <code>jq</code>.</p>
<pre><code class="language-sh">find data/ -type f |\
  jq --raw-input --slurp 'split(&quot;\n&quot;)' &gt; manifest.json
</code></pre>
<p>I then register each parquet file in the JSON array as follows:</p>
<pre><code class="language-js">await Promise.all(
  manifest.map(p =&gt; db.registerFileURL(p, p, DuckDBDataProtocol.HTTP, false))
);
</code></pre>
<p>Here were the rough results for a simple test database that I set up.
This is running on my local network, so the impact on a slower network would be greater.
The whole database is 2.5 MB in a parquet file, which is already a massive saving from the source 18 MB CSV file.
It's worth noting that the subsequent calls were much faster.</p>
<table>
<thead>
<tr>
<th style="text-align:right">Measurement</th>
<th style="text-align:center">Monolithic parquet</th>
<th style="text-align:center">Partitioned parquet</th>
</tr>
</thead>
<tbody>
<tr>
<td style="text-align:right">Network transfers</td>
<td style="text-align:center">11 requests</td>
<td style="text-align:center">6 requests</td>
</tr>
<tr>
<td style="text-align:right">Network payload</td>
<td style="text-align:center">3.6 MB</td>
<td style="text-align:center">81.3 kB</td>
</tr>
<tr>
<td style="text-align:right">Time for first query</td>
<td style="text-align:center">659 ms</td>
<td style="text-align:center">278 ms</td>
</tr>
<tr>
<td style="text-align:right">For next query</td>
<td style="text-align:center">~100 ms</td>
<td style="text-align:center">~20 ms</td>
</tr>
</tbody>
</table>
<p>Limitations I ran into, each of which could do with a bit more digging...</p>
<ol>
<li>It seems that at least the libraries that I was using don't allow more than 1024 partitions to be created.</li>
<li>I tried using Brotli compression, but the DuckDB WASM library didn't seem to like it.</li>
<li>I sometimes / within some build systems, had to perform some manipulation of the url to prefix with the server URL.</li>
</ol>
]]>
      </content:encoded>
      <pubDate>Mon, 06 Nov 2023 00:00:00 GMT</pubDate>
    </item>
  </channel>
</rss>