godforsaken.website is one of the many independent Mastodon servers you can use to participate in the fediverse.
godforsaken.website is a uk-based mastodon instance boasting literally thousands of posts about bumholes and UNESCO world heritage sites

Server stats:

31
active users

#epub

6 posts6 participants0 posts today

Idée logiciel / service open source : service de détection et remplacement des polices de caractère dans les fichiers epub.

Pourrait fonctionner hors ligne avec le stockage local du navigateur (serialisation du epub dans indexeddb).

Si vous avez connaissance d'un tel service, n'hésitez pas à partager !

#edition#libre#epub

Extracting content from an LCP "protected" ePub

shkspr.mobi/blog/2025/03/towar

As Cory Doctorow once said "Any time that someone puts a lock on something that belongs to you but won't give you the key, that lock's not there for you."

But here's the thing with the LCP DRM scheme; they do give you the key! As I've written about previously, LCP mostly relies on the user entering their password (the key) when they want to read the book. Oh, there's some deep cryptographic magic in the background but, ultimately, the key is sat on your computer waiting to be found. Of course, cryptography is Very Hard™ which make retrieving the key almost impossible - so perhaps we can use a different technique to extract the unencrypted content?

One popular LCP app is Thorium. It is an Electron Web App. That means it is a bundled browser running JavaScript. That also means it can trivially be debugged. The code is running on your own computer, it doesn't touch anyone else's machine. There's no reverse engineering. No cracking of cryptographic secrets. No circumvention of any technical control. It doesn't reveal any illegal numbers. It doesn't jailbreak anything. We simply ask the reader to give us the content we've paid for - and it agrees.

Here Be Dragons

This is a manual, error-prone, and tiresome process. This cannot be used to automatically remove DRM. I've only tested this on Linux. It must only be used on books that you have legally acquired. I am using it for research and private study.

This uses Thorium 3.1.0 AppImage.

First, extract the application:

./Thorium-3.1.0.AppImage --appimage-extract

That creates a directory called squashfs-root which contains all the app's code.

The Thorium app can be run with remote debugging enabled by using:

./squashfs-root/thorium --remote-debugging-port=9223 --remote-allow-origins=*

Within the Thorium app, open up the book you want to read.

Open up Chrome and go to http://localhost:9223/ - you will see a list of Thorium windows. Click on the link which relates to your book.

In the Thorium book window, navigate through your book. In the debug window, you should see the text and images pop up.

In the debug window's "Content" tab, you'll be able to see the images and HTML that the eBook contains.

Images

The images are the full resolution files decrypted from your ePub. They can be right-clicked and saved from the developer tools.

Files

An ePub file is just a zipped collection of files. Get a copy of your ePub and rename it to whatever.zip then extract it. You will now be able to see the names of all the files - images, css, fonts, text, etc - but their contents will be encrypted, so you can't open them.

You can, however, give their filenames to the Electron app and it will read them for you.

Images

To get a Base64 encoded version of an image, run this command in the debug console:

fetch("httpsr2://...--/xthoriumhttps/ip0.0.0.0/p/OEBPS/image/whatever.jpg") .then(response => response.arrayBuffer())  .then(buffer => {    let base64 = btoa(      new Uint8Array(buffer).reduce((data, byte) => data + String.fromCharCode(byte), '')    );    console.log(`data:image/jpeg;base64,${base64}`);  });

Thorium uses the httpsr2 URl scheme - you can find the exact URl by looking at the content tab.

CSS

The CSS can be read directly and printed to the console:

fetch("httpsr2://....--/xthoriumhttps/ip0.0.0.0/p/OEBPS/css/styles.css").then(response => response.text())  .then(cssText => console.log(cssText));

However, it is much larger than the original CSS - presumably because Thorium has injected its own directives in there.

Metadata

Metadata like the NCX and the OPF can also be decrypted without problem:

fetch("httpsr2://....--/xthoriumhttps/ip0.0.0.0/p/OEBPS/content.opf").then(response => response.text())  .then(metadata => console.log(metadata));

They have roughly the same filesize as their encrypted counterparts - so I don't think anything is missing from them.

Fonts

If a font has been used in the document, it should be available. It can be grabbed as Base64 encoded text to the console using:

fetch("httpsr2://....--/xthoriumhttps/ip0.0.0.0/p/OEBPS/font/Whatever.ttf") .then(response => response.arrayBuffer())  .then(buffer => {    let base64 = btoa(      new Uint8Array(buffer).reduce((data, byte) => data + String.fromCharCode(byte), '')    );    console.log(`${base64}`);  });

From there it can be copied into a new file and then decoded.

Text

The HTML of the book is also visible on the Content tab. It is not the original content from the ePub. It has a bunch of CSS and JS added to it. But, once you get to the body, you'll see something like:

<body>    <section epub:type="chapter" role="doc-chapter">        <h2 id="_idParaDest-7" class="ct"><a id="_idTextAnchor007"></a><span id="page75" role="doc-pagebreak" aria-label="75" epub:type="pagebreak"></span>Book Title</h2>        <div class="_idGenObjectLayout-1">            <figure class="Full-Cover-White">                <img class="_idGenObjectAttribute-1" src="image/cover.jpg" alt="" />            </figure>        </div>        <div id="page76" role="doc-pagebreak" aria-label="76" epub:type="pagebreak" />        <section class="summary"><h3 class="summary"><span class="border">SUMMARY</span></h3>         <p class="BT-Sans-left-align---p1">Lorem ipsum etc.</p>    </section>

Which looks like plain old ePub to me. You can use the fetch command as above, but you'll still get the verbose version of the xHTML.

Putting it all together

If you've unzipped the original ePub, you'll see the internal directory structure. It should look something like this:

├── META-INF│   └── container.xml├── mimetype└── OEBPS    ├── content.opf    ├── images    │   ├── cover.jpg    │   ├── image1.jpg    │   └── image2.png    ├── styles    │   └── styles.css    ├── content    │   ├── 001-cover.xhtml    │   ├── 002-about.xhtml    │   ├── 003-title.xhtml    │   ├── 004-chapter_01.xhtml    │   ├── 005-chapter_02.xhtml    │   └── 006-chapter_03.xhtml    └── toc.ncx

Add the extracted files into that exact structure. Then zip them. Rename the .zip to .epub. That's it. You now have a DRM-free copy of the book that you purchased.

BONUS! PDF Extraction

LCP 2.0 PDFs are also extractable. Again, you'll need to open your purchased PDF in Thorium with debug mode active. In the debugger, you should be able to find the URl for the decrypted PDF.

It can be fetched with:

fetch("thoriumhttps://0.0.0.0/pub/..../publication.pdf") .then(response => response.arrayBuffer())  .then(buffer => {    let base64 = btoa(      new Uint8Array(buffer).reduce((data, byte) => data + String.fromCharCode(byte), '')    );    console.log(`${base64}`);  });

Copy the output and Base64 decode it. You'll have an unencumbered PDF.

Next Steps

That's probably about as far as I am competent to take this.

But, for now, a solution exists. If I ever buy an ePub with LCP Profile 2.0 encryption, I'll be able to manually extract what I need from it - without reverse engineering the encryption scheme.

Ethics

Before I published this blog post, I publicised my findings on Mastodon. Shortly afterwards, I received a LinkedIn message from someone senior in the Readium consortium - the body which has created the LCP DRM.

They said:

Hi Terence, You've found a way to hack LCP using Thorium. Bravo!We certainly didn't sufficiently protect the system, we are already working on that.From your Mastodon messages, you want to post your solution on your blog. This is what triggers my message. From a manual solution, others will create a one-click solution. As you say, LCP is a "reasonably inoffensive" protection. We managed to convince publishers (even big US publishers) to adopt a solution that is flexible for readers and appreciated by public libraries and booksellers. Our gains are re-injected in open-source software and open standards (work on EPUB and Web Publications). If the DRM does not succeed, harder DRMs (for users) will be tested.I let you think about that aspect

I did indeed think about that aspect. A day later I replied, saying:

Thank you for your message.Because Readium doesn't freely licence its DRM, it has an adverse effect on me and other readers like me.

  • My eReader hardware is out of support from the manufacturer - it will never receive an update for LCP support.
  • My reading software (KOReader) have publicly stated that they cannot afford the fees you charge and will not be certified by you.
  • Kobo hardware cannot read LCP protected books.
  • There is no guarantee that LCP compatible software will be released for future platforms.

In short, I want to read my books on my choice of hardware and software; not yours.I believe that everyone deserves the right to read on their platform of choice without having to seek permission from a 3rd party.The technique I have discovered is basic. It is an unsophisticated use of your app's built-in debugging functionality. I have not reverse engineered your code, nor have I decrypted your secret keys. I will not be publishing any of your intellectual property.In the spirit of openness, I intend to publish my research this week, alongside our correspondence.

Their reply, shortly before publication, contained what I consider to be a crude attempt at emotional manipulation.

Obviously, we are on different sides of the channel on the subject of DRMs. I agree there should be many more LCP-compliant apps and devices; one hundred is insufficient. KOReader never contacted us: I don't think they know how low the certification fee would be (pricing is visible on the EDRLab website). FBReader, another open-source reading app, supports LCP on its downloadable version. Kobo support is coming. Also, too few people know that certification is free for specialised devices (e.g. braille and audio devices from Hims or Humanware). We were planning to now focus on new accessibility features on our open-source Thorium Reader, better access to annotations for blind users and an advanced reading mode for dyslexic people. Too bad; disturbances around LCP will force us to focus on a new round of security measures, ensuring the technology stays useful for ebook lending (stop reading after some time) and as a protection against oversharing. You can, for sure, publish information relative to your discoveries to the extent UK laws allow. After study, we'll do our best to make the technology more robust. If your discourse represents a circumvention of this technical protection measure, we'll command a take-down as a standard procedure.

A bit of a self-own to admit that they failed to properly prioritise accessibility!

Rather than rebut all their points, I decided to keep my reply succinct.

As you have raised the possibility of legal action, I think it is best that we terminate this conversation.

I sincerely believe that this post is a legitimate attempt to educate people about the deficiencies in Readium's DRM scheme. Both readers and publishers need to be aware that their Thorium app easily allows access to unprotected content.

I will, of course, publish any further correspondence related to this issue.

Chrome debug screen.
Terence Eden’s Blog · Extracting content from an LCP "protected" ePub
More from Terence Eden

🆕 blog! “Extracting content from an LCP "protected" ePub”

As Cory Doctorow once said "Any time that someone puts a lock on something that belongs to you but won't give you the key, that lock's not there for you."

But here's the thing with the LCP DRM scheme; they do give you the key! As I've written about previously, LCP mostly relies on the user…

👀 Read more: shkspr.mobi/blog/2025/03/towar

#debugging #drm #ebooks #epub

Chrome debug screen.
Terence Eden’s Blog · Extracting content from an LCP "protected" ePub
More from Terence Eden
Continued thread

And done.

I can now extract a full #ePub which was "protected" by Readium's #LCP Profile 2.0.

I haven't found a way to reverse engineer the encryption. Nor have I cracked any cryptographic keys. I haven't misused anyone's computer, nor connected to anyone else's computer and run code there.

I simply politely asked the app to give me the unencrypted files and it did so.

It is a slow, tedious, and manual process. But, crucially, it works.

Will blog it all up next week.

Continued thread

I think you should be interested if you want to publish #essays, #poetry, #articles, #newsletters, news, and essentially anything that could fit the old #blog paradigm.

One significant difference is that the character limit will be discarded for a pixel limit. A post will be a page. A thread may be downloaded as an #epub.

Code snippets will be a first-class entity for tech lit.
Scientific references will be a first-class entity for science lit.
If you're a writer, you will have what you need.

Got my tablet set up to ease out of the Kindle ecosystem. But when I went to download a book from calibre it wanted to open it in Google Books.

So what app do y'all use to read epubs on an Android device because I don't like the idea of uploading my books to Google Books to read.

I finished getting my 170+ #eBooks off #Amazon and into #Calibre and stripped the #DRM from them. As an interesting aside, something that I was aware of but hadn't really considered is I can have Calibre scan news websites daily and create #ePub books to read on my device.

Now I'm just trying to find the right reading app to use on my device. The leading app right now is #Librera, but I miss having shelves / groups in it. But I love the integration of #ProjectGutenberg and #InternetArchive.

https://mastodon.social/@richardrawson/114036725076795944

MastodonRichard E. Rawson, Psy.D., MBA (@richardrawson@mastodon.social)🚨 𝗔𝗺𝗮𝘇𝗼𝗻 𝗶𝘀 𝗥𝗲𝗺𝗼𝘃𝗶𝗻𝗴 𝗮 𝗞𝗲𝘆 𝗞𝗶𝗻𝗱𝗹𝗲 𝗙𝗲𝗮𝘁𝘂𝗿𝗲. Starting February 26, 2025, Amazon will remove the "𝗗𝗼𝘄𝗻𝗹𝗼𝗮𝗱 & 𝗧𝗿𝗮𝗻𝘀𝗳𝗲𝗿 𝘃𝗶𝗮 𝗨𝗦𝗕" option. This means you 𝘄𝗼𝗻’𝘁 𝗯𝗲 𝗮𝗯𝗹𝗲 𝘁𝗼 𝗺𝗮𝗻𝘂𝗮𝗹𝗹𝘆 𝗱𝗼𝘄𝗻𝗹𝗼𝗮𝗱 𝘆𝗼𝘂𝗿 𝗽𝘂𝗿𝗰𝗵𝗮𝘀𝗲𝗱 𝗞𝗶𝗻𝗱𝗹𝗲 𝗯𝗼𝗼𝗸𝘀—you’ll have to access them through Amazon’s ecosystem. If you value offline reading or personal backups, now’s the time to 𝗱𝗼𝘄𝗻𝗹𝗼𝗮𝗱 𝘆𝗼𝘂𝗿 𝗯𝗼𝗼𝗸𝘀 𝘄𝗵𝗶𝗹𝗲 𝘆𝗼𝘂 𝘀𝘁𝗶𝗹𝗹 𝗰𝗮𝗻. #Kindle #Amazon #Ebooks #DigitalOwnership https://www.theverge.com/news/612898/amazon-removing-kindle-book-download-transfer-usb

Jetzt, wo #Amazon bald den Download von #Kindle E-Books einstellt, ist es Zeit,

(1) sich definitiv aus der #DRM Hölle zu verabschieden, und

(2) die eigene Bibliothek noch herunterzuladen und DRM-frei zu machen. Wie das geht, wird hier erklärt:

umatechnology.org/how-to-de-dr

UMA Technology · How to De-DRM Kindle Books Using Calibre [2024] - UMA TechnologyGuide to De-DRM Kindle Books with Calibre in 2024
Continued thread

Project Gutenberg is a library of over 70,000 free eBooks

Choose among free epub and Kindle eBooks, download them or read them online. You will find the world’s great literature here, with focus on older works for which U.S. copyright has expired. Thousands of volunteers digitized and diligently proofread the eBooks, for you to enjoy.

📚 gutenberg.org
:mastodon: @gutenberg_org

Project GutenbergProject GutenbergProject Gutenberg is a library of free eBooks.