ClippyKindle package

Submodules

ClippyKindle.DataStructures module

class ClippyKindle.DataStructures.Book(title, author='')

Bases: object

Data structure for storing all highlights/notes/bookmarks for a given book.

__init__(title, author='')

Initialize a Book object.

Parameters
  • title (str) – Title of the book.

  • author (str) – Optional; Author of the book.

cutAfter(cutDate)

removes all data in Book object that was modified on or after the provided timestamp :param cutDate: cutoff date for preserving data in this Book :type cutDate: datetime.datetime

Returns

None

cutBefore(cutDate)

removes all data in Book object that was modified on or before provided timestamp :param cutDate: cutoff date for preserving data in this Book. :type cutDate: datetime.datetime

Returns

None

static fromDict(d)
Returns

A new Book object populated with the values from a provided dict (e.g. read from a JSON file)

getDateRange()

retrieves the datetime of the earliest and latest item (note, highlight, or bookmark) stored in this book (the datetime of an item is the timestamp at which it was originally added to the book)

Returns

(tuple of datetime.datetime objects) first object in tuple is the earliest date, second is the latest (if book has no items, earliest will be returned as None, and the latest as the datetimes at epoch 0)

getName()

returns a string containing the book’s title and author (if known) e.g. “How to Live on 24 Hours a Day by Arnold Bennett.md”

sort(removeDups)

sorts arrays self.highlights, self.notes, and self.bookmarks. Each array is stored by (increasing) location in the book (ties are broken by the date recorded) optionally removes duplicates within each array

Parameters

removeDups (bool) – set True to remove suspected duplicates within self.notes, self.highlights, self.bookmarks. the oldest item in each set of duplicates is the one preserved (the one last modified)

Returns

None

toCSV()

converts this book object to a CSV file (columns sorted by location in book increasing) :returns: Array of lists representing each row (can be written to csv file later).

toDict()

converts this book object to a dict (which can be jsonified later) :returns: A dict storing all the data in this book. :rtype: (dict)

class ClippyKindle.DataStructures.Bookmark(loc, locType, date)

Bases: object

Data structure for storing info about a single bookmark

__init__(loc, locType, date)

Bookmark class constructor

Parameters
  • loc (int) – page or location value this note was made at

  • locType (str) – “page or “location” (identifies what location type this highlight uses)

  • date (datetime.datetime) – date this highlight was made

static fromDict(d)
Returns

A new Bookmark object populated with the values from a provided dict (created with toDict())

Return type

(Bookmark)

isDuplicate(other)

returns true if provided Bookmark object can be considered a duplicate of this object

Parameters

other (Bookmark) – other Bookmark object to compare this object to

Returns

true or false.

Return type

(bool)

toDict()
Returns

Dict representing this object.

Return type

(dict)

ClippyKindle.DataStructures.GCS(string1, string2)
Returns

The greatest (longest) common substring between two provided strings (returns empty string if there is no overlap)

Return type

(str)

class ClippyKindle.DataStructures.Highlight(loc, locType, date, content)

Bases: object

Data structure for storing info about a single highlight

__init__(loc, locType, date, content)

Highlight class constructor

Parameters
  • tuple (loc) – (int locStart, int locEnd)

  • locType (str) – “page or “location” (identifies what location type this highlight uses)

  • date (datetime.datetime) – date this highlight was made

  • content (str) – book text stored in this highlight

static fromDict(d)
Returns

A new Highlight object populated with the values from a provided dict (created with toDict()).

isDuplicate(other, fuzzyMatch=True)

returns true if provided Highlight object can be considered a duplicate of this object

Parameters
  • other (Highlight) – other Highlight object to compare this object to

  • fuzzyMatch (bool) – true if we should consider Highlights with overlapping content (but not exactly the same) to be duplicates (default: True)

Returns

true or false.

Return type

(bool)

toDict()
Returns

A dict representing this object.

Return type

(dict)

class ClippyKindle.DataStructures.Note(loc, locType, date, content)

Bases: object

Data structure for storing info about a single note

__init__(loc, locType, date, content)

Note class constructor

Parameters
  • loc (int) – page or location value this note was made at

  • locType (str) – “page or “location” (identifies what location type this highlight uses)

  • date (datetime.datetime) – date this highlight was made

  • content (str) – text contents of the note

static fromDict(d)
Returns

A new Note object populated with the values from a provided dict (created with toDict())

isDuplicate(other, fuzzyMatch=True)

returns true if provided Note object can be considered a duplicate of this object

Parameters
  • other (Note) – other Note object to compare this object to

  • fuzzyMatch (bool) – true if we should consider Notes with overlapping content (but not exactly the same) to be duplicates (default: True)

Returns

true or false

Return type

(bool)

toDict()
Returns

A dict representing this object

Return type

(dict)

ClippyKindle.DataStructures.sortDictList(arr)

helper function for sorting a list of objects (representing Hightlight/Note/Bookmark objects) in order by (increasing) page/location within the book (ties broken by date recorded).

Parameters

arr (list of dict objects) – list of dicts that contain (at least) the fields “loc” and “dateStr” (these dicts should have created by a call of toDict())

Returns

original list of dicts except now reordered

Return type

(list of dict objects)

Module contents

class ClippyKindle.ClippyKindle

Bases: object

Does the work of parsing either a “My Clippings.txt” file or a previously exported JSON file created from a “My Clippings.txt” file. (using classes from ClippyKindle.DataStructures for storage)

static parseClippings(fname, verbose=False)

parses the notes/highlights/bookmarks stored in a kindle clippings txt file (printing any errors) and returns the data as an array of dicts (each dict representing the data from one book).

Parameters
  • fname (str) – file path to txt file to parse (e.g. “My Clippings.txt”)

  • TODO (#) – use verbose param with options 0 (print nothing), 1 (print everything), and 2 (print errors only)

Returns

type listOfObjects: DataStructures.Book) list of Book objects

Return type

(

static parseJsonFile(fname)

parses the notes/highlights/bookmarks stored in a JSON file previously created with ClippyKindle returns an array of Book objects

Parameters

fname (str) – file path to json file to parse (e.g. “collection.json”)

Returns

type listOfObjects: DataStructures.Book) list of Book objects

Return type

(

ClippyKindle.dateToStr(dateObj)

converts a provided dateTime object to a string with desired formatting

ClippyKindle.strToDate(dateStr)

converts a provided string (of desired formatting) to a dateTime object