Jump to content

Building a database for all coins of a specific type: IT help requested


Roerbakmix

Recommended Posts

2 hours ago, Rand said:

Thank you. This is a very important point. I need to be double vigilant on when taking notes about the occurrence of publicly announced events becomes scraping. I do not keep any information from Biddr outside my Watch list on their account, so this would not be an issue this time.

To be clear, I was referring to legitimate means of obtaining such information, either from Biddr owners or any authorised third parties, with any charges as appropriate.

Do you say that such information cannot be (legitimately) obtained, or are you unaware that it can be obtained?

I think @Ed Snible is referring to this paragraph:

4.1. Any action that affects the website or part of it in any form is prohibited. This includes in particular:

  • the use of software or any sort of equipment that could affect the proper function of the website,
  • the use of so-called Web Scrapers, Web Robots and other software to systematically collect data from the website of biddr.com,
  • actions that overload the biddr.com infrastructure in an unacceptable way,
  • blocking, deleting, overriding or modifying any content on the biddr.com website or disturbing its proper function in any other way.

This is to protect the site and its infrastructure from anything that could affect its normal availability and functionality. Biddr is first and foremost an auction platform that needs its resources for running auctions, and not a data store for crawlers and the like.

@Rand are you missing a particular auction house on acsearch? Although not all auctions are publicly available, all data is collected and as soon as acsearch has permission to publish them, they are moved to the public domain. Some auction houses prefer not to have their auctions available in the archives, and some we haven't asked yet.

Edited by SimonW
  • Like 2
Link to comment
Share on other sites

Thank you, @SimonWI was referring to the list of closed auctions on Biddr (auction name and date only) https://www.biddr.com/closed. I see on the list many dealers I do not know.

The coins I am interested in (Anastasian gold; I bought a few through Biddr) are well covered by ACSearch, and it is not an issue if I miss some on Biddr that are not in ACSearch. There would be very few.

My worry is that if I do not take note of the completed auctions (name/date) that do not have relevant coins, I will not know about this in the future if they are mentioned somewhere.

I know nothing about web scrapers; I had to Google about them today. I should have phrased better my earlier sentence: 'Past auctions are listed on Biddr, but manually recording them one by one takes a lot of time, and I do not know how to automate this.' There are 165 pages of completed auctions, and it would take ages to record them one by one manually. If attempted, I doubt that taking manual notes would classify as use Web Scrapers, but I will leave Biddr alone to avoid any possibilty of wrongdoing.

Link to comment
Share on other sites

9 minutes ago, Rand said:

I will leave Biddr alone

By leaving alone, I mean to ignore past auctions. I will continue to use it for what it is for. It is my favourate auction platform.

Edited by Rand
  • Clap 1
Link to comment
Share on other sites

@Rand Web scraping https://en.wikipedia.org/wiki/Web_scraping is an automated software bot and can cause all kinds of problems (or not, depending on the skill of the author).  I am not a lawyer.  Biddr's term 4.1.a says no affecting website function.  Term 4.1.b says no scraping.  So I don't do either.

For most auctions Biddr allows a PDF of the catalog to be downloaded.  A single person could spend 30 minutes per week and download all of next week's catalogs and make them available privately to researchers.  (I unsuccessfully tried to convince the ANS to have one of their library interns do it.  I also suggested to BCD to have his librarian do it, but that library is shutting down.  I can't remember if I suggested this to @rNumis)

Currently there are no restrictions on downloading PDF catalogs and using them for research, beyond copyright.  Please let's keep it that way!  It would break my heart if tomorrow Biddr said it was against their rules to train an AI using manually downloaded catalogs.

Edited by Ed Snible
  • Like 2
Link to comment
Share on other sites

  • Benefactor

I used to create a fair number of databases when I worked for the county hospital, using Access.  The discussions in this thread are beyond my expertise, such as it is.  I haven't worked with Access seriously since 2015, so I'm rusty to say the least.  Still I think every now and then creating an in-house database for the collection.

My basic rule it to keep the database simple, efficient and not require laborious data updates.  I've found that too many gongs and whistles when creating a database just cause headaches and weariness somewhere down the road. 

  • Like 3
Link to comment
Share on other sites

  • Benefactor
Posted (edited)
3 hours ago, Ed Snible said:

  I can't remember if I suggested this to @rNumis)

@Ed Snible  ...probably this?  (from the Pegasi thread)

On 12/18/2023 at 2:06 PM, Ed Snible said:

It would be wonderful if someone was saving catalogs via download, "print to PDF", or crawling sites.  For every ancient coin auction.

This data could become valuable in the future.

I recall agreeing with this and it did prompt a review on my side of what's currently available at the different consolidators. That's kind of ongoing and is already proving to be not that straightforward.

As part of that work, I was dismayed to find (unless I'm reading it wrong) that most of the SIXBID 'collectors archive' has disappeared behind a paywall. That was a wonderful free resource. Now, any lots and images older than 6 months are going to cost $$$$ to see. They're also claiming to be offering scanned *old* catalogs for even more money, but the examples they show in their advertisements are clearly just scans from the University of Heidelberg's Catalog digitization project (freely available and online). I have to wonder what is going. Anybody know?

Edited by rNumis
  • Like 1
  • Thanks 1
Link to comment
Share on other sites

Don't worry @Rand, it's perfectly fine to browse past auctions, that's why we keep them online. You may also copy auction information, lot images and descriptions for your personal use.

Web scrapers are programs that systematically download an entire website (or parts of it), sometimes making tens of thousands of requests in just a few minutes, and can be a huge burden on the infrastructure. If there are many aggressive scrapers at the same time, it can be like a DDOS attack, blocking out regular users, or - if the infrastructure scales automatically - greatly increase costs.

@Ed Snible, if you are interested in the PDF catalogs, I can send you a list of all PDF URLs. You can then download them all at once with a download manager. Using them to train AI is not forbidden at the moment, but since we are not the copyright holders, I can't promise that it will stay that way. Content creators around the world are starting to take action against AI services that make money from models trained on their content.

  • Like 2
  • Thanks 2
Link to comment
Share on other sites

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Paste as plain text instead

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

×
×
  • Create New...